« The Pigs Ate the Sausage | Main | Sitemaps and How the Rich Get Richer »

June 03, 2008

Sitemaps and the Dynamic Web

Google championed something called Sitemap files to help webmasters help Google do a better job of searching their websites. Sitemaps are essential for dynamic websites which generate their pages on the fly and which want to be found through search engines. Although basic Sitemap files are now standardized and there's even a sitemaps.org website, Google has also managed to make Sitemaps into a competitive advantage.

Here's the description of Sitemaps from sitemaps.org:

"Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.

"Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata. Using the Sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling your site."

In the early days of the web, most pages were static sitting out on a server somewhere waiting to be retrieved. Web crawlers – which were NOT invented by Google – followed the links from page to page until they had found a vast percentage of the pages on the web which they then indexed. The crawler part of Google's search engine works that way as well. In those old days you presumably had some way for users to navigate from page to page of your website so, if Google found one page through an external link, it could follow the links between pages to find the rest. As an example, suppose the crawler followed a link from an external site to a  page on your HR policy: that page probably has a link to your home page; your home page probably links to various sections; within the sections there's linkage; and so all your pages get crawled.

That was back in the days of the static web.

Now many of your pages may be created on the fly in response to specific requests from people browsing your site. Perhaps the page for a specific product doesn't exist until someone types in a part number or part name. At that point your server fetches the product from the database, constructs a webpage on the fly, and feeds it back to your customer's browser.  Since there was no link to that specific product page from anywhere else, it isn't going to be discovered by web crawlers and – most important to you Ms. Merchant -  your site isn't going to show up when someone Googles that product. Bad!

The solution is that you create a Sitemap file that essentially tells the web crawler how to get that dynamic page to be created. If you have a website named www.mildredsstuffforsale.com and it has a search box in which customers enter product names or product numbers and someone types in "white widgets" or XY768-0 chances are that the request is sent to your server as http://www.mildredsstuffforsale.com?q=white+widgets or http://www.mildredsstuffforsale.com?q=XY768-0. You put those one of these URLs in your Sitemap file as an instruction to web crawlers to make those requests. When they do, your server serves up the page all about this fine product, it gets indexed, and, if you're lucky (have good Google juice), it will come back as a search result for people looking to buy white widgets. You can buy a program to create these instructions for every product in your catalog and create a Sitemap file.

You can then follow the standard and put the Sitemap file in a place on your website where web crawlers will find it. You probably want to do that. But that's not all you want to do; you also want to "submit" it to Google and other search engines. Why? Stay tuned; the answer is in the next post about how Google has – quite properly – made Sitemap files into a competitive advantage and why the rich get richer.



| Comments (View)

Recent Posts

Why You Want to Use Free ChatGPT-4o Instead of Search

Tale of Two Districts

The Magical Mythical Equalized Pupil

Our Daughter and Family Doing What's Right

Human-in-the-Loop Artificial Intelligence


TrackBack URL for this entry:

Listed below are links to weblogs that reference Sitemaps and the Dynamic Web:


blog comments powered by Disqus
Blog powered by TypePad
Member since 01/2005