Microsoft Live Search has changed the way its Web crawler application operates when indexing millions of Web sites for inclusion in its search engine, adding features aimed at increasing the efficiency of the process by using compression and other techniques to lower the amount of work required by Web servers, the company announced today. Changes To Lower Bandwidth Requirements The changes include implementation of HTTP compression, a process that uses standard Web server utilities to shrink the size of files accessed while looking for changes to add to the Live Search results on Microsoft's servers. Common server side GZIP and Deflate utilities are supported for the new Live Search HTTP compression. A second significant change is the addition of support for "Conditional Get" functionality, a method that helps save data bandwidth and in turn server processor cycles by only looking at and re-indexing the areas of a Web site that have changed since the last visit my Microsoft's Live Search crawler. This is done by using HTTP headers that include the "If-Modified-Since" status of each Web page on a site being crawled and a record of when each page was last indexed by Live Search. New Live Search User Agent The updates announced today to Microsoft's Live Search crawler are considered significant enough to warrant a new name for the identifying user agent that let's webmasters know which search engine is indexing their Web sites. Live Search's new user agent is called "msnbot/1.1". Besides HTTP compression and "Conditional Get" support, "there are many more improvements in performance that should help further optimize our crawling," Fabrice Canel of Microsoft's Live Search Crawling Team wrote in the announcement, posted on the company's webmaster blog. HTTP Compression Implemented Microsoft's Live Search crawler has added support for the common World Wide Web Consortium (W3C) definition of HTTP compression, which is described on the group's Web site in Request For Comments (RFC) 2616, in sections 14.11 and 14.39. Microsoft has added a new utility to its Live Search Webmaster Center, which allows webmasters to test how HTTP compression and "Conditional Get" are implemented on their own Web servers. Microsoft has provided links to three resources aimed at helping those webmasters seeking more information about the type of HTTP compression today's changes implements, including "Enabling Compression in IIS 6.0," from its own TechNet system and two from SitePoint Pty., Ltd., "Compress Web Output Using mod_gzip and Apache" and "Compress Web Output Using mod_deflate and Apache 2.0.x". Conditional Get Added The addition of "Conditional Get" support to Microsoft Live Search will also follow the W3C's RFC 2616 implementation, in section 14.25, and Microsoft notes that its crawler will "generally" refrain from downloading any page for indexing unless it has changed since the previous crawler visit. The new "msnbot/1.1" crawler will include additional information in the requests it makes to Web servers. "Our crawler will include the 'If-Modified-Since' header and time of last download in the GET request and when available, our crawler will include the 'If-None-Match' header and the ETag value in the GET request," Canel wrote in the Microsoft statement. "If the content hasn't changed the web server will respond with a 304 HTTP response," she added. Webmasters such as those who regularly use online forums such as WebmasterWorld can use the Microsoft webmaster tool previously mentioned to test for "If-Modified-Since" HTTP headers, or users of Firefox can opt for a browser add-on called "Live HTTP Headers," written by Daniel Savard and Nikolas Coukouma, to perform similar tests. Internet Explorer users can use an extension called Fiddler, a Web debugging proxy, to perform the test. Additional information about the Fiddler program is available on the Microsoft Developer Network. Microsoft Live Search Streamlines Web Crawler Efficiency Microsoft urges webmasters to make sure their servers are ready to take advantage of the Live Search features added today. "If you have not yet configured conditional get on your site, we would strongly encourage you to do so, as it can significantly help reduce server load as most browsers and crawlers already support this feature," Canel wrote in the Microsoft statement. Related Links:
|