Redmond, Washington-based Microsoft has begun testing the next generation of its Web crawler application that will better discover and index online information for inclusion in the software maker's search engine Live Search, the company announced Thursday. The new test version of Microsoft's Live Search crawler was considered significant enough to warrant a new name for the identifying user agent that lets webmasters know which search engine is indexing their Web sites. Live Search's new user agent is called "msnbot/2.0b" and during a test period expected to last several weeks it will run simultaneously with the existing "msnbot/1.1" crawler, Microsoft said. MSNBot/2.0b To Replace Old Crawler User Agent Identification Microsoft did not release details of what changes have been made to the new Web crawler application, which indexes millions of Web sites for inclusion in its search engine. "In the coming weeks, we will be testing an update to MSNBot, which may show up as a new crawler name in your referrer logs," noted Microsoft Live Search program manager Jeremiah Andrick in a message announcing the test posted Thursday on the company's Webmaster Center blog. The full user agent string Webmasters will see in their log files when the new Web crawler visits their sites is: msnbot/2.0b (+http://search.msn.com/msnbot.htm) While both version of the crawler will operate concurrently, the existing 1.1 version will remain the primary user agent until Microsoft makes the new version the default crawler, a move expected in early 2009. Microsoft introduced the 1.1 version crawler in February, with features aimed at increasing the efficiency of the indexing process by using compression and other techniques to lower the amount of work required by Web servers. Among those changes were the implementation of HTTP compression, a process that uses standard Web server utilities to shrink the size of files accessed while looking for changes to add to the Live Search results on Microsoft's servers, and the addition of support for "Conditional Get" functionality, a method that helps save data bandwidth and in turn server processor cycles by only looking at and re-indexing the areas of a Web site that have changed since the last visit by Microsoft's Live Search crawler. Microsoft Live Search Tests Next Generation Web Crawler With the new crawler test announced Thursday, Webmasters will not need to make any changes to their robots.txt file, Microsoft said. "We intend to ensure that any robots exclusion protocol you are using is respected. As such, you don’t need to update your Robots.txt file," Andrick wrote in the announcement message. The computer files called "robots.txt," which have been in use for some 14 years, are text files most search engines look for and use to control how or whether a Web site is indexed for inclusion in search engine results pages, or SERPs, a term coined by Brett Tabke, CEO of the WebmasterWorld Inc. online discussion forums. During the test Microsoft will operate the new 2.0 MSNBot at a reduced speed in order to lower the processor load on Web servers already tasked with crawler traffic from the existing 1.1 version of Microsoft's spider. "We plan on crawling at a slow speed during the tests with the updated version," noted Andrick, who added that Microsoft would make an announcement when it replaces the old Web crawler. Related Links :
|