Sunnyvale, California-based Yahoo has updated its automated Web search engine crawler, called Slurp, with a new version that will be released in phases over the next several weeks, the search engine and online media company announced Monday in a bulletin aimed at webmasters and Web site publishers. The new Yahoo Slurp 3.0 crawler, a computer program used by Yahoo to traverse the Web and "slurp" up content for indexing by the second most popular search engine, will show up under a slightly different name in Web server log files and will rely more heavily on the practice of reverse domain name system (DNS) identification, Yahoo said. Slurping The Web To Index Information For Search The latest version of the Yahoo Slurp search crawler, sometimes also called a spider or a bot, will continue to use computer files called "robots.txt," text files most search engines looks for and use which control how or whether a Web site is indexed for inclusion in search engine results pages, or SERPs, a term coined by Brett Tabke, CEO of the WebmasterWorld Inc. online discussion forums. "Yahoo Slurp 3.0 recognizes the same user-agent and all robots.txt directives for 'Yahoo Slurp,' though it'll identify itself as Slurp 3.0 in your web logs," noted Yahoo search engineers Sharad Verma and Yoram Arnon in an announcement on Yahoo's search engine blog. All of Yahoo's indexing servers will begin using the new crawler within the next several weeks, Arnon and Verma noted. Specific Changes For Webmasters The new Yahoo Slurp will be operating from company servers using a new and smaller range of Internet protocol (IP) addresses, however it will continue to come from the same "crawl.yahoo.net" domain Yahoo has used since June 2007. Each of the new IP addressed associated with Yahoo Slurp 3.0 will continue to resolve to this domain, Verma and Arnon noted in the Monday blog announcement, however with the change in IP addresses the two "strongly recommend" that webmasters switch from IP-based crawler recognition to using the reverse DNS method. Yahoo expects to stop using the old range of IP addresses that have worked with Yahoo Slurp 2.0 over the next few weeks, and will change its identifying user-agent to "Yahoo! Slurp/3.0," Arnon and Verma noted, and recommended webmasters that are still using robots.txt directives referring to "Slurp/2.0" change them to a simpler and shorter version. "We recommend specifying the shorter version of: User-agent: Slurp," Yahoo added. Directives referring to "Yahoo! Slurp" will continue to work, Yahoo said, and noted that the robots.txt directive changes will affect only Yahoo's primary Web search crawlers and not its more regional versions such as Yahoo Slurp China. Yahoo Updates Web Search Crawler, Slurp 3.0 Webmasters and Web site publishers who don't use reverse DNS to identify Yahoo's crawlers, and who don't switch to using the method within the next several weeks, may risk turning away Yahoo crawlers and having new Web site content excluded from Yahoo SERPs. One member of WebmasterWorld's community of mostly technically savvy webmasters and search engine marketing (SEM) professionals, using the handle "incrediBILL," warned of the consequences facing those who don't make the switching to reverse DNS crawler identification in light of the Yahoo change announced Monday. "Anyone that validates Slurp by IP address instead of reverse DNS-based identification is about to be in a world of hurt until the new IP addresses are known," the member wrote. "Many sites will start bouncing Yahoo Slurp that didn't heed the call to use reverse DNS validation for major search engines, so this will be ugly," the member added. The robots.txt system used by Yahoo and other search engines was put together in 1994 by a loose affiliation of interested webmasters including Martjin Koster, who helped author an early version called "A Standard for Robot Exclusion". Yahoo has not yet released information about any changes in the way Yahoo Slurp 3.0 indexes Web site content, however some search engine industry analysts have speculated that changes have been made which are apparent in recent Yahoo search results. Related Links:
|