The web is still waiting for the worldwide roll-out of Google's next-generation search infrastructure, the mysterious indexing system overhaul known as "Caffeine."
A recent Wired profile of Google's search team indicates that Caffeine has already been deployed. But it seems the technology is still limited to a single data center, and though Google had planned to roll it out to other facilities after the New Year, this has yet to happen.
According to Search Engine Land, a Google spokesperson says that Caffeine will roll out across the company's global network of data centers "over the coming months." Previously, über-Googler Matt Cutts had indicated that Caffeine would be rolled out to multiple data centers "after the holidays," meaning after first of the year. And we're now two months on from January 1.
In early November, after testing Caffeine in a public sandbox for several weeks, Cutts indicated the platform would soon be rolled out to a single data center for use on the company's live search engine and that the company would follow suit with other data centers in a matter of weeks.
"Caffeine will go live at one data center so that we can continue to collect data and improve the technology, but I don’t expect Caffeine to go live at additional data centers until after the holidays are over," Cutts wrote on November 10. "Most searchers wouldn’t immediately notice any changes with Caffeine, but going slowly not only gives us time to collect feedback and improve, but will also minimize the stress on webmasters during the holidays."
Google did not immediately respond to our requests for comment. But that Google spokesperson tells Search Engine Land that the company expects to "roll [Caffeine] out to all data centers over the coming months." The company operates roughly 36 custom-built data centers across the globe.
"We run lots of tests with this big a change [sic] to our infrastructure,” the spokesperson says. “We want the new system to meet or exceed the abilities of our current system, and it can take time to ensure that everything looks good.”
It should be noted that Cutts never gave an exact date for the roll-out. He merely said it would not happen until after the holidays and - subsequently - "until at least January."
Caffeine continues to run in that single data center. In late November, according to Search Engine Roundtable, Cutts said that the the Google IP address 220.127.116.11 was hitting that single Caffeinated data center 50 per cent of the time, and it appears Google search-engine IPs are still mapping to the same data center.
"The data center remains the same,” the Google spokesperson tells Search Engine Land, “but different IP addresses can map to different data centers at different times due to how Google manages its traffic. Because of how Google employs custom load-balancing, there is not a single IP address that will always reach the Caffeine data center.”
Cutts first unveiled Caffeine - at least partially - in August with a post to the official Google Webmaster Central blog, calling it a "secret project" to build the "next-generation architecture for Google's web search," before pointing users to a sandbox where they could test it. Speaking with The Reg days later, he called it "a fundamental re-architecting" of Google's search indexing system.
"It's larger than a revamp," he told us. "It's more along the lines of a rewrite. And it's really great. It gives us a lot more flexibility, a lot more power. The ability to index more documents. Indexing speeds - that is, how quickly you can put a document through our indexing system and make it searchable - is much, much better."
This is not a change to Google's search philosophy. It's not a change to its famous search algorithms. It's a change to the way the company builds its index of all known websites and the metadata needed to describe them - the index that the algorithms rely on. "The new infrastructure sits 'under the hood' of Google's search engine," read Cutts' original blog post, "which means that most users won't notice a difference in search results."
After interviews with Google's search team, Wired's Steve Levy described Caffeine as something that makes it even easier for engineers to add "signals" - i.e. "contextual clues that help the search engine rank the millions of possible results to any query, ensuring that the most useful ones float to the top."
Cutts confirmed with The Reg that as we had reported earlier, Caffeine includes an overhaul of the company's distributed Google File System, or GFS. A technology two years in the making, the so-called GFS2 is a significant departure from the original Google File System that debuted almost ten years ago and now drives services across the Google empire.
With GFS, a master node oversees data that's spread across a series of distributed chunkservers, - architecture that's not exactly suited to apps that require low latency, such as YouTube and Gmail. That lone master is a single point of failure. To solve this problem, GFS2 uses not only distributed slaves, but distributed masters as well.
Cutts also said that Caffeine uses other back-end technologies recently developed by the company, but he declined to name them. He indicated that these did not include updates to MapReduce, Google's distributed number crunching platform, or BigTable, its distributed database.
Whatever new infrastructure technologies underpin Caffeine, they have not been deployed across other Google services. But Cutts indicated that Google hopes to do so with at least some of them. Google's distributed global infrastructure is designed to operate a like a single machine, running all its online services. Certainly, GFS2 will be deployed across the Googlenet. ®