Google reinvents web indexing with jolt of ‘Caffeine’

9 Jun 2010

Google has completed a new web indexing system called Caffeine which, it says, provides 50pc fresher results for web searches than its traditional web-indexing system.

As reported on last week, Matt Cutts, the head of Google’s webspam team and the webmaster who applies Google’s Quality Guidelines, Google has been re-architecting the entire indexing infrastructure under project ‘Caffeine.’

“The idea of that is to be able to return documents much faster, like within a minute or two, not just a day or after a few hours,” he said. “I think the areas we are stretching is the ability to incorporate an order of magnitude for more documents and web pages, but also the ability to index those much faster and return those to users.”

Caffeine, he said, prepares Google for a world where search won’t be done solely on computers but on the 4 billion-plus mobile phones in the world.

Google software engineer Carrie Grimes said that content on the web is blossoming, not just in size and numbers but with the advent of video, images, news and real-time updates.

“Our old index had several layers, some of which were refreshed at a faster rate than others; the main layer would update every couple of weeks. To refresh a layer of the old index, we would analyse the entire web, which meant there was a significant delay between when we found a page and made it available to you.

“With Caffeine, we analyse the web in small portions and update our search index on a continuous basis, globally. As we find new pages, or new information on existing pages, we can add these straight to the index. That means you can find fresher information than ever before — no matter when or where it was published.”

She explained that every second, Caffeine processes hundreds of thousands of pages in parallel.

“If this were a pile of paper it would grow three miles taller every second. Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day. You would need 625,000 of the largest iPods to store that much information; if these were stacked end-to-end they would go for more than 40 miles.

“We’ve built Caffeine with the future in mind. Not only is it fresher, it’s a robust foundation that makes it possible for us to build an even faster and comprehensive search engine that scales with the growth of information online, and delivers even more relevant search results to you,” Grimes said.

John Kennedy is a journalist who served as editor of Silicon Republic for 17 years