Google announced yesterday its new Web indexing system, which it says in the announcement “provides 50 percent fresher results for web searches than our last index, and it’s the largest collection of web content we’ve offered.” (Though not an exact document count, as Google stopped providing those a long time ago.)
Google has an illustration to show how the Web is being indexed. Before, it said, the old index had several layers. Now, the Web is analyzed in small portions and the index is being continually updated.
I’m afraid I misinterpreted that as “Before, Googleman stood beside a neatly-stacked index of information and had a fairly good idea of what was going on. Now, Googleman stands helplessly inside a maelstrom of content, constantly getting bombarded by multimedia.”
But I kid Google, though I wasn’t kidding about the maelstrom. Caffeine is huge. From the announcement: “If this were a pile of paper it would grow three miles taller every second. Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day. You would need 625,000 of the largest iPods to store that much information; if these were stacked end-to-end they would go for more than 40 miles.”
I remember when Google’s index went to a billion pages and that blew my mind. I don’t know what Google’s index stands at now, but I can do some guessing. I can do a search for a, for example (apparently it’s not a stop word anymore) and get 18,020,000,000 results at this writing. Do the same search for the last 24 hours and get 17,070,000,000 results. A search for the? About 11,490,000,000 results, with, very strangely, about 16,610,000,000 for the last 24 hours (sometimes Google’s numbers are odd.) site:com? 13,440,000,000. site:google.com? About 108,000,000 results.
As you know I’m very much into information trapping. I spend a lot of time figuring out how to find out about new places and resources without having to wait for other people to find them for me. Google Alerts on Google’s own index helps me a lot; I have many alerts set up to let me know when Google indexes particular kinds of content. (Are you interested in learning more about how to do this? Leave me a comment or drop and e-mail and I’ll write an article.) I have not seen in the last 24 hours any big change in the kind or amount of content that I’m getting, but I’ll keep an eye on it and let you know how it changes, if at all.