News

Google Makes Change to Daterange Syntax

When Google first released their API (which I think was spring of 2002) there were things in the API documentation that weren’t normally discussed at the Google site. Like a syntax called daterange:.

Daterange allowed you to do a search for all pages added to Google’s index, or refreshed by Google’s index, within a certain date range. Dates were specified with Julian dates, which are static and change at noon. (Today’s Julian date, for example, is 2454252.)

Taking my knowledge of the daterange syntax, I created a tool called GooFresh, which you can use at http://www.researchbuzz.org/wp/tools/goofresh/ . GooFresh allows you to search Google’s Web results for date-based content. (And it converts dates to Julian for you too.)

I got an e-mail from a Google engineer today telling me that the way the daterange syntax was being handled has changed slightly. Before a Web page was marked with a certain date when it was added to Google’s index and whenever it was refreshed in Google’s index — in other words, whenever Google re-indexes it. NOW, a Web page will be found on one date only — the date that it’s added to Google’s index. It doesn’t matter how many times it’s refreshed by Google’s spider, the only time it’ll appear in response to a daterange search is the date it was indexed.

(Note that just because Google indexes a page on a certain date doesn’t mean it’s a brand-new page; Google may have just discovered it. Keep that in mind.)

For those of us who like to monitor additions to the Web, this is great news. It’ll allow you to do date-based searches without having to plow through tons and tons of repeat pages that get indexed on a regular basis (like home pages, or section pages.)

The engineer I spoke to said that the new way of handling daterange is currently being rolled out and should be updated by the end of the week. So you may try this search and get weird results. I found a couple of times I’d do a search and get huge numbers of results ( site:us daterange:2454251-2454251 ) but when I revised the search to add a little more syntax I’d get reasonable numbers of results ( site:us inurl:us daterange:2454251-2454251 ). You’ll have to do some experimenting.

Feel free to use GooFresh to do that experimenting. But be warned; Google was awfully quick on the trigger about stopping me after a search and saying, “Pardon us but you look like a scraper. Please prove you are not with this CAPTCHA.” This got real old real quick…. I didn’t think I was typing THAT fast…

This post came from ResearchBuzz, a site with news and information about online data collections. Visit us at ResearchBuzz.com .

Categories: News