The New York Times announced yesterday that a new content API is available — this one is an article search API. This new API covers articles going back to 1981 — that’s over 2.8 million articles, and it’s updated hourly — with search in 35 different fields.
The documentation for the new API is at http://developer.nytimes.com/docs/article_search_api. You’ll need to have an API key to use it; keys are free but you’ll have to register. Keys for this particular API are good for 10000 requests a day. Like many modern APIs, you build the data request in the search URL, so it’s less about programming per se and more about making a good query.
So what can you search? Be sure to go through the whole documentation. The standard date range, query words, and order sorting are all available, but you also have this thing called “Facets”. “Facets” let you do things like search in different sections of the newspaper, or search for similar descriptive terms, or even the day of the week the item was published. I’m going to have to do a lot of playing with them before I can wrap my head around them completely.
Apparently you can do a facet search that will provide you with general information about search results before pointing you to articles themselves. Here’s a fun example:
When you run this search you’ll get at its beginning the people, according to the New York Times, that are associated with the text string “Twitter”. Barack Obama, sure. Shaq? Of course. William Shakespeare? Um, really? Okay. Bear in mind though that the names given by this API’s results may not always be accurate; I ran a vanity search on ResearchBuzz and discovered that that term was associated with “Rael Cornfest” (instead of Rael Dornfest.) Also in my experiments I didn’t always understand when names were indexed and when they weren’t.
But it gets kind of addictive to play with the facets. You can even search for a name facet and then search for associated names:
http://api.nytimes.com/svc/search/v1/article?query=per_facet:%5BMADOFF, BERNARD L]&facets=per_facet&api-key=YOURKEYGOESHERE
This search finds people associated with Bernie Madoff. This is a full-spectrum search; it would be interesting to run date based searches — say, five years at a time — and see who’s associated with him when.
When you look at the results keep in mind that while you’re getting query results on full-text searches, you’re being returned only the first paragraph of a query — and you get back only ten articles at a time. Also the search results are returned in JSON, though XML is said to be coming soon.
I really wish the output format was XML; I have about three hundred ideas of what I want to do with this. If you’re at all interested in data mining or the NYT check out the documentation for this API..