Blog Archives

New York Times Publishing More Subject Headings

The New York Times has published a bunch more subject headings to the Linked Data Cloud. I wrote about this last November when the NYT released 5000 person/place/organization names as subject headings (or, as I noted then, you can think of them as tags.) These subject headings are those that the NYT Open blog describes as “subject descriptors” — keywords related to article content instead of proper nouns. This release includes 498 of the most commonly-used subject headers, which, like the names, are mapped to DBPedia and/or Freebase. The NYT hopes to eventually release all 3,500 of its subject descriptors.

You can browse through all the available subject headings (the descriptors and the proper nouns) at http://data.nytimes.com/. Look for the alphabetical browsing links in the middle of the page. I looked through the Ds and the first one I found was DNA (Deoxyribonucleic Acid) at http://data.nytimes.com/26507891352660881440. This page of information shows the first and most recent use of the subject descriptor, Freebase and DBPedia links, and an associated article count. There’s also a “scope note,” that explains exactly what the subject heading covers. In this case the scope is “Used for any coverage that focuses on D.N.A. — whether in research, forensic science, genetics, etc.”

For details about using these headings in an article search, visit the Search API documentation. There’s also an API request tool for experimenting with searches without having to use an API key or build queries in a URL; here’s an example search for the DNA subject heading.

New York Times Offers Most Popular API

Woo! It’s API Wednesday. The New York Times announced a couple days ago its new API, the Most Popular API, the documentation for which is available at http://developer.nytimes.com/docs/most_popular_api. The Most Popular API is for getting links and data associated with the most-frequently e-mailed, shared, and viewed NYT blog posts and content.

The Most Popular API request uses a REST format with responses in JSON, XML, and the new-to-me serialized PHP (.sphp). Essentially when building a query for this API you’re requesting three things: what you want (most e-mailed, most shared, or most viewed), which section (all sections or one or more sections) and the time period you want to include (either 1, 7, or 30 days.) I really like the share-types option, though: you can limit the “most shared” results to the method used to share the items (one or more types, with your options being Digg, e-mail, Facebook, Mixx, MySpace, Permalink, TimesPeople, Twitter, or Yahoo Buzz.) It’s be neat to slice up data and see what people are sharing on Facebook vs. Twitter vs. MySpace.

You need a key to use the Most Popular API, but NYT Open has a prototyping tool where you can experiment with the different request parameters without having to build a request or get a key. Here’s my request for the most shared-to-Facebook items for the last seven days.

New York Times Releases Version 3 Of Times Newswire API

New York Times Open announced yesterday Version 3 of the Times Newswire API. If you’re using version 2, don’t worry; that will be supported until August 2010. The Newswire API site with documentation and changes is at http://developer.nytimes.com/docs/times_newswire_api.

There’s not a huge number of changes here, but the new version does allow you to filter by sections and sources, and apparently integrates better with the Times Article Search API, though I haven’t tried that yet. The new section parameter is called section; you can either specify all or you can give specific section names; the documentation provides a URL for getting a full list of available sections. The source parameter only has three options: you can specify that you want items coming only from the New York Times, only from the International Herald Tribune, or from both papers.

While there are examples with the new parameters, I did not see any applications designed to take advantage of the newly-available parameters. However, there’s always the Times Developer Network Gallery, which shows various applications built on Times APIs. New Apps include We Read, We Tweet and Nooblast.

New York Times Congress API: Version 3

Oh, New York Times, how you irritate me with your constant talk of a paywall (I don’t care if you institute it, I’m just tired of hearing the endless coy reveal and discussion. Paywall or get off the pot.) But no matter how much the NYT gets on my nerves, I always forgive it after a visit to the Open Blog. Latest from NYT Open: version 3 of the Congress API, which was announced late February. Version 2 of the API will be supported until June 2010.

There are several new additions and changes to the new version. New responses for the API include a list of members leaving office, chamber schedule, votes by date, and member sponsorship comparison. Changes to responses include vote responses (which now include bill information), member bio responses now include current party and state attributes (and some social media information if available), and bill details responses now include version information. You can get an overview of all the changes here. Full documentation is here.

So how can you put the new API to work? The Open folks have put together a sample app at http://nytcongress.appspot.com/ that compares voting records between a pair of senators (they’re random; refresh the page to see different pairs.) You also get a list of bills that the pair has cosponsored with links to additional legislation information. (You can get the code for this application at http://github.com/dwillis/NYT-Congress-API-Demo.) There’s also a forum available for discussion of new applications but it’s not active at the moment.

More great stuff from the NYT Open blog. Maybe as the US government starts releasing more data sets the NYT will start integrating some of that information into its APIs?…

More Subject Headings for NYT Data

The New York Times’ Open Blog announced this morning the addition of about 5,000 new subject headings to the Linked Open Data repository at http://data.nytimes.com/. I covered the initial release of 5,000 subject headings last November. These new subjects include geographic identifiers, organizations, and publicly-traded companies.

These subject headings, like the first crop, have been mapped to DBpedia, and Freebase. In the case where subject headings are geographic, they’ve also been mapped to GeoNames, which you can learn more about at http://www.geonames.org/. Aaaaannnd you developer types will be interested to know that the resources are being published in JSON along with RDF/XML and HTML.

To get a sense of what the New York Times has available, you can download data records for people, organizations, and locations at http://data.nytimes.com. You will have to agree to a Creative Commons license and you will have to wait a while — you’ll be getting an XML file but it’s still a pretty big download!

New York Times Open Announces API Tool

The New York Times actually announced its API tool last week, but when I tried to use it I got a 404 error. Now the link from the announcement of July 20th actually works. You can try the API tool at http://prototype.nytimes.com/gst/apitool/index.html.

So what the heck is an API tool? It’s a quick place to play with the different New York Times APIs — for getting information about articles, movie reviews, Congress, and a lot more — without having to write code or figure out just the right URL structure. I have a screen shot below showing the result of one query for the article search API.

New York Times' API Tool

What you do is choose which API you want to work with. The nav on the left will change to show you different options depending on the API. Enter your query and any options you want to enact. Your results on the right will include showing what the results will look like (mostly, it looks like you have the option of getting them in XML or JSON, though I saw one API that had SPHP as an option. At times the only result option was JSON.)

At the top of the page you’ll also get a view of what the proper request URL would look for your query. You can’t use this request URL — you need to have a working NYT API key — but it’ll at least show you what proper syntax looks like.

Some of the APIs are more self-explanatory than others. For some you will definitely need to use the documentation. (Look for a documentation link right under the pulldown box for choosing an API.) It’s a nifty sandbox for seeing just how much you can slice and dice the New York Times’ vast amount of data.

New York Times Announces Article Search API

The New York Times announced yesterday that a new content API is available — this one is an article search API. This new API covers articles going back to 1981 — that’s over 2.8 million articles, and it’s updated hourly — with search in 35 different fields.

The documentation for the new API is at http://developer.nytimes.com/docs/article_search_api. You’ll need to have an API key to use it; keys are free but you’ll have to register. Keys for this particular API are good for 10000 requests a day. Like many modern APIs, you build the data request in the search URL, so it’s less about programming per se and more about making a good query.

So what can you search? Be sure to go through the whole documentation. The standard date range, query words, and order sorting are all available, but you also have this thing called “Facets”. “Facets” let you do things like search in different sections of the newspaper, or search for similar descriptive terms, or even the day of the week the item was published. I’m going to have to do a lot of playing with them before I can wrap my head around them completely.

Apparently you can do a facet search that will provide you with general information about search results before pointing you to articles themselves. Here’s a fun example:

http://api.nytimes.com/svc/search/v1/article?query=twitter&facets=per_facet&api-key=YOURKEYGOESHERE

When you run this search you’ll get at its beginning the people, according to the New York Times, that are associated with the text string “Twitter”. Barack Obama, sure. Shaq? Of course. William Shakespeare? Um, really? Okay. Bear in mind though that the names given by this API’s results may not always be accurate; I ran a vanity search on ResearchBuzz and discovered that that term was associated with “Rael Cornfest” (instead of Rael Dornfest.) Also in my experiments I didn’t always understand when names were indexed and when they weren’t.

But it gets kind of addictive to play with the facets. You can even search for a name facet and then search for associated names:

http://api.nytimes.com/svc/search/v1/article?query=per_facet:%5BMADOFF, BERNARD L]&facets=per_facet&api-key=YOURKEYGOESHERE

This search finds people associated with Bernie Madoff. This is a full-spectrum search; it would be interesting to run date based searches — say, five years at a time — and see who’s associated with him when.

When you look at the results keep in mind that while you’re getting query results on full-text searches, you’re being returned only the first paragraph of a query — and you get back only ten articles at a time. Also the search results are returned in JSON, though XML is said to be coming soon.

I really wish the output format was XML; I have about three hundred ideas of what I want to do with this. If you’re at all interested in data mining or the NYT check out the documentation for this API..

Follow

Get every new post delivered to your Inbox.

Join 3,895 other followers