New York Times Publishing More Subject Headings

The New York Times has published a bunch more subject headings to the Linked Data Cloud. I wrote about this last November when the NYT released 5000 person/place/organization names as subject headings (or, as I noted then, you can think of them as tags.) These subject headings are those that the NYT Open blog describes as “subject descriptors” — keywords related to article content instead of proper nouns. This release includes 498 of the most commonly-used subject headers, which, like the names, are mapped to DBPedia and/or Freebase. The NYT hopes to eventually release all 3,500 of its subject descriptors.

You can browse through all the available subject headings (the descriptors and the proper nouns) at http://data.nytimes.com/. Look for the alphabetical browsing links in the middle of the page. I looked through the Ds and the first one I found was DNA (Deoxyribonucleic Acid) at http://data.nytimes.com/26507891352660881440. This page of information shows the first and most recent use of the subject descriptor, Freebase and DBPedia links, and an associated article count. There’s also a “scope note,” that explains exactly what the subject heading covers. In this case the scope is “Used for any coverage that focuses on D.N.A. — whether in research, forensic science, genetics, etc.”

For details about using these headings in an article search, visit the Search API documentation. There’s also an API request tool for experimenting with searches without having to use an API key or build queries in a URL; here’s an example search for the DNA subject heading.

New York Times Offers Most Popular API

Woo! It’s API Wednesday. The New York Times announced a couple days ago its new API, the Most Popular API, the documentation for which is available at http://developer.nytimes.com/docs/most_popular_api. The Most Popular API is for getting links and data associated with the most-frequently e-mailed, shared, and viewed NYT blog posts and content.

The Most Popular API request uses a REST format with responses in JSON, XML, and the new-to-me serialized PHP (.sphp). Essentially when building a query for this API you’re requesting three things: what you want (most e-mailed, most shared, or most viewed), which section (all sections or one or more sections) and the time period you want to include (either 1, 7, or 30 days.) I really like the share-types option, though: you can limit the “most shared” results to the method used to share the items (one or more types, with your options being Digg, e-mail, Facebook, Mixx, MySpace, Permalink, TimesPeople, Twitter, or Yahoo Buzz.) It’s be neat to slice up data and see what people are sharing on Facebook vs. Twitter vs. MySpace.

You need a key to use the Most Popular API, but NYT Open has a prototyping tool where you can experiment with the different request parameters without having to build a request or get a key. Here’s my request for the most shared-to-Facebook items for the last seven days.

New York Times Releases Version 3 Of Times Newswire API

New York Times Open announced yesterday Version 3 of the Times Newswire API. If you’re using version 2, don’t worry; that will be supported until August 2010. The Newswire API site with documentation and changes is at http://developer.nytimes.com/docs/times_newswire_api.

There’s not a huge number of changes here, but the new version does allow you to filter by sections and sources, and apparently integrates better with the Times Article Search API, though I haven’t tried that yet. The new section parameter is called section; you can either specify all or you can give specific section names; the documentation provides a URL for getting a full list of available sections. The source parameter only has three options: you can specify that you want items coming only from the New York Times, only from the International Herald Tribune, or from both papers.

While there are examples with the new parameters, I did not see any applications designed to take advantage of the newly-available parameters. However, there’s always the Times Developer Network Gallery, which shows various applications built on Times APIs. New Apps include We Read, We Tweet and Nooblast.

Google, the NYT, and Washington Post Team Up for Living Stories

Google’s hard at it in the labs again with a new feature called Living Stories. Living stories are more like living “topics,” with story topics put together into a permanent place and updated in a variety of ways. It’s available at http://livingstories.googlelabs.com/.

Once you get to the site, pick a story you want to follow. I chose executive pay, which is at http://livingstories.googlelabs.com/lsps/executivepay. Note the permanent URL. When you get there you’ll see a page which looks like this:

There’s a summary at the top of the page, a timeline, and down the middle of the page stories. The left side of the page beneath the timeline provides pointers to different kinds of media/information, while the right side provides links to important stories in the timeline.

When you click on a story you find interesting, you don’t leave the page — it opens right in the middle, which does make it easier to take in several stories without losing your place or getting hugely distracted. Key people and companies in the stories are highlighted; click on them and you’ll get a popout with a one-sentence explanation and sometimes a picture.

A took a look at the supplemental materials. There were plenty of images and quotes available, fewer videos. There were some great graphics showing the evolution of executive pay.

So how do you keep up with the changes to the stories if the URL is permanent? You have a couple of options. The first is a good old-fashioned RSS feed. The second is signing in with a Google account and getting updates to the story e-mailed to you.
Personally I’d prefer RSS.

Google’s put together a great way to group a lot of stories in one place. The one problem is the sources. The New York Times and Washington Post are great papers, but there are lots of other great papers, too, and you could get an even more multifaceted look at
a story if you used several different sources. I’ll be a lot more interested in this when I can go to a story about something happening in Virginia, for example, and get stories from every indexed news source in the state.

NYT Kicks out 5,000 Subject Headings to Data Clouds

I love the RSS feed for New York Times Open. The blog doesn’t update very often but whenever it does I know I’m going to have something good to read. At the end of October the blog announced the release of 5,000 of its person name subject headings as “Linked Open Data.”

Let’s back up. The New York Times has developed subject headings to index its archives. So you can think “tags” instead of “subject headings” if you like. (Sorry NYT.) And the “Linked Open Data” means that the 5,000 subject headings/tags have been manually mapped to the data sources Freebase and DBPedia. Two very exciting things about this: 5,000 is just a fraction of the NYT’s subject headings — there are in total over 30,000 — and the NYT intends to map and release them all. Further, the NYT is releasing this data under a Creative Commons license!

You can explore what’s been done so far at http://data.nytimes.com/. You can download all the data records in one file (you have to agree to the CC license first) or you can browse by last name. I went and looked at the E’s to see if Elmo had been indexed. He hadn’t, but there were several other people under E, from Eagleburger, Lawrence S, to Eyre, Richard. Each name has an URL associated with it. Click on it to get more data.

I clicked on Herm Edwards’ name which is located at http://data.nytimes.com/57985207950391437243.html. Data here includes the number of mentions in the NYT, first and last time the subject heading was mentioned (I’m suspicious of that “first time” — it seemed like 2001 in most of the names I looked at and that didn’t seem right for the historical figures) and a pointer to the New York Times “Topic Page”. (If you’re looking for the latest news and other information on a figure, in an easy-to-read format, use the Topic Page. It even has an RSS feed.)

Even more interesting than the aggregated data are the pointers to Freebase and DBPedia. Each person’s data URL is also associated with links to pages of data at Freebase and DBPedia. These two pages are in XML and RDF formats respectively, so they’re less for reading by humans and more for mixing and reoutputting by computer programs.

The NYT release of these subject headings/tags helps pull three data sources together. I expect to see some great tools made from this. If you’d like to see how people are discussing extending and using the new release, you can check out the Linked Open Data community at http://groups.google.com/group/nyt_linked_open_data.