NYT Kicks out 5,000 Subject Headings to Data Clouds
I love the RSS feed for New York Times Open. The blog doesn’t update very often but whenever it does I know I’m going to have something good to read. At the end of October the blog announced the release of 5,000 of its person name subject headings as “Linked Open Data.”
Let’s back up. The New York Times has developed subject headings to index its archives. So you can think “tags” instead of “subject headings” if you like. (Sorry NYT.) And the “Linked Open Data” means that the 5,000 subject headings/tags have been manually mapped to the data sources Freebase and DBPedia. Two very exciting things about this: 5,000 is just a fraction of the NYT’s subject headings — there are in total over 30,000 — and the NYT intends to map and release them all. Further, the NYT is releasing this data under a Creative Commons license!
You can explore what’s been done so far at http://data.nytimes.com/. You can download all the data records in one file (you have to agree to the CC license first) or you can browse by last name. I went and looked at the E’s to see if Elmo had been indexed. He hadn’t, but there were several other people under E, from Eagleburger, Lawrence S, to Eyre, Richard. Each name has an URL associated with it. Click on it to get more data.
I clicked on Herm Edwards’ name which is located at http://data.nytimes.com/57985207950391437243.html. Data here includes the number of mentions in the NYT, first and last time the subject heading was mentioned (I’m suspicious of that “first time” — it seemed like 2001 in most of the names I looked at and that didn’t seem right for the historical figures) and a pointer to the New York Times “Topic Page”. (If you’re looking for the latest news and other information on a figure, in an easy-to-read format, use the Topic Page. It even has an RSS feed.)
Even more interesting than the aggregated data are the pointers to Freebase and DBPedia. Each person’s data URL is also associated with links to pages of data at Freebase and DBPedia. These two pages are in XML and RDF formats respectively, so they’re less for reading by humans and more for mixing and reoutputting by computer programs.
The NYT release of these subject headings/tags helps pull three data sources together. I expect to see some great tools made from this. If you’d like to see how people are discussing extending and using the new release, you can check out the Linked Open Data community at http://groups.google.com/group/nyt_linked_open_data.