Playing with the National Data Catalog

I first read the announcement about the National Data Catalog over a week ago, but decided to play with it a little before I wrote about it. But first, what the heck is the National Data Catalog? From the site, located at “The National Data Catalog is an open platform for government data sets and APIs, making it easy to find datasets by and about government, across all levels (federal, state, and local) and across all branches (executive, legislative, and judicial).” At this writing data is cataloged from, the District of Columbia, Utah, and the Sunlight Foundation. For once, a site I’m reviewing is not in beta! No no no. It’s in alpha. D’oh!

You can do a keyword search or you can browse the data, and I started with browsing. Just going to browse the data brings about 1485 results, but you can narrow that somewhat by specifying jurisdiction, organization, source type, and release year. Available data sets are presented in a table which includes dataset name and (usually) some kind of description, star rating (if any; the ones I looked at usually didn’t have one), jurisdiction, organization, and formats. I saw all kinds of formats: XML, MAP, CSV, ATOM, XLS, ESRI, etc. (You can download data sets right from the collection browsing if you like.) Collections have their own detail pages, which also allow you to do data downloads and which have a little more information and spaces for comment.

Examples of data sets of I found: School Election Districts in DC, Active Mines and Mineral Plants in the US, and gall bladder removals in Utah hospitals (huh?) It looks like the Federal government has the most data sets here.

While there’s plenty you can do with a CSV file and a spreadsheet, several of these formats require more intense manipulation. As you might expect the National Data Catalog has an API along with a certain amount of documentation. Impressive, but naturally I want more data sets….

