Learning Search

Enigma Relaunches Collection of Public Data Sets

Engima announced this week the relaunch of its public data set collection, Enigma Public. If you’re interested in data sets but haven’t found a site to explore, this is your chance. In this article I’m going to show you around a bit.

Enigma Public is available at https://public.enigma.com . You can get an account (more about that later) but you don’t need one to browse the various data sets. You’ll start out with a menu of public collections in broad categories.

screenshot 2017 06 20 at 4 33 24 pm

Note that “United States” is its own category so if you’re looking for US data sets start there and don’t go jumping into Governments like I did.

Once you choose a collection category, you’ll get a list of organizations within that category. Choose one of those and you’ll get a list of available data sets.

screenshot 2017 06 20 at 4 36 18 pm

In the case of the Museum of Modern Art (MoMA to the cool kids) there are two data sets available. Click on one and the rightmost panel will update with statistics and other data.

screenshot 2017 06 20 at 4 49 12 pm

There’s a brief description and a link to the source, but what I really like here is that information is provided about the size of the data set, answering my question Can I explore this without getting hopelessly lost?

Scroll down that rightmost panel a little and you’ll get information on the fields in the database as well as a link to view the data set in an online viewer.

screenshot 2017 06 20 at 5 06 49 pm

I will confess: I have a bit of a nerdcrush on this data viewer. There are a couple of things to look out for, though. Let’s take a look at the MoMA artists data set.

screenshot 2017 06 20 at 5 14 07 pm

If you’ve used spreadsheets at all, this data viewer will feel familiar. You can sort by each column, hide columns, and filter data by string, even in multiple columns. Want to filter your data to include only female French artists who are still alive? You can do that!

screenshot 2017 06 20 at 5 30 49 pm

…. But keep an eye out for errors. The 0 in the death year column is supposed to indicate the artist is still alive, but that seems extremely unlikely for Odette Des Garets, who was born in 1891. Meanwhile, living artist Camille Henrot does not have anything in the birth year field, though the Artist Biography field notes she was born in 1978.

Data sets have errors but that doesn’t mean they’re useless. Just don’t accept everything unconditionally. (You can also report data errors to public-support@enigma.com .)

I had a little bit of an issue figuring out how to get rid of the filters. Look for the Matching Rows button in the lower right corner. Click on that and it’ll show you the active filters, and from there it’s just clicking an x to get rid of them.

screenshot 2017 06 20 at 6 17 54 pm

It’s fun to use the Web-based data viewer, but what if you want to do more exploration on your own, or want to use one of these data sets as a base to build another set? If you have an account you can download an entire data set; in this case, the MoMA artists data set is about 1MB.

You can also access data sets through the Enigma Public API; again, you will have to be registered, and to access the sets through an API you will need a key (which you can get when you register.) There’s pretty extensive documentation for the API functions at  https://docs.public.enigma.com/index.html .

If you’ve been looking for an opportunity to get your feet wet with public data sets, this is a great way to start exploring. The only vital thing missing as far as I’m concerned is the book information set from the Library of Congress; cross your fingers!

Leave a Reply