So what exactly does it do? Poligraft comes as a standalone Web site or as a bookmarklet. I’m going to do this writeup using the standalone Web site as it’s easier to show. When you visit a Web page or a news story that contains political content, you can run it through Poligraft. Poligraft will give you the story along with context in a sidebar — which lawmakers have been receiving political donations from whom, where aggregated donations from companies go, etc.
For example, take this article from The New York Times: “Education Department Deals Out Big Awards”. I can take that URL and copy and paste it at Poligraft. (I can also paste the contents of an article if I don’t have access to the URL.)
Poligraft reprints the article, but with an information bar on the left. In this case the information bar is showing where political donations from one individual went, and where aggregated donations from several institutions went — to Democrats or Republicans. The information presented in the bar is just a pie chart, which is a little misleading — you’ll note that all of Cornelia Grumman’s donations were all to Democrats — well, her one $250 donation. Meanwhile Johns Hopkins University has well over a million dollars in aggregate donations listed for the last 21 years, but has the same kind of little pie chart.
Each chunk of data on the information bar has a page with more details. The Ohio State University page shows top politicians donated to, as well as money spent on lobbying and issued lobbied about. Many of the individual names in the report pages are clickable, leading you if you wish down a political wonk rabbit hole.
I myself am enough of a wonk to appreciate this as a tool, but not enough of a wonk to really know how to use it (I had to go through several political stories before I found one that provided a lot of information.) I think as we get closer to the midterm elections it’ll be more useful as there will be more topical stories and more quotes from all sorts of organizations. Sunlight Labs is promising to add more data sets over time, too — look forward to seeing that.
I first read the announcement about the National Data Catalog over a week ago, but decided to play with it a little before I wrote about it. But first, what the heck is the National Data Catalog? From the site, located at http://nationaldatacatalog.com/: “The National Data Catalog is an open platform for government data sets and APIs, making it easy to find datasets by and about government, across all levels (federal, state, and local) and across all branches (executive, legislative, and judicial).” At this writing data is cataloged from Data.gov, the District of Columbia, Utah, and the Sunlight Foundation. For once, a site I’m reviewing is not in beta! No no no. It’s in alpha. D’oh!
You can do a keyword search or you can browse the data, and I started with browsing. Just going to browse the data brings about 1485 results, but you can narrow that somewhat by specifying jurisdiction, organization, source type, and release year. Available data sets are presented in a table which includes dataset name and (usually) some kind of description, star rating (if any; the ones I looked at usually didn’t have one), jurisdiction, organization, and formats. I saw all kinds of formats: XML, MAP, CSV, ATOM, XLS, ESRI, etc. (You can download data sets right from the collection browsing if you like.) Collections have their own detail pages, which also allow you to do data downloads and which have a little more information and spaces for comment.
Examples of data sets of I found: School Election Districts in DC, Active Mines and Mineral Plants in the US, and gall bladder removals in Utah hospitals (huh?) It looks like the Federal government has the most data sets here.
While there’s plenty you can do with a CSV file and a spreadsheet, several of these formats require more intense manipulation. As you might expect the National Data Catalog has an API along with a certain amount of documentation. Impressive, but naturally I want more data sets….
Thanks to reader APS for pointing me to The Straight Choice (http://www.thestraightchoice.org), a Web site containing almost 3500 (at this writing) election leaflets from UK general election candidates. The front of the site contains a list of latest leaflets found, the top parties, top constituencies, and campaign “not spots” (sorry, Aberdeen North.) You can also search the leaflets by postal code or browse them by party or category. There’s also a fairly substantial tag cloud of keywords.
I went looking at the parties, and found literally dozens — unfortunately some of the most interesting looking ones had no fliers associated with them. (The Dungeons Death and Taxes party?) I did find one flyer from the “Best of a Bad Bunch” party. The party pages for fliers contain links for getting an RSS feed or e-mail alert, and even embed codes if you want to feature a party’s leaflets on your own site. There’s a little data about where the leaflet came from and when it was uploaded to the site, and a few relevant categories listed.
The image quality of the leaflets themselves varied a lot — visitors are encouraged to scan or photograph leaflets and send ‘em on in — but all the ones I looked at were available in a large enough size that they were easily readable. I know this wasn’t the intention, but if you wander through a site with almost 3500 flyer designs you can learn at least a little something about layout.
As long as you’re looking at UK election campaign materials, drop by http://www.electionchampion.com, which is attempting to document election billboards. There’s a leaderboard where you can get points for taking pictures of and sending in billboards. Billboards have some data about the associated campaign and a map of the area where it was found. At least one billboad I looked at seemed to have suffered a bit of annotation. Not as extensive as the leaflets site but there seems to be more elections data here.
Google was a busy little bee last week. Another of the things it announced was a new Government Requests tool, which shows information about requests for user data or content removal from government agencies worldwide. (This is for Google and YouTube.) This iteration shows data from July through December of 2009, and there are plans to add data in six-month increments. The tool is available at http://www.google.com/governmentrequests/.
The site is basically a map with two menus. You can see government requests for data, and you can see requests for content removal. Brazil tops the list both times, not what I would have expected. There were many, many more requests for data than there were for content removal.
Countries with some data associated with them are tagged with numbers. Click on a country’s number and you’ll get a window showing how many requests were made for data and content removal. You’ll also see what percentage of data removal requests were complied with, and what kind of removal requests they were. A lot of Brazil’s were court orders. I wonder why South Korea had so many AdWords removal requests?
I want more context. I know Brazil has a lot of requests, but how many Brazil pages are in Google’s index? (using site:br as the measuring stick.) 193 million, approximately, at this writing. So that’s .66 content removal requests per million pages of indexed, country-code-specific content in Google’s index. Meanwhile, #2 Germany has 626 million pages (approximately at this writing) which means, what, .30 content removal requests per million pages of indexed, country-code-specific content in Google’s index? Pardon me while I math out for a while…
Thanks to Data Surfer for the pointer to the Traffic Safety Legislation Tracking Database, which tracks information on bills and chaptered laws in all fifty states and DC. It covers legislation from 2007 through 2010 and it’s available at http://www.ncsl.org/?TABID=13599. (The last update was April 6 so I guess updates to the site are ongoing.)
You can search by state, keyword, year, primary sponsor, topic, status, or bill number. I did a search for texting and got results for 34 bills in 14 states. Results are arranged alphabetically by state and each item of legislation (some of them enacted, many failed) includes status, date of last action, and summary. There’s also a history that you can expand for more details.
There were surprisingly few topics to browse through; you’ll have more luck doing keyword searching, especially if you’re interested in the intersection (um, no pun intended) between electronic communications and driving. I did get a couple of weird results on test searches; searching for Internet got me information on “Internet Violence Prevention” legislation, and searching for domestic found enacted legislation about texting and driving that was simply called “Criminal Law.”
It’s a DATA BUFFET! Sunlight Labs announced last week the launch of TransparencyData.com, a site that is designed to let you query and download bulk data about government transparency. The big data set available at launch is campaign contributions at the state and federal level. TransparencyData.com is available at — ta-dah! — http://transparencydata.com/.
The front page has a 4-minute video that shows how TransparencyData.com works — below that is a search form that lets you specify what time cycle you want to find donations in and whether you’re looking for donations for or against a candidate. The top field lets you choose one of many other variables to search, including the amount, the contributor, the contributor’s state, the recipient’s name, the recipient’s state, etc. What’s not immediately obvious is that you can “stack” variables by using the pulldown menu over and over again. For example, I can find contributions of less than $10,000 to candidates in Florida.
Once you’ve specified your contributions and selected “Preview Data,” TransparencyData will think about it for a few minutes and then present you with a preview of what your data looks like. You’ll get a table of results that shows several things including contribution amount, date, organization, and recipient.
Now, why is this data “previewed”? Because you’re not supposed to browse it on the site; instead, you’re supposed to check out this data and make sure it’s what you want to find, and then you download it. In this case, clicking the download data got me a CSV (Comma Separated Values) text file, suitable for opening in the spreadsheet program of your choice. The CSV file has literally dozens of fields, far more than you’ll see on the data preview page.
There are so many fields available, in fact, you might appreciate spending a few minutes with the documenation page, which does a good job of explaining the schema. If you find you want access to large lumps of raw data, visit the bulk download page, but I hope you have a high-speed download. Finally, if you’re a programmer, check out the REST API. I am loving that CSV response format.
What a great thing to read in the New York Times this morning! C-SPAN as you may know stands for Cable-Satellite Public Affairs Network, and is a set of networks that broadcasts nothing but government proceedings and public affairs programming. Now this network has taken “virtually every minute” (according to the New York Times article) of its archives and made them available on the Internet.
The archive Web site is available at http://www.c-spanvideo.org/videoLibrary/. The site currently has more than 160,000 hours of footage dating back to 1987. C-SPAN actually started in 1979, but according to the NYT article much of the early broadcasts are not available. There are about 10,000 hours of footage available pre-1987 which, the article notes, will have to be formatted for the Web before it can go online.
The front page of the C-SPAN archives actually has many ways you can browse the video; you can look at the most recent video as well as the most shared and most e-mailed video in a variety of categories. There are a few articles, too, pointing to video content. But I always like to start with a nice simple keyword search. And I knew exactly what to search for.
Let me nerd out on you for a minute. I taped the Enron hearings. You remember Enron? The energy company that also generated massive amounts of bogus accounting? Yeah, them. They were the subject of Congressional Hearings in early 2002, so I did a search for that (maybe I can toss these tapes.)
The search results were divided up in several ways — as you can see from the screenshot I got results from people (Skilling, Lay, Watkins, etc.) and by program. You can sort programs by relevance, newest, or oldest. So while I did eventually find the Enron hearings, I also found Jeffrey Skilling testifying about electricity deregulation in 1997, and Ken Lay participating in a forum about energy regulatory issues in 1990. The individual pages for videos are nice — there’s an in-page player with links to embed the video on your own site if you like, and links for sharing on Facebook or Twitter. There’s also a list of related videos and of the people who are in the video.
Oh yes, people have their own pages as well, though sometimes the archiving is a little off (if you do a search for Yahoo you’ll get Jerry Yang in the list of people in Yahoo-related videos, but there’s also one appearance from that lesser known Yahoo co-founder, Jerry Young.)
Take Meg Whitman, former eBay CEO and current candidate for Governor of California.
Her page is at http://www.c-spanvideo.org/person/58256. Here’s you’ll find links to her latest appearances (and an RSS feed!), a list of people with whom she appears, a photo gallery, and a list of appearances by year. I wish they had a similar person gallery for companies — I would love an RSS feed of a specified company’s representatives appearing before Congress.
There were a couple of other disappointments with the video archives as well. Many of these videos are fairly long — an hour plus. Many of the videos I looked at did have transcripts with time stamps, so if you wanted to find something in the video you could go through the transcript, find it, and then pull the player slider to the appropriate timestamp. But I wish that appearances of people had been marked in some way so you could jump to different places in the video.
Overall, though, the video pages are nicely organized with a ton of information, the people pages have RSS feeds (RSS for search results — how cool would THAT be?), and the promise of more, older archives to come. I can’t decide who’s going to have more fun with this archive — Jon Stewart or The Gregory Brothers.