The Library of Congress has launched a new crowdsourcing platform for transcribing the vast number of documents in its collection. As the new platform is supposed to start yielding transcribed documents early next year, I get the impression that the LoC is hitting the ground running on this one. Let’s take a look.
The site is available at https://crowd.loc.gov/ , and apparently it’s in beta. If you’ve ever participated in a crowdsourcing, you know that work is usually divided into projects. It’s the same for LoC Crowd; the site has launched with five campaigns to start:
- Branch Rickey: Changing the Game
- Civil War Soldiers: “Disabled but not disheartened”
- Clara Barton: “Angel of the Battlefield”
- Letters to Lincoln
- Mary Church Terrell: Advocate for African Americans and Women
If any of this is ringing a bell for you it might be because some of it is fairly recent; Branch Rickey’s scouting reports went online this past April. As you might imagine the spotlight project for this is Abraham Lincoln’s letter collection — the LoC is hoping to get all 28,000 pages transcribed by the end of the year — but I’m going to walk through the Branch Rickey project.
The Branch Rickey project page is at https://crowd.loc.gov/campaigns/branch-rickey-changing-the-game/ .I was kind of shocked at how bare it was.There’s a paragraph about Mr. Rickey, and external links to the collection, release announcement, and timeline for his life. The actual link to the items that need transcribing is on an image at the bottom of the page.
Click on that and you’ll get to https://crowd.loc.gov/campaigns/branch-rickey-changing-the-game/scouting-reports/ , which has the scouting reports that need transcription. Again, not too much in the line of instructions here. I clicked on the first set of papers, which was the Branch Rickey Papers: Baseball File, 1906-1971; Scouting reports; 1951; A-F. A helpful bar at the top of these pages told me that the pages were 100% in progress and 97% complete. It would be nice if that was on the previous page as well; I would focus on finding page sets that were less complete.
At any rate I tried again, this time with Branch Rickey Papers: Baseball File, 1906-1971; Scouting reports; 1963; M-O. These were noted as being only 2 percent completed, though when I looked it meant that the vast majority of the pages were under review. So I skipped that one and tried one more time, with Branch Rickey Papers: Baseball File, 1906-1971; Scouting reports; 1963; A-B. This was marked as 0% complete but there were many pages in review.
The file papers are presented as a series of images. They’ll read Transcribe if they need something done, Review if they’ve already been transcribed.
Pick an image and you’ll get a page with an image on the left and a transcription place on the write. And you start typing. There will be controls on the image side to pan the image around and zoom in, which you’ll probably need to do with some of these images — the typing is faded and sometimes the letters aren’t clear, etc. While there is a link to some quick hints at the bottom of the page, I was a bit taken aback at how little guidance there was.
You don’t have to be logged in to contribute, so I quickly typed up a transcription of image #68. When I clicked Save I had to solve a really obscure CAPTCHA; I’m glad they give second chances because I had to do it three times. (Sometimes the lines obscuring the letters aren’t good for actual humans either.)
Once you’ve successfully submitted the transcript, you’ll get a confirmation note. If you are logged in, you’ll also get the option to tag the transcription with information about teams, players, locations, and other relevant information. As I was not logged in that was grayed out.
While this new crowdsourcing platform from the Library of Congress was good in many ways – responsive, loads quickly, good tools for pulling up the materials to be transcribed – I was disappointed in some of its user interfaces.
It’s sometimes hard to find materials that have not been transcribed, so you end up paging through a lot of stuff that’s already been transcribed and is waiting for review. I know there are links to transcription hints at the bottom but it would be good to make those more front-and-center. It’d be good to give the transcription projects a bit more structure so that newer volunteers would find the platform friendlier.
For example? For example.
The Danish Butterflies and Moths 2020 project on Zooniverse walks project participants through a multi-step process. It is so easy and so friendly that people who speak and read no Danish (like me) can easily contribute. And as you might notice from the screenshot, the project is already done.
The Smithsonian’s Phyllis Diller project (also complete) makes it clear with large visuals which jokes have been complete, and puts help materials more front-and-center.
I’m not saying the new platform from the Library of Congress is bad — it isn’t. I am saying that with the Library of Congress now using a crowdsourcing platform, there’s a great opportunity for citizens to get involved in crowdsourcing and just generally learning more about the United States’ history and culture. Because of that it would be useful to see more help and more steps to get potential volunteers engaged and productive.