The LOC has started a Web site devoted to information about its program to capture and preserve Web sites — what it calls “historically important” web sites. You can check it out at http://www.loc.gov/webcapture/.
As you’ll see when you first visit the site, the LOC has actually been active in Web capture and preservation since 2000, and has election Web sites for 2000, for 2002, and for the events of September 11.
If you take a look at the current project list, you’ll see that while there are several projects organized by date, there are several that are more thematic. Current projects include Election 2006, the Papal transition, Prints and Photographs Acquisitions, and “General Collections Archiving Pilot”, several months of collecting sites that are not around a specific theme. Nerds might want to check out the technical information page, which not only describes some of the open source tools the LOC is using but announces that the LOC is developing a tool that will be made available as open source in “late 2006.”
It might seem counterintuitive that the Library of Congress is putting so much effort into Web archiving when there’s a perfectly good Internet Archive which has archived literally tens of billions of Web pages. But with both the amount of material to be archived, and the way that this material can be archived (all the different project lines the LOC is coming up with) the more the merrier.
I am a little concerned — this is going to sound incredibly lowbrow — I am a little concerned that some of the popular culture aspects of the Internet are going to be missed with these preservation programs. Of course, there are some things that for the good of humanity should probably be missed and kept missing. But on the other hand the Internet has been evolving now for over ten years, and there are Web sites that one could argue were either pivotal in the development or showed the evolution of the Internet as a power.
For example, the Blair Witch Project site. Nowadays everybody talks about viral marketing and buzz and all that. The Blair Witch Project site was one of the first Web sites to help build buzz about a project. Some of the early “Soap Opera” type sites, like The Spot. Early corporate sites, like Amazon back when it had a TBBS interface. Or more recently the guy that sold advertising by the pixel on a Web site. Things that point out and play up why things have changed, or when they changed, or how they changed — but don’t fit neatly into a project template.
Maybe we’ll have to wait until 2025 when college students are doing their theses on “Pivotal Events In the Early Development of the Internet” and start putting together these kinds of collections…