The US Patent and Trademark Office announced today that it was entering into a two year, no cost agreement with Google to make bulk electronic patent and trademark public data available. The USPTO provides the data, Google hosts it for the public.
By the USPTO’s estimate this is going to be about ten terabytes of data, and includes patent grants and applications, trademark applications, and patent and trademark assignments, with more data (like trademark file histories) available in the future. Google’s hosting this data right now; you can see it at http://www.google.com/googlebooks/uspto.html.
Google divides this hosted files into two pages: patents and trademarks. Google notes that it is only hosting the data provided by the USPTO; it isn’t altering it or changing it in any way. And this is BULK hosting. The patent data download page lets you choose between several different types of information (Grant images, Grant full text, Grant bibliographic data, Published applications, Assignments, Maintenance fee events, USPTO Red Book and Classification information) but when you pick one of them you’ll get a list of years and a list of files. I downloaded a patent assignment file and looked at it. It was a small XML file.
I looked at the trademark data page, which consists of Grants & applications, 1870-2008, Recent applications, Recent assignments, and Trademark Trial and Appeal Board decisions. Recent Assignments was another huge set of zip files for the last three years. I downloaded one at random and opened it. It’s an XML file with information about the schema at the top and a bewildering array of text below, using that schema, that was meant for text parsers or bots, not people.
I’m kind of surprised that Google is making this available as ZIP files; maybe it wants you to download it to your own machines before you start slicing and dicing it. I don’t do a lot of trademark and patent research; all I know is that there’s a LOT of data here, and according to the press release there’s going to be even more.