I was going to start out this writeup with an obscure reference to Jane Siberry and Lena Is a White Table, but I decided hmm, better not. (And her name is Issa now, I missed that.)
What was I talking about? Oh yes. Penn State developers have come up with a new search engine. TableSeer allows searchers to identify and extract tables from PDF documents. The engine will also rank table results based on factors relating to the table, like title and date of publication. Check it out at http://chemxseer.ist.psu.edu .
That URL is actually for ChemxSeer, but does have a link to TableSeer. The direct link is http://search.ist.psu.edu:8085/tableidx.jsp and attempts to access that page repeatedly timed out, so I can’t give you any direct feedback about the search engine.
I can point you to a PDF about TableSeer, available at chemxseer.ist.psu.edu/about/digital_library/Liu-JCDL2007.pdf . I can point you to the press release about TableSeer, which includes some interesting stats about tables in papers, at http://www.psu.edu/ur/2007/tableseer.htm . But I can’t actually use the engine! Note to self: check this later….