This one is for the text nerds! Oh, and the SEO people. You too. Topicalizer allows you to enter a block of text or an URL from which text is to be extracted, and provides you with a variety of statistics on said text. Check it out at http://www.topicalizer.com/ .
Prompted to enter either a URL or a block of text, I entered http://www.poemuseum.org/selected_works/print_telltale.html , which is a fairly plain copy of Poe’s The Tell Tale Heart. (If you’re going to analyze URLs, make sure they have as little extraneous text as possible, because everything gets thrown into the hopper for analysis.) Once you do that and give Topicalizer a few seconds, you’ll get a dizzying array of information about that particular block of text.
Like what? Tons of stuff, including word count, average number of words per sentence/paragraph, most frequent words and phrases (phrases cover varying number of words), longest and shortest sentence, three different estimates of readability, suggested keywords, and an abstract. Now for a short story the abstract doesn’t create anything very comprehensible, but for the nonfiction pages I tested it wasn’t bad.
The main part of the site is interesting, especially if you want to get an idea of the general structure of a chunk of text. But Topicalizer also has a tools page. (Some of these tools require just a block of text, you can’t use an URL.) From this page you can do several things, include find similar documents and do augmented keyword extraction (REALLY augmented, like getting “fibrocystic disease of the breast” from the first paragraph of Tell Tale Heart augmented).
Topicalizer has both a FAQ and a blog. An API is also available. VERY interesting.