Three cheers to Anne F, who let me know about the new Chicago Foreign Language Press Survey from the Newberry Library. It’s available at
The Chicago Foreign Language Press Survey was actually published over 70 years ago; the Newberry Library has brought it into the 21st century. Here’s how the site describes it: “The Chicago Foreign Language Press Survey was published in 1942 by the Chicago Public Library Omnibus Project of the Works Projects Administration of Illinois. The purpose of the project was to translate and classify selected news articles that appeared in the foreign language press from 1855 to 1938. The project consists of 120,000 typewritten pages translated from newspapers of 22 different foreign language communities of Chicago.”
There are over 48,000 articles in the collection. They can be searched by keyword, browsed by groups (groups include Albanian, Filipino, Lithuanian, Croatian, and Slovak), browsed by year (1855-1940), browsed by “Codes” (This is a tree of subject headings — a huge tree), or browsed by source (there are over 400, from the 1933 World’s Fair Weekly to Zwei Jahrhunderte Chicago.
The subject matter spans a great deal, but there’s a lot to be found on the topics of immigration laws, assimilation, education, economics, and social mores. I found many interesting articles just searching for the names of figures of the time. A Russian newspaper wrote a very kind eulogy to Will Rogers in 1935, while in a Lithuanian newspaper I found a reference to a letter from Upton Sinclair (though, sadly, not the letter itself.)
I did a search for computer and got 45 results, mostly because the search engine was matching on things like compute. Attempts to alleviate this by searching for “computer” and +computer didn’t work, in fact they made the results a lot worse. So be sure to use very precise, or, ideally, multiple keywords when you search this resource.
That aside, I love the elegance of the results page. A permanent link to the search results is available at the top of the page. After that there are summaries of matching articles along with information about the original language, source, and date. Click on a summary for the full article, and, beneath the full article, images of the cards from which the article came. Clicking on the headline of the article took me to a direct link to the article with a little additional information, including the article and its information in raw XML.
Though the articles were translations, I did not find them awkward or difficult to read. I did find myself at times interested in a particular source, but didn’t find any additional information at Newberry. Going to the LOC’s historical US Newspaper Directory got me more data about titles. One time it didn’t have the title I was looking for (Cesky Odd Fellow), but it did have a similar title (Cesky republikan) which was also in Chicago.
With the wide matching that the keyword search does, you might have to do some experimental searching before you get the best results, but even a casual browse here turned up fascinating historical material.
Thanks to Drake Library for the pointer to Native Languages of the Americas, available at http://www.native-languages.org/. This site contains over two thousand pages of information about several hundred Native American languages. The site’s been around for a long time, so I’m not giving you a cutting edge site, but what a huge wealth of data.
There’s actually a lot of information about various Native American tribes here, but the center of the site is a master list of Native American languages, from A’ananin to Zuni. Information about the languages varies from just a couple of sentences (Wazhazhe) to multiple paragraphs divided into several topical headings (Blackfoot).
Along with an overview each language page has links to language resources, language lessons, literature and texts, and course descriptions, language archives, and tapes. Of course how much is available varies by language, but I was pleasantly surprised that even an extinct language like Mattole had at least a few links available.
To get a sense of the wide variety of materials available on this site, visit its site map. I especially liked the “IAQ” (Infrequently Asked Questions) right next to the Frequently Asked Questions. Plenty to see here.
Google announced last week some updates to its translate project. I normally don’t use Google Translate outside the regular Web interface, so I’m sure I’ve missed a lot as it’s evolved.
Google Translate lives at http://translate.google.com/. It looks a bit different from what I remember, with more access to tools “up front” and an overt language autodetect for anything you might type into the search box. And of course you can translate a document as well. Over fifty languages are available for translation. But as I got further into looking at the new Google Translate, I discovered that all the good stuff, once again, lies outside of Google Translate’s home page.
For example. Google Translate Search. When you run a search you can choose to have results from Google Translate Search, which you can find at the very bottom of the search options toolbar. Google will decide which languages are appropriate for your search, run the search, report which languages it used, and translate the search results for you.
I searched for pierogi. Google decided I should have results from Polish and Lithuanian searches, and gave me translated pages of results. I didn’t get too much into them, but the snippets indicated perfectly acceptable translations for machine-level (of course they were mostly recipes.) I was amused to note that one of the results was the Polish Wikipedia translated into English. (There were far, far more Polish results than Lithuanian.)
If you don’t want to get that deeply into non-native-language search, keep an eye out for the Translate this Page links by the search results. Whenever you find a page that’s not in your native language, Google will give you a translation link.
Finally, Google has made Google translation a part of its shortcuts. You can do short translations from the Google search box. I found things like hello in German worked fine, but sometimes I had to specify using the word translate, like the query translate how are you in Swahili.
Note that you only get the translation. If you want to know what it sounds like, you’ll have to click on the link and go to the Google Translate page, where you’ll get a link to hear the translation.
I’m going to find the translate shortcut useful, but Google Translate Search’ll be pretty fun too, if I can remember to use it.
Thanks to HeraldScotland.com for the pointer to a new guide on Gaelic place names.
The National Gazetteer of Gaelic Place Names is located at http://www.ainmean-aite.org/ and is available in English and Gaelic. Currently it contains information on about 1000 Gaelic place names throughout Scotland. You can do a simple search by keyword, and advanced search (across several fields) or view all place names from A-Z.
I did a search for Glasgow. I got an information page showing the Gaelic name (Glaschu) and meaning, along with information about the location including location and local authority, elements (“G/P glas, ‘green, grey’ + *cu, ‘hollow’”), and pointers to external resources and more information.
In addition to the Gaelic names database the site also has some Gaelic maps, guidelines to Gaelic place names and orthography, a link list, and a blog (with one entry so far.)
Ever needed example sentences translated into lots and lots of languages? Here you go. Tatoeba (http://tatoeba.org/) is a database of sentences translated into many (over 40) languages. It’s in beta, which means that some of the features (like audio pronunciations of example sentences) are woefully underdone. But there’s still material for language lovers.
The front page does have a random sentence function, but the search box is at the top of the page. Specify from what language to what language you want to search, and any keywords you want to include. I searched from English to Any and searched for the keyword Hello. I got 58 results, from Hello? Are you still here? to Hello, it’s me, Nancy! I must say some of these sentences had plenty of personality (“Hello, what’s that? Somebody doing street theatre or something?”) so play around with your keywords.
Search results include the from language you’re searching, and then the sentence in all the other languages available. Some sentences had just the English version and one other version (usually Japanese or French.) Other sentences had results in French, German, Polish, Vietnamese, Czech, Arabic, Portuguese… each sentence has its own standalone page, where you can (if you’re registered) post a comment. (Registration also allows you to contribute to the Tatoeba site.) There’s also a log of changes to the sentence. Sometimes you’ll see that sentences are also “owned” by people. If sentences are not owned, you can “adopt” them and make changes yourself.
You’ll notice that most sentences have an audio icon next to them and in almost all cases that icon is marked with a red slash. That’s because while there is audio pronunciation on the site, it is at the moment extremely limited, so for the most part you will not see it as an available option for the sentences.
I liked the idea of a database like this — languages broken down into simple sentences, with many available — but I didn’t have a lot of hope for the sentences themselves. I am happy to report I was wrong; I love a database that has sentences like “Math is like love – a simple idea, but it can get complicated.” Or the vaguely sinister Bring everything to ruin. I found that the search engine was the most useful way to explore the site, but you might like the lists of topical sentences created by users or even, if you just want to browse, the random sentence option.
Of course I’m looking forward to the audio but the site has a lot to offer already.
The Official Google Blog announced yesterday that Google Translate was getting more text-to-speech translation options. English and Haitian Creole were the initial languages, and French, Italian, German, Hindi, and Spanish were added a couple weeks ago (I musta missed that!)
Google Translate has added the speech synthesizer eSpeak, which is adding text-to-speech for Afrikaans, Albanian, Catalan, Chinese (Mandarin), Croatian, Czech, Danish, Dutch, Finnish, Greek, Hungarian, Icelandic, Indonesian, Latvian, Macedonian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Swahili, Swedish, Turkish, Vietnamese and Welsh. (Does this strike anyone else as kind of an odd selection of languages? Where’s Japanese, for example? Why have Icelandic and not, say, Arabic or Hebrew? No offense intended to anybody’s language.)
You can try this for yourself. Google Translate’s URL is http://translate.google.com/, while if we wanted to get an English to Hungarian translation for good food, the URL would be http://translate.google.com/#en|hu|good%20food. (By the way, that URL is gorgeous. I love that structure.) There’s a little speaker icon by the translation; click on it and a rather tinny machine voice will tell you a jó étel. You can contribute a better translation, but that’s for text only, not speech.
What I wanted to do after listening to this was go to Forvo and see how the pronunciation compared to a human’s pronunciation, but while Forvo had plenty of Hungarian words I couldn’t find either of those particular Hungarian words. Machine translation always makes me a little nervous, while machine pronunciation makes me slightly less nervous but still concerned. Google’s making all these languages available is a huge step forward, but I wish I had something with which I could compare these translations…
Thanks to Lifehacker for the pointer to the U.S. Foreign Service Institute langauge courses online. They’re free and available at http://fsi-language-courses.org/Content.php. Unfortunately I wasn’t the only one who saw this announcement — not by a long shot. Apparently traffic from Lifehacker overwhelmed the site’s servers and the site had taken down some downloadable materials, though every language I looked at had at least PDFs of course material available.
The site covers 41 languages, from Amharic to Yoruba. Pick a language from the list on the left, click it, and you’ll get a list of student materials on the right. (In the case of the screen shot I chose Bulgarian.) There’s student texts (available in PDF) and teaching tapes (available in MP3.) There was almost no annotation for the materials. To stop the site from being overloaded again, you may wish to just download one section or text at a time.
If there’s not enough here, check the OffSite page where you will get pointers to other language lessons, including Polish, Persian, and Dari.
If that’s REALLY not enough, you may wish to explore the following sites for more free language lessons: