Example Sentences? There’s a Database for That

Ever needed example sentences translated into lots and lots of languages? Here you go. Tatoeba (http://tatoeba.org/) is a database of sentences translated into many (over 40) languages. It’s in beta, which means that some of the features (like audio pronunciations of example sentences) are woefully underdone. But there’s still material for language lovers.

The front page does have a random sentence function, but the search box is at the top of the page. Specify from what language to what language you want to search, and any keywords you want to include. I searched from English to Any and searched for the keyword Hello. I got 58 results, from Hello? Are you still here? to Hello, it’s me, Nancy! I must say some of these sentences had plenty of personality (“Hello, what’s that? Somebody doing street theatre or something?”) so play around with your keywords.

Search results include the from language you’re searching, and then the sentence in all the other languages available. Some sentences had just the English version and one other version (usually Japanese or French.) Other sentences had results in French, German, Polish, Vietnamese, Czech, Arabic, Portuguese… each sentence has its own standalone page, where you can (if you’re registered) post a comment. (Registration also allows you to contribute to the Tatoeba site.) There’s also a log of changes to the sentence. Sometimes you’ll see that sentences are also “owned” by people. If sentences are not owned, you can “adopt” them and make changes yourself.

You’ll notice that most sentences have an audio icon next to them and in almost all cases that icon is marked with a red slash. That’s because while there is audio pronunciation on the site, it is at the moment extremely limited, so for the most part you will not see it as an available option for the sentences.

I liked the idea of a database like this — languages broken down into simple sentences, with many available — but I didn’t have a lot of hope for the sentences themselves. I am happy to report I was wrong; I love a database that has sentences like “Math is like love – a simple idea, but it can get complicated.” Or the vaguely sinister Bring everything to ruin. I found that the search engine was the most useful way to explore the site, but you might like the lists of topical sentences created by users or even, if you just want to browse, the random sentence option.

Of course I’m looking forward to the audio but the site has a lot to offer already.

About Tara Calishain

Covering the world of search engines, databases, and other online information collections since 1996.

Posted on May 31, 2010, in Uncategorized and tagged , , . Bookmark the permalink. 2 Comments.

  1. Sharon Hockensmith

    I’m usually really enthusiastic about your incredible web finds and so I’m a great fan, but I have to say, Tatoeba is a miss for me. I travel a lot so I like language web sites. But so many of the English language sentences on this site are grammatically incorrect, I don’t think I trust it to generate a foreign language sentence that wouldn’t have similar errors. Perhaps user input will make corrections over time, but right now Tatoeba isn’t ready for “prime time.”

  2. Hi I’m one of the guy behind Tatoeba,

    I’ve just discovered your articles, glad you like our website, for audios, yep as our policy for audios are “recorder by native with good mic”only, and as it’s a feature we’ve added a month ago, this is still something “rare”, moreover the current audios are available only in Chinese , French, Shanghainese and Dutch, but the number of recorded sentences and number of languages will continue to grow

    @Sharon,

    For the English mistakes, yeah we all admit they do have a loooot of mistakes, but there is a reason, most of them come from a previous project and has been in fact written by Japanese student to become the takana corpus, so all this sentences are really likely to contains mistakes

    all this sentences has been imported in tatoeba a long time ago, at the very beginning of the project (in 2007 something like that), so basically 90% of Japanese and English sentences come from this corpus,
    you will note that all these sentences have no owner, that’s basically the way you have to know if a sentence is “reliable” or not,
    if it belongs to someone => reliable
    if it doesn’t belong to someone => unreliable

    But for all the other sentences, they have been added from tatoeba, by tatoeba contributors, all of them add sentences in their native languages, and we have a lot of discussion about “Is my translation good? Does it translate all the meaning of your sentences etc.”, moreover the english sentences are most of the time not used as the “source” sentences

    So for the moment reviewing all these English sentences is one of the major task in Tatoeba yet.

    But I still think tatoeba is very useful for language learner because:
    1 – for people who doesn’t care about English (a Chinese who wants French/Chinese sentences for example), as all the others languages are maintained by native, they can 99% trust the sentences

    2 – for an English native learning another foreign languages, as English is his native language, he will be able to get rid of the English mistakes, the important for him is to have reliable foreign sentences.

    3 For a non-English native who wants to use tatoeba to find English sentences, yeah I admit this is our current “fail” segment, but he still can find reliable result by only trusting “owned” sentences or by posting a comment for the “hmm this sentence seems weird” ‘s sentences

    because I think this really a strong point of tatoeba, the community, if you asked some precision for a sentence, even in some “rare” languages (Uighur / Estonian for example) , there will always be someone to answer you nicely. and if it’s about a “weird” English sentences, it will permit us to correct one sentences, and everyone will profit of this,
    1 you will have precision about grammar point/usage etc.
    2 people who will search after you will find a corrected + commented sentences

    Anyway the project is growling really fast ,we’re literally adding/improving features every weeks (you can follow them on our blog http://blog.tatoeba.org/ )

    So I hope I’ve not written too much, and haven’t made too many mistakes (I’m French and my English sucks :( ), and I hope our constant improvement in tatoeba will make you feel that soon it’s ready for “Prime time” :)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 3,867 other followers

%d bloggers like this: