Monday February 5, 2007

Fold the Exalted Dog

This quarter I'm working on a machine-translation project.  For starters, we're working with a set of seventeen sentences that exercise some simple grammatical phenomena.  I don't speak most of the languages that have landed on my plate, so as a first pass I've been running the sentences through machine translation systems on the web.  I realize it's old news that round-trip translations are funny, but the results for the English-Korean-English loop are especially dreadful.

Here are the results produced by Google's language tools.  The first line is the original English, the second line is the Korean translation, and the third is the Korean sentence translated back to English:

  1. Dogs sleep.
    개는 잔다.
    It folds and it sleeps.
  2. Dogs chase cars.
    개는 차를 쫓ㄴ다.
    When it folds [ccoch] all.
  3. I chase you.
    나는 당신을 쫓ㄴ다.
    I you [ccoch] all.
  4. These dogs sleep.
    이 개는 잔다.
    Folds and it sleeps.
  5. Dogs eat.
    개는 먹는다.
    It folds and it eats.
  6. I can eat glass.
    나는 유리를 먹어서 좋다.
    I eat a separation and.
  7. It doesn't hurt me.
    그것은 저를 낙상하지 않는다.
    It does not hurt me.
  8. The dogs chase cars.
    개는 차를 쫓ㄴ다.
    When it folds [ccoch] all.
  9. I think that you know that dogs chase cars.
    나는 개가 차를 쫓ㄴㄴ다는 것을 당신이 알고 있다고 생각한다.
    I think the triumph car that [ccoch] you are knowing the fact that it wears out.
  10. I ask whether you know that dogs chase cars.
    나는 개가 차를 쫓ㄴㄴ다는 것을 당신이 알고 있는지 질문한다.
    I [ccoch] you should have been knowing the fact that it wears out, question the triumph car.
  11. Cats and dogs chase cars.
    고양이와 개는 차를 쫓ㄴ다.
    With the cat when it folds [ccoch] all.
  12. Dogs chase cars and cats chase dogs.
    개는 차를 쫓고 고양이는 개를 쫓ㄴ다.
    It purses when it folds exaltation the dog [ccoch] all.
  13. Cats chase dogs and sleep.
    고양이는 개를 쫓고 잔다.
    Exaltation it purses the dog and it sleeps.
  14. Do cats chase dogs?
    고양이는 개를 쫓는가?
    Exaltation it purses the dog?
  15. Chase the dog!
    개를 쫓십시오!
    Dog [ccoch] ten:00 five!
  16. Hungry dogs eat.
    배고픈 개는 먹는다.
    It is hungry and it folds and it eats.
  17. Hungry dogs eat quickly.
    배고픈 개는 빨리 먹는다.
    It is hungry and it folds and it eats quickly.

Now, I should point out that I speak absolutely no Korean.  I can't even read Hangul.  Nonetheless, I can say with confidence that these results are bad, bad, bad.  Some of the English results are flatly ungrammatical (e.g. I eat a separation and).  The noun dog has, more often than not, been turned into a form of the verb fold, while many of the sentences have mysteriously gained the word exaltation.  And what's that [ccoch] everywhere and ten:00 five in sentence 15?

You might argue that Korean and English are such different languages, with such different writing systems, that such wacky results are only to be expected, but have a look at the results for the same sentences round-tripped through Japanese, a language that generally resembles Korean syntactically and has a much trickier writing system:

  1. Dogs sleep.
    The dog sleeps.
  2. Dogs chase cars.
    The dog pursues the car.
  3. I chase you.
    I pursue.
  4. These dogs sleep.
    These dogs sleep.
  5. Dogs eat.
    The dog eats.
  6. I can eat glass.
    I may eat the glass.
  7. It doesn't hurt me.
    That does not damage me.
  8. The dogs chase cars.
    The dog pursues the car.
  9. I think that you know that dogs chase cars.
    I think that you know that the dog pursues the car.
  10. I ask whether you know that dogs chase cars.
    I ask whether or not you know that the dog pursues the car.
  11. Cats and dogs chase cars.
    The cat and the dog pursue the car.
  12. Dogs chase cars and cats chase dogs.
    The dog pursues the car, the cat pursues the dog.
  13. Cats chase dogs and sleep.
    The cat pursues the dog, sleeps.
  14. Do cats chase dogs?
    Does the cat pursue the dog?
  15. Chase the dog!
    Pursue the dog!
  16. Hungry dogs eat.
    The hungry dog eats.
  17. Hungry dogs eat quickly.
    The hungry dog eats directly.

Now, I don't mean to say that these translations are perfect.  Grammatical number and definite articles have generally been lost because Japanese doesn't (straightforwardly) mark for those, and there's some minor variation in word choice (e.g. pursue for chase).  What's more, some of the Japanese sentences (which I can read, thank you very much) strike this non-native speaker as pretty awkward.  For instance: これらの犬 for these dogs is odd (though it does preserve the plurality), 食べてもいい for can eat changes potential into permission, and 猫および犬 for cats and dogs is just weird (I had to look up および 'and; as well as', which I'd never encountered before).  Still, compared to the Korean round-trip, the Japanese round-trip has done remarkably little violence to the original sentences.

I wonder if there's a bug in Google's Korean translator.  Is it possible I somehow fed it bad input?  Either way, it looks like I'm going to have to crack open a grammar and learn a little Korean.

I am The Tensor, and I approve this post.
02:00 AM in Computers , Linguistics | Submit: | Links:


TrackBack URL for this entry:

Listed below are links to weblogs that reference Fold the Exalted Dog:

» Roundup from mike's web log
Language-y stuff today.Hink Pinks. Riddles with rhyming answers. Example ... Q. What do you call a chubby kitty? A. Fat cat. A hink pink has answers of one syllable [Read More]

Tracked on Feb 7, 2007 7:56:10 PM


Interestingly there is broken (literally) in a number of the sentences. When you see ㄴ alone that is not allowed. Most likely the correct piece there is 은. The reason you keep seeing "fold" is that 개 is dog, but 개다 (verb) is fold up.

Additionally a couple of the Korean sentences are just wrong:

I can eat glass.
나는 유리를 먹어서 좋다.

The Korean literally translates as I ate glass so it is good the correct translation of the original sentence should be 나는 유리를 먹을수 있다.

I ask whether you know that dogs chase cars. 나는 개가 차를 쫓ㄴㄴ다는 것을 당신이 알고 있는지 질문한다.
I don't think this is necessarily wrong, but it seems to be very unnatural and should probably read: 나는 개가 차를 쫓은다는 것을 (당신이 not necessary) 알고 있냐고 한다.

Of course I'm not a native speaker of Korean so I may have made a mistake or two here. But I can guarantee that not all of the Korean is a correct translation of the original English. And the evidence is there for the round trip already. You should see some of the machine translation emails I recieve from my students when they are too lazy to try and compose in English. It's even better when they submit homework that has been machine translated - instant fail.

Posted by: EFL Geek at Feb 5, 2007 3:21:14 AM

And what's that [ccoch] everywhere and ten:00 five in sentence 15?
missed the above in my original comment. the [ccoch] is an attempt at transliteration of 쫓 which is part of the verb 쫓다. ten:00 five is taking the congjugation of the final imperative of the sentence and turning the individual syllables into english words 십=10 오=5 I'm not sure why 시=00 though.

I have no idea what exaltation is in Korean, but I'm positive it is a homonym for one of the words in the sentence.

Posted by: EFL Geek at Feb 5, 2007 4:55:10 AM

I love that Japanese translation of "I can eat glass" because it implies the existence of a whole meta-level of toughness: above those who eat glass are those who grant or deny PERMISSION to eat glass. Damn, that's tough!

Posted by: Matt at Feb 5, 2007 9:36:00 PM

私はガラスを食べてもいい sounds to me like something a young child might say in order to impress his peers. "Well, I get to eat glass, so nyah nyah!"

I also tried some simple sentences on the E-K-E translators, using typical college text vocabulary (I study Japanese language and culture; your book is behind my desk; etc.), and it fared...well, okay...maybe.

His book is behind my desk.
그의 책은 나의 책상의 뒤에 이다.
His book is on rear of my desk.

In this case the K-to-E was as reasonable as you might expect, but the initial E-to-K wasn't. Instead of using dui 'back/behind(?)' as a postposition, it used it as a full noun, so instead of idiomatic 'desk-behind-LOC', we have 'desk-POSS behind-LOC' (or something like that; I have about the equivalent of 5 weeks of college Korean).

(okay, there's also the fact that the main verb is in the infinitival/dictionary form, which, unlike Japanese, is actually not usable on its own in a main clause, AFAIK)

Posted by: Russell at Feb 6, 2007 11:09:59 PM

I took it to mean, deadpan, "I may eat glass" to indicate contemplation of potential imminent total mental mindfusk due to high and mounting personal stress.

But Matt's idea of a hyalinophile regulatory beaureau:cratic system (or shaven headed temple denizens who've already run up all the walls there are and are dancing on the undersides of clouds) was better.

Posted by: LFE O'Mel at Feb 7, 2007 6:44:23 AM

Some of the confusion is also the result of misdivision when the translator parses the Korean. For instance, how does "cat" become "exaltation"? Like this:

고양이는 > 고양 이는
cat-TOP > exaltation this-TOP

Korean lacks true third-person pronouns, so the deictic 이 can be used anaphorically. That explains how 이는 can be translated as "it".

Posted by: Da at Feb 8, 2007 4:15:25 PM

There's a very approachable introduction to the Korean alphabet at I know it isn't your purpose to learn Korean, but I thought that I would mention it. I used it to learn Hangul a few years ago. (I'm not a linguist, but rather a computer hacker (in the positive sense) with an interest in writing systems.)

I can see what is happening with [ccoch]. When it gets confused, it apparently falls back on simple transliteration, and marks it with brackets. As an earlier poster noted, [ccoch] is a transliteration of 쫓.

Hangul boxes up the symbols for each syllable into a group, either consonant-vowel, consonant -vowel-consonant, or consonant-vowel-consonant-cluster. (Where the first consonant can be a special silent one, for what would otherwise be a syllable starting with a vowel.) That's why ㄴ (n) by itself is invalid.

The consonants themselves generally come in three forms: minimally-aspirated, regular, aspirated, lemon-lime, and wint-o-green. Except for the last two, of course.

There is definitely a bug in Google's Korean translator. I think they still have Korean marked as "BETA", right? I hope that you provided feedback. I'm pretty sure Google just licenses the translation software, but they'd certainly appreciate a bug report.

Posted by: David Conrad at Feb 13, 2007 7:00:59 PM

Yeah, the Korean translator's quality is just lower than the Japanese one. I remember a co-worker getting a frantic email from a student he'd failed, and at the end of every sentence, the word "bedspread" appeared. He thought he'd hit the crooked-teacher jackpot, with sentences like, "I would like to discuss my grade bedspread... I would like to increase my mark bedspread. Can we meet and discuss together bedspread?"

Turned out the student was just being polite, and the translating program he was using was so dumb it couldn't differentiate between "yo" as "floor mat for sleeping on" and "yo" as the honorific attached to all the verbs in her polite sentences.

Seems to me one could implement a kind of XML-like markup system to fix some of that. A translator clever enough to track a "yo" after a verb could mark it up as an honorific and not a noun; likewise, "dogs" could be marked up as "plural noun" so that when translated into a language where plurals are implied, and then back, the plurality wouldn't be lost.

Then again, I'm sure I'm not the first to think of this.

Neat site, btw! SF-linguistics blogging...

Posted by: gordsellar at Feb 17, 2007 12:45:52 AM

I can eat glass actually comes out much more like "may I eat the glass?"

Posted by: Japanese words at Apr 14, 2009 9:12:47 AM