Image: Ian Lamont/Creative Commons
Czech this out: Instead of using a dictionary to look up the meaning of a word in another language, three researchers have come up with a more complex yet more elegant method: figuring out how specific words relate to one another and using those relationships to essentially "reverse engineer" translation dictionaries. Their theory goes like this: if you map the relationships between a word like "dog" and other related words—"animal," "pet," "bark,"—in two different languages, you are likely to create very similar looking maps. That's because statistically, they argue, these relationships are relatively similar across languages.
Throw in enough data to create better language maps (these are known as "vector spaces," as they're made up of vector lines) and math can do the rest, identifying more specific correspondences between your two maps and making an educated guess about the meaning of the words within. “Despite its simplicity, our method is surprisingly effective: we can achieve almost 90 percent precision for translation of words between English and Spanish,” write Tomas Mikolov, Quoc V. Le ,and Ilya Sutskever.
You still need humans in the loop: to begin to understand how to transform these relationships from one language to another, they started with a small set of human-compiled definitions and translations. And relying on computers to "read" different languages and create dictionaries sort of ex nihilo doesn't seem foolproof, of course, especially when we're talking about (or talking in) languages from different families: Chinese and English have very different structures for instance, though the researchers contend that because their method "makes little assumption about the languages ... it can be used to extend and refine dictionaries and translation tables for any language pairs."
Compared to existing methods of machine translation, which rely on some statistical analysis but primarily on human-compiled dictionaries, the technique is an impressive foray into a new kind of translation, one where, as with so much else, the meaning of content gives way to the form of data. For now, the process can be useful, they demonstrate, to help improve existing traditional machine translations (they proved this by finding multiple errors in an English-Czech dictionary). Of course, it helps to have a lot of language data to work with, not to mention time and money, and they do: all three engineers work at Google.