Friday, December 31, 2004

(Potentially relevant link)

Due to the lack of one-to-one correspondence of words in natural language, translating a word such as 'hot' (spicy? high temperature? sexy?) into German (which has a word that covers high temperature and sexy, but uses a different word to mean spicy), one must consider the context of the word. In doing so, one must consider the context of the surrounding words.
It sounds analogous Google's PageRank, and seems like what you need is a recursive analysis that slowly approaches a correct translation. But there's a difference: PageRank gives you numbers for each item, but words aren't numbers. You can't really say the German word 'scharf' (sharp/spicy/sexy) is somehow closer to the meaning of 'hot' (when used to mean high temperature) than the word 'Gürteltiere' (armadillo). They're just both 100% wrong.
Now, perhaps you can say 'ganz gut' (literally 'entirely good', actually just 'alright') is closer to 'fabulous' than 'Gürteltiere' (armadillo), but further than 'sehr gut' (very good, and quite high praise indeed). But if we could get it to the point where we were only worrying about such minor things such as that, we'd be mostly done. The tough part is taking 'hot' and knowing whether it's 'scharf' (sharp/spicy/sexy) or 'heiß' (high temperature/sexy). Recursion only helps if you get closer to the correct answer as the number of iterations increases. This problem, is perhaps more attuned to a backtracking solution as you might use in Chess: if one branch doesn't make sense, go up your line of reasoning a level and try the next option. (Prolog may be the language of the future. :-)
But then you have the difficulty of determining whether a particular branch 'makes sense'. How is the software to know when to give up on that line of substitution and backtrack?

1 comment:

Rick Campbell said...

Full of wit, thanks, really enjoyed the article. What about making any data room reviews in the same witty style? It is a very burning topic now.