Google's Literary Pretensions
Via Arts & Letters Daily comes Ben Macintyre's London Times musings on the really nouveau roman:
Literature is constantly being refashioned, if not actually rewritten. The whole of Austen has recently been repackaged as chick-lit, complete with pastel covers and skinny women with handbags. So-called fanfiction is booming, on websites where amateur writers continue their favourite stories: the further adventures of the Darcys, the Hobbits, Sherlock Holmes and Captain Kirk. The Fanfic.net website has more than 200,000 Harry Potter stories that J. K. Rowling never wrote.
This huge wave of derivative literature is a homage to the contagious power of fiction; soon it may be generated by the push of a button. Mathematicians at Google have invented a new algorithm (how's that for a gripping opening line?) that will soon be able to produce perfect instant translation. Within a given context of prose, they say, it is possible to work out mathematically the most appropriate translation for every word.
If computers can translate English into perfect French, then they can presumably translate English into perfect Shakespeare in the same way. Thus, in some distant future realm of literature, we may be able to feed, say, a work by Stephen King into your computer and then get the same story out, but as Shakespeare would have written it, at the other end.
Maybe we'll even get L. Ron Hubbard's version of South Park.
Reason's Charles Paul Freund looked at wonderful, politically charged cultural appropriations here. And I talked about culture as a "perpetual meaning machine" here. And Reason took a tour of fan fiction here and here.
Editor's Note: As of February 29, 2024, commenting privileges on reason.com posts are limited to Reason Plus subscribers. Past commenters are grandfathered in for a temporary period. Subscribe here to preserve your ability to comment. Your Reason Plus subscription also gives you an ad-free version of reason.com, along with full access to the digital edition and archives of Reason magazine. We request that comments be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of reason.com or Reason Foundation. We reserve the right to delete any comment and ban commenters for any reason at any time. Comments may only be edited within 5 minutes of posting. Report abuses.
Please
to post comments
Google's got a long way to go... their current translation technology is pretty weak.
Courtesy of The Dialectizer:
Literature be constantly bein' refashioned, if not actually rewritten. 'S coo', bro. De whole uh Austen gots recently been repackaged as chick-lit, complete wid pastel covers and skinny honky chicks wid handbags. So-called fanficshun be boomin', on websites where beginna' scribblers continue deir favourite sto'ies, dig dis: de furda' adventures uh de Darcys, de Hobbits, Sherlock Holmes and Captain Kirk. Ya' know? De Fanfic. Co' got d' beat!net website gots mo'e dan 200,000 Harry Potta' sto'ies dat J. K. Rowlin' neva' wrote.
I have an awful hard time thinking how they can translate slang properly, not to mention things like cause that improperly are lacking an apostrophe, which is the downside to spellchecks by the way. Not to mention improper grammar by folks posters like yours truly. And if they can't do that, then there will definitely be "lost in translation" foolishisms that remind me of translations by foreigners into English. (Not to bash foreigners, I'm sure Americans do an equally poor job of translating things into non-English.)
Context means a lot, and just cause a word in a translation dictionary is correct at least some of the time does not mean it will be correct all the time. They should have no problem writing an algorithm that gets the grammar rules correct though. Assuming of course they have the brainpower to write algorithms in the first place....
Mathematicians at Google have invented a new algorithm (how's that for a gripping opening line?) that will soon be able to produce perfect instant translation. Within a given context of prose, they say, it is possible to work out mathematically the most appropriate translation for every word.
I don't think this will produce perfect translations. The key word here is mathematically, which I have emboldened.
They should have no problem writing an algorithm that gets the grammar rules correct though.
I'm not sure that's true, at least in the case of translating English into another language. It's rules are so malleable, especially when it comes to creative writing, that it can be extemely difficult to fit it into more rigid grammatical structures.
...it is possible to work out mathematically the most appropriate translation for every word.
I wonder if they have, or if it's even possible, an accurate equation for connotation. Certainly when translating an older text or one from an uncommon language, getting the meaning correct is paramount. But I would say that in translating a contemporary novel from French to English or what have you, the strict meaning is fairly easy to capture, but that connotation, wordplay etc seems beyond the capabilities of current programs.
This brings to mind the episode of News Radio where Jimmy James had his book translated into Japanese and then from Japanese back into English. His work, originally titled "Capitalist Lion Tamer" came back as "Macho Business Donkey Wrestler".
Speaking as an actual expert here, I'm extremely skeptical that Google will be able to do any better than anyone else in the past forty years (which is about as long as 'machine translation' has been 'just around the corner'). For many of the reasons given above by other commentators, recovery of the intended message from written text requires an understanding human being. And while quite a bit can be done with statistical models, I think John Searle got it right quite a few years ago when he suggested that without a living mind (a real one) comprehension is not really going on.
Further to all the additional comments, the number of spelling and punctuation errors in the above comments only further underline the problem of automatically understanding language--only someone with the ability to empathize with the speaker (or writer) stands any chance of understanding a message.
the number of spelling and punctuation errors in the above comments only further underline the problem of automatically understanding language
Hey! I resemble that remark.
Not to mention translating bad puns.
yeah lets pick on the fiction writters...it is not enough that readership is falling but now thier jobs will soon be replaced by an imac.
I know the bully whip factory an all but as a good and pure libertarian the best point to lament the loss of a bygone age should be right before it is bygone....and then let it happen.
I'm extremely skeptical that Google will be able to do any better than anyone else in the past forty years
what do you have in your skull?? an organic computer...when will computers have seficiant power as the language centers in your brain?...how about in the next 5 years...
It is a solvable problem that will be solved.
hey at one time they coulnd't cure a bacteria infection...and then one one day they could.
What they haven't been able to invent is a computer that can translate as good as a human brain using less proccesing power...which you might be right that such a thing is impossible...but with a proccesor as powerful as a human brain...well you are dead wrong there.
but with a proccesor as powerful as a human brain...well you are dead wrong there.
All of which raises the question: so why bother making such a computer? If all you're doing is trying to create something that just does what a human can already do anyway, then why not just make another human instead? It's easier and more fun.
but with a proccesor as powerful as a human brain...well you are dead wrong there.
joshua corning,
Would you mind sharing your sophisticated mathematical calculations with the rest of the class?
Or maybe my comment should have read:
It is a solvable problem that will be solved.
Would you mind sharing your sophisticated mathematical calculations with the rest of the class?
what do you have in your skull?? an organic computer...when will computers have seficiant power as the language centers in your brain?...how about in the next 5 years...
It is a solvable problem that will be solved.
Quite a few programmers and linguists have been trying their entire lives to do so. The harder they try, the more the statistical modeling necessary to get computers to parse language proves impossible. Because language is infinite.
Think about it. There's a certain probability that a noun will follow "the". But after that? The possibilities skyrocket. When you add in the fact that sentences are recursive (in theory, infinitely so), you can see why it's impossible to predict. (And this is just for parsing...it doesn't begin to get into language PRODUCTION.)
Steve Abney, formerly at Bell Labs, and one of the best that I've read in this field, predicts even the best system will never get above 98% IIRC. Will that be "good enough" for applications like this? Depends what you want to use it for, I suppose.
Maybe the question ought to be not "Can we" but "Should we". Does the world really need something called "Passion's Captive" translated automatically upon release into 753 languages? Or, if you thought people had fun with a few cartoons, just wait till they get hold of E! Online!
what do you have in your skull?? an organic computer
Add me to the list of skeptics. The human brain does more than just "calculations" - it makes leaps and connections that are beyond any computer. If Google has the computing power to overcome such limitations--i.e., more computing power than ever seen before--then I'm impressed.
Add me to the long list of skeptics above, though I limit my skepticism to the claim of being able to produce "perfect instant translation" anytime soon. This is not my area of expertise but I have not read any literature that suggests that state of the art machine learning can do all that. In fact I ran this claim by a couple of friends who work in this field at a very well-known competitor to Google and they were extreeeemly dubious. I think the catch here here may be the "Within a given context of prose" qualifier. If the given context is "mary had a little lamb", then well !
Hey! I resemble that remark.
Courtesy google, translating from English to a few other common lingos, and back again:
Hey! I am similar that observation.
Hey! I resemble this note.
Hey! I resemble it this comment.
Hey! I resemble myself that observation.
will soon be able to produce perfect instant translation.
There is no such thing (perfect or instant). Blame the writer, not google.
Within a given context of prose, they say, it is possible to work out mathematically the most appropriate translation for every word.
I'm far more interested in what the Google people might have to say than in what this writer is trying to say.
Think about it. There's a certain probability that a noun will follow "the". But after that? The possibilities skyrocket.
After "the" comes whatever word you have in the source you're translating from.
When you add in the fact that sentences are recursive (in theory, infinitely so), you can see why it's impossible to predict. (And this is just for parsing...it doesn't begin to get into language PRODUCTION.)
I won't be surprised when, in N years, computers can translate at least as well as a person who's fluent in both languages...because language isn't infinite. If "infinite" is some magical barrier to computer translation, why isn't it a barrier to human translation?
[insert 20 to 50 year old funny quote about computers here]
to the skeptics...the human mind does it somehow (or perhaps you think soemthing else does it then i can't help you...a priest might be able to)
Is it your posisition that we will never understand how the human mind does it?
Or is it your position that we will but we will never be able to replicate it?
both positions seem rather amusing.
First let me add to my earlier comment that the problems I cited are just in modeling ONE language. Now when you take into account various grammars, idiosyncracies and idiomatic expressions that would be necessary for perfect translation...well, forget it.
Joshua...it may seem amusing to you. But look at the numbers. Literally. Computers work on numbers. Infinite is impossible for humans to truly comprehend; how could we program computers to do it? The problem is, like others have noted, humans use lots of contextual clues (who they're talking to, how drunk that person is, when something was written, etc., etc.) to interpret language. Not only is grammar recursive, but the range of situations in which a human brain has to interpret it is infinite.
You have a lot of faith in technology. For the most part so do I, and I agree that it's theoretically possible to model a human brain. But to date we have nothing close to a working model of the neural pathways and functions a human must use to interpret even one language, and are nowhere near having a settled theory of how to do so, and nowhere near having computational power that could seriously attempt it.
We can get percentages that seem pretty high, numbers-wise, but is a translation that's 90% correct good enough? Think about how one word out of ten being wrong would/could change the entire meaning of a text.
Hmmm. Now you've got me thinking.
The problem is on the input and output ends. I'm sure we could program computers with English grammar well enough that they could talk to each other. But getting the informaiton back into a human brain poses yet another hurdle. We might not understand what those English-speaking computers were saying. Because it lacks context. And yet, given the context that they are computers talking, our brains CAN make an educated guess at what the meaning is.
I think that's what it comes down to. Humans use their contextual experience to guess at the meaning; computers can only guess statistically based on the input they're given. Psycholinguistic studies indicate we only "really" listen to or read every third word or so. The rest we fill in based on educated guesswork and other inputs.
So, unless we can figure out how our own brains are programmed and ALL the inputs they use to make judgments, we can never program a computer do the same. We'd probably also have to figure out why miscommunications happen between humans...
Hmmm...perhaps I should take that offer to go back for a PhD and look into this further!
My position is that unless Google is running a top secret black ops machine learning/natural language processing project that is advanced beyond what current industry and research publications suggest is possible, its OK to wonder where these claims are coming from. Whether we will ever truly understand the functioning of the human brain etc is moot.
Companies have been known to occasionally hype their products you know ?
BTW - Pubs from people who work at Google labs.
http://labs.google.com/papers.html
Is it your posisition that we will never understand how the human mind does it?
Or is it your position that we will but we will never be able to replicate it?
both positions seem rather amusing.
joshua corning,
I suppose my flippant remark about your sophisticated mathematical calculations was not informative enough.
This is not a matter of position or opinion. I'm pretty sure there is a mathematical way of either proving or disproving the possibility (that computers could ever perfectly translate anything). Like linguist said, it is a matter of input and output. It is a formal mathematical language problem. And if I had paid closer attention in my Formal Languages and Automata class in college, I might be able to explain the problem and possibly prove or disprove it. Unfortunately, I was never talented at mathematical proofs.
But to date we have nothing close to a working model of the neural pathways
I've been out of the loop for a while. Have they figured out how to "back propagate" yet? In my day, before i resigned myself to the inferior academy that is law school, that was what always caused the models to be dismissed as "un-biological." *sigh*
Just to add another informed voice here.
The problem with machine translation is that meaning is under-determined in the code. Semantics is a matter of controlled inference, not cryptography. It has little to do with computing power.
The cool thing is that humans may be able to crack the inference even when the computer translated code is way off mark.
Smacky: I think you want to look into Wittgenstein. Supposedly (I haven't read the stuff directly, so don't know how strong the arguments are) he showed pretty conclusively that it's impossible to strip a meaningful language down to pure logic-it always has too many contextual elements and requires too many infereces, as Science says. But if a statement can't be formulated in purely logical terms, a computer can't handle it. On the other hand, it may be theoretically possible to build a computer capable of complicated contextual inference ("what could he reasonably have been trying to say?". But given that even we have a lot of trouble with that, I'm thinking it'll be a while before a computer can do it well enough to do good translations
Deep Blue did manage to beat me in the end, but then again, with a calculating power about zillion times bigger than mine. A rhino can beat Lennox Lewis, but that doesn't mean that the rhino is a boxer.
Supposedly (I haven't read the stuff directly, so don't know how strong the arguments are) he showed pretty conclusively that it's impossible to strip a meaningful language down to pure logic-it always has too many contextual elements and requires too many infereces, as Science says. But if a statement can't be formulated in purely logical terms, a computer can't handle it.
You (and many others) are jumping to conclusions, I think. Did google claim their new translator will work by pure logic?
I'm guessing the card up google's sleeve is their access to a BIG chunk of everything ever written by anybody - in many different languages,
and in many different historical periods: complete, large sets of "from" and "to." That's a far bigger contextual database than any human (mind) has reasonable access to.
Why do people get hung up on "perfect" translation? Human experts cannot do it. Some words or phrases have no perfect translation, many equivocations will be language-dependent, and some shades of expression will be limited by the structure of context-dependency of the language. At some point, you pick between translating for intent or literal meaning, between keeping the style or being comfortable translating a few words into a very long sentence. I routinely criticize professionally translated works for dropping difficult shades of meaning.
I have no idea how you translate multi-language puns. In English, we also have the habit of just adopting useful words. What is the English translation for tornado, sushi, or schadenfreude?
How about "good enough" translation? "As good as a competent human translator"? Language translation is one of the hardest tasks for computers, and the most optimistic estimates I have seen do not expect great translations within the next decade.
The reference to Searle's Chinese room is relevant but misguided: the computer will need to understand both languages to translate effectively, just as humans do. It is a sort of multi-language Turing test, but it should be feasible given sufficient algorithms and computing power. The only arguments I have seen for its theoretical impossibility assume what is to be proved: that computers cannot understand things. "[G]uess statistically based on the input they're given" means "use their contextual experience to guess at the meaning." That is what your brain is doing.
How many translations of The Iliad are there? Are most (all?) of them wrong, or just reasonable but different translations of the same text? I do not know that "perfect translation" is a meaningful concept, but I look forward to seeing human-comparable translation quality from computers. In a long while.
Mr. F has a point.
Not that it would solve the problem completely.
I wonder if Google is making their claim based on this line of research (you can get a copy of the pre-print from a link halfway down the article).
http://www.newscientist.com/article.ns?id=dn6924
The google distance is very cool. Don't think it ramps up past lexical semantics much, but very cool.
By the way,
Skip the Wittgenstein.
A better place to start is Dan Sperber's work.