Sunday, February 20, 2011

Q: How Smart is IBM’s Watson?

A. Really smart.
B. Not at all smart.
C. Wrong question.

As you’ve probably heard, IBM’s Watson computer system has beaten two Jeopardy champions, Ken Jennings and Brad Rutter, in a two-game match. There’s been a fair amount of press about the match, and some discussion in the blogosphere (e.g. at Language Log). The blogosphere has variously argued versions of A and B, but I’m inclined more to C myself.

Back in the mid-1970s the US Defense Department sponsored a speech understanding research project that ran, I believe, for three years. Three different research groups were tasked with building a system that could answer a spoken question about naval vessels. This was exciting research by the top people in computational linguistics. Measured against those systems, Watson’s performance is astounding. To be sure, some of the difference is in raw computing power. But that’s not all. Watson is using techniques that didn’t exist back then.

At the same, Watson would probably not be able to hold a decent conversation about the topics it produced in response to Jeopardy clues. To some extent, that’s by design. We do know things about conversational interaction that could be employed to make Watson a better conversationalist. But conversation wasn’t required of Watson, so Watson wasn’t built to do it. Still, even given further development, one wouldn’t expect Watson to be a particularly scintillating conversationalist.

There’s a whole list of language behaviors one wouldn’t expect of Watson, or of any other current computer system. This is not controversial nor, as far as I know, has anyone connected with the Watson project asserted otherwise. Watson was created to perform a specific task, and that’s what it has done, and done well.

How difficult is that task? How difficult is it to become a championship Jeopardy player? It’s not clear to me that anyone knows. I’m a smart guy, and I know a lot of things, but I doubt that I am a championship Jeopardy player. It’s quite possible that, given the right topics and opponents, I might win a few games. But a week’s worth of games? Probably not. I could no doubt do better by actively practicing and boning up on all sorts of trivia. But I have no particular reason to believe that I could train myself up to championship level. We don’t know what’s required for humans to play high-caliber Jeopardy, though we now know at least one way to enable a computer to do so.

The thing about Jeopardy is that there’s little prestige associated with it. It’s just a TV game show that started in the 1960s and is still going. It has a large and loyal audience, but prestige? No. It’s entertainment. It’s not serious.

Now, chess, chess IS serious business. The game is 1500 years old. It became an international sport in the 19th century. By the mid-20th century considerable national prestige had become attached to the World Chess Championship. After WWII players from the Soviet Union dominated the international scene until Bobby Fischer beat Boris Spassky in 1972. Between the interest inherent in the game itself and the Cold War context, that event aroused international interest, even among those who knew little or nothing about the game itself.

Chess is known as a difficult game, one at which few excell. In the West, at least, it is the archetype of difficult games. One has to be a brainiac to succeed at chess.

Does chess in fact require more smarts than Jeopardy? It certainly requires different smarts, but more? No one knows, it’s a silly question.

Heading back toward AI, chess-playing is a classic task in the world of artificial intelligence, older than question-answering by two decades or so. AI has devoted considerable time and attention to chess and some may even have believed that, when a computer could beat all human opponents in chess, that would put AI on a level with human intelligence. Thus considerable intellectual capital was on the line in 1997 when an IBM computer known as Deep Blue played Garry Kasparov in a chess match. That was an event; it made news. Big Blue won. The philosophers chattered about it, if not endlessly, at least at some length.

I haven’t noticed much philosophical chatter about Watson’s Jeopardy win, yet one could argue that it’s a more significant technological achievement. Have the philosophers become blasé about things computational? Or is it simply that Jeopardy lacks either the cultural weight of chess or the gee whiz allure of The Coming Singularity, though some see Watson as a portent of that Singularity.

And so we’re back where we began: How smart is Watson? No one knows. We know that building Watson has been a difficult technical achievement, more difficult than building Deep Blue. And yet championship chess is regarded as more difficult than championship Jeopardy. How do we reconcile these two assertions? What could we learn from the attempt at doing so?

1 comment:

  1. I don't know Bill. I thought we stopped treating "smart" as a useful word. It reminds me of the all-too-snide "If you're so smart, why ain't you rich?" Or, in the case of computer systems, 'If you're so smart, how come *I* get to tell *you* what to do?"

    Put the Watson question answering system in my phone, it's betting strategies in my spreadsheet, and give Deep Blue's look ahead capabilities to the diplomatic corp of some country interested in furthering human rights. But don't ask me which of them is smarter.

    It's like asking "Who's smarter?" of (1) the scintillating conversationalist who doesn't even understand the question "What's the derivative of X squared?", (2) Einstein, (3) Ken Jennings and (4) Obama.

    ReplyDelete