Monday, February 25, 2008

I Fought the Law...

Historically, many of the major advances in science have involved the coining of a "Law". A good law is pithy, describes the world in a way that enables applied use, and tells people who are strong in mathematics, something about the nature of the world that would make the law fit the equation. For example, the inverse square law governing the apparent brightness of a star follows neatly from the fact that the discrete packets of light streaming outward from it spread out through successively larger spheres. Einstein's famous E=mc2 tells you that there is a universal speed limit equal to lightspeed.

Stepping back from the methods and techniques that have been used in NLP, we can hope for a law that describes the progress that has been made as research from industry and academia pursue better solutions to the hard problems. If you listen to Ray Kurzweil, you'd conclude that progress is going to approach a vertical asymptote. In other words, things are going to improve so much that the future will be off the charts -- things, sooner or later, are going to become infinitely good. Maybe quite soon. If you think that's true, you should invest heavily in AI-style technologies now!

The history of NLP, however, as well as that of some other fields of AI, has produced repeated evidence for a different law governing progress. A lot of the cycle of "hype, then winter" comes from people failing to recognize this law. The law that the history of NLP describes is quite different from Kurzweil's rosy-colored vision. In fact, it's at right angles to it. Rather than the state of the art approaching a vertical asymptote (infinite goodness soon!), it has been approaching a horizontal asymptote (progress is all done!).

Obviously, these two worldviews could not be more opposed. But the facts clearly point to which one is valid. Let's take some examples.

An important task in NLP is called Part of Speech Tagging. It consists of marking all of the words in a piece of text with their grammatical category. Not many people want to do this for its own sake, but it's a highly useful initial step in analyzing text more deeply. The first effort to do this electronically was the work on the Brown Corpus by Greene and Rubin in 1971. They achieved, with a very simple approach, performance of 70%. Not too bad for a first try. By the early 1980s, researchers had pushed this number way up, with a system called CLAWS achieving about 94%, meaning that we'd done away with about four fifths of the errors made by Greene and Rubin's approach. Big progress! But in the 25 years since then, progress has been extremely minor, perhaps to 96%. In fact, there is good reason to believe that tagging better than 97% is impossible, because even human annotators tagging a text do not agree more than that often. Moreover, some very different approaches to tagging all yield similar accuracy rates, just shy of that theoretical maximum. When much more progress happens in the first decade than in the succeeding quarter century, you have found yourself a horizontal asymptote.

As it happens, 96% is pretty good, and a tagger that performs that well is highly useful, errors be damned. And of course, no one would argue that a horizontal asymptote for accuracy must exist for anything, because you can't beat 100%. So in principle, this isn't a problem.

But in practice, it is! Because while Part of Speech Tagging is ready for application, some other very important tasks in NLP are not. Suppose the asymptote were not at 96%, but much lower. That's what you have with another highly interesting problem called Word Sense Disambiguation -- the determination of which meaning of a word a writer had in mind. (For example, "bank" like the side of a river, or "bank" like a financial institution.) Here, though the numbers vary considerably according to how you measure accuracy, it is clear that there are no excellent methods available. WSD has not advanced dramatically in the half century since it became an area of interest, and the approaches that exist are tailor-made to disappoint the people who understandably would like a good solution to this problem.

A very big problem for NLP as an enterprise is the cycle of hype, disappointment, and mistrust that I identified a few weeks ago. And a very big part of the hype end of the problem is that people who are the recipients of hype (customer bases, venture capitalists, managers, etc.) don't know which law has been in effect. Innovation does exist. Progress does happen. But before an idea becomes a software project, anyone reviewing it should determine if the NLP needed to make the approach successful has already hit a horizontal asymptote.

Friday, February 1, 2008

Winter or Spring?

NLP is a sub-field of Artificial Intelligence. Many sub-fields of AI share with NLP a history in which periods of optimism give way to periods of pessimism. The leading term for this is "AI Winter". Expectations are set high, results are promised, funding comes from all corners, work begins, results fall short of goals, disappointment reigns, and funding goes away. Then, after enough time has passed, new expectations are formed and the cycle begins anew.

Of course, the cycles of failure are just part of the story. Speech-understanding has eventually found worthwhile (though limited) applications after surviving through cycles of hype and disappointment. Over time, the technology got better, the hardware got better and cheaper, and now it's rare to call a major corporation's customer support without being routed by a system that uses speech-understanding software.

But the cycles of failure keep coming, and they are continuing to the present day. Is NLP in Winter now or in Spring? Probably both. There has been a lot of funding for NLP-based ventures in the last few years. Some of it will lead to particular successes and some to particular failures. One beneficiary of high expectations of late has been San Francisco-based Powerset. Whether or not Powerset eventually delivers on the high expectations it has generated may have a lot to do with whether the field as a whole spends the next few years in spring or in winter.