NLP Confidential: google

Like many service available online, Google's search bar offers completions that suggest the rest of a user's query before they have to type it all out. Potentially, you could enter a 25-letter query in just a few keystrokes, saving a little time – and who doesn't love saving time?

Sometimes, though, you see something like this:

In this case, I deliberately "baited" Google by starting out with a query that I knew would elicit a result like this, but it certainly took the bait. Maybe someone is wondering why black people are more likely than other races to have to sickle-cell anemia, but Google guesses instead that they have a racist question that want to research.

So Google was wrong, and fails to save the person a couple of seconds. But it does something more than that. It exposes the user to a couple of stereotypes that range from unflattering to extremely offensive. And by moderating the query a little, you can find a treasure trove of other stereotypes according to Google completions:

Californians are fake, stupid, and weird. Texans are stupid idiots. New Yorkers are rude and arrogant. (Actually, most groups of people are rude, if you go by the Google completions.) Americans are obese and ignorant. Chinese people are smart. Jews are rich. Asians are bad drivers. At least, this is what Google completions suggest, in response to a partially-typed query.

Why do these completions exist? Who is actually putting these stereotypes forth as true? You'd have to know Google's backend architecture in detail to answer that exactly, but some experimentation indicates the following:

1) People who type queries into Google. Common queries are more likely to appear as completions.

2) People who create web pages. Some completions are oddly-worded and unlikely to have originated as queries, but appear verbatim in various web pages.

3) People who see these queries and then select them. This is where the algorithm becomes insidious. The intention of a completion is to help someone save the effort of typing. But an interesting completion might sidetrack someone from their original purpose and click on it just to see what it's about. For example:

Benjamin Franklin had syphilis? Maybe some student had to write a term paper on the great thinker and now they've been given the lurid suggestion that the man had a sexually-contracted disease! If true, it's certainly not why Franklin became famous, but there it is as one (in fact, two) of the top four relevant facts about the man. Never mind his efforts in founding the United States, publishing newspapers, discovering the electrical nature of lightning and so on: He had V.D.! That's bad enough if it were true, but in fact, it appears not to be! If you follow the links these completions lead to, none of them have any evidence that Franklin had syphilis – just people asking if he did. But I'll admit, the completion made me want to click on it. And that's exactly the problem. I just "voted" for the Franklin - syphilis link, elevating it a little bit higher than the other possibilities. If syphilis had started off as the #4 completion, people like me vote it up to a higher position on the list, so more sensational queries tend to "win".

As a consultant for a news aggregator startup, I once had access to the data of which headlines people clicked on. An unmistakable trend was that headlines with exciting, sensational words in them were clicked on more often. This was true even if the word was simply being used as a metaphor ("Interest Rates Explode", "President Attacks His Critics").

It doesn't require that a lot of people believe that Franklin had syphilis or that any of those stereotypes are listed are true. It only requires that enough people type that query (or click on webpages that assert or even ask the question), it gets somewhere on the list of completions (maybe #10) and then other people see that completion, get intrigued, and vote it up. In fact, a lot of the people who vote up the racial stereotypes could even be people who are incredibly offended by them.

Benjamin Franklin is dead, but these sorts of completions exist for living celebrities, too. I did a search for 3 former NFL quarterbacks, and one of them had a completion for "gay". The man in question has denied being gay. How many people, every day, see this completion? Google is in effect spreading rumors about the man's personal life, just as they flash before our eyes a number of racial stereotypes.

So what is Google's role in this? Surely not that someone at Google decided that these racial stereotypes are useful suggestions. Google only built the system. The data voted for these completions to rank so high. If Google's algorithms work so well in general, then they can claim neutrality on these questions and say, Sorry, but these are the completions lots of people type and choose.

Except Google isn't neutral. Google does sanitize their completions in many cases.

If you type "scarlett johansson photos..." into Google's search bar, and then add any letter of the alphabet, it will show you completions. Any letter, that is, besides "n". "scarlett johansson photos n" produces no completions at all. Why? And so what? The reason why is that Google has specifically censored the completion that would allow the word "nude" to appear. I've had access to the logs of queries from search engines, and I guarantee you that the completions Google shows for "scarlett johansson photos" are not more common than "scarlett johansson nude" or "scarlett johansson photos nude". In fact, Google shows no completions for the word "nude" all by itself, even though it shows completions for "Mohorovičić discontinuity"... which one do you think people are searching for more?

So Google isn't completely neutral. They let the data vote for itself sometimes, even usually, but they censor some completions on, apparently, the suspicion that the results would be offensive or non-family-friendly.

Given that, there's not much excuse for letting these racial slurs show up. If it's offensive to suggest that an actress has taken her clothes off, it's certainly more offensive to allow the data to promote the racial stereotypes listed above. "It's just data" is a valid excuse for a person or company who uses big data as a tool. But once human hands go to work in the system, selecting what does and doesn't show up, those hands start to take some of the blame for the whole system. One imagines that these stereotypes have simply been below Google's radar, and that the "Don't Be Evil" company would want to censor those completions once they're aware of them.

When Google talks, people listen. Recently, Google has been saying that sometime in the near future, when you talk, people can listen in other languages, thanks to your phone and their software. Franz Och, the head of Google's translation services has said "We think speech-to-speech translation should be possible and work reasonably well in a few years' time." I'm not the only one who's skeptical.

In 2007, I evaluated a state-of-the-art StS MT system that was built for military applications (think of the Somali pirate crisis from 2009 or any scene from The Hurt Locker). I don't think it delivered. The issue is quality. StS MT can "work" if:

a) Both parties are extremely cooperative and are willing to put some work into the exchange. (Eg, training the system on their voices; repeating things, maybe more than once, if the result is unclear the first time; tolerating the inherent delays.) This could mean a life-or-death situation where the alternative -- no translation -- is worse than a bad translation. (Although the stakes could be higher, even deadly, if the translation is faulty.)

b) The comprehension of the content is held to a low standard. Eg, if two businesspeople or recreational chatters want to feel like they're getting to know each other; not trying to hammer out the fine details of a legal agreement.

c) There is no alternative. Because having the parties type content into an online text-based MT system immediately removes one source of error.

As long as the sources of error are as great as they are now, I have trouble thinking of many contexts where people would be willing to tolerate the flaws. Maybe chatters who are only looking for entertainment and have no bottom line regarding accuracy. In war / emergency contexts, perhaps. In business, I think the problems just about doom the effort, unless a cultural adjustment makes people value "meeting" someone in this way even when the comprehension is shaky.

I was, however, impressed by how well the speech recognition worked with my voice when I trained the system to recognize me. Perhaps if Google trains the system on enough people, just about anyone new to come along would sound close enough to one of those. The speech-to-text part of the problem just might be solvable for a large segment of people.

But that leaves the machine translation link in the chain, and that's something where there's little reason to suspect that a quantum leap is about to happen. I took Mr. Och's own quotation above and translated it to Spanish and then back to English using Google's text-to-text machine translation. It did a pretty good job, but it changed his use of the word "should" to "must". That's the sort of thing you don't want to have happen when a Somali pirate has his gun trained on you.

NLP Confidential

Saturday, February 15, 2014

Is Google Racist?

Sunday, February 28, 2010

Speech To Speech

Blog Archive