Sunday, September 25, 2011
And laughs are something people like to share. When people communicate via social media, they type "laughs." In a sample of a million words of Twitter messages in ten different languages, I found that about 0.5% of all "words" are laughs – "haha", "LOL", or other ways of typing out a chuckle.
Do people everywhere laugh equally?
Not on your life.
In a study of ten Western languages (English, German, Dutch, Norwegian, Swedish, Danish, French, Spanish, Italian, and Portuguese), I found enormous differences in the frequency of Twitter-laughs.
The Germans laugh least, with Twitter laughs making up under 0.1% of all words.
Other languages of Northern Europe were somewhat more prone to laughs than German. In increasing order of laugh frequency, Norwegian, French, Swedish, English, and Danish all came in below 0.4%.
And then there are the happy Latins. Laughing just more than the Danes, Portuguese has 0.5% laughs, and that's nothing compared to the Italians who Twitter-laugh in 0.9% of words. But the runaway laugh champions are Spanish speakers who type Twitter laughs for 1.4% of words.
The North-South pattern is noteworthy, but is broken by the Dutch, who out-laugh their neighbors like they're misplaced Latins, finishing way up at 0.8%.
The Dutch withstanding, the North-South trend is sharp and undeniable, as this color-coded map makes clear.
What's even funnier, the languages where people laugh more often, they also type longer laughs.
When you take into account the length as well as frequency of laughs, Spanish Twitter has 24 times more laughing than German, as measured in character count. This is not a subtle difference!
So, why is all of this happening? It's clear that more Twitter laughs come from the warmer and sunnier countries.This is true not only in Europe but also in the Americas, where the most speakers of English, Spanish, and Portuguese live. Statistically speaking, the laugh statistic is highly correlated with the latitude of the corresponding European capital (farther south: r=0.66), how sunny that city is (more sun: r=0.74), and inversely with the suicide rate (r=-0.74; this is the same if you choose the U.S., Mexico, and Brazil instead of the U.K., Spain, and Portugal).
So is it as simple as this: Warm, sunny weather makes people laugh a lot and immune to depression?
That may be part of it. But another idea to consider is that in Germany and Scandinavia Twitter is used comparatively more often for business and relatively less often for chatting. When one subtracts the social chat, then naturally less laughter remains.
Overall, it's not clear how much Twitter reflects life as a whole. Until we plant microphones everywhere and monitor all human communication, studies like this will just be suggestive of larger truths. But insofar as it goes, this study of Twitter laughs serves to support a lot of existing cultural stereotypes.
Monday, August 29, 2011
Sunday, June 26, 2011
Friday, June 17, 2011
The NBA Finals ended last weekend with the Dallas Mavericks beating the Miami Heat in six games. This is an outcome that a significant minority of the Twitter users predicted -- specifically, it was the second-most common prediction (out of eight logical possibilities) with 24.8% of users choosing it.
Sunday, June 5, 2011
Tuesday, May 31, 2011
Sunday, February 28, 2010
In 2007, I evaluated a state-of-the-art StS MT system that was built for military applications (think of the Somali pirate crisis from 2009 or any scene from The Hurt Locker). I don't think it delivered. The issue is quality. StS MT can "work" if:
a) Both parties are extremely cooperative and are willing to put some work into the exchange. (Eg, training the system on their voices; repeating things, maybe more than once, if the result is unclear the first time; tolerating the inherent delays.) This could mean a life-or-death situation where the alternative -- no translation -- is worse than a bad translation. (Although the stakes could be higher, even deadly, if the translation is faulty.)
b) The comprehension of the content is held to a low standard. Eg, if two businesspeople or recreational chatters want to feel like they're getting to know each other; not trying to hammer out the fine details of a legal agreement.
c) There is no alternative. Because having the parties type content into an online text-based MT system immediately removes one source of error.
As long as the sources of error are as great as they are now, I have trouble thinking of many contexts where people would be willing to tolerate the flaws. Maybe chatters who are only looking for entertainment and have no bottom line regarding accuracy. In war / emergency contexts, perhaps. In business, I think the problems just about doom the effort, unless a cultural adjustment makes people value "meeting" someone in this way even when the comprehension is shaky.
I was, however, impressed by how well the speech recognition worked with my voice when I trained the system to recognize me. Perhaps if Google trains the system on enough people, just about anyone new to come along would sound close enough to one of those. The speech-to-text part of the problem just might be solvable for a large segment of people.
But that leaves the machine translation link in the chain, and that's something where there's little reason to suspect that a quantum leap is about to happen. I took Mr. Och's own quotation above and translated it to Spanish and then back to English using Google's text-to-text machine translation. It did a pretty good job, but it changed his use of the word "should" to "must". That's the sort of thing you don't want to have happen when a Somali pirate has his gun trained on you.