In 10 languages, happy words beat sad ones

Print Friendly
In 10 languages, happy words beat sad ones
Photo of two men gesturing and discussing 'wordy' research; the happiest and saddest words used on Twitter float in the background space.
Peter Dodds (left) and Chris Danforth, mathematicians at the University of Vermont, led a study that confirms the 1969 Pollyanna Hypothesis that there is a universal human tendency to “look on (and talk about) the bright side of life.”
Original photo: University of Vermont; adaptation: The Why Files

Amid the everyday storm of flaming, bitching, cursing and general bad-mouthing in music and film, and on Twitter and the web, how’s this for bizarre? A new study of billions of words actually used in 10 major languages finds that writers in each language prefer positive, happy words. Whether it’s tweets in English, movie subtitles in Korean, books in Russian or websites in French, this general preference for positivity showed up in every single realm.

Click to enlarge
Stack of bell curves showing the distribution of word ratings for different languages and text media, with the Spanish language having the highest average happiness word rating in all categories and Chinese books having the lowest average rating.
Language-medium pairs are ranked from most positive to most negative. Blue = negative; yellow = positive; shape of graph shows distribution of words according to positivity.
Modified original graphic by Dodds et al. 2015

The study is a classic example of using big data to address longstanding questions. The authors cast a wide net, looking at English, Spanish, French, German, Brazilian Portuguese, Korean, Chinese (simplified), Russian, Indonesian and Arabic. All told, they drift-netted 24 subtypes of media, including books, news outlets, social media, websites, television and movie subtitles, and music lyrics.

Each medium provided at least 5,000 words, and each language, at least 10,000. “Lists that were used in the past were created by experts,” says Peter Dodds, who collaborated with Chris Danforth on the study. Both men are mathematicians at the University of Vermont’s Computational Story Lab. “We thought, ‘Let’s go find all the words we use most commonly.'”

The researchers then asked 1,900 native speakers of the various languages to rate the words on a scale of 1 (illustrated with a deeply frowning face) to 9 (a broadly smiling face).

The result was 50 ratings for each word, for 5 million total evaluations.

But what about this?

They asked whether words removed from their context could be misleading. Wouldn’t the first word of the love ballad “Killing Me Softly With His Song” register as negative? Yes, Dodds says, “but they wash out when the sample gets large enough. It’s like measuring the temperature of a room. If you look at a few molecules, maybe they are more or less active,” but temperature is an average of all the molecules. In the same way, “We are trying to get at the whole picture” of language, he says.

Time series graph of average word ratings for the pages of Moby-Dick.
How does word use change across the pages of the classic “Moby-Dick”? The unsettling conclusion is clearly evident in this analysis of word use.
Modified from original graph by Hedonometer

Neutral words, such as “the” and “for” were ranked, as expected, around 5, but why not just ignore them? “We wanted the instrument we created to fit the language we would be using it on,” says Dodds. It’s easy to remove these “function words” from evaluations later on, he says, “but we can do this in a principled way. We don’t bring in our biases. We let people tell us what their language is.”

Once the words and the ratings were nailed down, it was a straightforward — if big — computing task to rate the words actually used in different media for the various languages.

And here’s the weird part: The average rating for every language — and every medium — was significantly above 5.

So did the study show that people have, in general, positive emotions? “That’s strong,” Dodds says. “I think it’s proof of a positive bias in language, and language is our code for how we interact; it’s our great social technology, an amazing invention that allows our communication in a powerful way.”

A long debate in linguistics boils down to this: Language shapes us. Or we shape language. The reality, Dodds says, is somewhere in the middle. “I think language encodes our sociality. We are social beings. You can argue that we are selfish or altruistic, but language tells a story about how we behave.”

Click for interactive graph
Time series graph of average daily Twitter word usage for the last 13 months.
A time series of positivity and negativity in Twitter posts in the U.S. since January 2014. Holidays such as Christmas and Thanksgiving stand out as the happiest days, while the Ferguson, Mo., protests and the death of Robin Williams were the saddest days. Note the fine-scale weekly “wobble” of happiness between week’s beginning and week’s end, and the fact that the rating is consistently above a neutral 5.
Click here to browse the interactive graph.

The sweet tweet

Map of the U.S. with pixels colored from red to blue mapped out according to the happiest and saddest cities, using Twitter as a metric
The 2011 geography of happiness — or at least, happy word usage, in Twitter.

After collecting about 100 billion words in tweets, the researchers were able to track emotional expressions by time and place (if the tweets were geotagged).

And that showed how emotional expression can vary from day to day. On a weekly cycle, Saturdays proved most positive; Tuesdays were most negative. Holidays, especially “merry” Christmas and “happy” Thanksgiving, marked the biggest spikes; deaths, murders and other outrages brought the biggest lows. Spanish — taken from Mexican websites, books and tweets — was the most positive — and Chinese, taken from Google books, was the most negative.

With those consistent trends, the study confirmed the 1969 “Pollyanna hypothesis,” which held that most human communication tends toward the positive.

Among cities, Boulder, Colo., rates highest for positive word usage while Racine, Wis., was the most negative. In Racine, Dodds says, “There is a lot more swearing, and ‘don’t,’ ‘never,’ ‘no,’ ‘nobody,’ and less ‘haha’ — that’s important on Twitter — and less ‘happy,’ ‘best,’ and ‘awesome.'”

We returned to our attempt to summarize the study, asking, “Are you saying people are happy based on the words they choose?” No, says Dodds. “We are not telling you what people are thinking inside their heads. We are telling you how people react to the words people use.”

– David J. Tenenbaum

1 2 3 4

Kevin Barrett, project assistant; Terry Devitt, editor; S.V. Medaris, designer/illustrator; David J. Tenenbaum, feature writer

Bibliography

  1. Human language reveals a universal positivity bias, Peter Sheridan Dodds et al, Proceedings of the National Academy of Sciences, Feb. 9, 2015.
  2. Eric Idle – “Always Look On The Bright Side Of Life”
  3. Big data. Big obstacles.
  4. How Big Data can bring medical benefits, without compromising privacy rights.