New study shows value of translating user generated content

UGC Translation

Getting the most out of translating your user-generated content

Language service providers have long known from experience that translating business content into the customer’s native language yielded significant financial returns.

Now, language service providers have proof to back up what they intuitively knew all along. A new study illustrates the power of translating your user-generated content into multiple language pairs and which language pairs are correlated with higher ratings.

A study by Scott Hale, Data Scientist at the Oxford Internet Institute, shows how language affects ratings on Hale took data mined from the site to map out whether translating between certain language pairs had beneficial effects on the overall ratings customers gave to Trip Advisor. Below is a map of the correlations between all language pairs that Hale looked at.

TripAdvisor Language Ratings

This chart shows the similarities that user reviews in different languages rated the same London attractions on TripAdvisor. Each cell represents the mean correlation of the star ratings between a pair of languages. Image credit:

A quick look at the numbers reveals some telling takeaways that can help any company to attract and retain customers.

The first is that user-generated content has sky-rocketed. More Internet users themselves – rather than institutional entities and site managers – are creating content. This user-generated content (UGC) has high commercial value; UGC like reviews, social media comments, and personal blogs can all contribute to promoting business’ services and products. For example, a cluster of positive Yelp review of a restaurant can do more than any explicit advertising to bump up restaurant revenue. Simply put, customers trust the unsolicited experience of other customers more than promotion or marketing sponsored by a commercial entity. With widespread internet access and use among consumers, more people than ever can create their own digital content.

More UGC means more demand for translation. Unless translated, UGC stays in the native language of its creator, severely limiting the scope of that content.  According to Common Sense Advisory, 74% of online shoppers say they are “more likely to purchase the same brand again if the after-sales care is in their language,” even if they speak good English. Thus, not translating your business’ UGC comes at significant commercial opportunity cost.

Second, more and more UGC is being written in languages other than English.

In his study, Hale says that, while the earliest TripAdvisor reviews from 2001 were all in English, from 2006 on “non-English reviews grew quickly,” the top eight languages being French, Spanish, Danish, Italian, Japanese, Portuguese, and Russian. Below, Hale graphs the increase in content in these languages.

TripAdvisor Top 8 Languages

Top 8 languages with the most number of user reviews on TripAdvisor about London attractions from 2001 to 2015. Image credit:

Third, Hale’s study demonstrates the positive effect certain language pair translations have for businesses. Hale found that certain language pairs dramatically increased the number of stars customers gave to attractions and shows. Conversely, in the absence of translation, some language pairs actually hurt the ratings customers gave on TripAdvisor.

For example, in another study Hale did on bilingual editing of Wikipedia showed the Japanese less likely to engage with foreign-language content compared to speakers of other languages. Thus, translating content into Japanese for Japanese customers might yield a higher marginal benefit for businesses than other languages, because Japanese customers more strongly prefer engaging with content in their native language. It is empirical confirmation of something that intuitively makes sense; customers respond best to content that is in the language they most feel at home in.

As more UGC is generated, language service providers will have more data to analyze, meaning we will get more accurate correlations for which language pairs perform the best commercially. That way, a company in the future will be able to optimize their translation budgets. Hale envisions a business with, say, many reviews in Russian, being able to ask themselves which languages would be most worth translating those reviews into.