DCU researchers analyse World Cup tweets to improve machine translation

8 Jul 2014

Image via lazyllama/Shutterstock

Dublin City University (DCU) researchers, in collaboration with Microsoft Research at the CNGL Centre for Global Intelligent Content in Dublin, have developed a live tweet translation streaming service in time for the 2014 World Cup’s final stages.

The 20-strong research team calls its live FIFA World Cup translation service ‘Brazilator’ and, with it, fans can follow what’s being said by supporters in 24 of the 32 original competing countries.

Brazilator tracks tweets in Irish, German, French, Spanish, Italian, Portuguese, Croatian, Greek, Japanese, Korean, Chinese and Farsi that include hashtags such as #WorldCup and #WC2014. It then publishes each tweet in its original form along with translations in English, French, Portuguese, Spanish and German.

The stream also provides sentiment analysis for each team during matches and visitors can track back on previous games to see if the Twitterverse reacted positively or negatively to a team’s performance.

World Cup Twitter analysis

Brazilator’s sentiment analysis of Portugal vs USA

While translating and analysing 140-character tweets may seem manageable, the nature of language used on social media presents a significant technical challenge.

“Tweets typically contain noisy, diverse and unstructured language, such as incomplete sentences, misspellings, abbreviations, web links, emoticons and hashtags,” explains Dr Lamia Tounsi, research integration officer with CNGL and co-leader of the Brazilator project team.

That’s where the DCU team’s expertise comes into play. CNGL, which is co-led by DCU and Trinity College Dublin, is one of just 11 Microsoft Translator partners worldwide and its machine translation group has won international recognition for its ability to translate the kind of unstructured language found in social media.

Apart from providing another way to follow the World Cup online, Brazilator is helping to build and evaluate machine translation engines, adapt existing engines to the football domain, analyse sentiment, and create new Web 2.0 resources.

“The Brazilator World Cup service evaluates machine translation systems and helps to identify the most effective translation options for this type of web content,” added Tounsi.

World Cup 2014 image by lazyllama via Shutterstock

Elaine Burke is the host of For Tech’s Sake, a co-production from Silicon Republic and The HeadStuff Podcast Network. She was previously the editor of Silicon Republic.