How DCU machine translation experts are transforming communication

4 days ago162 Shares

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Pin on PinterestShare on RedditEmail this to someone

Prof Andy Way of Dublin City University. Image: DCU

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Pin on PinterestShare on RedditEmail this to someone

Machine translation is helping people work through language barriers, and creating a world that will make communication easier, even in times of crisis.

Prof Andy Way is a machine translation (MT) expert. He has been working in this area for the past 30 years, and currently leads the Adapt Centre’s MT team at Dublin City University (DCU).

Way explained how MT was a good career choice for him: “My first degree was French, German and linguistics, and then I did a master’s in computing, so doing MT was a perfect marriage of the two disciplines.”

After completing a master’s, Way went on to undertake a PhD in MT.

The MT evolution

Way explained that there have been some significant changes in the approaches to MT over the years. “From the late ’80s to about 2015, the dominant approach to MT was statistical (SMT). We needed large amounts of parallel data, ie source sentences and their human-provided translations, to build our statistical translation models, which essentially would suggest target-language words and phrases, which the model believed to be translations of the source sentence.

“Following this process, it was then the job of the target language model – built from a large collection of monolingual data – to rearrange these words and phrases to produce the most fluent output according to the model.”

Essentially, SMT worked by identifying patterns in large collections of text, “which could be brought to bear when processing new, previously unseen texts”.

The last three years in MT have also seen neural MT (NMT) come to the fore. With NMT, all a research team needs is parallel data. The dominant model encodes the source sentence into a numerical vector representation, “which is in turn sent en bloc to the target-language decoder, whose job it is to generate the most likely target text from that vector”.

Way explained that NMT typically outperforms SMT and could be considered the “new state of the art”, citing more fluent translations and better word order as results. NMT does require much bigger training datasets, and models generally also take longer to train.

Adapt Centre breaking down language barriers

In terms of the application of MT at the Adapt Centre, Way and his team tackle language barriers that are “key challenges in enabling content to flow fluently across the globe”.

He explained that language shouldn’t be an obstacle when it comes to accessing information online, and machine translation is “becoming ever more important in facilitating access to content using dynamic transformation techniques”.

Way also explained how Adapt’s industry partners reap many benefits by engaging with the MT team at DCU. “[They] increase their competitiveness and reach across a wider set of industry vertical sectors.”

The team also works on developing “robust, high-quality Irish-English MT systems”, which are being used on a daily basis by Government departments, providing increasingly important societal and economic benefits to people in Ireland, especially as we approach the expiration of the Irish language derogation afforded by the European Commission.

MT in crisis situations

The work done by the Adapt MT team, led by Way, is a wonderful example of the real-life benefits of advances in computer science, which are already producing amazing results.

Interact is a project coordinated by Prof Sharon O’Brien at DCU and has partners such as Translators Without Borders, Microsoft, Unbabel, UCL and ASU. The project is concerned with the provision of reliable MT services in crisis scenarios.

Way points to the example of the 2010 Haiti earthquake, when “the world’s relief services arrived only to find that the locals only spoke Haitian Creole, so communication was impossible”. At the time, Microsoft put together an MT translation system, and the work Interact is doing runs along similar lines.

Way explained how the process works: “We’re trying to do much the same here, for a range of non-major language pairs, by pivoting through a well-resourced language; for example, there isn’t much Greek-Arabic parallel data around, but there is plenty of Greek-English and English-Arabic data, so we will be able to build a Greek-English-Arabic system with English as the pivot language. Other pivot set-ups include German-English-Arabic and French-English-Swahili.”

MT benefiting millions

Another MT project that benefits millions is the creation of a software translation tool created by Adapt Centre graduate and Microsoft researcher Dr Sandipan Dandapat.

Bangla is the seventh most spoken language in the world. The development of the tool that can help 215m Bangla speakers began while Dandapat was at Adapt during a Microsoft internship.

Dandapat said: “As Bangla is also my native language, it was especially rewarding to have the opportunity to bring this project to completion.”

Helming a team that works on such major projects, Way is in a good position to examine the future impact MT could have on the world and how we all communicate. “MT quality is now good enough that it has been demonstrated to be a key enabler in the translation pipeline for many use cases in different domains and for a range of end users. For example, Google Translate alone translates around 150bn words per day, every day.”

New use cases for MT

Way cited the emergence of more and more use cases, such as the instantaneous translation of massive volumes of tweets and other user-generated content. He said this area is where the “Adapt MT team is playing a leading role, where MT is the only hope. Human translators cannot work fast enough to meet such a huge translation demand and, in any case, the MT output is ‘good enough’ to allow users to understand the original content.”

In 2014, Adapt’s Brazilator project translated 83m words for 26 language pairs in real time, in tweets related to the FIFA World Cup. “By our estimation, around 1,300 human translators would have had to work full-time to do the same job in the time available, at a cost of over €3m. Clearly that would have been unfeasible, so there are opportunities for MT to offer new services where there currently is no provision.”

What about human translators?

So, you might ask, where does the MT boom leave real-life, human translators?

Way explained that it’s not a case that MT will ever replace what an individual expert can do. “This is no threat to human translators; it is estimated that only around 5pc of all content that could be translated actually is, as there simply aren’t enough human translators to meet this demand.”

He also emphasised that there would always be a need for a “human in the loop” to try and catch errors that MT models could make, and post-edit them. Although MT quality is now invariably good, Way said: “It will always make some mistakes, which are often difficult to predict.

“For example, NMT now produces some excellent translations but, all of a sudden, it’ll produce a translation which has very little to do with the sentence being translated.

“Given that a lot of the ‘knowledge’ is encoded in the system’s hidden layers, although quality has definitely improved, our understanding of what’s going on has not.”

He also made the point that the MT model training data itself comes from human translators. “We use some of that data for automatic testing of our systems. We need human expertise to tell us what errors our systems are making.”

Ethics are paramount

Like all machine learning endeavours, there are ethical concerns, ones that the experts are constantly cognisant of, said Way.

For example, who owns the parallel translation data (and any derivatives that ensue) used to train the MT systems? Who is liable if someone suffers (in all meanings of the word) from an MT error? Way also stressed the need for fair rates of pay for translators and post-editors who do vital work.

For him, the Adapt team, and machine translation experts in general, “the human in the loop will always be the most important link in the chain. We MT developers are just trying to make technology-savvy translators better, not replace them.”

It’s a fascinating field, and one that is metamorphosising at a rapid rate. MT models are helping all of us – from when we use Google Translate to find the meaning of a Spanish word on holidays, to creating an environment where more lives can be saved through clear communications in the aftermath of natural disasters.

Way and the Adapt team are at the coalface of major and exciting developments that will improve how we all connect and engage with one another.

Ellen Tannam is a writer covering all manner of business and tech subjects

editorial@siliconrepublic.com