EU bonanza for language software developers


21 Jan 2008

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Pin on PinterestShare on RedditEmail this to someone

Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Pin on PinterestShare on RedditEmail this to someone

The European Commission has made its collection of about one million sentences and their translations in 22 languages available for free. The data is highly sought after by developers of computer-assisted translation technology and is expected to spur development of linguistic software tools.

The Commission’s collection of translated sentences is the largest collection in so many languages – 22 of the 23 EU member states. It will be used in machine translation systems where automatic translation software ‘learns’ from manually translated texts how words and phrases are correctly and contextually translated.

“By this initiative the European Commission intends to boost human language technologies, support multilingualism and make computer-assisted translation easier, cheaper and more accessible,” said Leonard Orban, Commissioner for Multilingualism. “Citizens belonging to the smaller linguistic communities will now have easier access to documents and web pages previously only available in the most used languages.”

“This unique collection of language data contributes to the creation of a new generation of software tools for human language processing and helps foster the competitiveness of the language industry, which is already one of the fastest-growing industries in the EU,” said Janez Poto?nik, Commissioner for Science and Research.

The EU has more multilingual texts than any other body because of the requirement that EU law exists in each of its members’ 23 official languages. The EU translation services with 253 possible language-pair combinations produce around 1.5 million translated pages a year.

Whereas large amounts of translations of English or French texts can be found on the internet, such resources are scarce for languages such as Latvian or Romanian, and they are practically non-existent for the combination of two languages for which few resources exist.

The Commission is releasing large collections of sentences from legal documents covering technical, political and social issues, which are available in 22 languages. In this translation repository it is possible to find sentences with their equivalent in all other official languages. Only Irish translations are not yet available.

The Commission has extensive experience in the development of multilingual text processing tools and is at the forefront of multilingualism, offering publicly accessible news-search sites covering up to 35 languages via its European Media Monitoring tool. The 7th Framework programme for research and development supports research on machine translation and other language-related technologies.

By Niall Bryne