Meta said the AI model was designed to bring machine translation to low-resource languages and is being used to help Wikipedia editors.

Facebook’s parent company Meta has released an open-source AI model that can translate 200 languages.

Meta said its single AI model is the first to be able to translate across so many languages, which has a particular focus on African languages.

The tech company said a handful of languages dominate the internet, which means only a fraction of the world can access content and contribute to the web in their own language. Meta CEO Mark Zuckerberg said the new natural language processing (NLP) model will allow “25bn translations every day” across the company’s apps.

“We call this project No Language Left Behind, and the AI modelling techniques we used are helping make high quality translations for languages spoken by billions of people around the world,” Zuckerberg said.

Zuckerberg said the AI model, also called NLLB-200, has more than 50bn parameters and was trained on the company’s supercomputer called Research SuperCluster.

Meta first shared details of this long-term project in February, when the company showcased where its AI research is focused for the year ahead. This included ambitious plans for a Universal Speech Translator, to better support languages that lack a standardised writing system.

Meta said there are 55 African languages included in the new AI model, which includes low-resource languages that have few written examples available online. The tech giant said it has worked with professional translators so it can automatically assess the translation quality of low-resource languages.

The tech company has released a research paper on the AI model and said will provide tools for other researchers to extend the work to other languages. Meta said lessons from the model are being trialled on translation systems used by Wikipedia editors.

Potential pressure for reviewers

Commenting on the announcement, Victor Botev, CTO of start-up Iris.ai said the engineering prowess needed to present enough data for these obscure datasets is a “marvel”.

However, he said these types of AI models are not necessarily the “cure-all” they first appear as, as they can struggle with specific tasks due to their size.

“The models that Meta uses are massive, unwieldy beasts,” Botev said. “So, when you get into the minutiae of individualised use-cases, they can easily find themselves out of their depth – overgeneralised and incapable of performing the specific tasks required of them.”

Botev said that the validity of Meta’s measurements have not been scientifically proven and verified by their peers. He said the datasets for different languages are too small and the metric Meta is using, BLEU, is “is not particularly applicable”. Botev also said the article has not been published for peer review

“Doing a kind of peer review through Meta’s media publication creates bias for future reviews and puts public pressure on the reviewers,” Botev said. “But despite all of this, I’m hoping that these points will be addressed and it will be a good foundation for some great work in the next few months in NLP.”

10 things you need to know direct to your inbox every weekday. Sign up for the Daily Brief, Silicon Republic’s digest of essential sci-tech news.