Meta launches AI tool that can improve accuracy of Wikipedia citations

12 Jul 2022

Image: © prima91/Stock.adobe.com

Meta said its AI model can automatically verify hundreds of thousands of Wikipedia citations at once.

Researchers at Meta have developed an open-source AI tool that can automatically scan citations to check whether they’re accurate.

Meta said this is possible through its AI model called Sphere, which contains a dataset of 134m documents taken from public web pages. Sphere uses open web data rather than traditional search engines to better leverage “real-world knowledge”.

Meta said this model can successfully review and verify citations in Wikipedia, and is capable of automatically verifying hundreds of thousands of Wikipedia citations at once.

The eventual goal is to build a platform that can help Wikipedia editors easily spot citation issues and quickly fix errors at scale in citations or corresponding article content.

“While Wikipedia is accurate, well formatted and small enough for the majority of architectures to navigate, it’s also crowdsourced and doesn’t capture all the knowledge available on the web,” Meta said. “And its continued growth has made it challenging for editors to double-check every citation or inadvertent biases.”

Meta fed its algorithms 4m claims from Wikipedia, teaching them to zero in on a single source from a large pool of web pages to validate each statement. The model is then able to rank the cited source and retrieved alternatives according to the likelihood that they support the claim.

“When deployed in the real world, the model will offer the most relevant URLs as prospective citations for a human editor to review and approve,” Meta said.

It added that Sphere’s dataset represents “orders of magnitude more data” than other knowledge sources used in question-answering or fact-checking tasks, known as knowledge-intensive natural language processing (KI-NLP).

“Because Sphere can access far more public information than today’s standard models, it could provide useful information that they cannot,” Meta said in a blogpost.

It added that KI-NLP systems typically depend on “commercial black box” search engines to find relevant web knowledge to answer questions, which can lead to information being missed because it has a low ranking in search algorithms.

Sphere is an open-source tool and Meta aims to turn it into a universal source of knowledge that can solve multiple KI-NLP tasks at once.

The company said that in time, these models could deal with harmful web content and enhance people’s skills in digital literacy and critical thinking.

Although the AI tool was tested on Wikipedia articles, Meta said it is not partnering with parent company Wikimedia on this project. Meta added that Sphere is still in the research phase and is not being used to automatically update Wikipedia content.

Last week, Meta released an AI translation tool that the company said works across 200 different languages. It added that lessons from this open-source project are being applied to translation systems used by Wikipedia editors.

10 things you need to know direct to your inbox every weekday. Sign up for the Daily Brief, Silicon Republic’s digest of essential sci-tech news.

Leigh Mc Gowran is a journalist with Silicon Republic

editorial@siliconrepublic.com