The Internet Archive helps Wikipedia resurrect 9m dead links

3 Oct 2018

Wikipedia homepage. Image: pproman/Shutterstock

Wikipedia has millions of articles across an array of languages, so it often struggles with fixing the volume of broken links. The Internet Archive is helping to solve this issue.

While it may often be said that ‘the internet is forever’, this is not the truth in some cases. Firms, organisations and individuals put up and pull down websites at rapid rates, creating an issue for Wikipedia. What happens when a link that is trying to prove the veracity of an article is pulled?

As the internet is now viewed as a historical record, gaps begin to emerge and Wikipedia becomes littered with dead links.

Restoring millions of links

To try and help with this issue, the Internet Archive has recovered 9m broken links on Wikipedia. The links themselves previously directed users to websites that no longer exist, but they now go to archived versions of the sites in the Internet Archive’s Wayback Machine.

The Wayback Machine houses an archive of more than 338bn webpages, dating back to the very beginning of the world wide web. It can be difficult to manually search, so one Wikipedia contributor, Maximilian Doerr, created a program called IAbot (Internet Archive bot), which identified links that returned a 404 or ‘page not found’ error. Wikipedia community volunteer Stephen Balbach also played a crucial role in the project.

Once IAbot found the links, it searched the Internet Archive for the corresponding page. It then linked the article to the archived content. The software helped fix 6m links across 22 Wikipedia sites, with an additional 3m manually fixed by volunteers.

More work to do

The Internet Archive is planning further work down the line. This includes checking more Wikipedia versions and boosting how rapidly it can repair broken links. It is also going to investigate using the same process for academic papers or other media.

This project is part of the Internet Archive’s Building A Better Web initiative, which aims to “bring you knowledge in all its many forms that is richer, deeper, more trustworthy and openly accessible on the web”. This is a positive, particularly for those who use Wikipedia to find credible original articles to cite in reports, academic essays or other works. Without the links, readers may cast doubt on the credibility of the source.

The Wikimedia Foundation conducted a study with researchers from Standard and France’s EPFL, which found that users cite the Wayback Machine more than any other site. It averages approximately 25,000 click-throughs per day.

Wikipedia homepage. Image: pproman/Shutterstock

Ellen Tannam was a journalist with Silicon Republic, covering all manner of business and tech subjects