Here’s how rubbish April Fools’ stories could help AI identify ‘fake news’

1 Apr 2019

Man in a suit and tie unimpressed at reading an April Fool's story in a newspaper.

Image: © studiovespa/Stock.adobe.com

1 April marks the day when media outlets flock to write stories too silly to be true, but they could also help us tackle so-called ‘fake news’.

On the same day that a journalist’s inbox is full of PR pitches attempting to catch their attention with an April Fools’ Day joke, media outlets across the world are trying to play a prank on their readers with silly stories that almost sound like they could be true.

While you might roll your eyes at the annual tradition, researchers from Lancaster University are soon to present findings at the 20th International Conference on Computational Linguistics and Intelligent Text Processing that show how these stories could offer clues to artificial intelligence (AI) algorithms designed to spot attempts at disinformation, or so-called ‘fake news’.

The researchers claim that both false stories and April Fools’ Day stories have a similar structure, having compiled a dataset of more than 500 April Fools’ articles published over a period of 14 years.

“April Fools’ hoaxes are very useful because they provide us with a verifiable body of deceptive texts that give us an opportunity to find out about the linguistic techniques used when an author writes something fictitious disguised as a factual account,” said Edward Dearden, lead author of the research. “By looking at the language used in April Fools’ and comparing them with fake news stories, we can get a better picture of the kinds of language used by authors of disinformation.”

While the team admitted that not all of the features of April Fools’ pieces can be useful to detect pieces spreading disinformation, both types of article tend to favour less complex language and use longer sentences than genuine news.

Additionally, important new information – such as names, places, dates and times – was found less often within both April Fools’ stories and attempts at disinformation. However, proper nouns – eg the names of prominent politicians such as Donald Trump or Hillary Clinton – are more abundant in disinformation than in genuine news articles or April Fools’ pieces, which have significantly fewer.

A surprising discovery

One surprise in the findings was that first-person pronouns such as ‘we’ are quite common in April Fools’ and false stories, going against traditional thinking in deception detection that suggests liars use them less frequently.

The researchers also created a machine-learning ‘classifier’ to identify if articles were April Fools’ hoaxes, attempts at disinformation or genuine news stories. This AI achieved a 75pc accuracy at identifying April Fools’ stories, and 72pc for false stories.

Dr Alistair Baron, co-author of the paper, said: “Looking at details and complexities within a text are crucial when trying to determine if an article is a hoax. Although there are many differences, our results suggest that April Fools’ and fake news articles share some similar features, mostly involving structural complexity.

“Our findings suggest that there are certain features in common between different forms of disinformation, and exploring these similarities may provide important insights for future research into deceptive news stories.”