OpenAI claims NYT ‘hacked’ ChatGPT for copyright lawsuit

28 Feb 2024

Image: © vacant/Stock.adobe.com

In a motion to dismiss, the AI company claimed the US media outlet exploited a bug to create its evidence and focuses on fringe behaviours.

ChatGPT creator OpenAI has filed a motion to dismiss several claims in the New York Times’ copyright lawsuit, arguing that the newspaper’s complaint does not meet its own “famously rigorous journalistic standards”.

The AI company claims The New York Times “paid someone to hack OpenAI’s products” in order to generate “highly anomalous results” used as evidence in its AI copyright case.

“They were able to do so only by targeting and exploiting a bug (which OpenAI has committed to addressing) by using deceptive prompts that blatantly violate OpenAI’s terms of use,” the motion read.

“And even then, they had to feed the tool portions of the very articles they sought to elicit verbatim passages of, virtually all of which already appear on multiple public websites. Normal people do not use OpenAI’s products in this way.” The company claims the newspaper’s “contrived attacks” were carried out by a “hired gun” and denies that the AI technology threatens journalism.

In a legal battle launched at the end of 2023, the media outlet claimed that AI models such as ChatGPT have copied and use millions of copyrighted news articles, in-depth investigations and other journalistic work.

“Defendants seek to free-ride on The Times’s massive investment in its journalism by using it to build substitutive products without permission or payment,” The New York Times said. In January, OpenAI said it was “surprised and disappointed” by the lawsuit and added the newspaper was “not telling the full story”.

As well as claiming that then news organisation hacked its product to obtain its evidence, OpenAI’s motion to dismiss, filed this week (26 February), said the media outlet’s case focuses on what the ChatGPT creator claims are “two uncommon and unintended phenomena” in the AI model: regurgitation and hallucination.

“Training data regurgitation – sometimes referred to as unintended ‘memorisation’ or ‘overfitting’ – is a problem that researchers at OpenAI and elsewhere work hard to address, including by making sure that their datasets are sufficiently diverse,” the motion read.

“The second phenomenon – hallucination – occurs when a model generates ‘seemingly realistic’ answers that turn out to be wrong … An ongoing challenge of AI development is minimising and (eventually) eliminating hallucination, including by using more complete training datasets to improve the accuracy of the models’ predictions.”

OpenAI also asks the court to dismiss part of the media outlet’s claim due to some conduct occurring more than three years ago, which falls outside of a time limitation period.

In a statement sent to SiliconRepublic.com, Ian Crosby, a partner at Susman Godfrey and lead counsel for The New York Times, noted that OpenAI did not dispute that it copied millions of articles from the media outlet to build its products. “What OpenAI bizarrely mischaracterises as ‘hacking’ is simply using OpenAI’s products to look for evidence that they stole and reproduced the Times’s copyrighted works,’ he said. “And that is exactly what we found.”

Crosby also said that building new products is no excuse for violating copyright law. “OpenAI’s response also shows that it is tracking users’ queries and outputs, which is particularly surprising given that they claimed not to do so. We look forward to exploring that issue in discovery.”

Alon Yamin, CEO of AI plagiarism detection company Copyleaks, said we can expect more lawsuits similar to this one. “The argument regarding how these models are trained and what content was used will continue for a while because, as this technology expands and becomes widely utilised in more and more industries, so will the concern regarding the ethics surrounding AI and its development,” he said.

Updated, 3.54pm, 28 February 2024: This article has been updated to include a further statement from lead counsel for The New York Times.

Find out how emerging tech trends are transforming tomorrow with our new podcast, Future Human: The Series. Listen now on Spotify, on Apple or wherever you get your podcasts.