NYT AI lawsuit is ‘without merit’, says OpenAI

9 Jan 2024

Image: © vacant/Stock.adobe.com

The ChatGPT creator said the lawsuit is an opportunity to clarify its business and technology and added that the US newspaper ‘is not telling the full story’.

OpenAI, the company behind popular generative AI tool ChatGPT, has responded to the legal action The New York Times has taken against it.

In a legal battle launched at the end of 2023, the US media outlet claims AI chatbots such as ChatGPT are trained on millions of articles published by The New York Times and that the newspaper now competes with these chatbots as a source of reliable information.

In response, OpenAI said The New York Times is “not telling the full story”, and that the AI company collaborates with news organisations and offers an opt-out option for training data.

“Training AI models using publicly available internet materials is fair use, as supported by long-standing and widely accepted precedents. We view this principle as fair to creators, necessary for innovators and critical for US competitiveness,” the company said in a blogpost.

“That being said, legal right is less important to us than being good citizens. We have led the AI industry in providing a simple opt-out process for publishers (which The New York Times adopted in August 2023) to prevent our tools from accessing their sites.”

The company also said ‘regurgitation’ or ‘memorisation’ is a rare failure of the AI’s learning process. “We have measures in place to limit inadvertent memorisation and prevent regurgitation in model outputs. We also expect our users to act responsibly; intentionally manipulating our models to regurgitate is not an appropriate use of our technology and is against our terms of use,” it said.

“Because models learn from the enormous aggregate of human knowledge, any one sector – including news – is a tiny slice of overall training data, and any single data source – including The New York Times – is not significant for the model’s intended learning.

The lawsuit

The New York Times filed a lawsuit in December 2023 claiming that AI models from both OpenAI and Microsoft have copied and use millions of copyrighted news articles, in-depth investigations and other journalistic work.

“Defendants seek to free-ride on The Times’s massive investment in its journalism by using it to build substitutive products without permission or payment,” the media outlet said.

In one example, The Times claims Microsoft’s Bing search index generates responses that contain “verbatim excerpts and detailed summaries” of articles from the news organisation.

“By providing Times content without The Times’s permission or authorisation, defendants’ tools undermine and damage The Times’s relationship with its readers and deprive The Times of subscription, licensing, advertising and affiliate revenue,” it said in its court filing.

OpenAI said it was “surprised and disappointed” by the lawsuit, having had discussions over several months with The New York Times about collaboration, particularly around a real-time display with attribution in ChatGPT. “Along the way, they had mentioned seeing some regurgitation of their content but repeatedly refused to share any examples, despite our commitment to investigate and fix any issues.”

The AI company also claimed that the regurgitations the media outlet refers to appear to be from old articles that have already proliferated on other third-party websites, which can increase chances of being regurgitated through the AI model.

“We regard The New York Times’ lawsuit to be without merit. Still, we are hopeful for a constructive partnership with The New York Times and respect its long history.”

In a statement sent to SiliconRepublic.com, Susman Godfrey partner Ian Crosby, lead counsel for The New York Times, said: “The blog concedes that OpenAI used The Times’s work, along with the work of many others, to build ChatGPT. As The Times’s complaint states, ‘Through Microsoft’s Bing Chat (recently rebranded as Copilot) and OpenAI’s ChatGPT, defendants seek to free-ride on The Times’s massive investment in its journalism by using it to build substitutive products without permission or payment.’ That’s not fair use by any measure.”

The copyright concerns extend beyond the NYT’s lawsuit. In November 2023, a report from the News Media Alliance claimed many large language models use training datasets that contain copyrighted content from news, magazine and digital media organisations.

Meanwhile, authors and other content creators have raised concerns about these platforms using copyrighted materials without permission.

10 things you need to know direct to your inbox every weekday. Sign up for the Daily Brief, Silicon Republic’s digest of essential sci-tech news.

Updated, 3.10pm, 9 January 2024: This article has been updated to include a statement from lead counsel for The New York Times.

Jenny Darmody is the editor of Silicon Republic

editorial@siliconrepublic.com