OpenAI sued for allegedly ‘stealing’ data from millions

29 Jun 2023

Sam Altman in 2019. Image: Steve Jennings/Getty Images for TechCrunch (CC BY 2.0)

US law firm Clarkson said it is taking OpenAI and Microsoft to court to ‘represent real people whose information was stolen’ to create ChatGPT.

ChatGPT creator OpenAI is facing a major class-action lawsuit from a US law firm on the grounds that it scraped the internet to train its generative AI chatbot, potentially violating the rights of millions.

Filed in a federal court in San Francisco, California yesterday (28 June), the $3bn lawsuit by Clarkson alleges OpenAI ignored any legal means of obtaining training data for ChatGPT and chose to gather it from the web without consent from the millions who have uploaded content.

Data obtained from the web include everything from social media content and blog posts to Wikipedia articles and even cooking recipes, all of which, the law firm believes OpenAI had no right to take without consent and use for its own profit.

“Despite established protocols for the purchase and use of personal information, defendants took a different approach: theft,” the complaint reads.

“They systematically scraped 300bn words from the internet, ‘books, articles, websites and posts – including personal information obtained without consent’. OpenAI did so in secret, and without registering as a data broker as it was required to do under applicable law.”

Clarkson managing partner Ryan Clarkson told The Washington Post that the firm represents “real people whose information was stolen and commercially misappropriated to create this very powerful technology”.

“All of that information is being taken at scale when it was never intended to be utilised by a large language model,” Clarkson said, adding that he hopes the court institutes some guardrails on how AI algorithms are trained and how people are compensated when their data is used.

The complaint also targets Microsoft, which has invested billions in OpenAI.

Through their AI products, the law firm claimed, the two companies “collect, store, track, share and disclose” the personal information of millions of people, including product details, account information, names, contact details, login credentials, emails, payment information and so on.

“With respect to personally identifiable information, defendants fail sufficiently to filter it out of the training models, putting millions at risk of having that information disclosed on prompt or otherwise to strangers around the world,” it went on.

News of the lawsuit comes on the same day as OpenAI founder and CEO Sam Altman chose London for its first corporate office outside the US, as the tech company continues to grow rapidly to meet demand for its AI services globally.

10 things you need to know direct to your inbox every weekday. Sign up for the Daily Brief, Silicon Republic’s digest of essential sci-tech news.

Sam Altman in 2019. Image: Steve Jennings/Getty Images for TechCrunch (CC BY 2.0)