xAI, Google, OpenAI Sued for Scraping 'pirated books' for AI and Search Training

Investigative journalist John Carreyrou, best known for exposing the Theranos fraud, has filed a federal lawsuit accusing major AI companies of illegally using pirated books to train their models.

The case, filed in California federal court, names OpenAI, Google, Meta, and xAI as defendants. Carreyrou and five other authors claim these companies copied copyrighted books without permission to build commercial large language models. The complaint alleges large-scale copyright infringement through the use of pirated book repositories such as LibGen and Z-Library. The plaintiffs argue that entire books were copied, stored, and processed to accelerate AI development.

The lawsuit calls this conduct deliberate and systematic, not incidental scraping.

Why This Case Stands Out

Unlike many recent AI copyright cases, this lawsuit is not a class action. The plaintiffs are pursuing very specific individual claims, a move that sharply raises potential damages.

Under U.S. copyright law, statutory damages can reach up to $150,000 per infringed work. With so many books involved, liability could escalate quickly if the court rejects fair use defenses. The filing also criticizes recent settlements that paid authors only a fraction of that amount, arguing that such deals vastly undervalue the original reporting and long-form writing.

Fair Use Is the Central Fight

AI companies continue to argue that training on copyrighted material qualifies as fair use because models generate new outputs rather than reproducing books verbatim.  Courts have not come close to settling this issue. Some rulings suggest training itself may qualify as fair use, while separate findings have ruled that storing pirated works, even temporarily, directly violates copyright law.

Do We Care at This Stage?

The lawsuit reinforces a growing legal trend. Newsrooms, image libraries, and publishers are directly pushing back at Search and AI tools that increasingly answer queries without sending traffic back to original sources.

Related cases include lawsuits by The New York Times against Microsoft and OpenAI, and by Getty Images over AI training on licensed photos,  Reuters v Ross. and major music labels suit against Suno and Udio.

Discovery Happens Next

As is always the case in these suits, the defendants are expected to file motions to dismiss. If the case survives, discovery could expose how training data was sourced and handled internally (that could be fun if it happens, but doubtful anything sees the light of day).

For publishers and site owners, this lawsuit is less about a single journalist and more about a line being drawn (which - boy do we love to see this eh). It asks whether AI's can build commercial systems on unpaid content, or whether creators retain enforceable rights when their work becomes training fuel.

Thanks - but No Thanks?

While this action is nice to see for content owners and site publishers, the reality is that the businesses involved are at this stage, too-big-too-care. This suit will not have legal legs. If there is one thing I have learned about the US civil courts in my six decades is this simple golden rule, <i>he who has the most money, wins</i>. Let this story stand as the foot note.

 

 

Case Defendant(s) Status / notable ruling
Authors Guild v. Anthropic Anthropic Training = fair use / storing pirated copies ≠ fair use
New York Times v. OpenAI/MS OpenAI, Microsoft Ongoing
Getty Images v. Stability AI Stability AI Ongoing
Music publishers v. Suno/Udio Suno, Udio Ongoing
Carreyrou et al. v. OpenAI et al. OpenAI, Google, Meta, xAI Filed Dec 2025 – early stage