Menu
Reviewing the Top AI Training Cases from 2025
January 15th, 2026
As we start a new year, which is sure to be full of new developments in both artificial intelligence (AI) law and technology, this seems like a good time to review some major decisions from the year just past.
As background, generative artificial intelligence (GenAI) large language model (LLM) tools like ChatGPT, Grok, etc., are trained on materials (including books, newspaper and magazine articles, etc.) that may be protected by copyright.
Some GenAI companies have entered into deals with copyright owners to license these materials. Still, many have not – or they only sought licenses after getting caught using the materials without a license.
There are two fundamental copyright law issues when it comes to GenAI training:
- Does training GenAI tools using unlicensed copyrighted materials violate the exclusive rights of the copyright holders?
- Did the GenAI companies obtain and process the copyrighted materials in ways that violate the exclusive rights of the copyright holders?
Under Section 106 of the Copyright Act, and subject to sections 107 through 122, the owner of copyright under this title has the exclusive rights to do and to authorize any of the following:
- to reproduce the copyrighted work in copies or phonorecords;
- to prepare derivative works based upon the copyrighted work;
- to distribute copies or phonorecords of the copyrighted work to the public by sale or other transfer of ownership, or by rental, lease, or lending;
- in the case of literary, musical, dramatic, and choreographic works, pantomimes, and motion pictures and other audiovisual works, to perform the copyrighted work publicly;
- in the case of literary, musical, dramatic, and choreographic works, pantomimes, and pictorial, graphic, or sculptural works, including the individual images of a motion picture or other audiovisual work, to display the copyrighted work publicly; and
- in the case of sound recordings, to perform the copyrighted work publicly by means of a digital audio transmission.
It’s clear that “using the work to train GenAI” (or any other form of training) isn’t expressly listed among those exclusive rights.
However, “copying” is an exclusive right of a copyright holder.
When a human being uses a copyrighted work for training, there’s no inherent need to copy the work.
For example, a human can learn to play guitar or cook pasta by reading a book on the subject. The knowledge is “stored” in the human’s brain – the “wetware.”
However, for a computer to learn from copyrighted material, the material must usually be loaded onto a server (hardware) for processing.
Humans can obtain books legally or illegally. They can be purchased new or used from bookstores, downloaded legally from e-book sites (including as audiobooks), borrowed or stolen from friends or libraries, etc.
Current litigation has revealed that many millions of books and other materials used to train GenAI tools were “stolen”. i.e., copies illegally “ripped” from e-books and other sources were uploaded to pirate book websites and then downloaded (i.e., copied again) by GenAI companies.
If an act would otherwise be infringing – i.e., if it violates one of the exclusive rights of the copyright holder – it can be allowed under the doctrine of “fair use.”
As the Stanford Libraries note,
Unfortunately, the only way to get a definitive answer on whether a particular use is a fair use is to have it resolved in federal court. Judges use four factors to resolve fair use disputes, as discussed in detail below. It’s important to understand that these factors are only guidelines that courts are free to adapt to particular situations on a case-by-case basis. In other words, a judge has a great deal of freedom when making a fair use determination, so the outcome in any given case can be hard to predict.
The four factors judges consider are:
- the purpose and character of your use
- the nature of the copyrighted work
- the amount and substantiality of the portion taken, and
- the effect of the use upon the potential market.
Bartz v. Anthropic
In September of 2025, Anthropic settled a class action case involving book authors and publishers, agreeing to pay $1.5 billion after a judge ruled that the company had illegally downloaded and stored millions of copyrighted works.
If the company hadn’t settled, statutory damages could have exceeded $1 trillion, according to Wired.
As we wrote in this blog,
…in Anthropic, the court ruled that while using legally acquired copyrighted books to train AI large language models constitutes fair use, downloading pirated copies of those books for permanent storage violates copyright law.
One witness at the hearing noted that the use of pirate book sites by GenAI companies is especially concerning.
He noted that documents showed that Meta employees knew that using pirated sites was illegal, but that Meta chair Mark Zuckerberg decided to proceed anyway. The witness concluded that “There is no carve out in the Copyright Act for AI companies to engage in mass piracy.”
Kadrey v. Meta
Just two days after the Anthropic decision, another district court judge in the same district in Northern California also held that training an LLM using copyrighted books was fair use.
The judge in that case didn’t have a problem with the LLM being trained on pirated books, finding the entire training process (including all its steps) to be fair use.
On the fourth fair-use factor listed above, the judge found no triable issue of fact on the impact on the potential market for the original work because there was no well-defined market for licensing copyrighted works for AI training. However, that market has been developing since that decision, with several reported licensing deals.
ROSS Intelligence
This case involved Ross trying to build an AI-powered legal research tool to compete with Westlaw, owned by Thomson-Reuters.
When Reuters refused to license its content, Ross obtained “bulk memos” generated from Westlaw’s copyrighted headnotes and used them for training.
The headnotes were copied only to generate numerical weights – not displayed to end users.
The court held that since Ross was trying to compete with Westlaw using Westlaw’s own materials, that wasn’t fair use.
Categories: Technology
