Menu
California Court Rules on Motion to Dismiss in AI Training Case
April 10th, 2024
A California Federal Court has ruled in favor of a motion to dismiss a case brought by computer programmers who alleged that their work had been used to train artificial intelligence (AI) models to generate code.
The case is J. Doe 1 v. GitHub, Inc.
In their original complaint, filed in 2022, the developers, known as J. Doe 1 and J. Doe 2, alleged that Copilot and Codex, AI programs that can generate computer code in response to text prompts, were trained on the plaintiffs’ copyrighted code without their permission.
The plaintiff’s original complaint named GitHub (an open-source platform where the plaintiffs publish the code at issue and that also distributes Copilot), Microsoft (the owner of GitHub), and other OpenAI entities that programmed, trained, and/or maintained Codex as defendants.
The plaintiffs’ code was available under broad open-source licenses, so they couldn’t allege direct copyright infringement as had other plaintiffs in similar lawsuits over copyrighted training data used for generative AI (GAI).
Instead, the plaintiffs raised other related claims, including:
- Violation of the Digital Millennium Copyright Act (DMCA), because they claimed copyright management information (CMI) had been removed from the plaintiffs' code.
- Breach of contract, because the open-source licenses under which their code was distributed required a statement of attribution of authorship that was not included in the AI models or the “new” code they produced.
- Violation of the California Consumer Privacy Act.
- Tortious interference with the plaintiffs' contractual relationships.
The defendants moved to dismiss, and the court ruled on that motion in May 2023. Among other things, the court found that:
- The plaintiffs did have standing to seek an injunction because of the risk that their code would be reproduced in Codex and Copilot outputs.
- The plaintiffs didn’t have standing to seek monetary damages because they hadn’t shown that Copilot's output already reproduced their code.
- The plaintiffs properly alleged that the defendants intentionally designed their programs to remove CMI from the AI output.
The plaintiffs amended their complaint, this time providing examples of Copilot outputting the code of some of the plaintiffs either verbatim or with trivial changes.
The defendants again moved to dismiss the complaint, and the court again denied the motions in part and granted them in part.
The court found:
- By providing examples of where the AI output copied some of the plaintiffs’ code, they had standing for seeking monetary damages for copyright infringement.
- Violations of DMCA Section 1202(b) only occur “when CMI is removed or altered from an identical copy of a copyrighted work,” which was not the case here.
- The plaintiffs’ state law claims were preempted by federal copyright law.
Interestingly, the court found that plaintiffs can have standing to seek monetary damages for copyright infringement even when they themselves entered the prompts to the GAI that produced the allegedly infringing output.
The court noted:
In support of their position, Defendants contend that Plaintiffs “have not alleged any facts giving reason to believe that a real-world user plausibly has or would enter the sorts of prompts Plaintiffs used in their examples.” … They assert that Plaintiffs have neither explained that their code “frequently recurs in GitHub repositories,” nor that “anyone would want to copy their code.” … Maybe so, but Article III does not impose such requirements to confer standing for monetary damages. Further, the amount of damages for past harm suffered is a separate inquiry from whether Plaintiffs have alleged standing for damages in the first place.
(As the court noted, Article III of the US Constitution confines the federal judicial power to the resolution of ‘Cases’ and ‘Controversies.’ Lack of Article III standing requires dismissal for lack of subject matter jurisdiction under Federal Rule of Civil Procedure 12(b)(1).)
In other cases, by copyright owners against GAI companies, the owners have also made claims based on prompts they entered that generated copies or derivatives of their works.
In Authors Guild v. OpenAI Inc., the Authors Guild, John Grisham, Jodi Picoult, David Baldacci, George R.R. Martin, and 13 other authors filed a class-action suit against OpenAI.
Plaintiffs seek to represent a class of all people in the US who own a copyright in any work that was used as training data for OpenAI language models during the class period.
As the Author’s Guild noted,
The complaint draws attention to the fact that the plaintiffs’ books were downloaded from pirate ebook repositories and then copied into the fabric of GPT 3.5 and GPT 4, which power ChatGPT and thousands of applications and enterprise uses—from which OpenAI expects to earn many billions. These “professionally authored, edited, and published books” are “an especially important source of LLM ‘training’ data,” as the complaint states, because they allow GPT to provide better, more commercial outputs.
Attempts have been made to use GAI to generate the long-awaited volumes 6 and 7 of plaintiff George R.R. Martin’s Game of Thrones novel series.
The complaint alleges that “Open AI’s LLMs can spit out derivative works: material that is based on, mimics, summarizes or paraphrases Plaintiffs’ works, and harms the market for them.”
(An “LLM” is a large language model, a type of AI program that can recognize and generate text, among other things.)
The Author’s Guild complaint charges that
until recently, ChatGPT provided verbatim quotes of copyrighted text. Currently, it instead readily offers to produce summaries of such text. These summaries are themselves derivative works, the creation of which is inherently based on the original unlawfully copied work and could be—but for ChatGPT—licensed by the authors of the underlying works to willing, paying licensees.
As Publisher’s Weekly reported, in February, a federal court dismissed four out of six claims in the Author’s Guild lawsuit. The court gave the plaintiffs leave to amend their complaint, and the core claim of direct copyright infringement remains active.
The court order agreed with the complaint that
OpenAI copied Plaintiffs’ copyrighted books and used them in its training dataset. …. When prompted to summarize books written by each of the Plaintiffs, ChatGPT generated accurate summaries of the books’ content and themes.
Categories: Litigation