Does using art to train AI violate copyright law?

photo of an artist drawing

We’ve previously discussed the uproar over copyrighted works being used to train artificial intelligence (AI) tools to create “new” works using generative AI software like ChatGPT.

Reuters reports that a group of artists sued Stability AI, Midjourney, and DeviantArt. It accused them of committing mass copyright infringement using the artists' work in generative AI (GAI) systems.

In Andersen et al. v. Stability AI Ltd. et al., three artists—Sarah Andersen, Kelly McKernan, and Karla Ortiz—sued Stability AI Ltd., Stability AI, Inc., Midjourney, Inc., and DeviantArt, Inc. They alleged that Stability AI “copied and scraped” billions of images (including their art) to train an AI tool called “Stable Diffusion.”


When used to produce images from user prompts, Stable Diffusion uses the Training Images to create seemingly new ideas through a mathematical software process. These “new” images are based entirely on the Training Images and are derivative works of the particular pictures Stable Diffusion draws from when assembling a given output. Ultimately, it is merely a complex collage tool.

These resulting derived images compete in the marketplace with the original images. Until now, when a purchaser seeks a new idea “in the style” of a given artist, they must pay to commission or license an original image from that artist. Now, those purchasers can use the artist’s works contained in Stable Diffusion and the artist’s name to generate new works in the artist’s style without compensating the artist at all. As used herein, the phrase “in the style of” refers to a work that others would accept as a work created by that artist whose “style” was called upon, not the general category of work, such as fantasy or impressionism. Only a minimal number of incredibly talented artists can do this same feat for a single other artist (i.e., reproducing art that is convincingly in that artist’s style), let alone for countless other artists. AI Image Products do so with ease by violating the rights of millions of artists.

The plaintiffs allege that these copycat works are already being sold online, “siphoning commissions from the artists themselves.”

The companies have asked the court to dismiss the suit because “the AI-created images are not similar to the artists' work, and that the lawsuit did not note specific images that were allegedly misused.”

The defendants also said that only plaintiff Sarah Andersen had asserted copyright registrations and thus could sue for infringement.

The plaintiffs conceded that none of the images generated by the AI would “likely . . . be a close match” to their original art used to train the AI tool because of the way the AI “mixes up” the content it’s been introduced on.

According to the plaintiffs, the AI violated their exclusive rights to copy their work even if the resulting images weren’t substantially similar.

However, they said,

It is reasonable to infer that substantial similarity exists between the output image [generated by the AI] and the source images [created by human artists] that have been blended to create a particular output image…

The California federal judge in charge of the case recently indicated at a hearing that he was inclined to dismiss the bulk of the plaintiffs’ complaint while allowing them to re-plead their claims.

He suggested that the claims most likely to be allowed to go forward were those of Anderson, presumably because of her copyright registration.

He also stated that the plaintiffs needed to better allege which of the defendants infringed which of the plaintiffs’ works.

A published decision is expected soon and will likely influence how similar pending lawsuits involving other forms of intellectual property (IP), such as books, are handled.

As the New York Times reported, the best-selling author and comedian Sarah Silverman has joined a class-action lawsuit against OpenAI and another against Meta, accusing them of copyright infringement for using her works, such as her memoir “The Bedwetter,” without permission to train a GAI tool.

The suit alleges as The Verge reports,

that OpenAI’s ChatGPT and Meta’s LLaMA were trained on illegally-acquired datasets containing their works, which they say were acquired from “shadow library” websites like Bibliotik, Library Genesis, Z-Library, and others, noting the books are “available in bulk via torrent systems.”

Exhibits presented by the plaintiffs show that ChatGPT will summarize the plaintiffs’ books when prompted. The only way this could happen, claim the plaintiffs, is if the GAI tool copied the books for which the authors haven’t given their permission.

The lawsuits include claims for copyright violations, negligence, unjust enrichment, and unfair competition.

As discussed in the previous blog, Adobe recently released an AI image generator called Abobe Firefly that it claims won’t infringe on third-party intellectual property (IP) rights.

This is because the Adobe GAI tool is trained on licensed Adobe Stock and public domain images where the copyright has expired.

Adobe is so confident that its AI generator doesn’t infringe third-party IP and that its users won’t get sued for infringement that “Enterprises also have the opportunity to obtain an IP indemnity from Adobe for content generated by select workflows powered by Firefly.”

However, as FastCompany notes,

One question that hasn’t yet been answered is how the stock imagery creators used to train Firefly will be compensated for their work. In an FAQ on Firefly’s website, Adobe says only that it is “developing a compensation model for Adobe Stock contributors, and we’ll share the details of this model when Firefly exits beta.”

Getty Images, the stock photo company, is also suing Stable Diffusion, saying it “unlawfully copied and processed millions of images protected by copyright” to train its GAI software.

As The Verge notes,

AI firms claim this practice is covered by laws like the US fair use doctrine, but many rights holders disagree and say it constitutes a copyright violation. Legal experts are divided on the issue but agree that such questions must be asked.

Categories: Copyright