New York Times Sues OpenAI and Microsoft for Copyright Infringement
January 9th, 2024
At the end of December, the New York Times sued OpenAI and Microsoft for copyright infringement, claiming that their generative AI (GAI or GenAI) tools were trained on articles published by the Times without its authorization.
There have been similar previous cases filed by others, including writer and performer Sarah Silverman, but the Times case is the first by a major US media organization.
According to the 69-page complaint in the lawsuit, GAI is an existential threat to independent journalism:
Independent journalism is vital to our democracy. It is also increasingly rare and valuable. For more than 170 years, The Times has given the world deeply reported, expert, independent journalism. Times journalists go where the story is, often at great risk and cost, to inform the public about important and pressing issues. They bear witness to conflict and disasters, provide accountability for the use of power, and illuminate truths that would otherwise go unseen. Their essential work is made possible through the efforts of a large and expensive organization that provides legal, security, and operational support, as well as editors who ensure their journalism meets the highest standards of accuracy and fairness. This work has always been important. But within a damaged information ecosystem that is awash in unreliable content, The Times’s journalism provides a service that has grown even more valuable to the public by supplying trustworthy information, news analysis, and commentary.
Defendants’ unlawful use of The Times’s work to create artificial intelligence products that compete with it threatens The Times’s ability to provide that service.
The complaint alleges that although the defendants copied from many sources to train their large-language models (LLMs) they gave content from The Times priority in recognition of the value of the content.
The complaint notes that
The Times’s coverage has been widely recognized with many industry and peer accolades, including 135 Pulitzer Prizes since its first Pulitzer award in 1918 (nearly twice as many as any other organization).
The complaint alleges that Microsoft’s Bing Chat (recently rebranded as “Copilot”) and OpenAI’s ChatGPT “seek to free-ride on The Times’s massive investment in its journalism by using it to build substitutive products without permission or payment.”
The complaint includes exhibits that show that the defendants’ GAI tools “can generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style.”
The Times alleges that Bing generates “responses that contain verbatim excerpts and detailed summaries of Times articles that are significantly longer and more detailed than those returned by traditional search engines.”
This, says The Times, undermines and damage The Times’s relationship with its readers and deprives The Times of subscription, licensing, advertising, and affiliate revenue.
The GAI tools also allegedly “wrongly attribute false information to The Times.”
This is a phenomenon known as an AI “hallucination.”
As IBM explains,
AI hallucination is a phenomenon wherein a large language model (LLM)—often a generative AI chatbot or computer vision tool—perceives patterns or objects that are nonexistent or imperceptible to human observers, creating outputs that are nonsensical or altogether inaccurate.
Generally, if a user makes a request of a generative AI tool, they desire an output that appropriately addresses the prompt (i.e., a correct answer to a question). However, sometimes AI algorithms produce outputs that are not based on training data, are incorrectly decoded by the transformer or do not follow any identifiable pattern. In other words, it “hallucinates” the response.
In other words, AI tools make things up and thus can’t be relied upon for accurate information in some contexts.
For example, in June, as Reuters reported, an attorney faced a sanctions hearing when he used OpenAI’s ChatGPT program to help research a brief and it cited six non-existent cases he included in his court filing without checking them,
As Reuters noted,
a federal judge in Texas last week issued a requirement for lawyers in cases before him to certify that they did not use AI to draft their filings without a human checking their accuracy.
The Times asserts that using intellectual property (IP) of others has been lucrative for the defendants:
Microsoft’s deployment of Times-trained LLMs throughout its product line helped boost its market capitalization by a trillion dollars in the past year alone. And OpenAI’s release of ChatGPT has driven its valuation to as high as $90 billion.
The Times said for months it’s been trying to reach a negotiated settlement with the defendants, but to no avail.
The complaint notes that
The Times requires third parties to obtain permission before using Times content and trademarks for commercial purposes, and for decades The Times has licensed its content under negotiated licensing agreements. These agreements help ensure that The Times controls how, where, and for how long its content and brand appears and that it receives fair compensation for third-party use. Third parties, including large tech platforms, pay The Times significant royalties under these agreements in exchange for the right to use Times content for narrowly defined purposes.
The defendants contend that their use of The Times IP is allowed as “fair use” under US copyright law because the use of copyrighted content serves a new “transformative” purpose.
The Times contends this is not fair use “because the outputs of Defendants’ GenAI models compete with and closely mimic the inputs used to train them.”
The complaint notes that although defendant OpenAI was founded in 2015 as a non-profit research company with $1 billion in seed money from funders like “Elon Musk, the CEO of Tesla and X Corp. (formerly known as Twitter); Reid Hoffman, the co-founder of LinkedIn; Sam Altman, the former president of Y Combinator; and Greg Brockman, the former Chief Technology Officer of Stripe.”
However, OpenAI has since become a multi-billion-dollar for-profit business.
As an article in the Times notes,
The suit does not include an exact monetary demand. But it says the defendants should be held responsible for “billions of dollars in statutory and actual damages” related to the “unlawful copying and use of The Times’s uniquely valuable works.” It also calls for the companies to destroy any chatbot models and training data that use copyrighted material from The Times.