OpenAI and Microsoft Confront Copyright Lawsuits in New York Over AI Training Methods

In a significant development that could have far-reaching implications for the AI industry, OpenAI and Microsoft are facing a series of lawsuits over allegations of copyright infringement. Nonfiction authors Nicholas Basbanes and Nicholas Gage, along with The New York Times, have filed separate legal actions, accusing the tech giants of using copyrighted material without authorization to train large language models, including ChatGPT.

The Class-Action Lawsuit by Authors

The lawsuit filed by Basbanes and Gage in a Manhattan federal court is the latest in a string of legal challenges against the practice of using existing creative content for AI training. The authors, renowned for their contributions to journalism and nonfiction, allege that their copyrighted works were used to train AI models without their permission. This claim follows similar suits from other authors, including George R.R. Martin and Jonathan Franzen, indicating a growing unease within the literary community about AI’s use of their intellectual property.

The authors are seeking up to $150,000 in damages for each infringed work. Depending on the number of works involved, the total compensation could amount to millions of dollars. Furthermore, this lawsuit aims to represent a broader group of authors and creators, potentially expanding the legal battle to include thousands more claimants.

The New York Times’ Separate Legal Action

Adding to the pressure on Microsoft and OpenAI, The New York Times has launched its lawsuit, alleging the unauthorized use of its journalistic content to train AI models. This separate legal action by a leading global news organization underscores the potential scale of the issue, as it involves a wide array of copyrighted content utilized in AI training.

The New York Times is not only seeking damages but also an order for the companies to cease using its content for AI training and to destroy any data already collected. The exact amount of damages has not been specified, but the newspaper estimates it could amount to billions of dollars.

The Fair Use Defense and its Limitations

OpenAI and Microsoft have defended their use of copyrighted works for AI training as falling under the umbrella of “fair use,” a legal doctrine that permits unlicensed use of copyrighted material under certain conditions. However, The New York Times counters that the use of its content is not transformative and directly competes with its own operations, potentially reducing traffic and affecting revenue streams.

Instances where AI chatbots provided users with almost verbatim excerpts from NYT articles have been highlighted in the lawsuit. The newspaper expressed concerns about the impact on quality journalism and the challenges for readers in distinguishing fact from fiction, including instances of AI technology falsely attributing information to the newspaper.

OpenAI’s Response and Microsoft’s Silence

OpenAI expressed surprise and disappointment at the lawsuit, pointing to ongoing discussions with The New York Times that were progressing constructively. On the other hand, Microsoft has not yet released a public statement addressing the issue.

The Larger Implications for the AI Industry

These lawsuits mark a crucial juncture in the AI industry, spotlighting the ethical and legal complexities surrounding the use of copyrighted material for AI training. The outcomes of these legal battles could set precedents for how AI companies source training data and respect intellectual property rights.

Read More:

The legal challenges faced by OpenAI and Microsoft highlight a critical conversation about copyright laws in the age of AI. As AI technology continues to evolve and become more integrated into various sectors, the need for clear guidelines on the use of copyrighted material becomes increasingly pressing. These lawsuits could pave the way for more stringent regulations and a better understanding of the ethical implications of AI development, shaping the future of technology and its interaction with creative content.