Meta Accused of Using Pirated Books to Train AI, Court Documents Reveal

By CPI | January 9, 2025

Meta Platforms the parent company of Facebook, is facing fresh accusations from a group of authors claiming the tech giant knowingly used pirated books to train its artificial intelligence systems, according to court documents recently made public. The filings, disclosed on Wednesday in a California federal court, allege that Meta’s internal communications reveal CEO Mark Zuckerberg approved the use of the unauthorized materials despite concerns from company executives.

The lawsuit, originally filed in 2023, was brought by prominent figures including author Ta-Nehisi Coates and comedian Sarah Silverman. Per Reuters, the authors argue that Meta unlawfully utilized their works to train its large language model, LLaMA. The court documents suggest that Meta employed the controversial dataset known as “LibGen,” a repository that allegedly contains millions of pirated books, and distributed it via peer-to-peer torrent networks.

According to Reuters, internal documents indicate that Meta executives, including Zuckerberg, were aware of the dataset’s illicit nature. The court filing quotes a Meta communication stating that the company used “a dataset we know to be pirated,” despite internal concerns about its legality.

The lawsuit is part of a broader wave of legal challenges faced by tech companies over the use of copyrighted materials in developing AI tools. Defendants in these cases, including Meta, have largely argued that their practices fall under the doctrine of “fair use,” which allows limited use of copyrighted works under certain circumstances. However, the authors involved in this case argue that Meta crossed a legal boundary by knowingly incorporating unauthorized texts into its AI training process.

The authors have now asked the court for permission to submit an amended complaint, citing new evidence uncovered during the case. In their request, the plaintiffs also seek to revive a previously dismissed claim related to copyright management information (CMI) and add a new allegation of computer fraud. They contend that Meta’s use of the LibGen dataset strengthens their case for both copyright infringement and CMI violations.

Last year, U.S. District Judge Vince Chhabria dismissed some of the authors’ initial claims, including allegations that text generated by Meta’s AI systems directly infringed on their copyrights. He also rejected accusations that Meta had unlawfully removed copyright management information from the books. However, during a hearing on Thursday, Chhabria indicated he would allow the plaintiffs to file an updated complaint.

Despite permitting the amended filing, Chhabria expressed doubts about the viability of the new fraud and CMI claims. According to Reuters, he stated during the hearing that while he was open to letting the case proceed, he remained skeptical about the strength of those specific allegations.

Source: Reuters

Meta Accused of Using Pirated Books to Train AI, Court Documents Reveal

Get the Full Story

Featured News

Antitrust Mix by CPI

Subscribe to our newsletter