Content Creators Press for IP Protections in Face of AI

James Patterson has published hundreds of thrillers, but what frightens him is artificial intelligence (AI).

More to the point, the idea that his books were used — without his blessing — to train generative AI to write more books. 

“This will not end well for creatives,” Patterson told The Wall Street Journal on Sunday (July 30).

As that report notes, Patterson was among thousands of authors earlier this month who penned an open letter demanding AI firms get permission and pay writers for the use of their words to train AI models. Another group of writers have filed suit against OpenAI and Meta Platforms, accusing those companies of training their AI models using illegal copies of their books pulled from the internet.

Another lawsuit, undertaken last month by authors Paul Tremblay and Mona Awad, claims that OpenAI’s ChatGPT generates summaries of their books that are accurate to a degree that is only possible if it were trained on their works. The writers also allege this training on their copyrighted works was done without their permission.

 “At no point did ChatGPT reproduce any of the copyright management information Plaintiffs included with their published work,” their complaint said.

Last month also saw reports that news and magazine publishers were looking to band together to safeguard their businesses from AI companies.

Among these publishers’ worries is how their content, including text and images, has been used to train AI tools and whether they should be compensated. The publishers are also concerned that AI provides readers with information without requiring them to click links to go to the source.

However, lawyers told the WSJ in Sunday’s report that the size of the companies’ models could pose a roadblock for plaintiffs in search of copyright protection.

“The cases are new and dealing with questions of a scale that we haven’t seen before,” said Mehtab Khan, a resident fellow at Yale Law School’s Information Society Project. “The question becomes about feasibility. How will they reach out to every single author?” 

As noted here earlier this year, the ways businesses gather, collect and use data to power their AI solutions should be the key focus of regulatory frameworks.

“By enacting guardrails around the provenance of data used in large language models (LLMs) and other training models, making it obvious when an AI model is generating synthetic content, including text, images and even voice applications and flagging its source, governments and regulators can protect consumer privacy without hampering private sector innovation and growth,” PYMNTS wrote in April.