
Big Tech and Big Media don’t agree on much when it comes to artificial intelligence, but there is one surprising area of concurrence: TDM opt-outs. Both sides acknowledge there currently is no technically feasible means to enforce discretionary restrictions on access to data that is otherwise publicly available. That’s as far as the harmony goes, however.
With respect to both the EU AI Act and the U.K.’s proposed Data (Access and Use) Act, the creative industries have objected to the laws’ broad text-and-data mining (TDM) exceptions to copyright despite both measures allowing rights holders to opt-out of allowing their works to be included in TDM processes, such as AI training. While their objections are mainly grounded in economic and competitive concerns, they also argue that opt-out provisions are unworkable due to the lack of technical standards for asserting and enforcing those preferences.
The U.K. government recently has sought to placate copyright owners by offering to study the economic impact and technical feasibility of the TDM proposal. The third, and penultimate, draft of the EU’s AI Code of Practice likewise seeks to cushion the blow to rights owners by adding greater data transparency requirements on AI companies. But AI companies have emerged as unlikely allies for the creative industries on the technical feasibility of opting-out.
In comments submitted in the U.K. government’s public consultation on the Data Bill, OpenAI insists neither the British nor European opt-out provisions are workable.
“In the EU, the lack of clear and scalable technical standards has created uncertainty about what opt-out methods are workable and valid, causing uncertainty for both AI companies and rightsholders,” the ChatGPT-maker wrote. “The UK has a rare opportunity to cement itself as the AI capital of Europe by making choices that avoid policy uncertainty, foster innovation, and drive economic growth.”
Google voiced similar concerns in its comments to the Department for Science, Innovation and Technology, which is conducting the consultation. “Excessive transparency requirements… could hinder AI development and impact the UK’s competitiveness in this space,” the search giant wrote.
Currently, the only widely used method for restricting access to data on public websites is the decades-old robots.txt protocol. Introduced long before generative AI technology was publicly available, the robots protocol is a blunt instrument. It either permits or denies access to a site by web crawlers. But it cannot distinguish among text, audio, and video data and it cannot prevent access to downstream copies of the data.
Related: U.K. Government Offers Concessions on Data Bill, But Can’t Quell Opposition
Critically, while the AI Act does require AI developers to comply with robots.txt restrictions, compliance with the protocol is otherwise voluntary.
The most robust effort to create a more robust method to assert rights reservations comes from the Adobe-led Coalition for Content Provenance and Authenticity (C2PA). Originally intended to provide provenance metadata for digital content, the initiative added a “Do Not Train” assertion to the protocol after the introduction of ChatGPT in 2022. While the C2PA technical standard has gained support in the past year from several leading AI developers, including OpenAI, Google, Microsoft, and Meta, it, too, remains voluntary.
The opt-out debate is not only a question of technical feasibility,” however. Rights owners maintain that a reliable means for controlling access is essential to sustaining a robust market for AI training data. Technology companies are keen to prevent such a market from developing, both for reasons of cost and competitiveness.
“If, as the Government proposes, reserving rights against any text and data mining can be done collectively and ‘easily,’ there is a risk that most copyrighted data will become unavailable for text and data mining uses,” OpenAI wrote. “Under such a system, the Government’s access objective would not be met, because only the wealthiest and already data-rich technology companies would be able to access the quantity and diversity of data needed to train advanced models. This would not be a workable or sensible system for AI developers, who would be forced to disclose their most sensitive training details in order to strengthen the negotiating position of copyright owners.”
“We believe training on the open web must be free,” Google added.
Featured News
Meta Begins Defense After FTC Concludes Case in Landmark Antitrust Trial
May 15, 2025 by
CPI
UK Data Bill Still No Closer to Passage As Parliamentary ‘Ping-Pong’ Drags On
May 15, 2025 by
CPI
Regeneron Pharmaceuticals Awarded $271.2M in Damages Against Amgen
May 15, 2025 by
CPI
FTC Chair Proposes 15% Staff Reduction Amid Budget Constraints
May 15, 2025 by
CPI
UK Urges Antitrust Watchdog to Prioritize Growth and Clarity in Business Regulation
May 15, 2025 by
CPI
Antitrust Mix by CPI
Antitrust Chronicle® – Healthcare Antitrust
May 14, 2025 by
CPI
Healthcare & Antitrust: What to Expect in the New Trump Administration
May 14, 2025 by
Nana Wilberforce, John W O'Toole & Sarah Pugh
Patent Gaming and Disparagement: Commission Fines Teva For Improperly Protecting Its Blockbuster Medicine
May 14, 2025 by
Blaž Višnar, Boris Andrejaš, Apostolos Baltzopoulos, Rieke Kaup, Laura Nistor & Gianluca Vassallo
Strategic Alliances in the Pharma Sector: An EU Competition Law Perspective
May 14, 2025 by
Christian Ritz & Benedikt Weiss
Monopsony Power in the Hospital Labor Market
May 14, 2025 by
Kevin E. Pflum & Christian Salas