
Several leading artificial intelligence models are struggling to meet stringent European Union regulations in areas such as cybersecurity resilience and the prevention of discriminatory outputs, according to data reviewed by Reuters. The results come from a newly developed tool designed to test compliance with the EU’s upcoming Artificial Intelligence Act.
The European Union has long debated regulations for AI systems, but the public release of OpenAI’s ChatGPT in 2022 accelerated these discussions. The chatbot’s rapid popularity and the surrounding concerns over potential existential risks prompted lawmakers to draw up specific rules aimed at “general-purpose” AI (GPAI) systems. In response, a new AI evaluation framework has been developed, offering insights into the performance of top-tier models against the incoming legal standards.
New AI Compliance Tool Highlights Concerns
A tool designed by Swiss startup LatticeFlow AI, in collaboration with research institutes ETH Zurich and Bulgaria’s INSAIT, tested AI models from companies like OpenAI, Meta, Alibaba and others across numerous categories aligned with the EU’s AI Act. This tool has been praised by European officials as a valuable resource for measuring AI models’ readiness for compliance.
According to Reuters, the AI models were assessed in areas like technical robustness, safety and other critical factors. The models received scores ranging from 0 to 1, with a higher score indicating greater compliance. Most models tested, including those from OpenAI, Meta and Alibaba, scored an average of 0.75 or above. However, the “Large Language Model (LLM) Checker” also revealed significant shortcomings in areas that will need improvement if these companies hope to avoid regulatory penalties.
Companies that fail to meet the requirements of the AI Act could face fines of up to 35 million euros ($38 million), or 7% of their global annual turnover. Although the EU is still defining how rules around generative AI, such as ChatGPT, will be enforced, this tool provides early indicators of areas where compliance may be lacking.
Key Shortcomings: Bias and Cybersecurity
One of the most critical areas highlighted by the LLM Checker is the issue of discriminatory output. Many generative AI models have been found to reflect human biases related to gender, race and other factors. In this category, OpenAI’s GPT-3.5 Turbo model received a score of 0.46, while Alibaba’s Qwen1.5 72B Chat model fared even worse, scoring just 0.37.
Cybersecurity vulnerabilities were also spotlighted. LatticeFlow tested for “prompt hijacking,” a form of attack in which hackers use deceptive prompts to extract sensitive information. Meta’s Llama 2 13B Chat model scored 0.42 in this category, while French startup Mistral’s 8x7B Instruct model scored 0.38, according to Reuters.
Anthropic’s Claude 3 Opus, backed by Google, performed the best overall, receiving an average score of 0.89, making it the top performer across most categories.
A Step Toward Regulatory Compliance
The LLM Checker was developed to align with the AI Act’s evolving requirements and is expected to play a larger role as enforcement measures are introduced over the next two years. LatticeFlow has made the tool freely available, allowing developers to test their models’ compliance online.
Petar Tsankov, CEO and co-founder of LatticeFlow, told Reuters that while the results were generally positive, they also serve as a roadmap for companies to make necessary improvements. “The EU is still working out all the compliance benchmarks, but we can already see some gaps in the models,” he said. Tsankov emphasized that with more focus on optimizing for compliance, AI developers can better prepare their models to meet the stringent standards of the AI Act.
Although some companies declined to comment, including Meta and Mistral, and others like OpenAI, Anthropic and Alibaba did not respond to requests for comment, the European Commission has been following the tool’s development closely. A spokesperson for the Commission stated that the platform represents “a first step” in translating the EU AI Act into technical compliance requirements, signaling that more detailed enforcement measures are on the way.
This new test provides tech companies with valuable insights into the challenges ahead as they work to meet the EU’s AI regulations, which are expected to be fully implemented by 2025.
Source: Reuters
Featured News
Cautious Optimism From AI Execs Over Planned Lifting of Export Controls, But Concerns Remain
May 8, 2025 by
CPI
UK Holds Firm on Digital Tax for US Tech Giants Despite New Trade Deal
May 8, 2025 by
CPI
Pro Tennis Governing Body Barred from Influencing Players in Antitrust Lawsuit
May 8, 2025 by
CPI
Mastercard Wins Dismissal of Antitrust Suit Over Digital Wallet Access
May 8, 2025 by
CPI
J&J Antitrust Trial Heats Up as Innovative Health CEO Testifies on Market Suppression
May 8, 2025 by
CPI
Antitrust Mix by CPI
Antitrust Chronicle® – Mergers in Digital Markets
Apr 21, 2025 by
CPI
Catching a Killer? Six “Genetic Markers” to Assess Nascent Competitor Acquisitions
Apr 21, 2025 by
John Taladay & Christine Ryu-Naya
Digital Decoded: Is There More Scope for Digital Mergers In 2025?
Apr 21, 2025 by
Colin Raftery, Michele Davis, Sarah Jensen & Martin Dickson
AI In the Mix – An Ever-Evolving Approach to Jurisdiction Over Digital Mergers in Europe
Apr 21, 2025 by
Ingrid Vandenborre & Ketevan Zukakishvili
Antitrust Enforcement Errors Due to a Failure to Understand Organizational Capabilities and Dynamic Competition
Apr 21, 2025 by
Magdalena Kuyterink & David J. Teece