COLIBRIX ONE × BitGN: 1,000 Engineers Tested AI Agents in Ecommerce. Only 2.3% Passed the Benchmark

Following the first wave of the Agentic Ecommerce Challenge, COLIBRIX ONE shares early findings and announces ECOM2, opening this June

Valletta, Malta, 26 May 2026 — Over 1,000 engineers across 97 cities spent weeks stress-testing autonomous AI agents against real ecommerce and acquiring scenarios inside the ECOM1 sandbox. What emerged was not simply a developer competition, but one of the first large-scale industry datasets showing where agentic commerce systems succeed, fail, and break under real operational pressure. As lead partner, COLIBRIX ONE is publishing the findings ahead of ECOM2.

COLIBRIX ONE is among the first European EMIs to run a benchmark of this kind, systematically document how AI agents behave across real acquiring scenarios at this scale. The sandbox was launched to answer one question the industry has not yet resolved: which AI agents actually hold up in live commercial operations?

What the Agents Actually Did

Across 931,000 scored sandbox trials, clear patterns began to emerge around how AI agents behave under real commercial pressure. 

“ECOM1 revealed that the real challenge for agents is not simply completing a purchase. It is operating within the boundaries of a real commerce process: verifying customer identity, confirming the right to act, checking payment status, validating discount authority, and respecting privacy. In the DEV segment, across more than 246,000 task trials, the average score was around 20%, and only 2.3% of runs completed the benchmark in full,” said Rinat Abdullin, Founder of BitGN. 

The challenge sandbox was modelled on the friction points COLIBRIX ONE sees in live operations: fraud scenarios, Secure Customer Authentication failures, cross-border routing decisions, installment eligibility, and chargeback disputes. Which architectures held up, which broke, and where the gaps are — the full findings will be published on the COLIBRIX ONE website before ECOM2 opens. 

ECOM2 Opens in June

In recent days alone, ECOM1 generated more than 177,000 task-level attempts and over 2.5 million agent calls across 82 active teams. Due to strong participation, COLIBRIX ONE and BitGN are moving directly into a second competition: ECOM2 launches in June.

The format is designed to compound. Insights and top solutions from ECOM1 will be published before ECOM2 opens, allowing participants to learn from top-performing solutions and build stronger agents for the next round. ECOM2 will introduce scenarios closely aligned with fintech-specific requirements, with additional partners from the payments space.

ECOM1 is Still Open

If you are building AI agents and want to test them against real payment and commerce logic, there is still time to enter ECOM1, compete, and get your results onto the leaderboard before the round closes.

Registration: https://colibrix.one/agentic-ecommerce-challenge

Media Contact 

Aleksandra Kitina, CMO, COLIBRIX ONE 

alk@colibrix.one 

About COLIBRIX ONE 

COLIBRIX ONE is a payments infrastructure platform providing acquiring, current business accounts, fraud prevention, cross-border routing, and compliance frameworks for ecommerce businesses globally. COLIBRIX ONE is a trading name of Mellifera Operations Limited and Mellifera Kartiera Limited. Mellifera Kartiera Limited is an MFSA-authorised EMI (ref. C107685).

About BitGN 

BitGN is a technology innovation organisation focused on advancing agentic AI applications for commerce and payments, operating from its Vienna headquarters within AI Factory Austria.

Photo: Participants of the BitGN PAC1 Challenge, Vienna HQ Hub.

Divider Line
Press Contact

Aleksandra Kitina, CMO

COLIBRIX ONE

alk@colibrix.one