Report: Reasoning AI Models Fail When Problems Get Too Complicated

By PYMNTS | June 9, 2025

AI models, large reasoning models, complexity

The latest large reasoning models (LRMs) experience “complete accuracy collapse” when faced with highly complex tasks, according to a new paper co-authored by researchers from Apple.

These artificial intelligent (AI) models outperform standard models on some problems but do no better when problems get too complicated.

LRMs are trained to solve complex problems by showing them how to “think” step-by-step, just like a person might work through a puzzle. They generate detailed internal “thinking processes” before giving an answer, which has led to better performance on many tests.

But the new paper suggests these advanced AI models might hit a wall. The report indicates that even these models collapse completely in performance when faced with problems that go beyond a certain level of complexity.

The researchers wanted to look inside how these models operate, not just at their final answers. They felt that standard tests for AI performance might not tell the whole story, possibly because the AI had already seen similar problems during training.

Researchers used controllable puzzles like the Tower of Hanoi, Checkers Jumping, River Crossing and Blocks World, allowing them precise control over the difficulty of the puzzles by adding more disks, checkers, people or blocks, while keeping the basic rules the same. This allowed them to see exactly when and how the AI’s reasoning broke down as problems got harder.

Advertisement: Scroll to Continue

As puzzle complexity increased, the performance of these frontier LRMs didn’t just get a little worse; it suffered a “complete accuracy collapse,” often dropping to zero successful solutions beyond a certain point.

The researchers found that as the problems approached the point where the AI started failing, the LRMs began to reduce their reasoning effort, using fewer “thinking” steps or tokens, pointing to a fundamental limit in how they handle increasing difficulty.

On simple problems, the LRMs sometimes found the correct answer early but kept exploring wrong solutions — a form of “overthinking” that wastes effort. On harder problems, correct solutions appeared later, if at all. Beyond the collapse point, no correct solutions were found in the thinking process.

The study concluded that these findings point to fundamental limitations in how current LRMs tackle problems. While the “thinking” process helps delay failure, it doesn’t overcome these core barriers. The research raises questions about whether simply adding more “thinking” steps is enough to achieve truly general AI that can handle highly complex, novel problems.

Report: Reasoning AI Models Fail When Problems Get Too Complicated

Get the Full Story

Recommended

Trending News

The Big Story

Featured News

Subscribe

Partner with PYMNTS

Topics

Featured

Stay Current

Report: Reasoning AI Models Fail When Problems Get Too Complicated

Get the Full Story

Recommended

Trending News

The Big Story

Featured News

Subscribe

Partner with PYMNTS