AI models still far from AGI-level reasoning: Apple researchers

Current “thinking” AI models still can’t reason to a level that would be expected from humanlike artificial general intelligence, the researchers found.
The race to develop artificial general intelligence (AGI) still has a long way to run, according to Apple researchers who found that leading AI models still have trouble reasoning.
Recent updates to leading AI large language models (LLMs) such as OpenAI’s ChatGPT and Anthropic’s Claude have included large reasoning models (LRMs), but their fundamental capabilities, scaling properties, and limitations “remain insufficiently understood,” said the Apple researchers in a June paper called “The Illusion of Thinking.”
They noted that current evaluations primarily focus on established mathematical and coding benchmarks, “emphasizing final answer accuracy.”
However, this evaluation does not provide insights into the reasoning capabilities of the AI models, they said.
The research contrasts with an expectation that artificial general intelligence is just a few years away.
Apple researchers test “thinking” AI models
The researchers devised different puzzle games to test “thinking” and “non-thinking” variants of Claude Sonnet, OpenAI’s o3-mini and o1, and DeepSeek-R1 and V3 chatbots beyond the standard mathematical benchmarks.
They discovered that “frontier LRMs face a complete accuracy collapse beyond certain complexities,” don’t generalize reasoning effectively, and their edge disappears with rising complexity, contrary to expectations for AGI capabilities.
“We found that LRMs have limitations in exact computation: they fail to use explicit algorithms and reason inconsistently across puzzles.”
Get to know Godleak
Godleak crypto signal is a service which provide profitable crypto and forex signals. Godleak tried to provide you signals of best crypto channels in the world.
It means that you don’t need to buy individual crypto signal vip channels that have expensive prices. We bought all for you and provide you the signals with bot on telegram without even a second of delay.
Godleak crypto leak service have multiple advantages in comparision with other services:
- Providing signal of +160 best crypto vip channels in the world
- Using high tech bot to forward signals
- Without even a second of delay
- Joining in +160 separated channels on telegram
- 1 month, 3 months , 6 months and yearly plans
- Also we have trial to test our services before you pay for anything
For joining Godleak and get more information about us only need to follow godleak bot on telegram and can have access to our free vip channels. click on link bellow and press start button to see all features
Join for Free
☟☟☟☟☟
https://t.me/Godleakbot
Also you can check the list of available vip signal channels in the bot. by pressing Channels button.
AI chatbots are overthinking, say researchers
They found inconsistent and shallow reasoning with the models and also observed overthinking, with AI chatbots generating correct answers early and then wandering into incorrect reasoning.
The researchers concluded that LRMs mimic reasoning patterns without truly internalizing or generalizing them, which falls short of AGI-level reasoning.
“These insights challenge prevailing assumptions about LRM capabilities and suggest that current approaches may be encountering fundamental barriers to generalizable reasoning.”
The race to develop AGI
AGI is the holy grail of AI development, a state where the machine can think and reason like a human and is on a par with human intelligence.
In January, OpenAI CEO Sam Altman said the firm was closer to building AGI than ever before. “We are now confident we know how to build AGI as we have traditionally understood it,” he said at the time.
In November, Anthropic CEO Dario Amodei said that AGI would exceed human capabilities in the next year or two. “If you just eyeball the rate at which these capabilities are increasing, it does make you think that we’ll get there by 2026 or 2027,” he said.