🔴 LIVE — Updated every 10 minutes
👤 -- reading now 🌡 Nairobi
Breaking
HomeTechnologySurprise upset: GPT-5.5 beats Claude Fable…
Technology

Surprise upset: GPT-5.5 beats Claude Fable 5 on brutal new Agents’ Last Exam benchmark

VentureBeat Jun 10, 2026 3h ago ⏱ 1 min read 👁 3 views
Surprise upset: GPT-5.5 beats Claude Fable 5 on brutal new Agents’ Last Exam benchmark
Image via VentureBeat
📋 Article Summary
203 words
Researchers from the University of California, Berkeley's Center for Responsible, Decentralized Intelligence (RDI), alongside an advisory committee of over 300 domain experts, have launched Agents’ Last Exam (ALE)—a grueling new benchmark built to measure whether artificial intelligence can actually execute… Researchers from the University of California, Berkeley's Center for Responsible, Decentralized Intelligence (RDI), alongside an advisory committee of over 300 domain experts, have launched Agents’ Last Exam (ALE)—a grueling new benchmark built to measure whether artificial intelligence can actually execute economically valuable, long-horizon professional workflows.In a shocking upset, OpenAI’s GPT-5.5 from April, operating through the Codex harness, secured the absolute top spot on the new ALE Leaderboard with a 24.0% pass rate, beating Anthropic's highly anticipated, brand new Mythos-class Claude Fable 5 model released just yesterday, which came in third with a score of 22.0%.Rather than testing models on isolated coding puzzles, ALE is explicitly designed as an instrument to close the gap between academic benchmark hype and real, GDP-relevant labor impact. And right now, the data proves the most advanced models in the world are fundamentally failing the exam.Ending the Era of 'Cheating' and Brittle GradersThe fundamental shift in ALE lies in its evaluation architecture and the demands it…
Continue Reading
Full story on VentureBeat
Read Full Story →
🔗 Clicking will take you to venturebeat.com
Share this story: WhatsApp X/Twitter Facebook
👁 People Also Read
How Justin Ernest invested nearly $400M into hot startups without a traditional VC fund
Technology

How Justin Ernest invested nearly $400M into hot startups without a traditional VC fund

Instead of spending a year raising a formal venture fund, the Sabertooth VC founder used a captive network of LPs…

Read
KE
Cohere open-sources a coding agent that runs on a single H100
Technology

Cohere open-sources a coding agent that runs on a single H100

Engineering teams building agentic coding pipelines now have a concrete open-source alternative to managed models like Claude Fable 5 —…

Read
KE
Researchers say they trained a foundation model from scratch for about $1,500
Technology

Researchers say they trained a foundation model from scratch for about $1,500

Training a foundation LLM from scratch costs millions and requires internet-scale data — which is why most enterprises don't bother.…

Read
Hey Siri, here’s what I actually want from AI
Technology

Hey Siri, here’s what I actually want from AI

I'm desperate for a personal AI assistant, but do I really want to become the kind of person who can't…

Read