VentureBeatLarge Language Models 10 juni 2026

Surprise upset: GPT-5.5 beats Claude Fable 5 on brutal new Agents’ Last Exam benchmark

Researchers from the University of California, Berkeley's Center for Responsible, Decentralized Intelligence (RDI), alongside an advisory committee of over 300 domain experts, have launched Agents’ Last Exam (ALE)—a grueling new benchmark built to measure whether artificial intelligence can actually execute economically valuable, long-horizon professional workflows.In a shocking upset, OpenAI’s GPT-5.5 from April, operating through the Codex harness, secured the absolute top spot on the new

Lees het volledige artikel bij VentureBeat

AI implementeren in jouw bedrijf?

Stekz helpt bedrijven met het implementeren van AI en automatisering. Van strategie tot werkende code.

Plan een gesprek

Gerelateerd nieuws

Wired

Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude

The company changed course after researchers spoke out against the policy, which would have covertly...

VentureBeat

Researchers say they trained a foundation model from scratch for about $1,500

Training a foundation LLM from scratch costs millions and requires internet-scale data — which is wh...

VentureBeat

Anthropic CEO calls for FAA-style regulation of powerful AI models: what enterprises should know

In a sweeping new essay titled "Policy on the AI Exponential," Anthropic co-founder and CEO Dario Am...

The Verge

Claude Fable won’t answer basic biology questions

Anthropic just released Claude Fable 5, calling it the most powerful AI model it has ever made widel...