Submission + - We are nowhere near AGI (x.com)
schwit1 writes: Humans: 100%
Gemini 3.1 Pro: 0.37%
GPT 5.4: 0.26%
Opus 4.6: 0.25%
Grok-4.20: 0.00%
François Chollet just released ARC-AGI-3 — the hardest AI test ever created.
135 novel game environments. No instructions. No rules. No goals given.
Figure it out or fail.
Untrained humans solved every single one. Every frontier AI model scored below 1%.
Each environment was handcrafted by game designers. The AI gets dropped in and has to explore, discover what winning looks like, and adapt in real time.
The scoring punishes brute force. If a human needs 10 actions and the AI needs 100, the AI doesn't get 10%. It gets 1%. You can't throw more compute at this.
For context: ARC-AGI-1 is basically solved. Gemini scores 98% on it. ARC-AGI-2 went from 3% to 77% in under a year. Labs spent millions training on earlier versions.
ARC-AGI-3 resets the entire scoreboard to near zero.
Abstract and more here.
Gemini 3.1 Pro: 0.37%
GPT 5.4: 0.26%
Opus 4.6: 0.25%
Grok-4.20: 0.00%
François Chollet just released ARC-AGI-3 — the hardest AI test ever created.
135 novel game environments. No instructions. No rules. No goals given.
Figure it out or fail.
Untrained humans solved every single one. Every frontier AI model scored below 1%.
Each environment was handcrafted by game designers. The AI gets dropped in and has to explore, discover what winning looks like, and adapt in real time.
The scoring punishes brute force. If a human needs 10 actions and the AI needs 100, the AI doesn't get 10%. It gets 1%. You can't throw more compute at this.
For context: ARC-AGI-1 is basically solved. Gemini scores 98% on it. ARC-AGI-2 went from 3% to 77% in under a year. Labs spent millions training on earlier versions.
ARC-AGI-3 resets the entire scoreboard to near zero.
Abstract and more here.