Aletheia tackles FirstProof autonomously

📅 2026-02-24

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This work presents Aletheia, an AI agent designed to autonomously solve cutting-edge mathematical proof problems without human intervention. Built upon the Gemini 3 Deep Think large language model, Aletheia integrates an autonomous reasoning architecture with formal verification mechanisms. In the FirstProof challenge—a rigorously timed evaluation setting—Aletheia successfully completed proofs for six out of ten challenging theorems (Problems 2, 5, 7, 8, 9, and 10), with partial expert validation for Problem 8 and overall approval by a majority of domain experts. This study marks the first demonstration of end-to-end autonomous capability in complex mathematical theorem proving by an AI system, representing a significant advance in automated theorem proving and highlighting the potential of large language models in formal mathematical reasoning.

Technology Category

Application Category

📝 Abstract

We report the performance of Aletheia (Feng et al., 2026b), a mathematics research agent powered by Gemini 3 Deep Think, on the inaugural FirstProof challenge. Within the allowed timeframe of the challenge, Aletheia autonomously solved 6 problems (2, 5, 7, 8, 9, 10) out of 10 according to majority expert assessments; we note that experts were not unanimous on Problem 8 (only). For full transparency, we explain our interpretation of FirstProof and disclose details about our experiments as well as our evaluation. Raw prompts and outputs are available at https://github.com/google-deepmind/superhuman/tree/main/aletheia.

Problem

Research questions and friction points this paper is trying to address.

automated theorem proving

mathematical reasoning

AI research agent

FirstProof challenge

autonomous problem solving

Innovation

Methods, ideas, or system contributions that make the work stand out.

autonomous mathematical reasoning

AI theorem proving

Gemini 3 Deep Think

FirstProof challenge

Aletheia

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

No related jobs found.

Authors to Follow