🤖 AI Summary
This work presents Aletheia, an AI agent designed to autonomously solve cutting-edge mathematical proof problems without human intervention. Built upon the Gemini 3 Deep Think large language model, Aletheia integrates an autonomous reasoning architecture with formal verification mechanisms. In the FirstProof challenge—a rigorously timed evaluation setting—Aletheia successfully completed proofs for six out of ten challenging theorems (Problems 2, 5, 7, 8, 9, and 10), with partial expert validation for Problem 8 and overall approval by a majority of domain experts. This study marks the first demonstration of end-to-end autonomous capability in complex mathematical theorem proving by an AI system, representing a significant advance in automated theorem proving and highlighting the potential of large language models in formal mathematical reasoning.
📝 Abstract
We report the performance of Aletheia (Feng et al., 2026b), a mathematics research agent powered by Gemini 3 Deep Think, on the inaugural FirstProof challenge. Within the allowed timeframe of the challenge, Aletheia autonomously solved 6 problems (2, 5, 7, 8, 9, 10) out of 10 according to majority expert assessments; we note that experts were not unanimous on Problem 8 (only). For full transparency, we explain our interpretation of FirstProof and disclose details about our experiments as well as our evaluation. Raw prompts and outputs are available at https://github.com/google-deepmind/superhuman/tree/main/aletheia.