🤖 AI Summary
This work investigates effective mechanisms for leveraging large language models to assist in frontier research in mathematics and theoretical computer science. We propose and open-source Bolzano, a multi-agent system that integrates parallel provers and verifiers through iterative interactions, an interactive proof framework, and cross-iteration knowledge retention to establish a persistent collaborative reasoning architecture. Experiments on six authentic mathematical problems demonstrate, for the first time, a high level of autonomous research capability: three results were largely produced independently by the system, four meet publication standards, and six novel mathematical findings were generated in total. These outcomes provide strong empirical validation of the substantive contribution large language models can make to mathematical discovery.
📝 Abstract
We report new results on six problems in mathematics and theoretical computer science, produced with the assistance of Bolzano, an open-source multi-agent LLM system. Bolzano orchestrates rounds of interaction between parallel prover agents and a verifier agent while maintaining a persistent knowledge base that is carried across rounds. Classified using the significance-autonomy taxonomy of Feng et al., four of the six results reach the level of publishable research, and three of the six were produced essentially autonomously by Bolzano. Our results provide evidence that LLMs can contribute meaningfully to mathematical research, complementing recent reports by Bubeck et al., Woodruff et al., and others.