🤖 AI Summary
Automated solving of advanced undergraduate mathematics problems—such as those from MIT and Columbia University curricula and the MATH benchmark—remains a significant challenge for AI. This paper proposes an end-to-end framework integrating zero-shot learning, symbolic mathematical reasoning modeling, and program synthesis, augmented by a novel reasoning enhancement mechanism that enables problem solving, stepwise explanation, and executable code generation without reliance on large-scale annotated data. Our core contribution lies in jointly modeling formal logical reasoning and executable programs, thereby substantially improving logical rigor and out-of-distribution generalization. Evaluated on standard benchmarks, our method achieves 90.15% accuracy—surpassing the prior state-of-the-art (81.00%) by 9.15 percentage points. This work establishes a data-efficient, interpretable paradigm for complex mathematical reasoning.
📝 Abstract
The challenges of solving complex university-level mathematics problems, particularly those from MIT, and Columbia University courses, and selected tasks from the MATH dataset, remain a significant obstacle in the field of artificial intelligence. Conventional methods have consistently fallen short in this domain, highlighting the need for more advanced approaches. In this paper, we introduce a language-based solution that leverages zero-shot learning and mathematical reasoning to effectively solve, explain, and generate solutions for these advanced math problems. By integrating program synthesis, our method reduces reliance on large-scale training data while significantly improving problem-solving accuracy. Our approach achieves an accuracy of 90.15%, representing a substantial improvement over the previous benchmark of 81% and setting a new standard in automated mathematical problem-solving. These findings highlight the significant potential of advanced AI methodologies to address and overcome the challenges presented by some of the most complex mathematical courses and datasets.