Hierarchical Attention Generates Better Proofs

📅 2025-04-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) struggle to model the hierarchical structure of mathematical reasoning in formal theorem proving. Method: We propose Hierarchical Attention Regularization (HAR), the first approach to explicitly encode a five-level mathematical reasoning hierarchy—proposition → lemma → step → subgoal → atomic operation—into the Transformer attention mechanism via structural alignment constraints. HAR integrates mathematically grounded attention regularization with formal proof fine-tuning on miniF2F and ProofNet. Contribution/Results: On miniF2F, HAR improves proof success rate by 2.05% and reduces average proof steps by 23.81%; on ProofNet, it achieves a 1.69% gain in success rate and a 16.50% reduction in proof steps. These results demonstrate substantially enhanced modeling of abstract reasoning structures and improved generation efficiency for formal proofs.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have shown promise in formal theorem proving, but their token-level processing often fails to capture the inherent hierarchical nature of mathematical proofs. We introduce extbf{Hierarchical Attention}, a regularization method that aligns LLMs' attention mechanisms with mathematical reasoning structures. Our approach establishes a five-level hierarchy from foundational elements to high-level concepts, ensuring structured information flow in proof generation. Experiments demonstrate that our method improves proof success rates by 2.05% on miniF2F and 1.69% on ProofNet while reducing proof complexity by 23.81% and 16.50% respectively. The code is available at https://github.com/Car-pe/HAGBP.
Problem

Research questions and friction points this paper is trying to address.

Aligns LLM attention with mathematical proof structures
Improves proof success rates on miniF2F and ProofNet
Reduces proof complexity by significant percentages
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Attention aligns LLMs with math structures
Five-level hierarchy ensures structured proof generation
Improves success rates and reduces proof complexity
🔎 Similar Papers
No similar papers found.
J
Jianlong Chen
The Chinese University of Hong Kong, Shenzhen
C
Chao Li
Shanghai Qi Zhi Institute
Yang Yuan
Yang Yuan
Tsinghua University
Machine learningOptimization
A
Andrew C Yao
Shanghai Qi Zhi Institute, IIIS, Tsinghua University