EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving

📅 2025-09-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In automated theorem proving (ATP), large language models (LLMs) employ test-time scaling—e.g., reflective chain-of-thought (CoT) and multi-sampling—to improve proof success, yet incur prohibitive inference overhead; existing cost analyses overlook strategy-dependent sampling costs. Method: We propose EconRL, a cost-aware framework featuring (1) a dynamic CoT switching mechanism that activates complex reasoning only when necessary, and (2) a cost-sensitive parallel reinforcement learning architecture jointly optimizing trainable prefixes, sampling policies, and token efficiency. Contribution/Results: Evaluated on miniF2F and ProofNet, EconRL achieves theorem-proving performance comparable to baselines using only 12% of their computational cost—in terms of both token count and sample count—demonstrating, for the first time, fine-grained cost control and efficient co-optimization of test-time scaling in ATP.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have recently advanced the field of Automated Theorem Proving (ATP), attaining substantial performance gains through widely adopted test-time scaling strategies, notably reflective Chain-of-Thought (CoT) reasoning and increased sampling passes. However, they both introduce significant computational overhead for inference. Moreover, existing cost analyses typically regulate only the number of sampling passes, while neglecting the substantial disparities in sampling costs introduced by different scaling strategies. In this paper, we systematically compare the efficiency of different test-time scaling strategies for ATP models and demonstrate the inefficiency of the current state-of-the-art (SOTA) open-source approaches. We then investigate approaches to significantly reduce token usage and sample passes while maintaining the original performance. Specifically, we propose two complementary methods that can be integrated into a unified EconRL pipeline for amplified benefits: (1) a dynamic Chain-of-Thought (CoT) switching mechanism designed to mitigate unnecessary token consumption, and (2) Diverse parallel-scaled reinforcement learning (RL) with trainable prefixes to enhance pass rates under constrained sampling passes. Experiments on miniF2F and ProofNet demonstrate that our EconProver achieves comparable performance to baseline methods with only 12% of the computational cost. This work provides actionable insights for deploying lightweight ATP models without sacrificing performance.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational overhead in automated theorem proving
Optimizing token usage and sampling passes efficiently
Maintaining performance while minimizing inference costs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic CoT switching to reduce tokens
Diverse parallel-scaled RL with trainable prefixes
Unified EconRL pipeline for economical scaling
🔎 Similar Papers