🤖 AI Summary
This work addresses the substantial overhead in parallel proof search within Lean 4, which arises from repeatedly reconstructing proof states—such as library imports and theorem unfoldings—across different branches. To mitigate this redundancy, the authors propose a proof state snapshotting mechanism orthogonal to existing caching strategies. This approach captures and serializes the fully elaborated proof state once within the Lean 4 language server, enabling direct reuse across multiple search branches without recomputation. Integrated into the Snapshot-DSP inference pipeline, the technique achieves end-to-end speedups of 5.6× to 50× (averaging 14×) on the miniF2F-v2 benchmark, with performance gains scaling markedly as the number of parallel branches increases.
📝 Abstract
Automated theorem proving systems built on Lean 4 increasingly rely on parallel tactic search over partially specified proofs, such as those generated by Draft-Sketch-Prove (DSP) pipelines. In current systems, each search branch reconstructs a proof state by re-running elaboration, leading to substantial per-branch overhead. In Lean 4 with Mathlib, this cost has two components: (1) import loading, which deserializes pre-compiled libraries (~60 s per branch); and (2) theorem-body elaboration, which re-checks the theorem context up to the target goal (estimated 18-735 s depending on proof complexity). Together, these account for >99% of per-branch wall time, making portfolio-based search impractical at scale.
We observe that this overhead arises from a mismatch between the structure of proof search and its execution model: branching is implemented via repeated reconstruction of proof states rather than direct reuse. To address this, we introduce proof-state snapshotting, which captures the elaborated proof state once and reuses it across branches via a small extension to the Lean 4 language server.
Across 48 miniF2F-v2 problems (45 prove-phase benchmarks and 3 full end-to-end runs), our approach achieves a 5.6-50x wall-time speedup over the standard fallback (average 14x, median 9.7x). Speedup increases with the number of proof branches.
Our method is orthogonal to import-level caching (e.g., Kimina Lean Server), which avoids import loading but not theorem-body elaboration. The patched Lean binary and the Snapshot-DSP pipeline will be released as open source upon publication.