miniCTX: Neural Theorem Proving with (Long-)Contexts

📅 2024-08-05
🏛️ arXiv.org
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the evaluation of neural theorem provers’ formal reasoning capabilities in ultra-long (tens of thousands of tokens), previously unseen contexts—containing novel definitions, lemmas, and code structures—by introducing the first Lean-specific long-context theorem-proving benchmark. Methodologically, we propose a context extraction and annotation framework built upon ntp-toolkit, integrating fine-tuning with context-aware prompting to enable cross-file and cross-module semantic understanding. Key contributions include: (i) the first empirical demonstration that conventional state-driven approaches fundamentally fail in realistic long-context settings; and (ii) rigorous validation that context conditioning is the decisive factor for improving formal reasoning performance in practice. Our approach achieves substantial gains over baseline models from established benchmarks such as miniF2F on this new benchmark, establishing a new standard for evaluating long-context reasoning in interactive theorem proving.

Technology Category

Application Category

📝 Abstract
Real-world formal theorem proving often depends on a wealth of context, including definitions, lemmas, comments, file structure, and other information. We introduce miniCTX, which tests a model's ability to prove formal mathematical theorems that depend on new context that is not seen during training. miniCTX contains theorems sourced from real Lean projects and textbooks, each associated with a context that can span tens of thousands of tokens. Models are tasked with proving a theorem given access to code from the theorem's repository, which contains context that is needed for the proof. As a baseline for miniCTX, we tested fine-tuning and prompting methods that condition theorem proving on preceding context. Both approaches substantially outperform traditional methods that rely solely on state information. We found that this ability to use context is not captured by previous benchmarks such as miniF2F. Alongside miniCTX, we offer ntp-toolkit for automatically extracting and annotating theorem proving data, making it easy to add new projects into miniCTX to ensure that contexts are not seen during training. miniCTX offers a challenging and realistic evaluation of neural theorem provers.
Problem

Research questions and friction points this paper is trying to address.

Tests model's ability to prove theorems with unseen context.
Uses real Lean projects and textbooks for theorem proving.
Introduces miniCTX for realistic evaluation of neural theorem provers.
Innovation

Methods, ideas, or system contributions that make the work stand out.

miniCTX tests theorem proving with unseen contexts
Uses Lean projects and textbooks for realistic contexts
ntp-toolkit automates data extraction for theorem proving
🔎 Similar Papers
No similar papers found.
Jiewen Hu
Jiewen Hu
Carnegie Mellon University
T
Thomas (Hanwen) Zhu
Carnegie Mellon University
S
S. Welleck
Carnegie Mellon University