🤖 AI Summary
Ambiguous task formulations hinder large language models’ (LLMs) reasoning performance in Coq theorem proving. Method: To enhance task clarity, we propose (i) structured semantic context modeling, (ii) a selective concept expansion mechanism, and (iii) a Planner–Executor two-stage reasoning architecture; we further design a concept-level clarity metric for quantitative evaluation. Using DeepSeek-V3 as the base model, we integrate context augmentation and lightweight fine-tuning. Results: Evaluated on 1,386 theorems, our approach increases task clarity by 1.85× and boosts proof success rate from 21.8% to 45.8%, surpassing the prior SOTA Graph2Tac (33.2%). Our core contribution is the first systematic quantification and optimization of task clarity in formal theorem proving—establishing an interpretable, optimization-friendly paradigm for LLM-based formal reasoning.
📝 Abstract
In this work, we investigate whether improving task clarity can enhance reasoning ability of large language models, focusing on theorem proving in Coq. We introduce a concept-level metric to evaluate task clarity and show that adding structured semantic context to the standard input used by modern LLMs, leads to a 1.85$ imes$ improvement in clarity score (44.5%~$
ightarrow$~82.3%). Using the general-purpose model exttt{DeepSeek-V3}, our approach leads to a 2.1$ imes$ improvement in proof success (21.8%~$
ightarrow$~45.8%) and outperforms the previous state-of-the-art exttt{Graph2Tac} (33.2%). We evaluate this on 1,386 theorems randomly sampled from 15 standard Coq packages, following the same evaluation protocol as exttt{Graph2Tac}. Furthermore, fine-tuning smaller models on our structured data can achieve even higher performance (48.6%). Our method uses selective concept unfolding to enrich task descriptions, and employs a Planner--Executor architecture. These findings highlight the value of structured task representations in bridging the gap between understanding and reasoning.