🤖 AI Summary
Existing LLM-based approaches struggle to generate high-quality formal specifications for complex loop programs, often yielding irrelevant, incomplete, or ambiguous assertions—hindering end-to-end automated verification. To address this, we propose a two-stage specification generation framework: (1) program slicing to decompose loop structures and reduce modeling complexity; and (2) a logic-driven deletion mechanism that dynamically prunes incorrect candidate assertions, thereby enhancing the relevance, completeness, and correctness of LLM-generated specifications. The framework tightly integrates LLM reasoning with formal verification tools to enable closed-loop assertion filtering. Evaluation on our curated complex-loop dataset shows 95.1% of generated assertions and 90.91% of target programs pass formal verification. On established benchmarks, our method increases the number of verified programs by five and reduces average verification runtime by 23.73%.
📝 Abstract
Automatically generating formal specifications from program code can greatly enhance the efficiency of program verification and enable end-to-end automation from requirements to reliable software. However, existing LLM-based approaches often struggle with programs that include complex loop structures, leading to irrelevant specifications. Moreover, the rigorous proof obligations and design constraints imposed by verification tools can further result in incomplete and ambiguous specifications. To address these challenges, we propose SLD-Spec, an LLM-assisted specification generation method tailored for programs with complex loop constructs. SLD-Spec introduces two novel phases into the traditional specification generation framework: (1) A slicing phase, which decomposes each function into code fragments containing independent loop structures, thereby reducing the complexity of specification generation; and (2) A logical deletion phase, which applies LLM-based reasoning to filter out incorrect candidate specifications--especially those not easily identified by verification tool--while retaining valid ones. Experimental results show that on the simple dataset, SLD-Spec successfully verifies five more programs than the state-of-the-art AutoSpec and reduces runtime by 23.73%. To address the limitations of existing research, we manually construct a dataset comprising four categories of complex loop programs. On this dataset, SLD-Spec significantly improves the correctness, relevance, and completeness of generated specifications compared to baseline methods, enabling 95.1% of assertions and 90.91% of programs to pass verification. Ablation studies further reveal that logical deletion is critical for enhancing specification correctness and relevance, while program slicing contributes significantly to specification completeness. Our code and data are publicly available.