Seeking Specifications: The Case for Neuro-Symbolic Specification Synthesis

📅 2025-04-29

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the challenge of automatically generating ACSL formal specifications for C programs. We propose a neuro-symbolic collaborative approach that integrates the DeepSeek-R1 large language model with the Frama-C toolchain—specifically its EVA abstract interpreter and PathCrawler path coverage analyzer—to systematically uncover how code defects influence specification generation tendencies. We design a user-controllable, intent- or implementation-oriented prompting mechanism that explicitly distinguishes high-level behavioral specifications from low-level implementation-specific ones. Furthermore, we introduce a multi-stage symbolic-augmented reasoning paradigm, dynamically injecting symbolic analysis feedback into the LLM’s generation process. Experimental evaluation demonstrates substantial improvements in specification accuracy and semantic soundness: critical specification error rates decrease by 37%, and the method enables on-demand generation of high-quality, mechanically verifiable ACSL annotations.

Technology Category

Application Category

📝 Abstract

This work is concerned with the generation of formal specifications from code, using Large Language Models (LLMs) in combination with symbolic methods. Concretely, in our study, the programming language is C, the specification language is ACSL, and the LLM is Deepseek-R1. In this context, we address two research directions, namely the specification of intent vs. implementation on the one hand, and the combination of symbolic analyses with LLMs on the other hand. For the first, we investigate how the absence or presence of bugs in the code impacts the generated specifications, as well as whether and how a user can direct the LLM to specify intent or implementation, respectively. For the second, we investigate the impact of results from symbolic analyses on the specifications generated by the LLM. The LLM prompts are augmented with outputs from two formal methods tools in the Frama-C ecosystem, Pathcrawler and EVA. We demonstrate how the addition of symbolic analysis to the workflow impacts the quality of annotations.

Problem

Research questions and friction points this paper is trying to address.

Generating formal specifications from C code using LLMs and symbolic methods

Investigating impact of code bugs on specification generation for intent vs implementation

Enhancing LLM-generated specifications with symbolic analysis from Frama-C tools

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines LLMs with symbolic methods for specifications

Uses Deepseek-R1 for C code and ACSL annotations

Augments LLM prompts with Frama-C tool outputs

🔎 Similar Papers

SpecGen: Automated Generation of Formal Program Specifications via Large Language Models