Doc2Spec: Synthesizing Formal Programming Specifications from Natural Language via Grammar Induction

📅 2026-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Natural language API documentation is notoriously difficult to manually translate into consistent and reliable formal specifications. To address this challenge, this work proposes Doc2Spec, a novel framework that leverages multi-agent collaboration and large language models (LLMs) to automatically infer specification grammars directly from natural language descriptions. For the first time, such inferred grammars guide the generation of formal specifications without requiring any manually defined grammar, effectively constraining the specification space while incorporating domain knowledge. Evaluated across seven benchmarks spanning three programming languages, Doc2Spec significantly outperforms baseline approaches that lack grammar guidance and achieves performance on par with existing methods that rely on hand-crafted grammars.

Technology Category

Application Category

📝 Abstract
Ensuring that API implementations and usage comply with natural language programming rules is critical for software correctness, security, and reliability. Formal verification can provide strong guarantees but requires precise specifications, which are difficult and costly to write manually. To address this challenge, we present Doc2Spec, a multi-agent framework that uses LLMs to automatically induce a specification grammar from natural-language rules and then generates formal specifications guided by the induced grammar. The grammar captures essential domain knowledge, constrains the specification space, and enforces consistent representations, thereby improving the reliability and quality of generated specifications. Evaluated on seven benchmarks across three programming languages, Doc2Spec outperforms a baseline without grammar induction and achieves competitive results against a technique with a manually crafted grammar, demonstrating the effectiveness of automated grammar induction for formalizing natural-language rules.
Problem

Research questions and friction points this paper is trying to address.

formal specification
natural language
grammar induction
API compliance
software correctness
Innovation

Methods, ideas, or system contributions that make the work stand out.

grammar induction
formal specification
natural language to formal methods
multi-agent LLM framework
API compliance
S
Shihao Xia
The Pennsylvania State University
M
Mengting He
The Pennsylvania State University
H
Haomin Jia
University of the Chinese Academy of Sciences
Linhai Song
Linhai Song
Professor, Institute of Computing Technology, Chinese Academy of Sciences
Operating SystemsSoftware EngineeringSecurityProgramming Languages