Synthesizing Precise Protocol Specs from Natural Language for Effective Test Generation

📅 2025-11-22

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

To address the challenge of automating testing for natural language protocol specifications—particularly in safety-critical systems—this paper proposes a two-stage methodology. First, large language models (LLMs) extract protocol elements from unstructured textual sources (e.g., RFCs). Second, an I/O grammar-based formal synthesis technique generates executable, traceable, and LLM-free formal protocol specifications. The key contribution is a curated mapping corpus bridging natural language descriptions to formal syntax, enabling version-aware specification evolution and iterative refinement. Evaluated on SMTP, POP3, IMAP, FTP, and ManageSieve, the approach achieves 92.8% average recovery of client message types and 80.2% of server message types; deployed in real systems, it attains an 81.5% message acceptance rate. This significantly enhances the accuracy, maintainability, and scalability of automated test generation for protocol implementations.

Technology Category

Application Category

📝 Abstract

Safety- and security-critical systems have to be thoroughly tested against their specifications. The state of practice is to have _natural language_ specifications, from which test cases are derived manually - a process that is slow, error-prone, and difficult to scale. _Formal_ specifications, on the other hand, are well-suited for automated test generation, but are tedious to write and maintain. In this work, we propose a two-stage pipeline that uses large language models (LLMs) to bridge the gap: First, we extract _protocol elements_ from natural-language specifications; second, leveraging a protocol implementation, we synthesize and refine a formal _protocol specification_ from these elements, which we can then use to massively test further implementations. We see this two-stage approach to be superior to end-to-end LLM-based test generation, as 1. it produces an _inspectable specification_ that preserves traceability to the original text; 2. the generation of actual test cases _no longer requires an LLM_; 3. the resulting formal specs are _human-readable_, and can be reviewed, version-controlled, and incrementally refined; and 4. over time, we can build a _corpus_ of natural-language-to-formal-specification mappings that can be used to further train and refine LLMs for more automatic translations. Our prototype, AUTOSPEC, successfully demonstrated the feasibility of our approach on five widely used _internet protocols_ (SMTP, POP3, IMAP, FTP, and ManageSieve) by applying its methods on their _RFC specifications_ written in natural-language, and the recent _I/O grammar_ formalism for protocol specification and fuzzing. In its evaluation, AUTOSPEC recovers on average 92.8% of client and 80.2% of server message types, and achieves 81.5% message acceptance across diverse, real-world systems.

Problem

Research questions and friction points this paper is trying to address.

Automating formal protocol specification synthesis from natural language descriptions

Bridging manual testing and automated generation through LLM-based translation

Enabling scalable testing of implementations against inspectable formal specs

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs extract protocol elements from natural language

Synthesize formal specifications from extracted elements

Generate tests automatically without LLMs using formal specs

🔎 Similar Papers

SpecGen: Automated Generation of Formal Program Specifications via Large Language Models