Reducing the Costs of Proof Synthesis on Rust Systems by Scaling Up a Seed Training Set

📅 2026-02-04

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This work addresses the lack of formal correctness guarantees in code generated by large language models and the scarcity of annotated formal verification data for Rust, which is costly to produce. To overcome these challenges, the authors propose VeruSyn, a novel data synthesis pipeline that, for the first time, enables large-scale joint generation of Rust programs and their formal proofs with broad coverage of Verus features. The pipeline integrates self-synthesis, tutorial-guided synthesis, and agent trajectory synthesis, augmented by a long-chain-of-thought data augmentation strategy to reduce proof annotation costs. Using this pipeline, the authors construct a dataset of 6.9 million samples and fine-tune Qwen2.5-Coder-32B-Instruct, demonstrating superior performance over state-of-the-art models such as Claude Sonnet 4.5 and o4-mini in both proof generation capability and inference efficiency.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are widely used for code generation. However, the correctness of code generated by LLMs remains a concern. A potential remedy to this concern is to have LLMs generate formal correctness proofs along with such code. However, compared with code generation, code-proof generation requires much higher reasoning capability and has much less existing data to learn from. In this paper, we present VeruSyn, a data synthesis pipeline for Verus, a state-of-the-art verification tool for system software written in Rust. Through self-synthesis and tutorial-based synthesis, VeruSyn achieves much larger scale and Verus-feature coverage than previous data-synthesis techniques designed for Verus; VeruSyn also supplements its dataset with long-chain-of-thought (CoT) data through agent trajectory synthesis. With VeruSyn, we synthesize the largest set of Verus verified programs: 6.9 million Rust programs, each with a formal specification and a proof that it meets that specification. This dataset lets us create a fine-tuned Qwen2.5-Coder-32B-Instruct model with appealing cost-proof tradeoff compared with state-of-the-art commercial models like Claude Sonnet 4.5. It also significantly outperforms models like o4-mini and previously proposed research models.

Problem

Research questions and friction points this paper is trying to address.

proof synthesis

Rust systems

formal verification

large language models

data scarcity

Innovation

Methods, ideas, or system contributions that make the work stand out.

VeruSyn

proof synthesis

formal verification