LLM-based Satisfiability Checking of String Requirements by Consistent Data and Checker Generation

📅 2025-06-19

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

Verifying the satisfiability of string constraints specified in natural language (NL) remains challenging: traditional SMT solvers face theoretical limitations and require labor-intensive formalization, while the efficacy of large language models (LLMs) for this task has not been systematically investigated. Method: We propose the first LLM-driven bidirectional verification paradigm. It leverages state-of-the-art LLMs (e.g., GPT-4, Claude) to jointly generate candidate strings satisfying NL requirements and synthesize dual-modal validators—SMT-Lib formulas and Python checkers—in parallel. A feedback loop iteratively refines generation using validator outputs, establishing an end-to-end, falsifiable pipeline: NL requirement → concrete string instance → machine-checkable validation. Results: Our Python validators achieve 100% accuracy; integrating validation boosts consistent string generation success rate and F1-score by over 2× compared to unverified baselines, demonstrating substantial gains in correctness and reliability.

Technology Category

Application Category

📝 Abstract

Requirements over strings, commonly represented using natural language (NL), are particularly relevant for software systems due to their heavy reliance on string data manipulation. While individual requirements can usually be analyzed manually, verifying properties (e.g., satisfiability) over sets of NL requirements is particularly challenging. Formal approaches (e.g., SMT solvers) may efficiently verify such properties, but are known to have theoretical limitations. Additionally, the translation of NL requirements into formal constraints typically requires significant manual effort. Recently, large language models (LLMs) have emerged as an alternative approach for formal reasoning tasks, but their effectiveness in verifying requirements over strings is less studied. In this paper, we introduce a hybrid approach that verifies the satisfiability of NL requirements over strings by using LLMs (1) to derive a satisfiability outcome (and a consistent string, if possible), and (2) to generate declarative (i.e., SMT) and imperative (i.e., Python) checkers, used to validate the correctness of (1). In our experiments, we assess the performance of four LLMs. Results show that LLMs effectively translate natural language into checkers, even achieving perfect testing accuracy for Python-based checkers. These checkers substantially help LLMs in generating a consistent string and accurately identifying unsatisfiable requirements, leading to more than doubled generation success rate and F1-score in certain cases compared to baselines without generated checkers.

Problem

Research questions and friction points this paper is trying to address.

Verifying satisfiability of natural language string requirements

Automating translation of NL requirements to formal constraints

Assessing LLM effectiveness in string requirement verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs derive satisfiability outcomes and consistent strings

LLMs generate declarative SMT and imperative Python checkers

Hybrid approach combines LLMs with checker validation

🔎 Similar Papers

No similar papers found.