Bridging LLM Planning Agents and Formal Methods: A Case Study in Plan Verification

๐Ÿ“… 2025-10-03
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of verifying consistency between natural language plans generated by large language models (LLMs) and their intended behaviors. Methodologically, it introduces the first automated framework that deeply integrates LLM-based semantic parsing with formal verification: advanced LLMs (e.g., GPT-5) parse natural language plans into Kripke structures and linear temporal logic (LTL) formulas, which are then rigorously validated via model checking. The key contribution is the first end-to-end, fully automated translation from natural language to formally verifiable modelsโ€”ensuring both syntactic correctness and strong formal guarantees. Evaluated on a simplified PlanBench benchmark, the approach achieves a 96.3% F1 score, substantially outperforming existing baselines while delivering high precision and reliability.

Technology Category

Application Category

๐Ÿ“ Abstract
We introduce a novel framework for evaluating the alignment between natural language plans and their expected behavior by converting them into Kripke structures and Linear Temporal Logic (LTL) using Large Language Models (LLMs) and performing model checking. We systematically evaluate this framework on a simplified version of the PlanBench plan verification dataset and report on metrics like Accuracy, Precision, Recall and F1 scores. Our experiments demonstrate that GPT-5 achieves excellent classification performance (F1 score of 96.3%) while almost always producing syntactically perfect formal representations that can act as guarantees. However, the synthesis of semantically perfect formal models remains an area for future exploration.
Problem

Research questions and friction points this paper is trying to address.

Verifying alignment between natural language plans and expected behavior
Converting natural language plans into formal Kripke structures and LTL
Evaluating plan verification using model checking and LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Converts natural language plans into Kripke structures
Uses LLMs and LTL for formal model checking
Achieves high accuracy in plan verification tasks
๐Ÿ”Ž Similar Papers
No similar papers found.