Bridging LLM Planning Agents and Formal Methods: A Case Study in Plan Verification

📅 2025-10-03

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the challenge of verifying consistency between natural language plans generated by large language models (LLMs) and their intended behaviors. Methodologically, it introduces the first automated framework that deeply integrates LLM-based semantic parsing with formal verification: advanced LLMs (e.g., GPT-5) parse natural language plans into Kripke structures and linear temporal logic (LTL) formulas, which are then rigorously validated via model checking. The key contribution is the first end-to-end, fully automated translation from natural language to formally verifiable models—ensuring both syntactic correctness and strong formal guarantees. Evaluated on a simplified PlanBench benchmark, the approach achieves a 96.3% F1 score, substantially outperforming existing baselines while delivering high precision and reliability.

Technology Category

Application Category

📝 Abstract

We introduce a novel framework for evaluating the alignment between natural language plans and their expected behavior by converting them into Kripke structures and Linear Temporal Logic (LTL) using Large Language Models (LLMs) and performing model checking. We systematically evaluate this framework on a simplified version of the PlanBench plan verification dataset and report on metrics like Accuracy, Precision, Recall and F1 scores. Our experiments demonstrate that GPT-5 achieves excellent classification performance (F1 score of 96.3%) while almost always producing syntactically perfect formal representations that can act as guarantees. However, the synthesis of semantically perfect formal models remains an area for future exploration.

Problem

Research questions and friction points this paper is trying to address.

Verifying alignment between natural language plans and expected behavior

Converting natural language plans into formal Kripke structures and LTL

Evaluating plan verification using model checking and LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Converts natural language plans into Kripke structures

Uses LLMs and LTL for formal model checking

Achieves high accuracy in plan verification tasks

🔎 Similar Papers

No similar papers found.