Grounding Generative Planners in Verifiable Logic: A Hybrid Architecture for Trustworthy Embodied AI

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work addresses the limitations of large language models (LLMs) as embodied planners, which often lack formal reasoning capabilities, compromising safety and hindering effective repair of unsafe plans. To overcome this, the authors propose the Verifiable Iterative Refinement Framework (VIRF), introducing a novel collaborative paradigm between a logic-based tutor and an LLM. By integrating formal safety ontologies, causal reasoning, and pedagogical feedback, VIRF enables active, verifiable plan repair. This neuro-symbolic architecture combines scalable knowledge acquisition with an iterative correction mechanism, achieving a 0% hazardous action rate and a 77.3% task success rate in household safety scenarios, with an average of only 1.1 refinement iterations—significantly outperforming existing approaches.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) show promise as planners for embodied AI, but their stochastic nature lacks formal reasoning, preventing strict safety guarantees for physical deployment. Current approaches often rely on unreliable LLMs for safety checks or simply reject unsafe plans without offering repairs. We introduce the Verifiable Iterative Refinement Framework (VIRF), a neuro-symbolic architecture that shifts the paradigm from passive safety gatekeeping to active collaboration. Our core contribution is a tutor-apprentice dialogue where a deterministic Logic Tutor, grounded in a formal safety ontology, provides causal and pedagogical feedback to an LLM planner. This enables intelligent plan repairs rather than mere avoidance. We also introduce a scalable knowledge acquisition pipeline that synthesizes safety knowledge bases from real-world documents, correcting blind spots in existing benchmarks. In challenging home safety tasks, VIRF achieves a perfect 0 percent Hazardous Action Rate (HAR) and a 77.3 percent Goal-Condition Rate (GCR), which is the highest among all baselines. It is highly efficient, requiring only 1.1 correction iterations on average. VIRF demonstrates a principled pathway toward building fundamentally trustworthy and verifiably safe embodied agents.

Problem

Research questions and friction points this paper is trying to address.

Embodied AI

Safety Verification

Formal Reasoning

Plan Repair

Trustworthy AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

neuro-symbolic architecture

verifiable safety

iterative plan refinement