Ensuring Logic in the Fog: Sound POMDP Synthesis with LTL Objectives

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

This work addresses the challenge of synthesizing policies for partially observable Markov decision processes (POMDPs) that satisfy linear temporal logic (LTL) specifications. The authors propose a novel reward shaping mechanism grounded in verifiable LTL satisfaction, dynamically informed by belief states and integrated into an enhanced Monte Carlo tree search (MCTS) framework. This approach introduces the first sound reward shaping strategy that cohesively combines formal verification with approximate planning, enabling the generation of reliable reward signals even within the undecidable POMDP-LTL setting. Empirical evaluations demonstrate that the method significantly outperforms existing solvers across multiple benchmark domains, exhibiting superior scalability and robustness—particularly in complex, partially observable environments.

📝 Abstract

Synthesising autonomous agents that can navigate uncertain environments while adhering to complex temporal constraints remains a fundamental challenge. While Linear Temporal Logic (LTL) provides a rigorous language for specifying such tasks, the inherent undecidability of qualitatively verifying LTL satisfaction in partially observable Markov decision processes renders quantitative synthesis difficult, especially when designing reliable reward signals for approximate solvers. In this paper, we bridge this gap with a novel, sound reward-shaping mechanism that dynamically generates belief-dependent rewards grounded in certified LTL satisfaction. By integrating this mechanism into an enhanced Monte Carlo Planning framework, we empower agents to navigate the `fog' of partial observability with a search process focused on maximising verifiable success. Our experiments demonstrate that this approach not only thrives in scenarios where existing solvers fail but also maintains effectiveness and scalability across diverse benchmark domains.

Problem

Research questions and friction points this paper is trying to address.

POMDP

LTL

reward shaping

partial observability

temporal logic

Innovation

Methods, ideas, or system contributions that make the work stand out.

reward shaping

LTL synthesis

POMDP