Value Functions for Temporal Logic: Optimal Policies and Safety Filters

πŸ“… 2026-05-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

203K/year
πŸ€– AI Summary
This work addresses the challenge of synthesizing optimal policies for complex temporal logic specifications involving nested operators such as Until and Globally in undiscounted infinite-horizon settings, where conventional greedy Q-learning often defers task satisfaction indefinitely, compromising optimality. The paper presents the first theoretically grounded approach to construct optimal non-Markovian policies for such specifications by leveraging a graph decomposition of the temporal logic value function. The resulting policy explicitly conditions on state history to maximize quantitative robustness scores while incorporating a Q-function as a universal safety filter that guarantees adherence to the specification and prevents undesirable delays during execution. The method formally establishes policy optimality and demonstrates the Q-function’s efficacy as a safe and general-purpose mechanism for enforcing temporal logic constraints.
πŸ“ Abstract
While Bellman equations for basic reach, avoid, and reach-avoid problems are well studied, the relationship between value optimality and policy optimality becomes subtle in the undiscounted infinite-horizon setting, particularly for more complicated tasks. Greedily maximizing the Q-function can produce policies that indefinitely defer task completion for reach-avoid problems, or equivalently, Until specifications, even when the value function is optimal. Building upon recent results decomposing the value function for temporal logic (TL) into a graph of constituent value functions, we construct non-Markovian policies based on state history that avoid this pathology and prove their optimality with respect to the quantitative robustness score for nested Until, Globally, and Globally-Until specifications. We further show how the Q function can serve as a safety filter for complex TL specifications, extending prior results beyond simple avoid or reach-avoid tasks.
Problem

Research questions and friction points this paper is trying to address.

Temporal Logic
Value Function
Policy Optimality
Q-function
Infinite-horizon
Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal Logic
Non-Markovian Policies
Safety Filters
Robustness Score
Q-function
πŸ”Ž Similar Papers