One Subgoal at a Time: Zero-Shot Generalization to Arbitrary Linear Temporal Logic Requirements in Multi-Task Reinforcement Learning

📅 2025-08-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Reinforcement learning struggles with generalizing to complex, long-horizon temporal tasks subject to safety constraints—particularly in detecting unsatisfiable subgoals and dynamically switching to alternative strategies. To address this, we propose GenZ-LTL: a framework that parses arbitrary Linear Temporal Logic (LTL) specifications into sequences of reach-avoid subgoals using Büchi automata, and integrates a subgoal-driven observation abstraction mechanism for efficient, interpretable, phase-wise solving. GenZ-LTL unifies task structure modeling, safety enforcement, and observation abstraction, enabling zero-shot LTL transfer without requiring task-specific training data. Experiments demonstrate that GenZ-LTL significantly improves success rates and generalization robustness on unseen LTL tasks, outperforming state-of-the-art safe multi-task RL approaches.

Technology Category

Application Category

📝 Abstract
Generalizing to complex and temporally extended task objectives and safety constraints remains a critical challenge in reinforcement learning (RL). Linear temporal logic (LTL) offers a unified formalism to specify such requirements, yet existing methods are limited in their abilities to handle nested long-horizon tasks and safety constraints, and cannot identify situations when a subgoal is not satisfiable and an alternative should be sought. In this paper, we introduce GenZ-LTL, a method that enables zero-shot generalization to arbitrary LTL specifications. GenZ-LTL leverages the structure of Büchi automata to decompose an LTL task specification into sequences of reach-avoid subgoals. Contrary to the current state-of-the-art method that conditions on subgoal sequences, we show that it is more effective to achieve zero-shot generalization by solving these reach-avoid problems extit{one subgoal at a time} through proper safe RL formulations. In addition, we introduce a novel subgoal-induced observation reduction technique that can mitigate the exponential complexity of subgoal-state combinations under realistic assumptions. Empirical results show that GenZ-LTL substantially outperforms existing methods in zero-shot generalization to unseen LTL specifications.
Problem

Research questions and friction points this paper is trying to address.

Generalizing to complex temporal tasks in reinforcement learning
Handling nested long-horizon tasks and safety constraints
Zero-shot generalization to arbitrary LTL specifications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes LTL tasks into reach-avoid subgoals
Solves subgoals sequentially via safe RL
Reduces observation complexity with subgoal technique
🔎 Similar Papers
No similar papers found.