Automaton Constrained Q-Learning

📅 2025-10-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Real-world robotic tasks require achieving temporally ordered objectives under dynamic safety constraints, yet existing reinforcement learning (RL) methods struggle to jointly satisfy both—particularly in continuous domains, where linear temporal logic (LTL)-guided RL suffers from poor sample efficiency and low success rates. To address this, we propose LTL-Automaton Guided Goal-Conditioned RL (LAGC-RL), the first framework unifying automaton-guided sparse reward shaping with goal-conditioned value function learning. LAGC-RL supports arbitrary LTL-specified temporal objectives and non-stationary safety constraints. Built upon a Q-learning foundation, it leverages the structure of LTL automata to guide policy learning. Empirical evaluation on multiple continuous-control benchmarks demonstrates substantial improvements in sample efficiency and task success rate. Furthermore, we validate its deployment robustness and practicality in a challenging 6-DOF robotic arm task: safe grasping within a cluttered cabinet under time-varying safety constraints.

Technology Category

Application Category

📝 Abstract
Real-world robotic tasks often require agents to achieve sequences of goals while respecting time-varying safety constraints. However, standard Reinforcement Learning (RL) paradigms are fundamentally limited in these settings. A natural approach to these problems is to combine RL with Linear-time Temporal Logic (LTL), a formal language for specifying complex, temporally extended tasks and safety constraints. Yet, existing RL methods for LTL objectives exhibit poor empirical performance in complex and continuous environments. As a result, no scalable methods support both temporally ordered goals and safety simultaneously, making them ill-suited for realistic robotics scenarios. We propose Automaton Constrained Q-Learning (ACQL), an algorithm that addresses this gap by combining goal-conditioned value learning with automaton-guided reinforcement. ACQL supports most LTL task specifications and leverages their automaton representation to explicitly encode stage-wise goal progression and both stationary and non-stationary safety constraints. We show that ACQL outperforms existing methods across a range of continuous control tasks, including cases where prior methods fail to satisfy either goal-reaching or safety constraints. We further validate its real-world applicability by deploying ACQL on a 6-DOF robotic arm performing a goal-reaching task in a cluttered, cabinet-like space with safety constraints. Our results demonstrate that ACQL is a robust and scalable solution for learning robotic behaviors according to rich temporal specifications.
Problem

Research questions and friction points this paper is trying to address.

Achieving sequential goals under time-varying safety constraints in robotics
Combining reinforcement learning with temporal logic for complex tasks
Scaling robot learning to handle temporal ordering and safety simultaneously
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines goal-conditioned value learning with automaton-guided reinforcement
Encodes stage-wise goal progression and safety constraints
Leverages automaton representation for temporal logic specifications
🔎 Similar Papers
No similar papers found.