Mission-driven Exploration for Accelerated Deep Reinforcement Learning with Temporal Logic Task Specifications

๐Ÿ“… 2023-11-28
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 3
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper addresses the problem of learning sample-efficient, optimal control policies for mobile robots operating in unknown stochastic dynamic and geometric environments under Linear Temporal Logic (LTL) specifications. We propose a task-driven deep reinforcement learning framework that jointly integrates: (i) LTL automaton-guided task decomposition, (ii) a neural dynamics model to capture unknown system dynamics, and (iii) an intrinsic reward mechanism shaped by task progress to enable mission-oriented exploration within PPO- or SAC-style algorithms. Our approach overcomes key limitations of conventional deep RL in logical task settingsโ€”namely, low sample efficiency and slow convergence. Experiments on multi-obstacle navigation demonstrate a 2.3ร— improvement in sample efficiency and a 37% increase in task success rate over baseline methods. The core contribution is a novel learning paradigm synergizing LTL semantic guidance, dynamics-aware modeling, and progress-driven exploration incentives.
๐Ÿ“ Abstract
This paper addresses the problem of designing optimal control policies for mobile robots with mission and safety requirements specified using Linear Temporal Logic (LTL). We consider robots with unknown stochastic dynamics operating in environments with unknown geometric structure. The robots are equipped with sensors allowing them to detect obstacles. Our goal is to synthesize a control policy that maximizes the probability of satisfying an LTL-encoded task in the presence of motion and environmental uncertainty. Several deep reinforcement learning (DRL) algorithms have been proposed recently to address similar problems. A common limitation in related works is that of slow learning performance. In order to address this issue, we propose a novel DRL algorithm, which has the capability to learn control policies at a notably faster rate compared to similar methods. Its sample efficiency is due to a mission-driven exploration strategy that prioritizes exploration towards directions that may contribute to mission accomplishment. Identifying these directions relies on an automaton representation of the LTL task as well as a learned neural network that (partially) models the unknown system dynamics. We provide comparative experiments demonstrating the efficiency of our algorithm on robot navigation tasks in unknown environments.
Problem

Research questions and friction points this paper is trying to address.

Design control policies for agents with unknown stochastic dynamics
Improve learning speed of DRL for LTL task specifications
Enhance sample efficiency via mission-driven exploration strategy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel Deep Q-learning for faster LTL policy learning
Mission-driven exploration prioritizes success directions
Combines automaton LTL representation with neural modeling
๐Ÿ”Ž Similar Papers
No similar papers found.
J
Jun Wang
Department of Electrical and Systems Engineering, Washington University in St. Louis (WashU), St. Louis, MO
Hosein Hasanbeig
Hosein Hasanbeig
Microsoft Research
Machine LearningDeep LearningFormal MethodsAutomatic Control
K
Kaiyuan Tan
Department of Electrical and Systems Engineering, Washington University in St. Louis (WashU), St. Louis, MO
Z
Zihe Sun
Department of Mechanical Engineering, WashU
Y
Y. Kantaros
Department of Electrical and Systems Engineering, Washington University in St. Louis (WashU), St. Louis, MO