Mission-driven Exploration for Accelerated Deep Reinforcement Learning with Temporal Logic Task Specifications

📅 2023-11-28

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This paper addresses the problem of learning sample-efficient, optimal control policies for mobile robots operating in unknown stochastic dynamic and geometric environments under Linear Temporal Logic (LTL) specifications. We propose a task-driven deep reinforcement learning framework that jointly integrates: (i) LTL automaton-guided task decomposition, (ii) a neural dynamics model to capture unknown system dynamics, and (iii) an intrinsic reward mechanism shaped by task progress to enable mission-oriented exploration within PPO- or SAC-style algorithms. Our approach overcomes key limitations of conventional deep RL in logical task settings—namely, low sample efficiency and slow convergence. Experiments on multi-obstacle navigation demonstrate a 2.3× improvement in sample efficiency and a 37% increase in task success rate over baseline methods. The core contribution is a novel learning paradigm synergizing LTL semantic guidance, dynamics-aware modeling, and progress-driven exploration incentives.

📝 Abstract

This paper addresses the problem of designing optimal control policies for mobile robots with mission and safety requirements specified using Linear Temporal Logic (LTL). We consider robots with unknown stochastic dynamics operating in environments with unknown geometric structure. The robots are equipped with sensors allowing them to detect obstacles. Our goal is to synthesize a control policy that maximizes the probability of satisfying an LTL-encoded task in the presence of motion and environmental uncertainty. Several deep reinforcement learning (DRL) algorithms have been proposed recently to address similar problems. A common limitation in related works is that of slow learning performance. In order to address this issue, we propose a novel DRL algorithm, which has the capability to learn control policies at a notably faster rate compared to similar methods. Its sample efficiency is due to a mission-driven exploration strategy that prioritizes exploration towards directions that may contribute to mission accomplishment. Identifying these directions relies on an automaton representation of the LTL task as well as a learned neural network that (partially) models the unknown system dynamics. We provide comparative experiments demonstrating the efficiency of our algorithm on robot navigation tasks in unknown environments.

Problem

Research questions and friction points this paper is trying to address.

Design control policies for agents with unknown stochastic dynamics

Improve learning speed of DRL for LTL task specifications

Enhance sample efficiency via mission-driven exploration strategy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel Deep Q-learning for faster LTL policy learning

Mission-driven exploration prioritizes success directions

Combines automaton LTL representation with neural modeling

🔎 Similar Papers

Fast and Accurate Task Planning using Neuro-Symbolic Language Models and Multi-level Goal Decomposition