๐ค AI Summary
This paper addresses the problem of learning sample-efficient, optimal control policies for mobile robots operating in unknown stochastic dynamic and geometric environments under Linear Temporal Logic (LTL) specifications. We propose a task-driven deep reinforcement learning framework that jointly integrates: (i) LTL automaton-guided task decomposition, (ii) a neural dynamics model to capture unknown system dynamics, and (iii) an intrinsic reward mechanism shaped by task progress to enable mission-oriented exploration within PPO- or SAC-style algorithms. Our approach overcomes key limitations of conventional deep RL in logical task settingsโnamely, low sample efficiency and slow convergence. Experiments on multi-obstacle navigation demonstrate a 2.3ร improvement in sample efficiency and a 37% increase in task success rate over baseline methods. The core contribution is a novel learning paradigm synergizing LTL semantic guidance, dynamics-aware modeling, and progress-driven exploration incentives.
๐ Abstract
This paper addresses the problem of designing optimal control policies for mobile robots with mission and safety requirements specified using Linear Temporal Logic (LTL). We consider robots with unknown stochastic dynamics operating in environments with unknown geometric structure. The robots are equipped with sensors allowing them to detect obstacles. Our goal is to synthesize a control policy that maximizes the probability of satisfying an LTL-encoded task in the presence of motion and environmental uncertainty. Several deep reinforcement learning (DRL) algorithms have been proposed recently to address similar problems. A common limitation in related works is that of slow learning performance. In order to address this issue, we propose a novel DRL algorithm, which has the capability to learn control policies at a notably faster rate compared to similar methods. Its sample efficiency is due to a mission-driven exploration strategy that prioritizes exploration towards directions that may contribute to mission accomplishment. Identifying these directions relies on an automaton representation of the LTL task as well as a learned neural network that (partially) models the unknown system dynamics. We provide comparative experiments demonstrating the efficiency of our algorithm on robot navigation tasks in unknown environments.