Reinforcement learning with timed constraints for robotics motion planning

📅 2025-12-31

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the challenge of enabling robots to satisfy complex temporal tasks under time constraints in dynamic and uncertain environments, where integrating temporal logic specifications with reinforcement learning is hindered by stochasticity and partial observability. The paper proposes the first unified framework that compiles Metric Interval Temporal Logic (MITL) specifications into Timed Limit-Deterministic Generalized Büchi Automata (Timed-LDGBA), constructs a product temporal model with MDPs or POMDPs, and introduces a concise reward mechanism that jointly ensures temporal correctness and optimizes performance. By leveraging automaton synchronization and a scalable reward structure, the approach innovatively enables协同 modeling of MITL and reinforcement learning in both fully and partially observable settings. Empirical evaluations in grid-world and service robot simulations demonstrate the method’s effectiveness, scalability, and robustness.

Technology Category

Application Category

📝 Abstract

Robotic systems operating in dynamic and uncertain environments increasingly require planners that satisfy complex task sequences while adhering to strict temporal constraints. Metric Interval Temporal Logic (MITL) offers a formal and expressive framework for specifying such time-bounded requirements; however, integrating MITL with reinforcement learning (RL) remains challenging due to stochastic dynamics and partial observability. This paper presents a unified automata-based RL framework for synthesizing policies in both Markov Decision Processes (MDPs) and Partially Observable Markov Decision Processes (POMDPs) under MITL specifications. MITL formulas are translated into Timed Limit-Deterministic Generalized B\"uchi Automata (Timed-LDGBA) and synchronized with the underlying decision process to construct product timed models suitable for Q-learning. A simple yet expressive reward structure enforces temporal correctness while allowing additional performance objectives. The approach is validated in three simulation studies: a $5 \times 5$ grid-world formulated as an MDP, a $10 \times 10$ grid-world formulated as a POMDP, and an office-like service-robot scenario. Results demonstrate that the proposed framework consistently learns policies that satisfy strict time-bounded requirements under stochastic transitions, scales to larger state spaces, and remains effective in partially observable environments, highlighting its potential for reliable robotic planning in time-critical and uncertain settings.

Problem

Research questions and friction points this paper is trying to address.

Reinforcement learning

Timed constraints

Metric Interval Temporal Logic

Robotics motion planning

Partial observability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Metric Interval Temporal Logic

Timed Automata

Reinforcement Learning