Adaptive Reinforcement and Model Predictive Control Switching for Safe Human-Robot Cooperative Navigation

📅 2026-01-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of enabling mobile collaborative robots to safely navigate and interact in close proximity to humans under human guidance. The authors propose ARMS, a hybrid framework that integrates a PPO-based reinforcement learning (RL) follower with a safety-filtered model predictive control (MPC) formulated via quadratic programming. To handle partial observability, the system employs LSTM networks and temporal LiDAR spatial encoding. A novel learnable adaptive neural switching mechanism enables context-aware action fusion—favoring conservative MPC in low-risk regions and dynamically switching to the RL policy in high-density obstacle scenarios or when MPC becomes infeasible. Experiments demonstrate an 82.5% task success rate in dense environments, outperforming DWA and pure RL baselines by 7.1% and 3.1%, respectively, with a low computational latency of 5.2 ms—33% faster than conventional MPC—validated in both Gazebo simulations and real-robot trials.

Technology Category

Application Category

📝 Abstract
This paper addresses the challenge of human-guided navigation for mobile collaborative robots under simultaneous proximity regulation and safety constraints. We introduce Adaptive Reinforcement and Model Predictive Control Switching (ARMS), a hybrid learning-control framework that integrates a reinforcement learning follower trained with Proximal Policy Optimization (PPO) and an analytical one-step Model Predictive Control (MPC) formulated as a quadratic program safety filter. To enable robust perception under partial observability and non-stationary human motion, ARMS employs a decoupled sensing architecture with a Long Short-Term Memory (LSTM) temporal encoder for the human-robot relative state and a spatial encoder for 360-degree LiDAR scans. The core contribution is a learned adaptive neural switcher that performs context-aware soft action fusion between the two controllers, favoring conservative, constraint-aware QP-based control in low-risk regions while progressively shifting control authority to the learned follower in highly cluttered or constrained scenarios where maneuverability is critical, and reverting to the follower action when the QP becomes infeasible. Extensive evaluations against Pure Pursuit, Dynamic Window Approach (DWA), and an RL-only baseline demonstrate that ARMS achieves an 82.5 percent success rate in highly cluttered environments, outperforming DWA and RL-only approaches by 7.1 percent and 3.1 percent, respectively, while reducing average computational latency by 33 percent to 5.2 milliseconds compared to a multi-step MPC baseline. Additional simulation transfer in Gazebo and initial real-world deployment results further indicate the practicality and robustness of ARMS for safe and efficient human-robot collaboration. Source code and a demonstration video are available at https://github.com/21ning/ARMS.git.
Problem

Research questions and friction points this paper is trying to address.

human-robot collaboration
safe navigation
proximity regulation
safety constraints
mobile collaborative robots
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Switching
Reinforcement Learning
Model Predictive Control
Human-Robot Collaboration
Safe Navigation
🔎 Similar Papers
No similar papers found.
N
Ning Liu
School of Engineering, The University of Western Australia, 35 Stirling Highway, Perth, 6009, WA, Australia
S
Sen Shen
Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR
Z
Zheng Li
School of Engineering, The University of Western Australia, 35 Stirling Highway, Perth, 6009, WA, Australia
M
Matthew D'Souza
School of Electrical Engineering and Computer Science, The University of Queensland, Brisbane, 4072, QLD, Australia
Jen Jen Chung
Jen Jen Chung
The University of Queensland
RoboticsInformative ExplorationScene UnderstandingReinforcement LearningMulti-robot Systems
T
Thomas Braunl
School of Engineering, The University of Western Australia, 35 Stirling Highway, Perth, 6009, WA, Australia