🤖 AI Summary
This work addresses the challenge of enabling mobile collaborative robots to safely navigate and interact in close proximity to humans under human guidance. The authors propose ARMS, a hybrid framework that integrates a PPO-based reinforcement learning (RL) follower with a safety-filtered model predictive control (MPC) formulated via quadratic programming. To handle partial observability, the system employs LSTM networks and temporal LiDAR spatial encoding. A novel learnable adaptive neural switching mechanism enables context-aware action fusion—favoring conservative MPC in low-risk regions and dynamically switching to the RL policy in high-density obstacle scenarios or when MPC becomes infeasible. Experiments demonstrate an 82.5% task success rate in dense environments, outperforming DWA and pure RL baselines by 7.1% and 3.1%, respectively, with a low computational latency of 5.2 ms—33% faster than conventional MPC—validated in both Gazebo simulations and real-robot trials.
📝 Abstract
This paper addresses the challenge of human-guided navigation for mobile collaborative robots under simultaneous proximity regulation and safety constraints. We introduce Adaptive Reinforcement and Model Predictive Control Switching (ARMS), a hybrid learning-control framework that integrates a reinforcement learning follower trained with Proximal Policy Optimization (PPO) and an analytical one-step Model Predictive Control (MPC) formulated as a quadratic program safety filter. To enable robust perception under partial observability and non-stationary human motion, ARMS employs a decoupled sensing architecture with a Long Short-Term Memory (LSTM) temporal encoder for the human-robot relative state and a spatial encoder for 360-degree LiDAR scans. The core contribution is a learned adaptive neural switcher that performs context-aware soft action fusion between the two controllers, favoring conservative, constraint-aware QP-based control in low-risk regions while progressively shifting control authority to the learned follower in highly cluttered or constrained scenarios where maneuverability is critical, and reverting to the follower action when the QP becomes infeasible. Extensive evaluations against Pure Pursuit, Dynamic Window Approach (DWA), and an RL-only baseline demonstrate that ARMS achieves an 82.5 percent success rate in highly cluttered environments, outperforming DWA and RL-only approaches by 7.1 percent and 3.1 percent, respectively, while reducing average computational latency by 33 percent to 5.2 milliseconds compared to a multi-step MPC baseline. Additional simulation transfer in Gazebo and initial real-world deployment results further indicate the practicality and robustness of ARMS for safe and efficient human-robot collaboration. Source code and a demonstration video are available at https://github.com/21ning/ARMS.git.