๐ค AI Summary
Handcrafted constraints in multi-agent coordination struggle to handle dynamic environments and fine-grained behaviors (e.g., crowd avoidance). Method: This paper proposes a decentralized hybrid framework that retains verifiable expert controllers while leveraging locally communicated multi-agent reinforcement learning (MARL) to synthesize dynamic safety constraints onlineโthereby unifying reliability and adaptability. A novel tunable controller-dependency mechanism enables adaptive trade-offs between prior knowledge and data-driven decisions. Contribution/Results: Theoretical analysis guarantees convergence. Experiments demonstrate significant improvements over purely handcrafted designs, conventional hybrid approaches, and standard MARL baselines in multi-agent navigation tasks. The method is further validated on real robotic platforms, confirming its practical efficacy and robustness.
๐ Abstract
Constraint-based optimization is a cornerstone of robotics, enabling the design of controllers that reliably encode task and safety requirements such as collision avoidance or formation adherence. However, handcrafted constraints can fail in multi-agent settings that demand complex coordination. We introduce ReCoDe--Reinforcement-based Constraint Design--a decentralized, hybrid framework that merges the reliability of optimization-based controllers with the adaptability of multi-agent reinforcement learning. Rather than discarding expert controllers, ReCoDe improves them by learning additional, dynamic constraints that capture subtler behaviors, for example, by constraining agent movements to prevent congestion in cluttered scenarios. Through local communication, agents collectively constrain their allowed actions to coordinate more effectively under changing conditions. In this work, we focus on applications of ReCoDe to multi-agent navigation tasks requiring intricate, context-based movements and consensus, where we show that it outperforms purely handcrafted controllers, other hybrid approaches, and standard MARL baselines. We give empirical (real robot) and theoretical evidence that retaining a user-defined controller, even when it is imperfect, is more efficient than learning from scratch, especially because ReCoDe can dynamically change the degree to which it relies on this controller.