🤖 AI Summary
This work addresses the challenge of coupled interference between object representation and coordination policy in decentralized multi-robot cooperative transport, arising from partial observability and non-stationarity. To disentangle these components, the paper proposes DeReCo, a three-stage training framework: first, a coordination policy is trained under centralized settings using privileged information; second, object representations are reconstructed from local observations; and third, privileged information is progressively removed to enable fully decentralized execution. This approach substantially improves sample efficiency and cross-scenario generalization to objects with diverse shapes and physical properties. Experimental results demonstrate that DeReCo outperforms existing baselines in simulation, successfully generalizes to six previously unseen objects, and enables real-world robots to efficiently accomplish cooperative transport tasks with two novel objects.
📝 Abstract
Generalizing decentralized multi-robot cooperative transport across objects with diverse shapes and physical properties remains a fundamental challenge. Under decentralized execution, two key challenges arise: object-dependent representation learning under partial observability and coordination learning in multi-agent reinforcement learning (MARL) under non-stationarity. A typical approach jointly optimizes object-dependent representations and coordinated policies in an end-to-end manner while randomizing object shapes and physical properties during training. However, this joint optimization tightly couples representation and coordination learning, introducing bidirectional interference: inaccurate representations under partial observability destabilize coordination learning, while non-stationarity in MARL further degrades representation learning, resulting in sample-inefficient training. To address this structural coupling, we propose DeReCo, a novel MARL framework that decouples representation and coordination learning for object-adaptive multi-robot cooperative transport, improving sample efficiency and generalization across objects and transport scenarios. DeReCo adopts a three-stage training strategy: (1) centralized coordination learning with privileged object information, (2) reconstruction of object-dependent representations from local observations, and (3) progressive removal of privileged information for decentralized execution. This decoupling mitigates interference between representation and coordination learning and enables stable and sample-efficient training. Experimental results show that DeReCo outperforms baselines in simulation on three training objects, generalizes to six unseen objects with varying masses and friction coefficients, and achieves superior performance on two unseen objects in real-robot experiments.