CoEnv: Driving Embodied Multi-Agent Collaboration via Compositional Environment

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of spatial coordination, temporal reasoning, and shared perception in multi-agent embodied systems operating in complex collaborative settings. The authors propose a compositional environment paradigm that integrates real-world and simulated components into a unified decision-making space, enabling efficient and safe collaboration through a three-stage pipeline: scene-digitization-based real-to-sim mapping, vision-language-model-driven high-level action synthesis, and sim-to-real transfer with collision-aware execution. By decoupling cognitive planning from physical execution, the approach supports real-time high-level interface planning alongside code-level trajectory refinement. Evaluated on multi-arm collaborative benchmarks, the method significantly improves both task success rates and execution efficiency, establishing a new paradigm for multi-agent embodied AI.
📝 Abstract
Multi-agent embodied systems hold promise for complex collaborative manipulation, yet face critical challenges in spatial coordination, temporal reasoning, and shared workspace awareness. Inspired by human collaboration where cognitive planning occurs separately from physical execution, we introduce the concept of compositional environment -- a synergistic integration of real-world and simulation components that enables multiple robotic agents to perceive intentions and operate within a unified decision-making space. Building on this concept, we present CoEnv, a framework that leverages simulation for safe strategy exploration while ensuring reliable real-world deployment. CoEnv operates through three stages: real-to-sim scene reconstruction that digitizes physical workspaces, VLM-driven action synthesis supporting both real-time planning with high-level interfaces and iterative planning with code-based trajectory generation, and validated sim-to-real transfer with collision detection for safe deployment. Extensive experiments on challenging multi-arm manipulation benchmarks demonstrate CoEnv's effectiveness in achieving high task success rates and execution efficiency, establishing a new paradigm for multi-agent embodied AI.
Problem

Research questions and friction points this paper is trying to address.

multi-agent
embodied AI
collaborative manipulation
spatial coordination
shared workspace awareness
Innovation

Methods, ideas, or system contributions that make the work stand out.

compositional environment
multi-agent embodied AI
sim-to-real transfer
visual language model
collaborative manipulation
🔎 Similar Papers
No similar papers found.