CoEnv: Driving Embodied Multi-Agent Collaboration via Compositional Environment

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work addresses the challenges of spatial coordination, temporal reasoning, and shared perception in multi-agent embodied systems operating in complex collaborative settings. The authors propose a compositional environment paradigm that integrates real-world and simulated components into a unified decision-making space, enabling efficient and safe collaboration through a three-stage pipeline: scene-digitization-based real-to-sim mapping, vision-language-model-driven high-level action synthesis, and sim-to-real transfer with collision-aware execution. By decoupling cognitive planning from physical execution, the approach supports real-time high-level interface planning alongside code-level trajectory refinement. Evaluated on multi-arm collaborative benchmarks, the method significantly improves both task success rates and execution efficiency, establishing a new paradigm for multi-agent embodied AI.

Technology Category

Application Category

📝 Abstract

Multi-agent embodied systems hold promise for complex collaborative manipulation, yet face critical challenges in spatial coordination, temporal reasoning, and shared workspace awareness. Inspired by human collaboration where cognitive planning occurs separately from physical execution, we introduce the concept of compositional environment -- a synergistic integration of real-world and simulation components that enables multiple robotic agents to perceive intentions and operate within a unified decision-making space. Building on this concept, we present CoEnv, a framework that leverages simulation for safe strategy exploration while ensuring reliable real-world deployment. CoEnv operates through three stages: real-to-sim scene reconstruction that digitizes physical workspaces, VLM-driven action synthesis supporting both real-time planning with high-level interfaces and iterative planning with code-based trajectory generation, and validated sim-to-real transfer with collision detection for safe deployment. Extensive experiments on challenging multi-arm manipulation benchmarks demonstrate CoEnv's effectiveness in achieving high task success rates and execution efficiency, establishing a new paradigm for multi-agent embodied AI.

Problem

Research questions and friction points this paper is trying to address.

multi-agent

embodied AI

collaborative manipulation

spatial coordination

shared workspace awareness

Innovation

Methods, ideas, or system contributions that make the work stand out.

compositional environment

multi-agent embodied AI

sim-to-real transfer