Scaling Multi-Agent Environment Co-Design with Diffusion Models

📅 2025-11-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address scalability limitations, inefficient high-dimensional search, and low sample efficiency under dynamic objectives in joint policy-environment optimization for multi-agent systems, this paper proposes a diffusion-model-based co-design framework. Our method integrates generative environment search, joint policy-environment optimization, and constraint-aware sampling. Key contributions include: (1) projection-based universal guidance sampling, enabling efficient, high-reward environment exploration while strictly satisfying hard constraints; and (2) critic distillation, allowing the diffusion model to effectively incorporate dense reinforcement learning feedback and adapt to evolving policies. Evaluated on multi-agent path planning, warehouse automation, and wind-field optimization benchmarks, our approach significantly outperforms existing methods: it achieves a 39% improvement in task reward for warehouse automation and reduces simulation sample requirements by 66%.

Technology Category

Application Category

📝 Abstract
The agent-environment co-design paradigm jointly optimises agent policies and environment configurations in search of improved system performance. With application domains ranging from warehouse logistics to windfarm management, co-design promises to fundamentally change how we deploy multi-agent systems. However, current co-design methods struggle to scale. They collapse under high-dimensional environment design spaces and suffer from sample inefficiency when addressing moving targets inherent to joint optimisation. We address these challenges by developing Diffusion Co-Design (DiCoDe), a scalable and sample-efficient co-design framework pushing co-design towards practically relevant settings. DiCoDe incorporates two core innovations. First, we introduce Projected Universal Guidance (PUG), a sampling technique that enables DiCoDe to explore a distribution of reward-maximising environments while satisfying hard constraints such as spatial separation between obstacles. Second, we devise a critic distillation mechanism to share knowledge from the reinforcement learning critic, ensuring that the guided diffusion model adapts to evolving agent policies using a dense and up-to-date learning signal. Together, these improvements lead to superior environment-policy pairs when validated on challenging multi-agent environment co-design benchmarks including warehouse automation, multi-agent pathfinding and wind farm optimisation. Our method consistently exceeds the state-of-the-art, achieving, for example, 39% higher rewards in the warehouse setting with 66% fewer simulation samples. This sets a new standard in agent-environment co-design, and is a stepping stone towards reaping the rewards of co-design in real world domains.
Problem

Research questions and friction points this paper is trying to address.

Scaling multi-agent environment co-design in high-dimensional spaces
Improving sample efficiency for joint optimization of policies and environments
Addressing moving targets inherent in agent-environment co-design optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Co-Design framework enables scalable environment co-design
Projected Universal Guidance samples constraint-satisfying reward-maximizing environments
Critic distillation shares knowledge for adaptive diffusion model updates
🔎 Similar Papers