Adaptive Value Decomposition: Coordinating a Varying Number of Agents in Urban Systems

📅 2026-02-10

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work addresses key challenges in urban multi-agent systems—namely, dynamically varying agent populations, asynchronous action execution, and behavioral homogenization caused by shared policies—which collectively hinder collaborative efficiency. To this end, the paper proposes an Adaptive Value Decomposition (AVD) framework that, for the first time, integrates a lightweight behavioral diversity mechanism within a semi-MARL setting characterized by dynamic agent counts and asynchronous decision-making. The approach synergistically combines adaptive agent modeling, asynchronous training and execution strategies, and a value decomposition architecture. Evaluated on real-world bike-sharing rebalancing tasks in London and Washington, D.C., AVD significantly outperforms existing methods, demonstrating both its effectiveness and strong generalization capability.

Technology Category

Application Category

📝 Abstract

Multi-agent reinforcement learning (MARL) provides a promising paradigm for coordinating multi-agent systems (MAS). However, most existing methods rely on restrictive assumptions, such as a fixed number of agents and fully synchronous action execution. These assumptions are often violated in urban systems, where the number of active agents varies over time, and actions may have heterogeneous durations, resulting in a semi-MARL setting. Moreover, while sharing policy parameters among agents is commonly adopted to improve learning efficiency, it can lead to highly homogeneous actions when a subset of agents make decisions concurrently under similar observations, potentially degrading coordination quality. To address these challenges, we propose Adaptive Value Decomposition (AVD), a cooperative MARL framework that adapts to a dynamically changing agent population. AVD further incorporates a lightweight mechanism to mitigate action homogenization induced by shared policies, thereby encouraging behavioral diversity and maintaining effective cooperation among agents. In addition, we design a training-execution strategy tailored to the semi-MARL setting that accommodates asynchronous decision-making when some agents act at different times. Experiments on real-world bike-sharing redistribution tasks in two major cities, London and Washington, D.C., demonstrate that AVD outperforms state-of-the-art baselines, confirming its effectiveness and generalizability.

Problem

Research questions and friction points this paper is trying to address.

multi-agent reinforcement learning

dynamic agent population

asynchronous execution

action homogenization

urban systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Value Decomposition

multi-agent reinforcement learning

dynamic agent population