π€ AI Summary
This paper addresses the optimal composition design problem in partially controllable multi-agent systems (PCMAS), where the number and policies of controllable agents must be jointly optimized alongside the equilibrium response policies of uncontrollable agentsβa computationally intensive bilevel game optimization problem. To tackle this, we propose a hypernetwork-based meta-policy generation framework that unifies the modeling of both agent types to enable cross-configuration knowledge transfer. We further introduce reward-parameter adaptive optimization and mean-action approximation to overcome the limitations of conventional configuration-wise independent training. Experiments on a real-world New York City taxi dataset demonstrate that our method significantly improves order response rate and service coverage while achieving higher equilibrium policy approximation accuracy than state-of-the-art approaches. These results validate the effectiveness of controllable agents in enhancing system-level coordinated decision-making.
π Abstract
Partially Controlled Multi-Agent Systems (PCMAS) are comprised of controllable agents, managed by a system designer, and uncontrollable agents, operating autonomously. This study addresses an optimal composition design problem in PCMAS, which involves the system designer's problem, determining the optimal number and policies of controllable agents, and the uncontrollable agents' problem, identifying their best-response policies. Solving this bi-level optimization problem is computationally intensive, as it requires repeatedly solving multi-agent reinforcement learning problems under various compositions for both types of agents. To address these challenges, we propose a novel hypernetwork-based framework that jointly optimizes the system's composition and agent policies. Unlike traditional methods that train separate policy networks for each composition, the proposed framework generates policies for both controllable and uncontrollable agents through a unified hypernetwork. This approach enables efficient information sharing across similar configurations, thereby reducing computational overhead. Additional improvements are achieved by incorporating reward parameter optimization and mean action networks. Using real-world New York City taxi data, we demonstrate that our framework outperforms existing methods in approximating equilibrium policies. Our experimental results show significant improvements in key performance metrics, such as order response rate and served demand, highlighting the practical utility of controlling agents and their potential to enhance decision-making in PCMAS.