🤖 AI Summary
In multi-service-provider (SP) federated learning, non-cooperative behavior arises from privacy constraints and competing interests among SPs. To address this, we propose a game-theoretic multi-agent reinforcement learning framework that jointly optimizes client assignment, adaptive quantization, and resource scheduling. Our method innovatively integrates Pareto Actor-Critic with expected quantile regression, and introduces Tripartite Cartesian Action Decomposition (TCAD) and a parameterized conjecture generator to enable scalable computation of Pareto-optimal equilibria in high-dimensional action spaces. Experiments demonstrate that our approach outperforms state-of-the-art methods, achieving 5.8% higher total reward and 4.2% improvement in hypervolume metric. It further attains superior trade-offs between individual utility and system-wide performance under data heterogeneity and large-scale deployment.
📝 Abstract
Federated learning (FL) in multi-service provider (SP) ecosystems is fundamentally hampered by non-cooperative dynamics, where privacy constraints and competing interests preclude the centralized optimization of multi-SP communication and computation resources. In this paper, we introduce PAC-MCoFL, a game-theoretic multi-agent reinforcement learning (MARL) framework where SPs act as agents to jointly optimize client assignment, adaptive quantization, and resource allocation. Within the framework, we integrate Pareto Actor-Critic (PAC) principles with expectile regression, enabling agents to conjecture optimal joint policies to achieve Pareto-optimal equilibria while modeling heterogeneous risk profiles. To manage the high-dimensional action space, we devise a ternary Cartesian decomposition (TCAD) mechanism that facilitates fine-grained control. Further, we develop PAC-MCoFL-p, a scalable variant featuring a parameterized conjecture generator that substantially reduces computational complexity with a provably bounded error. Alongside theoretical convergence guarantees, our framework's superiority is validated through extensive simulations -- PAC-MCoFL achieves approximately 5.8% and 4.2% improvements in total reward and hypervolume indicator (HVI), respectively, over the latest MARL solutions. The results also demonstrate that our method can more effectively balance individual SP and system performance in scaled deployments and under diverse data heterogeneity.