DiffFP: Learning Behaviors from Scratch via Diffusion-based Fictitious Play

📅 2025-11-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multi-agent self-play in continuous decision spaces often suffers from poor convergence to Nash equilibria and limited policy generalization. To address this, we propose DiffFP—a novel framework that introduces diffusion models into fictitious play (FP), enabling generative modeling of multimodal best-response policies. DiffFP explicitly captures the uncertainty in strategy distributions via a diffusion process and supports end-to-end training. In zero-sum settings—including racing and multi-particle games—it achieves rapid convergence to ε-Nash equilibria. Empirical results demonstrate that DiffFP accelerates convergence by up to 3× and improves task success rates by an average of 30× over state-of-the-art reinforcement learning baselines. Moreover, it significantly enhances policy robustness, diversity, and adaptability to unseen opponents—thereby overcoming key limitations of conventional FP and RL-based approaches in continuous multi-agent learning.

Technology Category

Application Category

📝 Abstract
Self-play reinforcement learning has demonstrated significant success in learning complex strategic and interactive behaviors in competitive multi-agent games. However, achieving such behaviors in continuous decision spaces remains challenging. Ensuring adaptability and generalization in self-play settings is critical for achieving competitive performance in dynamic multi-agent environments. These challenges often cause methods to converge slowly or fail to converge at all to a Nash equilibrium, making agents vulnerable to strategic exploitation by unseen opponents. To address these challenges, we propose DiffFP, a fictitious play (FP) framework that estimates the best response to unseen opponents while learning a robust and multimodal behavioral policy. Specifically, we approximate the best response using a diffusion policy that leverages generative modeling to learn adaptive and diverse strategies. Through empirical evaluation, we demonstrate that the proposed FP framework converges towards $ε$-Nash equilibria in continuous- space zero-sum games. We validate our method on complex multi-agent environments, including racing and multi-particle zero-sum games. Simulation results show that the learned policies are robust against diverse opponents and outperform baseline reinforcement learning policies. Our approach achieves up to 3$ imes$ faster convergence and 30$ imes$ higher success rates on average against RL-based baselines, demonstrating its robustness to opponent strategies and stability across training iterations
Problem

Research questions and friction points this paper is trying to address.

Learning strategic behaviors in continuous multi-agent decision spaces
Achieving robust policy convergence against unseen opponent strategies
Overcoming slow Nash equilibrium convergence in competitive environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion policy approximates best response via generative modeling
Fictitious play framework enables robust multimodal behavioral policies
Achieves faster convergence and higher success rates than baselines
🔎 Similar Papers
No similar papers found.
A
Akash Karthikeyan
Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Canada
Yash Vardhan Pant
Yash Vardhan Pant
Assistant Professor, ECE, University of Waterloo
Control TheoryRoboticsMachine LearningFormal MethodsOptimization