SD2AIL: Adversarial Imitation Learning from Synthetic Demonstrations via Diffusion Models

📅 2025-12-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the performance bottleneck of adversarial imitation learning (AIL) caused by scarce expert demonstrations, this paper proposes DiffAIL: the first AIL framework integrating diffusion models into the discriminator to generate high-fidelity synthetic expert trajectories; and introduces Priority Expert Demonstration Replay (PEDR), a differentiable, quality-aware replay mechanism that dynamically selects high-value samples to enhance data efficiency and policy stability. By unifying generative modeling with discriminative learning, DiffAIL achieves an average return of 3441 on the Hopper benchmark—surpassing the state-of-the-art by +89. It further demonstrates superior robustness and sample efficiency across multiple simulated locomotion tasks. The core contributions are: (1) a diffusion-based paradigm for synthesizing expert demonstrations, and (2) a differentiable priority replay mechanism explicitly optimized for demonstration quality.

Technology Category

Application Category

📝 Abstract
Adversarial Imitation Learning (AIL) is a dominant framework in imitation learning that infers rewards from expert demonstrations to guide policy optimization. Although providing more expert demonstrations typically leads to improved performance and greater stability, collecting such demonstrations can be challenging in certain scenarios. Inspired by the success of diffusion models in data generation, we propose SD2AIL, which utilizes synthetic demonstrations via diffusion models. We first employ a diffusion model in the discriminator to generate synthetic demonstrations as pseudo-expert data that augment the expert demonstrations. To selectively replay the most valuable demonstrations from the large pool of (pseudo-) expert demonstrations, we further introduce a prioritized expert demonstration replay strategy (PEDR). The experimental results on simulation tasks demonstrate the effectiveness and robustness of our method. In particular, in the Hopper task, our method achieves an average return of 3441, surpassing the state-of-the-art method by 89. Our code will be available at https://github.com/positron-lpc/SD2AIL.
Problem

Research questions and friction points this paper is trying to address.

Generates synthetic expert demonstrations using diffusion models
Introduces prioritized replay to select valuable demonstrations
Improves imitation learning performance with augmented expert data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses diffusion models to generate synthetic expert demonstrations
Introduces prioritized replay strategy for valuable demonstrations
Augments real demonstrations with synthetic data via discriminator
🔎 Similar Papers
No similar papers found.
Pengcheng Li
Pengcheng Li
Ph.D. of Computer Science, University of Rochester; Google (present)
Programming SystemsCompilersRuntimes.
Q
Qiang Fang
College of Intelligence Science and Technology, National University of Defense Technology, Changsha, Hunan, China
T
Tong Zhao
College of Intelligence Science and Technology, National University of Defense Technology, Changsha, Hunan, China
Y
Yixing Lan
College of Intelligence Science and Technology, National University of Defense Technology, Changsha, Hunan, China
X
Xin Xu
College of Intelligence Science and Technology, National University of Defense Technology, Changsha, Hunan, China