On Discovering Algorithms for Adversarial Imitation Learning

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Adversarial Imitation Learning (AIL) performs well under expert data scarcity but suffers from training instability and overreliance on hand-crafted reward assignment (RA) functions—whose critical impact on policy performance is often overlooked. Method: We propose a data-driven framework for automatic RA function discovery, introducing large language model-guided evolutionary search to AIL for the first time. Our approach jointly optimizes density ratio estimation and meta-learning to efficiently explore high-performing RA structures within the functional space. Contribution/Results: The resulting algorithm, DAIL, significantly outperforms state-of-the-art methods across multiple benchmark tasks, achieving both enhanced training stability and superior policy performance. By automating RA design, DAIL overcomes the generalization and scalability limitations inherent in conventional manual RA engineering, establishing a new paradigm for robust and adaptive imitation learning.

Technology Category

Application Category

📝 Abstract

Adversarial Imitation Learning (AIL) methods, while effective in settings with limited expert demonstrations, are often considered unstable. These approaches typically decompose into two components: Density Ratio (DR) estimation $frac{ρ_E}{ρ_π}$, where a discriminator estimates the relative occupancy of state-action pairs under the policy versus the expert; and Reward Assignment (RA), where this ratio is transformed into a reward signal used to train the policy. While significant research has focused on improving density estimation, the role of reward assignment in influencing training dynamics and final policy performance has been largely overlooked. RA functions in AIL are typically derived from divergence minimization objectives, relying heavily on human design and ingenuity. In this work, we take a different approach: we investigate the discovery of data-driven RA functions, i.e, based directly on the performance of the resulting imitation policy. To this end, we leverage an LLM-guided evolutionary framework that efficiently explores the space of RA functions, yielding emph{Discovered Adversarial Imitation Learning} (DAIL), the first meta-learnt AIL algorithm. Remarkably, DAIL generalises across unseen environments and policy optimization algorithms, outperforming the current state-of-the-art of emph{human-designed} baselines. Finally, we analyse why DAIL leads to more stable training, offering novel insights into the role of RA functions in the stability of AIL. Code is publicly available: https://github.com/shshnkreddy/DAIL.

Problem

Research questions and friction points this paper is trying to address.

Addressing instability in adversarial imitation learning methods

Investigating overlooked reward assignment functions in imitation learning

Discovering data-driven reward functions via evolutionary algorithms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evolutionary framework discovers reward assignment functions

Data-driven approach replaces human-designed reward mechanisms

Meta-learned algorithm generalizes across environments and optimizers

🔎 Similar Papers

RILe: Reinforced Imitation Learning

2024-06-12arXiv.orgCitations: 0

Bosch Group

Renningen, BW, DE

Master Thesis Bridging the Gap between Reinforcement Learning & E2E Driving

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Robotic Control Policy (PhD)