FAIL: Flow Matching Adversarial Imitation Learning for Image Generation

📅 2026-02-12

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses the issue of policy drift in flow matching models under unseen conditions by proposing FAIL (Free Adversarial Imitation Learning), a reward- and preference-free adversarial imitation learning framework that aligns policies with expert demonstrations through post-training distributional matching. The approach integrates a differentiable ODE solver (FAIL-PD) with policy gradient methods (FAIL-PG), combining low-variance gradient pathways with black-box compatibility, thereby enabling generalization to discrete image and video generation tasks. Using only 13,000 demonstration samples to fine-tune the FLUX model, the method achieves state-of-the-art performance in prompt adherence and aesthetic quality while effectively mitigating reward hacking.

Technology Category

Application Category

📝 Abstract

Post-training of flow matching models-aligning the output distribution with a high-quality target-is mathematically equivalent to imitation learning. While Supervised Fine-Tuning mimics expert demonstrations effectively, it cannot correct policy drift in unseen states. Preference optimization methods address this but require costly preference pairs or reward modeling. We propose Flow Matching Adversarial Imitation Learning (FAIL), which minimizes policy-expert divergence through adversarial training without explicit rewards or pairwise comparisons. We derive two algorithms: FAIL-PD exploits differentiable ODE solvers for low-variance pathwise gradients, while FAIL-PG provides a black-box alternative for discrete or computationally constrained settings. Fine-tuning FLUX with only 13,000 demonstrations from Nano Banana pro, FAIL achieves competitive performance on prompt following and aesthetic benchmarks. Furthermore, the framework generalizes effectively to discrete image and video generation, and functions as a robust regularizer to mitigate reward hacking in reward-based optimization. Code and data are available at https://github.com/HansPolo113/FAIL.

Problem

Research questions and friction points this paper is trying to address.

flow matching

imitation learning

policy drift

adversarial training

image generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Flow Matching

Adversarial Imitation Learning

Policy-Expert Divergence