Fair Benchmarking of Emerging One-Step Generative Models Against Multistep Diffusion and Flow Models

📅 2026-03-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of fair, unified evaluation protocols for comparing single-step and multi-step text-to-image generative models, particularly under classifier-free guidance (CFG), varying inference steps, and out-of-distribution generalization. The authors propose a standardized cross-model evaluation framework under consistent class-conditional settings, benchmarking eight prominent models—including single-step flow models, multi-step baselines, and established systems—and introduce reLAIONet, an out-of-distribution test set aligned with ImageNet labels. They further develop the MinMax Harmonic Mean (MMHM), a composite metric integrating FID, Inception Score, CLIP Score, and Pick Score, to holistically assess performance across CFG strengths and inference budgets. Key findings reveal that optimizing solely for FID in few-step regimes compromises text-image alignment and human preference; while leading single-step models gain competitiveness with more inference steps, they still exhibit localized distortions.

Technology Category

Application Category

📝 Abstract
State-of-the-art text-to-image models produce high-quality images, but inference remains expensive as generation requires several sequential ODE or denoising steps. Native one-step models aim to reduce this cost by mapping noise to an image in a single step, yet fair comparisons to multi-step systems are difficult because studies use mismatched sampling steps and different classifier-free guidance (CFG) settings, where CFG can shift FID, Inception Score, and CLIP-based alignment in opposing directions. It is also unclear how well one-step models scale to multi-step inference, and there is limited standardized out-of-distribution evaluation for label-ID-conditioned generators beyond ImageNet. To address this, We benchmark eight models spanning one-step flows (MeanFlow, Improved MeanFlow, SoFlow), multi-step baselines (RAE, Scale-RAE), and established systems (SiT, Stable Diffusion 3.5, FLUX.1) under a controlled class-conditional protocol on ImageNet validation, ImageNetV2, and reLAIONet, our new proofread out-of-distribution dataset aligned to ImageNet label IDs. Using FID, Inception Score, CLIP Score, and Pick Score, we show that FID-focused model development and CFG selection can be misleading in few-step regimes, where guidance changes can improve FID while degrading text-image alignment and human preference signals and worsening perceived quality. We further show that leading one-step models benefit from step scaling and become substantially more competitive under multi-step inference, although they still exhibit characteristic local distortions. To capture these tradeoffs, we introduce MinMax Harmonic Mean (MMHM), a composite proxy over all four metrics that stabilizes hyperparameter selection across guidance and step sweeps.
Problem

Research questions and friction points this paper is trying to address.

one-step generative models
multi-step diffusion models
fair benchmarking
classifier-free guidance
out-of-distribution evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

one-step generative models
fair benchmarking
out-of-distribution evaluation
MinMax Harmonic Mean
classifier-free guidance
🔎 Similar Papers
No similar papers found.
A
Advaith Ravishankar
Harvard AI and Robotics Lab, Harvard University
S
Serena Liu
Harvard AI and Robotics Lab, Harvard University
Mingyang Wang
Mingyang Wang
University of Munich (LMU Munich)
Natural Language Processing
T
Todd Zhou
Harvard AI and Robotics Lab, Harvard University
J
Jeffrey Zhou
Harvard AI and Robotics Lab, Harvard University
A
Arnav Sharma
Harvard AI and Robotics Lab, Harvard University
Z
Ziling Hu
Harvard AI and Robotics Lab, Harvard University
L
Léopold Das
Harvard AI and Robotics Lab, Harvard University
A
Abdulaziz Sobirov
Harvard AI and Robotics Lab, Harvard University
F
Faizaan Siddique
Harvard AI and Robotics Lab, Harvard University
F
Freddy Yu
Harvard AI and Robotics Lab, Harvard University
S
Seungjoo Baek
Harvard AI and Robotics Lab, Harvard University
Yan Luo
Yan Luo
Harvard University
Computer VisionMachine LearningBiomedical ImagingAI for Medicine
Mengyu Wang
Mengyu Wang
Assistant Professor, Harvard Medical School
Artificial IntelligenceMachine LearningOphthalmologyGlaucomaComputational Mechanics