AdaGen: Learning Adaptive Policy for Image Synthesis

📅 2025-10-31
🏛️ IEEE Transactions on Pattern Analysis and Machine Intelligence
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing image generation methods rely on fixed scheduling strategies that fail to adapt to sample-specific characteristics, thereby limiting generative performance. This work proposes a lightweight, learnable adaptive scheduling framework by formulating parameter scheduling as a Markov decision process, enabling sample-aware dynamic scheduling through reinforcement learning. The approach innovatively incorporates an adversarial reward mechanism alongside a pre-trained reward model to effectively mitigate reward hacking while allowing flexible trade-offs between fidelity and diversity during inference. Extensive experiments across four mainstream generative paradigms demonstrate its effectiveness—for instance, reducing VAR’s FID from 1.92 to 1.59 or achieving superior generation quality on DiT-XL with only one-third of the original inference cost.

Technology Category

Application Category

📝 Abstract
Recent advances in image synthesis have been propelled by powerful generative models, such as Masked Generative Transformers (MaskGIT), autoregressive models, diffusion models, and rectified flow models. A common principle behind their success is the decomposition of complex synthesis tasks into multiple tractable steps. However, this introduces a proliferation of step-specific parameters to be configured for modulating the iterative generation process (e.g., mask ratio, noise level, or temperature at each step). Existing approaches typically rely on manually-designed scheduling rules to manage this complexity, demanding expert knowledge and extensive trial-and-error. Furthermore, these static schedules lack the flexibility to adapt to the unique characteristics of each individual sample, yielding sub-optimal performance. To address this issue, we present AdaGen, a <italic>general</italic>, <italic>learnable</italic>, and <italic>sample-adaptive</italic> framework for scheduling the iterative generation process. Specifically, we formulate the scheduling problem as a Markov Decision Process, where a lightweight policy network is introduced to adaptively determine the most suitable parameters given the current generation state, and can be trained through reinforcement learning. Importantly, we demonstrate that simple reward designs, such as FID or pre-trained reward models, can be easily hacked and may not reliably guarantee the desired quality or diversity of generated samples. Therefore, we propose an adversarial reward design to guide the training of the policy networks effectively. Finally, we introduce an inference-time refinement strategy and a controllable fidelity-diversity trade-off mechanism to further enhance the performance and flexibility of AdaGen. Comprehensive experiments across five benchmark datasets (ImageNet-256 × 256 & 512 × 512, MS-COCO, CC3M, and LAION-5B) and four distinct generative paradigms validate the superiority of AdaGen . For example, AdaGen achieves better performance on DiT-XL with <inline-formula><tex-math notation="LaTeX">$\mathbf {\sim 3\times }$</tex-math><alternatives><mml:math><mml:mrow><mml:mo>∼</mml:mo><mml:mn mathvariant="bold">3</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="huang-ieq1-3626772.gif"/></alternatives></inline-formula> lower inference cost and improves the FID of VAR from 1.92 to 1.59 with negligible additional computational overhead.
Problem

Research questions and friction points this paper is trying to address.

image synthesis
adaptive scheduling
generative models
sample-specific parameters
iterative generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive policy
reinforcement learning
image synthesis
adversarial reward
generative scheduling
Zanlin Ni
Zanlin Ni
Tsinghua University
Computer VisionDeep Learning
Yulin Wang
Yulin Wang
Shanghai Jiao Tong University
Y
Yeguo Hua
Department of Automation, BNRist, Tsinghua University, Beijing, China
R
Renping Zhou
Department of Automation, BNRist, Tsinghua University, Beijing, China
Jiayi Guo
Jiayi Guo
PhD student, Tsinghua University
computer visionmachine learninggenerative models
Jun Song
Jun Song
Shenzhen University
nanophotonics
Bo Zheng
Bo Zheng
Researcher, Alibaba Group
AINetworkE-Commerce
G
Gao Huang
Department of Automation, BNRist, Tsinghua University, Beijing, China