T2I-ConBench: Text-to-Image Benchmark for Continual Post-training

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
A standardized evaluation protocol for continual post-training of text-to-image diffusion models is currently lacking, hindering systematic progress in this direction. Method: We introduce the first dedicated benchmark covering two practical scenarios—product customization and domain adaptation—and comprehensively evaluate methods along four dimensions: generalization preservation, task performance, catastrophic forgetting, and cross-task generalization. Our framework features a novel multi-dimensional quantitative evaluation suite, incorporating human preference modeling and vision-language question answering (VL-QA) for the first time to overcome limitations of purely automated metrics, and supports multi-stage task sequence evaluation. Contribution/Results: We systematically assess 10 state-of-the-art methods across three realistic task sequences, revealing that even oracle joint training fails to balance all metrics, with particularly weak cross-task generalization. We open-source the full dataset, codebase, and toolchain, establishing a new standard and actionable roadmap for continual text-to-image learning.

Technology Category

Application Category

📝 Abstract
Continual post-training adapts a single text-to-image diffusion model to learn new tasks without incurring the cost of separate models, but naive post-training causes forgetting of pretrained knowledge and undermines zero-shot compositionality. We observe that the absence of a standardized evaluation protocol hampers related research for continual post-training. To address this, we introduce T2I-ConBench, a unified benchmark for continual post-training of text-to-image models. T2I-ConBench focuses on two practical scenarios, item customization and domain enhancement, and analyzes four dimensions: (1) retention of generality, (2) target-task performance, (3) catastrophic forgetting, and (4) cross-task generalization. It combines automated metrics, human-preference modeling, and vision-language QA for comprehensive assessment. We benchmark ten representative methods across three realistic task sequences and find that no approach excels on all fronts. Even joint"oracle"training does not succeed for every task, and cross-task generalization remains unsolved. We release all datasets, code, and evaluation tools to accelerate research in continual post-training for text-to-image models.
Problem

Research questions and friction points this paper is trying to address.

Standardized evaluation protocol lacking for continual post-training
Catastrophic forgetting undermines zero-shot compositionality in adaptation
No method excels in all continual post-training dimensions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces T2I-ConBench for continual post-training evaluation
Combines automated metrics and human-preference modeling
Benchmarks ten methods across three task sequences
Z
Zhehao Huang
Shanghai Jiao Tong University
Yuhang Liu
Yuhang Liu
The University of Adelaide
Representation LearningLLMsLatent Variable ModelsResponsible AI
Y
Yixin Lou
Shanghai Jiao Tong University
Z
Zhengbao He
Shanghai Jiao Tong University
Mingzhen He
Mingzhen He
Shanghai Jiao Tong University
Machine learning
W
Wenxing Zhou
Shanghai Jiao Tong University
T
Tao Li
Shanghai Jiao Tong University
Kehan Li
Kehan Li
Stanford University
Z
Zeyi Huang
Huawei
Xiaolin Huang
Xiaolin Huang
Professor, Shanghai Jiao Tong University
machine learningkernel methoddeep neural network trainingpiecewise linear model