Fine-T2I: An Open, Large-Scale, and Diverse Dataset for High-Quality T2I Fine-Tuning

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing open-source text-to-image (T2I) fine-tuning datasets—such as low resolution, poor text-image alignment, and insufficient diversity—which significantly hinder the performance of open models compared to proprietary systems. To bridge this gap, the authors construct the first large-scale open-source T2I dataset that approaches pretraining-scale size while maintaining fine-tuning-level quality. The dataset integrates synthetically generated images with real photographs from professional photographers, covering 10 task combinations, 32 prompt categories, 11 visual styles, and 5 prompt templates. Through rigorous multi-stage automated and manual filtering focused on text-image alignment, visual fidelity, and prompt quality, only approximately 5% of the highest-quality samples are retained. Experiments across multiple diffusion and autoregressive models demonstrate substantial improvements in image generation quality and instruction-following capability, effectively narrowing the data divide between open-source and enterprise-grade T2I models.

Technology Category

Application Category

📝 Abstract
High-quality and open datasets remain a major bottleneck for text-to-image (T2I) fine-tuning. Despite rapid progress in model architectures and training pipelines, most publicly available fine-tuning datasets suffer from low resolution, poor text-image alignment, or limited diversity, resulting in a clear performance gap between open research models and enterprise-grade models. In this work, we present Fine-T2I, a large-scale, high-quality, and fully open dataset for T2I fine-tuning. Fine-T2I spans 10 task combinations, 32 prompt categories, 11 visual styles, and 5 prompt templates, and combines synthetic images generated by strong modern models with carefully curated real images from professional photographers. All samples are rigorously filtered for text-image alignment, visual fidelity, and prompt quality, with over 95% of initial candidates removed. The final dataset contains over 6 million text-image pairs, around 2 TB on disk, approaching the scale of pretraining datasets while maintaining fine-tuning-level quality. Across a diverse set of pretrained diffusion and autoregressive models, fine-tuning on Fine-T2I consistently improves both generation quality and instruction adherence, as validated by human evaluation, visual comparison, and automatic metrics. We release Fine-T2I under an open license to help close the data gap in T2I fine-tuning in the open community.
Problem

Research questions and friction points this paper is trying to address.

text-to-image
fine-tuning
dataset
data gap
open research
Innovation

Methods, ideas, or system contributions that make the work stand out.

text-to-image
fine-tuning dataset
high-quality data
open dataset
instruction adherence
🔎 Similar Papers
2024-08-10AAAI Conference on Artificial IntelligenceCitations: 30
Xu Ma
Xu Ma
Northeastern University
Computer VisionMachine LearningGenerative AIMultimodal LLMs
Yitian Zhang
Yitian Zhang
Northeastern University
computer vision
Q
Qihua Dong
Department of Electrical & Computer Engineering, Northeastern University, Boston
Y
Yun Fu
Department of Electrical & Computer Engineering, Northeastern University, Boston