🤖 AI Summary
This work addresses the critical shortage of large-scale, permissively licensed, and reliably accessible image datasets in contemporary vision generation research. To this end, the authors introduce GPIC—a massive corpus comprising 100 million images (approximately 28 trillion pixels), all rigorously filtered for safety and deduplicated, with accompanying text descriptions automatically generated by state-of-the-art vision-language models. Licensed under a permissive agreement that permits both academic and commercial use, GPIC is centrally hosted on the Hugging Face platform. The release includes the full training, validation, and test splits, a standardized evaluation protocol, and a baseline model based on pixel-space flow matching, thereby establishing the first scalable, safety-compliant, and ready-to-use benchmark resource for visual generation.
📝 Abstract
Studying scalable methods for visual generative modeling requires large, accessible, and stable datasets. We introduce GPIC, a Giant Permissive Image Corpus of approximately 28 trillion pixels. GPIC comprises diverse internet images captioned by a state-of-the-art vision-language model, including 100M training, 200K validation, and 1M test examples. Moreover, all GPIC images are permissively licensed for both research and commercial use. GPIC is safety-filtered, deduplicated, and centrally hosted on Hugging Face. We provide a benchmarking protocol for generative modeling on GPIC. Finally, we provide a reference baseline for pixel-space flow matching on GPIC. Our dataset, benchmark, and models are available at https://huggingface.co/datasets/stanford-vision-lab/gpic. Evaluation toolkit and code are available at https://gpic.stanford.edu