PixVerve: Advancing Native UHR Image Generation to 100MP with a Large-Scale High-Quality Dataset

📅 2026-05-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

198K/year
🤖 AI Summary
This work addresses the challenge of text-to-image synthesis at native 100-megapixel (100MP) resolution, which has been hindered by the scarcity of high-quality ultra-high-resolution (UHR) data and the inherent complexity of image content. To overcome these limitations, the authors introduce PixVerve-95K, an open-source dataset comprising 95,000 UHR images accompanied by seven-dimensional fine-grained annotations. Leveraging this dataset, they scale multiple text-to-image foundation models to natively generate 100MP images. Additionally, they propose PixVerve-Bench, a multimodal evaluation benchmark that integrates visual fidelity and semantic alignment metrics. Experimental results demonstrate that the proposed approach substantially improves both generation quality and text-image consistency in the UHR regime, achieving, for the first time, native 100MP text-to-image synthesis and establishing an integrated foundation of data, models, and evaluation for future research in this domain.
📝 Abstract
Text-to-Image (T2I) models have recently seen notable progress around 1K and 2K resolution. With the extreme desire for better visual experience and the rapid development of imaging technology, the demand for Ultra-High-Resolution (UHR) image generation has grown significantly. However, UHR image generation poses great challenges due to the scarcity and complexity of high-resolution content. In this paper, we first introduce PixVerve-95K, a high-quality, open-source UHR T2I dataset curated with a carefully designed data pipeline, which contains 95K images across diverse scenarios (each image has a minimum pixel-count of 100M) and seven-dimensional annotations. Based on our large-scale image-text dataset, we take a pioneering step to extend various T2I foundation models to native 100MP generation with three training schemes. Finally, leveraging both conventional metrics and multimodal large language model-based assessments, our proposed PixVerve-Bench benchmark establishes a comprehensive evaluation protocol for UHR images encompassing visual quality and semantic alignment. Extensive experimental results on our benchmark and the constructive exploration of training strategies collaboratively provide valuable insights for future breakthroughs.
Problem

Research questions and friction points this paper is trying to address.

Ultra-High-Resolution
Text-to-Image
100MP
image generation
data scarcity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Ultra-High-Resolution (UHR)
Text-to-Image Generation
100MP Native Generation
Large-Scale Dataset
Multimodal Evaluation Benchmark