UltraHR-100K: Enhancing UHR Image Synthesis with A Large-Scale High-Quality Dataset

📅 2025-10-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of high-quality, large-scale datasets and effective fine-grained detail synthesis strategies for ultra-high-resolution (UHR) text-to-image generation, this work introduces UltraHR—the first UHR image dataset comprising 100K images at ≥3K resolution, each annotated with fine-grained semantic labels. We further propose a frequency-aware post-training framework featuring a detail-oriented temporal sampling strategy and discrete Fourier transform (DFT)-based soft-weighted frequency-domain regularization, which significantly enhances the model’s capacity to capture high-frequency textures and structural details. Evaluated on our newly constructed UltraHR-eval4K benchmark, our method achieves substantial improvements in detail fidelity, structural sharpness, and overall visual quality. This work demonstrates the critical importance of co-optimizing dataset curation and algorithmic design for advancing UHR generative modeling.

Technology Category

Application Category

📝 Abstract
Ultra-high-resolution (UHR) text-to-image (T2I) generation has seen notable progress. However, two key challenges remain : 1) the absence of a large-scale high-quality UHR T2I dataset, and (2) the neglect of tailored training strategies for fine-grained detail synthesis in UHR scenarios. To tackle the first challenge, we introduce extbf{UltraHR-100K}, a high-quality dataset of 100K UHR images with rich captions, offering diverse content and strong visual fidelity. Each image exceeds 3K resolution and is rigorously curated based on detail richness, content complexity, and aesthetic quality. To tackle the second challenge, we propose a frequency-aware post-training method that enhances fine-detail generation in T2I diffusion models. Specifically, we design (i) extit{Detail-Oriented Timestep Sampling (DOTS)} to focus learning on detail-critical denoising steps, and (ii) extit{Soft-Weighting Frequency Regularization (SWFR)}, which leverages Discrete Fourier Transform (DFT) to softly constrain frequency components, encouraging high-frequency detail preservation. Extensive experiments on our proposed UltraHR-eval4K benchmarks demonstrate that our approach significantly improves the fine-grained detail quality and overall fidelity of UHR image generation. The code is available at href{https://github.com/NJU-PCALab/UltraHR-100k}{here}.
Problem

Research questions and friction points this paper is trying to address.

Addressing the lack of large-scale high-quality UHR image datasets
Developing tailored training strategies for fine-grained detail synthesis
Enhancing high-frequency detail preservation in UHR image generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Created UltraHR-100K dataset with 100K high-resolution images
Proposed Detail-Oriented Timestep Sampling for denoising
Introduced Soft-Weighting Frequency Regularization via DFT
🔎 Similar Papers
No similar papers found.
C
Chen Zhao
State Key Laboratory of Novel Software Technology, Nanjing University, China
E
En Ci
State Key Laboratory of Novel Software Technology, Nanjing University, China
Y
Yunzhe Xu
State Key Laboratory of Novel Software Technology, Nanjing University, China
Tiehan Fan
Tiehan Fan
Nanjing University
AIGCMultiModal Learning
S
Shanyan Guan
vivo Mobile Communication Co., Ltd., China
Y
Yanhao Ge
vivo Mobile Communication Co., Ltd., China
J
Jian Yang
State Key Laboratory of Novel Software Technology, Nanjing University, China
Y
Ying Tai
State Key Laboratory of Novel Software Technology, Nanjing University, China