VINS-120K: Ultra High-Resolution Image Editing with A Large-Scale Dataset

📅 2026-05-22
📈 Citations: 0
Influential: 0
📄 PDF

career value

212K/year
🤖 AI Summary
This work addresses the limited progress in ultra-high-definition (UHD) image editing, which stems from the scarcity of high-quality data and inadequate modeling of high-frequency details. To overcome these challenges, we introduce VINS-120K, the first large-scale, instruction-driven UHD image editing dataset, comprising 120,000 carefully curated 4K+ resolution image triplets obtained through multi-stage filtering. We further propose a high-frequency-aware post-adaptation strategy that effectively transfers pre-trained models to the UHD domain. Additionally, we establish VINS-4KEval, a comprehensive benchmark covering diverse editing tasks. Experimental results demonstrate that our approach significantly enhances the realism of fine-grained texture synthesis and overall editing quality in ultra-high-resolution images.
📝 Abstract
Directly editing ultra-high-resolution (UHR) images is valuable but underexplored, primarily due to the lack of high-quality data and the challenge in modeling high-frequency texture details. We introduce VINS-120K, the first large-scale dataset for instruction-based UHR image editing, comprising 120K carefully curated triplets of instruction, input image, and edited image. Each image exceeds 4K resolution ($\geq$4096 $\times$ 4096) and is filtered through a rigorous multi-stage pipeline to ensure visual quality, instruction alignment, and aesthetic fidelity. Built on VINS-120K, we further develop a high-frequency-aware post-adaptation strategy to extend pretrained non-high-resolution models to the UHR regime. We also present VINS-4KEval, a benchmark covering diverse editing types, to facilitate consistent evaluation in UHR settings. Experiments confirm that our work improves fine-grained detail synthesis and texture realism in UHR image editing.
Problem

Research questions and friction points this paper is trying to address.

ultra-high-resolution image editing
high-frequency texture
large-scale dataset
instruction-based editing
4K resolution
Innovation

Methods, ideas, or system contributions that make the work stand out.

ultra-high-resolution image editing
large-scale dataset
high-frequency-aware adaptation
instruction-based editing
4K evaluation benchmark