InstructRestore: Region-Customized Image Restoration with Human Instructions

📅 2025-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing image restoration methods typically apply global, uniform processing, limiting their ability to support natural-language-guided, region-specific editing. This paper proposes an instruction-driven, region-adaptive diffusion inpainting framework that enables fine-grained, interpretable, and interactive local editing—e.g., “background bokeh” or “enhance the face in the top-left”—for the first time. Key contributions include: (1) constructing the first large-scale dataset of 537K high-resolution image–mask–instruction triplets; (2) designing a ControlNet-inspired, region-aware architecture that fuses multi-scale visual features with instruction-image aligned representations; and (3) achieving precise spatial control via mask-guided conditioning. Extensive experiments demonstrate significant improvements over global restoration baselines on tasks including bokeh synthesis and localized detail enhancement, validating both effectiveness and controllability.

Technology Category

Application Category

📝 Abstract
Despite the significant progress in diffusion prior-based image restoration, most existing methods apply uniform processing to the entire image, lacking the capability to perform region-customized image restoration according to user instructions. In this work, we propose a new framework, namely InstructRestore, to perform region-adjustable image restoration following human instructions. To achieve this, we first develop a data generation engine to produce training triplets, each consisting of a high-quality image, the target region description, and the corresponding region mask. With this engine and careful data screening, we construct a comprehensive dataset comprising 536,945 triplets to support the training and evaluation of this task. We then examine how to integrate the low-quality image features under the ControlNet architecture to adjust the degree of image details enhancement. Consequently, we develop a ControlNet-like model to identify the target region and allocate different integration scales to the target and surrounding regions, enabling region-customized image restoration that aligns with user instructions. Experimental results demonstrate that our proposed InstructRestore approach enables effective human-instructed image restoration, such as images with bokeh effects and user-instructed local enhancement. Our work advances the investigation of interactive image restoration and enhancement techniques. Data, code, and models will be found at https://github.com/shuaizhengliu/InstructRestore.git.
Problem

Research questions and friction points this paper is trying to address.

Enables region-customized image restoration via user instructions
Generates training data for targeted region enhancement tasks
Integrates ControlNet to adjust detail enhancement by region
Innovation

Methods, ideas, or system contributions that make the work stand out.

Region-customized restoration via human instructions
ControlNet-like model for targeted region enhancement
Data generation engine for training triplets
🔎 Similar Papers
No similar papers found.