🤖 AI Summary
Existing image restoration methods typically apply global, uniform processing, limiting their ability to support natural-language-guided, region-specific editing. This paper proposes an instruction-driven, region-adaptive diffusion inpainting framework that enables fine-grained, interpretable, and interactive local editing—e.g., “background bokeh” or “enhance the face in the top-left”—for the first time. Key contributions include: (1) constructing the first large-scale dataset of 537K high-resolution image–mask–instruction triplets; (2) designing a ControlNet-inspired, region-aware architecture that fuses multi-scale visual features with instruction-image aligned representations; and (3) achieving precise spatial control via mask-guided conditioning. Extensive experiments demonstrate significant improvements over global restoration baselines on tasks including bokeh synthesis and localized detail enhancement, validating both effectiveness and controllability.
📝 Abstract
Despite the significant progress in diffusion prior-based image restoration, most existing methods apply uniform processing to the entire image, lacking the capability to perform region-customized image restoration according to user instructions. In this work, we propose a new framework, namely InstructRestore, to perform region-adjustable image restoration following human instructions. To achieve this, we first develop a data generation engine to produce training triplets, each consisting of a high-quality image, the target region description, and the corresponding region mask. With this engine and careful data screening, we construct a comprehensive dataset comprising 536,945 triplets to support the training and evaluation of this task. We then examine how to integrate the low-quality image features under the ControlNet architecture to adjust the degree of image details enhancement. Consequently, we develop a ControlNet-like model to identify the target region and allocate different integration scales to the target and surrounding regions, enabling region-customized image restoration that aligns with user instructions. Experimental results demonstrate that our proposed InstructRestore approach enables effective human-instructed image restoration, such as images with bokeh effects and user-instructed local enhancement. Our work advances the investigation of interactive image restoration and enhancement techniques. Data, code, and models will be found at https://github.com/shuaizhengliu/InstructRestore.git.