BrushEdit: All-In-One Image Inpainting and Editing

📅 2024-12-13

🏛️ arXiv.org

📈 Citations: 6

✨ Influential: 1

career value

171K/year

🤖 AI Summary

Existing image editing methods face dual bottlenecks in large-scale modifications (e.g., object addition/removal): inversion-based approaches struggle with strong semantic changes, while instruction-driven methods lack fine-grained interactive control over editing regions and intensity. This paper proposes an instruction-guided freehand sketching editing paradigm, introducing the first agent-collaborative framework that seamlessly integrates a multimodal large language model (MLLM) with a dual-branch inpainting model. The framework jointly optimizes editing intent parsing, subject identification, mask generation, and region-specific restoration. By enabling users to freely sketch editable regions and adjust local editing strength, our method significantly improves mask fidelity and semantic consistency. Evaluated on seven quantitative metrics, it achieves state-of-the-art performance—marking the first end-to-end solution for large-scale image editing that simultaneously delivers high precision, strong controllability, and user-interactivity.

Technology Category

Application Category

📝 Abstract

Image editing has advanced significantly with the development of diffusion models using both inversion-based and instruction-based methods. However, current inversion-based approaches struggle with big modifications (e.g., adding or removing objects) due to the structured nature of inversion noise, which hinders substantial changes. Meanwhile, instruction-based methods often constrain users to black-box operations, limiting direct interaction for specifying editing regions and intensity. To address these limitations, we propose BrushEdit, a novel inpainting-based instruction-guided image editing paradigm, which leverages multimodal large language models (MLLMs) and image inpainting models to enable autonomous, user-friendly, and interactive free-form instruction editing. Specifically, we devise a system enabling free-form instruction editing by integrating MLLMs and a dual-branch image inpainting model in an agent-cooperative framework to perform editing category classification, main object identification, mask acquisition, and editing area inpainting. Extensive experiments show that our framework effectively combines MLLMs and inpainting models, achieving superior performance across seven metrics including mask region preservation and editing effect coherence.

Problem

Research questions and friction points this paper is trying to address.

Overcoming limitations of inversion-based image editing for big modifications

Addressing constraints of instruction-based methods on user interaction

Integrating MLLMs and inpainting models for free-form instruction editing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates MLLMs and inpainting models for editing

Enables free-form instruction-based interactive editing

Uses dual-branch inpainting for precise mask acquisition

🔎 Similar Papers

Streamlining Image Editing with Layered Diffusion Brushes