🤖 AI Summary
Existing image editing methods struggle to simultaneously preserve the sparse structural integrity of line art, enable high-level semantic modifications, and support precise local redrawing—hindering efficiency in digital illustration sketch editing. This paper proposes the first unified editing framework tailored for line art creation. We design an attribute-addition-and-removal chained data generation pipeline and introduce a task-oriented vision–text dual-path Mixture-of-Experts LoRA (MoE-LoRA) architecture, enabling synergistic optimization between semantic-instruction-driven global editing and stroke-guided local redrawing. Key innovations include RGB-channel reuse encoding, a style-aware attribute removal module, and a cross-sequence multi-step editing chain. Our method achieves state-of-the-art performance on both semantic editing and local redrawing tasks, significantly outperforming baselines in instruction adherence, structural fidelity, and style consistency.
📝 Abstract
Sketch editing is central to digital illustration, yet existing image editing systems struggle to preserve the sparse, style-sensitive structure of line art while supporting both high-level semantic changes and precise local redrawing. We present SketchAssist, an interactive sketch drawing assistant that accelerates creation by unifying instruction-guided global edits with line-guided region redrawing, while keeping unrelated regions and overall composition intact. To enable this assistant at scale, we introduce a controllable data generation pipeline that (i) constructs attribute-addition sequences from attribute-free base sketches, (ii) forms multi-step edit chains via cross-sequence sampling, and (iii) expands stylistic coverage with a style-preserving attribute-removal model applied to diverse sketches. Building on this data, SketchAssist employs a unified sketch editing framework with minimal changes to DiT-based editors. We repurpose the RGB channels to encode the inputs, enabling seamless switching between instruction-guided edits and line-guided redrawing within a single input interface. To further specialize behavior across modes, we integrate a task-guided mixture-of-experts into LoRA layers, routing by text and visual cues to improve semantic controllability, structural fidelity, and style preservation. Extensive experiments show state-of-the-art results on both tasks, with superior instruction adherence and style/structure preservation compared to recent baselines. Together, our dataset and SketchAssist provide a practical, controllable assistant for sketch creation and revision.