Image Referenced Sketch Colorization Based on Animation Creation Workflow

📅 2025-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing sketch coloring methods suffer from imprecise text-guided colorization, heavy reliance on manual prompt engineering, and artifact-prone image references. Addressing these challenges for animation production, this paper proposes a diffusion-based structure-color disentanglement coloring framework: it leverages sketches as geometric guidance and RGB images as color references. We introduce, for the first time, a split cross-attention mechanism and a LoRA fine-tuning module to independently model and controllably edit foreground and background features. Additionally, spatial masking guidance and a switchable inference mode are incorporated to mitigate inter-region interference and spatial artifacts. Experiments demonstrate that our method consistently produces high-fidelity, artifact-free results—even under severe geometric misalignment—outperforming state-of-the-art approaches in qualitative evaluation, quantitative metrics (e.g., LPIPS, FID), and user studies. Ablation studies validate the effectiveness of each component.

Technology Category

Application Category

📝 Abstract
Sketch colorization plays an important role in animation and digital illustration production tasks. However, existing methods still meet problems in that text-guided methods fail to provide accurate color and style reference, hint-guided methods still involve manual operation, and image-referenced methods are prone to cause artifacts. To address these limitations, we propose a diffusion-based framework inspired by real-world animation production workflows. Our approach leverages the sketch as the spatial guidance and an RGB image as the color reference, and separately extracts foreground and background from the reference image with spatial masks. Particularly, we introduce a split cross-attention mechanism with LoRA (Low-Rank Adaptation) modules. They are trained separately with foreground and background regions to control the corresponding embeddings for keys and values in cross-attention. This design allows the diffusion model to integrate information from foreground and background independently, preventing interference and eliminating the spatial artifacts. During inference, we design switchable inference modes for diverse use scenarios by changing modules activated in the framework. Extensive qualitative and quantitative experiments, along with user studies, demonstrate our advantages over existing methods in generating high-qualigy artifact-free results with geometric mismatched references. Ablation studies further confirm the effectiveness of each component. Codes are available at https://github.com/ tellurion-kanata/colorizeDiffusion.
Problem

Research questions and friction points this paper is trying to address.

Addresses inaccurate color and style in text-guided methods
Reduces manual operations in hint-guided methods
Eliminates artifacts in image-referenced methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-based framework
Split cross-attention mechanism
LoRA modules integration
🔎 Similar Papers
No similar papers found.
D
Dingkun Yan
Institute of Science Tokyo
X
Xinrui Wang
The University of Tokyo
Z
Zhuoru Li
Project HAT
S
Suguru Saito
Institute of Science Tokyo
Yusuke Iwasawa
Yusuke Iwasawa
The University of Tokyo
deep learningtransfer learningfoundation modelmeta learning
Y
Yutaka Matsuo
The University of Tokyo
Jiaxian Guo
Jiaxian Guo
Google Research
Efficient Foundation ModelReinforcement LearningCausality