🤖 AI Summary
Existing diffusion models struggle with high-precision, efficient, contextually consistent, and controllable line-art coloring in manga production—particularly regarding multi-reference image handling, inference latency, and user control. Method: We propose the first causal sparse DiT architecture tailored for long-context reference conditioning, integrating customized positional encoding, causal sparse attention, and KV caching. This design enables simultaneous processing of over 200 reference images and color prompts while preserving color identity consistency and drastically reducing inference latency. Contribution/Results: Our method achieves industrial-grade performance in both coloring accuracy and inference speed, enabling the first high-quality real-time interactive coloring system. It satisfies core production requirements for efficiency, cross-frame color consistency, and fine-grained controllability—marking a significant advancement in practical AI-assisted manga coloring.
📝 Abstract
The comic production industry requires reference-based line art colorization with high accuracy, efficiency, contextual consistency, and flexible control. A comic page often involves diverse characters, objects, and backgrounds, which complicates the coloring process. Despite advancements in diffusion models for image generation, their application in line art colorization remains limited, facing challenges related to handling extensive reference images, time-consuming inference, and flexible control. We investigate the necessity of extensive contextual image guidance on the quality of line art colorization. To address these challenges, we introduce Cobra, an efficient and versatile method that supports color hints and utilizes over 200 reference images while maintaining low latency. Central to Cobra is a Causal Sparse DiT architecture, which leverages specially designed positional encodings, causal sparse attention, and Key-Value Cache to effectively manage long-context references and ensure color identity consistency. Results demonstrate that Cobra achieves accurate line art colorization through extensive contextual reference, significantly enhancing inference speed and interactivity, thereby meeting critical industrial demands. We release our codes and models on our project page: https://zhuang2002.github.io/Cobra/.