🤖 AI Summary
Existing pretrained image editing models perform poorly on comic images, as they are primarily trained on natural images, and fine-tuning is hindered by computational costs and copyright constraints. This work proposes an inference-time adaptation strategy that leverages only the input comic image itself to achieve high-fidelity reconstruction under empty prompts, without requiring additional training data or model fine-tuning. By integrating trajectory optimization during inference with unsupervised image reconstruction, the method significantly outperforms current baselines while incurring negligible computational overhead. To the best of our knowledge, this is the first approach to enable efficient, personalized editing optimization for comic images, substantially improving both editing fidelity and general applicability.
📝 Abstract
We present an inference-time adaptation method that tailors a pretrained image editing model to each input manga image using only the input image itself. Despite recent progress in pretrained image editing, such models often underperform on manga because they are trained predominantly on natural-image data. Re-training or fine-tuning large-scale models on manga is, however, generally impractical due to both computational cost and copyright constraints. To address this issue, our method slightly corrects the generation trajectory at inference time so that the input image can be reconstructed more faithfully under an empty prompt. Experimental results show that our method consistently outperforms existing baselines while incurring only negligible computational overhead.