π€ AI Summary
To address the challenge of generating stylistically consistent yet highly diverse novel characters from only a few reference examples, this paper proposes an efficient diffusion-based fine-tuning method. The approach decouples individual identity from shared artistic style via a multi-token clustering assignment mechanism; replaces class-specific regularization with random token augmentation to enhance diversity; and integrates LoRA for parameter-efficient adaptation within the DreamBooth text-to-image framework. Experiments on five specialized small-scale datasets demonstrate significant improvements over state-of-the-art baselines: superior quantitative performance (lower FID, lower LPIPS) and human evaluations confirming advantages in style fidelity, detail richness, and character novelty. The method enables unlimited, controllable style-aware character generation with minimal supervision.
π Abstract
The audiovisual industry is undergoing a profound transformation as it is integrating AI developments not only to automate routine tasks but also to inspire new forms of art. This paper addresses the problem of producing a virtually unlimited number of novel characters that preserve the artistic style and shared visual traits of a small set of human-designed reference characters, thus broadening creative possibilities in animation, gaming, and related domains. Our solution builds upon DreamBooth, a well-established fine-tuning technique for text-to-image diffusion models, and adapts it to tackle two core challenges: capturing intricate character details beyond textual prompts and the few-shot nature of the training data. To achieve this, we propose a multi-token strategy, using clustering to assign separate tokens to individual characters and their collective style, combined with LoRA-based parameter-efficient fine-tuning. By removing the class-specific regularization set and introducing random tokens and embeddings during generation, our approach allows for unlimited character creation while preserving the learned style. We evaluate our method on five small specialized datasets, comparing it to relevant baselines using both quantitative metrics and a human evaluation study. Our results demonstrate that our approach produces high-quality, diverse characters while preserving the distinctive aesthetic features of the reference characters, with human evaluation further reinforcing its effectiveness and highlighting the potential of our method.