🤖 AI Summary
Existing story visualization methods struggle to simultaneously achieve precise character customization, semantic consistency, and continual integration of new characters. To address this challenge, this work proposes EverTale, a story world simulator that leverages a unified LoRA module for efficient continual character adaptation. EverTale introduces three core mechanisms: an integrated character integrator, a chain-of-thought reasoning quality gate based on multimodal large language models (MLLMs), and a character-aware region-focused sampling strategy. The proposed approach significantly outperforms current methods in both single- and multi-character story generation tasks, effectively mitigating identity degradation and layout conflicts while enabling high-quality, coherent, and scalable story visualization.
📝 Abstract
Story visualization has gained increasing attention in computer vision. However, current methods often fail to achieve a synergy between accurate character customization, semantic alignment, and continuous integration of new identities. To tackle this challenge, in this paper we present EverTale, a story world simulator for continuous story character customization. We first propose an All-in-One-World Character Integrator to achieve continuous character adaptation within unified LoRA module, eliminating the need for per-character optimization modules of previous methods. Then, we incorporate a Character Quality Gate via MLLM-as-Judge to ensure the fidelity of each character adaptation process through chain-of-thought reasoning, determining whether the model can proceed to the next character or require additional training on the current one. We also introduce a Character-Aware Region-Focus Sampling strategy to address the identity degradation and layout conflicts in existing multi-character visual storytelling, ensuring natural multi-character generation by harmonizing local character-specific details with global scene context with higher efficiency. Experimental results show that our EverTale achieves superior performance against a wider range of compared methods on both single- and multi-character story visualization. Codes will be available.