StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization

πŸ“… 2024-12-10
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 2
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the challenge of jointly preserving character identity consistency and ensuring fine-grained text-semantic alignment in story visualization, this paper proposes StoryWeaverβ€”a novel framework comprising three key components. First, we construct the first story-oriented Character Graph (CG), explicitly modeling fine-grained semantic relationships among characters, scenes, and events. Second, we design a CG-customized generation mechanism (C-CG) and a knowledge-enhanced spatial guidance module (KE-SG) to jointly optimize identity constraints and multi-character semantic coherence. Third, we adopt a lightweight world model architecture to improve computational efficiency. Evaluated on TBC-Bench, StoryWeaver achieves +9.03% improvement in DINO-I similarity and +13.44% in CLIP-T textual fidelity, significantly enhancing both multi-character consistency and text-image alignment. The code and dataset are publicly released.

Technology Category

Application Category

πŸ“ Abstract
Story visualization has gained increasing attention in artificial intelligence. However, existing methods still struggle with maintaining a balance between character identity preservation and text-semantics alignment, largely due to a lack of detailed semantic modeling of the story scene. To tackle this challenge, we propose a novel knowledge graph, namely Character Graph ( extbf{CG}), which comprehensively represents various story-related knowledge, including the characters, the attributes related to characters, and the relationship between characters. We then introduce StoryWeaver, an image generator that achieve Customization via Character Graph ( extbf{C-CG}), capable of consistent story visualization with rich text semantics. To further improve the multi-character generation performance, we incorporate knowledge-enhanced spatial guidance ( extbf{KE-SG}) into StoryWeaver to precisely inject character semantics into generation. To validate the effectiveness of our proposed method, extensive experiments are conducted using a new benchmark called TBC-Bench. The experiments confirm that our StoryWeaver excels not only in creating vivid visual story plots but also in accurately conveying character identities across various scenarios with considerable storage efficiency, emph{e.g.}, achieving an average increase of +9.03% DINO-I and +13.44% CLIP-T. Furthermore, ablation experiments are conducted to verify the superiority of the proposed module. Codes and datasets are released at https://github.com/Aria-Zhangjl/StoryWeaver.
Problem

Research questions and friction points this paper is trying to address.

Balancing character identity and text alignment
Enhancing story visualization with detailed semantics
Improving multi-character generation via knowledge injection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Character Graph for story knowledge
StoryWeaver for consistent visualization
Knowledge-enhanced spatial guidance
πŸ”Ž Similar Papers
No similar papers found.
J
Jinlu Zhang
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China
J
Jiji Tang
Fuxi AI Lab, Netease Inc.
Rongsheng Zhang
Rongsheng Zhang
Fuxi AI Lab, NetEase Inc., Hangzhou, China
NLP
Tangjie Lv
Tangjie Lv
netease
reinforcement learning
X
Xiaoshuai Sun
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China