ReDiStory: Region-Disentangled Diffusion for Consistent Visual Story Generation

📅 2026-02-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of cross-frame semantic interference in multi-frame visual story generation, where existing methods struggle to balance character identity consistency with per-frame semantic specificity due to entangled prompt fusion. The authors propose a training-free inference-stage approach that, for the first time within a diffusion model framework, achieves region-wise disentanglement and inter-frame decorrelation of text prompt embeddings. By decomposing textual embeddings into identity-related and frame-specific components and suppressing shared directions across frames, the method effectively mitigates semantic interference. Notably, it requires no model parameter modifications or additional supervision, yet significantly outperforms the 1Prompt1Story baseline on the ConsiStory+ benchmark, achieving consistent improvements across multiple identity consistency metrics.

Technology Category

Application Category

📝 Abstract
Generating coherent visual stories requires maintaining subject identity across multiple images while preserving frame-specific semantics. Recent training-free methods concatenate identity and frame prompts into a unified representation, but this often introduces inter-frame semantic interference that weakens identity preservation in complex stories. We propose ReDiStory, a training-free framework that improves multi-frame story generation via inference-time prompt embedding reorganization. ReDiStory explicitly decomposes text embeddings into identity-related and frame-specific components, then decorrelates frame embeddings by suppressing shared directions across frames. This reduces cross-frame interference without modifying diffusion parameters or requiring additional supervision. Under identical diffusion backbones and inference settings, ReDiStory improves identity consistency while maintaining prompt fidelity. Experiments on the ConsiStory+ benchmark show consistent gains over 1Prompt1Story on multiple identity consistency metrics. Code is available at: https://github.com/YuZhenyuLindy/ReDiStory
Problem

Research questions and friction points this paper is trying to address.

visual story generation
identity consistency
semantic interference
multi-frame generation
subject identity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Region-Disentangled Diffusion
Visual Story Generation
Identity Consistency
Prompt Embedding Reorganization
Training-Free Framework
🔎 Similar Papers
No similar papers found.