ReDiStory: Region-Disentangled Diffusion for Consistent Visual Story Generation

📅 2026-02-01

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

This work addresses the challenge of cross-frame semantic interference in multi-frame visual story generation, where existing methods struggle to balance character identity consistency with per-frame semantic specificity due to entangled prompt fusion. The authors propose a training-free inference-stage approach that, for the first time within a diffusion model framework, achieves region-wise disentanglement and inter-frame decorrelation of text prompt embeddings. By decomposing textual embeddings into identity-related and frame-specific components and suppressing shared directions across frames, the method effectively mitigates semantic interference. Notably, it requires no model parameter modifications or additional supervision, yet significantly outperforms the 1Prompt1Story baseline on the ConsiStory+ benchmark, achieving consistent improvements across multiple identity consistency metrics.

Technology Category

Application Category

📝 Abstract

Generating coherent visual stories requires maintaining subject identity across multiple images while preserving frame-specific semantics. Recent training-free methods concatenate identity and frame prompts into a unified representation, but this often introduces inter-frame semantic interference that weakens identity preservation in complex stories. We propose ReDiStory, a training-free framework that improves multi-frame story generation via inference-time prompt embedding reorganization. ReDiStory explicitly decomposes text embeddings into identity-related and frame-specific components, then decorrelates frame embeddings by suppressing shared directions across frames. This reduces cross-frame interference without modifying diffusion parameters or requiring additional supervision. Under identical diffusion backbones and inference settings, ReDiStory improves identity consistency while maintaining prompt fidelity. Experiments on the ConsiStory+ benchmark show consistent gains over 1Prompt1Story on multiple identity consistency metrics. Code is available at: https://github.com/YuZhenyuLindy/ReDiStory

Problem

Research questions and friction points this paper is trying to address.

visual story generation

identity consistency

semantic interference

multi-frame generation

subject identity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Region-Disentangled Diffusion

Visual Story Generation

Identity Consistency