RealDiffusion: Physics-informed Attention for Multi-character Storybook Generation

πŸ“… 2026-05-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

226K/year
πŸ€– AI Summary
This work addresses the challenge of generating temporally coherent multi-character image sequences that simultaneously preserve character identity consistency and narrative dynamismβ€”a trade-off often leading existing methods to either distort character appearances or stagnate plot progression. To overcome this, the authors propose RealDiffusion, a unified framework that introduces thermal diffusion as a dissipative prior to stabilize character features while employing a region-aware stochastic process to drive pose and scene evolution. A key innovation is a training-free physics-informed attention mechanism that models feature dynamics as a configurable physical system, enabling the injection of controllable physical priors during inference to jointly optimize spatiotemporal consistency and prompt-driven variation. Experiments demonstrate that the proposed method significantly outperforms current approaches in both character consistency and narrative expressiveness.
πŸ“ Abstract
While modern diffusion models excel at generating diverse single images, extending this to sequential generation reveals a fundamental challenge: balancing narrative dynamism with multi-character coherence. Existing methods often falter at this trade-off, leading to artifacts where characters lose their identity or the story stagnates. To resolve this critical tension, we introduce RealDiffusion, a unified framework designed to reconcile robust coherence with narrative dynamism. Heat diffusion serves as a dissipative prior that averages neighboring features along the sequence and removes high-frequency noise within the subject region. This suppresses attribute drift and stabilizes identity across frames. A region-aware stochastic process then introduces small perturbations that explore nearby modes and prevent collapse so the story maintains pose change and scene evolution. We thus introduce a lightweight, training-free Physics-informed Attention mechanism that injects controllable physical priors into the self-attention layers during inference. By modeling feature evolution as a configurable physical system, our method regularizes spatio-temporal relationships without suppressing intentional, prompt-driven changes. Extensive experiments demonstrate that RealDiffusion achieves substantial gains in character coherence while preserving narrative dynamism, outperforming state-of-the-art approaches. Code is available at https://github.com/ShmilyQi-CN/RealDiffusion.
Problem

Research questions and friction points this paper is trying to address.

multi-character coherence
narrative dynamism
sequential image generation
identity preservation
storybook generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Physics-informed Attention
Heat Diffusion Prior
Multi-character Coherence
Narrative Dynamism
Training-free Regularization
πŸ”Ž Similar Papers
No similar papers found.