RealDiffusion: Physics-informed Attention for Multi-character Storybook Generation

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses the challenge of generating temporally coherent multi-character image sequences that simultaneously preserve character identity consistency and narrative dynamism—a trade-off often leading existing methods to either distort character appearances or stagnate plot progression. To overcome this, the authors propose RealDiffusion, a unified framework that introduces thermal diffusion as a dissipative prior to stabilize character features while employing a region-aware stochastic process to drive pose and scene evolution. A key innovation is a training-free physics-informed attention mechanism that models feature dynamics as a configurable physical system, enabling the injection of controllable physical priors during inference to jointly optimize spatiotemporal consistency and prompt-driven variation. Experiments demonstrate that the proposed method significantly outperforms current approaches in both character consistency and narrative expressiveness.

📝 Abstract

While modern diffusion models excel at generating diverse single images, extending this to sequential generation reveals a fundamental challenge: balancing narrative dynamism with multi-character coherence. Existing methods often falter at this trade-off, leading to artifacts where characters lose their identity or the story stagnates. To resolve this critical tension, we introduce RealDiffusion, a unified framework designed to reconcile robust coherence with narrative dynamism. Heat diffusion serves as a dissipative prior that averages neighboring features along the sequence and removes high-frequency noise within the subject region. This suppresses attribute drift and stabilizes identity across frames. A region-aware stochastic process then introduces small perturbations that explore nearby modes and prevent collapse so the story maintains pose change and scene evolution. We thus introduce a lightweight, training-free Physics-informed Attention mechanism that injects controllable physical priors into the self-attention layers during inference. By modeling feature evolution as a configurable physical system, our method regularizes spatio-temporal relationships without suppressing intentional, prompt-driven changes. Extensive experiments demonstrate that RealDiffusion achieves substantial gains in character coherence while preserving narrative dynamism, outperforming state-of-the-art approaches. Code is available at https://github.com/ShmilyQi-CN/RealDiffusion.

Problem

Research questions and friction points this paper is trying to address.

multi-character coherence

narrative dynamism

sequential image generation

identity preservation

storybook generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Physics-informed Attention

Heat Diffusion Prior

Multi-character Coherence