Anchoring and Rescaling Attention for Semantically Coherent Inbetweening

📅 2026-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of semantic inconsistency, temporal instability, and inter-frame misalignment in generative intermediate frame synthesis under sparse keyframes and large-motion scenarios. The authors propose a keyframe-anchored attention bias combined with a rescaled temporal RoPE (Rotary Position Embedding) mechanism, which enhances the fidelity of self-attention to keyframes through dual semantic and temporal guidance. This approach significantly improves generation coherence and semantic consistency for both short and long sequences without requiring additional training. Furthermore, the study introduces TGI-Bench, the first text-conditioned benchmark for evaluating generative intermediate frame synthesis. Extensive experiments demonstrate state-of-the-art performance in frame consistency, semantic fidelity, and motion rhythm stability across diverse complex scenes.

Technology Category

Application Category

📝 Abstract
Generative inbetweening (GI) seeks to synthesize realistic intermediate frames between the first and last keyframes beyond mere interpolation. As sequences become sparser and motions larger, previous GI models struggle with inconsistent frames with unstable pacing and semantic misalignment. Since GI involves fixed endpoints and numerous plausible paths, this task requires additional guidance gained from the keyframes and text to specify the intended path. Thus, we give semantic and temporal guidance from the keyframes and text onto each intermediate frame through Keyframe-anchored Attention Bias. We also better enforce frame consistency with Rescaled Temporal RoPE, which allows self-attention to attend to keyframes more faithfully. TGI-Bench, the first benchmark specifically designed for text-conditioned GI evaluation, enables challenge-targeted evaluation to analyze GI models. Without additional training, our method achieves state-of-the-art frame consistency, semantic fidelity, and pace stability for both short and long sequences across diverse challenges.
Problem

Research questions and friction points this paper is trying to address.

Generative Inbetweening
semantic coherence
frame consistency
temporal pacing
keyframe alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Keyframe-anchored Attention
Rescaled Temporal RoPE
Generative Inbetweening
Semantic Coherence
Text-conditioned Video Generation
🔎 Similar Papers
No similar papers found.