🤖 AI Summary
Existing image-to-video and video generation models often produce sequences with uneven semantic progression, manifesting as alternating periods of stagnation and abrupt change that compromise visual coherence. This work proposes the first model-agnostic, one-dimensional semantic progression function that reparameterizes the semantic rhythm of videos by measuring semantic embedding distances, applying smooth curve fitting, and performing sequence retiming to achieve a constant rate of semantic change. The method enables visualization-based diagnosis of semantic pacing, facilitates cross-model comparisons, and supports guidance toward arbitrary target rhythms. It substantially enhances transition smoothness and semantic consistency in generated videos and offers flexible control over the semantic dynamics of both real and synthetic video content.
📝 Abstract
Transformations produced by image and video generation models often evolve in a highly non-linear manner: long stretches where the content barely changes are followed by sudden, abrupt semantic jumps. To analyze and correct this behavior, we introduce a Semantic Progress Function, a one-dimensional representation that captures how the meaning of a given sequence evolves over time. For each frame, we compute distances between semantic embeddings and fit a smooth curve that reflects the cumulative semantic shift across the sequence. Departures of this curve from a straight line reveal uneven semantic pacing. Building on this insight, we propose a semantic linearization procedure that reparameterizes (or retimes) the sequence so that semantic change unfolds at a constant rate, yielding smoother and more coherent transitions. Beyond linearization, our framework provides a model-agnostic foundation for identifying temporal irregularities, comparing semantic pacing across different generators, and steering both generated and real-world video sequences toward arbitrary target pacing.