FlowC2S: Flowing from Current to Succeeding Frames for Fast and Memory-Efficient Video Continuation

📅 2026-04-19

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work addresses the high computational cost, substantial memory consumption, and suboptimal generation quality in video continuation tasks by proposing an efficient flow-matching-based generative approach. The method fine-tunes a pretrained text-to-video diffusion model to learn a vector field mapping current frames to subsequent ones, directly modeling inter-frame flow trajectories without introducing noise and with reduced input dimensionality. It innovatively incorporates intrinsic optimal coupling and target inversion mechanisms to straighten flow paths and enhance frame-wise correspondence accuracy. Remarkably, with only five neural function evaluations, the proposed method achieves significant improvements over existing approaches in both FID and FVD metrics, simultaneously boosting generation efficiency and visual fidelity.

Technology Category

Application Category

📝 Abstract

This paper introduces a novel methodology for generating fast and memory-efficient video continuations. Our method, dubbed FlowC2S, fine-tunes a pre-trained text-to-video flow model to learn a vector field between the current and succeeding video chunks. Two design choices are key. First, we introduce inherent optimal couplings, utilizing temporally adjacent video chunks during training as a practical proxy for true optimal couplings, resulting in straighter flows. Second, we incorporate target inversion, injecting the inverted latent of the target chunk into the input representation to strengthen correspondences and improve visual fidelity. By flowing directly from current to succeeding frames, instead of the common combination of current frames with noise to generate a video continuation, we reduce the dimensionality of the model input by a factor of two. The proposed method, fine-tuned from LTXV and Wan, surpasses the state-of-the-art scores across quantitative evaluations with FID and FVD, with as few as five neural function evaluations.

Problem

Research questions and friction points this paper is trying to address.

video continuation

memory efficiency

fast generation

flow model

Innovation

Methods, ideas, or system contributions that make the work stand out.

FlowC2S

optimal couplings

target inversion

video continuation