StreamingEffect: Real-Time Human-Centric Video Effect Generation

📅 2026-05-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

201K/year
🤖 AI Summary
This work addresses the challenges of identity preservation, background retention, and temporal consistency in real-time portrait video effect generation by proposing a portrait-oriented streaming framework for video stylization. Built upon a contextual video editing architecture, the method distills a causal autoregressive student model from a bidirectional teacher model and incorporates a keyframe-based control mechanism to enable online interactive editing. The primary contributions include the construction of VideoEffect-130K, the first large-scale dataset for portrait video effects; the design of an efficient distillation strategy that reduces sampling steps from 50 to 4; and the achievement of real-time, high-quality 720p video editing on a single H200 GPU. Experiments demonstrate that the proposed approach significantly outperforms existing video editing and acceleration techniques in both visual quality and inference speed.
📝 Abstract
Streaming video effect generation is highly desirable for live human-centric applications such as e-commerce streaming, entertainment, and vlogging, yet remains difficult due to the lack of suitable data and deployable editing models. Unlike generic video generation, this task requires real-time video-to-video editing that adds expressive effects while preserving human identity, background content, and temporal consistency. Existing acceleration efforts mainly focus on text-to-video generation, while efficient distillation for video editing remains largely underexplored. In this paper, we present \textbf{StreamingEffect}, a real-time human-centric streaming video effect framework. We adopt an in-context video editing architecture and train a high-quality bidirectional teacher, then distill it into a causal autoregressive student and further reduce sampling from 50 steps to 4 steps. We also introduce keyframe control, allowing reference effect frames to be injected online and propagated through the stream for interactive editing. To address the data bottleneck, we construct \textbf{VideoEffect-130K}, to our knowledge the largest human-centric video effect dataset, containing 70K effect videos and 60K editing videos across 600 effect categories curated from short-video and editing platforms. Experiments show that our method enables real-time, high-quality 720p video editing on a single H200 GPU.
Problem

Research questions and friction points this paper is trying to address.

streaming video effect
real-time video editing
human-centric video
temporal consistency
video-to-video editing
Innovation

Methods, ideas, or system contributions that make the work stand out.

streaming video editing
model distillation
keyframe control
human-centric effects
real-time generation
🔎 Similar Papers
No similar papers found.