In-Context Sync-LoRA for Portrait Video Editing

📅 2025-12-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Portrait video editing requires fine-grained modifications—such as appearance, expression, or object replacement—while preserving temporal consistency of the subject’s motion and identity; its core challenges lie in frame-level synchronization and identity preservation. This paper proposes an in-context LoRA training paradigm for image-to-video diffusion models: lightweight LoRA modules are trained on synchronized, filtered paired video data to explicitly decouple motion dynamics from appearance. A synchrony-aware filtering mechanism constructs high-consistency training sets, enabling cross-identity and multi-type editing. Experiments demonstrate substantial improvements in visual fidelity and temporal coherence across diverse editing tasks, achieving a superior trade-off between editing accuracy and motion preservation compared to prior methods.

Technology Category

Application Category

📝 Abstract
Editing portrait videos is a challenging task that requires flexible yet precise control over a wide range of modifications, such as appearance changes, expression edits, or the addition of objects. The key difficulty lies in preserving the subject's original temporal behavior, demanding that every edited frame remains precisely synchronized with the corresponding source frame. We present Sync-LoRA, a method for editing portrait videos that achieves high-quality visual modifications while maintaining frame-accurate synchronization and identity consistency. Our approach uses an image-to-video diffusion model, where the edit is defined by modifying the first frame and then propagated to the entire sequence. To enable accurate synchronization, we train an in-context LoRA using paired videos that depict identical motion trajectories but differ in appearance. These pairs are automatically generated and curated through a synchronization-based filtering process that selects only the most temporally aligned examples for training. This training setup teaches the model to combine motion cues from the source video with the visual changes introduced in the edited first frame. Trained on a compact, highly curated set of synchronized human portraits, Sync-LoRA generalizes to unseen identities and diverse edits (e.g., modifying appearance, adding objects, or changing backgrounds), robustly handling variations in pose and expression. Our results demonstrate high visual fidelity and strong temporal coherence, achieving a robust balance between edit fidelity and precise motion preservation.
Problem

Research questions and friction points this paper is trying to address.

Preserving temporal synchronization during portrait video editing
Maintaining identity consistency while modifying appearance or expressions
Balancing edit fidelity with precise motion preservation in videos
Innovation

Methods, ideas, or system contributions that make the work stand out.

In-context LoRA training for synchronization
Synchronization-based filtering for training data curation
Image-to-video diffusion with first-frame edit propagation
🔎 Similar Papers
No similar papers found.