Zero-Shot Dynamic Concept Personalization with Grid-Based LoRA

📅 2025-07-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text-to-video generation methods rely on instance-specific fine-tuning for dynamic concept personalization—i.e., disentangling and transferring subject appearance and motion from a single reference video—suffering from poor scalability. This paper introduces the first zero-shot dynamic concept personalization framework: it requires no fine-tuning and generates the target video in a single forward pass, given only a structured 2×2 video grid input comprising a reference subject, driving motion, and corresponding masks. We propose Grid-LoRA, a lightweight adapter enabling cross-grid parameter sharing, and design the Grid Fill module to jointly enforce spatial layout constraints and temporal consistency. Evaluated across diverse subjects and editing scenarios, our method produces videos with high visual fidelity, strong temporal coherence, and faithful identity preservation. It significantly advances generalizability and practicality in dynamic personalized video synthesis.

Technology Category

Application Category

📝 Abstract
Recent advances in text-to-video generation have enabled high-quality synthesis from text and image prompts. While the personalization of dynamic concepts, which capture subject-specific appearance and motion from a single video, is now feasible, most existing methods require per-instance fine-tuning, limiting scalability. We introduce a fully zero-shot framework for dynamic concept personalization in text-to-video models. Our method leverages structured 2x2 video grids that spatially organize input and output pairs, enabling the training of lightweight Grid-LoRA adapters for editing and composition within these grids. At inference, a dedicated Grid Fill module completes partially observed layouts, producing temporally coherent and identity preserving outputs. Once trained, the entire system operates in a single forward pass, generalizing to previously unseen dynamic concepts without any test-time optimization. Extensive experiments demonstrate high-quality and consistent results across a wide range of subjects beyond trained concepts and editing scenarios.
Problem

Research questions and friction points this paper is trying to address.

Zero-shot personalization of dynamic video concepts
Eliminates per-instance fine-tuning for scalability
Generalizes to unseen concepts without test-time optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot dynamic concept personalization framework
Grid-LoRA adapters for video editing
Grid Fill module for coherent outputs
🔎 Similar Papers
No similar papers found.