Lifelong Learning of Video Diffusion Models From a Single Video Stream

📅 2024-06-07

📈 Citations: 2

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work investigates lifelong learning for autoregressive video diffusion models under a single-video-stream training paradigm, aiming to match offline training performance during continual streaming. Methodologically, it introduces a lightweight subset-based experience replay mechanism that selectively caches critical historical frames to mitigate catastrophic forgetting, and establishes three million-scale synthetic benchmarks—Bouncing Balls, 3D Maze, and PLAICraft—to enable controlled, reproducible lifelong learning evaluation. Key contributions are threefold: (1) the first empirical demonstration that single-video-stream lifelong training achieves performance on par with offline training; (2) an efficient replay strategy that substantially reduces memory footprint and retraining overhead; and (3) the open-sourcing of the first benchmark suite dedicated to lifelong learning for video generation. Experiments show no statistically significant performance gap versus offline training in standard metrics—including FVD and LPIPS—under identical gradient step budgets.

Technology Category

Application Category

📝 Abstract

This work demonstrates that training autoregressive video diffusion models from a single, continuous video stream is not only possible but remarkably can also be competitive with standard offline training approaches given the same number of gradient steps. Our demonstration further reveals that this main result can be achieved using experience replay that only retains a subset of the preceding video stream. We also contribute three new single video generative modeling datasets suitable for evaluating lifelong video model learning: Lifelong Bouncing Balls, Lifelong 3D Maze, and Lifelong PLAICraft. Each dataset contains over a million consecutive frames from a synthetic environment of increasing complexity.

Problem

Research questions and friction points this paper is trying to address.

Train video diffusion models from single video stream

Achieve effectiveness comparable to offline training

Introduce datasets for lifelong generative video modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training autoregressive video diffusion models from single video stream

Using experience replay methods retaining subset of video stream

Introducing four new datasets for lifelong generative video modeling

🔎 Similar Papers

No similar papers found.