Dynamic-Aware Video Distillation: Optimizing Temporal Resolution Based on Video Semantics

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing video dataset distillation methods overlook semantic diversity and rigidly assume uniform temporal redundancy, limiting both compression efficiency and fidelity. This paper proposes a dynamic-aware video distillation framework that introduces, for the first time, a semantics-driven adaptive temporal resolution mechanism—dynamically adjusting keyframe density according to video content semantics. Additionally, we design a teacher-in-the-loop reinforcement learning strategy that uses downstream task performance as a feedback signal to end-to-end optimize temporal compression decisions. Evaluated on multiple benchmarks, our method achieves higher downstream accuracy (+3.1% top-1 accuracy) using significantly fewer frames (42% reduction on average), demonstrating comprehensive advantages in compactness, fidelity, and generalization over prior approaches.

Technology Category

Application Category

📝 Abstract
With the rapid development of vision tasks and the scaling on datasets and models, redundancy reduction in vision datasets has become a key area of research. To address this issue, dataset distillation (DD) has emerged as a promising approach to generating highly compact synthetic datasets with significantly less redundancy while preserving essential information. However, while DD has been extensively studied for image datasets, DD on video datasets remains underexplored. Video datasets present unique challenges due to the presence of temporal information and varying levels of redundancy across different classes. Existing DD approaches assume a uniform level of temporal redundancy across all different video semantics, which limits their effectiveness on video datasets. In this work, we propose Dynamic-Aware Video Distillation (DAViD), a Reinforcement Learning (RL) approach to predict the optimal Temporal Resolution of the synthetic videos. A teacher-in-the-loop reward function is proposed to update the RL agent policy. To the best of our knowledge, this is the first study to introduce adaptive temporal resolution based on video semantics in video dataset distillation. Our approach significantly outperforms existing DD methods, demonstrating substantial improvements in performance. This work paves the way for future research on more efficient and semantic-adaptive video dataset distillation research.
Problem

Research questions and friction points this paper is trying to address.

Reducing redundancy in video datasets while preserving essential information
Addressing varying temporal redundancy across different video semantics
Optimizing temporal resolution adaptively based on video content
Innovation

Methods, ideas, or system contributions that make the work stand out.

RL predicts optimal video temporal resolution
Teacher-in-the-loop reward updates RL policy
Adaptive resolution based on video semantics
🔎 Similar Papers
Yinjie Zhao
Yinjie Zhao
PhD Student on AI at EEE NTU, Singapore
Deep LearningMultimodalAI
Heng Zhao
Heng Zhao
The Rockefeller University
Image RestorationDeep LearningInverse Problem
Bihan Wen
Bihan Wen
Associate Professor, Nanyang Technological University
Machine LearningImage ProcessingComputational ImagingComputer VisionTrustworthy AI
Y
Y. Ong
CFAR, Agency for Science, Technology and Research (A*STAR), Singapore; CCDS, Nanyang Technological University, Singapore
J
J. Zhou
CFAR, Agency for Science, Technology and Research (A*STAR), Singapore