PRISM: Video Dataset Condensation with Progressive Refinement and Insertion for Sparse Motion

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Video dataset compression faces challenges including difficulty in spatiotemporal coupling modeling and high computational overhead. Existing approaches predominantly adopt static-dynamic decoupling paradigms, which struggle to jointly preserve motion sparsity and representation fidelity. To address this, we propose an end-to-end differentiable video distillation framework that abandons conventional decoupling and instead explicitly models inter-frame gradient relationships for joint spatiotemporal optimization. Our method introduces three key components: (i) gradient-driven progressive frame selection, (ii) motion-aware frame refinement, and (iii) adaptive frame insertion. Evaluated on standard action recognition benchmarks, our approach achieves over 40% higher compression ratio with less than 1% top-1 accuracy degradation compared to state-of-the-art methods. Moreover, the resulting model is lightweight and well-suited for edge deployment.

Technology Category

Application Category

📝 Abstract
Video dataset condensation has emerged as a critical technique for addressing the computational challenges associated with large-scale video data processing in deep learning applications. While significant progress has been made in image dataset condensation, the video domain presents unique challenges due to the complex interplay between spatial content and temporal dynamics. This paper introduces PRISM, Progressive Refinement and Insertion for Sparse Motion, for video dataset condensation, a novel approach that fundamentally reconsiders how video data should be condensed. Unlike the previous method that separates static content from dynamic motion, our method preserves the essential interdependence between these elements. Our approach progressively refines and inserts frames to fully accommodate the motion in an action while achieving better performance but less storage, considering the relation of gradients for each frame. Extensive experiments across standard video action recognition benchmarks demonstrate that PRISM outperforms existing disentangled approaches while maintaining compact representations suitable for resource-constrained environments.
Problem

Research questions and friction points this paper is trying to address.

Condenses video datasets to reduce computational challenges
Preserves spatial-temporal interdependence in video data
Improves performance with compact storage for resource constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive refinement and insertion for sparse motion
Preserves interdependence between static and dynamic elements
Optimizes gradients relation for compact video representation
🔎 Similar Papers
No similar papers found.