Scalable Frame Sampling for Video Classification: A Semi-Optimal Policy Approach with Reduced Search Space

📅 2024-09-09
🏛️ British Machine Vision Conference
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the computationally intractable explosion of the frame-sampling search space—combinatorially scaling as $inom{T}{N}$—in video classification, this work proposes a decoupled single-frame value assessment framework. It reformulates frame selection as an independent confidence scoring problem and approximates the optimal $N$-frame subset via greedy selection of the top-$N$ highest-scoring frames. This reduces time complexity from $O(T^N)$ to $O(T)$, and for the first time provides theoretical guarantees on approximation optimality and scalability. The method is lightweight, model-agnostic, and requires no fine-tuning of downstream classifiers. Extensive experiments across multiple benchmarks and architectures demonstrate consistent superiority over state-of-the-art sampling methods, robustness to variations in both frame count $N$ and video length $T$, and over 100× inference speedup.

Technology Category

Application Category

📝 Abstract
Given a video with $T$ frames, frame sampling is a task to select $N ll T$ frames, so as to maximize the performance of a fixed video classifier. Not just brute-force search, but most existing methods suffer from its vast search space of $inom{T}{N}$, especially when $N$ gets large. To address this challenge, we introduce a novel perspective of reducing the search space from $O(T^N)$ to $O(T)$. Instead of exploring the entire $O(T^N)$ space, our proposed semi-optimal policy selects the top $N$ frames based on the independently estimated value of each frame using per-frame confidence, significantly reducing the computational complexity. We verify that our semi-optimal policy can efficiently approximate the optimal policy, particularly under practical settings. Additionally, through extensive experiments on various datasets and model architectures, we demonstrate that learning our semi-optimal policy ensures stable and high performance regardless of the size of $N$ and $T$.
Problem

Research questions and friction points this paper is trying to address.

Reducing vast search space in video frame sampling
Maximizing video classifier performance efficiently
Approximating optimal policy with reduced complexity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reduces search space from O(T^N) to O(T)
Selects top N frames using per-frame confidence
Ensures stable performance across varying N and T
🔎 Similar Papers
No similar papers found.