Scalable Frame Sampling for Video Classification: A Semi-Optimal Policy Approach with Reduced Search Space

📅 2024-09-09

🏛️ British Machine Vision Conference

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

To address the computationally intractable explosion of the frame-sampling search space—combinatorially scaling as $inom{T}{N}$—in video classification, this work proposes a decoupled single-frame value assessment framework. It reformulates frame selection as an independent confidence scoring problem and approximates the optimal $N$-frame subset via greedy selection of the top-$N$ highest-scoring frames. This reduces time complexity from $O(T^N)$ to $O(T)$, and for the first time provides theoretical guarantees on approximation optimality and scalability. The method is lightweight, model-agnostic, and requires no fine-tuning of downstream classifiers. Extensive experiments across multiple benchmarks and architectures demonstrate consistent superiority over state-of-the-art sampling methods, robustness to variations in both frame count $N$ and video length $T$, and over 100× inference speedup.

Technology Category

Application Category

📝 Abstract

Given a video with $T$ frames, frame sampling is a task to select $N ll T$ frames, so as to maximize the performance of a fixed video classifier. Not just brute-force search, but most existing methods suffer from its vast search space of $inom{T}{N}$, especially when $N$ gets large. To address this challenge, we introduce a novel perspective of reducing the search space from $O(T^N)$ to $O(T)$. Instead of exploring the entire $O(T^N)$ space, our proposed semi-optimal policy selects the top $N$ frames based on the independently estimated value of each frame using per-frame confidence, significantly reducing the computational complexity. We verify that our semi-optimal policy can efficiently approximate the optimal policy, particularly under practical settings. Additionally, through extensive experiments on various datasets and model architectures, we demonstrate that learning our semi-optimal policy ensures stable and high performance regardless of the size of $N$ and $T$.

Problem

Research questions and friction points this paper is trying to address.

Reducing vast search space in video frame sampling

Maximizing video classifier performance efficiently

Approximating optimal policy with reduced complexity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reduces search space from O(T^N) to O(T)

Selects top N frames using per-frame confidence

Ensures stable performance across varying N and T

🔎 Similar Papers

No similar papers found.

TikTok

San Jose, California

AI Research Scientist, Computer Vision - Facebook Video Intelligence