Switch-a-View: Few-Shot View Selection Learned from Edited Videos

📅 2024-12-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the viewpoint-adaptive selection problem for instructional multi-view operational videos. Methodologically, it proposes a few-shot transferable viewpoint orchestration framework that—uniquely—leverages unlabeled yet human-edited videos via self-supervised learning to model viewpoint transition patterns. The framework integrates pseudo-label generation using a pre-trained vision-language model, temporal attention modeling, cross-view contrastive learning, and a few-shot adaptation mechanism. Key contributions include: (1) rapid cross-domain adaptation without extensive manual annotation; (2) support for zero-shot domain transfer and real-time inference; and (3) state-of-the-art performance—achieving an 18.7% absolute improvement in viewpoint selection accuracy over baseline methods on both HowTo100M and Ego-Exo4D benchmarks.

Technology Category

Application Category

📝 Abstract
We introduce Switch-a-View, a model that learns to automatically select the viewpoint to display at each timepoint when creating a how-to video. The key insight of our approach is how to train such a model from unlabeled--but human-edited--video samples. We pose a pretext task that pseudo-labels segments in the training videos for their primary viewpoint (egocentric or exocentric), and then discovers the patterns between those view-switch moments on the one hand and the visual and spoken content in the how-to video on the other hand. Armed with this predictor, our model then takes an unseen multi-view video as input and orchestrates which viewpoint should be displayed when. We further introduce a few-shot training setting that permits steering the model towards a new data domain. We demonstrate our idea on a variety of real-world video from HowTo100M and Ego-Exo4D and rigorously validate its advantages.
Problem

Research questions and friction points this paper is trying to address.

Multi-angle Video
Automatic View Selection
Optimized Viewing Experience
Innovation

Methods, ideas, or system contributions that make the work stand out.

Switch-a-View
Automatic Perspective Selection
Few-shot Learning
🔎 Similar Papers
No similar papers found.