Let's Split Up: Zero-Shot Classifier Edits for Fine-Grained Video Understanding

📅 2026-02-18

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Existing video recognition models rely on fixed, coarse-grained taxonomies that struggle to adapt cost-effectively to evolving demands for fine-grained categories. This work introduces and addresses, for the first time, the zero-shot category splitting problem for video classifiers: by uncovering the latent compositional structure within a trained classifier, it automatically refines coarse categories into meaningful subcategories without requiring any additional labeled data. The approach further incorporates few-shot fine-tuning to enhance performance on the newly split classes. Evaluated on a newly established video category splitting benchmark, our method significantly outperforms vision-language baselines, achieving substantial gains in accuracy on novel subcategories while preserving original classification performance on parent categories.

Technology Category

Application Category

📝 Abstract

Video recognition models are typically trained on fixed taxonomies which are often too coarse, collapsing distinctions in object, manner or outcome under a single label. As tasks and definitions evolve, such models cannot accommodate emerging distinctions and collecting new annotations and retraining to accommodate such changes is costly. To address these challenges, we introduce category splitting, a new task where an existing classifier is edited to refine a coarse category into finer subcategories, while preserving accuracy elsewhere. We propose a zero-shot editing method that leverages the latent compositional structure of video classifiers to expose fine-grained distinctions without additional data. We further show that low-shot fine-tuning, while simple, is highly effective and benefits from our zero-shot initialization. Experiments on our new video benchmarks for category splitting demonstrate that our method substantially outperforms vision-language baselines, improving accuracy on the newly split categories without sacrificing performance on the rest. Project page: https://kaitingliu.github.io/Category-Splitting/.

Problem

Research questions and friction points this paper is trying to address.

video recognition

fine-grained understanding

category splitting

zero-shot editing

classifier adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

category splitting

zero-shot editing

fine-grained video understanding

compositional structure

video recognition

🔎 Similar Papers

VideoPrism: A Foundational Visual Encoder for Video Understanding

2024-02-20International Conference on Machine LearningCitations: 30