🤖 AI Summary
Existing open-world visual learning research predominantly relies on in-distribution, canonical video data, overlooking the potential of atypical videos—such as science fiction or animated content—with semantic deviations from real-world distributions—to enhance model generalization and novel concept discovery.
Method: We systematically investigate the role of such unconventional videos in open-world representation learning, proposing a hybrid training paradigm that integrates self-supervised and supervised contrastive learning to explicitly incorporate these samples and enrich semantic diversity in the feature space.
Contribution/Results: Experiments demonstrate that semantic diversity induced by atypical videos significantly outperforms mere dataset scaling, yielding consistent improvements in out-of-distribution detection accuracy, novel class discovery recall, and zero-shot action recognition performance. To our knowledge, this is the first work to empirically validate atypical videos as effective “concept perturbation sources,” offering both a novel theoretical perspective and a practical pathway for advancing open-world visual learning.
📝 Abstract
Humans usually show exceptional generalisation and discovery ability in the open world, when being shown uncommon new concepts. Whereas most existing studies in the literature focus on common typical data from closed sets, open-world novel discovery is under-explored in videos. In this paper, we are interested in asking: extit{What if atypical unusual videos are exposed in the learning process?} To this end, we collect a new video dataset consisting of various types of unusual atypical data (eg sci-fi, animation, etc). To study how such atypical data may benefit open-world learning, we feed them into the model training process for representation learning. Focusing on three key tasks in open-world learning: out-of-distribution (OOD) detection, novel category discovery (NCD), and zero-shot action recognition (ZSAR), we found that even straightforward learning approaches with atypical data consistently improve performance across various settings. Furthermore, we found that increasing the categorical diversity of the atypical samples further boosts OOD detection performance. Additionally, in the NCD task, using a smaller yet more semantically diverse set of atypical samples leads to better performance compared to using a larger but more typical dataset. In the ZSAR setting, the semantic diversity of atypical videos helps the model generalise better to unseen action classes. These observations in our extensive experimental evaluations reveal the benefits of atypical videos for visual representation learning in the open world, together with the newly proposed dataset, encouraging further studies in this direction.