🤖 AI Summary
Existing knowledge transfer among pretrained models relies on manually predefined teacher-student relationships and struggles to leverage open model repositories effectively. Method: This paper proposes a role-agnostic autonomous knowledge transfer framework centered on data-partition-based bidirectional knowledge distillation: each model adaptively assumes the teacher or student role on different data subsets, enabling cross-architecture, multi-model collaborative knowledge flow without structural alignment constraints. The approach supports plug-and-play cooperation among heterogeneous models. Contribution/Results: Evaluated on image classification (ViT-B gains +1.4% accuracy), semantic segmentation, and video saliency prediction, the method achieves consistent improvements—setting a new state-of-the-art on video saliency prediction. To our knowledge, this is the first work to systematically model public model repositories as cooperative knowledge sources, establishing a general paradigm for lightweight model evolution.
📝 Abstract
Pretrained models are ubiquitous in the current deep learning landscape, offering strong results on a broad range of tasks. Recent works have shown that models differing in various design choices exhibit categorically diverse generalization behavior, resulting in one model grasping distinct data-specific insights unavailable to the other. In this paper, we propose to leverage large publicly available model repositories as an auxiliary source of model improvements. We introduce a data partitioning strategy where pretrained models autonomously adopt either the role of a student, seeking knowledge, or that of a teacher, imparting knowledge. Experiments across various tasks demonstrate the effectiveness of our proposed approach. In image classification, we improved the performance of ViT-B by approximately 1.4% through bidirectional knowledge transfer with ViT-T. For semantic segmentation, our method boosted all evaluation metrics by enabling knowledge transfer both within and across backbone architectures. In video saliency prediction, our approach achieved a new state-of-the-art. We further extend our approach to knowledge transfer between multiple models, leading to considerable performance improvements for all model participants.