Learning Task-Agnostic Representations through Multi-Teacher Distillation

📅 2025-10-21

📈 Citations: 0

✨ Influential: 0

career value

148K/year

🤖 AI Summary

This paper addresses the challenge of learning task-agnostic, general-purpose representations from heterogeneous teacher models without access to task labels or domain-specific priors. The proposed method introduces a multi-teacher knowledge distillation framework centered on a majority-voting-based consensus objective that aggregates predictive distributions across teachers; theoretically, this objective lower-bounds the mutual information between teacher and student representations. To enhance representation transferability, the framework jointly optimizes embedding-space alignment and leverages mutual information bounds for principled regularization. It operates uniformly across modalities—including text, vision, and molecular data—enabling cross-modal representation learning. Extensive experiments demonstrate that the learned embeddings consistently outperform state-of-the-art unsupervised and self-supervised baselines on diverse downstream tasks—including classification, clustering, and regression—validating their strong generalization capability and cross-task transferability.

Technology Category

Application Category

📝 Abstract

Casting complex inputs into tractable representations is a critical step across various fields. Diverse embedding models emerge from differences in architectures, loss functions, input modalities and datasets, each capturing unique aspects of the input. Multi-teacher distillation leverages this diversity to enrich representations but often remains tailored to specific tasks. In this paper, we introduce a task-agnostic framework based on a ``majority vote" objective function. We demonstrate that this function is bounded by the mutual information between student and teachers' embeddings, leading to a task-agnostic distillation loss that eliminates dependence on task-specific labels or prior knowledge. Our evaluations across text, vision models, and molecular modeling show that our method effectively leverages teacher diversity, resulting in representations enabling better performance for a wide range of downstream tasks such as classification, clustering, or regression. Additionally, we train and release state-of-the-art embedding models, enhancing downstream performance in various modalities.

Problem

Research questions and friction points this paper is trying to address.

Develops task-agnostic representations through multi-teacher distillation

Eliminates dependence on task-specific labels or prior knowledge

Creates versatile embeddings for diverse downstream applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Task-agnostic framework using majority vote objective

Distillation loss eliminates task-specific label dependence

Leverages teacher diversity for multi-modal representation enhancement

🔎 Similar Papers

No similar papers found.