Learning Unified Distance Metric Across Diverse Data Distributions with Parameter-Efficient Transfer Learning

📅 2023-09-16

📈 Citations: 1

✨ Influential: 0

career value

202K/year

🤖 AI Summary

To address the challenges of imbalanced, heterogeneous multi-source data distributions in real-world scenarios and poor generalization of single-dataset metric learning, this paper proposes Unified Metric Learning (UML)—a novel paradigm for jointly learning a single, robust distance metric across multiple distributions. Methodologically, we introduce PUMA, a parameter-efficient framework that freezes a pretrained backbone, incorporates stochastic adapters and a learnable prompt pool, and integrates contrastive learning with multi-distribution joint optimization to mitigate distributional bias and sample imbalance. Our contributions are threefold: (1) the first UML benchmark comprising eight heterogeneous datasets; (2) a model requiring only 1.4% trainable parameters—69× fewer than state-of-the-art (SOTA) methods—while significantly improving cross-distribution generalization and fairness; and (3) consistent superiority over single-dataset SOTA methods across all tasks on the unified benchmark.

📝 Abstract

A common practice in metric learning is to train and test an embedding model for each dataset. This dataset-specific approach fails to simulate real-world scenarios that involve multiple heterogeneous distributions of data. In this regard, we explore a new metric learning paradigm, called Unified Metric Learning (UML), which learns a unified distance metric capable of capturing relations across multiple data distributions. UML presents new challenges, such as imbalanced data distribution and bias towards dominant distributions. These issues cause standard metric learning methods to fail in learning a unified metric. To address these challenges, we propose Parameter-efficient Unified Metric leArning (PUMA), which consists of a pre-trained frozen model and two additional modules, stochastic adapter and prompt pool. These modules enable to capture dataset-specific knowledge while avoiding bias towards dominant distributions. Additionally, we compile a new unified metric learning benchmark with a total of 8 different datasets. PUMA outperforms the state-of-the-art dataset-specific models while using about 69 times fewer trainable parameters.

Problem

Research questions and friction points this paper is trying to address.

Metric Learning

Data Imbalance

Multi-Dataset

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parameter-Efficient

Unified Metric Learning (PUMA)

Pre-trained Models and Random Adapters

🔎 Similar Papers

No similar papers found.

Microsoft

$6,710 -

San Francisco Bay area / New York City metropolitan area

Master Thesis Data-Efficient Hybrid Machine Learning for Robust Vibration System Prediction

Bosch Group

Renningen, BW, DE

Research Scientist Intern, AI Core Machine Learning (PhD)