Efficient Long-Tail Learning in Latent Space by sampling Synthetic Data

πŸ“… 2025-09-19
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Addressing imbalanced classification under long-tailed distributions, existing methods often struggle to balance performance and computational efficiency. This paper proposes an efficient long-tailed learning paradigm: leveraging the semantic latent space of vision foundation models (e.g., CLIP), it enables controllable sampling to generate high-fidelity, diverse synthetic samples for tail classes; only a small number of real examples are mixed to train a lightweight linear classifier. By avoiding fine-tuning large backbone networks, the approach significantly reduces computational overhead while enhancing minority-class recognition. It achieves state-of-the-art performance on CIFAR-100-LT and strong results on Places-LT, demonstrating effectiveness, generalizability, and deployment friendliness. The core contribution is the first systematic exploitation of foundation models’ semantic latent spaces for low-cost, high-fidelity long-tailed data augmentation.

Technology Category

Application Category

πŸ“ Abstract
Imbalanced classification datasets pose significant challenges in machine learning, often leading to biased models that perform poorly on underrepresented classes. With the rise of foundation models, recent research has focused on the full, partial, and parameter-efficient fine-tuning of these models to deal with long-tail classification. Despite the impressive performance of these works on the benchmark datasets, they still fail to close the gap with the networks trained using the balanced datasets and still require substantial computational resources, even for relatively smaller datasets. Underscoring the importance of computational efficiency and simplicity, in this work we propose a novel framework that leverages the rich semantic latent space of Vision Foundation Models to generate synthetic data and train a simple linear classifier using a mixture of real and synthetic data for long-tail classification. The computational efficiency gain arises from the number of trainable parameters that are reduced to just the number of parameters in the linear model. Our method sets a new state-of-the-art for the CIFAR-100-LT benchmark and demonstrates strong performance on the Places-LT benchmark, highlighting the effectiveness and adaptability of our simple and effective approach.
Problem

Research questions and friction points this paper is trying to address.

Addresses imbalanced classification in long-tail datasets
Reduces computational resources for training models
Generates synthetic data using Vision Foundation Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates synthetic data in latent space
Uses linear classifier with real-synthetic mixture
Reduces trainable parameters for efficiency
πŸ”Ž Similar Papers
No similar papers found.