🤖 AI Summary
In real-world long-tailed data, parameter-efficient fine-tuning (PEFT) methods often improve tail-class performance at the expense of head-class accuracy, while the critical factor of head-tail ratio remains largely overlooked.
Method: We propose LT-Soups, the first framework to systematically characterize the impact of head-tail ratio on PEFT generalization. It introduces a two-stage ensemble mechanism: (i) constructing a heterogeneous model pool—comprising LoRA and AdaptFormer variants trained on stratified subsamples—and performing weighted averaging; (ii) separately fine-tuning the classifier head to recalibrate head-tail bias.
Contribution/Results: Evaluated on six long-tailed benchmarks including CIFAR100, LT-Soups consistently outperforms existing PEFT and ensemble baselines across diverse imbalance levels. It simultaneously boosts head-class accuracy and tail-class recall, achieving a superior trade-off between head and tail performance without architectural or training overhead.
📝 Abstract
Real-world datasets typically exhibit long-tailed (LT) distributions, where a few head classes dominate and many tail classes are severely underrepresented. While recent work shows that parameter-efficient fine-tuning (PEFT) methods like LoRA and AdaptFormer preserve tail-class performance on foundation models such as CLIP, we find that they do so at the cost of head-class accuracy. We identify the head-tail ratio, the proportion of head to tail classes, as a crucial but overlooked factor influencing this trade-off. Through controlled experiments on CIFAR100 with varying imbalance ratio ($
ho$) and head-tail ratio ($eta$), we show that PEFT excels in tail-heavy scenarios but degrades in more balanced and head-heavy distributions. To overcome these limitations, we propose LT-Soups, a two-stage model soups framework designed to generalize across diverse LT regimes. In the first stage, LT-Soups averages models fine-tuned on balanced subsets to reduce head-class bias; in the second, it fine-tunes only the classifier on the full dataset to restore head-class accuracy. Experiments across six benchmark datasets show that LT-Soups achieves superior trade-offs compared to both PEFT and traditional model soups across a wide range of imbalance regimes.