🤖 AI Summary
This work addresses the significant performance degradation on tail classes in long-tailed image classification by establishing, for the first time, a theoretical connection between worst-class error and the spectral norm of the confusion matrix from the perspective of inter-class confusion. To this end, the authors propose Confusion-Aware Regularization (CAR), which minimizes the spectral norm of a differentiable proxy of the confusion matrix during training, thereby reducing inter-class confusion and enhancing generalization on tail classes. The method incorporates an Exponential Moving Average (EMA)-based confusion estimator and is combined with ConCutMix data augmentation. Extensive experiments on benchmarks including ImageNet-LT, CIFAR100-LT, and iNaturalist demonstrate that CAR consistently outperforms existing approaches—both in training from scratch and fine-tuning settings—simultaneously improving worst-class accuracy and overall performance.
📝 Abstract
Long-tailed image classification remains a long-standing challenge, as real-world data typically follow highly imbalanced distributions where a few head classes dominate and many tail classes contain only limited samples. This imbalance biases feature learning toward head categories and leads to significant degradation on rare classes. Although recent studies have proposed re-sampling, re-weighting, and decoupled learning strategies, the improvement on the most underrepresented classes still remains marginal compared with overall accuracy. In this work, we present a confusion-centric perspective for long-tailed recognition that explicitly focuses on worst-class generalization. We first establish a new theoretical framework of class-specific error analysis, which shows that the worst-class error can be tightly upper-bounded by the spectral norm of the frequency-weighted confusion matrix and a model-dependent complexity term. Guided by this insight, we propose the Confusion-Aware Spectral Regularizer (CAR) that minimizes the spectral norm of the confusion matrix during training to reduce inter-class confusion and enhance tail-class generalization. To enable stable and efficient optimization, CAR integrates a Differentiable Confusion Matrix Surrogate and an EMA-based Confusion Estimator to maintain smooth and low-variance estimates across mini-batches. Extensive experiments across multiple long-tailed benchmarks demonstrates that CAR substantially improves both worst-class accuracy and overall performance. When combined with ConCutMix augmentation, CAR consistently surpasses exisiting state-of-the-art long-tailed learning methods under both the training-from-scratch setting (by 2.37% ~ 4.83%) and the fine-tuning-from-pretrained setting (by 2.42% ~ 4.17%) across ImageNet-LT, CIFAR100-LT, and iNaturalist datasets.