🤖 AI Summary
This work addresses the practical difficulty of achieving Neural Collapse—the theoretically optimal state in supervised classification—by proposing a unified hyperspherical prototype contrastive framework that integrates cross-entropy and supervised contrastive learning. The approach employs normalized losses (NTCE and NONL) to enhance negative sample utilization and decouple alignment from uniformity, while theoretically demonstrating that supervised contrastive learning inherently yields an optimal classifier, obviating the need for linear probing. By unifying two dominant paradigms under a prototype contrastive perspective and using class-mean embeddings as classifier weights, the method closely approximates the geometric structure of Neural Collapse. It outperforms standard cross-entropy on four benchmarks including ImageNet-1K, with over 95% of metrics approaching theoretical limits, achieves collapse characteristics within just 7.5% of training iterations, improves transfer learning performance by 5.5% on average, gains up to 8.7% under severe class imbalance, and effectively reduces mCE on ImageNet-C.
📝 Abstract
Supervised classification has a theoretical optimum, Neural Collapse (NC), yet neither of its two dominant paradigms reaches it in practice. Cross entropy (CE) leaves radial degrees of freedom unconstrained and converges to a degenerate geometry, while supervised contrastive learning (SCL) drives features toward NC during pretraining but discards this structure in a post hoc linear probing phase. We show that both paradigms are different appearances of the same method, prototype contrast on the unit hypersphere, and that closing the gap requires fixing each at its specific point of failure. From the CE side, we propose NTCE and NONL, two normalized losses that import contrastive optimization's missing ingredients into classifier learning: a large effective negative set and decoupled alignment and uniformity terms. From the SCL side, we prove that SCL's objective already optimizes throughout training for a principled classifier whose weights are the class mean embeddings, making linear probing both redundant and harmful. Empirically, on four benchmarks including ImageNet-1K, NTCE and NONL surpass CE accuracy, closely approximate NC ($\geq 95\%$), and match CE's converged NC on 4/5 metrics in under $7.5\%$ of its iterations, while SCL with fixed prototypes matches linear probing without the hours-long classifier training phase. The learned geometry yields $+5.5\%$ mean relative improvement in transfer learning, up to $+8.7\%$ under severe class imbalance, and lower mCE on ImageNet-C, recasting supervised learning as prototype learning on the hypersphere, with NC reached by design on both paths.