🤖 AI Summary
Hierarchical classification often neglects inter-class structural relationships and suffers from severe class imbalance (long-tail distribution). Method: This paper proposes two novel hierarchical contrastive learning approaches that jointly model hierarchical structure and long-tail distributions within a unified contrastive framework—first of its kind. Specifically, it employs Gaussian Mixture Models (GMMs) to capture hierarchy-specific feature distributions and integrates attention mechanisms to explicitly encode cross-level semantic associations. The method enables fine-grained feature disentanglement and cross-level clustering, emulating human hierarchical cognition. Contribution/Results: Evaluated via linear probing on CIFAR-100 and ModelNet40, the proposed methods achieve accuracy improvements of 2 percentage points over current state-of-the-art methods, demonstrating superior effectiveness and generalization capability in hierarchical representation learning.
📝 Abstract
Hierarchical classification is a crucial task in many applications, where objects are organized into multiple levels of categories. However, conventional classification approaches often neglect inherent inter-class relationships at different hierarchy levels, thus missing important supervisory signals. Thus, we propose two novel hierarchical contrastive learning (HMLC) methods. The first, leverages a Gaussian Mixture Model (G-HMLC) and the second uses an attention mechanism to capture hierarchy-specific features (A-HMLC), imitating human processing. Our approach explicitly models inter-class relationships and imbalanced class distribution at higher hierarchy levels, enabling fine-grained clustering across all hierarchy levels. On the competitive CIFAR100 and ModelNet40 datasets, our method achieves state-of-the-art performance in linear evaluation, outperforming existing hierarchical contrastive learning methods by 2 percentage points in terms of accuracy. The effectiveness of our approach is backed by both quantitative and qualitative results, highlighting its potential for applications in computer vision and beyond.