Graph Regularized Encoder Training for Extreme Classification

📅 2024-02-28
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of data sparsity for tail labels and prohibitive computational overhead of Graph Convolutional Networks (GCNs) in extreme classification (XC) with million-scale label spaces, this paper proposes RAMEN—a novel framework that abandons conventional GCN-based label graph modeling. We theoretically prove and empirically validate that GCN computation can be entirely eliminated. Instead, RAMEN imposes structured regularization on a deep encoder using the label relationship graph, achieving both training enhancement and zero inference overhead. The method balances model expressiveness and efficiency: on public million-label benchmarks, it improves accuracy by up to 15% over state-of-the-art methods; on proprietary search-and-recommendation datasets, it outperforms the best baseline by 10%. Our core contribution is establishing the “graph regularization instead of graph modeling” paradigm—offering an efficient, scalable solution for ultra-large-scale sparse-label classification.

Technology Category

Application Category

📝 Abstract
Deep extreme classification (XC) aims to train an encoder architecture and an accompanying classifier architecture to tag a data point with the most relevant subset of labels from a very large universe of labels. XC applications in ranking, recommendation and tagging routinely encounter tail labels for which the amount of training data is exceedingly small. Graph convolutional networks (GCN) present a convenient but computationally expensive way to leverage task metadata and enhance model accuracies in these settings. This paper formally establishes that in several use cases, the steep computational cost of GCNs is entirely avoidable by replacing GCNs with non-GCN architectures. The paper notices that in these settings, it is much more effective to use graph data to regularize encoder training than to implement a GCN. Based on these insights, an alternative paradigm RAMEN is presented to utilize graph metadata in XC settings that offers significant performance boosts with zero increase in inference computational costs. RAMEN scales to datasets with up to 1M labels and offers prediction accuracy up to 15% higher on benchmark datasets than state of the art methods, including those that use graph metadata to train GCNs. RAMEN also offers 10% higher accuracy over the best baseline on a proprietary recommendation dataset sourced from click logs of a popular search engine. Code for RAMEN will be released publicly.
Problem

Research questions and friction points this paper is trying to address.

Extreme Classification
Computational Efficiency
Model Accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

RAMEN method
Graph information optimization
Limit classification problems
A
Anshul Mittal
Microsoft, India
S
Shikhar Mohan
Microsoft, India
Deepak Saini
Deepak Saini
Microsoft, India
S
Suchith C. Prabhu
IIT Delhi, India
J
Jian Jiao
Microsoft, India
Sumeet Agarwal
Sumeet Agarwal
IIT Delhi, India
Soumen Chakrabarti
Soumen Chakrabarti
IIT Bombay
Web searchWeb miningsocial networksmachine learning
Purushottam Kar
Purushottam Kar
IIT Kanpur, India
M
M. Varma
Microsoft, IIT Delhi, India