Retrieval-augmented Encoders for Extreme Multi-label Text Classification

📅 2025-02-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge in eXtreme Multi-Label Classification (XMC) of simultaneously achieving strong generalization for tail labels and robust memorization for head labels. To this end, we propose RAEXMC, a Retrieval-Augmented Encoder framework built upon a dual-encoder architecture—the first to introduce a non-parametric retrieval-augmentation mechanism into XMC. RAEXMC constructs an external knowledge memory via a key-value store and enables end-to-end inference through approximate nearest neighbor search, requiring no additional trainable parameters. It employs a unified contrastive learning objective, eliminating the need for complex joint training as in DE (Deep Embedding) and OVA (One-Vs-All) models. RAEXMC outperforms the state-of-the-art DEXML on four LF-XMC benchmarks and achieves over 10× training speedup on the LF-AmazonTitles-1.3M dataset using eight A100 GPUs.

Technology Category

Application Category

📝 Abstract
Extreme multi-label classification (XMC) seeks to find relevant labels from an extremely large label collection for a given text input. To tackle such a vast label space, current state-of-the-art methods fall into two categories. The one-versus-all (OVA) method uses learnable label embeddings for each label, excelling at memorization (i.e., capturing detailed training signals for accurate head label prediction). In contrast, the dual-encoder (DE) model maps input and label text into a shared embedding space for better generalization (i.e., the capability of predicting tail labels with limited training data), but may fall short at memorization. To achieve generalization and memorization, existing XMC methods often combine DE and OVA models, which involves complex training pipelines. Inspired by the success of retrieval-augmented language models, we propose the Retrieval-augmented Encoders for XMC (RAEXMC), a novel framework that equips a DE model with retrieval-augmented capability for efficient memorization without additional trainable parameter. During training, RAEXMC is optimized by the contrastive loss over a knowledge memory that consists of both input instances and labels. During inference, given a test input, RAEXMC retrieves the top-$K$ keys from the knowledge memory, and aggregates the corresponding values as the prediction scores. We showcase the effectiveness and efficiency of RAEXMC on four public LF-XMC benchmarks. RAEXMC not only advances the state-of-the-art (SOTA) DE method DEXML, but also achieves more than 10x speedup on the largest LF-AmazonTitles-1.3M dataset under the same 8 A100 GPUs training environments.
Problem

Research questions and friction points this paper is trying to address.

Retrieval-augmented Encoders for XMC
Balances generalization and memorization
Enhances efficiency and prediction accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-augmented Encoders for XMC
Contrastive loss optimization
Top-K key retrieval method
🔎 Similar Papers
No similar papers found.