🤖 AI Summary
To address the challenge of automatic keyphrase extraction from diverse texts under few-shot settings, this paper proposes the first supervised keyphrase extraction framework based on Mixture of Experts (MoE). Methodologically, it introduces a learnable routing mechanism that directs tokens to syntax- or semantics-specialized experts (e.g., POS taggers, named entity recognizers), integrated with a DeBERTa backbone, GRU layers, and attention mechanisms for multi-task token-level classification. Its key contributions are: (i) the first application of MoE to supervised keyphrase extraction, endowing expert modules with interpretable, task-specific roles; and (ii) significantly improved robustness under low-data regimes and enhanced model transparency. The framework achieves state-of-the-art performance across multiple English benchmarks, outperforming both strong supervised and unsupervised baselines. Ablation studies and visualization analyses confirm that experts autonomously specialize into distinct submodules—such as punctuation detectors, stop-word filters, POS classifiers, and entity recognizers.
📝 Abstract
Keyword extraction involves identifying the most descriptive words in a document, allowing automatic categorisation and summarisation of large quantities of diverse textual data. Relying on the insight that real-world keyword detection often requires handling of diverse content, we propose a novel supervised keyword extraction approach based on the mixture of experts (MoE) technique. MoE uses a learnable routing sub-network to direct information to specialised experts, allowing them to specialize in distinct regions of the input space. SEKE, a mixture of Specialised Experts for supervised Keyword Extraction, uses DeBERTa as the backbone model and builds on the MoE framework, where experts attend to each token, by integrating it with a recurrent neural network (RNN), to allow successful extraction even on smaller corpora, where specialisation is harder due to lack of training data. The MoE framework also provides an insight into inner workings of individual experts, enhancing the explainability of the approach. We benchmark SEKE on multiple English datasets, achieving state-of-the-art performance compared to strong supervised and unsupervised baselines. Our analysis reveals that depending on data size and type, experts specialize in distinct syntactic and semantic components, such as punctuation, stopwords, parts-of-speech, or named entities. Code is available at: https://github.com/matejMartinc/SEKE_keyword_extraction