🤖 AI Summary
This work proposes a semantic-aware sparse mixture-of-experts (MoE) architecture to address the limitations of existing definition modeling approaches in semantic diversity and domain specificity. By clustering training data to delineate semantic domains, the method trains compact, domain-specific language models as specialized semantic experts and introduces a domain-level routing mechanism to enable fine-grained expert specialization. The architecture supports efficient inference and scalable expert expansion at test time. Evaluated on five mainstream benchmarks, the model achieves a 7% improvement in BLEU score over the previous state-of-the-art, with expert specialization contributing nearly a 10% gain in definition quality. Moreover, the domain-level routing strategy demonstrates higher computational efficiency compared to conventional token-level routing.
📝 Abstract
We introduce LM-Lexicon, an innovative definition modeling approach that incorporates data clustering, semantic expert learning, and model merging using a sparse mixture-of-experts architecture. By decomposing the definition modeling task into specialized semantic domains, where small language models are trained as domain experts, LM-Lexicon achieves substantial improvements (+7% BLEU score compared with the prior state-of-the-art model) over existing methods on five widely used benchmarks. Empirically, we demonstrate that 1) the clustering strategy enables fine-grained expert specialization with nearly 10% improvement in definition quality; 2) the semantic-aware domain-level routing mechanism achieves higher expert efficacy (+1%) than conventional token-level routing; and 3) further performance gains can be obtained through test-time compute and semantic expert scaling. Our work advances definition modeling while providing insights into the development of efficient language models for semantic-intensive applications.