π€ AI Summary
Existing decoder-only large language models struggle to effectively leverage structured lexical semantic knowledge during text generation, resulting in insufficient inheritance of semantic relationships. This work proposes a lexical semantic knowledge distillation framework tailored for such models, which, for the first time, integrates structured semantic knowledge from lexical resources into the training process. By combining lexical knowledge distillation, contextual embedding alignment, and decoder fine-tuning, the approach enhances the modelβs semantic expressiveness without requiring on-the-fly dictionary lookups during inference. Experimental results demonstrate that the method significantly improves the modelβs ability to inherit lexical semantic relations across multiple benchmarks, achieving a favorable balance between semantic richness and training efficiency.
π Abstract
Large language models (LLMs) learn contextual embeddings that capture rich semantic information, yet they often overlook structured lexical knowledge such as word senses and relationships. Prior work has shown that incorporating sense dictionaries can improve knowledge distillation for encoder models, but their application to decoder as generative models remains challenging. In this paper, we introduce Decoder-based Sense Knowledge Distillation (DSKD), a framework that integrates lexical resources into the training of decoder-style LLMs without requiring dictionary lookup at inference time. Extensive experiments on diverse benchmarks demonstrate that DSKD significantly enhances knowledge distillation performance for decoders, enabling generative models to inherit structured semantics while maintaining efficient training.