🤖 AI Summary
To address the dual challenges of scarce high-quality annotated data and high deployment costs of large language models in spoken language understanding (SLU), this paper proposes an adaptive feature distillation framework. The method employs a residual projection neural network (RPNN) to align heterogeneous feature spaces between a teacher model (based on GTE) and a lightweight student model, and introduces a dynamic distillation coefficient (DDC) mechanism that adaptively adjusts distillation intensity based on real-time performance feedback from intent classification and slot filling tasks. This design significantly enhances knowledge transfer efficiency and generalization capability. Evaluated on the Chinese ProSLU benchmark, the proposed approach achieves 95.67% intent accuracy, 92.02% slot F1-score, and 85.50% joint accuracy—setting new state-of-the-art results.
📝 Abstract
Spoken Language Understanding (SLU) is a core component of conversational systems, enabling machines to interpret user utterances. Despite its importance, developing effective SLU systems remains challenging due to the scarcity of labeled training data and the computational burden of deploying Large Language Models (LLMs) in real-world applications. To further alleviate these issues, we propose an Adaptive Feature Distillation framework that transfers rich semantic representations from a General Text Embeddings (GTE)-based teacher model to a lightweight student model. Our method introduces a dynamic adapter equipped with a Residual Projection Neural Network (RPNN) to align heterogeneous feature spaces, and a Dynamic Distillation Coefficient (DDC) that adaptively modulates the distillation strength based on real-time feedback from intent and slot prediction performance. Experiments on the Chinese profile-based ProSLU benchmark demonstrate that AFD-SLU achieves state-of-the-art results, with 95.67% intent accuracy, 92.02% slot F1 score, and 85.50% overall accuracy.