AFD-SLU: Adaptive Feature Distillation for Spoken Language Understanding

📅 2025-09-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual challenges of scarce high-quality annotated data and high deployment costs of large language models in spoken language understanding (SLU), this paper proposes an adaptive feature distillation framework. The method employs a residual projection neural network (RPNN) to align heterogeneous feature spaces between a teacher model (based on GTE) and a lightweight student model, and introduces a dynamic distillation coefficient (DDC) mechanism that adaptively adjusts distillation intensity based on real-time performance feedback from intent classification and slot filling tasks. This design significantly enhances knowledge transfer efficiency and generalization capability. Evaluated on the Chinese ProSLU benchmark, the proposed approach achieves 95.67% intent accuracy, 92.02% slot F1-score, and 85.50% joint accuracy—setting new state-of-the-art results.

Technology Category

Application Category

📝 Abstract
Spoken Language Understanding (SLU) is a core component of conversational systems, enabling machines to interpret user utterances. Despite its importance, developing effective SLU systems remains challenging due to the scarcity of labeled training data and the computational burden of deploying Large Language Models (LLMs) in real-world applications. To further alleviate these issues, we propose an Adaptive Feature Distillation framework that transfers rich semantic representations from a General Text Embeddings (GTE)-based teacher model to a lightweight student model. Our method introduces a dynamic adapter equipped with a Residual Projection Neural Network (RPNN) to align heterogeneous feature spaces, and a Dynamic Distillation Coefficient (DDC) that adaptively modulates the distillation strength based on real-time feedback from intent and slot prediction performance. Experiments on the Chinese profile-based ProSLU benchmark demonstrate that AFD-SLU achieves state-of-the-art results, with 95.67% intent accuracy, 92.02% slot F1 score, and 85.50% overall accuracy.
Problem

Research questions and friction points this paper is trying to address.

Addresses scarcity of labeled SLU training data
Reduces computational burden of deploying LLMs
Aligns heterogeneous feature spaces for distillation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Feature Distillation framework transfers semantic representations
Dynamic adapter with Residual Projection Neural Network aligns features
Dynamic Distillation Coefficient modulates strength using real-time feedback
🔎 Similar Papers
No similar papers found.
Y
Yan Xie
School of Advanced Manufacturing and Robotics, Peking University, Beijing, China
Y
Yibo Cui
National Institute of Defense Technology Innovation, Academy of Military Sciences, Beijing, China; Tianjin Artificial Intelligence Innovation Center (TAIIC), Tianjin, China
Liang Xie
Liang Xie
Wuhan University of Technology
Time Series ForecastingCross-modal Learning
E
Erwei Yin
National Institute of Defense Technology Innovation, Academy of Military Sciences, Beijing, China; Tianjin Artificial Intelligence Innovation Center (TAIIC), Tianjin, China