🤖 AI Summary
Existing natural language understanding (NLU) models for multi-turn dialogue suffer from weak contextual modeling capabilities and struggle to jointly parse intent, slots, and domain semantics. Method: This paper proposes a hierarchical knowledge distillation framework featuring a novel multi-teacher collaborative distillation mechanism: three specialized teacher models—intent recognizer, slot tagger, and domain classifier—are constructed hierarchically, and a unified multi-teacher loss function guides a lightweight student model. Additionally, the framework integrates hierarchical semantic modeling with Transformer-based multi-turn dialogue encoding to jointly optimize sentence-level, token-level, and dialogue-level representations. Contribution/Results: Evaluated on mainstream multi-turn NLU benchmarks, the method achieves state-of-the-art performance across all three tasks—intent detection, slot filling, and domain classification—significantly outperforming both single-turn and existing dialogue-aware NLU models.
📝 Abstract
Although Large Language Models(LLMs) can generate coherent and contextually relevant text, they often struggle to recognise the intent behind the human user's query. Natural Language Understanding (NLU) models, however, interpret the purpose and key information of user's input to enable responsive interactions. Existing NLU models generally map individual utterances to a dual-level semantic frame, involving sentence-level intent and word-level slot labels. However, real-life conversations primarily consist of multi-turn conversations, involving the interpretation of complex and extended dialogues. Researchers encounter challenges addressing all facets of multi-turn dialogue conversations using a unified single NLU model. This paper introduces a novel approach, MIDAS, leveraging a multi-level intent, domain, and slot knowledge distillation for multi-turn NLU. To achieve this, we construct distinct teachers for varying levels of conversation knowledge, namely, sentence-level intent detection, word-level slot filling, and conversation-level domain classification. These teachers are then fine-tuned to acquire specific knowledge of their designated levels. A multi-teacher loss is proposed to facilitate the combination of these multi-level teachers, guiding a student model in multi-turn dialogue tasks. The experimental results demonstrate the efficacy of our model in improving the overall multi-turn conversation understanding, showcasing the potential for advancements in NLU models through the incorporation of multi-level dialogue knowledge distillation techniques.