Do Large Language Model Understand Multi-Intent Spoken Language ?

📅 2024-03-07

🏛️ arXiv.org

📈 Citations: 6

✨ Influential: 0

career value

187K/year

🤖 AI Summary

To address two key limitations of large language models (LLMs) in multi-intent spoken language understanding (SLU)—word-level alignment bias arising from autoregressive generation and insufficient modeling of semantic-level intent relationships—this paper proposes the EN-LLM and ENSI-LLM architectures. Methodologically, it introduces: (1) a restructured entity-slot modeling paradigm; (2) the Sub-Intent Instruction (SII) mechanism, the first of its kind to explicitly guide fine-grained intent decomposition; (3) synthetic benchmarks LM-MixATIS and LM-MixSNIPS; and (4) two novel fine-grained evaluation metrics—Entity Slot Accuracy (ESA) and Combined Semantic Accuracy (CSA). Experiments demonstrate state-of-the-art performance on multi-intent SLU tasks, with significant improvements in robustness to complex intent compositions and distributional shifts, as well as enhanced generalization capability.

Technology Category

Application Category

📝 Abstract

This research signifies a considerable breakthrough in leveraging Large Language Models (LLMs) for multi-intent spoken language understanding (SLU). Our approach re-imagines the use of entity slots in multi-intent SLU applications, making the most of the generative potential of LLMs within the SLU landscape, leading to the development of the EN-LLM series. Furthermore, we introduce the concept of Sub-Intent Instruction (SII) to amplify the analysis and interpretation of complex, multi-intent communications, which further supports the creation of the ENSI-LLM models series. Our novel datasets, identified as LM-MixATIS and LM-MixSNIPS, are synthesized from existing benchmarks. The study evidences that LLMs may match or even surpass the performance of the current best multi-intent SLU models. We also scrutinize the performance of LLMs across a spectrum of intent configurations and dataset distributions. On top of this, we present two revolutionary metrics - Entity Slot Accuracy (ESA) and Combined Semantic Accuracy (CSA) - to facilitate a detailed assessment of LLM competence in this multifaceted field."Our code and datasets are available at url{https://github.com/SJY8460/SLM}.

Problem

Research questions and friction points this paper is trying to address.

Addresses LLM misalignment in spoken language token tasks

Improves semantic relation capture for spoken language understanding

Enables step-by-step multi-intent recognition through Chain of Intent

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reformulates slot-filling as entity recognition task

Introduces Chain of Intent for multi-intent recognition

Improves spoken language understanding over standard fine-tuning

🔎 Similar Papers

MIDAS: Multi-level Intent, Domain, And Slot Knowledge Distillation for Multi-turn NLU