Do Large Language Model Understand Multi-Intent Spoken Language ?

📅 2024-03-07
🏛️ arXiv.org
📈 Citations: 6
Influential: 0
📄 PDF
🤖 AI Summary
To address two key limitations of large language models (LLMs) in multi-intent spoken language understanding (SLU)—word-level alignment bias arising from autoregressive generation and insufficient modeling of semantic-level intent relationships—this paper proposes the EN-LLM and ENSI-LLM architectures. Methodologically, it introduces: (1) a restructured entity-slot modeling paradigm; (2) the Sub-Intent Instruction (SII) mechanism, the first of its kind to explicitly guide fine-grained intent decomposition; (3) synthetic benchmarks LM-MixATIS and LM-MixSNIPS; and (4) two novel fine-grained evaluation metrics—Entity Slot Accuracy (ESA) and Combined Semantic Accuracy (CSA). Experiments demonstrate state-of-the-art performance on multi-intent SLU tasks, with significant improvements in robustness to complex intent compositions and distributional shifts, as well as enhanced generalization capability.

Technology Category

Application Category

📝 Abstract
This research signifies a considerable breakthrough in leveraging Large Language Models (LLMs) for multi-intent spoken language understanding (SLU). Our approach re-imagines the use of entity slots in multi-intent SLU applications, making the most of the generative potential of LLMs within the SLU landscape, leading to the development of the EN-LLM series. Furthermore, we introduce the concept of Sub-Intent Instruction (SII) to amplify the analysis and interpretation of complex, multi-intent communications, which further supports the creation of the ENSI-LLM models series. Our novel datasets, identified as LM-MixATIS and LM-MixSNIPS, are synthesized from existing benchmarks. The study evidences that LLMs may match or even surpass the performance of the current best multi-intent SLU models. We also scrutinize the performance of LLMs across a spectrum of intent configurations and dataset distributions. On top of this, we present two revolutionary metrics - Entity Slot Accuracy (ESA) and Combined Semantic Accuracy (CSA) - to facilitate a detailed assessment of LLM competence in this multifaceted field."Our code and datasets are available at url{https://github.com/SJY8460/SLM}.
Problem

Research questions and friction points this paper is trying to address.

Addresses LLM misalignment in spoken language token tasks
Improves semantic relation capture for spoken language understanding
Enables step-by-step multi-intent recognition through Chain of Intent
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reformulates slot-filling as entity recognition task
Introduces Chain of Intent for multi-intent recognition
Improves spoken language understanding over standard fine-tuning
S
Shangjian Yin
College of Mathematics and Informatics, South China Agricultural University, China
P
Peijie Huang
College of Mathematics and Informatics, South China Agricultural University, China
Yuhong Xu
Yuhong Xu
College of Mathematics and Informatics, South China Agricultural University, China
Haojing Huang
Haojing Huang
Tsinghua University
Natural Language ProcessingLarge Language Model
J
Jiatian Chen
College of Mathematics and Informatics, South China Agricultural University, China