MAC-SLU: Multi-Intent Automotive Cabin Spoken Language Understanding Benchmark

📅 2025-12-01

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

Existing SLU datasets suffer from limited scenario diversity, simplistic intent structures, and the absence of a unified, large-model-oriented evaluation benchmark. To address these limitations, we introduce MAC-SLU—the first multi-intent spoken language understanding dataset specifically designed for in-vehicle cabin environments—covering realistic complex interactions, context dependency, and concurrent multi-intent utterances. MAC-SLU supports both end-to-end and pipeline-based SLU paradigms and establishes the first unified, fine-grained evaluation benchmark for large language models (LLMs) and large audio-language models (LALMs) in the in-vehicle domain. Experimental results demonstrate that supervised fine-tuning substantially outperforms zero-shot in-context learning; moreover, end-to-end LALMs achieve performance comparable to pipeline methods while effectively mitigating ASR error propagation. This work fills a critical gap in multi-intent in-vehicle SLU benchmarking and advances the rigorous evaluation and practical deployment of foundation models in real-world speech understanding tasks.

Technology Category

Application Category

📝 Abstract

Spoken Language Understanding (SLU), which aims to extract user semantics to execute downstream tasks, is a crucial component of task-oriented dialog systems. Existing SLU datasets generally lack sufficient diversity and complexity, and there is an absence of a unified benchmark for the latest Large Language Models (LLMs) and Large Audio Language Models (LALMs). This work introduces MAC-SLU, a novel Multi-Intent Automotive Cabin Spoken Language Understanding Dataset, which increases the difficulty of the SLU task by incorporating authentic and complex multi-intent data. Based on MAC-SLU, we conducted a comprehensive benchmark of leading open-source LLMs and LALMs, covering methods like in-context learning, supervised fine-tuning (SFT), and end-to-end (E2E) and pipeline paradigms. Our experiments show that while LLMs and LALMs have the potential to complete SLU tasks through in-context learning, their performance still lags significantly behind SFT. Meanwhile, E2E LALMs demonstrate performance comparable to pipeline approaches and effectively avoid error propagation from speech recognition. Codefootnote{https://github.com/Gatsby-web/MAC_SLU} and datasetsfootnote{huggingface.co/datasets/Gatsby1984/MAC_SLU} are released publicly.

Problem

Research questions and friction points this paper is trying to address.

Introduces a multi-intent automotive SLU dataset to increase task difficulty

Benchmarks LLMs and LALMs on SLU using in-context learning and fine-tuning

Evaluates end-to-end LALMs versus pipeline methods to avoid error propagation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces MAC-SLU dataset with complex multi-intent data

Benchmarks LLMs and LALMs using in-context learning and fine-tuning

Shows end-to-end LALMs avoid speech recognition error propagation

🔎 Similar Papers

Do Large Language Model Understand Multi-Intent Spoken Language ?

2024-03-07arXiv.orgCitations: 6