Slot Filling as a Reasoning Task for SpeechLLMs

📅 2025-10-22

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses the limited reasoning capability of speech large language models (speechLLMs) in end-to-end slot filling. Methodologically, it proposes a hybrid modeling paradigm integrating chain-of-thought (CoT) reasoning: (1) constructing a reasoning-oriented annotation dataset tailored for speech understanding; (2) designing a hybrid speechLLM architecture supporting both direct prediction and multi-step reasoning modes; and (3) performing supervised fine-tuning on diverse-scale text-based LLM backbones to jointly optimize both inference paths. The key contribution lies in the first systematic empirical validation of the transfer limitations of pure-text CoT models in speech domains, demonstrating that explicit incorporation of intermediate reasoning steps significantly improves slot filling accuracy. Experiments across multiple benchmarks show consistent superiority of the proposed hybrid model over single-mode baselines, establishing a new, interpretable, and scalable paradigm for spoken language semantic parsing.

Technology Category

Application Category

📝 Abstract

We propose integration of reasoning into speech large language models (speechLLMs) for the end-to-end slot-filling task. Inspired by the recent development of reasoning LLMs, we use a chain-of-thought framework to decompose the slot-filling task into multiple reasoning steps, create a reasoning dataset and apply the supervised fine-tuning strategy to a speechLLM. We distinguish between regular and reasoning speechLLMs and experiment with different types and sizes of LLMs as their text foundation models. We demonstrate performance improvements by introducing reasoning (intermediate) steps. However, we show that a reasoning textual LLM developed mainly for math, logic and coding domains might be inferior as a foundation model for a reasoning speechLLM. We further show that hybrid speechLLMs, built on a hybrid text foundation LLM and fine-tuned to preserve both direct and reasoning modes of operation, have better performance than those fine-tuned employing only one mode of operation.

Problem

Research questions and friction points this paper is trying to address.

Integrating reasoning into speechLLMs for slot-filling

Decomposing slot-filling into multiple reasoning steps

Evaluating hybrid speechLLMs with dual operation modes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates reasoning into speechLLMs for slot-filling

Uses chain-of-thought framework to decompose task steps

Employs hybrid foundation models for dual operational modes

🔎 Similar Papers

Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data