Nudging Hidden States: Training-Free Model Steering for Chain-of-Thought Reasoning in Large Audio-Language Models

πŸ“… 2026-03-15
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the challenge of enhancing chain-of-thought (CoT) reasoning in large audio language models (LALMs) without requiring additional training. The authors propose a training-free, inference-time model guidance approach that synthesizes diverse information sources to construct guiding vectors and leverages textual exemplars to steer spoken reasoning, thereby enabling cross-modal knowledge transfer. By integrating hidden-state perturbations with CoT prompting, the method achieves an average accuracy improvement of 4.4% across four widely used LALMs and four benchmark tasks, significantly outperforming baseline approaches. These results demonstrate the method’s strong generalizability, robustness, and effectiveness in cross-modal guidance for audio-language reasoning.

Technology Category

Application Category

πŸ“ Abstract
Chain-of-thought (CoT) prompting has been extended to large audio-language models (LALMs) to elicit reasoning, yet enhancing its effectiveness without training remains challenging. We study inference-time model steering as a training-free approach to improve LALM reasoning. We introduce three strategies using diverse information sources and evaluate them across four LALMs and four benchmarks. Results show general accuracy gains up to 4.4% over CoT prompting. Notably, we identify a cross-modal transfer where steering vectors derived from few text samples effectively guide speech-based reasoning, demonstrating high data efficiency. We also examine hyperparameter sensitivity to understand the robustness of these approaches. Our findings position model steering as a practical direction for strengthening LALM reasoning.
Problem

Research questions and friction points this paper is trying to address.

Chain-of-Thought
Large Audio-Language Models
Model Steering
Training-Free
Inference-Time Reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free steering
cross-modal transfer
chain-of-thought reasoning
large audio-language models
inference-time intervention
πŸ”Ž Similar Papers
No similar papers found.
L
Lok-Lam Ieong
National Taiwan University, Taiwan
C
Chia-Chien Chen
National Taiwan University, Taiwan
Chih-Kai Yang
Chih-Kai Yang
National Taiwan University
Deep LearningSpeech ProcessingNatural Language ProcessingMachine Learning
Y
Yu-Han Huang
National Taiwan University, Taiwan
A
An-Yu Cheng
National Taiwan University, Taiwan
Hung-yi Lee
Hung-yi Lee
National Taiwan University
deep learningspoken language understandingspeech processing