Index-ASR Technical Report

📅 2025-12-31
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes Index-ASR, a novel approach to large language model (LLM)-based automatic speech recognition (ASR) that addresses the challenges of hallucination errors and limited contextual customization. By integrating an anti-hallucination mechanism with fine-grained hotword guidance for the first time in LLM-driven ASR, and leveraging large-scale training data augmented with background noise and contextual enhancements, Index-ASR substantially improves recognition robustness and adaptability across diverse scenarios. Experimental results demonstrate that the proposed method achieves state-of-the-art performance on both public benchmarks and internal test sets, confirming its effectiveness and practical utility in real-world applications.

Technology Category

Application Category

📝 Abstract
Automatic speech recognition (ASR) has witnessed remarkable progress in recent years, largely driven by the emergence of LLM-based ASR paradigm. Despite their strong performance on a variety of open-source benchmarks, existing LLM-based ASR systems still suffer from two critical limitations. First, they are prone to hallucination errors, often generating excessively long and repetitive outputs that are not well grounded in the acoustic input. Second, they provide limited support for flexible and fine-grained contextual customization. To address these challenges, we propose Index-ASR, a large-scale LLM-based ASR system designed to simultaneously enhance robustness and support customizable hotword recognition. The core idea of Index-ASR lies in the integration of LLM and large-scale training data enriched with background noise and contextual information. Experimental results show that our Index-ASR achieves strong performance on both open-source benchmarks and in-house test sets, highlighting its robustness and practicality for real-world ASR applications.
Problem

Research questions and friction points this paper is trying to address.

hallucination
contextual customization
automatic speech recognition
LLM-based ASR
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based ASR
hallucination mitigation
contextual customization
hotword recognition
robustness
🔎 Similar Papers
No similar papers found.
Z
Zheshu Song
Artificial Intelligence Platform Department, bilibili, China
L
Lu Wang
Artificial Intelligence Platform Department, bilibili, China
W
Wei Deng
Artificial Intelligence Platform Department, bilibili, China
Zhuo Yang
Zhuo Yang
Xidian University & Shanghai AI Laboratory
Lauge Language ModelAI for Science
Y
Yong Wu
Artificial Intelligence Platform Department, bilibili, China
B
Bin Xia
Artificial Intelligence Platform Department, bilibili, China