Development and multi-center evaluation of domain-adapted speech recognition for human-AI teaming in real-world gastrointestinal endoscopy

๐Ÿ“… 2026-04-02
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the poor performance of general-purpose automatic speech recognition (ASR) systems in gastrointestinal endoscopy settings, where dense medical terminology and challenging acoustic conditions degrade accuracy. The authors propose EndoASR, a lightweight domain-adaptive ASR system built upon the Paraformer architecture (220M parameters), featuring a novel two-stage adaptation strategy driven by synthetic endoscopy reports to separately enhance language modeling and noise robustness, while enabling real-time edge deployment. Evaluated for the first time in multicenter real-world environments, EndoASR demonstrates strong generalization: in retrospective testing, it reduces character error rate (CER) from 20.52% to 14.14% and improves medical term accuracy (Med ACC) from 54.30% to 87.59%; in prospective trials, it achieves a CER of 14.97% and Med ACC of 84.16%, with a real-time factor of only 0.005โ€”significantly outperforming Whisper-large-v3 and substantially boosting downstream large language model performance in information extraction.
๐Ÿ“ Abstract
Automatic speech recognition (ASR) is a critical interface for human-AI interaction in gastrointestinal endoscopy, yet its reliability in real-world clinical settings is limited by domain-specific terminology and complex acoustic conditions. Here, we present EndoASR, a domain-adapted ASR system designed for real-time deployment in endoscopic workflows. We develop a two-stage adaptation strategy based on synthetic endoscopy reports, targeting domain-specific language modeling and noise robustness. In retrospective evaluation across six endoscopists, EndoASR substantially improves both transcription accuracy and clinical usability, reducing character error rate (CER) from 20.52% to 14.14% and increasing medical term accuracy (Med ACC) from 54.30% to 87.59%. In a prospective multi-center study spanning five independent endoscopy centers, EndoASR demonstrates consistent generalization under heterogeneous real-world conditions. Compared with the baseline Paraformer model, CER is reduced from 16.20% to 14.97%, while Med ACC is improved from 61.63% to 84.16%, confirming its robustness in practical deployment scenarios. Notably, EndoASR achieves a real-time factor (RTF) of 0.005, significantly faster than Whisper-large-v3 (RTF 0.055), while maintaining a compact model size of 220M parameters, enabling efficient edge deployment. Furthermore, integration with large language models demonstrates that improved ASR quality directly enhances downstream structured information extraction and clinician-AI interaction. These results demonstrate that domain-adapted ASR can serve as a reliable interface for human-AI teaming in gastrointestinal endoscopy, with consistent performance validated across multi-center real-world clinical settings.
Problem

Research questions and friction points this paper is trying to address.

automatic speech recognition
domain adaptation
gastrointestinal endoscopy
clinical usability
real-world deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

domain-adapted ASR
endoscopy
multi-center evaluation
real-time speech recognition
medical term accuracy
๐Ÿ”Ž Similar Papers
No similar papers found.
R
Ruijie Yang
Zhejiang University, Hangzhou, China.
Y
Yan Zhu
Endoscopy Center and Endoscopy Research Institute, Zhongshan Hospital, Fudan University, Shanghai, China.
P
Peiyao Fu
Endoscopy Center and Endoscopy Research Institute, Zhongshan Hospital, Fudan University, Shanghai, China.
T
Te Luo
Shanghai Key Laboratory of MICCAI, Shanghai, China.
Zhihua Wang
Zhihua Wang
City University of Hong Kong
Computer VisionBiomedical EngineeringRobotics
Xian Yang
Xian Yang
University of Manchester
Artificial IntelligenceMachine LearningHealthcare AINatural Language Processing
Q
Quanlin Li
Endoscopy Center and Endoscopy Research Institute, Zhongshan Hospital, Fudan University, Shanghai, China.
P
Pinghong Zhou
Endoscopy Center and Endoscopy Research Institute, Zhongshan Hospital, Fudan University, Shanghai, China.
Shuo Wang
Shuo Wang
Shanghai Jiao Tong University
AI4CyberSecurityRepsonsible AIPrivacy