🤖 AI Summary
This work addresses the challenge of inaccurate automatic speech recognition (ASR) in medical question-answering scenarios, where errors in recognizing domain-specific terminology degrade downstream task performance. To this end, the authors propose MedSpeak, a novel framework that, for the first time, jointly integrates semantic information from a medical knowledge graph with the reasoning capabilities of large language models (LLMs) to correct ASR transcripts. By leveraging a multimodal fusion of acoustic, semantic, and phonetic features, MedSpeak significantly improves both medical term recognition accuracy and downstream question-answering performance across multiple healthcare benchmarks, achieving state-of-the-art results.
📝 Abstract
Spoken question-answering (SQA) systems relying on automatic speech recognition (ASR) often struggle with accurately recognizing medical terminology. To this end, we propose MedSpeak, a novel knowledge graph-aided ASR error correction framework that refines noisy transcripts and improves downstream answer prediction by leveraging both semantic relationships and phonetic information encoded in a medical knowledge graph, together with the reasoning power of LLMs. Comprehensive experimental results on benchmarks demonstrate that MedSpeak significantly improves the accuracy of medical term recognition and overall medical SQA performance, establishing MedSpeak as a state-of-the-art solution for medical SQA. The code is available at https://github.com/RainieLLM/MedSpeak.