🤖 AI Summary
Real-time speech summarization for doctor–patient dialogues remains underexplored, particularly for clinical deployment. Method: We propose the first deployable real-time medical dialogue speech summarization system, capable of generating turn-level local summaries during interaction and a final global summary upon dialogue completion. Our approach integrates automatic speech recognition (ASR), sequential abstractive summarization, and large language models (LLMs) within a novel real-time segmentation architecture. To address data scarcity, we introduce VietMed-Sum—a high-quality, bilingual (English–Vietnamese) medical speech summarization dataset—curated via a human-in-the-loop annotation paradigm combining LLM-based pre-screening and expert refinement. Contribution/Results: Our method achieves state-of-the-art performance on VietMed-Sum. All code, the dataset, and pretrained models are publicly released to foster reproducible research and clinical adoption.
📝 Abstract
In doctor-patient conversations, identifying medically relevant information is crucial, posing the need for conversation summarization. In this work, we propose the first deployable real-time speech summarization system for real-world applications in industry, which generates a local summary after every N speech utterances within a conversation and a global summary after the end of a conversation. Our system could enhance user experience from a business standpoint, while also reducing computational costs from a technical perspective. Secondly, we present VietMed-Sum which, to our knowledge, is the first speech summarization dataset for medical conversations. Thirdly, we are the first to utilize LLM and human annotators collaboratively to create gold standard and synthetic summaries for medical conversation summarization. Finally, we present baseline results of state-of-the-art models on VietMed-Sum. All code, data (English-translated and Vietnamese) and models are available online: https://github.com/leduckhai/MultiMed/tree/master/VietMed-Sum