VocalAgent: Large Language Models for Vocal Health Diagnostics with Safety-Aware Evaluation

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Speech-impaired individuals worldwide face challenges including limited diagnostic accessibility and insufficient multilingual support. To address these, we propose the first Audio Large Language Model (Audio-LLM) tailored for vocal fold health diagnosis, built upon the Qwen-Audio-Chat architecture and trained on hospital-collected tri-modal data—speech audio, transcribed text, and clinical labels—via instruction tuning and safety alignment for clinical deployment. Our key contributions include: (1) a safety-aware evaluation framework integrating diagnostic bias mitigation, cross-lingual robustness validation, and modality ablation analysis; and (2) empirical demonstration of state-of-the-art performance on multilingual voice disorder classification, achieving superior accuracy, strong generalization across languages and demographics, and practical clinical deployability. This work establishes a new paradigm for equitable, globally scalable voice health diagnostics.

Technology Category

Application Category

📝 Abstract

Vocal health plays a crucial role in peoples' lives, significantly impacting their communicative abilities and interactions. However, despite the global prevalence of voice disorders, many lack access to convenient diagnosis and treatment. This paper introduces VocalAgent, an audio large language model (LLM) to address these challenges through vocal health diagnosis. We leverage Qwen-Audio-Chat fine-tuned on three datasets collected in-situ from hospital patients, and present a multifaceted evaluation framework encompassing a safety assessment to mitigate diagnostic biases, cross-lingual performance analysis, and modality ablation studies. VocalAgent demonstrates superior accuracy on voice disorder classification compared to state-of-the-art baselines. Its LLM-based method offers a scalable solution for broader adoption of health diagnostics, while underscoring the importance of ethical and technical validation.

Problem

Research questions and friction points this paper is trying to address.

Develops VocalAgent for vocal health diagnosis using LLMs

Addresses lack of accessible voice disorder diagnostics globally

Ensures safety and accuracy in multilingual diagnostic evaluations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Qwen-Audio-Chat fine-tuned for vocal diagnostics

Multifaceted evaluation framework ensures safety

LLM-based scalable solution for voice disorders

🔎 Similar Papers

No similar papers found.