Unified Multi-task Learning for Voice-Based Detection of Diverse Clinical Conditions

📅 2025-08-28

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Existing voice-based health assessment methods are largely confined to single-disease detection and fail to capture cross-disease multidimensional information embedded in speech. To address this, we propose MARVEL—the first multitask framework enabling unified screening of nine neurologic, respiratory, and voice disorders using de-identified acoustic features. MARVEL employs a dual-branch architecture that integrates a self-supervised pretrained backbone with task-specific heads, facilitating cross-disease knowledge transfer while ensuring privacy by prohibiting raw audio transmission. Evaluated on Bridge2AI-Voice v2.0, MARVEL achieves an overall AUROC of 0.78 and 0.97 for Alzheimer’s disease/mild cognitive impairment detection. It outperforms state-of-the-art methods on seven tasks, improving over unimodal baselines by 5–19%. Crucially, MARVEL provides the first empirical validation of consistency between learned speech representations and clinically grounded acoustic features.

Technology Category

Application Category

📝 Abstract

Voice-based health assessment offers unprecedented opportunities for scalable, non-invasive disease screening, yet existing approaches typically focus on single conditions and fail to leverage the rich, multi-faceted information embedded in speech. We present MARVEL (Multi-task Acoustic Representations for Voice-based Health Analysis), a privacy-conscious multitask learning framework that simultaneously detects nine distinct neurological, respiratory, and voice disorders using only derived acoustic features, eliminating the need for raw audio transmission. Our dual-branch architecture employs specialized encoders with task-specific heads sharing a common acoustic backbone, enabling effective cross-condition knowledge transfer. Evaluated on the large-scale Bridge2AI-Voice v2.0 dataset, MARVEL achieves an overall AUROC of 0.78, with exceptional performance on neurological disorders (AUROC = 0.89), particularly for Alzheimer's disease/mild cognitive impairment (AUROC = 0.97). Our framework consistently outperforms single-modal baselines by 5-19% and surpasses state-of-the-art self-supervised models on 7 of 9 tasks, while correlation analysis reveals that the learned representations exhibit meaningful similarities with established acoustic features, indicating that the model's internal representations are consistent with clinically recognized acoustic patterns. By demonstrating that a single unified model can effectively screen for diverse conditions, this work establishes a foundation for deployable voice-based diagnostics in resource-constrained and remote healthcare settings.

Problem

Research questions and friction points this paper is trying to address.

Detecting multiple clinical conditions from voice

Leveraging acoustic features without raw audio

Enabling cross-condition knowledge transfer in diagnostics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multitask learning framework for nine disorders

Dual-branch architecture with shared acoustic backbone

Privacy-conscious using derived acoustic features only

🔎 Similar Papers

No similar papers found.