Breaking Data Efficiency Dilemma: A Federated and Augmented Learning Framework For Alzheimer's Disease Detection via Speech

📅 2026-02-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of data inefficiency in Alzheimer’s disease (AD) detection from speech, which stems from the scarcity of medical data and stringent privacy constraints. To overcome these limitations, the authors propose a novel framework that integrates speech content recombination-based augmentation, adaptive federated learning, and an attention-driven cross-modal alignment mechanism between acoustic and textual representations. This approach enables privacy-preserving collaboration across institutions while significantly enhancing data efficiency. The method achieves breakthroughs in absolute data efficiency, collaborative training efficiency, and representation learning efficiency, attaining a multimodal accuracy of 91.52% on the ADReSSo dataset—substantially outperforming existing centralized baselines.

Technology Category

Application Category

📝 Abstract
Early diagnosis of Alzheimer's Disease (AD) is crucial for delaying its progression. While AI-based speech detection is non-invasive and cost-effective, it faces a critical data efficiency dilemma due to medical data scarcity and privacy barriers. Therefore, we propose FAL-AD, a novel framework that synergistically integrates federated learning with data augmentation to systematically optimize data efficiency. Our approach delivers three key breakthroughs: First, absolute efficiency improvement through voice conversion-based augmentation, which generates diverse pathological speech samples via cross-category voice-content recombination. Second, collaborative efficiency breakthrough via an adaptive federated learning paradigm, maximizing cross-institutional benefits under privacy constraints. Finally, representational efficiency optimization by an attentive cross-modal fusion model, which achieves fine-grained word-level alignment and acoustic-textual interaction. Evaluated on ADReSSo, FAL-AD achieves a state-of-the-art multi-modal accuracy of 91.52%, outperforming all centralized baselines and demonstrating a practical solution to the data efficiency dilemma. Our source code is publicly available at https://github.com/smileix/fal-ad.
Problem

Research questions and friction points this paper is trying to address.

Alzheimer's Disease
data efficiency
speech-based detection
medical data scarcity
privacy constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

federated learning
data augmentation
voice conversion
cross-modal fusion
Alzheimer's disease detection
🔎 Similar Papers
No similar papers found.
Xiao Wei
Xiao Wei
Duke University
roboticsrobot learningreinforcement learning
Bin Wen
Bin Wen
快手
MLLM
Y
Yuqin Lin
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China; College of Computer and Data Science, Fuzhou University, Fuzhou, China
K
Kai Li
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
M
Mingyang Gu
Tianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University, Tianjin, China; Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Xiaobao Wang
Xiaobao Wang
天津大学 Associate Professor
人工智能,大模型生成安全,图机器学习
Longbiao Wang
Longbiao Wang
Professor, Tianjin University
Speech ProcessingSpeech recognitionspeaker recognitionacoustic signal processingspeech enhancement
Jianwu Dang
Jianwu Dang
JAIST, Japan / Tianjin Univ., China
Speech Sciencespeech productionEEGdisorder speech