Personal Attribute Leakage in Federated Speech Models

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work exposes severe privacy vulnerabilities of automatic speech recognition (ASR) models under federated learning to attribute inference attacks. Focusing on three mainstream ASR architectures—Wav2Vec2, HuBERT, and Whisper—we propose a non-parametric white-box attack under a passive threat model that requires no access to raw speech data and operates solely on differences in model weights. We systematically evaluate the inferability of sensitive demographic and clinical attributes—including gender, age, accent, emotion, and articulation disorders. Experimental results show that accent is highly inferable across all models, and attributes underrepresented or absent in pretraining corpora exhibit significantly higher leakage risk. To our knowledge, this is the first empirical study to uncover unreported systemic privacy flaws in federated ASR models. Our findings provide critical insights for privacy risk assessment and inform the design of robust mitigation mechanisms for federated ASR systems.

Technology Category

Application Category

📝 Abstract

Federated learning is a common method for privacy-preserving training of machine learning models. In this paper, we analyze the vulnerability of ASR models to attribute inference attacks in the federated setting. We test a non-parametric white-box attack method under a passive threat model on three ASR models: Wav2Vec2, HuBERT, and Whisper. The attack operates solely on weight differentials without access to raw speech from target speakers. We demonstrate attack feasibility on sensitive demographic and clinical attributes: gender, age, accent, emotion, and dysarthria. Our findings indicate that attributes that are underrepresented or absent in the pre-training data are more vulnerable to such inference attacks. In particular, information about accents can be reliably inferred from all models. Our findings expose previously undocumented vulnerabilities in federated ASR models and offer insights towards improved security.

Problem

Research questions and friction points this paper is trying to address.

Analyzing vulnerability of federated ASR models to attribute inference attacks

Demonstrating leakage of demographic and clinical attributes from model weights

Identifying underrepresented training data attributes as most vulnerable to attacks

Innovation

Methods, ideas, or system contributions that make the work stand out.

White-box attack uses weight differentials only

Tests vulnerability on demographic and clinical attributes

Reveals accent information most reliably inferred

🔎 Similar Papers

FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering

2023-10-24arXiv.orgCitations: 7

💼 Related Jobs

Research Engineer, Privacy

OpenAI

$380K – $445K • Offers Equity

San Francisco

Authors to Follow