When Helpful Context Leaks: Privacy Risks in Domain-Adapted ASR

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This study addresses a critical yet previously underexplored privacy vulnerability in domain-adaptive speech recognition, wherein models—when prompted or fine-tuned—may erroneously transcribe spoken utterances as contextually plausible but unspoken sensitive terms due to phonetic similarity, thereby leaking private information. The work presents the first systematic characterization and quantification of this context-dependent privacy risk, introducing a controlled dataset to evaluate leakage under both prompting and fine-tuning paradigms. Findings reveal that combining prompting with fine-tuning substantially amplifies the risk, whereas fine-tuning without contextual prompting effectively mitigates leakage while preserving transcription accuracy, achieving an optimal privacy–utility trade-off. The authors release their code and dataset to establish a new benchmark for privacy-aware speech recognition research.

📝 Abstract

SpeechLLMs are increasingly deployed in professional settings where domain customisation is standard practice: users supply context in prompts with sensitive information, fine-tune on proprietary recordings, or both. We identify and systematically investigate an overlooked privacy risk of such customisation: a model adapted to recognise domain-specific terminology can be nudged into transcribing a phonetically similar word from its context or training data, even when a different word is spoken, thereby leaking private information. To evaluate this risk, we construct a controlled dataset and measure leakage rates across two customisation mechanisms, prompting and fine-tuning. Both mechanisms cause measurable leakage, compounding when combined. We evaluate a prompt-level mitigation strategy and analyse the accuracy-leakage trade-off across customisation approaches, finding that fine-tuning without context prompts offers the best balance. We release our code and dataset publicly.

Problem

Research questions and friction points this paper is trying to address.

privacy leakage

domain-adapted ASR

speech recognition

context prompting

fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

privacy leakage

domain-adapted ASR

SpeechLLMs