Content Anonymization for Privacy in Long-form Audio

📅 2025-10-14

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Existing voice anonymization techniques conceal only acoustic features, rendering them vulnerable to content-level re-identification attacks exploiting linguistic style—such as lexical choices, syntactic patterns, and idiosyncratic expressions—especially in long-duration speech. This work is the first to systematically expose semantic-level identity leakage risks in extended speech segments. We propose an end-to-end content anonymization framework: speech is first transcribed via automatic speech recognition (ASR), then subjected to context-aware, semantics-preserving paraphrasing to eliminate speaker-specific linguistic traits, and finally converted back to speech using text-to-speech (TTS). Unlike conventional acoustic-only anonymization, our approach operates at the semantic level while preserving intelligibility and naturalness. Evaluated on telephone conversation data, the method significantly reduces re-identification rates while maintaining high speech intelligibility and perceptual quality, thereby ensuring practical utility in real-world communication scenarios.

Technology Category

Application Category

📝 Abstract

Voice anonymization techniques have been found to successfully obscure a speaker's acoustic identity in short, isolated utterances in benchmarks such as the VoicePrivacy Challenge. In practice, however, utterances seldom occur in isolation: long-form audio is commonplace in domains such as interviews, phone calls, and meetings. In these cases, many utterances from the same speaker are available, which pose a significantly greater privacy risk: given multiple utterances from the same speaker, an attacker could exploit an individual's vocabulary, syntax, and turns of phrase to re-identify them, even when their voice is completely disguised. To address this risk, we propose new content anonymization approaches. Our approach performs a contextual rewriting of the transcripts in an ASR-TTS pipeline to eliminate speaker-specific style while preserving meaning. We present results in a long-form telephone conversation setting demonstrating the effectiveness of a content-based attack on voice-anonymized speech. Then we show how the proposed content-based anonymization methods can mitigate this risk while preserving speech utility. Overall, we find that paraphrasing is an effective defense against content-based attacks and recommend that stakeholders adopt this step to ensure anonymity in long-form audio.

Problem

Research questions and friction points this paper is trying to address.

Addressing privacy risks in long-form audio content

Mitigating speaker re-identification through contextual content anonymization

Preserving meaning while eliminating speaker-specific stylistic features

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contextual transcript rewriting in ASR-TTS pipeline

Eliminating speaker-specific style while preserving meaning

Using paraphrasing as defense against content-based attacks

🔎 Similar Papers

No similar papers found.