AudioGuard: Toward Comprehensive Audio Safety Protection Across Diverse Threat Models

📅 2026-04-10

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the limitations of existing audio safety mechanisms in handling multidimensional threats, including harmful sounds, children’s speech, voice spoofing, and combined audio-content risks. To this end, the authors propose AudioGuard, a unified defense framework that integrates waveform-level detection (SoundGuard) with policy-based semantic content moderation (ContentGuard). They also introduce AudioSafetyBench, the first comprehensive audio safety benchmark encompassing multilingual data, suspicious human voices, risk combinations, and non-speech audio. Experimental results demonstrate that AudioGuard significantly outperforms strong baseline audio foundation models across multiple benchmarks, achieving higher accuracy with low latency and offering a systematic solution for robust audio safety.

Technology Category

Application Category

📝 Abstract

Audio has rapidly become a primary interface for foundation models, powering real-time voice assistants. Ensuring safety in audio systems is inherently more complex than just"unsafe text spoken aloud": real-world risks can hinge on audio-native harmful sound events, speaker attributes (e.g., child voice), impersonation/voice-cloning misuse, and voice-content compositional harms, such as child voice plus sexual content. The nature of audio makes it challenging to develop comprehensive benchmarks or guardrails against this unique risk landscape. To close this gap, we conduct large-scale red teaming on audio systems, systematically uncover vulnerabilities in audio, and develop a comprehensive, policy-grounded audio risk taxonomy and AudioSafetyBench, the first policy-based audio safety benchmark across diverse threat models. AudioSafetyBench supports diverse languages, suspicious voices (e.g., celebrity/impersonation and child voice), risky voice-content combinations, and non-speech sound events. To defend against these threats, we propose AudioGuard, a unified guardrail consisting of 1) SoundGuard for waveform-level audio-native detection and 2) ContentGuard for policy-grounded semantic protection. Extensive experiments on AudioSafetyBench and four complementary benchmarks show that AudioGuard consistently improves guardrail accuracy over strong audio-LLM-based baselines with substantially lower latency.

Problem

Research questions and friction points this paper is trying to address.

audio safety

threat models

voice impersonation

harmful sound events

voice-content composition

Innovation

Methods, ideas, or system contributions that make the work stand out.

AudioGuard

AudioSafetyBench

audio-native safety