HoliAntiSpoof: Audio LLM for Holistic Speech Anti-Spoofing

📅 2026-02-04

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the limitations of existing voice anti-spoofing approaches, which predominantly rely on binary classification and struggle to model the coupled effects of diverse spoofing techniques on both acoustic attributes and semantic content. To overcome this, we propose the first unified anti-spoofing framework based on Audio Large Language Models (ALLMs), reframing the task as a text generation problem to jointly infer the spoofing method, the affected speech attributes, and the resulting semantic distortions. We introduce DailyTalkEdit, a new benchmark enabling semantic-level evaluation, and incorporate in-context learning to enhance out-of-domain generalization. Experimental results demonstrate that our approach significantly outperforms conventional baselines across multiple settings while providing interpretable analyses of spoofing behaviors.

Technology Category

Application Category

📝 Abstract

Recent advances in speech synthesis and editing have made speech spoofing increasingly challenging. However, most existing methods treat spoofing as binary classification, overlooking that diverse spoofing techniques manipulate multiple, coupled speech attributes and their semantic effects. In this paper, we introduce HoliAntiSpoof, the first audio large language model (ALLM) framework for holistic speech anti-spoofing analysis. HoliAntiSpoof reformulates spoofing analysis as a unified text generation task, enabling joint reasoning over spoofing methods, affected speech attributes, and their semantic impacts. To support semantic-level analysis, we introduce DailyTalkEdit, a new anti-spoofing benchmark that simulates realistic conversational manipulations and provides annotations of semantic influence. Extensive experiments demonstrate that HoliAntiSpoof outperforms conventional baselines across multiple settings, while preliminary results show that in-context learning further improves out-of-domain generalization. These findings indicate that ALLMs not only enhance speech spoofing detection performance but also enable interpretable analysis of spoofing behaviors and their semantic effects, pointing towards more trustworthy and explainable speech security. Data and code are publicly available.

Problem

Research questions and friction points this paper is trying to address.

speech anti-spoofing

audio large language model

semantic impact

speech attributes

spoofing detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Audio Large Language Model

Holistic Anti-Spoofing

Text Generation for Spoofing Analysis