Robust CAPTCHA Using Audio Illusions in the Era of Large Language Models: from Evaluation to Advances

📅 2026-01-13

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

This work addresses the vulnerability of existing audio CAPTCHAs to large audio language models (LALMs) and automatic speech recognition (ASR) systems. To counter this threat, the authors propose AI-CAPTCHA, a novel framework comprising an evaluation platform named ACEval and a new audio CAPTCHA method called IllusionAudio. IllusionAudio is the first to incorporate principles from human auditory perceptual illusions into CAPTCHA design, generating audio challenges that remain easily solvable by humans yet robustly resistant to AI-based recognition. Experimental results demonstrate that IllusionAudio achieves a 100% human success rate while effectively thwarting all tested LALM and ASR attacks, significantly outperforming current state-of-the-art approaches in both usability and security.

Technology Category

Application Category

📝 Abstract

CAPTCHAs are widely used by websites to block bots and spam by presenting challenges that are easy for humans but difficult for automated programs to solve. To improve accessibility, audio CAPTCHAs are designed to complement visual ones. However, the robustness of audio CAPTCHAs against advanced Large Audio Language Models (LALMs) and Automatic Speech Recognition (ASR) models remains unclear. In this paper, we introduce AI-CAPTCHA, a unified framework that offers (i) an evaluation framework, ACEval, which includes advanced LALM- and ASR-based solvers, and (ii) a novel audio CAPTCHA approach, IllusionAudio, leveraging audio illusions. Through extensive evaluations of seven widely deployed audio CAPTCHAs, we show that most existing methods can be solved with high success rates by advanced LALMs and ASR models, exposing critical security weaknesses. To address these vulnerabilities, we design a new audio CAPTCHA approach, IllusionAudio, which exploits perceptual illusion cues rooted in human auditory mechanisms. Extensive experiments demonstrate that our method defeats all tested LALM- and ASR-based attacks while achieving a 100% human pass rate, significantly outperforming existing audio CAPTCHA methods.

Problem

Research questions and friction points this paper is trying to address.

audio CAPTCHA

Large Audio Language Models

Automatic Speech Recognition

security vulnerability

human-bot distinction

Innovation

Methods, ideas, or system contributions that make the work stand out.

audio CAPTCHA

audio illusions

Large Audio Language Models

perceptual robustness

accessibility

🔎 Similar Papers

No similar papers found.