๐ค AI Summary
Traditional CAPTCHAs struggle to distinguish humans from machines in the face of multimodal GUI agents endowed with advanced reasoning capabilities. This work proposes the first scalable human verification framework based on dynamic task generation, leveraging a backend-driven, infinitely generative mechanism to design challenges that exploit cognitive gaps between humans and AI in interactive perception, memory, and intuitive decision-making. By transcending the limitations of static datasets, the framework employs adaptive, non-procedural tasks that effectively counter state-of-the-art models such as GPT-5.2-Xhigh, drastically reducing their success rate on complex logical puzzles from 90% to near-chance levels, thereby reestablishing a robust boundary for humanโmachine differentiation in the age of intelligent agents.
๐ Abstract
The rapid evolution of GUI-enabled agents has rendered traditional CAPTCHAs obsolete. While previous benchmarks like OpenCaptchaWorld established a baseline for evaluating multimodal agents, recent advancements in reasoning-heavy models, such as Gemini3-Pro-High and GPT-5.2-Xhigh have effectively collapsed this security barrier, achieving pass rates as high as 90% on complex logic puzzles like"Bingo". In response, we introduce Next-Gen CAPTCHAs, a scalable defense framework designed to secure the next-generation web against the advanced agents. Unlike static datasets, our benchmark is built upon a robust data generation pipeline, allowing for large-scale and easily scalable evaluations, notably, for backend-supported types, our system is capable of generating effectively unbounded CAPTCHA instances. We exploit the persistent human-agent"Cognitive Gap"in interactive perception, memory, decision-making, and action. By engineering dynamic tasks that require adaptive intuition rather than granular planning, we re-establish a robust distinction between biological users and artificial agents, offering a scalable and diverse defense mechanism for the agentic era.