CaptchaMind: Training CAPTCHA Solvers via Reinforcement Learning with Explicit Reasoning Supervision

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the challenge that existing CAPTCHA verification mechanisms—relying on complex multi-step visual reasoning and interaction—significantly hinder end-to-end automation of intelligent agents on real-world websites, a problem exacerbated by limited training data and the absence of process-level annotations. To overcome these limitations, the authors introduce CaptchaBench, the first large-scale CAPTCHA benchmark featuring fine-grained region and explicit reasoning-process annotations, alongside CaptchaMind, a reinforcement learning–based solver that leverages explicit reasoning supervision to enhance its ability to handle intricate visual details and region-comparison tasks. Experimental results demonstrate that the proposed approach achieves an average success rate of 82.9% across eight task categories and 71.0% on real-world CAPTCHA instances, substantially outperforming all existing methods that do not rely on proprietary APIs.

📝 Abstract

CAPTCHAs are widely deployed as human verification mechanisms and frequently block intelligent agents from completing end-to-end automation in real-world web environments. Solving modern CAPTCHAs requires robust multi-step visual reasoning and interaction capabilities, yet training-based approaches have remained absent due to the lack of large-scale training data and process-level annotations. We introduce CaptchaBench, the first CAPTCHA benchmark designed to support large-scale training, comprising 16,000 programmatically generated samples across eight task categories with detailed region and process-level annotations. Systematic evaluation on CaptchaBench reveals that existing methods fail consistently on tasks requiring fine-grained visual detail capture and region-level comparison. We therefore present CaptchaMind, an RL-based solver trained with explicit reasoning process supervision, achieving 82.9% average success rate across eight tasks and 71.0% on real-world instances, substantially outperforming all existing methods without closed-source APIs.

Problem

Research questions and friction points this paper is trying to address.

CAPTCHA

visual reasoning

intelligent agents

human verification

end-to-end automation

Innovation

Methods, ideas, or system contributions that make the work stand out.

reinforcement learning

explicit reasoning supervision

CAPTCHA solving