🤖 AI Summary
This work addresses the susceptibility of vision-language models (VLMs) to memory bias when interpreting classic optical illusions, which often leads them to overlook genuine perceptual discrepancies. To mitigate this, the authors propose a training-free anti-illusion framework that uniquely integrates type-specific image preprocessing—such as edge extraction, color isolation, morphological operations, and reference line superimposition—with qualitative-comparison-oriented prompt engineering. A multi-vote ensemble strategy is further introduced to steer the model toward attending to actual visual cues. Evaluated on the official test set of the CVPR 2026 DataCV Challenge, the method achieves an accuracy of 90.48%, with a human-verified subset reaching 98.41%, securing second place and demonstrating a significant advance in VLMs’ capacity to reason about visual illusions.
📝 Abstract
Vision-Language Models (VLMs) exhibit systematic bias toward visual illusions, recalling memorized facts rather than perceiving actual visual differences. This paper presents a training-free framework for the 5th DataCV Challenge Task 1 at CVPR 2026, addressing this perception-versus-memory conflict through three complementary strategies:(1) illusion-aware image preprocessing that weakens illusion-inducing context via type-specific transformations (edge extraction, color isolation, morphological processing, and reference-line overlay), (2) anti-illusion prompt engineering guiding VLMs toward qualitative visual comparison, and (3) multi-vote ensemble that further improves robustness. Our method achieves 90.48% accuracy on the official 630-image test set using Claude (claude-opus-4-6) with 5-vote majority ensemble, and 98.41% on a human-verified subset. The approach requires no finetuning, relying solely on visual manipulation and prompt design. Our solution secured 2nd place in the challenge, only 0.47% behind the 1st-place solution. Code is available at https://github.com/jasminezz/sf-illusion-aware-vlm.git.