ORCA: Agentic Reasoning For Hallucination and Adversarial Robustness in Vision-Language Models

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Large Vision-Language Models (LVLMs) suffer from two critical reliability issues: hallucination and adversarial vulnerability. To address these, we propose ORCA—a modular agent-based framework implementing an “Observe–Reason–Critique–Act” loop that leverages multi-vision-tool collaboration for iterative verification and correction, without accessing model internals or requiring retraining. Our approach introduces structured reasoning chains to mitigate object-level hallucinations and integrates evidence-driven questioning, cross-model consistency checking, and test-time reasoning optimization. Experiments demonstrate significant improvements: hallucination rate drops to 40.67% on the POPE benchmark (a 3.64% absolute reduction); average accuracy under adversarial attacks increases by 20.11%, rising to a 48.00% gain when combined with defense mechanisms; and the framework supports auditable, step-by-step intermediate reasoning traceability.

Technology Category

Application Category

📝 Abstract

Large Vision-Language Models (LVLMs) exhibit strong multimodal capabilities but remain vulnerable to hallucinations from intrinsic errors and adversarial attacks from external exploitations, limiting their reliability in real-world applications. We present ORCA, an agentic reasoning framework that improves the factual accuracy and adversarial robustness of pretrained LVLMs through test-time structured inference reasoning with a suite of small vision models (less than 3B parameters). ORCA operates via an Observe--Reason--Critique--Act loop, querying multiple visual tools with evidential questions, validating cross-model inconsistencies, and refining predictions iteratively without access to model internals or retraining. ORCA also stores intermediate reasoning traces, which supports auditable decision-making. Though designed primarily to mitigate object-level hallucinations, ORCA also exhibits emergent adversarial robustness without requiring adversarial training or defense mechanisms. We evaluate ORCA across three settings: (1) clean images on hallucination benchmarks, (2) adversarially perturbed images without defense, and (3) adversarially perturbed images with defense applied. On the POPE hallucination benchmark, ORCA improves standalone LVLM performance by +3.64% to +40.67% across different subsets. Under adversarial perturbations on POPE, ORCA achieves an average accuracy gain of +20.11% across LVLMs. When combined with defense techniques on adversarially perturbed AMBER images, ORCA further improves standalone LVLM performance, with gains ranging from +1.20% to +48.00% across evaluation metrics. These results demonstrate that ORCA offers a promising path toward building more reliable and robust multimodal systems.

Problem

Research questions and friction points this paper is trying to address.

Reducing hallucinations in vision-language models

Improving adversarial robustness without retraining

Enhancing reliability through agentic reasoning framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic reasoning framework with Observe-Reason-Critique-Act loop

Uses small vision models for test-time structured inference

Improves accuracy without model retraining or internal access

🔎 Similar Papers

No similar papers found.