Transparent Visual Reasoning via Object-Centric Agent Collaboration

📅 2025-09-28

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

The core challenge in interpretable visual reasoning lies in generating transparent decision processes grounded in human-understandable concepts. To address this, we propose OCEAN—a novel framework featuring a game-theoretic, multi-agent negotiation mechanism integrated with end-to-end learned object-centric representations, enabling intrinsically interpretable visual reasoning. Each agent models individual scene objects and jointly negotiates to produce coherent, human-identifiable reasoning evidence—without post-hoc processing. On multi-object benchmarks, OCEAN achieves accuracy competitive with state-of-the-art black-box models. User studies demonstrate that its explanations are significantly more intuitive and trustworthy than post-hoc methods (e.g., Grad-CAM, LIME), yielding substantial gains in both faithfulness and comprehensibility. This work establishes a new paradigm for explainable AI that harmonizes theoretical rigor—rooted in game-theoretic principles—with cognitive alignment to human reasoning.

Technology Category

Application Category

📝 Abstract

A central challenge in explainable AI, particularly in the visual domain, is producing explanations grounded in human-understandable concepts. To tackle this, we introduce OCEAN (Object-Centric Explananda via Agent Negotiation), a novel, inherently interpretable framework built on object-centric representations and a transparent multi-agent reasoning process. The game-theoretic reasoning process drives agents to agree on coherent and discriminative evidence, resulting in a faithful and interpretable decision-making process. We train OCEAN end-to-end and benchmark it against standard visual classifiers and popular posthoc explanation tools like GradCAM and LIME across two diagnostic multi-object datasets. Our results demonstrate competitive performance with respect to state-of-the-art black-box models with a faithful reasoning process, which was reflected by our user study, where participants consistently rated OCEAN's explanations as more intuitive and trustworthy.

Problem

Research questions and friction points this paper is trying to address.

Developing explainable AI for visual reasoning using human-understandable concepts

Creating interpretable object-centric frameworks with transparent multi-agent collaboration

Producing faithful visual explanations that outperform posthoc methods like GradCAM

Innovation

Methods, ideas, or system contributions that make the work stand out.

Object-centric representations for visual reasoning

Multi-agent negotiation for transparent decision-making

Game-theoretic approach for coherent evidence selection

🔎 Similar Papers

Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering