Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning

📅 2025-07-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient trustworthiness of multimodal multi-agent systems in zero-shot scenarios, this paper proposes a trust-aware modular visual classification architecture. It decouples perception (CLIP-based image retrieval) from meta-reasoning (RAG-augmented language modeling), and introduces a dynamic trust calibration mechanism coupled with an iterative re-evaluation loop. Confidence quantification and regulation are achieved via metrics including Expected Calibration Error (ECE) and Concordance Correlation Coefficient (CCC), effectively mitigating agent overconfidence. Evaluated on a zero-shot apple leaf disease diagnosis task, our system achieves 85.63% accuracy—representing a 77.94% improvement over baseline methods. GPT-4o demonstrates superior calibration capability, while image-specific RAG substantially enhances reasoning reliability. All code and experimental configurations are fully open-sourced to ensure reproducibility.

Technology Category

Application Category

📝 Abstract
Modern Artificial Intelligence (AI) increasingly relies on multi-agent architectures that blend visual and language understanding. Yet, a pressing challenge remains: How can we trust these agents especially in zero-shot settings with no fine-tuning? We introduce a novel modular Agentic AI visual classification framework that integrates generalist multimodal agents with a non-visual reasoning orchestrator and a Retrieval-Augmented Generation (RAG) module. Applied to apple leaf disease diagnosis, we benchmark three configurations: (I) zero-shot with confidence-based orchestration, (II) fine-tuned agents with improved performance, and (III) trust-calibrated orchestration enhanced by CLIP-based image retrieval and re-evaluation loops. Using confidence calibration metrics (ECE, OCR, CCC), the orchestrator modulates trust across agents. Our results demonstrate a 77.94% accuracy improvement in the zero-shot setting using trust-aware orchestration and RAG, achieving 85.63% overall. GPT-4o showed better calibration, while Qwen-2.5-VL displayed overconfidence. Furthermore, image-RAG grounded predictions with visually similar cases, enabling correction of agent overconfidence via iterative re-evaluation. The proposed system separates perception (vision agents) from meta-reasoning (orchestrator), enabling scalable and interpretable multi-agent AI. This blueprint is extensible to diagnostics, biology, and other trust-critical domains. All models, prompts, results, and system components including the complete software source code are openly released to support reproducibility, transparency, and community benchmarking at Github: https://github.com/Applied-AI-Research-Lab/Orchestrator-Agent-Trust
Problem

Research questions and friction points this paper is trying to address.

Enhancing trust in multi-agent AI for zero-shot visual classification
Improving accuracy via trust-aware orchestration and RAG-based reasoning
Calibrating agent confidence in disease diagnosis without fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular AI with trust-aware orchestrator
RAG-based reasoning for visual classification
CLIP-enhanced trust calibration metrics
🔎 Similar Papers
No similar papers found.