🤖 AI Summary
Current large language models lack intermediate verification mechanisms during reasoning, leading to error propagation and limited auditability. This work proposes a blackboard-based multi-agent framework that decouples roles into a Skeptic-Researcher-Judge adversarial deliberation loop, separating logical justification from factual grounding to produce modular, verifiable reasoning traces. It introduces, for the first time, a step-level cognitive auditing mechanism enabling real-time validation and correction during inference. Evaluated on four major benchmarks—including OpenBookQA and TruthfulQA—the method significantly outperforms GEMINI-3.1-Pro and ReConcile, achieving state-of-the-art results across four fine-grained metrics. Furthermore, the approach is compatible with diverse backbone models and substantially enhances the modularity, transparency, and trustworthiness of model reasoning.
📝 Abstract
While explicit reasoning trajectories enhance model interpretability, existing paradigms often rely on monolithic chains that lack intermediate verification, allowing early errors to cascade unchecked. This lack of modularity impedes granular auditing and compromises the epistemic trust required for high-stakes applications. We propose MAVEN (Multi-Agent Verification-Elaboration Network with In-Step Epistemic Auditing), a blackboard-inspired framework designed to transform LLMs into deliberate reasoners through explicit role-decoupling. At its core, MAVEN operationalizes an adversarial Skeptic-Researcher-Judge loop, simulating expert deliberation by functionally separating logical defense from factual grounding. Experiments on OpenBookQA, TruthfulQA, HALUEVAL and StrategyQA benchmarks demonstrate that MAVEN delivers superior reasoning quality across four fine-grained metrics. Notably, MAVEN consistently outperforms latent reasoning models such as GEMINI-3.1-Pro and consensus-based baselines (e.g., ReConcile) by generating explicitly structured, modular, and verifiable deliberation trajectories, rather than relying on implicit internal states or post-hoc consensus. Moreover, comprehensive evaluations confirm that MAVEN is fully model-agnostic, serving as a strong and transferable reasoning booster that yields substantial performance improvements across diverse backbone models.