🤖 AI Summary
This work addresses the lack of reliable, decentralized verification mechanisms for large language models and multi-agent systems in high-stakes scenarios, where existing centralized approaches suffer from poor robustness, limited scalability, and privacy risks. To overcome these limitations, the authors propose TRUST, a novel framework that integrates a hierarchical directed acyclic graph (HDAG) with a causal interaction graph (CIG) to enable parallel auditing and precise root-cause tracing via the DAAN protocol. TRUST further introduces a stake-weighted, multi-layer consensus mechanism that guarantees correctness even under up to 30% malicious participation. Experimental results demonstrate that TRUST achieves 72.4% accuracy across multiple LLMs and benchmarks—surpassing baselines by 4–18%—and remains robust against 20% data poisoning. The DAAN protocol attains 70% attribution accuracy (outperforming baselines by 7–16 percentage points) while reducing token usage by 60%. Human evaluations yield an F1 score of 0.89 and a Brier score of 0.074.
📝 Abstract
Large Reasoning Models (LRMs) and Multi-Agent Systems (MAS) in high-stakes domains demand reliable verification, yet centralized approaches suffer four limitations: (1) Robustness, with single points of failure vulnerable to attacks and bias; (2) Scalability, as reasoning complexity creates bottlenecks; (3) Opacity, as hidden auditing erodes trust; and (4) Privacy, as exposed reasoning traces risk model theft. We introduce TRUST (Transparent, Robust, and Unified Services for Trustworthy AI), a decentralized framework with three innovations: (i) Hierarchical Directed Acyclic Graphs (HDAGs) that decompose Chain-of-Thought reasoning into five abstraction levels for parallel distributed auditing; (ii) the DAAN protocol, which projects multi-agent interactions into Causal Interaction Graphs (CIGs) for deterministic root-cause attribution; and (iii) a multi-tier consensus mechanism among computational checkers, LLM evaluators, and human experts with stake-weighted voting that guarantees correctness under 30% adversarial participation. We prove a Safety-Profitability Theorem ensuring honest auditors profit while malicious actors incur losses. All decisions are recorded on-chain, while privacy-by-design segmentation prevents reconstruction of proprietary logic. Across multiple LLMs and benchmarks, TRUST attains 72.4% accuracy (4-18% above baselines) and remains resilient against 20% corruption. DAAN reaches 70% root-cause attribution (vs. 54-63% for standard methods) with 60% token savings. Human studies validate the design (F1 = 0.89, Brier = 0.074). The framework supports (A1) decentralized auditing, (A2) tamper-proof leaderboards, (A3) trustless data annotation, and (A4) governed autonomous agents, pioneering decentralized AI auditing for safe, accountable deployment of reasoning-capable systems.