TriEx: A Game-based Tri-View Framework for Explaining Internal Reasoning in Multi-Agent LLMs

📅 2026-04-21

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

In partially observable multi-agent environments, decision-making by large language models (LLMs) hinges on dynamic beliefs and others’ behaviors, rendering their reasoning processes opaque. This work proposes TriEx, a novel framework that introduces the first tri-perspective explanation mechanism tailored for multi-agent LLMs: first-person self-reasoning, second-person modeling of opponents’ beliefs, and third-person environment-anchored auditing. By transforming explanations into comparable and verifiable evidential artifacts, TriEx uncovers systematic inconsistencies between agents’ actions, utterances, and underlying beliefs, and redefines explainability as an interaction-dependent property. Experimental results demonstrate that TriEx enables scalable analysis of explanation fidelity, belief evolution, and evaluation reliability, thereby validating the efficacy of its multi-perspective, evidence-anchored assessment paradigm.

Technology Category

Application Category

📝 Abstract

Explainability for Large Language Model (LLM) agents is especially challenging in interactive, partially observable settings, where decisions depend on evolving beliefs and other agents. We present \textbf{TriEx}, a tri-view explainability framework that instruments sequential decision making with aligned artifacts: (i) structured first-person self-reasoning bound to an action, (ii) explicit second-person belief states about opponents updated over time, and (iii) third-person oracle audits grounded in environment-derived reference signals. This design turns explanations from free-form narratives into evidence-anchored objects that can be compared and checked across time and perspectives. Using imperfect-information strategic games as a controlled testbed, we show that TriEx enables scalable analysis of explanation faithfulness, belief dynamics, and evaluator reliability, revealing systematic mismatches between what agents say, what they believe, and what they do. Our results highlight explainability as an interaction-dependent property and motivate multi-view, evidence-grounded evaluation for LLM agents. Code is available at https://github.com/Einsam1819/TriEx.

Problem

Research questions and friction points this paper is trying to address.

Explainability

Multi-Agent LLMs

Partially Observable Environments

Belief Dynamics

Interactive Decision Making

Innovation

Methods, ideas, or system contributions that make the work stand out.

TriEx

multi-agent LLMs

explainability