Disentangling Causal Importance from Emergent Structure in Multi-Expert Orchestration

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the common misconception in mixture-of-experts large language models that expert routing frequency reflects functional importance, despite an unclear causal relationship. The authors propose INFORM, a novel method that disentangles expert interaction structure, execution order, and causal contribution, thereby distinguishing routing topology (relational importance) from intrinsic causal importance. Through gradient-based attribution, interaction modeling, controlled decoding, and ablation studies on GSM8K, HumanEval, and MMLU benchmarks, they demonstrate that masking causally critical experts degrades performance significantly more than masking frequently routed ones. This reveals that high-frequency invocation does not necessarily imply functional necessity, while sparsely activated experts may play structurally pivotal roles, uncovering the asynchronous emergent nature and deep interdependencies of the orchestration mechanism.

Technology Category

Application Category

📝 Abstract
Multi-expert systems, where multiple Large Language Models (LLMs) collaborate to solve complex tasks, are increasingly adopted for high-performance reasoning and generation. However, the orchestration policies governing expert interaction and sequencing remain largely opaque. We introduce INFORM, an interpretability analysis that treats orchestration as an explicit, analyzable computation, enabling the decoupling of expert interaction structure, execution order, and causal attribution. We use INFORM to evaluate an orchestrator on GSM8K, HumanEval, and MMLU using a homogeneous consortium of ten instruction-tuned experts drawn from LLaMA-3.1 8B, Qwen-3 8B, and DeepSeek-R1 8B, with controlled decoding-temperature variation, and a secondary heterogeneous consortium spanning 1B-7B parameter models. Across tasks, routing dominance is a poor proxy for functional necessity. We reveal a divergence between relational importance, captured by routing mass and interaction topology, and intrinsic importance, measured via gradient-based causal attribution: frequently selected experts often act as interaction hubs with limited causal influence, while sparsely routed experts can be structurally critical. Orchestration behaviors emerge asynchronously, with expert centralization preceding stable routing confidence and expert ordering remaining non-deterministic. Targeted ablations show that masking intrinsically important experts induces disproportionate collapse in interaction structure compared to masking frequent peers, confirming that INFORM exposes causal and structural dependencies beyond accuracy metrics alone.
Problem

Research questions and friction points this paper is trying to address.

multi-expert orchestration
causal attribution
emergent structure
interpretability
expert routing
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-expert orchestration
causal attribution
interpretability
emergent structure
LLM collaboration
🔎 Similar Papers
No similar papers found.