Disentangling Causal Importance from Emergent Structure in Multi-Expert Orchestration

📅 2026-02-04

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

This work addresses the common misconception in mixture-of-experts large language models that expert routing frequency reflects functional importance, despite an unclear causal relationship. The authors propose INFORM, a novel method that disentangles expert interaction structure, execution order, and causal contribution, thereby distinguishing routing topology (relational importance) from intrinsic causal importance. Through gradient-based attribution, interaction modeling, controlled decoding, and ablation studies on GSM8K, HumanEval, and MMLU benchmarks, they demonstrate that masking causally critical experts degrades performance significantly more than masking frequently routed ones. This reveals that high-frequency invocation does not necessarily imply functional necessity, while sparsely activated experts may play structurally pivotal roles, uncovering the asynchronous emergent nature and deep interdependencies of the orchestration mechanism.

Technology Category

Application Category

📝 Abstract

Multi-expert systems, where multiple Large Language Models (LLMs) collaborate to solve complex tasks, are increasingly adopted for high-performance reasoning and generation. However, the orchestration policies governing expert interaction and sequencing remain largely opaque. We introduce INFORM, an interpretability analysis that treats orchestration as an explicit, analyzable computation, enabling the decoupling of expert interaction structure, execution order, and causal attribution. We use INFORM to evaluate an orchestrator on GSM8K, HumanEval, and MMLU using a homogeneous consortium of ten instruction-tuned experts drawn from LLaMA-3.1 8B, Qwen-3 8B, and DeepSeek-R1 8B, with controlled decoding-temperature variation, and a secondary heterogeneous consortium spanning 1B-7B parameter models. Across tasks, routing dominance is a poor proxy for functional necessity. We reveal a divergence between relational importance, captured by routing mass and interaction topology, and intrinsic importance, measured via gradient-based causal attribution: frequently selected experts often act as interaction hubs with limited causal influence, while sparsely routed experts can be structurally critical. Orchestration behaviors emerge asynchronously, with expert centralization preceding stable routing confidence and expert ordering remaining non-deterministic. Targeted ablations show that masking intrinsically important experts induces disproportionate collapse in interaction structure compared to masking frequent peers, confirming that INFORM exposes causal and structural dependencies beyond accuracy metrics alone.

Problem

Research questions and friction points this paper is trying to address.

multi-expert orchestration

causal attribution

emergent structure

interpretability

expert routing

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-expert orchestration

causal attribution

interpretability