🤖 AI Summary
This work addresses a critical yet overlooked challenge in multi-module large language model (LLM) agents: directly repairing a diagnosed bottleneck module can inadvertently degrade overall system performance. The authors introduce the “diagnosis paradox” and the “linguistic contract” hypothesis to explain how implicit co-adaptation mechanisms between modules underpin agent robustness. Leveraging causal analysis, they identify true performance bottlenecks and demonstrate through systematic experiments that repairing upstream modules yields significantly better outcomes than directly modifying the bottleneck itself. Furthermore, they propose a novel co-adaptation metric that effectively predicts the risk associated with module-level interventions. The phenomenon and proposed framework are validated across three distinct LLM agent families, offering a new paradigm for safe and effective maintenance of complex LLM pipelines.
📝 Abstract
When a multi-module LLM agent fails, the module most responsible for the failure is not necessarily the best place to intervene. We demonstrate this Diagnostic Paradox empirically: causal analysis consistently identifies the routing module -- which selects which tool to call next -- as the primary bottleneck across three independent agent families. Yet injecting prompt-level correction examples into this module consistently degrades performance, sometimes severely. Patching an upstream query-rewriting module instead reliably improves outcomes. The effect holds with statistical significance on two agent families and directional consistency on a third; alternative repair strategies at the routing module (instruction rewriting, model upgrade) are neutral, confirming that the harm is specific to correction-injection patching.
We explain this asymmetry through the Linguistic Contract hypothesis: each downstream module implicitly adapts to its upstream's characteristic error distribution, so correcting the bottleneck breaks this implicit alignment in a way that upstream corrections do not. We operationalize this via a per-agent co-adaptation measure, derived from diagnosis alone, and show it is consistently associated with patching harm across agent families: higher co-adaptation co-occurs with harm, lower with safety. This trend holds across all three agent families, providing preliminary support for the hypothesis beyond a single-agent observation.