An Information Theoretic Perspective on Agentic System Design

📅 2025-12-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Compressor-predictor design in LM-based agent systems lacks theoretical foundations, and compressor quality remains difficult to evaluate in a task-agnostic manner. Method: This paper pioneers modeling agent architectures as information transmission processes and introduces a mutual-information-based framework for quantifying compressor quality—enabling downstream-task-free performance prediction and system optimization. Contribution/Results: The framework reveals that scaling compressors yields higher cost-performance benefits than scaling predictors. Experiments across Qwen, Llama, and Phi model families demonstrate that a 7B compressor improves accuracy by 1.6×, text conciseness by 4.6×, and information density by 5.5× over a 1.5B baseline; a 3B local compressor recovers 99% of state-of-the-art model accuracy while reducing API costs by 74%. This work establishes the first task-agnostic, information-theoretic design principle for LM agent systems.

Technology Category

Application Category

📝 Abstract
Agentic language model (LM) systems power modern applications like "Deep Research" and "Claude Code," and leverage multi-LM architectures to overcome context limitations. Beneath their apparent diversity lies a recurring pattern: smaller "compressor" LMs (that can even run locally) distill raw context into compact text that is then consumed by larger "predictor" LMs. Despite their popularity, the design of compressor-predictor systems remains largely ad hoc, with little guidance on how compressor and predictor choices shape downstream performance. In practice, attributing gains to compression versus prediction requires costly, task-specific pairwise sweeps. We argue that these agentic system design questions are, at root, information-theoretic. Viewing the compressor LM as a noisy channel, we introduce a simple estimator of mutual information between the context and its compression to quantify compression quality in a task-independent way. We show that mutual information strongly predicts downstream performance, independent of any specific task. Through an information-theoretic framework, we perform a comprehensive empirical analysis across five datasets and three model families. Results reveal that larger compressors not only are more accurate, but also more token-efficient, conveying more bits of information per token. A 7B Qwen-2.5 compressor, for instance, is $1.6 imes$ more accurate, $4.6 imes$ more concise, and conveys $5.5 imes$ more bits of mutual information per token than its 1.5B sibling. Across datasets, scaling compressors is substantially more effective than scaling predictors, enabling larger on-device compressors to pair with smaller cloud predictors. Applied to a Deep Research system, these principles enable local compressors as small as 3B parameters to recover $99%$ of frontier-LM accuracy at $26%$ of API costs.
Problem

Research questions and friction points this paper is trying to address.

Designing compressor-predictor LM systems lacks theoretical guidance for performance optimization.
Attributing performance gains to compression vs prediction requires costly task-specific evaluations.
Determining optimal compressor and predictor choices for downstream tasks is ad hoc.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using mutual information to measure compression quality
Larger compressors are more accurate and token-efficient
Scaling compressors is more effective than scaling predictors
🔎 Similar Papers
No similar papers found.
S
Shizhe He
Department of Computer Science, Stanford University
A
Avanika Narayan
Department of Computer Science, Stanford University
I
Ishan S. Khare
Department of Computer Science, Stanford University
S
Scott W. Linderman
Department of Statistics, Stanford University
Christopher Ré
Christopher Ré
Computer Science, Stanford University
machine learningartificial intelligencemachine learning systemsdata managementAI systems
Dan Biderman
Dan Biderman
Stanford University
Machine LearningTheoretical NeuroscienceCognitive Science