To Call or Not to Call: Diagnosing Intrinsic Over-Calling Bias in LLM Agents

📅 2026-05-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

211K/year
🤖 AI Summary
This work addresses the inherent over-reliance of large language model agents on external tools even when unnecessary, a bias that undermines overall decision accuracy. The authors propose the Intrinsic Bias Hypothesis (IBH), which reframes this phenomenon as a quantifiable mechanistic issue. By leveraging sparse autoencoders (SAEs) to extract behavior-alignment features, they construct an activation margin to measure the bias and introduce Adaptive Margin Calibration Steering (AMCS)—a closed-form causal intervention method based on SAE decoding directions. Experiments across six mainstream models demonstrate that AMCS substantially mitigates over-calling while preserving tool-use accuracy, thereby yielding consistent improvements in overall performance with minimal trade-offs.
📝 Abstract
LLM agents exhibit a consistent tendency to over-call, invoking tools even in situations where none is needed. On the When2Call benchmark, six models from three families show high call accuracy but much lower no-call accuracy, leaving overall accuracy in the 55%-70% range. We trace this to an Intrinsic Bias Hypothesis (IBH): the call/no-call decision mapping carries an activation-independent call offset, so the model favors call even at activation parity. Using Sparse Autoencoders (SAEs), we recover behavior-aligned feature bases for the call/no_call decision, reduce them to a signed activation margin, and estimate the offset directly. Across all six models, the model is decision-neutral only when no_call activation outweighs call activation, consistent with IBH. We then causally test IBH with Adaptive Margin-Calibrated Steering (AMCS), a closed-form counter-bias shift along SAE decoder directions. Cancelling the diagnosed offset mitigates over-calling and improves overall accuracy with a negligible drop in call accuracy. Our work recasts over-calling from an empirical phenomenon into a mechanistic object amenable to causal correction. Code is available at https://github.com/SKURA502/agent-sae/.
Problem

Research questions and friction points this paper is trying to address.

over-calling
LLM agents
tool usage
decision bias
intrinsic bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

Intrinsic Bias Hypothesis
Sparse Autoencoders
Over-calling
Adaptive Margin-Calibrated Steering
Mechanistic Interpretability
🔎 Similar Papers
No similar papers found.