Same Signal, Opposite Meaning: Direction-Informed Adaptive Learning for LLM Agents

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This work addresses a critical limitation in existing adaptive reasoning methods for large language model agents, which rely on fixed-direction gating signals that often fail to reliably determine whether additional computation is beneficial across diverse environments and backbone models—sometimes even degrading performance due to incorrect gating direction. To overcome this, the paper introduces Direction-aware Adaptive Learning (DIAL), which reveals for the first time that the utility direction of gating can reverse across different settings. DIAL disentangles “computational demand” from “computational suitability” to dynamically decide whether to invoke extra computation. Leveraging a counterfactual exploration–based sparse gating training strategy, it learns unbiased estimates of state-dependent utility directions for each (environment, backbone model) pair. Experiments demonstrate that DIAL significantly outperforms fixed-direction baselines across six environments and three backbone models, achieving a superior trade-off between task success rate and computational cost.

📝 Abstract

Adaptive test-time compute for LLM agents aims to invoke extra computation only when it improves performance. Existing methods typically use confidence-, uncertainty-, or difficulty-based gates, assuming a fixed direction from the gating signal through compute need to the value of computation. This makes gating a utility-calibration problem: gating signals should align with whether extra computation improves the final outcome over the base policy. We show that this alignment is unstable: the same signal predicts rollout benefit in one setting and rollout harm in another, with reversals across environments and backbones even when the task is fixed. Wrong-direction gates can therefore worsen performance by precisely selecting harmful states. This reversal reflects a deeper distinction between compute need and compute suitability: a high uncertainty signal may indicate decision-difficult states where rollouts help compare alternatives, or intervention-unsuitable states where the current context does not support useful rollout-based improvement. Under this two-source model, fixed-direction gates are unreliable across heterogeneous settings. To address this, we propose DIAL (Direction-Informed Adaptive Learning), a sparse gate trained from signal-agnostic counterfactual exploration to learn the utility direction of state features per (environment, backbone). Across six environments and three backbones, DIAL yields a stronger overall success-cost trade-off than fixed-direction baselines.

Problem

Research questions and friction points this paper is trying to address.

adaptive test-time compute

gating signal

utility direction

compute suitability

LLM agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive test-time compute

direction-informed gating

counterfactual exploration