Grounded Test-Time Adaptation for LLM Agents

📅 2025-11-06

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

LLMs exhibit poor generalization when deployed in novel web environments or interfaces due to dual syntactic (e.g., observation format mismatches) and semantic (e.g., state-transition dynamics discrepancies) distribution shifts between pretraining and test-time settings. To address this, we propose a test-time adaptation framework comprising two synergistic components: (1) lightweight online distribution alignment via adaptive output-format vectors to mitigate syntactic bias, and (2) role-driven exploration for constructing a nonparametric world model that explicitly decouples and learns environment semantics. The method integrates environment-aware prompt biasing, causal dynamic modeling, and real-time optimization guided by deployment feedback. Evaluated across multiple benchmarks—including WebArena—our approach boosts cross-site task success rates from 2% to 23%, with negligible computational overhead. Results demonstrate strong effectiveness, broad applicability across diverse UI environments, and practical deployability.

Technology Category

Application Category

📝 Abstract

Large language model (LLM)-based agents struggle to generalize to novel and complex environments, such as unseen websites or new sets of functions, due to a fundamental mismatch between their pre-training and test-time conditions. This challenge stems from two distinct failure modes: a syntactic misunderstanding of environment-specific components like observation formats, and a semantic misunderstanding of state-transition dynamics, which are only revealed at test time. To address these issues, we propose two distinct and complementary strategies for adapting LLM agents by leveraging environment-specific information available during deployment. First, an online distributional adaptation method parameterizes environmental nuances by learning a lightweight adaptation vector that biases the model's output distribution, enabling rapid alignment with an environment response format. Second, a deployment-time dynamics grounding method employs a persona-driven exploration phase to systematically probe and learn the environment's causal dynamics before task execution, equipping the agent with a nonparametric world model. We evaluate these strategies across diverse agentic benchmarks, including function calling and web navigation. Our empirical results show the effectiveness of both strategies across all benchmarks with minimal computational cost. We find that dynamics grounding is particularly effective in complex environments where unpredictable dynamics pose a major obstacle, demonstrating a robust path toward more generalizable and capable LLM-based agents. For example, on the WebArena multi-site split, this method increases the agent's success rate from 2% to 23%.

Problem

Research questions and friction points this paper is trying to address.

LLM agents struggle with generalization to novel test-time environments

Agents misunderstand both syntactic formats and semantic state transitions

Methods address environment adaptation through distributional and dynamics learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Online adaptation vector aligns output with environment format

Persona-driven exploration learns environment causal dynamics

Lightweight methods improve generalization with minimal computational cost

🔎 Similar Papers

No similar papers found.