FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents

📅 2025-10-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language model (LLM)-driven web agents face challenges in long-webpage interaction, including context saturation, high computational overhead, and susceptibility to prompt injection attacks. To address these issues, this paper proposes FocusAgent—a task-oriented, lightweight pruning method leveraging an LLM-based retriever. Its core innovation lies in semantic-guided extraction of action-relevant nodes from the accessibility tree (AxTree), enabling context compression that preserves task-critical information while ensuring semantic consistency. Experiments on WorkArena and WebArena demonstrate that FocusAgent reduces input token count by over 50% on average, maintains baseline performance, accelerates inference significantly, and decreases prompt injection success rate by 62.3%. By jointly optimizing efficiency, security, and generalizability, FocusAgent provides a lightweight and robust solution for long-context web interaction.

Technology Category

Application Category

📝 Abstract
Web agents powered by large language models (LLMs) must process lengthy web page observations to complete user goals; these pages often exceed tens of thousands of tokens. This saturates context limits and increases computational cost processing; moreover, processing full pages exposes agents to security risks such as prompt injection. Existing pruning strategies either discard relevant content or retain irrelevant context, leading to suboptimal action prediction. We introduce FocusAgent, a simple yet effective approach that leverages a lightweight LLM retriever to extract the most relevant lines from accessibility tree (AxTree) observations, guided by task goals. By pruning noisy and irrelevant content, FocusAgent enables efficient reasoning while reducing vulnerability to injection attacks. Experiments on WorkArena and WebArena benchmarks show that FocusAgent matches the performance of strong baselines, while reducing observation size by over 50%. Furthermore, a variant of FocusAgent significantly reduces the success rate of prompt-injection attacks, including banner and pop-up attacks, while maintaining task success performance in attack-free settings. Our results highlight that targeted LLM-based retrieval is a practical and robust strategy for building web agents that are efficient, effective, and secure.
Problem

Research questions and friction points this paper is trying to address.

Web agents face excessive context from lengthy web pages
Existing pruning methods lose relevant or keep irrelevant content
Large observations increase computational costs and security risks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses lightweight LLM retriever for content extraction
Prunes irrelevant AxTree observations using task goals
Reduces observation size while maintaining performance benchmarks
🔎 Similar Papers
No similar papers found.
I
Imene Kerboua
LIRIS - CNRS, INSA Lyon, Universite Claude Bernard Lyon 1
S
Sahar Omidi Shayegan
ServiceNow Research
Megh Thakkar
Megh Thakkar
MILA - Quebec AI Institute
Natural Language ProcessingDeep Learning
Xing Han Lù
Xing Han Lù
PhD Student at McGill University; Mila
Natural Language ProcessingMachine Learning
Léo Boisvert
Léo Boisvert
PhD student. Polytechnique Montréal, MILA
LLM-based AgentsAI Agent Security
M
Massimo Caccia
ServiceNow Research
J
Jérémy Espinas
Esker
A
Alexandre Aussem
LIRIS - CNRS, INSA Lyon, Universite Claude Bernard Lyon 1
V
Véronique Eglin
LIRIS - CNRS, INSA Lyon, Universite Claude Bernard Lyon 1
Alexandre Lacoste
Alexandre Lacoste
Staff Research Scientist, ServiceNow Research
machine learning