FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents

Imene Kerboua, Sahar Omidi Shayegan, Megh Thakkar, Xing Han Lù, Léo Boisvert, Massimo Caccia, Jérémy Espinas, Alexandre Aussem, Véronique Eglin, Alexandre Lacoste

Abstract

Web agents powered by large language models (LLMs) must process lengthy web page observations to complete user goals; these pages often exceed tens of thousands of tokens. This saturates context limits and increases computational cost processing; moreover, processing full pages exposes agents to security risks such as prompt injection.

Existing pruning strategies either discard relevant content or retain irrelevant context, leading to suboptimal action prediction. We introduce FocusAgent, a simple yet effective approach that leverages a lightweight LLM retriever to extract the most relevant lines from accessibility tree (AxTree) observations, guided by task goals.

By pruning noisy and irrelevant content, FocusAgent enables efficient reasoning while reducing vulnerability to injection attacks. Experiments on WorkArena and WebArena benchmarks show that FocusAgent matches the performance of strong baselines, while reducing observation size by over 50%. Furthermore, a variant of FocusAgent significantly reduces the success rate of prompt-injection attacks, including banner and pop-up attacks, while maintaining task success performance in attack-free settings.

Key Innovations

Context Trimming

Intelligent pruning of web page observations using lightweight LLM retrievers, reducing context size by over 50% while maintaining performance.

Security Enhancement

Significant reduction in vulnerability to prompt injection attacks, including banner and pop-up attacks, while preserving task performance.

Efficiency Gains

Reduced computational costs and faster processing by focusing on relevant content extracted from accessibility tree observations.

Goal-Guided Retrieval

Task-aware content extraction that maintains relevance while eliminating noise and irrelevant context from web page observations.

Technical Approach

Lightweight LLM Retriever

FocusAgent employs a lightweight language model as a retriever to identify and extract the most relevant lines from accessibility tree observations. This approach ensures that critical information is preserved while eliminating redundant or irrelevant content.

  • Task-goal guided selection
  • Accessibility tree processing
  • Relevance-based filtering
  • Computational efficiency

Observation Pruning Strategy

Unlike existing approaches that either discard relevant content or retain irrelevant context, FocusAgent's pruning strategy is designed to maintain optimal action prediction capabilities while significantly reducing observation size.

  • Smart content selection
  • Noise reduction
  • Context size optimization
  • Performance preservation

Security-Aware Design

By removing potentially malicious content and focusing on task-relevant information, FocusAgent reduces exposure to prompt injection attacks while maintaining high performance on legitimate tasks.

  • Injection attack mitigation
  • Banner attack protection
  • Pop-up attack defense
  • Maintained task success

Experimental Results

50%+

Context Reduction

Significant reduction in observation size while maintaining performance

2

Benchmark Evaluation

Validated on WorkArena and WebArena benchmarks

Performance Metrics

  • Efficiency: Over 50% reduction in observation size
  • Accuracy: Matches performance of strong baselines
  • Security: Significant reduction in attack success rates
  • Robustness: Maintained performance in attack-free settings

Evaluation Benchmarks

  • WorkArena: Enterprise-focused web agent tasks
  • WebArena: Realistic web-based scenarios
  • Attack Scenarios: Banner and pop-up injection attacks
  • Baseline Comparison: State-of-the-art web agents

Impact and Applications

Practical Deployment

FocusAgent provides a practical solution for deploying web agents in real-world scenarios where computational efficiency and security are critical.

Cost Reduction

Significant reduction in computational costs through efficient context processing, making web agents more accessible and scalable.

Security Benefits

Enhanced security against prompt injection attacks, enabling safer deployment of web agents in enterprise and production environments.

General Applicability

The approach is simple yet effective, making it easily adaptable to various web agent architectures and deployment scenarios.

BibTeX

@article{kerboua2025focusagent,
  title={FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents},
  author={Kerboua, Imene and Shayegan, Sahar Omidi and Thakkar, Megh and L{\\`u}, Xing Han and Boisvert, L{\\'e}o and Caccia, Massimo and Espinas, J{\\'e}r{\\'e}my and Aussem, Alexandre and Gasse, Maxime and Chapados, Nicolas and others},
  journal={arXiv preprint arXiv:2510.03204},
  year={2025}
}