AgentLab Research Hub

A comprehensive framework for developing and evaluating web agents
agentlab-diagram

About

AgentLab is a comprehensive framework for developing and evaluating agents on a variety of benchmarks supported by BrowserGym. It provides essential building blocks for creating web agents, unified LLM APIs, and extensive reproducibility features for rigorous research.

The framework supports large-scale parallel agent experiments using Ray, includes various agent architectures, and maintains a unified leaderboard across multiple benchmarks including WebArena, WorkArena, VisualWebArena, AssistantBench, and more.

Research Projects

BrowserGym Ecosystem

A unified environment for web agent research across multiple benchmarks. Provides standardized interfaces and evaluation metrics.

WorkArena Benchmark

Enterprise-focused web agent benchmark with realistic workplace tasks. Multiple difficulty levels (L1, L2, L3) with high seed diversity.

WorkArena++

Advanced benchmark with 682 compositional planning and reasoning tasks for evaluating autonomous agents in enterprise workflows.

FocusAgent

Innovative approach for context trimming in web agents, enhancing efficiency and security.

Supported Benchmarks

Benchmark Tasks Max Steps Multi-tab Status
WebArena 812 30 Available
WorkArena L1-L3 33-341 30-50 Available
VisualWebArena 910 30 Available
AssistantBench 214 30 Available
MiniWoB 125 10 Available
OSWorld 369 Variable Available

Quick Start

Installation

pip install agentlab
playwright install

Basic Usage

from agentlab.agents.generic_agent import AGENT_4o_MINI
from agentlab.experiments.study import make_study

study = make_study(
    benchmark="miniwob",
    agent_args=[AGENT_4o_MINI],
    comment="My first study",
)

study.run(n_jobs=5)

Interactive Assistant

agentlab-assistant --start_url https://www.google.com