Metadata-Version: 2.4
Name: agent-as-annotators
Version: 0.1.0
Summary: Agent-as-Annotators: Structured Distillation of Web Agent Capabilities
Author-email: Xing Han Lu <your.email@example.com>
License: TBD
Project-URL: Homepage, https://github.com/McGill-NLP/agent-as-annotators
Project-URL: Repository, https://github.com/McGill-NLP/agent-as-annotators
Project-URL: Bug Tracker, https://github.com/McGill-NLP/agent-as-annotators/issues
Keywords: llm,annotation,machine-learning,nlp
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.12
Description-Content-Type: text/markdown
Requires-Dist: accelerate>=1.11.0
Requires-Dist: agentlab
Requires-Dist: bitsandbytes>=0.48.2
Requires-Dist: browsergym-meta
Requires-Dist: browsergym-workarena==0.5.3
Requires-Dist: flash-attn>=2.8.3
Requires-Dist: google-auth>=2.43.0
Requires-Dist: langchain>=0.3.27
Requires-Dist: peft>=0.18.1
Requires-Dist: qwen-vl-utils>=0.0.14
Requires-Dist: transformers==4.57.*
Requires-Dist: trl>=0.25.0
Requires-Dist: vllm>=0.11.0
Requires-Dist: gradio<6,>=5
Requires-Dist: playwright-stealth>=2.0.2
Provides-Extra: dev
Requires-Dist: black>=21.0; extra == "dev"
Provides-Extra: docs

<div align="center">

# Agent-as-Annotators (A3)

| [**💾 Code**](https://github.com/McGill-NLP/agent-as-annotators) | [**📄 Paper**](https://arxiv.org/abs/2604.07776) | [**🌐 Website**](https://agent-as-annotators.github.io) |
| :--: | :--: | :--: |
| [**🤗 Dataset**](https://huggingface.co/datasets/McGill-NLP/A3-Synth) | [**🤖 Model**](https://huggingface.co/McGill-NLP/A3-Qwen3.5-9B) | [**📦 PyPI**](https://pypi.org/project/agent-as-annotators/) |

[**Structured Distillation of Web Agent Capabilities Enables Generalization**](https://arxiv.org/abs/2604.07776)

*Xing Han Lù, Siva Reddy*

</div>

This repository contains the code for the A3 framework, which uses LLMs to systematically generate synthetic web agent training data by decomposing the annotation process into three roles: **Task Designer**, **Annotator**, and **Supervisor**.

## Installation

```bash
pip install agent-as-annotators
```

Or install from source:

```bash
git clone https://github.com/McGill-NLP/agent-as-annotators.git
cd agent-as-annotators
pip install -e .
```

## Quick Start: Evaluation

### 1. Serve a model with vLLM

```bash
vllm serve --config configs/vllm/Qwen3.5-9B.yaml
```

### 2. Run evaluation

```bash
a3-eval --benchmark webarena_test --model A3-qwen3.5-9b
```

## Pipeline: Generating A3-Synth

The A3 pipeline generates synthetic training data in 5 steps:

### Step 1: Create personas
```bash
python scripts/create_personas.py
```

### Step 2: Generate task intents (via exploration)
```bash
a3-explore
python scripts/generate_task_intents.py
```

### Step 3: Create A3-Synth task configs
```bash
python scripts/create_synth_configs.py
```

### Step 4: Collect trajectories
```bash
a3-synth --benchmark a3_synth --model gemini-3-pro
```

### Step 5: Convert to training data
```bash
python scripts/convert_trajectories_to_json.py
python scripts/generate_rft_data.py
```

## Training

```bash
a3-train --config configs/train/qwen3.5-9b.json
```

Training uses SFT with FSDP for multi-GPU parallelism. See `configs/train/` for hyperparameters and `configs/accelerate/` for FSDP configuration.

## CLI Commands

| Command | Description |
|---------|-------------|
| `a3-eval` | Run evaluation on WebArena, VisualWebArena, WorkArena, MiniWoB |
| `a3-synth` | Run trajectory collection for A3-Synth |
| `a3-explore` | Run environment exploration |
| `a3-train` | Fine-tune a model with SFT |
| `a3-screen-utils` | Screen session management utilities |

## Project Structure

```
agent-as-annotators/
  agent_as_annotators/       # Core package
    cli/                     # CLI entry points (eval, synth, explore, train)
    modeling.py              # Agent model wrapper (vLLM, Gemini, OpenAI)
    prompts/                 # All prompt templates
    judge/                   # Inverted evaluation protocol (Judge module)
    benchmarks/a3_synth/     # A3-Synth benchmark registration
    exploration/             # Exploration task registration
    utils/                   # Utilities
    configs/a3_synth/        # A3-Synth task configurations
  configs/
    model_configs.json       # Model registry
    train/                   # Training hyperparameters
    vllm/                    # vLLM serving configs
    accelerate/              # FSDP configs
  scripts/                   # Data pipeline scripts
```
