Running Benchmarks#
This guide explains how to run benchmarks using the argscape benchmark command-line tool.
Prerequisites#
Before running benchmarks, ensure:
ARGscape is installed with spatial dependencies:
pip install argscape[spatial]
The ARGscape server is running:
argscape serve --no-browser
For visualization benchmarks, Playwright browsers are installed:
pip install playwright playwright install chromium
Basic Usage#
The simplest way to run all benchmarks:
argscape benchmark run
This will:
Check server availability at
http://localhost:8000Look for datasets in the default location
Run all inference methods on all datasets
Run visualization benchmarks
Write results to
./results/
The run Subcommand#
Command Syntax#
argscape benchmark run [OPTIONS]
Options#
Option |
Short |
Description |
Default |
|---|---|---|---|
|
|
Output directory for results |
|
|
|
Dataset directory |
Generated datasets |
|
|
Comma-separated list of methods |
All methods |
|
Only run inference benchmarks |
False |
|
|
Only run visualization benchmarks |
False |
|
|
|
Number of repetitions per benchmark |
3 |
|
Timeout per inference run (seconds) |
600 |
|
|
ARGscape server URL |
|
|
|
Show browser during visualization tests |
False (headless) |
|
|
Resume from previous run, skip successful benchmarks |
False |
|
|
|
Suppress progress output |
False |
|
Enable debug output |
False |
Examples#
Run only inference benchmarks:
argscape benchmark run --inference-only
Run specific methods:
argscape benchmark run --methods fastgaia,midpoint,tsdate
Use custom dataset directory:
argscape benchmark run --datasets /path/to/my/datasets
Run with more repetitions for stability:
argscape benchmark run --repetitions 5
Connect to a different server:
argscape benchmark run --server-url http://localhost:9000
Resume a failed benchmark run:
argscape benchmark run --resume
The --resume flag reads previous results and skips benchmarks that completed successfully. This is useful for long-running benchmark suites that may be interrupted.
Debug mode with visible browser:
argscape benchmark run --visualization-only --show-browser --debug
Workflow#
Step 1: Generate Datasets (Optional)#
If you do not have datasets, generate them first:
argscape benchmark generate-datasets
See Benchmark Datasets for details on dataset generation.
Step 2: Start the Server#
In a separate terminal:
argscape serve --no-browser
The benchmark tool will verify the server is running before proceeding.
Step 3: Run Benchmarks#
argscape benchmark run --output my_results
Progress is displayed as benchmarks run:
Checking server at http://localhost:8000...
Server is available
Clearing session...
Found 7 datasets
=== Running Inference Benchmarks ===
Using server at http://localhost:8000
[1/56] fastgaia on samples_50 (API)...
[2/56] fastgaia on samples_100 (API)...
...
=== Running Visualization Benchmarks ===
Using server at http://localhost:8000
Running visualization benchmarks on 7 datasets...
Results written to my_results
Step 4: Generate Reports#
After benchmarks complete, generate publication-ready reports:
argscape benchmark report --results my_results
This creates:
LaTeX tables in
my_results/tables/Matplotlib plots in
my_results/figures/
Output Directory Structure#
After a complete benchmark run, the output directory contains:
results/
inference/
raw_metrics.csv # All inference measurements
outputs/ # Inference output files (if using subprocess mode)
visualization/
raw_metrics.csv # All visualization measurements
tables/
inference_comparison.tex # LaTeX table
figures/
scaling_plots.png # PNG plots
scaling_plots.pdf # PDF plots for papers
summary.json # Complete JSON with system info
Timing and Repetitions#
Each benchmark is run multiple times (default: 3) to improve measurement stability. The result with the median wall time is reported.
For accurate timing:
Close other applications to reduce CPU contention
Use a machine with consistent performance (avoid laptops in power-saving mode)
Consider longer timeouts for large datasets
Memory Measurement#
Memory usage is measured server-side when running via the API. The peak_memory_mb metric reflects the maximum memory consumed during the inference process.
Timeout Handling#
If an inference method exceeds the timeout (default: 10 minutes):
The benchmark is terminated
The result is recorded with an error message
The benchmark suite continues with the next method/dataset
Increase the timeout for large datasets:
argscape benchmark run --timeout 1800 # 30 minutes
Troubleshooting#
Server Not Available#
Error: ARGscape server not available at http://localhost:8000.
Start the server first with: argscape serve --no-browser
Ensure the server is running and accessible at the specified URL.
No Datasets Found#
Error: No datasets found. Run 'argscape benchmark generate-datasets' first.
Generate datasets or specify a directory containing .trees files.
Playwright Not Installed#
RuntimeError: Playwright is required for visualization benchmarks.
Install with: pip install playwright && playwright install
Install Playwright and browser drivers:
pip install playwright
playwright install chromium
Timeout Errors#
If methods consistently time out, consider:
Using smaller datasets
Increasing the timeout value
Excluding slow methods with
--methods