Metadata-Version: 2.4
Name: localbench
Version: 0.0.1
Summary: Add your description here
Author-email: Ramon Perez <ramon@menlo.ai>, Minh Nguyen <minh@menlo.ai>
License-Expression: MIT
Keywords: benchmark,cortex,llm,machine-learning
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.12
Requires-Dist: aiohttp>=3.11.12
Requires-Dist: gpustat>=1.1.1
Requires-Dist: psutil>=6.1.1
Requires-Dist: pyyaml>=6.0.2
Requires-Dist: rich>=13.9.4
Requires-Dist: typer>=0.15.1
Requires-Dist: websockets>=14.2
Description-Content-Type: text/markdown

# localbench

<p align="center">
  <img width="1280" alt="localbench Banner" src="./images/robo-bench1.jpg">
</p>

A benchmarking tool for Local LLMs. Currently works with [Cortex.cpp](https://github.com/janhq/cortex.cpp).

## What is this?

`localbench` measures performance metrics, resource utilization, and stability characteristics of your LLM deployments. Rather comprehensive, really.

## Features

- Model initialization metrics
- Runtime performance
- Resource utilization
- Advanced processing scenarios
- Workload-specific benchmarks
- System integration metrics
- Stability analysis

## Installation

Using `uvx`:
```bash
uvx install localbench
```

Using pip:
```bash
pip install localbench
```

## Usage

### Basic Benchmarking
```bash
# Standard benchmark
localbench benchmark "llama3.2:3b-gguf-q2-k"

# With detailed metrics
localbench benchmark "llama3.2:3b-gguf-q2-k" --verbose
```

### Specific Benchmarks
```bash
# Initialization only
localbench benchmark "llama3.2:3b-gguf-q2-k" --type init

# Runtime metrics
localbench benchmark "llama3.2:3b-gguf-q2-k" --type runtime

# Long-running stability test
localbench benchmark "llama3.2:3b-gguf-q2-k" --type stability --stability-duration 24
```

### Advanced Usage
```bash
# Custom benchmark prompts
localbench benchmark "llama3.2:3b-gguf-q2-k" --type workload --prompts my_prompts.json

# Multi-model benchmarking
localbench benchmark "llama3.2:3b-gguf-q2-k" --type advanced \
    --secondary-models "tinyllama:1b-gguf-q4" "phi2:3b-gguf-q4"

# Export results
localbench benchmark "llama3.2:3b-gguf-q2-k" --json results.json
```

## Status

Under active development. Support for additional frameworks is planned.

## Roadmap

- Framework-agnostic benchmarking
- Additional performance metrics
- Enhanced visualizations
- Extended stability testing

## Contributing

Issues and pull requests welcome. Do have a look at the existing ones first, though.
