Metadata-Version: 2.4
Name: llm-aggregator
Version: 0.1.3
Summary: A model aggregator service for multiple LLM backends.
Project-URL: Homepage, https://github.com/Wuodan/llm-aggregator
Project-URL: Repository, https://github.com/Wuodan/llm-aggregator
Project-URL: Issues, https://github.com/Wuodan/llm-aggregator/issues
License: Apache-2.0
License-File: LICENSE
Requires-Python: >=3.10
Requires-Dist: aiohttp
Requires-Dist: apscheduler
Requires-Dist: extract2md
Requires-Dist: fastapi
Requires-Dist: httpx
Requires-Dist: psutil
Requires-Dist: pydantic
Requires-Dist: pydantic-settings
Requires-Dist: pyyaml
Requires-Dist: uvicorn
Description-Content-Type: text/markdown

# LLM Aggregator

LLM Aggregator keeps a live list of every model exposed by your local OpenAI-compatible servers.

## Web Interface

The UI is a single table plus a small RAM widget, so you immediately see what is running:

<!-- pyml disable line-length -->

| Model       | Base URL                     | Types     | Family    | Context | Quant    | Params | Summary                        |
|-------------|------------------------------|-----------|-----------|---------|----------|--------|--------------------------------|
| llama3.1:8b | `http://10.7.2.100:11434/v1` | llm       | Llama 3.1 | 8K      | Q4\_K\_M | 8B     | General chat tuned for balance |
| qwen2.5:14b | `http://10.7.2.100:8080/v1`  | llm,embed | Qwen 2.5  | 32K     | Q5\_0    | 14B    | Multilingual reasoning focused |

<!-- pyml enable line-length -->

Columns:

- `Model` – identifier reported by the provider.
- `Base URL` – where the model is served.
- `Types` – capabilities (LLM, VLM, embedder, etc.).
- `Family` – base architecture inferred by the helper LLM.
- `Context` – approximate context window in tokens.
- `Quant` – quantization hinted by the model name or docs.
- `Params` – estimated parameter count.
- `Summary` – one-line description generated by the helper LLM.

## Features

- **Multi-Provider Discovery**: Automatically discovers models from multiple LLM servers running on different ports
- **AI-Powered Enrichment**: Uses a configurable "brain" LLM to enrich model metadata with details like model family,
  context size, quantization, and capabilities
- **Web Catalog Interface**: Clean web UI for browsing your model collection
- **Real-time Statistics**: Monitors system resources like RAM usage
- **REST API**: Programmatic access to model data and statistics
- **Background Processing**: Continuous model discovery and enrichment without blocking the UI
- **OpenAI-Compatible**: Works with any LLM server that implements the OpenAI `/v1/models` API

## Installation

### Prerequisites

- Python 3.10 or higher
- One or more running LLM servers (Ollama, llama.cpp, nexa, etc.) with OpenAI-compatible APIs

### Install from PyPI

```bash
pip install llm-aggregator
```

## Configuration

All runtime behavior is controlled through the YAML file pointed to by the `LLM_AGGREGATOR_CONFIG` environment variable.
Use [config.yaml](config.yaml) as a reference template.

### Configuration Options

- **host / port** – Where the FastAPI server and static frontend bind.
- **log_level** – Logging verbosity (`DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`). Defaults to `INFO` if omitted.
- **log_format** – Optional `logging` format string. When omitted the service leaves existing logging configuration
  untouched.
- **logger_overrides** – Map of logger names to override their logging level
  (e.g., `httpx: WARNING`).
- **brain** – Settings for the enrichment LLM:
  - `base_url` – HTTP endpoint of the enrichment provider.
  - `id` – Model identifier passed to the provider.
  - `api_key` – Optional bearer token injected into requests.
  - `max_batch_size` – Number of models to enrich at once (defaults to 1).
- **time** – Background scheduling knobs (all in seconds):
  - `fetch_models_interval`
  - `fetch_models_timeout`
  - `enrich_models_timeout`
  - `enrich_idle_sleep`
- **providers** – Each entry describes an OpenAI-compatible backend to query:
  - `base_url` – Public URL returned via the REST API.
  - `internal_base_url` – Optional internal URL used for server-to-server calls; defaults to `base_url` when omitted.
- **model_info_sources** – Ordered list of external websites where markdown context is fetched for enrichment prompts.
  Each entry requires a human-readable `name` (shown to the LLM) and a `url_template` that contains `{model_id}`.

## Usage

Set the `LLM_AGGREGATOR_CONFIG` environment variable to point at your [config.yaml](config.yaml) and the service will
load it on startup.

### Starting the Service

```bash
export LLM_AGGREGATOR_CONFIG=/path/to/config.yaml
llm-aggregator
```

Or run directly:

```bash
export LLM_AGGREGATOR_CONFIG=/path/to/config.yaml
python -m llm_aggregator
```

By default, the web interface will be available at `http://localhost:8888`.
