Metadata-Version: 2.4
Name: fitmyllm
Version: 0.3.110
Summary: Find the best local AI model for your GPU — terminal UI
Project-URL: Homepage, https://www.fitmyllm.com
Project-URL: Documentation, https://www.fitmyllm.com/?tab=mcp
Author: Davide Zingaro
License-Expression: AGPL-3.0-or-later
Keywords: gpu,llm,local-ai,model-recommendation,ollama,tui,vram
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: System :: Hardware
Requires-Python: >=3.10
Requires-Dist: httpx>=0.27
Requires-Dist: textual>=3.0
Description-Content-Type: text/markdown

# fitmyllm

Run the right LLM locally. Automatically.

## Install

```bash
pip install fitmyllm
```

Or run without installing:

```bash
pipx run fitmyllm
```

## Setup

Get your free API key at [fitmyllm.com/?tab=mcp](https://www.fitmyllm.com/?tab=mcp), then:

```bash
fitmyllm setup
# Paste your API key (starts with fml_)
```

Or set it as an environment variable:

```bash
export FITMYLLM_API_KEY=fml_your_key_here
```

## Run

```bash
fitmyllm                      # Interactive TUI (9 modes)
fitmyllm chat <model>         # Chat directly with a model
fitmyllm benchmark            # Run a speed benchmark
fitmyllm my-benchmarks        # View your submitted benchmarks
fitmyllm telemetry on|off     # Toggle anonymous speed telemetry
```

## Features

### Main screens

| Screen | Description |
|--------|-------------|
| **Quick Run** | Zero-config: detect GPU → recommend best model → download GGUF → start server → chat. No decisions needed |
| **Find Models** | Auto-detect GPU, 18+ filters (use case, context, size, family, quant, speed, KV cache, capabilities, 14 benchmark minimums, 19 sort options including per-benchmark ranking), multi-GPU support |
| **Find GPU** | GPU recommendations for any model with budget, speed, vendor, and quant filters |
| **Enterprise** | 10-tab deployment analysis: overview, risk, checklist, TCO, scaling, SLA, GPU matrix, performance, fine-tuning, architecture |
| **Model Library** | Browse all installed models from every backend (Ollama, llama-server, local GGUF). Chat, delete, disk usage |
| **Tier List** | Models and GPUs ranked S-F with cloud GPU alternatives |
| **Benchmarks** | Leaderboard sortable by 8 benchmark metrics |
| **GPU Prices** | Search and compare GPU pricing with vendor filter |
| **Run Benchmark** | Select from installed/recommended models, backend-agnostic speed test with community comparison |

### Live Speed Metrics

Chat shows real-time tok/s during streaming and a summary after each response:

```
42.3 tok/s · 210ms TTFT · 156 tokens
```

### Community Speed Telemetry

When opted in (`fitmyllm telemetry on`), the CLI silently collects anonymous speed metrics (tok/s, TTFT) during chat sessions and uploads them to improve predictions. No message content is ever sent.

Community speed data feeds back into the CLI and the [web UI](https://www.fitmyllm.com):

- **Find Models** detail panel: `Community 42 tok/s (12 reports)` alongside predicted speed
- **Model Detail**: per-quant breakdown with median, range, and report count
- **Benchmark results**: your speed vs community median comparison
- **Web model pages**: community speed section on [fitmyllm.com](https://www.fitmyllm.com) model detail pages

### Available from within screens

| Feature | Access | Description |
|---------|--------|-------------|
| **Compare** | `Space` to mark, `c` to compare | Side-by-side comparison of up to 4 models with all metrics |
| **Install** | `i` on any model | Choose quantization, pick engine (8 supported), or download GGUF from HuggingFace with progress bar |
| **Chat** | `c` from Model Library | Talk to models via any backend with real-time streaming and collapsible thinking blocks |
| **Charts** | `v` from Find Models | ASCII score/speed/VRAM bars and quality-vs-speed scatter plot |
| **Command Simulator** | `t` from model detail | Interactive parameter tuning for 8 engines (context, batch size, KV quant, GPU layers) |
| **Export** | `e` from Find Models | Export results as Markdown |

## Multi-Backend Support

The CLI auto-detects running inference backends and works with any of them:

| Backend | Port | Notes |
|---------|------|-------|
| **Ollama** | 11434 | Full support: pull, run, chat, model listing |
| **llama-server** | 8080 | llama.cpp HTTP server — auto-started or manual |
| **OpenAI-compatible** | 8080 | vLLM, LM Studio, or any `/v1/chat/completions` server |

Quick Run can auto-start `llama-server` with optimal parameters (GPU layers, context length, batch size) calculated from your hardware.

## GGUF Model Management

Download and manage GGUF models without Ollama:

- **Download** from any HuggingFace repo by quantization level
- **Inventory** tracked in `~/.fitmyllm/models/inventory.json`
- **Storage** in `~/.fitmyllm/models/` (configurable)
- **No extra dependencies** — uses httpx for downloads

## Keyboard Shortcuts

| Key | Action |
|-----|--------|
| `f` | Toggle filter panel |
| `g` | Search/change GPU |
| `Space` | Mark model for comparison |
| `c` | Compare marked models / Chat from library |
| `d` | Delete model (in Model Library) |
| `i` | Install model |
| `m` | Manual input (in Run Benchmark) |
| `t` | Command simulator / Toggle thinking |
| `s` | Save/unsave model |
| `r` | Refresh / Show HuggingFace README |
| `e` | Export results as Markdown |
| `v` | Show ASCII charts |
| `Ctrl+S` | Save current filters as defaults |
| `Ctrl+T` | Toggle thinking blocks in chat |
| `Esc` | Go back |
| `q` | Quit |

## Supported Engines

Ollama, llama-server, vLLM, LM Studio, llama.cpp, KoboldCpp, Jan, Docker Model Runner

## Data Storage

```
~/.fitmyllm/
  config.json     Preferences, API key, saved models, backend preference, telemetry opt-in
  cache/          API response cache (24h TTL, offline fallback)
  models/         Downloaded GGUF files + inventory.json
```

## Requirements

- Python 3.10+
- API key from [fitmyllm.com](https://www.fitmyllm.com/?tab=mcp)
- Ollama or llama-server (optional — for chat/benchmark features)
