Metadata-Version: 2.4
Name: ollama-spark
Version: 0.1.0
Summary: Terminal toolkit for local Ollama model recommendation, benchmarking, and comparison.
Author-email: Maniraj <your@email.com>
License: MIT
Project-URL: Homepage, https://github.com/maniraj21/ollama-spark
Project-URL: Repository, https://github.com/maniraj21/ollama-spark
Project-URL: Issues, https://github.com/maniraj21/ollama-spark/issues
Keywords: ollama,llm,benchmark,cli,local-ai,model-comparison
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Environment :: Console
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: typer<1.0,>=0.12
Requires-Dist: rich<14.0,>=13.7
Requires-Dist: pydantic<3.0,>=2.7
Requires-Dist: psutil<7.0,>=5.9
Requires-Dist: httpx<1.0,>=0.27
Requires-Dist: platformdirs<5.0,>=4.2
Requires-Dist: pyyaml<7.0,>=6.0
Provides-Extra: dev
Requires-Dist: pytest<9.0,>=8.0; extra == "dev"
Requires-Dist: pytest-cov<6.0,>=5.0; extra == "dev"
Requires-Dist: ruff<1.0,>=0.5; extra == "dev"
Requires-Dist: mypy<2.0,>=1.10; extra == "dev"
Dynamic: license-file

# ollama-spark

ollama-spark is a terminal-first toolkit to help you pick, download, benchmark, and compare Ollama LLM models for local hardware. The project provides:

- hardware detection (CPU, RAM, GPU)
- a curated model catalog with metadata and task capabilities
- model compatibility recommendations for common use-cases (chat, coding, instruct, vision, etc.)
- an Ollama HTTP client for listing/pulling/generating
- a lightweight benchmark runner (TTFT, TPS, latency) and aggregation
- a CLI (`ollama-spark`) to run everything from your terminal

This repository is organized as a Python package and designed to be released to PyPI as `ollama-spark`.

---

## Goals

- Help users identify which Ollama models are compatible with their local hardware.
- Provide a simple benchmark to measure real-world performance on your machine.
- Make it easy to pull recommended models via the local Ollama daemon and compare model trade-offs.
- Be lightweight, well-documented, and easy to extend.

---

## Table of Contents

- [Install](#install)
- [Quick start](#quick-start)
- [Concepts](#concepts)
- [CLI reference](#cli-reference)
- [How the benchmark works](#how-the-benchmark-works)
- [Project layout](#project-layout)
- [Contributing](#contributing)
- [Roadmap / Next steps](#roadmap--next-steps)
- [License](#license)

---

## Install

Recommended: create a virtual environment and install from the project root.

```ollama-spark/README.md#L201-206
python -m venv .venv
source .venv/bin/activate
pip install -e .
```

If you want development dependencies (tests/lint):

```ollama-spark/README.md#L207-212
pip install -e .[dev]
```

Notes:
- The CLI assumes a running Ollama daemon for list/pull/generate operations (default address: `http://127.0.0.1:11434`).
- On macOS with Apple Silicon you'll get MPS detection heuristics; for NVIDIA/AMD GPUs the tool uses `nvidia-smi` / `rocm-smi` / `lspci` where available.

---

## Quick start

Detect hardware:

```ollama-spark/README.md#L213-220
# show a friendly hardware summary
ollama-spark detect
```

List models available in your local Ollama daemon:

```ollama-spark/README.md#L221-226
ollama-spark list-models
```

Get recommendations for coding tasks:

```ollama-spark/README.md#L227-232
ollama-spark recommend --task coding --top-k 5
```

Pull a model (streams download progress from Ollama):

```ollama-spark/README.md#L233-238
ollama-spark pull "llama3.1:8b"
```

Run a quick benchmark (TTFT, TPS, latency):

```ollama-spark/README.md#L239-245
ollama-spark benchmark "llama3.1:8b" \
  --prompt "Write a short Python function that sorts a list" \
  --runs 2 --warmup 1 --timeout 60
```

Compare models (feature comparison + optional runtime micro-benchmark):

```ollama-spark/README.md#L246-252
ollama-spark compare llama3.1:8b qwen2.5:7b --task coding --runtime \
  --prompt "Write a function to compute fibonacci numbers efficiently"
```

---

## Concepts

- Hardware profile: collected via `ollama_spark.hardware` (CPU, RAM, GPUs). This is converted into a canonical `HardwareProfile` used by the recommender.
- Model spec: each model in the bundled `data/models.yaml` contains `min_ram_gb`, `recommended_ram_gb`, `min_vram_gb`, `parameter_billions`, `capabilities` (task scores), and `tags`.
- CompatibilityResult: result of hardware vs model checks (Compatible / Borderline / Incompatible) with reasons and estimated memory needs.
- Benchmark: the runner captures TTFT (time to first token), total latency, TPS (tokens per second), and lightweight resource samples via `psutil`. GPU sampling is best-effort and currently limited.

---

## CLI reference

The package installs a console script `ollama-spark` with the following commands:

- `detect` — detect and display hardware
- `list-models` — list models available to local Ollama
- `recommend` — recommend models for a task using your hardware
- `pull` — pull a model (streams progress)
- `benchmark` — run micro-benchmarks for a model
- `compare` — feature & optional runtime comparison for 2–4 models

Run `ollama-spark --help` or `ollama-spark <command> --help` for details.

Example:

```ollama-spark/README.md#L253-259
ollama-spark recommend --task instruct --top-k 5
```

---

## How the benchmark works (brief)

- Warmup runs (configurable) are executed first (not recorded).
- Measured runs call Ollama's `generate` streaming endpoint and:
  - record wall-clock time until the first token (TTFT)
  - record total time the request takes
  - sample CPU usage and resident memory periodically using `psutil`
  - estimate tokens generated (tries to use server counts if provided; otherwise naive splitting)
- After all runs the tool computes median and p95 for TTFT and TPS, median latency, error rate, and resource aggregations.

Limitations:
- GPU utilization and VRAM peak require polling vendor tools (`nvidia-smi`, `rocm-smi`) — these are not yet fully implemented in the main aggregated report.
- Token counting is approximate unless the Ollama server includes token counts in streaming events.
- Benchmarks will be affected by other local processes and background CPU/GPU load; run them on as quiet a system as possible for repeatable results.

---

## Project layout

Key files and directories:

```ollama-spark/README.md#L260-280
ollama-spark/
├─ ollama_spark/
│  ├─ __init__.py
│  ├─ cli.py
│  ├─ hardware.py
│  ├─ models.py
│  ├─ ollama_client.py
│  ├─ registry.py
│  ├─ recommender.py
│  ├─ benchmark.py
│  └─ data/
│     └─ models.yaml
├─ tests/
└─ pyproject.toml
```

---

## Contributing

I want this to be an excellent open source tool — you can help in several ways:

- File issues for bugs or feature requests on the repository issue tracker.
- Improve/extend the `data/models.yaml` catalog — accuracy of RAM/VRAM values and task scores improves recommendations dramatically.
- Add tests in `tests/` for:
  - registry parsing and validation
  - recommender ranking behavior (unit tests with several hardware profiles)
  - Ollama client error handling (mock HTTP responses)
- Help implement GPU metrics collection for benchmark aggregation (NVIDIA + ROCm + Apple).
- Improve the streaming parsing to match your version of Ollama (event formats vary).

Before you create PRs:
1. Fork the repository.
2. Create a feature branch.
3. Make tests for new behavior and ensure `pytest` passes.
4. Open a PR with a clear description and link to any issues.

---

## Development & CI

Recommended dev commands:

```ollama-spark/README.md#L281-290
# run tests
pytest

# run linter (if configured)
ruff .

# run CLI locally (editable install)
python -m ollama_spark.cli detect
```

I will add a GitHub Actions workflow to run tests and lint on PRs and push to `main` once you confirm CI preferences (Ubuntu + macOS + Python 3.10–3.12 is typical).

---

## Roadmap / Next steps

I will implement these items next (please tell me which you want prioritized):

1. README + LICENSE (this file + add MIT license) — done (README).
2. Add unit tests for registry parsing and recommender logic. (High priority)
3. Add CI workflow (GitHub Actions) for linting and tests. (High priority)
4. Implement GPU usage & VRAM sampling (NVIDIA / ROCm) in the benchmark runner. (Medium)
5. Improve token counting (integrate tokenizers or use server-provided token counts). (Medium)
6. Persist benchmark results to a small local DB and add `history` CLI. (Lower)
7. Prepare packaging and PyPI release (bump version and add release workflow). (Lower)

Tell me which 2–3 items you want me to implement next and I will continue immediately.

---

## Security & privacy notes

- The tool will talk to a local Ollama daemon only by default. It does not upload hardware information remotely.
- If you decide to add remote registries or model repositories, be careful with credentials and always use secure transfer (HTTPS). I can add secure store for API tokens if needed.

---

## License

This project is intended to be MIT-licensed (I'll add a `LICENSE` file with your confirmation). If you prefer a different license, tell me which one.

---

## Contact / Maintainer

If you want me to continue I can:
- add the `LICENSE` file,
- implement tests + CI,
- add GitHub Actions to run tests & lint,
- prepare a PyPI-ready release and draft changelog.

Tell me which items to prioritize next and whether you want me to:
- Use `MIT` or another license
- Target a specific set of Python versions for CI
- Add support for automatic model downloads (pull) after recommendations

I'll proceed once you confirm the next priorities.
