Metadata-Version: 2.4
Name: aurestral
Version: 1.0.0
Summary: Local GGUF AI inference library built on llama-cpp-python with hardware auto-tuning
Author: AyaX_CreationZ
License-Expression: MIT
Project-URL: Homepage, https://github.com/AyaX_CreationZ/aurestral
Project-URL: Documentation, https://github.com/AyaX_CreationZ/aurestral#readme
Keywords: llm,gguf,llama-cpp,inference,local-ai
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: llama-cpp-python>=0.2.90
Requires-Dist: psutil>=5.9.0
Provides-Extra: dev
Requires-Dist: build>=1.0.0; extra == "dev"
Requires-Dist: twine>=5.0.0; extra == "dev"
Dynamic: license-file

# Aurestral

Local GGUF inference for Python, powered by [llama-cpp-python](https://github.com/abetlen/llama-cpp-python). Aurestral discovers models in your project’s `models/` folder, auto-tunes thread counts, context size, and GPU offload for your hardware, and ships with an interactive chatbot CLI.

## Requirements

- Python 3.9+
- A GGUF model file (e.g. from [Hugging Face](https://huggingface.co/models?library=gguf))

## Installation

```bash
pip install aurestral
```

For **NVIDIA GPU** acceleration, install `llama-cpp-python` with CUDA support first, then Aurestral:

```bash
# Windows / Linux (CUDA)
set CMAKE_ARGS=-DGGML_CUDA=on
set FORCE_CMAKE=1
pip install llama-cpp-python --force-reinstall --no-cache-dir

pip install aurestral
```

On **macOS**, the default `llama-cpp-python` wheel typically includes Metal acceleration.

## Project layout

Place GGUF files in a `models/` directory at your project root (or set `AURESTRAL_MODELS_DIR`):

```
my-project/
├── models/
│   └── llama-3.2-3b-instruct.Q4_K_M.gguf
└── main.py
```

## Quick start

### Interactive chatbot

```bash
cd my-project
aurestral
# or explicitly:
aurestral chat -m llama-3.2-3b-instruct.Q4_K_M.gguf
```

Chat commands: `/help`, `/clear`, `/exit`

### Python API

```python
from aurestral import load_model, ChatSession, generate

# One-shot completion
text = generate("Explain quantum entanglement in one sentence.")
print(text)

# Reusable model handle
model = load_model()  # auto-picks sole GGUF, or pass name="my-model"
reply = model.chat([
    {"role": "user", "content": "Hello!"},
])
print(reply)

# Multi-turn session with streaming
session = ChatSession.create(system_prompt="You are a concise coding assistant.")
session.send("Write a Python hello world.", stream=True)
```

### List models and hardware info

```bash
aurestral list
aurestral info
aurestral run "The capital of France is" --stream
```

## Hardware auto-tuning

On load, Aurestral inspects CPU cores, RAM, and whether `llama-cpp-python` was built with GPU offload support. It sets:

| Setting | Behavior |
|--------|----------|
| `n_threads` | Physical cores minus one |
| `n_ctx` | 1k–8k based on available RAM |
| `n_gpu_layers` | `-1` (all layers) when GPU offload is available |
| `use_mlock` | Enabled on high-RAM CPU-only setups |
| `flash_attn` | Enabled when GPU offload is available |

Override defaults with `InferenceConfig` or `auto_tune=False`:

```python
from aurestral import InferenceConfig, load_model

cfg = InferenceConfig(n_ctx=8192, n_gpu_layers=35)
model = load_model("my-model.gguf", config=cfg, auto_tune=False)
```

## Configuration reference

**Environment**

- `AURESTRAL_MODELS_DIR` — path to models folder (instead of `./models`)

**`InferenceConfig`** — load-time: `n_ctx`, `n_batch`, `n_threads`, `n_gpu_layers`, `use_mmap`, `use_mlock`, `flash_attn`

**`GenerateConfig`** — generation-time: `max_tokens`, `temperature`, `top_p`, `top_k`, `repeat_penalty`, `stop`, `stream`

## Publishing to PyPI

```bash
pip install build twine
python -m build
twine upload dist/*
```

## License

MIT License — see [LICENSE](LICENSE).
