Metadata-Version: 2.4
Name: infera-deploy
Version: 0.1.0
Summary: Deploy GGUF models to RunPod or Replicate with one command.
Author-email: Gabriel Cicotoste <gabrielmurilocicotoste6@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/Ga0512/infera
Project-URL: Repository, https://github.com/Ga0512/infera
Project-URL: Issues, https://github.com/Ga0512/infera/issues
Keywords: llm,gguf,llama-cpp,runpod,replicate,deploy,serverless,inference
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: typer>=0.12
Requires-Dist: rich>=13
Requires-Dist: pyyaml>=6
Requires-Dist: requests>=2.31
Requires-Dist: runpod>=1.8
Requires-Dist: llama-cpp-python>=0.3
Dynamic: license-file

<p align="center">
  <img src="https://raw.githubusercontent.com/Ga0512/infera/main/assets/image.png" alt="infera — deploy & chill" width="420">
</p>

# infera

Deploy GGUF (`llama-cpp-python`) models to **RunPod** or **Replicate** with one command.

```bash
pip install infera-deploy

infera init my-project
cd my-project
cp ~/Downloads/llama.gguf models/
infera deploy runpod        # or: replicate
```

That's it. No Dockerfile, no Cog config, no GraphQL — `infera` writes the runtime, builds the image, uploads the model, and registers the serverless endpoint.

> Package name on PyPI is `infera-deploy`; the Python module and CLI are both `infera`.

## What you'll need

- Python 3.10+
- A `.gguf` model file (e.g. from [TheBloke](https://huggingface.co/TheBloke) on Hugging Face)
- For **RunPod**: Docker daemon, [RunPod API key](https://www.runpod.io/console/user/settings), Docker Hub login (`docker login`)
- For **Replicate**: [cog](https://github.com/replicate/cog) (Linux/macOS or WSL), `cog login`

## What `infera deploy` actually does

1. Bundles a runtime tailored to the provider (`Dockerfile` + handler for RunPod, `predict.py` + `cog.yaml` for Replicate)
2. Builds and pushes the container image
3. (RunPod) Creates a network volume and uploads `.gguf` files to it — idempotent, skips unchanged models via MD5
4. Registers / upserts the serverless endpoint
5. Smoke-tests it and prints the URL

Re-runs are idempotent: same template, same volume, only changed bits get re-shipped.

## Calling a deployed endpoint

The job input is OpenAI-ish:

```json
{
  "input": {
    "messages":    [{"role": "user", "content": "Hello"}],
    "model":       "llama",
    "temperature": 0.7,
    "max_tokens":  512
  }
}
```

`model` is optional — it's the filename stem (e.g. `llama-3.2-1b` for `llama-3.2-1b.gguf`). If omitted, the first model alphabetically is used.

For embeddings: `"endpoint": "embeddings"` and `"input": "text"` (or a list).

For function calling / structured output: pass `tools`, `response_format`, or `grammar` (GBNF) the same way you would to OpenAI.

**RunPod:** `POST https://api.runpod.ai/v2/<endpoint>/runsync` with `Authorization: Bearer <RUNPOD_KEY>`.
**Replicate:** standard Replicate API. `messages` and `tools` are JSON-encoded strings (Cog limitation).

## Adding a model to a deployed project

```bash
cp another.gguf models/
infera deploy runpod
```

Idempotent — only the new `.gguf` gets uploaded. Multiple models live side-by-side on the volume; pick one per request via the `model` field.

## Provider configs

First `infera deploy <provider>` drops `<provider>.yaml` into the project root. Edit and re-deploy.

```yaml
# runpod.yaml
gpu:           AMPERE_16,AMPERE_24
gpu_vram_min:  8
workers_min:   0
workers_max:   1
idle_timeout:  5
datacenter:    EU-RO-1
```

## Using the engine locally (advanced)

```python
from infera import Engine

engine = Engine("./models")
print(engine.chat([{"role": "user", "content": "Hello"}]))
```

## Support

If `infera` saved you an afternoon of Dockerfile yak-shaving, consider buying me a coffee:

<a href="https://buymeacoffee.com/gabrielcicotoste">
  <img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" height="48">
</a>

## License

MIT
