Metadata-Version: 2.4
Name: llamacpp-client
Version: 0.1.3
Summary: Client for LlamaCpp HTTP Server
Author-email: Luís Gomes <lmdgomes@fc.ul.pt>
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: httpx
Requires-Dist: requests

# llamacpp_client

`llamacpp_client` is a Python client library designed to simplify communication with the [LlamaCpp HTTP server](https://github.com/ggerganov/llama.cpp) (as provided by the official Docker image). It provides both synchronous and asynchronous interfaces for interacting with LlamaCpp's completion and chat endpoints.

## Features

- **Synchronous and Asynchronous API:**
  - `LlamaCppClient.completion`: Synchronous API for blocking calls to `/completion` endpoint on LlamaCpp server.
  - `LlamaCppClient.v1_chat_completions`: Synchronous API for synchronous, blocking calls, to OpenAI `/v1/chat/completions` endpoint on LlamaCpp server (or any other OpenAI compatible server).
  - `LlamaCppClient.async_completion`: Asynchronous API for non-blocking, streaming responses to `/completion` endpoint on LlamaCpp server.
  - `LlamaCppClient.async_v1_chat_completions`: Asynchronous API for non-blocking, streaming responses to OpenAI `/v1/chat/completions` endpoint on LlamaCpp server (or any other OpenAI compatible server).
- **Multiple Endpoints Support:**
  Load-balance requests across multiple LlamaCpp server endpoints.
- **Easy Configuration:**
  Use [`LlamaCppServerAddress`](src/llamacpp_client/config.py) to specify server host and port.

## Installation

```sh
pip install .
```

Or add to your pyproject.toml dependencies:

```toml
llamacpp_client = { path = "path/to/llamacpp_client" }
```

## Usage

Synchronous example:

```python
from llamacpp_client import LlamaCppClient, LlamaCppServerAddress

endpoints = [
    LlamaCppServerAddress(host="localhost", port=8080)
]
client = LlamaCppClient(endpoints)
response = client.completion(prompt="Hello, world!")
print(response)
```

Asynchronous example:

```python
import asyncio
from llamacpp_client import LlamaCppClient, LlamaCppServerAddress

async def main():
    endpoints = [
        LlamaCppServerAddress(host="localhost", port=8080)
    ]
    client = LlamaCppClient(endpoints)
    async for chunk in client.async_completion(prompt="Hello, world!"):
        print(chunk.decode(), end="")

asyncio.run(main())
```


## Requirements

- Python 3.9+
- [httpx](https://www.python-httpx.org/)
- [requests](https://docs.python-requests.org/)


## License

MIT

--
Developed by Luís Gomes <luismsgomes@gmail.com>.

