Metadata-Version: 2.4
Name: anyllm
Version: 0.2.0
Summary: A thin, unified LLM abstraction layer. Call any LLM with a single API.
Project-URL: Homepage, https://github.com/vietanhdev/anyllm
Project-URL: Documentation, https://github.com/vietanhdev/anyllm#readme
Project-URL: Repository, https://github.com/vietanhdev/anyllm
Project-URL: Issues, https://github.com/vietanhdev/anyllm/issues
Author-email: Viet-Anh Nguyen <vietanh.dev@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: ai,anthropic,chatbot,llama-cpp,llm,ollama,openai
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Requires-Dist: httpx>=0.24.0
Provides-Extra: all
Requires-Dist: anthropic>=0.20.0; extra == 'all'
Requires-Dist: openai>=1.0.0; extra == 'all'
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.20.0; extra == 'anthropic'
Provides-Extra: benchmark
Requires-Dist: numpy; extra == 'benchmark'
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.21; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: respx>=0.20; extra == 'dev'
Provides-Extra: openai
Requires-Dist: openai>=1.0.0; extra == 'openai'
Description-Content-Type: text/markdown

# anyllm

<p align="center"><img src="logo.svg" alt="anyllm logo" width="120"></p>

![PyPI](https://img.shields.io/pypi/v/anyllm)
![Python](https://img.shields.io/pypi/pyversions/anyllm)
![License](https://img.shields.io/pypi/l/anyllm)

A thin, unified LLM abstraction layer. Call any LLM with a single API.

**Local-first: Uses Ollama/llama.cpp by default. Cloud APIs optional.** When local models are available, anyllm automatically prefers them over cloud providers, ensuring your applications work offline.

Simpler than litellm -- focused on the essentials. No bloat, no complex abstractions. Just `anyllm.chat()`.

## Provider Support

| Provider | API Key Env Var | Streaming | Local |
|----------|----------------|-----------|-------|
| **OpenAI** (+ compatible APIs) | `OPENAI_API_KEY` | Yes | No |
| **Anthropic** Claude | `ANTHROPIC_API_KEY` | Yes | No |
| **Ollama** | -- | Yes | Yes |
| **llama.cpp** server | -- | Yes | Yes |

## Installation

```bash
pip install anyllm
```

With optional provider SDKs:

```bash
pip install anyllm[openai]      # OpenAI SDK
pip install anyllm[anthropic]   # Anthropic SDK
pip install anyllm[all]         # All optional SDKs
```

> **Note:** anyllm works without any provider SDKs installed -- it uses `httpx` for all HTTP calls by default.

## Quick Start

### Simple Chat

```python
import anyllm

# Auto-detects the best available provider
response = anyllm.chat("What is the meaning of life?")
print(response.content)
```

### OpenAI

```bash
export OPENAI_API_KEY="sk-..."
```

```python
response = anyllm.chat("Hello!", model="openai/gpt-4")
print(response.content)
print(f"Tokens used: {response.usage.total_tokens}")
```

### Anthropic Claude

```bash
export ANTHROPIC_API_KEY="sk-ant-..."
```

```python
response = anyllm.chat("Hello!", model="anthropic/claude-sonnet-4-20250514")
print(response.content)
```

### Ollama (Local)

```bash
ollama pull llama3
```

```python
response = anyllm.chat("Hello!", model="ollama/llama3")
print(response.content)
```

### llama.cpp Server

```bash
./llama-server -m model.gguf
```

```python
response = anyllm.chat("Hello!", model="llamacpp/default")
print(response.content)
```

## Streaming

```python
for chunk in anyllm.chat("Tell me a story", model="openai/gpt-4", stream=True):
    print(chunk, end="", flush=True)
print()
```

## Full Message Format

```python
response = anyllm.chat(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is Python?"},
    ],
    model="openai/gpt-4",
    temperature=0.7,
    max_tokens=500,
)
```

Or use the `system` parameter shorthand:

```python
response = anyllm.chat(
    "What is Python?",
    model="openai/gpt-4",
    system="You are a helpful assistant.",
)
```

## Configuration

### Environment Variables

| Variable | Description |
|----------|-------------|
| `OPENAI_API_KEY` | OpenAI API key |
| `OPENAI_BASE_URL` | Custom OpenAI-compatible API base URL |
| `ANTHROPIC_API_KEY` | Anthropic API key |
| `OLLAMA_HOST` | Ollama server URL (default: `http://localhost:11434`) |
| `ANYLLM_DEFAULT_MODEL` | Default model to use |

### Config File

Create `~/.anyllm/config.json`:

```json
{
  "default_model": "openai/gpt-4",
  "openai_api_key": "sk-...",
  "ollama_base_url": "http://localhost:11434"
}
```

### Programmatic Configuration

```python
import anyllm

# Set default model
anyllm.set_default("ollama/llama3")

# Check available providers
print(anyllm.available_providers())  # ['openai', 'ollama']

# List models
print(anyllm.list_models())
# {'openai': ['gpt-4', 'gpt-3.5-turbo', ...], 'ollama': ['llama3', ...]}

# Direct config access
config = anyllm.get_config()
config.set("openai_base_url", "http://localhost:1234/v1")
```

## Response Object

```python
response = anyllm.chat("Hello!", model="openai/gpt-4")

response.content        # "Hello! How can I help you?"
response.model          # "gpt-4"
response.usage          # Usage(prompt_tokens=10, completion_tokens=8, total_tokens=18)
response.raw_response   # Raw API response dict
```

## Local-First / Edge AI

anyllm is designed with a local-first philosophy. When auto-detecting providers,
it checks local options (Ollama, llama.cpp) before cloud APIs. This means your
applications work offline by default when local models are available.

```python
import anyllm

# See what local models are available
print(anyllm.list_local_models())
# {'ollama': ['llama3', 'mistral', ...]}

# Auto-detection prefers local -- this uses Ollama if running
response = anyllm.chat("Hello!")

# Explicitly use a local model
response = anyllm.chat("Hello!", model="ollama/llama3")
```

## Error Handling

anyllm includes automatic retry with exponential backoff for transient errors:

```python
# Retries up to 3 times by default
response = anyllm.chat("Hello!", model="openai/gpt-4", max_retries=3)
```

## License

MIT License. See [LICENSE](LICENSE) for details.

## Author

**Viet-Anh Nguyen** -- [nrl.ai](https://www.nrl.ai) -- [GitHub](https://github.com/vietanhdev)
