Metadata-Version: 2.4
Name: vrraj-llm-adapter
Version: 1.0.5
Summary: Standalone llm adapter/routing layer with a demo UI for testing provider connections and call and output signature comparison.
Author: Rajkumar Velliavitil
License: MIT License
        
        
        Copyright (c) 2025 Rajkumar Velliavitil. All rights reserved.
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/vrraj/llm-adapter
Project-URL: Repository, https://github.com/vrraj/llm-adapter
Project-URL: Issues, https://github.com/vrraj/llm-adapter/issues
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: openai==2.8.1
Requires-Dist: google-genai==1.60.0
Requires-Dist: python-dotenv<2.0.0,>=1.0.0
Requires-Dist: numpy<3.0.0,>=2.0.0
Requires-Dist: httpx<1.0.0,>=0.23.1
Requires-Dist: pydantic<3.0.0,>=2.11.0
Requires-Dist: pydantic-settings<3.0.0,>=2.7.0
Provides-Extra: tokens
Requires-Dist: tiktoken<1.0.0,>=0.7.0; extra == "tokens"
Provides-Extra: server
Requires-Dist: fastapi<1.0.0,>=0.135.0; extra == "server"
Requires-Dist: uvicorn<1.0.0,>=0.42.0; extra == "server"
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"
Dynamic: license-file

# vrraj-llm-adapter 
[![PyPI - Version](https://img.shields.io/pypi/v/vrraj-llm-adapter?color=blue&logo=pypi&logoColor=white)](https://pypi.org/project/vrraj-llm-adapter/)
[![GitHub Release](https://img.shields.io/github/v/release/vrraj/llm-adapter?label=github%20release&color=orange&logo=github)](https://github.com/vrraj/llm-adapter/releases)
![CI Status](https://github.com/vrraj/llm-adapter/actions/workflows/ci.yml/badge.svg)


> **Development and Demo UI:**  
> This repository ships with a FastAPI-powered **Interactive Playground** for validating text generation, embeddings, and registry configuration end-to-end. See **[Development And Demo UI](#development-and-demo-ui)** section below for details and setup instructions.


Provider-agnostic LLM adapter for **text generation + embeddings** with a **registry-driven routing layer** (capabilities, param policies, pricing metadata, access control), plus **normalized outputs** (text, tool calls, reasoning, usage).

Currently supports OpenAI and Gemini (extensible architecture for additional providers).

- **PyPI:** https://pypi.org/project/vrraj-llm-adapter
- **GitHub:** https://github.com/vrraj/llm-adapter
- **Documentation:** https://vrraj.github.io/llm-adapter/


## Install

```bash
pip install vrraj-llm-adapter
```




## What you get

- **One interface** for generation + embeddings across providers
- **Registry-driven routing (default + extensible)** — ships with built-in model keys and supports **custom** registry extensions
- **Parameter policies** (allowed/disabled filtering per model)
- **Normalized responses** (text, tool calls, reasoning, usage)
- **Model Allowlist** (access control)
- **Pricing metadata** in registry for cost visibility
- **Embedding controls** (optional normalization + configurable output dimensionality)




## Quickstart

> **Requires API keys:** `OPENAI_API_KEY` and/or `GEMINI_API_KEY`
> 
> **Setup:** Copy `.env.example` to `.env` and configure your API keys

```bash
cp .env.example .env
# Edit .env with your API keys
```

The examples below use a registry model key for `model=` (for example: `openai:gpt-4o-mini`, `gemini:openai-3-flash-preview`). For a complete list of default model keys, see [model-registry.md](https://github.com/vrraj/llm-adapter/blob/main/docs/model-registry.md) or print keys programmatically (snippet below)

### Option A: Run a ready-to-use example script
Download and run a ready-to-use example script for text generation and embeddings for openai and gemini

```bash
curl -L -O https://raw.githubusercontent.com/vrraj/llm-adapter/main/examples/llm_adapter_basic_usage.py

python llm_adapter_basic_usage.py
```


### Option B: Call the API directly

```python
from llm_adapter import llm_adapter

resp = llm_adapter.create(
    model="openai:gpt-4o-mini", # for gemini, use "gemini:openai-3-flash-preview"
    input="Write a one-sentence bedtime story about a unicorn.",
    max_output_tokens=100,
)

# Normalize to stable app-facing schema
result = llm_adapter.normalize_adapter_response(resp)

print(result["text"])
print(result["usage"])
```

### Discover available model keys

The package ships with a default registry. To list available keys:

```python
from llm_adapter import LLMAdapter

adapter = LLMAdapter()
for key in sorted(adapter.model_registry.keys()):
    print(key)
```




## Interactive Playground (GitHub)

The repo includes a small FastAPI demo + UI to try models, inspect registry metadata, and view normalized responses.

The source includes developer tooling to test **custom model registries** (overrides/extensions) end-to-end in the UI. See **[Development And Demo UI](#development-and-demo-ui)** section below.

![LLM Adapter Interactive Playground](https://github.com/vrraj/llm-adapter/blob/main/images/llm_adapter_interactive_playground.png)


## Public API (overview)

- `llm_adapter.create(...) -> AdapterResponse` — text generation (supports tools + optional streaming)
- `llm_adapter.normalize_adapter_response(...) -> LLMResult` — normalize `AdapterResponse` into a consistent dict schema
- `llm_adapter.create_embedding(...) -> EmbeddingResponse` — create embeddings
- `llm_adapter.get_pricing_for_model(...) -> Pricing | None` — pricing metadata lookup

>📋 For **complete method signatures, parameter details, and full response structures**, see: [api-reference.md](https://github.com/vrraj/llm-adapter/blob/main/docs/api-reference.md)

### AdapterResponse (from `create`)

Top-level fields (stable surface; note: `output_text` may include provider thought markup for some Gemini paths):

```python
AdapterResponse(
  output_text: str,
  model: str,
  usage: dict,
  status: str,
  finish_reason: str | None,
  tool_calls: list | None,
  metadata: dict | None,
  adapter_response: Any | None,  # debug/opaque
  model_response: Any | None,    # debug/opaque
)
```

### EmbeddingResponse (from `create_embedding`)

Top-level fields:

```python
EmbeddingResponse(
  data: List[List[float]],
  usage: EmbeddingUsage,
  normalized: bool | None,
  vector_dim: int | None,
  metadata: dict | None,
  raw: Any | None,
)
```

### LLMResult (from `normalize_adapter_response`)

Top-level fields:

```python
{
  "text": str,
  "reasoning": str | None,
  "role": str,
  "status": str,
  "finish_reason": str | None,
  "usage": dict,
  "tool_calls": list,
  "metadata": dict | None,
  "raw": Any,
}
```

### Recommended flow (create → normalize)

The adapter intentionally separates the **provider boundary** from your app-facing schema:

```text
User Input
   │
   ▼
llm_adapter.create(...)  ─────────────►  AdapterResponse
   │                                  (provider-aware: raw responses, metadata)
   │
   ▼
llm_adapter.normalize_adapter_response(resp)  ─►  LLMResult
                                          (stable dict schema for apps)

Notes:
- `create()` performs the network call.
- `normalize_adapter_response()` is a local transform (no additional provider request).
```

Normalize to `LLMResult` for stable, application-facing output.
Use `result["text"]` from `normalize_adapter_response()` for display-safe text; `resp.output_text` may include provider thought markup depending on model configuration.



## Documentation & References

- **Complete API Reference:** [api-reference.md](https://github.com/vrraj/llm-adapter/blob/main/docs/api-reference.md)
- **Model Registry docs:** [model-registry.md](https://github.com/vrraj/llm-adapter/blob/main/docs/model-registry.md)
- **Ready to use Examples:** [examples](https://github.com/vrraj/llm-adapter/tree/main/examples)
- **Dev notes:** [development.md](https://github.com/vrraj/llm-adapter/blob/main/docs/development.md)

---


## Usage Examples (PyPI)

Install the adapter from PyPI, then download and run the standalone example scripts to explore common usage patterns such as chat, embeddings, streaming, and custom registry overrides.

### Text Generation - Application Wrapper Pattern

Some applications prefer a one-step helper that standardizes on `LLMResult` internally:

```python
from llm_adapter import llm_adapter


def create_result(**kwargs):
    resp = llm_adapter.create(**kwargs)
    return llm_adapter.normalize_adapter_response(resp)

result = create_result(
    model="openai:gpt-4o-mini",
    input="Hello"
)

print(result["text"])
```

This pattern keeps the library surface minimal while allowing your application to standardize on the normalized contract.


**Core Examples:**
-  llm_adapter_basic_usage.py - Basic usage and normalization
-  create_and_normalize_example.py - Recommended create → normalize flow (Gemini-safe)
-  llm_adapter_model_spec_example.py - ModelSpec configuration

**Provider-Specific Examples:**
-  openai_embedding_example.py - OpenAI embeddings
-  openai_adapter_example.py - OpenAI chat
-  streaming_call_example.py - Streaming responses

**Advanced Examples:**
-  set_adapter_allowed_models.py - Allowlist demo
   *(See "Model Allowlist (Access Control)" section for environment variable details)*
-  custom_registry.py - Custom registry

> For application-facing output, use the create → normalize flow (see **Text Generation - Application Wrapper Pattern** above).
> If you need the raw provider boundary object for debugging, `llm_adapter.create(...)` returns an `AdapterResponse`.

### Accessing Reasoning Content

Some models (like Gemini) return reasoning content separately.

```python
from llm_adapter import llm_adapter, LLMError

try:
    response = llm_adapter.create(
        model="gemini:native-sdk-reasoning-2.5-flash",
        input="Explain why the sky is blue",
        reasoning_effort="high",   # adapter-level reasoning knob
        max_output_tokens=1000
    )

    normalized_response = llm_adapter.normalize_adapter_response(response)

    if normalized_response.get('reasoning'):
        print(f"Reasoning: {normalized_response['reasoning']}")

    print(normalized_response['text'])
except LLMError as e:
    print(f"Error: {e.code} - {e}")
```

### Streaming

```python
from llm_adapter import llm_adapter

for event in llm_adapter.create(model="openai:gpt-4o-mini", input="Hello", stream=True):
    if event.type == "output_text.delta":
        print(event.delta, end="")
```

## Tool Calling

The adapter supports provider-agnostic tool calling using the **OpenAI-style function schema**.

Pass tool definitions to `llm_adapter.create(...)`. The model may return one or more tool calls with structured arguments. The host application is responsible for executing those tools and sending the tool results back to the adapter as follow-up context for the final response.

### Tool definition format

Tool definitions use an OpenAI-style JSON schema:

```python
tools = [
    {
        "type": "function",
        "name": "get_weather",
        "description": "Get current weather information for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City name or location (for example: 'New York, NY')"
                },
                "units": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature units"
                }
            },
            "required": ["location"]
        }
    }
]
```

### Tool call output format

When the model decides to call a tool, normalized tool calls are returned in `AdapterResponse.tool_calls` and `LLMResult.tool_calls`:

```python
tool_calls = [
    {
        "id": "call_12345",
        "name": "get_weather",
        "args": {
            "location": "New York, NY",
            "units": "celsius"
        }
    }
]
```

### Example

```python
from llm_adapter import llm_adapter

tools = [
    {
        "type": "function",
        "name": "get_weather",
        "description": "Get weather information",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City name"
                }
            },
            "required": ["location"]
        }
    }
]

response = llm_adapter.create(
    model="openai:gpt-4o-mini",
    input="What's the weather like in New York?",
    tools=tools
)

if response.tool_calls:
    for call in response.tool_calls:
        tool_name = call["name"]
        tool_args = call["args"]
        tool_id = call["id"]

        # Execute the tool in your application
        result = execute_tool(tool_name, tool_args)

        # Send tool results back to the adapter in your follow-up call
```

### Notes

- The adapter normalizes tool definitions and emitted tool calls across providers.
- Tool execution is intentionally handled by the host application, not by the adapter.
- For application-facing output, use the `create -> normalize_adapter_response` flow.

### Model Registry & Extensibility

The LLM adapter uses a registry of model definitions (ModelInfo) that control:
- Provider routing
- Endpoint selection
- Parameter policies (allowed/disabled)
- Pricing and limits
- Capabilities (reasoning, tools, dimensions, etc.)

You can override or extend the registry by passing your own mapping to `LLMAdapter(...)`.

```python
from llm_adapter.model_registry import ModelInfo, validate_registry
from llm_adapter import ModelSpec
```

### Example Custom Registry

```python
from llm_adapter import LLMAdapter
from llm_adapter.model_registry import ModelInfo, Pricing

custom_registry = {
    "my-openai-model": ModelInfo(
        provider="openai",
        model="gpt-4o-mini",
        endpoint="chat_completions",
        pricing=Pricing(input_per_mm=0.05, output_per_mm=0.15),
        param_policy={"allowed": {"temperature", "max_tokens"}},
        limits={"max_output_tokens": 1000}
    )
}

adapter = LLMAdapter(model_registry=custom_registry)
```

### Model Allowlist

```bash
export LLM_ADAPTER_ALLOWED_MODELS="openai:gpt-4o-mini,openai:embed_small"
```

**For comprehensive registry documentation, see:**
- https://github.com/vrraj/llm-adapter/blob/main/docs/model-registry.md
- https://github.com/vrraj/llm-adapter/blob/main/examples/custom_registry.py
- https://github.com/vrraj/llm-adapter/blob/main/src/llm_adapter/model_registry.py

### Validate Custom Registry

```python
from llm_adapter.model_registry import validate_registry
validate_registry(custom_registry, strict=False)
```

### Embeddings

```python
from llm_adapter import llm_adapter, LLMError

try:
    response = llm_adapter.create_embedding(
        model="openai:embed_small",
        input="The quick brown fox jumps over the lazy dog"
    )
    print(f"Generated {len(response.data)} embeddings")
    print(f"First embedding dimension: {len(response.data[0])}")
except LLMError as e:
    print(f"Error: {e.code} - {e}")
```

## Development And Demo UI

Do this to run the **demo UI** (runs on port 8100) or **customize** the code.

1. Clone the repository and run the setup script.

```bash
git clone https://github.com/vrraj/llm-adapter.git
cd llm-adapter
bash scripts/llm_adapter_setup.sh
```

>This script (scripts/llm_adapter_setup.sh) checks prerequisites (`python3`, `make`), creates `.env` if missing, sets up a local `.venv`, installs the package (`pip install -e ".[server]"`), and shows **next steps**. The demo UI and FastAPI server run in this `.venv` virtual environment. Safe to run multiple times.

2. Set required API keys (see **Environment variables** section below).

3. Start the application.

```bash
make start
```

>**Note:** Run `make start` to run in foreground or `make start-bg` to run in background. Use `make stop` to stop the server.

4. Open the demo UI:

- http://localhost:8100/ui/


### Manual start (optional)

If you prefer not to use the Makefile helpers, you can start the FastAPI server directly:

```bash
uvicorn llm_adapter_demo.api:app --reload --port 8100
```

The Interactive Playground will be available at:

```
http://localhost:8100/ui/
```

### For Developers: Running Tests

#### Install Dev dependencies

```bash
pip install -e ".[dev]"
```

#### Run Tests

```bash
pytest
pytest -m integration
pytest -m "integration or unit"
```

## Project structure

For internal design and architecture notes, see [development.md](https://github.com/vrraj/llm-adapter/blob/main/docs/development.md).

## ModelSpec: Structured Configuration

`ModelSpec` provides a type-safe, reusable way to configure model parameters as an alternative to passing individual parameters.

>**Note**: See `examples/llm_adapter_model_spec_example.py` for a comprehensive example demonstrating ModelSpec usage with different providers and parameter configurations.

### Using ModelSpec

```python
from llm_adapter import llm_adapter
from llm_adapter import ModelSpec

chat_spec = ModelSpec(
    provider="openai",
    model="gpt-4o-mini",
    temperature=0.7,
    max_output_tokens=1000,
    extra={"custom_param": "value"}
)

resp1 = llm_adapter.create(spec=chat_spec, input=[{"role": "user", "content": "Hello"}])
resp2 = llm_adapter.create(spec=chat_spec, input=[{"role": "user", "content": "How are you?"}])

embed_spec = ModelSpec(
    provider="openai",
    model="embed_small"
)
resp = llm_adapter.create_embedding(spec=embed_spec, input="Text to embed")
```

### ModelSpec vs Individual Parameters

| Approach | Provider | Model Name | Auto-detection | Type Safety |
|----------|----------|------------|----------------|-------------|
| **Individual params** | Optional (auto-detected from registry) | Registry key (`openai:gpt-4o-mini`) | ✅ Yes | ❌ Runtime |
| **ModelSpec** | Required (explicit) | Provider-native (`gpt-4o-mini`) | ❌ No | ✅ Static type-checkers |


## Unified Token Accounting

LLMAdapter returns a consistent usage schema across all providers:

### Usage Schema

```json
{
  "prompt_tokens": 0,
  "cached_tokens": 0,
  "output_tokens": 0,
  "reasoning_tokens": 0,
  "answer_tokens": 0,
  "total_tokens": 0
}
```

**Key relationships:**
- `output_tokens = answer_tokens + reasoning_tokens`
- `total_tokens = prompt_tokens + cached_tokens + output_tokens`

## Environment variables

Copy `.env.example` to `.env` and to set up your API keys (or use your existing environment variables):

```bash
cp .env.example .env
```

Supported env vars:

**Minimal working sets:**
- **OpenAI-only**: `OPENAI_API_KEY`
- **Gemini native SDK**: `GEMINI_API_KEY`
- **Gemini OpenAI-compatible**: `GEMINI_API_KEY` + `GEMINI_OPENAI_BASE_URL`

**All supported variables:**
- `OPENAI_API_KEY`
- `GEMINI_API_KEY`
- `GEMINI_OPENAI_BASE_URL`
- `LLM_ADAPTER_ALLOWED_MODELS` (comma-separated list) - Restrict which models can be used in each environment.

## Model Allowlist

The `LLM_ADAPTER_ALLOWED_MODELS` environment variable allows you to restrict which models can be used. *By default, all models are allowed*.

```bash
export LLM_ADAPTER_ALLOWED_MODELS="openai:gpt-4o-mini,gemini:native-sdk-reasoning-2.5-flash"
```


## Supported Providers

Supports:
- **OpenAI** (Responses API, Chat Completions API, Embeddings API)
- **Gemini** (native `google-genai` SDK and OpenAI-compatible endpoint)

Models and capabilities are defined in `src/llm_adapter/model_registry.py`.

## Adding New Models

To add support for new models or override existing configurations, use **custom registries** rather than modifying the core registry:

1. **Create a custom registry** - See `examples/custom_registry.py` for a complete example
2. **Define ModelInfo entries** - Configure endpoints, capabilities, pricing, and parameter policies
3. **Load your registry** - Use environment variable or pass it to `LLMAdapter(model_registry=your_registry)`
4. **Test via Demo UI** - The Interactive Playground supports custom registry testing

### Environment Variable Configuration (Recommended)

For easy configuration without code changes, set the `CUSTOM_REGISTRY_PATH` environment variable:

```bash
# Configure environment (optional):
export CUSTOM_REGISTRY_PATH=/path/to/your/custom_registry.py
```

The adapter will automatically load and merge your custom registry with the default registry. This is useful for:
- Development environments with custom models
- Production deployments with organization-specific configurations
- Testing different registry configurations without code changes

📖 **For complete custom registry documentation**, see:
- [model-registry.md - Custom Registry](https://github.com/vrraj/llm-adapter/blob/main/docs/model-registry.md#custom-registry)


## Development

This is a standalone package. Development happens directly in this repo.

```bash
pip install -e .
make start
```

## License

This project is licensed under the MIT License.

