Metadata-Version: 2.4
Name: variably-sdk
Version: 2.4.0
Summary: Official Python SDK for Variably feature flags, LLM experimentation, and prompt optimization platform
Author: Variably
Author-email: Variably <support@variably.com>
License: MIT
Project-URL: Homepage, https://github.com/variably/variably-python-sdk
Project-URL: Documentation, https://docs.variably.com/sdks/python
Project-URL: Repository, https://github.com/variably/variably-python-sdk
Project-URL: Issues, https://github.com/variably/variably-python-sdk/issues
Keywords: feature-flags,experimentation,a-b-testing,variably,llm,prompt-experimentation,llmops
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Internet :: WWW/HTTP
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: requests>=2.25.0
Requires-Dist: typing-extensions>=3.7.4; python_version < "3.8"
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.18.0; extra == "dev"
Requires-Dist: pytest-cov>=2.10; extra == "dev"
Requires-Dist: black>=21.0; extra == "dev"
Requires-Dist: flake8>=3.8; extra == "dev"
Requires-Dist: mypy>=0.800; extra == "dev"
Requires-Dist: isort>=5.0; extra == "dev"
Requires-Dist: responses>=0.18.0; extra == "dev"
Provides-Extra: test
Requires-Dist: pytest>=6.0; extra == "test"
Requires-Dist: pytest-cov>=2.10; extra == "test"
Requires-Dist: responses>=0.18.0; extra == "test"
Dynamic: author
Dynamic: requires-python

# Variably Python SDK

Official Python SDK for Variably — feature flags, LLM experimentation, and prompt optimization.

## Installation

```bash
pip install variably-sdk
```

## Quick Start

```python
from variably import VariablyClient

# Initialize the client
client = VariablyClient({
    "api_key": "your-api-key",
    "base_url": "https://api.variably.com",  # optional, defaults to localhost:8080
    "environment": "production"  # optional
})

# Evaluate a boolean feature flag
user_context = {
    "user_id": "user-123",
    "email": "user@example.com",
    "country": "US"
}

is_feature_enabled = client.evaluate_flag_bool(
    "new-checkout-flow",
    False,  # default value
    user_context
)

if is_feature_enabled:
    # Show new checkout flow
    pass

# Evaluate a feature gate
has_access = client.evaluate_gate("premium-features", user_context)

# Track events
client.track({
    "name": "button_clicked",
    "user_id": "user-123",
    "properties": {
        "button_name": "checkout",
        "page": "product-detail"
    }
})

# Clean up resources
client.close()
```

## Prompt Experimentation

Variably provides two modes for LLM prompt experimentation:

### BYOR (Bring Your Own Runtime)

You call your own LLM. Variably handles variant allocation and 41-dimensional evaluation.

```python
from variably import VariablyClient
import time

client = VariablyClient({"api_key": "your-api-key"})

user_context = {"user_id": "user-123"}
input_variables = {"query": "What are the symptoms of Type 2 diabetes?"}

# Step 1: Get the allocated variant
variant = client.get_variant("rag-prompt-experiment", user_context, input_variables)
print(f"Variant: {variant.variant_key}, Model: {variant.model}")

# Step 2: Call your LLM with the variant's prompt template
prompt = variant.prompt_template.format(**input_variables)
start = time.time()
llm_response = call_your_llm(prompt, model=variant.model)  # your LLM call
latency = int((time.time() - start) * 1000)

# Step 3: Submit the response for 41-dimensional evaluation
result = client.submit_response(
    experiment_key="rag-prompt-experiment",
    variant_key=variant.variant_key,
    executed_prompt=prompt,
    response=llm_response,
    user_context=user_context,
    input_variables=input_variables,
    provider=variant.provider,
    model=variant.model,
    latency_ms=latency,
)
print(f"Submitted: {result.status}")
```

### Managed Execution

Variably selects the variant, calls the LLM, and evaluates — all in one call.

```python
response = client.evaluate_prompt(
    experiment_key="rag-prompt-experiment",
    user_context={"user_id": "user-123"},
    input_variables={"query": "What are the symptoms of Type 2 diabetes?"},
    evaluation_mode="full",  # "full" | "fast"
)

print(f"Content: {response.content}")
print(f"Model: {response.model}, Latency: {response.latency_ms}ms")
print(f"Tokens: {response.token_usage}")
print(f"Quality Score: {response.quality_score}")
```

### Managed Execution with Streaming (v2.1.0+)

Same as managed execution, but tokens stream in real-time — ideal for chatbot UIs.

```python
from variably import VariablyClient

client = VariablyClient({"api_key": "your-api-key"})

stream = client.evaluate_prompt_stream(
    experiment_key="rag-prompt-experiment",
    user_context={"user_id": "user-123"},
    input_variables={"query": "What are the symptoms of Type 2 diabetes?"},
)

# Tokens arrive one-by-one for real-time display
for token in stream:
    print(token, end="", flush=True)

print()  # newline after stream ends

# After iteration, metadata is available (token usage, latency, quality score)
meta = stream.metadata
if meta:
    print(f"Model: {meta.model}, Latency: {meta.latency_ms}ms")
    print(f"Tokens: {meta.token_usage}")
```

### Context-Aware Evaluation (Better RAG Quality) — v2.2.0+

For RAG chatbots, passing conversation history and retrieved chunks enables **groundedness scoring, hallucination detection, and conversational coherence** — dimensions that are impossible to evaluate in isolation.

The `evaluation_context` parameter is **not sent to the LLM** — it's only used by Variably's evaluator for richer scoring.

```python
# Step 1: Collect conversation history from your session
workflow_history = [
    {"role": "user", "content": "What causes diabetes?"},
    {"role": "assistant", "content": "Key factors include genetics, diet..."},
    {"role": "user", "content": "What about potatoes?"},
]

# Step 2: Collect retrieved RAG chunks (after your retrieval step)
reference_materials = [
    {
        "id": "chunk-001",
        "content": "Unhealthy diets high in refined sugars, fats...",
        "source": "Kenya National Clinical Guidelines",
        "type": "chunk",
        "relevance_score": 0.89,
    },
    {
        "id": "chunk-002",
        "content": "Modifiable risk factors include obesity...",
        "source": "Kenya National Clinical Guidelines",
        "type": "chunk",
        "relevance_score": 0.82,
    },
]

# Step 3: Pass evaluation_context in your evaluate call
response = client.evaluate_prompt(
    experiment_key="rag-prompt-experiment",
    user_context={"user_id": "user-123"},
    input_variables={"query": "What about potatoes?", "context": context_text},
    evaluation_mode="full",
    evaluation_context={
        "reference_materials": reference_materials,
        "workflow_history": workflow_history,
        "retrieval_query": "potato consumption glycemic index diabetes risk",
    },
)

# Same works with streaming
stream = client.evaluate_prompt_stream(
    experiment_key="rag-prompt-experiment",
    user_context={"user_id": "user-123"},
    input_variables={"query": "What about potatoes?", "context": context_text},
    evaluation_context={
        "reference_materials": reference_materials,
        "workflow_history": workflow_history,
    },
)
for token in stream:
    print(token, end="", flush=True)
```

**What this enables:**

| Dimension | Description | Requires |
|-----------|-------------|----------|
| `faithfulness` | % of claims grounded in retrieved chunks | `reference_materials` |
| `hallucination_rate` | % of claims with no source in context | `reference_materials` |
| `context_utilization` | % of relevant chunks actually used | `reference_materials` |
| `attribution_accuracy` | Do citations map to correct chunks? | `reference_materials` |
| `conversation_consistency` | No contradictions with prior turns | `workflow_history` |
| `context_retention` | Maintains topic awareness across turns | `workflow_history` |
| `transparency` | Discloses when going beyond source material | `reference_materials` |

**BYOR mode** also supports `evaluation_context` — pass it in `submit_response()`:

```python
result = client.submit_response(
    experiment_key="my-experiment",
    variant_key=variant.variant_key,
    executed_prompt=prompt,
    response=llm_response,
    user_context=user_context,
    input_variables=input_variables,
    provider=variant.provider,
    model=variant.model,
    latency_ms=latency,
    evaluation_context={
        "reference_materials": reference_materials,
        "workflow_history": workflow_history,
    },
)
```

#### evaluation_context Schema

| Field | Type | Description |
|-------|------|-------------|
| `reference_materials` | `list[dict]` | RAG chunks / source documents for groundedness scoring |
| `reference_materials[].id` | `str` | Unique chunk identifier |
| `reference_materials[].content` | `str` | Chunk text content |
| `reference_materials[].source` | `str` (optional) | Source document URL or name |
| `reference_materials[].type` | `str` (optional) | e.g. `"chunk"`, `"document"` |
| `reference_materials[].relevance_score` | `float` (optional) | Retriever similarity score |
| `workflow_history` | `list[dict]` | Conversation turns for coherence scoring |
| `workflow_history[].role` | `str` | `"user"` or `"assistant"` |
| `workflow_history[].content` | `str` | Message content |
| `retrieval_query` | `str` (optional) | The rewritten query sent to the retriever |

See [Context-Aware RAG Evaluation](../../docs/concepts/context-aware-rag-evaluation.md) for the full concept doc with architecture diagrams and integration examples.

#### Integration with LangGraph / FastAPI streaming

```python
from fastapi.responses import StreamingResponse

async def stream_with_variably(query: str, session_id: str):
    """Yield NDJSON events from Variably streaming evaluation."""
    stream = client.evaluate_prompt_stream(
        experiment_key="my-experiment",
        user_context={"user_id": session_id},
        input_variables={"query": query},
    )

    for token in stream:
        yield json.dumps({"type": "token", "content": token}) + "\n"

    # Send final metadata
    if stream.metadata:
        yield json.dumps({
            "type": "stream_end",
            "content": stream.metadata.content,
        }) + "\n"

@app.post("/api/chat")
async def chat(request: ChatRequest):
    return StreamingResponse(
        stream_with_variably(request.message, request.session_id),
        media_type="application/x-ndjson",
    )
```

### Backend API: SSE Streaming Endpoint

The streaming endpoint uses Server-Sent Events (SSE). Here's the raw API:

**Endpoint:** `POST /api/v1/internal/sdk/prompt-experiments/evaluate-stream`

**Headers:**
```
X-API-Key: your-api-key
Content-Type: application/json
```

**Request body** (same as non-streaming evaluate):
```json
{
  "experiment_key": "rag-prompt-experiment",
  "user_context": {
    "userId": "user-123",
    "sessionId": "sess-456"
  },
  "input_variables": {
    "query": "What are the symptoms of Type 2 diabetes?"
  },
  "evaluation_context": {
    "reference_materials": [{"id": "chunk-1", "content": "...", "source": "...", "type": "chunk"}],
    "workflow_history": [{"role": "user", "content": "..."}],
    "retrieval_query": "diabetes symptoms type 2"
  }
}
```

**Response** (SSE stream):
```
event: token
data: {"content": "Type"}

event: token
data: {"content": " 2"}

event: token
data: {"content": " diabetes"}

event: token
data: {"content": " symptoms"}

event: token
data: {"content": " include..."}

event: metadata
data: {"experiment_id": "exp-123", "variant_id": "variant-a", "execution_id": "eval-789", "provider": "anthropic", "model": "claude-3-5-haiku-20241022", "prompt_tokens": 150, "completion_tokens": 85, "total_tokens": 235, "cost_usd": 0.000425, "latency_ms": 1250}

event: done
data: {}
```

**curl example:**
```bash
curl -N -X POST http://localhost:8080/api/v1/internal/sdk/prompt-experiments/evaluate-stream \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "experiment_key": "rag-prompt-experiment",
    "user_context": {"userId": "user-123", "sessionId": "sess-456"},
    "input_variables": {"query": "What are the symptoms of Type 2 diabetes?"}
  }'
```

**Error handling:** If an error occurs during streaming, an error event is sent:
```
event: error
data: {"message": "LLM generation failed: rate limit exceeded"}
```

## Configuration

```python
from variably import VariablyConfig, VariablyClient

config = VariablyConfig(
    api_key="your-api-key",
    base_url="https://api.variably.com",  # default: http://localhost:8080
    environment="production",  # default: development
    timeout=5000,  # timeout in milliseconds, default: 5000
    retry_attempts=3,  # default: 3
    enable_analytics=True,  # default: True
    cache={
        "ttl": 300,  # TTL in seconds, default: 300 (5 minutes)
        "max_size": 1000,  # default: 1000
        "enabled": True  # default: True
    },
    log_level="INFO"  # DEBUG, INFO, WARNING, ERROR
)

client = VariablyClient(config)
```

## Advanced Usage

### Environment Variables

You can create a client using environment variables:

```python
from variably import create_client_from_env

# Uses these environment variables:
# VARIABLY_API_KEY (required)
# VARIABLY_BASE_URL
# VARIABLY_ENVIRONMENT
# VARIABLY_TIMEOUT
# VARIABLY_RETRY_ATTEMPTS
# VARIABLY_ENABLE_ANALYTICS
# VARIABLY_LOG_LEVEL

client = create_client_from_env()
```

### Different Flag Types

```python
# Boolean flags
bool_value = client.evaluate_flag_bool("feature-enabled", False, user_context)

# String flags
string_value = client.evaluate_flag_string("theme", "light", user_context)

# Number flags
number_value = client.evaluate_flag_number("max-items", 10, user_context)

# JSON flags
json_value = client.evaluate_flag_json("config", {"timeout": 5000}, user_context)

# Get full evaluation details
result = client.evaluate_flag("feature-flag", "default", user_context)
print(f"Value: {result.value}, Reason: {result.reason}, Cache Hit: {result.cache_hit}")
```

### Batch Evaluation

```python
flags = client.evaluate_flags([
    "feature-a",
    "feature-b", 
    "feature-c"
], user_context)

print(flags["feature-a"].value)
```

### Event Tracking

```python
from datetime import datetime

# Single event
client.track({
    "name": "purchase_completed",
    "user_id": "user-123",
    "properties": {
        "amount": 99.99,
        "currency": "USD",
        "items": ["item-1", "item-2"]
    },
    "timestamp": datetime.utcnow()  # optional, auto-generated if not provided
})

# Batch events
client.track_batch([
    {"name": "page_view", "user_id": "user-123", "properties": {"page": "/home"}},
    {"name": "button_click", "user_id": "user-123", "properties": {"button": "cta"}}
])
```

### Cache Management

```python
# Clear cache
client.clear_cache()

# Get cache stats
stats = client.cache.get_stats()
print(stats)  # {"size": 10, "max_size": 1000, "enabled": True, "ttl": 300}
```

### Metrics

```python
# Get SDK metrics
metrics = client.get_metrics()
print(metrics)
# {
#     "api_calls": 25,
#     "cache_hits": 15,
#     "cache_misses": 10,
#     "errors": 1,
#     "average_latency": 45.2,
#     "cache_hit_rate": 0.6,
#     "error_rate": 0.04,
#     "flags_evaluated": 20,
#     "gates_evaluated": 5,
#     "events_tracked": 12,
#     "start_time": "2023-10-01T12:00:00Z",
#     "uptime_seconds": 3600
# }
```

### Context Manager

```python
# Use with context manager for automatic cleanup
with VariablyClient({"api_key": "your-api-key"}) as client:
    result = client.evaluate_flag_bool("feature", False, user_context)
    # client.close() is called automatically
```

### Custom Logger

```python
from variably import VariablyClient, create_logger

# Create custom logger
logger = create_logger(
    name="my-app",
    level="DEBUG",
    structured=True,  # JSON logging
    silent=False
)

# Client will use the custom logger
client = VariablyClient({
    "api_key": "your-api-key",
    "log_level": "DEBUG"
})
```

## Error Handling

```python
from variably import (
    VariablyError,
    NetworkError,
    AuthenticationError,
    ValidationError,
    RateLimitError,
    TimeoutError,
    ConfigurationError
)

try:
    result = client.evaluate_flag("my-flag", False, user_context)
except AuthenticationError:
    print("Invalid API key")
except NetworkError as e:
    print(f"Network error: {e.status_code}")
except ValidationError as e:
    print(f"Validation error in field: {e.field}")
except RateLimitError as e:
    print(f"Rate limited, retry after {e.retry_after} seconds")
except TimeoutError:
    print("Request timed out")
except ConfigurationError as e:
    print(f"Configuration error in parameter: {e.parameter}")
except VariablyError as e:
    print(f"Variably SDK error: {e}")
```

## Type Hints

The SDK includes full type hints for better IDE support:

```python
from typing import Dict, Any
from variably import VariablyClient, UserContext, FlagResult

user_context: UserContext = {
    "user_id": "user-123",
    "email": "user@example.com",
    "attributes": {
        "plan": "premium",
        "signup_date": "2023-01-01"
    }
}

result: FlagResult = client.evaluate_flag("feature", False, user_context)
```

## Async Support

For async applications, you can wrap the synchronous client:

```python
import asyncio
from concurrent.futures import ThreadPoolExecutor
from variably import VariablyClient

class AsyncVariablyClient:
    def __init__(self, config):
        self.client = VariablyClient(config)
        self.executor = ThreadPoolExecutor(max_workers=4)
    
    async def evaluate_flag_bool(self, flag_key, default_value, user_context):
        loop = asyncio.get_event_loop()
        return await loop.run_in_executor(
            self.executor,
            self.client.evaluate_flag_bool,
            flag_key, default_value, user_context
        )
    
    async def close(self):
        self.client.close()
        self.executor.shutdown(wait=True)

# Usage
async def main():
    client = AsyncVariablyClient({"api_key": "your-api-key"})
    
    result = await client.evaluate_flag_bool("feature", False, {
        "user_id": "user-123"
    })
    
    await client.close()

asyncio.run(main())
```

## Development

### Setup

```bash
# Install development dependencies
pip install -e ".[dev]"
```

### Testing

```bash
pytest
```

### Code Quality

```bash
# Format code
black src/ tests/

# Sort imports
isort src/ tests/

# Lint
flake8 src/ tests/

# Type check
mypy src/
```

## Publishing to PyPI

### Prerequisites

1. Create a PyPI account at https://pypi.org/account/register/
2. Generate an API token at https://pypi.org/manage/account/token/
   - Scope: select "Entire account" for first upload, or project-specific after that
3. Install build tools:
   ```bash
   pip3 install build twine
   ```

> **Note:** `build` and `twine` install to user site-packages and may not be on your PATH.
> Always use `python3 -m build` and `python3 -m twine` instead of bare `build`/`twine`.

### Configure PyPI credentials

Create `~/.pypirc`:

```ini
[distutils]
index-servers = pypi

[pypi]
username = __token__
password = pypi-YOUR_API_TOKEN_HERE
```

Secure the file:

```bash
chmod 600 ~/.pypirc
```

### Build and publish

The version in the build output (e.g., `variably_sdk-2.0.0-py3-none-any.whl`) comes directly from `pyproject.toml`'s `version` field. PyPI rejects re-uploads of the same version — you must bump the version to publish again.

```bash
# 1. Clean previous builds
rm -rf dist/ build/ src/*.egg-info

# 2. Build sdist and wheel
python3 -m build

# 3. Verify the package (optional but recommended)
python3 -m twine check dist/*

# 4. Upload to TestPyPI first (optional, for dry-run)
python3 -m twine upload --repository testpypi dist/*

# 5. Upload to PyPI
python3 -m twine upload dist/*
```

### Verify the published package

```bash
pip3 install variably-sdk==2.1.0
python3 -c "from variably import VariablyClient, PromptVariant; print('OK')"
```

### Version bumping checklist

When releasing a new version, update these three files then clean-build-publish:

1. `src/variably/version.py` — `__version__`
2. `pyproject.toml` — `version`
3. `src/variably/http_client.py` — `User-Agent` header string

```bash
# Example: bumping from 2.0.0 to 2.0.1
# After updating the 3 files above:
rm -rf dist/ build/ src/*.egg-info
python3 -m build
python3 -m twine upload dist/*
```

## Requirements

- Python 3.7+
- requests >= 2.25.0

## License

MIT License - see LICENSE file for details.
