Metadata-Version: 2.4
Name: promptvc-sdk
Version: 1.0.0b5
Summary: Automatic prompt version control for LLM applications
License: MIT
Author: Justice
Author-email: justice@promptvc.io
Requires-Python: >=3.10,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Provides-Extra: privacy
Requires-Dist: opentelemetry-api (>=1.20)
Requires-Dist: opentelemetry-sdk (>=1.20)
Requires-Dist: requests (>=2.28)
Requires-Dist: urllib3 (>=2.7.0)
Description-Content-Type: text/markdown

# promptvc · Python SDK

Automatic prompt version control for LLM applications.  
Drop in two lines of code and every LLM call is captured, versioned, and observable from the [PromptVC dashboard](https://app.promptvc.io).

---

## Table of contents

- [How it works](#how-it-works)
- [Installation](#installation)
- [Quick start](#quick-start)
- [Configuration](#configuration)
- [Integrations](#integrations)
  - [OpenAI](#openai)
  - [Anthropic](#anthropic)
  - [LiteLLM](#litellm)
  - [LangChain](#langchain)
  - [Google ADK](#google-adk)
- [Context & metadata](#context--metadata)
- [Conversations](#conversations)
- [Named prompt assets](#named-prompt-assets)
- [Custom spans (no instrumentor)](#custom-spans-no-instrumentor)
- [PII redaction](#pii-redaction)
- [Testing](#testing)
- [Environment variables reference](#environment-variables-reference)

---

## How it works

PromptVC is built on [OpenTelemetry](https://opentelemetry.io/) and the [OpenInference](https://github.com/Arize-ai/openinference) semantic conventions.

1. `promptvc.configure_otel()` registers a custom OTel span exporter that forwards LLM traces to the PromptVC ingest API.
2. An **OpenInference instrumentor** (one per framework) monkey-patches your LLM client and emits standardised OTel spans automatically — no manual instrumentation needed.
3. The backend clusters spans into versioned prompt assets, tracks drift, and surfaces the diff view in the dashboard.

---

## Installation

```bash
pip install promptvc-sdk
```

Requires Python ≥ 3.10.

Install the OpenInference instrumentor for the framework(s) you use:

```bash
# OpenAI
pip install openinference-instrumentation-openai

# Anthropic
pip install openinference-instrumentation-anthropic

# LiteLLM (covers 100+ providers)
pip install openinference-instrumentation-litellm

# LangChain / LangGraph
pip install openinference-instrumentation-langchain

# Google ADK
pip install openinference-instrumentation-google-adk
```

**Optional — PII redaction:**

```bash
pip install 'promptvc-sdk[privacy]'
python -m spacy download en_core_web_md   # or your preferred model
```

---

## Quick start

```python
import promptvc
from openinference.instrumentation.openai import OpenAIInstrumentor

# 1. Wire up OTel → PromptVC
promptvc.configure_otel(
    api_key="pvc_live_xxx",   # or set PROMPTVC_API_KEY
    service="my-app",
    env="production",
)

# 2. Instrument your LLM client (call once at startup)
OpenAIInstrumentor().instrument()

# 3. Use your client as normal — calls are captured automatically
import openai

client = openai.OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
```

That's it. Every subsequent `client.chat.completions.create` call — in any file, any function — is captured without further changes.

---

## Configuration

### `promptvc.configure_otel()`

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `api_key` | `str` | `PROMPTVC_API_KEY` env var | Your PromptVC API key |
| `service` | `str` | `"default"` | Logical name for this application |
| `env` | `str` | `"development"` | `"development"` · `"staging"` · `"production"` |
| `backend_url` | `str` | `https://ingest.promptvc.io` | Override ingest endpoint |
| `debug` | `bool` | `False` | Print exporter activity to stderr |

Call `configure_otel()` **before** calling any instrumentor's `.instrument()`.

---

## Integrations

### OpenAI

Works with `openai.OpenAI`, `openai.AsyncOpenAI`, and any OpenAI-compatible client (Azure OpenAI, OpenRouter, etc.).

```python
import promptvc
from openinference.instrumentation.openai import OpenAIInstrumentor

promptvc.configure_otel(api_key="pvc_live_xxx", service="my-app")
OpenAIInstrumentor().instrument()

import openai

client = openai.OpenAI()

# Non-streaming
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a concise assistant."},
        {"role": "user", "content": "What is a vector database?"},
    ],
)
print(response.choices[0].message.content)

# Streaming — fully supported
stream = client.chat.completions.create(
    model="gpt-4o-mini",
    stream=True,
    messages=[{"role": "user", "content": "Tell me a joke."}],
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
```

Async clients work identically — just use `openai.AsyncOpenAI()` and `await`.

---

### Anthropic

Works with `anthropic.Anthropic` and `anthropic.AsyncAnthropic`.

```python
import promptvc
from openinference.instrumentation.anthropic import AnthropicInstrumentor

promptvc.configure_otel(api_key="pvc_live_xxx", service="my-app")
AnthropicInstrumentor().instrument()

import anthropic

client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "Explain transformers briefly."}],
)
print(response.content[0].text)
```

---

### LiteLLM

Instruments every `litellm.completion` / `litellm.acompletion` call, covering 100+ providers through a single integration.

```python
import promptvc
from openinference.instrumentation.litellm import LiteLLMInstrumentor

promptvc.configure_otel(api_key="pvc_live_xxx", service="my-app")
LiteLLMInstrumentor().instrument()

import litellm

response = litellm.completion(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
```

---

### LangChain

No callbacks needed — `LangChainInstrumentor` auto-patches every LangChain provider, chain type, and invocation pattern (`invoke`, `stream`, `ainvoke`, `astream`, `batch`).

```python
import promptvc
from openinference.instrumentation.langchain import LangChainInstrumentor

promptvc.configure_otel(api_key="pvc_live_xxx", service="my-app")
LangChainInstrumentor().instrument()

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

llm = ChatOpenAI(model="gpt-4o-mini")
response = llm.invoke([
    SystemMessage(content="You are a concise assistant."),
    HumanMessage(content="What is a binary search tree?"),
])
print(response.content)
```

**LCEL chains** are captured transparently:

```python
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

chain = (
    ChatPromptTemplate.from_messages([
        ("system", "You are a helpful assistant."),
        ("human", "{question}"),
    ])
    | ChatOpenAI(model="gpt-4o-mini")
    | StrOutputParser()
)

result = chain.invoke({"question": "What is RAG?"})
```

---

### Google ADK

Instruments every model call made by an ADK agent. No callbacks needed.

```python
import promptvc
from openinference.instrumentation.google_adk import GoogleADKInstrumentor

promptvc.configure_otel(api_key="pvc_live_xxx", service="my-app")
GoogleADKInstrumentor().instrument()

from google.adk.agents import Agent

root_agent = Agent(
    name="my-agent",
    model="gemini-2.0-flash",
    instruction="You are a helpful assistant.",
)
```

Works with any ADK-supported model backend including LiteLLM-proxied models (`gpt-4o-mini`, `claude-*`, etc.).

---

## Context & metadata

Attach arbitrary metadata to every LLM call made within a block — useful for user-level analytics, A/B testing, and multi-tenant tracing.

```python
with promptvc.context(user_id="u_123", tier="pro", feature="chat"):
    response = client.chat.completions.create(...)
```

Contexts nest — inner keys override outer keys for the same name:

```python
with promptvc.context(user_id="u_123"):
    with promptvc.context(feature="summarizer"):
        response = client.chat.completions.create(...)
        # captured with user_id="u_123", feature="summarizer"
```

---

## Conversations

Group multi-turn calls under a shared `conversation_id` so the full dialogue is linked in the dashboard.

```python
with promptvc.conversation() as conv_id:
    r1 = client.chat.completions.create(...)
    r2 = client.chat.completions.create(...)
    # r1 and r2 share the same conversation_id
```

Pass an explicit ID to resume an existing conversation:

```python
with promptvc.conversation(conversation_id="existing-id"):
    ...
```

---

## Named prompt assets

### Automatic call-site capture

By default, PromptVC walks the call stack on every LLM span and records:

- **File** — the source file that initiated the call
- **Function** — the enclosing Python function name
- **Line** — the exact line number
- **Fingerprint** — a stable hash of file + function + source text used for version tracking

Different callers of the same shared LLM wrapper appear as separate entries in the dashboard automatically, with no decorators required.

```python
def generate_summary(text: str) -> str:
    # Call site captured automatically
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": f"Summarise: {text}"}],
    )
    return response.choices[0].message.content
```

> **Tip:** Put LLM calls inside named Python functions rather than at module level so the call site shows a meaningful function name in the dashboard.

### `@promptvc.observe` — explicit asset names

Give a prompt a stable, human-readable name in the dashboard:

```python
@promptvc.observe(name="invoice-parser")
def parse_invoice(text: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": INVOICE_PROMPT},
            {"role": "user", "content": text},
        ],
    )
    return response.choices[0].message.content
```

### Version tracking

PromptVC automatically identifies prompt versions using your system prompt as the version signal. Two calls with the same system prompt — even with different user messages — are grouped under the same version. When you change the system prompt, a new version is created and the diff is surfaced in the dashboard.

---

## Custom spans (no instrumentor)

If there is no OpenInference instrumentor for your framework or HTTP client, use `promptvc.generation()` — a clean context manager that handles all OTel span creation for you.

```python
import promptvc

promptvc.configure_otel(api_key="pvc_live_xxx", service="my-app")

SYSTEM_PROMPT = "You are a concise assistant."
USER_MESSAGE  = "What is a hash table?"

with promptvc.generation(
    model="gpt-4o-mini",
    provider="openai",
    system=SYSTEM_PROMPT,
    user=USER_MESSAGE,
) as gen:
    reply, prompt_tokens, completion_tokens = my_llm_client(SYSTEM_PROMPT, USER_MESSAGE)
    gen.set_output(reply, input_tokens=prompt_tokens, output_tokens=completion_tokens)
```

`set_output()` records the response text and optional token counts. Call it before the `with` block exits. If you don't call it, the span is still closed cleanly — just without output attributes.

### Multi-turn conversations

Pass the full message list via `messages` instead of the `system`/`user` shorthand:

```python
with promptvc.generation(
    model="claude-sonnet-4-5",
    provider="anthropic",
    messages=[
        {"role": "system",    "content": SYSTEM_PROMPT},
        {"role": "user",      "content": turn_1},
        {"role": "assistant", "content": reply_1},
        {"role": "user",      "content": turn_2},
    ],
) as gen:
    reply = my_client.call(...)
    gen.set_output(reply)
```

### Parameters

| Parameter | Type | Description |
|---|---|---|
| `model` | `str` | Model identifier, e.g. `"gpt-4o-mini"` |
| `system` | `str` | System prompt shorthand |
| `user` | `str` | User message shorthand |
| `messages` | `list[dict]` | Full message list — overrides `system`/`user` if provided |
| `provider` | `str` | Provider name, e.g. `"openai"` / `"anthropic"` |
| `name` | `str` | OTel span name (default `"promptvc.generation"`) |
| `metadata` | `dict` | Arbitrary key/value pairs attached as `promptvc.*` span attributes |

### `set_output()` parameters

| Parameter | Type | Description |
|---|---|---|
| `text` | `str` | The model's plain-text response |
| `input_tokens` | `int` | Prompt token count (optional, for cost tracking) |
| `output_tokens` | `int` | Completion token count (optional, for cost tracking) |

See [examples/custom_span.py](examples/custom_span.py) for a complete runnable example.

---

## PII redaction

PromptVC can strip sensitive data from prompt and response content **before it leaves your process** — nothing is sent to the PromptVC backend in plain text.  Redaction runs on the OTel export path using [Microsoft Presidio](https://github.com/microsoft/presidio) and [spaCy](https://spacy.io/) as the NLP engine.

### Installation

```bash
pip install 'promptvc-sdk[privacy]'
python -m spacy download en_core_web_md   # or en_core_web_sm for a smaller footprint
```

### Enabling redaction

Pass `redact_pii=True` to `configure_otel()`.  That's all that's required — a sensible set of entity types is included by default.

```python
promptvc.configure_otel(
    api_key="pvc_live_xxx",
    service="my-app",
    redact_pii=True,
)
```

Every `<ENTITY_TYPE>` placeholder replaces the original value in the span before it is serialised and POSTed to the ingest API.  Your raw data never travels over the network.

### Default entity types

| Entity | Example input | Placeholder |
|--------|--------------|-------------|
| `PERSON` | Sarah Johnson | `<PERSON>` |
| `EMAIL_ADDRESS` | user@example.com | `<EMAIL_ADDRESS>` |
| `PHONE_NUMBER` | 800-555-0199 | `<PHONE_NUMBER>` |
| `US_SSN` | 123-45-6789 | `<US_SSN>` |
| `CREDIT_CARD` | 4111 1111 1111 1111 | `<CREDIT_CARD>` |
| `IP_ADDRESS` | 192.168.0.1 | `<IP_ADDRESS>` |
| `LOCATION` | 42 Elm Street | `<LOCATION>` |

### Customising entity types

Replace the default list entirely by passing `pii_entities`:

```python
promptvc.configure_otel(
    api_key="pvc_live_xxx",
    redact_pii=True,
    pii_entities=[
        "CREDIT_CARD",
        "US_SSN",
        "US_BANK_NUMBER",
        "IBAN_CODE",
        "EMAIL_ADDRESS",
        "PHONE_NUMBER",
        "PERSON",
        "LOCATION",
        "IP_ADDRESS",
        "URL",
        "US_PASSPORT",
        "US_DRIVER_LICENSE",
        "MEDICAL_LICENSE",
        "DATE_TIME",
    ],
)
```

Full list of supported entity types: [Presidio supported entities](https://microsoft.github.io/presidio/supported_entities/).

### Confidence threshold

Presidio assigns each detection a confidence score (0–1).  Detections below `pii_score_threshold` are ignored.  Lower values catch more — at the cost of more false positives.

```python
promptvc.configure_otel(
    api_key="pvc_live_xxx",
    redact_pii=True,
    pii_score_threshold=0.4,   # default is 0.5
)
```

### Custom regex patterns

Add extra patterns (e.g. internal IDs, account numbers) via `redact_patterns`.  Each entry is a Python regex string; any match is replaced with `<REDACTED>`.

```python
promptvc.configure_otel(
    api_key="pvc_live_xxx",
    redact_pii=True,
    redact_patterns=[r"EMP-\d{6}", r"ACC-[A-Z0-9]{8}"],
)
```

### Using a lighter spaCy model

`en_core_web_md` (the default) gives the best recall.  Swap for `en_core_web_sm` if memory is constrained:

```bash
python -m spacy download en_core_web_sm
```

```python
promptvc.configure_otel(
    api_key="pvc_live_xxx",
    redact_pii=True,
    pii_spacy_model="en_core_web_sm",
)
```

### Previewing redaction locally

Before sending any traffic, you can verify what will be redacted by calling `redact_text` directly:

```python
from promptvc.privacy import redact_text
from promptvc.config import get_config

cfg = get_config()
raw = "Hi, my name is Sarah Johnson. Email: sarah.johnson@example.com"
redacted = redact_text(
    text=raw,
    entities=cfg.pii_entities,
    language=cfg.pii_language,
    threshold=cfg.pii_score_threshold,
    extra_patterns=cfg.redact_patterns,
    spacy_model=cfg.pii_spacy_model,
)
print(redacted)
# Hi, my name is <PERSON>. Email: <EMAIL_ADDRESS>
```

See [examples/pii_redaction.py](examples/pii_redaction.py) for a complete runnable example.

---

## Testing

Disable the SDK entirely in test environments so no spans are exported:

```bash
PROMPTVC_DISABLED=1 pytest
```

Or in code:

```python
import os
os.environ["PROMPTVC_DISABLED"] = "1"
import promptvc  # configure_otel becomes a no-op
```

Run the SDK's own tests:

```bash
poetry run pytest                   # unit tests only
poetry run pytest -m integration    # integration tests (requires API keys)
```

---

## Environment variables reference

| Variable | `configure_otel()` param | Description |
|----------|--------------------------|-------------|
| `PROMPTVC_API_KEY` | `api_key` | Your API key |
| `PROMPTVC_SERVICE` | `service` | Service name |
| `PROMPTVC_ENV` | `env` | Deployment environment |
| `PROMPTVC_BACKEND_URL` | `backend_url` | Override ingest endpoint |
| `PROMPTVC_DEBUG` | `debug` | Set to `1` to enable debug logging |
| `PROMPTVC_DISABLED` | — | Set to `1` to disable the SDK entirely |

---

## License

MIT

