Metadata-Version: 2.4
Name: dobby-collector
Version: 0.4.3
Summary: Telemetry collector for AI agents — stream runs, tools, LLM calls, and chains to the Dobby AI Control Plane for governance
Project-URL: Homepage, https://dobby-ai.com
Project-URL: Documentation, https://docs.dobby-ai.com/sdk/python-collector
Project-URL: Repository, https://github.com/gil-dobby/repo-dobby
Project-URL: Issues, https://github.com/gil-dobby/repo-dobby/issues
Author-email: Dobby AI <dev@dobby-ai.com>
License-Expression: MIT
License-File: LICENSE
Keywords: agents,ai,autogen,compliance,crewai,governance,langchain,observability,telemetry
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Requires-Dist: httpx>=0.25.0
Requires-Dist: uuid-utils>=0.7.0
Provides-Extra: autogen
Requires-Dist: autogen-core>=0.4.0; extra == 'autogen'
Provides-Extra: crewai
Requires-Dist: crewai>=1.0.0; extra == 'crewai'
Provides-Extra: dev
Requires-Dist: mypy>=1.8.0; extra == 'dev'
Requires-Dist: pytest-httpx>=0.30.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.3.0; extra == 'dev'
Provides-Extra: langchain
Requires-Dist: langchain-core>=0.2.0; extra == 'langchain'
Provides-Extra: openai-assistants
Requires-Dist: openai>=1.20.0; extra == 'openai-assistants'
Description-Content-Type: text/markdown

# dobby-collector

> Telemetry collector for AI agents — stream runs, tools, LLM calls, and chains to the **[Dobby AI Control Plane](https://dobby-ai.com)** for governance and compliance.

[![Status](https://img.shields.io/badge/status-alpha-orange.svg)](https://github.com/gil-dobby/repo-dobby)
[![Python](https://img.shields.io/badge/python-3.9%2B-blue.svg)](https://www.python.org/)
[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)

`dobby-collector` is the **customer-side** Python package for sending telemetry from any AI agent — LangChain, CrewAI, AutoGen, plain OpenAI/Anthropic SDKs, or custom code — to Dobby's governance plane. Every captured run gets evaluated by the Policy Scanner against your imported Compliance Packs (SOC 2, GDPR, EU AI Act, etc.) and surfaces violations in the Dobby dashboard.

## Install

```bash
pip install dobby-collector
```

## Quickstart

```python
from dobby_collector import init, track, span, start_run, end_run

# 1. Initialize once at agent startup
init(
    api_key="dsdk_...",       # Generate at /dashboard/workloads/connect/python-sdk
    connector_id="wc_...",       # Same wizard hands you the connector ID
    framework="langchain",        # or 'crewai' | 'autogen' | omit to auto-detect
)

# 2. Decorate tool functions
@track(name="search_database", kind="tool")
def search_db(query: str) -> list:
    return db.execute(query)

# 3. Use spans for fine-grained capture
with span("retrieval", kind="tool", inputs={"query": "find AI startups"}):
    docs = retriever.invoke("find AI startups")

# 4. Wrap agent invocations in start/end_run
run = start_run(name="weekly_report", inputs={"week": "2026-W19"})
try:
    output = my_agent.run("Generate the weekly report")
    end_run(run, outputs={"report": output}, status="success")
except Exception as e:
    end_run(run, error=str(e), status="error")
    raise
```

The SDK runs a background thread that flushes events every 10 seconds (or immediately on terminal `run.completed` / `run.failed` events). Telemetry **never blocks** your agent — buffer overflow drops the oldest events silently.

## What gets captured

| What | When | Captured fields |
|---|---|---|
| **Run boundaries** | `start_run` / `end_run` | inputs, outputs, status, duration |
| **Tool calls** | `@track(kind="tool")` or `span(kind="tool")` | tool name, args, output, duration |
| **LLM calls** | LangChain auto-instrument (Phase 2b), or `@track(kind="llm")` | model, prompt, completion, tokens, latency |
| **Custom spans** | `@track()` / `span()` | name, inputs, outputs, duration |

## Config reference

```python
init(
    api_key="dsdk_...",              # REQUIRED — or set DOBBY_API_KEY env var
    connector_id="wc_...",              # REQUIRED — or set DOBBY_CONNECTOR_ID env var
    base_url="https://dobby-ai.com",    # Override for self-hosted; or DOBBY_BASE_URL env
    flush_interval_seconds=10.0,        # How often the sender thread flushes
    max_buffer_events=10_000,           # Ring buffer cap (oldest dropped on overflow)
    framework="auto",                   # Hint or pin: 'langchain' | 'crewai' | 'autogen'
    host_fingerprint=None,              # Optional hostname/container ID for multi-replica
    pii_redact=False,                   # Opt-in: redact emails/SSNs/credit cards (Phase 2b)
    exclude_fields=[],                  # Fields to scrub from `data` payloads (Phase 2b)
)
```

## Environment variables

The SDK reads these as fallbacks for `init()` args:

| Env var | Default | Purpose |
|---|---|---|
| `DOBBY_API_KEY` | — | Connector bearer token (`dsdk_*`) — minted by the wizard |
| `DOBBY_CONNECTOR_ID` | — | Workload connector ID (`wc_*`) |
| `DOBBY_BASE_URL` | `https://dobby-ai.com` | Dobby control-plane URL |

Set them in your deployment config and skip the corresponding `init()` args.

## Lifecycle

```python
from dobby_collector import init, shutdown

init(...)
# ... your agent runs ...
shutdown(timeout_seconds=5.0)  # Drains buffer + stops sender thread
```

`shutdown()` auto-fires via `atexit` if you forget, but call it explicitly when possible — `atexit` hooks have less time to drain before SIGTERM.

## Status — `v0.4.1` LIVE on PyPI · `v0.4.2` pending publish

**Shipped (Phases 1 → 6):**
- ✅ Manual API: `init` / `track` / `span` / `start_run` / `end_run` / `shutdown`
- ✅ In-memory ring buffer + background sender + HTTP retries
- ✅ SQLite DLQ — events survive network outages + process crashes
- ✅ PII redaction + field exclusion (`pii_redact=True` + `exclude_fields=[...]`)
- ✅ **LangChain auto-instrument** — `from dobby_collector.integrations.langchain import DobbyCallbackHandler`
- ✅ **CrewAI auto-instrument** — `from dobby_collector.integrations.crewai import DobbyCrewAIHandler` (CrewAI ≥ 1.0)
- ✅ **AutoGen auto-instrument** — `from dobby_collector.integrations.autogen import DobbyAutoGenLogHandler` (AutoGen ≥ 0.4)
- ✅ **OpenAI Assistants auto-instrument** — `from dobby_collector.integrations.openai_assistants import DobbyAssistantsHandler`
- ✅ **W3C Trace Context auto-emission** per outbound batch (Option C, v0.4.0)

**Quickstart pages** (customer-facing on dobby-ai.com):
- [`/docs/sdk/python-collector/crewai`](https://dobby-ai.com/docs/sdk/python-collector/crewai) — 5-minute CrewAI OSS instrumentation walkthrough
- [`/docs/sdk/python-collector/tracing`](https://dobby-ai.com/docs/sdk/python-collector/tracing) — how W3C `traceparent` unlocks the Surrounding-mode `Tracing Enabled` governance control

**Coming up (separate session):**
- Node.js port — see `.claude/plans/parallel-sessions/node-collector-sdk-port-2026-05-16.md`

See [CHANGELOG.md](./CHANGELOG.md) for the per-version detail.

## License

MIT — see [LICENSE](LICENSE).

## Runnable examples

Five end-to-end working agents you can `python` immediately after `pip install`. All make REAL LLM calls (no mocks) — substitute the model/tool for whatever your stack uses:

- **[examples/langchain_real_agent.py](examples/langchain_real_agent.py)** — LangChain ReAct agent with DuckDuckGo search, instrumented via `DobbyCallbackHandler`. Install: `pip install dobby-collector[langchain]`.
- **[examples/crewai_real_agent.py](examples/crewai_real_agent.py)** — CrewAI sequential crew (Researcher + Writer) with optional web search tool, instrumented via `DobbyCrewAIHandler`. Requires `crewai>=1.0`. Install: `pip install dobby-collector[crewai]`.
- **[examples/autogen_real_agent.py](examples/autogen_real_agent.py)** — AutoGen AssistantAgent (gpt-4o-mini) with a Python tool function, instrumented via `DobbyAutoGenLogHandler`. Requires `autogen-core>=0.4`. Install: `pip install dobby-collector[autogen]`.
- **[examples/openai_assistants_real_agent.py](examples/openai_assistants_real_agent.py)** — OpenAI Assistants API with a function tool, streamed through `client.beta.threads.runs.stream()` with `DobbyAssistantsHandler`. Install: `pip install dobby-collector[openai_assistants]`.
- **[examples/manual_api_real_agent.py](examples/manual_api_real_agent.py)** — Plain Python + OpenAI SDK + `urllib`, instrumented via manual `@track` / `span` / `start_run` API. Recommended for custom orchestration / OpenAI Responses API / any framework without a Dobby auto-handler yet.

All scripts call `https://dobby-ai.com` and produce real `workload_runs` + `compliance_scans` rows visible in the UI within ~60s.

## Customer walkthrough

Step-by-step onboarding guide (10–15 min, written for non-Dobby engineers):

- [docs/customer-onboarding/python-sdk-walkthrough.md](https://github.com/gil-dobby/repo-dobby/blob/main/docs/customer-onboarding/python-sdk-walkthrough.md)

Covers: wizard navigation → install → run → verify in UI + 7 troubleshooting scenarios (auth errors / DLQ / proxies / serverless / PII redaction).

## Links

- Full spec: [docs/spikes/dobby-collector-sdk-spec.md](https://github.com/gil-dobby/repo-dobby/blob/main/docs/spikes/dobby-collector-sdk-spec.md)
- Dobby AI Platform: [dobby-ai.com](https://dobby-ai.com)
- Documentation: [docs.dobby-ai.com/sdk/python-collector](https://docs.dobby-ai.com/sdk/python-collector)
