Metadata-Version: 2.4
Name: trajectory-sdk
Version: 0.1.6
Summary: SDK for importing and transforming agent traces into Trajectory format
License-Expression: MIT
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: langsmith>=0.1.0
Requires-Dist: requests>=2.28
Requires-Dist: google-cloud-storage>=2.0
Requires-Dist: tqdm>=4.60
Provides-Extra: parquet
Requires-Dist: pyarrow>=14.0; extra == "parquet"

# trajectory-sdk

Import agent traces from LangSmith (and other providers) into a standardized Trajectory format.

## Install

```bash
pip install trajectory-sdk
```

## Quick Start

### Individual conversation import

```python
import trajectory_sdk as tj

tj.init(provider="langsmith", api_key="lsv2_pt_...", project_id="...")

# List available conversations
conversations = tj.list_conversations()

# Import and save all conversations
trajectories = tj.import_conversations(conversations)
tj.save(trajectories, "./exports")
```

### Bulk export (E2E)

Export all conversations from a LangSmith project, parse into Trajectories, and upload to GCS + BigQuery in three lines:

```python
import trajectory_sdk as tj

tj.init(
    provider="langsmith",
    api_key="lsv2_pt_...",
    project_id="...",
    workspace_id="...",
    destination_id="...",
)
trajectories = tj.import_conversations(bulk=True)
tj.upload(trajectories, dataset="my_dataset")
```

This automatically discovers all trace IDs, triggers a LangSmith bulk export, downloads the parquet from GCS, and parses it into Trajectory objects.

## API

### `tj.init(*, provider, api_key, project_id, storage_dir, debug)`

Configure the SDK. Call once before other functions.

```python
tj.init(
    provider="langsmith",        # trace provider (default: "langsmith")
    api_key="lsv2_pt_...",       # provider API key (or set LANGSMITH_API_KEY env var)
    project_id="...",            # provider project/session ID
    workspace_id="...",          # LangSmith workspace/tenant ID (required for bulk export)
    destination_id="...",        # bulk export destination ID (required for bulk export)
    storage_dir="~/.trajectory", # local staging directory (default)
    debug=False,                 # enable debug logging (default: False)
)
```

### `tj.list_conversations(*, limit) -> list[ConversationSummary]`

List available conversations from the configured provider.

```python
conversations = tj.list_conversations(limit=100)
for c in conversations:
    print(c.conversation_id, c.num_turns)
```

### `tj.import_conversations(conversations, *, stage, redactor) -> list[Trajectory]`

Import conversations and return one Trajectory per conversation. Accepts a list of conversation ID strings or `ConversationSummary` objects.

```python
# By ID
trajectories = tj.import_conversations(["cc_abc123", "cc_def456"])

# By ConversationSummary (from list_conversations)
conversations = tj.list_conversations()
trajectories = tj.import_conversations(conversations)

# Bulk export from a local parquet file
trajectories = tj.import_conversations(bulk=True, source="export.parquet")

# Live bulk export (triggers export, downloads, parses)
trajectories = tj.import_conversations(bulk=True)

# With optional PII redaction
trajectories = tj.import_conversations(["cc_abc123"], redactor=my_redactor)

# Without local staging
trajectories = tj.import_conversations(["cc_abc123"], stage=False)
```

### `tj.upload(trajectories, dataset)`

Upload trajectories to GCS and BigQuery.

```python
tj.upload(trajectories, dataset="my_dataset")
```

### `tj.save(trajectories, output_dir)`

Save trajectories to local JSON files. Each trajectory is written to `{output_dir}/{conversation_id}.json`.

```python
# Save all
tj.save(trajectories, "./exports")
```

# Save a single trajectory
tj.save(trajectories[0], "./exports")
```

## Full Example

```python
import trajectory_sdk as tj

tj.init(
    provider="langsmith",
    api_key="lsv2_pt_...",
    project_id="...",
    workspace_id="...",
    destination_id="...",
)

# Bulk export everything and upload
trajectories = tj.import_conversations(bulk=True)
tj.upload(trajectories, dataset="production_traces")

print(f"Exported {len(trajectories)} trajectories")
for t in trajectories:
    print(f"  {t.task.conversation_id}: {t.task.num_turns} turns, {len(t.steps)} steps")
```
