Metadata-Version: 2.4
Name: woodwide-cli
Version: 0.1.0
Summary: Woodwide AI CLI -- agent-friendly command-line client to work with the Wood Wide models
Project-URL: Homepage, https://woodwide.ai
Project-URL: Repository, https://github.com/Wood-Wide-AI/wwai-main
Project-URL: Issues, https://github.com/Wood-Wide-AI/wwai-main/issues
Author: Woodwide AI
License-Expression: MIT
Keywords: agent,ai,cli,ml,woodwide
Classifier: Environment :: Console
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.12
Requires-Dist: click>=8.1
Requires-Dist: httpx>=0.28
Description-Content-Type: text/markdown

# wwai CLI

Agent-friendly command-line client for the Woodwide AI API.

## Install

Recommended (isolated install — works anywhere, no virtualenv needed):

```bash
uv tool install woodwide-cli
# or
pipx install woodwide-cli
```

Or as a regular package in an existing environment:

```bash
pip install woodwide-cli
```

Once installed, `wwai` is on your `PATH`:

```bash
wwai --help
```

### Local development

If you're hacking on the CLI in this monorepo, install it editable from the
package directory:

```bash
cd wwai-cli
uv pip install -e .
wwai --help
```

## Authentication

Provide your API key via any of these methods (first wins):

```bash
# 1. Flag
wwai --api-key sk_live_abc123 datasets list

# 2. Environment variable
export WOODWIDE_API_KEY=sk_live_abc123
wwai datasets list

# 3. Config file
wwai config set --api-key sk_live_abc123
wwai datasets list
```

Config is stored in `~/.wwai/config.json`.

### Pointing to a local API

```bash
wwai config set --api-key sk_test_... --base-url http://localhost:8080
# or
export WOODWIDE_BASE_URL=http://localhost:8080
```

## Commands

```
wwai config set       Save API key / base URL
wwai config show      Print current config (key redacted)

wwai datasets list      List datasets (supports --include-archived)
wwai datasets get       Get dataset details
wwai datasets create    Upload a CSV/Parquet file (multipart)
wwai datasets upload    Upload via signed URL (large files)
wwai datasets update    Update name/description
wwai datasets delete    Soft-delete a dataset
wwai datasets preview   Sample rows from the current version
wwai datasets lookup    Fetch rows by id column value
wwai datasets models    List models trained on this dataset
wwai datasets versions  List versions of a dataset
wwai datasets version   Get a specific dataset version

wwai models list        List models
wwai models get         Get model details
wwai models train       Train a new model
wwai models retrain     Retrain (dispatched via POST /models/train)
wwai models delete      Delete a model and all its versions
wwai models versions    List versions of a model
wwai models version     Get a specific model version

wwai infer sync         Synchronous file inference
wwai infer async        Async file inference
wwai infer batch        Async dataset inference (cached by default; --force-rerun to bypass)

wwai jobs list                List jobs
wwai jobs get                 Get job details
wwai jobs cancel              Cancel a job
wwai jobs retry               Retry a failed job
wwai jobs results             Get job results
wwai jobs wait                Poll until completion
wwai jobs dashboard-activity  Per-day counts + window aggregates

wwai org get          Get organization details

wwai api-keys list    List API keys
wwai api-keys get     Get API key details
wwai api-keys create  Create a new key (supports --scopes, --expires-at)
wwai api-keys update  Update name/scopes/expires_at
wwai api-keys revoke  Revoke a key (keeps audit trail)
wwai api-keys delete  Hard-delete a key (no audit trail)
wwai api-keys rotate  Rotate a key
```

Run `wwai <command> --help` for full options and examples on any command.

## Output format

JSON by default (machine-readable). Use `--format table` for human-readable output:

```bash
wwai --format table datasets list
```

## End-to-end lifecycle

Every ML workflow follows the same three-phase pattern:

```
1. Upload data  -->  2. Train a model  -->  3. Run inference to get results
```

Training and inference are async. They return a `job_id` which you poll
until the job reaches a terminal status (`succeeded` or `failed`). The
`--wait` flag automates this polling for you.

### Phase 1: Upload a dataset

Upload a CSV or Parquet file. This creates a dataset and an ingestion job
that processes the file into a queryable format.

```bash
wwai datasets create --file train.csv --name "sales-q4"
```

Output (JSON):

```json
{
  "dataset": { "id": "DATASET_ID", "version_id": "VERSION_ID", "version_number": 1 },
  "job_id": "INGEST_JOB_ID",
  "status": "queued"
}
```

The dataset is ready once the ingest job succeeds. You can wait for it:

```bash
wwai jobs wait INGEST_JOB_ID
```

### Phase 2: Train a model

Train a model on the dataset. You must specify the model type.
`prediction` models require `--label-column`; other types do not.

```bash
wwai models train \
  --dataset DATASET_ID \
  --type prediction \
  --label-column revenue
```

Output (JSON):

```json
{
  "model": { "id": "MODEL_ID", "version_id": "MV_ID", "version_number": 1 },
  "job_id": "TRAIN_JOB_ID",
  "status": "queued"
}
```

Training is async. You **must wait for it to finish** before running
inference. Use `--wait` to block until done:

```bash
wwai models train \
  --dataset DATASET_ID \
  --type prediction \
  --label-column revenue \
  --wait
```

Or poll separately:

```bash
wwai jobs wait TRAIN_JOB_ID --timeout 600
```

The model is ready for inference once the training job status is `succeeded`.

### Phase 3: Run inference to get results

Once a model is trained (`status: ready`), run inference to get
predictions on new data. There are three modes:

#### Sync inference (small files, results returned immediately)

Upload a CSV and get results back in one call. Best for small inputs.

```bash
wwai infer sync MODEL_ID --file new_data.csv --output-type json
```

The response body contains the prediction results directly. No job
polling needed.

#### Async file inference (large files)

Upload a file and get a job_id back immediately. Poll for results.

```bash
wwai infer async MODEL_ID --file large_data.csv --wait
```

Without `--wait`, you poll manually:

```bash
JOB_ID=$(wwai infer async MODEL_ID --file large_data.csv | jq -r '.job_id')
wwai jobs wait "$JOB_ID"
wwai jobs results "$JOB_ID"
```

`jobs results` returns a signed download URL for the results file:

```json
{
  "job_type": "infer_batch",
  "inference_results_uri": "https://storage.googleapis.com/...signed-url..."
}
```

#### Batch inference (run on an existing dataset)

Reference a dataset already in the system instead of uploading a file.
Best for large inputs that are already stored.

```bash
wwai infer batch MODEL_ID --dataset DATASET_ID --wait
```

Same result format as async -- poll the job, then fetch `jobs results`
for the download URL.

### Complete pipeline example

All three phases chained together:

```bash
# 1. Upload dataset and capture IDs
UPLOAD=$(wwai datasets create --file train.csv --name "sales-q4")
DATASET=$(echo "$UPLOAD" | jq -r '.dataset.id')
INGEST_JOB=$(echo "$UPLOAD" | jq -r '.job_id')

# Wait for ingestion to finish
wwai jobs wait "$INGEST_JOB"

# 2. Train model (--wait blocks until training succeeds)
TRAIN=$(wwai models train \
  --dataset "$DATASET" \
  --type prediction \
  --label-column revenue \
  --wait)
MODEL=$(echo "$TRAIN" | head -1 | jq -r '.model.id')

# 3. Run inference on new data
wwai infer sync "$MODEL" --file new_data.csv --output-type json
```

### Anomaly detection example

Anomaly models don't need a label column. Inference returns anomalous
row indices by default, or per-row scores with `--anomaly-format per_row`.

```bash
# Train
wwai models train --dataset "$DATASET" --type anomaly --wait

# Infer with per-row scores
wwai infer sync MODEL_ID --file transactions.csv --anomaly-format per_row --output-type json
```

### Retraining a model

Create a new version of an existing model with fresh data. Reuses the
original config (type, label column, etc.):

```bash
wwai models retrain MODEL_ID --dataset NEW_DATASET_ID --wait
```

### Job exit codes

All `--wait` flags and `wwai jobs wait` use these exit codes:

| Code | Meaning |
|---|---|
| `0` | Job succeeded |
| `1` | Job failed, canceled, or rejected |
| `2` | Timeout (job still running) |

Status updates are printed to stderr; final JSON goes to stdout.
This means you can pipe stdout to `jq` while still seeing progress.

## Model types

| Type | Description | Requires `--label-column` |
|---|---|---|
| `prediction` | Supervised classification/regression | Yes |
| `anomaly` | Anomaly detection | No |
| `embedding` | Vector embeddings | No |
| `clustering` | Unsupervised clustering | No |
| `factors` | Interpretable factor analysis | No |
| `search` | FAISS nearest-neighbor search | No |

## Environment variables

| Variable | Description |
|---|---|
| `WOODWIDE_API_KEY` | API key (`sk_...`) |
| `WOODWIDE_BASE_URL` | API base URL (default: `https://api.woodwide.ai`) |

## Releasing

The CLI is published to PyPI as [`woodwide-cli`](https://pypi.org/project/woodwide-cli/)
by the `.github/workflows/publish-cli.yml` workflow, which runs on tag pushes
matching `cli-v*` and uploads via PyPI trusted publishing (OIDC, no API tokens).

To cut a release:

1. Bump `[project].version` in `wwai-cli/pyproject.toml` (semver).
2. Commit on `main` (or merge a PR that does).
3. Tag the release commit and push:

   ```bash
   git tag cli-v0.1.0
   git push origin cli-v0.1.0
   ```

The workflow checks that the tag (`cli-v<VERSION>`) matches the pyproject
version, builds sdist + wheel with `uv build`, validates with `twine check`,
and publishes via `pypa/gh-action-pypi-publish`.

### One-time PyPI setup

Before the first publish, a maintainer with PyPI access must register the
trusted publisher (no token required):

1. Reserve the project on PyPI by uploading once manually, or pre-register a
   pending publisher at <https://pypi.org/manage/account/publishing/>.
2. Add a GitHub publisher with:
   - **Owner:** `Wood-Wide-AI`
   - **Repository:** `wwai-main`
   - **Workflow:** `publish-cli.yml`
   - **Environment:** `pypi`
3. In this repo, create a GitHub environment named `pypi` (Settings →
   Environments → New environment). No secrets are needed; optionally require
   reviewers for the publish step.
