Metadata-Version: 2.4
Name: vd3
Version: 0.2.0
Summary: Content database library and CLI for VisData 3
Project-URL: Homepage, https://github.com/jmuncaster/vd3
Project-URL: Repository, https://github.com/jmuncaster/vd3
Project-URL: Issues, https://github.com/jmuncaster/vd3/issues
Author-email: Justin Muncaster <justin@muncasterconsulting.com>
License-Expression: MIT
Keywords: computer-vision,dataset,dvc,video
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Video
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.12
Requires-Dist: duckdb>=1.1.0
Requires-Dist: dvc-azure>=3.0
Requires-Dist: dvc-gdrive>=3.0
Requires-Dist: dvc-gs>=3.0
Requires-Dist: dvc-s3>=3.0
Requires-Dist: dvc>=3.50.0
Requires-Dist: ffmpeg-python>=0.2.0
Requires-Dist: orjson>=3.10.0
Requires-Dist: pillow>=10.0.0
Requires-Dist: pydantic>=2.0
Requires-Dist: rich>=13.0.0
Requires-Dist: typer>=0.12.0
Description-Content-Type: text/markdown

# VD3Storage

Content database library and CLI for VisData 3. Manages video and imageset assets, annotations, and worksets backed by MP4/JSON media and CSV-based metadata, with [DVC](https://dvc.org)-managed remote storage.

## Installation

```bash
uv sync
```

To use as a dependency:

```toml
# pyproject.toml
[project]
dependencies = ["vd3"]
```

## Quick Start

```bash
# Initialize a content database in the current directory
vd3 init

# ...or in a specific directory
vd3 init /path/to/mydb

# Add a video under a datasource
vd3 add video clip.mp4 -d my-datasource -p /path/to/mydb

# Add multiple videos with a glob (quote to prevent shell expansion)
vd3 add video '*.mp4' -d my-datasource -p /path/to/mydb

# List assets
vd3 list assets -p /path/to/mydb

# Show media availability
vd3 media status -p /path/to/mydb
```

## Core Concepts

- **Datasource** — groups assets by origin (e.g. `dashcam-2024`, `test-data`). Required when importing.
- **Asset** — a single video (MP4 + JSON metadata) or imageset (directory of images).
- **Workset** — a named subset of assets, optionally organized into packages (folders). Independent of storage layout.
- **Annotation layer** — detections or tracks attached to an asset, with a key (e.g. `gt`, `det/yolo-v8`) and a `human` or `machine` source.

## Adding Assets

### Videos

```bash
# Single file
vd3 add video clip.mp4 -d dashcam

# Glob (recursive)
vd3 add video 'rawdata/**/*.mp4' -d dashcam

# Force re-import of a duplicate (matched by SHA-256)
vd3 add video clip.mp4 -d dashcam --force

# Add and assign to a workset/package
vd3 add video clip.mp4 -d dashcam -w my-workset -k batch1
```

### Imagesets

```bash
# Directory of images
vd3 add imageset /path/to/images -d my-datasource

# Tar archive
vd3 add imageset images.tar -d my-datasource
```

### Annotation results

VD3 JSON detections/tracks into an existing asset:

```bash
vd3 add result results.json -a clip -p /path/to/mydb
```

### COCO

Import COCO annotations into an existing imageset:

```bash
vd3 add coco annotations.json -a my-imageset \
    --layer gt --source human --reviewed-all
```

Import a full COCO dataset (creates the imageset and imports annotations):

```bash
vd3 add coco-dataset annotations.json -d my-datasource \
    --image-root /path/to/images --layer gt
```

## Worksets

```bash
# Create
vd3 workset create "My Experiment"

# Add assets by name or ID
vd3 workset add-asset my-experiment clip-001 clip-002

# ...or by media-path glob (run from the database root; files must be on disk)
cd /path/to/mydb
vd3 workset add-asset my-experiment 'db/media/videos/fc/*.mp4'

# Inspect
vd3 workset list
vd3 workset show my-experiment

# Remove an asset / delete the workset
vd3 workset remove-asset my-experiment clip-001
vd3 workset delete my-experiment
```

## Remote Storage

Media files are tracked by DVC. A content database has a single configured remote.

```bash
# Set the remote (replaces any existing one)
vd3 media remote set gs://my-bucket/vd3-data
vd3 media remote show

# Sync
vd3 media push
vd3 media pull
vd3 media status
```

Supported backends:

| Backend | URL form | Notes |
|---|---|---|
| Google Cloud Storage | `gs://bucket/path` | `gcloud auth application-default login` |
| Amazon S3 | `s3://bucket/path` | Standard AWS credential chain |
| Azure Blob Storage | `azure://container/path` | |
| Google Drive | `gdrive://folder-id` | via `dvc-gdrive` |
| Local / NAS | `/mnt/nas/vd3-backup` | |

## Listing & Inspection

```bash
vd3 list assets             # all assets (filterable)
vd3 list datasources        # all datasources
vd3 list layers -a clip     # annotation layers on an asset
vd3 show clip               # asset details
vd3 info                    # database overview
vd3 query "SELECT ..."      # raw DuckDB SQL against the CSV tables
```

## Exporting

```bash
# Extract frames from a video or images from an imageset
vd3 export frames clip -o ./out
```

## Library API

The CLI is a thin wrapper around `VD3Storage`, which is also usable directly.

```python
from vd3storage import VD3Storage

# Open an existing database (or use VD3Storage.init(path) to create one)
storage = VD3Storage("/path/to/mydb")

# Browse assets
for a in storage.list_assets(datasource="dashcam"):
    print(f"{a.name} ({a.asset_type}): {a.frame_count} frames @ {a.nominal_fps} fps")

# Look up by (datasource, name) or by ID
clip = storage.get_asset("dashcam", "clip-001")
clip = storage.get_asset_by_id("3f1a...")

# Import a video
asset = storage.import_video("clip.mp4", datasource="dashcam")

# Resolve where the media file lives on disk
storage.resolve_media_path(clip)

# Annotation layers
storage.list_annotation_layers(clip.asset_id)
storage.read_annotation_layer(clip.asset_id, "gt")

# Worksets
ws = storage.create_workset("My Experiment")
storage.add_asset_to_workset(ws.workset_id, clip.asset_id, package="batch1")
storage.list_workset_assets(ws.workset_id)

# Raw DuckDB SQL against the underlying CSV tables
rows = storage.execute_sql("SELECT name, frame_count FROM assets WHERE asset_type = 'video'")
```

Other useful methods: `import_imageset`, `import_coco`, `import_coco_dataset`, `import_result`, `export_coco`, `open_video`, `open_imageset`, `get_frame_image`, `add_tag`, `is_media_available`, `pull`, `push`. Inspect `help(VD3Storage)` for the full surface.

## CLI Reference

```
vd3 --help              Top-level help
vd3 <command> --help    Help for a specific command
```

| Command | Description |
|---|---|
| `init` | Initialize a content database (defaults to cwd) |
| `info` | Show database overview |
| `show` | Show asset details |
| `query` | Run raw DuckDB SQL against the CSV tables |
| `remove` | Delete an asset |
| `add video` | Import video files |
| `add imageset` | Import an imageset (directory or tar) |
| `add result` | Import VD3 JSON detections/tracks |
| `add coco` | Import COCO annotations into an existing imageset |
| `add coco-dataset` | Import a COCO dataset (imageset + annotations) |
| `list assets` | List assets |
| `list datasources` | List datasources |
| `list layers` | List annotation layers for an asset |
| `workset create` | Create a workset |
| `workset list` | List worksets |
| `workset show` | Show workset details |
| `workset add-asset` | Add assets to a workset |
| `workset remove-asset` | Remove an asset from a workset |
| `workset delete` | Delete a workset (assets are kept) |
| `media status` | Show media availability |
| `media push` | Push media to remote storage |
| `media pull` | Pull media from remote storage |
| `media remote set` | Set the remote storage URL |
| `media remote show` | Show the configured remote |
| `export frames` | Extract frames from a video or imageset |
