Metadata-Version: 2.4
Name: copyspace-guard
Version: 0.2.2
Summary: Deterministic data-movement audit, validation and optimization reports
Author: Dmitry Bortoq
Maintainer: Dmitry Bortoq
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/bortoq/copyspace-guard
Project-URL: Repository, https://github.com/bortoq/copyspace-guard
Project-URL: Issues, https://github.com/bortoq/copyspace-guard/issues
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: System :: Benchmark
Classifier: Topic :: Software Development :: Quality Assurance
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE
Provides-Extra: dev
Requires-Dist: build>=1.2; extra == "dev"
Requires-Dist: coverage>=7; extra == "dev"
Requires-Dist: hypothesis>=6.100; extra == "dev"
Requires-Dist: jsonschema>=4.22; extra == "dev"
Requires-Dist: mypy>=1.10; extra == "dev"
Requires-Dist: ruff>=0.5; extra == "dev"
Requires-Dist: setuptools>=77; extra == "dev"
Requires-Dist: twine>=5.1; extra == "dev"
Requires-Dist: wheel>=0.43; extra == "dev"
Dynamic: license-file

# Copy-Space Guard

[![CI](https://github.com/bortoq/copyspace-guard/actions/workflows/ci.yml/badge.svg)](https://github.com/bortoq/copyspace-guard/actions/workflows/ci.yml)
[![Coverage](https://img.shields.io/badge/coverage-94%25-brightgreen.svg)](https://github.com/bortoq/copyspace-guard/actions/workflows/ci.yml)
[![License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](LICENSE)
[![Python](https://img.shields.io/badge/python-3.10%2B-blue.svg)](pyproject.toml)


**Copy-Space Guard** is a metadata-only CLI for deterministic data-movement audits and CI regression gates. Status: production-oriented pilot / v0.2.

It takes a transfer demand matrix (`src_slot,dst_slot,bits_total`), validates schedules under a declared resource model, compares a baseline or customer schedule against a deterministic greedy candidate, and produces sales/engineering reports with lower-bound gap, utilization and estimated savings.

This package is intentionally small and pilot-friendly:

- no external Python dependencies;
- no payload data required;
- deterministic output for CI and regression tracking;
- machine-readable JSON plus human-readable Markdown/HTML reports.

![Copy-Space Guard report preview](docs/assets/report-preview.svg)

## Product promise

> Give us one transfer trace or demand matrix. In a few days we show whether your data-movement plan is conflict-free, how far it is from a deterministic lower bound, and what CI gate can prevent future regressions.

## Quickstart

From this directory:

```bash
python3 -m venv .venv
source .venv/bin/activate
python -m pip install -e .
copyspace-guard analyze \
  --csv examples/ring15.csv \
  --bw 256 \
  --id ai-staging-ring15 \
  --roi examples/roi.yml \
  --outdir artifacts/demo
```

Open:

- `artifacts/demo/report.html`
- `artifacts/demo/report.md`
- `artifacts/demo/summary.json`

Expected terminal shape:

```text
baseline: status=PASS ticks=768 lb=549 gap=0.398907 util=0.7143
greedy:   status=PASS ticks=549 lb=549 gap=0.000000 util=0.9992
saved_ticks=219 estimated_savings=9.73
```

The exact numbers depend on the input CSV and bandwidth value.

## Input format

CSV with header:

```csv
src_slot,dst_slot,bits_total
0,1,65536
1,2,65536
```

Meaning:

- `src_slot` — source endpoint ID;
- `dst_slot` — destination endpoint ID;
- `bits_total` — transfer volume from source to destination.

Duplicate pairs are automatically merged.

## Models v0: STRICT1 and READ1_WRITE1

`STRICT1`: within one tick, each slot can participate in at most one transfer, either as source or destination.

`READ1_WRITE1`: within one tick, each slot may send at most once and receive at most once.

This is a useful baseline for:

- endpoint-limited transfer systems;
- shuffle/staging/replication analysis;
- CI regression gates;
- comparing scheduler strategies;
- first customer audits where full topology is not yet modeled.

It is not a universal network model. For real pilots, confirm whether the client needs extensions such as READ1_WRITE1, broadcast, topology-aware bandwidth, asymmetric links or tier-aware storage constraints.

## Commands

### Check local pilot readiness

```bash
copyspace-guard --version
copyspace-guard doctor --root .
copyspace-guard doctor --root . --json
```

### Analyze CSV and generate reports

```bash
copyspace-guard analyze --csv INPUT.csv --bw 256 --outdir artifacts/run
```

Optional:

```bash
--slots N
--id workload-name
--notes "free text"
--cost-per-tick 0.02
--model STRICT1  # or READ1_WRITE1
--bounds-subset-limit 20
--max-errors 100
--max-demands 100000
--max-slots 10000
--max-output-ticks 1000000
```

`--bounds-subset-limit` controls exhaustive STRICT1 subset-density enumeration and is protected by a hard cap to avoid accidental exponential runs.

### Validate a schedule

```bash
copyspace-guard validate artifacts/run/instance.json artifacts/run/schedule_greedy.json --report artifacts/run/validation.json
```

### Regenerate Markdown/HTML reports

```bash
copyspace-guard report artifacts/run/summary.json --outdir artifacts/report
```

### Validate generated artifact contracts

```bash
copyspace-guard validate-artifact --kind summary artifacts/run/summary.json
```

### Run production-oriented checks

```bash
make production-check
```

This runs release checks plus a small synthetic performance suite. The suite can also be run directly:

```bash
copyspace-guard bench-suite --outdir artifacts/bench-suite --max-total-seconds 30
```


## Customer/current schedule input

For a real pilot, the customer may already have an actual schedule. Use CSV:

```csv
tick,src_slot,dst_slot,len_bits
0,0,1,256
0,2,3,256
1,1,2,256
```

Then run:

```bash
copyspace-guard analyze \
  --csv examples/ring15.csv \
  --bw 256 \
  --current-schedule-csv customer_schedule.csv \
  --outdir artifacts/customer-run
```

You can also convert a schedule CSV to JSON:

```bash
copyspace-guard schedule-csv-to-json --csv customer_schedule.csv --out schedule.json
```

## CI gate command

After `analyze`, fail/pass thresholds can be checked locally or in CI:

```bash
copyspace-guard gate artifacts/demo/summary.json \
  --report greedy \
  --max-gap 0.15 \
  --min-utilization 0.85
```

Exit code `0` means pass, exit code `2` means fail.

## Files generated by `analyze`

- `instance.json` — normalized workload contract.
- `schedule_baseline.json` or `schedule_customer_current.json` — current schedule artifact, unless `--summary-only` is used.
- `schedule_greedy.json` — deterministic candidate schedule, unless `--summary-only` is used.
- `schedule_baseline.csv` or `schedule_customer_current.csv` — CSV schedule artifact, unless `--summary-only` is used.
- `schedule_greedy.csv` — deterministic candidate schedule CSV, unless `--summary-only` is used.
- `report_baseline.json` or `report_customer_current.json` — validation metrics for the current schedule.
- `report_greedy.json` — validation metrics for candidate.
- `summary.json` — machine-readable comparison summary.
- `report.md` — human-readable audit report.
- `report.html` — shareable report for demos and sales calls.

## v0.2 boundaries

Included:

- volume-based demand modeling;
- deterministic baseline and greedy schedules;
- STRICT1 and READ1_WRITE1 validators;
- lower-bound gap and utilization metrics;
- ROI estimates via `roi.yml` or a simple `$ per tick` assumption;
- sales-ready report artifacts.

Not included yet:

- production security hardening;
- topology/path selection;
- real transfer execution;
- cloud adapter importers;
- address-level offset validation;
- VCopySpace receipt ledger integration.

Known operational caveats:

- Customer schedule CSVs used in streaming mode must be sorted by non-decreasing `tick`.
- Full artifact mode can produce large schedule JSON/CSV files; use `--summary-only` for large pilots and CI.
- For large STRICT1 slot counts, subset-density lower bounds may be partial; check `bounds_complete` in reports.
- The greedy schedule is deterministic and useful for comparison, but it is not a proof of global optimality.

## How this maps to the larger project set

- `copy-space` → scheduler, validator, lower-bound gap, CI-gate idea.
- `vcopyspace` → future enterprise layer: receipt-based metering, ledger, trace/replay, cost model.
- `DDAS` → long-term deterministic state-transition foundation.

## ROI mode

Turn saved ticks into business impact:

```bash
copyspace-guard analyze \
  --csv examples/ring15.csv \
  --bw 256 \
  --roi examples/roi.yml \
  --outdir artifacts/demo
```

Example `examples/roi.yml`:

```yaml
roi:
  tick_seconds: 1
  gpu_count_blocked: 64
  gpu_hour_cost_usd: 2.50
  runs_per_day: 12
  days_per_month: 30
```

## Gate config file

```bash
copyspace-guard gate artifacts/demo/summary.json \
  --config examples/copyspace_guard.yml
```

Example config:

```yaml
gates:
  report: greedy
  max_gap_to_lower_bound: 0.15
  min_utilization: 0.85
```

## Docker

```bash
docker build -t copyspace-guard .
docker run --rm --user "$(id -u):$(id -g)" -v "$PWD:/work" copyspace-guard analyze \
  --csv examples/ring15.csv \
  --bw 256 \
  --roi examples/roi.yml \
  --outdir artifacts/docker-demo
```

## Industry demos

```bash
copyspace-guard analyze --csv examples/ai_checkpoint.csv --bw 1048576 --roi examples/roi.yml --outdir artifacts/ai-checkpoint
copyspace-guard analyze --csv examples/db_shuffle.csv --bw 262144 --roi examples/roi.yml --outdir artifacts/db-shuffle
copyspace-guard analyze --csv examples/storage_replication.csv --bw 1048576 --roi examples/roi.yml --outdir artifacts/storage-replication
copyspace-guard analyze --csv examples/kv_cache_movement.csv --bw 1048576 --roi examples/roi.yml --outdir artifacts/kv-cache
```

## Client package

See `client-package/` for a minimal package that can be sent to a pilot customer:

- `README_CLIENT.md`
- `sample_demands.csv`
- `sample_schedule.csv`
- `roi.yml`
- `copyspace_guard.yml`
- `run_local.sh`
- `intake.md`

## Anonymize demands or schedules

```bash
copyspace-guard anonymize \
  --kind demands \
  --csv raw_demands.csv \
  --out anonymized_demands.csv \
  --mapping slot_mapping.json

copyspace-guard anonymize \
  --kind schedule \
  --csv raw_schedule.csv \
  --out anonymized_schedule.csv \
  --mapping-in slot_mapping.json \
  --mapping schedule_slot_mapping.json
```

Use `--mapping-in` when anonymizing demands and schedules that must share the same slot-ID mapping. Do not share `mapping.json` unless you intend to reveal the original endpoint names.

## Sales-oriented demos

Bad current schedule vs candidate:

```bash
copyspace-guard analyze   --csv examples/demo_bad_current_demands.csv   --bw 256   --current-schedule-csv examples/demo_bad_current_schedule.csv   --roi examples/roi.yml   --outdir artifacts/bad-current-demo
```

Conflict detection:

```bash
copyspace-guard analyze   --csv examples/demo_conflict_demands.csv   --bw 256   --current-schedule-csv examples/demo_conflict_schedule.csv   --summary-only   --outdir artifacts/conflict-demo
```

Large workloads can use `--summary-only` to avoid writing full schedule JSON/CSV artifacts. In this mode generated baseline/candidate schedules are streamed into the validator instead of materialized in memory. Customer schedule CSVs used in streaming mode must be sorted by non-decreasing `tick`.


## Model and bound details

- Model limitations: `docs/MODEL_LIMITATIONS.md`
- Lower-bound definitions: `docs/BOUNDS.md`
- JSON schemas: `docs/SCHEMAS.md`
- Artifact contracts: `docs/ARTIFACT_CONTRACTS.md`
- Performance notes: `docs/PERFORMANCE.md`
- Pilot readiness: `docs/PILOT_READINESS.md`
- Production readiness: `docs/PRODUCTION_READINESS.md`
- Operations guide: `docs/OPERATIONS.md`
- Release process: `docs/RELEASE_PROCESS.md`
- Threat model: `docs/THREAT_MODEL.md`
- Data handling: `docs/DATA_HANDLING.md`
- Changelog: `CHANGELOG.md`


## Benchmark

```bash
copyspace-guard bench   --slots 64   --bits-per-edge 1048576   --bw 1048576   --outdir artifacts/bench
```
