Metadata-Version: 2.4
Name: cyberwave-edge-core
Version: 0.1.4
Summary: The core component of the Cyberwave Edge Node
Project-URL: Homepage, https://cyberwave.com
Project-URL: Documentation, https://docs.cyberwave.com
Project-URL: Repository, https://github.com/cyberwave-os/cyberwave-edge-core
Project-URL: Issues, https://github.com/cyberwave-os/cyberwave-edge-core/issues
Author-email: Cyberwave <info@cyberwave.com>
License: MIT
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: <4.0,>=3.10
Requires-Dist: click>=8.1.0
Requires-Dist: cyberwave>=0.4.4
Requires-Dist: httpx>=0.25.0
Requires-Dist: rich>=13.0.0
Provides-Extra: build
Requires-Dist: pyinstaller>=6.0.0; extra == 'build'
Provides-Extra: dev
Requires-Dist: mypy>=1.5.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Description-Content-Type: text/markdown

<p align="center">
  <a href="https://cyberwave.com">
    <img src="https://cyberwave.com/cyberwave-logo-black.svg" alt="Cyberwave logo" width="240" />
  </a>
</p>

# Cyberwave Edge Core

This module is part of **Cyberwave: Making the physical world programmable**.

Cyberwave Edge Core acts as the orchestrator of Cyberwave edge drivers.

[![License](https://img.shields.io/badge/License-MIT-orange.svg)](https://github.com/cyberwave-os/cyberwave-edge-core/blob/main/LICENSE)
[![Documentation](https://img.shields.io/badge/Documentation-docs.cyberwave.com-orange)](https://docs.cyberwave.com)
[![Discord](https://badgen.net/badge/icon/discord?icon=discord&label&color=orange)](https://discord.gg/dfGhNrawyF)
[![PyPI version](https://img.shields.io/pypi/v/cyberwave-edge-core.svg)](https://pypi.org/project/cyberwave-edge-core/)
[![PyPI Python versions](https://img.shields.io/pypi/pyversions/cyberwave-edge-core.svg)](https://pypi.org/project/cyberwave-edge-core/)
[![Release to PyPI](https://github.com/cyberwave-os/cyberwave-edge-core/actions/workflows/release-pypi.yml/badge.svg)](https://github.com/cyberwave-os/cyberwave-edge-core/actions/workflows/release-pypi.yml)

## Quickstart

SSH to the edge device where you want to install Edge Core, then install the Cyberwave CLI and run the installer:

```bash
# Install the Cyberwave CLI (one-time setup)
curl -fsSL https://cyberwave.com/install.sh | bash

# Run the edge installer (interactive)
sudo cyberwave edge install
```

The installer will prompt you to log in with your Cyberwave account, select a workspace and environment, and persist configuration under `~/.cyberwave/`. You can override the config directory via the `CYBERWAVE_EDGE_CONFIG_DIR` environment variable. Legacy installs that used `/etc/cyberwave` are automatically migrated.

Permissions on the config directory:

- `credentials.json` is written with mode `0600` (owner-only) because it holds your API token.
- `fingerprint.json` is written with mode `0644` (world-readable) because the device fingerprint is a hardware identifier, not a secret. This lets user shells read it even when Edge Core runs as root via `systemd`.
- On `systemd` deployments where Edge Core runs as root, the service re-chowns files under `~/.cyberwave/` on startup so they stay owned by the user whose home directory holds them.

> Don't have a Cyberwave account? Get one at [cyberwave.com](https://cyberwave.com)

### Config files created

The installer and Edge Core create these files in the config directory:

| File               | Description                                 |
| ------------------ | ------------------------------------------- |
| `credentials.json` | API token and workspace information         |
| `fingerprint.json` | Device fingerprint (generated by Edge Core) |
| `environment.json` | Selected environment and twin UUIDs         |

Edge Core requires `credentials.json` to operate. `fingerprint.json` is produced by Edge Core; `environment.json` is written by the CLI during setup.

## How Edge Core works

On startup (service or direct run), Edge Core performs the following steps:

1. Validate credentials from `credentials.json`.
2. Connect to the backend MQTT broker and verify connectivity.
3. Start a **bootstrap health publisher** that sends periodic edge health messages while drivers are starting up.
4. Register the edge device and record a unique `edge_fingerprint`.
5. Download the selected environment and resolve twins linked to the fingerprint.
6. Start drivers for linked twins. Special handling for attached camera child twins:
   - If a twin is a camera child (has `attach_to_twin_uuid`), Edge Core does not start a separate driver for it.
   - Camera child UUIDs are passed to the parent driver via `CYBERWAVE_CHILD_TWIN_UUIDS`.
7. **Stop the bootstrap health publisher** once drivers are running (drivers publish their own health messages; keeping both would produce duplicate signals in the UI).
8. Pull workflow workers (`wf_*.py`) for the twins listed in `environment.json` and start the worker container (if any worker files exist in `{config_dir}/workers/`).

### Scope of workflow worker sync

Workflow worker sync is scoped strictly to the twin UUIDs the operator selected at install time and persisted to `environment.json` under `twin_uuids`. This is intentionally narrower than the fingerprint-based discovery used for drivers and the bootstrap health publisher: an environment can carry stale `metadata.edge_fingerprint` entries from previous installs, and we don't want those to pull unrelated `wf_*.py` files onto this edge. For backward compatibility, installs that predate the `twin_uuids` field still fall back to fingerprint-based discovery.

During driver startup, Docker image pull progress is mirrored into the edge-core service logs and forwarded through the same MQTT-backed driver log stream used for runtime container logs, so users can follow image download progress remotely.

### Remote restart (Edge REST API)

Request a remote restart of Edge Core via the REST API:

```http
POST /api/v1/edges/{uuid}/restart-core
```

The API will publish an MQTT message to the edge's command topic:

Topic: `edges/{edge_uuid}/command`

Example payload:

```json
{ "command": "restart_edge_core" }
```

When Edge Core receives this command it performs a graceful restart consisting of:

1. Stopping the worker container (if running).
2. Removing cached twin JSON files from the edge config directory.
3. Stopping and removing any edge-managed driver containers, then pruning stopped containers.
4. Resolving any active `driver_starting` alerts on the affected twins so leftovers from the previous run do not stay visible after the restart.
5. Re-downloading the selected environment and restarting drivers.
6. Restarting the worker container (if worker files exist).

The restart is intended to preserve durable state where possible. If connectivity is available before shutdown, Edge Core will attempt to sync any twin JSON changes back to the backend.

Each driver startup attempt creates a `driver_starting` twin alert that tracks the in-flight startup (image pull, container launch, post-launch health probe). The alert is automatically resolved once the driver container is observed `running`, and is annotated and resolved with a failure phase if the attempt fails. The alert is therefore guaranteed to clear when the driver has restarted; longer-lived failure conditions are surfaced through separate `driver_start_failure` alerts created by the orchestrator.

## Model Manager (ML model cache)

Edge Core includes a `ModelManager` that resolves ML model weights into a local cache before starting the worker container. It is designed for both online deployments and air-gapped sites: it prefers fresh weights from Cyberwave when the network is available, falls back to upstream public mirrors, and finally to whatever is already on disk.

**Cache location:**

| Platform | Default path |
|---|---|
| All | `~/.cyberwave/models/` |

Override with `CYBERWAVE_EDGE_CONFIG_DIR`.

**Cache layout:**

```
<cache_dir>/
├── manifest.json            # index of all cached models
├── yolov8n/
│   ├── yolov8n.pt           # weight file
│   └── metadata.json        # checksum, runtime, source URL, upstream URL
└── background-subtraction/
    └── ...
```

**Model requirements discovery:** Edge Core scans `*.py` files in `~/.cyberwave/workers/` for `cw.models.load(...)` calls to determine which weights to ensure.

`~/.cyberwave/models/` and `~/.cyberwave/workers/` are created eagerly on Edge Core startup (even before any worker runs), with ownership matching the invoking user, so operators can drop pre-staged weights into `~/.cyberwave/models/{model_id}/` from a regular shell.

### Resolution order

For each required model, `ensure_model(model_id)` runs the following steps:

1. **Reconcile disk.** If `cache_dir/{model_id}/` already contains a weight file (with or without a sidecar), it is registered in the manifest. A missing sidecar is generated from the on-disk file (with a freshly computed SHA-256) and tagged `downloaded_from: prestaged`. This is how an operator pre-stages weights from a USB stick on an air-gapped site.
2. **Verify cache integrity.** If the local file's SHA-256 matches the manifest checksum, the cache is *intact*.
3. **Pre-staged short-circuit.** When an intact entry is tagged `downloaded_from: prestaged`, Edge Core returns it without ever contacting the catalog. Pre-staged files are operator-curated truth; to force a re-download, evict the model directory.
4. **Best-effort catalog probe.** For non-prestaged intact entries, Edge Core does a short-timeout `GET /api/v1/mlmodels/...` to compare checksums.
   * Catalog unreachable, no checksum, or matching checksum → return the cached file (no download).
   * Catalog returns a different checksum → fall through to the download path.
5. **Download.** Sources are tried in priority order:
   1. **Cyberwave-hosted signed URL** from `GET /api/v1/mlmodels/{uuid}/weights` — used for checkpoints we have uploaded to our private GCS bucket (e.g. internally trained or mirrored models). Authenticated, served from infrastructure we control.
   2. **Upstream weights URL** from the catalog entry (`download_url` / `metadata.upstream_weights_url`) — used for community checkpoints we did not mirror.

   The first source that yields a checksum-verified file wins. The sidecar records `downloaded_from` (`artifact_url` / `download_url` / `prestaged`), `source_url` (the public URL we fetched, or `null` for artifact downloads — the signed URL expires in minutes and is useless to persist), and `upstream_url` (provenance).
6. **Fail-soft.** If every download attempt fails *and* the cached file is intact, Edge Core returns the cached path with a warning. This keeps workers running across transient network failures and on permanently air-gapped sites. If the cache is empty or corrupt, `RuntimeError` is raised.

**Cache integrity:** SHA-256 checksums are verified on every `ensure_model` call when a checksum is recorded — on cold start, after every download, during disk reconciliation, and on every warm-cache hit. There is no shortcut. For multi-gigabyte checkpoints this is the dominant cost of the call, but `ensure_model` only runs when a worker file changes (rare) or the worker container restarts; in exchange we get unconditional bit-rot detection. A checksum mismatch on a downloaded artifact triggers a re-download attempt; a download whose checksum does not match the catalog is rejected and the partial file removed.

### Pre-staging weights for air-gapped deployments

On a site without internet access, an operator can place weights directly into the cache:

```bash
mkdir -p ~/.cyberwave/models/yolov8n
cp /usb-stick/yolov8n.pt ~/.cyberwave/models/yolov8n/
```

Edge Core picks up the file on the next `ensure_model("yolov8n")` call, computes a SHA-256, and writes a sidecar `metadata.json` so subsequent runs are deterministic. The `runtime` field is inferred from the file extension (`.pt` → `ultralytics`, `.onnx` → `onnxruntime`, `.engine`/`.trt` → `tensorrt`, `.tflite` → `tflite`, `.pth` → `torch`, `.xml` → `opencv`); provide a hand-written `metadata.json` to override.

**Updating in place.** Operators can drop a new build into the same directory and Edge Core will detect the change on the next call:

```bash
cp /usb-stick/yolov8n-v2.pt ~/.cyberwave/models/yolov8n/yolov8n.pt
```

The mismatch between the on-disk SHA-256 and the manifest checksum triggers a re-stamp (not a re-download), provided the sidecar still records `downloaded_from: prestaged`. This keeps offline edges functional across model upgrades. Files that were previously *downloaded* by Edge Core keep the corruption-detection semantics — bit-rot still triggers a re-download attempt rather than being silently accepted.

**Pre-staged files are never auto-overwritten by catalog updates.** Once a file lives under `cache_dir/{model_id}/` with a `downloaded_from: prestaged` sidecar, Edge Core treats it as the source of truth and skips the catalog probe entirely. To force a re-download from the Cyberwave catalog, evict the model:

```bash
rm -rf ~/.cyberwave/models/yolov8n
```

Provide a hand-written `metadata.json` (with `filename`, `checksum_sha256`, and `runtime`) when there are multiple weight files in the directory or when corruption detection should compare against a known-good hash.

## Worker container

Edge Core manages one ML worker container per edge device (container name: `cyberwave-worker-{env_uuid[:8]}`). The worker container runs Python worker scripts from the local workers directory and has access to cached model weights.

### Worker directory layout

Place worker scripts in `{config_dir}/workers/` (default: `~/.cyberwave/workers/`):

```
~/.cyberwave/
├── workers/
│   ├── detect_people.py        # Custom worker
│   └── cyberwave.yml           # Optional: list model requirements
└── models/                     # Auto-managed model cache
    ├── manifest.json
    └── yolov8n/
        └── yolov8n.pt
```

### cyberwave.yml

Optionally declare model requirements so Edge Core can pre-download them before starting the worker container:

```yaml
models:
  - yolov8n
  - background-subtraction
```

Edge Core also auto-detects models by scanning `cw.models.load("...")` calls in worker Python files.

### Worker container environment variables

| Variable | Value |
|---|---|
| `CYBERWAVE_API_KEY` | Injected from credentials |
| `CYBERWAVE_ENVIRONMENT_UUID` | Active environment UUID |
| `CYBERWAVE_TWIN_UUIDS` | Comma-separated twin UUIDs in environment |
| `CYBERWAVE_DATA_BACKEND` | `zenoh` |
| `ZENOH_CONNECT` | Set when a Zenoh router is configured |
| `ZENOH_SHARED_MEMORY` | `false` by default (opt-in; requires `--ipc=host`) |

### File watching and hot-reload

Edge Core monitors `{config_dir}/workers/` every reconcile cycle (~15 seconds). When `.py` files are added, removed, or modified, Edge Core automatically:

1. Re-scans model requirements.
2. Pre-downloads any missing models.
3. Restarts the worker container with the updated files.

A minimum cool-down of 10 seconds between successive automatic restarts prevents rapid churn when files are written incrementally (e.g. by `rsync` or `scp`).

### Workflow-driven worker lifecycle

The worker container is brought up and torn down based on whether any active workflows are currently synced for the connected twins:

- **Startup:** after pulling worker files from the backend (step 8), Edge Core inspects `{config_dir}/workers/`. If at least one `wf_*.py` file is present, the worker container is started; otherwise it is left down — and the `cyberwaveos/edge-ml-worker` image is **not** pulled.
- **Periodic reconcile:** every ~5 minutes (configurable via `CYBERWAVE_WORKER_SYNC_INTERVAL_LOOPS`), Edge Core resyncs worker files from the backend. If a workflow was activated mid-run and new files appeared, the worker container is started. If every workflow was deactivated and the directory is now empty, the container is stopped. Both calls are idempotent.
- **Immediate reconcile on activate:** when a `run_on_edge` workflow is activated (UI, CLI, or API), the backend publishes a `sync_workflows` command on `cyberwave/twin/{twin_uuid}/command` for each twin the workflow references. Edge Core runs `reconcile_worker_sync` right away so the new `wf_*.py` lands within seconds instead of up to one periodic interval. Failures fall back to the periodic reconcile; the MQTT nudge is best-effort, not a correctness guarantee.
- **Sync errors:** if a sync cycle reports any errors, the lifecycle reconcile is skipped to avoid churning a healthy worker on transient API failures. The next successful sync re-evaluates state.

### Worker image refresh policy

`WorkerManager._ensure_image_pulled` decides whether to issue `docker pull` before each worker (re)start:

| Tag basename (with optional `-gpu`/`-cpu`/`-arch` suffix) | Mutability | Pull behaviour |
|---|---|---|
| `latest`, `dev`, `local`, `staging`, `nightly`, `edge`, `main`, `master` | Mutable | Pull every time, even when the image is already present locally. If the registry is unreachable but a local copy exists, fall back to the local copy and warn. |
| Anything else (`v1.2.3`, dated build IDs, `@sha256:…`) | Immutable | Skip the pull when the image is already present locally; only pull when missing. |

This avoids the previous failure mode where a stale `cyberwaveos/edge-ml-worker:dev-gpu` image stayed cached after a developer pushed a new build — operators no longer need to remember to `docker rmi` before restarting the worker. Immutable tags keep the original fast-path so versioned production deployments are not slowed down by an extra round-trip to the registry on every restart.

### Pinning a custom worker image (`CYBERWAVE_WORKER_IMAGE`)

`resolve_worker_image()` consults `CYBERWAVE_WORKER_IMAGE` before falling back to the `CYBERWAVE_ENVIRONMENT`-derived tag. This is the worker-side counterpart to the `driver_overrides` field in `credentials.json` for camera/asset drivers, and is the recommended way to run a locally-built or hot-patched worker image without re-pushing to the registry.

Two ways to set it (both honoured by `get_runtime_env_var`):

- **Operator config (preferred for dev hosts, no `sudo`)** — add to the `envs` block of `~/.cyberwave/credentials.json`:

  ```json
  {
    "envs": {
      "CYBERWAVE_ENVIRONMENT": "dev",
      "CYBERWAVE_WORKER_IMAGE": "cyberwaveos/edge-ml-worker:local"
    }
  }
  ```

- **Systemd dropin (preferred for managed deployments)** — `sudo systemctl edit cyberwave-edge-core`:

  ```ini
  [Service]
  Environment=CYBERWAVE_WORKER_IMAGE=cyberwaveos/edge-ml-worker:local
  ```

Either way, restart edge-core (`sudo systemctl restart cyberwave-edge-core`) to reload the resolver.

Use the `:local` tag (no `-gpu`/`-cpu` suffix) — same convention as the camera-driver `:local` tag. `_run_container` auto-appends `-gpu` for `cyberwaveos/edge-ml-worker:*` overrides on GPU hosts, so the operator only commits and pins one tag.

Typical hot-fix loop (mirroring the camera-driver `:local` tag pattern):

```bash
# 1. Hot-patch the running container (e.g. swap an SDK runtime file).
WORKER=cyberwave-worker-<fingerprint>
docker cp /path/to/patched/file.py "$WORKER:/usr/local/lib/python3.12/dist-packages/cyberwave/.../file.py"
docker exec -u root "$WORKER" rm /usr/local/.../file.cpython-312-x86_64-linux-gnu.so

# 2. Snapshot the patched container as a local-only tag (and an alias on the
#    GPU-suffixed tag, so the same image is used regardless of which path
#    `_run_container` resolves to on this host).
docker commit "$WORKER" cyberwaveos/edge-ml-worker:local
docker tag    cyberwaveos/edge-ml-worker:local cyberwaveos/edge-ml-worker:local-gpu

# 3. Tell edge-core to use it (see the two options above), then restart it.
#    Pull will fail (registry has no :local) and `_ensure_image_pulled`
#    falls back to the locally-present image.
sudo systemctl restart cyberwave-edge-core
```

The patched image survives every `docker rm`/`docker run` cycle from edge-core's reconcile loop. To revert, remove the env var (or `systemctl revert cyberwave-edge-core` if you used the dropin) and `docker rmi cyberwaveos/edge-ml-worker:local cyberwaveos/edge-ml-worker:local-gpu`.

When overriding to a registry outside `cyberwaveos/edge-ml-worker:*` (e.g. an internal mirror), include the `-gpu` suffix yourself if you need GPU access — the auto-suffix path in `_run_container` only triggers for the canonical `cyberwaveos/edge-ml-worker:` prefix.

### Worker health monitoring

Edge Core continuously monitors the worker container for spontaneous exits and crash loops:

- **Restart accounting**: every restart is recorded with a timestamp and reason.
- **Sliding-window rate limiting**: if more than 5 restarts occur within 5 minutes, the circuit-breaker trips and automatic restarts are suppressed. The breaker resets automatically once the window clears.
- **Spontaneous exit detection**: if the container exits without a deliberate restart, a warning is logged so operators can investigate.

Use `cyberwave-edge-core worker health` to inspect the full restart history and circuit-breaker state.

### Resource limits

You can constrain the worker container's CPU and memory usage by setting `CYBERWAVE_WORKER_CPU_QUOTA_PERCENT` and `CYBERWAVE_WORKER_MEMORY_MB` environment variables on the edge host (both optional). When set, Edge Core passes the corresponding `--cpu-quota`, `--cpu-period`, and `--memory` flags to `docker run`.

GPU memory fraction can be limited via `CYBERWAVE_GPU_MEM_FRACTION` (a float between 0 and 1); this is passed as an env var into the worker container.

### GPU support

When an NVIDIA container runtime is detected (`docker info` reports `nvidia` runtime), Edge Core adds `--gpus all` to the **worker** container's `docker run` command.

**Driver GPU passthrough** is opt-in via asset metadata. When a driver's config includes `"prefer_gpu": true` and the host has:

1. NVIDIA container runtime available (`docker info` reports `nvidia`), **and**
2. `nvidia` set as the default runtime in `/etc/docker/daemon.json`

…Edge Core passes `--gpus` to the driver container. The optional `"gpu"` field controls which GPUs are exposed:

| `gpu` value | Docker flag | Use case |
|---|---|---|
| _(not set)_ | `--gpus all` | All available GPUs (default) |
| `1` | `--gpus 1` | Limit to 1 GPU |
| `"device=0,2"` | `--gpus "device=0,2"` | Specific GPU devices |

Example driver metadata:

```json
{
  "drivers": {
    "default": {
      "docker_image": "cyberwaveos/go2-ros2-driver:humble",
      "prefer_gpu": true,
      "gpu": "all"
    }
  }
}
```

If the NVIDIA runtime is available but not configured as the default in `daemon.json`, Edge Core logs an informational message with setup instructions instead of silently skipping GPU passthrough.

### Jetson detection

Edge Core auto-detects NVIDIA Jetson hardware via `/etc/nv_tegra_release`. When running on a Jetson:

- The platform key `linux-aarch64-jetson` is added to the driver resolution order, allowing asset metadata to specify a Jetson-optimised image.
- If no `linux-aarch64-jetson` driver key exists in metadata, Edge Core rewrites the image tag by prepending `jetson-` (e.g. `cyberwaveos/go2-ros2-driver:humble` → `cyberwaveos/go2-ros2-driver:jetson-humble`). If the Jetson-prefixed image is not available, it falls back to the original tag automatically.

Override detection with `CYBERWAVE_PLATFORM_VARIANT=jetson` for testing.

## Multi-camera orchestration

When multiple cameras are connected to the same edge device (each represented as a separate digital twin), Edge Core orchestrates them as follows:

1. **One driver per camera:** Each camera twin gets its own `cyberwave-driver-{uuid[:8]}` container. Child camera twins that are attached to a parent twin share the parent's driver instead.
2. **One shared worker:** A single `cyberwave-worker-{env[:8]}` container receives frames from all cameras. The worker container receives `CYBERWAVE_TWIN_UUIDS` as a comma-separated list of all linked twins.
3. **Readiness probes:** Edge Core waits for all driver containers to reach a `running` state before starting the worker. If some drivers fail, the worker starts anyway so healthy cameras can be utilized.
4. **Model pre-download:** Before the worker starts, Edge Core scans worker scripts and pre-downloads all referenced ML models.
5. **Driver health monitoring:** If a driver goes down while the worker is running, Edge Core sends an alert to the affected twin.

Use `cyberwave edge status` to see all driver and worker containers with their twin mappings.

## Multi-container drivers

Some robots require multiple cooperating containers (e.g. a driver, bridge nodes, Nav2, SLAM, and elevation mapping). Edge Core supports this via an optional `services` array in the driver metadata. When present, Edge Core launches one container per service instead of a single driver container.

### Metadata schema

```json
{
  "drivers": {
    "linux-aarch64-jetson": {
      "services": [
        {
          "image": "cyberwaveos/go2-ros2-driver:jetson-humble",
          "name": "driver",
          "command": ["ros2", "launch", "cyberwave_go2_driver", "robot_driver.launch.py"]
        },
        {
          "image": "cyberwaveos/go2-ros2-driver:jetson-humble",
          "name": "bridges",
          "command": ["ros2", "launch", "cyberwave_go2_driver", "robot_bridges.launch.py"]
        },
        {
          "image": "cyberwaveos/ros2-nav2:jetson-humble",
          "name": "nav2"
        },
        {
          "image": "cyberwaveos/ros2-slam:jetson-humble",
          "name": "slam"
        },
        {
          "image": "cyberwaveos/ros2-elevation-mapping:jetson-humble",
          "name": "elevation",
          "prefer_gpu": true
        }
      ],
      "shared_env": {
        "CONFIG_PROFILE": "jetson",
        "ROS_DOMAIN_ID": "0",
        "CYBERWAVE_MAP_DIR": "/data"
      },
      "shared_params": ["--network", "host", "-v", "/data:/data"]
    },
    "default": {
      "docker_image": "cyberwaveos/go2-ros2-driver",
      "prefer_gpu": true
    }
  }
}
```

### How it works

- **`services`** present → multi-container mode (one container per service entry).
- **`docker_image`** present, no `services` → single-container mode (existing behavior, unchanged).
- The `default` fallback key works as before for platforms that don't match a specific key.

### Per-service fields

| Field | Required | Description |
|---|---|---|
| `image` | Yes | Docker image reference |
| `name` | Yes | Service name (used in container naming) |
| `command` | No | Override the container entrypoint command |
| `env` | No | Per-service environment variables |
| `params` | No | Per-service Docker params |
| `prefer_gpu` | No | Enable GPU passthrough for this service |
| `gpu` | No | GPU device selector (default: `all`) |

### Shared configuration

- `shared_env`: Environment variables applied to every service. Per-service `env` overrides shared values.
- `shared_params`: Docker params applied to every service (e.g. `--network host`, volume mounts).

### Environment layering

1. Edge Core base env (`CYBERWAVE_API_KEY`, MQTT, Zenoh, etc.) — existing, unchanged
2. `shared_env` from metadata
3. Per-service `env` from metadata

### Container naming

- Single-container mode (unchanged): `cyberwave-driver-{twin_uuid[:8]}`
- Multi-container mode: `cyberwave-driver-{twin_uuid[:8]}-{service_name}`

### Backward compatibility

The existing single-image contract is fully preserved. Metadata without a `services` key follows the original code path with zero changes. All existing tests continue to pass unmodified.

## Writing compatible drivers

A Cyberwave driver is a Docker image that interacts with device hardware and the Cyberwave backend. When Edge Core starts a driver container it sets the following environment variables (provided to the container):

- `CYBERWAVE_TWIN_UUID`
- `CYBERWAVE_API_KEY`
- `CYBERWAVE_TWIN_JSON_FILE` (writable file path)
- `CYBERWAVE_CHILD_TWIN_UUIDS` (optional, comma-separated)
- `CYBERWAVE_DATA_BACKEND` — data transport backend (`zenoh` by default)
- `ZENOH_SHARED_MEMORY` — `true`/`false`; enables zero-copy Zenoh SHM transport
- `ZENOH_CONNECT` — (optional) comma-separated Zenoh router endpoint URLs

`CYBERWAVE_CHILD_TWIN_UUIDS` is present when child camera twins are attached to the driver twin; drivers can use this to coordinate cameras without additional prompts.

### Zenoh data bus

Edge Core automatically injects Zenoh transport configuration into every driver container so that drivers using `cw.data.publish()` work without any extra configuration. The data-bus variables are:

| Variable | Default | Description |
|---|---|---|
| `CYBERWAVE_DATA_BACKEND` | `zenoh` | Data transport: `zenoh` or `filesystem` |
| `ZENOH_SHARED_MEMORY` | `false` | Opt-in zero-copy shared-memory transport. Requires `--ipc=host` between containers; leave disabled unless your runtime is configured for it. |
| `ZENOH_CONNECT` | (empty) | Router endpoints, e.g. `tcp/10.0.0.1:7447` |

All variables can be overridden per-driver with `-e KEY=VALUE` in driver params.

**Peer-to-peer mode (default):** when `ZENOH_CONNECT` is empty, Zenoh uses multicast discovery. On Linux with `--network host` (the default), all driver containers on the same machine discover each other automatically.

**Router mode (optional):** set `ZENOH_ROUTER_ENABLED=true` to have Edge Core start an `eclipse/zenoh:latest` router container before the driver containers. This is required for MQTT bridge or multi-hop topologies.

Environment variables for Zenoh infrastructure:

| Variable | Default | Description |
|---|---|---|
| `ZENOH_ROUTER_ENABLED` | `false` | Start a Zenoh router container before drivers |
| `ZENOH_ROUTER_IMAGE` | `eclipse/zenoh:latest` | Docker image for the router |
| `ZENOH_ROUTER_PORT` | `7447` | Host port for the router |
| `ZENOH_SHARED_MEMORY` | `false` | Opt-in shared-memory transport. Requires all Cyberwave containers to share an IPC namespace (`--ipc=host`); leave disabled unless validated end-to-end. |

### Driver failure handling

Drivers must exit with a **non-zero** code when they cannot access required hardware (for example, missing `/dev/video*` or disconnected peripherals). This allows Edge Core to detect startup failures and trigger restart logic.

Edge Core alerts and behavior:

- `driver_start_failure`: raised if a driver container cannot reach a stable running state.
- `driver_restart_loop`: raised when a driver restarts more than the configured threshold (default 4 restarts within 60 seconds). The container is stopped and marked as flapping.

Optional environment variables to tune restart behavior:

- `CYBERWAVE_DRIVER_RESTART_LOOP_THRESHOLD` (default: `4`)
- `CYBERWAVE_DRIVER_RESTART_LOOP_WINDOW_SECONDS` (default: `60`)
- `CYBERWAVE_DRIVER_TROUBLESHOOTING_URL` (default: `https://docs.cyberwave.com`)

#### Driver revival and orphan containers

When a managed driver exits cleanly (Docker's `--restart unless-stopped` policy does not auto-revive clean exits), Edge Core's revival reconciler re-runs driver startup so the missing container is recreated. Revival is restricted to driver containers this Edge Core process is currently managing — i.e. whose twin is still linked to this edge's fingerprint.

Stopped `cyberwave-driver-*` containers belonging to twins that have since been unlinked are treated as **orphans** and ignored by revival. They remain on the host harmlessly until the user removes them with `docker rm` or `docker container prune`. Without this guard, an orphan would re-trigger driver startup every revival cycle and force-recreate the currently healthy drivers as a side effect of the idempotent `docker rm -f` step.

### Twin JSON file

`CYBERWAVE_TWIN_JSON_FILE` is an absolute path to a JSON file provided to the driver. The file contains the digital twin instance object (including its `metadata`) and the associated catalog twin data, matching the API schema: TwinSchema and AssetSchema.

Drivers may modify this file; Edge Core will sync changes back to the backend when connectivity is available.

#### Bidirectional twin sync

`reconcile_twin_json_file_sync()` runs on every reconcile cycle (~15 s) and now operates in **both directions**:

- **Push** (legacy): a local file whose checksum changed since the last cycle is pushed to `PUT /api/v1/twins/{uuid}`. The set of fields the edge is allowed to push is constrained by `_TWIN_UPDATE_ALLOWED_FIELDS` (no `asset_uuid`, `environment_uuid`, etc.).
- **Pull** (new): for every tracked twin file that did *not* change locally this cycle, the latest twin is fetched via `client.twins.get_raw(uuid)` and the fields in `_TWIN_PULL_ALLOWED_FIELDS` (currently just `metadata`) are merged into the local file.

The pull leg closes the gap that previously forced an `edge-core` restart for UI-driven metadata edits (e.g. flipping the privacy frame filter on/off in the sensor settings dialog) to reach the driver container's environment via `entrypoint.sh`. Push wins for the cycle in which the local file changed; the next cycle's pull surfaces any concurrent backend edits.

The pull set is intentionally narrow: any field the edge legitimately writes locally **must not** be added to `_TWIN_PULL_ALLOWED_FIELDS`, otherwise the next cycle would silently clobber the local edit.

### Twin metadata

Use the official Cyberwave SDK to interact with the API and MQTT; it abstracts authentication, retries, and handshake logic.

Register a driver by adding its configuration to a twin's metadata (or the catalog twin's metadata if you control the catalog twin). Use the environment view's **Advanced editing** to edit metadata.

Note: changing a catalog twin's metadata affects all subsequently created digital twins derived from that catalog twin.

Example driver metadata (JSON):

```json
{
  "drivers": {
    "default": {
      "docker_image": "cyberwaveos/so101-driver",
      "version": "0.0.1",
      "params": [
        "--network",
        "local",
        "--add-host",
        "host.docker.internal:host-gateway"
      ]
    }
  }
}
```

### Platform-specific driver selection

Edge Core can select platform-specific driver entries before falling back to
`default`.

Selection order:

1. Child-registry-specific entry (existing behavior)
2. Host platform/machine keys (for example `darwin-arm64`, `darwin`, `macos`, `mac`)
3. `default`

Example:

```json
{
  "drivers": {
    "default": {
      "docker_image": "cyberwaveos/so101-driver"
    },
    "darwin-arm64": {
      "docker_image": "cyberwaveos/so101-driver:macos",
      "params": ["-e", "CYBERWAVE_SERIAL_BRIDGE_URL=tcp://host.docker.internal:22001"]
    }
  }
}
```

### macOS host-device bridge hook

On macOS, Linux `--device` mappings in `params` cannot directly expose host
hardware to Linux containers. Edge Core now supports a pre-run native bridge
hook:

- Set `CYBERWAVE_MACOS_DEVICE_BRIDGE_COMMAND` on the host
- Edge Core executes it once per `--device` mapping before `docker run`
- Template variables available:
  - `{host_device}`
  - `{container_device}`
  - `{twin_uuid}`
  - `{container_name}`
  - `{config_dir}`

Example:

```bash
export CYBERWAVE_MACOS_DEVICE_BRIDGE_COMMAND="cyberwave-edge-hw-bridge --device {host_device} --target {container_device} --twin {twin_uuid}"
```

The command can start native camera/serial forwarding services that expose
bridge endpoints to the container (typically via `host.docker.internal`).

Bridge command stdout can optionally return a resolved source for the mapped
device:

- JSON: `{"resolved_device":"rtsp://host.docker.internal:8554/cam0"}`
- or line format: `resolved_device=rtsp://host.docker.internal:8554/cam0`

When this value differs from `/dev/video*`, Edge Core can transparently:

- inject `CYBERWAVE_METADATA_VIDEO_DEVICE` for the driver
- inject `CYBERWAVE_EDGE_VIDEO_DEVICE_MAP` (JSON map of Linux device to resolved source)
- remove Linux-only `--device /dev/video*` flags before `docker run` on macOS
  (default enabled with `CYBERWAVE_MACOS_STRIP_VIDEO_DEVICE_PARAMS=true`)

This lets Linux-style drivers keep their normal auto-setup logic while receiving
a macOS-compatible video source without driver code changes.

For camera twins, Edge Core can also provide default bridge candidates on macOS
even when metadata has no explicit `--device` params (default driver config),
so Linux-oriented camera drivers remain compatible with minimal metadata.

To inject environment variables into a driver container, list `-e` flags inside `params`: each `-e` must be a separate element followed by its `KEY=value` string. Example:

```json
{
  "drivers": {
    "default": {
      "docker_image": "cyberwaveos/go2-native-driver",
      "params": ["-e", "MY_VAR=value", "-e", "ANOTHER_VAR=value2"]
    }
  }
}
```

Each `-e` must be its own element in the array, followed by the `KEY=value` string as the next element. This is equivalent to passing `-e MY_VAR=value` on the `docker run` command line.

This is useful for driver-specific configuration that varies per device, such as IP addresses, credentials, or feature flags that cannot be stored in the twin's `edge_configs` metadata.

### Runtime configuration for drivers (`metadata["edge_configs"]`)

Drivers and edge services should treat `metadata["edge_configs"]` as the source of truth for per-device runtime configuration.
Edge identity should be stored at `metadata["edge_fingerprint"]` (not duplicated inside `edge_configs`).

> **Runtime access**: The core passes the full twin JSON (including `metadata`) to every driver via the `CYBERWAVE_TWIN_JSON_FILE` environment variable. Drivers can read `edge_configs` from that file at startup to obtain per-device settings — for example, selecting the right camera source or IP address for the current machine. This is the recommended way to pass device-specific configuration to a driver without hardcoding values in the image.

- Type: object/dictionary
- Value: binding object (`object`)

Canonical shape:

```json
{
  "edge_fingerprint": "macbook-pro-a1b2c3d4e5f6",
  "edge_configs": {
    "camera_config": {
      "camera_id": "front",
      "source": "rtsp://user:pass@192.168.1.20/stream",
      "fps": 10,
      "resolution": "VGA",
      "camera_type": "cv2"
    }
  }
}
```

Field notes:

- `edge_fingerprint`: fingerprint of the edge serving this twin (recommended).
- `camera_config`: per-device camera/runtime config consumed by drivers.

Avoid storing transient runtime state such as `edge_uuid`, `registered_at`, `last_sync`, `last_ip_address`, or `status_data` inside `edge_configs`.

Backward compatibility:

- Older records may use a legacy map shape (`edge_configs[fingerprint] = {...}`).
- Older records may store camera settings in `cameras[0]` or as top-level fields.
- New writers should prefer `camera_config` under `edge_configs`.
- Do not rely on `PUT /api/v1/edges/{uuid}/twins/{twin_uuid}/camera-config`; it is deprecated. Update twin metadata instead.

## Advanced usage

### Manual install and troubleshooting

```bash
# Install the Buildkite package signing key
curl -fsSL "https://packages.buildkite.com/cyberwave/cyberwave-edge-core/gpgkey" | gpg --dearmor -o /etc/apt/keyrings/cyberwave_cyberwave-edge-core-archive-keyring.gpg

# Configure the Apt source
echo -e "deb [signed-by=/etc/apt/keyrings/cyberwave_cyberwave-edge-core-archive-keyring.gpg] https://packages.buildkite.com/cyberwave/cyberwave-edge-core/any/ any main\ndeb-src [signed-by=/etc/apt/keyrings/cyberwave_cyberwave-edge-core-archive-keyring.gpg] https://packages.buildkite.com/cyberwave/cyberwave-edge-core/any/ any main" \
  > /etc/apt/sources.list.d/buildkite-cyberwave-cyberwave-edge-core.list

# Run Edge Core (performs startup checks and starts drivers + worker container)
cyberwave-edge-core

# Show status, credentials and MQTT connectivity (read-only)
cyberwave-edge-core status

# Show version
cyberwave-edge-core --version

# Worker container management (also available via `cyberwave worker …`)
cyberwave-edge-core worker start      # Start the worker container
cyberwave-edge-core worker stop       # Stop the worker container
cyberwave-edge-core worker restart    # Restart the worker container
cyberwave-edge-core worker status     # Show container state, workers, cached models, and health
cyberwave-edge-core worker health     # Show detailed restart history and circuit-breaker state
cyberwave-edge-core worker logs       # Stream worker container logs
cyberwave-edge-core worker logs --no-follow  # Print recent logs without following
```

Preview builds from `dev` / `staging` CI are published as **separate Debian packages** in the same apt repo: `cyberwave-edge-core-dev` and `cyberwave-edge-core-staging`. `apt install cyberwave-edge-core` only pulls tagged releases; use one of the channel packages explicitly when you want those binaries (the packages conflict because they ship the same `/usr/bin/cyberwave-edge-core`).

On non-apt platforms, prerelease Python wheels are published to the Buildkite Python registry and consumed automatically by `cyberwave edge install --channel dev|staging`. Stable pip installs continue to use the public PyPI release.

### Environment variables

Run against a different environment/base URL:

```bash
export CYBERWAVE_ENVIRONMENT="yourenv"
export CYBERWAVE_BASE_URL="https://yourbaseurl"
cyberwave-edge-core
```

Control log verbosity (default: `INFO`):

```bash
export CYBERWAVE_EDGE_LOG_LEVEL="DEBUG"
cyberwave-edge-core
```

Or pass env vars to the CLI installer:

```bash
sudo CYBERWAVE_ENVIRONMENT="yourenv" CYBERWAVE_BASE_URL="https://yourbaseurl" CYBERWAVE_MQTT_HOST="yourmqtt" cyberwave edge install
```

## Local development (from this folder)

You can develop both the **Cyberwave CLI** and **Edge Core** from the `cyberwave-edge-core` directory using a single virtual environment that has the monorepo SDK, CLI, and edge-core installed in editable mode.

### One-time setup

From `cyberwave-edge-core/`:

```bash
# Create and activate a venv (e.g. .venv in this folder)
python3 -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate

# Install SDK, CLI, and Edge Core in editable mode (order matters: SDK first)
pip install -e ../cyberwave-sdks/cyberwave-python
pip install -e ../cyberwave-clis/cyberwave-python-cli/"[build]"
pip install -e ".[build]"
```

**Generate the SDK REST client (required for editable SDK).** The SDK’s `cyberwave.rest` package is generated from the backend OpenAPI spec and is not committed. If you see `ImportError: cannot import name 'DefaultApi' from 'cyberwave.rest'`:

1. Start the backend: `cd ../cyberwave-backend && docker compose -f local.yml up -d` (wait until healthy).
2. From the repo root, generate the REST client:
   ```bash
   cd cyberwave-sdks && ./python-sdk-gen.sh sdk --host localhost:8000
   ```
3. Re-run the `pip install -e` steps above if you already installed; the editable SDK will then include the generated `cyberwave/rest` code.

### Run CLI and Edge Core

After activating the venv, both commands are on your PATH:

```bash
# CLI
cyberwave --help
cyberwave login --email boss@cyberwave.com --password iamnottheboss
cyberwave edge install --help

# Edge Core
cyberwave-edge-core --help
cyberwave-edge-core status
cyberwave-edge-core
```

**Target backend:** If you do not set `CYBERWAVE_BASE_URL`, the CLI and Edge Core use the default **production** API (`https://api.cyberwave.com`). To use your local backend instead:

```bash
export CYBERWAVE_BASE_URL=http://localhost:8000
export CYBERWAVE_MQTT_HOST=localhost
export CYBERWAVE_ENVIRONMENT=local
```

### Paths from this folder

| What       | Path (from `cyberwave-edge-core/`)         |
| ---------- | ------------------------------------------ |
| Repo root  | `..`                                       |
| Python SDK | `../cyberwave-sdks/cyberwave-python`       |
| CLI        | `../cyberwave-clis/cyberwave-python-cli`   |

Edit code in any of those directories; the editable installs pick up changes (no reinstall needed for Python changes).

## Contributing

Contributions are welcome. Please open an issue to discuss bugs or feature requests, and submit a pull request when you are ready.

## Community and Documentation

- Documentation: https://docs.cyberwave.com
- Community (Discord): https://discord.gg/dfGhNrawyF
- Issues: https://github.com/cyberwave-os/cyberwave-edge-core/issues
