Metadata-Version: 2.4
Name: edgefirst_hal
Version: 0.25.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Rust
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Image Processing
Classifier: Topic :: Software Development :: Libraries
Requires-Dist: numpy
Requires-Dist: pytest ; extra == 'test'
Requires-Dist: psutil ; extra == 'test'
Provides-Extra: test
Summary: Hardware Abstraction Layer for edge AI with zero-copy tensors, image processing, and YOLO decoding
Keywords: edge-ai,computer-vision,machine-learning,yolo,tensor,image-processing,dma,zero-copy,embedded,nxp-imx
Home-Page: https://edgefirst.ai
Author-email: Au-Zone Technologies <support@au-zone.com>
Maintainer-email: Au-Zone Technologies <support@au-zone.com>
License: Apache-2.0
Requires-Python: >=3.8
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Changelog, https://github.com/EdgeFirstAI/hal/blob/main/CHANGELOG.md
Project-URL: Documentation, https://github.com/EdgeFirstAI/hal#readme
Project-URL: Homepage, https://edgefirst.ai
Project-URL: Issues, https://github.com/EdgeFirstAI/hal/issues
Project-URL: Repository, https://github.com/EdgeFirstAI/hal.git

# edgefirst-hal

[![PyPI](https://img.shields.io/pypi/v/edgefirst-hal.svg)](https://pypi.org/project/edgefirst-hal/)
[![Python](https://img.shields.io/pypi/pyversions/edgefirst-hal.svg)](https://pypi.org/project/edgefirst-hal/)
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

Hardware-accelerated image processing, zero-copy tensors, and YOLO decoding
for edge AI inference pipelines. Built in Rust with Python bindings via PyO3.

## Installation

```bash
pip install edgefirst-hal
```

Pre-built wheels are available for Linux (x86_64, aarch64), macOS, and Windows.
No Rust toolchain required.

> **Python 3.11+** wheels use the improved stable ABI for zero-copy buffer
> protocol support. Python 3.8–3.10 wheels use a compatible fallback.
> Pip selects the best wheel automatically.

## Quick Start

```python
import edgefirst_hal as ef

# Load a source image
src = ef.Tensor.load("photo.jpg", ef.PixelFormat.Rgb)

# Create an image processor (auto-selects best backend: GPU > G2D > CPU)
processor = ef.ImageProcessor()

# Allocate a GPU-optimal output buffer — always use create_image() for
# destinations passed to convert(), so the processor can select the best
# memory type (DMA-buf, PBO, or system memory) for zero-copy GPU paths.
dst = processor.create_image(640, 640, ef.PixelFormat.Rgb)

# Convert with a letterbox resize (preserves aspect ratio, pads with grey).
# Omit `letterbox=` to stretch-to-fill instead.
processor.convert(src, dst, letterbox=[114, 114, 114, 255])

# Access pixel data as a numpy array. Use the context manager + .numpy()
# form — this is the portable pattern that works on both wheel variants.
import numpy as np
with dst.map() as m:
    pixels = np.frombuffer(m.numpy(), dtype=np.uint8).reshape(dst.shape())

# The shorter `np.frombuffer(dst.map(), ...)` form only works on the
# abi3-py311 wheel, where `TensorMap` exposes Python's buffer protocol
# directly. The abi3-py38 compatibility wheel disables `__getbuffer__`,
# so use `.numpy()` if your code needs to run on Python 3.8–3.10.
```

## Role in edgefirst-hal

The `edgefirst-hal` package on PyPI is the Python face of the EdgeFirst
HAL Rust workspace:

- Built from [`crates/python`](https://github.com/EdgeFirstAI/hal/tree/main/crates/python),
  which is a PyO3 binding over the `edgefirst-hal` Rust umbrella crate.
- Does **not** consume the C API ([`edgefirst-hal-capi`](https://github.com/EdgeFirstAI/hal/blob/main/crates/capi/));
  the binding goes directly through Rust.
- Exposes the same `Tensor`, `ImageProcessor`, `Decoder`, and `Tracker`
  surfaces as the Rust crate, with numpy-friendly conversions and the
  buffer protocol for zero-copy interop.
- Wheels are distributed as two stable-ABI variants per platform —
  `abi3-py311` (preferred, supports buffer protocol features added in
  3.11) and `abi3-py38` (compatibility fallback for 3.8–3.10).
  Pip selects the best wheel automatically.

## Key Features

- **Zero-copy tensors** — DMA-BUF, POSIX shared memory, and PBO-backed
  buffers with automatic fallback to system memory
- **Hardware-accelerated image processing** — OpenGL, NXP G2D, and
  optimized CPU backends with automatic selection
- **Letterbox resize** — aspect-ratio-preserving resize with configurable
  padding color, rotation, and flip
- **Int8 output** — `create_image(..., dtype="int8")` for direct signed
  int8 tensor output with GPU-accelerated XOR bias
- **YOLO decoding** — YOLOv5, YOLOv8, YOLO11, and YOLO26 detection and
  instance segmentation (including end-to-end models)
- **Object tracking** — ByteTrack multi-object tracker with Kalman filtering
- **Fully typed** — ships with `.pyi` stubs for IDE autocompletion and
  type checking with mypy / pyright

## Image Processing

```python
import edgefirst_hal as ef

processor = ef.ImageProcessor()
src = ef.Tensor.load("frame.jpg", ef.PixelFormat.Rgb)

# Letterbox resize to model input size
dst = processor.create_image(640, 640, ef.PixelFormat.Rgb)
processor.convert(src, dst)

# With rotation and horizontal flip
processor.convert(src, dst, rotation=ef.Rotation.Rotate90, flip=ef.Flip.Horizontal)

# Crop source region
processor.convert(src, dst, src_crop=ef.Rect(100, 100, 400, 400))

# Int8 output for quantized models
dst_i8 = processor.create_image(640, 640, ef.PixelFormat.Rgb, dtype="int8")
processor.convert(src, dst_i8)
```

## Zero-Copy External Buffer (Linux)

When integrating with an NPU delegate that owns DMA-BUF buffers, render
directly into the delegate's buffer to eliminate a `memcpy`:

```python
import edgefirst_hal as ef

processor = ef.ImageProcessor()
src = ef.Tensor.load("frame.jpg", ef.PixelFormat.Rgb)

# Render directly into the delegate's DMA-BUF — zero copies
dst = processor.import_image(fd=vx_fd, width=640, height=640, format=ef.PixelFormat.Rgb)
processor.convert(src, dst)

# Reverse: HAL allocates, consumer imports the fd
hal_dst = processor.create_image(640, 640, ef.PixelFormat.Rgb)
fd = hal_dst.dmabuf_clone()  # Raises if not DMA-backed
delegate.register(fd)
```

You can also attach format metadata to any raw tensor created via `from_fd()`:

```python
t = ef.Tensor.from_fd(some_fd, [480, 640, 3])
t.set_format(ef.PixelFormat.Rgb)
processor.convert(src, t)
```

**Performance tip:** When rotating through a pool of DMA-BUFs (e.g. 2-3
from an NPU delegate), create the `Tensor` wrappers once at init and
reuse them across frames. This avoids EGL image cache misses (~100-300us
each on Vivante GPUs).

## CUDA Zero-Copy (TensorRT)

When running inference with TensorRT or cupy, `Tensor.cuda_map()` exposes a
raw CUDA device pointer to a tensor that has been registered with CUDA (e.g.
via the GL-CUDA interop path). The mapping is scoped by a context manager so
the GPU buffer is released automatically for the next `convert()` call.

Check availability first, then try `cuda_map()` and fall back to `map()` for
CPU paths:

```python
import edgefirst_hal as ef

# One-time check — cached after first call
if not ef.is_cuda_available():
    print("libcudart not found; falling back to CPU tensors")

tensor = ef.ImageProcessor().create_image(640, 640, ef.PixelFormat.Rgb)

cm = tensor.cuda_map()
if cm is not None:
    with cm as m:
        # m.device_ptr is the raw CUDA device pointer (int)
        # m.size is the buffer size in bytes
        trt_context.set_input_tensor_address("input", m.device_ptr)
        trt_context.execute_async_v3(stream)
else:
    # No CUDA handle on this tensor — use the CPU path
    with tensor.map() as host:
        run_cpu_inference(host)
```

`CudaMap` exposes:
- `device_ptr` (`int`) — raw CUDA device pointer, suitable for
  `cupy.ndarray.from_dlpack`, `pycuda.gpuarray`, or TensorRT
  `set_input_tensor_address`.
- `size` (`int`) — buffer size in bytes.
- `release()` — explicitly release before the `with` block ends (idempotent).

## YOLO Decoding

```python
import edgefirst_hal as ef

# Configure decoder from model metadata
decoder = ef.Decoder(
    {"detection": {"shape": [1, 84, 8400], "dtype": "float32"}},
    score_threshold=0.5,
    iou_threshold=0.45,
)

# Decode model outputs → (boxes, scores, class_ids)
boxes, scores, classes = decoder.decode([output_tensor])
```

## Object Tracking

`ByteTrack` is a multi-object tracker based on ByteTrack with Kalman filtering.
It assigns consistent track IDs across frames.

```python
import edgefirst_hal as ef

tracker = ef.ByteTrack(
    high_conf=0.7,         # High-confidence detection threshold
    iou=0.25,              # IoU threshold for association
    update=0.25,           # Update/low-confidence threshold
    lifespan_ns=500_000_000,  # Track lifespan without detection (nanoseconds)
)

# Decode and track in one call (returns boxes, scores, classes, masks, track_infos)
boxes, scores, classes, masks, tracks = decoder.decode_tracked(
    tracker, timestamp_ns, [output_tensor]
)
# masks is empty for detection-only models

# Or query currently active tracks
active = tracker.get_active_tracks()
```

## Segmentation Mask Rendering

### draw_decoded_masks()

Draw pre-decoded masks onto a destination image:

```python
processor.draw_decoded_masks(
    dst,
    bbox,           # numpy array [N, 4]
    scores,         # numpy array [N]
    classes,        # numpy array [N]
    seg=[],         # list of segmentation arrays (optional)
    background=None,  # optional background tensor to blit before drawing
    opacity=1.0,    # mask alpha scale (0.0 – 1.0)
)
```

### draw_masks()

Decode model outputs and draw segmentation masks in a single call. Masks never
leave Rust, eliminating the Python round-trip overhead of `decode()` +
`draw_decoded_masks()`.

Without a tracker, returns `(boxes, scores, classes)`. With a tracker, returns
`(boxes, scores, classes, track_infos)`.

```python
import edgefirst_hal as ef

processor = ef.ImageProcessor()
tracker = ef.ByteTrack()

# Without tracking
boxes, scores, classes = processor.draw_masks(decoder, outputs, dst)

# With overlay parameters
boxes, scores, classes = processor.draw_masks(
    decoder, outputs, dst,
    background=bg_tensor,  # blit bg_tensor into dst before masks
    opacity=0.7,           # semi-transparent masks
)

# With tracking (requires tracker= and timestamp=)
import time
ts = time.monotonic_ns()
boxes, scores, classes, tracks = processor.draw_masks(
    decoder, outputs, dst,
    tracker=tracker,
    timestamp=ts,
)
```

## Platform Support

| Platform | GPU Acceleration | Memory Types |
|----------|-----------------|-------------|
| Linux (NXP i.MX8/i.MX95) | OpenGL + G2D | DMA-buf, SHM, PBO, Mem |
| Linux (x86_64, other ARM) | OpenGL | SHM, PBO, Mem |
| macOS / Windows | CPU only | Mem |

Hardware acceleration is used automatically when available. All platforms
fall back to CPU.

## Part of the EdgeFirst Ecosystem

`edgefirst-hal` is the runtime inference library in the
[EdgeFirst](https://edgefirst.ai) platform for deploying AI at the edge.

- **[EdgeFirst Studio](https://edgefirst.studio)** — label, train, and
  deploy models for edge devices
- **[Rust crates](https://crates.io/crates/edgefirst-hal)** — use the
  same library directly from Rust or C
- **[GitHub](https://github.com/EdgeFirstAI/hal)** — source code,
  architecture docs, benchmarks, and contribution guide

## Documentation

- Architecture overview: [ARCHITECTURE.md](https://github.com/EdgeFirstAI/hal/blob/main/crates/python/ARCHITECTURE.md)
- Testing guide: [TESTING.md](https://github.com/EdgeFirstAI/hal/blob/main/crates/python/TESTING.md)
- Project README (cross-language overview): [../../README.md](https://github.com/EdgeFirstAI/hal/blob/main/README.md)
- Optimization guide (cross-language user rules): [README.md#optimization-guide](https://github.com/EdgeFirstAI/hal/blob/main/README.md#optimization-guide)

## License

Apache-2.0 — see [LICENSE](https://github.com/EdgeFirstAI/hal/blob/main/LICENSE).

