Metadata-Version: 2.1
Name: bithuman
Version: 1.18.0
Summary: Portable C++ avatar runtime — Python bindings via pybind11. Powers the bitHuman Essence pipeline cross-platform.
Keywords: bithuman,avatar,essence,lipsync,pybind11
Author-Email: bitHuman <hello@bithuman.ai>
License: Commercial — see LICENSE file
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Operating System :: MacOS
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: C++
Classifier: Topic :: Multimedia
Classifier: Topic :: Multimedia :: Graphics
Classifier: Topic :: Multimedia :: Sound/Audio
Classifier: Topic :: Multimedia :: Video
Project-URL: Homepage, https://bithuman.ai
Project-URL: Documentation, https://docs.bithuman.ai
Project-URL: Source, https://github.com/bithuman-product/bithuman-sdk
Requires-Python: >=3.9
Requires-Dist: numpy>=1.24
Requires-Dist: h5py~=3.13
Requires-Dist: loguru~=0.7
Requires-Dist: soxr>=0.5
Requires-Dist: soundfile~=0.13
Requires-Dist: pydantic~=2.10
Requires-Dist: pydantic-settings~=2.8
Requires-Dist: networkx<4.0,>=3.1
Requires-Dist: PyYAML~=6.0
Requires-Dist: aiohttp~=3.11
Requires-Dist: onnxruntime>=1.18; python_version >= "3.10"
Requires-Dist: onnxruntime>=1.14; python_version < "3.10"
Requires-Dist: safetensors>=0.4
Requires-Dist: av>=12.0
Requires-Dist: PyJWT>=2.8
Requires-Dist: requests>=2.31
Requires-Dist: lz4>=4.3
Requires-Dist: PyTurboJPEG>=1.7
Requires-Dist: Pillow>=9.0
Requires-Dist: opencv-python-headless>=4.8
Requires-Dist: tqdm>=4.60
Provides-Extra: test
Requires-Dist: pytest>=7; extra == "test"
Requires-Dist: psutil>=5.9; extra == "test"
Provides-Extra: cli
Requires-Dist: soundfile>=0.12; extra == "cli"
Requires-Dist: imageio>=2.34; extra == "cli"
Requires-Dist: imageio-ffmpeg>=0.5; extra == "cli"
Description-Content-Type: text/markdown

# bithuman

This is the **Python flavor of Layer 3**: a platform-specific library for app developers. It wraps the Layer 1 [`libessence` engine](../../README.md). For the CLI tool see [`docs/CLI.md`](../../../docs/CLI.md).

```
┌─────────────────────────────────────────────────────────────┐
│ Layer 3: Platform-specific libraries (app developers)       │
│   - Python wheel       pip install bithuman    ◄──── you are here
│   - Swift package      SwiftPM Bithuman                     │
│   - Kotlin AAR         ai.bithuman:sdk                      │
│   - (future) Rust crate, JS/TS, Go, ...                     │
└─────────────────────────────────────────────────────────────┘
                          ▼ embeds
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: bithuman CLI (end-user tool)                       │
│   - one cross-platform binary on macOS / Linux / Windows    │
│   - brew install bithuman · curl-pipe installer             │
└─────────────────────────────────────────────────────────────┘
                          ▼ links
┌─────────────────────────────────────────────────────────────┐
│ Layer 1: libessence engine (cross-platform C++ core)        │
│   - portable C ABI, same source on every target             │
│   - macOS · iOS · Android · Linux · Windows                 │
│   - never imported directly by app developers               │
└─────────────────────────────────────────────────────────────┘
```

Python bindings for the **bitHuman SDK** — the portable C++ avatar engine
(`libessence`) that powers our cross-platform lipsync pipeline. The wheel
ships a native pybind11 module that talks directly to `libessence`,
so you get the same per-frame cost as our Swift and Kotlin clients with
none of the GIL noise.

On an Apple M5 with 24 GB unified memory we measure **~640 FPS sustained
compose** (1.56 ms/frame mean, 2.03 ms p99) for a 1248×704 avatar, with
**~206 MB peak RSS** end-to-end. Cold load is ~14 ms for the fixture and
~400 ms for the first compose tick (lazy ONNX init).

This package is namespace-isolated from the v0 `bithuman` SDK; you can
install both side-by-side.

## Install

```sh
pip install bithuman
```

> **Status — Python wheel lags the rest of the SDK.** The PyPI
> `bithuman` wheel is **v1.12.4** (ABI v4), built from the legacy
> `python/` tree at the root of `bithuman-sdk`. The new
> ABI-v6 streaming surface (`be_runtime_push_audio` /
> `be_runtime_pull_frame` / …) is **C-level only** in this binding tree
> until the Rust PyO3 wheel at `cpp/bindings/rust/crates/bithuman-py`
> ships to PyPI as the canonical replacement. Today's PyPI users keep
> the legacy `AsyncBithuman.push_audio` + `async for ... in runtime.run()`
> shape — see the [legacy quickstart](#quickstart-legacy-asyncbithuman-pypi).

## Compatibility

- **Platforms:** macOS arm64, Linux x86_64, Linux arm64 — all ship as wheels.
  Windows is tracked for a follow-up.
- **Python:** 3.10 – 3.13 (cp310, cp311, cp312, cp313). CPython only.
- **ABI:** the published wheel wraps `libessence` ABI v4. The libessence
  engine itself is on ABI v6 — that surface is currently exposed via the
  Swift / Kotlin / Rust bindings only. PyO3 wheel migration in flight.
- **Auth:** ships with live heartbeat against `api.bithuman.ai` baked into
  `libessence`. `Avatar.load(api_secret=...)` is the entry point;
  `BITHUMAN_API_SECRET` env var works too. Set `BITHUMAN_UNMETERED=1`
  for dev / parity-test runs.

## What you get

The package exposes three API tiers (all importable from `bithuman`):

| Tier        | Types                                                            | Use when…                                            |
| ----------- | ---------------------------------------------------------------- | ---------------------------------------------------- |
| Async       | `AsyncAvatar`, `AudioChunk`, `VideoControl`, `VideoFrame`        | Hosting a service / parity with legacy `AsyncBithuman` |
| Sync facade | `Avatar`, `ComposedFrame`, `EP`                                  | Offline / batch / CLI rendering                      |
| Low-level   | `Fixture`, `Runtime`, `EP_CPU`/`EP_AUTO`/`EP_COREML`/`EP_NNAPI`/`EP_QNN` | Direct C ABI access, custom audio pipeline           |

Error types: `BithumanError` (base), `TokenError` /
`TokenExpiredError` / `TokenValidationError` / `TokenRequestError` /
`AccountStatusError` (auth), `ModelError` / `ModelNotFoundError` /
`ModelLoadError` / `ModelSecurityError` / `ExpressionModelNotSupported`
(fixture), `RuntimeNotReadyError`.

Version info: `bithuman.__version__` (Python package),
`bithuman.__core_version__` (linked libessence), `bithuman.__abi_version__`.

## Quickstart (legacy `AsyncBithuman` — PyPI)

This is the shape of the current published wheel. It ports directly from
the v0 `bithuman` SDK: feed PCM with `push_audio`, drain frames from the
`runtime.run()` async generator.

```python
import asyncio
from bithuman import AsyncBithuman

async def main():
    runtime = await AsyncBithuman(
        model_path="model.imx",
        api_secret="...",  # or BITHUMAN_API_SECRET env var
    ).start()

    await runtime.push_audio(pcm_16k_mono_int16_bytes,
                             sample_rate=16000, last_chunk=True)

    async for frame in runtime.run():
        # frame.bgr_image is (H, W, 3) uint8 in BGR order
        ...

asyncio.run(main())
```

PCM accepted is int16 little-endian bytes at 16 kHz mono. WAV / MP3 /
FLAC / OGG decoding is the caller's responsibility (use `soundfile`).

## Quickstart (low-level, C-level streaming surface)

The Rust PyO3 wheel will expose the ABI-v6 streaming pair
(`runtime.push_audio` + `runtime.pull_frame`) on the same shape as the
Swift / Kotlin bindings. Until it ships to PyPI, the snippet below uses
the legacy `Fixture` / `Runtime` types in the published wheel.

## CLI

A `essence-render` console script ships with the wheel:

```sh
pip install 'bithuman[cli]'

essence-render \
  --model ~/.cache/bithuman/models/sample-avatar.imx \
  --audio speech.wav \
  --output out.mp4
```

Pass `--output -` to stream raw BGR24 frames to stdout (handy for piping
into a separate ffmpeg pipeline or a custom encoder). Other flags:

| Flag | Default | Description |
| ---- | ------- | ----------- |
| `--fps` | 25 | Output FPS for the MP4 container. |
| `--quality` | 80 | libx264 quality 1..100 (higher = better). |
| `--ep` | `cpu` | Execution provider hint (`cpu`/`auto`/`coreml`/…). |
| `--threads` | 1 | ORT intra-op thread count. |
| `--no-audio` | – | Skip audio muxing; produce a silent video. |

Example end-to-end run (5 s sine sweep):

```
essence-render 0.1.0: model=sample-avatar.imx audio=sine_sweep_5s.wav ep=cpu threads=1
essence-render: loaded fixture in 14.9 ms — 1248x704 @ 25 fps, 183 clusters, 202 src frames
essence-render: composed 122 frames in 1.83s (14.96 ms/frame, 66.8 fps)
essence-render: wrote /tmp/sine_sweep_5s.mp4
```

(Throughput here is bounded by H.264 encode, not Essence inference. Use
`--output -` if you want to measure raw compose speed.)

## Low-level API

If you need finer control or want to swap in a custom audio pipeline,
the C ABI is exposed directly:

```python
import numpy as np
from bithuman import Fixture, Runtime, EP_CPU

fx = Fixture("model.imx", preferred_ep=EP_CPU, intra_op_threads=1)
rt = Runtime(fx)
pcm = np.fromfile("speech.f32", dtype=np.float32)  # 16 kHz mono float32
cluster_idx, bgr = rt.tick_compose(pcm, frame_idx_hint=-1)
# bgr.shape == (fx.frame_height, fx.frame_width, 3), dtype uint8
```

Pass the entire pcm buffer to each `tick_compose` call; the runtime
maintains an internal cursor and advances one tick per call until the
audio is exhausted.

### Zero-alloc hot path (since 1.12.4)

For tight render loops, pre-allocate the BGR buffer once and pass it
via `out=`. The runtime writes into it in place and returns just the
`cluster_idx`. This drops wrapper overhead to within ~3 % of raw
libessence (vs ~8 % for the alloc-per-tick path):

```python
out = np.empty((fx.frame_height, fx.frame_width, 3), dtype=np.uint8)
for _ in range(num_ticks):
    cluster_idx = rt.tick_compose(pcm, -1, out=out)
    # `out` now holds this tick's frame; read it before the next call.
```

The same `out=` keyword works on `tick_compose_to_size`. See
`docs/ARCHITECTURE.md` §9 for the cross-wrapper perf table.

## Build from source

You need the prebuilt parent C++ archive at
`cpp/build/libessence.a` (run the parent CMake build first), plus
the runtime deps from Homebrew (`onnxruntime`, `webp`, `ffmpeg`,
`hdf5`, `jpeg-turbo`).

```sh
cd cpp/bindings/python
uv pip install -e '.[cli,test]' --no-build-isolation
```

The CMake glue links the prebuilt static archive directly — it does NOT
re-run the parent build, so iterate on bindings without paying the C++
rebuild cost.

## Performance

Measured with `tests/bench.py` against the v1 compose path
(audio → composited BGR frame) on Apple M5 24 GB, libessence 1.16.0:

| Metric                       | Alloc per tick     | `out=` reuse buffer |
| ---------------------------- | ------------------ | ------------------- |
| Steady-state mean            | 1.53 ms / frame    | **1.45 ms / frame** |
| p99                          | 1.66 ms            | 1.53 ms             |
| Sustained throughput         | 655 FPS            | **692 FPS**         |
| Overhead vs raw libessence   | +8.3 %             | **+2.6 %**          |
| Peak RSS (proc)              | 192 MB             | 182 MB              |

Wrapper overhead is within 5 % of raw libessence on the `out=` path;
see `docs/ARCHITECTURE.md` §9 for the apples-to-apples methodology and
the cross-wrapper comparison. Reproduce with:

```sh
scripts/bench-wrappers.sh
```

## Linux wheels

Pre-built `manylinux_2_28` wheels ship for x86_64 + aarch64 across cp310
through cp313 — 8 wheels in total, all auditwheel-repaired with the
full dep tree bundled (ORT, FFmpeg, HDF5, libjpeg-turbo, libwebp,
libcurl, OpenSSL).

To rebuild them locally:

```sh
# One-time: build the dep-baked Docker images (~10 min each).
docker build --platform linux/amd64 -t libessence/manylinux-x86_64:0.1 \
    -f scripts/Dockerfile.manylinux-x86_64 scripts/
docker build --platform linux/arm64/v8 -t libessence/manylinux-aarch64:0.1 \
    -f scripts/Dockerfile.manylinux-aarch64 scripts/

# Per wheel build (~2 min):
docker run --rm --platform linux/amd64 -v "$REPO":/src \
    -e PYTAG=cp311 -e ARCH_INSIDE=x86_64 \
    libessence/manylinux-x86_64:0.1 \
    bash /src/cpp/bindings/python/scripts/build-wheel-in-container.sh
```

## Limitations

- Windows wheels not yet built — tracked for v0.2.
- The CLI's output framerate is fixed at 25 fps to match the model's
  internal rate. Pass `--output -` and pipe to your own encoder if you
  need temporal resampling.
- `preferred_ep=COREML/NNAPI/QNN` is accepted but currently no-ops to
  CPU in the v0.1 build.

## License

Commercial. Contact <hello@bithuman.ai>.

## See also

- [Root `README.md`](../../../README.md) — install matrix
- [`cpp/README.md`](../../README.md) — libessence engine internals + C ABI
- [`docs/CLI.md`](../../../docs/CLI.md) — `bithuman` CLI reference
- [`cpp/bindings/swift/README.md`](../swift/README.md) — Swift binding
- [`cpp/bindings/kotlin/README.md`](../kotlin/README.md) — Kotlin/Android binding
- [`docs/BUILD_AND_RELEASE.md`](../../../docs/BUILD_AND_RELEASE.md) — release flow
