Metadata-Version: 2.1
Name: bithuman
Version: 2.3.3
Summary: bitHuman Python SDK — libessence-backed avatar runtime. `from bithuman import AsyncBithuman`.
Keywords: bithuman,avatar,essence,lipsync,pybind11
Author-Email: bitHuman <hello@bithuman.ai>
License: Commercial — see LICENSE file
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Operating System :: MacOS
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: C++
Classifier: Topic :: Multimedia
Classifier: Topic :: Multimedia :: Graphics
Classifier: Topic :: Multimedia :: Sound/Audio
Classifier: Topic :: Multimedia :: Video
Project-URL: Homepage, https://bithuman.ai
Project-URL: Documentation, https://docs.bithuman.ai
Project-URL: Source, https://github.com/bithuman-product/bithuman-sdk
Requires-Python: >=3.10
Requires-Dist: numpy>=1.26.0
Requires-Dist: loguru~=0.7
Requires-Dist: soundfile~=0.13
Requires-Dist: pydantic~=2.10
Requires-Dist: pydantic-settings~=2.8
Requires-Dist: av>=12.0
Requires-Dist: opencv-python-headless>=4.8
Provides-Extra: test
Requires-Dist: pytest>=7; extra == "test"
Requires-Dist: psutil>=5.9; extra == "test"
Description-Content-Type: text/markdown

# bithuman

This is the **Python flavor of Layer 3**: a platform-specific library for app developers. It wraps the Layer 1 [`libessence` engine](../../README.md). For the CLI tool see [`docs/CLI.md`](../../../docs/CLI.md).

```
┌─────────────────────────────────────────────────────────────┐
│ Layer 3: Platform-specific libraries (app developers)       │
│   - Python wheel       pip install bithuman    ◄──── you are here
│   - Swift package      SwiftPM Bithuman                     │
│   - Kotlin AAR         ai.bithuman:sdk                      │
│   - (future) Rust crate, JS/TS, Go, ...                     │
└─────────────────────────────────────────────────────────────┘
                          ▼ embeds
┌─────────────────────────────────────────────────────────────┐
│ Layer 2: bithuman CLI (end-user tool)                       │
│   - one cross-platform binary on macOS / Linux / Windows    │
│   - brew install bithuman · curl-pipe installer             │
└─────────────────────────────────────────────────────────────┘
                          ▼ links
┌─────────────────────────────────────────────────────────────┐
│ Layer 1: libessence engine (cross-platform C++ core)        │
│   - portable C ABI, same source on every target             │
│   - macOS · iOS · Android · Linux · Windows                 │
│   - never imported directly by app developers               │
└─────────────────────────────────────────────────────────────┘
```

## Two surfaces

`pip install bithuman` is **both a CLI and a Python library**. They share
the same libessence native engine but otherwise operate independently —
the CLI is a bundled Rust binary at `<pkg>/_bin/bithuman` and does not
import the Python lib API; lib users never invoke the CLI.

```sh
# CLI — talk to an avatar in your browser
bithuman run model.imx
```

```python
# Library — embed the runtime in your own app
from bithuman import AsyncBithuman
runtime = await AsyncBithuman(model_path="model.imx", api_secret="...").start()
```

----

Python bindings for the **bitHuman SDK** — the portable C++ avatar engine
(`libessence`) that powers our cross-platform lipsync pipeline. The wheel
ships a native pybind11 module that talks directly to `libessence`,
so you get the same per-frame cost as our Swift and Kotlin clients with
none of the GIL noise.

On an Apple M5 with 24 GB unified memory we measure **~640 FPS sustained
compose** (1.56 ms/frame mean, 2.03 ms p99) for a 1248×704 avatar, with
**~206 MB peak RSS** end-to-end. Cold load is ~14 ms for the fixture and
~400 ms for the first compose tick (lazy ONNX init).

This package is namespace-isolated from the v0 `bithuman` SDK; you can
install both side-by-side.

## Install

```sh
pip install bithuman
```

> **Status.** The PyPI `bithuman` wheel is at **v2.2.6** (2026-05-27)
> shipping the bundled Rust CLI + conversation brain. `bithuman run` is
> the fast-path live-avatar command; `bithuman[local]` adds a fully
> on-device brain (whisper.cpp + llama.cpp + Supertonic). The legacy
> low-level streaming API (`AsyncBithuman.push_audio` + `async for ...
> in runtime.run()`) is still exported for library users — see the
> [legacy quickstart](#quickstart-legacy-asyncbithuman-pypi).

## Brain modes

`bithuman run avatar.imx` needs a conversational brain. There are two paths:

| Mode | Install | Env | When |
| ---- | ------- | --- | ---- |
| **Cloud (default)** | `pip install bithuman` | `OPENAI_API_KEY=sk-...` | Fastest setup, best quality (OpenAI Realtime). |
| **On-device** | `pip install 'bithuman[local]'` | `BITHUMAN_LOCAL=1` | Zero outbound network, no API keys (whisper.cpp + llama.cpp + Supertonic). |

Setting `BITHUMAN_LOCAL=1` takes precedence — the cloud key is ignored
when local mode is active. Run `bithuman doctor` to see which modes are
available on your machine.

## Compatibility

- **Platforms:** macOS arm64, Linux x86_64, Linux arm64 — all ship as wheels.
  Windows is tracked for a follow-up.
- **Python:** 3.10 – 3.13 (cp310, cp311, cp312, cp313). CPython only.
- **ABI:** the published wheel wraps `libessence` ABI v4. The libessence
  engine itself is on ABI v6 — that surface is currently exposed via the
  Swift / Kotlin / Rust bindings only. PyO3 wheel migration in flight.
- **Auth:** ships with live heartbeat against `api.bithuman.ai` baked into
  `libessence`. `Avatar.load(api_secret=...)` is the entry point;
  `BITHUMAN_API_SECRET` env var works too. Set `BITHUMAN_UNMETERED=1`
  for dev / parity-test runs.

## What you get

The package exposes three API tiers (all importable from `bithuman`):

| Tier        | Types                                                            | Use when…                                            |
| ----------- | ---------------------------------------------------------------- | ---------------------------------------------------- |
| Async       | `AsyncAvatar`, `AudioChunk`, `VideoControl`, `VideoFrame`        | Hosting a service / parity with legacy `AsyncBithuman` |
| Sync facade | `Avatar`, `ComposedFrame`, `EP`                                  | Offline / batch / CLI rendering                      |
| Low-level   | `Fixture`, `Runtime`, `EP_CPU`/`EP_AUTO`/`EP_COREML`/`EP_NNAPI`/`EP_QNN` | Direct C ABI access, custom audio pipeline           |

Error types: `BithumanError` (base), `TokenError` /
`TokenExpiredError` / `TokenValidationError` / `TokenRequestError` /
`AccountStatusError` (auth), `ModelError` / `ModelNotFoundError` /
`ModelLoadError` / `ModelSecurityError` / `ExpressionModelNotSupported`
(fixture), `RuntimeNotReadyError`.

Version info: `bithuman.__version__` (Python package),
`bithuman.__core_version__` (linked libessence), `bithuman.__abi_version__`.

## Quickstart (legacy `AsyncBithuman` — PyPI)

This is the shape of the current published wheel. It ports directly from
the v0 `bithuman` SDK: feed PCM with `push_audio`, drain frames from the
`runtime.run()` async generator.

```python
import asyncio
from bithuman import AsyncBithuman

async def main():
    runtime = await AsyncBithuman(
        model_path="model.imx",
        api_secret="...",  # or BITHUMAN_API_SECRET env var
    ).start()

    await runtime.push_audio(pcm_16k_mono_int16_bytes,
                             sample_rate=16000, last_chunk=True)

    async for frame in runtime.run():
        # frame.bgr_image is (H, W, 3) uint8 in BGR order
        ...

asyncio.run(main())
```

PCM accepted is int16 little-endian bytes at 16 kHz mono. WAV / MP3 /
FLAC / OGG decoding is the caller's responsibility (use `soundfile`).

## Quickstart (low-level, C-level streaming surface)

The Rust PyO3 wheel will expose the ABI-v6 streaming pair
(`runtime.push_audio` + `runtime.pull_frame`) on the same shape as the
Swift / Kotlin bindings. Until it ships to PyPI, the snippet below uses
the legacy `Fixture` / `Runtime` types in the published wheel.

## Live avatar — `bithuman run` (browser + voice chat)

The wheel also ships the `bithuman` Rust CLI plus an embedded conversation
brain (a `livekit-agents` worker). `bithuman run avatar.imx` stands up the
whole stack — embedded livekit-server, libessence runtime, brain, browser
player — and prints a URL you open to talk to the avatar.

```sh
pip install bithuman
export BITHUMAN_API_KEY=...     # avatar-runtime auth (https://www.bithuman.ai/#developer)
export OPENAI_API_KEY=sk-...    # the conversational brain (OpenAI Realtime)

bithuman run ~/.cache/bithuman/models/sample-avatar.imx
# → open the printed http://127.0.0.1:8088/<CODE> URL in a browser
```

### Fully on-device — `bithuman[local]`

For zero-cloud operation, install the `[local]` extra and set
`BITHUMAN_LOCAL=1`. No OpenAI key, no outbound network — the brain
swaps to whisper.cpp (STT) + llama.cpp (LLM) + Supertonic (TTS), all
in-process, all auto-downloaded from HuggingFace on first run.

```sh
pip install 'bithuman[local]'

export BITHUMAN_API_KEY=...
BITHUMAN_LOCAL=1 bithuman run ~/.cache/bithuman/models/sample-avatar.imx
```

| Slot | Library (mobile-portable C++ core) | Default model | Disk | RAM |
| ---- | ---------------------------------- | ------------- | ---- | --- |
| STT  | `pywhispercpp` → whisper.cpp        | `tiny.en`     | 77 MB  | ~150 MB |
| LLM  | `llama-cpp-python` → llama.cpp      | Qwen 2.5 0.5B-Instruct Q4_K_M | 400 MB | ~600 MB |
| TTS  | `supertonic` → ONNX Runtime         | Supertonic 3 (voice M1, 31 languages) | 380 MB | ~600 MB |
| VAD  | `livekit-plugins-silero`            | Silero        | 5 MB   | ~50 MB  |

Total ~860 MB on disk, ~1.5 GB RAM, ~717 ms warm load, ~1.4 s warm
end-to-end (STT + LLM + TTS) on Apple Silicon. Cold start adds ~90 s
once for first-run model downloads.

#### Optional knobs (env vars)

| Var | Default | What |
| --- | ------- | ---- |
| `BITHUMAN_LOCAL` | _unset_ | `=1` flips the brain to the local stack. |
| `BITHUMAN_LOCAL_WHISPER` | `tiny.en` | whisper.cpp model size (`tiny.en` / `base.en` / `small` / `large-v3-turbo`). |
| `BITHUMAN_LOCAL_LLM` | `Qwen/Qwen2.5-0.5B-Instruct-GGUF` | HuggingFace repo id of a GGUF LLM. |
| `BITHUMAN_LOCAL_LLM_FILE` | `qwen2.5-0.5b-instruct-q4_k_m.gguf` | GGUF file within the repo. |
| `BITHUMAN_LOCAL_VOICE` | `M1` | Supertonic voice preset (`M1`–`M5` / `F1`–`F5`). |
| `BITHUMAN_LOCAL_LANG` | `en` | Supertonic language (31 supported: `en`, `ko`, `ja`, `es`, `de`, …). |
| `BITHUMAN_INSTRUCTIONS` | _short default_ | Override the system prompt. |

All three local backends have first-party iOS/Android C++ builds, so the
same `.gguf` / `.bin` / `.onnx` model files are reusable when porting to
mobile — see `sdks/python/src/bithuman/local_plugins/`.

## CLI

`pip install bithuman` ships a `bithuman` console script — the Rust CLI
(bundled at `<wheel>/bithuman/_bin/bithuman`) is the supported surface.

```sh
pip install bithuman

bithuman run avatar.imx        # live avatar (browser-to-talk)
bithuman render avatar.imx -a speech.wav -o out.mp4
bithuman info  avatar.imx
bithuman list / pull / doctor
```

See `bithuman --help` for full flags. Same Essence engine the lib uses;
two surfaces, one libessence.

## Low-level API

If you need finer control or want to swap in a custom audio pipeline,
the C ABI is exposed directly:

```python
import numpy as np
from bithuman import Fixture, Runtime, EP_CPU

fx = Fixture("model.imx", preferred_ep=EP_CPU, intra_op_threads=1)
rt = Runtime(fx)
pcm = np.fromfile("speech.f32", dtype=np.float32)  # 16 kHz mono float32
cluster_idx, bgr = rt.tick_compose(pcm, frame_idx_hint=-1)
# bgr.shape == (fx.frame_height, fx.frame_width, 3), dtype uint8
```

Pass the entire pcm buffer to each `tick_compose` call; the runtime
maintains an internal cursor and advances one tick per call until the
audio is exhausted.

### Zero-alloc hot path (since 1.12.4)

For tight render loops, pre-allocate the BGR buffer once and pass it
via `out=`. The runtime writes into it in place and returns just the
`cluster_idx`. This drops wrapper overhead to within ~3 % of raw
libessence (vs ~8 % for the alloc-per-tick path):

```python
out = np.empty((fx.frame_height, fx.frame_width, 3), dtype=np.uint8)
for _ in range(num_ticks):
    cluster_idx = rt.tick_compose(pcm, -1, out=out)
    # `out` now holds this tick's frame; read it before the next call.
```

The same `out=` keyword works on `tick_compose_to_size`. See
`docs/ARCHITECTURE.md` §9 for the cross-wrapper perf table.

## Build from source

You need the prebuilt parent C++ archive at
`engine/essence/build/libessence.a` (run the parent CMake build first), plus
the runtime deps from Homebrew (`onnxruntime`, `webp`, `ffmpeg`,
`hdf5`, `jpeg-turbo`).

```sh
cd sdks/python
uv pip install -e '.[cli,test]' --no-build-isolation
```

The CMake glue links the prebuilt static archive directly — it does NOT
re-run the parent build, so iterate on bindings without paying the C++
rebuild cost.

## Performance

Measured with `tests/bench.py` against the v1 compose path
(audio → composited BGR frame) on Apple M5 24 GB, libessence 1.16.0:

| Metric                       | Alloc per tick     | `out=` reuse buffer |
| ---------------------------- | ------------------ | ------------------- |
| Steady-state mean            | 1.53 ms / frame    | **1.45 ms / frame** |
| p99                          | 1.66 ms            | 1.53 ms             |
| Sustained throughput         | 655 FPS            | **692 FPS**         |
| Overhead vs raw libessence   | +8.3 %             | **+2.6 %**          |
| Peak RSS (proc)              | 192 MB             | 182 MB              |

Wrapper overhead is within 5 % of raw libessence on the `out=` path;
see `docs/ARCHITECTURE.md` §9 for the apples-to-apples methodology and
the cross-wrapper comparison. Reproduce with:

```sh
scripts/bench-wrappers.sh
```

## Linux wheels

Pre-built `manylinux_2_28` wheels ship for x86_64 + aarch64 across cp310
through cp313 — 8 wheels in total, all auditwheel-repaired with the
full dep tree bundled (ORT, FFmpeg, HDF5, libjpeg-turbo, libwebp,
libcurl, OpenSSL).

To rebuild them locally:

```sh
# One-time: build the dep-baked Docker images (~10 min each).
docker build --platform linux/amd64 -t libessence/manylinux-x86_64:0.1 \
    -f scripts/Dockerfile.manylinux-x86_64 scripts/
docker build --platform linux/arm64/v8 -t libessence/manylinux-aarch64:0.1 \
    -f scripts/Dockerfile.manylinux-aarch64 scripts/

# Per wheel build (~2 min):
docker run --rm --platform linux/amd64 -v "$REPO":/src \
    -e PYTAG=cp311 -e ARCH_INSIDE=x86_64 \
    libessence/manylinux-x86_64:0.1 \
    bash /src/sdks/python/scripts/build-wheel-in-container.sh
```

## Limitations

- Windows wheels not yet built — tracked for v0.2.
- The CLI's output framerate is fixed at 25 fps to match the model's
  internal rate. Pass `--output -` and pipe to your own encoder if you
  need temporal resampling.
- `preferred_ep=COREML/NNAPI/QNN` is accepted but currently no-ops to
  CPU in the v0.1 build.

## License

Commercial. Contact <hello@bithuman.ai>.

## See also

- [Root `README.md`](../../../README.md) — install matrix
- [`engine/essence/README.md`](../../../engine/essence/README.md) — libessence engine internals + C ABI
- [`docs/CLI.md`](../../../docs/CLI.md) — `bithuman` CLI reference
- [`sdks/swift/README.md`](../swift/README.md) — Swift binding
- [`sdks/kotlin/README.md`](../kotlin/README.md) — Kotlin/Android binding
- [`docs/BUILD_AND_RELEASE.md`](../../../docs/BUILD_AND_RELEASE.md) — release flow
