Metadata-Version: 2.4
Name: neuromeka_vfm
Version: 0.1.8
Summary: Client utilities for Neuromeka VFM FoundationPose RPC (upload meshes, call server)
Author: Neuromeka
License-Expression: MIT
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy<2.0,>=1.23
Requires-Dist: pyzmq>=25.0.0
Provides-Extra: segmentation
Requires-Dist: av; extra == "segmentation"
Requires-Dist: opencv-python-headless>=4.5.0; extra == "segmentation"
Provides-Extra: pcd
Requires-Dist: trimesh; extra == "pcd"
Requires-Dist: tqdm; extra == "pcd"
Provides-Extra: ssh
Requires-Dist: paramiko; extra == "ssh"
Provides-Extra: examples
Requires-Dist: Pillow; extra == "examples"
Requires-Dist: opencv-python>=4.5.0; extra == "examples"
Requires-Dist: pyrealsense2; extra == "examples"
Provides-Extra: all
Requires-Dist: av; extra == "all"
Requires-Dist: paramiko; extra == "all"
Requires-Dist: trimesh; extra == "all"
Requires-Dist: tqdm; extra == "all"
Requires-Dist: Pillow; extra == "all"
Requires-Dist: opencv-python>=4.5.0; extra == "all"
Requires-Dist: pyrealsense2; extra == "all"
Dynamic: license-file

# neuromeka_vfm

A lightweight client SDK for communicating with Segmentation servers (SAM2 pipeline and SAM3) and Pose Estimation (NVIDIA FoundationPose) servers over RPC/ZeroMQ. It also provides SSH/SFTP utilities to upload mesh files to the host.

- Website: http://www.neuromeka.com
- PyPI package: https://pypi.org/project/neuromeka_vfm/
- Documents: https://docs.neuromeka.com

## Installation

```bash
pip install neuromeka_vfm
```

Optional feature extras:
```bash
# Segmentation compression helpers (h264/png/jpeg): av + OpenCV
pip install "neuromeka_vfm[segmentation]"

# Point-cloud utilities: trimesh + tqdm
pip install "neuromeka_vfm[pcd]"

# SSH/SFTP mesh upload: paramiko
pip install "neuromeka_vfm[ssh]"

# Local demo dependencies: Pillow + OpenCV + pyrealsense2
pip install "neuromeka_vfm[examples]"

# All optional dependencies
pip install "neuromeka_vfm[all]"
```

## Python API (usage by example)

- Client PC: the machine running your application with this package installed.
- Host PC: the machine running Segmentation and Pose Estimation Docker servers. If you run Docker locally, use `localhost`.

### Segmentation

Install extra first: `pip install "neuromeka_vfm[segmentation]"`.

```python
from neuromeka_vfm import Segmentation

seg = Segmentation(
    hostname="192.168.10.63",
    port=5432,
    compression_strategy="png",    # none | png | jpeg | h264
)

# Register using an image prompt
seg.add_image_prompt("drug_box", ref_rgb)
seg.register_first_frame(
    frame=first_rgb,
    prompt="drug_box",     # ID string
    use_image_prompt=True,
)

# Register using a text prompt
seg.register_first_frame(
    frame=first_rgb,
    prompt="box .",         # Text prompt (must end with " .")
    use_image_prompt=False,
)

# SAM2 tracking on the registered mask(s)
resp = seg.get_next(next_rgb)
if isinstance(resp, dict) and resp.get("result") == "ERROR":
    print(f"Tracking error: {resp.get('message')}")
    seg.reset()
else:
    masks = resp

# Segmentation settings / model selection (nrmk_realtime_segmentation v0.2+)
caps = seg.get_capabilities()["data"]
current = seg.get_config()["data"]
seg.set_config(
    {
        "grounding_dino": {
            "backbone": "Swin-B",        # Swin-T | Swin-B
            "box_threshold": 0.35,
            "text_threshold": 0.25,
        },
        "dino_detection": {
            "threshold": 0.5,
            "target_multiplier": 25,
            "img_multiplier": 50,
            "background_threshold": -1.0,
            "final_erosion_count": 10,
            "segment_min_size": 20,
        },
        "sam2": {
            "model": "facebook/sam2.1-hiera-large",
            "use_legacy": False,
            "compile": False,
            "offload_state_to_cpu": False,
            "offload_video_to_cpu": False,
        },
    }
)

# Remove an object (v0.2+, only when use_legacy=False)
seg.remove_object("cup_0")

seg.close()
```

Additional Segmentation APIs and behaviors

- `benchmark=True` in the constructor enables timing counters (`call_time`, `call_count`) for `add_image_prompt`, `register_first_frame`, and `get_next`.
- `switch_compression_strategy()` lets you change the compression strategy at runtime.
- `register_first_frame()` returns `True`/`False` and raises `ValueError` if image prompts are missing when `use_image_prompt=True`.
- `register_first_frame()` accepts a list of prompt IDs when `use_image_prompt=True`.
- `get_next()` returns `None` if called before registration; it can also return the server error dict when available.
- `reset()` performs a server-side reset, while `finish()` clears only local state.
- Exposed state: `tracking_object_ids`, `current_frame_masks`, `invisible_object_ids`.
- Backward-compat alias: `NrmkRealtimeSegmentation`.

#### Segmentation v0.2 config summary (defaults/choices)
`seg.get_capabilities()` can differ depending on server configuration. The following reflects v0.2 defaults.

```yaml
grounding_dino:
  backbone:
    choices:
      - Swin-B
      - Swin-T
    default: Swin-T
  box_threshold:
    default: 0.35
    min: 0.0
    max: 1.0
  text_threshold:
    default: 0.25
    min: 0.0
    max: 1.0

dino_detection:
  threshold:
    default: 0.5
  target_multiplier:
    default: 25
  img_multiplier:
    default: 50
  background_threshold:
    default: -1.0
  final_erosion_count:
    default: 10
  segment_min_size:
    default: 20

sam2:
  model:
    choices:
      - facebook/sam2-hiera-base-plus
      - facebook/sam2-hiera-large
      - facebook/sam2-hiera-small
      - facebook/sam2-hiera-tiny
      - facebook/sam2.1-hiera-base-plus
      - facebook/sam2.1-hiera-large
      - facebook/sam2.1-hiera-small
      - facebook/sam2.1-hiera-tiny
    default: facebook/sam2.1-hiera-large
  use_legacy:
    default: false
  compile:
    default: false
  offload_state_to_cpu:
    default: false
  offload_video_to_cpu:
    default: false
```

#### Segmentation v0.2 notes and changes

- If SAM2 VRAM estimation fails, `seg.get_next()` may return `{"result":"ERROR"}`. Handle the error and call `reset` before re-registering.
- `compile=True` can slow down first-frame registration and `reset`.
- CPU offloading is most effective when both `offload_state_to_cpu=True` and `offload_video_to_cpu=True` are set (legacy mode does not support `offload_video_to_cpu`).
- `remove_object` is supported only when `use_legacy=False`.
- GroundingDINO added the Swin-B backbone and fixed prompt-token merge issues.

### SAM3 Segmentation

`Sam3Segmentation` is a separate client for the SAM3 docker server (single-frame prediction API).
Unlike `Segmentation`, it does not provide `register_first_frame -> get_next` tracking flow.

```python
from neuromeka_vfm import Sam3Segmentation

sam3 = Sam3Segmentation(hostname="192.168.4.109",
                        port=5559,)

sam3.check()
caps = sam3.get_capabilities()["data"]
config = sam3.get_config()["data"]

sam3.set_config(
    {
        "resolution": 1008,
        "confidence_threshold": 0.5,
        "compile": False,
    }
)

# text prompt
resp = sam3.predict_text(frame=rgb, prompt="bolt")

# box prompt
resp = sam3.predict_box(
    frame=rgb,
    boxes=[[700, 470, 980, 620]],
    box_format="xyxy_abs",
)

# text + box prompt
resp = sam3.predict(
    frame=rgb,
    prompt="bolt",
    boxes=[[700, 470, 980, 620]],
    box_format="xyxy_abs",
)

if resp.get("result") == "SUCCESS":
    print(sam3.last_obj_ids)
    print(sam3.last_scores)
    print(sam3.last_boxes_xyxy)
    masks = sam3.current_frame_masks  # {obj_id: mask(H,W,1)}

sam3.reset(free_vram=True)
sam3.close()
```

`Sam3Segmentation` methods:
- `check()`
- `get_capabilities()`
- `get_config()`
- `set_config(config)`
- `reset(free_vram=True)`
- `predict(frame, prompt=None, boxes=None, labels=None, box_format="xyxy_abs", confidence_threshold=None)`
- `predict_text(frame, prompt, confidence_threshold=None)`
- `predict_box(frame, boxes, labels=None, box_format="xyxy_abs", prompt=None, confidence_threshold=None)`
- `close()`

`Sam3Segmentation` state:
- `last_obj_ids`
- `current_frame_masks` (`{obj_id: mask(H,W,1)}`)
- `last_boxes_xyxy`
- `last_scores`

Capability-based local validation:
- On initialization (`validate_capabilities_on_init=True` by default), the client reads `get_capabilities()` and caches supported `box_formats`/`config_keys`.
- `set_config` and box-prompt calls validate values against that cache when available.

### Pose Estimation

Optional: Generate simple box STL (client-side utility)

```python
from neuromeka_vfm import MeshGenerator, write_box_stl

# function style
path = write_box_stl(
    filename="box_61x56x99.stl",
    width=0.0617,   # X (m)
    depth=0.0564,   # Y (m)
    height=0.0993,  # Z (m)
    output_dir="./mesh",   # optional, not fixed to /opt/meshes
)

# class style
mesh_gen = MeshGenerator(output_dir="./mesh")
path2 = mesh_gen.write_box_stl("box2.stl", width=0.05, depth=0.05, height=0.05)
```

Path rule:
- absolute `filename`: write exactly there
- relative `filename`: resolve by `output_dir`, else `$NRMK_MESH_DIR`, else `/opt/meshes`

**Mesh upload**: Upload the mesh file (STL) to `/opt/meshes/` on the host PC. You can also use SSH directly.
Install extra first: `pip install "neuromeka_vfm[ssh]"`.

```python
from neuromeka_vfm import upload_mesh

upload_mesh(
    host="192.168.10.63",
    user="user",
    password="pass",
    local="mesh/my_mesh.stl",         # local mesh path
    remote="/opt/meshes/my_mesh.stl", # host mesh path (Docker volume)
)
```

Initialization

```python
from neuromeka_vfm import PoseEstimation

pose = PoseEstimation(host="192.168.10.72", port=5557)

pose.init(
    mesh_path="/app/modules/foundation_pose/mesh/my_mesh.stl",
    apply_scale=1.0,
    track_refine_iter=3,
    min_n_views=40,
    inplane_step=60,
)
```

- mesh_path: path to the mesh file (STL/OBJ). Initialization fails if missing.
- apply_scale: scalar applied after loading the mesh.
  - STL in meters: 1.0 (no scaling)
  - STL in centimeters: 0.01 (1 cm -> 0.01 m)
  - STL in millimeters: 0.001 (1 mm -> 0.001 m)
- force_apply_color: if True, forces a solid color when the mesh lacks color data.
- apply_color: RGB tuple (0-255) used when `force_apply_color=True`.
- est_refine_iter: number of refinement iterations during registration (higher = more accurate, slower).
- track_refine_iter: number of refinement iterations per frame during tracking.
- min_n_views: minimum number of sampled camera views (affects rotation candidates).
- inplane_step: in-plane rotation step in degrees (smaller = more candidates).

Registration and tracking

```python
# Registration (server defaults when iteration is omitted, check_vram=True pre-checks VRAM)
register_resp = pose.register(rgb=rgb0, depth=depth0, mask=mask0, K=cam_K, check_vram=True)

# Tracking (optionally limit search area with bbox_xywh)
track_resp = pose.track(rgb=rgb1, depth=depth1, K=cam_K, bbox_xywh=bbox_xywh)

pose.close()
```

- cam_K: camera intrinsics.
- Large RGB resolution, large `min_n_views`, or small `inplane_step` can cause GPU VRAM errors.
- `check_vram=True` in `register` performs a pre-check to prevent server shutdown due to OOM.
- `iteration` in `register`/`track` can override the server default if provided.
- `reset()` resets the server state; `reset_object()` reuses the cached mesh to rebuild the rotation grid.
- Default host/port can come from `FPOSE_HOST` and `FPOSE_PORT` environment variables.
- Backward-compat alias: `FoundationPoseClient`.

<!--
## Benchmark

Measured on local servers. Empty cells are not yet measured.

**RTX 5060**
| Task | Prompt | None (s) | JPEG (s) | PNG (s) | h264 (s) |
| --- | --- | --- | --- | --- | --- |
| Grounding DINO | text (human . cup .) | 0.86 | 0.35 | 0.50 | 0.52 |
| DINOv2 | image prompt | 0.85 | 0.49 | 0.65 | 0.63 |
| SAM2 | - |  |  |  |  |
| FoundationPose registration | - |  |  |  |  |
| FoundationPose track | - |  |  |  |  |

**RTX 5090**
| Task | Prompt | None (s) | JPEG (s) | PNG (s) | h264 (s) |
| --- | --- | --- | --- | --- | --- |
| Grounding DINO | text (human . cup .) |  |  |  |  |
| DINOv2 | image prompt |  |  |  |  |
| SAM2 | - |  |  |  |  |
| FoundationPose registration | - | 0.4 | - |  |  |
| FoundationPose track | - | 0.03 |  |  |  |
-->

## Release notes

- 0.1.2: Improved success detection for Segmentation responses (`result`/`success`/`status`), fixed image prompt registration/usage, added `check_vram` to PoseEstimation `register`.
- 0.1.1: Improved resource cleanup in PoseEstimation/Segmentation, use server defaults when iteration is omitted, added pose demo example.
- 0.1.0: Initial public release. Includes FoundationPose RPC client, real-time segmentation client, SSH-based mesh upload CLI/API.
