Metadata-Version: 2.4
Name: neuromeka_vfm
Version: 0.2.0
Summary: Client utilities for Neuromeka VFM FoundationPose RPC (upload meshes, call server)
Author: Neuromeka
License-Expression: MIT
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy<2.0,>=1.23
Requires-Dist: pyzmq>=25.0.0
Provides-Extra: segmentation
Requires-Dist: av; extra == "segmentation"
Requires-Dist: opencv-python-headless>=4.5.0; extra == "segmentation"
Provides-Extra: dinov3
Requires-Dist: opencv-python-headless>=4.5.0; extra == "dinov3"
Provides-Extra: pcd
Requires-Dist: trimesh; extra == "pcd"
Requires-Dist: tqdm; extra == "pcd"
Provides-Extra: ssh
Requires-Dist: paramiko; extra == "ssh"
Provides-Extra: examples
Requires-Dist: Pillow; extra == "examples"
Requires-Dist: opencv-python>=4.5.0; extra == "examples"
Requires-Dist: pyrealsense2; extra == "examples"
Provides-Extra: all
Requires-Dist: av; extra == "all"
Requires-Dist: paramiko; extra == "all"
Requires-Dist: trimesh; extra == "all"
Requires-Dist: tqdm; extra == "all"
Requires-Dist: Pillow; extra == "all"
Requires-Dist: opencv-python>=4.5.0; extra == "all"
Requires-Dist: pyrealsense2; extra == "all"
Dynamic: license-file

# neuromeka_vfm

A lightweight client SDK for communicating with Segmentation servers (SAM2 pipeline and SAM3) and Pose Estimation (NVIDIA FoundationPose) servers over RPC/ZeroMQ. It also provides SSH/SFTP utilities to upload mesh files to the host.

- Website: http://www.neuromeka.com
- PyPI package: https://pypi.org/project/neuromeka_vfm/
- Documents: https://docs.neuromeka.com

## Installation

```bash
pip install neuromeka_vfm
```

Optional feature extras:
```bash
# Segmentation compression helpers (h264/png/jpeg): av + OpenCV
pip install "neuromeka_vfm[segmentation]"

# Point-cloud utilities: trimesh + tqdm
pip install "neuromeka_vfm[pcd]"

# SSH/SFTP mesh upload: paramiko
pip install "neuromeka_vfm[ssh]"

# DINOv3 image-prompt detection helpers
pip install "neuromeka_vfm[dinov3]"

# Local demo dependencies: Pillow + OpenCV + pyrealsense2
pip install "neuromeka_vfm[examples]"

# All optional dependencies
pip install "neuromeka_vfm[all]"
```

## Python API (usage by example)

- Client PC: the machine running your application with this package installed.
- Host PC: the machine running Segmentation and Pose Estimation Docker servers. If you run Docker locally, use `localhost`.

### Segmentation

Install extra first: `pip install "neuromeka_vfm[segmentation]"`.

```python
from neuromeka_vfm import Segmentation

seg = Segmentation(
    hostname="192.168.10.63",
    port=5432,
    compression_strategy="png",    # none | png | jpeg | h264
)

# Register using an image prompt
seg.add_image_prompt("drug_box", ref_rgb)
seg.register_first_frame(
    frame=first_rgb,
    prompt="drug_box",     # ID string
    use_image_prompt=True,
)

# Register using a text prompt
seg.register_first_frame(
    frame=first_rgb,
    prompt="box .",         # Text prompt (must end with " .")
    use_image_prompt=False,
)

# SAM2 tracking on the registered mask(s)
resp = seg.get_next(next_rgb)
if isinstance(resp, dict) and resp.get("result") == "ERROR":
    print(f"Tracking error: {resp.get('message')}")
    seg.reset()
else:
    masks = resp

# Segmentation settings / model selection (nrmk_realtime_segmentation v0.2+)
caps = seg.get_capabilities()["data"]
current = seg.get_config()["data"]
seg.set_config(
    {
        "grounding_dino": {
            "backbone": "Swin-B",        # Swin-T | Swin-B
            "box_threshold": 0.35,
            "text_threshold": 0.25,
        },
        "dino_detection": {
            "threshold": 0.5,
            "target_multiplier": 25,
            "img_multiplier": 50,
            "background_threshold": -1.0,
            "final_erosion_count": 10,
            "segment_min_size": 20,
        },
        "sam2": {
            "model": "facebook/sam2.1-hiera-large",
            "use_legacy": False,
            "compile": False,
            "offload_state_to_cpu": False,
            "offload_video_to_cpu": False,
        },
    }
)

# Remove an object (v0.2+, only when use_legacy=False)
seg.remove_object("cup_0")

seg.close()
```

Additional Segmentation APIs and behaviors

- `benchmark=True` in the constructor enables timing counters (`call_time`, `call_count`) for `add_image_prompt`, `register_first_frame`, and `get_next`.
- `switch_compression_strategy()` lets you change the compression strategy at runtime.
- `register_first_frame()` returns `True`/`False` and raises `ValueError` if image prompts are missing when `use_image_prompt=True`.
- `register_first_frame()` accepts a list of prompt IDs when `use_image_prompt=True`.
- `get_next()` returns `None` if called before registration; it can also return the server error dict when available.
- `reset()` performs a server-side reset, while `finish()` clears only local state.
- Exposed state: `tracking_object_ids`, `current_frame_masks`, `invisible_object_ids`.
- Backward-compat alias: `NrmkRealtimeSegmentation`.

#### Segmentation v0.2 config summary (defaults/choices)
`seg.get_capabilities()` can differ depending on server configuration. The following reflects v0.2 defaults.

```yaml
grounding_dino:
  backbone:
    choices:
      - Swin-B
      - Swin-T
    default: Swin-T
  box_threshold:
    default: 0.35
    min: 0.0
    max: 1.0
  text_threshold:
    default: 0.25
    min: 0.0
    max: 1.0

dino_detection:
  threshold:
    default: 0.5
  target_multiplier:
    default: 25
  img_multiplier:
    default: 50
  background_threshold:
    default: -1.0
  final_erosion_count:
    default: 10
  segment_min_size:
    default: 20

sam2:
  model:
    choices:
      - facebook/sam2-hiera-base-plus
      - facebook/sam2-hiera-large
      - facebook/sam2-hiera-small
      - facebook/sam2-hiera-tiny
      - facebook/sam2.1-hiera-base-plus
      - facebook/sam2.1-hiera-large
      - facebook/sam2.1-hiera-small
      - facebook/sam2.1-hiera-tiny
    default: facebook/sam2.1-hiera-large
  use_legacy:
    default: false
  compile:
    default: false
  offload_state_to_cpu:
    default: false
  offload_video_to_cpu:
    default: false
```

#### Segmentation v0.2 notes and changes

- If SAM2 VRAM estimation fails, `seg.get_next()` may return `{"result":"ERROR"}`. Handle the error and call `reset` before re-registering.
- `compile=True` can slow down first-frame registration and `reset`.
- CPU offloading is most effective when both `offload_state_to_cpu=True` and `offload_video_to_cpu=True` are set (legacy mode does not support `offload_video_to_cpu`).
- `remove_object` is supported only when `use_legacy=False`.
- GroundingDINO added the Swin-B backbone and fixed prompt-token merge issues.

### SAM3 Segmentation

`Sam3Segmentation` is a separate client for the SAM3 docker server.
It supports both single-frame prediction API and streaming tracking API.

```python
from neuromeka_vfm import Sam3Segmentation

sam3 = Sam3Segmentation(hostname="192.168.4.109",
                        port=5559,)

sam3.check()
caps = sam3.get_capabilities()["data"]
config = sam3.get_config()["data"]

sam3.set_config(
    {
        "resolution": 1008,
        "confidence_threshold": 0.5,
        "compile": False,
    }
)

# text prompt
resp = sam3.predict_text(frame=rgb, prompt="bolt")

# box prompt
resp = sam3.predict_box(
    frame=rgb,
    boxes=[[700, 470, 980, 620]],
    box_format="xyxy_abs",
)

# text + box prompt
resp = sam3.predict(
    frame=rgb,
    prompt="bolt",
    boxes=[[700, 470, 980, 620]],
    box_format="xyxy_abs",
)

if resp.get("result") == "SUCCESS":
    print(sam3.last_obj_ids)
    print(sam3.last_scores)
    print(sam3.last_boxes_xyxy)
    masks = sam3.current_frame_masks  # {obj_id: mask(H,W,1)}

# tracking flow
reg = sam3.register_first_frame(
    frame=rgb0,
    boxes=[[700, 470, 980, 620]],
    phrases=["bolt"],
)
if reg.get("result") == "SUCCESS":
    nxt = sam3.get_next(frame=rgb1)  # alias of track()
    if nxt.get("result") == "SUCCESS":
        print(sam3.last_obj_ids)
        print(sam3.current_frame_masks.keys())
    sam3.remove_object(obj_id=sam3.last_obj_ids[0], strict=False, need_output=True)
    sam3.stop_tracking(free_vram=True, drop_tracking_predictor=False)

memory_report = sam3.get_memory_report()
sam3.reset(free_vram=True, reset_tracking=True, drop_tracking_predictor=False)
sam3.close()
```

`Sam3Segmentation` methods:
- `check()`
- `get_capabilities()`
- `get_config()`
- `set_config(config)`
- `get_memory_report()`
- `reset(free_vram=True, reset_tracking=True, drop_tracking_predictor=False)`
- `reset_tracking(free_vram=True, drop_tracking_predictor=False)`
- `stop_tracking(free_vram=True, drop_tracking_predictor=False)`
- `predict(frame, prompt=None, boxes=None, labels=None, box_format="cxcywh_norm", confidence_threshold=None)`
- `predict_text(frame, prompt, confidence_threshold=None)`
- `predict_box(frame, boxes, labels=None, box_format="cxcywh_norm", prompt=None, confidence_threshold=None)`
- `register_first_frame(frame, boxes=None, phrases=None, points_data=None)`
- `track(frame)`
- `get_next(frame)`
- `remove_object(obj_id, strict=False, need_output=False)`
- `close()`

`Sam3Segmentation` state:
- `last_obj_ids`
- `current_frame_masks` (`{obj_id: mask(H,W,1)}`)
- `last_boxes_xyxy`
- `last_scores`
- `tracking_object_ids`
- `invisible_object_ids`
- `first_frame_registered`
- `last_frame_idx`
- `tracking_active` (read-only property)
- `tracked_obj_ids` (read-only property)

SAM3 tracking request/response summary:
- `register_first_frame` request (box/phrase):
  - `{"operation":"register_first_frame","frame":frame,"boxes":...,"phrases":...}`
- `register_first_frame` request (points):
  - `{"operation":"register_first_frame","frame":frame,"points_data":{"obj_id":{"input_point":...,"input_label":...}}}`
- `register_first_frame` success response:
  - `{"result":"SUCCESS","data":{"frame_idx":int,"obj_ids":[...],"masks":uint8(N,H,W,1)}}`
- `track`/`get_next` request:
  - `{"operation":"track","frame":frame}` or `{"operation":"get_next","frame":frame}`
- `track`/`get_next` success response:
  - `{"result":"SUCCESS","data":{"obj_ids":[...],"masks":uint8(N,H,W,1)}}`
- `remove_object` request/response:
  - request: `{"operation":"remove_object","obj_id":"...","strict":bool,"need_output":bool}`
  - response: `{"result":"SUCCESS","data":{"obj_ids":[...]}}`
- `reset` supports tracking reset:
  - `{"operation":"reset","free_vram":bool,"reset_tracking":bool,"drop_tracking_predictor":bool}`
- `get_memory_report` response:
  - GPU/모델 메모리 트리, runtime tensor usage, allocator stats

Reset behavior guidance:
- `stop_tracking()` is the recommended API to stop SAM3 streaming tracking.
- `reset_tracking(...)` is still available as an explicit alias.
- Current merged server keeps `drop_tracking_predictor` parameter for compatibility, but the server currently ignores the drop request and keeps the predictor resident.

Capability-based local validation:
- On initialization (`validate_capabilities_on_init=True` by default), the client reads `get_capabilities()` and caches supported `box_formats`/`config_keys`.
- `set_config` and box-prompt calls validate values against that cache when available.

### DINOv3 Image Prompt Detection

`Dinov3Client` is the low-level client for the `nrmk_dinov3` ZeroMQ server.
`Dinov3Detection` adds image-prompt payload construction and heatmap-to-mask postprocessing.
It can optionally refine DINOv3 heatmap points through the SAM3 server by setting `run_sam3=True`.
The package does not start Docker or load DINOv3 weights; run the DINOv3 server separately.

```python
from neuromeka_vfm import Dinov3Client, Dinov3Detection, Dinov3DetectionConfig

# Low-level RPC client
dino = Dinov3Client(hostname="127.0.0.1", port=5568, timeout_ms=180000)
print(dino.get_capabilities())
print(dino.get_config())
dino.close()

# High-level image-prompt detection
detector = Dinov3Detection(hostname="127.0.0.1", port=5568, timeout_ms=180000)
resp = detector.detect_image_prompt(
    scene=scene_rgb,          # np.uint8 RGB, shape (H, W, 3)
    prompt_image=prompt_rgb,  # RGB or RGBA image crop, shape (h, w, 3|4)
    points=None,              # optional; defaults to alpha/object center
    config=Dinov3DetectionConfig(
        scene_patch_multiplier=50,
        prompt_patch_multiplier=25,
        threshold=0.7,
        max_detections=1,
    ),
)

if resp.get("result") == "SUCCESS":
    data = resp["data"]
    masks = data["masks"]       # {"1": mask(H,W,1)}
    bboxes = data["bboxes"]     # [{"label":1,"top":...,"left":...,"bottom":...,"right":...}]
    heatmap = data["scene_heatmap"]

detector.close()
```

#### DINOv3 + SAM3 refinement

This SDK-side workflow calls the DINOv3 server first, selects high-score native DINO patch centers from the heatmap, sends those points to the SAM3 server as point prompts, and returns the consensus SAM3 mask.

```python
from neuromeka_vfm import Dinov3Detection, Dinov3DetectionConfig

detector = Dinov3Detection(hostname="127.0.0.1", port=5568, timeout_ms=180000)
resp = detector.detect_image_prompt(
    scene=scene_rgb,
    prompt_image=prompt_rgb,
    points=[{"x": 36, "y": 16}],   # reference-image point on the target object
    config=Dinov3DetectionConfig(
        threshold=0.2,
        run_sam3=True,
        sam3_hostname="127.0.0.1",  # defaults to the DINOv3 hostname
        sam3_port=5559,
        sam3_top_n_points=4,
    ),
)

if resp.get("result") == "SUCCESS":
    data = resp["data"]
    masks = data["masks"]                 # SAM3-refined masks
    dino_mask = data["dino_labeled_mask"] # thresholded DINO component mask
    selected_points = data["sam3"]["top_patch_points"]

detector.close()
```

For a separate workflow object, import `Dinov3Sam3Detection` and `Dinov3Sam3DetectionConfig`.

`Dinov3DetectionConfig` fields:
- `backbone`: optional server backbone override. Leave as `None` to keep the model already loaded by the DINOv3 server.
- `model_dtype`: optional server dtype override (`bfloat16`, `float16`, `float32`).
- `scene_patch_multiplier`: long-side resize multiplier for the scene image.
- `prompt_patch_multiplier`: long-side resize multiplier for prompt images.
- `threshold`: score threshold used to convert the scene heatmap to masks.
- `final_erosion_count`: optional 3x3 erosion iterations after thresholding.
- `segment_min_size`: minimum component area in pixels.
- `max_detections`: optional top-N component limit.
- `run_sam3`: when true, use DINOv3 heatmap-selected point prompts to refine masks with SAM3.
- `point_selection_mode`: `global_top_n` or `top_n_per_component`.
- `sam3_top_n_points`: number of positive DINO patch centers to pass to SAM3.
- `sam3_negative_top_n_points`: optional negative point count from medium/low score regions.
- `sam3_mask_consensus_mode`: `area` or `iou` consensus across SAM3 masks.
- `sam3_hostname`, `sam3_port`: SAM3 server location for refinement.

### Pose Estimation

Optional: Generate simple box STL (client-side utility)

```python
from neuromeka_vfm import MeshGenerator, write_box_stl

# function style
path = write_box_stl(
    filename="box_61x56x99.stl",
    width=0.0617,   # X (m)
    depth=0.0564,   # Y (m)
    height=0.0993,  # Z (m)
    output_dir="./mesh",   # optional, not fixed to /opt/meshes
)

# class style
mesh_gen = MeshGenerator(output_dir="./mesh")
path2 = mesh_gen.write_box_stl("box2.stl", width=0.05, depth=0.05, height=0.05)
```

Path rule:
- absolute `filename`: write exactly there
- relative `filename`: resolve by `output_dir`, else `$NRMK_MESH_DIR`, else `/opt/meshes`

**Mesh upload**: Upload the mesh file (STL) to `/opt/meshes/` on the host PC. You can also use SSH directly.
Install extra first: `pip install "neuromeka_vfm[ssh]"`.

```python
from neuromeka_vfm import upload_mesh

upload_mesh(
    host="192.168.10.63",
    user="user",
    password="pass",
    local="mesh/my_mesh.stl",         # local mesh path
    remote="/opt/meshes/my_mesh.stl", # host mesh path (Docker volume)
)
```

Initialization

```python
from neuromeka_vfm import PoseEstimation

pose = PoseEstimation(host="192.168.10.72", port=5557)

pose.init(
    mesh_path="/app/modules/foundation_pose/mesh/my_mesh.stl",
    apply_scale=1.0,
    track_refine_iter=3,
    min_n_views=40,
    inplane_step=60,
)

# Or initialize directly with mesh arrays (without mesh_path)
pose.init(
    mesh_vertices=mesh_vertices,   # (V, 3)
    mesh_faces=mesh_faces,         # (F, 3)
    symmetry_tfs=symmetry_tfs,     # optional: (N,4,4) or (4,4)
)
```

- mesh input for `init`:
  - `mesh_path` (STL/OBJ path), or
  - direct mesh payload `mesh_vertices` + `mesh_faces`
  - if `mesh_path` is given, it is used with priority
- symmetry_tfs: optional symmetry transforms with shape `(N,4,4)` or `(4,4)`.
- apply_scale: scalar applied after loading the mesh.
  - STL in meters: 1.0 (no scaling)
  - STL in centimeters: 0.01 (1 cm -> 0.01 m)
  - STL in millimeters: 0.001 (1 mm -> 0.001 m)
- force_apply_color: if True, forces a solid color when the mesh lacks color data.
- apply_color: RGB tuple (0-255) used when `force_apply_color=True`.
- est_refine_iter: number of refinement iterations during registration (higher = more accurate, slower).
- track_refine_iter: number of refinement iterations per frame during tracking.
- min_n_views: minimum number of sampled camera views (affects rotation candidates).
- inplane_step: in-plane rotation step in degrees (smaller = more candidates).

Registration and tracking

```python
# Registration (server defaults when iteration is omitted, check_vram=True pre-checks VRAM)
register_resp = pose.register(rgb=rgb0, depth=depth0, mask=mask0, K=cam_K, check_vram=True)

# Init + register in one call using mesh payload
register_mesh_resp = pose.register_with_mesh(
    mesh_path="/app/modules/foundation_pose/mesh/my_mesh.stl",
    # or mesh_vertices=..., mesh_faces=...
    symmetry_tfs=symmetry_tfs,   # optional: (N,4,4) or (4,4)
    rgb=rgb0,
    depth=depth0,
    mask=mask0,
    K=cam_K,
    check_vram=True,
)

# Tracking (optionally limit search area with bbox_xywh)
track_resp = pose.track(rgb=rgb1, depth=depth1, K=cam_K, bbox_xywh=bbox_xywh)

# Session-scoped tracking for a specific object
pose.register_with_mesh(
    session_id="cup_0",
    object_id="cup_0",
    mesh_path="/app/modules/foundation_pose/mesh/cup.stl",
    rgb=rgb0,
    depth=depth0,
    mask=mask0,
    K=cam_K,
)
track_cup = pose.track(
    session_id="cup_0",
    object_id="cup_0",
    rgb=rgb1,
    depth=depth1,
    K=cam_K,
    mask=mask1,
)

# Recommended reset operation
pose.reset_vram(free_vram=True)

# Backward-compatible alias (deprecated)
pose.reset()

pose.close()
```

- cam_K: camera intrinsics.
- Large RGB resolution, large `min_n_views`, or small `inplane_step` can cause GPU VRAM errors.
- `check_vram=True` in `register` performs a pre-check to prevent server shutdown due to OOM.
- `iteration` in `register`/`track` can override the server default if provided.
- `register_with_mesh()` sends init+register in one operation with mesh payload.
- `session_id` and `object_id` are optional for FoundationPose servers with multi-session support. If omitted, the server uses its default single-object session.
- `track()` can send `mask` or `bbox_xywh`; newer servers use them to recenter object-specific tracking.
- Use `reset_vram()` as the default reset API (`reset()` is kept as backward-compatible deprecated wrapper).
- `reset_object()` can optionally receive mesh/model overrides:
  - `mesh_path` or `mesh_vertices` + `mesh_faces`
  - `model_pts`, `model_normals`, `symmetry_tfs`, `min_n_views`, `inplane_step`
- Default host/port can come from `FPOSE_HOST` and `FPOSE_PORT` environment variables.
- Backward-compat alias: `FoundationPoseClient`.

Multi-object FoundationPose tracking

```python
from neuromeka_vfm import MultiObjectPoseEstimation

tracker = MultiObjectPoseEstimation(host="192.168.10.72", port=5557)
tracker.add_object(
    object_id="cup_0",
    session_id="cup_0",
    mesh_path="/app/modules/foundation_pose/mesh/cup.stl",
    mask_id="sam3_cup_0",
)
tracker.add_object(
    object_id="cup_1",
    session_id="cup_1",
    mesh_path="/app/modules/foundation_pose/mesh/cup.stl",
    mask_id="sam3_cup_1",
)

tracker.register_many(
    rgb=rgb0,
    depth=depth0,
    K=cam_K,
    masks_by_object_id={
        "cup_0": mask0,
        "cup_1": mask1,
    },
)

track_many_resp = tracker.track_many(
    rgb=rgb1,
    depth=depth1,
    K=cam_K,
    masks_by_object_id={
        "cup_0": mask0_next,
        "cup_1": mask1_next,
    },
)
poses = track_many_resp["data"]["poses"]
tracker.close()
```

Low-level API equivalents are available on `PoseEstimation`: `create_session`,
`list_sessions`, `delete_session`, `reset_session`, `register_many`, and `track_many`.
`track_many` requires a server that reports `track_many_supported=True` in capabilities.

<!--
## Benchmark

Measured on local servers. Empty cells are not yet measured.

**RTX 5060**
| Task | Prompt | None (s) | JPEG (s) | PNG (s) | h264 (s) |
| --- | --- | --- | --- | --- | --- |
| Grounding DINO | text (human . cup .) | 0.86 | 0.35 | 0.50 | 0.52 |
| DINOv2 | image prompt | 0.85 | 0.49 | 0.65 | 0.63 |
| SAM2 | - |  |  |  |  |
| FoundationPose registration | - |  |  |  |  |
| FoundationPose track | - |  |  |  |  |

**RTX 5090**
| Task | Prompt | None (s) | JPEG (s) | PNG (s) | h264 (s) |
| --- | --- | --- | --- | --- | --- |
| Grounding DINO | text (human . cup .) |  |  |  |  |
| DINOv2 | image prompt |  |  |  |  |
| SAM2 | - |  |  |  |  |
| FoundationPose registration | - | 0.4 | - |  |  |
| FoundationPose track | - | 0.03 |  |  |  |
-->

## Release notes

- 0.1.2: Improved success detection for Segmentation responses (`result`/`success`/`status`), fixed image prompt registration/usage, added `check_vram` to PoseEstimation `register`.
- 0.1.1: Improved resource cleanup in PoseEstimation/Segmentation, use server defaults when iteration is omitted, added pose demo example.
- 0.1.0: Initial public release. Includes FoundationPose RPC client, real-time segmentation client, SSH-based mesh upload CLI/API.
