Metadata-Version: 2.4
Name: neuromeka_vfm
Version: 0.1.6
Summary: Client utilities for Neuromeka VFM FoundationPose RPC (upload meshes, call server)
Author: Neuromeka
License: MIT License
        
        Copyright (c) 2025 Neuromeka Co., Ltd.
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: pyzmq
Requires-Dist: paramiko
Requires-Dist: av
Requires-Dist: trimesh
Requires-Dist: tqdm
Dynamic: license-file

# neuromeka_vfm

A lightweight client SDK for communicating with Segmentation (SAM2, Grounding DINO) and Pose Estimation (NVIDIA FoundationPose) servers over RPC/ZeroMQ. It also provides SSH/SFTP utilities to upload mesh files to the host.

- Website: http://www.neuromeka.com
- PyPI package: https://pypi.org/project/neuromeka_vfm/
- Documents: https://docs.neuromeka.com

## Installation

```bash
pip install neuromeka_vfm
```

## Python API (usage by example)

- Client PC: the machine running your application with this package installed.
- Host PC: the machine running Segmentation and Pose Estimation Docker servers. If you run Docker locally, use `localhost`.

### Segmentation

```python
from neuromeka_vfm import Segmentation

seg = Segmentation(
    hostname="192.168.10.63",
    port=5432,
    compression_strategy="png",    # none | png | jpeg | h264
)

# Register using an image prompt
seg.add_image_prompt("drug_box", ref_rgb)
seg.register_first_frame(
    frame=first_rgb,
    prompt="drug_box",     # ID string
    use_image_prompt=True,
)

# Register using a text prompt
seg.register_first_frame(
    frame=first_rgb,
    prompt="box .",         # Text prompt (must end with " .")
    use_image_prompt=False,
)

# SAM2 tracking on the registered mask(s)
resp = seg.get_next(next_rgb)
if isinstance(resp, dict) and resp.get("result") == "ERROR":
    print(f"Tracking error: {resp.get('message')}")
    seg.reset()
else:
    masks = resp

# Segmentation settings / model selection (nrmk_realtime_segmentation v0.2+)
caps = seg.get_capabilities()["data"]
current = seg.get_config()["data"]
seg.set_config(
    {
        "grounding_dino": {
            "backbone": "Swin-B",        # Swin-T | Swin-B
            "box_threshold": 0.35,
            "text_threshold": 0.25,
        },
        "dino_detection": {
            "threshold": 0.5,
            "target_multiplier": 25,
            "img_multiplier": 50,
            "background_threshold": -1.0,
            "final_erosion_count": 10,
            "segment_min_size": 20,
        },
        "sam2": {
            "model": "facebook/sam2.1-hiera-large",
            "use_legacy": False,
            "compile": False,
            "offload_state_to_cpu": False,
            "offload_video_to_cpu": False,
        },
    }
)

# Remove an object (v0.2+, only when use_legacy=False)
seg.remove_object("cup_0")

seg.close()
```

Additional Segmentation APIs and behaviors

- `benchmark=True` in the constructor enables timing counters (`call_time`, `call_count`) for `add_image_prompt`, `register_first_frame`, and `get_next`.
- `switch_compression_strategy()` lets you change the compression strategy at runtime.
- `register_first_frame()` returns `True`/`False` and raises `ValueError` if image prompts are missing when `use_image_prompt=True`.
- `register_first_frame()` accepts a list of prompt IDs when `use_image_prompt=True`.
- `get_next()` returns `None` if called before registration; it can also return the server error dict when available.
- `reset()` performs a server-side reset, while `finish()` clears only local state.
- Exposed state: `tracking_object_ids`, `current_frame_masks`, `invisible_object_ids`.
- Backward-compat alias: `NrmkRealtimeSegmentation`.

#### Segmentation v0.2 config summary (defaults/choices)
`seg.get_capabilities()` can differ depending on server configuration. The following reflects v0.2 defaults.

```yaml
grounding_dino:
  backbone:
    choices:
      - Swin-B
      - Swin-T
    default: Swin-T
  box_threshold:
    default: 0.35
    min: 0.0
    max: 1.0
  text_threshold:
    default: 0.25
    min: 0.0
    max: 1.0

dino_detection:
  threshold:
    default: 0.5
  target_multiplier:
    default: 25
  img_multiplier:
    default: 50
  background_threshold:
    default: -1.0
  final_erosion_count:
    default: 10
  segment_min_size:
    default: 20

sam2:
  model:
    choices:
      - facebook/sam2-hiera-base-plus
      - facebook/sam2-hiera-large
      - facebook/sam2-hiera-small
      - facebook/sam2-hiera-tiny
      - facebook/sam2.1-hiera-base-plus
      - facebook/sam2.1-hiera-large
      - facebook/sam2.1-hiera-small
      - facebook/sam2.1-hiera-tiny
    default: facebook/sam2.1-hiera-large
  use_legacy:
    default: false
  compile:
    default: false
  offload_state_to_cpu:
    default: false
  offload_video_to_cpu:
    default: false
```

#### Segmentation v0.2 notes and changes

- If SAM2 VRAM estimation fails, `seg.get_next()` may return `{"result":"ERROR"}`. Handle the error and call `reset` before re-registering.
- `compile=True` can slow down first-frame registration and `reset`.
- CPU offloading is most effective when both `offload_state_to_cpu=True` and `offload_video_to_cpu=True` are set (legacy mode does not support `offload_video_to_cpu`).
- `remove_object` is supported only when `use_legacy=False`.
- GroundingDINO added the Swin-B backbone and fixed prompt-token merge issues.

### Pose Estimation

**Mesh upload**: Upload the mesh file (STL) to `/opt/meshes/` on the host PC. You can also use SSH directly.

```python
from neuromeka_vfm import upload_mesh

upload_mesh(
    host="192.168.10.63",
    user="user",
    password="pass",
    local="mesh/my_mesh.stl",         # local mesh path
    remote="/opt/meshes/my_mesh.stl", # host mesh path (Docker volume)
)
```

Initialization

```python
from neuromeka_vfm import PoseEstimation

pose = PoseEstimation(host="192.168.10.72", port=5557)

pose.init(
    mesh_path="/app/modules/foundation_pose/mesh/my_mesh.stl",
    apply_scale=1.0,
    track_refine_iter=3,
    min_n_views=40,
    inplane_step=60,
)
```

- mesh_path: path to the mesh file (STL/OBJ). Initialization fails if missing.
- apply_scale: scalar applied after loading the mesh.
  - STL in meters: 1.0 (no scaling)
  - STL in centimeters: 0.01 (1 cm -> 0.01 m)
  - STL in millimeters: 0.001 (1 mm -> 0.001 m)
- force_apply_color: if True, forces a solid color when the mesh lacks color data.
- apply_color: RGB tuple (0-255) used when `force_apply_color=True`.
- est_refine_iter: number of refinement iterations during registration (higher = more accurate, slower).
- track_refine_iter: number of refinement iterations per frame during tracking.
- min_n_views: minimum number of sampled camera views (affects rotation candidates).
- inplane_step: in-plane rotation step in degrees (smaller = more candidates).

Registration and tracking

```python
# Registration (server defaults when iteration is omitted, check_vram=True pre-checks VRAM)
register_resp = pose.register(rgb=rgb0, depth=depth0, mask=mask0, K=cam_K, check_vram=True)

# Tracking (optionally limit search area with bbox_xywh)
track_resp = pose.track(rgb=rgb1, depth=depth1, K=cam_K, bbox_xywh=bbox_xywh)

pose.close()
```

- cam_K: camera intrinsics.
- Large RGB resolution, large `min_n_views`, or small `inplane_step` can cause GPU VRAM errors.
- `check_vram=True` in `register` performs a pre-check to prevent server shutdown due to OOM.
- `iteration` in `register`/`track` can override the server default if provided.
- `reset()` resets the server state; `reset_object()` reuses the cached mesh to rebuild the rotation grid.
- Default host/port can come from `FPOSE_HOST` and `FPOSE_PORT` environment variables.
- Backward-compat alias: `FoundationPoseClient`.

<!--
## Benchmark

Measured on local servers. Empty cells are not yet measured.

**RTX 5060**
| Task | Prompt | None (s) | JPEG (s) | PNG (s) | h264 (s) |
| --- | --- | --- | --- | --- | --- |
| Grounding DINO | text (human . cup .) | 0.86 | 0.35 | 0.50 | 0.52 |
| DINOv2 | image prompt | 0.85 | 0.49 | 0.65 | 0.63 |
| SAM2 | - |  |  |  |  |
| FoundationPose registration | - |  |  |  |  |
| FoundationPose track | - |  |  |  |  |

**RTX 5090**
| Task | Prompt | None (s) | JPEG (s) | PNG (s) | h264 (s) |
| --- | --- | --- | --- | --- | --- |
| Grounding DINO | text (human . cup .) |  |  |  |  |
| DINOv2 | image prompt |  |  |  |  |
| SAM2 | - |  |  |  |  |
| FoundationPose registration | - | 0.4 | - |  |  |
| FoundationPose track | - | 0.03 |  |  |  |
-->

## Release notes

- 0.1.2: Improved success detection for Segmentation responses (`result`/`success`/`status`), fixed image prompt registration/usage, added `check_vram` to PoseEstimation `register`.
- 0.1.1: Improved resource cleanup in PoseEstimation/Segmentation, use server defaults when iteration is omitted, added pose demo example.
- 0.1.0: Initial public release. Includes FoundationPose RPC client, real-time segmentation client, SSH-based mesh upload CLI/API.
