Metadata-Version: 2.4
Name: cane-robotics
Version: 0.1.0
Summary: Foundation Model Active Learning for autonomous robot object discovery
Project-URL: Homepage, https://github.com/colingfly/cane-robotics
Project-URL: Repository, https://github.com/colingfly/cane-robotics
Project-URL: Issues, https://github.com/colingfly/cane-robotics/issues
Author: Cane
License-Expression: MIT
Keywords: active-learning,clip,embodied-ai,foundation-models,grounding-dino,object-detection,robotics,sim2real,vlm,yolo
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: numpy>=1.24.0
Requires-Dist: opencv-python>=4.8.0
Requires-Dist: pillow>=10.0.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: torch>=2.0.0
Requires-Dist: torchvision>=0.15.0
Requires-Dist: transformers>=4.30.0
Requires-Dist: ultralytics>=8.0.0
Provides-Extra: dev
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: sim
Requires-Dist: isaacsim; extra == 'sim'
Description-Content-Type: text/markdown

# cane-robotics

Foundation Model Active Learning (FMAL) for autonomous robot object discovery.

Fuses three vision-language foundation models -- GroundingDINO, DINO, and CLIP -- into a unified acquisition function for active learning. The system enables robots to efficiently discover and learn novel objects in unstructured environments with minimal human annotation.

## Install

```bash
pip install cane-robotics
```

## Quick Start

```bash
# Run a single active learning experiment
cane-robotics run --images-dir data/images --labels-dir data/labels --classes box laptop chair

# Run all ablation variants across multiple seeds
cane-robotics ablations --images-dir data/images --labels-dir data/labels

# Evaluate sim-to-real transfer
cane-robotics sim2real --synthetic-dir data/synthetic --real-dir data/real

# Launch annotation GUI
cane-robotics annotate novel_detections/

# Plot experiment results
cane-robotics plot results/

# Generate synthetic training data (Isaac Sim)
cane-robotics generate --output-dir data/synthetic --num-scenes 50
```

## How It Works

The active learning pipeline scores candidate object detections using three complementary signals:

1. **GroundingDINO** -- open-vocabulary detection confidence
2. **DINO ViT** -- class-agnostic attention saliency (filters background clutter)
3. **CLIP** -- semantic novelty relative to known object classes

These are combined into a unified acquisition score:

```
score(x) = 0.5 * conf_gdino + 0.3 * attn_dino + 0.2 * sim_fg - 0.2 * sim_bg
```

A temporal deduplication module tracks previously queried objects via embedding similarity, reducing redundant annotation queries by ~69%.

Each round, the top-scoring proposals are labeled (by human or oracle), added to the training set, and a YOLOv8 detector is retrained. The loop repeats until convergence.

## Package Structure

```
cane_robotics/
  pipeline/        Core active learning pipeline, offline replay, ROS node
  models/          Foundation model wrappers (GDINO, CLIP, DINO, dedup)
  dataset/         Dataset management and augmentation
  config/          Experiment configuration (dataclasses + YAML)
  experiments/     Experiment runners, ablations, sim2real evaluation
  training/        YOLO training and dataset preparation
  sim/             Isaac Sim synthetic data generation
  tools/           Annotation GUI, result plotting
```

## Python API

```python
from cane_robotics import (
    ActiveLearningPipeline,
    create_gdino_pipeline,
    ExperimentConfig,
    DatasetManager,
    TemporalDeduplicator,
)

# Create pipeline with full multi-VLM acquisition
pipeline = create_gdino_pipeline(
    known_classes=["mug", "bowl", "can"],
    acquisition_type="full",
    enable_dedup=True,
)

# Process a single image
result = pipeline.process_image("frame_001.jpg")
for obj in result["novel_objects"]:
    print(f"{obj['label']} (score={obj['score']:.3f})")
```

## Ablation Variants

The experiment framework supports 8 acquisition function variants for systematic comparison:

| Variant | Description |
|---------|-------------|
| `full` | All three VLM signals combined (default) |
| `random` | Random scoring baseline |
| `gdino_only` | GroundingDINO confidence only |
| `clip_only` | CLIP novelty signal only |
| `dino_only` | DINO attention only |
| `no_fg_bg_gate` | Full formula without foreground/background gating |
| `no_dedup` | Full scoring with deduplication disabled |
| `no_sam` | Full scoring with SAM splitting disabled |

## Dependencies

Core: `numpy`, `pyyaml`, `torch`, `torchvision`, `ultralytics`, `opencv-python`, `Pillow`, `transformers`

Optional:
- `[sim]` -- Isaac Sim for synthetic data generation
- `[dev]` -- pytest, ruff for development

## License

MIT
