Metadata-Version: 2.4
Name: recallforge
Version: 0.1.2
Summary: RecallForge - Cross-Modal Vision-Language Search Engine
Author: Brian Meyer
License: MIT
Project-URL: Homepage, https://github.com/brianmeyer/recallforge
Project-URL: Repository, https://github.com/brianmeyer/recallforge
Project-URL: Issues, https://github.com/brianmeyer/recallforge/issues
Keywords: search,semantic,embedding,vision-language,cross-modal
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: <3.14,>=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: lancedb<1.0,>=0.20
Requires-Dist: pyarrow<20.0,>=18.0
Requires-Dist: pillow<12.0,>=10.0
Requires-Dist: numpy<3.0,>=2.0
Requires-Dist: mcp<2.0,>=1.0
Provides-Extra: torch
Requires-Dist: torch<3.0,>=2.0; extra == "torch"
Requires-Dist: torchvision<1.0,>=0.15; extra == "torch"
Requires-Dist: transformers<5.0,>=4.40; extra == "torch"
Requires-Dist: scipy<2.0,>=1.10; extra == "torch"
Requires-Dist: qwen-vl-utils<1.0,>=0.0.14; extra == "torch"
Provides-Extra: mlx
Requires-Dist: mlx<1.0,>=0.20; extra == "mlx"
Requires-Dist: mlx-vlm<1.0,>=0.1; extra == "mlx"
Requires-Dist: qwen-vl-utils<1.0,>=0.0.14; extra == "mlx"
Requires-Dist: torchvision<1.0,>=0.15; extra == "mlx"
Provides-Extra: docs
Requires-Dist: pypdf<6.0,>=5.0; extra == "docs"
Provides-Extra: cuda
Requires-Dist: torch<3.0,>=2.0; extra == "cuda"
Requires-Dist: torchvision<1.0,>=0.15; extra == "cuda"
Requires-Dist: transformers<5.0,>=4.40; extra == "cuda"
Requires-Dist: scipy<2.0,>=1.10; extra == "cuda"
Requires-Dist: qwen-vl-utils<1.0,>=0.0.14; extra == "cuda"
Provides-Extra: dev
Requires-Dist: pytest<9.0,>=8.0; extra == "dev"
Requires-Dist: pytest-asyncio<1.0,>=0.24; extra == "dev"
Provides-Extra: all
Requires-Dist: recallforge[docs,mlx,torch]; extra == "all"
Dynamic: license-file

# RecallForge

![CI](https://github.com/brianmeyer/recallforge/actions/workflows/ci.yml/badge.svg) [![PyPI](https://img.shields.io/pypi/v/recallforge)](https://pypi.org/project/recallforge/) ![License](https://img.shields.io/badge/License-MIT-green) ![Python](https://img.shields.io/badge/Python-3.12%20%7C%203.13-blue)

**Every modality, one search. Local first.**

![RecallForge — Your Files → One Search](docs/hero-banner.png)

Standard RAG only works on text. Drop a PDF with charts, a photo of a whiteboard, or a video recording — and your AI agent goes blind. RecallForge gives agents **eyes and ears over your local filesystem**. Text, images, documents, and video all live in one unified search space, and nothing ever leaves your machine.

## What this enables

> **You:** "What did the whiteboard look like in our last meeting?"
>
> **Claude:** *(Searches your local `~/Documents`, finds a photo of a whiteboard from an iPhone, reads the handwriting via Qwen3-VL, and surfaces the image with context.)*

> **You:** "Find the architecture diagram from that PDF I downloaded last week."
>
> **Claude:** *(Indexes the PDF, matches your query against extracted text and embedded figures, returns the relevant page.)*

> **You:** *(Drops an image of a circuit board)* "Find my notes related to this."
>
> **Claude:** *(Reverse image-to-text search across your indexed notes. Returns matching documents.)*

One query. Any modality. All local.

## What makes RecallForge different

| Capability | RecallForge | Chroma | Mem0 | Qdrant | Weaviate |
|------------|-------------|--------|------|--------|----------|
| Cross-modal search | ✅ Native | ✅ OpenCLIP | ❌ Text only | ❌ | ✅ CLIP modules |
| Video support [Beta] | ✅ | ❌ | ❌ | ❌ | ❌ |
| Document ingest (PDF/DOCX/PPTX) | ✅ | ❌ | ❌ | ❌ | ❌ |
| Built-in reranking | ✅ Multimodal | ❌ | ❌ | ✅ ColBERT | ✅ Modules |
| Query expansion | ✅ Multimodal | ❌ | ❌ | ❌ | ✅ Generative |
| MCP-native | ✅ 16 tools | ❌ | ❌ | ❌ | ❌ |
| 100% local | ✅ | ✅ | ⚠️ Cloud default | ✅ | ✅ Docker |
| Apple Silicon optimized | ✅ MLX 4-bit | ❌ | ❌ | ❌ | ❌ |
| Cloud option | ❌ | ✅ | ✅ | ✅ | ✅ |
| JS/TS SDK | ❌ | ✅ | ✅ | ✅ | ✅ |

**Use RecallForge when:** You need multimodal memory for AI agents that runs entirely on your machine, especially on Apple Silicon. One search across text, images, documents, and video.

**Use something else when:** You need cloud hosting, massive scale (millions+ vectors), or a JS/TS-first ecosystem.

## Performance

4 modalities (text, images, documents, video) unified in a single MLX-optimized local vector space. Sub-60ms search latency in embed mode. Under 400MB resident memory.

### Pipeline ablation (Mac mini M4 16GB, MLX 4-bit)

Each stage of the pipeline improves retrieval quality. The architecture is the product.

| Stage | R@1 | R@5 | R@10 | MRR | p50 |
|-------|-----|-----|------|-----|-----|
| Vector-only | 68.3% | 68.3% | 70.0% | 68.5% | 17ms |
| BM25-only | 55.0% | 55.0% | 85.0% | 60.0% | 15ms |
| Vector + BM25 (RRF) | 71.7% | 86.7% | 86.7% | 76.4% | 93ms |
| **+ Reranker (hybrid mode)** | **83.3%** | **91.7%** | **95.0%** | **86.9%** | 3.9s |
| + Query expansion (full mode) | 83.3% | 90.0% | 93.3% | 85.7% | 5.7s |

The reranker is the big win: **+22% R@1 over raw embeddings**, pushing R@10 to 95%. Embed mode gives you 17ms searches for speed-sensitive workloads. Hybrid/full mode gives you 83%+ R@1 when quality matters.

*Measured on 200 text documents + 50 images with ground-truth queries. See `benchmarks/` for methodology.*

### Latency & resource usage

| Metric | MLX 4-bit | PyTorch fp16 |
|--------|-----------|--------------|
| Warm search p50 (embed) | 53ms | 599ms |
| Warm search p95 (embed) | 55ms | — |
| Cold start | 7.6s | ~20s |
| Peak RSS (embed) | 329MB* | ~4GB |
| Text indexing | 5.0 docs/sec | — |

*\*MLX maps model weights lazily via memory-mapped files. RSS reflects resident pages, not full model size (~1.7GB on disk for embed mode). Actual memory pressure is low.*

### COCO 1K retrieval (raw embeddings, no pipeline)

For transparency: raw embedding quality on the standard COCO benchmark (1,000 images, no BM25/reranking/expansion). These numbers reflect the Qwen3-VL-2B embedder alone, not the full pipeline.

| Direction | R@1 | R@5 | R@10 |
|-----------|-----|-----|------|
| Text → Image | 24.5% | 42.3% | 49.9% |
| Image → Text | 34.3% | 42.0% | 44.1% |

*Qwen3-VL is a generative VLM, not a contrastive model like CLIP. The pipeline ablation above shows how BM25 fusion and reranking compensate for this.*

## Installation

```bash
pip install recallforge[mlx]       # Apple Silicon (recommended, 4-bit quantization)
pip install recallforge[cuda]      # NVIDIA GPU
pip install recallforge[torch]     # CPU / other PyTorch targets
pip install recallforge[docs]      # add richer PDF extraction (optional)
```

> **Note:** `pip install recallforge` installs the core without a backend.
> You need at least one of `[mlx]`, `[cuda]`, or `[torch]` to run inference.

From source:

```bash
git clone https://github.com/brianmeyer/recallforge.git
cd recallforge
pip install -e ".[mlx]"
```

### Requirements

- Python 3.12 or 3.13 required (3.14 not yet supported, pending pyarrow wheel)
- Disk: ~2-5GB free for model downloads on first run
- RAM (MLX 4-bit): ~1.7GB (`embed`) to ~4.4GB (`full`)
- `ffmpeg` recommended for video indexing/search
- First run downloads models automatically and may take a few minutes

## MCP Server (primary use)

RecallForge is designed as a **Model Context Protocol server for AI agents**. Configure in Claude Desktop (or any MCP-compatible agent host):

```json
{
  "mcpServers": {
    "recallforge": {
      "command": "recallforge",
      "args": ["serve", "--mode", "full"]
    }
  }
}
```

Run manually:

```bash
recallforge serve --mode embed --backend mlx --quantize 4bit
```

Exposes **16 tools** for agents: `ingest`, `search`, `search_fts`, `search_vec`, `index_document`, `index_image`, `memory_add`, `memory_update`, `memory_delete`, `status`, `rebuild_fts`, `list_collections`, `list_namespaces`, `batch`, `get_config`, `set_config`.

See [docs/mcp-tools.md](docs/mcp-tools.md) for the full tool reference.

## Search modes

| Mode | Models loaded | Memory (MLX 4-bit) | Quality | Best for |
|------|--------------|-------------------|---------|----------|
| `embed` | Embedder | ~1.7GB | Good | Memory-constrained, fast searches |
| `hybrid` | + Reranker | ~3.4GB | Better | Balanced quality and memory |
| `full` | + Query Expander | ~4.4GB | Best | Maximum retrieval quality |

> **Video [Beta] note:** Video support requires `ffmpeg`. The torch backend video path has a known upstream issue (see [QwenLM/Qwen3.5#58](https://github.com/QwenLM/Qwen3.5/issues/58)).

## How it works

RecallForge encodes text, images, and video frames into the same 2048-dimensional vector space using Qwen3-VL. This means "find notes about this diagram" works whether the diagram is text, an image, or a frame from a video. A 3-stage pipeline handles the rest:

```mermaid
graph TD
    subgraph Local Filesystem
        Docs[📄 Documents]
        Imgs[🖼️ Images]
        Vids[🎬 Video]
    end

    subgraph RecallForge Ingest
        Docs --> TxtExt[Text Extractor]
        Imgs --> VLM[Qwen3-VL Encoder]
        Vids --> Frame[Frame & Audio Extractor]
        Frame --> VLM
        TxtExt --> VLM
    end

    subgraph LanceDB Storage
        VLM -->|2048-dim Vectors| VecDB[(Vector Space)]
        TxtExt -->|Text/Transcripts| FTS[(Tantivy FTS)]
    end

    subgraph MCP Search Pipeline
        Query[Agent Query] --> BM25[BM25 Text Search]
        Query --> Dense[Vector Similarity Search]
        BM25 --> RRF[RRF Fusion]
        Dense --> RRF
        RRF --> Rerank[Cross-Encoder Reranker]
        Rerank --> Output[Final Context to Agent]
    end
```

**Pipeline:** BM25 probe → Query expansion (full mode) → Parallel BM25 + Vector → RRF fusion → Reranking (hybrid/full) → Score blending

## CLI (development & debugging)

```bash
# Index anything
recallforge index ./photos ./docs
recallforge index ~/Movies/demo.mp4
recallforge index ~/Documents/roadmap.pptx

# Search any modality
recallforge search "whiteboard diagram from last meeting"
recallforge search --image ./photos/whiteboard.png
recallforge search --video ~/Movies/demo.mp4

# Watch a folder for changes (auto-index)
recallforge watch start ~/Documents --collection docs
recallforge watch list
recallforge watch stop ~/Documents

# Status
recallforge status
```

RecallForge auto-detects MLX on Apple Silicon, PyTorch elsewhere.

## Python API

```python
from recallforge import get_backend, get_storage
from recallforge.search import HybridSearcher

backend = get_backend()
storage = get_storage()
backend.warm_up()

# Index
storage.index_document(
    path="notes.md",
    text="My notes about AI...",
    collection="my_docs",
    model="Qwen3-VL-Embedding-2B",
    embed_func=backend.embed_text,
)

# Search
searcher = HybridSearcher(backend=backend, storage=storage, limit=10)
results = searcher.search("artificial intelligence")
for r in results:
    print(f"[{r.score:.3f}] {r.title}")
```

## Configuration

| Variable | Default | Description |
|----------|---------|-------------|
| `RECALLFORGE_BACKEND` | `auto` | `auto`, `mlx`, `torch` |
| `RECALLFORGE_MODE` | `full` | `embed`, `hybrid`, `full` |
| `RECALLFORGE_MLX_QUANTIZE` | `4bit` | `4bit`, `bf16` |
| `RECALLFORGE_STORE_PATH` | `~/.recallforge` | Storage directory |

## Project structure

```
src/recallforge/
├── backends/
│   ├── mlx_backend.py    # MLX 4-bit/bf16 (Apple Silicon)
│   └── torch_backend.py  # PyTorch (CUDA/MPS/CPU)
├── storage/
│   └── lancedb_backend.py # LanceDB + Tantivy FTS
├── cache.py              # LRU embedding cache
├── search.py             # Hybrid search pipeline (BM25 + vector + RRF)
├── server.py             # MCP server (16 tools)
├── documents.py          # PDF/DOCX/PPTX extraction
├── video.py              # Frame/transcript extraction
├── watch_folder.py       # Folder monitoring with dedup
└── cli.py                # CLI interface
```

## Development

```bash
pytest tests/ -m "not live"    # Unit tests (no model download needed)
pytest tests/ -m live -v       # Integration tests (requires models)
```

See [CONTRIBUTING.md](CONTRIBUTING.md) for full development guidelines.

## Attribution

RecallForge is inspired by [QMD](https://github.com/tobil/qmd) by Tobi. QMD pioneered the multi-stage retrieval pipeline (embedding, reranking, query expansion). RecallForge extends this pattern to vision-language with cross-modal retrieval and multi-backend support.

## License

MIT License
