Metadata-Version: 2.4
Name: cabinet-hsh
Version: 0.1.4
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: License :: OSI Approved :: MIT License
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Requires-Dist: maturin>=1.0 ; extra == 'dev'
Requires-Dist: pytest>=7.0 ; extra == 'dev'
Requires-Dist: black>=23.0 ; extra == 'dev'
Requires-Dist: ruff>=0.1.0 ; extra == 'dev'
Requires-Dist: pdfplumber>=0.10 ; extra == 'docs'
Requires-Dist: python-docx>=1.1 ; extra == 'docs'
Requires-Dist: openpyxl>=3.1 ; extra == 'docs'
Requires-Dist: streamlit>=1.30 ; extra == 'gui'
Requires-Dist: pandas>=2.0 ; extra == 'gui'
Requires-Dist: matplotlib>=3.7 ; extra == 'gui'
Requires-Dist: networkx>=3.0 ; extra == 'gui'
Requires-Dist: plotly>=5.18 ; extra == 'gui'
Requires-Dist: graphviz>=0.20 ; extra == 'gui'
Requires-Dist: pillow>=10.0 ; extra == 'gui'
Requires-Dist: numpy>=1.24 ; extra == 'gui'
Requires-Dist: matplotlib>=3.7 ; extra == 'plot'
Requires-Dist: numpy>=1.24 ; extra == 'plot'
Provides-Extra: dev
Provides-Extra: docs
Provides-Extra: gui
Provides-Extra: plot
Summary: Python bindings for Cabinet - Hierarchical Semantic Hashing memory retrieval
Keywords: memory,retrieval,semantic,hashing,RAG,AI-agent
Author: Cabinet Contributors
License: MIT OR Apache-2.0
Requires-Python: >=3.8
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Documentation, https://github.com/Sauomore/Cabinet#readme
Project-URL: Homepage, https://github.com/Sauomore/Cabinet
Project-URL: Issues, https://github.com/Sauomore/Cabinet/issues
Project-URL: Repository, https://github.com/Sauomore/Cabinet

# cabinet

[![PyPI](https://img.shields.io/pypi/v/cabinet-hsh)](https://pypi.org/project/cabinet-hsh/)
[![Python](https://img.shields.io/badge/python-3.8%2B-blue)](https://www.python.org)
[![License](https://img.shields.io/badge/license-MIT%20OR%20Apache--2.0-green)](LICENSE)

> Python bindings for **Cabinet** — a discrete semantic memory retrieval system for AI agents.
>
> Replace 768-dim dense vectors with **20-bit structured integer codes** and retrieve on pure CPU with O(log n) B-tree prefix matching.

---

## What is Cabinet?

Cabinet is a memory retrieval engine designed for Agent scenarios where you need to:

- **Remember** large amounts of text on a laptop or edge device
- **Recall** relevant snippets fast, without GPU
- **Explain** why a snippet was retrieved (category → cluster → word, four-level matching)
- **Update** incrementally without rebuilding the whole index

The core idea is **Hierarchical Semantic Hashing (HSH)**: each word is encoded as a 20-bit structured integer:

```
┌──────┬─────────┬─────────┐
│ feat │   sim   │   abs   │
│ 4-bit│  8-bit  │  8-bit  │
└──────┴─────────┴─────────┘
   ↓        ↓         ↓
 POS tag  cluster   bucket
```

Retrieval becomes integer prefix matching on B-trees, which is tiny, fast, and fully auditable.

---

## Installation

```bash
# Core package (pre-compiled wheels, no Rust needed)
pip install cabinet-hsh

# With optional GUI visualization
pip install cabinet-hsh[gui]

# With document parsing (PDF, DOCX, XLSX)
pip install cabinet-hsh[docs]

# With plotting utilities
pip install cabinet-hsh[plot]

# Development install from source (requires Rust 1.72+)
git clone https://github.com/Sauomore/Cabinet.git
cd Cabinet/cabinet
maturin develop
```

---

## Quick Start

```python
import cabinet

# Open a memory cabinet (~4MB RAM + single SQLite file)
mem = cabinet.Memory(
    path="./agent_memory.db",
    precision="light",    # light | hybrid | precise
    pos_threshold=50,     # common-word promotion threshold
    max_context=4096,     # working-memory window
)

# Insert snippets
mem.insert("用户明天下午3点开会，准备PPT。")
mem.insert("用户喜欢听管弦乐。")
mem.insert("5号楼邻居有梯子，平时放在车库。")

# Query
results = mem.query("会议准备", top_k=5)
for r in results:
    level = ["关联", "同类", "同簇", "精确"][r.match_level - 1]
    print(f"[{level}] score={r.score:.3f} doc_id={r.doc_id}")
    if r.match_level >= 3:
        print(f"  → {mem.decode(r)}")

# Snapshot and close
mem.snapshot("./backup/agent_memory_2026-07-03.db")
mem.close()
```

---

## API Overview

### `cabinet.Memory`

```python
Memory(
    path: str,               # SQLite database path
    precision: str,          # "light" | "hybrid" | "precise"
    pos_threshold: int,      # frequent-word promotion threshold
    max_context: int,        # working-memory capacity in tokens
)
```

Methods:

- `insert(text: str) -> int` — tokenize, encode, and store a document; returns `doc_id`
- `query(text: str, top_k: int = 10) -> list[QueryResult]` — retrieve top-k matches
- `decode(result: QueryResult) -> str | None` — decode the original text of a result
- `snapshot(dst: str) -> None` — copy the database to `dst`
- `close() -> None` — close the database

### `cabinet.QueryResult`

A result object with the following fields:

| Field | Type | Meaning |
|-------|------|---------|
| `doc_id` | `int` | document ID |
| `position` | `int` | word position inside the document |
| `score` | `float` | relevance score |
| `match_level` | `int` | 1=related, 2=same category, 3=same cluster, 4=exact |

### Context decoding

```python
from cabinet import decode_context

results = mem.query("借梯子", top_k=3)
for r in results:
    text = decode_context(mem, r, mode="sentence")
    print(text)
```

Supported `mode` values: `"paragraph"`, `"sentence"`, `"window"`, `"before"`, `"after"`, `"window_sent"`.

---

## Supported Platforms

Pre-compiled wheels are provided for:

- Linux: x86_64, aarch64 (manylinux)
- macOS: universal2 (Intel + Apple Silicon)
- Windows: x64, x86

Requires Python ≥ 3.8 (CPython).

---

## Architecture

```
cabinet (Python API)
  └── PyO3 bindings
      └── cabinet-core (Rust)
          ├── cabinet-hsh     # 20-bit HSH encoding
          ├── cabinet-index   # B-tree prefix index + LSM
          ├── cabinet-store   # SQLite backend
          └── cabinet-router  # relevance routing
```

Three-layer memory model:

1. **Token Store** — raw HSH sequences, append-only WAL buffer
2. **Archive Index** — 16 feature drawers with B-tree (sim, abs) indexes
3. **Working Memory** — LRU hot cache for inference-time hits

---

## When to use Cabinet vs. vector databases

| Scenario | Cabinet | FAISS / Chroma |
|----------|---------|----------------|
| Laptop / edge device | ✅ Tiny CPU model | ❌ Needs GPU or large RAM |
| Incremental updates | ✅ Append-only | ❌ Rebuild clusters |
| Explainable retrieval | ✅ Auditable path | ❌ Black-box similarity |
| Semantic similarity | ⚠️ Discrete approximation | ✅ Dense vectors |

Use Cabinet when you need a small, fast, explainable, and incrementally-updatable memory for Agents.

---

## GUI Visualization

If you installed with `[gui]`:

```bash
cabinet-gui
# or
cd cabinet-gui
streamlit run app.py
```

The GUI includes pages for encoding visualization, memory architecture, retrieval paths, index browser, and an interactive console.

---

## License

MIT OR Apache-2.0

---

> **Cabinet** — let AI remember, and explain why it remembers.

