Metadata-Version: 2.4
Name: hybi
Version: 0.1.1
Summary: Python SDK for HyperBinder - a neurosymbolic database for AI applications
Project-URL: Homepage, https://github.com/hyperbinder/hyperbinder
Project-URL: Documentation, https://docs.hyperbinder.io
Project-URL: Repository, https://github.com/hyperbinder/hyperbinder
Author: HyperBinder Team
License-Expression: MIT
Keywords: ai,embeddings,hyperbinder,llm,neurosymbolic,rag,retrieval,vector-database
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Requires-Dist: httpx>=0.24.0
Requires-Dist: numpy<3,>=1.20.0
Requires-Dist: pandas<3,>=1.5.0
Provides-Extra: dev
Requires-Dist: black>=23.0; extra == 'dev'
Requires-Dist: hypothesis>=6.82; extra == 'dev'
Requires-Dist: mutmut>=2.4; extra == 'dev'
Requires-Dist: mypy>=1.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest-randomly>=3.15; extra == 'dev'
Requires-Dist: pytest-rerunfailures>=13.0; extra == 'dev'
Requires-Dist: pytest-timeout>=2.2; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.1; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.0; extra == 'docs'
Requires-Dist: mkdocs>=1.5; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.24; extra == 'docs'
Provides-Extra: local
Requires-Dist: hybi-local>=0.1.0; extra == 'local'
Description-Content-Type: text/markdown

# HyperBinder Python SDK

A Python client for HyperBinder — the compositional semantic database
that combines vector search, graph traversal, and SQL-like queries with
per-field encoding strategies.

## Installation

```bash
pip install hybi
```

This installs the HTTP-only Python SDK — enough to talk to a running
HyperBinder server.

## Quick Start

```python
from hybi import HyperBinder, RelationalTable, Field, Encoding
import pandas as pd

# Connect to a running HyperBinder server
hb = HyperBinder("http://localhost:8000")

# Sample data
df = pd.DataFrame({
    "id": ["1", "2", "3"],
    "category": ["AI", "Cloud", "Analytics"],
    "text": [
        "Artificial intelligence and machine learning solutions",
        "Cloud computing and infrastructure services",
        "Data analytics and business intelligence",
    ],
    "revenue": [5000000, 3000000, 2000000],
})

# Define a schema with per-field encoding
schema = RelationalTable(
    primary_key="id",
    columns={
        "id": Field(encoding=Encoding.EXACT),
        "category": Field(encoding=Encoding.EXACT),
        "text": Field(encoding=Encoding.SEMANTIC),
        "revenue": Field(encoding=Encoding.NUMERIC),
    },
)

# Ingest
result = hb.ingest(df, collection="companies", schema=schema, dim=384)
print(f"Ingested {result.rows_ingested} rows")

# Semantic search
results = hb.search("AI and machine learning", collection="companies", top_k=3)
for r in results:
    print(f"{r.data['text']}: {r.score:.3f}")

# SQL-like query
filtered = hb.select(
    collection="companies",
    where=[("revenue", ">", 2500000)],
    order_by=[("revenue", True)],
)
for row in filtered.rows:
    print(row)

# Hybrid query (semantic + filters)
results = hb.search(
    "cloud services",
    collection="companies",
    filters=[("revenue", ">", 2000000)],
    top_k=5,
)
```

## Key Features

### 🎯 Per-Field Encoding Strategies

Unlike vector databases that encode entire documents into a single
vector, HyperBinder lets you specify **different encoding strategies
for each field**:

```python
schema = RelationalTable(
    primary_key="product_id",
    columns={
        "product_id": Field(encoding=Encoding.EXACT),     # Hash-based exact match
        "category": Field(encoding=Encoding.EXACT),       # Categorical exact match
        "name": Field(encoding=Encoding.SEMANTIC),        # Embedding-based similarity
        "description": Field(encoding=Encoding.SEMANTIC), # Embedding-based similarity
        "price": Field(encoding=Encoding.NUMERIC),        # Numeric comparison
        "stock": Field(encoding=Encoding.NUMERIC),        # Numeric comparison
    },
)
```

This enables queries that blend matching types in one call:

```python
# Find products semantically similar to "laptop computer"
# WHERE category exactly matches "Electronics" (not similar, exact)
# AND price is between 500 and 1500 (numeric range)
# AND stock > 0 (numeric comparison)
results = hb.search(
    "laptop computer",
    collection="products",
    filters=[
        ("category", "==", "Electronics"),
        ("price", ">=", 500),
        ("price", "<=", 1500),
        ("stock", ">", 0),
    ],
    top_k=10,
)
```

- Exact match where you need it (IDs, categories)
- Semantic search where you need it (descriptions, text)
- Numeric comparisons where you need it (prices, counts)
- All in one query, one database

### 📊 Hybrid Queries (Semantic + Structured)

Combine semantic search with SQL-like filters:

```python
# Semantic search with exact filters
results = hb.search(
    "machine learning research",
    collection="papers",
    filters=[
        ("year", ">=", "2020"),
        ("citations", ">", 1000),
        ("peer_reviewed", "==", "true"),
    ],
    top_k=10,
)

# Pure SQL-like query
result = hb.select(
    collection="papers",
    where=[
        ("author", "==", "Vaswani"),
        ("year", ">=", "2017"),
    ],
    order_by=[("citations", True)],
    limit=10,
)
```

**Supported operators:** `=`, `==`, `!=`, `<>`, `>`, `>=`, `<`, `<=`

## Data Ingestion

### Basic ingestion with a schema

Always define a schema with encoding types:

```python
from hybi import HyperBinder, RelationalTable, Field, Encoding
import pandas as pd

hb = HyperBinder("http://localhost:8000")

df = pd.DataFrame({
    "id": ["1", "2", "3"],
    "name": ["Product A", "Product B", "Product C"],
    "category": ["Electronics", "Books", "Clothing"],
    "description": ["High-quality electronics", "Bestselling books", "Fashion items"],
    "price": [299.99, 19.99, 49.99],
})

schema = RelationalTable(
    primary_key="id",
    columns={
        "id": Field(encoding=Encoding.EXACT),
        "name": Field(encoding=Encoding.SEMANTIC),
        "category": Field(encoding=Encoding.EXACT),
        "description": Field(encoding=Encoding.SEMANTIC),
        "price": Field(encoding=Encoding.NUMERIC),
    },
)

result = hb.ingest(df, collection="products", schema=schema, dim=384)
print(f"Ingested {result.rows_ingested} rows")
```

### Encoding types

| Encoding    | Use for                    | How it works               | Example fields                     |
| ----------- | -------------------------- | -------------------------- | ---------------------------------- |
| `EXACT`     | IDs, categories, tags      | Hash-based exact match     | `id`, `status`, `category`         |
| `SEMANTIC`  | Text, descriptions, titles | Embedding-based similarity | `title`, `description`, `content`  |
| `NUMERIC`   | Numbers, prices, counts    | Numeric comparison         | `price`, `quantity`, `rating`      |

### Without a schema

If you don't provide a schema, HyperBinder will auto-detect encoding
per column, but results may be suboptimal:

```python
# Not recommended — auto-detection may not choose the optimal encoding
result = hb.ingest(df, collection="products", dim=384)
```

## Searching

### Semantic search

```python
results = hb.search("laptop computers", collection="products", top_k=5)
for r in results:
    print(f"Score: {r.score:.3f}")
    print(f"Name:  {r.data['name']}")
    print(f"Desc:  {r.data['description']}")
```

### Hybrid: semantic + filters

```python
results = hb.search(
    "artificial intelligence",
    collection="products",
    filters=[
        ("category", "==", "Electronics"),
        ("price", ">=", 100),
        ("price", "<=", 500),
        ("in_stock", "==", "true"),
    ],
    top_k=10,
)
```

### Pure SQL-like

```python
result = hb.select(
    collection="products",
    columns=["name", "price", "category"],
    where=[
        ("category", "==", "Electronics"),
        ("price", ">", 200),
    ],
    order_by=[("price", True)],  # True = descending
    limit=10,
)
for row in result.rows:
    print(row)
```

## Collection management

```python
products = hb.collection("products")
if products.exists():
    print(f"Collection has {products.count()} rows")

stats = products.stats()
print(f"Columns:   {stats.columns}")
print(f"Dimension: {stats.dimension}")

for coll in hb.list_collections():
    print(f"{coll.name}: {coll.rows} rows")

# Delete all rows but keep the collection structure
<!-- FORWARD-LOOKING: Collection.truncate() fluent form ships with PR
     feat/namespace-row-counts. Until that lands on master, use the
     equivalent hb.truncate(collection="products") instead. Remove
     this comment once feat/namespace-row-counts is merged. -->
products.truncate()

# Delete the entire collection
products.delete()
```

## Advanced features

### Multi-hop graph traversal

```python
results = hb.multihop(
    collection="knowledge_graph",
    start={"entity": "Albert Einstein"},
    hops=[("discovered", "theory"), ("influenced", "scientist")],
    top_k=10,
)
```

### RAG context assembly

```python
context = hb.get_context(
    "What are the latest AI developments?",
    collection="research_papers",
    top_k=5,
)

prompt = f"""Context: {context.text}

Question: What are the latest AI developments?
Answer:"""
```

### Aggregations

```python
result = hb.aggregate(
    collection="sales",
    group_by=["region", "product_type"],
    aggregations=[
        ("revenue", "sum", "total_revenue"),
        ("orders", "count", "order_count"),
        ("revenue", "avg", "avg_order"),
    ],
    order_by=["total_revenue"],
)

for group in result.groups:
    print(f"{group['region']}: ${group['total_revenue']:,.2f}")
```

## Common issues

### Search returns zero results

- Make sure you ingested with a schema, not just the raw DataFrame.
- Confirm the collection has rows: `hb.collection("products").count()`.

### Duplicate results after re-ingest

Clear the collection before re-ingesting:

<!-- FORWARD-LOOKING: Collection.truncate() fluent form ships with PR
     feat/namespace-row-counts. Until that lands, call
     hb.truncate(collection="products") instead. Remove this comment
     once feat/namespace-row-counts is merged. -->
```python
hb.collection("products").truncate()  # keep schema, drop rows
# or
hb.collection("products").delete()    # drop everything
```

## Quick reference

```python
from hybi import HyperBinder, RelationalTable, Field, Encoding

hb = HyperBinder("http://localhost:8000")

# Schema
schema = RelationalTable(
    primary_key="id",
    columns={
        "id": Field(encoding=Encoding.EXACT),
        "text": Field(encoding=Encoding.SEMANTIC),
        "category": Field(encoding=Encoding.EXACT),
        "price": Field(encoding=Encoding.NUMERIC),
    },
)

# Ingest
hb.ingest(df, collection="data", schema=schema, dim=384)

# Search
results = hb.search("query", collection="data", top_k=10)

# Hybrid search
results = hb.search(
    "query",
    collection="data",
    filters=[("category", "==", "value"), ("price", ">", 100)],
    top_k=10,
)

# SQL-like
result = hb.select(collection="data", where=[...], order_by=[...])

# Collection management
hb.collection("data").exists()
hb.collection("data").count()
hb.collection("data").truncate()  # ships with feat/namespace-row-counts
hb.collection("data").delete()
```

<!-- FORWARD-LOOKING: Collection.truncate() fluent form ships with PR
     feat/namespace-row-counts. Until that lands, use
     hb.truncate(collection=...). Remove this comment and the inline
     note above once feat/namespace-row-counts is merged. -->


## Contributing

See the [Contributing Guide](https://github.com/hyperbinder/sdk) for details.

## License

MIT License — see LICENSE for details.
