Metadata-Version: 2.4
Name: latticedb
Version: 0.9.3
Summary: Embedded single-file property-graph database with vector and BM25 full-text search
Author-email: Jeff Hajewski <jeff.hajewski@gmail.com>
Maintainer-email: Jeff Hajewski <jeff.hajewski@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/jeffhajewski/latticedb
Project-URL: Documentation, https://latticedb.org
Project-URL: Repository, https://github.com/jeffhajewski/latticedb
Project-URL: Issues, https://github.com/jeffhajewski/latticedb/issues
Keywords: database,graph,property-graph,vector,full-text,embedded,cypher,hnsw,bm25
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: numpy>=1.20.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"

# LatticeDB Python Bindings

Python bindings for [LatticeDB](https://github.com/jeffhajewski/latticedb), an embedded single-file property-graph database with native vector and BM25 full-text search.

## Installation

```bash
pip install latticedb
```

Published wheels are expected to bundle the native shared library on supported platforms.

If you are installing from a source checkout, the package build can either:

- bundle a prebuilt `liblattice` from `LATTICE_BUNDLE_LIB_DIR` / `LATTICE_BUNDLE_LIB_PATH`, or
- build `liblattice` with Zig during the wheel build

For example, to bundle a staged installed library into a locally built wheel:

```bash
export LATTICE_BUNDLE_LIB_DIR=/tmp/lattice-install/lib
pip wheel . -w dist
pip install dist/latticedb-*.whl
```

At runtime, explicit library discovery overrides still work via `LATTICE_LIB_PATH`, `LATTICE_PREFIX`, and `pkg-config`.

Migration note: embedding helpers now live in the dedicated `latticedb.embedding` module. See [../../docs/client_api_migration.md](../../docs/client_api_migration.md) for the preferred API names and deprecated compatibility aliases.

Installed-prefix workflow:

```bash
zig build install --prefix /tmp/lattice-install
export LATTICE_PREFIX=/tmp/lattice-install
```

Alternatively, discovery can use `pkg-config`:

```bash
export PKG_CONFIG_PATH=/tmp/lattice-install/lib/pkgconfig
```

## Quick Start

```python
import numpy as np
from latticedb import Database

with Database("knowledge.db", create=True, enable_vectors=True, vector_dimensions=4) as db:
    # Create nodes, edges, and index content
    with db.write() as txn:
        alice = txn.create_node(
            labels=["Person"],
            properties={"name": "Alice", "age": 30},
        )
        bob = txn.create_node(
            labels=["Person"],
            properties={"name": "Bob", "age": 25},
        )
        txn.create_edge(alice.id, bob.id, "KNOWS")

        # Index text for full-text search
        txn.fts_index(alice.id, "Alice works on machine learning research")
        txn.fts_index(bob.id, "Bob studies deep learning and neural networks")

        # Store vector embeddings
        txn.set_vector(alice.id, "embedding", np.array([1.0, 0.0, 0.0, 0.0], dtype=np.float32))
        txn.set_vector(bob.id, "embedding", np.array([0.0, 1.0, 0.0, 0.0], dtype=np.float32))

        txn.commit()

    # Query with Cypher
    result = db.query("MATCH (n:Person) WHERE n.age > 20 RETURN n.name, n.age")
    for row in result:
        print(row)

    # Vector similarity search
    query_vec = np.array([0.9, 0.1, 0.0, 0.0], dtype=np.float32)
    for r in db.vector_search(query_vec, k=2):
        print(f"Node {r.node_id}: distance={r.distance:.4f}")

    # Full-text search
    for r in db.fts_search("machine learning"):
        print(f"Node {r.node_id}: score={r.score:.4f}")

    # Fuzzy search (typo-tolerant)
    for r in db.fts_search_fuzzy("machin lerning"):
        print(f"Node {r.node_id}: score={r.score:.4f}")
```

## API Reference

### Database

```python
Database(
    path: str | Path,
    *,
    create: bool = False,        # Create if doesn't exist
    read_only: bool = False,     # Open in read-only mode
    cache_size_mb: int = 100,    # Page cache size
    enable_vectors: bool | None = None, # Preferred vector config flag
    enable_vector: bool | None = None,  # Deprecated compatibility alias
    vector_dimensions: int = 128 # Vector dimensions
)
```

#### Methods

- `open()` / `close()` - Open/close the database (also works as context manager)
- `read()` - Start a read-only transaction (context manager)
- `write()` - Start a read-write transaction (context manager)
- `query(cypher, parameters=None)` - Execute a Cypher query
- `vector_search(vector, k=10, ef_search=64)` - k-NN vector search
- `fts_search(query, limit=10)` - Full-text search
- `fts_search_fuzzy(query, limit=10, max_distance=0, min_term_length=0)` - Fuzzy full-text search
- `read_stream(stream, after_sequence=0, limit=100, timeout_ms=0)` - Read durable stream records by cursor
- `get_stream_offset(stream, consumer)` - Read a committed consumer offset
- `changes(after_sequence=0, limit=100, timeout_ms=0)` - Read the built-in graph changefeed
- `cache_clear()` - Clear the query cache
- `cache_stats()` - Get cache hit/miss statistics

### Transaction

#### Read Operations

- `get_node(node_id)` - Get a node by ID, returns `Node` or `None`
- `node_exists(node_id)` - Check if a node exists
- `get_property(node_id, key)` - Get a property value
- `get_outgoing_edges(node_id)` - Get outgoing edges from a node
- `get_incoming_edges(node_id)` - Get incoming edges to a node
- `is_read_only` / `is_active` - Transaction state

#### Write Operations

- `create_node(labels=[], properties=None)` - Create a node
- `delete_node(node_id)` - Delete a node
- `set_property(node_id, key, value)` - Set a property on a node
- `set_vector(node_id, key, vector)` - Set a vector embedding
- `batch_insert_vectors(label, vectors)` - Insert vector-bearing nodes in one call
- `batch_insert(label, vectors)` - Deprecated compatibility alias for `batch_insert_vectors`
- `fts_index(node_id, text)` - Index text for full-text search
- `create_edge(source_id, target_id, edge_type, properties=None)` - Create an edge
- `delete_edge(source_id, target_id, edge_type)` - Delete an edge
- `set_edge_property(edge_id, key, value)` - Set an edge property by stable edge ID
- `get_edge_property(edge_id, key)` - Get an edge property by stable edge ID
- `remove_edge_property(edge_id, key)` - Remove an edge property by stable edge ID
- `publish_stream(stream, payload, kind="message")` - Publish a durable stream record
- `set_stream_offset(stream, consumer, sequence)` - Commit a durable consumer offset
- `trim_stream(stream, through_sequence)` - Delete stream records through a sequence
- `commit()` / `rollback()` - Commit or rollback the transaction

### Bulk Vector Insertion

Insert many nodes with vectors in a single efficient call:

```python
import numpy as np

with Database("vectors.db", create=True, enable_vectors=True, vector_dimensions=128) as db:
    with db.write() as txn:
        vectors = np.random.rand(1000, 128).astype(np.float32)
        node_ids = txn.batch_insert_vectors("Document", vectors)
        print(f"Created {len(node_ids)} nodes")
        txn.commit()
```

### Full-Text Search

#### Exact Search

```python
results = db.fts_search("machine learning", limit=10)
for r in results:
    print(f"Node {r.node_id}: score={r.score:.4f}")
```

#### Fuzzy Search (Typo-Tolerant)

```python
# Finds "machine learning" even with typos
results = db.fts_search_fuzzy("machne lerning", limit=10)

# Control fuzzy matching sensitivity
results = db.fts_search_fuzzy(
    "machne",
    limit=10,
    max_distance=2,      # Max edit distance (default: 2)
    min_term_length=4,   # Min term length for fuzzy matching (default: 4)
)
```

### Embeddings

LatticeDB includes a built-in hash embedding function and an HTTP client for external embedding services. For new code, prefer the dedicated `latticedb.embedding` module. The package root still exposes deprecated compatibility aliases.

#### Hash Embeddings (Built-in)

Deterministic, no external service needed. Useful for testing or simple keyword-based similarity:

```python
from latticedb.embedding import hash_embed

vec = hash_embed("hello world", dimensions=128)
print(vec.shape)  # (128,)
```

#### HTTP Embedding Client

Connect to Ollama, OpenAI, or compatible APIs:

```python
from latticedb.embedding import EmbeddingClient, EmbeddingApiFormat

# Ollama (default)
with EmbeddingClient("http://localhost:11434") as client:
    vec = client.embed("hello world")

# OpenAI-compatible API
with EmbeddingClient(
    "https://api.openai.com/v1",
    model="text-embedding-3-small",
    api_format=EmbeddingApiFormat.OPENAI,
    api_key="sk-...",
) as client:
    vec = client.embed("hello world")
```

### Edge Traversal

```python
with db.read() as txn:
    outgoing = txn.get_outgoing_edges(node_id)
    for edge in outgoing:
        print(f"{edge.source_id} --[{edge.edge_type}]--> {edge.target_id}")

    incoming = txn.get_incoming_edges(node_id)
    for edge in incoming:
        print(f"{edge.source_id} --[{edge.edge_type}]--> {edge.target_id}")
```

### Cypher Queries

```python
# Pattern matching
result = db.query("MATCH (n:Person) RETURN n.name")

# With parameters
result = db.query(
    "MATCH (n:Person) WHERE n.name = $name RETURN n",
    parameters={"name": "Alice"},
)

# Vector similarity in Cypher
result = db.query(
    "MATCH (n:Document) WHERE n.embedding <=> $vec < 0.5 RETURN n.title",
    parameters={"vec": query_vector},
)

# Full-text search in Cypher
result = db.query(
    'MATCH (n:Document) WHERE n.content @@ "machine learning" RETURN n.title'
)

# Data mutation
db.query("CREATE (n:Person {name: 'Charlie', age: 35})")
db.query("MATCH (n:Person {name: 'Charlie'}) SET n.age = 36")
db.query("MATCH (n:Person {name: 'Charlie'}) DETACH DELETE n")
```

### Query Cache

```python
# Get cache statistics
stats = db.cache_stats()
print(f"Entries: {stats['entries']}, Hits: {stats['hits']}, Misses: {stats['misses']}")

# Clear the cache
db.cache_clear()
```

### Durable Streams and Changefeeds

Streams are durable named event logs stored inside the database file. Records are
published in write transactions, sequence numbers are per stream, and reads use
an explicit cursor. Reads do not acknowledge records; commit offsets separately
when your consumer has processed a batch.

```python
with Database("events.db", create=True) as db:
    with db.write() as txn:
        txn.publish_stream("jobs", {"id": 1, "status": "queued"}, kind="job.queued")
        txn.commit()

    records = db.read_stream("jobs", after_sequence=0, limit=100, timeout_ms=0)

    with db.write() as txn:
        txn.set_stream_offset("jobs", "worker-a", records[-1].sequence)
        txn.trim_stream("jobs", records[-1].sequence - 1)
        txn.commit()
```

`db.changes()` reads the reserved `__lattice_changes` stream. It emits semantic
graph events such as `node.insert`, `node.property_set`, `edge.delete`, and
`edge.property_remove`, with payloads represented as normal Python values.

## Supported Property Types

- `None` - Null value
- `bool` - Boolean
- `int` - 64-bit integer
- `float` - 64-bit float
- `str` - UTF-8 string
- `bytes` - Binary data
- NumPy `ndarray` (`float32`) - Vector embeddings

Nested `list` and `dict` values are not currently exposed by the public bindings/C API.

## Error Handling

```python
from latticedb import LatticeError, LatticeNotFoundError, LatticeIOError

try:
    with Database("nonexistent.db") as db:
        pass
except LatticeNotFoundError:
    print("Database not found")
except LatticeIOError:
    print("I/O error")
except LatticeError as e:
    print(f"Error: {e}")
```

## Requirements

- Python 3.9+
- NumPy (for vector operations)
- The native LatticeDB library (`liblattice.dylib` / `liblattice.so`)

## License

MIT
