Metadata-Version: 2.4
Name: lakehouse-memory
Version: 0.1.0b3
Summary: Unity Catalog-native episodic, semantic, and working memory for AI agents on Databricks
Project-URL: Homepage, https://github.com/travis-burmaster/lakehouse-memory
Project-URL: Issues, https://github.com/travis-burmaster/lakehouse-memory/issues
Author-email: Travis Burmaster <travis@burmaster.com>
License: Apache-2.0
License-File: LICENSE
Keywords: ai-agents,databricks,memory,unity-catalog,vector-search
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: databricks-sdk>=0.20.0
Requires-Dist: databricks-sql-connector>=3.0.0
Requires-Dist: databricks-vectorsearch>=0.40
Requires-Dist: pydantic>=2.0.0
Provides-Extra: dev
Requires-Dist: build>=1.0; extra == 'dev'
Requires-Dist: mypy>=1.8.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: python-dotenv>=1.0; extra == 'dev'
Requires-Dist: ruff>=0.3.0; extra == 'dev'
Requires-Dist: twine>=5.0; extra == 'dev'
Provides-Extra: langchain
Requires-Dist: langchain-core>=0.3.0; extra == 'langchain'
Description-Content-Type: text/markdown

# lakehouse-memory

[![PyPI](https://img.shields.io/pypi/v/lakehouse-memory.svg?label=pypi&include_prereleases)](https://pypi.org/project/lakehouse-memory/)
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![CI](https://github.com/travis-burmaster/lakehouse-memory/actions/workflows/ci.yml/badge.svg)](https://github.com/travis-burmaster/lakehouse-memory/actions/workflows/ci.yml)

Unity Catalog-native episodic, semantic, and working memory for AI agents on Databricks.

> **Status:** Pre-release (`0.1.0b3`). Public from day one. The core library, LangChain adapters, and DAB starter (M3) are workspace-validated; the docs site (M4) is not yet shipped. See [the spec](https://github.com/travis-burmaster/lakehouse-memory) for design intent.

## The pitch

Memory is the missing Databricks layer. The standard workaround is a sidecar vector DB with its own governance, access control, and lineage — a system you can't ship. Memory belongs in Unity Catalog, where your data already lives.

`lakehouse-memory` gives AI agents on Databricks three first-class memory primitives — episodic, semantic, and working — backed by Unity Catalog tables and Databricks Vector Search.

## Install

```bash
pip install --pre lakehouse-memory
```

The `--pre` flag is required while the package is in pre-release. Once `0.1.0` ships (alongside the M3 DAB starter and M4 docs), `pip install lakehouse-memory` will work without the flag.

## Quickstart with the DAB starter (recommended)

Bootstrap the whole reference architecture — UC tables, Vector Search indexes,
and a working chat agent — in your Databricks workspace:

```bash
databricks bundle init https://github.com/travis-burmaster/lakehouse-memory \
  --template-dir templates/lakehouse-memory-bundle
cd <project-name>
databricks bundle deploy
databricks bundle run setup_job
```

You'll be prompted for your catalog, schema, Vector Search endpoint, SQL
warehouse HTTP path, and LLM serving endpoint. After `setup_job` finishes,
open `notebooks/02_chat_agent.ipynb` and run all cells — a memory-backed
agent in under 10 minutes.

## Manual setup (advanced)

```python
from lakehouse_memory import Memory, MemoryConfig, Scope
from lakehouse_memory.client import SqlConnectorClient
from lakehouse_memory.vector_databricks import DatabricksVectorIndex
import os

config = MemoryConfig(catalog="main", schema_name="agent_memory")

client = SqlConnectorClient(
    server_hostname=os.environ["DATABRICKS_HOST"].replace("https://", ""),
    http_path=os.environ["DATABRICKS_HTTP_PATH"],
    access_token=os.environ["DATABRICKS_TOKEN"],
)

index = DatabricksVectorIndex(
    endpoint_name=os.environ["DATABRICKS_VECTOR_SEARCH_ENDPOINT"],
    index_name=f"{config.catalog}.{config.schema_name}.episodic_idx",
    workspace_url=os.environ["DATABRICKS_HOST"],
    access_token=os.environ["DATABRICKS_TOKEN"],
    columns=["event_id", "text", "user_id", "session_id", "agent_id"],
)

mem = Memory(config=config, client=client, index=index, scope=Scope(user_id="u_1"))
mem.provision(
    vector_search_endpoint=os.environ["DATABRICKS_VECTOR_SEARCH_ENDPOINT"],
    workspace_url=os.environ["DATABRICKS_HOST"],
    access_token=os.environ["DATABRICKS_TOKEN"],
)

# Write a fact
mem.semantic.upsert(fact="User prefers SQL over Python.")

# Delta Sync indexes are TRIGGERED — explicitly fire the sync after writes.
# (For production, consider switching to CONTINUOUS pipelines.)
mem.semantic._index.trigger_sync()

# Wait for sync; production code would use exponential backoff
import time; time.sleep(15)

facts = mem.semantic.retrieve("language preferences", k=3)
```

**LangChain integration:**

```python
chat = mem.as_langchain_chat_history(limit=50)
retriever = mem.as_langchain_retriever(k=5)
```

## Production gaps

(Coming in M4. Short version: compaction at scale, multi-tenant RLS, regression evals, observability, and custom retrieval strategies are deliberately not in OSS. If you want help building past those, the [Burmaster Databricks AI Practice](https://burmaster.com) does this for a living.)

## License

Apache 2.0. See [LICENSE](LICENSE).
