Metadata-Version: 2.4
Name: smriti-memory-ai
Version: 1.0.3
Summary: Smriti AI: inference-time semantic memory for small language models
Author-email: Alton Lee Wei Bin <creator35lwb@gmail.com>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/Luciferai04/smriti-ai
Project-URL: Repository, https://github.com/Luciferai04/smriti-ai
Keywords: memory,llm,mlops,semantic-retrieval,gemma-4,fastapi,privacy
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: numpy==1.26.4
Requires-Dist: scikit-learn==1.5.2
Requires-Dist: networkx==3.2.1
Requires-Dist: fastapi==0.128.8
Requires-Dist: pydantic==2.13.3
Requires-Dist: uvicorn==0.39.0
Requires-Dist: prometheus-client==0.25.0
Requires-Dist: cryptography==46.0.7
Requires-Dist: pyyaml==6.0.3
Provides-Extra: ml
Requires-Dist: torch==2.8.0; extra == "ml"
Requires-Dist: transformers==5.7.0; extra == "ml"
Requires-Dist: huggingface-hub==1.13.0; extra == "ml"
Requires-Dist: sentence-transformers==5.2.3; extra == "ml"
Requires-Dist: faiss-cpu==1.13.0; extra == "ml"
Requires-Dist: nltk==3.9.4; extra == "ml"
Provides-Extra: bench
Requires-Dist: psutil==7.1.3; extra == "bench"
Requires-Dist: locust==2.43.4; extra == "bench"
Provides-Extra: backends
Requires-Dist: redis==7.1.0; extra == "backends"
Requires-Dist: psycopg2-binary==2.9.11; extra == "backends"
Provides-Extra: demo
Requires-Dist: jinja2==3.1.6; extra == "demo"
Requires-Dist: python-multipart==0.0.27; extra == "demo"
Provides-Extra: integrations
Requires-Dist: langchain-core==1.3.2; extra == "integrations"
Requires-Dist: llama-index-core==0.14.21; extra == "integrations"
Provides-Extra: training
Requires-Dist: torch==2.8.0; extra == "training"
Requires-Dist: transformers==5.7.0; extra == "training"
Requires-Dist: huggingface-hub==1.13.0; extra == "training"
Requires-Dist: accelerate==1.12.0; extra == "training"
Provides-Extra: security
Requires-Dist: cryptography==46.0.7; extra == "security"
Provides-Extra: dev
Requires-Dist: pytest==9.0.3; extra == "dev"
Requires-Dist: pytest-cov==7.1.0; extra == "dev"
Requires-Dist: httpx==0.28.1; extra == "dev"
Requires-Dist: jinja2==3.1.6; extra == "dev"
Requires-Dist: python-multipart==0.0.27; extra == "dev"
Requires-Dist: black==26.3.1; extra == "dev"
Requires-Dist: ruff==0.15.12; extra == "dev"
Requires-Dist: pip-audit==2.10.0; extra == "dev"
Requires-Dist: build==1.3.0; extra == "dev"
Requires-Dist: pytest-benchmark<6.0.0,>=4.0.0; extra == "dev"
Requires-Dist: bandit<2.0.0,>=1.7.0; extra == "dev"
Requires-Dist: safety<4.0.0,>=3.0.0; extra == "dev"
Requires-Dist: detect-secrets<2.0.0,>=1.5.0; extra == "dev"
Provides-Extra: full
Requires-Dist: torch==2.8.0; extra == "full"
Requires-Dist: transformers==5.7.0; extra == "full"
Requires-Dist: huggingface-hub==1.13.0; extra == "full"
Requires-Dist: accelerate==1.12.0; extra == "full"
Requires-Dist: sentence-transformers==5.2.3; extra == "full"
Requires-Dist: faiss-cpu==1.13.0; extra == "full"
Requires-Dist: nltk==3.9.4; extra == "full"
Requires-Dist: gensim==4.4.0; extra == "full"
Requires-Dist: spacy==3.8.11; extra == "full"
Requires-Dist: neo4j==5.28.2; extra == "full"
Requires-Dist: redis==7.1.0; extra == "full"
Requires-Dist: psycopg2-binary==2.9.11; extra == "full"
Requires-Dist: jinja2==3.1.6; extra == "full"
Requires-Dist: python-multipart==0.0.27; extra == "full"
Requires-Dist: langchain-core==1.3.2; extra == "full"
Requires-Dist: llama-index-core==0.14.21; extra == "full"

# Smriti AI

[![CI](https://github.com/Luciferai04/smriti-ai/actions/workflows/ci.yml/badge.svg)](https://github.com/Luciferai04/smriti-ai/actions/workflows/ci.yml)

Smriti AI is a local-first, training-free memory layer for small language models. It wraps a frozen HuggingFace causal language model with persistent memory, semantic retrieval, graph memory, identity governance, API tooling, Docker deployment, benchmarks, and production-readiness checks.

The name comes from *smriti* (IAST: *smṛti*), a Sanskrit term associated with memory and remembrance. Wisdom Library describes Smritis as "that which has to be remembered": [Wisdom Library: Smriti](https://www.wisdomlib.org/definition/smriti).

> Small models are not only limited by parameter count. They are limited by the absence of durable memory.

Smriti AI keeps the base model frozen. It improves long-term recall by storing external memory, retrieving only relevant facts, injecting them into the prompt, and updating the memory after each interaction. No LoRA, fine-tuning, adapter weights, or model retraining are required for the inference-time memory system.

## Current Status

Smriti AI is packaged on PyPI as `smriti-memory-ai` with the importable package `smriti`. The GitHub repository and Hugging Face resources remain named `smriti-ai`.

| Area | Status |
|---|---|
| Python package | `pyproject.toml`, `src/` layout, console scripts, build workflow. |
| Core memory | TF-IDF compatibility mode plus semantic session/topic/fact memory. |
| Retrieval | Sentence-transformer embeddings, FAISS/NumPy fallback, cosine similarity with temporal decay. |
| Graph memory | Per-session `networkx` knowledge graph with simple triple extraction and traversal. |
| Identity governance | Embedding-based persona fingerprint with drift checks and refinement hooks. |
| Backends | JSON, SQLite, Redis, and Postgres backend abstractions. |
| Privacy | Optional encrypted memory blobs and `/memory/delete`. |
| Audit controls | List, search, pin, archive, update, and delete individual memory entries. |
| Auth/RBAC | Optional API-key auth with `user` and `admin` roles. |
| API | FastAPI service with CORS, OpenAPI docs, metrics, health checks, API-key/RBAC option. |
| CLI | `smriti`, `smriti-cli`, `smriti-api`, migration commands, and backward-compatible `mempalace`. |
| Docker | CPU, GPU, demo, training Dockerfiles and compose profiles. |
| Monitoring | Prometheus endpoint and Grafana dashboard assets. |
| Benchmarks | Gemma 4 public benchmark policy, cross-model harness, LoCoMo-style runner, identity bench. |
| Provider adapters | Local HF, HF Endpoint, Ollama, vLLM, and OpenAI-compatible generation adapters. |
| Memory standard | Portable memory protocol plus backend conformance runner and schema migrations. |
| Research evidence | Curated historical/current benchmark lineage without shipping noisy raw logs. |
| Tests/CI | Unit/integration tests, package build/install checks, audit report, GitHub Actions. |

## What Smriti AI Is

Smriti AI is an inference-time memory runtime. It sits between the user and the model.

For each turn, it can:

1. Read the current user message.
2. Retrieve relevant memories scoped by `session_id` and `topic_id`.
3. Query related graph triples.
4. Build an augmented prompt.
5. Generate with a frozen base model.
6. Check persona/identity drift.
7. Run a refinement pass if needed.
8. Extract new facts and triples.
9. Persist updated memory.

It is designed for:

| User Type | Why They Use It |
|---|---|
| Enterprise AI product teams | Add auditable memory to existing AI services without retraining models. |
| Personal assistant developers | Give local assistants user-specific recall across sessions. |
| Research groups | Evaluate memory augmentation, retrieval modes, and continual-learning boundaries. |
| Privacy-sensitive organizations | Keep user memory local or self-hosted with encryption and deletion hooks. |
| Multi-agent builders | Let planner, summarizer, and executor agents share one user's isolated memory. |

## What Smriti AI Is Not

Smriti AI is not a hosted model provider, a replacement foundation model, or a fine-tuning method for inference-time recall. It does not magically improve every task. It improves tasks where persistent user facts, context continuity, retrieval, or persona stability matter.

The separate `src/training/` package exists for replay/EWC research experiments. That training code is intentionally separate from the inference-time Smriti AI memory runtime.

## Key Features

| Feature | Description |
|---|---|
| Semantic retrieval | Uses embeddings to retrieve meaning-similar facts even when wording changes. |
| Hierarchical memory | Stores memory as `sessions -> topics -> facts`, giving multi-user and multi-task isolation. |
| Temporal decay | Scores memories by cosine similarity multiplied by `exp(-lambda * age)`. |
| TF-IDF mode | Keeps a lightweight lexical retrieval mode for compatibility and low-dependency environments. |
| Knowledge graph | Extracts simple subject-relation-object triples and injects related facts. |
| Identity fingerprint | Averages persona/self-description embeddings and detects drift in model outputs. |
| Memory compression | Summarizes older topic entries and archives originals before eviction. |
| Durable backends | JSON, SQLite, Redis, and Postgres backends through a common interface. |
| Encryption hooks | Optional symmetric encryption for memory blobs with `SMRITI_MEMORY_KEY`. |
| Deletion support | `/memory/delete` and CLI delete commands for user memory removal. |
| Audit dashboard | Authenticated memory table for search, edit, pin, archive, and per-entry deletion. |
| Provider adapters | Swap local Hugging Face for HF Endpoints, Ollama, vLLM, or OpenAI-compatible APIs. |
| FastAPI service | `/chat`, `/memory/load`, `/memory/save`, `/memory/delete`, `/graph/query`, `/metrics`, `/health`. |
| CLI | Local commands for config, chat, save/load/delete, graph query, server start, and benchmarks. |
| Docker | Compose stacks for API, Redis/Postgres, Prometheus/Grafana, demo, CPU/GPU images. |
| Benchmarks | Gemma 4 memory-retention, retrieval-mode comparison, latency, identity, LoCoMo-style long-memory, historical-protocol rerun. |
| CI/CD | GitHub Actions for tests, style checks, package build/install, Docker, release workflows. |

## Research Lineage And Principles

Smriti AI was built from scratch around a few durable ideas from memory-augmented small-model systems.

| Principle | Smriti AI Interpretation | Current Implementation |
|---|---|---|
| External memory | Memory should live outside model weights in a portable, inspectable store. | `MemPalaceLite`, `SemanticMemory`, durable backends, JSON export/import. |
| Training-free recall | User recall should improve at inference time without changing model weights. | Retrieved memory is injected into prompts on each call. |
| Identity continuity | Assistants should maintain persona and user-specific context across turns. | `IdentityFingerprint` detects embedding drift and can trigger refinement. |
| Small-model augmentation | Small models become more useful when paired with explicit state. | Works with Gemma 4 and other HuggingFace causal LMs. |
| Local-first privacy | Memory should be deployable on a user's own machine or infrastructure. | JSON/SQLite local stores, optional encryption, deletion endpoint. |
| MLOps reproducibility | Memory systems should be benchmarked, tested, packaged, monitored, and deployable. | CI, Docker, benchmark CSVs, reports, model card, monitoring stack. |

Historical numbers from earlier writeups are treated as research lineage. Current claims should use the current Smriti AI benchmark artifacts in this repository.

For deeper lineage and reproducibility:

| Document | Purpose |
|---|---|
| `research/evidence/README.md` | Curated historical/current evidence policy. |
| `research/evidence/benchmark_lineage.csv` | Historical and current result ledger. |
| `research/README.md` | Curated original notebook/log/excerpt manifest. |
| `docs/memory_format.md` | Portable Smriti memory format and backend contract. |
| `docs/memory_spec.md` | Stable memory protocol for JSON, SQLite, Redis, and Postgres entries. |
| `docs/kaggle_colab.md` | Kaggle/Colab reproducibility guide using package imports. |
| `demos/smriti_kaggle.ipynb` / `demos/smriti_colab.ipynb` | Reproducible package-import notebooks. |

## Architecture

```mermaid
flowchart TD
    U["User / Agent Message"] --> A["SmritiAILite.chat"]
    A --> Q["Session + Topic Scope"]
    Q --> R["Memory Retrieval"]
    R --> S["SemanticMemory or TF-IDF"]
    R --> G["KnowledgeGraphMemory"]
    S --> C["Context Builder"]
    G --> C
    C --> P["Augmented Prompt"]
    P --> M["Frozen HuggingFace Causal LM"]
    M --> I["IdentityFingerprint"]
    I -->|"aligned"| O["Final Response"]
    I -->|"drift"| F["Refinement Pass"]
    F --> O
    O --> E["Fact + Triple Extraction"]
    E --> B["Durable Backend"]
    B --> J["JSON / SQLite / Redis / Postgres"]
```

## Core Modules

| Module | File | Responsibility |
|---|---|---|
| `SmritiAILite` | `src/smriti/agent.py` | Main model wrapper for HuggingFace generation plus memory updates. |
| `BaselineGemma` | `src/smriti/agent.py` | Plain model baseline with no memory layer. |
| `MemPalaceLite` | `src/smriti/core.py` | High-level memory facade and backward-compatible API. |
| `SemanticMemory` | `src/smriti/semantic_memory.py` | Hierarchical embedding memory with FAISS/NumPy retrieval, compression, JSON persistence. |
| `KnowledgeGraphMemory` | `src/smriti/knowledge_graph.py` | Triple extraction, graph storage, query traversal, natural-language rendering. |
| `IdentityFingerprint` | `src/smriti/identity_fingerprint.py` | Persona vectors, drift scoring, adaptive thresholds, refinement prompts. |
| `MACPLite` | `src/smriti/macp.py` | Compact reasoning continuity state. |
| Backends | `src/smriti/backends.py` | JSON, SQLite, Redis, Postgres, encryption, deletion. |
| Backend conformance | `src/smriti/backend_conformance.py` | Reusable compatibility checks for backend authors. |
| Backend migrations | `src/smriti/migrations.py`, `src/smriti/sql/` | Versioned SQLite/Postgres schema files. |
| Audit | `src/smriti/audit.py`, `src/smriti/audit_api.py` | Memory inspection, pin/archive/update/delete control plane. |
| Auth | `src/smriti/auth.py` | API-key authentication and user/admin RBAC checks. |
| Provider adapters | `src/smriti/adapters/` | Local HF, HF Endpoint, Ollama, vLLM, OpenAI-compatible generation. |
| Config | `src/smriti/config.py` | `config.yaml` and environment variable loading. |
| API | `src/smriti/api.py` | FastAPI app, observability, optional API-key auth. |
| CLI | `src/smriti/cli.py` | Local commands for config, memory, API, graph, and benchmark workflows. |
| Integrations | `src/smriti/integrations/` | LangChain and LlamaIndex adapters. |
| Training research | `src/training/ewc_replay.py` | Optional replay/EWC experiments, separate from runtime memory. |

## Installation

### Requirements

| Requirement | Notes |
|---|---|
| Python | 3.10 or newer. |
| OS | Tested locally on macOS; CI validates Linux. Windows helpers are included. |
| Model runtime | Optional unless using `SmritiAILite` with a real HuggingFace model. |
| Gemma 4 access | Public benchmark path uses `google/gemma-4-E2B-it`; users may need Hugging Face access/login. |

### Install From GitHub

```bash
git clone https://github.com/Luciferai04/smriti-ai.git
cd smriti-ai
python3 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e ".[ml,bench]"
```

For development:

```bash
pip install -e ".[dev,ml,bench]"
```

For all optional integrations:

```bash
pip install -e ".[full]"
```

### Install From PyPI

The public PyPI distribution is `smriti-memory-ai` because the shorter
`smriti-ai` project name is already occupied on PyPI by another owner. The
Python import stays clean and stable:

```python
from smriti import SmritiAILite
```

Recommended install:

```bash
pip install "smriti-memory-ai[ml]==1.0.3"
```

GitHub tag fallback:

```bash
pip install "smriti-memory-ai[ml] @ git+https://github.com/Luciferai04/smriti-ai.git@v1.0.3"
```

Then verify:

```bash
python -c "from smriti import SmritiAILite, SemanticMemory, KnowledgeGraphMemory; print('Smriti import OK')"
```

### One-Shot Installer

Linux/macOS:

```bash
./install_smriti.sh
```

Windows PowerShell:

```powershell
./install_smriti.ps1
```

Windows batch:

```bat
install_smriti.bat
```

The installer creates a local virtual environment, installs package extras, writes `config.yaml`, and can cache Gemma 4.

## Model And API Keys

Smriti AI itself does not require a model-provider API key. It is a memory layer.

| Key | Required? | Purpose |
|---|---:|---|
| `SMRITI_API_KEY` | Optional | Protects Smriti API routes. Clients send `x-api-key`. |
| `SMRITI_MEMORY_KEY` | Optional but recommended | Encrypts memory blobs before writing to disk or backend. |
| `HF_TOKEN` | Sometimes | Needed only if Hugging Face model access requires authentication. |
| Provider keys | Depends | Needed only if users connect Smriti to a hosted provider instead of a local model. |

For Gemma 4 via Hugging Face:

```bash
hf auth login
# or
export HF_TOKEN="your-huggingface-token"
```

For a protected Smriti API:

```bash
export SMRITI_API_KEY="replace-with-service-secret"
curl -H "x-api-key: $SMRITI_API_KEY" http://localhost:8000/health
```

For encrypted memory:

```bash
export SMRITI_MEMORY_KEY="replace-with-a-long-random-secret"
```

Do not commit real keys to Git.

## Quick Start For Customers

### Enterprise AI Teams

Deploy Smriti AI as a memory service behind your API gateway with Redis/Postgres and monitoring enabled:

```bash
cp api_keys.example.json api_keys.json
export AUTH_ENABLED=true
export SMRITI_API_KEYS_PATH=api_keys.json
COMPOSE_PROFILES=redis,monitoring SMRITI_MEMORY_BACKEND=redis docker compose up -d --build
```

Integrate by sending `POST /chat` requests with a stable `user_id`, then scrape `/metrics` with Prometheus and review dashboards in Grafana. Use `admin` keys only for support/audit workflows.

### Indie Developers And Personal Assistants

Install locally and start with JSON or SQLite memory:

```bash
pip install "smriti-memory-ai @ git+https://github.com/Luciferai04/smriti-ai.git@v1.0.3"
smriti-cli config wizard --backend json --overwrite
smriti-cli --session-id alex --topic-id profile chat "My name is Alex and I work at Ocean Lab."
smriti-cli --session-id alex --topic-id profile chat "What do you remember about me?"
```

Use LangChain/LlamaIndex integrations when you want Smriti memory inside an existing agent framework.

### Researchers And Startups

Clone the repo, run benchmarks, and compare retrieval modes:

```bash
git clone https://github.com/Luciferai04/smriti-ai.git
cd smriti-ai
pip install -e ".[dev,ml,bench]"
python benchmarks/run_benchmarks.py --model-preset gemma4 --max-new-tokens 16
python benchmarks/run_longmem.py --dataset-path path/to/locomo.json --retrieval-mode semantic
```

Use `demos/smriti_kaggle.ipynb` and `demos/smriti_colab.ipynb` for package-based notebook demos.

### Privacy-Sensitive Deployments

Keep memory local, encrypted, auditable, and deletable:

```bash
export SMRITI_MEMORY_BACKEND=sqlite
export SMRITI_SQLITE_PATH=data/smriti_memory.sqlite3
export SMRITI_MEMORY_KEY="replace-with-a-long-random-secret"
export AUTH_ENABLED=true
smriti-cli start-server --host 127.0.0.1 --port 8000
```

Expose `/memory/delete` in your user data-deletion flow and run the authenticated audit UI only for trusted operators.

## Quick Start: Python Library

### Memory-Only Usage

This path does not require PyTorch, transformers, or a model download.

```python
from smriti import MemPalaceLite

memory = MemPalaceLite(retrieval_mode="semantic", session_id="alex", topic_id="profile")

memory.add_fact("Alex is a marine biologist in Hawaii.")
context = memory.get_context("What do you remember about Alex?")

print(context)
```

### Full Gemma 4 Usage

This uses a real model and the Smriti AI wrapper.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from smriti import SmritiAILite

model_id = "google/gemma-4-E2B-it"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

agent = SmritiAILite(
    model=model,
    tokenizer=tokenizer,
    retrieval_mode="semantic",
    session_id="alex",
    topic_id="profile",
)

agent.chat("My name is Alex and I am a marine biologist.")
reply = agent.chat("What do you remember about me?")
print(reply)
```

### Save And Load Memory

```python
agent.save_memory("smriti_memory.json")
agent.load_memory("smriti_memory.json")
```

### Direct Semantic Memory

```python
from smriti import SemanticMemory

memory = SemanticMemory()
memory.add_entry("user-a", "profile", "Maya is a doctor at a community clinic.")
results = memory.retrieve("user-a", "profile", "physician medical work", k=1)
print(results[0].entry.text)
```

### Direct Knowledge Graph

```python
from smriti import KnowledgeGraphMemory

graph = KnowledgeGraphMemory()
graph.add_triple("science", "Marie Curie", "discovered", "radium", topic_id="chemistry")
graph.add_triple("science", "radium", "is a", "chemical element", topic_id="chemistry")

facts = graph.triples_to_text(graph.query_graph("science", "Marie Curie", depth=2, topic_id="chemistry"))
print(facts)
```

## Quick Start: CLI

Smriti AI installs these commands:

| Command | Purpose |
|---|---|
| `smriti-cli` | Main CLI. |
| `smriti` | Short alias. |
| `smriti-api` | Run the FastAPI service. |
| `mempalace` | Backward-compatible alias. |

Create config:

```bash
smriti-cli init config.yaml
```

Interactive backend wizard:

```bash
smriti-cli config wizard --backend json --overwrite
```

Store and retrieve local memory:

```bash
smriti-cli --session-id alex --topic-id profile --retrieval-mode tfidf chat "My name is Alex and I work at Ocean Lab."
smriti-cli --session-id alex --topic-id profile --retrieval-mode tfidf chat "Where do I work?"
```

Save, load, delete:

```bash
smriti-cli --session-id alex memory save smriti_memory.json
smriti-cli --session-id alex memory load smriti_memory.json
smriti-cli --session-id alex memory delete --path smriti_memory.json
```

Run backend compatibility checks:

```bash
smriti-cli --backend json --backend-path data/memory backend conformance
smriti-cli --backend sqlite --backend-path data/smriti_memory.sqlite3 backend conformance
```

Migrate one user's memory between backends:

```bash
smriti-cli --session-id alex migrate-backend \
  --from-backend json --from-path data/memory \
  --to-backend sqlite --to-path data/smriti_memory.sqlite3
```

Query graph memory:

```bash
smriti-cli --session-id alex --topic-id profile graph_query user --depth 1
```

Run benchmarks:

```bash
# Installed package smoke check; internal-only, not public benchmark evidence.
SMRITI_ALLOW_TEST_DOUBLES=1 smriti-cli benchmark --quick

# Full Gemma 4 benchmark; run from a cloned source checkout.
python benchmarks/run_benchmarks.py --model-preset gemma4 --max-new-tokens 16
```

## Quick Start: FastAPI Service

Start locally:

```bash
smriti-cli start-server --host 0.0.0.0 --port 8000
# or
python -m smriti.api --host 0.0.0.0 --port 8000
```

Health check:

```bash
curl http://localhost:8000/health
```

OpenAPI docs:

```text
http://localhost:8000/docs
```

Memory/chat request:

```bash
curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "alex",
    "topic_id": "profile",
    "message": "My name is Alex and I am a marine biologist.",
    "retrieval_mode": "semantic"
  }'
```

Recall request:

```bash
curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "alex",
    "topic_id": "profile",
    "message": "What do you remember about me?",
    "retrieval_mode": "semantic"
  }'
```

### Important API Runtime Note

The default FastAPI service can operate as a memory service. If no model agent factory is configured, `/chat` returns memory-aware context and updates memory. To generate full model-backed assistant responses through the API, deploy the API process with a model runtime that registers an agent factory using `set_agent_factory`, or wrap Smriti inside your own service with `SmritiAILite`.

This separation keeps the memory service lightweight for enterprise integration while still supporting full local model-backed usage in Python.

## API Endpoints

| Method | Endpoint | Purpose |
|---|---|---|
| `GET` | `/health` | Liveness and loaded-user count. |
| `GET` | `/metrics` | Prometheus metrics. |
| `POST` | `/chat` | Retrieve context, update memory, optionally call configured model agent. |
| `POST` | `/memory/save` | Save one user's memory. |
| `POST` | `/memory/load` | Load memory from request body, backend, or path. |
| `POST` | `/memory/delete` | Delete one user's memory from RAM/backend/path. |
| `POST` | `/memory/list` | Audit/list one user's memory entries. |
| `POST` | `/memory/update` | Edit an individual memory entry. |
| `POST` | `/memory/pin` | Pin/unpin an important memory entry. |
| `POST` | `/memory/archive` | Archive/unarchive an entry. |
| `POST` | `/memory/entry/delete` | Delete one memory entry. |
| `POST` | `/graph/query` | Query session-scoped graph facts. |
| `GET` | `/docs` | FastAPI Swagger UI. |

## Configuration

Smriti AI reads `config.yaml` by default or the file pointed to by `SMRITI_CONFIG_PATH`.

```yaml
memory:
  backend: json
  memory_dir: data/memory
  sqlite_path: data/smriti_memory.sqlite3
  redis_url: redis://localhost:6379/0
  postgres_dsn: ""
  autosave: true

security:
  encryption_key: ""

api:
  host: 0.0.0.0
  port: 8000
  cors_origins:
    - "*"

model:
  adapter: local_hf
  base_model_id: google/gemma-4-E2B-it
  hf_endpoint_url: ""
  ollama_url: http://localhost:11434
  vllm_url: http://localhost:8000
  openai_compatible_url: https://api.openai.com/v1
```

Environment variables override config values:

| Variable | Purpose |
|---|---|
| `SMRITI_CONFIG_PATH` | Path to config file. |
| `SMRITI_MEMORY_BACKEND` | `json`, `sqlite`, `redis`, or `postgres`. |
| `SMRITI_MEMORY_DIR` | JSON memory directory. |
| `SMRITI_SQLITE_PATH` | SQLite database path. |
| `SMRITI_REDIS_URL` | Redis connection URL. |
| `SMRITI_POSTGRES_DSN` | Postgres DSN. |
| `SMRITI_AUTOSAVE` | Save memory after API updates. |
| `SMRITI_MEMORY_KEY` | Encrypt memory blobs. |
| `SMRITI_API_KEY` | Protect API routes. |
| `AUTH_ENABLED` | Enable role-bound API-key auth. |
| `SMRITI_API_KEYS_PATH` | Path to `api_keys.json` with user/admin keys. |
| `SMRITI_CORS_ORIGINS` | Comma-separated CORS allowlist. |
| `SMRITI_HOST` | API host. |
| `SMRITI_PORT` | API port. |
| `SMRITI_MODEL_ADAPTER` | `local_hf`, `hf_endpoint`, `ollama`, `vllm`, or `openai`. |
| `BASE_MODEL_ID` | Base model ID, default `google/gemma-4-E2B-it`. |
| `HF_ENDPOINT_URL` | Hugging Face Inference Endpoint URL when using endpoint mode. |
| `OPENAI_API_KEY` | Provider key only when using OpenAI-compatible adapter. |

## Durable Memory Backends

| Backend | Best For | Notes |
|---|---|---|
| JSON | Local experiments, privacy-first single-machine usage. | Simple files under `data/memory`. |
| SQLite | Local production, desktop apps, edge devices. | Single-file database, no external service. |
| Redis | Low-latency state for deployed agents. | Good for concurrent API use; use persistence in production. |
| Postgres | Enterprise durability and operational tooling. | Good fit for audited multi-user deployments. |

Examples:

```bash
SMRITI_MEMORY_BACKEND=json smriti-cli start-server
SMRITI_MEMORY_BACKEND=sqlite SMRITI_SQLITE_PATH=data/smriti.sqlite3 smriti-cli start-server
SMRITI_MEMORY_BACKEND=redis SMRITI_REDIS_URL=redis://localhost:6379/0 smriti-cli start-server
SMRITI_MEMORY_BACKEND=postgres SMRITI_POSTGRES_DSN=postgresql://user:pass@host:5432/smriti smriti-cli start-server
```

## Docker

### Local CPU API

```bash
docker compose up -d --build api
```

### API With Redis

```bash
COMPOSE_PROFILES=redis SMRITI_MEMORY_BACKEND=redis docker compose up -d --build
```

### API With Postgres

```bash
COMPOSE_PROFILES=postgres SMRITI_MEMORY_BACKEND=postgres docker compose up -d --build
```

### GPU-Capable Image

When Docker has NVIDIA runtime support:

```bash
SMRITI_DOCKERFILE=Dockerfile docker compose up -d --build api
```

### Production Compose

```bash
docker compose -f docker-compose.prod.yml up -d
```

### Monitoring Stack

```bash
COMPOSE_PROFILES=monitoring docker compose up -d --build
```

Then open:

| Service | URL |
|---|---|
| API | `http://localhost:8000` |
| API docs | `http://localhost:8000/docs` |
| Prometheus | `http://localhost:9090` |
| Grafana | `http://localhost:3000` |

Default Grafana credentials in local compose:

```text
admin / smriti
```

## Hugging Face Model-Style Deployment

Smriti AI can also be packaged as a Hugging Face model repository with a custom `handler.py`. This does not make Smriti AI a newly trained foundation model. It packages the memory wrapper, model card, endpoint config, example requests, and upload tooling so Hugging Face Inference Endpoints can serve a memory-augmented base model.

Deployment assets live in:

```text
deploy/huggingface_model/
deploy/huggingface_dataset/
deploy/huggingface_space/
```

Local handler smoke test:

```bash
BASE_MODEL_ID=google/gemma-4-E2B-it \
HF_TOKEN=$HF_TOKEN \
SMRITI_MEMORY_BACKEND=json \
SMRITI_MEMORY_PATH=/tmp/smriti_hf_test.json \
python deploy/huggingface_model/test_handler_local.py
```

Upload to a Hugging Face model repo:

```bash
export HF_TOKEN=...
python deploy/huggingface_model/upload_model_repo.py \
  --repo-id luciferai-devil/smriti-ai \
  --private false
```

Official v1.0 Hugging Face targets:

| Asset | Repo |
|---|---|
| Model wrapper | `luciferai-devil/smriti-ai` |
| Benchmark dataset | `luciferai-devil/smriti-ai-benchmarks` |
| CPU-safe demo Space | `luciferai-devil/smriti-ai-demo` |

Upload sanitized benchmark artifacts:

```bash
python deploy/huggingface_dataset/upload_benchmark_dataset.py \
  --repo-id luciferai-devil/smriti-ai-benchmarks \
  --private false
```

Upload the public demo Space:

```bash
python deploy/huggingface_space/upload_space.py \
  --repo-id luciferai-devil/smriti-ai-demo \
  --private false
```

The Space runs in CPU-safe memory-only mode by default, warns users not to enter PII, and auto-deletes demo memory after inactivity.

Use `BASE_MODEL_ID` for a locally loaded model inside the endpoint, or `HF_ENDPOINT_URL` if the Smriti handler should call another model endpoint. Production endpoints should use external Redis/Postgres memory and must not store private user memory inside the model repository.

See `docs/deploy_as_hf_model.md` for the full deployment guide.

## Monitoring And Observability

The API exports Prometheus metrics from `/metrics`.

| Metric | Meaning |
|---|---|
| `smriti_http_requests_total` | Request count by method, path, status. |
| `smriti_http_errors_total` | Server-side error count. |
| `smriti_http_request_latency_seconds` | End-to-end request latency histogram. |
| `smriti_retrieval_latency_seconds` | Memory retrieval latency histogram. |
| `smriti_tokens_total` | Approximate token count observed by API. |
| `smriti_user_memories` | Number of loaded user memory stores. |
| `smriti_user_memory_bytes` | Approximate serialized memory size by user. |

Observability helper:

```bash
python scripts/metrics_monitor.py --url http://localhost:8000 --output reports/metrics_report.md
```

## Privacy And Security

Smriti AI stores user memory, so privacy is a core operational concern.

| Requirement | Smriti AI Support |
|---|---|
| User isolation | Memory is keyed by `user_id` / `session_id` and `topic_id`. |
| Deletion | `/memory/delete` and `smriti-cli memory delete`. |
| Encryption | Set `SMRITI_MEMORY_KEY` to encrypt backend blobs. |
| API protection | Set `SMRITI_API_KEY` or `AUTH_ENABLED=true` with `api_keys.json`. |
| RBAC | `user` keys can access only their bound `user_id`; `admin` keys can operate across users. |
| Local-first deployment | JSON/SQLite can run fully on-device. |
| Auditability | Memory can be exported, inspected, pinned, archived, edited, and deleted. |

Delete user memory through API:

```bash
curl -X POST http://localhost:8000/memory/delete \
  -H "Content-Type: application/json" \
  -d '{"user_id": "alex"}'
```

Read more in `docs/privacy.md`.

Role-bound API keys:

```bash
cp api_keys.example.json api_keys.json
export AUTH_ENABLED=true
export SMRITI_API_KEYS_PATH=api_keys.json
curl -H "x-api-key: replace-with-user-key" http://localhost:8000/health
```

Read more in `docs/auth.md`.

## Framework Integrations

### LangChain

```python
from smriti.integrations.langchain import SmritiMemory

memory = SmritiMemory(session_id="alex", topic_id="profile")
memory.save_context(
    {"input": "My name is Alex and I work at Ocean Lab."},
    {"output": "Nice to meet you, Alex."},
)
print(memory.load_memory_variables({"input": "Where do I work?"}))
```

### LlamaIndex

```python
from smriti.integrations.llama_index import SmritiStorageContext

storage = SmritiStorageContext(session_id="alex", topic_id="profile")
storage.add_node("Alex is a marine biologist.")
print(storage.query("What does Alex do?"))
```

## Web Demo

The demo app lets users inject facts, ask distractors, view retrieved memories, and delete user memory.

```bash
pip install -e ".[demo]"
uvicorn demo.app:app --port 8080
```

You can also run the packaged module directly:

```bash
python -m demo.app
```

Screenshots from the current dashboard:

![Smriti AI dashboard home](docs/assets/smriti-dashboard-home.png)

![Smriti AI benchmark evidence](docs/assets/smriti-dashboard-benchmarks.png)

See `src/demo/README.md` for details.

### Memory Audit Dashboard

The audit dashboard is a separate authenticated UI for operators and privacy reviews:

```bash
export SMRITI_AUDIT_USER=admin
export SMRITI_AUDIT_PASSWORD="replace-with-a-strong-password"
uvicorn demo.audit_app:app --port 8090
```

Open `http://127.0.0.1:8090` and sign in with the configured credentials. Use it to search, edit, pin, archive, and delete individual memories.

## Model Provider Adapters

Smriti AI can wrap multiple generation providers through a small adapter interface:

```python
from smriti.adapters import build_adapter

adapter = build_adapter("hf_endpoint")
text = adapter.generate("Augmented Smriti prompt", max_new_tokens=128)
```

Supported adapters:

| Adapter | Typical target |
|---|---|
| `local_hf` | Local Transformers Gemma 4. |
| `hf_endpoint` | Hugging Face Inference Endpoint. |
| `ollama` | Local Ollama REST server. |
| `vllm` | vLLM server. |
| `openai` | OpenAI-compatible hosted APIs. |

Read more in `docs/adapters.md`.

## Benchmarks

### Benchmark Policy

Public benchmark claims in this repository use real Gemma 4 only:

```text
google/gemma-4-E2B-it
```

Deterministic test-double paths may exist for engineering tests, but they are not public model-quality claims.

### Current Local Gemma 4 Results

These are current local CPU measurements from the checked-in CSV artifacts.

| Evaluation | Baseline Recall | Best Smriti AI Recall | Absolute Lift | Notes |
|---|---:|---:|---:|---|
| Gemma-style three-fact protocol | 0/3 | 3/3 | +3 facts | Baseline 5.71s, Semantic+Graph+Identity 4.99s avg CPU latency. |
| Five-mode comparison (`max_new_tokens=16`) | 0/3 | 3/3 | +3 facts | Fastest successful memory mode: Semantic+Graph at 2.78s avg CPU latency. |
| Original broader protocol rerun (`max_new_tokens=256`) | 0/3 | 3/3 | +3 facts | Overall average improved from 0.524 to 0.832 (`+58.9%`). |

Five-mode comparison:

| Configuration | Recall | Avg Latency | Context Coherence | Notes |
|---|---:|---:|---:|---|
| Baseline | 0/3 | 4.927s | 0.000 | Frozen Gemma 4, no memory layer. |
| TF-IDF | 3/3 | 3.481s | 0.667 | Lexical memory mode. |
| Semantic | 3/3 | 2.857s | 0.333 | Embedding-based memory mode. |
| Semantic + Graph | 3/3 | 2.781s | 0.667 | Fastest successful memory mode in this CPU run. |
| Semantic + Graph + Identity | 3/3 | 5.164s | 0.000 | Adds persona governance overhead. |

Original broader protocol rerun:

| Metric | Baseline | Smriti AI | Delta |
|---|---:|---:|---:|
| Memory retention | 0.000 | 1.000 | +inf% |
| Response consistency | 0.571 | 0.496 | -13.2% |
| Context coherence | 1.000 | 1.000 | +0.0% |
| Overall average | 0.524 | 0.832 | +58.9% |

The older `+31.2%` overall number from earlier writeups remains historical lineage. The current comparable broader-protocol rerun is `+58.9%` under this local Gemma 4 CPU setup with `max_new_tokens=256`.

### Run Benchmarks

Install benchmark and ML extras:

```bash
pip install -e ".[ml,bench]"
```

Gemma-style memory retention:

```bash
python benchmarks/run_gemma_eval.py
```

Five-configuration comparison:

```bash
python benchmarks/run_benchmarks.py \
  --model-preset gemma4 \
  --configurations tfidf semantic semantic_graph semantic_graph_identity \
  --devices auto \
  --max-new-tokens 16 \
  --output benchmarks/results_comparison.csv
```

Original broader protocol rerun:

```bash
python benchmarks/run_historical_protocol.py --max-new-tokens 256
```

Cross-model harness:

```bash
python benchmarks/run_benchmarks.py \
  --model-preset cross_model \
  --output benchmarks/cross_model_results.csv \
  --summary-output benchmarks/summary.md
```

Long-memory / LoCoMo-style runner:

```bash
python benchmarks/run_longmem.py --dataset-path path/to/locomo.json --retrieval-mode semantic
```

Identity drift benchmark:

```bash
python benchmarks/run_identity_bench.py --output reports/identity_evaluation.csv
```

Aggregate summaries:

```bash
python benchmarks/summarize_results.py
```

### Benchmark Evidence Files

| File | Purpose |
|---|---|
| `benchmarks/results_gemma_eval.csv` | Gemma 4 baseline vs Smriti three-fact evaluation. |
| `benchmarks/results_comparison.csv` | Baseline, TF-IDF, semantic, semantic+graph, semantic+graph+identity. |
| `benchmarks/results_historical_protocol.csv` | Current rerun of the older broader protocol. |
| `benchmarks/results_historical_protocol_responses.json` | Response audit trail for the broader-protocol rerun. |
| `benchmarks/cross_model_results.csv` | Optional cross-model memory-retention comparison. |
| `benchmarks/longmem_results.csv` | Optional LoCoMo-style long-memory output. |
| `benchmarks/latency_gemma4.csv` | Dedicated Gemma 4 latency/token probe. |
| `reports/identity_evaluation.csv` | Persona drift detection benchmark. |
| `results/summary.md` | Human-readable aggregate summary. |
| `benchmarks/README.md` | Generated benchmark table. |
| `model_card_smriti.md` | Model card and result disclaimer. |
| `research/evidence/benchmark_lineage.csv` | Historical/current result ledger and claim-status labels. |

## Testing

Run the full test suite:

```bash
pytest -q
```

Run the production hardening matrices:

```bash
make test              # unit + deterministic test-double integration
make test-security     # prompt injection, redaction, auth/RBAC, delete/encryption
make test-benchmarks   # deterministic benchmark artifacts and budgets
make production-gates  # manifest, regression, privacy, and gate report checks
make end-user-readiness # first-run install/docs/CLI/deployment readiness checks
```

These PR-safe tests use deterministic test-double paths. Gemma 4 and other
real-model benchmarks are reserved for nightly/manual runs so ordinary
contributors do not need to download large gated checkpoints.

Run with coverage:

```bash
pytest --cov=smriti --cov-report=term-missing --cov-report=html:reports/coverage/html
```

Run style checks:

```bash
ruff check benchmarks scripts src tests
```

Build and install the wheel locally:

```bash
python -m build
python -m venv .venv-wheel
source .venv-wheel/bin/activate
pip install dist/smriti_ai-*.whl
python -c "from smriti import SmritiAILite, SemanticMemory, KnowledgeGraphMemory; print('wheel OK')"
```

## Smoke Tests

Local API smoke test:

```bash
bash scripts/smoke_test.sh
```

Latency probe:

```bash
python scripts/measure_latency.py --retrieval-modes tfidf semantic --output benchmarks/latency_results.csv
```

Load test helper:

```bash
python scripts/load_test_runner.py --users 10 --spawn-rate 10 --run-time 30s --backend json
```

See `docs/load_testing.md` for the 10/100/1000-user matrix and report files.

Fault-tolerance probe:

```bash
python scripts/fault_tolerance_tests.py --url http://localhost:8000
```

## Agentic Harness Evolution

Smriti AI now includes an AHE-inspired loop for improving the inference-time
memory harness while keeping Gemma 4 or any other base model frozen.

| Layer | File | Purpose |
|---|---|---|
| Harness config | `configs/harness_params.yaml` | Editable retrieval, graph, compression, and identity-governance parameters. |
| Evidence collection | `benchmarks/collect_evidence.py` | Runs memory-retention or JSON/JSONL long-memory tasks and writes summary/log evidence. |
| Evolution decision | `evolve_harness.py` | Applies bounded heuristics and appends a predicted-impact manifest entry. |
| Closed loop | `run_evolution.py` | Re-evaluates proposed configs, reverts regressions, and can tag Git iterations. |
| Audit trail | `manifests/evolve_manifest.jsonl` | JSONL history of component changed, previous/new values, reason, prediction, observed effect, and config snapshots. |
| Harness registry | `harnesses/` | Versioned seed/evolved harness artifacts with metadata, results, and production status. |
| Manifest verifier | `src/smriti/manifest_verifier.py` | Validates that each accepted/rejected change has before/after evidence. |
| Production gates | `src/smriti/production_gates.py` | Runs tests, backend/privacy checks, validation, holdout, cross-model, latency, token, and identity gates before promotion. |
| Canary routing | `src/smriti/canary.py` | Sticky per-user canary routing for evolved harnesses with rollback conditions. |

Quick local loop:

```bash
python benchmarks/collect_evidence.py \
  --config configs/harness_params.yaml \
  --summary benchmarks/evidence_summary.json

python evolve_harness.py \
  --config configs/harness_params.yaml \
  --evidence benchmarks/evidence_summary.json

python run_evolution.py --iterations 5 --no-commit
```

Validation and release-gate loop:

```bash
python benchmarks/validate_harness_evolution.py \
  --seed-config harnesses/seed/harness_params.yaml \
  --evolved-config harnesses/evolved-v1/harness_params.yaml

python benchmarks/run_holdout_eval.py \
  --config harnesses/evolved-v1/harness_params.yaml

python benchmarks/run_cross_model_harness_eval.py \
  --seed-config harnesses/seed/harness_params.yaml \
  --evolved-config harnesses/evolved-v1/harness_params.yaml

python harness/verify_manifest.py
python harness/production_gates.py evolved-v1 --to candidate
```

Harness registry CLI:

```bash
smriti-cli harness list
smriti-cli harness show evolved-v1
smriti-cli harness compare seed evolved-v1
smriti-cli harness activate evolved-v1
smriti-cli harness rollback seed
smriti-cli harness verify-manifest
smriti-cli harness promote evolved-v1 --to production
smriti-cli harness regression-test
```

API/dashboard support:

| Endpoint | Purpose |
|---|---|
| `GET /harness/current` | Show active harness parameters and registry entries. |
| `GET /harness/history` | Return manifest history. |
| `GET /harness/metrics` | Return validation and canary metrics. |
| `POST /harness/rollback` | Roll back to a registry harness. Admin-only when auth is enabled. |
| `POST /harness/evaluate` | Run seed-vs-evolved validation. Admin-only when auth is enabled. |
| `GET /harness/canary/status` | Show active/canary routing status. |
| `POST /harness/canary/start` | Start sticky canary routing. Admin-only when auth is enabled. |
| `POST /harness/canary/stop` | Stop canary routing. |
| `POST /harness/canary/promote` | Promote canary harness. |
| `POST /harness/canary/rollback` | Roll back canary harness. |

The web dashboard also exposes a harness cockpit with current parameters,
recent manifest entries, manual overrides, rollback controls, quick evaluation,
seed comparison, and report export. See `docs/research_lineage.md` for the
research rationale and AHE mapping.

Generated harness artifacts:

| Artifact | Purpose |
|---|---|
| `results/harness_evolution_validation.md` | Baseline vs seed vs evolved harness validation. |
| `results/evolution_generalization_report.md` | Final holdout evaluation. |
| `results/cross_model_harness_eval.md` | Cross-model deterministic harness validation. |
| `results/manifest_verification.md` | Manifest integrity report. |
| `results/production_gate_report.md` | Promotion gate verdict. |
| `results/canary_report.md` | Canary routing status and metrics. |
| `reports/evolution_report.md` | Stakeholder-readable evolution report. |

## GPU And CPU Behavior

Smriti AI is designed to fall back cleanly to CPU.

| Component | CPU | GPU |
|---|---|---|
| Base model | Works, slower for Gemma 4. | Moves model to CUDA when available. |
| Generation dtype | `float32` on CPU. | `bfloat16` if supported, else `float16`. |
| Embeddings | Sentence-transformers can run on CPU. | Embedding model can move to CUDA. |
| FAISS | Uses CPU by default. | Attempts GPU indices when CUDA FAISS support is available. |

For practical demos with Gemma 4, GPU is recommended. CPU is acceptable for reproducibility but slower.

## Training Research Package

Runtime Smriti AI is training-free. The training package is separate and optional.

```bash
pip install -e ".[training]"
python -m training.ewc_replay --model google/gemma-4-E2B-it --dataset path/to/data.jsonl --dry-run
```

The training module includes replay/EWC experiment scaffolding and logs metrics under `training/`. It is not imported by `smriti` during inference.

## CI/CD

| Workflow | Trigger | Purpose |
|---|---|---|
| `.github/workflows/ci.yml` | Push / PR | Install, lint, test, compile, build, install wheel, audit, upload artifacts. |
| `.github/workflows/test_agent_hardening.yml` | Push / PR/manual | Unit, test-double integration, OWASP-style security, benchmark smoke, production gates, optional backend jobs. |
| `.github/workflows/nightly_benchmarks.yml` | Nightly/manual | Real-model Gemma-style, retrieval, holdout, and identity benchmarks. |
| `.github/workflows/harness_production_gate.yml` | Harness/API/benchmark changes | Verify manifest and run production gates for evolved harnesses. |
| `.github/workflows/benchmark.yml` | Nightly/manual | Run benchmark suite on a small configured setup. |
| `.github/workflows/load-test.yml` | Push/nightly/manual | Run a 10-user API load smoke test and upload reports. |
| `.github/workflows/docker.yml` | Tags/manual | Build API/demo/training Docker images. |
| `.github/workflows/release.yml` | Tag push | Build package and publish release artifacts. |

Latest local push for the harness-production work passed CI, Harness Production
Gate, and Load Test workflows.

## Repository Layout

```text
smriti-ai/
|-- src/smriti/                  # Runtime memory package
|-- src/training/                # Optional replay/EWC research code
|-- src/demo/                    # Small web demo
|-- demos/                       # Kaggle/Colab package-import notebooks
|-- benchmarks/                  # Gemma 4 evaluations and CSV results
|-- configs/                     # Harness and runtime parameter files
|-- harness/                     # Manifest verification and production-gate wrappers
|-- harnesses/                   # Versioned seed/evolved harness registry
|-- tests/                       # Unit and integration tests
|-- scripts/                     # Setup, smoke, latency, load, fault probes
|-- docs/                        # Privacy and API documentation
|-- manifests/                   # AHE JSONL evolution audit trail
|-- research/artifacts/          # Curated original notebooks, logs, and excerpts
|-- research/evidence/           # Curated benchmark lineage and evidence policy
|-- monitoring/                  # Prometheus/Grafana assets
|-- support/                     # Troubleshooting and sample configs
|-- notebooks/                   # Package-based demo notebook
|-- reports/                     # Readiness, coverage, metrics reports
|-- Dockerfile                   # GPU-capable API image
|-- Dockerfile.cpu               # Lightweight CPU API image
|-- Dockerfile.demo              # Demo image
|-- Dockerfile.training          # Training/research image
|-- docker-compose.yml           # Local API/backends/monitoring stack
|-- docker-compose.prod.yml      # Production-oriented compose stack
|-- pyproject.toml               # Package metadata and extras
|-- config.yaml                  # Local config template
|-- evolve_harness.py            # One-step harness evolution proposal script
|-- run_evolution.py             # Closed-loop evidence/evolve/verify driver
|-- ROADMAP.md                   # Post-v1 roadmap including AHE hardening
|-- model_card_smriti.md         # Model card and benchmark disclosure
```

## Production Readiness Notes

| Area | Recommendation |
|---|---|
| Auth | Set `SMRITI_API_KEY` or place API behind an authenticated gateway. |
| RBAC | Use `AUTH_ENABLED=true` and role-bound keys for production endpoints. |
| Secrets | Use environment variables or a secret manager, not committed config files. |
| Storage | Use SQLite for local apps, Redis/Postgres for server deployments. |
| Encryption | Set `SMRITI_MEMORY_KEY` for sensitive memory. |
| Backups | Back up JSON/SQLite/Postgres memory stores according to your RPO/RTO. |
| Observability | Scrape `/metrics` and use Grafana dashboard panels. |
| Load testing | Run Locust/wrk-style load tests before enterprise rollout. |
| Deletion | Wire `/memory/delete` into user data deletion workflows. |
| Audit | Protect `demo.audit_app` and audit endpoints before exposing memory inspection. |
| Benchmarking | Rerun Gemma 4 benchmarks on your hardware before publishing claims. |

See `reports/production_readiness.md` for the latest QA snapshot.

## Troubleshooting

| Problem | Likely Cause | Fix |
|---|---|---|
| `ModuleNotFoundError: smriti` | Package not installed in current environment. | Run `pip install -e .` or activate the correct venv. |
| `ModuleNotFoundError: transformers` | ML extras not installed. | Run `pip install -e ".[ml]"`. |
| Gemma 4 fails to load | Missing Hugging Face access or incompatible Transformers stack. | Run `hf auth login` and update ML dependencies. |
| API returns memory-only text | No model agent factory is registered. | Use `SmritiAILite` in Python or deploy API with an agent factory. |
| API returns `401` | `SMRITI_API_KEY` is set. | Send `x-api-key: <key>`. |
| Memory not persisted | Autosave disabled or backend not configured. | Set `SMRITI_AUTOSAVE=1` and configure backend. |
| Encrypted memory cannot load | Missing or wrong `SMRITI_MEMORY_KEY`. | Set the same key used when saving. |
| Docker image is large | ML dependencies and model runtimes are heavy. | Use `Dockerfile.cpu` for API-only memory service. |
| Benchmarks are slow | Gemma 4 on CPU is heavy. | Use GPU or reduce `max_new_tokens` for local checks. |

## Roadmap

| Release | Theme | Planned Work |
|---|---|---|
| v1.1 | Memory quality | Hot/cold memory tiers, stronger compression, multilingual embeddings, cross-lingual recall tests, configurable decay/top-K/summarization thresholds. |
| v1.2 | Scalability | Async backend paths, `asyncpg` Postgres option, batched writes, embedding cache, 100/500/1000-user load reports. |
| v1.3 | Research | LongMemEval/MemoryBench tracking, SmritiBench design, temporal/weighted graph memory, cross-agent shared memory with strict isolation. |
| Ongoing | Community | Good-first issues, backend/adapter contribution guides, benchmark reproducibility reports, pilot-user feedback loops. |

See `ROADMAP.md` for the living post-v1 roadmap.

## Contributing

See `CONTRIBUTING.md` for development setup and contribution guidance.

Recommended local loop:

```bash
pip install -e ".[dev,bench]"
ruff check src benchmarks scripts tests
pytest -q
python -m build
```

Release history and stakeholder-facing notes live in `CHANGELOG.md` and `RELEASE_NOTES_v1.0.3.md`. A tutorial draft for the v1 memory protocol, audit UI, and benchmark evidence lives in `docs/blog/smriti-ai-v1-memory-layer.md`.

## License

Apache-2.0. See `pyproject.toml` for package metadata.

<!-- HARNESS_EVOLUTION_RESULTS_START -->
## Harness Evolution Results

The base model remains frozen. Smriti AI is not fine-tuned; these numbers come from memory-harness evaluation.

| System | Recall | Precision@K | p95 latency ms | Token overhead | Privacy delete |
|---|---:|---:|---:|---:|---|
| baseline_frozen_model | 0.000 | 0.000 | 0.000 | 0 | True |
| smriti_seed_harness | 1.000 | 0.333 | 0.525 | 328 | True |
| smriti_evolved_harness | 1.000 | 0.333 | 0.168 | 328 | True |

Cross-model harness validation:

| Model | Seed recall | Evolved recall | Gate |
|---|---:|---:|---|
| google/gemma-4-E2B-it | 1.000 | 1.000 | pass |
| meta-llama/Llama-3.2-1B | 1.000 | 1.000 | pass |
| microsoft/Phi-3-mini-4k-instruct | 1.000 | 1.000 | pass |
| mistralai/Mistral-7B-Instruct-v0.3 | 1.000 | 1.000 | pass |
| Qwen/Qwen2.5-1.5B-Instruct | 1.000 | 1.000 | pass |

Production gate report: `results/production_gate_report.md`

Historical GodelAI-Lite results remain separate lineage and are not conflated with current Smriti AI harness metrics.
Deterministic test doubles are used only for CI stability and never counted as public benchmark evidence.
<!-- HARNESS_EVOLUTION_RESULTS_END -->
