Metadata-Version: 2.4
Name: smriti-memory-ai
Version: 1.0.7
Summary: Smriti AI: inference-time semantic memory for small language models
Author-email: Alton Lee Wei Bin <creator35lwb@gmail.com>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/Luciferai04/smriti-ai
Project-URL: Repository, https://github.com/Luciferai04/smriti-ai
Keywords: memory,llm,mlops,semantic-retrieval,gemma-4,fastapi,privacy
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: numpy==1.26.4
Requires-Dist: scikit-learn==1.5.2
Requires-Dist: networkx==3.2.1
Requires-Dist: fastapi==0.128.8
Requires-Dist: pydantic==2.13.3
Requires-Dist: uvicorn==0.39.0
Requires-Dist: prometheus-client==0.25.0
Requires-Dist: cryptography==46.0.7
Requires-Dist: pyyaml==6.0.3
Provides-Extra: ml
Requires-Dist: torch==2.8.0; extra == "ml"
Requires-Dist: transformers==5.7.0; extra == "ml"
Requires-Dist: huggingface-hub==1.13.0; extra == "ml"
Requires-Dist: sentence-transformers==5.2.3; extra == "ml"
Requires-Dist: faiss-cpu==1.13.0; extra == "ml"
Requires-Dist: nltk==3.9.4; extra == "ml"
Provides-Extra: bench
Requires-Dist: psutil==7.1.3; extra == "bench"
Requires-Dist: locust==2.43.4; extra == "bench"
Provides-Extra: backends
Requires-Dist: redis==7.1.0; extra == "backends"
Requires-Dist: psycopg2-binary==2.9.11; extra == "backends"
Provides-Extra: demo
Requires-Dist: jinja2==3.1.6; extra == "demo"
Requires-Dist: python-multipart==0.0.27; extra == "demo"
Provides-Extra: integrations
Requires-Dist: langchain-core==1.3.2; extra == "integrations"
Requires-Dist: llama-index-core==0.14.21; extra == "integrations"
Provides-Extra: training
Requires-Dist: torch==2.8.0; extra == "training"
Requires-Dist: transformers==5.7.0; extra == "training"
Requires-Dist: huggingface-hub==1.13.0; extra == "training"
Requires-Dist: accelerate==1.12.0; extra == "training"
Provides-Extra: security
Requires-Dist: cryptography==46.0.7; extra == "security"
Provides-Extra: observability
Requires-Dist: opentelemetry-api<2.0.0,>=1.28.0; extra == "observability"
Requires-Dist: opentelemetry-sdk<2.0.0,>=1.28.0; extra == "observability"
Requires-Dist: opentelemetry-instrumentation-fastapi<1.0.0,>=0.49b0; extra == "observability"
Requires-Dist: opentelemetry-exporter-otlp<2.0.0,>=1.28.0; extra == "observability"
Requires-Dist: sentry-sdk<3.0.0,>=2.20.0; extra == "observability"
Provides-Extra: dev
Requires-Dist: pytest==9.0.3; extra == "dev"
Requires-Dist: pytest-cov==7.1.0; extra == "dev"
Requires-Dist: httpx==0.28.1; extra == "dev"
Requires-Dist: jinja2==3.1.6; extra == "dev"
Requires-Dist: python-multipart==0.0.27; extra == "dev"
Requires-Dist: black==26.3.1; extra == "dev"
Requires-Dist: ruff==0.15.12; extra == "dev"
Requires-Dist: pip-audit==2.10.0; extra == "dev"
Requires-Dist: build==1.3.0; extra == "dev"
Requires-Dist: pytest-benchmark<6.0.0,>=4.0.0; extra == "dev"
Requires-Dist: bandit<2.0.0,>=1.7.0; extra == "dev"
Requires-Dist: safety<4.0.0,>=3.0.0; extra == "dev"
Requires-Dist: detect-secrets<2.0.0,>=1.5.0; extra == "dev"
Provides-Extra: full
Requires-Dist: torch==2.8.0; extra == "full"
Requires-Dist: transformers==5.7.0; extra == "full"
Requires-Dist: huggingface-hub==1.13.0; extra == "full"
Requires-Dist: accelerate==1.12.0; extra == "full"
Requires-Dist: sentence-transformers==5.2.3; extra == "full"
Requires-Dist: faiss-cpu==1.13.0; extra == "full"
Requires-Dist: nltk==3.9.4; extra == "full"
Requires-Dist: gensim==4.4.0; extra == "full"
Requires-Dist: spacy==3.8.11; extra == "full"
Requires-Dist: neo4j==5.28.2; extra == "full"
Requires-Dist: redis==7.1.0; extra == "full"
Requires-Dist: psycopg2-binary==2.9.11; extra == "full"
Requires-Dist: jinja2==3.1.6; extra == "full"
Requires-Dist: python-multipart==0.0.27; extra == "full"
Requires-Dist: langchain-core==1.3.2; extra == "full"
Requires-Dist: llama-index-core==0.14.21; extra == "full"

# Smriti AI

[![CI](https://github.com/Luciferai04/smriti-ai/actions/workflows/ci.yml/badge.svg)](https://github.com/Luciferai04/smriti-ai/actions/workflows/ci.yml)
[![Security](https://github.com/Luciferai04/smriti-ai/actions/workflows/security.yml/badge.svg)](https://github.com/Luciferai04/smriti-ai/actions/workflows/security.yml)
[![Agent Hardening](https://github.com/Luciferai04/smriti-ai/actions/workflows/test_agent_hardening.yml/badge.svg)](https://github.com/Luciferai04/smriti-ai/actions/workflows/test_agent_hardening.yml)
[![Production File Audit](https://github.com/Luciferai04/smriti-ai/actions/workflows/production_file_audit.yml/badge.svg)](https://github.com/Luciferai04/smriti-ai/actions/workflows/production_file_audit.yml)
[![Release](https://github.com/Luciferai04/smriti-ai/actions/workflows/release.yml/badge.svg)](https://github.com/Luciferai04/smriti-ai/actions/workflows/release.yml)
[![Docker Images](https://github.com/Luciferai04/smriti-ai/actions/workflows/docker.yml/badge.svg)](https://github.com/Luciferai04/smriti-ai/actions/workflows/docker.yml)
[![PyPI](https://img.shields.io/pypi/v/smriti-memory-ai.svg?label=PyPI)](https://pypi.org/project/smriti-memory-ai/)
[![Python Versions](https://img.shields.io/pypi/pyversions/smriti-memory-ai.svg)](https://pypi.org/project/smriti-memory-ai/)
[![License: Apache-2.0](https://img.shields.io/badge/License-Apache--2.0-blue.svg)](LICENSE)
[![Status](https://img.shields.io/badge/status-production--ready-brightgreen.svg)](#production-readiness-notes)
[![Frozen Base Model](https://img.shields.io/badge/model-frozen--base-blue.svg)](#what-smriti-ai-is-not)
[![Training Free](https://img.shields.io/badge/training-free_memory_layer-success.svg)](#what-smriti-ai-is)
[![Privacy](https://img.shields.io/badge/privacy-encryption_%2B_delete-success.svg)](#privacy-and-security)
[![Auth/RBAC](https://img.shields.io/badge/security-auth%2FRBAC-success.svg)](#privacy-and-security)
[![Monitoring](https://img.shields.io/badge/monitoring-Prometheus%2FGrafana-orange.svg)](#cicd)
[![Benchmark Provenance](https://img.shields.io/badge/benchmarks-provenance_required-informational.svg)](#benchmarks)
[![Docker](https://img.shields.io/badge/deploy-Docker%2FGHCR-blue.svg)](#docker)
[![Hugging Face](https://img.shields.io/badge/deploy-Hugging_Face-yellow.svg)](#hugging-face-model-style-deployment)
[![Local First](https://img.shields.io/badge/deploy-local--first-lightgrey.svg)](#quick-start-for-customers)

Smriti AI is a local-first, training-free memory layer for small language models. It wraps a frozen HuggingFace causal language model with persistent memory, semantic retrieval, graph memory, identity governance, API tooling, Docker deployment, benchmarks, and production-readiness checks.

The name comes from *smriti* (IAST: *smṛti*), a Sanskrit term associated with memory and remembrance. Wisdom Library describes Smritis as "that which has to be remembered": [Wisdom Library: Smriti](https://www.wisdomlib.org/definition/smriti).

> Small models are not only limited by parameter count. They are limited by the absence of durable memory.

Smriti AI keeps the base model frozen. It improves long-term recall by storing external memory, retrieving only relevant facts, injecting them into the prompt, and updating the memory after each interaction. No LoRA, fine-tuning, adapter weights, or model retraining are required for the inference-time memory system.

## Current Status

Smriti AI is packaged on PyPI as `smriti-memory-ai` with the importable package `smriti`. The GitHub repository and Hugging Face resources remain named `smriti-ai`.

| Area | Status |
|---|---|
| Python package | `pyproject.toml`, `src/` layout, console scripts, build workflow. |
| Core memory | TF-IDF compatibility mode plus semantic session/topic/fact memory. |
| Retrieval | Sentence-transformer embeddings, FAISS/NumPy fallback, cosine similarity with temporal decay. |
| Graph memory | Per-session `networkx` knowledge graph with simple triple extraction and traversal. |
| Identity governance | Embedding-based persona fingerprint with drift checks and refinement hooks. |
| Backends | JSON, SQLite, Redis, and Postgres backend abstractions. |
| Privacy | Optional encrypted memory blobs and `/memory/delete`. |
| Audit controls | List, search, pin, archive, update, and delete individual memory entries. |
| Auth/RBAC | Optional API-key auth with `user` and `admin` roles. |
| API | FastAPI service with CORS, OpenAPI docs, metrics, health checks, API-key/RBAC option. |
| CLI | `smriti`, `smriti-cli`, `smriti-api`, migration commands, and backward-compatible `mempalace`. |
| Docker | CPU, GPU, demo, training Dockerfiles and compose profiles. |
| Monitoring | Prometheus endpoint and Grafana dashboard assets. |
| Benchmarks | Gemma 4 public benchmark policy, cross-model harness, LoCoMo-style runner, identity bench. |
| Provider adapters | Local HF, HF Endpoint, Ollama, vLLM, and OpenAI-compatible generation adapters. |
| Memory standard | Portable memory protocol plus backend conformance runner and schema migrations. |
| Research evidence | Curated historical/current benchmark lineage without shipping noisy raw logs. |
| Tests/CI | Unit/integration tests, package build/install checks, audit report, GitHub Actions. |

## What Smriti AI Is

Smriti AI is an inference-time memory runtime. It sits between the user and the model.

For each turn, it can:

1. Read the current user message.
2. Retrieve relevant memories scoped by `session_id` and `topic_id`.
3. Query related graph triples.
4. Build an augmented prompt.
5. Generate with a frozen base model.
6. Check persona/identity drift.
7. Run a refinement pass if needed.
8. Extract new facts and triples.
9. Persist updated memory.

It is designed for:

| User Type | Why They Use It |
|---|---|
| Enterprise AI product teams | Add auditable memory to existing AI services without retraining models. |
| Personal assistant developers | Give local assistants user-specific recall across sessions. |
| Research groups | Evaluate memory augmentation, retrieval modes, and continual-learning boundaries. |
| Privacy-sensitive organizations | Keep user memory local or self-hosted with encryption and deletion hooks. |
| Multi-agent builders | Let planner, summarizer, and executor agents share one user's isolated memory. |

## What Smriti AI Is Not

Smriti AI is not a hosted model provider, a replacement foundation model, or a fine-tuning method for inference-time recall. It does not magically improve every task. It improves tasks where persistent user facts, context continuity, retrieval, or persona stability matter.

The separate `src/training/` package exists for replay/EWC research experiments. That training code is intentionally separate from the inference-time Smriti AI memory runtime.

## Key Features

| Feature | Description |
|---|---|
| Semantic retrieval | Uses embeddings to retrieve meaning-similar facts even when wording changes. |
| Hierarchical memory | Stores memory as `sessions -> topics -> facts`, giving multi-user and multi-task isolation. |
| Temporal decay | Scores memories by cosine similarity multiplied by `exp(-lambda * age)`. |
| TF-IDF mode | Keeps a lightweight lexical retrieval mode for compatibility and low-dependency environments. |
| Knowledge graph | Extracts simple subject-relation-object triples and injects related facts. |
| Identity fingerprint | Averages persona/self-description embeddings and detects drift in model outputs. |
| Memory compression | Summarizes older topic entries and archives originals before eviction. |
| Durable backends | JSON, SQLite, Redis, and Postgres backends through a common interface. |
| Encryption hooks | Optional symmetric encryption for memory blobs with `SMRITI_MEMORY_KEY`. |
| Deletion support | `/memory/delete` and CLI delete commands for user memory removal. |
| Audit dashboard | Authenticated memory table for search, edit, pin, archive, and per-entry deletion. |
| Provider adapters | Swap local Hugging Face for HF Endpoints, Ollama, vLLM, or OpenAI-compatible APIs. |
| FastAPI service | `/chat`, `/memory/load`, `/memory/save`, `/memory/delete`, `/graph/query`, `/metrics`, `/health`. |
| CLI | Local commands for config, chat, save/load/delete, graph query, server start, and benchmarks. |
| Docker | Compose stacks for API, Redis/Postgres, Prometheus/Grafana, demo, CPU/GPU images. |
| Benchmarks | Gemma 4 memory-retention, retrieval-mode comparison, latency, identity, LoCoMo-style long-memory, historical-protocol rerun. |
| CI/CD | GitHub Actions for tests, style checks, package build/install, Docker, release workflows. |

## Research Lineage And Principles

Smriti AI was built from scratch around a few durable ideas from memory-augmented small-model systems.

| Principle | Smriti AI Interpretation | Current Implementation |
|---|---|---|
| External memory | Memory should live outside model weights in a portable, inspectable store. | `MemPalaceLite`, `SemanticMemory`, durable backends, JSON export/import. |
| Training-free recall | User recall should improve at inference time without changing model weights. | Retrieved memory is injected into prompts on each call. |
| Identity continuity | Assistants should maintain persona and user-specific context across turns. | `IdentityFingerprint` detects embedding drift and can trigger refinement. |
| Small-model augmentation | Small models become more useful when paired with explicit state. | Works with Gemma 4 and other HuggingFace causal LMs. |
| Local-first privacy | Memory should be deployable on a user's own machine or infrastructure. | JSON/SQLite local stores, optional encryption, deletion endpoint. |
| MLOps reproducibility | Memory systems should be benchmarked, tested, packaged, monitored, and deployable. | CI, Docker, benchmark CSVs, reports, model card, monitoring stack. |

Historical numbers from earlier writeups are treated as research lineage. Current claims should use the current Smriti AI benchmark artifacts in this repository.

For deeper lineage and reproducibility:

| Document | Purpose |
|---|---|
| `research/evidence/README.md` | Curated historical/current evidence policy. |
| `research/evidence/benchmark_lineage.csv` | Historical and current result ledger. |
| `research/README.md` | Curated original notebook/log/excerpt manifest. |
| `docs/memory_format.md` | Portable Smriti memory format and backend contract. |
| `docs/memory_spec.md` | Stable memory protocol for JSON, SQLite, Redis, and Postgres entries. |
| `docs/kaggle_colab.md` | Kaggle/Colab reproducibility guide using package imports. |
| `smriti-ai-kaggle.ipynb` | Current Kaggle kernel notebook, mirrored from `demos/smriti_kaggle.ipynb`. |
| `demos/smriti_kaggle.ipynb` / `demos/smriti_colab.ipynb` | Reproducible package-import notebooks pinned to `smriti-memory-ai==1.0.7`. |

## Architecture

```mermaid
flowchart TD
    U["User / Agent Message"] --> A["SmritiAILite.chat"]
    A --> Q["Session + Topic Scope"]
    Q --> R["Memory Retrieval"]
    R --> S["SemanticMemory or TF-IDF"]
    R --> G["KnowledgeGraphMemory"]
    S --> C["Context Builder"]
    G --> C
    C --> P["Augmented Prompt"]
    P --> M["Frozen HuggingFace Causal LM"]
    M --> I["IdentityFingerprint"]
    I -->|"aligned"| O["Final Response"]
    I -->|"drift"| F["Refinement Pass"]
    F --> O
    O --> E["Fact + Triple Extraction"]
    E --> B["Durable Backend"]
    B --> J["JSON / SQLite / Redis / Postgres"]
```

## Core Modules

| Module | File | Responsibility |
|---|---|---|
| `SmritiAILite` | `src/smriti/agent.py` | Main model wrapper for HuggingFace generation plus memory updates. |
| `BaselineGemma` | `src/smriti/agent.py` | Plain model baseline with no memory layer. |
| `MemPalaceLite` | `src/smriti/core.py` | High-level memory facade and backward-compatible API. |
| `SemanticMemory` | `src/smriti/semantic_memory.py` | Hierarchical embedding memory with FAISS/NumPy retrieval, compression, JSON persistence. |
| `KnowledgeGraphMemory` | `src/smriti/knowledge_graph.py` | Triple extraction, graph storage, query traversal, natural-language rendering. |
| `IdentityFingerprint` | `src/smriti/identity_fingerprint.py` | Persona vectors, drift scoring, adaptive thresholds, refinement prompts. |
| `MACPLite` | `src/smriti/macp.py` | Compact reasoning continuity state. |
| Backends | `src/smriti/backends.py` | JSON, SQLite, Redis, Postgres, encryption, deletion. |
| Backend conformance | `src/smriti/backend_conformance.py` | Reusable compatibility checks for backend authors. |
| Backend migrations | `src/smriti/migrations.py`, `src/smriti/sql/` | Versioned SQLite/Postgres schema files. |
| Audit | `src/smriti/audit.py`, `src/smriti/audit_api.py` | Memory inspection, pin/archive/update/delete control plane. |
| Auth | `src/smriti/auth.py` | API-key authentication and user/admin RBAC checks. |
| Provider adapters | `src/smriti/adapters/` | Local HF, HF Endpoint, Ollama, vLLM, OpenAI-compatible generation. |
| Config | `src/smriti/config.py` | `config.yaml` and environment variable loading. |
| API | `src/smriti/api.py` | FastAPI app, observability, optional API-key auth. |
| CLI | `src/smriti/cli.py` | Local commands for config, memory, API, graph, and benchmark workflows. |
| Integrations | `src/smriti/integrations/` | LangChain and LlamaIndex adapters. |
| Training research | `src/training/ewc_replay.py` | Optional replay/EWC experiments, separate from runtime memory. |

## Installation

### Requirements

| Requirement | Notes |
|---|---|
| Python | 3.10 or newer. |
| OS | Tested locally on macOS; CI validates Linux. Windows helpers are included. |
| Model runtime | Optional unless using `SmritiAILite` with a real HuggingFace model. |
| Gemma 4 access | Public benchmark path uses `google/gemma-4-E2B-it`; users may need Hugging Face access/login. |

### Install From GitHub

```bash
git clone https://github.com/Luciferai04/smriti-ai.git
cd smriti-ai
python3 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e ".[ml,bench]"
```

For development:

```bash
pip install -e ".[dev,ml,bench]"
```

For all optional integrations:

```bash
pip install -e ".[full]"
```

### Install From PyPI

The public PyPI distribution is `smriti-memory-ai` because the shorter
`smriti-ai` project name is already occupied on PyPI by another owner. The
Python import stays clean and stable:

```python
from smriti import SmritiAILite
```

Recommended install:

```bash
pip install "smriti-memory-ai[ml]==1.0.7"
```

GitHub tag fallback:

```bash
pip install "smriti-memory-ai[ml] @ git+https://github.com/Luciferai04/smriti-ai.git@v1.0.7"
```

Then verify:

```bash
python -c "from smriti import SmritiAILite, SemanticMemory, KnowledgeGraphMemory; print('Smriti import OK')"
```

### One-Shot Installer

Linux/macOS:

```bash
./install_smriti.sh
```

Windows PowerShell:

```powershell
./install_smriti.ps1
```

Windows batch:

```bat
install_smriti.bat
```

The installer creates a local virtual environment, installs package extras, writes `config.yaml`, and can cache Gemma 4.

## Model And API Keys

Smriti AI itself does not require a model-provider API key. It is a memory layer.

| Key | Required? | Purpose |
|---|---:|---|
| `SMRITI_API_KEY` | Optional | Protects Smriti API routes. Clients send `x-api-key`. |
| `SMRITI_MEMORY_KEY` | Optional but recommended | Encrypts memory blobs before writing to disk or backend. |
| `HF_TOKEN` | Sometimes | Needed only if Hugging Face model access requires authentication. |
| Provider keys | Depends | Needed only if users connect Smriti to a hosted provider instead of a local model. |

For Gemma 4 via Hugging Face:

```bash
hf auth login
# or
export HF_TOKEN="your-huggingface-token"
```

For a protected Smriti API:

```bash
export SMRITI_API_KEY="replace-with-service-secret"
curl -H "x-api-key: $SMRITI_API_KEY" http://localhost:8000/health
```

For encrypted memory:

```bash
export SMRITI_MEMORY_KEY="replace-with-a-long-random-secret"
```

Do not commit real keys to Git.

## Quick Start For Customers

### Enterprise AI Teams

Deploy Smriti AI as a memory service behind your API gateway with Redis/Postgres and monitoring enabled:

```bash
cp api_keys.example.json api_keys.json
export AUTH_ENABLED=true
export SMRITI_API_KEYS_PATH=api_keys.json
COMPOSE_PROFILES=redis,monitoring SMRITI_MEMORY_BACKEND=redis docker compose up -d --build
```

Integrate by sending `POST /chat` requests with a stable `user_id`, then scrape `/metrics` with Prometheus and review dashboards in Grafana. Use `admin` keys only for support/audit workflows.

### Indie Developers And Personal Assistants

Install locally and start with JSON or SQLite memory:

```bash
pip install "smriti-memory-ai @ git+https://github.com/Luciferai04/smriti-ai.git@v1.0.7"
smriti-cli config wizard --backend json --overwrite
smriti-cli --session-id alex --topic-id profile chat "My name is Alex and I work at Ocean Lab."
smriti-cli --session-id alex --topic-id profile chat "What do you remember about me?"
```

Use LangChain/LlamaIndex integrations when you want Smriti memory inside an existing agent framework.

### Researchers And Startups

Clone the repo, run benchmarks, and compare retrieval modes:

```bash
git clone https://github.com/Luciferai04/smriti-ai.git
cd smriti-ai
pip install -e ".[dev,ml,bench]"
python benchmarks/run_benchmarks.py --model-preset gemma4 --max-new-tokens 16
python benchmarks/run_longmem.py --dataset-path path/to/locomo.json --retrieval-mode semantic
```

Use `smriti-ai-kaggle.ipynb`, `demos/smriti_kaggle.ipynb`, and `demos/smriti_colab.ipynb` for package-based notebook demos. They demonstrate the frozen Gemma 4 target, semantic/graph/identity memory, SQLite persistence, and delete verification without using fake/mock/tiny production claims.

### Privacy-Sensitive Deployments

Keep memory local, encrypted, auditable, and deletable:

```bash
export SMRITI_MEMORY_BACKEND=sqlite
export SMRITI_SQLITE_PATH=data/smriti_memory.sqlite3
export SMRITI_MEMORY_KEY="replace-with-a-long-random-secret"
export AUTH_ENABLED=true
smriti-cli start-server --host 127.0.0.1 --port 8000
```

Expose `/memory/delete` in your user data-deletion flow and run the authenticated audit UI only for trusted operators.

## Quick Start: Python Library

### Memory-Only Usage

This path does not require PyTorch, transformers, or a model download.

```python
from smriti import MemPalaceLite

memory = MemPalaceLite(retrieval_mode="semantic", session_id="alex", topic_id="profile")

memory.add_fact("Alex is a marine biologist in Hawaii.")
context = memory.get_context("What do you remember about Alex?")

print(context)
```

### Full Gemma 4 Usage

This uses a real model and the Smriti AI wrapper.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from smriti import SmritiAILite

model_id = "google/gemma-4-E2B-it"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

agent = SmritiAILite(
    model=model,
    tokenizer=tokenizer,
    retrieval_mode="semantic",
    session_id="alex",
    topic_id="profile",
)

agent.chat("My name is Alex and I am a marine biologist.")
reply = agent.chat("What do you remember about me?")
print(reply)
```

### Save And Load Memory

```python
agent.save_memory("smriti_memory.json")
agent.load_memory("smriti_memory.json")
```

### Direct Semantic Memory

```python
from smriti import SemanticMemory

memory = SemanticMemory()
memory.add_entry("user-a", "profile", "Maya is a doctor at a community clinic.")
results = memory.retrieve("user-a", "profile", "physician medical work", k=1)
print(results[0].entry.text)
```

### Direct Knowledge Graph

```python
from smriti import KnowledgeGraphMemory

graph = KnowledgeGraphMemory()
graph.add_triple("science", "Marie Curie", "discovered", "radium", topic_id="chemistry")
graph.add_triple("science", "radium", "is a", "chemical element", topic_id="chemistry")

facts = graph.triples_to_text(graph.query_graph("science", "Marie Curie", depth=2, topic_id="chemistry"))
print(facts)
```

## Quick Start: CLI

Smriti AI installs these commands:

| Command | Purpose |
|---|---|
| `smriti-cli` | Main CLI. |
| `smriti` | Short alias. |
| `smriti-api` | Run the FastAPI service. |
| `mempalace` | Backward-compatible alias. |

Create config:

```bash
smriti-cli init config.yaml
```

Interactive backend wizard:

```bash
smriti-cli config wizard --backend json --overwrite
```

Store and retrieve local memory:

```bash
smriti-cli --session-id alex --topic-id profile --retrieval-mode tfidf chat "My name is Alex and I work at Ocean Lab."
smriti-cli --session-id alex --topic-id profile --retrieval-mode tfidf chat "Where do I work?"
```

Save, load, delete:

```bash
smriti-cli --session-id alex memory save smriti_memory.json
smriti-cli --session-id alex memory load smriti_memory.json
smriti-cli --session-id alex memory delete --path smriti_memory.json
```

Run backend compatibility checks:

```bash
smriti-cli --backend json --backend-path data/memory backend conformance
smriti-cli --backend sqlite --backend-path data/smriti_memory.sqlite3 backend conformance
```

Migrate one user's memory between backends:

```bash
smriti-cli --session-id alex migrate-backend \
  --from-backend json --from-path data/memory \
  --to-backend sqlite --to-path data/smriti_memory.sqlite3
```

Query graph memory:

```bash
smriti-cli --session-id alex --topic-id profile graph_query user --depth 1
```

Run benchmarks:

```bash
# Installed package smoke check; internal-only, not public benchmark evidence.
SMRITI_ALLOW_TEST_DOUBLES=1 smriti-cli benchmark --quick

# Full Gemma 4 benchmark; run from a cloned source checkout.
python benchmarks/run_benchmarks.py --model-preset gemma4 --max-new-tokens 16
```

## Quick Start: FastAPI Service

Start locally:

```bash
smriti-cli start-server --host 0.0.0.0 --port 8000
# or
python -m smriti.api --host 0.0.0.0 --port 8000
```

Health check:

```bash
curl http://localhost:8000/health
```

OpenAPI docs:

```text
http://localhost:8000/docs
```

Memory/chat request:

```bash
curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "alex",
    "topic_id": "profile",
    "message": "My name is Alex and I am a marine biologist.",
    "retrieval_mode": "semantic"
  }'
```

Recall request:

```bash
curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "alex",
    "topic_id": "profile",
    "message": "What do you remember about me?",
    "retrieval_mode": "semantic"
  }'
```

### Important API Runtime Note

The default FastAPI service can operate as a memory service. If no model agent factory is configured, `/chat` returns memory-aware context and updates memory. To generate full model-backed assistant responses through the API, deploy the API process with a model runtime that registers an agent factory using `set_agent_factory`, or wrap Smriti inside your own service with `SmritiAILite`.

This separation keeps the memory service lightweight for enterprise integration while still supporting full local model-backed usage in Python.

## API Endpoints

| Method | Endpoint | Purpose |
|---|---|---|
| `GET` | `/health` | Liveness and loaded-user count. |
| `GET` | `/metrics` | Prometheus metrics. |
| `POST` | `/chat` | Retrieve context, update memory, optionally call configured model agent. |
| `POST` | `/memory/save` | Save one user's memory. |
| `POST` | `/memory/load` | Load memory from request body, backend, or path. |
| `POST` | `/memory/delete` | Delete one user's memory from RAM/backend/path. |
| `POST` | `/memory/list` | Audit/list one user's memory entries. |
| `POST` | `/memory/update` | Edit an individual memory entry. |
| `POST` | `/memory/pin` | Pin/unpin an important memory entry. |
| `POST` | `/memory/archive` | Archive/unarchive an entry. |
| `POST` | `/memory/entry/delete` | Delete one memory entry. |
| `POST` | `/graph/query` | Query session-scoped graph facts. |
| `GET` | `/docs` | FastAPI Swagger UI. |

## Configuration

Smriti AI reads `config.yaml` by default or the file pointed to by `SMRITI_CONFIG_PATH`.

```yaml
memory:
  backend: json
  memory_dir: data/memory
  sqlite_path: data/smriti_memory.sqlite3
  redis_url: redis://localhost:6379/0
  postgres_dsn: ""
  autosave: true

security:
  encryption_key: ""

api:
  host: 0.0.0.0
  port: 8000
  cors_origins:
    - "*"

model:
  adapter: local_hf
  base_model_id: google/gemma-4-E2B-it
  hf_endpoint_url: ""
  ollama_url: http://localhost:11434
  vllm_url: http://localhost:8000
  openai_compatible_url: https://api.openai.com/v1
```

Environment variables override config values:

| Variable | Purpose |
|---|---|
| `SMRITI_CONFIG_PATH` | Path to config file. |
| `SMRITI_MEMORY_BACKEND` | `json`, `sqlite`, `redis`, or `postgres`. |
| `SMRITI_MEMORY_DIR` | JSON memory directory. |
| `SMRITI_SQLITE_PATH` | SQLite database path. |
| `SMRITI_REDIS_URL` | Redis connection URL. |
| `SMRITI_POSTGRES_DSN` | Postgres DSN. |
| `SMRITI_AUTOSAVE` | Save memory after API updates. |
| `SMRITI_MEMORY_KEY` | Encrypt memory blobs. |
| `SMRITI_API_KEY` | Protect API routes. |
| `AUTH_ENABLED` | Enable role-bound API-key auth. |
| `SMRITI_API_KEYS_PATH` | Path to `api_keys.json` with user/admin keys. |
| `SMRITI_CORS_ORIGINS` | Comma-separated CORS allowlist. |
| `SMRITI_HOST` | API host. |
| `SMRITI_PORT` | API port. |
| `SMRITI_MODEL_ADAPTER` | `local_hf`, `hf_endpoint`, `ollama`, `vllm`, or `openai`. |
| `BASE_MODEL_ID` | Base model ID, default `google/gemma-4-E2B-it`. |
| `HF_ENDPOINT_URL` | Hugging Face Inference Endpoint URL when using endpoint mode. |
| `OPENAI_API_KEY` | Provider key only when using OpenAI-compatible adapter. |

## Durable Memory Backends

| Backend | Best For | Notes |
|---|---|---|
| JSON | Local experiments, privacy-first single-machine usage. | Simple files under `data/memory`. |
| SQLite | Local production, desktop apps, edge devices. | Single-file database, no external service. |
| Redis | Low-latency state for deployed agents. | Good for concurrent API use; use persistence in production. |
| Postgres | Enterprise durability and operational tooling. | Good fit for audited multi-user deployments. |

Examples:

```bash
SMRITI_MEMORY_BACKEND=json smriti-cli start-server
SMRITI_MEMORY_BACKEND=sqlite SMRITI_SQLITE_PATH=data/smriti.sqlite3 smriti-cli start-server
SMRITI_MEMORY_BACKEND=redis SMRITI_REDIS_URL=redis://localhost:6379/0 smriti-cli start-server
SMRITI_MEMORY_BACKEND=postgres SMRITI_POSTGRES_DSN=postgresql://host:5432/smriti smriti-cli start-server
```

## Docker

### Local CPU API

```bash
docker compose up -d --build api
```

### API With Redis

```bash
COMPOSE_PROFILES=redis SMRITI_MEMORY_BACKEND=redis docker compose up -d --build
```

### API With Postgres

```bash
COMPOSE_PROFILES=postgres SMRITI_MEMORY_BACKEND=postgres docker compose up -d --build
```

### GPU-Capable Image

When Docker has NVIDIA runtime support:

```bash
SMRITI_DOCKERFILE=Dockerfile docker compose up -d --build api
```

### Production Compose

```bash
docker compose -f docker-compose.prod.yml up -d
```

### Monitoring Stack

```bash
COMPOSE_PROFILES=monitoring docker compose up -d --build
```

Then open:

| Service | URL |
|---|---|
| API | `http://localhost:8000` |
| API docs | `http://localhost:8000/docs` |
| Prometheus | `http://localhost:9090` |
| Grafana | `http://localhost:3000` |

Default Grafana credentials in local compose:

```text
admin / smriti
```

## Hugging Face Model-Style Deployment

Smriti AI can also be packaged as a Hugging Face model repository with a custom `handler.py`. This does not make Smriti AI a newly trained foundation model. It packages the memory wrapper, model card, endpoint config, example requests, and upload tooling so Hugging Face Inference Endpoints can serve a memory-augmented base model.

Deployment assets live in:

```text
deploy/huggingface_model/
deploy/huggingface_dataset/
deploy/huggingface_space/
```

Local handler smoke test:

```bash
BASE_MODEL_ID=google/gemma-4-E2B-it \
HF_TOKEN=$HF_TOKEN \
SMRITI_MEMORY_BACKEND=json \
SMRITI_MEMORY_PATH=/tmp/smriti_hf_test.json \
python deploy/huggingface_model/test_handler_local.py
```

Upload to a Hugging Face model repo:

```bash
export HF_TOKEN=...
python deploy/huggingface_model/upload_model_repo.py \
  --repo-id luciferai-devil/smriti-ai \
  --private false
```

Official v1.0 Hugging Face targets:

| Asset | Repo |
|---|---|
| Model wrapper | `luciferai-devil/smriti-ai` |
| Benchmark dataset | `luciferai-devil/smriti-ai-benchmarks` |
| CPU-safe demo Space | `luciferai-devil/smriti-ai-demo` |

Upload sanitized benchmark artifacts:

```bash
python deploy/huggingface_dataset/upload_benchmark_dataset.py \
  --repo-id luciferai-devil/smriti-ai-benchmarks \
  --private false
```

Upload the public demo Space:

```bash
python deploy/huggingface_space/upload_space.py \
  --repo-id luciferai-devil/smriti-ai-demo \
  --private false
```

The Space runs in CPU-safe memory-only mode by default, warns users not to enter PII, and auto-deletes demo memory after inactivity.

Use `BASE_MODEL_ID` for a locally loaded model inside the endpoint, or `HF_ENDPOINT_URL` if the Smriti handler should call another model endpoint. Production endpoints should use external Redis/Postgres memory and must not store private user memory inside the model repository.

See `docs/deploy_as_hf_model.md` for the full deployment guide.

## Monitoring And Observability

The API exports Prometheus metrics from `/metrics`.

| Metric | Meaning |
|---|---|
| `smriti_http_requests_total` | Request count by method, path, status. |
| `smriti_http_errors_total` | Server-side error count. |
| `smriti_http_request_latency_seconds` | End-to-end request latency histogram. |
| `smriti_retrieval_latency_seconds` | Memory retrieval latency histogram. |
| `smriti_tokens_total` | Approximate token count observed by API. |
| `smriti_user_memories` | Number of loaded user memory stores. |
| `smriti_user_memory_bytes` | Approximate serialized memory size by user. |

Observability helper:

```bash
python scripts/metrics_monitor.py --url http://localhost:8000 --output reports/metrics_report.md
```

## Privacy And Security

Smriti AI stores user memory, so privacy is a core operational concern.

| Requirement | Smriti AI Support |
|---|---|
| User isolation | Memory is keyed by `user_id` / `session_id` and `topic_id`. |
| Deletion | `/memory/delete` and `smriti-cli memory delete`. |
| Encryption | Set `SMRITI_MEMORY_KEY` to encrypt backend blobs. |
| API protection | Set `SMRITI_API_KEY` or `AUTH_ENABLED=true` with `api_keys.json`. |
| RBAC | `user` keys can access only their bound `user_id`; `admin` keys can operate across users. |
| Local-first deployment | JSON/SQLite can run fully on-device. |
| Auditability | Memory can be exported, inspected, pinned, archived, edited, and deleted. |

Delete user memory through API:

```bash
curl -X POST http://localhost:8000/memory/delete \
  -H "Content-Type: application/json" \
  -d '{"user_id": "alex"}'
```

Read more in `docs/privacy.md`.

Role-bound API keys:

```bash
cp api_keys.example.json api_keys.json
export AUTH_ENABLED=true
export SMRITI_API_KEYS_PATH=api_keys.json
curl -H "x-api-key: replace-with-user-key" http://localhost:8000/health
```

Read more in `docs/auth.md`.

## Framework Integrations

### LangChain

```python
from smriti.integrations.langchain import SmritiMemory

memory = SmritiMemory(session_id="alex", topic_id="profile")
memory.save_context(
    {"input": "My name is Alex and I work at Ocean Lab."},
    {"output": "Nice to meet you, Alex."},
)
print(memory.load_memory_variables({"input": "Where do I work?"}))
```

### LlamaIndex

```python
from smriti.integrations.llama_index import SmritiStorageContext

storage = SmritiStorageContext(session_id="alex", topic_id="profile")
storage.add_node("Alex is a marine biologist.")
print(storage.query("What does Alex do?"))
```

## Web Demo

The demo app lets users inject facts, ask distractors, view retrieved memories, and delete user memory.

```bash
pip install -e ".[demo]"
uvicorn demo.app:app --port 8080
```

You can also run the packaged module directly:

```bash
python -m demo.app
```

Screenshots from the current dashboard:

![Smriti AI dashboard home](docs/assets/smriti-dashboard-home.png)

![Smriti AI benchmark evidence](docs/assets/smriti-dashboard-benchmarks.png)

See `src/demo/README.md` for details.

### Memory Audit Dashboard

The audit dashboard is a separate authenticated UI for operators and privacy reviews:

```bash
export SMRITI_AUDIT_USER=admin
export SMRITI_AUDIT_PASSWORD="replace-with-a-strong-password"
uvicorn demo.audit_app:app --port 8090
```

Open `http://127.0.0.1:8090` and sign in with the configured credentials. Use it to search, edit, pin, archive, and delete individual memories.

## Model Provider Adapters

Smriti AI can wrap multiple generation providers through a small adapter interface:

```python
from smriti.adapters import build_adapter

adapter = build_adapter("hf_endpoint")
text = adapter.generate("Augmented Smriti prompt", max_new_tokens=128)
```

Supported adapters:

| Adapter | Typical target |
|---|---|
| `local_hf` | Local Transformers Gemma 4. |
| `hf_endpoint` | Hugging Face Inference Endpoint. |
| `ollama` | Local Ollama REST server. |
| `vllm` | vLLM server. |
| `openai` | OpenAI-compatible hosted APIs. |

Read more in `docs/adapters.md`.

## Benchmarks

### Benchmark Policy

Public benchmark claims in this repository use real Gemma 4 only:

```text
google/gemma-4-E2B-it
```

Deterministic test-double paths may exist for engineering tests, but they are not public model-quality claims.

### Current Local Gemma 4 Results

These are current local CPU measurements from the checked-in CSV artifacts.
Benchmark-readiness audit status: `benchmark_staging_only`.
These results are valid validation evidence for the memory harness, but they are
not yet industry-grade public production benchmark claims. The current artifacts
use 21 explicit recall checks, 9 holdout query events, one local real-model run
bundle, and deterministic cross-model harness smoke checks. Use the exact
wording below until larger repeated real-model benchmarks are available.

| Evaluation | Baseline Recall | Best Smriti AI Recall | Absolute Lift | Notes |
|---|---:|---:|---:|---|
| Gemma-style three-fact protocol | 0/3 | 3/3 | +3 facts | Baseline 5.71s, Semantic+Graph+Identity 4.99s avg CPU latency. |
| Five-mode comparison (`max_new_tokens=16`) | 0/3 | 3/3 | +3 facts | Fastest successful memory mode: Semantic+Graph at 2.78s avg CPU latency. |
| Original broader protocol rerun (`max_new_tokens=256`) | 0/3 | 3/3 | +3 facts | Overall average improved from 0.524 to 0.832 (`+58.9%`). |

Five-mode comparison:

| Configuration | Recall | Avg Latency | Context Coherence | Notes |
|---|---:|---:|---:|---|
| Baseline | 0/3 | 4.927s | 0.000 | Frozen Gemma 4, no memory layer. |
| TF-IDF | 3/3 | 3.481s | 0.667 | Lexical memory mode. |
| Semantic | 3/3 | 2.857s | 0.333 | Embedding-based memory mode. |
| Semantic + Graph | 3/3 | 2.781s | 0.667 | Fastest successful memory mode in this CPU run. |
| Semantic + Graph + Identity | 3/3 | 5.164s | 0.000 | Adds persona governance overhead. |

Original broader protocol rerun:

| Metric | Baseline | Smriti AI | Delta |
|---|---:|---:|---:|
| Memory retention | 0.000 | 1.000 | +inf% |
| Response consistency | 0.571 | 0.496 | -13.2% |
| Context coherence | 1.000 | 1.000 | +0.0% |
| Overall average | 0.524 | 0.832 | +58.9% |

The older `+31.2%` overall number from earlier writeups remains historical lineage. The current comparable broader-protocol rerun is `+58.9%` under this local Gemma 4 CPU setup with `max_new_tokens=256`.

Safe external wording:

```text
Current local Gemma 4 validation artifacts show 3/3 recall on the checked-in
memory-retention protocol. Larger repeated real-model and holdout benchmarks
are required before claiming industry-standard benchmark superiority.
```

Run the benchmark-readiness audit:

```bash
make benchmark-readiness
```

Plan the enterprise industry benchmark without running real models:

```bash
make benchmark-industry
make benchmark-industry-hf
```

See `docs/industry_benchmarks.md` for the full industry gate, required model
matrix, provenance schema, and safe public-claim rules.

### Run Benchmarks

Install benchmark and ML extras:

```bash
pip install -e ".[ml,bench]"
```

Gemma-style memory retention:

```bash
python benchmarks/run_gemma_eval.py
```

Five-configuration comparison:

```bash
python benchmarks/run_benchmarks.py \
  --model-preset gemma4 \
  --configurations tfidf semantic semantic_graph semantic_graph_identity \
  --devices auto \
  --max-new-tokens 16 \
  --output benchmarks/results_comparison.csv
```

Original broader protocol rerun:

```bash
python benchmarks/run_historical_protocol.py --max-new-tokens 256
```

Cross-model harness:

```bash
python benchmarks/run_benchmarks.py \
  --model-preset cross_model \
  --output benchmarks/cross_model_results.csv \
  --summary-output benchmarks/summary.md
```

Long-memory / LoCoMo-style runner:

```bash
python benchmarks/run_longmem.py --dataset-path path/to/locomo.json --retrieval-mode semantic
```

Identity drift benchmark:

```bash
python benchmarks/run_identity_bench.py --output reports/identity_evaluation.csv
```

Aggregate summaries:

```bash
python benchmarks/summarize_results.py
```

### Benchmark Evidence Files

| File | Purpose |
|---|---|
| `benchmarks/results_gemma_eval.csv` | Gemma 4 baseline vs Smriti three-fact evaluation. |
| `benchmarks/results_comparison.csv` | Baseline, TF-IDF, semantic, semantic+graph, semantic+graph+identity. |
| `benchmarks/results_historical_protocol.csv` | Current rerun of the older broader protocol. |
| `benchmarks/results_historical_protocol_responses.json` | Response audit trail for the broader-protocol rerun. |
| `benchmarks/cross_model_results.csv` | Optional cross-model memory-retention comparison. |
| `benchmarks/longmem_results.csv` | Optional LoCoMo-style long-memory output. |
| `benchmarks/latency_gemma4.csv` | Dedicated Gemma 4 latency/token probe. |
| `reports/identity_evaluation.csv` | Persona drift detection benchmark. |
| `results/summary.md` | Human-readable aggregate summary. |
| `benchmarks/README.md` | Generated benchmark table. |
| `model_card_smriti.md` | Model card and result disclaimer. |
| `research/evidence/benchmark_lineage.csv` | Historical/current result ledger and claim-status labels. |

## Testing

Run the full test suite:

```bash
pytest -q
```

Run the production hardening matrices:

```bash
make test              # unit + deterministic test-double integration
make test-security     # prompt injection, redaction, auth/RBAC, delete/encryption
make test-benchmarks   # deterministic benchmark artifacts and budgets
make production-gates  # manifest, regression, privacy, and gate report checks
make end-user-readiness # first-run install/docs/CLI/deployment readiness checks
```

These PR-safe tests use deterministic test-double paths. Gemma 4 and other
real-model benchmarks are reserved for nightly/manual runs so ordinary
contributors do not need to download large gated checkpoints.

Run with coverage:

```bash
pytest --cov=smriti --cov-report=term-missing --cov-report=html:reports/coverage/html
```

Run style checks:

```bash
ruff check benchmarks scripts src tests
```

Build and install the wheel locally:

```bash
python -m build
python -m venv .venv-wheel
source .venv-wheel/bin/activate
pip install dist/smriti_ai-*.whl
python -c "from smriti import SmritiAILite, SemanticMemory, KnowledgeGraphMemory; print('wheel OK')"
```

## Smoke Tests

Local API smoke test:

```bash
bash scripts/smoke_test.sh
```

Latency probe:

```bash
python scripts/measure_latency.py --retrieval-modes tfidf semantic --output benchmarks/latency_results.csv
```

Load test helper:

```bash
python scripts/load_test_runner.py --users 10 --spawn-rate 10 --run-time 30s --backend json
```

See `docs/load_testing.md` for the 10/100/1000-user matrix and report files.

Fault-tolerance probe:

```bash
python scripts/fault_tolerance_tests.py --url http://localhost:8000
```

## Agentic Harness Evolution

Smriti AI now includes an AHE-inspired loop for improving the inference-time
memory harness while keeping Gemma 4 or any other base model frozen.

| Layer | File | Purpose |
|---|---|---|
| Harness config | `configs/harness_params.yaml` | Editable retrieval, graph, compression, and identity-governance parameters. |
| Evidence collection | `benchmarks/collect_evidence.py` | Runs memory-retention or JSON/JSONL long-memory tasks and writes summary/log evidence. |
| Evolution decision | `evolve_harness.py` | Applies bounded heuristics and appends a predicted-impact manifest entry. |
| Closed loop | `run_evolution.py` | Re-evaluates proposed configs, reverts regressions, and can tag Git iterations. |
| Audit trail | `manifests/evolve_manifest.jsonl` | JSONL history of component changed, previous/new values, reason, prediction, observed effect, and config snapshots. |
| Harness registry | `harnesses/` | Versioned seed/evolved harness artifacts with metadata, results, and production status. |
| Manifest verifier | `src/smriti/manifest_verifier.py` | Validates that each accepted/rejected change has before/after evidence. |
| Production gates | `src/smriti/production_gates.py` | Runs tests, backend/privacy checks, validation, holdout, cross-model, latency, token, and identity gates before promotion. |
| Canary routing | `src/smriti/canary.py` | Sticky per-user canary routing for evolved harnesses with rollback conditions. |

Quick local loop:

```bash
python benchmarks/collect_evidence.py \
  --config configs/harness_params.yaml \
  --summary benchmarks/evidence_summary.json

python evolve_harness.py \
  --config configs/harness_params.yaml \
  --evidence benchmarks/evidence_summary.json

python run_evolution.py --iterations 5 --no-commit
```

Validation and release-gate loop:

```bash
python benchmarks/validate_harness_evolution.py \
  --seed-config harnesses/seed/harness_params.yaml \
  --evolved-config harnesses/evolved-v1/harness_params.yaml

python benchmarks/run_holdout_eval.py \
  --config harnesses/evolved-v1/harness_params.yaml

python benchmarks/run_cross_model_harness_eval.py \
  --seed-config harnesses/seed/harness_params.yaml \
  --evolved-config harnesses/evolved-v1/harness_params.yaml

python harness/verify_manifest.py
python harness/production_gates.py evolved-v1 --to candidate
```

Harness registry CLI:

```bash
smriti-cli harness list
smriti-cli harness show evolved-v1
smriti-cli harness compare seed evolved-v1
smriti-cli harness activate evolved-v1
smriti-cli harness rollback seed
smriti-cli harness verify-manifest
smriti-cli harness promote evolved-v1 --to production
smriti-cli harness regression-test
```

API/dashboard support:

| Endpoint | Purpose |
|---|---|
| `GET /harness/current` | Show active harness parameters and registry entries. |
| `GET /harness/history` | Return manifest history. |
| `GET /harness/metrics` | Return validation and canary metrics. |
| `POST /harness/rollback` | Roll back to a registry harness. Admin-only when auth is enabled. |
| `POST /harness/evaluate` | Run seed-vs-evolved validation. Admin-only when auth is enabled. |
| `GET /harness/canary/status` | Show active/canary routing status. |
| `POST /harness/canary/start` | Start sticky canary routing. Admin-only when auth is enabled. |
| `POST /harness/canary/stop` | Stop canary routing. |
| `POST /harness/canary/promote` | Promote canary harness. |
| `POST /harness/canary/rollback` | Roll back canary harness. |

The web dashboard also exposes a harness cockpit with current parameters,
recent manifest entries, manual overrides, rollback controls, quick evaluation,
seed comparison, and report export. See `docs/research_lineage.md` for the
research rationale and AHE mapping.

Generated harness artifacts:

| Artifact | Purpose |
|---|---|
| `results/harness_evolution_validation.md` | Baseline vs seed vs evolved harness validation. |
| `results/evolution_generalization_report.md` | Final holdout evaluation. |
| `results/cross_model_harness_eval.md` | Cross-model deterministic harness validation. |
| `results/manifest_verification.md` | Manifest integrity report. |
| `results/production_gate_report.md` | Promotion gate verdict. |
| `results/canary_report.md` | Canary routing status and metrics. |
| `reports/evolution_report.md` | Stakeholder-readable evolution report. |

## GPU And CPU Behavior

Smriti AI is designed to fall back cleanly to CPU.

| Component | CPU | GPU |
|---|---|---|
| Base model | Works, slower for Gemma 4. | Moves model to CUDA when available. |
| Generation dtype | `float32` on CPU. | `bfloat16` if supported, else `float16`. |
| Embeddings | Sentence-transformers can run on CPU. | Embedding model can move to CUDA. |
| FAISS | Uses CPU by default. | Attempts GPU indices when CUDA FAISS support is available. |

For practical demos with Gemma 4, GPU is recommended. CPU is acceptable for reproducibility but slower.

## Training Research Package

Runtime Smriti AI is training-free. The training package is separate and optional.

```bash
pip install -e ".[training]"
python -m training.ewc_replay --model google/gemma-4-E2B-it --dataset path/to/data.jsonl --dry-run
```

The training module includes replay/EWC experiment scaffolding and logs metrics under `training/`. It is not imported by `smriti` during inference.

## CI/CD

| Workflow | Trigger | Purpose |
|---|---|---|
| `.github/workflows/ci.yml` | Push / PR | Install, lint, test, compile, build, install wheel, audit, upload artifacts. |
| `.github/workflows/test_agent_hardening.yml` | Push / PR/manual | Unit, test-double integration, OWASP-style security, benchmark smoke, production gates, optional backend jobs. |
| `.github/workflows/nightly_benchmarks.yml` | Nightly/manual | Real-model Gemma-style, retrieval, holdout, and identity benchmarks. |
| `.github/workflows/harness_production_gate.yml` | Harness/API/benchmark changes | Verify manifest and run production gates for evolved harnesses. |
| `.github/workflows/benchmark.yml` | Nightly/manual | Run benchmark suite on a small configured setup. |
| `.github/workflows/load-test.yml` | Push/nightly/manual | Run a 10-user API load smoke test and upload reports. |
| `.github/workflows/docker.yml` | Tags/manual | Build API/demo/training Docker images. |
| `.github/workflows/release.yml` | Tag push | Build package and publish release artifacts. |

Latest local push for the harness-production work passed CI, Harness Production
Gate, and Load Test workflows.

## Repository Layout

```text
smriti-ai/
|-- src/smriti/                  # Runtime memory package
|-- src/training/                # Optional replay/EWC research code
|-- src/demo/                    # Small web demo
|-- demos/                       # Kaggle/Colab package-import notebooks
|-- benchmarks/                  # Gemma 4 evaluations and CSV results
|-- configs/                     # Harness and runtime parameter files
|-- harness/                     # Manifest verification and production-gate wrappers
|-- harnesses/                   # Versioned seed/evolved harness registry
|-- tests/                       # Unit and integration tests
|-- scripts/                     # Setup, smoke, latency, load, fault probes
|-- docs/                        # Privacy and API documentation
|-- manifests/                   # AHE JSONL evolution audit trail
|-- research/artifacts/          # Curated original notebooks, logs, and excerpts
|-- research/evidence/           # Curated benchmark lineage and evidence policy
|-- monitoring/                  # Prometheus/Grafana assets
|-- support/                     # Troubleshooting and sample configs
|-- notebooks/                   # Package-based demo notebook
|-- reports/                     # Readiness, coverage, metrics reports
|-- Dockerfile                   # GPU-capable API image
|-- Dockerfile.cpu               # Lightweight CPU API image
|-- Dockerfile.demo              # Demo image
|-- Dockerfile.training          # Training/research image
|-- docker-compose.yml           # Local API/backends/monitoring stack
|-- docker-compose.prod.yml      # Production-oriented compose stack
|-- pyproject.toml               # Package metadata and extras
|-- config.yaml                  # Local config template
|-- evolve_harness.py            # One-step harness evolution proposal script
|-- run_evolution.py             # Closed-loop evidence/evolve/verify driver
|-- ROADMAP.md                   # Post-v1 roadmap including AHE hardening
|-- model_card_smriti.md         # Model card and benchmark disclosure
```

## Production Readiness Notes

| Area | Recommendation |
|---|---|
| Auth | Set `SMRITI_API_KEY` or place API behind an authenticated gateway. |
| RBAC | Use `AUTH_ENABLED=true` and role-bound keys for production endpoints. |
| Secrets | Use environment variables or a secret manager, not committed config files. |
| Storage | Use SQLite for local apps, Redis/Postgres for server deployments. |
| Encryption | Set `SMRITI_MEMORY_KEY` for sensitive memory. |
| Backups | Back up JSON/SQLite/Postgres memory stores according to your RPO/RTO. |
| Observability | Scrape `/metrics` and use Grafana dashboard panels. |
| Load testing | Run Locust/wrk-style load tests before enterprise rollout. |
| Deletion | Wire `/memory/delete` into user data deletion workflows. |
| Audit | Protect `demo.audit_app` and audit endpoints before exposing memory inspection. |
| Benchmarking | Rerun Gemma 4 benchmarks on your hardware before publishing claims. |

See `reports/production_readiness.md` for the latest QA snapshot.

## Troubleshooting

| Problem | Likely Cause | Fix |
|---|---|---|
| `ModuleNotFoundError: smriti` | Package not installed in current environment. | Run `pip install -e .` or activate the correct venv. |
| `ModuleNotFoundError: transformers` | ML extras not installed. | Run `pip install -e ".[ml]"`. |
| Gemma 4 fails to load | Missing Hugging Face access or incompatible Transformers stack. | Run `hf auth login` and update ML dependencies. |
| API returns memory-only text | No model agent factory is registered. | Use `SmritiAILite` in Python or deploy API with an agent factory. |
| API returns `401` | `SMRITI_API_KEY` is set. | Send `x-api-key: <key>`. |
| Memory not persisted | Autosave disabled or backend not configured. | Set `SMRITI_AUTOSAVE=1` and configure backend. |
| Encrypted memory cannot load | Missing or wrong `SMRITI_MEMORY_KEY`. | Set the same key used when saving. |
| Docker image is large | ML dependencies and model runtimes are heavy. | Use `Dockerfile.cpu` for API-only memory service. |
| Benchmarks are slow | Gemma 4 on CPU is heavy. | Use GPU or reduce `max_new_tokens` for local checks. |

## Roadmap

| Release | Theme | Planned Work |
|---|---|---|
| v1.1 | Memory quality | Hot/cold memory tiers, stronger compression, multilingual embeddings, cross-lingual recall tests, configurable decay/top-K/summarization thresholds. |
| v1.2 | Scalability | Async backend paths, `asyncpg` Postgres option, batched writes, embedding cache, 100/500/1000-user load reports. |
| v1.3 | Research | LongMemEval/MemoryBench tracking, SmritiBench design, temporal/weighted graph memory, cross-agent shared memory with strict isolation. |
| Ongoing | Community | Good-first issues, backend/adapter contribution guides, benchmark reproducibility reports, pilot-user feedback loops. |

See `ROADMAP.md` for the living post-v1 roadmap.

## Contributing

See `CONTRIBUTING.md` for development setup and contribution guidance.

Recommended local loop:

```bash
pip install -e ".[dev,bench]"
ruff check src benchmarks scripts tests
pytest -q
python -m build
```

Release history and stakeholder-facing notes live in `CHANGELOG.md` and `RELEASE_NOTES_v1.0.7.md`. A tutorial draft for the v1 memory protocol, audit UI, and benchmark evidence lives in `docs/blog/smriti-ai-v1-memory-layer.md`.

## License

Apache-2.0. See `pyproject.toml` for package metadata.

<!-- HARNESS_EVOLUTION_RESULTS_START -->
## Harness Evolution Results

The base model remains frozen. Smriti AI is not fine-tuned; these numbers come from memory-harness evaluation.

| System | Recall | Precision@K | p95 latency ms | Token overhead | Privacy delete |
|---|---:|---:|---:|---:|---|
| baseline_frozen_model | 0.000 | 0.000 | 0.000 | 0 | True |
| smriti_seed_harness | 1.000 | 0.333 | 0.525 | 328 | True |
| smriti_evolved_harness | 1.000 | 0.333 | 0.168 | 328 | True |

Cross-model harness validation:

| Model | Seed recall | Evolved recall | Gate |
|---|---:|---:|---|
| google/gemma-4-E2B-it | 1.000 | 1.000 | pass |
| meta-llama/Llama-3.2-1B | 1.000 | 1.000 | pass |
| microsoft/Phi-3-mini-4k-instruct | 1.000 | 1.000 | pass |
| mistralai/Mistral-7B-Instruct-v0.3 | 1.000 | 1.000 | pass |
| Qwen/Qwen2.5-1.5B-Instruct | 1.000 | 1.000 | pass |

Production gate report: `results/production_gate_report.md`
 
Deterministic test doubles are used only for CI stability and never counted as public benchmark evidence.
By - Soumyajit Ghosh
<!-- HARNESS_EVOLUTION_RESULTS_END -->
