Metadata-Version: 2.4
Name: memory-layer-yadu9989
Version: 0.3.1
Summary: Local-first context engineering and style adaptation for LLM-assisted coding
Project-URL: Homepage, https://github.com/yadu9989/memory-layer
Project-URL: Issues, https://github.com/yadu9989/memory-layer/issues
Project-URL: Changelog, https://github.com/yadu9989/memory-layer/blob/main/CHANGELOG.md
License: MIT License
        
        Copyright (c) 2026 Neeraj Yadav
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: coding-assistant,context,llm,mcp,memory,ollama
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Requires-Dist: apscheduler>=3.10
Requires-Dist: fastapi>=0.115
Requires-Dist: httpx>=0.27
Requires-Dist: pydantic>=2.10
Requires-Dist: python-dotenv>=1.0
Requires-Dist: sqlalchemy>=2.0.40
Requires-Dist: sqlite-vec>=0.1.6
Requires-Dist: tiktoken>=0.7
Requires-Dist: uvicorn[standard]>=0.32
Requires-Dist: watchdog>=4.0
Provides-Extra: cpu-embeddings
Requires-Dist: numpy>=1.26; extra == 'cpu-embeddings'
Requires-Dist: sentence-transformers>=3.0; extra == 'cpu-embeddings'
Provides-Extra: dev
Requires-Dist: mypy>=1.11; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: ruff>=0.6; extra == 'dev'
Provides-Extra: train
Requires-Dist: accelerate>=0.30; extra == 'train'
Requires-Dist: peft>=0.12; extra == 'train'
Requires-Dist: torch>=2.3; extra == 'train'
Requires-Dist: transformers>=4.45; extra == 'train'
Provides-Extra: unsloth
Requires-Dist: unsloth; extra == 'unsloth'
Description-Content-Type: text/markdown

# memory-layer

A local-first **context engineering and style adaptation** system for LLM-assisted coding.

`memory-layer` runs alongside your codebase, maintains a compact semantic graph of your code, and feeds that compressed representation to any LLM on every prompt. A weekly LoRA fine-tuning job adapts a local base model to your *style* — your formatting conventions, type annotation patterns, and error-handling idioms.

---

## Install

```bash
pipx install memory-layer
```

Requires Python 3.11+ and [Ollama](https://ollama.com).

> **No Python?** Download a pre-built binary for your platform from the [Releases](https://github.com/yadu9989/memory-layer/releases) page (Linux x86_64, macOS arm64, Windows x86_64).

---

## Quick start

```bash
# Interactive setup: detects Ollama, suggests a model based on your GPU,
# writes ~/.memory-layer/config.toml
memory-layer init

# Start the file watcher + REST API (port 8000)
memory-layer api

# In a separate terminal — start the MCP server for editor integration
memory-layer mcp
```

The interactive `init` wizard:
- Detects whether Ollama is running (and tells you how to start it if not)
- Reads your GPU VRAM via `nvidia-smi` and suggests the best model size
- Writes `~/.memory-layer/config.toml` (no `.env` file required)

Skip all prompts with `--yes` for CI / Docker:

```bash
memory-layer init --yes --watch /path/to/project
```

### Verify

```bash
curl http://localhost:8000/health          # → {"status":"ok"}
curl http://localhost:8000/context         # → {"context":"# Memory Layer Context ..."}
memory-layer status                        # active project, model, token-saving %
```

---

## What it does

| Layer | Responsibility |
|-------|---------------|
| **Layer 1** — Grounding Feed | Watches your files, extracts AST surfaces via tree-sitter, summarises each module with a local Ollama model |
| **Layer 2** — Memory Engine | Maintains the project state: entity graph, change history, semantic embeddings, developer profile |
| **Layer 3** — Context Delivery | Assembles a token-budgeted context block from the DB and delivers it over MCP (stdio) or HTTP (FastAPI) |
| **Layer 4** — Style Adapter | Runs a weekly LoRA fine-tune on your accepted/corrected completions to adapt the model to your coding style |

### What it does *not* do

- Layer 4 does **not** teach the model facts about your codebase. Facts live in Layer 2; retrieved on every prompt by Layer 3.
- The system does not claim the model "remembers your codebase through training." Layer 4 learns your *style*, not your code.
- Cloud sync, team sharing, and VS Code extensions are V4 features in a separate repo.

---

## Multi-project support (V3)

V3 manages multiple projects automatically. Each repo gets its own isolated database under `<repo>/.memory-layer/`. A global registry at `~/.memory-layer/registry.db` tracks which repos are active.

```bash
memory-layer projects                        # list all registered repos
memory-layer register /path/to/other/repo   # manual registration
memory-layer unregister <project-id>        # remove from registry
```

When your editor sends an MCP `roots/list_changed` notification (Cursor, Claude Desktop, Continue.dev), `memory-layer mcp` switches the active project automatically.

---

## Editor integration (MCP)

### Claude Desktop

Add to `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or `%APPDATA%\Claude\claude_desktop_config.json` (Windows):

```json
{
  "mcpServers": {
    "memory-layer": {
      "command": "memory-layer",
      "args": ["mcp"]
    }
  },
  "systemPromptAppend": "Before answering coding questions, call the memory-layer get_context tool to load the current project state. This is required for every new session."
}
```

Claude Desktop auto-surfaces the `memory://current/context` resource to the model. The `systemPromptAppend` is belt-and-suspenders for sessions where the resource is not picked up automatically.

> Run `memory-layer api` in a separate terminal so the MCP server sees live file changes.

### Cursor

Add to `.cursor/mcp.json` in your project root (or `~/.cursor/mcp.json` for global):

```json
{
  "mcpServers": {
    "memory-layer": {
      "command": "memory-layer",
      "args": ["mcp"]
    }
  }
}
```

### Continue.dev

In `~/.continue/config.json`:

```json
{
  "experimental": {
    "modelContextProtocolServers": [
      {
        "transport": {
          "type": "stdio",
          "command": "memory-layer",
          "args": ["mcp"]
        }
      }
    ]
  }
}
```

### Cline (VS Code extension)

In Cline's MCP settings (gear icon → MCP Servers → Add):

```json
{
  "memory-layer": {
    "command": "memory-layer",
    "args": ["mcp"],
    "disabled": false,
    "autoApprove": ["get_context", "semantic_search"]
  }
}
```

### Gemini Code Assist (Standard / Enterprise)

In your workspace `.gemini/settings.json`:

```json
{
  "mcpServers": {
    "memory-layer": {
      "command": "memory-layer",
      "args": ["mcp"]
    }
  }
}
```

### Verifying auto-context

To confirm the AI is reading the context block automatically — without you calling `get_context` manually:

1. Start `memory-layer api` (watches files, writes DB) and `memory-layer mcp` (serves MCP).
2. Open your project in Claude Desktop or another resource-aware client.
3. Ask: **"What files are in my project?"** — do not mention `get_context` or `@memory-layer`.
4. The AI should answer with actual file names and summaries from your codebase.

**How it works:** On startup, resource-aware clients call `resources/list`; the MCP server returns `memory://current/context` and `memory://current/developer-profile`. The client injects those into the model's context window automatically. When you switch projects (your editor sends `notifications/roots/list_changed`), the server re-registers the new project and emits `notifications/resources/list_changed` so the client refreshes.

**If the AI doesn't know your files:**

| Symptom | Fix |
|---------|-----|
| AI has no project knowledge at all | Check `memory-layer status` — has the project been indexed? Run `memory-layer api` first. |
| AI knows files after calling `get_context` but not before | Your client may not auto-surface MCP resources. Add `systemPromptAppend` (Claude Desktop) or equivalent. |
| AI loses context when switching repos | Ensure `roots/list_changed` notifications are enabled in your client settings. |

---

## Configuration

`memory-layer init` writes `~/.memory-layer/config.toml`. You can edit it directly:

```toml
[ollama]
url              = "http://localhost:11434"
summarizer_model = "qwen2.5-coder:7b"   # verify with: ollama list
embedding_model  = "nomic-embed-text"

[watch]
dirs = ["/path/to/your/project"]

[api]
host = "127.0.0.1"   # use 0.0.0.0 to expose on LAN
port = 8000

# [layer4]
# base_model = ""   # HF model ID or local path; leave blank to skip LoRA
```

Environment variables override `config.toml` values. See `.env.example` for the full list.

---

## Process model

Three separate processes; never combined:

```
Process A  memory-layer api    watcher + FastAPI on :8000 + discovery service
Process B  memory-layer train  scheduler + LoRA trainer (no network surface)
Process C  memory-layer mcp    MCP stdio server (reads DB read-only)
```

The MCP server reads from the same SQLite file that Process A writes to (WAL mode). Run `memory-layer api` in a separate terminal.

---

## CLI reference

| Command | Description |
|---------|-------------|
| `memory-layer init [--yes] [--watch DIR]` | Interactive setup wizard |
| `memory-layer api` | Start watcher + REST API (port 8000) |
| `memory-layer mcp` | Start MCP stdio server |
| `memory-layer status [--language LANG]` | Show active project, model, token savings |
| `memory-layer projects` | List registered projects |
| `memory-layer register PATH` | Manually register a project |
| `memory-layer unregister PROJECT_ID` | Remove project from registry |
| `memory-layer migrate` | Apply pending DB migrations |
| `memory-layer reindex [--project ID]` | Force re-embedding of all entities |
| `memory-layer train [--force]` | Run LoRA training once |
| `memory-layer eval` | Evaluate current vs previous adapter |
| `memory-layer report [--days N]` | Print token-saving impact report |
| `memory-layer collect` | Interactive completion logger |

---

## REST API

| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/health` | Liveness probe (no auth) |
| `GET` | `/context` | Return the current compressed context block |
| `POST` | `/completion` | Log a completion and update developer profile |
| `POST` | `/telemetry` | Record interaction metrics |
| `GET` | `/report` | Aggregated telemetry stats |
| `GET` | `/eval` | Held-out eval results for current vs previous adapter |

Interactive docs: `http://localhost:8000/docs`

---

## MCP tools

| Tool | Description |
|------|-------------|
| `get_context` | Return the context block — inject as system prompt |
| `search_entities` | Fuzzy search active code entities by path or function name |
| `semantic_search` | Vector search over entity summaries (for concept-level queries) |
| `log_completion` | Log a completion outcome for Style Adapter training |
| `mark_reprompt` | Flag that a prior `get_context` response was insufficient; call with the `session_id` to mark it for telemetry analysis |

---

## Migration guide: V2 → V3

V3 is backward-compatible at the DB level. The migration runner (`memory-layer migrate`) handles schema upgrades automatically on startup.

Key changes:

1. **Config location.** V2 used a per-project `.env`. V3 adds `~/.memory-layer/config.toml` as the user-level config. Your `.env` still works (env vars override `config.toml`), but `memory-layer init` generates `config.toml` for new installs.

2. **Per-repo databases.** V3 stores project data in `<repo>/.memory-layer/memory.db` rather than a single `memory.db`. Run `memory-layer migrate` once; the runner adds `project_id` to all tables.

3. **Developer profile moved to global registry.** Your coding-style profile now lives in `~/.memory-layer/registry.db` and is shared across all projects. Existing data is migrated automatically.

4. **Semantic search requires `nomic-embed-text`.** Pull it with `ollama pull nomic-embed-text`, then run `memory-layer reindex` to build embeddings for existing entities.

5. **MCP tool names unchanged.** No editor config changes needed.

---

## Style Adapter (Layer 4)

The Style Adapter is a LoRA fine-tune of your local base model, trained on completions you have accepted or corrected. It learns **output style** — not codebase facts.

**Training gates** (all required before a run is attempted):

| Gate | Default | Rationale |
|------|---------|-----------|
| Minimum total samples | 500 | Below this, LoRA produces noise or memorisation |
| Minimum new since last run | 100 | Avoid retraining on tiny deltas |
| Frequency | Weekly (Sun 02:00 UTC) | Amortises the cost of a 30B model run |

**Promotion gate**: the new adapter is only promoted if its held-out loss improves on the previous adapter by ≥ 0.02. Failed runs are logged with status `failed_eval`; the previous adapter stays active.

---

## Development

```bash
git clone https://github.com/yadu9989/memory-layer
cd memory-layer
pip install -e ".[dev]"
pytest
ruff check .
```

Optional extras:

```bash
pip install -e ".[train]"          # LoRA training (torch, transformers, peft)
pip install -e ".[cpu-embeddings]" # sentence-transformers fallback if Ollama unavailable
pip install -e ".[unsloth]"        # faster training via unsloth
```

---

## Requirements

- Python 3.11+
- [Ollama](https://ollama.com) running locally (for file summaries and embeddings)
- SQLite (bundled with Python)
- *(optional, for training)* `pip install -e ".[train]"` — torch, transformers, peft, accelerate

---

## License

MIT
