Metadata-Version: 2.4
Name: mindvault
Version: 0.5.460
Summary: Local-first second brain RAG system — chat with your own data
Author-email: Caleb <caleb@calebmedia.co>
License: Source-Available — see LICENSE
License-File: LICENSE
Requires-Python: >=3.11
Requires-Dist: ddgs
Requires-Dist: fastapi
Requires-Dist: httpx
Requires-Dist: prompt-toolkit
Requires-Dist: pypdf
Requires-Dist: python-dotenv
Requires-Dist: qdrant-client
Requires-Dist: rich
Requires-Dist: trafilatura
Requires-Dist: uvicorn
Description-Content-Type: text/markdown

<p align="center">
  <img src="icon-text.png" alt="MindVault" width="320"/>
</p>

# MindVault

A local-first second brain that turns your AI conversation exports, Obsidian notes, and documents into a searchable, conversational memory system.

Everything runs on your machine. No data leaves.

---

## What it does

- **Ingests** Claude, ChatGPT, and other AI conversation exports, Obsidian vaults, PDFs, and plain text files
- **Indexes** content into a multi-layer memory system (raw → compressed → structured → linked)
- **Remembers** entities, decisions, and goals extracted from every chat
- **Retrieves** using hybrid scoring — summaries first, raw text only when needed
- **Chats** interactively with six reasoning modes powered by a council of AI voices
- **Searches the web** automatically when your memory doesn't have a confident answer
- **Saves sessions** — resume any previous conversation exactly where you left off

---

## Quick start

### Prerequisites

- Python 3.11+
- [Ollama](https://ollama.com/download) with two models:

```bash
ollama pull nomic-embed-text   # vector search
ollama pull llama3.2           # chat and summarization
ollama serve                   # start Ollama if not running
```

### Install

```bash
git clone https://github.com/calebthecm/MindVault
cd MindVault
python -m venv .venv && source .venv/bin/activate
pip install -e .               # installs deps + registers the mindvault CLI
```

Or without the CLI shortcut:

```bash
pip install -r requirements.txt
```

### First run

```bash
mindvault setup                # or: python -m mindvault setup
```

### Add your data

Drop your AI export folder into the project directory. PDFs and `.txt`/`.md` files go anywhere — point the ingester at them manually.

| Provider | How to export |
|---|---|
| **Claude** | claude.ai → Settings → Export Data (folder starting with `data-`) |
| **ChatGPT** | chatgpt.com → Settings → Data Controls → Export Data |

### Index and chat

```bash
mindvault ingest               # index everything
mindvault chat                 # start talking to your brain
```

---

## Running MindVault

Three equivalent ways to run it — use whichever you prefer:

```bash
# After pip install -e . (recommended)
mindvault
mindvault chat
mindvault ingest

# As a Python module (no install needed)
python -m mindvault
python -m mindvault chat
python -m mindvault ingest

# Legacy script (still works)
python mindvault.py
python mindvault.py chat
python mindvault.py ingest
```

---

## Commands

```
mindvault                           chat (default)
mindvault chat                      interactive REPL
mindvault chat --resume             resume last session
mindvault chat --resume <id>        resume specific session
mindvault ingest                    auto-discover and index all exports
mindvault ingest ./folder/          index a specific folder
mindvault ingest --force            re-index even if already processed
mindvault notes                     regenerate Obsidian notes
mindvault setup                     first-run configuration wizard
mindvault stats                     show index and session statistics
mindvault sessions                  list resumable sessions
mindvault consolidate               merge near-duplicate memories
```

**During a chat session:**

```
Shift+Tab            cycle reasoning mode
/help                show all commands
/web <query>         search the web (DuckDuckGo, no API key needed)
/search <term>       search memory without LLM — shows scored results
/note <text>         quick-capture a note (indexed on next ingest)
/forget <topic>      suppress matching chunks from future retrieval
/mode [name]         show or switch mode (CHAT, PLAN, DECIDE, DEBATE, REFLECT, EXPLORE)
/sources             show which memories were used in the last answer
/remember <fact>     save a fact to this session
/private             toggle private vault inclusion
/resume              interactive session picker
/clear               clear conversation history
/quit, /exit         end session (compresses and saves automatically)
```

---

## Web search

MindVault searches the web automatically when memory confidence is low, or on demand:

```
/web what is the current price of ETH?
/web latest news on local SEO in 2025
```

Uses DuckDuckGo — no API key, no Docker, no setup. Configure in `config.py`:

```python
WEB_SEARCH_AUTO_THRESHOLD = 0.45   # auto-search when best memory score is below this
WEB_SEARCH_MAX_RESULTS    = 5      # results to include in context
```

Set `WEB_SEARCH_AUTO_THRESHOLD = 0` to disable auto-search.

---

## Reasoning modes

MindVault has six modes, cycled with **Shift+Tab** in the prompt bar.

| Mode | What it does |
|---|---|
| 💬 CHAT | Standard RAG — retrieve memories, synthesize an answer |
| 📋 PLAN | Break the task into structured, actionable steps |
| 🗳 DECIDE | Five-voice council votes; tally + majority verdict shown |
| ⚖ DEBATE | FOR vs AGAINST, then a moderated verdict |
| 🔍 REFLECT | Deep synthesis — what does your brain really know about this? |
| 🕸 EXPLORE | Graph traversal — follows memory links to surface surprises |

The council is five internal voices with distinct personalities:

| Voice | Orientation |
|---|---|
| 📊 The Analyst | Evidence-first, skeptical, quantitative |
| 🚀 The Visionary | Big-picture, creative, optimistic |
| 🔧 The Pragmatist | What's actionable right now |
| 😈 The Devil | Challenges every assumption, finds the flaw |
| 📜 The Historian | Patterns across time; what past memory reveals |

---

## How it works

### Memory layers

| Layer | What | Used for |
|---|---|---|
| Raw | Original text chunks | Fallback when summaries aren't confident enough |
| Compressed | LLM-generated summaries per session/document | Primary retrieval context |
| Structured | Extracted entities (persons, projects, decisions, goals) | Entity-boosted retrieval |
| Linked | Relationships between memories via shared entities + wikilinks | Graph traversal in EXPLORE mode |
| Web | Live DuckDuckGo results | Augments memory for current/unknown topics |

### Retrieval scoring

```
score = 0.5 × embedding_similarity
      + 0.2 × entity_overlap
      + 0.2 × recency
      + 0.1 × importance
```

Compressed summaries are searched first. Raw chunks are only fetched when confidence drops below the threshold. EXPLORE mode additionally walks `memory_links` to pull in related neighbors.

### Session lifecycle

1. **During chat:** turns saved live + entities extracted per exchange (background)
2. **On exit:** LLM compresses the session into a 2–4 sentence summary
3. **Summary embedded** and stored in the compressed memory layer
4. **Resume anytime** with `--resume` or `/resume` during chat

---

## Configuration

All settings in `mindvault/config.py`:

| Variable | Default | What it controls |
|---|---|---|
| `LLM_MODEL` | `llama3.2` | Model for summarization, chat, extraction |
| `EMBEDDING_MODEL` | `nomic-embed-text` | Vector search embeddings |
| `CHAT_TOP_K` | `8` | Chunks retrieved per query |
| `COMPRESSED_SCORE_THRESHOLD` | `0.75` | Below this, also fetch raw chunks |
| `WEB_SEARCH_AUTO_THRESHOLD` | `0.45` | Auto web search below this memory score (0 = off) |
| `SUGGEST_FOLLOWUPS` | `True` | Suggest follow-up questions after each answer |
| `WRITE_SESSIONS_TO_VAULT` | `True` | Write session summary notes to Obsidian on exit |
| `CHAT_INCLUDE_PRIVATE` | `False` | Include private vault by default |

---

## Storage

| Path | What |
|---|---|
| `brain.db` | SQLite: ingestion tracking, entities, links, importance scores |
| `.qdrant/` | Qdrant: vector index (raw + compressed collections) |
| `sessions/` | Compressed chat sessions (`.json.gz`) |
| `notes/` | Quick-captured notes via `/note` (indexed on next ingest) |
| `My Brain/` | Obsidian vault — business, projects, general knowledge |
| `Private Brain/` | Obsidian vault — personal content (separate collection) |
| `data-*/` | Export folders (excluded from git) |

---

## Privacy

- All processing is local by default.
- Web search uses DuckDuckGo's anonymous API — no account, no tracking.
- `My Brain` and `Private Brain` are in **separate Qdrant collections** — private content is never implicitly included in responses.
- `.gitignore` excludes all personal data: vaults, exports, sessions, databases.

---

## Requirements

```
qdrant-client       vector database
httpx               HTTP client (LLM + web requests)
python-dotenv       .env file loading
pypdf               PDF ingestion
prompt_toolkit      TUI and interactive input
rich                markdown rendering in terminal
ddgs                web search (no API key)
trafilatura         web page content extraction
```
