Metadata-Version: 2.4
Name: localthink-mcp
Version: 2.1.1
Summary: Local Ollama-backed MCP server — 45 tools, smart buffer, execution filters, persistent scratchpad, settings GUI
Project-URL: Homepage, https://github.com/H3xabah/localthink-mcp
Project-URL: Repository, https://github.com/H3xabah/localthink-mcp
Project-URL: Issues, https://github.com/H3xabah/localthink-mcp/issues
Project-URL: Changelog, https://github.com/H3xabah/localthink-mcp/blob/main/CHANGELOG.md
License: MIT
License-File: LICENSE
Keywords: claude,claude-code,context-compression,llm,mcp,ollama
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: httpx>=0.27
Requires-Dist: mcp>=1.0
Description-Content-Type: text/markdown

# localthink-mcp

**Local LLM context compression for Claude Code.**
Offloads large file queries and document processing to Ollama so they never burn Claude's context window.

> v0.1.0 benchmarked at **~30× token savings** on 16 KB file queries.
> v1.1 adds 13 new tools covering every major token-waste pattern.
> v1.2 adds **pre-injection**: `local_improve_prompt` and `local_preplan` run locally *before* Claude sees the task — sharpening prompts and scaffolding plans so Claude executes rather than guesses.
> v2.1 adds **smart buffer**, **execution filters**, **session scratchpad**, **persistent notes**, **response refinement**, and a **disk-backed result cache** — 14 new tools, 45 total.

---

## Quick start

```bash
# 1. Pull a model (once)
ollama pull qwen2.5:14b-instruct-q4_K_M

# 2. Register with Claude Code
claude mcp add localthink -- uvx localthink-mcp

# 3. Verify
claude mcp list   # localthink → Connected
```

---

## Requirements

- [Ollama](https://ollama.ai) installed and running (`ollama serve`)
- [Claude Code](https://claude.ai/code) CLI
- Python 3.10+

---

## All 45 tools

### v0.1.0 — Core compression

| Tool | When to use |
|------|-------------|
| `local_answer(file_path, question)` | Query a large file without loading it into context |
| `local_summarize(text, focus?)` | Compress a large text blob already in context |
| `local_extract(text, query)` | Pull only the cited passages you need from a document |

### v1.1 — New routes

#### File operations
| Tool | What it does |
|------|--------------|
| `local_shrink_file(file_path, focus?)` | Read a file → return compressed *content* (not an answer). Hold the compressed version in context for repeated reference. |
| `local_batch_answer(file_paths, question)` | Answer one question across many files in a single call. No files enter Claude's context. |
| `local_scan_dir(dir_path, pattern, question?, max_files?)` | Walk a directory, summarize or query every matching file. Glob pattern support (`**/*.ts`, `config/*.yaml`). |

#### Composition (fewer round-trips)
| Tool | What it does |
|------|--------------|
| `local_pipeline(text, steps)` | Chain `summarize` → `extract` → `answer` in one call. Up to 5 steps. Eliminates back-and-forth for predictable multi-stage workflows. |
| `local_auto(input, question?)` | Meta-tool: detects file path vs text, picks the right op, handles large docs with auto extract-then-answer. Zero decision overhead. |

#### Stateful document chat
| Tool | What it does |
|------|--------------|
| `local_chat(document, message, history?)` | Multi-turn Q&A. Document is compressed on first call and stays with Ollama. Claude holds only conversation history — the original doc never enters Claude's window. |

#### Semantic & structural
| Tool | What it does |
|------|--------------|
| `local_grep_semantic(file_path, meaning, max_results?)` | Find passages matching a *concept*, not a literal string. "Find where rate limiting is enforced" works even if the word "rate" isn't there. |
| `local_outline(text)` | Structural table of contents with line ranges — no content returned. Use before `local_extract` to find the right section. |
| `local_code_surface(file_path)` | Public API skeleton. **Python: pure AST (no Ollama, instant).** Other languages: fast LLM. Typically 5-10% of original size. |

#### Analysis / meta
| Tool | What it does |
|------|--------------|
| `local_classify(text)` | Classify content type + recommend the best tool. Returns JSON. Use for programmatic routing in hooks/scripts. |
| `local_audit(file_path, checklist)` | Checklist-based file audit: PASS / FAIL / PARTIAL / N/A per item. File never enters Claude's context. |
| `local_models()` | List local Ollama models and show current DEFAULT / FAST model config. |

### v1.2 — Pre-injection (run before Claude thinks)

These tools run a local model pass *before* Claude engages with a task. Claude never sees the raw input — only the pre-processed output. Eliminates waste at the source rather than compressing after the fact.

| Tool | What it does |
|------|--------------|
| `local_improve_prompt(prompt, context?)` | Rewrite a vague or rough prompt into a clear, specific, unambiguous version. Claude receives only the sharpened result. Uses the fast model — minimal overhead. |
| `local_preplan(task, context?, depth?)` | Generate a structured implementation plan (goal / assumptions / ordered steps / risks / open questions) via local model. Claude executes the scaffold rather than planning from scratch. `depth`: `"quick"` (3-5 steps), `"standard"` (default), `"detailed"` (sub-bullets + rationale). |

**`local_improve_prompt` example:**
```
"make the auth faster"
→ local_improve_prompt(prompt, context="Next.js, JWT, DB bottleneck suspected")
→ "Optimise JWT validation latency in src/auth/middleware.ts — profile the verify()
   hot path, remove redundant DB round-trips, target p95 < 5 ms."
→ Feed that to Claude as the actual task
```

**`local_preplan` example:**
```python
plan = local_preplan(
  task="add rate limiting to the API",
  context="Express.js, Redis available, routes in src/routes/",
  depth="standard"
)
# Returns: Goal / Assumptions / Steps with file paths / Risks / Open questions
# Then: "Execute this plan: <plan>"
```

### v1.1 expansion — high-context compression + smart reading

#### High-context compression
| Tool | What it does |
|------|--------------|
| `local_compress_log(file_path, level?, since?)` | Compress a log file to its essential signal. Groups repeated errors with counts, extracts key events, surfaces anomalies. Optional level (`ERROR`/`WARN`) and timestamp-prefix filters. Turns 5 MB logs into ~500-token summaries. |
| `local_compress_stack_trace(text)` | Distil a stack trace (+ source context) to: root cause, failure point, 3-5 key frames, fix hint. Eliminates framework boilerplate that inflates traces to thousands of tokens. |
| `local_compress_data(data, keep_fields?, question?)` | Compress JSON objects, CSV exports, and API responses. Strips nulls, samples large arrays, keeps IDs/status codes. REST responses commonly shrink 20:1. |
| `local_session_compress(file_path)` | **Recursive meta-tool.** Compress a saved Claude conversation transcript to a re-entry briefing: context, decisions, current state, open items, constraints. The transcript never enters Claude's context. |
| `local_prompt_compress(text)` | Compress a long CLAUDE.md or system prompt to its minimal directive set. Preserves every unique rule; removes duplicates and verbose prose. |

#### Smart reading (avoid loading files at all)
| Tool | What it does |
|------|--------------|
| `local_symbols(file_path)` | Full symbol table: every definition with type, line number, and one-line description. Replaces "read file to see what's in it." |
| `local_find_impl(file_path, spec)` | Natural-language code search inside a file. Returns the complete matching logical unit with line numbers. E.g. `spec="where JWT token is verified"`. |
| `local_strip_to_skeleton(file_path)` | All function bodies → `...`, everything else preserved (docstrings, decorators, type annotations, comments). Typically 30-50% of original. |

#### Format transformation
| Tool | What it does |
|------|--------------|
| `local_translate(text, target_format)` | Convert formats without loading source into context: `json↔yaml↔toml`, `csv→markdown_table`, `code→pseudocode`, `sql→english`, `env→json`. |
| `local_schema_infer(data)` | Sample data → compact JSON Schema (draft-07). API samples are often 100:1 data-to-schema ratio. |

#### Temporal & multi-file diff
| Tool | What it does |
|------|--------------|
| `local_timeline(text)` | Chronological event sequence from logs, changelogs, git log, or incident reports. Deduplicates repeated events. |
| `local_diff_files(path_a, path_b, focus?)` | Diff two files by path — neither file loaded into context. Counterpart to `local_diff` which takes in-context text. |

### v2.1 — Smart buffer, execution filters, scratchpad, notes, cache

#### Smart Buffer (raw output triage)
| Tool | What it does |
|------|--------------|
| `local_gate(raw_output)` | Triage any raw output (test results, build logs, lint dumps) into Pattern + Anomalies + Signal. Always fits in budget. Use before injecting any raw tool output into context. |
| `local_slice(file_path, offset_lines)` | Read a window of lines from a file at an offset. On-demand raw access when `local_gate` identifies a region worth inspecting. |
| `local_diff_semantic(before, after)` | Meaning-level diff — noise (whitespace, formatting, minor rewording) suppressed. Only semantic changes surface. |

#### Execution Filters (project tools → local LLM)
| Tool | What it does |
|------|--------------|
| `local_run_tests()` | Run the project test suite. Returns only `{failed, delta, pointer}`. Nothing else enters context. |
| `local_run_lint()` | Run the linter. Violations grouped by rule; passing rules suppressed. |
| `local_run_build()` | Run the build. Returns root cause + affected symbols only. |

#### Session Scratchpad (stateful decisions)
| Tool | What it does |
|------|--------------|
| `local_memo_write(section, content)` | Write to a named scratchpad section: `decisions`, `assumptions`, `pitfalls`, `open_questions`. Auto-compacts beyond threshold. |
| `local_memo_read()` | Read the full scratchpad as a distilled summary. Restore context mid-session without re-reading files. |
| `local_memo_checkpoint()` | Freeze scratchpad into a `RESUME_PROMPT` string. Paste after `/clear` to continue with full context. |

#### Persistent Notes (cross-session knowledge)
| Tool | What it does |
|------|--------------|
| `local_note_write(category, content)` | Write a permanent note to disk (`architecture`, `gotcha`, `pattern`). Survives `/clear` and new sessions. |
| `local_note_search(query)` | Full-text search across all persisted notes. Run at session start to surface relevant prior knowledge. |

#### Response Quality & Cache
| Tool | What it does |
|------|--------------|
| `local_refine(prompt, draft, instructions?)` | Post-process an LLM draft through a refinement pass. Optional instructions target tone, brevity, or accuracy. |
| `local_cache_stats()` | Show cache hit/miss counts, entry count, and total disk usage. |
| `local_cache_clear()` | Evict all cached results. |
| `local_config()` | Open the settings GUI — configure all 18 settings across Ollama, Timeouts, Limits, Cache, and Memo. Saves to `~/.localthink-mcp/config.json` and hot-reloads the running server. |

---

## Decision guide

| Situation | Tool |
|-----------|------|
| File > 5 KB, one specific question | `local_answer` |
| File > 5 KB, need to reference it multiple times | `local_shrink_file` |
| Text already in context, want to compress it | `local_summarize` |
| "Find me the part about X" | `local_extract` |
| Need to outline a doc before extracting | `local_outline` → `local_extract` |
| Want to know what's in a code file | `local_symbols` |
| Want to understand a code file's structure | `local_code_surface` |
| Want the full file but bodies stripped | `local_strip_to_skeleton` |
| "Find the function that does X" | `local_find_impl` |
| Multi-step process on the same document | `local_pipeline` |
| Unsure which tool to use | `local_auto` |
| Multiple questions about the same large doc | `local_chat` |
| Same question across 5+ files | `local_batch_answer` |
| Understand what's in a directory | `local_scan_dir` |
| "Find where X is handled" (concept search) | `local_grep_semantic` |
| Security or quality checklist | `local_audit` |
| Unsure of content type before processing | `local_classify` |
| Large log file | `local_compress_log` |
| Stack trace + source context | `local_compress_stack_trace` |
| JSON / CSV / API response payload | `local_compress_data` |
| Session too long, need to restart | `local_session_compress` |
| CLAUDE.md grown too large | `local_prompt_compress` |
| Need JSON as YAML (or any format swap) | `local_translate` |
| Need a schema for sample data | `local_schema_infer` |
| Need a timeline from a log or changelog | `local_timeline` |
| Compare two files without loading them | `local_diff_files` |
| Compare two in-context text blobs | `local_diff` |
| Prompt is vague — sharpen before sending to Claude | `local_improve_prompt` |
| Task is large — plan locally before Claude touches it | `local_preplan` |
| Raw test/build/lint output about to enter context | `local_gate` |
| `local_gate` flagged a specific region worth reading | `local_slice` |
| Two text blobs — want only the meaningful diff | `local_diff_semantic` |
| Run tests without dumping output into context | `local_run_tests` |
| Run lint without dumping output into context | `local_run_lint` |
| Run build without dumping output into context | `local_run_build` |
| Want to record a decision or assumption mid-session | `local_memo_write` |
| Resuming work, need to restore session context | `local_memo_read` |
| About to `/clear` — want to resume with full context | `local_memo_checkpoint` |
| Want to save a pattern or gotcha for future sessions | `local_note_write` |
| Starting a session — check for relevant prior notes | `local_note_search` |
| LLM draft needs a quality pass | `local_refine` |
| Check or clear the result cache | `local_cache_stats` / `local_cache_clear` |
| Change any setting via GUI | `local_config` |

---

## local_pipeline examples

```python
# Extract auth sections, then summarize for security review
local_pipeline(text=big_doc, steps=[
    {"op": "extract",   "query": "authentication and authorization"},
    {"op": "summarize", "focus": "security risks and gotchas"},
])

# Answer a question after narrowing to the relevant section
local_pipeline(text=api_docs, steps=[
    {"op": "extract",  "query": "rate limiting"},
    {"op": "answer",   "question": "what headers control retry behaviour?"},
])
```

## local_chat example

```python
# Turn 1 — document is compressed automatically
r = local_chat(full_doc, "What does this library do?", "")
# r["doc"]     = compressed version (hold this)
# r["history"] = conversation so far (hold this)
# r["answer"]  = the answer

# Turn 2 — pass compressed doc + history back
r = local_chat(r["doc"], "How do I configure auth?", r["history"])

# Turn 3
r = local_chat(r["doc"], "Show me the relevant config keys", r["history"])
```

---

## Configuration

The easiest way to configure LocalThink is to call `local_config` from Claude Code — it opens a GUI that covers every setting below.

Settings are saved to `~/.localthink-mcp/config.json` and applied automatically on the next server start.

### Ollama

| Env var | Default | Recommended |
|---------|---------|-------------|
| `OLLAMA_BASE_URL` | `http://localhost:11434` | Change only if Ollama runs on a remote machine or non-default port |
| `OLLAMA_MODEL` | `qwen2.5:14b-instruct-q4_K_M` | Match your VRAM tier — see SETUP.md for the full table |
| `OLLAMA_FAST_MODEL` | *(same as MODEL)* | One tier smaller than the default (e.g. `qwen2.5:7b` if default is `14b`). Used by classify, outline, translate, schema_infer |
| `OLLAMA_TINY_MODEL` | *(same as FAST)* | `qwen2.5:3b` or smaller. Used by trivial ops on small inputs |

### Timeouts

| Env var | Default | Recommended |
|---------|---------|-------------|
| `LOCALTHINK_TIMEOUT` | `360` | `360` for 14b models · `600` for 32b+ · `120` for 7b on fast GPU |
| `LOCALTHINK_FAST_TIMEOUT` | `180` | `60`–`180` — fast model calls should be quick |
| `LOCALTHINK_TINY_TIMEOUT` | `60` | Rarely needs changing |
| `LOCALTHINK_HEALTH_TIMEOUT` | `2` | Leave at `2` — this is just an Ollama ping |
| `LOCALTHINK_CODE_SURFACE_TIMEOUT` | `600` | Increase to `900` for large TS/Go/Rust files on slow hardware |

### Limits

| Env var | Default | Recommended |
|---------|---------|-------------|
| `LOCALTHINK_MAX_FILE_BYTES` | `200000` | `200000` (~200 KB) is right for most codebases · increase to `500000` for monorepos with giant files |
| `LOCALTHINK_MAX_PIPELINE_STEPS` | `5` | Leave at `5` unless you're building complex custom pipelines |
| `LOCALTHINK_MAX_SCAN_FILES` | `20` | Increase to `50`–`100` for large directory scans; watch memory |
| `LOCALTHINK_CLASSIFY_SAMPLE` | `8000` | `8000` chars is enough for most inputs — rarely needs changing |
| `LOCALTHINK_MAX_CONCURRENCY` | `4` | `1`–`2` on low VRAM · `4` default · `6`–`8` if Ollama handles parallel slots well |

### Cache

| Env var | Default | Recommended |
|---------|---------|-------------|
| `LOCALTHINK_CACHE_DIR` | `~/.cache/localthink-mcp` | Change if the default drive is low on space |
| `LOCALTHINK_CACHE_TTL_DAYS` | `30` | `7` if disk space is tight · `90` if you want long-lived results across projects |

### Memo / Notes

| Env var | Default | Recommended |
|---------|---------|-------------|
| `LOCALTHINK_MEMO_DIR` | `~/.localthink-mcp` | Point to a synced folder (Dropbox, OneDrive) to share notes across machines |
| `LOCALTHINK_COMPACT_THRESHOLD` | `3000` | `1500` for faster reads · `5000` to preserve more raw content before auto-compact |

### Example: 3-tier model setup

```json
{
  "mcpServers": {
    "localthink": {
      "env": {
        "OLLAMA_MODEL":      "qwen2.5:14b-instruct-q4_K_M",
        "OLLAMA_FAST_MODEL": "qwen2.5:7b-instruct-q4_K_M",
        "OLLAMA_TINY_MODEL": "qwen2.5:3b"
      }
    }
  }
}
```

---

## Install options

### uvx (recommended — zero setup)

```bash
claude mcp add localthink -- uvx localthink-mcp
```

### pip

```bash
pip install localthink-mcp
claude mcp add localthink -- localthink-mcp
```

### Windows — if `uvx` isn't on Claude's PATH

```bash
claude mcp add --transport stdio localthink -- cmd /c uvx localthink-mcp
```

---

## Security

- **Local only** — runs as a stdio child process, never exposed to the network.
- **`local_answer` / `local_shrink_file` / `local_audit` read any path your shell can access.** Same trust level as Claude's built-in `Read` tool.
- **Ollama has no auth by default.** Don't expose port `11434` to the internet.
- **No data leaves your machine.** All inference is local.

---

## Troubleshooting

**`[localthink] Ollama is not running`**
```bash
ollama serve
curl http://localhost:11434/api/tags
```

**Slow responses**
Switch to a smaller model or set a fast model:
```bash
OLLAMA_MODEL=qwen2.5:7b-instruct claude
```

**Windows: `uvx` not found**
Install [uv](https://docs.astral.sh/uv/getting-started/installation/), then retry. Or use `cmd /c uvx` fallback.

---

## License

MIT © 2026 [H3xabah](https://github.com/H3xabah)
