Metadata-Version: 2.4
Name: nouz-mcp
Version: 3.2.1
Summary: MCP server for Obsidian — semantic knowledge graph with auto-classification, DAG hierarchy, and cross-domain bridge detection
Author-email: Semiotronika <belkinamariaigorevna@yandex.ru>
License-Expression: MIT
Project-URL: Homepage, https://semiotronika.ru
Project-URL: Repository, https://github.com/Semiotronika/NOUZ-MCP
Project-URL: Documentation, https://github.com/Semiotronika/NOUZ-MCP#readme
Project-URL: Changelog, https://github.com/Semiotronika/NOUZ-MCP/blob/main/CHANGELOG.md
Project-URL: Bug Tracker, https://github.com/Semiotronika/NOUZ-MCP/issues
Keywords: mcp,obsidian,knowledge-graph,semantic-classification,rag,llm,notes,pkm,embeddings,dag
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
Classifier: Topic :: Text Processing :: Markup :: Markdown
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: mcp>=1.0.0
Requires-Dist: aiofiles>=23.0.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: aiohttp>=3.9.0
Requires-Dist: aiosqlite>=0.19.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Dynamic: license-file

# NOUZ — Semantic MCP Server for Your Knowledge Base

> *Structure emerges from content.*

Works with Obsidian, Logseq, and any directory of Markdown files.

[![MIT License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://python.org)
[![MCP](https://img.shields.io/badge/protocol-MCP_stdio-lightgrey.svg)](https://modelcontextprotocol.io)
[![PyPI](https://img.shields.io/badge/pypi-nouz--mcp-orange.svg)](https://pypi.org/project/nouz-mcp/)

🇷🇺 [Русская версия](README.md)

---

## Why NOUZ

NOUZ sits between your note base and your AI agent. It helps turn scattered Markdown files into a graph that can be used through MCP:

1. **Automatic Classification (Semantics)**  
   You define "Cores" — base domains of your knowledge base, such as Systems Analysis, Data & Science, and Engineering. When you add a new note, NOUZ reads its text, compares vectors, and proposes a domain sign or a combination of domains.

2. **Connection Discovery Between Notes**
   The server builds a directed graph (DAG) and proposes links that can be reviewed before they are written:
   - *Semantic bridges:* two notes from different domains point to the same idea.
   - Explicit `tag` links can be stored manually in `parents_meta`; `suggest_metadata` also proposes read-only `tag_bridges` from shared canonical YAML tags.

3. **Base Evolution Tracking (Drift)**  
   NOUZ aggregates data bottom-up. If a module started in one domain while new notes gradually pull it into another, the server shows the divergence (`core_drift`).

Depending on your needs, NOUZ works in three modes: from a simple graph (**LUCA**) to a strict 5-level hierarchy (**SLOI**).

---

## How It Works

1. You describe domains in `config.yaml` — what each domain covers and which textual signals identify it.
2. The server turns descriptions into vector etalons (locally, via LM Studio or Ollama).
3. Each new note is projected onto these axes. Sign is determined by content, or by you.
4. L4 gets a domain profile from text classification, while L3/L2 aggregate `core_mix` from child nodes. If a module's `sign` diverges from `core_mix`, the server reports `core_drift`.

**Semantic bridges** find connections between notes from different domains when texts are close in meaning. Tags remain explicit user metadata.

---

## Quick Start

```bash
pip install nouz-mcp
OBSIDIAN_ROOT=/path/to/vault nouz-mcp
```

Without `config.yaml`, the server starts in **LUCA** mode — graph without semantics, works immediately.

To enable semantic mode, create a local config from the template:

```bash
cp config.template.yaml config.yaml
```

On Windows PowerShell:

```powershell
Copy-Item config.template.yaml config.yaml
```

Or from source:

```bash
git clone https://github.com/Semiotronika/NOUZ-MCP
cd NOUZ-MCP
pip install -r requirements.txt
cp config.template.yaml config.yaml
OBSIDIAN_ROOT=./vault python server.py
```

Connect to Claude Desktop, Cursor, OpenCode, or any MCP client:

```json
{
  "mcpServers": {
    "nouz": {
      "command": "nouz-mcp",
      "env": {
        "OBSIDIAN_ROOT": "/path/to/vault",
        "NOUZ_CONFIG": "/absolute/path/to/config.yaml",
        "EMBED_API_URL": "http://127.0.0.1:1234/v1"
      }
    }
  }
}
```

---

## MCP Tools

| Tool | Purpose |
|------------|-------|
| `suggest_metadata` | Sign, level, bridges, drift warnings |
| `write_file` | Write a note with YAML frontmatter |
| `update_metadata` | Update YAML only, preserving the note body |
| `read_file` | Read a note + metadata |
| `calibrate_cores` | Update core reference vectors |
| `recalc_signs` | Recalculate signs for all notes |
| `recalc_core_mix` | Recalculate bottom-up aggregation |
| `index_all` | Re-index the entire base; with `with_embeddings=true`, also refresh file/chunk embeddings |
| `embed` | Get a vector for text |
| `chunk_text` | Split Markdown text into deterministic retrieval chunks |
| `chunk_file` | Split one note body into deterministic retrieval chunks |
| `search_chunks` | Search stored chunk embeddings |
| `list_files` | List with filters by level, sign |
| `get_children` | Traverse down the graph |
| `get_parents` | Traverse up the graph |
| `suggest_parents` | Find parents for an orphan |
| `add_entity` | Create an entity in one step (auto sign/parents, explicit tags only) |
| `process_orphans` | Auto-fill files without markup |

Set `NOUZ_READ_ONLY=true` to hide and block mutating tools (`write_file`,
`update_metadata`, `index_all`, recalculation, orphan processing, and entity
creation). Read-only tools such as `read_file`, `suggest_metadata`, `embed`,
`chunk_text`, `chunk_file`, and `search_chunks` remain available. With
`NOUZ_READ_ONLY=true`, read-only tools do not refresh the SQLite cache by
default, and startup skips DB init/index/calibration; set `NOUZ_CACHE_WRITE=true`
if you want cache writes in read-only mode.

`chunk_text` and `chunk_file` return `chunker_version`, stable `id`, actual
chunk text coordinates (`start_char`/`end_char`), body coordinates without
overlap (`body_start_char`/`body_end_char`), and hash fields. `index_all` with
`with_embeddings=true` stores these chunks in the SQLite `chunk_embeddings`
table, and `search_chunks` ranks them by cosine similarity to the query.

`parents_meta.link_type` supports manual `hierarchy`, `semantic`, `temporary`,
`tag`, `analogy`, and `error` links. NOUZ does not auto-generate analogy links.
`tag_bridges` in `suggest_metadata` are suggestions from explicit YAML tags and
are not written back to files.

YAML tags are explicit metadata: NOUZ normalizes them to canonical slug form
(`agent-context`, optionally `area/topic`) and rejects obvious non-tags such as
hex colors, URLs, numeric-only tokens, empty values, and `none`/`null`.
`suggest_metadata` returns `tag_quality` so an agent can see which tags are
accepted for future `tag_bridges` and which raw values were discarded.
For tag automation, `suggest_metadata` also returns read-only `tag_candidates`:
candidates from the already accepted YAML tag vocabulary in the index and
explicit inline `#tag` markers in the note body. Candidates are not written to
YAML automatically; once accepted through `update_metadata`, normal
`tag_bridges` work from those tags. Before writing, possible links are returned
separately as `candidate_tag_bridges`. For each candidate, NOUZ temporarily
chunks the current text and returns `evidence` with `chunk_id`, heading,
coordinates, and a short snippet. This does not require a prebuilt
`chunk_embeddings` table.

---

## Configuration

Minimal `config.yaml`:

```yaml
mode: prizma

etalons:
  - sign: S
    name: Systems Analysis
    text: >
      Methodology for analysing complex objects: feedback loops,
      emergent properties, self-regulation, bifurcation points.
      Cybernetics, synergetics, dissipative structures, catastrophe
      theory, autopoiesis — tools for understanding how the whole
      exceeds the sum of its parts. Not data and not code — a way
      of thinking about how parts form a whole and why systems
      behave non-linearly.
  - sign: D
    name: Data & Science
    text: >
      Physics and cosmology: from subatomic particles to the large-scale
      structure of the Universe. Lagrangians, curvature tensors, scattering
      cross-sections, quarks, bosons, fermions, plasma, vacuum fluctuations,
      cosmic microwave background, cosmological constant, decoherence.
      Pure science about the nature of matter, energy and spacetime.
  - sign: E
    name: Engineering
    text: >
      Software engineering, machine learning and infrastructure: writing
      and debugging code, deployment, containerisation, neural networks,
      inference, tokenisation, data serialisation, microservices, CI/CD,
      automated testing, refactoring, Git, Docker, Kubernetes, APIs.
      The practical discipline of building computational systems from
      architecture to production.

thresholds:
  sign_spread: 0.05
  confident_spread: 60.0
  pattern_second_sign_threshold: 30.0
  semantic_bridge_threshold: 0.55
  parent_link_threshold: 0.55

artifact_signs:
  - sign: n
    name: Note
    text: Short note, observation, fragment.
  - sign: c
    name: Concept
    text: Definition, concept, entity description.
  - sign: r
    name: Reference
    text: External source, documentation, link, citation.
  - sign: l
    name: Log
    text: Session log, chronology, dialogue record.
  - sign: u
    name: Update
    text: Update, release note, changelog entry.
  - sign: h
    name: Hypothesis
    text: Hypothesis, assumption, speculative idea.
  - sign: s
    name: Specification
    text: Technical specification, instruction, requirements.
```

After setup, run `calibrate_cores` — the server creates reference vectors.
Check pairwise cosines: mean-centered between different domains should be
noticeably lower than raw. If all pairs are roughly equal — strengthen the differences in texts.
You can also run the standalone etalon check from the installed package:
`nouz-calc-etalons --config config.yaml`.

`etalons` are semantic domains compared through embeddings.
`artifact_signs` describe the material type of L5 artifacts: note, concept, reference, log, update, hypothesis, or specification. This is a heuristic label, not a separate embedding etalon. In the public convention, domains use uppercase signs (`S/D/E`) while material types use lowercase signs (`n/c/r/l/u/h/s`); you can replace them in config as long as signs stay short and do not conflict with domain signs. If needed, add `keywords` to any material type: the server will use your detection words instead of the built-in RU/EN fallback.

### Real Calculation Example

Here are actual results for the S/D/E etalons using the `text-embedding-granite-embedding-278m-multilingual` model:

```
=== Pairwise Cosine (raw) ===
S↔D: 0.5894    S↔E: 0.5862    D↔E: 0.6022

=== Pairwise Cosine (mean-centered) ===
S↔D: -0.5059   S↔E: -0.5117   D↔E: -0.4822
```

Negative mean-centered values are a good result here: after subtracting the mean vector, domains are well-separated. Self-classification: S→99.4%, D→97.5%, E→96.9%.

| Variable | Default | Description |
| --- | --- | --- |
| `OBSIDIAN_ROOT` | `./obsidian` | Path to vault |
| `NOUZ_CONFIG` | *(empty)* | Absolute path to `config.yaml`; if omitted, the server looks in the current working directory |
| `NOUZ_DATABASE_NAME` | `obsidian_kb.db` | SQLite cache filename inside `OBSIDIAN_ROOT`; useful for isolated public checks, e.g. `obsidian_kb.public.db` |
| `NOUZ_DATABASE_PATH` | *(empty)* | Full SQLite cache path; takes precedence over `NOUZ_DATABASE_NAME` |
| `EMBED_PROVIDER` | `openai` | `openai`, `lmstudio`, `ollama` |
| `EMBED_API_URL` | `http://127.0.0.1:1234/v1` | Embedding endpoint |
| `EMBED_API_KEY` | *(empty)* | API key, if needed |
| `EMBED_MODEL` | *(empty)* | Model name |

---

## Privacy

| Component | Local? |
|-----------|-----------|
| Embeddings (LM Studio / Ollama) | ✅ Yes |
| Your notes | ✅ Yes |
| NOUZ server | ✅ Yes |
| AI agent context (Claude, ChatGPT) | ❌ Goes to cloud |

Everything critical stays on your machine.

---

## Development

```bash
git clone https://github.com/Semiotronika/NOUZ-MCP
cd NOUZ-MCP
pip install -e .
python -m compileall -q nouz_mcp pytest_smoke.py scripts
python -m pytest -q
python test_server.py
```

---

## Links

- 🌐 [semiotronika.ru](https://semiotronika.ru)
- 📦 [PyPI](https://pypi.org/project/nouz-mcp/)
- 🗂️ [Glama Registry](https://glama.ai/mcp/servers/Semiotronika/NOUZ-MCP)
- 🐙 [GitHub](https://github.com/Semiotronika/NOUZ-MCP)

MIT License © 2026 Semiotronika

*Cosines are computed. Syntax changes. Semantics remains.*

<!-- mcp-name: io.github.Semiotronika/NOUZ-MCP -->
