Metadata-Version: 2.4
Name: knowledge-index
Version: 0.4.1
Summary: Search index for agent memory — knowledge graph index for your documents.
Project-URL: Homepage, https://github.com/zach-blumenfeld/knowledge-index
Project-URL: Repository, https://github.com/zach-blumenfeld/knowledge-index
Author: Zach Blumenfeld
License:                                  Apache License
                                   Version 2.0, January 2004
                                http://www.apache.org/licenses/
        
           TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
        
           1. Definitions.
        
              "License" shall mean the terms and conditions for use, reproduction,
              and distribution as defined by Sections 1 through 9 of this document.
        
              "Licensor" shall mean the copyright owner or entity authorized by
              the copyright owner that is granting the License.
        
              "Legal Entity" shall mean the union of the acting entity and all
              other entities that control, are controlled by, or are under common
              control with that entity. For the purposes of this definition,
              "control" means (i) the power, direct or indirect, to cause the
              direction or management of such entity, whether by contract or
              otherwise, or (ii) ownership of fifty percent (50%) or more of the
              outstanding shares, or (iii) beneficial ownership of such entity.
        
              "You" (or "Your") shall mean an individual or Legal Entity
              exercising permissions granted by this License.
        
              "Source" form shall mean the preferred form for making modifications,
              including but not limited to software source code, documentation
              source, and configuration files.
        
              "Object" form shall mean any form resulting from mechanical
              transformation or translation of a Source form, including but
              not limited to compiled object code, generated documentation,
              and conversions to other media types.
        
              "Work" shall mean the work of authorship, whether in Source or
              Object form, made available under the License, as indicated by a
              copyright notice that is included in or attached to the work
              (an example is provided in the Appendix below).
        
              "Derivative Works" shall mean any work, whether in Source or Object
              form, that is based on (or derived from) the Work and for which the
              editorial revisions, annotations, elaborations, or other modifications
              represent, as a whole, an original work of authorship. For the purposes
              of this License, Derivative Works shall not include works that remain
              separable from, or merely link (or bind by name) to the interfaces of,
              the Work and Derivative Works thereof.
        
              "Contribution" shall mean any work of authorship, including
              the original version of the Work and any modifications or additions
              to that Work or Derivative Works thereof, that is intentionally
              submitted to the Licensor for inclusion in the Work by the copyright owner
              or by an individual or Legal Entity authorized to submit on behalf of
              the copyright owner. For the purposes of this definition, "submitted"
              means any form of electronic, verbal, or written communication sent
              to the Licensor or its representatives, including but not limited to
              communication on electronic mailing lists, source code control systems,
              and issue tracking systems that are managed by, or on behalf of, the
              Licensor for the purpose of discussing and improving the Work, but
              excluding communication that is conspicuously marked or otherwise
              designated in writing by the copyright owner as "Not a Contribution."
        
              "Contributor" shall mean Licensor and any individual or Legal Entity
              on behalf of whom a Contribution has been received by the Licensor and
              subsequently incorporated within the Work.
        
           2. Grant of Copyright License. Subject to the terms and conditions of
              this License, each Contributor hereby grants to You a perpetual,
              worldwide, non-exclusive, no-charge, royalty-free, irrevocable
              copyright license to reproduce, prepare Derivative Works of,
              publicly display, publicly perform, sublicense, and distribute the
              Work and such Derivative Works in Source or Object form.
        
           3. Grant of Patent License. Subject to the terms and conditions of
              this License, each Contributor hereby grants to You a perpetual,
              worldwide, non-exclusive, no-charge, royalty-free, irrevocable
              (except as stated in this section) patent license to make, have made,
              use, offer to sell, sell, import, and otherwise transfer the Work,
              where such license applies only to those patent claims licensable
              by such Contributor that are necessarily infringed by their
              Contribution(s) alone or by combination of their Contribution(s)
              with the Work to which such Contribution(s) was submitted. If You
              institute patent litigation against any entity (including a
              cross-claim or counterclaim in a lawsuit) alleging that the Work
              or a Contribution incorporated within the Work constitutes direct
              or contributory patent infringement, then any patent licenses
              granted to You under this License for that Work shall terminate
              as of the date such litigation is filed.
        
           4. Redistribution. You may reproduce and distribute copies of the
              Work or Derivative Works thereof in any medium, with or without
              modifications, and in Source or Object form, provided that You
              meet the following conditions:
        
              (a) You must give any other recipients of the Work or
                  Derivative Works a copy of this License; and
        
              (b) You must cause any modified files to carry prominent notices
                  stating that You changed the files; and
        
              (c) You must retain, in the Source form of any Derivative Works
                  that You distribute, all copyright, patent, trademark, and
                  attribution notices from the Source form of the Work,
                  excluding those notices that do not pertain to any part of
                  the Derivative Works; and
        
              (d) If the Work includes a "NOTICE" text file as part of its
                  distribution, then any Derivative Works that You distribute must
                  include a readable copy of the attribution notices contained
                  within such NOTICE file, excluding any notices that do not
                  pertain to any part of the Derivative Works, in at least one
                  of the following places: within a NOTICE text file distributed
                  as part of the Derivative Works; within the Source form or
                  documentation, if provided along with the Derivative Works; or,
                  within a display generated by the Derivative Works, if and
                  wherever such third-party notices normally appear. The contents
                  of the NOTICE file are for informational purposes only and
                  do not modify the License. You may add Your own attribution
                  notices within Derivative Works that You distribute, alongside
                  or as an addendum to the NOTICE text from the Work, provided
                  that such additional attribution notices cannot be construed
                  as modifying the License.
        
              You may add Your own copyright statement to Your modifications and
              may provide additional or different license terms and conditions
              for use, reproduction, or distribution of Your modifications, or
              for any such Derivative Works as a whole, provided Your use,
              reproduction, and distribution of the Work otherwise complies with
              the conditions stated in this License.
        
           5. Submission of Contributions. Unless You explicitly state otherwise,
              any Contribution intentionally submitted for inclusion in the Work
              by You to the Licensor shall be under the terms and conditions of
              this License, without any additional terms or conditions.
              Notwithstanding the above, nothing herein shall supersede or modify
              the terms of any separate license agreement you may have executed
              with Licensor regarding such Contributions.
        
           6. Trademarks. This License does not grant permission to use the trade
              names, trademarks, service marks, or product names of the Licensor,
              except as required for reasonable and customary use in describing the
              origin of the Work and reproducing the content of the NOTICE file.
        
           7. Disclaimer of Warranty. Unless required by applicable law or
              agreed to in writing, Licensor provides the Work (and each
              Contributor provides its Contributions) on an "AS IS" BASIS,
              WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
              implied, including, without limitation, any warranties or conditions
              of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
              PARTICULAR PURPOSE. You are solely responsible for determining the
              appropriateness of using or redistributing the Work and assume any
              risks associated with Your exercise of permissions under this License.
        
           8. Limitation of Liability. In no event and under no legal theory,
              whether in tort (including negligence), contract, or otherwise,
              unless required by applicable law (such as deliberate and grossly
              negligent acts) or agreed to in writing, shall any Contributor be
              liable to You for damages, including any direct, indirect, special,
              incidental, or consequential damages of any character arising as a
              result of this License or out of the use or inability to use the
              Work (including but not limited to damages for loss of goodwill,
              work stoppage, computer failure or malfunction, or any and all
              other commercial damages or losses), even if such Contributor
              has been advised of the possibility of such damages.
        
           9. Accepting Warranty or Additional Liability. While redistributing
              the Work or Derivative Works thereof, You may choose to offer,
              and charge a fee for, acceptance of support, warranty, indemnity,
              or other liability obligations and/or rights consistent with this
              License. However, in accepting such obligations, You may act only
              on Your own behalf and on Your sole responsibility, not on behalf
              of any other Contributor, and only if You agree to indemnify,
              defend, and hold each Contributor harmless for any liability
              incurred by, or claims asserted against, such Contributor by reason
              of your accepting any such warranty or additional liability.
        
           END OF TERMS AND CONDITIONS
        
           Copyright 2026 Neo4j Labs
        
           Licensed under the Apache License, Version 2.0 (the "License");
           you may not use this file except in compliance with the License.
           You may obtain a copy of the License at
        
               http://www.apache.org/licenses/LICENSE-2.0
        
           Unless required by applicable law or agreed to in writing, software
           distributed under the License is distributed on an "AS IS" BASIS,
           WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
           See the License for the specific language governing permissions and
           limitations under the License.
License-File: LICENSE
Keywords: agent-memory,knowledge-graph,markdown,neo4j,obsidian,search
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Classifier: Topic :: Text Processing :: Markup :: Markdown
Requires-Python: >=3.11
Requires-Dist: aiofiles>=23.2
Requires-Dist: click>=8.1
Requires-Dist: markdown-it-py>=3.0
Requires-Dist: neo4j>=5.20
Requires-Dist: python-frontmatter>=1.1
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.7
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.5; extra == 'dev'
Description-Content-Type: text/markdown

# <img src="img/ki.png" alt="ki logo" width="120" align="left" />  Knowledge Index (`ki`)
> Scale memory across agents and docs
<br clear="left">

Point `ki` at a folder (or many), sync to graph index, and query from the CLI or any AI agent. Multiple folders and users can share the same index. Source files are never modified, so it's safe on an Obsidian vault, a git repo, or a research folder. Backed by Neo4j.


install `ki`, wire into Claude Code, ask Claude what's in your notes:

```bash
uv tool install knowledge-index
ki configure                       # one-time: pick Local (Podman), Aura, or an Existing Neo4j — see `references/neo4j-podman.md` for the Local path
ki skill install claude-code       # drops the routing rules into ~/.claude/skills/ki/SKILL.md
```

Then in Claude Code:

```sh
❯ index your memory                                                                                                                     

...

⏺ Bash(ki index /Users/zachblumenfeld/.claude/projects/ 2>&1 | tail -40)
  ⎿  ✓ Initialized vault at /Users/zachblumenfeld/.claude/projects (id: 85f4be4d…)
     Indexed: 13 added, 0 updated, 0 unchanged, 3 sections, 10 links.

⏺ Indexed. 13 docs added across your project memory dirs (3 sections, 10 cross-doc links). Try ki search "..." to query — e.g. ki search
   "decision graph" or ki search "naming" --type section.

...

❯ what retrieval stuff have we built lately?                                                                                            

⏺ Bash(ki search "retrieval" --k 10 --json 2>&1)
  ⎿  [
       {
         "document_uri": "ce2cc5c8-2968-44fd-8ea8-caca79cfd747/retrieval-queries.md",
     … +121 lines (ctrl+o to expand)

  Listed 1 directory (ctrl+o to expand)

⏺ Based on the indexed memory, here's what's there on retrieval (current working dir ~/demo/retrieval is empty, so this is from
  notes/docs):

  ki (Knowledge Index) — the main retrieval project
  - retrieval-queries.md — 10 retrieval queries (B.1–B.10) ported from the old Wikipedia-graph queries to the new
  User–Vault–Document–Section schema
  - requirements.md + ingest-cypher.md — schema, constraints, and the content_search fulltext index over Document|Section|Vault on
  displayName + content + aliases + description
  - Key design call: fulltext is the v1 retrieval substrate; vector/embeddings deferred. Wikilink aliases ("JFK" / "John F Kennedy")
  indexed so alternates hit the same doc.
  - Validated approach: what-worked.md notes that section-level retrieval beat whole-document retrieval when dogfooded against the
  repo's own docs/.
  
  Other hits (Clojure / internals-markdown notes mentioning "information retrieval") are unrelated tangents — noise from the broader
  vault.

  Want me to pull the full retrieval-queries doc or a specific section?

```

That's the whole loop. Read on for the same flow with other agents, direct CLI use, and what's not in this release.

## Getting started

### With a coding agent (Claude Code, Cursor, Windsurf, …)

Coding agents can shell out to `ki` directly. `ki skill install` drops the bundled routing rules (the markdown at [`skills/ki/SKILL.md`](skills/ki/SKILL.md)) into each agent's well-known config path so the agent knows *when* to use `ki` — track / remember / build a knowledge base / search my notes / find related material — and *when* to skip.

```bash
# 1. Install ki on PATH.
uv tool install knowledge-index            # if `uv` isn't installed: curl -LsSf https://astral.sh/uv/install.sh | sh
ki --version

# 2. One-time Neo4j connection. Three paths: Local (Podman — see references/neo4j-podman.md),
#    Aura (billable cloud), or Existing (point at a Neo4j you already run).
ki configure

# 3. Install the skill into every detected agent — or pick one explicitly.
ki skill install                           # all detected agents
ki skill install claude-code               # one specific agent
ki skill list                              # what's wired up, what's detected
ki skill remove claude-code                # undo
```

Supported agent catalog

```
claude-code   cursor   windsurf   copilot    gemini-cli
cline         codex    pi         opencode   junie
```

For anything not in the catalog, pass an explicit path:

```bash
ki skill install my-fork --path ~/.my-fork/rules/ki.md
```

Then ask the agent things like:

- *"Can you incorporate this folder of notes into your memory?"*
- *"What did I write about retrieval strategies?"*
- *"Find the doc where I sketched the schema."*

Auto-mode rules (full text in [`skills/ki/SKILL.md`](skills/ki/SKILL.md)): reversible local actions (`ki index`, single-doc `ki rm`) fire without asking; irreversible or billable actions (`ki configure → Aura`, whole-vault `ki rm --vault`) pause for explicit consent. Source files are never modified by either `ki` or the agent.

### From the command line (no agent)

If you'd rather drive `ki` yourself:

```bash
uv tool install knowledge-index                # install
ki configure                                   # one-time Neo4j connection
ki index ~/Documents/my-vault                  # sync the folder into the graph (idempotent)

ki search "retrieval"                          # default: section content (B.2)
ki search "graph" --type document --k 5        # document title  (B.1)
ki search "graphs" --type vault --k 5          # cross-vault routing by description (B.11)
ki search "" --type neighbors --doc-uri <uri>  # 1-hop link neighbourhood (B.3)

ki vault list                                  # show every indexed vault with its description

ki rm ~/Documents/my-vault/notes/old.md        # remove a doc from the index (file untouched)
ki rm ~/Documents/my-vault --vault             # remove a whole vault (typed confirmation)
```

All commands: `ki configure | index | search | vault | rm | init | skill`. Run any with `--help` for flags. `KI_PROFILE=work ki index ./vault` overrides the profile per-invocation. Run `uvx knowledge-index --help` first if you'd rather not install globally.

Per-vault routing is driven by `<vault>/.ki/vault.yaml`. `ki` writes the `uri:` UUID on first ingest; add an optional `description:` to give agents a short routing hint about what this vault is for. Quickest way to set it: `ki index <vault> --description "..."` (or wait for the interactive prompt on the very first `ki index`). The description flows into `Vault.description` on each ingest and powers `ki search --type vault`.

### From a chat app (Claude, ChatGPT, Gemini, Copilot — web / desktop)

Not yet supported. Required MCP server.  On Roadmap

## Roadmap & known limitations

`ki` v0.1 is intentionally scoped. The items below are not bugs — they're explicit deferrals you should know about before betting on it.

### Local Neo4j via Podman — the recommended quick-start

`ki configure` offers three paths: **Local** (Podman), **Aura** (billable cloud), **Existing** (point at a URI you already have).

**Local** runs `neo4j:latest` in a Podman container with the APOC + GenAI plugins enabled, a named volume for persistence, and `--restart unless-stopped`. The full agent-followable runbook — preflight, bring-up, recovery for the three failure modes (container stopped / removed / volume wiped), and teardown — lives in [`references/neo4j-podman.md`](references/neo4j-podman.md). `ki configure → Local` shells out to that path automatically; if you'd rather read what it does first, the runbook is the source of truth.

Prerequisite: `podman` on PATH. On macOS:

```bash
brew install podman
podman machine init
podman machine start
```

On Linux: `apt install podman` / `dnf install podman` / etc. (no `machine` step needed.)

**Aura** — `neo4j-cli aura create` provisions a real billable cloud instance. `ki configure → Aura` walks you through it. See [neo4j-labs/neo4j-cli](https://github.com/neo4j-labs/neo4j-cli).

**Existing** — any Neo4j you can reach over Bolt works. Pick this if you already run Neo4j via Docker, a managed service, or anything else; `ki configure → Existing` just prompts for URI + credentials.

### No vector search yet — fulltext only

`ki search` runs against the `content_search` fulltext index over `Document|Section|Vault.{displayName, content, aliases, description}`. There are no vector indexes or embeddings in the graph yet; hybrid (fulltext + vector) is on the v2 list. The `genai` plugin is already enabled in the Podman setup (see `references/neo4j-podman.md`) for the upgrade path, so when this lands existing vaults won't need to be re-ingested.

Of the ten queries defined in [`docs/retrieval-queries.md`](docs/retrieval-queries.md), three are wired into the CLI today; the rest exist as Cypher but aren't reachable through `ki search` yet:

| Flag                          | Query | What it does                       |
|-------------------------------|-------|------------------------------------|
| `--type section` (default)    | B.2   | Section content fulltext           |
| `--type document`             | B.1   | Document title fulltext            |
| `--type neighbors --doc-uri`  | B.3   | 1-hop `LINKS_TO` neighbourhood     |

The remaining seven retrieval shapes — full-document text, frontmatter + section titles, section get-by-URI, ±N section windowing (full and summary), backlinks, shortest path — are tracked at [#6](https://github.com/zach-blumenfeld/knowledge-index/issues/6).

### Markdown (`.md`) only — convert other formats first

v1 indexes `.md` files only. For PDFs, docx, HTML, or plaintext, convert to markdown first with `pandoc`, `markitdown`, or by reading + transcribing, then run `ki index` on the output folder. See the *PREPARE when* section of [`skills/ki/SKILL.md`](skills/ki/SKILL.md) for the agent-side flow. Native ingest of other formats is on the roadmap.

### No MCP server to work with chat apps — only coding agents work today

Coding agents (Claude Code, Cursor, …) run on your machine and can shell out to `ki` directly. Chat apps (claude.ai, ChatGPT, Gemini, Copilot Web/Desktop) can't — they need an MCP server bridging the chat surface to a local tool. `ki` doesn't ship one yet. Until then, use a coding agent on the same machine, or paste `ki search "..." --json` output into the chat manually.

### Smaller deferrals

These are unlikely to change soon but worth being explicit about:

- **Single-machine ingest, single Neo4j write session.** Concurrent writers would deadlock on shared `MERGE` targets and the throughput at v1 scales doesn't justify the complexity — see [`docs/requirements_v01_mvp.md`](docs/requirements_v01_mvp.md) §Scalability lever 5.
- **No `:Folder` node label.** Hierarchy lives in `Document.uri` and prefix-matches handle subtree queries.
- **Plaintext passwords in `~/.config/ki/config.yaml`** (file mode `0600`). OS keyring integration is the v2 upgrade path.
- **No `--purge`, ever.** `ki` removes data from the index; source files are *always* untouched. See [`AGENTS.md`](AGENTS.md) §Non-negotiable design principles.
- **PyPI is the only supported install.** No Homebrew formula, no `curl | sh`, no standalone binaries.

For everything else on the active roadmap (embeddings, PageRank, MCP server, native non-markdown ingest, …), see the open issues at <https://github.com/zach-blumenfeld/knowledge-index/issues>.

## Development

If you want to hack on `ki` itself (rather than just use it), the loop is `uv sync --extra dev && uv run pytest && uv run ruff check src/ tests/ scripts/`.

### Setup

```bash
git clone https://github.com/zach-blumenfeld/knowledge-index.git
cd knowledge-index
uv sync --extra dev          # installs runtime + pytest + ruff
```

### Run tests

```bash
# Unit tests only — pure Python, no Neo4j needed.
uv run pytest tests/unit -v

# Full suite. Integration tests auto-skip if no Neo4j is reachable.
uv run pytest tests/ -v

# To actually run integration tests, point them at any Neo4j you have
# (Docker, Aura, or a local install):
KI_TEST_NEO4J_URI=bolt://localhost:7687 \
KI_TEST_NEO4J_USER=neo4j \
KI_TEST_NEO4J_PASSWORD=password \
  uv run pytest tests/ -v
```

The integration suite is destructive — it ingests `tests/fixtures/sample_vault/` and `DETACH DELETE`s vaults on teardown. Don't point it at a Neo4j that holds real data.

### Lint

```bash
uv run ruff check src/ tests/ scripts/
```

CI runs the same command on Python 3.11 / 3.12 / 3.13.

### Test fixtures

`tests/fixtures/sample_vault/` is generated by `scripts/gen_test_vault.py`. Don't hand-edit — regenerate:

```bash
rm -rf tests/fixtures/sample_vault
uv run python scripts/gen_test_vault.py --size tiny --seed 42 \
  --output tests/fixtures/sample_vault
```

Same `--seed` → byte-identical output across runs. The generator supports `tiny / small / medium / large` matching the §Scalability envelopes in [`docs/requirements_v01_mvp.md`](docs/requirements_v01_mvp.md).

### Contributing

Before opening a PR:

1. Read [`AGENTS.md`](AGENTS.md) — design principles, project map, and the *Don't* list.
2. Skim [`docs/requirements_v01_mvp.md`](docs/requirements_v01_mvp.md) — anything in there is normative. If your change conflicts with the spec, update the spec in the same PR.
3. If you're changing the schema, update [`docs/data-model.md`](docs/data-model.md) before the code.
4. If you're changing Cypher, update [`docs/ingest-cypher.md`](docs/ingest-cypher.md) or [`docs/retrieval-queries.md`](docs/retrieval-queries.md) before the code — those are the source of truth.
5. If you're changing CLI behavior, keep [`docs/requirements_v01_mvp.md`](docs/requirements_v01_mvp.md), [`skills/ki/SKILL.md`](skills/ki/SKILL.md), and the implementation in lockstep — drift between those is the #1 source of agent-routing bugs.

### Release flow

The release workflow is manual and lives at [`.github/workflows/release.yml`](.github/workflows/release.yml). To cut a release:

1. Bump `version = "..."` in `pyproject.toml`.
2. Add a `## [X.Y.Z] — YYYY-MM-DD` section to [`CHANGELOG.md`](CHANGELOG.md). The heading format is load-bearing — the workflow `awk`-extracts the release notes by matching it exactly.
3. Open a PR, merge to `main`.
4. **Actions** tab → **Release to PyPI** → **Run workflow** → branch `main`.

The workflow refuses to re-release an existing tag (forces a version bump on re-run), builds, publishes via PyPI Trusted Publishing, creates the git tag, and cuts a GitHub Release with body extracted from CHANGELOG.md.

## Learn more

- [`docs/requirements_v01_mvp.md`](docs/requirements_v01_mvp.md) — full design spec (CLI shape, schema, scalability, auto-mode rules)
- [`docs/data-model.md`](docs/data-model.md) — Neo4j schema (nodes, edges, properties)
- [`docs/ingest-cypher.md`](docs/ingest-cypher.md) — what `ki index` writes
- [`docs/retrieval-queries.md`](docs/retrieval-queries.md) — what `ki search` exposes (B.1–B.10)
- [`skills/ki/SKILL.md`](skills/ki/SKILL.md) — agent routing rules (when an agent should invoke `ki`)
- [`AGENTS.md`](AGENTS.md) — for AI agents (or humans) contributing to the codebase
- [`CHANGELOG.md`](CHANGELOG.md) — release history

## License

See [`LICENSE`](LICENSE).
