Metadata-Version: 2.4
Name: contxt-box
Version: 0.1.1
Summary: Local-first MCP-compatible persistent knowledge base and media contextualization system.
Project-URL: Homepage, https://github.com/Oshadha345/contxt-box
Project-URL: Repository, https://github.com/Oshadha345/contxt-box
Project-URL: Issues, https://github.com/Oshadha345/contxt-box/issues
Author: ConTXT BOX contributors
License: MIT
License-File: LICENSE
Keywords: ai,context,knowledge-base,local-first,mcp,media
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Documentation
Classifier: Topic :: Text Processing :: Indexing
Requires-Python: >=3.12
Requires-Dist: mcp>=1.2.0
Requires-Dist: orjson>=3.10.7
Requires-Dist: pydantic>=2.9.0
Requires-Dist: structlog>=24.4.0
Requires-Dist: typer>=0.12.5
Requires-Dist: watchdog>=4.0.2
Provides-Extra: all
Requires-Dist: chromadb>=0.5.5; extra == 'all'
Requires-Dist: docling>=2.0.0; extra == 'all'
Requires-Dist: markitdown[all]>=0.1.0; extra == 'all'
Requires-Dist: sentence-transformers>=3.0.1; extra == 'all'
Provides-Extra: dev
Requires-Dist: black>=24.8.0; extra == 'dev'
Requires-Dist: mypy>=1.11.2; extra == 'dev'
Requires-Dist: pytest>=8.3.2; extra == 'dev'
Requires-Dist: ruff>=0.6.4; extra == 'dev'
Provides-Extra: media
Requires-Dist: docling>=2.0.0; extra == 'media'
Requires-Dist: markitdown[all]>=0.1.0; extra == 'media'
Provides-Extra: vector
Requires-Dist: chromadb>=0.5.5; extra == 'vector'
Requires-Dist: sentence-transformers>=3.0.1; extra == 'vector'
Description-Content-Type: text/markdown

<div align="center">
  <img src="./logo/logo.svg" alt="ConTXT BOX" width="720">

  <h1>ConTXT BOX</h1>

  <p><strong>A local-first external context box for coding agents.</strong></p>

  <p>
    <a href="https://github.com/Oshadha345/contxt-box/actions"><img alt="CI" src="https://img.shields.io/github/actions/workflow/status/Oshadha345/contxt-box/ci.yml?style=flat-square"></a>
    <a href="https://pypi.org/project/contxt-box/"><img alt="Python" src="https://img.shields.io/badge/python-3.12%2B-3776AB?style=flat-square&logo=python&logoColor=white"></a>
    <a href="https://github.com/modelcontextprotocol/python-sdk"><img alt="MCP" src="https://img.shields.io/badge/MCP-ready-111827?style=flat-square"></a>
    <a href="https://github.com/microsoft/markitdown"><img alt="MarkItDown" src="https://img.shields.io/badge/MarkItDown-primary-2563EB?style=flat-square"></a>
    <a href="https://github.com/docling-project/docling"><img alt="Docling" src="https://img.shields.io/badge/Docling-primary-059669?style=flat-square"></a>
    <a href="./LICENSE"><img alt="License" src="https://img.shields.io/badge/license-MIT-black?style=flat-square"></a>
  </p>
</div>

---

## What Is It?

ConTXT BOX is a strict, local-first knowledge layer that sits beside any project or document folder. It gives coding agents such as Claude Code, Codex, Cursor, and other MCP clients a fast external memory: indexed filenames, folders, neighbors, summaries, cached document/image context, and durable chat preservation.

The design is intentionally narrow. Documents and images are the core path because they cover most real user context. Heavy extraction uses exactly one configured engine: [MarkItDown](https://github.com/microsoft/markitdown) or [Docling](https://github.com/docling-project/docling). No multi-tool fallback chain is used in core extraction.

## Features

- Lazy indexing with `rel_path`, filename, folder, mtime, size, type, neighbors, folder summaries, and cheap file summaries.
- On-demand extraction only through MarkItDown or Docling.
- Permanent Markdown sidecars under `.contextbox/history/media/`.
- MCP tools for coding agents.
- Watchdog-based `watch` command for continuous index updates.
- Preview-only smart reorganization.
- Auto preservation into `.contextbox/CONTEXT.md` plus JSONL history.

## Quick Start

```bash
uv sync
uv run contxtbox --help
uv run contxtbox init --root "S:\Papers"
uv run contxtbox config-show --root "S:\Papers"
uv run contxtbox index --root "S:\Papers"
uv run contxtbox health --root "S:\Papers"
uv run contxtbox search "computer vision" --root "S:\Papers"
```

Install the document/image engines:

```bash
uv sync --extra media
```

Extract one file with the strict default engine:

```bash
uv run contxtbox extract-media "Computer Vision\paper.pdf" --root "S:\Papers"
```

Use Docling explicitly:

```bash
uv run contxtbox extract-media "Computer Vision\paper.pdf" --root "S:\Papers" --engine docling
```

Watch a folder:

```bash
uv run contxtbox watch --root "S:\Papers"
```

Run production readiness checks:

```bash
uv run contxtbox health --root "S:\Papers" --fail-on-error
```

Show the effective workspace config:

```bash
uv run contxtbox config-show --root "S:\Papers"
```

Production and MCP setup guides:

- [Installation](docs/INSTALL.md)
- [Production readiness](docs/PRODUCTION.md)
- [MCP client setup](docs/MCP_CLIENTS.md)
- [Client verification](docs/CLIENT_VERIFICATION.md)

## How It Works

```text
workspace/
`-- .contextbox/
    |-- index.json
    |-- config.toml
    |-- CONTEXT.md
    |-- preservation.jsonl
    `-- history/
        `-- media/
            `-- sanitized__file__path.context.md
```

### Indexing Rules

`index`, `update_index`, and `watch` always record:

- `rel_path`
- `filename`
- `folder_path`
- `mtime`
- `size`
- `file_type`
- `neighbors`
- `parent_folder_summary`
- `last_indexed`
- `context_summary`

The default summary is cheap and deterministic. It uses filename, folder name, and 5-7 nearby files. It does not open PDFs or images during indexing.

### Configuration

`init` creates `.contextbox/config.toml`:

```toml
extraction_engine = "markitdown"
max_inline_bytes = 512000
large_file_bytes = 50000000
max_neighbors = 10
debounce_seconds = 2.0

ignored_dirs = [
  ".git",
  ".venv",
  "node_modules",
]

priority_folders = [
  "codebases/",
  "research/",
  "specs/",
  "decisions/",
  "assets/images/",
]
```

Use `"docling"` when you want Docling as the strict extraction engine.

### Extraction Rules

Heavy extraction only happens when:

- `extract-media path` is called,
- or an MCP client calls `get_file(path, depth="full")`.

The result is cached as Markdown in `.contextbox/history/media/`, and `index.json` receives:

- `extracted_at`
- `context_ref`
- `extraction_method`
- `extraction_status`
- `extraction_warnings`
- `extraction_duration_seconds`

Sidecars include the same audit header before extracted content. Status values are conservative:
`success`, `partial`, `metadata-only`, or `cached`.

### MCP Tools

- `update_index()`
- `server_info()`
- `health()`
- `search(query, limit=10)`
- `get_file(path, depth="metadata" | "full")`
- `pull_context(task, limit=5)`
- `extract_media(path, force=false)`
- `reorganize(instruction)`
- `auto_preserve_context(summary, metadata=null)`

Start the MCP server:

```bash
uv run contxtbox mcp --root "S:\Papers"
```

## Attribution

- [Model Context Protocol Python SDK](https://github.com/modelcontextprotocol/python-sdk), MIT.
- [MarkItDown](https://github.com/microsoft/markitdown), MIT.
- [Docling](https://github.com/docling-project/docling), MIT.
- [watchdog](https://github.com/gorakhargosh/watchdog), Apache-2.0.
- [sentence-transformers](https://github.com/huggingface/sentence-transformers), Apache-2.0 library with model-specific licenses.
- [ChromaDB](https://github.com/chroma-core/chroma), Apache-2.0.
- [gstack](https://github.com/garrytan/gstack), MIT, as workflow inspiration.
- [Ponytail](https://github.com/DietrichGebert/ponytail), MIT, as minimal-agent behavior inspiration.

## Roadmap

- Stronger semantic search over sidecars.
- Reorganization scoring based on folder summaries and neighbor cues.
- MCP client recipes for Claude Code, Codex, Cursor, and others.
- Safe apply/undo flow for reorganization.
- Configurable ignore rules and extraction engine policy.

## License

MIT. See [LICENSE](./LICENSE).

## Release

PyPI publishing is configured for Trusted Publishing through GitHub Actions. See
[Production readiness](docs/PRODUCTION.md#public-release).
