Metadata-Version: 2.4
Name: mcp-glovebox
Version: 0.1.2
Summary: MCP server for read-only sensitive data: tools return metadata and aggregates only
Author-email: maker-nathan <nathan@makernet.work>
License-Expression: MIT
Project-URL: Homepage, https://gitlab.com/maker-nathan/glovebox
Project-URL: Documentation, https://gitlab.com/maker-nathan/glovebox/-/tree/main/docs
Project-URL: Source, https://gitlab.com/maker-nathan/glovebox
Project-URL: Bug Tracker, https://gitlab.com/maker-nathan/glovebox/-/issues
Keywords: mcp,model-context-protocol,llm,security,sensitive-data,privacy,data-governance
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Filesystems
Classifier: Typing :: Typed
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: mcp>=1.27.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.24.0; extra == "dev"
Requires-Dist: pytest-cov>=5.0.0; extra == "dev"
Requires-Dist: pyyaml>=6.0.0; extra == "dev"
Requires-Dist: faker>=25.0.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: mkdocs<2,>=1.6; extra == "docs"
Requires-Dist: mkdocs-material<10,>=9.5; extra == "docs"
Provides-Extra: eval
Requires-Dist: litellm>=1.55.0; extra == "eval"
Requires-Dist: openai>=1.0.0; extra == "eval"
Requires-Dist: anthropic>=0.40.0; extra == "eval"
Dynamic: license-file

# Glovebox

Glovebox is a [Model Context Protocol](https://modelcontextprotocol.io/) (MCP) server that exposes **read-only** access to sensitive files mounted at a fixed path inside a container. Tools return **metadata and aggregates only** (directory listings, file stats, regex match counts, line numbers, CSV dimensions, line counts)—not raw file contents.

The design follows the “glovebox / cleanroom” idea: data stays in a bounded environment; the model receives structured summaries through tool results, not a dump of secrets.

**Full documentation:** browse the markdown on [GitLab](https://gitlab.com/maker-nathan/glovebox/-/tree/main/docs) ([index](https://gitlab.com/maker-nathan/glovebox/-/blob/main/docs/index.md)), or build the MkDocs site locally with `pip install -e '.[docs]' && mkdocs serve`.

## Quick start (pip)

```bash
pip install mcp-glovebox
```

Set the root and start the MCP server on stdio:

```bash
GLOVEBOX_ROOT=/path/to/sensitive glovebox
```

Pre-flight check before wiring a client:

```bash
GLOVEBOX_ROOT=/path/to/sensitive glovebox --doctor
glovebox --print-config    # resolved JSON snapshot for automation
```

Configure your MCP client to run `glovebox` with `GLOVEBOX_ROOT` set. Copy-paste JSON for Claude Desktop and Cursor are on the [MCP client examples](https://gitlab.com/maker-nathan/glovebox/-/blob/main/docs/getting-started/mcp-examples.md) page.

## Quick start (Docker)

> **Architecture note:** the published image targets `linux/arm64` (Apple Silicon, AWS Graviton). It runs natively on Mac M-series and arm64 Linux. For x86-64 hosts, build locally (see "Build it yourself" under [Releases and versioning](#releases-and-versioning)).

Pull the published image:

```bash
docker pull touchthesun/glovebox:0.1.1
```

Or build locally:

```bash
docker build -t glovebox:local .
```

Run the MCP server on stdio (required for most MCP clients). Mount your sensitive directory **read-only** at `/glovebox/data`. Use the hardened form for any deployment against real sensitive data:

```bash
docker run --rm -i \
  --read-only \
  --tmpfs /tmp \
  --cap-drop ALL \
  --security-opt no-new-privileges \
  -v /path/to/sensitive:/glovebox/data:ro \
  touchthesun/glovebox:0.1.1
```

`-i` keeps stdin open so the client can speak MCP over stdio. See [Security defaults](https://gitlab.com/maker-nathan/glovebox/-/blob/main/docs/security/defaults.md) for a full audit of what the flags do and what a naive user gets without them.

Configure your MCP client to launch this command. Copy-paste JSON templates are in [MCP client examples](https://gitlab.com/maker-nathan/glovebox/-/blob/main/docs/getting-started/mcp-examples.md); broader integration notes live in [Integration](https://gitlab.com/maker-nathan/glovebox/-/blob/main/docs/getting-started/integration.md).

Before attaching a client locally, sanity-check your mount directory:

```bash
GLOVEBOX_ROOT=/path/to/sensitive glovebox --doctor
```

## Environment variables

| Variable | Default | Meaning |
|----------|---------|---------|
| `GLOVEBOX_ROOT` | `/glovebox/data` | Directory all tool paths are relative to |
| `GLOVEBOX_MAX_OUTPUT_BYTES` | `256000` | Upper bound on JSON size for a tool result |
| `GLOVEBOX_MAX_SEARCH_FILE_BYTES` | `1048576` | Files larger than this are rejected for search/aggregate |
| `GLOVEBOX_SEARCH_BUDGET` | `100` | Per-session ceiling on search calls (`<=0` disables); bounds oracle-style reconstruction |
| `GLOVEBOX_MIN_CELL` | `5` | Small-cell suppression threshold; counts below this are returned as `"<k"` to reduce re-identification risk |
| `GLOVEBOX_MIN_FILE_ROWS` | `0` (off) | Refuse search/aggregate on files below N rows/lines |
| `GLOVEBOX_REDACT_FILENAMES` | `0` (off) | Hash `name` fields in `glovebox_list` responses instead of returning real filenames. Enable when filenames in the mount are themselves sensitive (e.g. `patient_HIV_positive.pdf`). **Trade-off:** directory-listing navigation is disabled; directed-analysis workflows (explicit paths) are unaffected. |
| `GLOVEBOX_AUDIT_LOG` | _(stderr only)_ | Append JSONL audit records to this file path in addition to stderr |

## Built-in tools

- **`glovebox_list`** — List directory entries (name, type, size, mtime). No file contents.
- **`glovebox_stat`** — Metadata for one path. No file contents.
- **`glovebox_search`** — Regex search: `count_matches` or `line_numbers_only`. Never returns matching line text.
- **`glovebox_aggregate`** — `csv`: row and column counts; `text`: line count only. Never returns cell or line contents.

Paths are **relative** to `GLOVEBOX_ROOT`; absolute paths and escapes outside the mount are rejected.

## Threat model (summary)

**Primary defence — keep secrets out of filenames.** Tool responses return metadata verbatim (filenames, directory structure, sizes, mtimes). Do not encode sensitive information in file or directory names; Glovebox protects file *contents*, not metadata. If you cannot control filenames, set `GLOVEBOX_REDACT_FILENAMES=1` — see the env vars table above for the trade-off.

Glovebox reliably answers **count and frequency questions** (how many rows match this pattern? which lines contain this credential?) without field values entering model context. Segmented analysis over known categories is possible with multiple search calls. Statistical aggregates, value discovery, and open-ended exploration require the constrained-computation roadmap tier. See the [use-case boundary analysis](https://gitlab.com/maker-nathan/glovebox/-/blob/main/docs/use-cases/boundary-analysis.md) for a full ✓/≈/✗ breakdown across PII and code-audit scenarios.

Glovebox is **one control** in a larger compliance story: it minimizes what crosses into the model context but does not govern LLM providers, compromised hosts, or malicious MCP clients. See the full [threat model](https://gitlab.com/maker-nathan/glovebox/-/blob/main/docs/security/threat-model.md) and [harness non-goals](https://gitlab.com/maker-nathan/glovebox/-/blob/main/docs/harness/non-goals.md).

## Development

```bash
python -m venv .venv
source .venv/bin/activate
pip install -e '.[dev]'
pytest                                          # CI also runs scripts/export_tool_manifest.py --check
```

Run the MCP server locally (stdio):

```bash
GLOVEBOX_ROOT=/path/to/data python -m glovebox
```

Pre-flight diagnostics:

```bash
GLOVEBOX_ROOT=/path/to/data python -m glovebox --doctor
python -m glovebox --print-config               # resolved JSON snapshot for automation
```

Documentation site locally:

```bash
pip install -e '.[docs]'
mkdocs serve
```

## Releases and versioning

Releases follow [Semantic Versioning](https://semver.org/). See [CHANGELOG.md](https://gitlab.com/maker-nathan/glovebox/-/blob/main/CHANGELOG.md) for the full history.

**Tagging convention:** a git tag `v0.1.0` produces Docker images tagged `0.1.0` and `latest`. Users should pin to the versioned tag, not `latest`.

```bash
docker pull touchthesun/glovebox:0.1.1     # pinned — recommended
docker pull touchthesun/glovebox:latest     # floating — only for local dev
```

> **Architecture:** published images target `linux/arm64` (Apple Silicon, AWS Graviton). x86-64 users should build locally.

**To cut a release:**

1. Update `version` in `pyproject.toml` and `src/glovebox/_version.py` to match.
2. Add a release entry to `CHANGELOG.md`.
3. Commit, then push a semver tag:
   ```bash
   git tag v0.1.0 && git push origin v0.1.0
   ```
4. In CI, manually trigger `docker_push_hub` (and `docker_push` for the GitLab registry). Both jobs require `DOCKERHUB_USER` / `DOCKERHUB_TOKEN` CI variables to be set.

**Build it yourself** (required for x86-64; always available as a fallback):

```bash
docker build -t glovebox:local .
docker tag glovebox:local touchthesun/glovebox:0.1.0
docker push touchthesun/glovebox:0.1.0
```

## Adding tools

Use the [glovebox-tool](https://gitlab.com/maker-nathan/glovebox/-/blob/main/.cursor/skills/glovebox-tool/SKILL.md) Cursor skill and the [templates/tool](https://gitlab.com/maker-nathan/glovebox/-/tree/main/templates/tool) template. New tools must preserve the no-leak contract and add contract tests. Run `python scripts/validate_tools.py` before committing, then `python scripts/export_tool_manifest.py --write` when tool schemas change.

## Contributing and security

See [CONTRIBUTING.md](https://gitlab.com/maker-nathan/glovebox/-/blob/main/CONTRIBUTING.md) for development setup, the no-leak contract, and the PR checklist. To report a security vulnerability, follow the process in [SECURITY.md](https://gitlab.com/maker-nathan/glovebox/-/blob/main/SECURITY.md) — do not open a public issue.

## Evaluation harness

The [harness](https://gitlab.com/maker-nathan/glovebox/-/tree/main/harness) directory runs four layers of scenarios (tool surface, LLM behavior, inference, evidence). See [Harness overview](https://gitlab.com/maker-nathan/glovebox/-/blob/main/docs/harness/index.md), [harness roadmap](https://gitlab.com/maker-nathan/glovebox/-/blob/main/docs/harness/roadmap.md), and [CI semantics](https://gitlab.com/maker-nathan/glovebox/-/blob/main/docs/harness/ci-semantics.md).

Optional Falco-sidecar notes: [Hardening](https://gitlab.com/maker-nathan/glovebox/-/blob/main/docs/hardening/index.md).
