Metadata-Version: 2.4
Name: stronk-gateway
Version: 0.3.14
Summary: Security-first self-hosted privacy masking and routing proxy for LLM traffic.
Author: EYYCHEEV
License: Apache-2.0
Requires-Python: >=3.11
Requires-Dist: fastapi<1.0,>=0.115
Requires-Dist: httpx<1.0,>=0.27
Requires-Dist: presidio-analyzer<3.0,>=2.2
Requires-Dist: pydantic-settings<3.0,>=2.6
Requires-Dist: pydantic<3.0,>=2.9
Requires-Dist: spacy<4.0,>=3.8
Requires-Dist: uvicorn<1.0,>=0.32
Requires-Dist: websockets<16,>=13
Provides-Extra: dev
Requires-Dist: mypy<2.0,>=1.13; extra == 'dev'
Requires-Dist: pytest-cov<7.0,>=6.0; extra == 'dev'
Requires-Dist: pytest<9.0,>=8.3; extra == 'dev'
Requires-Dist: ruff<1.0,>=0.8; extra == 'dev'
Description-Content-Type: text/markdown

# stronk-gateway

Security-first self-hosted masking and rehydration proxy for large language model traffic.

`stronk-gateway` sits between callers and upstream providers. It detects sensitive input, applies per-type policy, forwards only masked payloads, and rehydrates model output before returning it to the caller.

This repo now also includes a separate operator control plane:

- `proxy-api`: the caller-facing masking proxy
- `admin-api`: sanitized monitoring API
- `admin-ui`: modern light/dark monitoring workspace
- `admin-gateway`: authenticated reverse-proxy entrypoint for the control plane

## Implemented Provider Surfaces

- `POST /v1/chat/completions`
- `POST /v1/responses`
- `GET /v1/responses` websocket upgrade for OpenAI Responses-style realtime clients
- `POST /anthropic/v1/messages`
- `GET /health`

Supported behavior:

- non-streaming request masking and response rehydration
- streaming SSE rehydration for the supported provider families
- downstream OpenAI Responses websocket compatibility for `response.create` and `response.append`
- real `previous_response_id` continuations are preserved for upstream `/responses` requests
- local `generate:false` prewarm stays local and uses memory-only turn-state recovery instead of leaking synthetic response IDs upstream
- request-scoped opaque placeholders
- config-driven policy per detector type: `allow`, `mask`, `block`, `route_local`
- deny-by-default behavior when upstream egress or local routing is unsafe
- sanitized audit-event persistence for the optional control plane

## Websocket Deployment Contract

Supported deployment modes:

- single-process deployment
- multi-process deployment with sticky-session affinity for one websocket turn

Supported reconnect semantics:

- any client harness that can speak OpenAI Responses-style JSON text events can use the websocket bridge; the current `x-codex-turn-state` header name is compatibility carry-over, but the value itself is treated as a generic bounded opaque turn key
- `x-codex-turn-state` is validated locally, kept process-local and memory-only, and never forwarded upstream
- the last committed `previous_response_id` is reused only after a terminal event has been delivered locally
- if a socket disconnects before terminal completion, the next reconnect resumes from the last committed state rather than the aborted in-flight turn
- same-turn collisions are rejected with deterministic `409 invalid_websocket_turn`; they are not serialized
- session-cap overflow is rejected; existing sessions are not evicted

Explicitly unsupported in this plan:

- non-sticky multi-worker websocket continuity
- external shared turn-state storage
- generic websocket passthrough

## Safety Guarantees

- Raw sensitive values are never forwarded upstream when a detection is masked or blocked.
- Raw request bodies, response bodies, rehydrated text, headers, provider credentials, and placeholder maps are not persisted by default.
- Upstream egress is fixed to configured provider base URLs; the caller cannot choose arbitrary upstream targets.
- Secret classes default to `block` rather than `mask`.
- If policy requires `route_local` and no local handler exists, the request is rejected instead of falling back upstream.
- The admin plane is separate from proxy routes and is disabled by default.
- The admin plane now uses first-party session-cookie auth in `stronk-gateway-admin`, while raw content traces remain behind the private same-origin bridge path.

## V1 Privacy Boundary

In-scope privacy actions on caller-controlled, upstream-visible surfaces:

- mask `instructions`
- mask `developer` and `system` message content
- reject top-level `tools` when sensitive text appears in descriptions, examples, or schema defaults
- strip `x-codex-turn-metadata`
- do not forward caller-supplied `x-codex-turn-state`
- preserve `authorization`, `x-api-key`, `openai-*`, `anthropic-*`, and `x-responsesapi-include-timing-metrics` as transport/auth headers rather than privacy-scoped text surfaces
- preserve payload `id` and `previous_response_id` as bounded identifier surfaces
- preserve caller `x-request-id` only when it is already a bounded opaque identifier; otherwise replace it with a proxy-issued opaque request id

Still out of scope in v1:

- neutralizing the public `x-codex-turn-state` header name itself
- masking arbitrary provider-defined identifier fields beyond the explicit in-scope surfaces above

## Control Plane

The operator surface is intentionally read-only in this phase.
Provisioning and materialization happen outside the browser through the active deployment profile (`gateway-only` or `gateway+CLIProxyAPI`).

What it shows:

- request counts, mask/block/rehydration totals, and error counts
- detector and policy action mix
- sanitized per-request events with endpoint, model, latency, counts, and touched JSON paths
- safe config posture and recent control-plane access logs

What it does not show:

- raw request bodies
- raw response bodies
- rehydrated plaintext
- upstream `Authorization` or `X-API-Key` headers
- placeholder-to-original mappings

## Detection Coverage

Deterministic detectors are implemented first and augmented with a local Presidio + spaCy layer by default. The detector interface stays pluggable so richer local NER can still be added later without redesigning the pipeline.

Current detector set:

- English and Chinese person names
- Company and organization names
- English and Chinese addresses
- Emails
- US and China phone numbers
- API keys, including `sk-...`, `sk-proj-...`, `sk-ant-...`, and common provider key prefixes
- Bearer tokens
- JWT-like tokens

## Architecture

- `src/stronk_gateway/redaction/` - detection, masking, placeholder vault, and structured payload traversal
- `src/stronk_gateway/policy/` - per-detector policy resolution
- `src/stronk_gateway/providers/` - fixed provider endpoint specs
- `src/stronk_gateway/proxy/` - upstream transport, header controls, SSE rehydration, and audit writes
- `src/stronk_gateway/admin/` - first-party session auth, SQLite-backed sanitized event store, and UI lookup helpers
- `src/stronk_gateway/app.py` - proxy app factory
- `src/stronk_gateway/admin_app.py` - separate admin app factory
- `web/` - React/Vite operator UI with light and dark mode
- `compose/` - local proxy + admin + Caddy stack
- `docs/` - architecture, threat model, and execution plans

## Quick Start

### Local Python + frontend

```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -e .[dev]
python -m spacy download en_core_web_sm
python -m spacy download zh_core_web_sm
make ui-install
make ui-build
make test
```

Run the proxy:

```bash
make run
```

Run the admin API/UI separately:

```bash
STRONK_GATEWAY_AUDIT_STORAGE_ENABLED=true \
STRONK_GATEWAY_AUDIT_DB_PATH=./data/stronk-gateway-audit.sqlite3 \
STRONK_GATEWAY_ENABLE_ADMIN_API=true \
STRONK_GATEWAY_ENABLE_ADMIN_UI=true \
STRONK_GATEWAY_ADMIN_BOOTSTRAP_USER=operator \
STRONK_GATEWAY_ADMIN_BOOTSTRAP_PASSWORD_HASH="$(python - <<'PY'
from stronk_gateway.admin import hash_admin_password
print(hash_admin_password('change-me'))
PY
)" \
STRONK_GATEWAY_ADMIN_UI_DIR=./web/dist \
make admin-run
```

The admin app now owns first-party web login with a bootstrap username plus scrypt password hash. Running `make admin-run` is useful for local session/auth/UI work, but content-trace inspection still depends on the same-origin admin gateway path described below because `/api/debug/*` is bridged to the proxy runtime on a separate private listener.

### Local Docker Compose

1. Choose an operator username:

```bash
export STRONK_GATEWAY_ADMIN_USER=operator
```

2. Generate an admin scrypt password hash:

```bash
export STRONK_GATEWAY_ADMIN_PASSWORD_HASH="$(python - <<'PY'
from stronk_gateway.admin import hash_admin_password
print(hash_admin_password('change-me'))
PY
)"
```

3. Start the stack:

```bash
docker compose -f compose/docker-compose.yml up --build
```

Default local endpoints:

- proxy: `http://127.0.0.1:8787`
- admin UI: `http://127.0.0.1:8788`

Default local behavior:

- sign in through the admin web UI with `STRONK_GATEWAY_ADMIN_USER` plus the plaintext password you hashed locally
- public `:8787` stays inference-only
- raw content traces are visible only after authenticated login through the admin gateway on `:8788`
- the trace bridge is internal-only and does not mount raw trace routes on the public proxy listener

## Release Publishing

The preferred release path now uses GitHub Actions for both GHCR image publishing and PyPI Trusted Publishing. The recorded release flow, plus the Bitwarden-backed local PyPI fallback, lives in [docs/release-publishing.md](/Users/eyy/Documents/Work/Dev/repos/stronk-gateway/docs/release-publishing.md).

The short version:

```bash
cd /Users/eyy/Documents/Work/Dev/repos/stronk-gateway
uv build
BWS_ACCESS_TOKEN="$BWS_STRONK_TERMINAL_ACCESS_TOKEN" \
bws run -- uv publish
```

Keep GHCR publishing in GitHub Actions; do not add local container-registry tokens to this flow.

## Configuration

Core proxy environment variables:

- `STRONK_GATEWAY_OPENAI_UPSTREAM_BASE_URL`
- `STRONK_GATEWAY_ANTHROPIC_UPSTREAM_BASE_URL`
- `STRONK_GATEWAY_ALLOW_INSECURE_UPSTREAMS=false`
- `STRONK_GATEWAY_ALLOW_NONSTANDARD_UPSTREAM_HOSTS=false`
- `STRONK_GATEWAY_ENABLE_DEBUG_MASK_ENDPOINT=false`
- `STRONK_GATEWAY_PRESIDIO_ENABLED=true`
- `STRONK_GATEWAY_PRESIDIO_ENGLISH_MODEL=en_core_web_sm`
- `STRONK_GATEWAY_PRESIDIO_CHINESE_MODEL=zh_core_web_sm`
- `STRONK_GATEWAY_*_ACTION`

Admin plane environment variables:

- `STRONK_GATEWAY_AUDIT_STORAGE_ENABLED=false`
- `STRONK_GATEWAY_AUDIT_DB_PATH`
- `STRONK_GATEWAY_AUDIT_MAX_EVENTS=2000`
- `STRONK_GATEWAY_ENABLE_ADMIN_API=false`
- `STRONK_GATEWAY_ENABLE_ADMIN_UI=false`
- `STRONK_GATEWAY_ADMIN_AUTH_MODE=session`
- `STRONK_GATEWAY_ADMIN_BOOTSTRAP_USER`
- `STRONK_GATEWAY_ADMIN_BOOTSTRAP_PASSWORD_HASH`
- `STRONK_GATEWAY_ADMIN_BOOTSTRAP_ROLES=admin,operator`
- `STRONK_GATEWAY_ADMIN_TRACE_ALLOWED_ROLES=admin,operator`
- `STRONK_GATEWAY_ADMIN_SESSION_COOKIE_NAME=stronk_admin_session`
- `STRONK_GATEWAY_ADMIN_SESSION_IDLE_TTL_SECONDS=1800`
- `STRONK_GATEWAY_ADMIN_SESSION_ABSOLUTE_TTL_SECONDS=43200`
- `STRONK_GATEWAY_ADMIN_LOGIN_ATTEMPT_LIMIT=5`
- `STRONK_GATEWAY_ADMIN_LOGIN_ATTEMPT_WINDOW_SECONDS=900`
- `STRONK_GATEWAY_UNSAFE_DEBUG_GATEWAY_SECRET`
- `STRONK_GATEWAY_ADMIN_ALLOWED_ROLES=admin,operator,auditor`
- `STRONK_GATEWAY_ADMIN_ACCESS_LOG_MAX_ENTRIES=500`
- `STRONK_GATEWAY_ADMIN_UI_DIR=./web/dist`

Default policy:

- `email`, `phone`, `person_name`, `organization`, `address` -> `mask`
- `api_key`, `bearer_token`, `jwt` -> `block`

## Tests

The suite includes:

- unit coverage for detector behavior, overlap resolution, placeholder generation, policy parsing, audit summaries, event-store behavior, and SSE rehydration
- integration coverage for all supported provider endpoints
- websocket regression coverage for OpenAI Responses `response.create`, `response.append`, prewarm, invalid events, and incomplete upstream streams
- websocket regression coverage for real `previous_response_id` continuations, fresh-chain resets, reconnect recovery via `x-codex-turn-state`, and binary-frame rejection
- regression checks proving raw values do not leak into forwarded upstream payloads
- admin-plane coverage for `401/403` auth enforcement and sanitized proxy-to-admin event flow
- streaming tests for OpenAI chat, OpenAI responses, and Anthropic messages
- bypass and collision canaries including zero-width-key variants and literal placeholder collisions

## Benchmark Evidence

The local benchmark harness lives at `python3 scripts/bench_proxy.py` and writes row artifacts under `docs/exec-plans/active/privacy-proxy-hardening-and-scalability-v1/artifacts/benchmarks/`.

Measured local rows on `2026-03-28`:

- `W1-http-chat`: `15.27` requests/s, p95 latency `10649 ms`, zero failures, blocked only because no historic pre-change baseline was captured
- `W2-http-responses`: `14.33` requests/s, p95 latency `11030 ms`, zero failures, blocked only because no historic pre-change baseline was captured
- `W3-sse-stream`: `14.47` streams/s, p95 first-byte `4863 ms`, zero leak counters, blocked only because no historic pre-change baseline was captured
- `W4-ws-sequential`: `80` successful turns at `16` sessions x `5` turns, blocked because the black-box reconnect probe still ended with `ConnectionClosedError` instead of deterministically proving rollback semantics
- `W5-ws-abuse`: all `24/24` scripted abusive requests were rejected and all `4/4` blocked-budget sockets closed, but the row remains blocked because black-box evidence cannot prove internal pre-parse and pre-redaction ordering
- `W6-audit-contention`: `256.97` requests/s, p95 latency `1577 ms`, zero `database is locked` failures, blocked on the missing historic baseline and because admin access logging means the read side is not a pure-read workload
- `W7-memory-bound`: `9.80` requests/s, `40` truncation hits observed, blocked because the plan never encoded a numeric RSS envelope even though capture limits and truncation markers were exercised

These are local harness measurements, not an SLA. The repo does not claim a p95 regression comparison against pre-change behavior because that historic baseline was not captured before the hardening work.

## Stronger Than PasteGuard

This repo now claims stronger behavior only where it is implemented and tested:

- Explicit OpenAI `responses` endpoint coverage, not just chat completions.
- OpenAI `responses` websocket compatibility on the same public path used by Codex-style clients.
- Fixed upstream egress targets with request-header allowlists and redirect refusal.
- Official upstream host pinning is on by default; non-standard compatible hosts require explicit opt-in.
- Default secret handling is `block`, not best-effort masking.
- Raw detection values are not reflected back through the debug path.
- Streaming rehydration is covered for all three supported provider surfaces.
- Rehydration is limited to human-readable assistant text paths; tool arguments stay masked by default.
- Regression tests cover placeholder collisions, bypass attempts, and `/responses` as a first-class path.
- The monitoring plane is separate from proxy routes, disabled by default, and stores sanitized telemetry only.

## Threat Model Summary

- Caller credentials for upstream providers are part of the data plane and must not be reused for admin authentication.
- The admin plane now uses first-party session-cookie auth inside `stronk-gateway-admin`, backed by a bootstrap username plus scrypt password hash.
- Raw content traces stay memory-only in the proxy runtime and are reachable only through the loopback/private admin gateway path plus the shared bridge secret.
- The admin gateway should stay loopback-bound or privately networked by default. If you widen it, add real TLS and network controls first.
- The SQLite event store persists sanitized events only. It is not a safe place for raw prompts, raw completions, or placeholder vault state.

## Known Limitations

- Name, organization, and address detection is heuristic. It is materially stronger than regex-only email and key detection, but it is not equivalent to a full local NER model.
- Presidio + spaCy are enabled by default in this repo. Fresh environments must install `en_core_web_sm` and `zh_core_web_sm` or explicitly disable Presidio.
- `route_local` is a clean policy boundary today, but the local-model execution path is still a scaffold and fails closed.
- WebSocket support is intentionally scoped to OpenAI Responses-style JSON text events. `stronk-gateway` is not a generic websocket tunnel and does not currently support binary or audio frames.
- The websocket bridge keeps the upstream side on HTTP plus Server-Sent Events (SSE). It does not yet proxy upstream websocket transports.
- The public `x-codex-turn-state` header name is a compatibility carry-over. Its semantics are harness-neutral, but the header name itself is not yet neutralized.
- Websocket continuity still depends on single-process deployment or sticky-session affinity for one turn. Non-sticky multi-worker continuity and shared external turn-state remain out of scope here.
- The admin plane is read-only in this phase. There is no browser-based policy editor or request replay workflow.
- The admin backend should stay on a private or loopback-bound network. The shared trace-bridge secret is a second trust signal, not a substitute for network boundaries and TLS.
- The frontend build currently uses npm-managed assets and should be built as part of image creation or CI before enabling the admin UI.
