Metadata-Version: 2.4
Name: crprotocol
Version: 2.3.0
Summary: Context Relay Protocol — unbounded context, unbounded generation, amplified reasoning for LLMs
Project-URL: Homepage, https://crprotocol.io
Project-URL: Documentation, https://crprotocol.io
Project-URL: Repository, https://github.com/Constantinos-uni/context-relay-protocol
Project-URL: Issues, https://github.com/Constantinos-uni/context-relay-protocol/issues
Project-URL: Changelog, https://github.com/Constantinos-uni/context-relay-protocol/blob/main/CHANGELOG.md
Author: Constantinos Vidiniotis
License-Expression: LicenseRef-Elastic-2.0
License-File: LICENSE.md
License-File: NOTICE
Keywords: context,crp,llm,protocol,relay
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Requires-Python: >=3.10
Provides-Extra: cli
Requires-Dist: click<9,>=8.0; extra == 'cli'
Provides-Extra: dev
Requires-Dist: mypy<2,>=1.5; extra == 'dev'
Requires-Dist: pip-audit<3,>=2.6; extra == 'dev'
Requires-Dist: pytest-asyncio<1,>=0.21; extra == 'dev'
Requires-Dist: pytest-cov<6,>=4.1; extra == 'dev'
Requires-Dist: pytest<9,>=7.4; extra == 'dev'
Requires-Dist: ruff<1,>=0.1; extra == 'dev'
Provides-Extra: full
Requires-Dist: anthropic<2,>=0.7; extra == 'full'
Requires-Dist: blake3<2,>=0.3; extra == 'full'
Requires-Dist: click<9,>=8.0; extra == 'full'
Requires-Dist: cryptography<44,>=41.0; extra == 'full'
Requires-Dist: gliner<2,>=0.2; extra == 'full'
Requires-Dist: hnswlib<1,>=0.7; extra == 'full'
Requires-Dist: httpx<1,>=0.24; extra == 'full'
Requires-Dist: igraph<1,>=0.11; extra == 'full'
Requires-Dist: leidenalg<1,>=0.10; extra == 'full'
Requires-Dist: openai<3,>=1.0; extra == 'full'
Requires-Dist: prometheus-client<1,>=0.17; extra == 'full'
Requires-Dist: sentence-transformers<4,>=2.2; extra == 'full'
Requires-Dist: spacy<4,>=3.5; extra == 'full'
Provides-Extra: nlp
Requires-Dist: gliner<2,>=0.2; extra == 'nlp'
Requires-Dist: hnswlib<1,>=0.7; extra == 'nlp'
Requires-Dist: sentence-transformers<4,>=2.2; extra == 'nlp'
Requires-Dist: spacy<4,>=3.5; extra == 'nlp'
Provides-Extra: security
Requires-Dist: blake3<2,>=0.3; extra == 'security'
Requires-Dist: cryptography<44,>=41.0; extra == 'security'
Description-Content-Type: text/markdown

<!--
  Copyright (c) 2026 Constantinos Vidiniotis. All rights reserved.
  Licensed under the terms described in LICENSE.md in the root of this repository.
-->

<p align="center">
  <img src="media/logo.svg" alt="CRP Logo" width="200" />
</p>

<h1 align="center">Context Relay Protocol (CRP)™</h1>

<p align="center">
  <strong>An open protocol for structured context management across LLM invocations.</strong>
</p>

<p align="center">
  <a href="LICENSE.md"><img src="https://img.shields.io/badge/Spec-CC_BY--SA_4.0-blue.svg" alt="Spec: CC BY-SA 4.0"></a>
  <a href="LICENSE.md"><img src="https://img.shields.io/badge/Code-ELv2-orange.svg" alt="Code: Elastic License 2.0"></a>
  <img src="https://img.shields.io/badge/Spec_Version-2.2.0-brightgreen.svg" alt="Spec Version: 2.2.0">
  <img src="https://img.shields.io/badge/RFC_2119-Conformant-orange.svg" alt="RFC 2119">
  <img src="https://img.shields.io/badge/Language_Neutral-JSON_Schema-yellow.svg" alt="Language Neutral">
  <img src="https://img.shields.io/badge/Status-Specification_Complete-green.svg" alt="Status: Specification Complete">
  <a href="https://github.com/Constantinos-uni/context-relay-protocol/actions"><img src="https://img.shields.io/github/actions/workflow/status/Constantinos-uni/context-relay-protocol/ci.yml?label=CI" alt="CI"></a>
  <img src="https://img.shields.io/badge/python-3.10%2B-blue.svg" alt="Python 3.10+">
  <img src="https://img.shields.io/badge/tests-1%2C537-brightgreen.svg" alt="1,537 tests">
</p>

<p align="center">
  <a href="#quick-start">Quick Start</a> •
  <a href="#the-problem">The Problem</a> •
  <a href="#what-crp-does">Solution</a> •
  <a href="#inter-llm-context-sharing-http-sidecar">Inter-LLM Sharing</a> •
  <a href="BENCHMARKS.md">Benchmarks</a> •
  <a href="#specification-documents">Specification</a> •
  <a href="#sdk-status">SDKs</a> •
  <a href="#community">Community</a>
</p>

---

> MCP gives agents tools. A2A lets agents talk. **CRP gives every agent unbounded context, unbounded generation, and amplified reasoning** — the foundation both protocols assume but neither provides.

---

## Table of Contents

1.  [The Problem](#the-problem)
2.  [What CRP Does](#what-crp-does)
3.  [Key Differentiators](#key-differentiators)
4.  [Quick Start](#quick-start)
5.  [How CRP Works](#how-crp-works)
6.  [Architecture Overview](#architecture-overview)
7.  [Core Capabilities](#core-capabilities)
8.  [Inter-LLM Context Sharing (HTTP Sidecar)](#inter-llm-context-sharing-http-sidecar)
9.  [End-to-End Example: Penetration Test](#end-to-end-example-a-penetration-test)
10. [CRP in the AI Stack](#crp-in-the-ai-stack)
11. [Extraction Quality](#extraction-quality)
12. [Efficiency and Cost](#efficiency-and-cost)
13. [Observability and Auditing](#observability-and-auditing)
14. [Limitations and Trade-offs](#limitations-and-trade-offs)
15. [Why Large Context Windows Are Not Enough](#why-large-context-windows-are-not-enough)
16. [Specification Documents](#specification-documents)
17. [JSON Schemas](#json-schemas)
18. [API Surface](#api-surface)
19. [SDK Status](#sdk-status)
20. [Comparison with Alternatives](#comparison-with-alternatives)
21. [Hardware Requirements](#hardware-requirements)
22. [Configuration](#configuration)
23. [Use Cases](#use-cases)
24. [Roadmap](#roadmap)
25. [Contributing](#contributing)
26. [Governance](#governance)
27. [Security](#security)
28. [Community](#community)
29. [Built With](#built-with)
30. [Intellectual Property & License](#license)

---

## The Problem

Every agentic AI system forces its LLM to work inside a single, shared context window. Planning, reasoning, tool calling, analysis, memory, and output generation all compete for the same finite token budget. This creates three compounding failures:

| Failure | What Happens | Impact |
|---------|-------------|--------|
| **Context Contamination** | Tool output from step 3 dilutes reasoning for step 12 | The LLM "forgets" early discoveries. Later decisions degrade |
| **Attention Collapse** | At 30K+ tokens, attention spreads thin over irrelevant content | Critical facts in the middle are effectively invisible ([Liu et al., 2023](https://arxiv.org/abs/2307.03172)) |
| **Hard Ceiling** | When the context window fills, the system truncates or stops | Reports are incomplete. Analysis is shallow. Output is arbitrarily cut short |

These aren't edge cases — they happen on **every non-trivial agentic task** and get worse the more capable your agent becomes.

---

## What CRP Does

CRP is a **middleware layer** that wraps your existing LLM calls. It does NOT replace your LLM — it **amplifies** it.

For every LLM call you already make, CRP:

1. **Builds a better prompt** — adds an envelope of relevant historical facts, source passages, and the LLM's own synthesis alongside your system prompt and task input
2. **Calls YOUR LLM** — through your existing provider and infrastructure
3. **Returns the raw output unchanged** — exactly what the LLM generated, not a filtered version
4. **Observes the output** (read-only) — extracts facts into the knowledge fabric so future windows benefit
5. **Carries the LLM's understanding forward** — progressive synthesis evolves across windows
6. **Scaffolds reasoning** — decomposes complex tasks into micro-steps for models that can't chain-of-thought natively

```
   WITHOUT CRP                                WITH CRP

   One shared window,                         N dedicated windows,
   everything competing:                      each pristine:

   +---------------------------+              +----------+   +----------+   +----------+
   | System prompt             |              | System   |   | System   |   | System   |
   | + Tool schemas (10K tok)  |              | Envelope |   | Envelope |   | Envelope |
   | + Tool output #1-#3       |              | Task     |   | Task     |   | Task     |
   | + Reasoning history       |              |          |   |          |   |          |
   | + Prior conversation      |              | Full     |   | Full     |   | Full     |
   | + Current task (buried)   |              | 128K     |   | 128K     |   | 128K     |
   +---------------------------+              +----------+   +----------+   +----------+

    Total capacity: 128K (fixed)              Total capacity: N × 128K (unbounded)
    Quality: degrades with length             Quality: peak per window (tier-reported)
    Input limit: context window               Input limit: unbounded (auto-ingest)
    Output limit: max_output_tokens           Output limit: unbounded (continuation)
```

---

## Key Differentiators

- **Embedded library, not a server** — zero deployment overhead. `pip install crprotocol` and you're running. No Docker, no infrastructure. Optional HTTP sidecar (`crp serve`) for [inter-LLM context sharing](#inter-llm-context-sharing-http-sidecar) — never started automatically
- **Works with any LLM provider** — auto-detected, 3 fields to configure. Built-in adapters for OpenAI, Anthropic, Ollama, and llama.cpp — plus `CustomProvider` to wrap any LLM in 3 lines
- **Structured knowledge extraction** — 6-stage graduated pipeline (regex → statistical NLP → GLiNER NER → UIE relations → RST discourse → LLM-assisted relational). Not just text chunking
- **Contextual Knowledge Fabric (CKF)** — graph-structured knowledge with 4-mode retrieval (graph walk + pattern query + semantic fallback + community summaries), event-sourced history, and cross-session persistence
- **Two-sided provenance** — CRP classifies every model *output* as `CONTEXT_GROUNDED | PARAMETRIC | MIXED | UNCERTAIN` **and** records every *input* fact's upstream source (RAG chunk, vector DB, MCP tool, function call, web search, user turn, file upload, agent memory, or parametric). `ContextManifest` lets you sign a declaration of intended sources; anything observed outside the declaration is flagged as `CONTEXT_ATTESTATION_MISMATCH` in the audit log. Foundational for **ISO/IEC 42001 §4**, **EU AI Act Art. 10**, and **GDPR Art. 30**
- **Unbounded input** — automatically ingests documents larger than any model's context window through structure-aware chunking with protected spans
- **Unbounded output** — automatic continuation with voice profile preservation, document maps, degradation-triggered re-grounding, and content-type-aware stitching
- **Honest quality guarantees** — a degradation model, not magic claims. Quality tiers S through D, reported with every dispatch. Extraction recall percentages published per stage
- **Cross-session knowledge** — sessions build on each other. CKF persists facts, reasoning traces, and graph structure across sessions
- **Reasoning amplification** — meta-learning scaffolds (ORC + ICML + RTL) enable 2B–7B models to perform multi-step reasoning they cannot do natively
- **Zero in-window overhead** — CRP operates entirely outside the LLM's context window. No protocol tokens, no function call schemas, no memory management instructions inside the window
- **Full observability** — per-window metrics, session dashboards, window DAG traceability, telemetry export. Debug "why did it do that?" by tracing decisions through the DAG

---

## Quick Start

### Minimal Integration (3 lines)

```python
import crp

# Auto-detects your LLM from environment (OPENAI_API_KEY, ANTHROPIC_API_KEY, or Ollama)
client = crp.Client()
output, report = client.dispatch(
    system_prompt="You are a helpful assistant.",
    task_input="Summarize this document: ..."
)
# output = raw LLM output, unmodified
# report.quality_tier = "S" | "A" | "B" | "C" | "D"
```

### Explicit Provider

```python
from crp import Client
from crp.providers import OpenAIAdapter

client = Client(provider=OpenAIAdapter(model="gpt-4o"))
output, report = client.dispatch(
    system_prompt="You are a helpful assistant.",
    task_input="Summarize this document: ..."
)
```

### Model Name Shortcut

```python
import crp

# Pass model= for automatic provider detection
client = crp.Client(model="claude-sonnet-4-20250514")   # → AnthropicAdapter
client = crp.Client(model="gpt-4o")              # → OpenAIAdapter
client = crp.Client(model="llama3.1")             # → OllamaAdapter
```

### Local Models (Zero-Config)

```python
from crp import Client
from crp.providers import OllamaAdapter

client = Client(provider=OllamaAdapter())  # Auto-detects localhost:11434
output, report = client.dispatch(
    system_prompt="You are a security analyst.",
    task_input="Analyze these scan results: ..."
)
```

### llama.cpp / vLLM

```python
from crp import Client
from crp.providers import LlamaCppAdapter

client = Client(provider=LlamaCppAdapter(server_url="http://localhost:8080"))
output, report = client.dispatch(system_prompt=system, task_input=user_message)
```

### Any Custom Setup

```python
from crp import Client
from crp.providers import CustomProvider

def my_generate(messages, **kw):
    # Your existing LLM function
    return ("response text", "stop")  # (output, finish_reason)

client = Client(provider=CustomProvider(
    generate_fn=my_generate,
    count_tokens_fn=lambda text: len(text) // 4,
    context_size=128000,
))
output, report = client.dispatch(system_prompt=system, task_input=user_message)
```

### Direct Ingestion (No LLM Window)

```python
client.ingest(nmap_output)      # ~7ms, extraction only — no LLM call
client.ingest(nikto_output)     # Facts go to warm state automatically
client.ingest(api_response)     # Available in next window's envelope
```

### LLM Compatibility

| API Style | Provider | Examples |
|-----------|---------|----------|
| Chat completions | `OpenAIAdapter`, `AnthropicAdapter`, `OllamaAdapter` | OpenAI, Anthropic, Ollama |
| HTTP completions | `LlamaCppAdapter` | llama.cpp, any OpenAI-compatible HTTP endpoint |
| Any custom setup | `CustomProvider` | Any function that takes messages → returns (text, reason) |

### Configuration

```bash
# .env — ALL optional (CRP auto-detects LLM from API keys or local Ollama)
CRP_ENABLED=true                   # Master switch (default: enabled)
CRP_LOG_ENVELOPES=false            # Debug logging (default: false)
CRP_MAX_CONTINUATIONS=50           # Safety limit on continuation windows
```

### Async Support

```python
# Works with FastAPI, asyncio, any async framework
output, report = await client.async_dispatch("You are helpful.", "Explain CRP.")
facts_count = await client.async_ingest(text, label="docs")
async for event in client.async_dispatch_stream("You are helpful.", "Explain CRP."):
    if event.event_type == "token":
        print(event.data, end="")
await client.async_close()
```

> **More examples**: See [`examples/`](examples/) for runnable scripts — quickstart, multi-turn, ingestion, streaming, async, and provider selection.

---

## How CRP Works

### Four Core Mechanisms

All operate **outside the LLM** — zero protocol tokens inside the model's window.

#### 1. Task Isolation

Every LLM call gets its own dedicated context window containing: system prompt, context envelope, and task input. Nothing else. CRP does NOT add LLM calls — every `crp.dispatch()` maps 1:1 to calls your application already makes. The only "extra" windows are continuations when output hits the physical limit.

#### 2. Context Envelopes + Knowledge Fabric

Between windows, an **envelope** carries forward everything the next window needs. Built by **extraction** (not summarization) — atomic facts and relationships are pulled from output using a graduated 6-stage pipeline, stored in the **Contextual Knowledge Fabric (CKF)** — a fact graph with typed edges, event-sourced history, community detection, and multi-mode retrieval:

- **Graph Walk** — traverse edges from seed facts (2-hop BFS) to reconstruct the subgraph around the task's focal point
- **Pattern Query** — content-addressable structured matching inspired by tuple spaces (Gelernter, 1985)
- **Semantic Fallback** — traditional ANN cosine similarity when graph structure is insufficient
- **Community Summaries** — Leiden community detection produces topic clusters; summaries provide high-level context

Facts are scored by **multi-aspect semantic similarity** with cross-encoder reranking, and packed greedily with dependency-aware graph packing until the window is full.

#### 3. Multi-Signal Completion Detection

The protocol monitors four signals across windows:

| Signal | What It Measures | Dominates For |
|--------|-----------------|---------------|
| **Fact Flow** | New facts per token | Entity-rich content |
| **Structural Flow** | New headings/paragraphs/list items | Structured documents |
| **Vocabulary Novelty** | New n-grams vs. seen n-grams | Creative/discursive content |
| **Structural Completion** | Conclusion detection | Summaries and conclusions |

Signals are weighted by content type — preventing premature termination of conclusions, summaries, and rhetorical passages that produce few new facts but are genuine content.

#### 4. Envelope-Based Continuation

When output hits the physical limit, CRP:

1. Incrementally extracts facts from the new window's output — `O(N)` per window, not `O(N²)` accumulated
2. Identifies what's missing via multi-level gap analysis
3. Builds a continuation envelope with voice profile + document map + structural state for long-chain coherence
4. Dispatches a fresh window — the continuation sees extracted essence, not raw overlap

### What CRP Sends to Your LLM

```python
# You call:
response = client.dispatch(
    system_prompt="You are a security analyst.",
    task_input="Analyze these nmap results: ..."
)

# CRP constructs and sends to YOUR LLM:
messages = [
    {"role": "system", "content": "You are a security analyst."},    # UNCHANGED
    {"role": "user", "content": envelope_text + "\n\n" + task_input} # envelope ADDED
]
```

Your system prompt and task input pass through unchanged. The envelope is additional context — historical facts from prior windows, scored by relevance. The LLM doesn't know CRP exists. **Zero protocol overhead inside the window.**

### Output Guarantee

`dispatch()` returns the complete, unmodified LLM output. Always. Extraction is a read-only side effect — it never modifies, filters, or summarizes the returned string.

---

## Architecture Overview

```
+---------------------------------------------------------------------+
|                        YOUR APPLICATION                              |
|   (any code that calls an LLM — agents, pipelines, reports)         |
+---------------------------------------------------------------------+
                                |
                                |  crp.dispatch(system_prompt, task_input)
                                v
+---------------------------------------------------------------------+
|                        CRP ORCHESTRATOR                              |
|                                                                      |
|   +-----------------+  +-----------------+  +----------------------+ |
|   | Envelope        |  | Warm State      |  | Extraction Pipeline  | |
|   | Builder         |  | Store + Fact    |  | (Blackboard-Reactive)| |
|   | (multi-aspect   |  | Graph + Event   |  | regex → stat → NER → | |
|   |  scoring +      |  | Log             |  | UIE → discourse →    | |
|   |  cross-encoder  |  | (session facts, |  | LLM-relational       | |
|   |  reranking +    |  |  scored +       |  | (graduated, content- | |
|   |  CKF multi-mode |  |  embedded +     |  |  type-adaptive,      | |
|   |  retrieval +    |  |  graph edges +  |  |  self-gating)        | |
|   |  source         |  |  FactEvents)    |  |                      | |
|   |  grounding)     |  |                 |  |                      | |
|   +-----------------+  +-----------------+  +----------------------+ |
|                                                                      |
|   +-----------------+  +-----------------+  +----------------------+ |
|   | Multi-Signal    |  | Continuation    |  | CKF (Knowledge       | |
|   | Completion +    |  | Manager         |  |  Fabric)             | |
|   | Degradation     |  | (auto-ingest,   |  | graph walk +         | |
|   | Monitor         |  |  gap analysis,  |  | pattern query +      | |
|   | (fact flow +    |  |  stitch,        |  | semantic fallback +  | |
|   |  structural +   |  |  voice profile, |  | community summary +  | |
|   |  vocabulary +   |  |  document map,  |  | pub-sub events +     | |
|   |  chain degr.)   |  |  re-grounding)  |  | cross-session graph) | |
|   +-----------------+  +-----------------+  +----------------------+ |
|                                                                      |
|   +-----------------+  +-----------------+  +----------------------+ |
|   | Source          |  | LLM Context     |  | Meta-Learning        | |
|   | Grounding       |  | Curator         |  | Engine               | |
|   | Engine          |  | (periodic       |  | (ORC: orchestrated   | |
|   | (original text  |  |  curation       |  |  reasoning chains,   | |
|   |  passages in    |  |  windows,       |  |  ICML: in-context    | |
|   |  envelopes,     |  |  progressive    |  |  meta-learning,      | |
|   |  dual-layer     |  |  understanding, |  |  RTL: reasoning      | |
|   |  fact+source)   |  |  LLM synthesis) |  |  template library)   | |
|   +-----------------+  +-----------------+  +----------------------+ |
+---------------------------------------------------------------------+
                                |
                                |  Standard LLM API call (unchanged)
                                v
                    +------------------------+
                    |    LLM (any model)     |
                    |    Local or cloud      |
                    +------------------------+
```

---

## Core Capabilities

### Unbounded Context: Input > Model's Window

**Problem**: Your input is 1M tokens but your model has 128K context.

CRP's **auto-ingest** handles this transparently:

1. Detects overflow: `system_prompt + task_input + generation_reserve > context_window`
2. Structure-aware chunking at natural boundaries with protected spans (code blocks, tables, JSON objects are never split). 500-token overlap with boundary reconciliation
3. Extracts facts from each chunk — zero LLM calls for typical content
4. Builds envelope with multi-aspect scoring, cross-encoder reranking, and dependency-aware graph packing
5. Dispatches with a maximally-saturated context window

```python
# Transparent — the user doesn't manage chunking
result = crp.dispatch(
    system_prompt="You are a legal analyst.",
    task_input=million_token_contract  # CRP handles the rest
)
```

Strictly better than truncation (which loses 87% of 1M input on a 128K model). See [02_CORE_PROTOCOL.md §7.6](specification/02_CORE_PROTOCOL.md) for the honest degradation model.

### Unbounded Generation: Output > Model's Limit

**Problem**: Your model outputs 4K tokens per call, but you need 100K.

CRP's **continuation loop** handles this automatically:

1. LLM generates → hits output limit (`finish_reason: "length"`)
2. CRP incrementally extracts facts from the output — `O(N)` not `O(N²)`
3. Runs multi-level gap analysis
4. Builds continuation envelope: facts + structural state + remaining items + voice profile + document map
5. Dispatches fresh window — full context capacity, no attention degradation
6. Stitches outputs with content-type-aware boundary detection, echo detection, heading hierarchy validation
7. Periodically runs re-grounding windows that re-extract from accumulated output to correct warm state drift
8. Repeats until multi-signal completion detection indicates genuine completion

```python
result = crp.dispatch(
    system_prompt="Write a comprehensive security report.",
    task_input="All findings here...",
    max_continuations=50  # Optional safety limit
)
# result contains the full output, stitched from multiple windows
```

### Peak Quality Per Window

Every window gets the model's full context capacity. The envelope fills all remaining space with semantically-ranked facts. Fresh KV cache per window eliminates attention degradation. Self-calibrating weights and thresholds require zero configuration.

### Concurrency Model

Each `CRPOrchestrator` instance is **single-threaded by design** — one dispatch at a time per session. Different sessions (separate `CRPOrchestrator` instances) are fully isolated and can run concurrently without interference. Each session has its own `WarmStateStore`, `FactGraph`, `WindowDAG`, and event log. To process multiple tasks concurrently, create one orchestrator per task.

---

## Inter-LLM Context Sharing (HTTP Sidecar)

> **Optional.** The sidecar is never started automatically. You must explicitly run `crp serve` to enable it.

CRP includes an HTTP sidecar that exposes the **full protocol surface** over REST, enabling multiple applications — potentially using different LLMs — to share extracted knowledge without direct LLM-to-LLM communication.

### Why This Matters

Application A (Claude) extracts facts about code architecture. Application B (GPT-4) receives those facts via the `/facts/share` endpoint. Both benefit from the other's knowledge — without API key sharing, without prompt injection, without any LLM talking to another LLM. The knowledge flows through CRP's structured extraction layer.

This is **not** a chat relay. It is structured, scored, ranked knowledge transfer.

### Quick Start

```bash
# Start the sidecar (loopback only, no auth — local development)
crp serve

# Start with authentication (recommended)
crp serve --auth-token "my-secret-token"

# Bind to all interfaces (REQUIRES auth token)
crp serve --bind-all --auth-token "my-secret-token" --port 9470
```

### Example: Two LLMs Sharing Knowledge

```bash
# 1. Create sessions for two different applications
SESSION_A=$(curl -s -X POST http://localhost:9470/sessions \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model": "claude-app", "context_window": 128000}' | python -c "import sys,json; print(json.load(sys.stdin)['session_id'])")

SESSION_B=$(curl -s -X POST http://localhost:9470/sessions \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt4-app", "context_window": 128000}' | python -c "import sys,json; print(json.load(sys.stdin)['session_id'])")

# 2. Application A ingests data and dispatches
curl -X POST http://localhost:9470/sessions/$SESSION_A/ingest \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"text": "The authentication module uses bcrypt with cost factor 12..."}'

curl -X POST http://localhost:9470/sessions/$SESSION_A/dispatch \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"system_prompt": "You are a security analyst.", "task_input": "Analyze the auth module."}'

# 3. Share Application A's knowledge → Application B
curl -X POST http://localhost:9470/sessions/$SESSION_A/facts/share \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"target_session_id": "'$SESSION_B'", "min_confidence": 0.5}'

# 4. Application B now has A's extracted facts in its warm state.
#    Its next dispatch will include those facts in the envelope.
curl -X POST http://localhost:9470/sessions/$SESSION_B/dispatch \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"system_prompt": "You are a code reviewer.", "task_input": "Review the auth module for best practices."}'
# → GPT-4 now sees Claude's extracted security facts in its context envelope
```

### Full Endpoint Reference

**Session Lifecycle**

| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/sessions` | Create a new CRP session |
| `GET` | `/sessions` | List sessions (owned by caller only) |
| `GET` | `/sessions/:id/status` | Session metrics and health |
| `POST` | `/sessions/:id/close` | Close and clean up session |

**Dispatch (All 6 Variants)**

| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/sessions/:id/dispatch` | Basic dispatch |
| `POST` | `/sessions/:id/dispatch/tools` | Tool-mediated dispatch |
| `POST` | `/sessions/:id/dispatch/reflexive` | Reflexive (verify) dispatch |
| `POST` | `/sessions/:id/dispatch/progressive` | Progressive dispatch |
| `POST` | `/sessions/:id/dispatch/stream-augmented` | Stream-augmented dispatch |
| `POST` | `/sessions/:id/dispatch/agentic` | Agentic dispatch |

**Knowledge**

| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/sessions/:id/ingest` | Ingest raw text (extraction only, no LLM call) |
| `GET` | `/sessions/:id/facts` | Query extracted facts (with `?limit=` and `?min_confidence=`) |
| `POST` | `/sessions/:id/facts/share` | **Share facts to another session** (core feature) |
| `POST` | `/sessions/:id/facts/feedback` | Boost, penalize, or reject a fact |
| `GET` | `/sessions/:id/envelope` | Preview envelope contents |

**Admin**

| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/sessions/:id/providers` | Register a fallback provider |
| `POST` | `/sessions/:id/estimate` | Cost estimation |
| `GET` | `/health` | Health check (session count, auth status, version) |

### Security Model

The sidecar is designed with defense-in-depth. Every layer is enforced on every request.

| Layer | Protection | Detail |
|-------|-----------|--------|
| **Bind address** | Loopback by default | Binds to `127.0.0.1` — only local processes can connect |
| **Authentication** | Bearer token | `--auth-token` enables timing-safe (`secrets.compare_digest`) token verification |
| **Bind-all gate** | `--bind-all` requires auth | Cannot expose to network without `--auth-token` (or explicit `--allow-unauthenticated` override) |
| **Session ownership** | Token-hash binding | Sessions are bound to the SHA-256 hash of the token that created them. Other tokens get `403 Forbidden` |
| **Rate limiting** | Per-IP burst window | Default 120 req/60s per IP. Configurable via `--rate-limit`. Uses monotonic clock (immune to clock drift) |
| **Body size limit** | 10 MB cap | Requests exceeding 10 MB receive `413 Payload Too Large`. Prevents memory exhaustion |
| **Session cap** | 64 concurrent sessions | Returns `503 Service Unavailable` when exceeded. Configurable via `--max-sessions` |
| **Security headers** | On every response | `X-Content-Type-Options: nosniff`, `Cache-Control: no-store` |
| **No HTTPS** | By design | Deploy behind a TLS-terminating reverse proxy (nginx, Caddy) for production |

### CLI Options

```
crp serve [OPTIONS]

Options:
  --port INTEGER                    Port number (default: 9470)
  --bind-all                        Bind to 0.0.0.0 (requires --auth-token)
  --auth-token TEXT                 Bearer token for authentication
  --allow-unauthenticated           Override auth requirement for --bind-all
  --max-sessions INTEGER            Max concurrent sessions (default: 64)
  --rate-limit INTEGER              Max requests per IP per 60s (default: 120)
```

### Integration with CRP Protocol

The sidecar is a thin HTTP layer over the same `CRPOrchestrator` that the Python SDK uses directly. Every session created via the sidecar is a full CRP session with:

- All 6 extraction stages (regex → statistical → NER → UIE → discourse → LLM-relational)
- Contextual Knowledge Fabric (CKF) with graph walk, pattern query, semantic fallback, community summaries
- Multi-signal completion detection and automatic continuation
- Envelope building with multi-aspect scoring and cross-encoder reranking
- Event emission for all pipeline stages (`fact.shared`, `fact.received`, `dispatch.completed`, etc.)
- RBAC enforcement, budget tracking, and cost estimation

The sidecar adds no protocol modifications. A fact extracted via the sidecar is identical to one extracted via `client.dispatch()`. A session created via HTTP behaves identically to one created via Python.

---

## End-to-End Example: A Penetration Test

Your pentest application already has separate LLM calls for planning, tool selection, analysis, and reporting. With CRP, each `llm.generate()` becomes `crp.dispatch()`. **CRP does not add, remove, or restructure your calls.**

### Step 1: Planning

```python
plan = crp.dispatch(
    system_prompt="You are a penetration testing planner...",
    task_input="Create a pentest plan for target 192.168.1.50. Scope: external, web focus."
)
```

| Phase | What Happens | Time |
|-------|-------------|------|
| Envelope | Empty (first window — cold start) | 0ms |
| LLM generates | Phase 1: Recon. Phase 2: Web vuln. Phase 3: Exploitation. Phase 4: Reporting | ~3s |
| Extraction | regex captures "192.168.1.50"; statistical: "nmap", "nikto" = 8 facts | ~6ms |
| Warm state | 8 facts with embeddings | — |

### Step 2: Tool Selection

```python
tool_choice = crp.dispatch(
    system_prompt="You are a security tool selector...",
    task_input="Select and configure the first tool for recon of 192.168.1.50"
)
```

| Phase | What Happens | Time |
|-------|-------------|------|
| Envelope | 8 facts from Step 1, scored by similarity to "tool selection for recon" | ~3ms |
| LLM generates | "Run: nmap -sV -sC -p- 192.168.1.50" | ~2s |
| Extraction | regex: full nmap command; statistical: "version detection" = 6 new facts | ~5ms |
| Warm state | Now 14 facts (8 + 6) | — |

### Step 3: Tool Execution + Ingestion

```python
nmap_result = run_tool("nmap", "-sV -sC -p- 192.168.1.50")  # Your tool runner
crp.ingest(nmap_result)  # ~7ms extraction; 22 new facts (ports, services, versions)
```

No LLM call. Extraction pipeline processes raw tool output directly. Warm state: 36 facts.

### Step 4: Analysis

| Phase | What Happens | Time |
|-------|-------------|------|
| Envelope | 36 facts scored for "vulnerability analysis". ~4200 tokens of dense, relevant context | ~4ms |
| LLM generates | "Critical: Apache 2.4.52 — CVE-2024-XXXX. High: OpenSSH 8.2 — known auth bypass..." | ~5s |
| Extraction | 12 new facts: CVEs, severity ratings, affected services, attack vectors | ~8ms |
| Warm state | 48 facts | — |

### Step 5: Report Generation + Continuation

| Phase | What Happens | Time |
|-------|-------------|------|
| Envelope | 48 facts scored for "report writing". All CVEs, findings, recommendations ranked | ~5ms |
| LLM generates | "Executive Summary... Finding 1: Critical..." → hits output limit | ~8s |
| Continuation | Extract from partial report, identify missing sections, build continuation envelope | ~15ms |
| Window 2 | Fresh context, continues report. 6 more findings + recommendations | ~6s |
| Stitch | Window 1 + Window 2 joined. Echo detection removes overlap. Clean 12-page report | ~2ms |

### Total CRP Overhead

| Step | CRP Time | LLM Time | Overhead |
|------|----------|----------|----------|
| Planning | ~6ms | ~3,000ms | 0.2% |
| Tool selection | ~8ms | ~2,000ms | 0.4% |
| Ingestion | ~7ms | 0ms | N/A |
| Analysis | ~12ms | ~5,000ms | 0.2% |
| Report + continuation | ~22ms | ~14,000ms | 0.2% |
| **Total** | **~55ms** | **~24,000ms** | **0.2%** |

---

## CRP in the AI Stack

### The Three-Layer Architecture

```
+-----------------------------------------------------------+
|  Layer 3:  A2A  —  Agent-to-Agent Communication            |
|  "How agents talk to each other"                           |
+-----------------------------------------------------------+
|  Layer 2:  MCP  —  Model Context Protocol                  |
|  "How agents access tools"                                 |
+-----------------------------------------------------------+
|  Layer 1:  CRP  —  Context Relay Protocol                  |
|  "How each agent manages its own context"                  |
|  THE FOUNDATION LAYER                                      |
+-----------------------------------------------------------+
```

**CRP is complementary to MCP and A2A.** MCP defines how agents access tools. A2A defines how agents communicate. CRP defines how each agent **manages its own context** — the foundation that makes both work at scale.

- Without CRP, every MCP tool call competes for context space
- Without CRP, every A2A message accumulates in a degrading window
- With CRP + MCP: tool results are extracted into facts, not piled into the window
- With CRP + A2A: inter-agent messages are structured knowledge, not raw text

---

## Extraction Quality

| Stage | Method | What It Extracts | Accuracy | When It Runs |
|-------|--------|-----------------|----------|-------------|
| **1** | Regex | IP addresses, CVEs, JSON, version strings | ~99% | Always |
| **2** | Statistical (TextRank) | Key sentences by term frequency | ~85-90% recall | Always |
| **3** | GLiNER NER | Entity spans (software, vulnerabilities) | ~80-90% F1 | When yield is low |
| **4** | UIE Relations | Entity relationships (X vulnerable to Y) | ~70-80% F1 | When yield is low |
| **5** | Discourse Structure | Logical relations (cause→effect, condition→consequence) via RST | ~65-75% F1 | Reasoning-dense content |
| **6** | LLM-Assisted Relational | Implicit logical relationships | ~85-90% F1 | Optional, high-complexity only |

Stages are graduated — 3-6 activate selectively based on content complexity and prior stage yield. Content is auto-classified as `ENTITY_RICH`, `REASONING_DENSE`, or `NARRATIVE` to route through appropriate strategies.

| Content Type | Typical Stages | Typical Time |
|-------------|---------------|-------------|
| Structured/factual | 1-2 | ~10-15ms |
| Mixed content | 1-4 | ~50-80ms |
| Reasoning-dense | 1-5 | ~160ms |
| High-complexity | 1-6 | ~500ms+ (Stage 6 uses LLM) |

---

## Efficiency and Cost

### Per-Window Overhead

| Operation | Time | When |
|-----------|------|------|
| Multi-aspect scoring + graph packing | ~5-10ms | Every window |
| Cross-encoder reranking (top-200) | ~400ms | When >50 facts (amortized) |
| Extraction Stages 1-2 | ~6ms | Every window |
| Extraction Stage 3 (GLiNER) | ~50ms | Only when yield is low |
| Extraction Stage 4 (UIE) | ~100ms | Only when yield is low |
| Extraction Stage 5 (Discourse) | ~150ms | Reasoning-dense content |
| **Typical total** | **~15-20ms** | **0.1-1% of LLM time** |

### Token Efficiency: CRP vs MCP

| Cost Factor | MCP | CRP |
|-------------|-----|-----|
| Tool schemas in prompt | ALL repeated every call (10K-50K) | Zero — only for tool-selection windows |
| Accumulated context | All prior results stay, attention degrades | Only relevant extracted facts |
| Redundant content | Same schemas repeated N times | No repetition — envelope carries only what's relevant |

**Example**: 20-step agentic loop, 50 tools:
- **MCP**: 20 × 10K schema tokens = **200K tokens** on tool definitions alone
- **CRP**: Schemas in tool-selection windows only. **~90% fewer protocol tokens**

### Cloud API Cost

| Scenario | Without CRP | With CRP | Savings |
|----------|------------|----------|---------|
| 20-step agentic loop (50 tools) | ~400K tokens | ~120K tokens | ~70% |
| Long report (3 continuations) | Truncated at limit | 4 windows, complete | N/A (impossible before) |
| Simple single-turn task | ~2K tokens | ~2K tokens | 0% (no penalty) |

### Real-World: 200-Page Textbook Generation

| Provider | Total Cost | Windows |
|----------|-----------|---------|
| Claude Opus | ~$17 | ~32 |
| Claude Sonnet | ~$3.30 | ~32 |
| GPT-4o | ~$2.50 | ~32 |
| DeepSeek | ~$0.27 | ~32 |
| Local model (Ollama) | **$0** | ~32 |

Naive approach (paste all prior chapters into context): ~800K+ input tokens and worse quality.

### Cost Controls

```python
client = Client(
    llm=adapter,
    max_windows_per_session=50,
    max_total_input_tokens=1_000_000,
    max_total_output_tokens=500_000,
)

# Pre-flight estimation
estimate = client.estimate_session(planned_dispatches=32, avg_output_tokens=4000)
print(f"Estimated cost: ${estimate.estimated_cost_usd:.2f}")

# Live tracking
status = client.session_status()
print(f"Running total: ${status.total_cost:.2f}")
```

Budget caps raise `BudgetExhaustedError` when hit. Rate limits are respected automatically.

---

## Observability and Auditing

### Per-Window Metrics (Automatic)

Every `crp.dispatch()` records:

```json
{
  "window_id": "w-a3f2c1",
  "session_id": "pentest-192.168.1.50",
  "parent_windows": ["w-b7e4d2"],
  "envelope_tokens": 4200,
  "saturation": 0.94,
  "extraction_stages_used": ["regex", "statistical"],
  "extraction_time_ms": 7,
  "facts_extracted": 12,
  "information_flow_rate": 0.0018,
  "quality_tier": "S",
  "gap_analysis": {"required": 7, "fulfilled": 7, "missing": 0},
  "continuation_triggered": false
}
```

### Session Dashboard

| Metric | Alert Threshold | What It Means |
|--------|-----------------|---------------|
| Total windows | >>2× your call count | Runaway continuations |
| Continuation rate | >30% | Tasks may be too large for one window |
| Average saturation | <60% | Extraction yield is low |
| Extraction yield | <2 facts/window | Content type may need different strategy |
| Stage escalation rate | >50% | Structured output would help |

### Window DAG Traceability

Every session produces a directed acyclic graph:

```
W1 (plan) → W2 (tool select) → W3 (analysis) → W4 (report) → W5 (report cont.)
```

Each node shows facts produced, facts consumed, information flow, and envelope saturation. Enables "why did it do that?" debugging by tracing decisions through the DAG.

---

## Limitations and Trade-offs

| Limitation | Severity | Mitigation |
|-----------|----------|-----------|
| **Extraction is lossy** | MEDIUM | 6-stage pipeline covers spectrum. ~85-90% recall on structured, ~70-80% on reasoning-dense, ~50-65% on implicit. See [§7.6](specification/02_CORE_PROTOCOL.md) for degradation model |
| **Fact granularity mismatch** | MEDIUM | Graduated pipeline from tight entities (regex) through relationships (UIE, discourse). Fact graph preserves inter-fact relationships |
| **Hallucinations may pass fact gate** | MEDIUM | Three-tier validation: structural, confidence, anomaly detection. Not perfect for structurally-valid hallucinations |
| **Cold start** | LOW | First window: empty envelope. First ~5 windows: calibrating. System bootstraps safely — never prematurely terminates |
| **Not beneficial for single-turn** | N/A | CRP adds zero value (and zero cost) for tasks that fit in one window |

---

## Why Large Context Windows Are Not Enough

"But my model has 1M context!" — Three problems:

1. **Output limits are NOT 1M.** Models with 1M *input* have *output* limits of 8K-32K. You still need continuation
2. **Attention degrades with length.** "Lost in the middle" means content at position 30K is invisible at position 200K ([Liu et al., 2023](https://arxiv.org/abs/2307.03172))
3. **Cost scales quadratically.** Growing context = $O(N^2)$ total tokens. CRP envelopes = $O(N)$ linear scaling

But more fundamentally, **context size is only 1 of CRP's 9 permanent value propositions**:

| # | Value Proposition | Why Native Context Cannot Provide It |
|---|---|---|
| 1 | **Context Quality** | CRP's scored, graph-structured envelopes put the right facts first. Raw text has no ranking |
| 2 | **Task Isolation** | One window per task. No cross-task attention contamination |
| 3 | **Attention Optimization** | Critical facts placed in the attention sink, not buried at position 500K |
| 4 | **Cost Efficiency** | $O(N)$ total tokens vs $O(N^2)$ for growing native context |
| 5 | **Cross-Session Knowledge** | CKF persists facts and reasoning across sessions |
| 6 | **Structured Knowledge** | Typed fact graph with edges, communities, temporal history |
| 7 | **Multi-Agent Coordination** | Envelope = structured state transfer between agents |
| 8 | **Observability** | Full provenance: every fact has source, confidence, lifecycle |
| 9 | **Reasoning Amplification** | Meta-learning scaffolds turn 2B models into reasoning systems |

**Even a model with infinite native context needs CRP** for propositions 1-4, 6-9.

Scientific backing: "Retrieval can significantly improve the performance of LLMs **regardless of their extended context window sizes**" — Xu et al., ICLR 2024.

---

## Specification Documents

The complete CRP v2.0 specification:

| # | Document | Description | Lines |
|---|----------|-------------|-------|
| 1 | [01_RESEARCH_FOUNDATIONS.md](specification/01_RESEARCH_FOUNDATIONS.md) | Academic research backing — 9 research areas, 40+ papers, meta-learning, retrieval augmentation | ~1,200 |
| 2 | [02_CORE_PROTOCOL.md](specification/02_CORE_PROTOCOL.md) | **The core specification** — 29 sections: axioms, state model, CKF, extraction, completion detection, quality tiers, hierarchical processing, meta-learning, security, concurrency, observability, deployment, publication | ~6,800 |
| 3 | [03_CONTEXT_ENVELOPE.md](specification/03_CONTEXT_ENVELOPE.md) | Context envelope — multi-phase scoring, CKF retrieval, source grounding, continuation envelopes | ~1,200 |
| 4 | [04_TOKEN_GENERATION_PROTOCOL.md](specification/04_TOKEN_GENERATION_PROTOCOL.md) | Unbounded output — continuation, stitching, voice profiles, document maps, completion detection | ~950 |
| 5 | [05_SYSTEM_WIDE_INTEGRATION.md](specification/05_SYSTEM_WIDE_INTEGRATION.md) | Integration architecture — 87+ call sites mapped, component inventory, migration strategy | ~1,750 |
| 6 | [06_IMPLEMENTATION_PLAN.md](specification/06_IMPLEMENTATION_PLAN.md) | Implementation plan — phased rollout, 13 modules, ~3,890 lines of code planned | ~2,000 |
| 7 | [07_SECURITY.md](specification/07_SECURITY.md) | Security architecture — threat model, input validation, fact integrity, RBAC, encryption, OWASP, quantum resistance | ~1,300 |
| 8 | [08_MONETIZATION.md](specification/08_MONETIZATION.md) | Business model — PostgreSQL model (full capability free), 5 revenue pillars, competitive positioning | ~2,000 |
| 9 | [09_DEPLOYMENT.md](specification/09_DEPLOYMENT.md) | Deployment — embedded library rationale, resource footprint, Lambda/K8s/MCP comparison, containerization | ~2,000 |

**Total specification**: ~19,200 lines across 9 documents.

---

## JSON Schemas

All API types are defined as [JSON Schema (Draft 2020-12)](https://json-schema.org/draft/2020-12/schema) for language-neutral consumption:

| Schema | Description | Source |
|--------|-------------|--------|
| [task-intent.json](schemas/task-intent.json) | `TaskIntent` — declarative, all-optional dispatch input | §6.10.2 |
| [quality-report.json](schemas/quality-report.json) | `QualityReport` — returned with every dispatch | §6.10.2 |
| [session-status.json](schemas/session-status.json) | `SessionStatus` — session health snapshot | §6.10.2 |
| [cost-estimate.json](schemas/cost-estimate.json) | `CostEstimate` — pre-flight cost estimation | §6.10.2 |
| [envelope-preview.json](schemas/envelope-preview.json) | `EnvelopePreview` — inspect without dispatching | §6.10.2 |
| [session-handle.json](schemas/session-handle.json) | `SessionHandle` — returned by init() | §6.10.8 |
| [stream-event.json](schemas/stream-event.json) | `StreamEvent` — streaming dispatch events | §6.10.5 |
| [crp-error.json](schemas/crp-error.json) | `CRPError` — standard error format | §6.10.4 |
| [persisted-state-header.json](schemas/persisted-state-header.json) | `PersistedStateHeader` — cold state versioning | §6.10.10 |

---

## API Surface

CRP exposes a synchronous + async + streaming API. All operations use **direct function invocation** (not network RPC). SDKs MAY expose JSON-RPC or gRPC transports for cross-process access.

### Core Operations

| Operation | Stability | Description |
|-----------|-----------|-------------|
| `Client(provider=..., app_id=...)` | **Stable** | Create session, init subsystems, restore cold state |
| `dispatch(system_prompt, task_input, ...)` | **Stable** | Execute LLM window with envelope, extract facts |
| `dispatch_stream(...)` | Provisional | Streaming variant — emits token/extraction/continuation/done events |
| `ingest(raw_text, ...)` | **Stable** | Extract facts without LLM invocation (~7ms) |
| `session_status()` | **Stable** | Session health: windows, tokens, facts, budget remaining, cost |
| `estimate_session(...)` | **Stable** | Pre-flight cost estimation with USD pricing |
| `preview_envelope(...)` | **Stable** | Inspect what the envelope would contain |
| `configure(config)` | **Stable** | Update security/cost config (ADMIN) |
| `export_state(...)` | Provisional | Export encrypted session state |
| `close()` | **Stable** | Flush warm → cold, persist CKF, clean up |

### Error Taxonomy

| Code | Error | Comparable To |
|------|-------|---------------|
| 1001 | `BudgetExhaustedError` | gRPC `RESOURCE_EXHAUSTED` |
| 1002 | `RateLimitExceeded` | HTTP 429 |
| 1003 | `SessionExpired` | gRPC `DEADLINE_EXCEEDED` |
| 1005 | `SessionClosed` | gRPC `FAILED_PRECONDITION` |
| 1010 | `ValidationError` | gRPC `INVALID_ARGUMENT` |
| 1011 | `SecurityInvariantError` | gRPC `ABORTED` |
| 1012 | `SignatureInvalidError` | gRPC `UNAUTHENTICATED` |
| 1020 | `ProviderError` | gRPC `INTERNAL` |
| 1021 | `ProviderTimeoutError` | gRPC `DEADLINE_EXCEEDED` |
| 1030 | `StateCorruptedError` | gRPC `DATA_LOSS` |
| 1031 | `ChainVerificationFailedError` | gRPC `DATA_LOSS` |
Full error taxonomy with all codes in [§6.10.4](specification/02_CORE_PROTOCOL.md).

### RBAC Roles (Planned)

| Role | Permissions |
|------|------------|
| **OBSERVER** | `session_status`, `estimate_session` |
| **OPERATOR** | All OBSERVER + `dispatch`, `ingest`, `preview_envelope` |
| **ADMIN** | All OPERATOR + `configure`, `reset_session`, `export_state` |

> RBAC is fully enforced in the SDK. Every dispatch, ingest, and admin operation checks `RBACEnforcer.check_permission()` and `check_rate_limit()` before proceeding. Default role is OPERATOR (dispatch + ingest). Set via `CRPConfig(default_role="ADMIN")` or `CRPConfig(default_role="OBSERVER")`.

---

## SDK Status

| Language | Status | Package | Repository |
|----------|--------|---------|------------|
| **Python** | ✅ v2.0.0 | `pip install -e ".[dev]"` | This repository |
| **TypeScript** | 📋 Planned | `npm install @crp/sdk` | `crp-typescript` |
| **Rust** | 📋 Planned | `cargo add crp` | `crp-rust` |

### Python SDK — Quick Start

```bash
pip install -e ".[dev]"
```

```python
import crp

# Zero-config: auto-detects LLM from environment
client = crp.Client()

# Or explicit: pass model name or provider
client = crp.Client(model="gpt-4o")
# client = crp.Client(provider=CustomProvider(...))

# Dispatch — CRP builds envelope, calls your LLM, extracts facts, returns raw output
output, report = client.dispatch(
    system_prompt="You are a security analyst.",
    task_input="Analyze the authentication flow in auth.py.",
)

print(output)                    # Unmodified LLM output (Axiom 9)
print(report.quality_tier)       # "S" | "A" | "B" | "C" | "D"
print(report.facts_extracted)    # Facts pulled from output
print(report.continuation_windows)  # How many continuation windows were used
```

**Built-in Providers:**

| Provider | Import | Requirements |
|----------|--------|-------------|
| Auto-detect | `crp.Client()` | Set `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or run Ollama |
| Custom (any LLM) | `crp.providers.CustomProvider` | None |
| OpenAI / Azure | `crp.providers.OpenAIAdapter` | `openai>=1.0`, `tiktoken` |
| Anthropic | `crp.providers.AnthropicAdapter` | `anthropic>=0.25` |
| Ollama | `crp.providers.OllamaAdapter` | Running Ollama instance |
| llama.cpp | `crp.providers.LlamaCppAdapter` | `llama-cpp-python` or HTTP server |

**Key Features:**
- 351 tests passing (integration + benchmarks + unit)
- Zero-config auto-detection (`Client()` or `Client(model="...")`)
- Quality tier classification (S/A/B/C/D) on every dispatch
- Zero-LLM ingestion (`client.ingest()`)
- Streaming dispatch (`client.dispatch_stream()`)
- Continuation with key findings context threading
- Full observability (events, audit log, metrics export)
- Budget enforcement (windows, input/output tokens)
- State export (encrypted AES-256-GCM)

The specification is language-neutral. JSON Schemas in [`/schemas/`](schemas/) enable code generation for any language.

---

## Comparison with Alternatives

| Approach | What It Does | Limitation | In-Window Overhead |
|----------|-------------|------------|-------------------|
| **Naive Prompting** | Everything in one window | Context contamination, attention collapse | None (quality degrades) |
| **RAG** | Retrieves relevant documents (flat vectors) | No output management, no continuation, no graph | Retrieved chunks only |
| **MemGPT / Letta** | Virtual memory via LLM self-management | LLM burns tokens managing its own memory | High (memory function calls) |
| **GraphRAG** | Knowledge graph + community summaries | Static offline indexing, no real-time extraction | Low (query overhead) |
| **Sliding Window** | Truncates old context | Early context permanently lost | Low (but lossy) |
| **MCP** | Standardized tool interface | Manages tool *access*, not tool *output context* | Very High (10K-50K schemas) |
| **A2A** | Inter-agent communication | Manages *messages between agents*, not context *within* | Varies |
| **CRP** | Task isolation + CKF + extraction envelopes + continuation | Extraction is imperfect | **Zero** |

**Key distinction**: MCP and A2A solve *different problems*. MCP connects LLMs to tools. A2A connects agents to each other. CRP manages context within each agent. They are complementary and can be used together.

---

## Hardware Requirements

| Component | Size | Required? |
|-----------|------|-----------|
| Your LLM | Varies | Yes (already running) |
| all-MiniLM-L6-v2 (embeddings) | ~80MB | Yes |
| ms-marco-MiniLM-L6-v2 (reranker) | ~80MB | No (bi-encoder sufficient for <50 facts) |
| GLiNER (NER) | ~200MB | No (lazy-loaded, degrades gracefully) |
| UIE (relations) | ~400MB | No (lazy-loaded, degrades gracefully) |

**Minimum**: Any machine running an LLM can run CRP. **80MB required** + 0-680MB optional.

---

## Configuration

CRP follows a 5-layer configuration hierarchy (see [§25](specification/02_CORE_PROTOCOL.md)):

```
Layer 5: Runtime API  (highest priority)
Layer 4: Environment Variables
Layer 3: Session Config File
Layer 2: User Config File
Layer 1: Built-in Defaults  (lowest priority)
```

All configuration is optional. CRP works with zero configuration if you pass your LLM adapter directly.

### Key Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `CRP_ENABLED` | `true` | Master switch |
| `CRP_LLM_ENDPOINT` | — | Fallback LLM endpoint if no adapter passed |
| `CRP_LOG_ENVELOPES` | `false` | Debug: log envelope contents |
| `CRP_MAX_WINDOWS` | `100` | Session window limit |
| `CRP_TELEMETRY_FILE` | `crp_telemetry.jsonl` | Telemetry output path |

---

## Use Cases

| Domain | How CRP Helps |
|--------|--------------|
| **Penetration Testing** | Each tool selection, analysis, and report section gets a fresh window with full findings context |
| **Report Generation** | Unbounded-length reports with per-section windows, gap-aware continuation, quality-tiered output |
| **Multi-Step Reasoning** | Each step gets full context; prior conclusions carried as facts. ORC decomposes complex reasoning |
| **Agentic Tool Use** | Tool results extracted into facts immediately; next selection sees ALL discoveries, ranked |
| **Code Generation** | Large codebases across multiple windows; each sees full architecture via envelopes |
| **Research & Analysis** | Long-form analysis exceeding any single window; information flow detects genuine completion |
| **Small Model Amplification** | Meta-learning scaffolds enable 2B–7B models to perform reasoning they cannot do natively |
| **Legal Document Analysis** | Million-token contracts auto-ingested; cross-reference tracking via fact graph |
| **Medical Literature Review** | Cross-session knowledge accumulates across papers; community detection groups related findings |

---

## Roadmap

### Phase 1: Open Specification ← **We are here**
- [x] Publish CRP v2.0 specification (9 documents, ~19,200 lines)
- [x] JSON Schema definitions for all API types
- [ ] Reference SDK: Python (`pip install crprotocol`)
- [ ] Benchmark results: CRP on vs. off across tasks and models
- [ ] arXiv technical report with empirical evaluation

### Phase 2: Ecosystem
- [ ] JSON-RPC server mode — any language can use CRP over HTTP
- [ ] TypeScript/JavaScript reference implementation
- [ ] Integration guides: LangChain, LlamaIndex, AutoGen, CrewAI
- [ ] MCP + CRP integration example
- [ ] A2A + CRP integration example

### Phase 3: Meta-Learning & Advanced Features
- [ ] Source-Grounded Envelope engine
- [ ] LLM-Driven Context Curation with progressive understanding
- [ ] Reasoning Template Library (RTL)
- [ ] Orchestrated Reasoning Chains (ORC)
- [ ] Domain-specialized GLiNER models (cybersecurity, biomedical, legal, financial, regulatory)
- [ ] Benchmark: reasoning amplification on 2B/7B vs. baseline

### Phase 4: Adoption & Standards
- [ ] IETF Internet-Draft submission
- [ ] W3C Community Group: "Context Management for AI"
- [ ] LF AI & Data project hosting
- [ ] Conformance test suite
- [ ] Community benchmark suite for context management quality

---

## Contributing

We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for:

- How to submit issues, spec clarifications, and pull requests
- The RFC process for non-trivial specification changes
- Code of conduct
- Contributor License Agreement (CLA)

---

## Governance

CRP follows an open governance model inspired by the Apache Software Foundation. See [GOVERNANCE.md](GOVERNANCE.md) for:

- Roles: Maintainers, Committers, Contributors
- Decision-making process (consensus-seeking, lazy consensus for minor changes, formal vote for breaking changes)
- Specification versioning and deprecation policy

---

## Security

See [SECURITY.md](SECURITY.md) for:

- Responsible disclosure policy
- Security contact information
- What constitutes a security vulnerability in CRP
- How security issues in reference implementations are handled

The protocol's security architecture is documented in [07_SECURITY.md](specification/07_SECURITY.md) — covering threat modeling, input validation, fact integrity, RBAC, encryption at rest, OWASP mapping, and quantum resistance planning.

---

## Community

- **GitHub Discussions**: [Join the conversation](https://github.com/Constantinos-uni/context-relay-protocol/discussions)
- **GitHub Issues**: Bug reports, spec clarifications, feature requests
- **General enquiries**: [info@crprotocol.io](mailto:info@crprotocol.io)
- **Enterprise & licensing**: [contact@crprotocol.io](mailto:contact@crprotocol.io)

---

## Built With

| Component | Technology |
|-----------|-----------|
| **Knowledge Layer** | CKF — graph walk + pattern query + semantic fallback + community summaries, event-sourced history |
| **Extraction** | 6-stage graduated blackboard-reactive pipeline (regex → TextRank → GLiNER → UIE → RST discourse → LLM-relational) |
| **Source Grounding** | Dual-layer envelopes — extracted facts paired with original text passages |
| **Meta-Learning** | ORC + ICML + RTL — structured reasoning scaffolding for small models |
| **Embeddings** | sentence-transformers/all-MiniLM-L6-v2 (~80MB, CPU) |
| **Reranking** | cross-encoder/ms-marco-MiniLM-L6-v2 (~80MB, ~500 pairs/sec on CPU) |
| **Indexing** | HNSW approximate nearest neighbor — O(log N) retrieval |
| **Storage** | Warm state (in-memory fact graph + event log) + CKF cold storage (SQLite WAL + vector DB + graph) |
| **Coherence** | Voice profiles, progressive document maps, degradation-triggered re-grounding |
| **Validation** | Pydantic v2 / JSON Schema Draft 2020-12 |

---

## Positioning Statement

**For developers building LLM-powered applications** who need reliable context management across multiple LLM invocations, **CRP (Context Relay Protocol)** is an open protocol that provides structured knowledge extraction, cross-session persistence, and honest quality guarantees. **Unlike** ad-hoc prompt chaining, proprietary context APIs, or vector-only RAG, CRP offers a **formally specified, LLM-agnostic, embedded-library protocol** with a graduated extraction pipeline, graph-structured knowledge fabric, and transparent degradation model — all deployable with zero infrastructure overhead.

---

## License

Context Relay Protocol (CRP) is the original work of **Constantinos Vidiniotis**, created in 2026.

### Specification

The protocol specification documents are licensed under the **Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0)**. You may read, share, and adapt the specification with attribution. Full terms: https://creativecommons.org/licenses/by-sa/4.0/

### Implementation Code

SDK and implementation code is licensed under the **Elastic License 2.0 (ELv2)**. You may use CRP freely in your own applications. You may NOT offer CRP as a hosted/managed service without a commercial license.

### Commercial Licensing

For enterprise licensing, managed-service rights, or OEM inquiries:

**AutoCyber AI Pty Ltd** · ABN 22 697 087 166
Email: [contact@crprotocol.io](mailto:contact@crprotocol.io) · General: [info@crprotocol.io](mailto:info@crprotocol.io) · Web: [crprotocol.io](https://crprotocol.io)

### Trademark

"Context Relay Protocol" is a trademark of Constantinos Vidiniotis (application pending, Class 9 — IP Australia).
Use of the name to refer to this project is welcomed; use implying endorsement or affiliation without authorization is not permitted.

See [LICENSE.md](LICENSE.md) for the full license text.

**Copyright (c) 2026 Constantinos Vidiniotis. All rights reserved.**

---

<p align="center">
  <strong>Context Relay Protocol v2.0</strong><br>
  Zero configuration. Unbounded input. Unbounded output. Amplified reasoning.<br>
  Better context at every scale. Honest degradation. Quality-tiered.<br>
  Peak quality. Every window.
</p>
