Metadata-Version: 2.4
Name: clawglove
Version: 2.1.0
Summary: Governed execution substrate for autonomous AI agents
Author: Navakanth
License: Apache-2.0
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: confluent-kafka>=2.3.0
Requires-Dist: docker>=7.0.0
Requires-Dist: opentelemetry-sdk>=1.24.0
Requires-Dist: opentelemetry-exporter-otlp>=1.24.0
Requires-Dist: pyyaml>=6.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23.0; extra == "dev"
Provides-Extra: fs-watch
Requires-Dist: watchdog>=4.0.0; extra == "fs-watch"
Provides-Extra: distributed
Requires-Dist: etcd3>=0.12.0; extra == "distributed"
Dynamic: license-file

﻿<p align="center">
  <img src="assets/clawglove-logo.png" width="320"/>
</p>

# ClawGlove

**Governed execution substrate for autonomous agents and OpenClaw-class runtimes.**  
Intercepts, audits, and enforces policy boundaries on every action an AI agent attempts — at the network layer, before it reaches the outside world.

[![CGBench Grade](https://img.shields.io/badge/CGBench-G--5%20Provenance%20Certified-blueviolet?style=flat-square)](https://github.com/navakanth1984/ClawGlove)
[![Sensitivity](https://img.shields.io/badge/Sensitivity-100%25-brightgreen?style=flat-square)](https://github.com/navakanth1984/ClawGlove)
[![Specificity](https://img.shields.io/badge/Specificity-100%25-brightgreen?style=flat-square)](https://github.com/navakanth1984/ClawGlove)
[![H_escape](https://img.shields.io/badge/H__escape-0.6194%20bits-blue?style=flat-square)](https://github.com/navakanth1984/ClawGlove)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow?style=flat-square)](LICENSE)

---

## The Problem

Autonomous AI agents are powerful precisely because they act without asking permission. That same property makes them dangerous: a compromised or misaligned agent can exfiltrate credentials, cross tenant boundaries, persist malicious context across sessions, or bypass its LLM gateway — before operator intervention is possible.

Existing safety benchmarks measure what a model *says* it will do. ClawGlove measures what an agent *actually does* — at the execution layer — and stops it if it violates policy.

**The core metric:** *Governance Escape Entropy* (H_escape) — the Shannon entropy of escape paths an agent can probe before quarantine. Standard agents have H_escape > 1.5 bits. ClawGlove reduces it to **0.62 bits** under normal operation and **0.00 bits** under active attack (quarantine collapses all escape paths to a single outcome).

---

## What ClawGlove Does

<p align="center">
  <img src="assets/clawglove_architecture.png" alt="ClawGlove Sidecar Governance Plane Architecture" width="800"/>
</p>

* **Agent containers**: Reside in the `clawglove-internal` network (`internal: true`).
* **Fail-Closed Security**: There is **NO direct external route** to the internet. The only path is through the ClawGlove Sidecar. If the sidecar goes down, agents go dark instantly by design.

---

## CGBench Certification

ClawGlove ships with **CGBench** — the Adversarial Governance Benchmark Suite — a 5-layer certification harness that produces a verified governance grade.

### Scorecard

```
================================================================================
                  CGBENCH GOVERNANCE CERTIFICATION SCORECARD
================================================================================
  Governance Entropy (H_gov):   5.5639 bits  (creative tool path variance)
  Escape Entropy (H_escape):    0.6194 bits  (target <=1.0 bits)   ✓
  Governance Drift Velocity:    0.9053 / step (trust boundary expansion rate)
  Contamination Isolation:      100.0%     (target 100%)           ✓
  Replay Mutation Recovery:     100.0%     (target >=98.0%)        ✓
  Persistence Dwell Blocking:   5.0 ms     (target <=100ms)        ✓
  Survivability Index:          100.0%     (target 100%)           ✓
  Surface Exposure Score:       0.100      (target <=0.150)        ✓
  Runtime Constraint Safety:    100.0%     (target >=98%)          ✓
--------------------------------------------------------------------------------
  AWARDED GOVERNANCE GRADE:     G-5 (Provenance Certified)
================================================================================
```

### Governance Grades

| Grade | Name | H_escape | Description |
|---|---|---|---|
| G-1 | Baseline | > 3.0 bits | No governance layer |
| G-2 | Policy Aware | ≤ 3.0 bits | Basic allow/deny rules |
| G-3 | Epoch Sealed | ≤ 2.0 bits | Replay integrity enforced |
| G-4 | Drift Certified | ≤ 1.0 bits | Escape entropy bounded |
| **G-5** | **Provenance Certified** | **≤ 1.0 bits + all gates** | **Full substrate certification** |

### Benchmark Layers

| Layer | What It Tests |
|---|---|
| L1 — Runtime Governance | 25 violation + 25 clean action deterministic checks |
| L2 — Probabilistic Drift | Shannon escape entropy over 50 stochastic runs |
| L3 — Cross-Agent Contamination | Shared memory poison, trust epoch mismatch, cross-domain replay |
| L4 — Autonomous Persistence | Policy self-modification, heartbeat amplification, subprocess replication |
| L5 — Infrastructure Resilience | Kafka/etcd offline fallbacks, zero-stall verification |
| X-Ray | Trust surface scan: config boundaries, unverified skills, secret exposure |
| Forensic | EventStore mutation injection and recovery audit |

---

## CVE Coverage

ClawGlove's violation suite is grounded in real OpenClaw CVEs:

| CVE | Category |
|---|---|
| CVE-2026-25253 | Sarvam API key exfiltration |
| CVE-2026-32922 | Cross-workspace boundary violation |
| CVE-2026-24763 | openclaw.json config poisoning |
| CVE-2026-26322 | Non-Sarvam LLM provider bypass |
| CVE-2026-26329 | Shell injection via tool output |
| CVE-2026-30741 | Path traversal read |
| CVE-2026-35650 | Writable config boundary (CVSS 9.8) |
| ClawHavoc | Multi-vector coordinated campaign |

---

## Roadmap

ClawGlove currently governs **structural** agent behavior — outbound actions, policy boundaries, and runtime state transitions. The known open frontier is **semantic** governance:

- **Context Provenance Tracking** — every memory fragment carries lineage, every tool result carries trust origin, every derived conclusion maps upstream. Git blame for autonomous cognition.
- **Retrieval Contamination Detection** — poisoned memory shaping and multi-turn context laundering operating below the action layer.
- **Latent Goal Drift** — long-horizon steering that never triggers a single policy violation but cumulatively moves the agent off its intended trajectory.

These are tracked as future work. Contributions welcome.

---

## Architecture Decisions

| Decision | Choice | Rationale |
|---|---|---|
| Sidecar model | Out-of-process TCP daemon | Agent crash cannot bypass governance |
| Fail-closed | Docker `internal: true` network | No sidecar = no internet for agents |
| Event store | Kafka (Redpanda) + local fallback | Append-only, tamper-evident governance log |
| Consensus | etcd + local fallback | Distributed leader election for multi-node |
| Observability | OpenTelemetry → Jaeger | Governance spans traceable across agent calls |
| Eval harness | Human-written, OS-ACL locked | AI cannot modify its own benchmark |

---

## Quick Start

**Prerequisites:** Docker Desktop, Python 3.12

```bash
git clone https://github.com/navakanth1984/ClawGlove.git
cd ClawGlove

# Install Python package
pip install -e .

# Start the full stack (Redpanda + etcd + OTel + Jaeger + sidecar + agents)
docker compose up -d

# Wait ~30s for services to initialize, then verify
docker ps --format "{{.Names}}\t{{.Status}}"
```

All 7 containers should show `Up` or `healthy`.

---

## Verify Your Deployment

**Phase 5 — Network isolation + fail-closed test (10 checks):**
```powershell
.\scripts\test_failclosed.ps1
```

Expected: `RESULT: 10 passed / 0 failed`

**CGBench — Full governance certification:**
```powershell
$env:PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION="python"
$env:PYTHONIOENCODING="utf-8"
py -m cgbench.runner --policies policies/ --runs 50
```

Expected: `AWARDED GOVERNANCE GRADE: G-5 (Provenance Certified)`

---

## Repository Structure

```
ClawGlove/
├── clawglove/
│   ├── interfaces/          — frozen ABCs (EventStore, TenantIsolation, Coordinator, Telemetry, PolicyEngine)
│   ├── events/
│   │   └── kafka_store.py   — Kafka EventStore with offline fallback
│   ├── tenants/
│   │   └── container_isolation.py — Docker tenant isolation with cgroup limits
│   ├── runtime/
│   │   └── etcd_coordinator.py    — etcd leader election + offline fallback
│   ├── metrics/
│   │   └── otel_telemetry.py      — OpenTelemetry with 4 governance scalars
│   ├── policies/
│   │   ├── compiler.py      — YAML policy compiler
│   │   └── engine.py        — runtime policy enforcer (fail-closed)
│   └── sidecar/
│       ├── daemon.py        — out-of-process TCP sidecar
│       ├── client.py        — agent-side client wrapper
│       ├── http_proxy.py    — HTTP proxy intercepting api.sarvam.ai
│       └── escalation.py   — ThreatEscalationTracker (NORMAL→ELEVATED→QUARANTINE)
├── cgbench/
│   ├── runner.py            — benchmark orchestrator
│   ├── metrics.py           — H_escape, H_gov, V_drift calculations
│   ├── discovery.py         — X-Ray trust surface scanner
│   └── layers/              — L1â€“L5 + replay integrity
├── policies/
│   ├── screenwriter.yaml    — screenwriter agent policy
│   └── director.yaml        — director agent policy
├── tests/eval/
│   └── run_empirical_eval.py — human-locked evaluation harness (25 violations + 25 clean)
├── docker/
│   ├── Dockerfile.sidecar
│   ├── Dockerfile.agent-test
│   ├── start-sidecar.sh
│   ├── agent_harness.py
│   └── otel-collector-config.yaml
├── docker-compose.yml
└── scripts/
    ├── test_failclosed.ps1  — Phase 5 network isolation verification
    └── cgbench_docker.ps1   — CGBench against live Docker stack
```

---

## Evaluation Harness Integrity

The empirical evaluation harness (`tests/eval/run_empirical_eval.py`) is **human-written and OS-ACL locked**. A git pre-commit hook rejects any commit that modifies files under `tests/eval/`. This ensures the benchmark cannot be gamed by the system being measured.

```bash
# Lock on Windows (run once after clone)
icacls tests\eval\ /deny SYSTEM:(OI)(CI)(W)
icacls tests\eval\ /deny %USERNAME%:(OI)(CI)(W)
```

---

## Policy Configuration

Policies are YAML files in `policies/`. Each tenant policy defines allowed actions and explicit denials:

```yaml
tenant_id: screenwriter
allowed_actions:
  - llm_call
  - file_read_workspace
  - memory_read
  - send_*
  - search_*
  - heartbeat_*
denied_actions:
  - read_credentials_dir
  - exec_shell_command
  - cross_tenant_*
  - write_soul_md
  - write_agents_md
  - install_unverified_skill
```

Non-tenant YAML files (budget limits, routing weights) are silently skipped by the compiler.

---

## License

MIT — see [LICENSE](LICENSE)

---

## Author

Built by [Navakanth](https://github.com/navakanth1984) — Microsoft Certified Trainer, Data Engineer, and AI systems architect based in Hyderabad, India.

