arbiter-ops · Feature Catalog
Architecture · the 9 planes
Every plane has the same hex-arch shape: domain/{ports,models}.py (pure Python · zero infra imports) → application/ (orchestration · port-consuming) → adapters/ (concrete tech · swappable).
| # | Plane | Purpose | Ports | Adapters | Recent additions |
|---|---|---|---|---|---|
| 1 | sensing | Ingest from observability sources | 10 | 18 | OCSF normalizer · provenance stamp |
| 2 | context | Entity + topology resolver | 4 | 2 | in-memory defaults · Neo4j scaffold |
| 3 | feature | Feature engineering | — | — | domain-only · pass-through to reasoning |
| 4 | intelligence | LLM + ML reasoners (provider-neutral) | 9 | 8 | Portkey AI Gateway invoker · XGBoost cost predictor · cost-aware router |
| 5 | reasoning | Ensemble + causal classifier | 7 | 6 | Portkey reasoner · SMT verifier (z3) · time-series reasoner |
| 6 | decision | Policy engine + autonomy levels | 5 | 3 | XGBoost triage classifier |
| 7 | action | Invokers (SOAR · ITSM · monitoring · MCP) | 6 | 13 | 6 SOAR vendors + 6 adjacent + MCP invoker |
| 8 | evidence | Facade over governance audit | 4 | 2 | Audit-port adapter |
| 9 | improvement | Offline GA · policy evolution | 6 | 3 | XGBoost surrogate fitness + drift detection |
Plus 11 first-class supporting containers covered below.
Plane 1 · sensing · ingest
Purpose. Receive observability signals from any vendor · normalize · stamp provenance · detect clock skew · route to feature plane.
Ports (10): IngestSinkPort NormalizationPort ProvenancePort ClockSkewPort DeadLetterPort (+ ABCs).
Receivers (11 vendor adapters)
- AppDynamics
- AWS CloudWatch
- CrowdStrike
- Datadog
- Dynatrace
- Grafana
- New Relic
- OpenTelemetry (OTel)
- Prometheus
- Microsoft Sentinel
- Splunk Enterprise
- Generic webhook receiver (FastAPI)
Pipeline adapters
cloud_events_normalizer.py— CloudEvents 1.0 envelope conformanceocsf_normalizer.py— OCSF (Open Cybersecurity Schema Framework) normalizationprovenance_stamp.py— provenance metadata stampingclock_skew_detector.py— clock-skew detection across producersin_memory_dead_letter.py·in_memory_ingest_sink.py— dev defaults
Plane 2 · context · entity + topology
Purpose. Resolve incoming signals against the org's entity graph (services · hosts · users · datacenters · deploys).
Ports (4): TopologyStorePort ChangeCorrelatorPort (+ ABCs).
Adapters (2): in_memory_topology_store.py · in_memory_change_correlator.py · Neo4j adapter scaffold reserved via [neo4j] extra.
Plane 3 · feature · feature engineering
Purpose. Feature engineering pass-through between context and reasoning · domain-models only at v1.0.
Plane 4 · intelligence · LLM + ML reasoners
Purpose. Provider-neutral access to LLMs · per-(model, task) cost prediction · cost-aware routing.
Ports (9):
- ModelRegistryPort · approved model catalog with downgrade ceilings
- RoutingPolicyPort · routing constraints (capabilities · compliance ·
max_cost_per_call_usd) - LLMInvokerPort · provider-neutral invocation
- CostTelemetryPort · streaming cost emission
- CostPredictorPort · per-(model, task) cost / latency / success prediction
Adapters (8):
in_memory_model_registry.py· model registry defaultstatic_routing_policy.py· policy lookup defaultpassthrough_invoker.py· zero-dep dev invoker (canned responses)audit_gated_invoker.py· invoker that routes every call throughgovernance.AuditPort.authorize/recordportkey_invoker.py· productionLLMInvokerPortbacked by the Portkey AI Gateway · virtual-key per-tenant credentialing · config-driven fallback / load-balancing ·trace_idcorrelation · AIOps metadata auto-injection ·[portkey]extrain_memory_cost_telemetry.py· cost emission defaulthistorical_mean_cost_predictor.py· Welford streaming ·is_loaded()True at ≥10 samples per (model, task) · zero ML depxgboost_cost_predictor.py· 3-booster stack (cost regressor + success classifier + latency regressor) ·[ml-intelligence]extra
Application layer: CostAwareRouter · selects highest expected_value subject to RoutingPolicy constraints · fallback to default_model_id.
Plane 5 · reasoning · ensemble + causal + SMT
Purpose. Turn features into Hypotheses · ground every claim against evidence · classify causal vs correlational · verify numeric/constraint proposals deterministically.
Ports (7):
- ReasonerPort · async
reason(request, features, evidence) → Hypothesis - GroundingCheckerPort · "every claim MUST cite specific features / evidence"
- CausalClassifierPort · labels each (cause→effect) pair as causation / correlation / unknown
- ConfidenceCalibratorPort · Brier-score / ECE-based calibration
- FeatureLookupPort · adapter to Feature Plane
- EvidenceLookupPort · adapter to Context Plane
- ReasoningVerifierPort (D-10) · symbolic constraint verification
Adapters (6):
classifier_reasoner.py· rule-based reasonertimeseries_reasoner.py· time-series-pattern reasonerlitellm_reasoner.py· LiteLLM-backed cross-provider reasoner ·[litellm]extraportkey_reasoner.py· Portkey AI Gateway reasoner · adds virtual keys + config-driven fallback + semantic caching + per-request observability metadata +trace_idcorrelation ·[portkey]extrain_memory.py· dev defaultsmt_verifier.py· z3-solver wrapper · bit-deterministic SAT / UNSAT with unsat_core ·[smt]extra
Plane 6 · decision · policy + triage + autonomy
Purpose. Combine reasoning hypothesis · policy engine · autonomy level · per-band invariants · produce a Verdict.
Ports (5):
- PolicyEnginePort · ABC consumed by Cedar / OPA / rule-based adapters
- AutonomyResolverPort · maps tenant + capability to autonomy band
- KillSwitchPort · global + per-tenant + per-capability emergency stops
- DecisionRecorderPort · audit emission (adapter chooses durability)
- TriageClassifierPort · per-incident triage prediction with confidence + feature importances
Adapters (3):
in_memory.py· dev defaults for all portsheuristic_triage.py·HeuristicTriageClassifier· always-available · zero ML dep ·is_loaded()=Truexgboost_triage.py·XGBoostTriageClassifier·xgb.Booster.load_model()(sidesteps the XGBoost 2.x sklearn-wrapper_estimator_type undefinedbug) ·[ml-decision]extra · graceful fallback (is_loaded()=Falseon load failure)
Plane 7 · action · SOAR + ITSM + monitoring invokers
Purpose. Invoke external systems with per-capability invariants enforced (reversibility=COSTLY · requires_simulation=True · requires_rollback_artifact=True).
Ports (6): ToolRegistryPort ToolInvokerPort SimulatorPort IdempotencyStorePort KillSwitchPort ExecutionEventSinkPort.
Invokers (12 vendors)
| Category | Vendor | Capabilities | Auth |
|---|---|---|---|
| SOAR (6) | Splunk SOAR | splunk_soar.run_playbook · splunk_soar.create_container | token |
| Cortex XSOAR | xsoar.run_playbook · xsoar.create_incident | API key + ID | |
| Tines | tines.send_to_story (webhook) · tines.create_record | per-story webhook secret | |
| Swimlane (Turbine) | swimlane.run_playbook · swimlane.create_record | Private-Token header | |
| Google Chronicle SOAR | chronicle_soar.run_playbook · chronicle_soar.create_case | OAuth Bearer (static OR callable) | |
| Microsoft Sentinel | sentinel.run_playbook (Logic Apps signed URL) · sentinel.update_incident (ARM PATCH) | azure_ad_token_provider callable | |
| ITSM (4) | ServiceNow | servicenow.create_incident | u_aiops_* custom fields |
| Jira | jira.create_issue | basic + token | |
| PagerDuty | pagerduty.trigger · acknowledge · resolve | routing key | |
| Opsgenie | opsgenie.create_alert · acknowledge | API key | |
| Monitoring | Grafana annotations | grafana.create_annotation | API token |
| MCP | MCPToolInvoker | mcp.* (any registered capability · prefix stripped on the wire) | operator wires their own MCP client via call_tool callable |
RuntimeError("<vendor> <capability> failed (status=...)") · missing required parameter → ValueError(...) · reversibility class declared at capability-registration time.
Plane 8 · evidence · Compliance Evidence Package facade
Ports (4): AuditReaderPort EvidenceFacadePort (+ ABCs).
Adapters (2):
audit_facade.py· CEP body builder · maintainsaiops_cross_refscorrelation fieldsin_memory_audit_reader.py· dev default
Plane 9 · improvement · GA + drift detection
Purpose. Evolve policy configurations against historical replay · detect surrogate drift · emit audit events.
Ports (6):
- FitnessEvaluatorPort · ground-truth replay evaluator
- SurrogateFitnessEvaluatorPort · fast surrogate
- GenomeOperatorsPort · GA mutation + crossover
- ReplayCorpusPort · historical sample corpus
- CampaignStorePort · campaign state persistence
- ControlPlaneHandoffPort · promotion handoff to control plane
Adapters (3):
in_memory.py· GA campaign defaultsxgboost_fitness_surrogate.py·XGBoostFitnessSurrogate· loads N regressors · KDTree-based confidence-via-neighbor ·[ml-surrogate]extracontrol_plane_handoff.py· production handoff to control plane
Application layer. HybridFitnessEvaluator · surrogate-fast + periodic ground-truth · drift detection emits surrogate.drift_detected audit row at MAE divergence threshold · pins to ground-truth for fallback_window_generations after drift.
Supporting container · governance · audit primitive
Purpose. The foundation every other plane depends on · authorize() before · record() after · the adapter chooses durability (default LocalAuditAdapter writes JSONL).
Surface:
- AuditPort ABC ·
authorize(request) → Verdict+record(record) → None - AuditRequest · request envelope · request_id · tenant_id · kind · intent · actor · payload
- AuditRecord · evidence row · record_id · request_id · tenant_id · outcome · occurred_at · ms_taken
- Verdict · outcome · rationale · flags · policy_version
- VerdictOutcome · APPROVE · BLOCK · DOWNGRADE · DEFER_TO_HIL
LocalAuditAdapter· default · writes JSON-line evidence to$ARBITER_OPS_AUDIT_PATHor stdoutmake_default_audit_port()· factory
Supporting container · control · operator HTTP plane
Purpose. FastAPI operator HTTP plane · 8 routers · 12 ports.
Launchable: arbiter-ops-control --host 0.0.0.0 --port 8001 --hil-gateway in_process
Ports (12): AuthorizationPort TierClassifierPort HilGatewayPort ControlAuditPort TenantOpsPort PolicyDistributionPort KillSwitchOpsPort WorkflowOpsPort RBACPort TelemetryPort ConformanceOpsPort SubstrateSwitchPort.
HTTP routers (8)
| Router | Purpose |
|---|---|
tenant | Per-tenant config · approach-band overlays |
policy | Policy CRUD · evolution gates · drift kill rules |
rbac | Roles · permissions · principals |
killswitch | Global + per-tenant + per-capability emergency stops |
workflow | Start / status / cancel workflow runs |
conformance | Live 9-plane conformance probe |
substrate | Health + version + readiness |
telemetry | Metrics surface + decision-event tap |
x-aiops-operator-subject + x-aiops-operator-role + x-aiops-tenant-scope. Production wires JWT / SPIFFE via make_identity_resolver(...). Header names retained as wire-format contracts.
Supporting container · hil · 5-gate HITL framework
Purpose. G-1..G-5 gate types per the 9-plane spec · composable in front of every governed decision.
Launchable: arbiter-ops-hil-worker (Temporal worker) · arbiter-ops-hil-submit (smoke-test CLI).
Ports (14 · including ABCs): GatePort ChannelPort VerifierPort EscalationPolicyPort ApproverDirectoryPort AntiDarkPatternLintPort EventEmitterPort (each + ABC variant).
Gate adapters
- G-1 per-proposal approval ·
g1_per_proposal.py - G-3 policy-pre-approved (Meta-HITL) ·
g3_policy_envelope.py - G-5 adversarial verifier ·
g5_adversarial_verifier.py(composes in front of G-1 / G-2 / G-3) - G-2 batch review + G-4 outcome envelope are domain-modelled · gate adapters scheduled in v0.5
Adapter inventory
| Category | Adapter | Notes |
|---|---|---|
| Verifiers | verifiers/cross_provider_llm.py | LLM from a different (model, framework) tuple per the 9-plane spec §5.9.12 |
verifiers/rule_based.py | Deterministic rule verifier | |
| Channels | channels/slack.py | Slack SDK · [slack] extra |
channels/operator_console.py | Console fallback for dev | |
| Orchestrators | orchestrator/temporal.py | Production · durable workflow state machine |
orchestrator/in_process.py | Dev / scaffold / no-durability mode | |
orchestrator/git.py | Design-only scaffold · commits = state transitions · TODOs for git add / commit -S / push | |
| Events | events/kafka.py | confluent-kafka · [kafka] extra · production |
events/in_memory.py | Dev default | |
| Lint | lint/rule_based.py | Anti-dark-pattern UI lint (loud-button enforcement · countdown timer guards) |
Supporting container · triage_room · live Slack/Teams rooms
Purpose. Per-incident triage rooms with bidirectional bot · operator on-call invitation · auto-archive + post-mortem flush.
Launchable: arbiter-ops-triage-rooms --tenant-id <tid>
Ports (7): TriageRoomChatPort TriageRoomPolicyPort TriageRoomStorePort AgentContextPort EventSubscriberPort HilSignalPort ControlPlaneInvokerPort.
Adapters (3):
slack.py· Slack SDK ·[slack]extrakafka_subscriber.py· Kafka event-stream subscriberin_memory.py· dev default
Supporting container · agent · long-running loop (E201)
Purpose. Per-tenant agent process · subscribes to incidents · runs triage → reasoning → decision → action → evidence loop.
Launchable: arbiter-ops-agent --tenant-id <tid> --autonomy-level AL-2
Ports (5): IncidentSubscriberPort TriagePort ContextResolverPort FeedbackSinkPort AgentSessionStorePort.
truncated_reason.
Supporting container · operator · arbiter-opsctl (E206)
Purpose. Recipe-driven CLI for scaffolding control-plane UIs · generating apps · deploying to k8s · smoke-testing webhooks · bootstrapping tenants.
Launchable: arbiter-opsctl <recipe> [params...]
Built-in recipes (7)
| Recipe | Purpose |
|---|---|
ui_scaffold | Scaffold a control-plane operator UI |
autonomous_bootstrap | Bootstrap an autonomous AIOps agent |
incident_response | Synthetic incident-response drill |
incident_report | Post-incident report generation |
k8s_deploy | k8s deployment template generator |
tenant_bootstrap | New-tenant bootstrap |
webhook_smoke | Webhook smoke driver |
Ports (4): RecipeRepositoryPort FileWriterPort KubectlPort HttpClientPort.
Adapters (4): filesystem · subprocess-kubectl · httpx · in-memory.
Application layers:
policy/·RecipePolicy+ApprovalLedger+HilGateway· runtime policy enforcementstore/· OCI-registry + filesystem cache for recipe packs
Supporting container · workflow · agent-workflow boards
Purpose. Declarative workflow YAML cards · board protocol orchestration.
Workflow: aiops-incident (under src/arbiter_ops/workflow/workflows/aiops-incident/)
workflow.yaml· 10-state state machinecard.schema.json· JSON Schema for the workflow card
Adapters: aiops_board.py · AIOpsBoard board adapter · load_aiops_incident_workflow() helper.
Application layer: bridge.py · RecipeWorkitemBridge · every arbiter-opsctl recipe run becomes a card.
Supporting container · mcp · Model Context Protocol integration
Purpose. Expose 7 substrate operations as MCP tools (inbound) · invoke external MCP tools from the action plane (outbound). Both surfaces use callable-injection so the mcp SDK is opt-in via the [mcp] extra.
Launchable: arbiter-ops-mcp [--transport stdio|http] [--list-tools]
Exposed MCP tools (7)
| Tool | Plane | Wraps |
|---|---|---|
arbiter_ops_band_check | decision | ApproachRegistry + validates() · CPL-08 invariant |
arbiter_ops_audit_record | governance | AuditPort.authorize + record |
arbiter_ops_triage_predict | decision | TriageClassifierPort · heuristic default · XGBoost via [ml-decision] |
arbiter_ops_cost_predict | intelligence | CostPredictorPort · Welford default · XGBoost via [ml-intelligence] |
arbiter_ops_smt_verify | reasoning | SmtVerifier (z3-solver) · requires [smt] |
arbiter_ops_kill_switch_check | decision | KillSwitchPort.is_engaged |
arbiter_ops_redteam_scan | redteam | RuleBasedRedteamScanner.scan_text · 12-vector matrix |
mcp/registry.py (pure-Python ToolSpec dataclasses · JSON-schema input/output) · mcp/tools.py (substrate dispatchers via SubstrateContainer DI · zero SDK dep · tested without [mcp]) · mcp/server.py (FastMCP wrapper · lazy mcp import). Operators wire production-grade adapters (Postgres audit · XGBoost classifiers · etc.) via build_fastmcp_server(container=...).
Client integrations: Claude Desktop · Cursor · Windsurf · any MCP-compatible client. See docs/mcp_integration.md for the full quick-start (with ~/.claude/desktop_config.json snippet).
Outbound action-plane adapter: MCPToolInvoker (Plane 7 · Action · row added to the invoker table above) · dispatches ProposedAction to any wired MCP client · same invariants (reversibility=COSTLY · simulation · rollback) as the SOAR invokers · AIOps correlation fields auto-injected.
Supporting container · redteam · standalone scanner
Purpose. OWASP Agentic ASI-01 (Goal Hijacking) + ASI-05 (Insecure Output Handling) coverage.
Launchable: arbiter-redteam scan <dir-or-file> [--json] [--min-grade A/B/C/D/F] [--strict] [--severity-floor low/high/critical] [--include glob,...]
12-vector attack matrix
| # | Vector | Severity |
|---|---|---|
| V01 | direct_instruction_override | high |
| V02 | role_play_jailbreak · DAN / STAN / AIM / sudo | high |
| V03 | sandwich_attack | high |
| V04 | encoding_attack · base64 / rot13 / hex | high |
| V05 | authority_impersonation | low |
| V06 | pii_extraction_probe | critical |
| V07 | tool_invocation_smuggling | critical |
| V08 | prompt_leak_probe | high |
| V09 | goal_hijack_directive | critical |
| V10 | multi_step_chained_injection | high |
| V11 | unicode_homoglyph_smuggling | low |
| V12 | policy_bypass_appeal | low |
Grading: A (0 vectors) · B (1 low) · C (2 vectors OR 1 high) · D (3+ non-critical) · F (any critical).
Supporting container · knowledge · runbooks + post-mortems
Ports (4): KnowledgeRepositoryPort EmbeddingPort RerankerPort KnowledgeRetrieverPort.
Adapters (1): in_memory.py · dev default.
Models: KnowledgeCitation maps onto arbiter_ops.reasoning.domain.models.EvidenceCitation(kind=DOC, ref=...) for cross-plane evidence flow.
Supporting container · identity · tenancy + auth (E003)
Ports (6): TenantStorePort AgentIdentityPort SecretsPort (+ ABCs).
Adapters (3 · all in-memory · production swaps via DI):
in_memory_tenant_store.py· per-tenant configin_memory_agent_identity.py· SPIFFE-style agent identity (spiffe://aiops.local/tenants/{tid}/agents/{aid}· trust domain retained as wire-format default · operator-configurable)in_memory_secrets.py· dev secrets fallback
Production note. Cryptographic actor attestation (signed requested_by claims, mTLS service-mesh identity, etc.) is an operator-supplied adapter. The in-memory stub is suitable for dev; production wires the identity provider that fits the deployment.
Supporting container · conformance · 9-plane probe
Purpose. Generate Level-1 (Observed) self-declaration for the 9-plane spec §12 conformance levels.
Application: SelfDeclarationGenerator.generate(level=ConformanceLevel.OBSERVED) ships; Levels 2-4 deferred (independent audit · public attestation · regulator endorsement).
8 console binaries shipped
Installed by pip install -e packages/arbiter-ops:
| Binary | Module | Purpose |
|---|---|---|
arbiter-ops | arbiter_ops.cli | Top-level CLI shim (Phase 4 backlog) |
arbiter-ops-control | arbiter_ops.control.server | FastAPI control plane (8 routers · port 8001) |
arbiter-ops-agent | arbiter_ops.agent.server | Long-running per-tenant agent loop |
arbiter-ops-improve | arbiter_ops.improvement.server | GA campaign server |
arbiter-ops-triage-rooms | arbiter_ops.triage_room.server | Slack / Teams triage rooms service |
arbiter-ops-hil-worker | arbiter_ops.hil.worker | Temporal HIL worker |
arbiter-ops-hil-submit | arbiter_ops.hil.cli.submit | Synthetic HitlRequest submitter (smoke) |
arbiter-opsctl | arbiter_ops.operator.cli.main | Recipe-driven operator CLI |
arbiter-redteam | arbiter_ops.redteam.cli | 12-vector prompt-injection scanner |
arbiter-ops-mcp | arbiter_ops.mcp.server | MCP server · 7 substrate tools (stdio / SSE) |
10 optional dependencies (pyproject.toml)
| Extra | Brings | Used by |
|---|---|---|
slack | slack_sdk | triage_room.adapters.slack · hil.adapters.channels.slack |
neo4j | neo4j | context (production topology store · scaffold) |
integrations | requests · httpx | action invokers |
llm | anthropic | intelligence (provider-neutral LLM access · direct SDK) |
litellm | litellm>=1.50 | reasoning · litellm_reasoner.py (cross-provider abstraction) |
portkey | portkey-ai>=1.8 | reasoning · portkey_reasoner.py + intelligence · portkey_invoker.py (AI gateway · virtual keys · semantic caching · trace_id correlation) |
mcp | mcp>=1.0 | arbiter-ops-mcp server transport · MCPToolInvoker uses callable injection so its SDK dep is optional |
kafka | confluent-kafka | hil.adapters.events.kafka · triage_room.kafka_subscriber |
ml-decision | xgboost>=2.0,<3.0 · numpy · scikit-learn | XGBoost triage classifier |
ml-intelligence | xgboost>=2.0,<3.0 · numpy · scikit-learn | XGBoost cost predictor (3-booster stack) |
ml-surrogate | xgboost>=2.0,<3.0 · numpy · scipy | GA surrogate fitness evaluator |
smt | z3-solver>=4.13 | SMT verifier (reasoning · Layer 2a deterministic) |
signing | cryptography | Ed25519 recipe-pack signer (operator/store) |
all | everything above | — |
Architectural invariants (enforced at multiple layers)
| Invariant | Enforced where | Why it's load-bearing |
|---|---|---|
reversibility=COSTLY + requires_simulation=True + requires_rollback_artifact=True | Every action-plane capability registration | Prevents irreversible side effects without a simulation pass and rollback evidence |
| AIOps correlation fields auto-injected | Every SOAR / ITSM / monitoring invoker | Replay protection + cross-system trace |
| Per-band invariants enforced at 3 layers | Response model · UI · band guard | CPL-08 · stops band-mismatch at any single point of compromise |
authorize() → call → record() | Every LLM call via audit_gated_invoker.py | EU AI Act Art. 12 / GDPR Art. 22 evidence chain (adapter chooses durability) |
| CPL-22 anti-runaway caps | agent.application.loop | Three independent caps (turns · tokens · USD) · defense-in-depth |
| Consumer-flippable port + ML opt-in | Every ML adapter (XGBoost triage · cost · surrogate) | Default path is zero-ML-dep · operators install extras when ready · graceful fallback on load failure |
| Layer 2a deterministic verifier load-bearing | SMT verifier · per-band invariants · policy_hash() tamper refusal | The verifier is the only re-runnable point in the loop a regulator can replay |
Performance (published benchmarks)
| Benchmark | p50 | p99 | Throughput |
|---|---|---|---|
approach_registry.get_record (band lookup) | 0.7 μs | 1.4 μs | 1.15M ops/sec |
approach_record.validates (per-band invariant SAT) | 0.7 μs | 1.6 μs | 1.10M ops/sec |
approach_record.validates (UNSAT) | 0.7 μs | 0.9 μs | 1.18M ops/sec |
policy_engine.evaluate (AllowAll fallback) | 1.1 μs | 1.7 μs | 805K ops/sec |
cep_hash_chain.link (canonical SHA-256 + chain link) | 7 μs | 15 μs | 137K ops/sec |
end_to_end.verifier_path (band + invariant + hash) | 4.8 μs | 15 μs | 181K ops/sec |
smt_verifier.verify_constraints (DTI cap) | 686 μs | 1.7 ms | 1.3K ops/sec |
All numbers above are arbiter-ops' own measurements on its own code, on the host described in docs/BENCHMARKS.md. Comparisons against any other vendor's published numbers are not made here — readers evaluating arbiter-ops against existing tooling should reproduce both products' benchmarks on their own hardware before drawing conclusions. See docs/BENCHMARKS.md for the full methodology and disclaimer.
OWASP Agentic Top 10 coverage
| Risk | arbiter-ops control | Coverage |
|---|---|---|
| ASI-01 · Goal Hijacking | CPL-22 caps + CPL-08 band guard + authorize() pre-call gate | ✓ full |
| ASI-02 · Excessive Capabilities | 5-band approach registry + ToolCapability registry | ✓ full |
| ASI-03 · Identity & Privilege Abuse | requested_by + tenant scoping + SPIFFE-style stub | ◐ partial |
| ASI-04 · Uncontrolled Code Execution | Action-plane simulation + rollback gate | ✓ full |
| ASI-05 · Insecure Output Handling | Verifier chain (rule + groundedness + RAGAS + cross-LLM + G5 + redteam CLI) | ✓ full |
| ASI-06 · Memory Poisoning | AuditPort.record() append-only by contract | ◐ partial |
| ASI-07 · Unsafe Inter-Agent Comms | gov-everywhere + MCP gating | ◐ partial |
| ASI-08 · Cascading Failures | CPL-22 caps + drift kill-switch + budget tracker | ✓ full |
| ASI-09 · Human-Agent Trust Deficit | AuditPort.authorize/record on every call · LocalAuditAdapter default | ◐ partial |
| ASI-10 · Rogue Agents | ai_kill_switch_t (global / per-tenant / per-capability) + band invariants | ✓ full |
Headline. 6 / 10 full · 4 / 10 partial. The partial items (ASI-03 · ASI-06 · ASI-07 · ASI-09) are all areas where the substrate provides the contract + a dev-grade default · operators supply the production-grade adapter (durable audit backend, mTLS service-mesh identity, etc.).
Tooling + scripts
| Script | Purpose |
|---|---|
scripts/arbiter_ops_leak_check.py | Whitelabel guard · scans for upstream-product strings (leaks: 0 / 491 files) |
scripts/benchmarks/bench_verifier_path.py | Reproducible Layer 2a benchmarks · --iterations · --json · --only |
What's deferred (operator follow-up)
| Item | Sprint | Estimate |
|---|---|---|
A-1 · expand context / knowledge / identity test coverage to ≥0.5 ratio | Sprint 9 | ~3 days |
A-2 · narrow 45 except Exception: catches + add log.warning(..., exc_info=True) | Sprint 9 | ~1.5 days |
A-3 · ship hil/adapters/orchestrator/git.py HIL orchestrator OR DEFERRED.md entry | Sprint 9 | ~1.5 days |
| G-2 batch review + G-4 outcome envelope gate adapters | v0.5 | TBD |
| Conformance levels 2-4 (independent audit · public attestation · regulator endorsement) | v0.5 / v1.0 / v2.0 | product milestones |
| End-to-end agent-to-agent encryption (ASI-07 partial → full) | Sprint 10+ | TBD |
| 5-language SDK matrix (.NET / Rust / Go) | customer demand only | — |
| Generic framework middleware (LangChain / CrewAI / AutoGen) | not planned · we sit one layer down | — |