arbiter-ops · Developer Guide

v1.0.0 · 9-plane hexagonal AIOps substrate · ISO 42001 / NIST AI RMF / EU AI Act Art. 12 / DORA evidenced.
Apache-2.0 · whitelabel-clean (zero upstream-product surface · verified by scripts/arbiter_ops_leak_check.py).

30-second mental model. arbiter-ops is the reasoning layer that sits between your observability stack and your runbooks. It ingests signals, classifies risk, predicts cost, picks a routing decision, optionally runs a SOAR playbook, and writes an audit row for every step. Nine planes · 25+ ports · default adapters for everything · opt-in ML where it pays off. You wire your tech stack at the boundary; the core stays pure.

Table of contents

  1. Install
  2. 5-minute smoke test
  3. Architecture · C4 + 9 planes
  4. Plane 0 · Governance (audit)
  5. Plane 6 · Decision (triage classifier)
  6. Plane 4 · Intelligence (cost predictor + router)
  7. Plane 9 · Improvement (GA surrogate fitness)
  8. Plane 7 · Action (SOAR invokers)
  9. Plane W · Workflow (agent-workflow boards)
  10. Control plane · HTTP API
  11. Console entry-points
  12. Optional ML extras
  13. Local development
  14. Troubleshooting

1 · Install

Python 3.11+ required.

Lite build (no ML deps · always works)

pip install -e packages/arbiter-ops

With opt-in extras

# Pick what you need
pip install -e "packages/arbiter-ops[slack,neo4j,integrations,llm,kafka]"

# ML opt-ins (each adds xgboost + numpy + sklearn / scipy)
pip install -e "packages/arbiter-ops[ml-decision]"      # XGBoost triage classifier
pip install -e "packages/arbiter-ops[ml-intelligence]"  # XGBoost cost predictor
pip install -e "packages/arbiter-ops[ml-surrogate]"     # GA surrogate fitness

# Or everything
pip install -e "packages/arbiter-ops[all]"

Verify

python -c "import arbiter_ops; print(arbiter_ops.__version__)"
# 1.0.0

python scripts/arbiter_ops_leak_check.py
# leaks: 0 across 482 files scanned

python -m pytest packages/arbiter-ops/tests/ -q
# 1034 passed, 8 skipped in ~62s

2 · 5-minute smoke test

The fastest path from nothing to a working substrate: write an audit record using the default LocalAuditAdapter.

from arbiter_ops.governance import (
    make_default_audit_port, AuditRequest, AuditRecord,
    Verdict, VerdictOutcome,
)

audit = make_default_audit_port()  # writes JSON-line evidence to stdout

# 1. Authorize a candidate action BEFORE side effects
verdict = await audit.authorize(AuditRequest(
    actor="ops.bot",
    action="restart_service",
    subject="payments-api",
    context={"incident_id": "INC-2031", "blast_radius": "low"},
))
assert verdict.outcome in (VerdictOutcome.ALLOW, VerdictOutcome.DENY)

# 2. Record what actually happened (post-hoc evidence)
await audit.record(AuditRecord(
    actor="ops.bot",
    action="restart_service",
    subject="payments-api",
    succeeded=True,
    metadata={"verdict_id": verdict.verdict_id, "duration_ms": 412},
))
By default this prints JSON lines to stdout. Set ARBITER_OPS_AUDIT_PATH=/var/log/arbiter-ops.jsonl to write to a file, or swap to a hosted backend (Postgres, Kafka) by registering your own AuditPort implementation through the application container.

3 · Architecture

Two C4-style diagrams orient the rest of the guide. C4 is Simon Brown's notation for software architecture · four nested levels (System Context · Container · Component · Code). The dev guide ships levels 1 and 2; the per-plane deep-dives below correspond to level 3.

C4 Level 1 · System Context

arbiter-ops as a single system surrounded by the actors and systems it interacts with.

arbiter-ops 9-plane hexagonal AIOps substrate 314 modules · 1180 tests · Apache-2.0 authorize() · call · record() [System] Operators SRE · platform team on-call · approvers [Person] LLM agents Claude Desktop · Cursor Windsurf · MCP clients [External system] Observability sources Prometheus · Datadog · OTel Dynatrace · New Relic · Splunk CloudWatch · Grafana · Sentinel [11 vendors] SOAR + ITSM targets Splunk SOAR · XSOAR · Tines Swimlane · Chronicle · Sentinel ServiceNow · Jira · PagerDuty [12 invokers] LLM gateways Anthropic · OpenAI · Bedrock via LiteLLM · Portkey Storage backends Postgres · audit backend Kafka · MinIO · Neo4j consumer-flippable [via adapters] Regulators · auditors EU AI Act Art. 12 · DORA · GDPR Art. 22 audit export (operator-supplied retention) configure · approve · query MCP tools events · metrics · logs invoke · remediate reason · invoke persist · audit audit export (operator-defined)

C4 Level 2 · Container

Internal decomposition · 9 planes flowing left-to-right, governance underpinning everything, supporting containers around the perimeter.

arbiter-ops · system boundary governance · AuditPort.authorize() / record() · the foundation everything depends on (adapter chooses durability) [Container] 1 · sensing 11 vendor receivers CloudEvents · OCSF 2 · context entity + topology change correlator 3 · feature feature engineering pure-python 4 · intelligence LiteLLM · Portkey · XGBoost CostAwareRouter 5 · reasoning ensemble · causal SMT verifier (z3) 6 · decision 5-band autonomy · triage kill switch · policy engine 7 · action 13 invokers · 6 SOAR + ITSM + MCP simulate + rollback gate 8 · evidence CEP builder audit facade 9 · improvement GA + XGBoost surrogate drift detection every plane records to governance control FastAPI · 8 routers hil G-1..G-5 · Temporal triage_room Slack · Teams · Kafka workflow agent-workflow boards agent long-running loop operator arbiter-opsctl · 7 recipes redteam 12-vector scanner mcp server + invoker · 7 tools knowledge runbooks · post-mortems identity SPIFFE-style stub conformance 9-plane probe evidence (8) CEP facade Legend: sensing pipeline (planes 1-4) reasoning + verification (plane 5) decision + action (planes 6-7) improvement + supporting governance + identity (foundation) recent additions (MCP · redteam · SMT verifier) Arrows: solid = data flow · dashed = governance (every plane records to AuditPort) Hex-arch invariant: every plane has domain/{ports,models}.py · application/ · adapters/ — swap by registering a different adapter, never edit engine code.

The 9 planes

#PlaneResponsibilityDefault adapterML opt-in?
1sensingIngest from observability sourcesin-memory + http
2contextEntity + topology resolverin-memory · neo4j extra
3featureFeature engineeringpure-python
4intelligenceLLM + ML reasoners (provider-neutral)HistoricalMeanCostPredictor (Welford)ml-intelligence
5reasoningEnsemble + causal classifierrule-based
6decisionPolicy engine + autonomy levelsHeuristicTriageClassifierml-decision
7actionInvokers · 6 SOAR vendors + adjacentper-vendor REST + simulation
8evidenceFacade over governance auditLocalAuditAdapter
9improvementOffline GA · policy evolutionReplayFitnessEvaluatorml-surrogate

Plus first-class supporting containers: governance (audit primitive · the foundation everything else depends on), control (operator HTTP plane), hil (Temporal worker), workflow (agent-workflow boards), triage_room (Slack/Teams bot), operator (recipe-driven CLI), conformance (architecture probe), identity (tenancy + auth), knowledge (runbooks · post-mortems), agent (long-running loop).

Hex-arch invariant. Every plane has a domain/ directory (pure Python · zero infra imports), an application/ directory (orchestration · uses ports), and an adapters/ directory (concrete tech). You swap a plane by registering a different adapter through the DI container — never by editing engine code.

4 · Plane 0 · Governance (audit)

Foundation everything else depends on. Two-step decision audit: authorize() before, record() after.

Default adapter · LocalAuditAdapter

from arbiter_ops.governance import make_default_audit_port

audit = make_default_audit_port()

Writes JSON-line evidence to stdout, or to a file at $ARBITER_OPS_AUDIT_PATH.

Custom adapter (Postgres / Kafka / your hosted backend)

from arbiter_ops.governance.audit_port import AuditPort, AuditRecord, AuditRequest, Verdict

class PostgresAuditAdapter(AuditPort):
    def __init__(self, conn): self.conn = conn

    async def authorize(self, req: AuditRequest) -> Verdict:
        # consult your policy engine, write the request row
        ...

    async def record(self, rec: AuditRecord) -> None:
        # append to your evidence table
        ...

Then wire it into the application container; the rest of arbiter-ops is unchanged.


5 · Plane 6 · Decision (triage classifier)

Every incident gets a triage prediction: auto_approved, route_to_human, route_to_lead, or rejected · plus a confidence score and feature importances. Two interchangeable adapters via TriageClassifierPort.

Default · HeuristicTriageClassifier (no ML dep)

from arbiter_ops.decision.adapters.heuristic_triage import HeuristicTriageClassifier
from arbiter_ops.decision.domain.models import DecisionInput

clf = HeuristicTriageClassifier()
assert clf.is_loaded()  # always True

prediction = clf.predict(DecisionInput(
    incident_id="INC-2031",
    blast_score=0.18,           # fraction of fleet at risk
    asset_score=0.42,           # asset criticality
    reversibility_score=0.85,   # 1.0 = fully reversible
    confidence_score=0.91,      # upstream model confidence
    reasoning_status="succeeded",
))

print(prediction.predicted_outcome)        # DecisionOutcome.AUTO_APPROVED
print(prediction.confidence)               # 0.97
print(prediction.feature_importances)      # {'reasoning_status': 0.30, 'blast_score': 0.20, ...}

ML opt-in · XGBoostTriageClassifier

# Step 1: install the extra
pip install -e "packages/arbiter-ops[ml-decision]"

# Step 2: train (or use the bundled reference model)
python packages/arbiter-ops/scripts/train_triage_classifier.py \
    --corpus packages/arbiter-ops/artifacts/triage_classifier/example_corpus.jsonl \
    --output packages/arbiter-ops/artifacts/triage_classifier/v1/

# Step 3: load + predict
from arbiter_ops.decision.adapters.xgboost_triage import XGBoostTriageClassifier

clf = XGBoostTriageClassifier(
    model_path="packages/arbiter-ops/artifacts/triage_classifier/v1/model.json",
    class_map_path="packages/arbiter-ops/artifacts/triage_classifier/v1/class_map.json",
    metadata_path="packages/arbiter-ops/artifacts/triage_classifier/v1/metadata.json",
)

if not clf.is_loaded():
    clf = HeuristicTriageClassifier()  # graceful fallback
Both adapters return a TriagePrediction with the same shape: outcome_probabilities, predicted_outcome, confidence, feature_importances, model_version, prediction_id. Swap the adapter behind TriageClassifierPort; downstream code never changes.

6 · Plane 4 · Intelligence (cost predictor + router)

Predicts per-(model, task) cost · success probability · latency, then picks the model with the highest expected value subject to RoutingPolicy constraints.

Default · HistoricalMeanCostPredictor (Welford streaming · no ML dep)

from arbiter_ops.intelligence.adapters import HistoricalMeanCostPredictor

pred = HistoricalMeanCostPredictor()

# Observe historical executions (live · stream from your billing pipe)
for record in cost_records:
    pred.observe(
        model_id=record["model_id"],
        task_class=record["task_class"],
        cost_usd=record["cost"],
        latency_ms=record["latency_ms"],
        succeeded=record["success"],
    )

# Cold below 10 samples per (model, task) — call falls back to policy default
if pred.is_loaded(model_id="claude-sonnet", task_class="triage"):
    p = pred.predict(model_id="claude-sonnet", task_class="triage")
    print(p.expected_cost_usd, p.cost_confidence_interval, p.expected_value)

ML opt-in · XGBoostCostPredictor (3 boosters)

pip install -e "packages/arbiter-ops[ml-intelligence]"

python packages/arbiter-ops/scripts/train_cost_predictor.py \
    --corpus packages/arbiter-ops/artifacts/cost_predictor/example_corpus.jsonl \
    --output packages/arbiter-ops/artifacts/cost_predictor/

# Loads cost_regressor.json + success_classifier.json + latency_regressor.json
from arbiter_ops.intelligence.adapters.xgboost_cost_predictor import XGBoostCostPredictor

pred = XGBoostCostPredictor(model_dir="packages/arbiter-ops/artifacts/cost_predictor/")

Putting it together · CostAwareRouter

from arbiter_ops.intelligence.application.cost_aware_router import CostAwareRouter
from arbiter_ops.intelligence.domain.models import RoutingPolicy

router = CostAwareRouter(predictor=pred)

policy = RoutingPolicy(
    candidates=["claude-haiku", "claude-sonnet", "gpt-4o-mini"],
    capabilities={"reasoning", "json_mode"},
    compliance={"soc2"},
    max_cost_per_call_usd=0.05,
    default_model_id="claude-haiku",   # fallback when no candidate qualifies
)

choice = router.route(task_class="triage", policy=policy)
print(choice.model_id, choice.expected_value)
# claude-sonnet  0.84   # highest expected_value under the cost cap

Wiring the gateway · LiteLLM OR Portkey

The provider-neutral LLMInvokerPort takes a completion callable that matches the OpenAI chat-completions shape. Two production adapters ship — pick by SDK · the rest of the substrate is identical:

# Option A · LiteLLM gateway (the original ADR-0006 default)
pip install -e "packages/arbiter-ops[litellm]"

from litellm import acompletion
from arbiter_ops.intelligence.adapters import GatedLLMInvoker
# (LiteLLMInvoker is the planned production wrapper · for v0.1 you can
#  inject acompletion directly into the audit-gated invoker chain)
invoker = ...

# Option B · Portkey AI Gateway (v0.1+ · this release)
pip install -e "packages/arbiter-ops[portkey]"

from portkey_ai import AsyncPortkey
from arbiter_ops.intelligence.adapters import PortkeyInvoker

pk = AsyncPortkey(api_key=os.environ["PORTKEY_API_KEY"])

invoker = PortkeyInvoker(
    completion=pk.chat.completions.create,
    virtual_key="vk-tenant-acme-anthropic",   # per-tenant credential vault
    portkey_config="cfg-prod-fallback-v3",    # gateway-side fallback chain
    default_metadata={"environment": "prod"},
)

outcome = await invoker.invoke(
    model=model_descriptor,
    request=invocation_request,
)
# outcome.cost_usd · outcome.input_tokens · outcome.output_tokens
# outcome.metadata["portkey_trace_id"] pins back to the AIDecisionRecord
Same surface · different operational features. Portkey adds virtual keys (per-tenant credential vaulting so the substrate never holds raw provider secrets), config-driven fallback + load-balancing (declared in Portkey rather than coded in the FallbackChain), semantic caching (gateway-side), and per-request observability metadata with trace_id correlation back to the substrate's AIDecisionRecord.decision_id. LiteLLM gives a slimmer pure-Python abstraction over the same set of provider SDKs without the gateway value-add.

The same gateway choice applies to the reasoning plane · LiteLLMReasoner and PortkeyReasoner are interchangeable at the ReasonerPort surface · both consume a completion callable and produce a Hypothesis.


7 · Plane 9 · Improvement (GA surrogate fitness)

The GA evolves policy configurations against historical replay. Ground-truth evaluation is expensive (replay ≈ seconds per genome) — the surrogate is fast (XGBoost prediction ≈ μs). HybridFitnessEvaluator blends both: surrogate every generation · ground-truth every N generations · drift detection emits an audit row when the surrogate diverges from reality.

pip install -e "packages/arbiter-ops[ml-surrogate]"

python packages/arbiter-ops/scripts/train_fitness_surrogate.py \
    --corpus packages/arbiter-ops/artifacts/fitness_surrogate/example_corpus.jsonl \
    --output packages/arbiter-ops/artifacts/fitness_surrogate/

from arbiter_ops.improvement.adapters.xgboost_fitness_surrogate import XGBoostFitnessSurrogate
from arbiter_ops.improvement.application.hybrid_fitness import HybridFitnessEvaluator
from arbiter_ops.governance import make_default_audit_port
# (replay-based ground-truth evaluator is your existing FitnessEvaluatorPort)

evaluator = HybridFitnessEvaluator(
    surrogate=XGBoostFitnessSurrogate(model_dir="packages/arbiter-ops/artifacts/fitness_surrogate/"),
    ground_truth=replay_evaluator,
    audit=make_default_audit_port(),
    validation_every_n_generations=5,
    validation_sample_size=8,
    divergence_threshold=0.05,        # MAE on scalar fitness
    fallback_window_generations=5,    # pin to ground-truth on drift
    rng_seed=42,
)

scores = await evaluator.evaluate_population(genomes, generation=27)

# Inspect what happened
print(evaluator.metrics.surrogate_calls,    # cheap calls
      evaluator.metrics.ground_truth_calls, # validation calls
      evaluator.metrics.using_fallback,     # True if drift kicked in
      evaluator.metrics.last_validation_mae)
When drift is detected the evaluator emits a surrogate.drift_detected audit record via the governance port, pins to ground-truth for the next fallback_window_generations, and exposes the event on HybridMetrics.drift_events. Wire this into your alerting; surrogate divergence is a leading indicator of policy-corpus drift.

8 · Plane 7 · Action (SOAR invokers)

6 SOAR vendors shipped. All capabilities ship with reversibility=COSTLY · requires_simulation=True · requires_rollback_artifact=True · the Action plane refuses to invoke without a simulation pass and a recorded rollback artifact.

VendorCapabilitiesAuth
Splunk SOAR (Phantom)run_playbook · create_containertoken
Cortex XSOARrun_playbook · create_incidentAPI key + ID
Tinessend_to_story (webhook) · create_recordper-story webhook secret
Swimlane (Turbine)run_playbook · create_recordPrivate-Token header
Google Chronicle SOARrun_playbook · create_caseOAuth Bearer (static or callable)
Microsoft Sentinelrun_playbook (Logic Apps) · update_incident (ARM)SAS-signed URL · Azure AD Bearer

Example · invoke a Swimlane playbook

from arbiter_ops.action.adapters.invokers.swimlane import (
    SwimlaneInvoker, build_swimlane_tool,
)
from arbiter_ops.action.domain.models import ProposedAction

tool = build_swimlane_tool(
    tool_id="swimlane/turbine",
    base_url="https://swimlane.acme.io",
)

invoker = SwimlaneInvoker(
    base_url="https://swimlane.acme.io",
    api_token="",
    http_client=httpx_client,   # injected
)

action = ProposedAction(
    tool_id="swimlane/turbine",
    capability_id="swimlane.run_playbook",
    parameters={"playbook_id": "pb-block-user", "inputs": {"user_id": "u-123"}},
    aiops_request_id="req-abc",
    aiops_decision_id="dec-xyz",
    aiops_tenant_id="tenant-acme",
    idempotency_key="idem-2031-01",
)

result = await invoker.invoke(tool, action)
# {'capability': 'swimlane.run_playbook',
#  'playbook_id': 'pb-block-user',
#  'run_id': 'swim-run-123',
#  'status_code': 200, ...}

Example · Sentinel Logic Apps callback

from arbiter_ops.action.adapters.invokers.sentinel import SentinelInvoker

invoker = SentinelInvoker(
    default_callback_url="https://prod-15.eastus.logic.azure.com/...&sig=...",
    azure_ad_token_provider=lambda: get_azure_ad_token(),  # for ARM update_incident
    http_client=httpx_client,
)

# Logic Apps webhook — signed URL, no auth header
res = await invoker.invoke(tool, ProposedAction(
    tool_id="sentinel",
    capability_id="sentinel.run_playbook",
    parameters={"trigger_inputs": {"alert_id": "A-77"}},
    ...
))
print(res["workflow_run_id"])   # extracted from x-ms-workflow-run-id response header
AIOps correlation fields are auto-injected into every vendor payload: aiops_request_id aiops_decision_id aiops_tenant_id idempotency_key · use these on the SOAR side for replay protection and cross-system trace.

9 · Plane W · Workflow (agent-workflow boards)

Workflows ship as YAML cards under src/arbiter_ops/workflow/workflows/<name>/. The default substrate is the agent-workflow open-standard board protocol (Apache-2.0).

# Example: aiops-incident workflow card
src/arbiter_ops/workflow/workflows/aiops-incident/
├── workflow.yaml         # state machine + transitions
└── card.schema.json      # JSON Schema for the workflow card

# Drive a board run
from arbiter_ops.workflow.adapters import LocalBoardClient
client = LocalBoardClient()
run_id = await client.start("aiops-incident", inputs={"incident_id": "INC-2031"})
status = await client.status(run_id)

10 · Control plane · HTTP API

FastAPI app · 8 routers · launchable via console entry-point.

Launch

arbiter-ops-control --host 0.0.0.0 --port 8001 --hil-gateway in_process

# Environment overrides
ARBITER_OPS_CONTROL_HOST=0.0.0.0
ARBITER_OPS_CONTROL_PORT=8001
ARBITER_OPS_CONTROL_HIL_GATEWAY=auto_approve   # auto_approve | auto_reject | in_process

Routers

RouterPurposeSample endpoints
tenantPer-tenant config + approach-band overlaysGET/POST /tenants · PUT /tenants/{id}/overlay
policyPolicy CRUD · evolution gates · drift kill rulesGET/POST /policies · POST /policies/{id}/promote
rbacRoles · permissions · principal managementGET /principals · POST /roles/{id}/permissions
killswitchGlobal + per-tenant + per-capability emergency stopsPOST /killswitch/global · POST /killswitch/tenant/{id}
workflowStart / status / cancel workflow runsPOST /workflow/start · GET /workflow/{run_id}
conformanceLive 9-plane conformance probeGET /conformance/probe · GET /conformance/status
substrateSubstrate health + version + readinessGET /healthz · GET /readyz · GET /version
telemetryMetrics surface + decision-event tapGET /metrics · GET /events?since=...

Auth (v0.1 alpha · header-stub)

curl -H "x-arbiter-ops-operator-subject: alice@acme.io" \
     -H "x-arbiter-ops-operator-role: PLATFORM_ADMIN" \
     -H "x-arbiter-ops-tenant-scope: tenant-acme,tenant-globex" \
     http://localhost:8001/healthz
Header-stub auth is alpha-only. For production wire JWT / SPIFFE through arbiter_ops.control.application.authz.make_identity_resolver(...).

11 · Console entry-points

CommandModuleUse
arbiter-opsarbiter_ops.cliTop-level CLI · serve · migrate · smoke-test
arbiter-ops-controlarbiter_ops.control.serverControl plane HTTP server (port 8001 default)
arbiter-ops-hil-workerarbiter_ops.hil.workerTemporal HIL worker
arbiter-ops-hil-submitarbiter_ops.hil.cli.submitSubmit a HIL gate from the CLI
arbiter-ops-agentarbiter_ops.agent.serverLong-running agent loop (E201)
arbiter-ops-improvearbiter_ops.improvement.serverGA campaign server
arbiter-ops-triage-roomsarbiter_ops.triage_room.serverSlack/Teams triage-room bot
arbiter-opsctlarbiter_ops.operator.cli.mainRecipe-driven incident-response CLI

12 · Optional ML extras

ExtraAddsDefault fallbackUsed by
ml-decisionxgboost · numpy · sklearnHeuristicTriageClassifierTriage classifier (decision plane)
ml-intelligencexgboost · numpy · sklearnHistoricalMeanCostPredictorCost predictor (intelligence plane)
ml-surrogatexgboost · numpy · scipyReplayFitnessEvaluator (ground truth)GA surrogate fitness (improvement plane)
slackslack_sdkSlack triage-room bot
neo4jneo4jin-memory contextContext plane (entity + topology)
integrationsrequests · httpxSOAR invokers (action plane)
llmanthropicProvider-neutral intelligence (direct SDK)
litellmlitellm>=1.50LiteLLM cross-provider reasoner (litellm_reasoner.py)
portkeyportkey-ai>=1.8Portkey AI Gateway · reasoner + invoker (portkey_reasoner.py · portkey_invoker.py) · virtual keys · semantic caching · trace_id
kafkaconfluent-kafkain-memory event busEvent-stream sensing
smtz3-solver>=4.13SMT verifier (reasoning Layer 2a · D-10)
alleverything above
Consumer-flippable invariant. Every ML adapter is consumer-flippable behind a port: is_loaded() returns False on load failure, the call returns a graceful default (uniform distribution / cold-fallback / heuristic), and the substrate keeps running. Operators install the extra when they're ready · zero behavior change for users who stick with the default path.

13 · Local development

Run the test suite

python -m pytest packages/arbiter-ops/tests/ -q
# 1118 passed, 8 skipped in ~30s

Leak check (every PR · whitelabel guard)

python scripts/arbiter_ops_leak_check.py
# leaks: 0 across 482 files scanned

The regex catches references to upstream product names anywhere in the package tree. The full pattern lives in scripts/arbiter_ops_leak_check.py as PAT. CI fails on any non-zero count.

Add an adapter

  1. Implement the relevant ...Port ABC under arbiter_ops.<plane>.domain.ports.
  2. Land it under arbiter_ops.<plane>.adapters.<your_adapter>.
  3. Add a conformance test under tests/<plane>/test_<your_adapter>.py using the shared port-conformance fixture.
  4. Wire it through the application container if it should become the default · or expose a factory and let consumers register it.
  5. Run pytest + leak check.

Repository layout (top-level)

packages/arbiter-ops/
├── src/arbiter_ops/
│   ├── action/         decision/      improvement/    intelligence/
│   ├── workflow/       sensing/       context/        feature/
│   ├── reasoning/      governance/    evidence/       hil/
│   ├── control/        agent/         triage_room/    operator/
│   ├── conformance/    identity/      knowledge/
│   └── cli.py
├── tests/                # 1034 passing
├── artifacts/            # trained reference models
│   ├── triage_classifier/v1/
│   ├── cost_predictor/
│   └── fitness_surrogate/
├── scripts/              # train_*.py CLIs
├── docs/                 # this guide + per-feature .md
├── pyproject.toml        # name = "arbiter-ops" · ml-* extras
└── README.md

14 · Troubleshooting

XGBoostTriageClassifier.is_loaded() returns False

The model artifacts didn't load — likely missing model.json, class_map.json, or metadata.json in the expected directory, or an XGBoost-version mismatch with the file format. Operations fall back gracefully to a uniform distribution; the substrate keeps running. To force the heuristic path, inject HeuristicTriageClassifier directly.

HistoricalMeanCostPredictor reports is_loaded() == False

Need ≥10 observations per (model_id, task_class) pair before is_loaded() flips True. Below that threshold the predictor is cold and the router falls back to RoutingPolicy.default_model_id. Pre-warm with a replay step at startup.

TypeError: _estimator_type undefined when loading an XGBoost model

XGBoost 2.x sklearn-wrapper bug. Models in this package are saved with model.get_booster().save_model(path) and loaded via xgb.Booster().load_model(path) — which sidesteps the bug. If you trained with the sklearn API directly, re-export the booster.

Surrogate drift fires too often

Raise divergence_threshold (default 0.05 MAE on scalar fitness), increase validation_sample_size, or extend fallback_window_generations. Drift events are also a signal that the corpus has shifted — retraining the surrogate may be more useful than tuning the gate.

Sentinel run_playbook returns 401

Logic Apps callback URLs are SAS-signed and contain the auth in the URL itself — no Authorization header. If the URL has expired (Logic Apps SAS rotates), regenerate it from the Logic App in Azure portal. ARM update_incident uses Azure AD Bearer; check azure_ad_token_provider wiring.

Audit records aren't appearing

Default LocalAuditAdapter writes to stdout. Set ARBITER_OPS_AUDIT_PATH to a writable file path, or register a hosted backend through the application container. Records flush per call — no buffering.

Tests fail with ModuleNotFoundError: agentic_ops

Stale install from before the rename. Run pip uninstall agentic-ops -y followed by pip install -e packages/arbiter-ops to clear it.


Normative spec at 9-plane hexagonal AIOps architecture v1.4.
This guide reflects v1.0.0. For per-feature deep-dives see triage_classifier.md · cost_predictor.md · fitness_surrogate.md · portkey_integration.md.