Domain Identity ⓘ How it works
Define the scope of your ontology. All generated IRIs and documentation will use these values.

Load a starter template

Pre-built domains that fill in entities, events, relationships, and competency questions for you.
  • What's shown: 10 industry templates — five built-in (telecom, healthcare, finance, manufacturing, retail) and five YAML-defined (energy, logistics, government, insurance, pharmaceuticals).
  • Why it matters: a starter saves 30+ minutes of typing and gives you a known-good shape to edit. You can change anything afterwards.
  • How to use: click any tile to load it. The whole session is replaced — fields below, entities, events, relationships, and CQs all update at once.

Import an existing schema

Drop a .json session file (wizard or onboard.py format) or a .json schema. SQL DDL import lands in a follow-up step (#16).

Connect directly to your operational database. Tables, columns, and foreign keys are read into the same review modal as the file path — no upload required.

Pick an existing profile to pre-fill the form, or leave on — New connection — to enter fresh credentials.
Connection details
Select a vendor to enable these actions.
Credential handling. Passwords are stored locally in db/ontologies.db with base64 obfuscation — a defence against casual inspection, not encryption. Profiles never leave your machine. For production credentials, prefer secrets provided via your organisation's key store at runtime.

Domain details

The identity of your ontology — used in every generated IRI, SHACL shape, and report.
  • Domain Name: human-readable name shown in the report header (e.g. "Smart Building Operations").
  • Description: one or two sentences. Goes into the auto-generated scope charter and is read by the LLM if you use --llm suggestions.
  • Base IRI: the namespace every OWL class lives under. Pick once and stick with it — changing it later breaks downstream graph stores.
  • Industry: selects sensible defaults for sensitivity tiers and standards alignment (e.g. TM Forum SID for telecom).
The main things in your domain — assets, customers, resources, orders. Each becomes an OWL class.

Entities (0)

The persistent nouns of your domain — the things your data is about.
  • What's shown: one chip per entity, with its sensitivity tier and a delete button. The badge in the heading is the live count.
  • Why it matters: every entity becomes an owl:Class in the generated ontology, a sh:NodeShape in the SHACL file, and a @type in the JSON-LD context.
  • How to read: aim for 5–10 entities. If you have more than 15, you're probably modelling implementation details rather than domain concepts. Click a chip to edit; drag to reorder.
Log Discovery L4 · Preview ⓘ How it works
Convert a folder of raw logs into ontology candidates an engineer reviews one at a time. Approved candidates flow into the regular Wizard steps (Entities · Events · Relationships · Rules) and end up as classes, properties, and SPARQL rules in your generated ontology.

Pipeline at a glance

📂
Mine
Drain clusters log lines into templates; typed slots; PMI entity graph; Granger causality gate
--phase mine
📈
Sequence
Per-service HMM over trajectories; anomalous sequences surface as event candidates
--phase sequence
👁️
Review
You are here. Approve / reject / edit / merge proposals so they flow into your session
Step 5 (this screen)
🛠️
Generate
RCA-shaped ontology with :CausalEvent · :hasCause · :rootCause taxonomy
--phase 2
🧠
Reason
Materialise derived :hasCause + :triggers with prov:wasDerivedFrom lineage
--phase reason
💬
Ask
Insights preset "what caused this event?" answers from the materialised graph
/api/insights/rca

The four candidate kinds

📅 Event LOG_EVENT
A recurring message pattern Drain found in your logs. Approving it creates an OWL class.
Approval lands in: Step 3 — Events
Becomes: rdfs:subClassOf :CausalEvent
Auto-emits: :startedAt · :endedAt · :duration
🏷️ Entity LOG_ENTITY
A variable slot the template carries — typed as IRI / UUID / ENUM. An identifier worth modelling.
Approval lands in: Step 2 — Entities
PII heuristic: imsi · msisdn · email auto-flagged
Sensitivity: defaults to Confidential when PII
🔗 Relationship LOG_RELATIONSHIP
Two entities that co-occur with strong PMI but where the statistical causality test was inconclusive.
Approval lands in: Step 4 — Relationships
Direction: undirected
Signal: PMI ≥ 3.0
⚡ Causal edge LOG_CAUSAL_EDGE
A directed edge that passed all three gates: PMI, temporal ordering, and Granger or transfer-entropy.
Approval lands in: Step 7 — Rules
Becomes: SPARQL CONSTRUCT for :hasCause
Materialised in: --phase reason

What each button does

Approve
Marks APPROVED and pushes into the matching session bucket. Visible on Steps 2 / 3 / 4 / 7 right away.
Reject
Marks REJECTED with an optional note. Persists across re-mines — your decision sticks.
Edit
Override the auto-suggested label before approving. Turn "Event template #18" into "Heartbeat Timeout".
🔀
Merge
Fold into an existing session entry by name. The sample log line joins the target's evidence_samples.
📊 Sort order
Confidence × consequence. PRML Ch. 1 decision-theoretic ranking — a low-confidence CRITICAL row outranks a high-confidence INFO row.
🔒 Privacy
Slot values matching imsi · msisdn · email · ssn · tax default to Confidential sensitivity. Downgrade at approval time if appropriate.
🔄 Drift loop
--phase drift-templates watches new logs and queues never-seen patterns here with strategy DRIFT_ON_NEW_TEMPLATE — same review surface for bootstrap + steady-state.
Things that happen — incidents, alerts, orders. Events become OWL subclasses of a base Event class.

Events (0)

The verbs — things that happen at a point in time.
  • What's shown: one chip per event, the live count in the heading badge.
  • Why it matters: events become owl:Class subclasses of a base Event class with prov:atTime and prov:wasAttributedTo properties — drift detection, audit logs, and PROV-O lineage all hang off them.
  • How to read: entities answer "what exists?", events answer "what happened?". A good rule: if you'd record it in a log line or audit trail, it's an event. 3–8 event types is typical.
Relationships ⓘ How it works
Describe how entities connect. Write plain sentences — e.g. "A Customer places one or more Orders."

Relationships (0)

How your entities connect — written as plain English sentences.
  • What's shown: a List tab for editing and a Graph tab for a force-directed view of your domain.
  • Why it matters: each sentence becomes an owl:ObjectProperty with proper rdfs:domain and rdfs:range. The graph view exposes structural problems (orphan entities, missing back-edges, accidental cycles) at a glance — issues you'd miss scanning a flat list.
  • How to read: always source entity → verb phrase → target entity, e.g. "A Customer places one or more Orders." Cardinality words become SHACL constraints automatically. Drag nodes to explore; the layout settles after a second.
Entity Event
Reading the graph.
  • Nodes = your entities and events. Blue = persistent things, purple = events. Orphan nodes (no edges) usually mean you've defined an entity but never connected it.
  • Arrows point from source entity to target entity, labelled with the verb phrase. Bidirectional concepts need two relationships, one each way.
  • Drag a node to rearrange. Scroll to zoom. ⟳ Re-layout kicks the physics simulation; Fit centres everything in view.
Competency Questions ⓘ How it works
Questions your ontology must be able to answer. These drive the governance scorecard and SPARQL test suite.

Questions (0)

The questions your ontology must be able to answer.
  • What's shown: one row per CQ, count in the heading badge. New ones go in the input below.
  • Why it matters: each CQ becomes a SPARQL test in the generated test suite. Their pass rate is one of the 34 inputs to the governance scorecard, and a failed CQ on CI blocks merge.
  • How to read: a good CQ is specific and answerable from the data — "Which assets have been in outage for more than 4 hours?" beats "Are assets healthy?". Aim for 5–8 to start; you can always add more later.
Rules & Reasoning ⓘ How it works
Author SHACL, SPARQL CONSTRUCT, and OWL-axiom rules. Saved rules are validated, persisted to session.rules, and consumed by --phase reason at materialisation time.

Rules (0)

Rules let the reasoner derive triples your data doesn't state directly.
  • SHACLsh:rule blocks. "When this shape matches, assert these triples." Use for data-driven assertions.
  • SPARQLCONSTRUCT { ?x :p ?y } WHERE { ... }. Use for pattern-based inference (transitive impact, joins).
  • OWL — Turtle fragments declaring property characteristics (transitive, equivalent, etc.). Use for taxonomy-level axioms.
  • Validation — every rule is parsed and smoke-executed against an empty graph before save. Broken rules cannot be saved.
Generate Ontology ⓘ How it works
Review your session and run the toolkit pipeline to generate OWL, SHACL, JSON-LD, SKOS, and the governance report.

Pre-flight check & run

The pipeline reads your wizard session, generates the full ontology bundle (OWL, SHACL, JSON-LD, SKOS), runs the SPARQL competency-question tests, and writes the HTML governance report. Typical run on a small domain: 3–10 seconds.
  • Required: a domain name and at least one entity. Without these the pipeline runs but produces an empty graph.
  • Recommended: a base IRI and at least one competency question. CQs drive the governance scorecard, so a run without them won't score above the minimum gate.
  • Reversible: nothing in your input is changed. Outputs land under output/ (or your wizard project directory) and can be regenerated freely.

    Session Summary

    A read-only roll-up of everything you've defined so far.
    • What's shown: domain identity + counts of entities, events, relationships, and CQs.
    • Why it matters: last sanity check before generation. If a count looks wrong, jump back to that step before clicking run.
    • How to read: rough sizing — under 5 entities is probably too thin; over 20 is probably too detailed. Same shape rule for CQs.

    Pipeline phases

    Which steps of the toolkit pipeline to run. Default = all of them.
    • 1 — Foundation: introspects your schema and writes the annotated class/property inventory.
    • 2 — OWL Modeling: generates enterprise.ttl + events.ttl + provenance.ttl.
    • 3 — SHACL: generates validation shapes (enterprise-shapes.ttl, agent-gate.ttl).
    • 4 — Mapping: writes the mapping workbook + semantic loss report so you can see which DB columns didn't map cleanly.
    • 5 — JSON-LD: emits the JSON-LD context, sample payloads, SKOS vocab, and MCP tool definitions.
    • Test: runs the SPARQL CQ suite and computes the 34-criterion governance scorecard.
    • Report: renders the visual HTML run summary.
    Uncheck individual phases only if you're iterating on a specific output and want a faster loop.

    Pipeline Output

    Live tail of the toolkit pipeline as it runs.
    • What's shown: the last ~50 lines of stdout/stderr from toolkit.py, refreshed every second while running.
    • Why it matters: this is where you see ✓ PASS / ✗ FAIL on each phase, the SPARQL CQ results, and the final governance score.
    • How to read: green ticks = phase succeeded. Red FAIL / ERROR = look at the line above for the cause (commonly a missing entity or a malformed relationship). The badge top-right turns green on success.
    Waiting to run pipeline...
    Ontology Library
    All ontologies you've saved — load any one back into the wizard, or open it in the read-only viewer.

    Saved Ontologies

    Loading…
    Settings
    Switch starting points · pick which domain templates appear on Step 1 · configure LLM providers.

    Switch starting point

    Re-open the welcome picker to load a different industry template. The current session (entities, events, relationships, CQs) will be replaced by the template you pick — save to Library first if you want to keep it.

    Domains to show on Step 1

    Tick the domains you want on the Step 1 · Domain Identity template grid. By default nothing is ticked — Step 1 stays clean until you pick. Unticked domains stay loadable any time via the Switch-starting-point picker above and via GET /api/template/<name>. (This does not affect the marketing landing page at /, which is brand chrome only.)
    Loading…

    LLM Providers

    Insights uses one of these providers per call. Keys are read from environment variables (or the provider's config file) — never persisted by the wizard. The residency hint tells you whether a request leaves your tenancy. Materialised triples from Phase B are sent only when the toggle below is on, and only if you confirm after seeing the hint for the active provider.
    Loading…

    Ask Insights

    Question grounded in the asserted ontology. Toggle Include materialised to also send Phase B inferences (transitive impact, sh:rule outputs, etc.). The response is checked for hallucinated IRIs against the ontology vocabulary.
    Ontology Evolution Review ⓘ How it works
    Workstream 2 — Review ontology-evolution proposals surfaced from production observations. Approve, reject, or defer each candidate. Approved proposals flow through the reasoner + SPARQL CQ gate before auto-bumping the ontology version.

    Detection run

    Trigger the anomaly monitor to scan production observations and surface candidate ontology changes.
    • Minimum evidence: how many independent observations a candidate needs before it becomes a proposal. Higher = fewer, better-supported proposals; lower = more noise.
    • Strategy: which detector to run. SHACL violation accumulation spots constraints that fire repeatedly; cardinality breach flags real-world data exceeding declared min/maxCount; class co-occurrence finds entities that always appear together; NLP candidate promotion promotes terms from log corpora.
    • How to use: run with the default settings first, refresh the proposal list, then tighten min evidence if you're drowning in low-signal candidates.

    Pending proposals

    Every candidate change the monitor surfaced, scored against five dimensions and bucketed by urgency.
    • What's shown: proposal ID, type (new class / new property / cardinality change), composite score (0–1), urgency band, evidence count, and a "View detail" link per row.
    • Band filter: Review now (≥0.80) — high-confidence, look at this week. Weekly batch (0.50–0.80) — defer to a bulk review. Candidate (<0.50) — let the monitor accumulate more evidence first.
    • How to read: sort by score descending. The score weighs evidence volume, schema fit, query impact, novelty, and downstream blast radius — five inputs visible in the detail card below.
    Run the monitor or click refresh to load proposals.
    Regulatory Compliance Dashboard ⓘ How it works
    Workstream 4 — Assemble signed, machine-verifiable evidence bundles for named regulatory frameworks. Monitor per-regulation coverage, traffic-light status per requirement, and gap analysis.

    Regulation coverage

    Per-regulation traffic-light view of how well your ontology + observations cover each regulatory requirement, scoped to your domain so you don't see frameworks that don't apply.
    • What's shown: one row per regulation that applies to your industry (set on Step 1 — Domain Identity), plus any cross-cutting frameworks (e.g. EU AI Act). Columns: requirement count, evidence count, coverage %, traffic light.
    • How filtering works: a healthcare project sees HIPAA + EU AI Act; a finance project sees Basel IV + EU AI Act; a telecom project sees Ofcom + EU AI Act. Custom regulations declare their industries via applies_to in their JSON.
    • Traffic lights: 🟢 ≥80% covered · 🟡 50–80% partial · 🔴 <50% — this regulation won't pass an audit yet.
    • How to use: click Refresh after every pipeline run. Tick Show all to see every regulation regardless of domain (useful when reviewing a multi-jurisdiction deployment).
    Click refresh to load regulation coverage.

    Assemble new evidence bundle

    Build a signed, machine-verifiable ZIP containing every artifact relevant to the chosen regulation.
    • Regulation: the framework you're building evidence for. Each loaded regulation lists its requirements, controls, and the toolkit artifacts that satisfy them.
    • Decision IRI (optional): point at a specific ObservationRecord to scope the bundle to one decision (e.g. one credit-risk score, one care-plan recommendation). Leave blank for a full coverage bundle.
    • How to read: the bundle includes the OWL ontology, SHACL shapes, evidence records, an Ed25519 signature, and a SHA-256 manifest. Auditors verify with one command — see "Exported bundles" below.

    Exported bundles

    History of every signed bundle you've generated.
    • What's shown: bundle ID, regulation, assembly timestamp, file size, signature status, and a download link.
    • Why it matters: bundles are immutable evidence — auditors verify them with the public key in compliance/keys/ without needing access to your live system.
    • Verify offline: python3 -m compliance.bundle verify path/to/bundle.zip — checks SHA-256 manifest + Ed25519 signature.
    Run an assembly to generate a bundle.
    Ontology-Bounded Vector Retrieval ⓘ How it works
    Workstream 5 — The OWL class hierarchy becomes a hard semantic filter on vector similarity search. Inspect embedding indexes, query with class-scoped semantics, and compare retrieval strategies.

    Embedding indexes

    The vector indexes the toolkit has built per flavor. One row per index.
    • What's shown: flavor name, vector store backend (memory / Qdrant / Chroma / Weaviate / pgvector), embedding dimension, record count, and last-built timestamp.
    • Why it matters: a query can only return results from an existing index. If you don't see one for your flavor, run python3 toolkit.py --phase embed --flavor <name> first.
    • How to read: high record counts on indexes you don't query are a cost signal — consider scoping flavors more tightly.
    Click refresh to load indexes.

    Hybrid retrieval — ontology-bounded query

    Run a single query against the index, with the OWL class hierarchy as a hard pre-filter.
    • Flavor: which agent's view to query against (matches a file under runtime/flavors/).
    • Class expression: the OWL class filter — only embeddings tagged with this class (or a subclass) are eligible. Multi-class with | (OR), e.g. tmf:NetworkFunction | tmf:Alarm.
    • k: top-k vector neighbours to return after filtering.
    • Question: the natural-language query that gets embedded and compared.
    • How to read: click Run for one strategy, or Compare all strategies to see UNFILTERED_VECTOR vs ONTOLOGY_BOUNDED vs PURE_SPARQL on the same question side-by-side.

    Examples for your domain

    — click any example to prefill the form above.
    Why these examples matter. A "flavor" is a named, scoped view of your ontology — each one defines which OWL classes an agent is allowed to see and which sensitivity tier it can read. The class expression then narrows that flavor further at query time. The combination is what makes hybrid retrieval safe: a billing agent can't accidentally retrieve clinical observations even if the embedding looks similar.
    • Class expression — pipe-separated OWL classes (OR). Sub-classes are included automatically. Empty = whole flavor.
    • Question — the natural-language query that gets embedded and ranked.
    • Why this works — each example is paired with a one-line rationale so you understand what the OWL filter is actually buying you over a raw vector search.

    Retrieval benchmark

    Head-to-head comparison of the three retrieval strategies on a fixed question set.
    • What's shown: precision@k, MRR, and latency p50/p95 for UNFILTERED_VECTOR (raw kNN), ONTOLOGY_BOUNDED (with the OWL filter), and PURE_SPARQL (no embeddings).
    • Why it matters: proves the OWL filter is actually pulling its weight on your data — typical wins are higher precision and similar latency for ONTOLOGY_BOUNDED vs UNFILTERED_VECTOR.
    • How to read: green wins beat the baseline; red losses indicate the class filter is too tight or the index is stale. Run launches a fresh suite (1–3 minutes); Load latest shows the previous run.
    Click run to compare UNFILTERED_VECTOR vs ONTOLOGY_BOUNDED vs PURE_SPARQL.