Domain Identity
ⓘ How it works
Define the scope of your ontology. All generated IRIs and documentation will use these values.
Load a starter template
Pre-built domains that fill in entities, events, relationships, and competency questions for you.
- What's shown: 10 industry templates — five built-in (telecom, healthcare, finance, manufacturing, retail) and five YAML-defined (energy, logistics, government, insurance, pharmaceuticals).
- Why it matters: a starter saves 30+ minutes of typing and gives you a known-good shape to edit. You can change anything afterwards.
- How to use: click any tile to load it. The whole session is replaced — fields below, entities, events, relationships, and CQs all update at once.
Import an existing schema
Drop a .json session file (wizard or onboard.py format) or a
.json schema. SQL DDL import lands in a follow-up step (#16).
Connect directly to your operational database. Tables, columns, and foreign keys are read into the same review modal as the file path — no upload required.
Pick an existing profile to pre-fill the form, or leave on
— New connection — to enter fresh credentials.
Select a vendor to enable these actions.
Credential handling.
Passwords are stored locally in
db/ontologies.db with base64
obfuscation — a defence against casual inspection, not encryption. Profiles
never leave your machine. For production credentials, prefer secrets
provided via your organisation's key store at runtime.
Domain details
The identity of your ontology — used in every generated IRI, SHACL shape, and report.
- Domain Name: human-readable name shown in the report header (e.g. "Smart Building Operations").
- Description: one or two sentences. Goes into the auto-generated scope charter and is read by the LLM if you use
--llmsuggestions. - Base IRI: the namespace every OWL class lives under. Pick once and stick with it — changing it later breaks downstream graph stores.
- Industry: selects sensible defaults for sensitivity tiers and standards alignment (e.g. TM Forum SID for telecom).
Entities
ⓘ How it works
The main things in your domain — assets, customers, resources, orders. Each becomes an OWL class.
Entities (0)
The persistent nouns of your domain — the things your data is about.
- What's shown: one chip per entity, with its sensitivity tier and a delete button. The badge in the heading is the live count.
- Why it matters: every entity becomes an
owl:Classin the generated ontology, ash:NodeShapein the SHACL file, and a@typein the JSON-LD context. - How to read: aim for 5–10 entities. If you have more than 15, you're probably modelling implementation details rather than domain concepts. Click a chip to edit; drag to reorder.
Convert a folder of raw logs into ontology candidates an engineer reviews one at a time. Approved candidates flow into the regular Wizard steps (Entities · Events · Relationships · Rules) and end up as classes, properties, and SPARQL rules in your generated ontology.
Pipeline at a glance
📂
Mine
Drain clusters log lines into templates; typed slots; PMI entity graph; Granger causality gate
--phase mine
→
📈
Sequence
Per-service HMM over trajectories; anomalous sequences surface as event candidates
--phase sequence
→
👁️
Review
You are here. Approve / reject / edit / merge proposals so they flow into your session
Step 5 (this screen)
🛠️
Generate
RCA-shaped ontology with
:CausalEvent · :hasCause · :rootCause taxonomy--phase 2
🧠
Reason
Materialise derived
:hasCause + :triggers with prov:wasDerivedFrom lineage--phase reason
💬
Ask
Insights preset "what caused this event?" answers from the materialised graph
/api/insights/rca
The four candidate kinds
📅 Event
LOG_EVENT
A recurring message pattern Drain found in your logs. Approving it creates an OWL class.
Approval lands in: Step 3 — Events
Becomes:
Auto-emits:
Becomes:
rdfs:subClassOf :CausalEventAuto-emits:
:startedAt · :endedAt · :duration
🏷️ Entity
LOG_ENTITY
A variable slot the template carries — typed as IRI / UUID / ENUM. An identifier worth modelling.
Approval lands in: Step 2 — Entities
PII heuristic:
Sensitivity: defaults to Confidential when PII
PII heuristic:
imsi · msisdn · email auto-flaggedSensitivity: defaults to Confidential when PII
🔗 Relationship
LOG_RELATIONSHIP
Two entities that co-occur with strong PMI but where the statistical causality test was inconclusive.
Approval lands in: Step 4 — Relationships
Direction: undirected
Signal: PMI ≥ 3.0
Direction: undirected
Signal: PMI ≥ 3.0
⚡ Causal edge
LOG_CAUSAL_EDGE
A directed edge that passed all three gates: PMI, temporal ordering, and Granger or transfer-entropy.
Approval lands in: Step 7 — Rules
Becomes: SPARQL CONSTRUCT for
Materialised in:
Becomes: SPARQL CONSTRUCT for
:hasCauseMaterialised in:
--phase reason
What each button does
✓
Approve
Marks
APPROVED and pushes into the matching session bucket. Visible on Steps 2 / 3 / 4 / 7 right away.
✕
Reject
Marks
REJECTED with an optional note. Persists across re-mines — your decision sticks.
✎
Edit
Override the auto-suggested label before approving. Turn "Event template #18" into "Heartbeat Timeout".
🔀
Merge
Fold into an existing session entry by name. The sample log line joins the target's
evidence_samples.📊 Sort order
Confidence × consequence. PRML Ch. 1 decision-theoretic ranking — a low-confidence CRITICAL row outranks a high-confidence INFO row.
🔒 Privacy
Slot values matching
imsi · msisdn · email · ssn · tax default to Confidential sensitivity. Downgrade at approval time if appropriate.
🔄 Drift loop
--phase drift-templates watches new logs and queues never-seen patterns here with strategy DRIFT_ON_NEW_TEMPLATE — same review surface for bootstrap + steady-state.
Events
ⓘ How it works
Things that happen — incidents, alerts, orders. Events become OWL subclasses of a base Event class.
Events (0)
The verbs — things that happen at a point in time.
- What's shown: one chip per event, the live count in the heading badge.
- Why it matters: events become
owl:Classsubclasses of a baseEventclass withprov:atTimeandprov:wasAttributedToproperties — drift detection, audit logs, and PROV-O lineage all hang off them. - How to read: entities answer "what exists?", events answer "what happened?". A good rule: if you'd record it in a log line or audit trail, it's an event. 3–8 event types is typical.
Relationships
ⓘ How it works
Describe how entities connect. Write plain sentences — e.g. "A Customer places one or more Orders."
Relationships (0)
How your entities connect — written as plain English sentences.
- What's shown: a List tab for editing and a Graph tab for a force-directed view of your domain.
- Why it matters: each sentence becomes an
owl:ObjectPropertywith properrdfs:domainandrdfs:range. The graph view exposes structural problems (orphan entities, missing back-edges, accidental cycles) at a glance — issues you'd miss scanning a flat list. - How to read: always source entity → verb phrase → target entity, e.g. "A Customer places one or more Orders." Cardinality words become SHACL constraints automatically. Drag nodes to explore; the layout settles after a second.
Reading the graph.
- Nodes = your entities and events. Blue = persistent things, purple = events. Orphan nodes (no edges) usually mean you've defined an entity but never connected it.
- Arrows point from source entity to target entity, labelled with the verb phrase. Bidirectional concepts need two relationships, one each way.
- Drag a node to rearrange. Scroll to zoom. ⟳ Re-layout kicks the physics simulation; Fit centres everything in view.
Competency Questions
ⓘ How it works
Questions your ontology must be able to answer. These drive the governance scorecard and SPARQL test suite.
Questions (0)
The questions your ontology must be able to answer.
- What's shown: one row per CQ, count in the heading badge. New ones go in the input below.
- Why it matters: each CQ becomes a SPARQL test in the generated test suite. Their pass rate is one of the 34 inputs to the governance scorecard, and a failed CQ on CI blocks merge.
- How to read: a good CQ is specific and answerable from the data — "Which assets have been in outage for more than 4 hours?" beats "Are assets healthy?". Aim for 5–8 to start; you can always add more later.
Rules & Reasoning
ⓘ How it works
Author SHACL, SPARQL CONSTRUCT, and OWL-axiom rules. Saved rules are validated, persisted to
session.rules, and consumed by --phase reason at materialisation time.Rules (0)
Rules let the reasoner derive triples your data doesn't state directly.
- SHACL —
sh:ruleblocks. "When this shape matches, assert these triples." Use for data-driven assertions. - SPARQL —
CONSTRUCT { ?x :p ?y } WHERE { ... }. Use for pattern-based inference (transitive impact, joins). - OWL — Turtle fragments declaring property characteristics (transitive, equivalent, etc.). Use for taxonomy-level axioms.
- Validation — every rule is parsed and smoke-executed against an empty graph before save. Broken rules cannot be saved.
Generate Ontology
ⓘ How it works
Review your session and run the toolkit pipeline to generate OWL, SHACL, JSON-LD, SKOS, and the governance report.
Pre-flight check & run
The pipeline reads your wizard session, generates the full ontology bundle (OWL, SHACL, JSON-LD, SKOS), runs the SPARQL competency-question tests, and writes the HTML governance report. Typical run on a small domain: 3–10 seconds.
- Required: a domain name and at least one entity. Without these the pipeline runs but produces an empty graph.
- Recommended: a base IRI and at least one competency question. CQs drive the governance scorecard, so a run without them won't score above the minimum gate.
- Reversible: nothing in your input is changed. Outputs land under
output/(or your wizard project directory) and can be regenerated freely.
Session Summary
A read-only roll-up of everything you've defined so far.
- What's shown: domain identity + counts of entities, events, relationships, and CQs.
- Why it matters: last sanity check before generation. If a count looks wrong, jump back to that step before clicking run.
- How to read: rough sizing — under 5 entities is probably too thin; over 20 is probably too detailed. Same shape rule for CQs.
Pipeline phases
Which steps of the toolkit pipeline to run. Default = all of them.
- 1 — Foundation: introspects your schema and writes the annotated class/property inventory.
- 2 — OWL Modeling: generates
enterprise.ttl+events.ttl+provenance.ttl. - 3 — SHACL: generates validation shapes (
enterprise-shapes.ttl,agent-gate.ttl). - 4 — Mapping: writes the mapping workbook + semantic loss report so you can see which DB columns didn't map cleanly.
- 5 — JSON-LD: emits the JSON-LD context, sample payloads, SKOS vocab, and MCP tool definitions.
- Test: runs the SPARQL CQ suite and computes the 34-criterion governance scorecard.
- Report: renders the visual HTML run summary.
Pipeline Output
Live tail of the toolkit pipeline as it runs.
- What's shown: the last ~50 lines of stdout/stderr from
toolkit.py, refreshed every second while running. - Why it matters: this is where you see
✓ PASS/✗ FAILon each phase, the SPARQL CQ results, and the final governance score. - How to read: green ticks = phase succeeded. Red
FAIL/ERROR= look at the line above for the cause (commonly a missing entity or a malformed relationship). The badge top-right turns green on success.
Waiting to run pipeline...
Ontology Library
All ontologies you've saved — load any one back into the wizard, or open it in the read-only viewer.
Saved Ontologies
Loading…
Settings
Switch starting points · pick which domain templates appear on Step 1 · configure LLM providers.
Switch starting point
Re-open the welcome picker to load a different industry template.
The current session (entities, events, relationships, CQs) will be
replaced by the template you pick — save to Library first if
you want to keep it.
Domains to show on Step 1
Tick the domains you want on the Step 1 · Domain Identity
template grid. By default nothing is ticked — Step 1 stays clean
until you pick. Unticked domains stay loadable any time via the
Switch-starting-point picker above and via
GET /api/template/<name>. (This does not
affect the marketing landing page at /, which is brand
chrome only.)
Loading…
LLM Providers
Insights uses one of these providers per call. Keys are read from
environment variables (or the provider's config file) — never
persisted by the wizard. The residency hint tells you whether
a request leaves your tenancy. Materialised triples from
Phase B are sent only when the toggle below is on, and only if you
confirm after seeing the hint for the active provider.
Loading…
Ask Insights
Question grounded in the asserted ontology. Toggle Include
materialised to also send Phase B inferences (transitive
impact, sh:rule outputs, etc.). The response is checked for
hallucinated IRIs against the ontology vocabulary.
Ontology Evolution Review
ⓘ How it works
Workstream 2 — Review ontology-evolution proposals surfaced from production observations. Approve, reject, or defer each candidate. Approved proposals flow through the reasoner + SPARQL CQ gate before auto-bumping the ontology version.
Detection run
Trigger the anomaly monitor to scan production observations and surface candidate ontology changes.
- Minimum evidence: how many independent observations a candidate needs before it becomes a proposal. Higher = fewer, better-supported proposals; lower = more noise.
- Strategy: which detector to run. SHACL violation accumulation spots constraints that fire repeatedly; cardinality breach flags real-world data exceeding declared
min/maxCount; class co-occurrence finds entities that always appear together; NLP candidate promotion promotes terms from log corpora. - How to use: run with the default settings first, refresh the proposal list, then tighten
min evidenceif you're drowning in low-signal candidates.
Pending proposals
Every candidate change the monitor surfaced, scored against five dimensions and bucketed by urgency.
- What's shown: proposal ID, type (new class / new property / cardinality change), composite score (0–1), urgency band, evidence count, and a "View detail" link per row.
- Band filter: Review now (≥0.80) — high-confidence, look at this week. Weekly batch (0.50–0.80) — defer to a bulk review. Candidate (<0.50) — let the monitor accumulate more evidence first.
- How to read: sort by score descending. The score weighs evidence volume, schema fit, query impact, novelty, and downstream blast radius — five inputs visible in the detail card below.
Run the monitor or click refresh to load proposals.
Regulatory Compliance Dashboard
ⓘ How it works
Workstream 4 — Assemble signed, machine-verifiable evidence bundles for named regulatory frameworks. Monitor per-regulation coverage, traffic-light status per requirement, and gap analysis.
Regulation coverage
Per-regulation traffic-light view of how well your ontology + observations cover each regulatory requirement, scoped to your domain so you don't see frameworks that don't apply.
- What's shown: one row per regulation that applies to your industry (set on Step 1 — Domain Identity), plus any cross-cutting frameworks (e.g. EU AI Act). Columns: requirement count, evidence count, coverage %, traffic light.
- How filtering works: a healthcare project sees HIPAA + EU AI Act; a finance project sees Basel IV + EU AI Act; a telecom project sees Ofcom + EU AI Act. Custom regulations declare their industries via
applies_toin their JSON. - Traffic lights: 🟢 ≥80% covered · 🟡 50–80% partial · 🔴 <50% — this regulation won't pass an audit yet.
- How to use: click Refresh after every pipeline run. Tick Show all to see every regulation regardless of domain (useful when reviewing a multi-jurisdiction deployment).
Click refresh to load regulation coverage.
Assemble new evidence bundle
Build a signed, machine-verifiable ZIP containing every artifact relevant to the chosen regulation.
- Regulation: the framework you're building evidence for. Each loaded regulation lists its requirements, controls, and the toolkit artifacts that satisfy them.
- Decision IRI (optional): point at a specific
ObservationRecordto scope the bundle to one decision (e.g. one credit-risk score, one care-plan recommendation). Leave blank for a full coverage bundle. - How to read: the bundle includes the OWL ontology, SHACL shapes, evidence records, an Ed25519 signature, and a SHA-256 manifest. Auditors verify with one command — see "Exported bundles" below.
Exported bundles
History of every signed bundle you've generated.
- What's shown: bundle ID, regulation, assembly timestamp, file size, signature status, and a download link.
- Why it matters: bundles are immutable evidence — auditors verify them with the public key in
compliance/keys/without needing access to your live system. - Verify offline:
python3 -m compliance.bundle verify path/to/bundle.zip— checks SHA-256 manifest + Ed25519 signature.
Run an assembly to generate a bundle.
Ontology-Bounded Vector Retrieval
ⓘ How it works
Workstream 5 — The OWL class hierarchy becomes a hard semantic filter on vector similarity search. Inspect embedding indexes, query with class-scoped semantics, and compare retrieval strategies.
Embedding indexes
The vector indexes the toolkit has built per flavor. One row per index.
- What's shown: flavor name, vector store backend (memory / Qdrant / Chroma / Weaviate / pgvector), embedding dimension, record count, and last-built timestamp.
- Why it matters: a query can only return results from an existing index. If you don't see one for your flavor, run
python3 toolkit.py --phase embed --flavor <name>first. - How to read: high record counts on indexes you don't query are a cost signal — consider scoping flavors more tightly.
Click refresh to load indexes.
Hybrid retrieval — ontology-bounded query
Run a single query against the index, with the OWL class hierarchy as a hard pre-filter.
- Flavor: which agent's view to query against (matches a file under
runtime/flavors/). - Class expression: the OWL class filter — only embeddings tagged with this class (or a subclass) are eligible. Multi-class with
|(OR), e.g.tmf:NetworkFunction | tmf:Alarm. - k: top-k vector neighbours to return after filtering.
- Question: the natural-language query that gets embedded and compared.
- How to read: click Run for one strategy, or Compare all strategies to see UNFILTERED_VECTOR vs ONTOLOGY_BOUNDED vs PURE_SPARQL on the same question side-by-side.
Examples for your domain
— click any example to prefill the form above.
Why these examples matter. A "flavor" is a named, scoped view of your ontology — each one defines which OWL classes an agent is allowed to see and which sensitivity tier it can read. The class expression then narrows that flavor further at query time. The combination is what makes hybrid retrieval safe: a billing agent can't accidentally retrieve clinical observations even if the embedding looks similar.
- Class expression — pipe-separated OWL classes (OR). Sub-classes are included automatically. Empty = whole flavor.
- Question — the natural-language query that gets embedded and ranked.
- Why this works — each example is paired with a one-line rationale so you understand what the OWL filter is actually buying you over a raw vector search.
Retrieval benchmark
Head-to-head comparison of the three retrieval strategies on a fixed question set.
- What's shown: precision@k, MRR, and latency p50/p95 for UNFILTERED_VECTOR (raw kNN), ONTOLOGY_BOUNDED (with the OWL filter), and PURE_SPARQL (no embeddings).
- Why it matters: proves the OWL filter is actually pulling its weight on your data — typical wins are higher precision and similar latency for ONTOLOGY_BOUNDED vs UNFILTERED_VECTOR.
- How to read: green wins beat the baseline; red losses indicate the class filter is too tight or the index is stale. Run launches a fresh suite (1–3 minutes); Load latest shows the previous run.
Click run to compare UNFILTERED_VECTOR vs ONTOLOGY_BOUNDED vs PURE_SPARQL.