← Back to Studio · Step 5 · Log Discovery · Help
On this page

Log Discovery

Turn a folder of raw operational logs into reviewable ontology candidates — events, entities, relationships, and causal rules — using a chain of statistical and Bayesian methods. No labels. No hand-written regexes. No neural fine-tuning. You stay in the loop on every decision.

Why this exists the gap

✕ Regexes & dashboards

  • Catch only what someone already thought to look for.
  • Miss every other failure mode.
  • Drift the moment the log format changes.

✕ End-to-end neural

  • Catch more — but produce embeddings engineers can't read.
  • Hard to audit, hard to defend in review.
  • Hard to act on a 768-dimensional vector.

Log Discovery takes the middle path: statistical inference an engineer can read. Every candidate it surfaces has an explicit reason — "templates X and Y share an upstream cause", "this trajectory's per-step log-likelihood is in the bottom 5 %" — that you can challenge, accept, or reject.

Pipeline at a glance four phases

MINE Drain templates Slot typing PMI entity graph L1, L1.5 SEQUENCE HMM per service Trajectory anomalies GP rate spikes L2, L8, L10, L13 CAUSALITY Granger + TE gate PC algorithm DAG Shared causes L3, L11 REVIEW Confidence × value sort Active-learning re-rank pPCA merge candidates L4, L9, L12 DATA FLOWS LEFT-TO-RIGHT · YOU REVIEW AT THE END 📂 Raw log files 📋 Reviewed ontology candidates

The pipeline runs as two CLI phases: python toolkit.py --phase mine --log-path … then --phase sequence --log-path …. Both finish in seconds for a typical 10 k–50 k line corpus. Then you open this wizard's Step 5 to review.

What you get four kinds of proposals

🎯

Events

OWL classes derived from repeating log templates. e.g. :UserLoggedIn.

🔑

Entities

Identifier properties from typed slots. e.g. :hasUserId as UUID.

Relationships

Undirected PMI-weighted edges → owl:ObjectProperty candidates.

Causal edges

Directed edges past the triangulation gate. :X :causes :Y.

The contract: nothing flows into your ontology automatically. Every proposal starts as PENDING. You approve, edit, reject, or merge — and the pipeline ranks by confidence × consequence so the highest-leverage decisions sit at the top.

v1 layers L1 — L4 · shipped

The original seven layers. Each is a small, independently-testable PRML-grounded step.

L1

Template clustering

Bishop §9

What Cluster every log line into a small set of templates with wildcards <*> for variable parts. "User u-123 logged in" and "User u-456 logged in" collapse to one template.

Algorithm Drain3 tree-based online clustering, refined by a small EM merge pass for near-duplicates.

Why it matters Templates are the atomic unit downstream. Over-merge and distinct events vanish; over-split and review is exhausting.

In Studio Each row in log_templates becomes a candidate :Event_N class with a sample log line attached.

User u-123 logged in User u-456 logged in User u-789 logged in User <*> logged in template · hits=3
3 raw lines collapse into 1 template with a wildcard slot.
L1.5

Slot typing & entity graph

Bishop §8

What Classify each <*> position by the values it takes (IRI, UUID, enum, numeric, free text) and flag possible PII. Build an undirected graph over slot values that co-occur within a short time window.

Algorithm Regex ladder + Shannon entropy for typing. Pointwise mutual information (PMI) + temporal-lead ratio for the graph.

Why it matters Entity proposals start here. "slot 2 of template #7 is a UUID with 4 130 distinct values" tells you it's an identifier property worth keeping.

In Studio log_template_slots + log_entity_edges rows; visible as Entity and Relationship proposals.

userId orderId cartId PMI 6.2 PMI 4.1 PMI 3.8
High-PMI edges between slot values become relationship candidates.
L2

HMM trajectories & anomalies

Bishop §13

What Group log lines by service into time-ordered sequences of template IDs. Fit a per-service Hidden Markov Model. Flag sequences in the bottom 5 % of per-step log-likelihood.

Algorithm CategoricalHMM (hmmlearn). State count by BIC. Combined with cluster-bag novelty for trajectories that miss the service's mode templates.

Why it matters Individual log lines look fine; the sequence tells the story. The HMM learns the normal rhythm of each service so deviations surface.

In Studio LOG_EVENT proposals at the top of the queue, each linked back to the anomalous trajectory.

log-likelihood 5th pct threshold anomaly time →
Anomalous trajectory (red) falls below the 5th-percentile likelihood threshold.
L3

Causality gate

Bishop §11, §14

What For each directed entity edge, test whether the history of src actually informs the future of dst beyond dst's own history.

Algorithm Granger F-test on time-binned rate series; transfer entropy z-score as fallback. Triangulation: PMI + lead + statistical causality must all agree.

Why it matters Temporal-lead is necessary but not sufficient for causality. Without this gate, every "occurs before" pair looks like a cause.

In Studio Directed edges that pass become LOG_CAUSAL_EDGE; those that fail become LOG_RELATIONSHIP instead.

A(t) B(t) B(t) ← A(t−1) p < 0.01 ✓ Granger-causal
B's rises track A's with a small lag. Granger says A causes B.
L4

Review surface

Bishop §1

What Load every proposal into the wizard's Step 5 queue, sorted by confidence × consequence. Engineer approves, rejects, edits, or merges.

Algorithm Decision-theoretic ordering by expected information value.

Why it matters Your time is the bottleneck. The highest-leverage decisions sit at the top so you can stop reading after the most important N.

In Studio This panel.

95% — :NFDeregister 82% — :PaymentRetry 61% — :CacheHit 23% — :Heartbeat
Top of queue = highest leverage. Confidence × severity decides.

v2 enhancements L9 — L13 · shipped

Six additive enhancements landed on top of v1. Each targets a specific shallow spot that became obvious in real use.

L9

Active-learning review-queue ranker

Bishop §1, §4.3

What A logistic-regression model trained on your past approve/ reject decisions reorders the queue so the things you tend to approve rise to the top.

Algorithm scikit-learn LogisticRegression on per-proposal features (kind, confidence, evidence volume, slot-PII flag, age, regime tag, …) with median imputation + class balancing.

Why it matters Without it, engineers re-reject the same noise template every session. With it, the queue learns from each decision.

In Studio "↻ Re-rank with feedback" button on Step 5; a small badge under the controls shows "ranker · 47 decisions · AUC 0.83".

before after re-rank approvals first; noise sinks.
Approved (green) rise; rejected (red) sink. Same data, smarter order.
L8

Switching state-space regimes

Bishop §9, §13.3.3

What Replace v1's single per-service HMM with a mixture of HMMs. A per-trajectory regime indicator selects which HMM applies — separating business hours from weekend maintenance from incident response.

Algorithm VB-EM with Dirichlet prior on regime mixing weights; K selected by ELBO over 3 random restarts; K ≤ 5.

Why it matters v1's single HMM averages across regimes and flags normal-but-rare patterns (e.g. weekend maintenance) as anomalies. The largest single source of false positives.

In Studio Each LOG_EVENT proposal carries an amber chip — "Regime 2 of 3" — so you know which operating mode produced it.

two distinct regimes Regime 1: weekday Regime 2: maintenance
A single HMM would flag Regime 2 as anomalous. L8's mixture keeps it as “normal for that regime”.
L10

Calibrated confidence via VB

Bishop §10

What Replace v1's heuristic max(c, 0.6) floor with a real posterior probability. Confidence becomes "P(per-step logL < anomaly threshold)" under a Variational Bayesian HMM.

Algorithm VB-HMM with Dirichlet priors on π, A, B; mean-field VB-EM via log-space forward-backward; Bayesian model averaging across K. Confidence via Monte-Carlo posterior predictive.

Why it matters An audit-ready confidence. Comparable across proposal kinds, defensible to a reviewer. ECE < 0.05 on synthetic data.

In Studio Confidence scores stop clustering at 0.6 / 0.7 / 0.8 (the v1 floors) — they spread smoothly across [0, 1].

predicted actual ideal v1 floors v2 spread
v1 confidences clump at heuristic floors; v2 tracks the diagonal — true posterior.
L11

PC algorithm causal DAG

Bishop §8.2, §8.4.5

What Replace pairwise Granger with a structural causal graph. The PC algorithm tests conditional independence under expanding conditioning sets, so shared upstream causes eliminate spurious "B causes C" edges.

Algorithm Partial Pearson correlation + Fisher z-transform CI test. Skeleton phase with conditioning sets ≤ 3. V-structure orientation. Meek rules R1–R4.

Why it matters Pairwise tests can't tell "A causes both B and C" from "B causes C". PC can.

In Studio Causal-edge titles gain a suffix like "— shared upstream cause: template #41" when the PDAG identifies a confounder.

v1 Granger A B C spurious B↔C v2 PC A B C B ⊥ C | A — edge dropped
Same data, two graphs. PC drops the spurious B↔C once conditioned on A.
L12

pPCA template similarity

Bishop §12.2

What A 2-D PCA scatter of all active templates, with merge candidates highlighted in amber when their distance in the full-dim embedding falls below a threshold.

Algorithm Bag-of-tokens vectoriser + probabilistic PCA; L2-normalised chord distance for the merge-candidate test.

Why it matters Drain's sim_th=0.4 is a hard knob. Some near-duplicates always slip through. The scatter lets you spot them visually and merge with one click.

In Studio The "Template similarity" card on Step 5; close pairs are connected by faint amber lines.

amber pairs ≈ likely duplicates
Templates as 2-D points. Close pairs are merge candidates.
L13

GP rate-shape anomalies

Bishop §6.4

What Per template, fit a Gaussian Process over hour-of-day rate and flag observations outside the 99 % posterior interval — catching the burst-of-normal-templates pattern v1's HMM misses entirely.

Algorithm Periodic (24 h) + RBF + white-noise kernel; per-hour-median aggregation for robustness; MAD-based residual z-score against the smooth baseline.

Why it matters When a deploy goes bad and a normally-quiet endpoint floods the queue with success logs, v1's HMM sees nothing wrong — the templates are fine, there are just too many of them. The GP sees the rate spike.

In Studio Rows tagged with a green "GP rate" chip and an inline sparkline. "↻ GP rate scan" button on the toolbar.

3am spike hour of day →
Green band = 99 % GP posterior. Single red spike at 3 am falls outside → flagged.

Benefits before vs after

Without Log Discovery

  • Regexes catch only what you thought of.
  • Dashboards fire on guessed thresholds.
  • RCA = chasing temporal-correlation pairs.
  • Same noise rejected every week.
  • “Confidence 0.8.” What does that mean?

With Log Discovery

  • Drain proposes templates; pPCA flags the duplicates.
  • HMM + GP catch sequence and rate anomalies.
  • PC algorithm builds a real causal DAG with shared causes.
  • Ranker learns what you approve; noise sinks.
  • Confidence is a real posterior. ECE < 0.05.

What to expect 5-step workflow

1

Point at a log folder

python toolkit.py --phase mine --log-path /your/logs/ then --phase sequence. Both finish in seconds for a 10 k–50 k line corpus.

2

Open Step 5

The Log Discovery panel lights up. Four kinds of proposals — events, entities, relationships, causal edges — with sample log lines, confidence scores, and (when applicable) regime tags, GP chips, or shared-cause hints.

3

Review top-down

Events first (they define your vocabulary). Then entities (the identifiers). Then relationships and causal edges (they depend on the classes above).

4

Re-rank periodically

After 20+ decisions, click "↻ Re-rank with feedback". The logistic-regression ranker refits and the queue reorders.

5

Approved proposals flow into your ontology

Click ✓ Approve → the candidate joins session.events / .entities / .relationships / .causal_rules. When you hit Generate, they emerge as OWL classes, properties, and SHACL rules.

You stay in the loop. Nothing is added to your ontology without an explicit approve click. Reject and merge are equally fast. The pipeline is opinionated about what to surface, not what to keep.

Glossary terms used above

Drain
Tree-based online clustering that groups log lines into templates by prefix similarity.
PMI
Pointwise mutual information — log-ratio of co-occurrence vs marginal product. High PMI ⇒ "more often together than chance".
HMM
Hidden Markov Model — discrete-state sequence model. Used over template-id trajectories.
BIC
Bayesian Information Criterion. Penalises complexity. Used to pick HMM state count.
Granger
F-test on whether past(X) predicts future(Y) beyond past(Y) alone.
VB
Variational Bayes — approximate inference by KL-minimisation over a simpler family. Used in L8, L10.
ELBO
Evidence Lower Bound. VB's objective. Differences in ELBO across K select regime count.
PC algorithm
Peter–Clark causal structure learner. Skeleton phase, v-structures, Meek rules.
ECE
Expected Calibration Error. Average gap between predicted confidence and observed accuracy.
PDAG
Partially-directed acyclic graph. PC's output. Some edges oriented, some undirected.
MAD
Median Absolute Deviation. Robust scale estimator. L13 uses it so a single spike can't hide itself.
pPCA
Probabilistic PCA. PCA with a Gaussian noise model. Bishop §12.2.