Metadata-Version: 2.4
Name: trace-sentinel
Version: 0.2.0
Summary: 潜龙 — a passive, semantic anomaly-detection sentinel for system events and logs.
Project-URL: Homepage, https://github.com/CalBearKen/sentinel
Project-URL: Documentation, https://github.com/CalBearKen/sentinel#readme
Project-URL: Source, https://github.com/CalBearKen/sentinel
Project-URL: Issues, https://github.com/CalBearKen/sentinel/issues
Project-URL: Changelog, https://github.com/CalBearKen/sentinel/blob/main/CHANGELOG.md
Author-email: Ken Li <ken.li.berkeley@gmail.com>
License: MIT
License-File: LICENSE
Keywords: anomaly-detection,logs,monitoring,observability,security,siem,ueba
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Classifier: Topic :: System :: Logging
Classifier: Topic :: System :: Monitoring
Requires-Python: >=3.9
Provides-Extra: dev
Requires-Dist: mypy; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.0; extra == 'docs'
Provides-Extra: embeddings
Requires-Dist: numpy>=1.21; extra == 'embeddings'
Description-Content-Type: text/markdown

# trace-sentinel

Passive anomaly detection for logs and system events.

`trace-sentinel` learns a baseline from your event stream and surfaces only what
genuinely deviates from it — both **statistically** (online z-score / MAD) and
**semantically** (log lines whose *content* is new, not just whose numbers are
large). It's a zero-dependency Python library and CLI, with an optional live
dashboard.

It is a building block, not a platform: embed it in an app, a job runner, or a
pipeline, and route the anomalies wherever you already send alerts.

## Features

- **Streaming and passive** — feed events one at a time; baselines update in O(1)
  with no batch jobs and no stored history.
- **Statistical + semantic** — z-score/EWMA and robust median/MAD for magnitudes,
  plus a semantic-novelty detector for unfamiliar content — both expressed in the
  same sigma units, so one threshold governs everything.
- **Calm by default** — a 3σ default, outlier-resistant detectors, and baseline
  *seeding* so it isn't noisy from a cold start.
- **Zero dependencies** — pure standard-library core, Python 3.9+. Optional extras
  only if you bring real vector embeddings.
- **Composable** — write your own extractors, swap detectors, and send anomalies
  to any sink (console, JSON lines, webhook, your SIEM).
- **Batteries included** — a `trace-sentinel` CLI and a `trace-sentinel-hud` live
  dashboard.

## Install

```bash
pip install trace-sentinel

# optional extras
pip install "trace-sentinel[embeddings]"   # numpy, for custom vector embeddings
pip install "trace-sentinel[dev]"          # pytest, ruff, mypy
```

Python 3.9+. The core has no third-party dependencies.

## Quickstart

```python
from trace_sentinel import Sentinel, Event

sentinel = Sentinel(threshold=3.0, warmup=30, enable_semantic=True)

for line in open("app.log"):
    for anomaly in sentinel.observe(Event.from_line(line, source="app")):
        print(anomaly.severity.value, anomaly.explanation)
```

`observe()` returns the anomalies for that event (usually none). Nothing fires
during warmup or for values inside the learned baseline.

### CLI

```bash
trace-sentinel watch app.log --semantic                       # plain text log
trace-sentinel watch events.jsonl --jsonl --format json       # JSON lines -> JSON out
trace-sentinel watch /var/log/auth.log --follow --semantic    # tail -f a live file
trace-sentinel watch app.log --semantic --seed 3000           # prime baselines first
```

### Live dashboard

```bash
trace-sentinel-hud        # streams your real system logs to http://localhost:8787
```

A self-contained HUD (threat gauge, σ-timeline, per-metric rings, scrolling
anomaly feed) driven entirely by real engine state — no simulated data.

## How it works

```
                 ┌───────────┐   features    ┌────────────┐   sigma   ┌──────────┐
 Event  ───────▶ │ Extractors│ ────────────▶ │  Detectors │ ────────▶ │ Sentinel │ ──▶ Anomaly
 (log line)      └───────────┘ message.length└────────────┘ z / MAD   │ threshold│      (+ Sink)
                       │       message.tokens                          └──────────┘
                       │            ...                                      ▲
                       └────────── message text ───▶ Semantic novelty ──────┘
```

1. **Extractors** turn each `Event` into named numeric metrics (`message.length`,
   `message.tokens`, numeric fields, inter-arrival time, …).
2. **Detectors** keep one online baseline per metric and return a standardized
   deviation in sigma. A value is scored against history *before* it's folded in,
   so a spike is measured against the past, not against itself.
3. The **Sentinel** emits an `Anomaly` when `|score| ≥ threshold`, after a
   per-metric warmup.
4. The **semantic layer** embeds the message text, tracks a decaying centroid of
   "normal," and scores novelty — standardized to sigma so it thresholds exactly
   like the numeric metrics.

Cold baselines over-alert, so `Sentinel.prime()` (and the bundled benign-signal
corpus in `trace_sentinel.seed`) can establish a stable "normal" up front. See
[docs/concepts.md](docs/concepts.md).

## Roadmap

| Stage | Capability | Status |
|---|---|---|
| Detect | passive statistical + semantic detection (this release) | ✅ 0.1 |
| Correlate | cross-source correlation, alert routing, dashboards | planned |
| Respond | automated response / playbooks | planned |

## Documentation

| Doc | Contents |
|---|---|
| [Quickstart](docs/quickstart.md) | Install and first run |
| [Concepts](docs/concepts.md) | The detection model, sigma, and the 2σ caveat |
| [Detectors](docs/detectors.md) | Z-score, EWMA drift, robust MAD |
| [Semantic layer](docs/semantic.md) | Novelty scoring; plugging in real embeddings / LLMs |
| [Extractors](docs/extractors.md) | Built-ins and writing your own |
| [CLI](docs/cli.md) | `trace-sentinel watch` reference |
| [API reference](docs/api.md) | Public classes and functions |
| [Production](docs/production.md) | Tuning, persistence, scaling, gotchas |
| [FAQ](docs/faq.md) | How it compares to SIEM/UEBA |

## Development

```bash
pip install -e ".[dev]"
pytest            # test suite
ruff check .      # lint
mypy              # type-check (strict)
```

## License

MIT © 2026 Ken Li. See [LICENSE](LICENSE).

<sub>Also published under the name **潜龙** (qiánlóng, “hidden dragon”) — dormant until something genuinely stirs it.</sub>
