Metadata-Version: 2.4
Name: agentic-log-analyser
Version: 0.1.0
Summary: Deterministic log templating on top of Drain3, packaged as an artifact for AI agents.
Author: agentic-log-analyser
License: MIT
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: drain3>=0.9.11
Requires-Dist: mcp>=1.2
Requires-Dist: boto3>=1.34
Provides-Extra: build
Requires-Dist: pyinstaller>=6.0; extra == "build"
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == "dev"
Provides-Extra: mcp
Requires-Dist: mcp[cli]>=1.2; extra == "mcp"
Dynamic: license-file

# AgenticAILogAnalyser

Python port of [codag-drain](https://github.com/codag-megalith/codag-drain) that
uses the upstream Python [Drain3](https://github.com/logpai/Drain3) package as
its grouping engine. Same CLI surface, same output shape, same evidence-rich
artifact, packaged as a single binary you can drop into any environment.

The intended consumer is an AI agent that needs to read a large log window
under a fixed token budget. Instead of feeding the agent 1,400 raw lines, you
feed it 8 templates with slot statistics and a few raw examples per group.

## What it does

Takes a stream of log lines on stdin, groups near-duplicates with Drain3, and
emits one templated line per group with:

- the count of collapsed lines,
- a derived `<*>` template,
- per-slot stats (min / max / median for numeric slots, distinct values for
  enums, an auto-detected unit like `ms` or `MB`),
- a few raw sample lines.

The intended consumer is an LLM agent that needs to read a large log window
under a fixed token budget.

## Real-world example

A 1,438-line Kiro IDE log compresses to 8 templates at ~180x compression:

```
[x1] [WebviewProcessMonitor] Service starting
[x4] update#setState <*> [idle,downloading,downloaded,ready]
[x14] [WebviewProcessMonitor] Tracking webview renderer: pid=<*>, origin=<*>, windowId=<*> [13773..87619 p50=87288.5]
[x1] update#setState checking for updates
[x14] Extension host with pid <*> exited with code: 0, signal: unknown. [13697..89755 p50=73921]
[x1395] No ptyHost heartbeat after 6 seconds
[x8] [WebviewProcessMonitor] Webview renderer process gone: pid=<*>
[x1] Extracting content from 1 URIs
[codag-drain-py] 1438 lines -> 8 templates (179.8x)
```

The dominant signal — 97% of the file being one repeating warning — is the
first thing the model sees instead of being buried. Numeric ranges and enum
values are preserved, so outliers and state distributions stay visible.

## Install

From source:

```bash
pip install -e .
```

From source with the build extra (PyInstaller):

```bash
pip install -e ".[build]"
```

## Usage

```bash
echo 'worker latency 20ms
worker latency 20ms
worker latency 20ms
worker latency 8400ms' | codag-drain-py --stats
```

```
[x4] worker latency <*> [20..8400ms p50=20ms]
[codag-drain-py] 4 lines -> 1 templates (4.0x)
```

JSON output:

```bash
echo 'worker ready shard=1' | codag-drain-py --format json
```

Choose a grouper:

```bash
cat logs.txt | codag-drain-py --grouper drain-stock
```

NDJSON input:

```bash
cat events.ndjson | codag-drain-py --json
```

Available groupers:

| name | description |
|------|-------------|
| `drain` (default) | Drain3 with codag's compact-line tokenizer fallback |
| `drain-stock` | Drain3 with vanilla whitespace tokenization |
| `drain-delimited` | Drain with extra punctuation delimiters folded into whitespace |
| `drain-fullsearch` | Drain similarity over all same-length clusters (no prefix-tree) |
| `statistical` | Non-Drain control: IDF-weighted anchor co-occurrence |

## Build a single-file binary

```bash
./scripts/build_binary.sh
./dist/codag-drain-py --help
```

PyInstaller bundles the Python interpreter and `drain3` into one file under
`dist/`. Build on each OS / architecture you intend to ship.

## Programmatic API

```python
from codag_drain_py import LogLine, TemplaterConfig, template_logs

result = template_logs(
    [LogLine(message="latency 20ms"), LogLine(message="latency 8400ms")]
)
print(result.render())
print(result.to_json(indent=2))
```

`TemplateIndex` exposes the streaming variant:

```python
from codag_drain_py import LogLine, TemplateIndex

idx = TemplateIndex()
for msg in some_iterator():
    idx.push(LogLine(message=msg))
print(idx.templates().render())
```

## Tests

```bash
pip install -e ".[dev]"
pytest
```

## Credits

- [Drain3](https://github.com/logpai/Drain3) — the underlying log template
  miner from logpai. We use the published PyPI package directly.
- [codag-drain](https://github.com/codag-megalith/codag-drain) — the Rust
  project this Python port is modeled on. The compact-line tokenizer fallback,
  multi-member template derivation, slot profiling, and CLI surface all
  follow that design.
- [Drain paper](http://jiemingzhu.github.io/pub/pjhe_icws2017.pdf) — He et al.,
  "Drain: An Online Log Parsing Approach with Fixed Depth Tree", ICWS 2017.

## License

MIT. See [`LICENSE`](LICENSE).

## Layout

```
src/codag_drain_py/
    __init__.py     public exports
    __main__.py     `python -m codag_drain_py`
    cli.py          argparse + stdin pipeline
    compress.py     templater entry point + rendering
    grouper.py      Drain / DrainStock / DrainDelimited / FullSearch / Statistical
    input.py        heuristic line + NDJSON parsers
    lex.py          character-class tokenizer + lex template derivation
    profile.py      slot capture, numeric stats, distinct-value summaries
    stream.py       TemplateIndex streaming wrapper
    template.py     whitespace template derivation + capture regex
tests/
    test_compress.py
    test_input.py
scripts/
    build_binary.sh PyInstaller --onefile build
```


## MCP server (use as a tool from Kiro / Claude / any MCP client)

The analyser ships with a built-in [Model Context Protocol](https://modelcontextprotocol.io)
server. Once registered with Kiro or Claude Desktop, your assistant can call it
as a tool to compress logs on demand without you piping anything through a
shell.

### What it exposes

Five tools, all served over stdio:

| tool | description |
|------|-------------|
| `analyse_logs` | Compress an inline log body. Returns templated artifact + summary. |
| `analyse_log_file` | Same but reads the body from a local file path. |
| `stream_push` | Append lines to a named streaming session. |
| `stream_project` | Render templates over the accumulated session. |
| `stream_reset` | Clear a session. |

Each tool accepts the full set of analyser options: `grouper`, `sample_cap`,
`template_clip`, `body_format`, `output_format`.

### Build the MCP binary

```bash
./scripts/build_mcp_binary.sh
```

This produces a single self-contained binary at `dist/agentic-log-analyser-mcp`
(~22 MB). It bundles the Python interpreter, the analyser, `drain3`, and the
MCP SDK — no Python install required on the machine that runs it.

### Register with Kiro

Open Kiro's MCP config (Command Palette → "Open MCP Config" or edit
`.kiro/settings/mcp.json` in your workspace, or `~/.kiro/settings/mcp.json` for
user-wide). Add:

```json
{
  "mcpServers": {
    "agentic-log-analyser": {
      "command": "/absolute/path/to/dist/agentic-log-analyser-mcp",
      "args": [],
      "disabled": false,
      "autoApprove": ["analyse_logs", "analyse_log_file", "stream_project"]
    }
  }
}
```

There's a ready-to-paste example at `examples/mcp_config_kiro.json`. Reload the
MCP config from the MCP Server view in the Kiro feature panel.

### Register with Claude Desktop

Edit `~/Library/Application Support/Claude/claude_desktop_config.json`
(macOS) and merge in:

```json
{
  "mcpServers": {
    "agentic-log-analyser": {
      "command": "/absolute/path/to/dist/agentic-log-analyser-mcp",
      "args": []
    }
  }
}
```

Restart Claude Desktop. The tools will appear in the tools menu.

### Use it from a chat

In Kiro or Claude, just ask:

> "Compress this log file and tell me what stands out:
> `/Users/me/Desktop/logs/cloudtrail_event.txt`"

The assistant will pick up `analyse_log_file`, call it with the path, and
diagnose against the templated artifact instead of the raw bytes.

### Debug from the CLI

To run the server manually and tail its output:

```bash
./dist/agentic-log-analyser-mcp
```

It speaks JSON-RPC over stdio. The repo's `scripts/smoke_mcp_binary.py` shows a
real client roundtrip you can use as a reference.
