Metadata-Version: 2.4
Name: jsleekr-logpilot
Version: 1.0.0
Summary: Structured JSON log viewer and analyzer for the terminal
Author-email: JSLEEKR <93jslee@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/JSLEEKR/logpilot
Project-URL: Repository, https://github.com/JSLEEKR/logpilot
Keywords: logging,json,cli,terminal,observability,structured-logs
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: System :: Logging
Classifier: Topic :: Utilities
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

<div align="center">

# 📊 logpilot

### Structured log viewer and analyzer

[![GitHub Stars](https://img.shields.io/github/stars/JSLEEKR/logpilot?style=for-the-badge&logo=github&color=yellow)](https://github.com/JSLEEKR/logpilot/stargazers)
[![License](https://img.shields.io/badge/license-MIT-blue?style=for-the-badge)](LICENSE)
[![Python](https://img.shields.io/badge/Python-3.12+-3776AB?style=for-the-badge&logo=python&logoColor=white)](https://python.org)
[![Tests](https://img.shields.io/badge/tests-500%20passing-brightgreen?style=for-the-badge)](#)

<br/>

**Parse, filter, search, and analyze JSON-structured logs without leaving your terminal**

</div>

---

## Why This Exists

Production logs are JSON. Your terminal tools are not. `grep` doesn't know that `"level":"error"` means ERROR. `jq` requires you to know the exact field names. Elasticsearch needs a cluster. Datadog needs a credit card.

`logpilot` is a pure Python, zero-dependency log viewer that understands structured JSON logs out of the box -- auto-detecting field names from 40+ variants, filtering by level and source, running a Boolean query language, and clustering errors by similarity, all from a single command.

- **No infrastructure required** -- works on plain files and stdin, runs anywhere Python 3.12 runs
- **Smart field detection** -- auto-discovers timestamp, level, message, and source fields regardless of framework (Pino, Winston, structlog, Bunyan)
- **Built-in analysis** -- error rate stats, anomaly detection, and error clustering without leaving the terminal

## Requirements

- Python 3.12+
- No external dependencies (pure stdlib)

## Why logpilot?

Most log viewers require infrastructure (Elasticsearch, Loki, Datadog). logpilot works with **plain files** and **stdin** -- no servers, no setup, no cost.

- **Instant parsing** of JSON logs from any framework (Pino, Winston, structlog, Bunyan, serilog)
- **Smart field detection** -- auto-discovers timestamp, level, message, source from 40+ field name variants
- **Powerful query language** -- `level:>=warn AND source:api`
- **Built-in analysis** -- error clustering, anomaly detection, time histograms
- **Multiple export formats** -- JSON, JSONL, CSV, Markdown

## How It Works

```
Input          Parse          Filter         Analyze        Format         Output
──────────     ──────────     ──────────     ──────────     ──────────     ──────────
File/stdin  →  Parser      →  Filter      →  Analyzer    →  Formatter   →  Terminal
               • JSON          • level        • stats        • compact      • stdout
               • multiline     • source       • clusters     • pretty       • file
               • auto-         • search       • anomalies    • json         • pipe
                 detect        • query        • patterns     • table
                 fields        • regex
```

## Installation

```bash
pip install logpilot
```

Or run directly:

```bash
python -m logpilot app.log
```

## When to Use This

| Scenario | Command |
|---|---|
| View error logs | `logpilot app.log -l error` |
| Monitor live (tail -f) | `logpilot app.log -F` |
| Search for a keyword | `logpilot app.log -s "timeout"` |
| Grep raw log text | `logpilot app.log -g "500"` |
| Filter by source | `logpilot app.log --source api` |
| Advanced query | `logpilot app.log -q "level:>=warn AND source:api"` |
| Show statistics | `logpilot app.log --stats` |
| Detect anomalies | `logpilot app.log --anomalies` |
| Top recurring errors | `logpilot app.log --top-errors 10` |
| Export to CSV | `logpilot app.log --export csv -o report.csv` |
| Pipe from kubectl | `kubectl logs my-pod \| logpilot -l warn` |
| Pipe from docker | `docker logs myapp 2>&1 \| logpilot -s "exception"` |
| Last N entries | `logpilot app.log -t 100` |
| First N entries | `logpilot app.log -n 50` |
| Pretty output | `logpilot app.log --format pretty` |
| No color output | `logpilot app.log --no-color` |

## Quick Start

### View logs with smart formatting

```bash
logpilot app.log
```

### Filter by level

```bash
logpilot app.log --level error          # ERROR and FATAL only
logpilot app.log -l warn                # WARN and above
```

### Search messages

```bash
logpilot app.log --search "timeout"     # Search in message field
logpilot app.log --grep "500"           # Search in raw log text
```

### Query language

```bash
logpilot app.log --query "level:error AND source:api"
logpilot app.log -q "status:>=400"
logpilot app.log -q "level:>=warn AND NOT source:healthcheck"
logpilot app.log -q "method:POST AND path:~users"
```

### Pipe from stdin

```bash
cat app.log | logpilot --level error
kubectl logs my-pod | logpilot -l warn --format pretty
docker logs myapp 2>&1 | logpilot --search "exception"
```

### Follow mode (tail -f)

```bash
logpilot app.log --follow --level error
logpilot app.log -F -l warn --search "timeout"
```

### Statistics

```bash
logpilot app.log --stats
```

Output:
```
Log Statistics
========================================
Total entries: 15432
Parse errors: 12
Time span: 4.2h
First entry: 2024-01-01T08:00:00
Last entry: 2024-01-01T12:15:00

By Level:
  INFO : 12000 ( 77.7%) ████████████████░░░░
  WARN :  2500 ( 16.2%) ███░░░░░░░░░░░░░░░░░
  ERROR:   900 (  5.8%) █░░░░░░░░░░░░░░░░░░░
  FATAL:    32 (  0.2%) ░░░░░░░░░░░░░░░░░░░░

Error rate: 6.0%
```

### Anomaly detection

```bash
logpilot app.log --anomalies
```

Detects error rate spikes, recurring error patterns, and unusual time gaps.

### Error clustering

```bash
logpilot app.log --top-errors 10
```

Groups similar errors together, normalizing IPs, UUIDs, and numbers.

### Export

```bash
logpilot app.log --export json -o filtered.json
logpilot app.log --export csv -o report.csv
logpilot app.log --export jsonl
logpilot app.log --export markdown
```

## CLI Reference

### Positional arguments

| Argument | Description |
|---|---|
| `file` | Log file to read (omit to read from stdin) |

### Filtering

| Flag | Short | Description |
|---|---|---|
| `--level LEVEL` | `-l` | Minimum log level: `trace`, `debug`, `info`, `warn`, `error`, `fatal` |
| `--search TEXT` | `-s` | Search for text in the message field |
| `--grep TEXT` | `-g` | Search for text in the raw log line |
| `--query EXPR` | `-q` | Query expression, e.g. `level:error AND source:api` |
| `--source NAME` | | Filter by source/logger name |
| `--field FIELD VALUE` | | Filter by a specific field value (repeatable) |
| `--has-field FIELD` | | Only show entries that contain a given field |
| `--json-only` | | Only show JSON-parsed entries (skip unparseable lines) |
| `--invert` | `-v` | Invert the filter -- show non-matching entries |

### Display

| Flag | Short | Description |
|---|---|---|
| `--format FORMAT` | `-f` | Output format: `compact`, `pretty`, `json`, `json-pretty`, `raw`, `table` |
| `--no-color` | | Disable colored output |
| `--fields FIELD...` | | Only show these fields in output |
| `--exclude-fields FIELD...` | | Hide these fields from output |

### Limiting

| Flag | Short | Description |
|---|---|---|
| `--head N` | `-n` | Show first N entries |
| `--tail N` | `-t` | Show last N entries |
| `--max-lines N` | | Maximum number of input lines to read |

### Sorting

| Flag | Description |
|---|---|
| `--sort FIELD` | Sort by field (`timestamp`, `level`, `message`, etc.) |
| `--reverse` | Reverse sort order |

### Analysis

| Flag | Description |
|---|---|
| `--stats` | Show log statistics (counts, rates, time span) |
| `--anomalies` | Detect anomalies (error spikes, gaps, recurring errors) |
| `--top-errors [N]` | Show top N error clusters, default 10 |
| `--top-messages [N]` | Show most frequent messages, default 10 |

### Export

| Flag | Short | Description |
|---|---|---|
| `--export FORMAT` | | Export format: `json`, `jsonl`, `csv`, `tsv`, `markdown` |
| `--output FILE` | `-o` | Output file path (default: stdout) |

### Misc

| Flag | Short | Description |
|---|---|---|
| `--follow` | `-F` | Follow file for new entries (tail -f mode) |
| `--version` | `-V` | Print version and exit |

## Output Formats

| Format | Flag | Description |
|--------|------|-------------|
| compact | `--format compact` | Single-line with key=value pairs (default) |
| pretty | `--format pretty` | Multi-line with indented fields |
| json | `--format json` | JSON per line |
| json-pretty | `--format json-pretty` | Pretty-printed JSON |
| raw | `--format raw` | Original log line |
| table | `--format table` | Tabular output |

## Supported Log Formats

logpilot auto-detects fields from JSON logs:

| Field | Detected Names |
|-------|---------------|
| Timestamp | `timestamp`, `time`, `ts`, `@timestamp`, `datetime`, `date` |
| Level | `level`, `severity`, `log_level`, `loglevel`, `lvl` |
| Message | `message`, `msg`, `text`, `body`, `log` |
| Source | `source`, `logger`, `module`, `component`, `service` |

Level aliases: `WARNING`/`CRITICAL`/`VERBOSE`/`NOTICE` are mapped automatically.

Timestamps: ISO 8601, Unix seconds, Unix milliseconds -- all handled.

## Features

| Module | Feature | Description |
|---|---|---|
| `parser` | JSON parsing | Parses JSON logs with multi-format field detection |
| `parser` | Field auto-detect | Discovers timestamp, level, message, source from 40+ variants |
| `parser` | Timestamp normalization | ISO 8601, Unix seconds, Unix milliseconds |
| `filter` | Level filtering | Minimum level threshold (trace → fatal) |
| `filter` | Message search | Case-insensitive substring match in message field |
| `filter` | Raw grep | Substring match against the full raw log line |
| `filter` | Source filter | Filter by logger/service/module name |
| `filter` | Field filter | Match any arbitrary field by value |
| `filter` | Field existence | `--has-field` for entries containing a field |
| `filter` | JSON-only mode | Skip unparseable / plain-text lines |
| `filter` | Invert | Show non-matching entries |
| `filter` | Head / Tail | First or last N entries |
| `query` | Query language | Boolean expressions: `AND`, `OR`, `NOT` |
| `query` | Comparisons | `>=`, `<=`, `>`, `<`, `=` on numeric/level fields |
| `query` | Regex match | `field:~pattern` for substring/regex matching |
| `formatter` | Compact format | Single-line `key=value` with ANSI level colours |
| `formatter` | Pretty format | Multi-line indented field display |
| `formatter` | JSON / JSON-pretty | Structured JSON output per entry |
| `formatter` | Raw format | Original unparsed log line |
| `formatter` | Table format | Aligned tabular output |
| `analyzer` | Statistics | Total counts, error rate, time span, per-level breakdown |
| `analyzer` | Error clustering | Groups similar errors, normalises IPs, UUIDs, numbers |
| `analyzer` | Anomaly detection | Error-rate spikes, recurring patterns, time gaps |
| `analyzer` | Top messages | Most frequent log messages |
| `pipeline` | Composable pipeline | Chainable filter/sort/limit API |
| `watcher` | Follow mode | Live tail of a file (`tail -f` equivalent) |
| `exporter` | JSON export | Full JSON array export |
| `exporter` | JSONL export | One JSON object per line |
| `exporter` | CSV export | Spreadsheet-ready export |
| `exporter` | TSV export | Tab-separated export |
| `exporter` | Markdown export | Table export for docs/wikis |
| `context` | Trace correlation | Group entries by `trace_id` / `request_id` |
| `context` | Slow trace detection | Find traces exceeding a latency threshold |
| `highlighter` | Search highlighting | Highlights matched terms in terminal output |
| `rate` | Throughput tracking | Log lines per second calculation |
| `rate` | Rate spike detection | Detects unusual volume spikes |
| `merge` | Multi-file merge | Load and chronologically merge multiple log files |
| `config` | Config profiles | Named profiles via config file |
| `patterns` | Pattern detection | OOM, timeouts, SSL, auth failures, disk full, DNS, deadlocks |
| `bookmark` | Bookmarks | Annotate and bookmark specific log entries |
| `sampler` | Sampling | Reservoir, stratified, and hash-based sampling |
| `structured` | Structured output | Machine-readable output for programmatic consumption |
| `diff` | Log diff | Compare two log datasets; show new/resolved errors |
| `fields` | Field extraction | Extract, transform, and mask fields |
| `validators` | Log validation | Validate log quality and schema compliance |
| `enrichment` | Enrichment | HTTP status classification, error severity tagging |
| `window` | Windowing | Time-based windowing and session grouping |
| `alerts` | Alerting engine | Rule-based alerting with configurable severity |

## Architecture

```
logpilot/
  models.py       # LogEntry, LogLevel, LogStats, FilterConfig
  parser.py       # JSON log parser with multi-format support
  filter.py       # Filtering engine (level, source, search, regex, field)
  formatter.py    # Output formatters (compact, pretty, json, raw, table)
  analyzer.py     # Statistics, error clustering, anomaly detection
  pipeline.py     # Composable processing pipeline
  query.py        # Query language parser and evaluator
  watcher.py      # File watching (tail -f) support
  exporter.py     # Export to JSON, JSONL, CSV, Markdown
  cli.py          # CLI interface
  context.py      # Request/trace ID correlation
  highlighter.py  # Search match highlighting and ANSI utilities
  rate.py         # Throughput calculation and rate spike detection
  merge.py        # Multi-file log merging
  config.py       # Configuration file support with profiles
  patterns.py     # Built-in pattern detection (OOM, timeout, SSL, etc.)
  bookmark.py     # Bookmark and annotation support
  sampler.py      # Sampling strategies (reservoir, stratified, hash)
  structured.py   # Structured output for programmatic use
  diff.py         # Log dataset comparison
  fields.py       # Field extraction, transformation, masking
  validators.py   # Log quality validation
  enrichment.py   # HTTP status classification, error severity tagging
  window.py       # Time-based windowing and sessionization
  alerts.py       # Alerting rules engine
```

## Advanced Features

### Trace/Request Correlation

```python
from logpilot.context import group_by_trace, find_slow_traces

traces = group_by_trace(entries)
slow = find_slow_traces(entries, threshold_ms=1000)
```

### Pattern Detection

```python
from logpilot.patterns import scan_patterns

matches = scan_patterns(entries)
for match in matches:
    print(f"[{match.count}x] {match.description}: {match.suggestion}")
```

Detects: OOM, connection refused, timeouts, disk full, auth failures, rate limits, DNS failures, SSL errors, null pointers, deadlocks.

### Alerting Rules

```python
from logpilot.alerts import create_default_engine

engine = create_default_engine()
alerts = engine.evaluate(entries)
for alert in alerts:
    print(f"[{alert.severity}] {alert.message}")
```

### Multi-file Merge

```python
from logpilot.merge import load_multiple_files, merge_sorted

entries = load_multiple_files(["app.log", "worker.log", "db.log"])
timeline = merge_sorted(entries)
```

### Log Diff

```python
from logpilot.diff import compare_by_message, compare_error_patterns

diff = compare_by_message(before_entries, after_entries)
errors = compare_error_patterns(before_entries, after_entries)
print(f"New errors: {errors['new_count']}, Resolved: {errors['resolved_count']}")
```

## API Usage

```python
from logpilot.parser import parse_file
from logpilot.pipeline import Pipeline
from logpilot.models import LogLevel
from logpilot.analyzer import compute_stats

# Parse and filter
entries = parse_file("app.log")
errors = (
    Pipeline(entries)
    .filter_level(LogLevel.ERROR)
    .search("timeout")
    .sort_by("timestamp")
    .limit(100)
    .execute()
)

# Analyze
stats = compute_stats(entries)
print(f"Error rate: {stats.error_rate:.1f}%")

# Query language
from logpilot.query import execute_query
result = execute_query(entries, "level:>=warn AND source:api")
print(f"Matched {result.matched} of {result.total_scanned} entries")
```

## Troubleshooting / FAQ

**My logs are not being parsed.**
logpilot expects one JSON object per line. If your logs use multi-line JSON or a non-JSON format, they will be treated as raw lines. Use `--format raw` to view them as-is.

**I see parse errors in `--stats` output.**
Lines that are not valid JSON increment the parse error counter but are not discarded. They appear as entries with `is_json=False`. Use `--json-only` to hide them.

**Colors look wrong in my terminal.**
Run with `--no-color` to disable ANSI color codes.

**`--follow` doesn't work on stdin.**
Follow mode requires a real file path because it watches the file for new bytes. Pipe-based streaming is not supported with `-F`.

**The query syntax returns no results.**
Field names in queries are case-sensitive and must match the actual JSON key. Use `--format json` on a few entries to inspect the exact field names.

**`--export tsv` and `--export csv` produce the same output.**
TSV support reuses the CSV exporter in the current version. A dedicated TSV formatter is on the roadmap.

**How do I suppress INFO logs and only see warnings and above?**
```bash
logpilot app.log -l warn
```

**Can I combine `--query` with `--stats`?**
Yes. Filters (including `--query`) are applied before analysis, so `--stats` reports on the filtered subset.

## License

MIT
