Metadata-Version: 2.4
Name: loggen-lg
Version: 0.1.0
Summary: Synthetic log generator for SIEM and IR exercises
License-Expression: MIT
Keywords: siem,logs,security,testing,ir,blue-team
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Information Technology
Classifier: Topic :: Security
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.10
Description-Content-Type: text/markdown

# loggen

Synthetic log generator for SIEM and IR exercises. Reproducible, deterministic, supports arbitrary scales from MB to TB. Profiles for Windows, Linux, network, and EDR-like telemetry.

## Install

```bash
pip install -e .
```

## Quick start

```python
from loggen import LogGen, Session

gen = LogGen(seed=42)

# Stream 1 000 mixed entries
for entry in gen.stream(["windows", "linux", "edr"], count=1000):
    print(entry.asdict())

# Write 1 GB of mixed logs (gzip)
gen.write("corpus.jsonl.gz", ["windows", "linux", "network", "edr"], size="1GB")

# Run a built-in attack scenario
gen.write("attack.jsonl", [], scenario="lateral_movement", count=50)

# In-memory list
entries = gen.to_list(["network"], count=200)
```

## Custom data pools

By default loggen ships with a built-in set of hostnames, users, and IPs.  
You can replace any of them with your own data.

**Inline lists**
```python
from loggen import Session, LogGen

session = Session(
    seed=42,
    hosts=["prod-web-01", "prod-db-01", "prod-cache-01"],
    users=["alice", "bob", "charlie"],
)
gen = LogGen(session=session)
```

**From a file**
```python
from loggen import Session, LogGen, WordList

session = Session(
    seed=42,
    hosts=WordList.from_file("wordlists/hosts.txt"),
    users=WordList.from_file("wordlists/users.json"),
    internal_ips=WordList.from_file("wordlists/ips.csv", column=0),
)
gen = LogGen(session=session)
```

**From a JSON config file**
```python
from loggen import Session, LogGen

session = Session.from_config("config/loggen.json")
gen = LogGen(session=session)
```

See [WIKI.md](WIKI.md) for the config file format and all supported options.

**From an environment variable**
```python
from loggen import WordList

hosts = WordList.from_env("LOGGEN_HOSTS")        # comma-separated
users = WordList.from_env("LOGGEN_USERS", fallback=["admin"])
```

## CLI

```bash
# 10 000 mixed entries to stdout (JSON Lines)
python -m loggen generate -p windows -p linux -n 10000

# 500 MB of network logs, gzip, reproducible
python -m loggen generate -p network --size 500MB --seed 42 -o network.jsonl.gz

# Brute-force scenario in CEF format
python -m loggen generate --scenario brute_force -f cef -o brute.cef

# Custom profile weights
python -m loggen generate -p windows -p edr --weight windows:4 --weight edr:1 -n 50000 -o out.jsonl

# List available options
python -m loggen list-profiles
python -m loggen list-scenarios
python -m loggen list-formats
```

## Profiles

| Profile   | Covers |
|-----------|--------|
| `windows` | Security events 4624/4625/4688/4672/4698/7045/5140/4104 … |
| `linux`   | sshd auth, sudo, cron, systemd, auditd, PAM, kernel |
| `network` | Firewall allow/deny, DNS, HTTP proxy, DHCP |
| `edr`     | Process/file/network/registry/module events with hashes |

## Scenarios

| Scenario           | Description |
|--------------------|-------------|
| `brute_force`      | Repeated logon failures → success |
| `lateral_movement` | Recon → SMB share access → remote execution |
| `priv_esc`         | Service install → SYSTEM shell → 4672 |
| `data_exfil`       | Archive creation → large outbound transfers → DNS tunnelling |
| `persistence`      | Registry run key + scheduled task + binary drop |

## Output formats

| Format   | Description |
|----------|-------------|
| `jsonl`  | JSON Lines / NDJSON (default) |
| `cef`    | ArcSight CEF:0 |
| `syslog` | RFC 5424 syslog |

See [WIKI.md](WIKI.md) for full API reference.
