Metadata-Version: 2.4
Name: sodatools-core
Version: 0.0.3
Summary: Some CLI tools for text file processing
Author-email: Jonathan Olsten <jonathan.olsten@gmail.com>
License-File: LICENSE
Requires-Python: >=3.11
Requires-Dist: lark>=1.3.1
Requires-Dist: matplotlib>=3.9
Requires-Dist: polars>=1.0
Requires-Dist: typer>=0.24.1
Description-Content-Type: text/markdown

# sodatools

A CLI toolkit for columnar text processing. Pipe-friendly, composable tools that operate on whitespace-delimited data from stdin.

```
pip install sodatools-core
```

Every tool is available both as a subcommand (`soda <tool>`) and as a
direct `sd<tool>` entry point (e.g. `sdalign`, `sdkut`, `sduniq`). The
`sd` prefix keeps the direct names out of the way of coreutils.

## Tools

| Tool | Description |
|------|-------------|
| `align` | Buffer and align columns to fixed widths (auto right-justify numbers) |
| `coltest` | Filter lines by column test expressions (`2gt50`, `1seqOK`, `3m^abc`) |
| `cutw` | Truncate lines to terminal or explicit width |
| `delta` | Compute row-to-row differences for numeric columns |
| `events` | Convert per-row state labels into bounded events with start/stop times |
| `filter` | Filter noisy or invalid rows by column behavior (MAD, run-length, rate, range, clock, monotonicity) |
| `kut` | Select, reorder, and format columns (ranges, exclusions, literals) |
| `nf` | Enforce a specific number of fields per line (drop, pad, or truncate) |
| `plot` | Interactive scatter plots from columnar data (matplotlib) |
| `radix` | Convert integer columns between bases (dec, hex, bin, oct) |
| `sample` | Sample rows (every N, random, percent) or resample by time interval |
| `sample-data` | Generate sample datasets for testing (`weather`, `small`, `noisy`, etc.) |
| `smooth` | Sliding window mean/median smoothing for numeric columns |
| `stats` | Column statistics with formatted table output (mean, std, trend, outliers) |
| `tag` | Assign named states to rows based on column conditions |
| `uniq` | Consecutive deduplication with optional column keys and fuzzy matching |

## Examples

Tag temperature readings into states and convert to events:

```bash
soda sample-data weather | soda tag -H1 hot: 2gt25 cold: 2lt15 normal: 2ge15 2le25 \
  | soda events -H1 --gap 5m --print-header
```

```
state  start                stop                 duration  count
hot    2024-01-01T04:35:00  2024-01-01T10:55:00  6.3h      77
normal 2024-01-01T11:00:00  2024-01-01T18:00:00  7.0h      85
cold   2024-01-01T18:05:00  2024-01-02T04:30:00  10.4h     126
...
```

Select columns, compute deltas, and align:

```bash
soda sample-data weather | soda kut 1 2 3 -H1 | soda delta 2 3 -H1 | soda align -H1
```

```
timestamp            temp  humidity
-------------------  ----  --------
2024-01-01T00:05:00  0.03     -0.29
2024-01-01T00:10:00  0.12      0.51
2024-01-01T00:15:00  0.18     -0.87
...
```

Clean noisy sensor data — remove MAD outliers and short mode glitches, respecting time gaps:

```bash
soda sample-data noisy | soda filter 2:mad 3:run --gap 30s --drop | soda align
```

Sample every 10th row, compute statistics:

```bash
soda sample-data weather | soda sample every 10 -H1 | soda stats -H1
```

Convert hex register dumps to binary with aligned output:

```bash
echo -e "0xff\n0x0a\n0x80" | soda radix --from hex --to bin -b 8 --align
```

```
11111111
00001010
10000000
```

Filter lines where column 2 > 50, show only matching lines:

```bash
soda sample-data small | soda coltest 2gt50 -H1
```

Resample time series to 1-hour intervals, then plot temperature and humidity:

```bash
soda sample-data weather | soda sample interval 1h -H1 | soda plot -H1 2 c=red / 3 c=blue
```

Enforce 3 fields per line, pad short rows, smooth, and align:

```bash
soda sample-data small | soda nf 3 -H1 --pad '?' | soda smooth 3 -p 1 | soda align
```

## Common flags

Most tools share these flags:

| Flag | Alias | Description |
|------|-------|-------------|
| `--delimiter` | `-d` | Input field delimiter (default: whitespace) |
| `--header` | `-H` | First N lines are headers |
| `--align` | | Adaptively align output columns |
| `--no-fail` | | Skip problematic rows instead of erroring |
| `--example` | | Show usage examples and exit |

## Plugins

Third-party packages can add new subcommands to `soda` by registering them
under the `sodatools.commands` entry-point group. Core discovers plugins
at startup; any failures are reported to stderr without crashing the CLI.

```toml
# your plugin's pyproject.toml
[project.entry-points."sodatools.commands"]
mytool = "sodatools_myplugin.cli:mytool"   # exposes `soda mytool`

[project.scripts]
sdmytool = "sodatools_myplugin.entrypoints:sdmytool"  # optional direct entry
```

The registered object must be a typer-compatible command function with the
same shape as the built-ins (plain function with `typer.Option` /
`typer.Argument` defaults). Plugin names that collide with built-ins are
skipped with a warning.

## Changelog

See [CHANGELOG.md](CHANGELOG.md) for release notes.

## License

MIT
