Metadata-Version: 2.4
Name: nm-tool-forge
Version: 0.2.4
Summary: Analyze MigMan log files and generate aggregated CSV, Markdown, HTML, and optional PDF reports.
Author-email: Stefan Ewald <s.ew@outlook.de>
License-Expression: MIT
Project-URL: Homepage, https://github.com/Jack736-ui/migman_log
Project-URL: Issues, https://github.com/Jack736-ui/migman_log/issues
Keywords: migman,logs,analysis,reporting,csv,markdown,pdf
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: chardet>=5.0
Provides-Extra: pdf
Requires-Dist: weasyprint>=62; extra == "pdf"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: build>=1.2; extra == "dev"
Requires-Dist: twine>=5.0; extra == "dev"
Requires-Dist: ruff>=0.11; extra == "dev"
Dynamic: license-file

# nm-tool-forge

`nm-tool-forge` analyzes MigMan text log files with severity tokens such as `INFO`, `ERROR`, and `WARNING` and generates aggregated CSV, Markdown, HTML, and optional PDF reports. The package also includes `csvchunking`, a small helper for splitting large CSV files into migration-friendly chunks.

The project uses a package-ready `src` layout. The legacy `log_analysis.py` file remains available as a thin compatibility entry point for older local setups.

## Features

- Parse logical log entries from multi-line text logs
- Normalize recurring error patterns for better aggregation
- Generate aggregated CSV reports
- Generate Markdown summary reports
- Optionally convert reports to HTML and PDF
- Keep a backup copy of analyzed log files
- Split large CSV files into numbered chunks while preserving the header row
- Run built-in self-tests from the CLI

## Installation

Basic installation from a local checkout:

```powershell
python -m pip install .
```

Installation with optional PDF support and developer tools:

```powershell
python -m pip install .[pdf,dev]
```

## Command-line usage

After installation, the CLI entry points are available:

```powershell
python -m loganalysis --help
python -m csvchunking --help
loganalysis --help
nm-tool-forge --help
csvchunking --help
```

Typical analysis run:

```powershell
nm-tool-forge --logs-dir logs --out-dir log_analyse_out
```

Analysis with HTML/PDF conversion:

```powershell
nm-tool-forge --logs-dir logs --out-dir log_analyse_out --convert
```

Self-test mode:

```powershell
python -m loganalysis --self-test
```

Legacy compatibility call:

```powershell
python .\log_analysis.py --convert
```

CSV chunking run:

```powershell
csvchunking "data\large_export.csv" --chunk-size 5000
```

The command creates an output directory next to the input file named after the CSV stem. For example, `data\large_export.csv` is split into files such as `data\large_export\large_export_01.csv`, `data\large_export\large_export_02.csv`, and so on.

CSV chunking with an explicit encoding:

```powershell
python -m csvchunking "data\large_export.csv" --chunk-size 5000 --encoding utf-8-sig
```

Each chunk contains the original header row plus up to `--chunk-size` data rows. The delimiter is detected automatically; if detection fails, semicolon-separated CSV is used.

## Supported CLI options

Log analysis options:

- `--logs-dir`
- `--out-dir`
- `--backup-dir`
- `--top-examples`
- `--convert`
- `--self-test`

CSV chunking options:

- `input_file` - path to the CSV file to split
- `--chunk-size` - required number of data rows per output file; must be greater than zero
- `--encoding` - input and output encoding; defaults to `utf-8-sig`

## Release process

To publish a new release, always test on TestPyPI first, then upload to PyPI only after successful Conda smoke tests:

```bash
export TWINE_USERNAME="__token__"
export TWINE_PASSWORD="pypi-..."

bash scripts/release_testpypi.sh --bump patch
bash scripts/release_pypi.sh --yes
```

**Notes:**
- Run and verify the TestPyPI release first, then upload the final package to PyPI.
- PyPI versions cannot be overwritten or reused.

## Library usage

```python
from pathlib import Path

from loganalysis import (
    analyze_file,
    convert_report_md_to_html_pdf,
    iter_logical_entries,
    normalize_message,
)
from csvchunking import split_csv

result = analyze_file(Path("logs/app.txt"))
print(result["norm_counts"])

print(normalize_message(
    'Conversion: X =3100110. 138 The record was not found in table "Teile".'
))

for entry in iter_logical_entries(Path("logs/app.txt")):
    print(entry)

convert_report_md_to_html_pdf(
    Path("log_analyse_out/report.md"),
    Path("log_analyse_out/report.html"),
    Path("log_analyse_out/report.pdf"),
)

chunk_result = split_csv(Path("data/large_export.csv"), chunk_size=5000)
print(chunk_result.output_dir)
print(chunk_result.output_files)
```

`split_csv()` returns a `ChunkResult` with the input file, output directory, chunk size, processed data-row count, created file count, and generated output file paths.

## Project structure

```text
.
├─ pyproject.toml
├─ src/loganalysis/
├─ src/csvchunking/
├─ tests/
├─ docs/
└─ log_analysis.py
```

Important modules:

- `analysis.py` - file-level and overall aggregation
- `parsing.py` - logical entry detection and parsing
- `normalization.py` - message normalization
- `report_markdown.py` - Markdown report model and rendering
- `report_html.py` - HTML/CSS rendering
- `report_pdf.py` - PDF engine selection and fallback handling
- `converters.py` - Markdown-to-HTML/PDF conversion
- `loganalysis/cli.py` - log analysis command-line entry point
- `csvchunking/chunker.py` - CSV splitting logic and `ChunkResult`
- `csvchunking/cli.py` - CSV chunking command-line entry point

## HTML/PDF conversion

Report conversion is intentionally optional:

- `report.md` remains the primary human-readable output
- `report.html` is generated from the internal report model
- `report.pdf` is created when supported PDF tooling is available

PDF engine preference order:

1. `weasyprint`
2. `wkhtmltopdf`
3. `pandoc` + `xelatex` or `pdflatex`

If no supported PDF engine is available, the analysis still succeeds and generates Markdown and HTML output.

Windows-specific setup notes:

- `docs/install_gtk_weasyprint_windows.md`
- `docs/install_xelatex_windows.md`

## Tests

```powershell
pytest
```

## Local build

```powershell
python -m build
```

Expected artifacts:

- `dist/*.tar.gz`
- `dist/*.whl`

## Notes

The package name on PyPI/TestPyPI is `nm-tool-forge`, while the current Python import package remains `loganalysis`.

This keeps the first public release small and low-risk. A later follow-up release can still rename the import package if desired.
