Metadata-Version: 2.4
Name: transmog
Version: 2.0.2
Summary: A data transformation library for flattening complex nested structures into tabular formats while preserving hierarchical relationships
License-Expression: MIT
License-File: LICENSE
Keywords: avro,csv,data-pipeline,data-processing,data-transformation,elt,etl,flattening,json,normalization,parquet,pyarrow
Author: Scott Draper
Author-email: admin@scottdraper.io
Requires-Python: >=3.10
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Database :: Database Engines/Servers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Provides-Extra: dev
Provides-Extra: minimal
Requires-Dist: bandit (>=1.9) ; extra == "dev"
Requires-Dist: cramjam (>=2.7)
Requires-Dist: fastavro (>=1.9)
Requires-Dist: furo (>=2024.8) ; extra == "dev"
Requires-Dist: hjson (>=3.1)
Requires-Dist: hjson (>=3.1) ; extra == "minimal"
Requires-Dist: interrogate (>=1.7) ; extra == "dev"
Requires-Dist: json5 (>=0.13)
Requires-Dist: json5 (>=0.13) ; extra == "minimal"
Requires-Dist: linkify-it-py (>=2) ; extra == "dev"
Requires-Dist: memory-profiler (>=0.60) ; extra == "dev"
Requires-Dist: mypy (>=1.19) ; extra == "dev"
Requires-Dist: myst-parser (>=4) ; extra == "dev"
Requires-Dist: orjson (>=3.11)
Requires-Dist: orjson (>=3.11) ; extra == "minimal"
Requires-Dist: pre-commit (>=4.5) ; extra == "dev"
Requires-Dist: psutil (>=5.8) ; extra == "dev"
Requires-Dist: pyarrow (>=23)
Requires-Dist: pyproject-fmt (>=2.6) ; extra == "dev"
Requires-Dist: pyproject-parser (>=0.7) ; extra == "dev"
Requires-Dist: pytest (>=9) ; extra == "dev"
Requires-Dist: pytest-benchmark (>=4) ; extra == "dev"
Requires-Dist: pytest-cov (>=3) ; extra == "dev"
Requires-Dist: ruff (>=0.14) ; extra == "dev"
Requires-Dist: safety (>=2.3.5) ; extra == "dev"
Requires-Dist: sphinx (>=7,<9) ; extra == "dev"
Requires-Dist: sphinx-autobuild (>=2021.3.14) ; extra == "dev"
Requires-Dist: sphinx-autodoc-typehints (>=1.24) ; extra == "dev"
Requires-Dist: sphinx-copybutton (>=0.5) ; extra == "dev"
Requires-Dist: sphinx-design (>=0.5) ; extra == "dev"
Requires-Dist: sphinx-rtd-theme (>=3) ; extra == "dev"
Requires-Dist: sphinxcontrib-applehelp (>=1) ; extra == "dev"
Requires-Dist: sphinxcontrib-htmlhelp (>=2) ; extra == "dev"
Requires-Dist: sphinxcontrib-jsmath (>=1) ; extra == "dev"
Requires-Dist: sphinxcontrib-mermaid (>=0.8.1) ; extra == "dev"
Requires-Dist: sphinxcontrib-napoleon (>=0.7) ; extra == "dev"
Requires-Dist: sphinxcontrib-qthelp (>=1) ; extra == "dev"
Requires-Dist: sphinxcontrib-serializinghtml (>=1) ; extra == "dev"
Requires-Dist: types-pyyaml ; extra == "dev"
Requires-Dist: types-toml ; extra == "dev"
Requires-Dist: typing-extensions (>=4)
Requires-Dist: typing-extensions (>=4) ; extra == "minimal"
Project-URL: Bug Tracker, https://github.com/scottdraper8/transmog/issues
Project-URL: Documentation, https://scottdraper8.github.io/transmog/
Project-URL: Homepage, https://github.com/scottdraper8/transmog
Description-Content-Type: text/markdown

<div align="center">

# Transmog - Flatten Nested JSON to Tabular Formats

[![Transmog Version](https://img.shields.io/badge/transmog-2.0.1-ff79c6?logo=github&logoColor=white&labelColor=6272a4)](https://github.com/scottdraper8/transmog/releases)
[![Python 3.10+](https://img.shields.io/badge/Python-3.10+-ffb86c?logo=python&logoColor=white&labelColor=6272a4)](https://www.python.org/downloads/)
[![Poetry](https://img.shields.io/badge/Poetry-1.0+-f1fa8c?logo=poetry&logoColor=282a36&labelColor=6272a4)](https://python-poetry.org/)
[![pre-commit](https://img.shields.io/badge/pre--commit-6.0.0-50fa7b?logo=pre-commit&logoColor=282a36&labelColor=6272a4)](https://github.com/pre-commit/pre-commit)
[![License: MIT](https://img.shields.io/badge/License-MIT-8be9fd?logo=opensourceinitiative&logoColor=white&labelColor=6272a4)](LICENSE)

---

A configurable data flattening tool that transforms nested JSON data into
flat, tabular formats while preserving parent-child relationships.

---

</div>

## Installation

```bash
# Full install (CSV, Parquet, ORC, Avro output)
pip install transmog

# CSV only (no pyarrow, fastavro, or cramjam)
pip install transmog[minimal]
```

## Quick Start

```python
import transmog as tm

data = {"user": "Alice", "orders": [{"id": 101}, {"id": 102}]}
result = tm.flatten(data, name="users")

result.main                    # Main table
result.tables["users_orders"]  # Child tables
result.save("output.csv")      # Save to file
```

### In-Memory vs Streaming

1. **flatten(data, name, config)** — Flatten data in memory

    ```python
    result = tm.flatten("data.json", name="products")
    result = tm.flatten([{"id": 1}, {"id": 2}])
    result.save("output.parquet")
    ```

2. **flatten_stream(data, output_path, name, output_format)** — Stream directly to disk

    ```python
    tm.flatten_stream("large.jsonl", "output/", name="events", output_format="parquet")
    ```

## Configuration

```python
config = tm.TransmogConfig(
    # Array handling
    array_mode=tm.ArrayMode.SMART,   # SMART (default), SEPARATE, INLINE, SKIP

    # ID generation and metadata fields
    id_generation="random",          # random (default), natural, hash, or ["field1", "field2"]
    id_field="_id",                  # Field name for record IDs
    parent_field="_parent_id",       # Field name for parent references
    time_field="_timestamp",         # Field name for timestamps (None to disable)

    # Data transformation
    include_nulls=False,             # Include null/empty values in output
    stringify_values=False,          # Convert all leaf values to strings

    # Processing controls
    max_depth=100,                   # Maximum recursion depth
    batch_size=1000                  # Records per batch for streaming
)

result = tm.flatten(data, config=config)
```

### Array Modes

| Mode       | Behavior                                                        |
| ---------- | --------------------------------------------------------------- |
| `SMART`    | Preserve simple arrays, extract complex arrays to child tables  |
| `SEPARATE` | Extract all arrays to child tables                              |
| `INLINE`   | Serialize arrays as JSON strings                                |
| `SKIP`     | Omit arrays from output                                         |

### ID Generation

| Strategy          | Description                                        |
| ----------------- | -------------------------------------------------- |
| `random`          | Generate random UUID (default)                     |
| `natural`         | Use existing ID field from data                    |
| `hash`            | Deterministic hash of entire record                |
| `["field1", ...]` | Deterministic hash of specified fields             |

## Documentation

Full documentation: [scottdraper8.github.io/transmog](https://scottdraper8.github.io/transmog)

- [Getting Started](https://scottdraper8.github.io/transmog/getting_started.html)
- [Configuration](https://scottdraper8.github.io/transmog/configuration.html)
- [API Reference](https://scottdraper8.github.io/transmog/api.html)
- [Contributing](https://scottdraper8.github.io/transmog/contributing.html)

## License

MIT License - see [LICENSE](LICENSE) file for details.

