Metadata-Version: 2.4
Name: thebeast
Version: 0.5.0
Summary: A declarative toolkit for transforming machine-readable data into FollowTheMoney entities
Author: The Beast Contributors
Maintainer: The Beast Contributors
License-Expression: MIT
Keywords: ftm,followthemoney,data-transformation,etl,osint
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: General
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: followthemoney~=4.5.0
Requires-Dist: jmespath
Requires-Dist: PyYAML>=6.0
Requires-Dist: smart-open[s3,ssh]>=7.0
Requires-Dist: ijson~=3.4
Requires-Dist: fastjsonschema>=2.15
Requires-Dist: Jinja2>=3.1
Requires-Dist: tqdm>=4.60
Requires-Dist: regex>=2024.0
Requires-Dist: python-dateutil>=2.9
Requires-Dist: names_translator>=1.2
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-subtests; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: flake8; extra == "dev"
Dynamic: license-file

# The Beast

A flexible, declarative toolkit for transforming machine-readable data into
[FollowTheMoney](https://followthemoney.tech/) (FTM) entities.

The Beast is currently in **beta** and is battle-tested in production on hundreds of data sources.
While the mapping format may evolve for better flexibility, changes are introduced cautiously.

## Installation

```bash
pip install thebeast
```

## Quick Start

1. Write a YAML mapping that describes how to read your source data and transform it into FTM entities.
2. Run the mapping:

```bash
beast mapping.yaml
```

3. Or sample a small fraction first:

```bash
beast-sample mapping.yaml --fraction 0.01
```

## Features

- **Declarative YAML mappings** - define data transformations without writing code
- **Multiple input formats** - CSV, TSV, JSON, JSONL, with support for compressed and remote files (via `smart_open`)
- **Rich property pipelines** - column extraction, literals, Jinja2 templates, regex operations, transformers, augmentors
- **Nested collections** - handle hierarchical data with JMESPath traversal
- **Statement metadata** - attach provenance at dataset, collection, or property level
- **Multiprocessing** - parallel digest for CPU-bound workloads
- **Built-in transformers** - date parsing, phone/email normalization, transliteration, and more
- **FTM schema validation** - entities are validated against FollowTheMoney schemas
- **Custom FTM ontologies** - extend or replace the standard FTM model with your own schemas

## Mapping Example

```yaml
id: my_dataset

ingest:
  cls: thebeast.ingest.CSVDictReader
  params:
    input_uri: ./people.csv

digest:
  cls: thebeast.digest.SingleProcessDigestor
  meta:
    dataset: { literal: MY_DATASET }
  collections:
    persons:
      path: "[@]"
      entities:
        person:
          schema: Person
          keys:
            - record.id
          properties:
            name:
              template: "{{ record.first }} {{ record.last }}"
            birthDate:
              column: birth
            email:
              column: emails
              regex_split: "[;,]"

dump:
  cls: thebeast.dump.StatementsCSVWriter
  params:
    output_uri: ./output.csv
    error_uri: ./errors.csv
```

## Documentation

Full documentation is available in `docs/README.md`, covering:

- Mapping format and all property operations
- Ingestors, digestors, and dumpers
- Statement metadata and provenance
- Nested collections and entity references
- Record and property transformers
- Sampling and testing workflows

## Running Tests

```bash
pip install thebeast[dev]
python -m pytest thebeast/tests/ -v
```

## License

MIT
