Metadata-Version: 2.4
Name: cepheus-benza
Version: 5.3.98
Summary: Airgapped Informatica PowerCenter automated conversion tool with signed audit trails and governance evidence
Author-email: Cepheus Engineering <info@cepheus.in>
License: Proprietary
Project-URL: Homepage, https://cepheus.in/benza
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: lxml>=4.9
Requires-Dist: click>=8.0
Requires-Dist: rich>=13.0
Requires-Dist: lark>=1.1
Requires-Dist: sqlglot>=17.0
Requires-Dist: flask>=3.0
Requires-Dist: reportlab>=4.0
Requires-Dist: pypdf>=4.0
Requires-Dist: cryptography>=42.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: tomli>=2.0; python_version < "3.11"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: jsonschema>=4.17; extra == "dev"
Provides-Extra: signing
Requires-Dist: pyhanko>=0.21; extra == "signing"
Requires-Dist: pyhanko-certvalidator>=0.26; extra == "signing"
Dynamic: license-file

# Cepheus Benza

Airgapped Informatica PowerCenter conversion tool that produces signed audit trails, equivalence reports, risk scores, and pre-conversion analysis — all as tamper-evident PDFs with embedded JSON attachments.

Cepheus Benza reads a PowerCenter repository XML export and writes production-ready code for your target platform — without connecting to any database or network during conversion. Before converting, the analysis report tells you exactly what can be automated, what requires manual effort, and the complexity tier of each workflow, so you know what you're getting into before committing.

---

## Target platforms

| Platform | Output | Orchestration |
|---|---|---|
| **dbt** | SQL models, macros, snapshots, sources.yml | Airflow DAG, Dagster job, Prefect flow, Python script |
| **PySpark / Databricks** | PySpark job scripts, notebook cells | Databricks Workflows |
| **Snowpark / Snowflake** | Snowpark stored procedures | Snowflake Tasks |
| **AWS Glue** | Glue PySpark job scripts | Step Functions, Glue Workflows, Airflow DAG |
| **Azure Data Factory** | Mapping Data Flow JSON, Pipeline JSON | ADF Pipelines |
| **Dataform / BigQuery** | SQLX models (dbt-compatible) | Dataform schedules |

---

## Key capabilities

| Capability | Details |
|---|---|
| Airgapped operation | Core conversion runs with no network access; reconciliation is optional and separate |
| Pre-conversion analysis | Analysis report shows automation rate, complexity tiers (simple/moderate/complex), and manual session count per workflow — before you convert |
| Signed audit trails | Per-node translation records with confidence levels, rule attribution, and reviewer actions — PDF/A-3 with embedded JSON |
| Equivalence reports | Per-mapping behavioural equivalence assessments with step-level detail |
| Risk scores | Risk classification for each mapping, surfaced in the Workbench heatmap |
| Transformation coverage | 17 Informatica types: Source, Target, Expression, Filter, Lookup, Aggregator, Joiner, Router, Union, Sequence Generator, Normalizer, Rank, Sorter, Mapplet, Stored Procedure, Update Strategy, SCD |
| Expression functions | 111 Informatica PC 10.5 functions translated by formal grammar (Lark + sqlglot) |
| SQL dialect transpilation | Snowflake, BigQuery, Redshift, Databricks, Azure Synapse, generic ANSI SQL |
| SCD support | Type 1, 2, and 3 detection with dbt snapshots for Type 2 |
| Workflow orchestration | Full conversion of Informatica workflows, sessions, worklets, and decision tasks to target orchestration |
| Workbench UI | Local Flask dashboard for reviewing and approving converted mappings |

---

## Requirements

- Python 3.9 or later
- A licensed Cepheus Benza installation (see Licensing below)

---

## Installation

```bash
pip install cepheus-benza
```

Verify the installation:

```bash
cepheus-benza --version
```

---

## Quick start

Every command takes a `spec.toml` project file as its first argument. Copy `docs/spec.toml` as a starting point and fill in the paths for your project. CLI flags override spec values when provided.

### 1. Analyze your exports

```bash
cepheus-benza analyze spec.toml
```

Scans the directory for PowerCenter XML files and produces `analysis_report.pdf` — automation rate, per-mapping confidence levels, complexity tiers, and degradation addendum. The PDF embeds SHA256 checksums as a JSON attachment. Submit it to benza@cepheus.in to obtain a license file.

### 2. Verify your license

Add the license path to your `spec.toml`:

```toml
[verify_license]
license = "./license.pdf"
```

Then verify:

```bash
cepheus-benza verify-license spec.toml
```

The output shows `VALID`, `MISMATCH`, or `UNLICENSED` for each export file. If a file shows `MISMATCH`, it was modified since analysis — re-run `analyze` and request a new license.

### 3. Convert

```bash
# dbt (default)
cepheus-benza convert spec.toml

# dbt with Snowflake dialect and Airflow orchestration
cepheus-benza convert spec.toml --dialect snowflake --orchestrator airflow

# PySpark / Databricks
cepheus-benza convert spec.toml --target pyspark

# Snowpark / Snowflake
cepheus-benza convert spec.toml --target snowpark

# AWS Glue
cepheus-benza convert spec.toml --target glue

# Azure Data Factory
cepheus-benza convert spec.toml --target adf
```

### 4. Review in the Workbench

```bash
cepheus-benza serve spec.toml
```

Open `http://localhost:8080` in your browser. The Workbench shows a dashboard of all converted mappings with risk scores, SQL previews, audit trails, and an approval workflow.

---

## Output structure (dbt target)

```
dbt_project/
├── dbt_project.yml          # dbt project config; var declarations for $$ parameters
├── README.md                # Conversion summary (mapping count, model count, warnings)
├── models/
│   ├── sources.yml          # Source table definitions
│   ├── <model>.sql          # SELECT with {{ source() }} and {{ ref() }} macros
│   └── <model>.yml          # Column-level tests and documentation
├── macros/
│   └── <mapplet>.sql        # Mapplet logic as reusable dbt macros
├── snapshots/
│   └── <snapshot>.sql       # SCD Type 2 snapshot configs
├── stubs/
│   └── <procedure>.py       # Stored procedure Python stubs for manual implementation
├── orchestration/
│   └── <workflow>.py        # Workflow DAG / script
├── params/
│   ├── dbt_vars.yml         # dbt variable declarations for workflow parameters
│   └── env_template.sh      # Shell env file template
├── validation/
│   └── <model>_validation.sql   # Row-count and aggregate comparison queries
└── reports/
    ├── audit_trail.json         # Per-node translation records with confidence levels
    ├── audit_report.pdf         # Signed audit trail PDF (embedded JSON attachment)
    ├── equivalence_report.json  # Behavioural equivalence assessments
    ├── equivalence_report.pdf   # Signed equivalence PDF (embedded JSON attachment)
    └── risk_scores.json         # Risk scoring for each translated mapping
```

Other targets produce equivalent structures: PySpark job scripts, Snowpark stored procedures, Glue job scripts, or ADF JSON definitions — each with the same `reports/` directory containing signed audit trails and equivalence reports.

---

## Licensing

Cepheus Benza uses a checksum-based license that ties a specific license file to specific export files.

```
analyze  →  submit report  →  receive license  →  convert
```

The `analyze` command produces a PDF containing SHA256 checksums of your export files embedded as a JSON attachment. Cepheus issues a license that approves exactly those files. The `convert` command validates the checksums before running the pipeline. This prevents accidental conversion of modified or untested exports.

To produce a black-on-white printable version of any PDF output, set `printable = true` in `[global]` or use the `--printable` flag:

```bash
cepheus-benza --printable analyze spec.toml
```

---

## Governance artifacts

Four signed PDF reports are produced across the analyze and convert workflow. They cannot be disabled.

**`analysis_report.pdf`** — produced by `analyze`, before conversion:
- Automation rate, mapping classification, node confidence breakdown
- Mapping and orchestration complexity tiers (simple / moderate / complex)
- Degradation addendum listing untranslatable nodes and manual sessions
- Embedded `analysis_checksums.json` with SHA256 fingerprints for licensing

**`audit_report.pdf`** — produced by `convert`:
- Every translation decision: node name, type, mapping, handling rule
- Confidence level: `exact_equivalent`, `semantic_equivalent`, `behavioural_equivalent`, `approximation`, or `stub`
- Informatica input snapshot and target output snapshot
- Reviewer action required per node

**`equivalence_report.pdf`** — produced by `convert`:
- Per-mapping behavioural equivalence assessment with step-level detail

**`risk_scores.json`** — produced by `convert`:
- Risk classification for each mapping, surfaced in the Workbench risk heatmap

All PDFs are PDF/A-3 with embedded JSON attachments and HMAC integrity signatures.

---

## Reconciliation (optional)

The reconciliation engine requires SQLAlchemy and a database driver for your warehouse. Install both before using the `reconcile` command — run `cepheus-benza howto` for the full guide with driver packages and connection setup.

After loading your converted output into the target warehouse:

```bash
cepheus-benza reconcile spec.toml
```

Results are written to `reports/reconciliation_report.json` and `reports/reconciliation_report.pdf`, and surfaced in the Workbench.

---

## Further reading

The complete How-To Guide is bundled with the package. After installation:

```bash
cepheus-benza howto
```

This opens the full guide in your browser — installation, spec.toml configuration, all six target platforms, reconciliation setup, governance artifacts, custom rules, environment variables, and troubleshooting. Works offline.
