Cepheus
How-To Guide
benza@cepheus.in
Cepheus Benza

Get started with PowerCenter conversion

From export analysis to signed governance artifacts. Six target platforms, airgapped operation, full audit trails.

Installation
pip install cepheus-benza
cepheus-benza --version
Minimal spec.toml
[analyze]
path   = "./exports/"
output = "./analysis.pdf"

[convert]
path    = "./exports/"
license = "./license.pdf"
output  = "./output/"
target  = "dbt"
dialect = "snowflake"

Every command reads from spec.toml. CLI flags override spec values when provided. Relative paths resolve from the spec file's directory.

1
Analyze Your Exports

Scan PowerCenter XML exports to assess automation potential before committing to a conversion.

Automation Rate
Per-mapping confidence levels across the portfolio
Complexity Tiers
Simple / moderate / complex for mapping and orchestration
Degradation Report
Untranslatable nodes and manual sessions disclosed upfront
SHA256 Checksums
Embedded in the PDF for license issuance
spec.toml
[analyze]
path         = "./exports/"
output       = "./analysis.pdf"
summary_only = false
no_recursion = false

Submit analysis_report.pdf to benza@cepheus.in. The PDF embeds cryptographic fingerprints of your exports. Cepheus replies with a license file.

Why checksums? The license is tied to your export files as they existed when analyzed. Modified files won't match and conversion is blocked — preventing accidental conversion of untested exports.

2
Obtain and Verify a License
spec.toml
[verify_license]
path    = "./exports/"
license = "./license.pdf"
VALID      repo_export.xml      (sha256: a1b2c3...)
VALID      archive_export.xml   (sha256: f6e5d4...)

MISMATCH = file modified since analysis (re-run analyze). UNLICENSED = not in this license.

3
Convert
spec.toml
[convert]
path       = "./exports/"
license    = "./license.pdf"
output     = "./output/"
target     = "dbt"          # dbt | pyspark | snowpark | glue | adf | dataform
dialect    = "snowflake"    # snowflake | bigquery | redshift | databricks | synapse | generic
orchestrator = "airflow"    # python | airflow | dagster | prefect (dbt only)

Target platforms

dbt
SQL models, Jinja macros, snapshots
PySpark
Databricks notebooks, job scripts
Snowpark
Snowflake stored procedures
AWS Glue
Glue PySpark + GlueContext scripts
ADF
Mapping Data Flows, Pipeline JSON
Dataform
BigQuery SQLX models

Platform-specific settings

spec.toml — PySpark
[convert]
target         = "pyspark"
workspace_path = "/Workspace/benza"
spec.toml — Snowpark
[convert]
target    = "snowpark"
warehouse = "COMPUTE_WH"
spec.toml — Glue
[convert]
target        = "glue"
glue_database = "benza_db"

Custom translation rules

spec.toml
[convert]
rules_dir = "./my_custom_rules"

See Custom Rules for how to write rule classes.

Understanding the Output

Structure varies by target. All produce a reports/ directory with signed governance artifacts.

dbt

output/
├── dbt_project.yml
├── models/           <model>.sql + sources.yml + .yml docs
├── macros/           <mapplet>.sql
├── snapshots/        SCD Type 2 snapshot configs
├── orchestration/    Airflow / Dagster / Prefect / Python
├── validation/       Reconciliation queries
└── reports/          audit_trail.json + .pdf | equivalence + .pdf | risk_scores.json

PySpark

output/
├── jobs/             PySpark job scripts
├── notebooks/        Databricks notebook cells
├── orchestration/    Databricks Workflow JSON
└── reports/

Snowpark

output/
├── procedures/       Snowpark stored procedures
├── orchestration/    Snowflake Task DAG (CREATE TASK)
└── reports/

Glue

output/
├── jobs/             Glue PySpark + GlueContext
├── orchestration/    Step Functions | Glue Workflow | Airflow
└── reports/

ADF

output/
├── dataflows/        Mapping Data Flow JSON
├── pipelines/        ADF Pipeline JSON
├── datasets/ + linkedServices/ + arm_template.json
└── reports/

Dataform

output/
├── definitions/      SQLX models
├── includes/         JavaScript macros
├── workflow_settings.yaml
└── reports/
📄

About the PDFs: Every PDF is PDF/A-3 with JSON data embedded as an attachment. One file for humans, one for machines. HMAC integrity metadata detects tampering.

Confidence levels

LevelMeaning
exact_equivalentFunctionally identical
semantic_equivalentSame result, different syntax
behavioral_equivalentSame in normal cases; edge cases may differ
approximationBest-effort. Manual review required
stubNot translated. Manual implementation required
4
Review in the Workbench

Local Flask dashboard for reviewing conversion results.

spec.toml
[serve]
output_dir = "./output/"
port       = 8080
host       = "127.0.0.1"
Risk Heatmap
Mappings colour-coded by risk score
Confidence Breakdown
Distribution across all translated nodes
Mapping Detail
Full output, audit trail, approval controls
5
Approve or Flag

From the Workbench, each mapping can be Approved or Flagged (with notes). Decisions are recorded in reports/approvals.jsonl — append-only, immutable, timestamped. Part of the governance evidence trail.

6
Reconcile Against Live Data

Optional. Runs validation queries against source and target databases, compares results.

spec.toml
[reconcile]
output_dir    = "./output/"
source_conn   = "postgresql://user:pass@source-host:5432/db"
target_conn   = "snowflake://user:pass@account/db/schema?warehouse=wh"
source_schema = "operational"
target_schema = "analytics"
tolerance     = 0.0
timeout       = 120

Database drivers

WarehousePackage
Snowflakesnowflake-sqlalchemy
BigQuerysqlalchemy-bigquery
Redshiftsqlalchemy-redshift
PostgreSQLpsycopg2-binary
Databricksdatabricks-sql-connector

Result statuses

StatusMeaning
greenAll checks passed within tolerance
yellowPassed within tolerance but not exact
redOne or more checks failed
errorQuery execution error

Governance Artifacts

Produced on every conversion run. Cannot be disabled.

audit_trail.json
Per-node translation record: rule, confidence, warnings, reviewer action
equivalence_report.json
Per-mapping behavioural equivalence with step-level detail
risk_scores.json
Risk classification for Workbench heatmap
Signed PDFs
PDF/A-3 with embedded JSON. HMAC tamper detection.

Audit trail commands

spec.toml
[audit_report]
output_dir = "./output/"
export     = ""               # leave blank for all, or set to a mapping name
printable  = false
Target Platforms
PlatformTarget valueOutputOrchestration
dbtdbtSQL models, macros, snapshotsAirflow, Dagster, Prefect, Python
PySparkpysparkJob scripts, notebooksDatabricks Workflows
SnowparksnowparkStored proceduresSnowflake Tasks
AWS GlueglueGlue PySpark scriptsStep Functions, Glue Workflows, Airflow
ADFadfData Flows, Pipelines, ARMADF Pipelines
DataformdataformSQLX, JS includesDataform schedules

All share the same translation engine, expression grammar, and SCD detection. Governance artifacts are identical across targets.

SQL Dialects dbt only
DialectValueHighlights
Generic ANSIgenericDefault
SnowflakesnowflakeFLATTEN, TO_TIMESTAMP_NTZ
BigQuerybigqueryDATE_TRUNC, SAFE_DIVIDE
RedshiftredshiftGETDATE(), DATEADD
DatabricksdatabricksDATE_FORMAT, DATEDIFF
SynapsesynapseCONVERT, DATEPART
💡

Other targets (PySpark, Snowpark, Glue, ADF, Dataform) generate platform-native code directly — no dialect transpilation.

Orchestrators

dbt target

OrchestratorValueOutput
PythonpythonPlain Python script
AirflowairflowDAG with PythonOperator
DagsterdagsterJob with @op definitions
PrefectprefectFlow with @task definitions

Other targets — built-in

TargetOrchestration
PySparkDatabricks Workflow JSON
SnowparkSnowflake Task DAG
GlueStep Functions + Glue Workflow + Airflow DAG
ADFADF Pipeline JSON
Dataformworkflow_settings.yaml
Custom Translation Rules
from cepheus_benza.core.rules.base import TranslationRule

class MyLookupRule(TranslationRule):
    priority = 50

    def applies_to(self, node) -> bool:
        return node.transformation_type == "Lookup" and "CUSTOM_" in node.name

    def translate(self, node, context):
        return node_output

Place rule files in a directory and set rules_dir in [convert]. Higher priority takes precedence over built-in rules.

$
Environment Variables

Override spec.toml values. Useful for credentials you don't want in config files.

VariableSpec equivalent
CEPHEUS_BENZA_SOURCE_CONN[reconcile] source_conn
CEPHEUS_BENZA_TARGET_CONN[reconcile] target_conn
CEPHEUS_BENZA_OUTPUT_DIRECTORY[convert] output
CEPHEUS_BENZA_LOGGING_LEVEL
CEPHEUS_BENZA_SMTP_PASSWORD[notification] smtp_password
?
Troubleshooting

"No licensed exports found"

Checksums don't match. File was modified after analysis or wrong license. Re-run analyze and request an updated license.

"LicenseNotFoundError"

Check the license path in spec.toml. Ensure the file exists and is readable.

Parse errors

Not a valid PowerCenter export. Verify: exported from Repository Manager, UTF-8 or Latin-1 encoding, file not truncated.

Missing output models

Some types produce no output without a connected Target. Check audit trail for stub confidence or disconnected node warnings.

Reconciliation: sqlalchemy not installed

pip install sqlalchemy <driver-for-your-warehouse>

Reconciliation: connection errors

  • Verify connection string format matches your driver
  • Check network access to both databases
  • Use timeout in spec.toml to increase per-query limit

Workbench shows no models

Verify output_dir in [serve] matches output in [convert], and reports/audit_trail.json exists.