Metadata-Version: 2.3
Name: prophecy-lineage-extractor
Version: 0.21.1
Summary: 
Author: Ashish Patel
Author-email: ashishpatel0720@gmail.com
Requires-Python: >=3.9,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: openpyxl (>=3.1.5,<4.0.0)
Requires-Dist: pandas (>=2.2.3,<3.0.0)
Requires-Dist: requests (>=2.32.3,<3.0.0)
Requires-Dist: sqlglot (>=18.11.6,<19.0.0)
Requires-Dist: websocket-client (>=1.8.0,<2.0.0)
Description-Content-Type: text/markdown

# Prophecy Lineage Extractor Documentation

## Description
The `Prophecy Lineage Extractor` is a tool to extract lineage information from Prophecy projects and pipelines. It allows users to specify a project, pipeline, and branch, and outputs the extracted lineage to a specified directory. Optional features include email notifications.

---

## Usage
```bash
python -m prophecy_lineage_extractor --project-id &lt;PROJECT_ID&gt; --pipeline-id &lt;PIPELINE_ID&gt; --output-dir &lt;OUTPUT_DIRECTORY&gt; [--send-email] [--branch &lt;BRANCH_NAME&gt;] [--run-for-all]

```
* We must need to set these env variables **PROPHECY_URL** and **PROPHECY_PAT**
---
## Arguments
### Required Arguments
* **--project-id**
  * Type: str
  * Description: Prophecy Project ID.
  * Required: Yes

*  **--pipeline-id** 
    * Type: str
    * Description: Prophecy Pipeline ID.
    * Required: Yes / Optional if using `knowledge-graph` type reader. 
* **--output-dir**
  * Type: str
  * Description: Output directory inside the project where lineage files will be stored.
  * Required: Yes

### Optional Arguments
* --reader:
  * Type: str
  * Description: Reading adapter to use 
    * Spark Lineage (`lineage`) or
      * Note that `pipeline-id` is mandatory argument for this method at the moment.
    * SQL Knowledge Graph (`knowledge-graph`)
* --writer:
  * Type: str
  * Description: Data Format to write to from among:
    * Excel Files (xlxs sheet)
      * We save a sheet with name lineage_<project-id>_(<optional_pipeline-ids>.xlsx) will be created in `<output_dir>`.
      * For each pipeline mentioned in the query, an Excel sheet is created. If `run-for-all` is used, a sheet per pipeline is created. 
      * If `run-for-all` flag is used, an additional `Overall Project` sheet will be created. NOTE that it 
    * Openlineage Format
      * We save [Dummy Run Events](https://openlineage.io/apidocs/openapi/) JSON files in OpenLineage format in the output-dir/<project-id> folder 
      * We attempt to make an API Call to an Openlineage compatible frontend like [Marquez](https://marquezproject.ai/) or [Datahub](https://datahub.com/).
      * We support Column Level Lineage as well as Project Level Lineage via this method.
* --run-for-all
  * Type: boolean flag
  * Description: If Specified, a Project level Lineage Excel file is created as an Overall Project.
* --send-email
  * Type: flag
  * Description: If specified, sends an email with the generated lineage report to ENV variable **RECEIVER_EMAIL**.
  * We must set following Env variables for this option if passed
    * SMTP_HOST
    * SMTP_PORT
    * SMTP_USERNAME
    * SMTP_PASSWORD
    * RECEIVER_EMAIL
    
* --branch
  * Type: str
  * Description: Branch to run the lineage extractor on.
  * Default: default branch in Prophecy, generally 'main or master'

---
## Running

* Please run extractor as following, it needs env variables
* we Only need to set SMTP creds if we plan to pass `--send-email` argument

```shell
export PROPHECY_URL=https://app.prophecy.io
export PROPHECY_PAT=${{ secrets.PROPHECY_PAT }}

# These are needed if you using --send-email option
export SMTP_HOST=smtp.gmail.com
export SMTP_PORT=587
export SMTP_USERNAME=${{ secrets.SMTP_USERNAME }}
export SMTP_PASSWORD=${{ secrets.SMTP_PASSWORD }}
export RECEIVER_EMAIL=ashish@prophecy.io

python -m prophecy_lineage_extractor --project-id 36587 --pipeline-id 36587/pipelines/customer_orders_demo --send-email --branch dev
```

---
## Github Action Guide

* This extactor can be setup in Github Action of a Prophecy project to get email of lineage on every commit to main
* Following is a sample of github action we can use on default branch
[Github Action default branch](https://github.com/pateash/ProphecyHelloWorld/blob/main/.github/workflows/prophecy_lineage_extractor.yml)

* Following is a sample of github action we can use on custom branch
[Github Action custom branch](https://github.com/pateash/ProphecyHelloWorld/blob/main/.github/workflows/prophecy_lineage_extractor_dev.yml)


---
## Gitlab Action Guide
* Following is a sample of gitlab action we can use on a branch
[Gitlab Action guide](https://github.com/pateash/ProphecyHelloWorld/blob/main/.gitlab-ci.yml)
* Note—we need to create gitlab CI/CD variables(secrets) for using them in our YML file, ex. SMTP_USER etc.
* additionally, we will also need to setup an ACCESS_TOKEN to allow the JOB to commit if commit is enabled.

