Metadata-Version: 2.3
Name: py-dbt-cll
Version: 0.1.3
Summary: python package to extract CLL from dbt files
Author: hi@pinkbeton.com
Requires-Python: >=3.9, <3.12
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: sqlglot (>=27.5.1,<28.0.0)
Description-Content-Type: text/markdown

### py-dbt-cll

![PyPI version](https://img.shields.io/pypi/v/py-dbt-cll.svg)&nbsp;![GitHub tag (latest SemVer)](https://img.shields.io/github/v/tag/ngmiduc/py-dbt-cll?label=version)&nbsp;![Publish](https://github.com/ngmiduc/py-dbt-cll/actions/workflows/publish.yml/badge.svg)&nbsp;![Tests](https://github.com/ngmiduc/py-dbt-cll/actions/workflows/test.yml/badge.svg)

Python packages that extracts column lineage information from dbt models based on their metadata in the manifest file. It does not require any connection to the database and it only uses sqlGlot to extract the column level lineage information from a SQL query. Before the query is passed into sqlGlot, the query is modified with additional information from the manifest file, so that the column lineage can be accurately determined.

### Installation

You can install the package using pip:

```bash
pip install py-dbt-cll
```

### Usage

Import the class from the module.

```bash
from py_dbt_cll.dbt_lineage import DbtCLL
```

Load your manifest file from the json file and create the class instance with it. After that you can use the method `extract_cll` to extract column lineage information from your SQL queries.

```py
with open("tests/manifest.json", "r", encoding="utf-8") as file:
    manifest_data = json.load(file)
ccl = DbtCLL(manifest_data)

sql = """
    select *
    from (
        select *
        from ...
    ) as final
"""
columns = ["academic_year_id", "date_id"]
lineage = ccl.extract_cll(sql, columns, debug=False)
```

Parameters:

- `sql` (str): The SQL query from which to extract column lineage.
- `columns` (list): A list of column names to extract lineage for.
- `debug` (bool): Whether to enable debug mode for more verbose output. (default is False)
- `dialect` (str): The SQL dialect to use for parsing the SQL query (default is "tsql").

Returns:

- dict: A dictionary mapping column names to their lineage information.

