Metadata-Version: 2.4
Name: sql-dag-flow
Version: 0.1.8
Summary: A sophisticated SQL lineage visualization tool for Medallion Architectures.
Author-email: Flavio Sandoval <dsandovalflavio@gmail.com>
License: MIT
Keywords: sql,lineage,dag,visualization,medallion-architecture,data-engineering
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: fastapi
Requires-Dist: uvicorn
Requires-Dist: sqlglot
Requires-Dist: networkx
Requires-Dist: pydantic
Dynamic: license-file

# SQL DAG Flow

> **"Static Data Lineage for Modern Data Engineers. No databases, just code."**

**SQL DAG Flow** is a lightweight, open-source Python library designed to transform your SQL code into visual architecture.

Unlike traditional lineage tools that require active database connections or query log access, **SQL DAG Flow** performs **static analysis (parsing)** of your local `.sql` files. This allows for instant, secure dependency visualization, bottleneck identification, and Data Lineage documentation without leaving your development environment.

Specially optimized for the **Medallion Architecture** (Bronze, Silver, Gold) and modern stacks (DuckDB, BigQuery, Snowflake), it bridges the gap between the code you write and the architecture you design.

## 💡 Philosophy: Why this exists

*   **Local-First & Zero-Config**: You don't need to configure servers, cloud credentials, or Docker containers. If you have SQL files, you have a diagram.
*   **Security by Design**: By relying on static analysis, your code never leaves your machine and no access to sensitive production data is required.
*   **Living Documentation**: The diagram is generated *from* the code. If the code changes, the documentation updates, eliminating obsolete manually-drawn diagrams.

## 🎯 Objectives & Use Cases

*   **1. Legacy Code Audit & Refactoring**:
    *   *The Problem*: You join a new project with 200+ undocumented SQL scripts. Nobody knows what breaks what.
    *   *The Solution*: Run `sql-dag-flow` to instantly map the "spaghetti" dependencies. Identify orphan tables, circular dependencies, and the impact of changing a Silver layer table.
*   **2. Automated Architecture Documentation**:
    *   *The Problem*: Architecture diagrams in Lucidchart or Visio are always outdated.
    *   *The Solution*: Generate interactive pipeline visualizations (ETL/ELT) to include in your Pull Requests, Wikis, or client deliverables.
*   **3. Medallion Architecture Validation**:
    *   *The Problem*: It's hard to verify if the logical separation of layers (Bronze → Silver → Gold) is being respected.
    *   *The Solution*: The tool visually groups your scripts by folder structure, allowing you to validate that data flows correctly between quality layers without improper "jumps".
*   **4. Accelerated Onboarding**:
    *   *The Problem*: Explaining data flow to new engineers takes hours of whiteboard drawing.
    *   *The Solution*: Deliver an interactive map where new team members can explore where data comes from, view associated SQL code, and understand business logic without reading thousands of lines of code.

![SQL DAG Flow Screenshot](/images/sql-architecture-logo.png)

## 🚀 Key Features

*   **Automatic Parsing & Visualization**: Recursively scans your project folders to find `.sql` files and detect dependencies (`FROM`, `JOIN`, `CTE`s) using `sqlglot`.
*   **Medallion Architecture Support**: Automatically categorizes and colors nodes based on folder structure:
    *   🟤 **Bronze**: Raw ingestion layers.
    *   ⚪ **Silver**: Cleaned and conformed data.
    *   🟡 **Gold**: Business-level aggregates.
*   **Smart Folder Selection**:
    *   **Selective Exploration**: Choose specific subfolders to analyze using an interactive tree view.
    *   **Deep Filtering**: Focus only on relevant parts of your pipeline.
*   **Advanced Organization**:
    *   **Selection Toolbar**: Multi-select nodes and align them horizontally/vertically.
    *   **Node Hiding**: Hide specific nodes or entire trees to declutter the view.
    *   **Auto Layout**: Automatically arrange nodes using Dagre layout engine.
*   **Configuration Management**:
    *   **Save & Load**: Persist your layouts, hidden nodes, and viewport settings to JSON.
    *   **Workspaces**: manage multiple project configurations.
*   **Rich Metadata**:
    *   **Details Panel**: View full SQL content and schema details.
    *   **Annotations**: Add sticky notes with **Markdown support**, resize them, and create visual groups.
*   **Visual Cues**:
    *   **Solid Border**: Indicates a Table.
    *   **Dashed Border**: Indicates a View (auto-detected).
*   **Premium UI/UX**:
    *   **Dark/Light Modes**: Themed for your preference.
    *   **Export**: Save as high-resolution **PNG** or vector **SVG**.

*   **Discovery Mode (New 🔍)**:
    *   **Visualize Missing Files**: Detects dependencies referenced in your SQL that don't satisfy the parser (e.g. valid external tables or missing files).
    *   **Ghost Nodes**: These appear as "External" (Orange) nodes.
    *   **Quick Creation**: Right-click any ghost node to instantly create the corresponding SQL file with a pre-filled template.

## 🌍 Supported Dialects

Powered by `sqlglot`, supporting:
*   **BigQuery** (Default)
*   **Snowflake**
*   **PostgreSQL**
*   **Spark / Databricks**
*   **Amazon Redshift**
*   **DuckDB**
*   **MySQL**
*   ...and more.

## 📦 Installation

Install easily via `pip`:

```bash
pip install sql-dag-flow
```

To update to the latest version:

```bash
pip install --upgrade sql-dag-flow
```

## ▶️ Usage

### 1. Command Line Interface (CLI)

You can run the tool directly from your terminal:

```bash
# Analyze the current directory
sql-dag-flow

# Analyze a specific SQL project
sql-dag-flow /path/to/my/dbt_project
```

### 2. Python API

Integrate it into your Python scripts or notebooks:

```python
from sql_dag_flow import start

# Start the server and open the browser
start(directory="./my_sql_project")
```

## 📂 Project Structure Expectations

SQL DAG Flow is opinionated but flexible. It looks for standard Medallion Architecture naming conventions to assign colors:

*   **Bronze Layer**: Any folder named `bronze`, `raw`, `landing`, or `staging`.
*   **Silver Layer**: Any folder named `silver`, `intermediate`, or `conformed`.
*   **Gold Layer**: Any folder named `gold`, `mart`, `serving`, or `presentation`.
*   **Other**: Any other folder is categorized as "Other" (Gray).

## 🛠️ Configuration & Customization

### Settings
Click the **Settings (Gear)** icon in the bottom toolbar to:
*   **Change SQL Dialect**: Ensure your specific SQL syntax is parsed correctly.
*   **Toggle Node Style**: Switch between "Full" (colored body) and "Minimal" (colored border) styles.
*   **Change Palette**: Switch between Standard, Vivid, and Pastel color palettes.

### Saving Layouts
Your graph layout (positions, hidden nodes) is **not** permanent by default. To save your work:
1.  Click **Save** in the top bar.
2.  Choose a filename (e.g., `marketing_flow.json`).
3.  Next time, click **Load** to restore that exact view.

## 🤝 Contributing

Contributions are welcome!
1.  Fork the repository.
2.  Create a feature branch.
3.  Submit a Pull Request.

---
*Created by [Flavio Sandoval](https://github.com/dsandovalflavio)*
