Metadata-Version: 2.4
Name: datalineagepy
Version: 3.0.3
Summary: Enterprise-grade Python data lineage tracking library with automatic pandas integration, perfect memory optimization, and comprehensive visualization capabilities.
Author-email: Arbaz Nazir <arbaznazir4@gmail.com>
Maintainer-email: Arbaz Nazir <arbaznazir4@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/Arbaznazir/DataLineagePy
Project-URL: Documentation, https://github.com/Arbaznazir/DataLineagePy/tree/main/docs
Project-URL: Repository, https://github.com/Arbaznazir/DataLineagePy
Project-URL: Bug Tracker, https://github.com/Arbaznazir/DataLineagePy/issues
Project-URL: Feature Requests, https://github.com/Arbaznazir/DataLineagePy/discussions
Project-URL: Release Notes, https://github.com/Arbaznazir/DataLineagePy/blob/main/CHANGELOG.md
Project-URL: Source Code, https://github.com/Arbaznazir/DataLineagePy
Project-URL: Download, https://pypi.org/project/datalineagepy/
Project-URL: User Guide, https://github.com/Arbaznazir/DataLineagePy/tree/main/docs/user-guide
Project-URL: API Reference, https://github.com/Arbaznazir/DataLineagePy/tree/main/docs/api
Project-URL: Quick Start, https://github.com/Arbaznazir/DataLineagePy/blob/main/docs/quickstart.md
Project-URL: Installation Guide, https://github.com/Arbaznazir/DataLineagePy/blob/main/docs/installation.md
Project-URL: Examples, https://github.com/Arbaznazir/DataLineagePy/tree/main/examples
Project-URL: FAQ, https://github.com/Arbaznazir/DataLineagePy/blob/main/docs/faq.md
Keywords: data-lineage,pandas,data-tracking,etl,data-pipeline,data-governance,visualization,enterprise,analytics,ml-ops
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Database
Classifier: Topic :: Documentation
Classifier: Topic :: Office/Business
Classifier: Topic :: Utilities
Classifier: Typing :: Typed
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.3.0
Requires-Dist: numpy>=1.21.0
Requires-Dist: matplotlib>=3.3.0
Requires-Dist: networkx>=2.6.0
Requires-Dist: tqdm>=4.60.0
Requires-Dist: psutil>=5.8.0
Provides-Extra: viz
Requires-Dist: plotly>=5.0.0; extra == "viz"
Requires-Dist: graphviz>=0.16; extra == "viz"
Requires-Dist: seaborn>=0.11.0; extra == "viz"
Requires-Dist: bokeh>=2.3.0; extra == "viz"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=3.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: flake8>=4.0.0; extra == "dev"
Requires-Dist: mypy>=0.950; extra == "dev"
Requires-Dist: pre-commit>=2.17.0; extra == "dev"
Requires-Dist: sphinx>=4.0.0; extra == "dev"
Requires-Dist: sphinx-rtd-theme>=1.0.0; extra == "dev"
Requires-Dist: twine>=4.0.0; extra == "dev"
Requires-Dist: build>=0.8.0; extra == "dev"
Provides-Extra: test
Requires-Dist: pytest>=7.0.0; extra == "test"
Requires-Dist: pytest-cov>=3.0.0; extra == "test"
Requires-Dist: pytest-xdist>=2.5.0; extra == "test"
Requires-Dist: pytest-benchmark>=3.4.0; extra == "test"
Requires-Dist: hypothesis>=6.0.0; extra == "test"
Requires-Dist: factory-boy>=3.2.0; extra == "test"
Provides-Extra: docs
Requires-Dist: sphinx>=4.0.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.0.0; extra == "docs"
Requires-Dist: myst-parser>=0.17.0; extra == "docs"
Requires-Dist: sphinx-autodoc-typehints>=1.17.0; extra == "docs"
Requires-Dist: sphinx-copybutton>=0.5.0; extra == "docs"
Provides-Extra: performance
Requires-Dist: memory-profiler>=0.60.0; extra == "performance"
Requires-Dist: line-profiler>=3.5.0; extra == "performance"
Requires-Dist: py-spy>=0.3.0; extra == "performance"
Requires-Dist: scalene>=1.5.0; extra == "performance"
Provides-Extra: enterprise
Requires-Dist: cryptography>=3.4.0; extra == "enterprise"
Requires-Dist: sqlalchemy>=1.4.0; extra == "enterprise"
Requires-Dist: redis>=4.0.0; extra == "enterprise"
Requires-Dist: celery>=5.2.0; extra == "enterprise"
Requires-Dist: prometheus-client>=0.14.0; extra == "enterprise"
Provides-Extra: cloud
Requires-Dist: boto3>=1.20.0; extra == "cloud"
Requires-Dist: azure-storage-blob>=12.8.0; extra == "cloud"
Requires-Dist: google-cloud-storage>=2.0.0; extra == "cloud"
Requires-Dist: apache-airflow>=2.3.0; extra == "cloud"
Provides-Extra: ml
Requires-Dist: scikit-learn>=1.0.0; extra == "ml"
Requires-Dist: xgboost>=1.5.0; extra == "ml"
Requires-Dist: lightgbm>=3.3.0; extra == "ml"
Requires-Dist: tensorflow>=2.8.0; extra == "ml"
Requires-Dist: torch>=1.11.0; extra == "ml"
Provides-Extra: all
Requires-Dist: datalineagepy[cloud,dev,docs,enterprise,ml,performance,test,viz]; extra == "all"
Dynamic: license-file

# 🚀 DataLineagePy 3.0

**Enterprise-Grade Python Data Lineage Tracking**

[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Production Ready](https://img.shields.io/badge/status-production%20ready-green.svg)](https://github.com/Arbaznazir/DataLineagePy)
[![Performance Score](https://img.shields.io/badge/performance-92.1%2F100-brightgreen.svg)](https://github.com/Arbaznazir/DataLineagePy)
[![Enterprise Grade](https://img.shields.io/badge/enterprise-grade%20ready-gold.svg)](https://github.com/Arbaznazir/DataLineagePy)

---

<div align="center">
  <img src="banner.jpg" width="100%" alt="DataLineagePy Banner"/>
  <h2>Beautiful, Powerful, and Effortless Data Lineage for Python</h2>
  <p>Track, visualize, and govern your data pipelines with zero friction.</p>
</div>

---

## 🌟 Why DataLineagePy?

- **Automatic, column-level lineage tracking** for all pandas DataFrames
- **Enterprise performance**: memory-optimized, scalable, and production-ready
- **Stunning visualizations**: interactive dashboards, HTML, PNG, SVG, and more
- **Plug-and-play connectors**: MySQL, PostgreSQL, SQLite, and custom sources
- **Security & compliance**: RBAC, AES-256 encryption, audit trails
- **Real-time collaboration**: WebSocket server/client for team workflows
- **ML/AI pipeline tracking**: Full auditability for machine learning steps
- **Cloud-native deployment**: Docker, Kubernetes, Helm, Terraform

---

## 📋 Table of Contents

- [Quick Start](#quick-start)
- [Installation](#installation)
- [Core Features](#core-features)
- [Usage Guide](#usage-guide)
- [Database Connectors](#database-connectors)
- [Visualization & Reporting](#visualization--reporting)
- [Performance Monitoring](#performance-monitoring)
- [Security & Compliance](#security--compliance)
- [ML/AI Pipeline Tracking](#mlai-pipeline-tracking)
- [Enterprise Deployment](#enterprise-deployment)
- [Use Cases](#use-cases)
- [Documentation](#documentation)
- [Contributing](#contributing)
- [License](#license)

---

## 🚀 Quick Start

```bash
pip install datalineagepy
```

```python
from datalineagepy import LineageTracker, LineageDataFrame
import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
tracker = LineageTracker(name="demo")
ldf = LineageDataFrame(df, name="my_df", tracker=tracker)
ldf2 = ldf.filter(ldf._df['a'] > 1)
ldf3 = ldf2.assign(c=ldf2._df['a'] + ldf2._df['b'])
tracker.visualize()  # Interactive HTML dashboard
tracker.export_lineage("lineage.json")
```

---

## 💾 Installation

- **PyPI**: `pip install datalineagepy`
- **With visualization**: `pip install datalineagepy[viz]`
- **All features**: `pip install datalineagepy[all]`
- **Conda**: `conda install -c conda-forge datalineagepy` _(coming soon)_
- **Docker**: `docker pull datalineagepy/datalineagepy:latest`

See [Installation Guide](docs/installation.md) for advanced and enterprise setup.

---

## 📚 Core Features

- **Automatic lineage tracking** for pandas DataFrames
- **Data validation**: completeness, uniqueness, range, custom rules
- **Profiling & analytics**: quality scoring, missing data, correlations
- **Visualization**: HTML, PNG, SVG, interactive dashboards
- **Performance monitoring**: execution time, memory, alerts
- **Security**: RBAC, AES-256 encryption, audit trail
- **Custom connectors**: SDK for any data source
- **Versioning**: save, diff, rollback lineage graphs
- **Collaboration**: real-time editing/viewing
- **ML/AI pipeline tracking**: AutoMLTracker for full auditability

---

## 🔧 Usage Guide

### 1. Lineage Tracking

```python
from datalineagepy import LineageTracker, LineageDataFrame
import pandas as pd
tracker = LineageTracker(name="my_pipeline")
df = pd.DataFrame({'x': [1,2,3], 'y': [4,5,6]})
ldf = LineageDataFrame(df, name="input", tracker=tracker)
ldf2 = ldf.assign(z=ldf._df['x'] + ldf._df['y'])
print(tracker.export_graph())
```

### 2. Data Validation

```python
from datalineagepy.core.validation import DataValidator
validator = DataValidator(tracker)
rules = {'completeness': {'threshold': 0.9}, 'uniqueness': {'columns': ['x']}}
results = validator.validate_dataframe(ldf, rules)
print(results)
```

### 3. Profiling & Analytics

```python
from datalineagepy.core.analytics import DataProfiler
profiler = DataProfiler(tracker)
profile = profiler.profile_dataset(ldf, include_correlations=True)
print(profile)
```

### 4. Visualization & Reporting

```python
from datalineagepy.visualization.graph_visualizer import GraphVisualizer
visualizer = GraphVisualizer(tracker)
visualizer.generate_html("lineage.html")
visualizer.generate_png("lineage.png")
```

### 5. Performance Monitoring

```python
from datalineagepy.core.performance import PerformanceMonitor
monitor = PerformanceMonitor(tracker)
monitor.start_monitoring()
_ = ldf._df.sum()
monitor.stop_monitoring()
print(monitor.get_performance_summary())
```

### 6. Security & Compliance

```python
from datalineagepy.security.rbac import RBACManager
rbac = RBACManager()
rbac.add_role('admin', ['read', 'write'])
rbac.add_user('alice', ['admin'])
print(rbac.check_access('alice', 'write'))

from datalineagepy.security.encryption.data_encryption import EncryptionManager
import os
os.environ['MASTER_ENCRYPTION_KEY'] = 'supersecretkey1234567890123456'
enc_mgr = EncryptionManager()
secret = 'Sensitive Data'
encrypted = enc_mgr.encrypt_sensitive_data(secret)
decrypted = enc_mgr.decrypt_sensitive_data(encrypted)
print(decrypted)
```

### 7. Database Connectors

```python
from datalineagepy.connectors.database.mysql_connector import MySQLConnector
from datalineagepy.core import LineageTracker
db_config = {'host': 'localhost', 'user': 'root', 'password': 'password', 'database': 'test_db'}
tracker = LineageTracker()
conn = MySQLConnector(**db_config, lineage_tracker=tracker)
conn.execute_query('SELECT * FROM test_table')
conn.close()
```

### 8. ML/AI Pipeline Tracking

```python
from datalineagepy import AutoMLTracker
tracker = AutoMLTracker(name='ml_pipeline')
tracker.log_step('fit', model='LogisticRegression', params={'solver': 'lbfgs'})
tracker.log_step('predict', model='LogisticRegression')
print(tracker.export_ai_ready_format())
```

---

## 📊 Visualization & Reporting

- **Interactive HTML dashboards**: `tracker.visualize()`
- **Export formats**: JSON, DOT, PNG, SVG, Excel, CSV
- **Custom visualizations**: Use `GraphVisualizer` for advanced needs

---

## 🗄️ Database Connectors

- **MySQL, PostgreSQL, SQLite**: Full lineage tracking for every query
- **Custom connectors**: Build your own with the SDK
- See [Database Connectors Guide](docs/user-guide/database-connectors.md)

---

## ⚡ Performance Monitoring

- **Track execution time, memory, and operation stats**
- **Alerting**: Slack, Email, custom hooks
- **Production monitoring**: Integrate with Prometheus, Grafana, etc.

---

## 🔒 Security & Compliance

- **RBAC**: Role-based access control for users and actions
- **AES-256 encryption**: At-rest and in-transit data protection
- **Audit trail**: Full operation history for compliance

---

## 🤖 ML/AI Pipeline Tracking

- **AutoMLTracker**: Log, audit, and export every ML pipeline step
- **Explainability**: Export pipeline steps for downstream analysis

---

## ☁️ Enterprise Deployment

- **Docker, Kubernetes, Helm, Terraform**: Cloud-native ready
- **Production scripts**: See `deploy/` for examples

---

## 💡 Use Cases

- **Data science**: Reproducibility, experiment tracking, Jupyter integration
- **Enterprise ETL**: Production pipelines, data quality, compliance
- **Data governance**: Impact analysis, documentation, audit trails
- **ML/AI**: Pipeline explainability, model audit, feature tracking

---

## 📖 Documentation

- [User Guide](docs/user-guide/)
- [API Reference](docs/api/)
- [Quick Start](docs/quickstart.md)
- [Enterprise Guide](docs/advanced/production.md)
- [FAQ](docs/faq.md)
- [Examples](examples/)

---

## 🤝 Contributing

We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

---

## 📄 License

MIT License. See [LICENSE](LICENSE) for details.

---

<div align="center">
  <b>DataLineagePy 3.0 &mdash; The new standard for Python data lineage</b><br/>
  <i>Beautiful. Powerful. Effortless.</i>
</div>
