Metadata-Version: 2.4
Name: scomp-link
Version: 1.1.2
Summary: The Astromech arm for your Python data projects — end-to-end ML toolkit
Author-email: Saccaggi Giacomo <giacomo.saccaggi@gmail.com>
Maintainer-email: Saccaggi Giacomo <giacomo.saccaggi@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/GiacomoSaccaggi/scomp_link
Project-URL: Documentation, https://github.com/GiacomoSaccaggi/scomp_link/wiki
Project-URL: Repository, https://github.com/GiacomoSaccaggi/scomp_link
Project-URL: Issues, https://github.com/GiacomoSaccaggi/scomp_link/issues
Project-URL: Changelog, https://github.com/GiacomoSaccaggi/scomp_link/blob/main/CHANGELOG.md
Project-URL: Wiki, https://github.com/GiacomoSaccaggi/scomp_link/wiki
Keywords: machine-learning,data-science,automation,pipeline,automl,scikit-learn,regression,classification,clustering,deep-learning,nlp,text-classification,contrastive-learning,anomaly-detection,time-series,ensemble-learning,hyperparameter-tuning,model-selection,cross-validation,feature-selection,bert,transformers,tensorflow,pytorch,computer-vision,image-classification,html-report,plotly,eda,preprocessing
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Education
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Scientific/Engineering :: Image Recognition
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Linguistic
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy<1.27.0,>=1.22.0; python_version >= "3.10" and python_version < "3.11"
Requires-Dist: numpy<1.27.0,>=1.24.0; python_version >= "3.11" and python_version < "3.12"
Requires-Dist: numpy<2.0.0,>=1.26.0; python_version >= "3.12" and python_version < "3.13"
Requires-Dist: numpy<3.0.0,>=1.26.0; python_version >= "3.13"
Requires-Dist: pandas<2.2.0,>=1.4.0; python_version >= "3.10" and python_version < "3.11"
Requires-Dist: pandas<2.2.0,>=2.0.0; python_version >= "3.11" and python_version < "3.12"
Requires-Dist: pandas<2.3.0,>=2.1.0; python_version >= "3.12" and python_version < "3.13"
Requires-Dist: pandas<3.0.0,>=2.1.0; python_version >= "3.13"
Requires-Dist: scipy<1.12.0,>=1.8.0; python_version >= "3.10" and python_version < "3.11"
Requires-Dist: scipy<1.12.0,>=1.10.0; python_version >= "3.11" and python_version < "3.12"
Requires-Dist: scipy<1.14.0,>=1.11.0; python_version >= "3.12" and python_version < "3.13"
Requires-Dist: scipy<2.0.0,>=1.11.0; python_version >= "3.13"
Requires-Dist: scikit-learn<1.4.0,>=1.2.0; python_version >= "3.10" and python_version < "3.11"
Requires-Dist: scikit-learn<1.5.0,>=1.3.0; python_version >= "3.11" and python_version < "3.12"
Requires-Dist: scikit-learn<1.6.0,>=1.3.0; python_version >= "3.12" and python_version < "3.13"
Requires-Dist: scikit-learn<2.0.0,>=1.3.0; python_version >= "3.13"
Requires-Dist: matplotlib<3.9.0,>=3.6.0; python_version >= "3.10" and python_version < "3.11"
Requires-Dist: matplotlib<3.9.0,>=3.7.0; python_version >= "3.11" and python_version < "3.12"
Requires-Dist: matplotlib<3.10.0,>=3.8.0; python_version >= "3.12" and python_version < "3.13"
Requires-Dist: matplotlib<4.0.0,>=3.8.0; python_version >= "3.13"
Requires-Dist: plotly<6.0.0,>=5.0.0
Requires-Dist: seaborn<0.14.0,>=0.11.0
Requires-Dist: torch<2.3.0,>=1.13.0; python_version >= "3.10" and python_version < "3.11"
Requires-Dist: torch<2.4.0,>=2.0.0; python_version >= "3.11" and python_version < "3.12"
Requires-Dist: torch<2.5.0,>=2.0.0; python_version >= "3.12" and python_version < "3.13"
Requires-Dist: torch<3.0.0,>=2.0.0; python_version >= "3.13"
Requires-Dist: transformers<5.0.0,>=4.30.0
Requires-Dist: spacy<4.0.0,>=3.5.0
Requires-Dist: faiss-cpu<2.0.0,>=1.7.0
Requires-Dist: sentence-transformers<3.0.0,>=2.2.0
Requires-Dist: tensorflow<2.16.0,>=2.10.0; python_version >= "3.10" and python_version < "3.11"
Requires-Dist: tensorflow<2.16.0,>=2.12.0; python_version >= "3.11" and python_version < "3.12"
Requires-Dist: tensorflow<2.18.0,>=2.13.0; python_version >= "3.12" and python_version < "3.13"
Requires-Dist: tensorflow<3.0.0,>=2.13.0; python_version >= "3.13"
Requires-Dist: pillow<11.0.0,>=9.0.0; python_version >= "3.10" and python_version < "3.12"
Requires-Dist: pillow<11.0.0,>=10.0.0; python_version >= "3.12"
Requires-Dist: pytorch-tabnet<5.0.0,>=4.0.0
Requires-Dist: statsmodels<1.0.0,>=0.13.0
Requires-Dist: tf-keras>=2.16.0; python_version >= "3.12"
Requires-Dist: polars<2.0.0,>=0.20.0
Requires-Dist: pyarrow>=12.0.0
Requires-Dist: shap<1.0.0,>=0.42.0
Requires-Dist: lime<1.0.0,>=0.2.0
Requires-Dist: optuna<5.0.0,>=3.0.0
Requires-Dist: tqdm<5.0.0,>=4.50.0
Requires-Dist: PyJWT<3.0.0,>=2.0.0
Requires-Dist: markdown<4.0.0,>=3.3.0
Requires-Dist: weasyprint<63.0,>=57.0
Requires-Dist: playwright<2.0.0,>=1.40.0
Provides-Extra: dev
Requires-Dist: pytest<9.0.0,>=6.0.0; extra == "dev"
Requires-Dist: pytest-cov<6.0.0,>=2.10.0; extra == "dev"
Dynamic: license-file

# scomp-link: The Astromech Arm for Your Python Projects

## May the code be with you

[![CI](https://github.com/GiacomoSaccaggi/scomp_link/actions/workflows/ci.yml/badge.svg)](https://github.com/GiacomoSaccaggi/scomp_link/actions/workflows/ci.yml)
[![codecov](https://codecov.io/gh/GiacomoSaccaggi/scomp_link/branch/main/graph/badge.svg)](https://codecov.io/gh/GiacomoSaccaggi/scomp_link)
[![PyPI](https://img.shields.io/pypi/v/scomp-link)](https://pypi.org/project/scomp-link/)
[![Python](https://img.shields.io/pypi/pyversions/scomp-link)](https://pypi.org/project/scomp-link/)
[![Downloads](https://img.shields.io/pypi/dm/scomp-link)](https://pypi.org/project/scomp-link/)
[![License](https://img.shields.io/github/license/GiacomoSaccaggi/scomp_link)](https://github.com/GiacomoSaccaggi/scomp_link/blob/main/LICENSE)
[![Code style: ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)
[![Typed](https://img.shields.io/badge/typing-typed-blue)](https://peps.python.org/pep-0561/)

---

## Overview

**scomp-link** is an end-to-end machine learning toolkit that automates the complete ML workflow — from data profiling and preprocessing to model selection, training, validation, explainability, monitoring, and deployment.

It includes a **full-featured CLI** for zero-code ML workflows and a Python API for programmatic use.

---

## Installation

```bash
pip install scomp-link
```

Requires Python 3.10+.

---

## Key Features

| Category | Features |
|----------|----------|
| **Pipeline** | Automated model selection, training, validation, HTML reports |
| **CLI** | 13 commands — `run`, `predict`, `explain`, `engineer`, `forecast`, `anomaly`, `drift`, `fairness`, `quality`, `report`, `compare`, `info`, `init` |
| **Preprocessing** | Data cleaning, feature engineering (interactions, log, dates, target encoding, binning), data quality profiling |
| **Models** | Regression, classification, clustering, time series forecasting, anomaly detection, text (BERT contrastive), images (CNN) |
| **Tuning** | Optuna (Bayesian), Halving Grid Search, Early Stopping CV |
| **Validation** | K-Fold, LOOCV, Bootstrap, ensemble (voting/stacking) |
| **Explainability** | SHAP values, LIME explanations |
| **Monitoring** | Data drift detection (PSI + KS test) |
| **Fairness** | Demographic parity, disparate impact (4/5 rule), equalized odds |
| **Persistence** | Custom `.scomp` format (model + preprocessor + config + metrics + sample data) |
| **Reporting** | Interactive HTML reports (Plotly), data quality reports |

---

## CLI Quick Start

```bash
# Scaffold a new project
scomp-link init my_project

# Profile your data
scomp-link quality --data data.csv --output report.html

# Feature engineering
scomp-link engineer --data data.csv --target y --interactions --log-transform --output features.csv

# Train a model
scomp-link run --data features.csv --target y --task regression --save-artifact model.scomp

# Predict
scomp-link predict --artifact model.scomp --data new_data.csv --output predictions.csv

# Explain
scomp-link explain --artifact model.scomp --data test.csv

# Detect drift
scomp-link drift --reference train.csv --current production.csv

# Forecast time series
scomp-link forecast --data series.csv --column value --horizon 30

# Anomaly detection
scomp-link anomaly --data data.csv --methods iforest,lof

# Fairness check
scomp-link fairness --data preds.csv --target y_true --predicted y_pred --sensitive gender

# Compare models
scomp-link compare --artifacts v1.scomp v2.scomp

# Generate EDA report
scomp-link report --data data.csv --output eda_report.html

# Generate model evaluation report
scomp-link report --artifact model.scomp --data test.csv --output model_report.html
```

---

## Python API Quick Start

```python
from scomp_link import ScompLinkPipeline, ScompArtifact, set_verbosity
import pandas as pd

# Control output
set_verbosity("info")  # "silent" | "warning" | "info" | "debug"

# Build pipeline
pipe = ScompLinkPipeline("My Project")
pipe.set_objectives(["Minimize RMSE"])
pipe.import_and_clean_data(df)
pipe.select_variables(target_col='target')
pipe.choose_model("numerical_prediction")
results = pipe.run_pipeline(task_type="regression")

# Save as artifact
artifact = ScompArtifact()
artifact.set_model(pipe.model)
artifact.set_config(task_type='regression', target_col='target')
artifact.set_metrics(results['metrics'])
artifact.save('model.scomp')

# Load and predict
loaded = ScompArtifact.load('model.scomp')
predictions = loaded.predict(new_data)
```

---

## Feature Engineering

```python
from scomp_link import FeatureEngineer

fe = FeatureEngineer(
    interactions=True,      # Polynomial interactions
    log_transform=True,     # Log1p for skewed features
    date_features=True,     # Extract year/month/dow/weekend
    target_encode=True,     # Encode high-cardinality categoricals
    auto_bin=True,          # Quantile binning
)
X_train_eng = fe.fit_transform(X_train, y_train)
X_test_eng = fe.transform(X_test)
```

---

## Advanced Hyperparameter Tuning

```python
from scomp_link.models.advanced_tuning import OptunaOptimizer

def param_space(trial):
    return {
        'n_estimators': trial.suggest_int('n_estimators', 50, 500),
        'max_depth': trial.suggest_int('max_depth', 3, 20),
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True),
    }

optimizer = OptunaOptimizer(GradientBoostingRegressor, param_space, scoring='r2', n_trials=100)
best_model = optimizer.optimize(X_train, y_train)
```

---

## Explainability

```python
from scomp_link import ShapExplainer, LimeExplainer

# SHAP
shap_exp = ShapExplainer(model, X_train[:100])
shap_exp.explain(X_test)
importance = shap_exp.feature_importance()
fig = shap_exp.plot_importance()

# LIME
lime_exp = LimeExplainer(model, X_train, task='regression')
exp = lime_exp.explain_instance(X_test.iloc[0])
fig = lime_exp.plot_explanation(exp)
```

---

## Data Drift Detection

```python
from scomp_link import DriftDetector

detector = DriftDetector(X_train, psi_threshold=0.2)
report = detector.detect(X_production)
summary = detector.summary(report)
fig = detector.plot_drift_report(report)
```

---

## Fairness & Bias Metrics

```python
from scomp_link import FairnessMetrics

fm = FairnessMetrics(y_true, y_pred, sensitive_feature=df['gender'])
report = fm.compute_all()
print(fm.summary(report))
fig = fm.plot_fairness_report(report)
```

---

## Time Series Forecasting

```python
from scomp_link import TimeSeriesForecaster

fc = TimeSeriesForecaster(method='auto', horizon=30)
fc.fit(series)
forecast = fc.predict_with_ci()
cv_results = fc.walk_forward_cv(series, n_splits=5)
fig = fc.plot_forecast()
```

---

## Data Quality Report

```python
from scomp_link import DataQualityReport

dqr = DataQualityReport(df)
report = dqr.generate()  # missing, cardinality, constants, duplicates, correlations
dqr.save_html('quality_report.html')
```

---

## Anomaly Detection

```python
from scomp_link import AnomalyDetector

detector = AnomalyDetector(
    contamination=0.05,
    methods=['iforest', 'lof', 'tabnet', 'transformer'],
    consensus_threshold=2,
)
results = detector.fit_predict(df, features=['col1', 'col2', 'col3'])
```

---

## Project Structure

```
scomp_link/
├── cli.py                    # CLI (12 commands)
├── core.py                   # ScompLinkPipeline orchestrator
├── preprocessing/
│   ├── data_processor.py     # Preprocessor (polars backend)
│   ├── feature_engineer.py   # FeatureEngineer (sklearn-compatible)
│   └── data_quality.py       # DataQualityReport
├── models/
│   ├── model_factory.py      # Decision-tree model selection
│   ├── regressor_optimizer.py
│   ├── classifier_optimizer.py
│   ├── ensemble_optimizer.py
│   ├── advanced_tuning.py    # Optuna, Halving, EarlyStopping
│   ├── forecaster.py         # TimeSeriesForecaster
│   ├── anomaly_detector.py
│   ├── ts_anomaly_detector.py
│   ├── contrastive_text.py   # BERT contrastive learning
│   ├── supervised_text.py
│   └── supervised_img.py
├── validation/
│   ├── model_validator.py    # Metrics + HTML reports
│   ├── advanced_cv.py        # LOOCV, Bootstrap
│   └── fairness.py           # FairnessMetrics
├── explainability/
│   └── explainer.py          # ShapExplainer, LimeExplainer
├── monitoring/
│   └── drift_detector.py     # DriftDetector (PSI + KS)
├── persistence/
│   └── artifact.py           # ScompArtifact (.scomp format)
└── utils/
    ├── logger.py             # Configurable logging
    ├── report_html.py        # HTML report builder
    └── plotly_utils.py       # Chart utilities
```

---

## Testing

```bash
# Run all tests (185 tests)
pytest tests/ -v

# With coverage
pytest tests/ --cov=scomp_link --cov-report=html
```

---

## Documentation

Full documentation with API reference and CLI guide:

```bash
pip install mkdocs mkdocs-material "mkdocstrings[python]"
mkdocs serve  # http://localhost:8000
```

---

## Contributing

```bash
git clone https://github.com/GiacomoSaccaggi/scomp-link.git
cd scomp_link
pip install -e ".[dev]"
pytest tests/ -v
```

---

## License

MIT License

---

**May the code be with you.** 🚀

📦 [scomp-link on PyPI](https://pypi.org/project/scomp-link/)
