Metadata-Version: 2.1
Name: skpipeline-forge
Version: 1.0.0
Summary: Composable sklearn-compatible pipeline builder for feature engineering, validation, resampling, and monitoring.
Keywords: machine-learning,sklearn,pipeline,feature-engineering,resampling
Author: Tom
Author-email: tomrfreeman3@gmail.com
Requires-Python: >=3.10,<4.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: imbalanced-learn (>=0.14.0,<0.15.0)
Requires-Dist: mlflow (>=2.6.0,<3.0.0)
Requires-Dist: numpy (>=1.25.2,<2.0.0)
Requires-Dist: pandas (>=2.1.0,<3.0.0)
Requires-Dist: prophet (>=1.1.4,<2.0.0)
Requires-Dist: scikit-learn (>=1.3.0,<2.0.0)
Requires-Dist: setuptools (>=75.0.0,<76.0.0)
Description-Content-Type: text/markdown

# skpipeline-forge

`skpipeline-forge` is a composable, sklearn-compatible pipeline builder for tabular ML workflows.

It helps you define reusable transformation stacks as objects, then apply them consistently across:
- training
- scoring
- optional imbalance handling
- monitoring/reporting hooks

## Install

```bash
pip install skpipeline-forge
```

## Quickstart

```python
from pipeline_forge import FeaturePipelineBuilder

builder = FeaturePipelineBuilder(target_col="target", random_seed=42)

(
    builder
    .validate_schema(required_cols=["city", "hour", "age", "balance"], strict=False)
    .group_rare_categories(categorical_cols=["city"], min_freq=0.02)
    .target_encode(categorical_cols=["city"], drop_original=True)
    .encode_cyclical_time(period_map={"hour": 24})
    .add_interaction_features(numeric_cols=["age", "balance"])
    .scaler("robust")
    .random_oversampling(sampling_strategy=1.0)
)

builder.build_pipeline()
```

## What You Get

- Fluent API for chaining transformations
- sklearn/imblearn-compatible pipelines
- Feature engineering + resampling in one object
- Extra pipeline layers for validation, selection, postprocessing, monitoring, explainability
- Built-in common transformers for tabular data

## Documentation

- [Overview](docs/index.md)
- [Pipeline Layers](docs/pipelines.md)
- [Transformation Catalog](docs/transformations.md)
- [Examples](docs/examples.md)
- [Tutorial Notebook](tutorial.ipynb)

## Package Notes

- Python: `>=3.10,<4.0`
- Main runtime deps: `scikit-learn`, `imbalanced-learn`, `pandas`, `numpy`, `mlflow`, `prophet`
- Tests: `pytest`

## Development

```bash
poetry install --with dev
poetry run pytest -q
poetry build
```

