Metadata-Version: 2.4
Name: faro-prep
Version: 0.1.0
Summary: Data preprocessing and feature engineering for time-series forecasting
Author-email: Angel Zeledon <angel.zeledon.fernandez@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/Angel-Zeledon/faro-lib
Project-URL: Repository, https://github.com/Angel-Zeledon/faro-lib
Keywords: preprocessing,feature-engineering,time-series,pandas,pipeline
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: pandas>=1.5
Requires-Dist: numpy>=1.23
Requires-Dist: scikit-learn>=1.1
Requires-Dist: holidays>=0.20
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: ruff; extra == "dev"

# faro-prep

Data preprocessing and feature engineering library for time-series forecasting. Fluent chainable API for cleaning, encoding, scaling, and generating time-series features from pandas DataFrames.

## Installation

```bash
pip install faro-prep
```

## Quick Start

```python
from forecastlib.data import Loader

ds = (
    Loader.from_csv("sales.csv")
    .select(target="sales", datetime="date", group="store")
    .clean.fix_datetime()
    .fill.smart()
    .categorical().encode.auto()
    .numeric().exclude(["sales"]).scale.standard()
    .target().lags([1, 7, 14])
    .target().rolling.mean([7, 30])
    .datetime().features.calendar()
)

df = ds.to_dataframe()

pipeline = ds.to_pipeline()
pipeline.save("pipeline.pkl")

from forecastlib.pipeline import Pipeline
loaded = Pipeline.load("pipeline.pkl")
```

## Features

- Chainable fluent API on `Dataset` objects
- Smart missing value imputation (median, forward-fill, interpolation)
- Automatic categorical encoding: label, one-hot, ordinal
- Flexible scaling: standard, minmax, robust, log
- Time-series features: lags, rolling mean/std/min/max, EWM, diffs
- Calendar features with cyclical sin/cos encoding, Colombia holidays
- Train/test splitting with expanding window cross-validation
- Serializable preprocessing pipelines (save/load as `.pkl`)

## License

MIT
