Metadata-Version: 2.4
Name: lecrapaud
Version: 0.32.1
Summary: Framework for machine and deep learning, with regression, classification and time series analysis
License: Apache License
License-File: LICENSE
Author: Pierre H. Gallet
Requires-Python: ==3.12.*
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: alembic (>=1.17.2)
Requires-Dist: bandit (>=1.9.2)
Requires-Dist: black (>=25.12.0)
Requires-Dist: catboost (>=1.2.8)
Requires-Dist: category-encoders (>=2.9.0)
Requires-Dist: codecov (>=2.1.13)
Requires-Dist: coverage (>=7.13.0)
Requires-Dist: flake8 (>=7.3.0)
Requires-Dist: ftfy (>=6.3.1)
Requires-Dist: hyperopt (>=0.2.7)
Requires-Dist: ipykernel (>=7.1.0)
Requires-Dist: ipywidgets (>=8.1.8)
Requires-Dist: joblib (>=1.5.3)
Requires-Dist: keras (>=3.12.0)
Requires-Dist: keras-tcn (>=3.5.6)
Requires-Dist: lightgbm (>=4.6.0)
Requires-Dist: lime (>=0.2.0.1)
Requires-Dist: matplotlib (>=3.10.8)
Requires-Dist: mkdocs (>=1.6.0)
Requires-Dist: mkdocs-gen-files (>=0.5.0)
Requires-Dist: mkdocs-literate-nav (>=0.6.0)
Requires-Dist: mkdocs-material (>=9.5.0)
Requires-Dist: mkdocs-section-index (>=0.3.8)
Requires-Dist: mkdocstrings[python] (>=0.27.0)
Requires-Dist: mlxtend (>=0.23.4)
Requires-Dist: mypy (>=1.19.1)
Requires-Dist: numpy (>=2.1.3)
Requires-Dist: openai (>=2.13.0)
Requires-Dist: pandas (>=2.3.3)
Requires-Dist: pipdeptree (>=2.30.0)
Requires-Dist: poetry (>=2.2.1)
Requires-Dist: psycopg2-binary (>=2.9.9)
Requires-Dist: pydantic (>=2.12.5)
Requires-Dist: pylint (>=4.0.4)
Requires-Dist: pymysql (>=1.1.2)
Requires-Dist: pytest (>=9.0.2)
Requires-Dist: pytest-cov (>=7.0.0)
Requires-Dist: pytest-mock (>=3.15.1)
Requires-Dist: python-dotenv (>=1.2.1)
Requires-Dist: ray[tune] (>=2.52.1)
Requires-Dist: safety (>=3.7.0)
Requires-Dist: scikit-learn (>=1.6.1)
Requires-Dist: scipy (>=1.16.3)
Requires-Dist: seaborn (>=0.13.2)
Requires-Dist: sentry-sdk (>=2.48.0)
Requires-Dist: shap (>=0.50.0)
Requires-Dist: sqlalchemy (>=2.0.45)
Requires-Dist: tabulate (>=0.9.0)
Requires-Dist: tensorboard (<=2.19.0)
Requires-Dist: tensorboardx (>=2.6.4)
Requires-Dist: tensorflow (<=2.19.0)
Requires-Dist: tiktoken (>=0.12.0)
Requires-Dist: tqdm (>=4.67.1)
Requires-Dist: xgboost (>=3.1.2)
Description-Content-Type: text/markdown

<div align="center">

<img src="https://em-content.zobj.net/source/apple/129/frog-face_1f438.png" width=120 alt="crapaud"/>

## 🐸 LeCrapaud

**An all-in-one machine learning framework**

[![PyPI version](https://badge.fury.io/py/lecrapaud.svg)](https://badge.fury.io/py/lecrapaud)
[![Python versions](https://img.shields.io/pypi/pyversions/lecrapaud.svg)](https://pypi.org/project/lecrapaud)
[![Documentation](https://img.shields.io/badge/docs-lecrapaud.pierregallet.com-green)](https://lecrapaud.pierregallet.com)

</div>

---

LeCrapaud is a high-level Python library for end-to-end machine learning on tabular and time series data. It handles feature engineering, model selection, training, and prediction in one command.

### Key Features

- 🔄 **End-to-end ML pipeline** — feature engineering, preprocessing, feature selection, hyperparameter optimization, and training in a single `fit()` call
- 🤖 **11+ models** — from Linear Regression to XGBoost, LightGBM, CatBoost, and deep learning architectures (LSTM, GRU, TCN, Transformer)
- 🎯 **Automated feature selection** — ensemble of 10+ methods (Chi2, ANOVA, Mutual Information, SHAP, RFE, etc.)
- ⚡ **Hyperparameter optimization** — HyperOpt (TPE) and Ray Tune with cross-validation support
- 🔍 **Explainability** — built-in SHAP, LIME, feature importance, and tree visualization
- 🗄️ **Experiment tracking** — every experiment is stored in the database (PostgreSQL or MySQL) with full reproducibility
- 🧩 **Modular** — use the full pipeline or individual components (FeatureEngineer, FeaturePreprocessor, FeatureSelector) in sklearn-compatible pipelines

## Why LeCrapaud?

Most ML tools solve **one piece** of the puzzle. LeCrapaud handles the **entire workflow** in a single `fit()` call.

| | LeCrapaud | MLflow | scikit-learn | Auto-sklearn / TPOT |
|---|:---:|:---:|:---:|:---:|
| Feature engineering | ✅ Automated (Fourier dates, target encoding, imputation) | ❌ Manual | ❌ Manual | ❌ Generic only |
| Feature selection | ✅ Ensemble of 10+ methods with voting | ❌ Manual | ❌ One method at a time | ⚠️ Implicit |
| Hyperparameter optimization | ✅ HyperOpt + Ray Tune | ❌ Manual | ⚠️ GridSearchCV | ✅ Built-in |
| Multi-target support | ✅ Native (regression + classification) | ❌ | ❌ | ❌ |
| Deep learning models | ✅ LSTM, GRU, TCN, Transformer | ❌ | ⚠️ MLP only | ❌ |
| Time series support | ✅ Fourier features, temporal CV, RNNs | ❌ | ⚠️ Basic | ❌ |
| Explainability | ✅ SHAP + LIME + feature importance | ❌ | ⚠️ Feature importance only | ❌ |
| Experiment tracking | ✅ Full artifacts in PostgreSQL/MySQL | ✅ Tracking server | ❌ | ❌ |
| Reproducibility | ✅ Reload any experiment with `get(id=...)` | ✅ | ❌ | ⚠️ |
| sklearn compatibility | ✅ fit/transform pattern | ❌ | ✅ Native | ✅ |

**In short:**

- **MLflow** tracks experiments but doesn't train models or engineer features — you still write all the ML code yourself
- **scikit-learn** provides building blocks but requires manual pipeline composition, no experiment tracking, and limited model support
- **AutoML tools** (auto-sklearn, TPOT) automate model selection but act as black boxes with no feature engineering transparency, no explainability, and no time series support
- **LeCrapaud** combines automated feature engineering, ensemble feature selection, hyperparameter optimization, multi-target training, explainability, and experiment tracking — all in one `fit()` call, while remaining transparent and customizable

## Prerequisites

- **Python 3.12** (strictly required)
- **PostgreSQL** or **MySQL** database for experiment storage
- **macOS only** — [libomp](https://formulae.brew.sh/formula/libomp) for LightGBM/XGBoost:
  ```sh
  brew install libomp
  ```

## Installation

### 📦 From PyPI (recommended)

Install the latest stable release:

```sh
pip install lecrapaud
```

Or pin a specific version:

```sh
pip install lecrapaud==0.31.7
```

### 🔧 From source

Install the latest development version directly from GitHub:

```sh
pip install git+https://github.com/PierreGallet/lecrapaud.git
```

Or clone the repository and install locally:

```sh
git clone https://github.com/PierreGallet/lecrapaud.git
cd lecrapaud
pip install .
```

## Quick Start

```python
from lecrapaud import LeCrapaud

LeCrapaud.set_uri("mysql+pymysql://user:password@host:port/dbname")

lc = LeCrapaud(
    experiment_name="my_experiment",
    target_numbers=[1],
    target_clf=[1],
    models_idx=["lgb", "xgb"],
)

lc.fit(data)
predictions, scores_reg, scores_clf = lc.predict(new_data)
```

## Documentation

Full documentation available at **[lecrapaud.pierregallet.com](https://lecrapaud.pierregallet.com)**

## Contributing

Contributions are welcome! Here's how to get started.

### Development Setup

```sh
git clone https://github.com/PierreGallet/lecrapaud.git
cd lecrapaud
python3.12 -m venv .venv
source .venv/bin/activate
make install
```

### Workflow

1. **Open an issue** first to discuss the change you'd like to make
2. **Fork the repo** and create a branch from `main`:
   - `feat/your-feature` for new features
   - `fix/your-bugfix` for bug fixes
   - `docs/your-change` for documentation
3. **Write or update tests** when changing behavior
4. **Run the test suite** before submitting:
   ```sh
   make test
   ```
5. **Open a Pull Request** against `main` with a clear description

### Commit Convention

We use [Conventional Commits](https://www.conventionalcommits.org/). Every commit message and PR title must follow this format:

```
type: short description
```

| Type | Usage |
|------|-------|
| `feat:` | New feature |
| `fix:` | Bug fix |
| `docs:` | Documentation only |
| `refactor:` | Code change that neither fixes a bug nor adds a feature |
| `test:` | Adding or updating tests |
| `perf:` | Performance improvement |
| `ci:` | CI/CD changes |
| `chore:` | Maintenance tasks |

Examples:
```
feat: add catboost model support
fix: handle missing target column in predict
docs: update getting started guide
```

### Guidelines

- Keep PRs focused and small — one concern per PR
- Update documentation when APIs change
- Follow the existing code style
- All tests must pass before merging

## License

LeCrapaud is licensed under the [Apache License 2.0](LICENSE). You are free to use, modify, and distribute this software in compliance with the license terms.

---

Pierre Gallet 2025

