Metadata-Version: 2.4
Name: ferroml
Version: 1.0.0
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: MacOS
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Rust
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Typing :: Typed
Requires-Dist: numpy>=1.21
Requires-Dist: ferroml[polars,pandas,sklearn,cli] ; extra == 'all'
Requires-Dist: typer>=0.9 ; extra == 'cli'
Requires-Dist: rich>=13.0 ; extra == 'cli'
Requires-Dist: polars>=0.19 ; extra == 'cli'
Requires-Dist: pytest>=7.0 ; extra == 'dev'
Requires-Dist: pytest-cov>=4.0 ; extra == 'dev'
Requires-Dist: pytest-xdist>=3.0 ; extra == 'dev'
Requires-Dist: mypy>=1.0 ; extra == 'dev'
Requires-Dist: ruff>=0.1 ; extra == 'dev'
Requires-Dist: pandas>=1.5 ; extra == 'pandas'
Requires-Dist: polars>=0.19 ; extra == 'polars'
Requires-Dist: scikit-learn>=1.0 ; extra == 'sklearn'
Requires-Dist: ferroml[dev,sklearn,pandas] ; extra == 'test'
Provides-Extra: all
Provides-Extra: cli
Provides-Extra: dev
Provides-Extra: pandas
Provides-Extra: polars
Provides-Extra: sklearn
Provides-Extra: test
License-File: LICENSE-MIT
License-File: LICENSE-APACHE
Summary: Statistically rigorous AutoML in Rust with Python bindings
Keywords: machine-learning,automl,statistics,data-science,rust,scikit-learn,gradient-boosting,random-forest,svm,regression,classification,clustering,statistical-diagnostics,preprocessing,cross-validation,bayesian
Author: FerroML Contributors
License-Expression: MIT OR Apache-2.0
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Changelog, https://github.com/robertlupo1997/ferroml/blob/master/CHANGELOG.md
Project-URL: Documentation, https://github.com/robertlupo1997/ferroml#readme
Project-URL: Homepage, https://github.com/robertlupo1997/ferroml
Project-URL: Issues, https://github.com/robertlupo1997/ferroml/issues
Project-URL: Repository, https://github.com/robertlupo1997/ferroml

# FerroML

[![PyPI](https://img.shields.io/pypi/v/ferroml.svg)](https://pypi.org/project/ferroml/)
[![CI](https://github.com/robertlupo1997/ferroml/actions/workflows/ci.yml/badge.svg)](https://github.com/robertlupo1997/ferroml/actions/workflows/ci.yml)
[![License](https://img.shields.io/crates/l/ferroml-core.svg)](https://github.com/robertlupo1997/ferroml)

**High-performance ML in Rust with a scikit-learn-compatible Python API.**

FerroML is a machine learning library written in Rust that provides 55+ algorithms with statistical rigor built in: confidence intervals on predictions, hypothesis testing for model comparison, and assumption checks on every model. It's 2-40x faster than scikit-learn on predict and up to 9x faster on fit for tree/ensemble models.

## Installation

```bash
pip install ferroml
```

Requires Python 3.10+. Pre-built wheels available for Linux (x86_64, aarch64), macOS (x86_64, arm64), and Windows (x86_64).

## Quick Start

```python
from ferroml.linear import LinearRegression
import numpy as np

# Linear regression with full statistical diagnostics
X = np.random.randn(100, 5)
y = X @ np.array([1, 2, 3, 4, 5]) + np.random.randn(100) * 0.1

model = LinearRegression()
model.fit(X, y)
print(model.summary())  # R-style output: coefficients, std errors, p-values, R²
```

```python
from ferroml.trees import RandomForestClassifier
from ferroml.preprocessing import StandardScaler
from ferroml.pipeline import Pipeline

# scikit-learn-compatible pipeline
pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("clf", RandomForestClassifier(n_estimators=100)),
])
pipe.fit(X_train, y_train)
score = pipe.score(X_test, y_test)
```

```python
from ferroml.preprocessing import CountVectorizer, TfidfTransformer
from ferroml.naive_bayes import MultinomialNB

# Text classification
cv = CountVectorizer()
X_counts = cv.fit_transform(documents)
tfidf = TfidfTransformer()
X_tfidf = tfidf.fit_transform(X_counts)
clf = MultinomialNB()
clf.fit(X_tfidf, y)
```

## Performance vs scikit-learn

All benchmarks produce matching predictions. Speedup >1x = FerroML is faster.

| Model | N | Fit | Predict |
|-------|--:|----:|--------:|
| RandomForest | 1K | **9.2x** | **7.8x** |
| Ridge | 1K | **5.3x** | **19.3x** |
| DecisionTree | 5K | **1.4x** | **16.4x** |
| GradientBoosting | 1K | **1.5x** | **1.2x** |
| LogisticRegression | 10K | **1.5x** | **13.7x** |

FerroML is faster on **predict universally** (zero Python overhead) and on **fit for tree/ensemble models** (Rayon parallel construction). scikit-learn wins on fit for LAPACK/MKL-backed linear algebra.

## Available Models

| Module | Models |
|--------|--------|
| `linear` | LinearRegression, LogisticRegression, Ridge, Lasso, ElasticNet, RidgeCV, LassoCV, ElasticNetCV, RidgeClassifier, QuantileRegression, RobustRegression, Perceptron, IsotonicRegression |
| `trees` | DecisionTreeClassifier, DecisionTreeRegressor, GradientBoostingClassifier/Regressor, HistGradientBoostingClassifier/Regressor |
| `ensemble` | RandomForest, ExtraTrees, AdaBoost, Bagging, Stacking, Voting (classifiers + regressors), SGD, PassiveAggressive |
| `naive_bayes` | GaussianNB, MultinomialNB, BernoulliNB, CategoricalNB |
| `svm` | SVC, SVR, LinearSVC, LinearSVR |
| `neighbors` | KNeighborsClassifier/Regressor, NearestCentroid |
| `neural` | MLPClassifier, MLPRegressor |
| `gaussian_process` | GaussianProcessClassifier, GaussianProcessRegressor |
| `clustering` | KMeans, DBSCAN, HDBSCAN, AgglomerativeClustering, GaussianMixture |
| `anomaly` | IsolationForest, LocalOutlierFactor |
| `decomposition` | PCA, IncrementalPCA, TruncatedSVD, LDA, QDA, FactorAnalysis, TSNE |
| `preprocessing` | 22+ transformers: scalers, encoders, imputers, SMOTE/ADASYN, CountVectorizer, TfidfTransformer |
| `explainability` | TreeSHAP, KernelSHAP, permutation importance, PDP, ICE, H-statistic |
| `multioutput` | MultiOutputClassifier, MultiOutputRegressor |
| `calibration` | TemperatureScaling, Sigmoid (Platt), Isotonic |
| `pipeline` | Pipeline, ColumnTransformer, FeatureUnion |
| `model_selection` | train_test_split, cross_validate, KFold, StratifiedKFold, GroupKFold, TimeSeriesSplit |
| `metrics` | ROC-AUC, F1, MCC, R², RMSE, MAE, roc_curve, precision_recall_curve |
| `automl` | AutoML with statistical model comparison |
| `datasets` | Iris, Diabetes, Wine, California Housing, synthetic generators |

## sklearn API Compatibility

FerroML supports the scikit-learn API conventions:

- `fit()` / `predict()` / `transform()` on all models
- `score()` on 56 models (R² for regressors, accuracy for classifiers)
- `partial_fit()` on 10 models for incremental learning
- `decision_function()` on 13 classifiers
- `predict_proba()` on probabilistic classifiers
- Pipeline and ColumnTransformer composition
- NumPy array input/output

## Testing

5,650+ tests passing (3,550+ Rust + 2,100+ Python), validated against scikit-learn, scipy, xgboost, lightgbm, and statsmodels with 200+ cross-library correctness tests.

## License

MIT OR Apache-2.0

