Metadata-Version: 2.4
Name: process-improve
Version: 1.21.4
Summary: Designed Experiments; Latent Variables (PCA, PLS, multivariate methods with missing data); Process Monitoring; Batch data analysis.
Keywords: Designed Experiments,Latent Variables,PCA,PLS,Multivariate Data Analysis,Batch data analysis
Author: Kevin Dunn
Author-email: Kevin Dunn <kgdunn@gmail.com>
License-Expression: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Dist: matplotlib>=3.10.8
Requires-Dist: numba>=0.63.1
Requires-Dist: numpy>=2.2.6
Requires-Dist: openpyxl>=3.1.5
Requires-Dist: pandas>=2.3.3
Requires-Dist: patsy>=1.0.2
Requires-Dist: plotly>=6.5.2
Requires-Dist: pydantic>=2.12.5
Requires-Dist: pyyaml>=6.0
Requires-Dist: pydoe3>=1.0
Requires-Dist: ridgeplot>=0.5.0
Requires-Dist: scikit-image>=0.25.2
Requires-Dist: scikit-learn>=1.7.2
Requires-Dist: seaborn>=0.13.2
Requires-Dist: statsmodels>=0.14.6
Requires-Dist: tqdm>=4.67.1
Requires-Dist: coverage>=7.13.1 ; extra == 'dev'
Requires-Dist: hypothesis>=6.100 ; extra == 'dev'
Requires-Dist: matplotlib-stubs>=0.3.11 ; extra == 'dev'
Requires-Dist: mypy>=1.19.1 ; extra == 'dev'
Requires-Dist: pandas-stubs>=2.3.3.260113 ; extra == 'dev'
Requires-Dist: plotly-stubs>=0.0.6 ; extra == 'dev'
Requires-Dist: pre-commit>=4.5.1 ; extra == 'dev'
Requires-Dist: pytest>=9.0.2 ; extra == 'dev'
Requires-Dist: pytest-cov>=7.0.0 ; extra == 'dev'
Requires-Dist: pytest-xdist>=3.8.0 ; extra == 'dev'
Requires-Dist: pydata-sphinx-theme>=0.16.1 ; extra == 'dev'
Requires-Dist: ruff>=0.11.0 ; extra == 'dev'
Requires-Dist: sphinx>=8.1.3 ; extra == 'dev'
Requires-Dist: nbsphinx>=0.9.5 ; extra == 'dev'
Requires-Dist: ipykernel>=6.29 ; extra == 'dev'
Requires-Dist: tqdm-stubs>=0.2.1 ; extra == 'dev'
Requires-Dist: watchdog>=6.0.0 ; extra == 'dev'
Requires-Dist: mcp>=1.0 ; extra == 'mcp'
Requires-Python: >=3.10
Project-URL: Homepage, https://github.com/kgdunn/process_improve
Project-URL: Repository, https://github.com/kgdunn/process_improve
Project-URL: Issues, https://github.com/kgdunn/process_improve/issues
Provides-Extra: dev
Provides-Extra: mcp
Description-Content-Type: text/markdown

# process-improve

**Multivariate analysis, designed experiments, and process monitoring for Python.**
Built for chemometrics, manufacturing, and pharma data - the methods that scikit-learn skips.

[![PyPI version](https://img.shields.io/pypi/v/process-improve.svg)](https://pypi.org/project/process-improve/)
[![Python versions](https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2Fkgdunn%2Fprocess-improve%2Fmain%2Fpyproject.toml&label=python)](https://pypi.org/project/process-improve/)
[![Downloads](https://static.pepy.tech/badge/process-improve)](https://pepy.tech/project/process-improve)
[![Downloads per month](https://static.pepy.tech/badge/process-improve/month)](https://pepy.tech/project/process-improve)
[![CI](https://github.com/kgdunn/process-improve/actions/workflows/run-tests.yml/badge.svg?branch=main&event=push)](https://github.com/kgdunn/process-improve/actions/workflows/run-tests.yml?query=branch%3Amain)
[![codecov](https://codecov.io/gh/kgdunn/process-improve/branch/main/graph/badge.svg)](https://codecov.io/gh/kgdunn/process-improve)
[![Docs](https://img.shields.io/badge/docs-kgdunn.github.io-blue.svg)](https://kgdunn.github.io/process-improve/)
[![License](https://img.shields.io/pypi/l/process-improve.svg)](LICENSE)

---

## What it does

`process-improve` provides production-grade implementations of the methods
practitioners actually use on real plant and lab data:

- **PCA** with SVD and NIPALS, plus native missing-value handling via Trimmed
  Score Regression
- **PLS** regression with a fully sklearn-compatible API, VIP scores, and
  cross-validated diagnostics
- **TPLS** - PLS for *T-shaped (multi-block) data structures*
- **Outlier detection** combining Hotelling's T² and SPE with an ESD-based test
- **Designed experiments** - full-factorial, fractional-factorial, and
  response-surface designs, plus a multi-stage DOE strategy recommender
- **Process monitoring** - Shewhart, CUSUM, and Holt-Winters control charts
- **Batch data analysis** - alignment, feature extraction, and multivariate
  batch monitoring (MBPCA / MBPLS)
- **Interactive Plotly diagnostics** bound directly to every fitted model

Outputs are `pandas`-native: scores, loadings, and predictions keep your row
and column labels.

It is the companion package to the online textbook
[Process Improvement using Data](https://learnche.org/pid), and powers the
statistical engine behind [factori.al](https://factori.al).

## Why not scikit-learn?

scikit-learn answers *"what fits the data?"* - `process-improve` answers
*"is this batch normal, which variable went off, and how confident am I in the
prediction?"* The two libraries are designed to be used together;
`process-improve` follows sklearn conventions (`fit`, `predict`, `score`, the
`_` suffix on fitted attributes) and drops into existing pipelines.

| Capability                                       | scikit-learn | process-improve |
| ------------------------------------------------ | :----------: | :-------------: |
| PCA, PLS with sklearn-style API                  |       ✓      |        ✓        |
| Missing-data fitting (NIPALS / TSR)              |       -      |        ✓        |
| Hotelling's T² + SPE outlier limits              |       -      |        ✓        |
| Variable-level score contributions               |       -      |        ✓        |
| Cross-validated coefficient confidence intervals |       -      |        ✓        |
| Multi-block models (TPLS)                         |       -      |        ✓        |
| Designed experiments (DoE)                        |       -      |        ✓        |
| Control charts (Shewhart / CUSUM / Holt-Winters)  |       -      |        ✓        |
| Batch process monitoring (MBPCA / MBPLS)          |       -      |        ✓        |
| Plotly diagnostics built in                       |       -      |        ✓        |
| Labeled `DataFrame` outputs                       |    partial   |        ✓        |

## Installation

```bash
pip install process-improve
```

Requires Python 3.10 or newer. Built on `numpy`, `pandas`, `scipy`,
`scikit-learn`, `statsmodels`, `plotly`, and `pyDOE3`.

## Quick start

### PCA - Principal Component Analysis

```python
import pandas as pd
from process_improve.multivariate.methods import PCA, MCUVScaler

X = pd.read_csv("your_data.csv", index_col=0)
X_scaled = MCUVScaler().fit_transform(X)

pca = PCA(n_components=3).fit(X_scaled)
print(pca.r2_cumulative_)         # cumulative R² per component
pca.score_plot()                  # interactive Plotly figure

# Flag outliers using combined T² and SPE limits at 95% confidence
outliers = pca.detect_outliers(conf_level=0.95)

# Which variables drove the first observation off?
contrib = pca.score_contributions(pca.scores_.iloc[0].values)
```

### PLS - Projection to Latent Structures

```python
from process_improve.multivariate.methods import PLS, MCUVScaler

# Scale X and Y separately
scaler_x = MCUVScaler().fit(X)
scaler_y = MCUVScaler().fit(Y)
X_s, Y_s = scaler_x.transform(X), scaler_y.transform(Y)

pls = PLS(n_components=3).fit(X_s, Y_s)
print(pls.beta_coefficients_)     # regression coefficients (K x M)
print(pls.r2_cumulative_)         # cumulative R² for Y
print(pls.vip())                  # VIP scores per X variable

# Predict new observations, with diagnostics on the prediction
result = pls.predict(scaler_x.transform(X_new))
result.y_hat                      # point predictions
result.spe                        # squared prediction error
result.hotellings_t2              # Hotelling's T² for new observations

# Cross-validated component selection
cv_select = PLS.select_n_components(X_s, Y_s, max_components=6)
print(cv_select.n_components)     # recommended number of components
print(cv_select.rmsecv)           # RMSECV per component count

# Cross-validation with beta-coefficient confidence intervals
cv = pls.cross_validate(X_s, Y_s, cv="loo")
print(cv.beta_ci_lower, cv.beta_ci_upper)   # 95% CI for each beta
print(cv.significant)                       # betas significantly != 0
print(cv.q_squared)                         # cross-validated R² (Q²)
```

### DOE - multi-stage experimental strategy

```python
from process_improve.experiments.factor import Factor, Response
from process_improve.experiments.strategy import recommend_strategy

factors = [
    Factor(name="Temperature", low=25, high=40, units="degC"),
    Factor(name="pH", low=5.0, high=7.5),
    Factor(name="Glucose", low=10, high=50, units="g/L"),
]
strategy = recommend_strategy(
    factors=factors,
    responses=[Response(name="Yield", goal="maximize", units="g/L")],
    budget=40,
    domain="fermentation",
)
for s in strategy["stages"]:
    print(s["stage_number"], s["design_type"], s["estimated_runs"])
```

Longer, fully-worked versions of each example live in the
[Quickstart guide](https://kgdunn.github.io/process-improve/quickstart.html)
and the `process_improve/notebooks_examples/` folder.

New to designed experiments? The
[**Applied DoE tutorial**](https://kgdunn.github.io/process-improve/applied_doe/index.html)
is an eight-module worked-solution series.

## API design

PCA and PLS follow scikit-learn conventions: `fit()` returns `self`, fitted
attributes end with a trailing underscore (`scores_`, `loadings_`, `spe_`,
`hotellings_t2_`, `r2_cumulative_`, ...), and `predict()` returns an
`sklearn.utils.Bunch` with named fields (`y_hat`, `spe`, `hotellings_t2`, ...).
Inputs are accepted as `pandas.DataFrame`, and index/column labels are
preserved through `fit` and `transform`.

## Documentation & learning resources

- **API reference & user guide:** <https://kgdunn.github.io/process-improve/>
- **Applied DoE tutorial (8 modules):**
  <https://kgdunn.github.io/process-improve/applied_doe/index.html>
- **Companion textbook:** [Process Improvement using Data](https://learnche.org/pid)
- **Hosted experiment-design tool:** [factori.al](https://factori.al)
- **Local docs build:** `cd docs && make html`

## Citing process-improve

If you use this package in academic work, please cite it:

```bibtex
@software{dunn_process_improve,
  author  = {Dunn, Kevin G.},
  title   = {{process-improve: Multivariate Analysis for Process Improvement}},
  year    = {2026},
  version = {v1.21.4},
  url     = {https://github.com/kgdunn/process-improve}
}
```

A `CITATION.cff` file is included, so GitHub renders a *"Cite this
repository"* button in the sidebar.

## Contributing

Bug reports, feature requests, and pull requests are welcome. See
[CONTRIBUTING.md](CONTRIBUTING.md) for development setup, testing, and code
style. Bugs and feature requests can be filed on the
[issue tracker](https://github.com/kgdunn/process-improve/issues).

## License

MIT - see [LICENSE](LICENSE) for details.
