Metadata-Version: 2.4
Name: alina-mypackage
Version: 0.1.0
Summary: A from-scratch machine-learning library: regression, preprocessing, KNN, neural networks, diagnostics, and more.
Author: Alina
License-Expression: MIT
Project-URL: Homepage, https://github.com/yourusername/mypackage-alina
Project-URL: Issues, https://github.com/yourusername/mypackage-alina/issues
Keywords: machine-learning,regression,preprocessing,knn,pca,from-scratch
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.23
Provides-Extra: viz
Requires-Dist: matplotlib>=3.5; extra == "viz"
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Dynamic: license-file
Dynamic: requires-python

# mypackage-alina

A **from-scratch** machine-learning library for regression, preprocessing,
neighbours, neural networks, decomposition, diagnostics, and more —
implemented in pure NumPy for educational clarity and portfolio value.

Version: `0.1.0`

---

## Installation

```bash
# From PyPI
pip install alina-mypackage

# With optional visualization support
pip install "alina-mypackage[viz]"

# Editable / development install from source
# (replace the URL with your repository if you host it)
git clone https://github.com/yourusername/mypackage-alina.git
cd mypackage-alina
pip install -e ".[dev]"
```

---

## Importing from `mypackage`

```python
from mypackage import OLSRegression, StandardScaler, KNNRegressor, PCA
```

---

## Features

| Module | Algorithms / Tools |
|---|---|
| **regression** | OLS, Ridge (L2), Lasso (L1) |
| **neighbors** | KNN Classifier, KNN Regressor |
| **neural_networks** | Single-layer Perceptron (regression) |
| **preprocessing** | StandardScaler, OneHotEncoder, TargetEncoder, MissingValueHandler, OutlierHandler, PolynomialFeatures, TargetTransformer, FeatureSelector |
| **metrics** | MSE, RMSE, MAE, R² |
| **model_selection** | train_test_split, CrossValidation (k-fold) |
| **diagnostics** | VIF, NormalityTest, HeteroscedasticityTest |
| **feature_selection** | ForwardSelection, BackwardElimination |
| **decomposition** | PCA |
| **visualization** | RegressionPlots (actual vs predicted, residuals, histogram) |

---

## Quick Start

```python
import numpy as np
from mypackage import (
    OLSRegression,
    StandardScaler,
    train_test_split,
    mse, rmse, r2_score,
)

# --- Synthetic data ---
rng = np.random.default_rng(42)
X = rng.standard_normal((100, 3))
y = X @ np.array([2.0, -1.5, 0.8]) + rng.standard_normal(100) * 0.3

# --- Split ---
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# --- Preprocess ---
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test  = scaler.transform(X_test)

# --- Fit ---
model = OLSRegression()
model.fit(X_train, y_train)

# --- Evaluate ---
y_pred = model.predict(X_test)
print(f"MSE:  {mse(y_test, y_pred):.4f}")
print(f"RMSE: {rmse(y_test, y_pred):.4f}")
print(f"R²:   {r2_score(y_test, y_pred):.4f}")
```

---

## Algorithms in Detail

### Regression

#### OLS (Ordinary Least Squares)
Solves the normal equation analytically using the Moore-Penrose pseudo-inverse for numerical stability.

```python
from mypackage import OLSRegression
model = OLSRegression()
model.fit(X_train, y_train)
print(model.coef_, model.intercept_)
```

#### Ridge Regression (L2)
```python
from mypackage import RidgeRegression
model = RidgeRegression(learning_rate=0.001, epochs=10_000, lambda_param=0.01)
model.fit(X_train, y_train)
```

#### Lasso Regression (L1)
```python
from mypackage import LassoRegression
model = LassoRegression(learning_rate=0.001, epochs=10_000, lambda_param=0.01)
model.fit(X_train, y_train)
```

---

### Preprocessing

```python
from mypackage import (
    StandardScaler,
    MissingValueHandler,
    OutlierHandler,
    OneHotEncoder,
    PolynomialFeatures,
    TargetTransformer,
    FeatureSelector,
)

# Handle missing values
handler = MissingValueHandler(strategy="mean")
X_clean = handler.fit_transform(X_with_nans)

# Remove outliers (IQR fence)
oh = OutlierHandler()
X_no_outliers = oh.fit_transform(X)

# One-hot encode categorical columns
enc = OneHotEncoder()
X_encoded = enc.fit_transform(X_categorical)

# Polynomial features up to degree 2
pf = PolynomialFeatures(degree=2)
X_poly = pf.fit_transform(X)

# Log-transform target
tt = TargetTransformer(method="log1p")
y_log = tt.fit_transform(y)
y_original = tt.inverse_transform(y_log)

# Select top-2 features by correlation
fs = FeatureSelector(method="correlation", k=2)
X_selected = fs.fit_transform(X, y)
```

---

### KNN

```python
from mypackage import KNNClassifier, KNNRegressor

clf = KNNClassifier(k=5, distance="euclidean")
clf.fit(X_train, y_train_labels)
print(clf.score(X_test, y_test_labels))

reg = KNNRegressor(k=3, distance="manhattan")
reg.fit(X_train, y_train)
print(reg.score(X_test, y_test))
```

---

### Neural Network — Perceptron

```python
from mypackage import Perceptron

p = Perceptron(learning_rate=0.01, epochs=200, verbose=True, random_state=0)
p.fit(X_train, y_train)
preds = p.predict(X_test)
```

---

### Model Selection

```python
from mypackage import CrossValidation, OLSRegression

cv = CrossValidation(OLSRegression(), k=5)
result = cv.evaluate(X, y)
print(f"CV mean R²: {result['mean_score']:.4f}")
```

---

### Diagnostics

```python
from mypackage import VIF, NormalityTest, HeteroscedasticityTest

# Multicollinearity
vif_scores = VIF().calculate(X)
print("VIF:", vif_scores)

# Normality of residuals
nt = NormalityTest()
print(nt.summary(residuals))

# Heteroscedasticity
ht = HeteroscedasticityTest()
print(ht.variance_check(residuals))
```

---

### Decomposition

```python
from mypackage import PCA

pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)
print("Explained variance:", pca.explained_variance_)
```

---

## Metrics Reference

| Function | Description |
|---|---|
| `mse(y_true, y_pred)` | Mean Squared Error |
| `rmse(y_true, y_pred)` | Root Mean Squared Error |
| `mae(y_true, y_pred)` | Mean Absolute Error |
| `r2_score(y_true, y_pred)` | Coefficient of determination R² |

---

## Running Tests

```bash
pip install pytest
pytest tests/ -v
```

---

## Building & Publishing

```bash
# Build source distribution and wheel
python -m build

# Check distribution
twine check dist/*

# Upload to TestPyPI first
twine upload --repository testpypi dist/*

# Upload to PyPI
twine upload dist/*
```

---

## Project Structure

```
mypackage-alina/
├── mypackage/
│   ├── __init__.py            ← public API
│   ├── decomposition/
│   │   └── pca.py
│   ├── diagnostics/
│   │   ├── heteroscedasticity.py
│   │   ├── multicollinearity.py
│   │   └── normality.py
│   ├── feature_selection/
│   │   ├── backward_elimination.py
│   │   └── forward_selection.py
│   ├── metrics/
│   │   └── regression_metrics.py
│   ├── model_selection/
│   │   ├── cross_validation.py
│   │   └── train_test_split.py
│   ├── neighbors/
│   │   ├── distances.py
│   │   ├── knn_classifier.py
│   │   └── knn_regressor.py
│   ├── neural_networks/
│   │   └── perceptron.py
│   ├── preprocessing/
│   │   ├── encoder.py
│   │   ├── feature_selection.py
│   │   ├── missing_values.py
│   │   ├── outliers.py
│   │   ├── polynomial.py
│   │   ├── scaler.py
│   │   └── target_transformer.py
│   ├── regression/
│   │   ├── lasso.py
│   │   ├── ols.py
│   │   └── ridge.py
│   └── visualization/
│       └── plots.py
├── datasets/                  ← external datasets (not shipped)
├── tests/
│   └── test_*.py
├── .gitignore
├── LICENSE
├── MANIFEST.in
├── README.md
├── pyproject.toml
└── setup.py
```

---

## Future Improvements

- Logistic regression and multi-class classification
- Decision tree and random forest
- Support Vector Machine (SVM)
- Gradient boosting
- Mini-batch gradient descent
- Pipeline API (fit/transform chaining)
- Cross-validated hyperparameter search
- More formal statistical tests (Shapiro-Wilk, Breusch-Pagan, White)
- Sparse matrix support

---

## License

[MIT](LICENSE) © 2026 Alina
