Metadata-Version: 2.4
Name: statdstools
Version: 0.1.1
Summary: Simple linear model tools
Author: Shouhardyo Sarkar
Requires-Python: >=3.8
Description-Content-Type: text/markdown

# statdstools: A Data Science package for Linear Models and Regularized Regression

`dstools` is a Python package developed progressively over the course of a semester in a statistical learning / regression class at the University of Iowa-Department of Statistics & Actuarial Science.  
It started with **basic linear regression** and grew into a toolkit that includes:

- Ordinary Least Squares (OLS) via normal equations and QR decomposition  
- Performance improvements using **Cython**  
- **Ridge regression** and cross-validation (`cvridge`)  
- **Adaptive Elastic Net (AENet)** with cross-validation over λ₁ and λ₂ (`cv_aenet`)  
- A vignette-style tutorial and examples on simulated data  

This README serves as both:

- A **user guide** for the `dstools` package  
- A **narrative summary**  

---

## 1. Project Overview

The goal of `dstools` is to provide **transparent, educational implementations** of core regression tools:

1. **Basic Linear Models**
   - Implemented from scratch using:
     - Normal equations
     - QR decomposition
   - Focus on understanding the math and numerical stability.

2. **Performance Optimization**
   - Selected parts of the code (e.g., linear model fitting) were reimplemented in **Cython** to:
     - Speed up repeated computations
     - Illustrate how low-level optimization works in Python ecosystems.

3. **Ridge Regression**
   - Introduced ℓ₂ regularization to handle multicollinearity and overfitting.
   - Implemented both:
     - Closed-form ridge solution
     - Cross-validation (`cvridge`) to select the penalty parameter λ.

4. **Adaptive Elastic Net (AENet)**
   - Combined ideas from LASSO and ridge with adaptive weights.
   - Implemented via **coordinate descent**.
   - Used ridge regression to compute adaptive weights.
   - Implemented cross-validation over λ₁ and λ₂ (`cv_aenet`).

5. **Cross-Validation and Model Selection**
   - Implemented K-fold cross-validation for:
     - Ridge regression
     - Adaptive elastic net
   - Produced:
     - Mean CV error surfaces
     - Upper and lower bounds (cvupper, cvlower)
     - Best tuning parameters via **cvmin**.

6. **Documentation and Packaging**
   - Organized as a proper Python package with:
     - `src/dstools/` structure
     - `pyproject.toml` or `setup.py`
     - `README.md`
     - `docs/tutorial.md` (vignette-style tutorial)
   - Designed to be installable via `pip install -e .`.

---

## 2. Package Structure

A typical `dstools` layout:

```text
dstools/
├── pyproject.toml          # or setup.py
├── README.md               # this file
├── LICENSE                 # license file (e.g., MIT)
├── docs/
│   └── tutorial.md         # vignette-style tutorial
└── src/
    └── dstools/
        ├── __init__.py     # package initializer
        ├── mylm_qr.py      # basic linear model via QR
        ├── mylm.py         # basic linear model via normal equations (optional)
        ├── mylm_cython.pyx # Cython-accelerated linear model (optional)
        ├── ridge.py        # ridge regression + cvridge
        ├── aenet.py        # adaptive elastic net implementation
        ├── cv_aenet.py     # cross-validation for AENet (if separate)
        ├── utils.py        # helper functions (standardization, etc.)
        └── ...


Installation :
pip install -e .
import dstools
from dstools import cvridge, ridge, aenet, cv_aenet




Example usages:

1) mylm: Basic Linear Regression (Normal Equations)

import numpy as np
from dstools import mylm  # if exposed in __init__.py

X = np.array([[1, 2],
              [2, 3],
              [3, 4],
              [4, 5]], dtype=float)
y = np.array([2, 3, 4, 5], dtype=float)

fit = mylm(X, y)
print("Coefficients:", fit["beta"])
print("Fitted values:", fit["fitted"])
print("Residuals:", fit["residuals"])


2) mylm_qr: Linear Regression via QR Decomposition
from dstools import mylm_qr

fit_qr = mylm_qr(X, y)
print("Coefficients (QR):", fit_qr["beta"])

3) Cython: Speeding Up Linear Models

# mylm_cython.pyx (conceptual)
cimport cython
import numpy as np
cimport numpy as np

@cython.boundscheck(False)
@cython.wraparound(False)
def mylm_cython(double[:, :] X, double[:] y):
    # implement normal equations or QR with typed loops
    # return coefficients, etc.
    ...
from dstools import mylm_cython

fit_fast = mylm_cython(X, y)

4) Ridge Regression and Cross-Validation

from dstools import ridge

ridge_fit = ridge(X, y, lam=1.0)
beta_ridge = ridge_fit.betas.flatten()

5) cvridge: Cross-Validation for Ridge

from dstools import cvridge
import numpy as np

lam_grid = np.logspace(-3, 6, 200)
cv = cvridge(X, y, lam_grid, K=5)

best_idx = np.argmin(cv["cv_mse"])
best_lam = cv["lam"][best_idx]

ridge_fit = ridge(X, y, best_lam)
beta_ridge = ridge_fit.betas.flatten()


6) Adaptive Elastic Net (AENet)


from dstools import aenet

lam1 = 0.1
lam2 = 0.1

fit_aenet = aenet(X, y, lam1=lam1, lam2=lam2, weights=weights)
b0 = fit_aenet["b0"]
beta_hat = fit_aenet["beta"]

7) Cross-Validation for AENet: cv_aenet

from dstools import cv_aenet
import numpy as np

lambda1_seq = np.logspace(3, -1, 30)
lambda2_seq = np.array([0.0, 0.1, 1.0])

cvfit = cv_aenet(X, y, lambda1_seq, lambda2_seq, k=5, random_state=123)

best_lam1 = cvfit["best_lambda1"]
best_lam2 = cvfit["best_lambda2"]

i = cvfit["best_lambda1_index"]
j = cvfit["best_lambda2_index"]

beta_cvmin = cvfit["full_fit"][i]["beta"]
b0_cvmin = cvfit["full_fit"][i]["b0"]

selected = np.where(np.abs(beta_cvmin) > 1e-8)[0]


