Metadata-Version: 2.4
Name: notebookpkg
Version: 2.0.0
Summary: A package manager for Jupyter notebook templates
Author: Priyansu Pattanaik
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Requires-Dist: click
Requires-Dist: pandas
Requires-Dist: nbformat
Dynamic: author
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# notebookpkg

**A notebook template manager for ML students.**  
One command installs a ready-to-run Jupyter notebook — already wired to your dataset, with your column names, your target, and your drop columns injected automatically.

No more writing the same boilerplate for every assignment. Just pick a template, point it to your CSV, and open Jupyter.

---

## Installation

```bash
pip install notebookpkg
```

**Requirements:** Python 3.7+, pandas, scikit-learn, matplotlib, seaborn, nbformat, click

---

## How It Works

1. You run one command with your CSV file
2. The tool reads your dataset and detects all column names and types
3. It injects your dataset path, column names, target column, and drop columns into the template
4. A `.ipynb` file is created in your current folder
5. Open it in Jupyter and run all cells — everything is pre-filled

---

## Quick Start

```bash
# Step 1: See all available templates
notebookpkg list

# Step 2: Install a template for your CSV
notebookpkg install linear-regression --dataset Salary_Data.csv --target Salary

# Step 3: Open the notebook
jupyter notebook linear-regression_notebook.ipynb
```

---

## Commands

### `notebookpkg list`

Lists all available templates with their descriptions.

```bash
notebookpkg list
```

**Output:**

```
📦 Available Templates:

  decision-tree                       Decision Tree: criterion=entropy, max_depth=5, plot_tree, accuracy, report
  eda-basic                           Basic EDA: head, shape, info, describe, nulls, dtypes, nunique
  eda-full                            Full EDA: visual + outliers, skewness, duplicates, value counts
  eda-visual                          Visual EDA: pairplot, heatmap, distributions
  kmeans-clustering                   KMeans Clustering: StandardScaler, elbow method, silhouette score, cluster plot
  knn-classifier                      KNN Classifier: StandardScaler, fit, accuracy, confusion matrix, report
  lasso-ridge                         Linear + Lasso + Ridge Regression with StandardScaler and coefficient plots
  linear-regression                   Linear Regression: EDA, fit, predict, visualize, MSE, R²
  logistic-regression                 Logistic Regression: StandardScaler, fit, accuracy, confusion matrix, report
  multi-model-compare                 LR + KNN + Naive Bayes on same dataset with accuracy comparison
  naive-bayes                         Gaussian Naive Bayes: StandardScaler, fit, accuracy, confusion matrix heatmap
  polynomial-regression               Polynomial Regression: PolynomialFeatures, smooth curve plot, MSE, R²
  random-forest-classifier            Random Forest Classifier: model1, accuracy, confusion matrix, feature importance
  random-forest-regressor             Random Forest Regressor: RFR, fit, MSE, R², Actual vs Predicted scatter
  svm-classifier                      SVM: Linear kernel, then RBF kernel with AgeSalary feature engineering
```

---

### `notebookpkg install`

Installs a template wired to your dataset.

```bash
notebookpkg install <template-name> --dataset <path-to-csv> [options]
```

**All options:**

| Option | Required | Default | Description |
|---|---|---|---|
| `--dataset` | Yes | — | Path to your CSV file |
| `--target` | No | Last column | Target/label column name |
| `--drop` | No | None | Columns to drop, comma-separated |
| `--degree` | No | `2` | Polynomial degree — only for `polynomial-regression` |
| `--clusters` | No | `3` | Number of clusters — only for `kmeans-clustering` |
| `--output` | No | `<template>_notebook.ipynb` | Custom output filename |

---


---

### `notebookpkg syntax`

Prints the complete code of a template — every cell in order — directly in your terminal.
Use this to preview exactly what will be generated before installing.

```bash
notebookpkg syntax <template-name>
```

**Example:**

```bash
notebookpkg syntax logistic-regression
```

**Output:**

```
============================================================
  Template : logistic-regression
  Logistic Regression: StandardScaler, fit, accuracy, confusion matrix, report
  Total cells: 16
============================================================

── Cell 1 ──────────────────────────────────────────────────
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

── Cell 2 ──────────────────────────────────────────────────
df = pd.read_csv('{{DATASET_PATH}}')
df.head()

── Cell 3 ──────────────────────────────────────────────────
{{DROP_CODE}}

... (all remaining cells shown in full)

============================================================
  Install this template:
  notebookpkg install logistic-regression --dataset yourdata.csv
============================================================
```

You can run `syntax` for any of the 15 templates:

```bash
notebookpkg syntax eda-basic
notebookpkg syntax eda-visual
notebookpkg syntax eda-full
notebookpkg syntax linear-regression
notebookpkg syntax polynomial-regression
notebookpkg syntax logistic-regression
notebookpkg syntax knn-classifier
notebookpkg syntax naive-bayes
notebookpkg syntax lasso-ridge
notebookpkg syntax decision-tree
notebookpkg syntax random-forest-regressor
notebookpkg syntax random-forest-classifier
notebookpkg syntax svm-classifier
notebookpkg syntax kmeans-clustering
notebookpkg syntax multi-model-compare
notebookpkg syntax cross-validation
notebookpkg syntax dbscan-clustering
notebookpkg syntax pca
notebookpkg syntax association-rules
notebookpkg syntax arima-forecasting
notebookpkg syntax text-classification
notebookpkg syntax ensemble-methods
notebookpkg syntax hierarchical-clustering
notebookpkg syntax moving-average
```

---

## Templates

### EDA Templates

#### `eda-basic`
Basic Exploratory Data Analysis. Covers the essential checks every notebook needs.

**Cells generated:**
1. Imports
2. `df.read_csv()` + `df.head()`
3. Drop columns cell (optional)
4. `df.shape`
5. `df.info()`
6. `df.describe()`
7. `df.isnull().sum()`
8. `df.dtypes`
9. `df.nunique()`

```bash
notebookpkg install eda-basic --dataset data.csv
```

---

#### `eda-visual`
EDA with all key visualizations.

**Cells generated:**
Everything in `eda-basic`, plus:
- `sns.pairplot(df)`
- Correlation heatmap (`df.corr()` + `sns.heatmap()`)
- Histogram for each numeric column

```bash
notebookpkg install eda-visual --dataset data.csv
```

---

#### `eda-full`
Complete EDA including outlier detection and categorical analysis.

**Cells generated:**
Everything in `eda-visual`, plus:
- `df.duplicated().sum()`
- Boxplot for each numeric column
- Skewness: `df.skew(numeric_only=True)`
- IQR outlier count for each numeric column
- `value_counts()` for each categorical column

```bash
notebookpkg install eda-full --dataset data.csv
```

---

### Regression Templates

#### `linear-regression`
Standard Linear Regression pipeline on your CSV.

**Cells generated:**
1. Imports
2. Load dataset + head
3. Drop columns cell
4. shape, info, describe, isnull
5. pairplot
6. Correlation heatmap
7. X / y split (iloc)
8. train_test_split (test_size=0.2, random_state=0)
9. `regressor = LinearRegression()` + fit
10. Predict
11. Visualize training data (scatter + regression line)
12. Visualize testing data
13. Coefficient and intercept
14. MSE
15. R²

```bash
notebookpkg install linear-regression --dataset Salary_Data.csv --target Salary
```

---

#### `polynomial-regression`
Polynomial Regression with smooth curve visualization.

**Cells generated:**
1. Imports (includes `PolynomialFeatures`)
2. Load dataset + head
3. Drop columns cell
4. info, describe, pairplot, heatmap
5. X / y split
6. `PolynomialFeatures(degree=N)` + transform
7. train_test_split
8. `plr = LinearRegression()` + fit
9. Smooth curve plot using `X_gride`
10. Predict
11. MSE
12. R²

```bash
notebookpkg install polynomial-regression --dataset hw.csv --target Price
notebookpkg install polynomial-regression --dataset hw.csv --target Price --degree 3
```

---

#### `lasso-ridge`
Linear Regression + Lasso + Ridge, all on the same dataset with comparison.

**Cells generated:**
1. Imports
2. Load + EDA (info, describe, columns, shape)
3. Drop columns cell
4. X / y split
5. train_test_split
6. StandardScaler
7. Linear Regression (`lm`) + coefficient barh plot
8. Lasso (`alpha=0.1`) + MSE + R² + coefficient barh plot
9. Ridge (`alpha=0.1`) + MSE + R²

```bash
notebookpkg install lasso-ridge --dataset BostonHousing.csv --target medv
```

---

### Classification Templates

#### `logistic-regression`
Logistic Regression with StandardScaler.

**Cells generated:**
1. Imports
2. Load dataset + head
3. Drop columns cell
4. shape, info, describe, isnull
5. Correlation heatmap
6. X / y split
7. train_test_split (test_size=0.3, random_state=0)
8. `sc = StandardScaler()` + fit_transform / transform
9. `lr = LogisticRegression()` + fit
10. Predict
11. Accuracy score
12. Confusion matrix
13. Classification report

```bash
notebookpkg install logistic-regression --dataset Day5.csv --target Purchased
notebookpkg install logistic-regression --dataset Day5.csv --target Purchased --drop "User ID,Gender"
```

---

#### `knn-classifier`
K-Nearest Neighbors Classifier with StandardScaler.

**Cells generated:**
1. Imports
2. Load dataset + head
3. Drop columns cell
4. shape, info, describe, isnull, duplicated
5. Correlation heatmap + pairplot
6. X / y split
7. train_test_split (test_size=0.2, random_state=42)
8. StandardScaler
9. `knn = KNeighborsClassifier()` + fit
10. Predict
11. Accuracy, confusion matrix, classification report

```bash
notebookpkg install knn-classifier --dataset Day5.csv --target Purchased
```

---

#### `naive-bayes`
Gaussian Naive Bayes with StandardScaler and confusion matrix heatmap.

**Cells generated:**
1. Imports
2. Load + shape, describe, isnull
3. Drop columns cell
4. Correlation heatmap
5. X / y split
6. train_test_split with `stratify=y`
7. StandardScaler (fit only on train)
8. `nb = GaussianNB()` + fit
9. Predict
10. Accuracy
11. Classification report
12. Confusion matrix as `sns.heatmap`

```bash
notebookpkg install naive-bayes --dataset Day5.csv --target Purchased
```

---

#### `decision-tree`
Decision Tree Classifier with tree visualization.

**Cells generated:**
1. Imports (includes `from sklearn import tree`)
2. Load + EDA
3. Drop columns cell
4. Distribution plot + heatmap + pairplot
5. X / y split
6. train_test_split
7. StandardScaler
8. `DecisionTreeClassifier(criterion='entropy', max_depth=5, random_state=0)`
9. Predict
10. Accuracy score
11. Confusion matrix
12. Classification report
13. `tree.plot_tree()` — full visual tree diagram

```bash
notebookpkg install decision-tree --dataset SNP.csv --target Purchased
```

---

#### `svm-classifier`
SVM with both Linear and RBF kernels, plus feature engineering.

**Cells generated:**
1. Imports (includes `SVC`)
2. Load + EDA (info, describe, isnull, value_counts)
3. Drop columns cell
4. Scatter plot of features
5. X / y split
6. train_test_split
7. StandardScaler
8. `model = SVC(kernel='linear')` + fit + predict + accuracy + CM + heatmap
9. Feature engineering: `df['AgeSalary'] = df['Age'] * df['EstimatedSalary']`
10. Re-split with new feature
11. `model1 = SVC(kernel='rbf')` + fit + predict + accuracy + CM + heatmap

```bash
notebookpkg install svm-classifier --dataset SNP.csv --target Purchased
```

---

#### `multi-model-compare`
Runs Logistic Regression, KNN, and Naive Bayes on the same dataset and compares accuracy.

**Cells generated:**
1. Imports
2. Load + EDA
3. Drop columns cell
4. X / y split
5. train_test_split
6. `model_lr = LogisticRegression()` → fit → predict → accuracy → report
7. `model_knn = KNeighborsClassifier()` → fit → predict → accuracy → report
8. `model_nb = GaussianNB()` → fit → predict → accuracy → report
9. Comparison dict with all three accuracy scores printed together

```bash
notebookpkg install multi-model-compare --dataset Day5.csv --target Purchased
```

---

### Ensemble Templates

#### `random-forest-regressor`
Random Forest Regressor with actual vs predicted scatter plot.

**Cells generated:**
1. Imports
2. Load + isnull, duplicated, info, describe
3. Drop columns cell
4. Correlation heatmap
5. X / y split
6. train_test_split (test_size=0.2, random_state=42)
7. `RFR = RandomForestRegressor(n_estimators=100, random_state=42)` + fit
8. Predict
9. MSE
10. R²
11. Scatter plot: Actual vs Predicted

```bash
notebookpkg install random-forest-regressor --dataset housing.csv --target Price
```

---

#### `random-forest-classifier`
Random Forest Classifier with feature importance bar chart.

**Cells generated:**
1. Imports
2. Load + EDA
3. Drop columns cell
4. X / y split
5. train_test_split
6. StandardScaler
7. `model1 = RandomForestClassifier(n_estimators=100, random_state=42)` + fit
8. Predict
9. Accuracy
10. Classification report
11. Confusion matrix heatmap
12. Feature importance: `model1.feature_importances_`
13. Bar chart of feature importance

```bash
notebookpkg install random-forest-classifier --dataset iris.csv --target species
```

---

### Clustering Templates

#### `kmeans-clustering`
KMeans Clustering with elbow method and silhouette score. No target column needed.

**Cells generated:**
1. Imports (includes `KMeans`, `silhouette_score`)
2. Load + shape, info, describe, isnull, duplicated
3. Drop columns cell
4. pairplot
5. Correlation heatmap
6. StandardScaler on numeric columns
7. Elbow method loop (k=1 to 9) + inertia plot
8. `KMeans(n_clusters=N)` + fit
9. Cluster labels added to df
10. Cluster scatter plot with centroids marked in red
11. Silhouette score

```bash
notebookpkg install kmeans-clustering --dataset Mall_Customers.csv
notebookpkg install kmeans-clustering --dataset Mall_Customers.csv --clusters 5
```

---

## The `--drop` Option

Many real datasets have ID columns, name columns, or other columns that should not go into the model.
Use `--drop` to remove them before anything is processed.

**With `--drop`**, the generated notebook gets:
```python
df = df.drop(columns=['User ID', 'Gender'], axis=1)
df.head()
```

**Without `--drop`**, the cell appears as a comment so you can still do it manually:
```python
# No columns dropped
# To drop columns use: df = df.drop(columns=['col1','col2'], axis=1)
```

The profiler also respects the drop — column detection for `NUMERIC_COLS`, `CAT_COLS`, and `FEATURE_COLS` all happen after the drop, so the rest of the notebook is consistent.

```bash
# Drop one column
notebookpkg install knn-classifier --dataset Day5.csv --target Purchased --drop "User ID"

# Drop multiple columns
notebookpkg install logistic-regression --dataset Day5.csv --target Purchased --drop "User ID,Gender"
```

---

## All Usage Examples

```bash
# ── EDA ──────────────────────────────────────────────────────────────────
notebookpkg install eda-basic   --dataset data.csv
notebookpkg install eda-visual  --dataset data.csv
notebookpkg install eda-full    --dataset data.csv

# ── Regression ───────────────────────────────────────────────────────────
notebookpkg install linear-regression      --dataset Salary_Data.csv --target Salary
notebookpkg install polynomial-regression  --dataset hw.csv --target Price --degree 3
notebookpkg install lasso-ridge            --dataset BostonHousing.csv --target medv

# ── Classification ────────────────────────────────────────────────────────
notebookpkg install logistic-regression    --dataset Day5.csv --target Purchased
notebookpkg install knn-classifier         --dataset Day5.csv --target Purchased
notebookpkg install naive-bayes            --dataset Day5.csv --target Purchased
notebookpkg install decision-tree          --dataset SNP.csv  --target Purchased
notebookpkg install svm-classifier         --dataset SNP.csv  --target Purchased
notebookpkg install multi-model-compare    --dataset Day5.csv --target Purchased

# ── Ensemble ─────────────────────────────────────────────────────────────
notebookpkg install random-forest-regressor   --dataset housing.csv --target Price
notebookpkg install random-forest-classifier  --dataset iris.csv    --target species

# ── Clustering ────────────────────────────────────────────────────────────
notebookpkg install kmeans-clustering --dataset Mall_Customers.csv
notebookpkg install kmeans-clustering --dataset Mall_Customers.csv --clusters 5

# ── With drop ─────────────────────────────────────────────────────────────
notebookpkg install logistic-regression --dataset Day5.csv --target Purchased --drop "User ID,Gender"

# ── Custom output filename ────────────────────────────────────────────────
notebookpkg install linear-regression --dataset Salary_Data.csv --target Salary --output my_analysis.ipynb
```

---

## Project Structure

```
notebookpkg/
├── notebookpkg/
│   ├── cli.py          # CLI commands: install, list
│   ├── profiler.py     # Reads CSV, detects column types
│   ├── injector.py     # Replaces tokens in notebook cells
│   ├── registry.py     # Finds templates by name
│   └── templates/
│       ├── eda-basic/
│       ├── eda-visual/
│       ├── eda-full/
│       ├── linear-regression/
│       ├── polynomial-regression/
│       ├── logistic-regression/
│       ├── knn-classifier/
│       ├── naive-bayes/
│       ├── lasso-ridge/
│       ├── decision-tree/
│       ├── random-forest-regressor/
│       ├── random-forest-classifier/
│       ├── svm-classifier/
│       ├── kmeans-clustering/
│       └── multi-model-compare/
├── build_templates.py  # Regenerates all .ipynb template files
├── setup.py
├── MANIFEST.in
└── README.md
```

Each template folder contains:
- `template.ipynb` — the notebook with `{{TOKEN}}` placeholders
- `meta.json` — name, description, and whether a target column is needed

---

## Dependencies

```
pandas
numpy
scikit-learn
matplotlib
seaborn
nbformat
click
```

These are installed automatically when you run `pip install notebookpkg`.

---

## Author

**Priyansu Pattanaik**  
B.Tech — Electronics & Telecommunication  
PG Diploma in AI — CDAC Kharghar  
priyansupattanaikwork@gmail.com

---

## License

MIT License. Free to use, modify, and distribute.
