Metadata-Version: 2.4
Name: quickinsights
Version: 0.5.1
Summary: Lightweight data analysis & ML library. Only requires numpy - includes DataFrame, statistics, ML algorithms, and visualization with zero heavy dependencies.
Author-email: QuickInsights Team <erena6466@gmail.com>
Maintainer-email: QuickInsights Team <erena6466@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/ErenAta16/quickinsight_library
Project-URL: Documentation, https://github.com/ErenAta16/quickinsight_library/blob/main/docs/API_REFERENCE.md
Project-URL: Repository, https://github.com/ErenAta16/quickinsight_library.git
Project-URL: Bug Tracker, https://github.com/ErenAta16/quickinsight_library/issues
Project-URL: Source Code, https://github.com/ErenAta16/quickinsight_library
Keywords: data-analysis,machine-learning,visualization,statistics,numpy,lightweight,dataframe,no-dependencies,streaming,clustering,regression,classification
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Healthcare Industry
Classifier: Intended Audience :: Manufacturing
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
Classifier: Topic :: Database :: Database Engines/Servers
Classifier: Topic :: Office/Business :: Financial :: Investment
Classifier: Topic :: Office/Business :: Financial :: Spreadsheet
Classifier: Framework :: AsyncIO
Classifier: Environment :: Console
Classifier: Environment :: Web Environment
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.21.0
Provides-Extra: pandas
Requires-Dist: pandas>=1.5.0; extra == "pandas"
Provides-Extra: viz
Requires-Dist: matplotlib>=3.5.0; extra == "viz"
Requires-Dist: seaborn>=0.11.0; extra == "viz"
Requires-Dist: plotly>=5.0.0; extra == "viz"
Provides-Extra: stats
Requires-Dist: scipy>=1.9.0; extra == "stats"
Provides-Extra: sklearn
Requires-Dist: scikit-learn>=1.1.0; extra == "sklearn"
Provides-Extra: config
Requires-Dist: pyyaml>=6.0; extra == "config"
Requires-Dist: toml>=0.10.0; extra == "config"
Provides-Extra: system
Requires-Dist: psutil>=5.9.0; extra == "system"
Provides-Extra: io
Requires-Dist: openpyxl>=3.1.0; extra == "io"
Requires-Dist: pyarrow>=10.0.0; extra == "io"
Provides-Extra: dev
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Requires-Dist: mypy>=1.3.0; extra == "dev"
Requires-Dist: pylint>=3.0.0; extra == "dev"
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pre-commit>=3.0.0; extra == "dev"
Provides-Extra: security
Requires-Dist: bandit>=1.7.0; extra == "security"
Requires-Dist: safety>=2.3.0; extra == "security"
Provides-Extra: performance
Requires-Dist: numba>=0.57.0; extra == "performance"
Requires-Dist: dask>=2023.0.0; extra == "performance"
Provides-Extra: gpu
Requires-Dist: torch>=1.9.0; extra == "gpu"
Requires-Dist: cupy>=10.0.0; extra == "gpu"
Provides-Extra: cloud
Requires-Dist: boto3>=1.26.0; extra == "cloud"
Requires-Dist: azure-storage-blob>=12.14.0; extra == "cloud"
Requires-Dist: google-cloud-storage>=2.7.0; extra == "cloud"
Provides-Extra: fast
Requires-Dist: numba>=0.57.0; extra == "fast"
Requires-Dist: dask>=2023.0.0; extra == "fast"
Requires-Dist: joblib>=1.2.0; extra == "fast"
Provides-Extra: ml
Requires-Dist: torch>=1.9.0; extra == "ml"
Requires-Dist: xgboost>=1.7.0; extra == "ml"
Requires-Dist: lightgbm>=3.3.0; extra == "ml"
Requires-Dist: catboost>=1.1.0; extra == "ml"
Provides-Extra: web
Requires-Dist: fastapi>=0.100.0; extra == "web"
Requires-Dist: uvicorn>=0.20.0; extra == "web"
Requires-Dist: flask>=2.0.0; extra == "web"
Requires-Dist: pydantic>=2.0.0; extra == "web"
Provides-Extra: full
Requires-Dist: quickinsights[config,io,pandas,sklearn,stats,system,viz]; extra == "full"
Provides-Extra: all
Requires-Dist: quickinsights[cloud,dev,fast,full,gpu,ml,performance,security,web]; extra == "all"
Dynamic: license-file

# QuickInsights

[![PyPI](https://img.shields.io/pypi/v/quickinsights)](https://pypi.org/project/quickinsights/)
[![Python](https://img.shields.io/pypi/pyversions/quickinsights)](https://pypi.org/project/quickinsights/)
[![License](https://img.shields.io/pypi/l/quickinsights)](https://github.com/ErenAta16/quickinsight_library/blob/main/LICENSE)
[![Tests](https://img.shields.io/badge/tests-107%20passed-brightgreen)]()

A data analysis and machine learning toolkit that runs on **numpy alone**.

pandas, scipy, scikit-learn, matplotlib — all optional.
Install them when you need them; QuickInsights works without any of them.

```
pip install quickinsights
```

## What is in the box

| Module | What it does | Replaces |
|---|---|---|
| `dataframe` | Read / write CSV-JSON-Parquet, filter, group, describe | pandas |
| `stats` | Descriptive stats, correlation, hypothesis tests, outlier detection | scipy |
| `ml` | 22 algorithms — regression, classification, clustering, reduction | scikit-learn |
| `viz` | Text, HTML and SVG charts; auto-fallback to matplotlib when present | matplotlib |
| `io_module` | Smart loader (CSV &rarr; Parquet caching), streaming, result cache | — |
| `analysis` | One-call `analyze()` + `quick_insight()` executive summary | — |
| `cleaning` | Missing-value handling, duplicate removal | — |
| `plugins` | Runtime plugin registration and execution | — |
| `config_module` | Nested key-value config with JSON / YAML / TOML back-ends | — |

## Getting started

```python
import quickinsights as qi

# analyse a plain dict — no pandas required
result = qi.analyze({
    "price":    [29.99, 49.99, 19.99, 99.99],
    "rating":   [4.5, 3.8, 4.9, 4.1],
    "category": ["books", "electronics", "books", "clothing"],
})

# one-line executive summary
print(qi.quick_insight(result, target="price")["executive_summary"])

# export to HTML, CSV or JSON
qi.export(result, "report", "html")
```

## Working with data

```python
from quickinsights.dataframe import QuickFrame

qf = QuickFrame.read_csv("sales.csv")          # chunked reading supported
qf = qf[qf["revenue"].values > 0]              # filter
print(qf.groupby("region").mean())              # aggregate
print(qf.corr())                                # correlation matrix
qf.to_csv("filtered.csv")
```

QuickFrame supports: `select_dtypes`, `sort_values`, `dropna` / `fillna`,
`describe`, `value_counts`, `concat`, `rename`, `drop`, `duplicated`,
chunked CSV iteration, JSON and Parquet I/O.

## Statistics

```python
from quickinsights.stats import (
    pearson_correlation, ttest_ind, detect_outliers_iqr, jarque_bera
)
import numpy as np

x = np.random.randn(1000)
y = 2 * x + np.random.randn(1000) * 0.5

pearson_correlation(x, y)       # ≈ 0.97
detect_outliers_iqr(x).sum()    # number of outliers
jarque_bera(x)                  # (statistic, p_value)
ttest_ind(x, y)                 # (t, p)
```

Also available: `spearman_correlation`, `covariance`, `skewness`, `kurtosis`,
`chi2_test`, `zscore`, `kde_estimate`, `entropy`, distance metrics.

## Machine learning

22 algorithms, from scratch, in pure NumPy.

```python
from quickinsights.ml import (
    train_test_split, StandardScaler,
    RandomForestClassifier, accuracy_score, classification_report,
)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
X_train = StandardScaler().fit_transform(X_train)

model = RandomForestClassifier(n_estimators=50, max_depth=8)
model.fit(X_train, y_train)

print(accuracy_score(y_test, model.predict(X_test)))
print(classification_report(y_test, model.predict(X_test)))
```

**Full algorithm list**

| Category | Algorithms |
|---|---|
| Linear | LinearRegression, LogisticRegression, RidgeRegression, LassoRegression, ElasticNet |
| Trees & ensembles | DecisionTreeClassifier, RandomForestClassifier/Regressor, GradientBoostingClassifier/Regressor |
| Neighbours | KNeighborsClassifier |
| Bayes | GaussianNB, MultinomialNB |
| Clustering | KMeans, DBSCAN, AgglomerativeClustering |
| Dimensionality reduction | PCA, t-SNE |
| Preprocessing | StandardScaler, MinMaxScaler, LabelEncoder, train_test_split |
| Evaluation | accuracy_score, MSE, MAE, R², confusion_matrix, classification_report, cross_val_score |

## Visualization without matplotlib

```python
from quickinsights.viz import text_histogram, generate_html_report

# works in any terminal
print(text_histogram(data, bins=20, title="Distribution"))

# self-contained HTML report — no browser extensions needed
generate_html_report(analysis_results, output_path="report.html")
```

When matplotlib **is** installed, `smart_histogram` / `smart_bar_chart` /
`smart_heatmap` produce regular matplotlib figures automatically.

## Streaming large files

```python
from quickinsights.io_module import StreamingAnalyzer

analyzer = StreamingAnalyzer(chunksize=50_000)
result = analyzer.analyze("big_file.csv")   # constant memory usage
```

## Benchmarks

Measured on 5 000 samples, 20 features. Native = QuickInsights, sklearn = scikit-learn.

| Algorithm | Native | sklearn | Accuracy difference |
|---|---|---|---|
| GaussianNB | **0.2 ms** | 1.0 ms | identical |
| Ridge | **0.4 ms** | 2.2 ms | identical |
| LinearRegression | **2.2 ms** | 4.0 ms | identical |
| KMeans (k = 5) | **155 ms** | 637 ms | identical |
| RandomForest (20 trees) | 167 ms | **30 ms** | identical |
| GradientBoosting (50 trees) | 963 ms | **93 ms** | identical |

Linear-algebra algorithms beat sklearn because both call into the same
BLAS/LAPACK routines with less Python overhead.
Tree-based ensembles are slower (pure Python vs compiled C) but
**produce the same predictions**.

## Installation options

```bash
pip install quickinsights              # numpy only — everything works
pip install quickinsights[pandas]      # adds pandas
pip install quickinsights[viz]         # adds matplotlib, seaborn, plotly
pip install quickinsights[sklearn]     # adds scikit-learn
pip install quickinsights[full]        # all of the above
```

## Project layout

```
src/quickinsights/
    __init__.py          core.py          error_handling.py
    dataframe/           stats/           ml/
    viz/                 io_module/       analysis/
    cleaning/            plugins/         config_module/
```

## Running the tests

```bash
pip install pytest
pytest tests/ -v          # 107 tests, < 3 s
```

## License

MIT — see [LICENSE](LICENSE).

## Author

Eren Ata — erena6466@gmail.com
