Metadata-Version: 2.4
Name: bizlens
Version: 2.2.11
Summary: Educational descriptive analytics + statistical inference + process mining. Rich tables, diagnostic checks, hypothesis testing, and business process analysis for teaching and analysis.
Home-page: https://github.com/solutiongate-learn/bizlens
Author: Sudhanshu Singh
Author-email: Sudhanshu Singh <cc9n8y8tqc@privaterelay.appleid.com>
License: MIT
Project-URL: Homepage, https://github.com/solutiongate-learn/bizlens
Project-URL: Documentation, https://github.com/solutiongate-learn/bizlens#readme
Project-URL: Repository, https://github.com/solutiongate-learn/bizlens
Project-URL: Bug Tracker, https://github.com/solutiongate-learn/bizlens/issues
Project-URL: PyPI, https://pypi.org/project/bizlens/
Keywords: analytics,statistics,education,business-intelligence,descriptive-analytics,eda,data-visualization,outlier-detection,normality-test,synthetic-data,process-mining,teaching,sample-vs-population,skewness,data-science
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Education
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.21.0
Requires-Dist: pandas>=1.5.0
Requires-Dist: polars>=0.18.0
Requires-Dist: narwhals>=1.0.0
Requires-Dist: scipy>=1.9.0
Requires-Dist: statsmodels>=0.13.0
Requires-Dist: matplotlib>=3.6.0
Requires-Dist: seaborn>=0.12.0
Requires-Dist: plotly>=5.0.0
Requires-Dist: scikit-learn>=1.0.0
Requires-Dist: rich>=13.0.0
Provides-Extra: jupyter
Requires-Dist: jupyter>=1.0.0; extra == "jupyter"
Requires-Dist: ipython>=7.30.0; extra == "jupyter"
Requires-Dist: ipykernel>=6.0.0; extra == "jupyter"
Provides-Extra: stats-advanced
Requires-Dist: phik>=0.11.0; extra == "stats-advanced"
Provides-Extra: process-mining
Requires-Dist: pm4py>=2.7.0; extra == "process-mining"
Provides-Extra: full
Requires-Dist: jupyter>=1.0.0; extra == "full"
Requires-Dist: ipython>=7.30.0; extra == "full"
Requires-Dist: ipykernel>=6.0.0; extra == "full"
Requires-Dist: phik>=0.11.0; extra == "full"
Requires-Dist: pm4py>=2.7.0; extra == "full"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: black>=23.0; extra == "dev"
Requires-Dist: flake8>=6.0; extra == "dev"
Requires-Dist: mypy>=0.990; extra == "dev"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# BizLens v2.2.11 📊

**Educational Analytics: Descriptive + Statistical Inference + Process Mining**

BizLens is a Python library for business analysts, data scientists, and educators. It combines three powerful analytics domains:

- **Descriptive Analytics**: Rich statistical tables, summaries, and data exploration
- **Statistical Inference**: Hypothesis testing, confidence intervals, effect sizes
- **Process Mining**: Event log analysis, case metrics, bottleneck detection

---

## Quick Start

### Installation

```bash
pip install bizlens==2.2.11
```

### 3-Line Example: Descriptive Analytics

```python
import bizlens as bl
import pandas as pd

data = pd.DataFrame({'revenue': [100, 150, 200, 250], 'region': ['A', 'B', 'A', 'B']})
bl.describe(data)  # Auto-detects data type, creates summary tables
```

### Process Mining (Auto-Detection)

```python
event_log = bl.generate_hr_onboarding_event_log(num_cases=50)
bl.describe(event_log)  # Auto-detects as event log, shows case metrics & variants
```

### Hypothesis Testing

```python
import numpy as np

sample = np.random.normal(100, 15, size=50)
bl.inference.confidence_interval(sample)  # 95% CI for mean
bl.inference.one_sample_ttest(sample, pop_mean=100)  # Test vs population
```

---

## Core Modules

### 1. **describe()** — Smart Descriptive Analytics
- Analyzes DataFrames with automatic statistics
- **Auto-detects event logs**: When columns include case_id/activity/timestamp
- **Sample vs Population**: Educational comparison (ddof=1 vs ddof=0)
- **Rich output**: Professional console tables with rich formatting

```python
bl.describe(dataframe)
bl.describe(event_log)  # Auto-detects and shows process mining metrics
```

---

### 2. **tables** — Statistical Tables

Professional tables for data summarization:

```python
bl.tables.frequency_table(series)              # Value counts with %
bl.tables.percentile_table(series)             # Quartile breakdown (0,25,50,75,100)
bl.tables.contingency_table(df, 'row', 'col') # Crosstab with chi-square
bl.tables.summary_statistics(df)               # Count, mean, std, min, quartiles, max
bl.tables.group_comparison(df, 'group')       # ANOVA across groups
bl.tables.distribution_fit(series)             # Fit to distributions (normal, exponential, etc)
bl.tables.descriptive_comparison(df1, df2)    # Sample vs population tables
```

---

### 3. **diagnostic** — Data Quality & Statistical Diagnostics

Check data quality and identify anomalies:

```python
bl.diagnostic.detect_outliers(series, method='iqr')      # IQR, Z-score, or Isolation Forest
bl.diagnostic.normality_test(series)                      # Shapiro-Wilk, Anderson-Darling, KS
bl.diagnostic.correlation_analysis(df)                    # Pearson & Spearman with heatmap
bl.diagnostic.missing_value_analysis(df)                  # Missing data patterns
bl.diagnostic.duplicate_analysis(df)                      # Find exact duplicates
bl.diagnostic.sample_vs_population(sample, pop_mean, pop_std)  # Educational t-test
```

---

### 4. **inference** — Hypothesis Testing & Statistical Inference

Test hypotheses and estimate population parameters:

```python
bl.inference.confidence_interval(sample, confidence=0.95)            # 95% CI for mean
bl.inference.one_sample_ttest(sample, pop_mean=100)                 # Test sample vs population
bl.inference.two_sample_ttest(group1, group2)                       # Compare two groups (t-test + Mann-Whitney U)
bl.inference.paired_ttest(before, after)                            # Before/after testing
bl.inference.anova_test({'A': group_a, 'B': group_b, 'C': group_c}) # Multi-group comparison
bl.inference.correlation_test(x, y)                                  # Correlation significance
```

---

### 5. **process_mining** — Event Log Analysis

Analyze business processes from event logs:

```python
bl.process_mining.case_metrics(event_log)           # Duration, cost, activity count per case
bl.process_mining.activity_metrics(event_log)       # Frequency, duration by activity
bl.process_mining.resource_analysis(event_log)      # Workload distribution
bl.process_mining.variant_discovery(event_log)      # Top activity sequences (paths)
bl.process_mining.bottleneck_analysis(event_log)    # Waiting time identification
bl.process_mining.rework_detection(event_log)       # Repeated activities
bl.process_mining.timeline_visualization(event_log) # Interactive Gantt chart (plotly)
```

**Auto-Detection Requirements:**
- `case_id` column: Uniquely identifies a case/instance
- `activity` column: Names the activity/step
- `timestamp` column: When the activity occurred
- Optional: `resource`, `cost`, or other numeric columns

---

### 6. **quality** — Data Quality Assessment

Evaluate overall data quality:

```python
bl.quality.completeness_report(df)      # Missing data per column
bl.quality.consistency_check(df)        # Type mixing and format violations
bl.quality.uniqueness_analysis(df)      # Cardinality and duplicates
bl.quality.data_profile(df)             # Overall quality score (0-100)
bl.quality.outlier_summary(df)          # Quick outlier identification
```

---

## Built-in Datasets & Generators

```python
# Load classic datasets
bl.load_dataset('iris')         # Fisher's iris dataset
bl.load_dataset('tips')         # Restaurant tips
bl.load_dataset('diamonds')     # Diamond prices

# Generate synthetic business data
bl.generate_sample_data(n_rows=1000)

# Generate event logs for process mining
bl.generate_hr_onboarding_event_log(num_cases=300)
bl.generate_healthcare_event_log(num_cases=250)
bl.generate_manufacturing_event_log(num_cases=200)
bl.generate_tech_support_event_log(num_cases=400)
```

---

## Examples

### Example 1: Descriptive Analytics with Tables

```python
import bizlens as bl
import pandas as pd
import numpy as np

# Generate data
data = pd.DataFrame({
    'region': np.random.choice(['North', 'South', 'East', 'West'], 500),
    'revenue': np.random.gamma(2, 5000, 500),
    'satisfaction': np.random.normal(7.5, 1.5, 500).clip(1, 10),
})

# Analyze
bl.describe(data)
bl.tables.frequency_table(data['region'])
bl.tables.percentile_table(data['revenue'])
bl.tables.summary_statistics(data)
bl.diagnostic.detect_outliers(data['revenue'])
bl.quality.data_profile(data)
```

**Run this example:**
```bash
python examples/01_descriptive_analytics.py
```

---

### Example 2: Process Mining Event Logs

```python
import bizlens as bl

# Generate HR onboarding event log
event_log = bl.generate_hr_onboarding_event_log(num_cases=100)

# Analyze with auto-detection
bl.describe(event_log)  # Auto-detects event log

# Process mining metrics
bl.process_mining.case_metrics(event_log)
bl.process_mining.variant_discovery(event_log, top_n=5)
bl.process_mining.bottleneck_analysis(event_log)
bl.process_mining.rework_detection(event_log)
```

**Run this example:**
```bash
python examples/02_process_mining_basics.py
```

---

### Example 3: Hypothesis Testing & Inference

```python
import bizlens as bl
import numpy as np
import pandas as pd

# Generate test data
sample = pd.Series(np.random.normal(100, 15, size=50))
control = pd.Series(np.random.normal(100, 15, size=40))
treatment = pd.Series(np.random.normal(110, 18, size=40))

# Confidence intervals
bl.inference.confidence_interval(sample, confidence=0.95)

# t-tests
bl.inference.one_sample_ttest(sample, pop_mean=100)
bl.inference.two_sample_ttest(control, treatment)

# ANOVA
bl.inference.anova_test({
    'Control': control,
    'Treatment': treatment,
})
```

**Run this example:**
```bash
python examples/03_inference_hypothesis_testing.py
```

---

## Educational Features

### Sample vs Population
Learn the difference between sample and population statistics:

```python
# Sample: Bessel's correction (ddof=1)
bl.diagnostic.sample_vs_population(
    sample_data=my_sample,
    pop_mean=100,
    pop_std=15,
    column_name='Revenue'
)
```

### Hypothesis Testing with Effect Sizes
Understand both statistical AND practical significance:

```python
results = bl.inference.one_sample_ttest(sample, pop_mean=100)
print(f"p-value: {results['p_value']}")      # Statistical significance
print(f"Cohen's d: {results['cohens_d']}")   # Effect size (practical significance)
```

### Visualization of Distributions
```python
table, dist_info = bl.tables.distribution_fit(data['column'])
# Returns: best-fit distribution, parameters, AIC scores
```

---

## Installation & Dependencies

### Minimal Install
```bash
pip install bizlens==2.2.11
```

Core dependencies:
- **pandas** ≥1.5.0 — Data manipulation
- **numpy** ≥1.21.0 — Numerical operations
- **scipy** ≥1.9.0 — Statistical tests
- **statsmodels** ≥0.13.0 — Advanced statistics
- **scikit-learn** ≥1.0.0 — Outlier detection
- **matplotlib** ≥3.6.0 — Static plots
- **seaborn** ≥0.12.0 — Statistical plots
- **rich** ≥13.0.0 — Beautiful console output
- **plotly** ≥5.0.0 — Interactive visualizations (optional)

### Optional Extras
```bash
# For Jupyter/Colab
pip install bizlens[jupyter]

# For advanced correlations
pip install bizlens[stats-advanced]

# Everything
pip install bizlens[full]
```

---

## Running in Different Environments

### Google Colab
1. Paste example code into a cell
2. Run! (Auto-install handles dependencies)

### Jupyter Notebook
```python
%run examples/01_descriptive_analytics.py
```

### VSCode / Terminal
```bash
python examples/01_descriptive_analytics.py
```

### Anaconda
```bash
conda install -c conda-forge bizlens==2.2.11
```

---

## Version History

### v2.2.11 (Current)
- ✅ NEW: Statistical tables module (frequency, percentile, contingency, summary)
- ✅ NEW: Diagnostic module (outliers, normality, correlations, quality checks)
- ✅ NEW: Inference module (hypothesis testing, confidence intervals, effect sizes)
- ✅ ENHANCED: Process mining with Gantt charts, bottleneck detection, rework analysis
- ✅ NEW: Quality module (data profiling, completeness, consistency assessment)
- ✅ Fixed: narwhals.selectors fallback for compatibility
- ✅ Enhanced: Rich educational docstrings with learning notes

### v2.2.10
- Initial v2.2 release with descriptive analytics and process mining

### v2.2.1
- Early descriptive analytics foundation

---

## Future Roadmap

| Version | Features | Status |
|---------|----------|--------|
| **v2.2.11** | Tables, Diagnostic, Inference, Process Mining, Quality | ✅ Current |
| **v2.3.0** | Predictive Analytics (regression, classification, decision trees) | 🔄 Planned |
| **v2.4.0** | Time-Series Forecasting (ARIMA, exponential smoothing) | 🔄 Planned |
| **v2.5.0** | Quality & Six Sigma (Cpk, control charts, hypothesis testing) | 🔄 Planned |
| **v3.0.0** | Advanced: Deep Learning pipelines, automated ML | 🔄 Future |

---

## Performance Notes

- **Handles**: DataFrames up to 1M rows (pandas) or larger (polars)
- **Auto-detects**: Event logs automatically (looks for case_id + activity + timestamp)
- **Educational**: Rich output with learning notes (not optimized for speed)
- **Polars Support**: Via narwhals abstraction layer for fast operations

---

## FAQ

**Q: Can I use BizLens with Polars DataFrames?**
A: Yes! Via narwhals compatibility layer. Same API works with both pandas and polars.

**Q: Does BizLens replace scipy/statsmodels?**
A: No! It's a wrapper providing educational, easy-to-use interfaces. For advanced use, use scipy/statsmodels directly.

**Q: Is it for teaching or production?**
A: Both! Clean API and rich output make it great for education. Lightweight and fast enough for analysis pipelines.

**Q: Can I use the event log generators for my data?**
A: Yes! The templates include realistic HR, healthcare, manufacturing, and support data. Modify them for your use case.

**Q: Does it support missing data?**
A: Yes! Functions automatically drop NaN values. Use pandas `.fillna()` or `.interpolate()` for imputation first.

---

## Contributing

Issues, feature requests, and PRs welcome at:
https://github.com/solutiongate-learn/bizlens

---

## License

MIT License — Free for personal, educational, and commercial use.

---

## Citation

If you use BizLens in research or publications:

```bibtex
@software{bizlens2024,
  title = {BizLens: Educational Analytics for Python},
  author = {Singh, Sudhanshu},
  year = {2024},
  url = {https://github.com/solutiongate-learn/bizlens}
}
```

---

**Made with ❤️ for data analysts, teachers, and students**

[GitHub](https://github.com/solutiongate-learn/bizlens) • [PyPI](https://pypi.org/project/bizlens/) • [Issues](https://github.com/solutiongate-learn/bizlens/issues)
