Metadata-Version: 2.4
Name: skiearn-kdd
Version: 1.0.1
Summary: Interactive Knowledge Discovery & Data Mining (KDD) study guide with comprehensive documentation
Home-page: https://github.com/yourusername/skiearn
Author: KDD Study Team
License: MIT
Project-URL: Homepage, https://github.com/yourusername/skiearn
Project-URL: Documentation, https://github.com/yourusername/skiearn
Project-URL: Repository, https://github.com/yourusername/skiearn
Project-URL: Bug Tracker, https://github.com/yourusername/skiearn/issues
Keywords: kdd,data-mining,machine-learning,statistics,study-guide,education,data-science
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Education
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# SKIEARN - Knowledge Discovery & Data Mining Study Guide

![Python Version](https://img.shields.io/badge/python-3.7%2B-blue)
![License](https://img.shields.io/badge/license-MIT-green)
![PyPI version](https://img.shields.io/badge/pypi-v1.0.0-orange)

**SKIEARN** is an interactive Python package providing comprehensive documentation for Knowledge Discovery in Databases (KDD) and Data Mining concepts. Perfect for university students preparing for KDD/Data Mining exams!

## 🚀 Installation

```bash
pip install skiearn-kdd
```

## 📖 Usage

Simply import and run:

```python
import skiearn

skiearn.print()
```

This will launch an interactive menu where you can browse through 13 comprehensive documentation files covering all KDD topics.

## 📚 What's Included

### Main Documentation (13 Files)

1. **Data Preparation & Formalism** - Variable types, IID, splits, data leakage, curse of dimensionality
2. **Statistics & Distributions** - Descriptive statistics, normality tests, CLT
3. **Hypothesis Testing** - Z-tests, t-tests, ANOVA, Chi-square, bootstrap, permutation tests
4. **Causality & Feature Selection** - Causality concepts, Simpson's paradox, feature selection methods
5. **Outliers & Robust Statistics** - Detection methods (Z-score, IQR, MAD, LOF, Isolation Forest)
6. **Supervised Learning** - All major ML algorithms with formal explanations
7. **Model Evaluation & Comparison** - Metrics, cross-validation, statistical model comparison
8. **Imbalanced & Missing Data** - SMOTE, sampling techniques, imputation methods
9. **Explainability & Visualization** - SHAP, LIME, all visualization techniques
10. **Dimensionality & Clustering** - PCA, t-SNE, K-Means, DBSCAN, cluster validation
11. **Advanced Topics** - Time series, association rules, information theory, probability theory
12. **Encoding & Validation** - All encoding techniques, validation strategies
13. **Exam Traps & Pitfalls** ⚠️ - Common mistakes and how to avoid them

### Additional Resources

- **README** - Study guide overview and organization
- **Study Guide** - Recommended week-by-week study path
- **Verification** - Complete section mapping (all 74 topics covered)

## 💡 Features

- ✅ **74 comprehensive topics** covering all KDD/Data Mining concepts
- ✅ **Executable Python code examples** for every concept
- ✅ **Statistical tests** with proper interpretation
- ✅ **Exam traps highlighted** - avoid common mistakes
- ✅ **Interactive menu** - easy navigation
- ✅ **No external dependencies** - pure Python

## 🎯 Perfect For

- University students taking KDD/Data Mining courses
- Exam preparation and quick reference
- Understanding statistical foundations of ML
- Learning proper data preprocessing
- Avoiding common data science pitfalls

## 📋 Example Session

```python
>>> import skiearn
>>> skiearn.print()

================================================================================
                    SKIEARN DOCUMENTATION VIEWER
               Knowledge Discovery & Data Mining Study Guide
================================================================================

📚 MAIN DOCUMENTATION:
  [ 1] Data Preparation & Formalism
      └─ Variable types, IID, splits, leakage, dimensionality, bias-variance

  [ 2] Statistics & Distributions
      └─ Mean/median/variance, skewness/kurtosis, normality tests

  ...

📖 Enter your choice: 3
```

## 📦 Package Contents

```
skiearn/
├── __init__.py          # Package initialization
├── viewer.py            # Interactive documentation viewer
└── docs/                # All documentation files
    ├── 01_data_preparation.txt
    ├── 02_statistics_distributions.txt
    ├── ... (13 main files + 3 additional resources)
```

## 🔧 Requirements

- Python 3.7 or higher
- No external dependencies!

## 📝 License

MIT License - feel free to use for educational purposes.

## 🤝 Contributing

Contributions welcome! If you find errors or want to add topics:
1. Fork the repository
2. Create a feature branch
3. Submit a pull request

## ⭐ Support

If this package helped you ace your KDD exam, consider:
- Giving it a star on GitHub
- Sharing with classmates
- Reporting issues or suggestions

## 📧 Contact

For questions or suggestions, please open an issue on GitHub.

---

**Good luck with your KDD exam!** 🎓

## Quick Start Example

```python
import skiearn

# Launch interactive viewer
skiearn.print()

# Navigate using:
# - Numbers 1-13: View specific documentation
# - R: README
# - S: Study Guide
# - A: View all files
# - Q: Quit
```

## Topics Covered

- Data preparation and preprocessing
- Statistical hypothesis testing
- Causality and feature selection
- Machine learning algorithms
- Model evaluation metrics
- Handling imbalanced and missing data
- Explainability (SHAP, LIME)
- Clustering and dimensionality reduction
- Time series analysis
- Information theory
- And much more!

**All with executable Python code and detailed explanations!**
