Metadata-Version: 2.4
Name: modelscout-ai
Version: 0.1.0
Summary: Autonomous ML agent that finds the best model for any dataset automatically
Author-email: Iram Fatima <iramfatima749@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/Iramfatima12/modelscout
Keywords: machine learning,automl,model selection,autonomous,deep learning
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Requires-Python: >=3.8
Description-Content-Type: text/markdown

# ModelScout 🤖

**Intelligent ML Model Recommendation System**

An automated machine learning tool that analyzes your dataset and recommends the best-fitting ML models. ModelScout uses Auto-sklearn to intelligently search through a vast hyperparameter space and identifies optimal models for your specific data.

## 🎯 Features

- **Automated Problem Detection**: Automatically detects classification, regression, or clustering tasks
- **Smart Model Selection**: Uses Auto-sklearn to find the best models for your data
- **Comprehensive Analysis**: Provides detailed dataset analysis and insights
- **Multiple Formats**: Generates reports in text, JSON, and table formats
- **REST API**: Flask-based REST API for easy integration
- **Support for All ML Tasks**: Classification, Regression, Time-series, and more

## 📋 Project Structure

```
ModelScout/
├── agent/                    # Core ML engine
│   ├── data_analyzer.py     # Dataset analysis module
│   ├── model_selector.py    # Model recommendation using Auto-sklearn
│   ├── reporter.py          # Report generation
│   ├── orchestrator.py      # Main pipeline orchestrator
│   └── __init__.py
├── api/                      # REST API
│   ├── main.py              # Flask API endpoints
│   └── __init__.py
├── data/                     # Sample datasets
├── models/                   # Trained models storage
├── outputs/                  # Generated reports
├── requirements.txt         # Python dependencies
├── demo.py                  # Demo script with examples
└── README.md
```

## 🚀 Quick Start

### 1. Installation

```bash
# Clone or navigate to the project directory
cd ModelScout

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
```

### 2. Basic Usage

```python
from agent.orchestrator import ModelScout

# Initialize
scout = ModelScout(auto_train_time=300)

# Run complete pipeline
result = scout.run_full_pipeline(
    data_path='your_data.csv',
    target='target_column',
    report_path='outputs/report.txt'
)

# Access results
print(result['recommendations']['best_model_name'])
print(result['recommendations']['test_score'])
print(result['report'])
```

### 3. Step-by-Step Usage

```python
from agent.orchestrator import ModelScout
import pandas as pd

scout = ModelScout()

# Load data
df = scout.load_data('data.csv')

# Analyze data
analysis = scout.analyze_data(df, target='label')
print(f"Problem Type: {analysis['target_analysis']['type']}")

# Get recommendations
recommendations = scout.recommend_models(df, 'label')
print(f"Best Model: {recommendations['best_model_name']}")
print(f"Test Score: {recommendations['test_score']}")

# Generate report
report = scout.generate_report(output_format='text', output_path='report.txt')
```

## 🔧 API Endpoints

### Health Check
```
GET /health
```

### Analyze Dataset
```
POST /api/analyze
Content-Type: application/json

{
    "file_path": "path/to/data.csv",
    "target": "target_column"
}
```

### Get Recommendations
```
POST /api/recommend
Content-Type: application/json

{
    "file_path": "path/to/data.csv",
    "target": "target_column",
    "time_limit": 300
}
```

### Generate Report
```
POST /api/report
Content-Type: application/json

{
    "file_path": "path/to/data.csv",
    "target": "target_column",
    "format": "text"
}
```

### Full Pipeline
```
POST /api/pipeline
Content-Type: application/json

{
    "file_path": "path/to/data.csv",
    "target": "target_column",
    "time_limit": 300
}
```

## 🎮 Run Demo

```bash
python demo.py
```

The demo script:
1. Creates sample datasets (Iris, Breast Cancer, Regression)
2. Runs ModelScout on each dataset
3. Generates comparison reports
4. Demonstrates both classification and regression

## 📊 What ModelScout Analyzes

### Data Characteristics
- Dataset size and shape
- Missing values and data quality
- Feature types and counts
- Memory usage

### Target Variable
- Problem type (Classification/Regression)
- Class distribution (for classification)
- Value range (for regression)
- Class imbalance ratio

### Feature Statistics
- Numeric: mean, std, min, max, missing count
- Categorical: unique values, missing count

## 🤖 How It Works

1. **Data Loading**: Supports CSV, Excel, JSON formats
2. **Analysis**: Comprehensive dataset profiling
3. **Problem Detection**: Auto-detects ML task type
4. **Model Search**: Auto-sklearn searches optimal models
5. **Evaluation**: Train/test split and performance metrics
6. **Reporting**: Generates detailed recommendations

## 📦 Dependencies

- **pandas**: Data manipulation
- **scikit-learn**: ML algorithms
- **auto-sklearn**: Automated ML model selection
- **numpy**: Numerical computing
- **matplotlib/seaborn**: Visualization
- **flask**: REST API
- **xgboost, lightgbm, catboost**: Advanced models
- **imbalanced-learn**: Class imbalance handling

## 🔍 Example Output

```
======================================================================
  ___  ___           _      _    ____  ___  _   _ ___
 |  \/  |          | |    | |  / ___ \/ _ \| | | |_  |
 | .  . | ___    __| | ___| | / /   \/ /_\ \ | | | / /
 | |\/| |/ _ \  / _` |/ _ \ | \ \   |  _  | | | |/ /
 | |  | | (_) || (_| |  __/ |  \ \__| | | | |_| / /
 |_|  |_|\___/  \__,_|\___|_|   \___/_| |_|\___/___/

======================================================================

DATA OVERVIEW
======================================================================
Dataset Shape: (150, 5) (rows, columns)
Memory Usage: 0.00 MB
Missing Values: 0 (0.00%)
Numeric Features: 4
Categorical Features: 0

TARGET VARIABLE ANALYSIS
----------------------------------------------------------------------
Problem Type: CLASSIFICATION
Unique Values: 3
Missing Values: 0
Class Imbalance Ratio: 1.00:1
Class Distribution:
  0: 50 (33.3%)
  1: 50 (33.3%)
  2: 50 (33.3%)

MODEL RECOMMENDATIONS
======================================================================
Best Model: RandomForestClassifier
Problem Type: CLASSIFICATION
Train Score: 1.0000
Test Score: 0.9333
Data Shape Used: (150, 4)
Number of Classes: 3

======================================================================
```

## 🛠️ Configuration

You can customize behavior by modifying parameters:

```python
scout = ModelScout(
    auto_train_time=600  # Increase for more thorough search (seconds)
)
```

## 📝 License

This project is for educational and portfolio purposes.

## 🤝 Contributing

Feel free to extend ModelScout with:
- Additional models
- More data preprocessing options
- Visualization enhancements
- Performance optimizations

## 📞 Support

For issues or questions, refer to the demo.py script for usage examples.

---

**Happy Model Scouting! 🎯**
