Metadata-Version: 2.4
Name: lungscan
Version: 0.1.6
Summary: CNN and U-Net library for lung cancer classification and segmentation
Author-email: Hosam Hatim Osman <hosam.bosati@gmail.com>
License: Copyright 2026 Hosam Hatim Osman
        
        Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Keywords: medical,imaging,lung,cancer,segmentation,classification
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python :: 3.13
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.13
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: asttokens==3.0.1
Requires-Dist: pydot==4.0.1
Requires-Dist: absl-py==2.4.0
Requires-Dist: colorama==0.4.6
Requires-Dist: comm==0.2.3
Requires-Dist: contourpy==1.3.3
Requires-Dist: cycler==0.12.1
Requires-Dist: debugpy==1.8.20
Requires-Dist: decorator==5.2.1
Requires-Dist: executing==2.2.1
Requires-Dist: filelock==3.20.3
Requires-Dist: fonttools==4.61.1
Requires-Dist: fsspec==2026.2.0
Requires-Dist: glob2==0.7
Requires-Dist: h5py==3.15.1
Requires-Dist: ipykernel==7.2.0
Requires-Dist: ipython==9.9.0
Requires-Dist: ipython_pygments_lexers==1.1.1
Requires-Dist: jedi==0.19.2
Requires-Dist: Jinja2==3.1.6
Requires-Dist: joblib==1.5.3
Requires-Dist: jupyter_client==8.8.0
Requires-Dist: jupyter_core==5.9.1
Requires-Dist: keras==3.13.1
Requires-Dist: kiwisolver==1.4.9
Requires-Dist: markdown-it-py==4.0.0
Requires-Dist: MarkupSafe==3.0.3
Requires-Dist: matplotlib==3.10.8
Requires-Dist: matplotlib-inline==0.2.1
Requires-Dist: mdurl==0.1.2
Requires-Dist: ml_dtypes==0.5.4
Requires-Dist: mpmath==1.3.0
Requires-Dist: namex==0.1.0
Requires-Dist: nest-asyncio==1.6.0
Requires-Dist: networkx==3.6.1
Requires-Dist: numpy==2.2.6
Requires-Dist: opencv-python==4.12.0.88
Requires-Dist: optree==0.18.0
Requires-Dist: packaging==26.0
Requires-Dist: pandas==3.0.0
Requires-Dist: parso==0.8.6
Requires-Dist: pillow==12.1.0
Requires-Dist: platformdirs==4.5.1
Requires-Dist: prompt_toolkit==3.0.52
Requires-Dist: psutil==7.2.2
Requires-Dist: pure_eval==0.2.3
Requires-Dist: Pygments==2.19.2
Requires-Dist: pyparsing==3.3.2
Requires-Dist: python-dateutil==2.9.0.post0
Requires-Dist: pyzmq==27.1.0
Requires-Dist: rich==14.3.2
Requires-Dist: scikit-learn==1.8.0
Requires-Dist: scipy==1.17.0
Requires-Dist: setuptools==82.0.0
Requires-Dist: six==1.17.0
Requires-Dist: stack-data==0.6.3
Requires-Dist: sympy==1.14.0
Requires-Dist: threadpoolctl==3.6.0
Requires-Dist: torch==2.9.1
Requires-Dist: torchvision==0.24.1
Requires-Dist: tornado==6.5.4
Requires-Dist: tqdm==4.67.1
Requires-Dist: traitlets==5.14.3
Requires-Dist: typing_extensions==4.15.0
Requires-Dist: tzdata==2025.3
Requires-Dist: wcwidth==0.6.0
Dynamic: license-file

# LungScan - Advanced Lung Cancer Detection & Segmentation Library

![Python](https://img.shields.io/badge/python-3.13+-blue.svg)
![License](https://img.shields.io/badge/license-MIT-green.svg)
![Status](https://img.shields.io/badge/status-production--ready-brightgreen.svg)
![windows](https://img.shields.io/badge/windows--11-ready-green.svg)

**LungScan** is a comprehensive medical AI library for automated lung cancer detection, classification, and segmentation from CT scans. Built with clinical accuracy and ease of use in mind, LungScan combines state-of-the-art deep learning models with medical imaging best practices to deliver reliable diagnostic support.

---

## 📋 Table of Contents

- [Features](#-features)
- [Installation](#-installation)
- [Quick Start](#-quick-start)
- [Dataset Preparation](#-dataset-preparation)
- [Classification Pipeline](#-classification-pipeline)
- [Segmentation Pipeline](#-segmentation-pipeline)
- [Inference & Prediction](#-inference--prediction)
- [Curriculum Learning](#-curriculum-learning)
- [Evaluation & Metrics](#-evaluation--metrics)
- [API Reference](#-api-reference)
- [Pre-trained Models](#-pre-trained-models)
- [License](#-license)

---

## ✨ Features

### Core Capabilities
- **Multi-Class Lung Cancer Classification**: Detect 4 lung cancer types with confidence scoring
  - Adenocarcinoma
  - Squamous Cell Carcinoma
  - Large Cell Carcinoma
  - Normal (Healthy)

- **Precise Lung Segmentation**: Attention U-Net architecture for accurate lung region extraction

- **Metalung Augmentation**: Advanced medical-specific data augmentation for improved generalization

- **Curriculum Learning**: Progressive training on lesion sizes (Small → Medium → Large → XLarge)

- **Medical Priority Balancing**: Handles class imbalance with clinical-aware weighting

- **GPU Acceleration**: Auto-detects Intel GPU for optimized training

- **Comprehensive Metrics**: Precision, Recall, F1-Score, IoU, Dice Coefficient, ROC-AUC

- **Visual Diagnostics**: Built-in visualization tools for training monitoring and result interpretation

---

## 📦 Installation

### Prerequisites
- Python 3.13 or higher

### Install LungScan

```bash
# Install from PyPI
pip install lungscan

```
---

## 🚀 Quick Start

### 1. Prepare Your Dataset

```python
from lungscan import convert_pkl2images_metalung, LungDatasetSplitter

# Convert training data with augmentation
convert_pkl2images_metalung(
    pickle_path='dataset/source/lung_cancer_train.pkl',
    output_base_dir='dataset/image_data/train',
    num_augments=2
)

# Convert test data
convert_pkl2images_metalung(
    pickle_path='dataset/source/lung_cancer_test.pkl',
    output_base_dir='dataset/image_data/test',
    num_augments=0  # No augmentation for test set
)

# Initialize splitter with pixel range definitions
splitter = LungDatasetSplitter(
    source_dir='dataset/image_data',
    pixel_ranges={
        'xlarge': (150, 301),   # 150-300 pixels
        'large': (50, 151),     # 50-150 pixels
        'medium': (20, 51),     # 20-50 pixels
        'small': (9, 21)        # 9-20 pixels
    }
)

# Analyze lesion distribution
splitter.analyze()

# Create curriculum dataset
splitter.split(output_dir='dataset/image_split')

```

### 2. Train Classification Model

```python
from lungscan import LungClassificationPipeline

# Initialize and train
pipeline = LungClassificationPipeline()
pipeline.load_data('dataset/lung_classes')
pipeline.train(epochs=10, load_pretrained=True)

# Evaluate
metrics = pipeline.evaluate(num_samples=20)
print(metrics)
```

### 3. Make Classification Prediction

```python
# Classification prediction
result = pipeline.predict(
    'path/to/ct_scan.png',
    visualize=True
)
print(f"Diagnosis: {result['class']} ({result['confidence']:.1%})")
```

### 4. Train Segmentation Model

```python
from lungscan import LungSegmentationPipeline

# Initialize segmentation pipeline
pipeline = LungSegmentationPipeline(
    img_size=(256, 256, 1),
    model_type='att_unet'
)

pipeline.load_data('dataset/image_split')
pipeline.train(epochs_per_stage=10)
```

### 5. Make Segmentation Predictions

```python
# Classification prediction
result = pipeline.predict(
    'path/to/ct_scan.png',
    visualize=True
)
```

---

## 📊 Dataset Preparation

### Input Format
LungScan expects data in `.pkl` format containing:
- CT scan images
- Corresponding masks (for segmentation)
- Class labels (for classification)
The dataset can be downloaded from [Here](https://data.mendeley.com/datasets/5rr22hgzwr/1). 

### Directory Structure
```
dataset/
├── source/
│   ├── lung_cancer_train.pkl
│   └── lung_cancer_test.pkl
├── image_data
│   ├── train
│   │   ├── images
│   │   └── masks
│   └── test
│       ├── images
│       └── masks
├── image_split
│   ├── train
│   │   ├── xlarge
│   │   │    ├── images
│   │   │    └── masks
│   │   ├── large
│   │   │    ├── images
│   │   │    └── masks
│   │   ├── meduim
│   │   │    ├── images
│   │   │    └── masks
│   │   └── small
│   │        ├── images
│   │        └── masks
│   └── test
│   │   ├── xlarge
│   │   │    ├── ...
│   │   ├── large
│   │   │    ├── ...
│   │   ├── meduim
│   │   │    ├── ...
│   │   └── small
│   │        ├── ...
└── lung_classes
    ├── train
    │   ├── adenocarcinoma
    │   ├── squamous_cell_carcinoma
    │   ├── large_cell_carcinoma
    │   └── normal
    └── test
        ├── adenocarcinoma
        ├── squamous_cell_carcinoma
        ├── large_cell_carcinoma
        └── normal
```

### Metalung Augmentation
The `convert_pkl2images_metalung` function applies medical-specific augmentations:
- Random rotation and flipping
- Intensity adjustments (simulating different scanner settings)
- Random Cancer relocation (generatining multiple cancer variantions of the same sample)
- Noise injection (simulating acquisition artifacts)

---

## 🔍 Classification Pipeline

### LungClassificationPipeline

A complete end-to-end pipeline for lung cancer classification.

#### Key Methods

```python
from lungscan import LungClassificationPipeline

pipeline = LungClassificationPipeline(
    img_size=(224, 224, 3),  # Input image dimensions
    verbose=True              # Enable detailed logging
)

# Load balanced dataset
pipeline.load_data('dataset/lung_classes')

# Visualize samples
pipeline.view_sample(data_type='train', is_notebook=True)

# Train model
pipeline.train(
    epochs=10,
    load_pretrained=True,      # Use pre-trained weights
    learning_rate=1e-4
)

# Calculate metrics
metrics = pipeline.calcuate_metrics(data_type='test')

# Predict on new image
result = pipeline.predict(
    'path/to/image.png',
    visualize=True,
    is_notebook=True
)
# Returns: {'class': 'adenocarcinoma', 'confidence': 0.94, 'probabilities': {...}}
```

#### Training Features
- **Transfer Learning**: Leverages pre-trained CNN architectures (EfficientNetB0)
- **Class Balancing**: Automatic handling of imbalanced datasets
- **Early Stopping**: Prevents overfitting with patience monitoring
- **Checkpoint Saving**: Saves best model weights automatically

---

## ✂️ Segmentation Pipeline

### LungSegmentationPipeline

Advanced lung segmentation using Attention U-Net architecture.

#### Key Methods

```python
from lungscan import LungSegmentationPipeline

pipeline = LungSegmentationPipeline(
    img_size=(256, 256, 1),           # Grayscale input
    model_type='att_unet',            # Attention U-Net
    pretrained_path=None,             # Path to pre-trained weights
    verbose=True
)

# Load dataset
pipeline.load_data('dataset/image_split')

# Visualize samples
pipeline.view_sample(sample_type='train', is_notebook=True)

# Train with curriculum learning
pipeline.train(epochs_per_stage=10)

# Predict segmentation
result = pipeline.predict(
    'path/to/ct_scan.png',
    output_path='results/mask.png',
    is_notebook=True
)
# Returns: {'image': array, 'mask': array, 'overlay': array}

# Evaluate performance
pipeline.evaluate(num_samples=10, is_notebook=True)
pipeline.calcuate_metrics(data_type='test')
```

#### Model Architecture
- **Attention Gates**: Focus on relevant regions, suppress noise
- **Skip Connections**: Preserve spatial information
- **Multi-scale Feature Extraction**: Captures details at different resolutions
- **Dice Loss**: Optimized for medical segmentation tasks

---

## 🎓 Curriculum Learning

### LungDatasetSplitter

Progressive training strategy based on lesion size for improved convergence.

```python
from lungscan import LungDatasetSplitter

# Initialize splitter with pixel range definitions
splitter = LungDatasetSplitter(
    source_dir='dataset/image_data',
    pixel_ranges={
        'xlarge': (150, 301),   # 150-300 pixels
        'large': (50, 151),     # 50-150 pixels
        'medium': (20, 51),     # 20-50 pixels
        'small': (9, 21)        # 9-20 pixels
    }
)

# Analyze lesion distribution
splitter.analyze()

# Create curriculum dataset
splitter.split(output_dir='dataset/image_split')
```

#### Benefits
- **Faster Convergence**: Start with easier (xlarger) lesions
- **Better Generalization**: Gradually learn complex patterns
- **Reduced Overfitting**: Progressive complexity prevents memorization

---

## 🔮 Inference & Prediction

### Standalone Prediction Tool

For quick predictions without training:

```python
from lungscan import LungSegmentationPipeline, LungClassificationPipeline

# Initialize models
seg_model = LungSegmentationPipeline(
    img_size=(256, 256, 1),
    model_type='att_unet',
    pretrained_path='checkpoints/best_medium.keras'
)

classi_model = LungClassificationPipeline(
    img_size=(224, 224, 3)
)

# Predict
classification = classi_model.predict('path/to/image.png', visualize=False)
segmentation = seg_model.predict('path/to/image.png', visualize=False)

print(f"Diagnosis: {classification['class']}")
print(f"Confidence: {classification['confidence']:.1%}")
```

### GUI-Based Prediction

Interactive file selection for batch prediction:

```python
from lungscan import select_files

# Open file dialog
image_paths = select_files()

# Process each image
for path in image_paths:
    # Your prediction logic here
    pass
```

---

## 📈 Evaluation & Metrics

### Classification Metrics
- **Accuracy**: Overall correctness
- **Precision**: True positive rate among predicted positives
- **Recall**: True positive rate among actual positives
- **F1-Score**: Harmonic mean of precision and recall
- **Sentivity**: True positive rate among actual negatives
- **Specificity**: True negative rate among predicted negatives

### Segmentation Metrics
- **IoU (Intersection over Union)**: Overlap between predicted and ground truth
- **Dice Coefficient**: Similarity measure (2 * IoU / (IoU + 1))
- **Pixel Accuracy**: Percentage of correctly classified pixels
- **precision**: True positive rate among predicted positives
- **recall**: True positive rate among actual positives


### Visualization Tools

```python
from lungscan import disp_image, add_text_to_image

# Display image in notebook or save to file
disp_image(image_array, isNotebook=True, save_path='output.png')

# Add diagnostic text to image
annotated = add_text_to_image(
    image_array,
    text="Diagnosis: Adenocarcinoma",
    position=(5, 5),
    font_size=12,
    color=(255, 0, 0)  # Red text
)
```

---

## 📚 API Reference

### Core Functions

#### `convert_pkl2images_metalung(pickle_path, output_base_dir, num_augments)`
Converts pickle dataset to images with metalung augmentation.

**Parameters:**
- `pickle_path` (str): Path to .pkl file
- `output_base_dir` (str): Output directory for images/masks
- `num_augments` (int): Number of augmented copies per image

**Returns:** None

---

#### `LungClassificationPipeline(img_size, verbose)`
End-to-end classification pipeline.

**Methods:**
- `load_data(base_dir)`: Load dataset from directory
- `view_sample(data_type, is_notebook)`: Visualize sample images
- `train(epochs, load_pretrained, learning_rate)`: Train model
- `predict(image_path, visualize, is_notebook)`: Predict on single image
- `evaluate(num_samples, output_dir, is_notebook)`: Comprehensive evaluation
- `calcuate_metrics(data_type)`: Calculate performance metrics

---

#### `LungSegmentationPipeline(img_size, model_type, pretrained_path, verbose)`
Lung segmentation pipeline with Attention U-Net.

**Methods:**
- `load_data(data_dir)`: Load segmentation dataset
- `view_sample(sample_type, is_notebook)`: Visualize samples
- `train(epochs_per_stage, num_of_sweeps, load_checkpoint)`: Train with curriculum learning
- `predict(image_path, output_path, is_notebook)`: Predict segmentation mask
- `evaluate(num_samples, output_dir, is_notebook)`: Evaluate on test set
- `calcuate_metrics(data_type)`: Calculate segmentation metrics

---

#### `LungDatasetSplitter(source_dir, pixel_ranges)`
Split dataset by lesion size for curriculum learning.

**Methods:**
- `analyze()`: Display lesion size distribution
- `split(output_dir)`: Create curriculum dataset splits

---

#### `select_files()`
Open file dialog for image selection.

**Returns:** List of selected file paths

---

#### `disp_image(image, isNotebook, save_path)`
Display or save image.

**Parameters:**
- `image`: PIL Image or numpy array
- `isNotebook`: Display in Jupyter notebook
- `save_path`: Save path (optional)

---

#### `add_text_to_image(image, text, position, font_size, color)`
Add text annotation to image.

**Parameters:**
- `image`: Input image
- `text`: Text to add
- `position`: (x, y) coordinates
- `font_size`: Font size
- `color`: RGB tuple

**Returns:** Annotated image array

---

#### `fetch_dirs(base_dir, is_semantic)`
Fetch image and mask file lists from directory structure.

**Parameters:**
- `base_dir`: Base directory
- `is_semantic`: Flag for semantic segmentation

**Returns:** Tuple of (image_dict, mask_dict)

---

## 🏆 Pre-trained Models

### Available Checkpoints

| Model | Task | Path | Performance |
|-------|------|------|-------------|
| `best_medium.keras` | Segmentation | `checkpoints/2nd advance/best_medium.keras` | IoU: 0.2726, Dice: 0.4285 |
| `lung_classification.keras` | Classification | `models/lung_classification.keras` | Accuracy: 80.21%, F1: 0.8 |

### Loading Pre-trained Weights

```python
# Segmentation
seg_pipeline = LungSegmentationPipeline(
    pretrained_path='checkpoints/2nd advance/best_medium.keras'
)

# Classification
class_pipeline = LungClassificationPipeline()
class_pipeline.train(load_pretrained=True)
```

---

## 📝 Example Workflows

### Complete Training Pipeline

```python
# Step 1: Prepare dataset
from lungscan import convert_pkl2images_metalung
convert_pkl2images_metalung('dataset/source/train.pkl', 'dataset/image_data/train', 2)

# Step 2: Split for curriculum learning
from lungscan import LungDatasetSplitter
splitter = LungDatasetSplitter('dataset/image_data')
splitter.split('dataset/image_split')

# Step 3: Train segmentation
from lungscan import LungSegmentationPipeline
seg = LungSegmentationPipeline(img_size=(256, 256, 1))
seg.load_data('dataset/image_split')
seg.train(epochs_per_stage=10)

# Step 4: Train classification
from lungscan import LungClassificationPipeline
clf = LungClassificationPipeline()
clf.load_data('dataset/lung_classes')
clf.train(epochs=20, load_pretrained=True)
```

### Batch Evaluation

```python
from lungscan import LungClassificationPipeline, LungSegmentationPipeline,fetch_dirs

# Get test images
img_flst, _ = fetch_dirs('dataset/lung_classes', is_semantic=False)

# Initialize model
seg_line = LungSegmentationPipeline(
    pretrained_path='checkpoints/2nd advance/best_medium.keras',
)

class_line = LungClassificationPipeline()

# Evaluate all test samples
results = []
for class_name in img_flst['test'].keys():
    for img_path in img_flst['test'][class_name]:
        pred_mask = seg_line.predict(img_path, visualize=False)
        pred = class_line.predict(img_path, visualize=False)
        results.append({
            'path': img_path,
            'true_class': class_name,
            'input': pred_mask['image'],
            'mask': pred_mask['overlay'],
            'predicted': pred['class'],
            'confidence': pred['confidence']
        })
```

---

## 📜 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

---

## 🙏 Acknowledgments

- Medical imaging datasets and annotations
- Pytorch/Keras development team
- Attention U-Net original authors
- Open-source medical AI community

---

## 📧 Contact

For questions, issues, or collaboration opportunities:

- **Email**: hosam.bosati@gmail.com

---

## 📊 Citation

If you use LungScan in your research, please cite:

```bibtex
@software{lungscan2026,
  author = {Hosam Hatim Osman},
  title = {LungScan: Advanced Lung Cancer Detection and Segmentation Library},
  year = {2026},
  url = {https://pypi.org/project/lungscan/}
}
```

---

**Built with ❤️ for medical AI advancement**
