Metadata-Version: 2.4
Name: mimic_iv_analysis
Version: 1.13.1
Summary: A data science and machine learning framework for nursing research
Author-email: Artin Majdi <msm2024@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/artinmajdi/mimic_iv_analysis
Project-URL: Documentation, https://github.com/artinmajdi/mimic_iv_analysis/docs
Project-URL: Repository, https://github.com/artinmajdi/mimic_iv_analysis.git
Project-URL: Issues, https://github.com/artinmajdi/mimic_iv_analysis/issues
Keywords: nursing research,healthcare,AI,medical analysis
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Healthcare Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: <3.13,>=3.10
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: dask[distributed]
Requires-Dist: pyarrow
Requires-Dist: scipy
Requires-Dist: scikit-learn
Requires-Dist: matplotlib
Requires-Dist: seaborn
Requires-Dist: plotly
Requires-Dist: streamlit
Requires-Dist: umap-learn
Requires-Dist: jupyter
Requires-Dist: ipywidgets
Requires-Dist: notebook
Requires-Dist: dask-labextension
Requires-Dist: tqdm
Requires-Dist: python-dotenv
Requires-Dist: pyyaml
Requires-Dist: joblib
Requires-Dist: watchdog
Requires-Dist: dotenv
Requires-Dist: humanize
Requires-Dist: pytest
Requires-Dist: sphinx
Requires-Dist: sphinx-rtd-theme
Requires-Dist: pydantic>=2.11.5
Requires-Dist: rich>=14.0.0
Requires-Dist: psutil>=7.0.0
Requires-Dist: pip>=23.0.0
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: pytest-cov>=2.0; extra == "dev"
Requires-Dist: black>=22.0; extra == "dev"
Requires-Dist: isort>=5.0; extra == "dev"
Requires-Dist: flake8>=3.9; extra == "dev"
Requires-Dist: mypy>=0.9; extra == "dev"
Requires-Dist: sphinx>=4.0; extra == "dev"
Requires-Dist: sphinx-rtd-theme>=1.0; extra == "dev"
Requires-Dist: twine>=6.0; extra == "dev"
Requires-Dist: wheel; extra == "dev"
Requires-Dist: build>=1.0; extra == "dev"
Provides-Extra: test
Requires-Dist: pytest>=6.0; extra == "test"
Requires-Dist: pytest-cov>=2.0; extra == "test"
Dynamic: license-file

# MIMIC-IV Analysis Toolkit

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python Version](https://img.shields.io/badge/python-3.12%2B-blue.svg)](https://www.python.org/downloads/)
<img src="https://img.shields.io/github/last-commit/artinmajdi/mimic_iv_analysis?style=flat&logo=git&logoColor=white&color=0080ff" alt="last-commit">
<img src="https://img.shields.io/github/languages/top/artinmajdi/mimic_iv_analysis?style=flat&color=0080ff" alt="repo-top-language">

*Unlock Insights from Healthcare Data Effortlessly*

## Development Architecture and Implementation of the MIMIC-IV Analysis User Interface

The MIMIC-IV Analysis User Interface was developed as a comprehensive web-based dashboard using a modern, modular architecture that prioritizes scalability, maintainability, and user experience. The application is built on **Streamlit**, a Python-based framework specifically designed for data science applications, which enables rapid development of interactive web applications without requiring extensive frontend development expertise. This choice was particularly advantageous for nursing research applications, as it allows researchers to focus on data analysis rather than web development complexities.

The application follows a **component-based architecture** with clear separation of concerns, organized into distinct modules that handle different aspects of the analytical workflow. The main application class (MIMICDashboardApp) serves as the orchestrator, managing session state and coordinating between specialized components including data loading (SideBar), exploration and visualization (ExplorationAndViz), feature engineering (FeatureEngineeringTab), clustering analysis (ClusteringAnalysisTab), and post-analysis visualization (AnalysisVisualizationTab). Each component is encapsulated as a separate class with its own render() method, ensuring modularity and facilitating future enhancements or modifications to specific functionality areas.

**Performance optimization** was a critical consideration in the development process, particularly given the large-scale nature of MIMIC-IV datasets. The application integrates **Dask**, a parallel computing library, to handle distributed data processing and enable efficient analysis of datasets that exceed available memory. The Dask integration includes automatic client initialization with configurable worker processes, memory limits, and dashboard monitoring capabilities. Additionally, the application implements intelligent data caching strategies using Streamlit's built-in caching mechanisms (@st.cache\_resource and @st.cache\_data) to minimize redundant computations and improve user experience during iterative analysis sessions.

The user interface design emphasizes **accessibility and usability** for nursing researchers with varying levels of technical expertise. The dashboard features an intuitive tab-based navigation system with four primary analytical workflows: data exploration and visualization, feature engineering, clustering analysis, and cluster interpretation. Each tab provides contextual guidance and progressive disclosure of advanced options, allowing users to begin with basic exploration and gradually access more sophisticated analytical tools. The interface includes comprehensive data filtering capabilities with predefined inclusion and exclusion criteria specific to nursing research contexts, such as age-based patient selection (18-75 years), specific admission types, and clinical outcome considerations. Real-time feedback mechanisms, including progress indicators and dataset statistics, help users understand the impact of their analytical choices and maintain confidence in the analysis process.

**Data integration and preprocessing** capabilities are seamlessly embedded within the interface, supporting both CSV and Parquet file formats with automatic format detection and conversion. The application includes sophisticated data validation and error handling mechanisms that provide clear feedback when data quality issues are encountered, ensuring that nursing researchers can confidently work with complex clinical datasets. The modular design also facilitates easy integration of new data sources or analytical methods, making the platform adaptable to evolving research needs in nursing informatics and clinical data analysis.


## Table of Contents

- [MIMIC-IV Analysis Toolkit](#mimic-iv-analysis-toolkit)
  - [Development Architecture and Implementation of the MIMIC-IV Analysis User Interface](#development-architecture-and-implementation-of-the-mimic-iv-analysis-user-interface)
  - [Table of Contents](#table-of-contents)
  - [About MIMIC-IV Data](#about-mimic-iv-data)
  - [Features](#features)
  - [Project Structure](#project-structure)
  - [Installation](#installation)
    - [Prerequisites](#prerequisites)
    - [Installation Steps](#installation-steps)
  - [Configuration](#configuration)
  - [Usage](#usage)
    - [Running the Streamlit Dashboard](#running-the-streamlit-dashboard)
    - [Install the package from TestPyPI (Example for version 0.5.8)](#install-the-package-from-testpypi-example-for-version-058)
  - [Core Modules Overview](#core-modules-overview)
  - [Development](#development)
    - [Code Style](#code-style)
    - [Running Tests](#running-tests)
  - [Documentation](#documentation)
  - [Streamlit Cloud Deployment](#streamlit-cloud-deployment)
  - [Contributing](#contributing)
  - [License](#license)
  - [Author](#author)

## About MIMIC-IV Data

This toolkit is designed to analyze data from the [MIMIC-IV (Medical Information Mart for Intensive Care IV)](https://mimic.mit.edu/docs/iv/) clinical database. MIMIC-IV is a large, freely-available database comprising de-identified health-related data associated with patients who stayed in critical care units at the Beth Israel Deaconess Medical Center.

For detailed information on the MIMIC-IV data structure used by this project, please refer to the documentation:
* [MIMIC-IV Data Structure Overview](documentations/mimic_iv_data_structure.md)
* [Detailed Table Structures](documentations/DATA_STRUCTURE.md)

## Features

*   **Comprehensive Data Loader:** Utilities for loading and preparing MIMIC-IV data, simplifying the process of loading and preprocessing MIMIC-IV datasets, addressing common data management challenges. Supports both CSV and Parquet formats, with options for Dask integration for large datasets.
*   **Interactive Visualization:** A Streamlit application for visualizing data, cluster results, and analysis. Utilizes Streamlit for real-time data exploration, enhancing user engagement and understanding of complex datasets.
*   **Feature Engineering Tools:** Tools for creating meaningful features from clinical temporal data, including order frequency matrices, temporal order sequences, and order timing features. Provides utilities for identifying and extracting relevant features, streamlining the data preparation process.
*   **Clustering Analysis Capabilities:** Implementations for K-Means, Hierarchical, DBSCAN clustering, and LDA Topic Modeling to identify patterns in clinical data.
*   **Predictive Modeling Support:** Designed to prepare data for various predictive tasks.
*   **Configuration Management:** Easy-to-use YAML configuration for managing data paths and application settings.
*   **MIMIC-IV Data Focus:** Specifically designed to work with the MIMIC-IV clinical database structure.
*   **Modular Architecture:** Facilitates easy updates and maintenance, promoting a seamless development experience.
*   **Exploratory Data Analysis**
*   **Patient Trajectory Visualization**
*   **Order Pattern Analysis**

## Project Structure

The repository is organized as follows:

```
mimic_iv_analysis/
├── mimic_iv_analysis/ # Main package source code
│   ├── __init__.py # Package initialization
│   ├── configurations/ # Configuration files (e.g., config.yaml)
│   ├── core/ # Core functionalities (data loading, clustering, feature engineering)
│   │   ├── __init__.py
│   │   ├── clustering.py
│   │   ├── data_loader.py
│   │   ├── feature_engineering.py
│   │   └── filtering.py
│   ├── examples/ # Example scripts and notebooks
│   └── visualization/ # Streamlit dashboard application and utilities
│       ├── __init__.py
│       ├── app.py
│       └── app_components/
├── documentations/ # Project documentation
├── scripts/ # Utility and helper scripts (install, run dashboard)
├── setup_config/ # Configuration for setup and testing (e.g., pytest.ini)
├── tests/ # Test suite for the project
├── .streamlit/ # Configuration for Streamlit Cloud deployment
├── README.md # This file
├── requirements.txt # Python package dependencies
└── setup.py # Package setup script
```

(Note: The `src/` directory mentioned in one of the older READMEs is now represented by the top-level `mimic_iv_analysis/` package directory for source code.)


## Installation

### Prerequisites

*   Python 3.12 or higher
*   pip or conda package manager

### Installation Steps

1.  **Clone the repository:**
    ```bash
    git clone https://github.com/artinmajdi/mimic_iv_analysis.git
    cd mimic_iv_analysis
    ```

2.  **Create a virtual environment (recommended):**
    ```bash
    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    ```

3.  **Install dependencies:**
    The `requirements.txt` file lists all necessary Python packages.
    ```bash
    pip install -r requirements.txt
    ```
    To install the package in editable mode along with development dependencies:
    ```bash
    pip install -e ".[dev]"
    ```
    Alternatively, you can use the provided installation script which offers environment choices (venv, conda, docker):
    ```bash
    bash scripts/install.sh
    ```

## Configuration

The main configuration for the application is located in `mimic_iv_analysis/configurations/config.yaml`.

You **must** update the `mimic_data_path` in this file to point to the root directory of your local MIMIC-IV dataset (version 3.1 or compatible).

Example `config.yaml` structure:
```yaml
data:
  mimic_data_path: "/path/to/your/mimic-iv-data" # <-- IMPORTANT: Update this path

app:
  port: 8501
  theme: "light"
  debug: false

# ... other configurations
```

## Usage

### Running the Streamlit Dashboard

1.  Ensure your virtual environment is activated (if you created one).
2.  Make sure you have configured the `mimic_data_path` in `config.yaml`.
3.  Run the application using:
    ```bash
    streamlit run mimic_iv_analysis/visualization/app.py
    ```
    Alternatively, if the package was installed using pip (e.g., via `pip install -e .` or from PyPI), you might be able to use a command like:
    ```bash
    mimic-iv
    ```
The dashboard should open in your web browser, typically at `http://localhost:8501` (or the port specified in `config.yaml`).

### Install the package from TestPyPI (Example for version 0.5.8)

If a version is available on TestPyPI, you can install it using:
```bash
pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ mimic_iv_analysis==0.5.8
```
(Replace `0.5.8` with the desired version if applicable.)


## Core Modules Overview

*   **`mimic_iv_analysis.core`**: Contains the fundamental logic for data handling and analysis.
    *   `data_loader.py`: Utilities for loading MIMIC-IV tables efficiently, supporting both CSV and Parquet formats, with options for Dask integration for large datasets.
    *   `feature_engineering.py`: Tools to create meaningful features from raw clinical data, such as order frequencies and temporal sequences.
    *   `clustering.py`: Implements various clustering algorithms (K-Means, Hierarchical, DBSCAN) and LDA topic modeling.
    *   `filtering.py`: Enables applying inclusion and exclusion criteria to the dataset.
*   **`mimic_iv_analysis.visualization`**: Houses the Streamlit application.
    *   `app.py`: The main entry point for the interactive dashboard.
    *   `app_components/`: Contains different tabs and UI elements of the dashboard.
*   **`mimic_iv_analysis.configurations`**: Manages application settings.

## Development

### Code Style

This project uses the following tools to maintain code quality:

*   **Black:** For code formatting.
*   **isort:** For import sorting.
*   **Flake8:** For style guide enforcement (PEP 8).
*   **MyPy:** For static type checking.

To format your code:
```bash
black .
isort .
```

To check your code:
```bash
flake8 .
mypy .
```

### Running Tests

Tests are located in the `tests/` directory. To run the test suite:
```bash
pytest tests/
```

To run tests with coverage:
```bash
pytest --cov=mimic_iv_analysis tests/
```
Test configuration can be found in `setup_config/pytest.ini` (or `pytest.ini` / `pyproject.toml` depending on project setup).

## Documentation

Further documentation can be found in the `documentations/` directory:

*   [`DATA_STRUCTURE.md`](documentations/DATA_STRUCTURE.md): Describes the expected structure of the MIMIC-IV data.
*   [`mimic_iv_data_structure.md`](documentations/mimic_iv_data_structure.md): Provides an overview of MIMIC-IV tables and identifiers.
*   [`.streamlit/README.md`](.streamlit/README.md): Guide for deploying the Streamlit application to Streamlit Cloud.
*   The `documentations/pyhealth/` directory contains documentation for the PyHealth library, which might be a dependency or a related project.

## Streamlit Cloud Deployment

For deploying the dashboard to Streamlit Cloud, refer to the guide in [`.streamlit/README.md`](.streamlit/README.md). This includes steps for repository preparation, secret management, and dependency configuration.

## Contributing

Contributions are welcome! Please follow these general steps:

1.  Fork the repository.
2.  Create a new feature branch (`git checkout -b feature/your-feature-name`).
3.  Make your changes.
4.  Ensure all tests pass (`pytest tests/`).
5.  Format your code (`black .` and `isort .`).
6.  Submit a pull request with a clear description of your changes.

## License

This project is licensed under the MIT License. See the `LICENSE.md` file for details.

## Author

*   Artin Majdi ([msm2024@gmail.com](mailto:msm2024@gmail.com))
