Metadata-Version: 2.4
Name: tinyehr
Version: 0.1.0
Summary: A tiny EHR dataset for learning, prototyping, and building — 100 patients in MIMIC and OMOP formats.
Author: Vidul Ayakulangara Panickan
License-Expression: MIT
Project-URL: Homepage, https://tinyehr.org
Project-URL: Repository, https://github.com/vidulpanickan/TinyEHR
Project-URL: Dataset, https://huggingface.co/datasets/vidulpanickan/TinyEHR
Keywords: ehr,mimic,omop,clinical,healthcare,dataset
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Healthcare Industry
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.3.0
Requires-Dist: pyarrow>=10.0.0
Provides-Extra: minimal
Requires-Dist: pandas>=1.3.0; extra == "minimal"
Dynamic: license-file

# TinyEHR : A Tiny Electronic Health Records Dataset for Learning, Prototyping, and Building

TinyEHR is a small, open, reproducible clinical dataset with 100 patients available in two formats - MIMIC and OMOP. It is derived from the [MIMIC-IV Clinical Database Demo v2.2](https://physionet.org/content/mimic-iv-demo/2.2/), the publicly available subset of MIMIC-IV published by the MIT Laboratory for Computational Physiology.

Open and ready to use — no credentialing and no data use agreements. Install and start exploring clinical data in seconds.

| | |
|---|---|
| Website | [tinyehr.org](https://tinyehr.org) |
| GitHub | [github.com/vidulpanickan/TinyEHR](https://github.com/vidulpanickan/TinyEHR) |
| HuggingFace | [datasets/vidulpanickan/TinyEHR](https://huggingface.co/datasets/vidulpanickan/TinyEHR) |
| PyPI | `pip install tinyehr` |

## Install

```bash
pip install tinyehr
```

## Python API

```python
import tinyehr

# Quick reference of all functions
tinyehr.help()

# Overview of all tables with row counts
tinyehr.info()
tinyehr.info(format="tinyehr_omop_format")

# List table names
tinyehr.list_tables()
tinyehr.list_tables(format="tinyehr_omop_format")

# Column names, types, and sample rows for a table
tinyehr.describe_table("patients")
tinyehr.describe_table("person", format="tinyehr_omop_format")

# Find tables by keyword in table and column names
tinyehr.search_tables("lab")
tinyehr.search_tables("drug")

# Load a table as a pandas DataFrame
patients = tinyehr.load_table("patients")
person = tinyehr.load_table("person", format="tinyehr_omop_format")

# All data for one patient across all tables
data = tinyehr.get_patient(10000032)
data["admissions"]    # DataFrame of this patient's admissions
data["labevents"]     # DataFrame of this patient's labs
data["noteevents"]    # DataFrame of this patient's notes

# Build a local SQLite database
db_path = tinyehr.build_sqlite(format="tinyehr_mimic_format")
db_path = tinyehr.build_sqlite(format="tinyehr_omop_format")

# Query the SQLite database
import sqlite3
conn = sqlite3.connect(db_path)
conn.execute("SELECT * FROM admissions LIMIT 5").fetchall()
```

## Direct from HuggingFace

```python
import pandas as pd

patients = pd.read_parquet(
    "hf://datasets/vidulpanickan/tinyehr/tinyehr_mimic_format/patients.parquet"
)
```

No dependencies beyond `pandas` and `pyarrow`.

## Trouble downloading?

You can download the raw CSV files directly from GitHub:

1. Go to [github.com/vidulpanickan/TinyEHR](https://github.com/vidulpanickan/TinyEHR)
2. Click the green **Code** button
3. Select **Download ZIP**

Or clone via terminal:

```bash
git clone https://github.com/vidulpanickan/TinyEHR.git
```

## Formats

TinyEHR ships in two formats from the same underlying patient cohort:

**MIMIC format** follows the original MIMIC-IV schema with dates shifted to realistic years, ICD codes reformatted with decimal points, and 4,580 synthetic clinical notes added.

**OMOP format** follows the OHDSI CDM v5.3.1 schema with hashed person IDs, dates shifted to realistic years, ICD codes formatted with periods, and clinical codes mapped to SNOMED, LOINC, and RxNorm via a custom MIMIC specific concept vocabulary.

For full dataset structure, schema documentation, and table details, visit [tinyehr.org](https://tinyehr.org).

## Differences from MIMIC-IV Demo

TinyEHR applies four targeted transformations to the original MIMIC-IV Demo data. All clinical values, patient demographics, table structures, referential integrity, and row counts are unchanged.

| Transformation | What changed | Why |
|---------------|-------------|-----|
| **Date shifting** | All dates shifted from synthetic 2100+ range to realistic 2010s-2020s using per-patient offsets derived from `anchor_year_group`. Affects 21 MIMIC tables and 15 OMOP tables. Offsets saved in `metadata/date_offsets.csv`. | Realistic dates for teaching and prototyping. |
| **ICD code formatting** | Decimal points inserted into ICD codes (`E119` - `E11.9`, `V707` - `V70.7`). ICD-10-PCS codes left unchanged. Affects `diagnoses_icd`, `d_icd_diagnoses`, `procedures_icd`, `d_icd_procedures` (MIMIC) and `condition_source_value`, `procedure_source_value` (OMOP). | Matches real-world clinical code formatting. |
| **Synthetic clinical notes** | 4,580 notes across 14 types added (not present in original Demo). Generated using a large language model, grounded in each patient's demographics, diagnoses, and admission data. Added as `noteevents` (MIMIC) and `note` (OMOP) with proper concept mappings. | The original Demo has no clinical notes. |
| **OMOP note concepts** | 19 note-related concepts added to `2b_concept.csv` (10 Note Type, 7 LOINC Document Ontology, 2 utility). Row count: 3,885 - 3,904. | Required for OMOP note table concept references. |

## License

- **Code** (this Python package): [MIT License](LICENSE)
- **Data** (the TinyEHR dataset): [ODbL-1.0](https://opendatacommons.org/licenses/odbl/1-0/)
