Metadata-Version: 2.4
Name: flat-fhir
Version: 0.1.1
Summary: Convert FHIR JSON bundles into flat, LLM-friendly XLSX spreadsheets.
License: MIT
Project-URL: Homepage, https://github.com/Ghost---Shadow/flat-fhir
Project-URL: Repository, https://github.com/Ghost---Shadow/flat-fhir
Project-URL: Bug Tracker, https://github.com/Ghost---Shadow/flat-fhir/issues
Keywords: fhir,hl7,healthcare,xlsx,llm,clinical-data
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Healthcare Industry
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: openpyxl>=3.1.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: click>=8.0.0
Requires-Dist: requests>=2.28.0
Requires-Dist: tabulate>=0.9.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Dynamic: license-file

# flat-fhir

Convert FHIR JSON bundles into flat, LLM-friendly XLSX spreadsheets.

Machines speak FHIR. Humans and LLMs get spreadsheets.

## Why

FHIR is the right canonical format for medical interoperability. But nested JSON with cross-references is terrible for:

- **Doctors** who want to eyeball trends in a spreadsheet
- **Patients** who want to carry their records on a USB stick
- **LLMs** that reason far better over flat tabular data than deeply nested JSON
- **Rural clinics** that have Excel but not a FHIR parser

flat-fhir bridges that gap. FHIR stays the source of truth. XLSX becomes the universal export.

## Install

```bash
pip install flat-fhir
```

## Quick Start

```python
from flat_fhir import flatten

# From a FHIR Bundle JSON file
flatten("patient_bundle.json", output="patient_record.xlsx")

# From a Python dict
import json
with open("patient_bundle.json") as f:
    bundle = json.load(f)

flatten(bundle, output="patient_record.xlsx")
```

## What It Produces

A multi-sheet XLSX workbook. Every row follows the same structure:

```
primary_key | performer (who) | timestamp (when) | ...all the whats
```

- **primary_key** — deterministic hash, unique per row
- **performer** — the doctor, lab, or pathologist who recorded it
- **timestamp** — when the observation/event happened (ISO 8601)
- **...whats** — the actual clinical data, flattened into columns

If a blood culture and liver ultrasound are ordered on the same day by different doctors, they produce two separate rows. One row per performer per event. Sparse by design — most cells are empty because you only measure a few things per visit.

### Sheets

| Sheet | Description | Example Columns |
|---|---|---|
| `patient` | Demographics (single row) | `family_name`, `given_name`, `birth_date`, `gender`, `phone`, `email`, `address` |
| `vitals` | Physical measurements | `blood_pressure_systolic_mmhg`, `blood_pressure_diastolic_mmhg`, `heart_rate_bpm`, `body_temperature_c`, `oxygen_saturation_pct`, `respiratory_rate_bpm`, `body_weight_kg`, `body_height_cm`, `bmi_kg_m2` |
| `labs` | Laboratory results | `hba1c_pct`, `blood_glucose_fasting_mg_dl`, `total_cholesterol_mg_dl`, `creatinine_mg_dl`, `egfr_ml_min`, `tsh_miu_l`, `hemoglobin_g_dl`, `wbc_k_ul`, `troponin_t_hs_ng_l` |
| `medications` | Prescriptions | `medication_name`, `rxnorm_code`, `dose`, `dose_unit`, `frequency`, `route`, `status`, `prescriber` |
| `conditions` | Diagnoses | `condition_name`, `icd_code`, `snomed_code`, `clinical_status`, `verification_status`, `severity`, `category`, `onset_date`, `abatement_date` |
| `allergies` | Allergy & intolerance records | `substance`, `reaction`, `severity`, `criticality`, `type`, `category`, `clinical_status` |
| `procedures` | Surgical & clinical procedures | `procedure_name`, `snomed_code`, `body_site`, `outcome`, `status` |
| `imaging` | Imaging studies | `modality`, `body_site`, `dicom_uid`, `findings` |
| `immunizations` | Vaccination records | `vaccine_name`, `cvx_code`, `dose`, `dose_unit`, `site`, `lot_number`, `status` |
| `encounters` | Visit metadata | `class`, `type`, `facility`, `reason`, `discharge_disposition`, `status`, `period_start`, `period_end` |
| `diagnostic_reports` | Panel-level lab groupings | `report_name`, `loinc_code`, `category`, `status`, `conclusion`, `result_references` |

### Common Columns

Every sheet (except `patient`) includes these columns in this order:

| Column | Description |
|---|---|
| `primary_key` | Deterministic hash — unique row identifier |
| `performer` | **Who** recorded it (doctor, lab, pathologist) |
| `timestamp` | **When** — ISO 8601 datetime |
| `encounter_id` | Links rows back to a specific visit |
| `source_resource_id` | Original FHIR resource ID for traceability |
| `notes` | Free text annotations |

## Built-in LOINC Registry

flat-fhir ships with 100+ LOINC code mappings covering the standard FHIR R4 profiles:

- **Vital signs** — blood pressure panel (85354-9), heart rate, SpO2, temperature, weight, height, BMI, head circumference
- **CBC** — WBC, RBC, hemoglobin, hematocrit, MCV, MCH, MCHC, RDW, platelets
- **BMP/CMP** — glucose, BUN, creatinine, sodium, potassium, chloride, CO2, calcium, ALT, AST, ALP, bilirubin, albumin, total protein
- **Lipid panel** — total cholesterol, HDL, LDL (direct & calculated), triglycerides
- **Thyroid** — TSH, free T4, free T3
- **HbA1c** — standard and IFCC
- **Renal** — eGFR, creatinine
- **Coagulation** — PT, INR, aPTT, fibrinogen, D-dimer
- **Cardiac markers** — troponin T/I (standard & high-sensitivity), BNP, NT-proBNP
- **Blood gases** — pH, pCO2, pO2, bicarbonate, base excess, lactate
- **Inflammatory** — CRP (standard & high-sensitivity), ESR
- **Vitamins** — vitamin D, B12, folate, iron, ferritin
- **Urinalysis** — specific gravity, pH, protein, glucose, ketones, blood, nitrite, leukocyte esterase

## Naming Convention

All column headers use **pythonic snake_case** with the unit appended:

```
blood_pressure_systolic_mmhg
body_weight_kg
hba1c_pct
blood_glucose_fasting_mg_dl
oxygen_saturation_pct
body_temperature_c
troponin_t_hs_ng_l
```

The mapping from FHIR codes (LOINC, SNOMED, RxNorm, CVX) to human-readable column names is maintained in a built-in registry.

## CLI

```bash
# Basic conversion
flat-fhir convert patient_bundle.json -o patient_record.xlsx

# Only specific sheets
flat-fhir convert patient_bundle.json -o record.xlsx --sheets vitals,labs,medications

# From a FHIR server (auto-appends $everything for Patient URLs)
flat-fhir fetch https://fhir.example.com/Patient/123 -o record.xlsx

# Merge multiple bundles (e.g., from different providers)
flat-fhir merge hospital_a.json hospital_b.json clinic.json -o merged.xlsx

# Reverse: XLSX back to FHIR Bundle JSON
flat-fhir unflatten patient_record.xlsx -o patient_bundle.json
```

## Python API

```python
from flat_fhir import flatten, unflatten, merge, FlatFHIR

# Basic flatten
flatten("bundle.json", output="record.xlsx")

# Fine-grained control
ff = FlatFHIR("bundle.json")

# Access individual sheets as pandas DataFrames
vitals_df = ff.vitals
labs_df = ff.labs

# Filter and query
recent_bp = ff.vitals[ff.vitals["timestamp"] > "2025-01-01"][
    ["timestamp", "blood_pressure_systolic_mmhg", "blood_pressure_diastolic_mmhg"]
]

# Plot directly
ff.labs.plot(x="timestamp", y="hba1c_pct", title="HbA1c Over Time")

# Export specific sheets
ff.to_xlsx("record.xlsx", sheets=["vitals", "labs"])

# Export to other formats
ff.to_csv("vitals.csv", sheet="vitals")
ff.to_parquet("record.parquet")
ff.to_json("record_flat.json")  # flat JSON, not FHIR

# Round-trip back to FHIR
fhir_bundle = ff.to_fhir()
unflatten("record.xlsx", output="bundle.json")

# Merge records from multiple providers
merged = merge(["hospital_a.json", "clinic_b.json"])
merged.to_xlsx("complete_record.xlsx")
```

## Custom Column Registry

The built-in registry maps LOINC/SNOMED codes to snake_case column names. You can extend it:

```python
from flat_fhir import registry

# Add a custom mapping
registry.add("85354-9", "blood_pressure_systolic_mmhg", sheet="vitals")

# Or load a custom registry file
registry.load("my_mappings.json")

# See all registered mappings
print(registry.list())
```

## LLM Integration

The whole point. Flat FHIR spreadsheets are purpose-built for LLM consumption:

```python
from flat_fhir import FlatFHIR

ff = FlatFHIR("bundle.json")

# Generate a compact text summary for an LLM prompt
prompt_context = ff.to_prompt()

# Or get a specific sheet as markdown table
vitals_table = ff.vitals.to_markdown()

# Token-aware truncation — keeps the most recent N rows
prompt_context = ff.to_prompt(max_tokens=2000)
```

Example output of `ff.to_prompt()`:

```
## Patient: Amir Khan, M, DOB 1990-05-15

## Active Conditions
- Essential hypertension (diagnosed 2024-09-12)
- Type 2 diabetes mellitus without complications (diagnosed 2024-01-10)

## Allergies
- Penicillin (causes Skin rash)

## Active Medications
| medication | dose | frequency |
|---|---|---|
| Lisinopril 10 MG Oral Tablet | 10 mg | once daily |
| Metformin 500 MG Oral Tablet | 500 mg | twice daily |

## Vitals (last 2)
| timestamp | bp_sys | bp_dia | temp_c | weight_kg | hr | spo2 |
|---|---|---|---|---|---|---|
| 2024-12-01 | 115 | 74 | 36.5 | 75.2 | 64 | 99 |
| 2024-09-12 | 118 | 76 | — | 75.8 | 66 | — |

## Labs (last 1)
| timestamp | hba1c | glucose_fasting | cholesterol | triglycerides |
|---|---|---|---|---|
| 2024-12-01 | 6.3 | 102 | 180 | 125 |
```

## Round-Trip Fidelity

flat-fhir preserves FHIR resource IDs and code systems so you can go back:

```
FHIR Bundle JSON → flatten → XLSX → unflatten → FHIR Bundle JSON
```

The `source_resource_id` column on every sheet ensures nothing gets lost. LOINC, SNOMED, ICD, RxNorm, and CVX codes are stored alongside human-readable names so the round-trip is lossless.

## Testing

178 tests covering every extractor, output format, CLI command, and round-trip path. Tested against 13 external FHIR samples from HL7, IHE, SMART on FHIR, and Synthea.

```bash
# Run tests
pytest

# Run with coverage
pytest --cov=flat_fhir --cov-report=term-missing

# Current coverage: 99%
```

### Registry Codegen

The LOINC registry is auto-generated from the FHIR R4 spec. To regenerate after a spec update:

```bash
python scripts/sync_fhir_spec.py --synthea
```

This downloads the FHIR R4 definitions, parses vital signs LOINC codes, scans Synthea sample bundles for real-world codes, and regenerates `flat_fhir/registry.py`.

## Roadmap

- [x] Core flatten/unflatten for all major FHIR resource types
- [x] CLI with `convert`, `fetch`, `merge`, `unflatten`
- [x] Built-in LOINC/SNOMED/CVX → snake_case column registry (150+ codes)
- [x] `to_prompt()` for LLM-ready output
- [x] FHIR server direct fetch (`flat-fhir fetch <url>`)
- [x] Merge records from multiple providers
- [x] DiagnosticReport support
- [x] Parquet export
- [x] Auto-codegen from FHIR R4 spec (`sync_fhir_spec.py`)
- [x] 99% test coverage with external FHIR fixtures
- [ ] Google Sheets export
- [ ] Streaming flatten for massive bundles
- [ ] SMART on FHIR auth for direct patient access
- [ ] Deduplication across merged bundles

## Project Philosophy

1. **FHIR is the source of truth.** We don't replace it. We make it accessible.
2. **Snake case everything.** No camelCase, no abbreviation soup. `blood_pressure_systolic_mmhg` not `BPSys`.
3. **Doctor-first rows.** Every row is `primary_key | who | when | ...whats`. One row per performer per event.
4. **Sparse is fine.** A 200-column sheet that's 95% empty is better than nested JSON nobody can read.
5. **Round-trip or bust.** If you can't go back to FHIR, the flattening is lossy and therefore broken.
6. **LLMs are first-class consumers.** Every design decision asks "can an LLM reason over this?"

## License

MIT
