Metadata-Version: 2.4
Name: mosaicx
Version: 1.0.9
Summary: Medical cOmputational Suite for Advanced Intelligent eXtraction
Project-URL: Homepage, https://github.com/LalithShiyam/MOSAICX
Project-URL: Repository, https://github.com/LalithShiyam/MOSAICX
Project-URL: Documentation, https://github.com/LalithShiyam/MOSAICX#readme
Project-URL: Bug Tracker, https://github.com/LalithShiyam/MOSAICX/issues
Author-email: Lalith Kumar Shiyam Sundar <lalith.shiyam@med.uni-muenchen.de>
License: DUAL LICENSING NOTICE
        ====================
        
        MOSAICX is dual-licensed under the terms of both the GNU Affero General Public License v3.0 (AGPL-3.0) and a Commercial License.
        
        OPEN SOURCE LICENSE
        ===================
        
        This software is available under the GNU Affero General Public License v3.0 (AGPL-3.0).
        
        Under this license, you are free to use, modify, and distribute this software, provided that:
        - Any derivative work or application that uses this software must also be open-sourced under AGPL-3.0
        - If you run this software on a server and provide it as a service, you must make the complete source code of your application (including modifications) available to your users
        - You must include this license notice and copyright information in all copies
        
        For the complete AGPL-3.0 license terms, see LICENSE-AGPL-3.0.txt
        
        COMMERCIAL LICENSE
        ==================
        
        If you wish to use this software in a commercial product or service without the open-source requirements of AGPL-3.0, you must obtain a commercial license.
        
        Commercial licenses are available from:
        
            Zenta GmbH
            
            For commercial licensing inquiries, please contact:
            Email: info@zenta.solutions
            Subject: MOSAICX Commercial License Request
        
        Commercial licensing allows you to:
        - Use this software in proprietary applications
        - Distribute applications containing this software without open-source obligations
        - Customize and modify the software without sharing changes
        - Receive commercial support and maintenance
        
        COPYRIGHT AND ATTRIBUTION
        ==========================
        
        Copyright (c) 2024 DIGITX Lab, Department of Radiology, LMU Munich University Hospital
        Developed by Lalith Kumar Shiyam Sundar, PhD
        
        Commercial licensing managed by Zenta GmbH
        
        IMPORTANT NOTICE
        ================
        
        By using this software, you agree to comply with the terms of one of the above licenses.
        If you are unsure which license applies to your use case, please contact Zenta GmbH for clarification.
License-File: LICENSE
Keywords: extraction,llm,medical,nlp,pdf,radiology
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Healthcare Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.11
Requires-Dist: click>=8.1.0
Requires-Dist: docling>=2.0.0
Requires-Dist: dspy-ai>=2.4.0
Requires-Dist: httpx>=0.25.0
Requires-Dist: instructor>=1.0.0
Requires-Dist: ollama>=0.3.0
Requires-Dist: openai>=1.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: python-cfonts>=1.0.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: reportlab>=4.4.4
Requires-Dist: rich-click>=1.8.0
Requires-Dist: rich>=13.0.0
Requires-Dist: typing-extensions>=4.8.0
Provides-Extra: dev
Requires-Dist: black>=23.7.0; extra == 'dev'
Requires-Dist: isort>=5.12.0; extra == 'dev'
Requires-Dist: mypy>=1.5.0; extra == 'dev'
Requires-Dist: pre-commit>=3.3.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
Requires-Dist: pytest>=7.4.0; extra == 'dev'
Requires-Dist: ruff>=0.0.280; extra == 'dev'
Description-Content-Type: text/markdown

<div align="center">
  <img src="assets/mosaicx_logo.png" alt="MOSAICX Logo" width="800"/>
</div>
<p align="center">
  <a href="https://pypi.org/project/mosaicx/"><img alt="PyPI" src="https://img.shields.io/pypi/v/mosaicx.svg?label=PyPI&style=flat-square&logo=python&logoColor=white&color=bd93f9"></a>
  <a href="https://www.python.org/downloads/"><img alt="Python" src="https://img.shields.io/badge/Python-3.11%2B-50fa7b?style=flat-square&logo=python&logoColor=white"></a>
  <a href="https://www.gnu.org/licenses/agpl-3.0"><img alt="License" src="https://img.shields.io/badge/License-AGPL--3.0-ff79c6?style=flat-square&logo=gnu&logoColor=white"></a>
  <a href="https://pepy.tech/project/mosaicx"><img alt="Downloads" src="https://img.shields.io/pepy/dt/mosaicx?style=flat-square&color=8be9fd&label=Downloads"></a>
  <a href="https://pydantic.dev"><img alt="Pydantic v2" src="https://img.shields.io/badge/Pydantic-v2-ffb86c?style=flat-square&logo=pydantic&logoColor=white"></a>
  <a href="https://ollama.ai"><img alt="Ollama Compatible" src="https://img.shields.io/badge/Ollama-Compatible-6272a4?style=flat-square&logo=ghost&logoColor=white"></a>
  <a href="mailto:lalith@zenta.solutions"><img alt="Commercial License" src="https://img.shields.io/badge/Commercial%20Use-Contact%20Zenta-orange?style=flat-square&logo=data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iOTIiIGhlaWdodD0iOTIiIHZpZXdCb3g9IjAgMCA5MiA5MiIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KPGNpcmNsZSBjeD0iNDUuODk2NiIgY3k9IjcuOTEyOTYiIHI9IjYuMjkxNzEiIHRyYW5zZm9ybT0icm90YXRlKDkwIDQ1Ljg5NjYgNy45MTI5NikiIGZpbGw9IiNFNkU0REUiIHN0cm9rZT0iI0U2RTRERSIgc3Ryb2tlLXdpZHRoPSIzLjI0MjUxIi8+CjxjaXJjbGUgY3g9IjgzLjg3ODEiIGN5PSIyNi45MDQyIiByPSI2LjI5MTcxIiB0cmFuc2Zvcm09InJvdGF0ZSg5MCA4My44NzgxIDI2LjkwNDIpIiBmaWxsPSIjRTZFNERFIiBzdHJva2U9IiNFNkU0REUiIHN0cm9rZS13aWR0aD0iMy4yNDI1MSIvPgo8Y2lyY2xlIGN4PSI3LjkxMzIxIiBjeT0iMjYuOTA0MiIgcj0iNi4yOTE3MSIgdHJhbnNmb3JtPSJyb3RhdGUoOTAgNy45MTMyMSAyNi45MDQyKSIgZmlsbD0iI0U2RTRERSIgc3Ryb2tlPSIjRTZFNERFIiBzdHJva2Utd2lkdGg9IjMuMjQyNTEiLz4KPGNpcmNsZSBjeD0iNy45MTMyMSIgY3k9IjQ1Ljg5NjQiIHI9IjYuMjkxNzEiIHRyYW5zZm9ybT0icm90YXRlKDkwIDcuOTEzMjEgNDUuODk2NCkiIGZpbGw9IiNFNkU0REUiIHN0cm9rZT0iI0U2RTRERSIgc3Ryb2tlLXdpZHRoPSIzLjI0MjUxIi8+CjxjaXJjbGUgY3g9IjY0Ljg4NjgiIGN5PSI4My44Nzg4IiByPSI2LjI5MTcxIiB0cmFuc2Zvcm09InJvdGF0ZSg5MCA2NC44ODY4IDgzLjg3ODgpIiBmaWxsPSIjRTZFNERFIiBzdHJva2U9IiNFNkU0REUiIHN0cm9rZS13aWR0aD0iMy4yNDI1MSIvPgo8Y2lyY2xlIGN4PSIyNi45MDQ0IiBjeT0iODMuODc4OCIgcj0iNi4yOTE3MSIgdHJhbnNmb3JtPSJyb3RhdGUoOTAgMjYuOTA0NCA4My44Nzg4KSIgZmlsbD0iI0U2RTRERSIgc3Ryb2tlPSIjRTZFNERFIiBzdHJva2Utd2lkdGg9IjMuMjQyNTEiLz4KPGNpcmNsZSBjeD0iNy45MTMyMSIgY3k9IjY0Ljg4NzYiIHI9IjYuMjkxNzEiIHRyYW5zZm9ybT0icm90YXRlKDkwIDcuOTEzMjEgNjQuODg3NikiIGZpbGw9IiNFNkU0REUiIHN0cm9rZT0iI0U2RTRERSIgc3Ryb2tlLXdpZHRoPSIzLjI0MjUxIi8+CjxyZWN0IHg9IjkwLjE2OTgiIHk9IjM5LjYwNDciIHdpZHRoPSIzMS41NzQ1IiBoZWlnaHQ9IjEyLjU4MzQiIHJ4PSI2LjI5MTcxIiB0cmFuc2Zvcm09InJvdGF0ZSg5MCA5MC4xNjk4IDM5LjYwNDcpIiBmaWxsPSIjRTZFNERFIiBzdHJva2U9IiNFNkU0REUiIHN0cm9rZS13aWR0aD0iMy4yNDI1MSIvPgo8cmVjdCB4PSI3MS4xNzg1IiB5PSIxLjYyMTI2IiB3aWR0aD0iMzEuNTc0NSIgaGVpZ2h0PSIxMi41ODM0IiByeD0iNi4yOTE3MSIgdHJhbnNmb3JtPSJyb3RhdGUoOTAgNzEuMTc4NSAxLjYyMTI2KSIgZmlsbD0iI0U2RTRERSIgc3Ryb2tlPSIjRTZFNERFIiBzdHJva2Utd2lkdGg9IjMuMjQyNTEiLz4KPHJlY3QgeD0iNTIuMTg4MyIgeT0iNTguNTk1OSIgd2lkdGg9IjMxLjU3NDUiIGhlaWdodD0iMTIuNTgzNCIgcng9IjYuMjkxNzEiIHRyYW5zZm9ybT0icm90YXRlKDkwIDUyLjE4ODMgNTguNTk1OSkiIGZpbGw9IiNFNkU0REUiIHN0cm9rZT0iI0U2RTRERSIgc3Ryb2tlLXdpZHRoPSIzLjI0MjUxIi8+CjxyZWN0IHg9IjMzLjE5NjEiIHk9IjIyLjE5NTUiIHdpZHRoPSIzMS41NzQ1IiBoZWlnaHQ9IjEyLjU4MzQiIHJ4PSI2LjI5MTcxIiB0cmFuc2Zvcm09InJvdGF0ZSg5MCAzMy4xOTYxIDIyLjE5NTUpIiBmaWxsPSIjRTZFNERFIiBzdHJva2U9IiNFNkU0REUiIHN0cm9rZS13aWR0aD0iMy4yNDI1MSIvPgo8cmVjdCB4PSIzMy4xOTYxIiB5PSIxLjYyMTI2IiB3aWR0aD0iMzEuNTc0NSIgaGVpZ2h0PSIxMi41ODM0IiByeD0iNi4yOTE3MSIgdHJhbnNmb3JtPSJyb3RhdGUoOTAgMzMuMTk2MSAxLjYyMTI2KSIgZmlsbD0iI0U2RTRERSIgc3Ryb2tlPSIjRTZFNERFIiBzdHJva2Utd2lkdGg9IjMuMjQyNTEiLz4KPHJlY3QgeD0iMTQuMjA0OSIgeT0iMzkuNjA0NyIgd2lkdGg9IjMxLjU3NDUiIGhlaWdodD0iMTIuNTgzNCIgcng9IjYuMjkxNzEiIHRyYW5zZm9ybT0icm90YXRlKDkwIDE0LjIwNDkgMzkuNjA0NykiIGZpbGw9IiNFNkU0REUiIHN0cm9rZT0iI0U2RTRERSIgc3Ryb2tlLXdpZHRoPSIzLjI0MjUxIi8+Cjwvc3ZnPgo="></a>
</p>

# MOSAICX: Structure first. Insight follows.

MOSAICX turns unstructured clinical documents into **validated, structured data**—locally, privately, reproducibly. It supports:

- **Schema generation** from natural language (Pydantic v2)  
- **Extraction** from PDFs/text using the generated schema  
- **Summarization** of radiology reports (single or multi-report per patient) → **critical timeline + one-paragraph executive summary** as **JSON**

> Local LLMs via **Ollama** (OpenAI-compatible). PDF text via **Docling**. Rich terminal UI.

---

## 🚀 Quick Start

### 1) Requirements
```bash
# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model that behaves well with JSON
ollama pull llama3.1:8b-instruct         # or: qwen2.5:7b-instruct, gpt-oss:120b
```

### 2) Install MOSAICX
```bash
pip install mosaicx
# or (faster resolver)
uv add mosaicx
```

### 3) Smoke test
```bash
mosaicx --help
```

---

## ✨ New: Summarize (Timeline + JSON)

**Goal:** give clinicians an at-a-glance patient trajectory from one or more radiology reports (same patient), without reading everything.

- **Input:** one or many reports (`.pdf` or `.txt`) for a single patient  
- **Logic:** radiology-first prompt (modality-adaptive), concise, **no recommendations/differentials**  
- **Output:**  
  - **Terminal**: header + timeline + executive summary (Rich)  
  - **JSON**: standardized object (`patient`, `timeline[]`, `overall`)

**Example**
```bash
mosaicx summarize \
  --report P001_CT_2025-08-01.pdf \
  --report P001_CT_2025-09-10.pdf \
  --patient P001 \
  --model llama3.1:8b-instruct \
  --json-out out/summary_P001.json \
```
**or**
```bash
mosaicx summarize \
  --dir ./patient_directory
  --json-out ./longitudinal_summary.json
  --model gpt-oss:120b
```

**Summary JSON (shape)**
```json
{
  "patient": {
    "patient_id": "P001",
    "dob": null,
    "sex": null,
    "last_updated": "2025-09-19T12:34:56Z"
  },
  "timeline": [
    { "date": "2025-08-01", "source": "CT 2025-08-01", "note": "Baseline nodal disease; R ext-iliac LN short-axis 12 mm" },
    { "date": "2025-09-10", "source": "CT 2025-09-10", "note": "R ext-iliac LN 12→16 mm — progression; no visceral mets" }
  ],
  "overall": "Nodal-only disease with interval progression of the right external iliac node [CT 2025-09-10]; baseline nodal disease without visceral metastases [CT 2025-08-01]."
}
```

**Under the hood (robust fallbacks)**  
1) Instructor JSON → Pydantic → ✅  
2) Raw JSON extraction → Pydantic → ✅  
3) Heuristic timeline/summary → ✅  

---

## Core Workflows

### 1) Generate a schema (from plain English)
```bash
mosaicx generate \
  --desc "Echocardiography with patient_id, exam_date, EF %, valve grades (Normal/Mild/Moderate/Severe), impression" \
  --model gpt-oss:120b
```

### 2) Extract structured data with that schema(PDF → JSON)
```bash
mosaicx extract \
  --pdf echo_report.pdf \
  --schema EchocardiographyReport_20250919_143022 \
  --model gpt-oss:120b \
  --save out/echo_001.json
```

### 3) Summarize radiology reports (timeline + JSON)
```bash
# Multiple inputs for the same patient
mosaicx summarize \
  --dir ./reports/P001 \
  --patient P001 \
  --model gpt-oss:120b \
  --json-out out/summary_P001.json \
```

**CLI options (summarize)**  
- `--report` … (repeatable)  
- `--dir` (recursively picks `.pdf`, `.txt`)  
- `--patient PSEUDONYM`  
- `--json-out path.json` and `--print-json`  
- `--model`, `--base-url`, `--api-key`, `--temperature`

---

## Tips for Great Results

- **Models:** prefer `llama3.1:8b-instruct` or `qwen2.5:7b-instruct` for clean JSON.  
- **Prompts (summarize):** MOSAICX uses a **conciseness-first** prompt, modality-adaptive, no DDx or recommendations.  
- **PDFs:** If scanned (no text), run OCR before MOSAICX or add an OCR pre-step in your pipeline.

---

## Troubleshooting

- **Connection refused / model not found**: start Ollama, `ollama list`, pull your model.  
- **Empty summary**: try a more JSON-obedient model; lower `--temperature` (0.0–0.2).  
- **PDF yields no text**: the PDF likely has no text layer; OCR it first.

---

## Why MOSAICX (one paragraph)

MOSAICX is **infrastructure** for clinical data: schema-driven, validated, local, and reproducible. Structure reports once, then reuse the same schemas and summarizers across departments and time—enabling longitudinal analysis, cross-modal integration, and downstream intelligence without sending data to the cloud.

---

## License

AGPL-3.0. See `LICENSE`.

### Contact
DIGIT-X Lab · Department of Radiology · LMU Klinikum  
`lalith.shiyam@med.uni-muenchen.de`
