Metadata-Version: 2.4
Name: ohdsi-circepy
Version: 0.1.0
Summary: Python implementation of OHDSI CIRCE-BE for cohort definition and SQL generation
Author-email: Jamie Gilbert <gilbert@ohdsi.org>
Maintainer-email: Jamie Gilbert <gilbert@ohdsi.org>
License: Apache-2.0
Project-URL: Homepage, https://github.com/OHDSI/Circepy
Project-URL: Documentation, https://ohdsi-circepy.readthedocs.io/
Project-URL: Repository, https://github.com/OHDSI/Circepy.git
Project-URL: Issues, https://github.com/OHDSI/Circepy/issues
Project-URL: Changelog, https://github.com/OHDSI/Circepy/blob/main/CHANGELOG.md
Project-URL: Bug Tracker, https://github.com/OHDSI/Circepy/issues
Project-URL: Source Code, https://github.com/OHDSI/Circepy
Keywords: ohdsi,circe,cohort,sql,omop,cdm,healthcare,observational-health
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Healthcare Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic>=2.0.0
Requires-Dist: typing-extensions>=4.0.0
Requires-Dist: jinja2>=3.1.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: isort>=5.0.0; extra == "dev"
Requires-Dist: flake8>=5.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: sqlglot>=23.0.0; extra == "dev"
Requires-Dist: duckdb>=0.9.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx>=5.0.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.0.0; extra == "docs"
Dynamic: license-file

# CIRCE Python Implementation

[![Python](https://img.shields.io/badge/python-3.8%2B-blue)](https://www.python.org/downloads/)
[![Tests](https://img.shields.io/badge/tests-3400%2B%20passed-brightgreen)](tests/)
[![Coverage](https://img.shields.io/badge/coverage-34%25-orange)](htmlcov/)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)
[![PyPI](https://img.shields.io/badge/PyPI-ohdsi--ohdsi-circepy-blue)](https://pypi.org/project/ohdsi-circepy/)

> [!CAUTION]
> **This project is currently under active testing and development.** It is a Python implementation of the OHDSI CIRCE-BE Java library. While we aim for 1:1 parity, this version is an Alpha release and should be used with caution in production environments.

A Python implementation of the OHDSI CIRCE-BE (Cohort Inclusion and Restriction Criteria Engine) for generating SQL queries from cohort definitions in the OMOP Common Data Model.

## Overview

CIRCE Python provides a comprehensive toolkit for working with OMOP CDM cohort definitions:

- **Cohort Definition Modeling**: Create and validate cohort expressions using Pydantic models
- **SQL Generation**: Generate SQL queries from cohort definitions for OMOP CDM v5.x
- **Concept Set Management**: Handle concepts and concept sets from OMOP vocabularies
- **Validation & Checking**: Comprehensive validation with 40+ checker implementations
- **Print-Friendly Output**: Generate human-readable markdown descriptions of cohort definitions
- **CLI Interface**: Command-line tools for validation, SQL generation, and markdown rendering

## Package Status

> [!IMPORTANT]
> This package is currently in **Alpha** status and undergoing rigorous parity testing against the Java implementation.

- **Version**: 0.1.0 (Alpha)
- **Tests**: 3,400+ passing
- **Coverage**: 34% (Core logic focus)
- **Python**: 3.8+
- **License**: Apache 2.0

## Installation

> [!NOTE]
> This package is currently in private development. Install from source using Git.

### From Source (Current Method)

```bash
# Clone the repository
git clone https://github.com/OHDSI/ohdsi-circepy.git
cd Circepy

# Install in development mode with all dependencies
pip install -e ".[dev]"

# Verify installation
circe --help
```

See [INSTALLATION.md](INSTALLATION.md) for detailed installation instructions, troubleshooting, and setup options.

### From PyPI (Coming Soon)

> ```bash
> # Coming in future release
> pip install ohdsi-circepy
> ```

## Quick Start

### Command-Line Interface

The easiest way to use CIRCE is through the command-line interface:

```bash
# Validate a cohort expression JSON file
circe validate cohort.json

# Generate SQL from a cohort expression
circe generate-sql cohort.json --output cohort.sql

# Render a cohort expression to markdown
circe render-markdown cohort.json --output cohort.md

# Process a cohort expression (validate, generate SQL, and render markdown)
circe process cohort.json --validate --sql --markdown
```

See the [CLI Documentation](#command-line-interface) section below for more details.

### Python API

```python
from circe import CohortExpression
from circe.cohortdefinition import PrimaryCriteria, ConditionOccurrence
from circe.cohortdefinition.core import ObservationFilter, ResultLimit
from circe.vocabulary import ConceptSet, ConceptSetExpression, ConceptSetItem, Concept

# Create a cohort expression
cohort = CohortExpression(
    title="Type 2 Diabetes Cohort",
    primary_criteria=PrimaryCriteria(
        criteria_list=[
            ConditionOccurrence(
                codeset_id=1,
                first=True
            )
        ],
        observation_window=ObservationFilter(prior_days=0, post_days=0),
        primary_limit=ResultLimit(type="All")
    ),
    concept_sets=[
        ConceptSet(
            id=1,
            name="Type 2 Diabetes",
            expression=ConceptSetExpression(
                items=[
                    ConceptSetItem(
                        concept=Concept(
                            concept_id=201826,
                            concept_name="Type 2 diabetes mellitus"
                        ),
                        include_descendants=True
                    )
                ]
            )
        )
    ]
)

# Generate SQL using the API
from circe.api import build_cohort_query
from circe.cohortdefinition import BuildExpressionQueryOptions

options = BuildExpressionQueryOptions()
options.cdm_schema = 'cdm'
options.vocabulary_schema = 'cdm'
options.cohort_id = 1
options.target_table = 'scratch.cohort'
sql = build_cohort_query(cohort, options)
print(sql)
```

## What's Included

This package provides a complete Python implementation of CIRCE-BE with:

- **3,400+ passing tests** with focused coverage on core logic
- **18+ SQL builders** for all OMOP CDM domains:
  - Condition Occurrence/Era
  - Drug Exposure/Era
  - Procedure Occurrence
  - Measurement, Observation
  - Visit Occurrence/Detail
  - Device Exposure, Specimen
  - Death, Location Region
  - Observation Period, Payer Plan Period
  - And more...
- **Full cohort expression validation** with comprehensive error checking
- **Markdown rendering** for human-readable cohort descriptions
- **Complete CLI interface** with 4 commands (validate, generate-sql, render-markdown, process)
- **Java interoperability** - supports both camelCase and snake_case field names for seamless Java CIRCE-BE compatibility

## ⚠️ Java Fidelity Requirement

**This project maintains 1:1 compatibility with Java CIRCE-BE.**

- All Python classes replicate Java functionality exactly
- Field names support both Java (camelCase) and Python (snake_case) formats
- SQL generation produces identical results to Java implementation
- All changes are validated against Java schema

See [JAVA_CLASS_MAPPINGS.md](JAVA_CLASS_MAPPINGS.md) for complete class mappings.

## Package Structure

```
circe/
├── cohortdefinition/          # Core cohort definition classes
│   ├── builders/              # SQL query builders (18+ builders)
│   ├── printfriendly/         # Human-readable markdown output
│   └── negativecontrols/      # Negative control generation
├── vocabulary/                # Concept and concept set management
├── check/                     # Validation and checking framework
│   ├── checkers/              # 40+ specific checker implementations
│   ├── operations/            # Check operations
│   ├── utils/                 # Check utilities
│   └── warnings/              # Warning classes
├── helper/                    # Utility helper classes
├── api.py                     # High-level API functions
└── cli.py                     # Command-line interface
```

## Features

### ✅ Implemented

- [x] Complete cohort definition data model with Pydantic validation
- [x] 18+ SQL builders covering all OMOP CDM domains
- [x] Comprehensive CLI interface (validate, generate-sql, render-markdown, process)
- [x] Java interoperability with camelCase/snake_case field support
- [x] Cohort expression validation with 40+ checker implementations
- [x] Markdown rendering for print-friendly descriptions
- [x] Full test suite (3,400+ tests)
- [x] Type hints throughout with py.typed marker
- [x] Concept set expression handling
- [x] Window criteria and correlated criteria support
- [x] Date adjustments and custom era strategies
- [x] Observation period and demographic criteria
- [x] Inclusion rules and censoring criteria


## Command-Line Interface

CIRCE provides a comprehensive command-line interface for validating, generating SQL, and rendering cohort expressions.

### Validate Command

Validate a cohort expression JSON file against the CIRCE standard:

```bash
circe validate cohort.json
```

Options:
- `--verbose, -v`: Display all validation warnings including INFO level
- `--quiet, -q`: Suppress non-error output

Exit codes:
- `0`: Valid (no errors or warnings)
- `1`: Invalid (errors found)
- `2`: Valid but has warnings

### Generate SQL Command

Generate SQL from a cohort expression:

```bash
# Output to stdout
circe generate-sql cohort.json

# Output to file
circe generate-sql cohort.json --output cohort.sql

# With custom schema names
circe generate-sql cohort.json --cdm-schema my_cdm --vocab-schema my_vocab --cohort-id 123
```

Options:
- `--output, -o`: Output SQL file path (default: stdout)
- `--sql-options`: JSON file with BuildExpressionQueryOptions
- `--cdm-schema`: CDM schema name (default: `@cdm_database_schema`)
- `--vocab-schema`: Vocabulary schema name (default: `@vocabulary_database_schema`)
- `--cohort-id`: Cohort ID for SQL generation
- `--validate`: Validate before generating SQL (default: True)
- `--no-validate`: Skip validation before generating SQL
- `--verbose, -v`: Verbose output
- `--quiet, -q`: Suppress non-error output

### Render Markdown Command

Render a cohort expression to human-readable markdown:

```bash
# Output to stdout
circe render-markdown cohort.json

# Output to file
circe render-markdown cohort.json --output cohort.md
```

Options:
- `--output, -o`: Output markdown file path (default: stdout)
- `--validate`: Validate before rendering markdown (default: True)
- `--no-validate`: Skip validation before rendering markdown
- `--verbose, -v`: Verbose output
- `--quiet, -q`: Suppress non-error output

### Process Command

Process a cohort expression with multiple operations:

```bash
# Validate, generate SQL, and render markdown
circe process cohort.json --validate --sql --markdown

# Generate SQL with custom output file
circe process cohort.json --sql output.sql

# Generate SQL and markdown with default file names
circe process cohort.json --sql --markdown
```

Options:
- `--validate`: Validate the cohort expression
- `--sql [FILE]`: Generate SQL (optionally specify output file, default: input file with .sql extension)
- `--markdown [FILE]`: Render markdown (optionally specify output file, default: input file with .md extension)
- `--sql-options`: JSON file with BuildExpressionQueryOptions
- `--cdm-schema`: CDM schema name (default: `@cdm_database_schema`)
- `--vocab-schema`: Vocabulary schema name (default: `@vocabulary_database_schema`)
- `--cohort-id`: Cohort ID for SQL generation
- `--verbose, -v`: Verbose output
- `--quiet, -q`: Suppress non-error output

### CLI Examples

```bash
# Validate a cohort expression
circe validate my_cohort.json

# Generate SQL with custom schema
circe generate-sql my_cohort.json --output my_cohort.sql \
    --cdm-schema my_cdm_schema \
    --vocab-schema my_vocab_schema \
    --cohort-id 1

# Generate SQL and markdown in one command
circe process my_cohort.json --sql --markdown

# Validate, generate SQL, and render markdown
circe process my_cohort.json --validate --sql my_cohort.sql --markdown my_cohort.md
```

## Development

### Setup Development Environment

```bash
# Clone the repository
git clone https://github.com/OHDSI/Circepy.git
cd Circepy

# Install with development dependencies
pip install -e ".[dev]"

# Verify installation
pytest --version
circe --help
```

### Running Tests

```bash
pytest
```

All 3,400+ tests should pass.

### Code Formatting

```bash
black circe/
isort circe/
```

### Type Checking

```bash
mypy circe/
```

## Compatibility Notes

This implementation is designed to be compatible with OHDSI CIRCE-BE Java version. The Python package:

- Accepts JSON cohort definitions from OHDSI Atlas and other tools
- Generates SQL identical to the Java implementation
- Supports all OMOP CDM v5.x versions
- Maintains field name compatibility (camelCase and snake_case)

## Troubleshooting

### Import Errors

If you encounter import errors, ensure the package is properly installed:

```bash
pip install --upgrade ohdsi-circepy
```

### SQL Generation Issues

- Verify your cohort expression JSON is valid using `circe validate`
- Check that all concept IDs reference valid OMOP concepts
- Ensure schema names are correctly specified

### Performance Considerations

For large cohort definitions with many criteria:

- SQL generation typically completes in < 1 second
- Validation runs in < 500ms for most cohorts
- Memory usage scales with the number of criteria (typically < 100MB)

## FAQ

**Q: Is this compatible with OHDSI Atlas?**
A: Yes, this package can process cohort definition JSON files exported from Atlas.

**Q: Can I use this with CDM v5.3?**
A: Yes, the package supports all OMOP CDM v5.x versions.

**Q: How do I convert camelCase JSON to Python?**
A: The package automatically handles both camelCase and snake_case field names.

**Q: Does this replace the Java CIRCE-BE?**
A: No, this is a complementary Python implementation. Both produce identical SQL output.

## Contributing

Contributions are welcome! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.

Key areas for contribution:
- Additional test coverage
- Performance optimizations
- Documentation improvements
- Bug fixes and issue reports

## License

This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

This project is based on the Java CIRCE-BE implementation by the OHDSI community. We thank all contributors to the original Java implementation.

Special thanks to:
- The OHDSI community for their continued support
- Contributors to the Java CIRCE-BE project
- The Pydantic team for their excellent validation library

## Support

- **Repository**: https://github.com/OHDSI/ohdsi-circepy
- **Issues**: https://github.com/OHDSI/ohdsi-circepy/issues
- **Installation Guide**: [INSTALLATION.md](INSTALLATION.md)
- **PyPI**: https://pypi.org/project/ohdsi-circepy/ (coming soon)
- **Documentation**: https://ohdsi-circepy.readthedocs.io/ (coming soon)

## Related Projects

- [OHDSI CIRCE-BE (Java)](https://github.com/OHDSI/circe-be) - Original Java implementation
- [OHDSI Common Data Model](https://github.com/OHDSI/CommonDataModel) - OMOP CDM specification
- [OHDSI Atlas](https://github.com/OHDSI/Atlas) - Web-based cohort definition tool
- [OHDSI WebAPI](https://github.com/OHDSI/WebAPI) - RESTful API for OHDSI tools
