Metadata-Version: 2.4
Name: commcp
Version: 1.0.0
Summary: Conformal Model Moderation & Human-in-the-Loop Routing Python Library
Author: Raghottam Nadgoudar
License: MIT
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.20.0
Requires-Dist: scikit-learn>=1.0.0
Requires-Dist: openai>=1.0.0
Dynamic: license-file

# commCP 🛡️
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

**commCP** (Conformal Model Moderation & Human-in-the-Loop Routing) is a post-training wrapper for binary classification estimators. It combines **Conformal Prediction** (to enforce statistical reliability guarantees) and **LLM Refereeing** to decide when a prediction can be auto-accepted vs. when it should be escalated for human review.

Inspired by stats-centric tools like MAPIE, commCP bridges the gap between statistical guarantees and LLM verification for the AI era.

---

## Features
- **Statistical Coverage Guarantees**: Enforces target error rates ($1 - \alpha$) via conformal calibration.
- **Selective Prediction / HITL**: Automatically routes predictions into `auto_decided` or `escalated` queues.
- **LLM-as-a-Referee**: Mediates ensemble disagreements and conformal "gray-zone" uncertainties dynamically.
- **Cost-Optimized**: Bypasses the LLM completely for obvious acceptances or low-confidence/high-risk rejections, keeping API costs to a minimum.
- **Seamless sklearn Compatibility**: Works with any estimator exposing a `predict_proba` method (e.g., LogisticRegression, RandomForest, XGBoost).

---

## Installation

```bash
# Install from source (or PyPI once published)
pip install .
```

---

## Quick Start Guide

### 1. Train Your Classifier
```python
from sklearn.ensemble import RandomForestClassifier
from commcp import CommCP

# Train a standard sklearn classifier
model = RandomForestClassifier()
model.fit(X_train, y_train)
```

### 2. Wrap and Calibrate commCP
Initialize `CommCP` with your trained estimator, along with a task description and class labels (which are required to build high-accuracy prompts for the LLM Referee). Pass a held-out calibration set to establish the conformal threshold.

```python
# Initialize commcp wrapper (configured for significance level alpha=0.05 -> 95% coverage)
ccp = CommCP(
    estimator=model,
    task_description="Predict whether a patient has heart disease based on clinical features",
    class_labels={0: "Healthy", 1: "Heart Disease Present"},
    alpha=0.05,
    llm_provider="groq", # Supports "groq" or "openai"
    verify_margin=0.15   # Trigger LLM verification on predictions within 15% of the threshold
)

# Calibrate
ccp.calibrate(X_calib, y_calib)
```

### 3. Predict & Moderate
Predict outcomes for test data. CommCP will execute conformal gating, query the LLM referee on borderline cases, and partition predictions.

```python
# Run predictions
results = ccp.predict(
    X_test, 
    text_dossiers=text_descriptions # Optional natural language dossiers for LLM inspection
)

# Get automation and routing results
print(f"Automation rate: {results.automation_rate:.2%}")

# Access lists of auto-decided and escalated records
auto_cases = results.auto_decided  # list of dicts
human_queue = results.escalated    # list of dicts
```

### 4. Evaluate Guarantees
Verify if your target mathematical coverage guarantee was met:
```python
empirical_coverage = results.coverage(y_test)
print(f"Empirical Coverage: {empirical_coverage:.2%}") # Should be >= 95%
```

Examine system performance details:
```python
print(results.stats(y_test))
```

---

## Customizing Gating Logic

CommCP dynamically adjusts its gating based on your model architecture:
* **Single Models**: Uses **Gray-Zone Gating**. Calls the LLM referee only when confidence is close but below the conformal cutoff.
* **Ensembles**: Uses **Disagreement Gating**. Automatically inspects ensemble consensus and calls the LLM to referee conflicting model predictions.

---

## License
Licensed under the [MIT License](LICENSE).
