Metadata-Version: 2.4
Name: ai-critic
Version: 3.5.0
Summary: Graph-based evaluation engine for machine learning models
Author: Luiz Filipe Seabra
License: MIT
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: numpy<2.3,>=1.23
Requires-Dist: scikit-learn<2.0,>=1.0
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Provides-Extra: viz
Requires-Dist: graphviz; extra == "viz"
Provides-Extra: full
Requires-Dist: graphviz; extra == "full"
Requires-Dist: pandas; extra == "full"

# 🚀 AI Critic 3.5.0 (Production Readiness Engine)

```bash
pip install ai-critic
```

**AI Critic** is a **graph-based evaluation engine for machine learning models**, designed to go beyond isolated metrics.

It runs a **structured evaluation pipeline** that analyzes multiple dimensions — performance, robustness, explainability, data quality, and structure — delivering a **unified, interpretable, and actionable report**.

---

# 🔥 WHAT’S NEW IN 3.5.0

### 🧠 Production-First Design

* One-line evaluation: `evaluate()`
* Simplified API for fast adoption
* Built for **real-world deployment decisions**

---

### ⚡ Standard Usage (NEW)

AI Critic is now designed to be used **right after training**:

```python
import ai_critic

report = ai_critic.evaluate(model, X, y)
```

---

### 🚫 Quality Gate (NEW — CRITICAL)

Turn evaluation into a **deployment decision**:

```python
from ai_critic import evaluate
from ai_critic.gate import enforce

report = evaluate(model, X, y)

enforce(report, threshold=75)
```

If the model is not good enough → **deployment is blocked**.

---

### 📦 Standardized Report (JSON-first)

All results follow the same schema:

```python
report = {
    "scores": {},        # technical scores (0–1)
    "details": {},       # raw evaluator outputs
    "risk": {},          # interpretable score (0–100)
    "summary": {},       # human-readable insights
    "suggestions": []    # recommended actions
}
```

👉 This makes AI Critic:

* API-ready
* Easy to log and persist
* Production-ready

---

### ⚡ Improved Graph Engine

* Dependency-aware execution (topological sort)
* Parallel execution support
* Deterministic evaluation order

---

### 🎯 Multi-layer Scoring System

* **Technical score (0–1)** → aggregation layer
* **Risk score (0–100)** → decision layer

---

### 💡 Integrated Suggestion Engine

* Automatically generates recommendations based on model behavior

---

### 🧩 Plugin System

* Clean evaluator interface
* Dependency-aware plugins
* Easily extensible evaluation pipeline

---

# ⚡ QUICK START

## 🧠 One-liner (recommended)

```python
import ai_critic

report = ai_critic.evaluate(model, X, y)

print(report["risk"])
print(report["summary"])
```

---

## 🔐 Production usage (recommended)

```python
from ai_critic import evaluate
from ai_critic.gate import enforce

report = evaluate(model, X, y)

# 🚫 blocks bad models
enforce(report, threshold=75)
```

---

## 🧪 Full control (advanced)

```python
from api.client import AICritic
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris

data = load_iris()
X, y = data.data, data.target

model = RandomForestClassifier().fit(X, y)

critic = AICritic(weights={
    "performance": 1.0,
    "robustness": 1.5
})

report = critic.evaluate(model, X, y, parallel=True)
```

---

# 🧩 INTERNAL PIPELINE

```text
evaluate()
   ↓
EvaluationGraph (nodes)
   ↓
raw_results
   ↓
ScoreAggregator (0–1)
   ↓
build_report()
   ↓
scoring.py (risk 0–100)
   ↓
summary.py (human-readable)
   ↓
SuggestionEngine
```

---

# 🧱 CORE COMPONENTS

## 1. Evaluation Graph

A DAG-based execution system:

* Automatically resolves dependencies
* Executes nodes in correct order
* Enables parallel execution

Example:

```
performance → robustness → explainability
```

---

## 2. Score Aggregator

Combines evaluator outputs:

```python
critic = AICritic(weights={
    "performance": 1.0,
    "robustness": 2.0
})
```

---

## 3. Evaluator Plugins

Fully extensible via plugins:

```python
from ai_critic.plugins.base import EvaluatorPlugin
from ai_critic.plugins.registry import EvaluatorRegistry

class FairnessEvaluator(EvaluatorPlugin):
    name = "fairness"
    dependencies = ["performance"]
    weight = 1.0

    def evaluate(self, model, dataset, context=None):
        return {
            "score": 0.92,
            "verdict": "stable",
            "message": "Fairness is acceptable"
        }

EvaluatorRegistry.register(FairnessEvaluator())
```

---

## 4. Risk Scoring (0–100)

Transforms technical signals into decision-ready output:

```python
report["risk"] = {
    "global_score": 78.5,
    "verdict": "usable_with_caution",
    "component_scores": {...},
    "penalties": [...]
}
```

---

## 5. Human Summary

High-level interpretation:

```python
report["summary"] = {
    "executive_summary": {
        "verdict": "⚠️ Risky",
        "deploy_recommended": False
    }
}
```

---

## 6. Suggestion Engine

Actionable insights:

```python
[
    "Check for data leakage",
    "Improve robustness with regularization"
]
```

---

# 🖥️ CLI

Run directly from terminal:

```bash
ai-critic --model model.pkl --data dataset.csv --target label
```

### 🔥 CI/CD Mode (recommended)

```bash
ai-critic --model model.pkl --data dataset.csv --target label --fail-on-risk
```

👉 Fails automatically if model risk is too high.

---

# 🧠 DESIGN PHILOSOPHY

### 1. Single Source of Truth

One unified data format → no inconsistencies

---

### 2. Graph-first Thinking

Evaluation is a dependency-driven pipeline, not isolated functions

---

### 3. JSON-native

Everything is ready for:

* APIs
* dashboards
* logging
* SaaS platforms

---

### 4. Actionable AI

Not just metrics — decisions:

* Should you deploy?
* Where is the risk?
* What should be improved?

---

# 🔥 POSITIONING

AI Critic is not just a metrics library.

It is a:

> 🧠 **Production gatekeeper for machine learning models**

---

# 🚀 ROADMAP

* REST API (`/evaluate`)
* Visual dashboard
* Model monitoring (post-deployment)
* Continuous evaluation (CI/CD)
* Global benchmarking between models

---

# 📄 LICENSE

MIT License
