Metadata-Version: 2.4
Name: ai-critic
Version: 2.1.0
Summary: Fast AI evaluator for scikit-learn models
Author-email: Luiz Seabra <filipedemarco@yahoo.com>
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: numpy
Requires-Dist: scikit-learn

# ai-critic 🧠

## The Quality Gate for Machine Learning Models

**ai-critic** is an **intelligent evaluation and decision system** designed to determine whether a machine learning model is **safe, reliable, and trustworthy enough** to be deployed in real-world environments.

Unlike traditional ML evaluation tools that focus almost exclusively on *performance metrics*, **ai-critic acts as a Quality Gate** — a final checkpoint that actively probes models to uncover **hidden risks** that frequently cause silent failures in production.

> **ai-critic does not ask *“How accurate is this model?”***
> It asks ***“Can this model be trusted in the real world?”***

---

## 🎯 Why ai-critic Exists

Most production ML failures are **not accuracy problems**.

They are caused by:

* Data leakage hidden inside features
* Overfitting disguised as strong validation scores
* Models that collapse under small noise
* Fragile dependency on a single feature
* Structurally unsafe configurations

These failures usually appear **after deployment**, when they are already expensive — or dangerous — to fix.

**ai-critic exists to detect these risks *before* deployment.**

---

## 🚀 Installation

Install directly from PyPI:

```bash
pip install ai-critic
```

Python **3.8+** is recommended.

---

## ⚡ Quick Start (Fast Verdict)

If you want a **clear, conservative deployment recommendation**, this is all you need.

```python
from ai_critic import AICritic
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

X, y = make_classification(
    n_samples=1000,
    n_features=20,
    random_state=42
)

model = RandomForestClassifier(
    max_depth=5,
    random_state=42
)

critic = AICritic(model, X, y)

report = critic.evaluate(view="executive")

print(report)
```

### Example Output

```text
Verdict: ⚠️ Risk Detected
Risk Level: medium
Deploy Recommended: False
Main Reason: Structural or robustness risks detected
```

> **If ai-critic approves deployment, it means no meaningful risks were detected by multiple independent checks.**

The system is intentionally **skeptical by design**.

---

## 🧭 What Does the Verdict Mean?

| Field                | Meaning                     |
| -------------------- | --------------------------- |
| `verdict`            | Human-readable summary      |
| `risk_level`         | low / medium / high         |
| `deploy_recommended` | Final quality gate decision |
| `main_reason`        | Primary blocking factor     |

Clarity is prioritized over ambiguity.

---

## 🧠 How ai-critic Thinks (Core Concept)

**ai-critic is not a metric calculator.**
It is a **decision system**.

Internally, it works in three layers:

1. **Evaluators** → Detect signals and risks
2. **Critic Gate** → Decide if intervention is needed
3. **Deployment Policy** → Decide if deployment is safe

---

## 🧱 The Four Pillars of the Audit

ai-critic evaluates models across **four independent risk dimensions**:

| Pillar                | Detects                          | Why It Matters      |
| --------------------- | -------------------------------- | ------------------- |
| 📊 Data Integrity     | Leakage, shortcuts, correlations | Inflated metrics    |
| 🧠 Model Structure    | Over-complexity, unsafe configs  | Poor generalization |
| 📈 Performance Sanity | Suspicious CV behavior           | False confidence    |
| 🧪 Robustness         | Noise sensitivity                | Production collapse |

Each pillar emits **signals**, not binary judgments.

Those signals are aggregated by the **Critic Gate**.

---

## 🧪 Robustness Testing (Noise Injection)

Production data is never clean.

ai-critic injects controlled noise into inputs and measures degradation:

```python
robustness = report["details"]["robustness"]

print(robustness["performance_drop"])
print(robustness["verdict"])
```

Possible outcomes:

* `stable` → acceptable degradation
* `fragile` → high sensitivity
* `misleading` → likely inflated performance

---

## 🔍 Explainability & Feature Sensitivity

ai-critic performs **feature sensitivity analysis** to detect:

* Feature-level leakage
* Over-reliance on a single signal
* Shortcut learning

How it works:

1. A feature is perturbed or permuted
2. The model is re-evaluated
3. Performance drop is measured

Large drops indicate **critical dependency**.

This approach is:

* Model-agnostic
* Lightweight
* Interpretable
* Framework-independent

---

## 🧠 Recommendations Engine

ai-critic does not stop at *“deploy or not”*.

It generates **actionable recommendations**, such as:

* Reduce model complexity
* Increase regularization
* Possible data leakage detected
* High noise sensitivity
* Structural overfitting signals

These recommendations are **rule-based and data-driven**, not LLM hallucinations.

---

## 🚦 Deployment Decision

The final decision is produced via:

```python
decision = critic.deploy_decision()

print(decision)
```

Output includes:

* Deployment approval or rejection
* Risk level
* ML confidence score
* Blocking issues
* Recommendations

---

## 🧠 Critic Gate (New)

The **Critic Gate** decides **whether suggestions should even be made**.

This prevents:

* Over-criticism
* Noise-based warnings
* Fatigue from excessive suggestions

The gate considers:

* Overall score
* Dataset size
* Verdict severity
* Structural risk

This turns ai-critic into a **judgment system**, not a nagging tool.

---

## 🔄 Feedback Loop & Learning Critic

ai-critic can learn from outcomes.

You can optionally provide feedback:

```bash
ai-critic --feedback success
```

This enables:

* Smarter future decisions
* Better thresholds
* Context-aware criticism

The critic improves without exposing your data.

---

## 🖥️ Command Line Interface (CLI)

ai-critic ships with a professional CLI:

```bash
ai-critic \
  --model model.pkl \
  --data dataset.csv \
  --target label
```

CLI output includes:

* Gate decision
* Deployment recommendation
* Risk level
* Suggestions

Use `--json` for automation and pipelines.

---

## 🧩 Multi-Framework Support

Supported via adapters:

* scikit-learn
* PyTorch
* TensorFlow

The API remains consistent.

---

## 🛡️ What ai-critic Is NOT

* ❌ A hyperparameter optimizer
* ❌ A leaderboard benchmarking tool
* ❌ A replacement for domain expertise
* ❌ A blind approval system

---

## 🧠 Design Philosophy

ai-critic assumes:

* Metrics can lie
* Data is imperfect
* Models fail silently
* Trust must be earned

That makes it ideal as a **final quality gate**, not a tuning toy.

---

## 🧠 Final Note

> **ai-critic is not here to make models look good.**
> It exists to **prevent unsafe models from looking good enough to deploy**.

A failed audit does **not** mean your model is bad.
It means your model is **not yet safe to trust**.

That distinction is everything.
