Metadata-Version: 2.1
Name: benchmark-reliability
Version: 0.1.4
Summary: Benchmark Reliability Framework (BRF) - dataset-level reliability auditing for predictive benchmarks
Author-email: zhanglizhuo <zhanglizhuo@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/zhanglizhuo/BenchmarkReliability
Project-URL: Repository, https://github.com/zhanglizhuo/BenchmarkReliability
Keywords: benchmark reliability,dataset auditing,educational AI,machine learning
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: numpy>=1.21
Requires-Dist: scikit-learn>=1.0
Requires-Dist: matplotlib>=3.5

# benchmark-reliability

A Python package for computing the **Benchmark Reliability Framework (BRF)**: a four-dimension audit protocol that evaluates whether a predictive dataset is structurally reliable before model development.

## Installation

```bash
pip install benchmark-reliability
```

Requires Python 3.8+ with numpy, scikit-learn, and matplotlib.

## Quick Start

```python
import numpy as np
from brf import BRFAnalyzer

# Your data
X = np.random.randn(200, 10)
y = np.random.randn(200)
groups = np.random.choice(["A", "B", "C"], 200)

# Run the audit
analyzer = BRFAnalyzer(n_splits=30, n_permutations=200).fit(X, y, groups=groups)

# Results
print(analyzer.brf_vector)
# {'B': 0.123, 'I': 0.045, 'N': 0.97, 'M': 0.82,
#  'S': 0.925, 'E': 0.943, 'class': 'Reliable'}
```

## BRF Dimensions

| Dimension | Name | Meaning |
|-----------|------|---------|
| B | Baseline Gain | Model improvement over mean predictor |
| I | Instability | Sensitivity to train/test split choice |
| N | Null Separability | Signal distinguishability from noise |
| M | Metadata Sufficiency | Group structure completeness |

The embedding coordinates S = N - I (Signal Identifiability) and E = B + M (Epistemic Completeness) classify datasets into **Reliable**, **Fragile**, or **Void**.

## Visualization

```python
from brf.phase import plot_phase_diagram

plot_phase_diagram(
    [analyzer.S], [analyzer.E],
    labels=[analyzer.class_],
    classes=[analyzer.class_],
)
```

## Export

```python
from brf.report import export_json, export_latex

export_json(analyzer.brf_vector, "results.json")
latex_table = export_latex(analyzer.brf_vector)
```

## Citation

If you use this package, please cite the BehaviorAudit paper:

```
BehaviorAudit: a four-dimension pre-modeling audit protocol
for educational prediction benchmarks. Scientific Reports (under review).
```

## License

MIT

## Links

- GitHub: https://github.com/zhanglizhuo/BenchmarkReliability
- PyPI: https://pypi.org/project/benchmark-reliability/
