Metadata-Version: 2.4
Name: crystal-metrics
Version: 0.1.0
Summary: Transparent multimodal reasoning metrics from the CRYSTAL benchmark (Match F1, Ordered Match F1, accuracy).
Author: Wayner Barrios, SouYoung Jin
License: MIT
Project-URL: Homepage, https://github.com/waybarrios/crystal
Project-URL: Paper, https://arxiv.org/abs/2603.13099
Project-URL: Dataset, https://huggingface.co/datasets/waybarrios/CRYSTAL
Keywords: mllm,vlm,reasoning,evaluation,match-f1,crystal,benchmark
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: torch
Requires-Dist: sentence-transformers
Requires-Dist: tqdm
Provides-Extra: judge
Requires-Dist: openai>=1.0; extra == "judge"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"

# crystal-metrics

Transparent multimodal reasoning metrics from the **CRYSTAL** benchmark —
*Match F1*, *Ordered Match F1*, *Precision*, *Recall*, and multi-format
*Accuracy*.

```bash
pip install crystal-metrics          # core metrics
pip install crystal-metrics[judge]   # + optional LLM judge
```

```python
from crystal_metrics import MLLMReasoningEvaluator

evaluator = MLLMReasoningEvaluator()  # all-distilroberta-v1, tau=0.35 (paper defaults)
m = evaluator.evaluate_single(
    predicted_steps=["Three objects on a table", "The middle one is smallest", "Answer C"],
    reference_steps=["There are three objects", "Compare their sizes", "Middle is smallest", "Select C"],
    alpha=0.3,  # enable Ordered Match F1
)
print(m.match_f1, m.precision, m.recall, m.ordered_match_f1)
```

See the [docs](https://github.com/waybarrios/crystal/tree/main/docs) for
installation, quickstart, metric definitions, and the CLI.
