Metadata-Version: 2.4
Name: dot-metrics
Version: 0.4.0
Summary: A lightweight metrics and constraint evaluation framework
Project-URL: Homepage, https://gitlab.com/deepika6190303/deepika-open-toolbox/dot-metrics
Project-URL: Repository, https://gitlab.com/deepika6190303/deepika-open-toolbox/dot-metrics
Project-URL: Issues, https://gitlab.com/deepika6190303/deepika-open-toolbox/dot-metrics/-/issues
Author-email: deepika Team <contact@deepika.ai>
License: TODO: TO BE COMPLETED
License-File: LICENSE
Keywords: constraints,deepika,evaluation,metrics,open-toolbox
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.12
Requires-Dist: dash-bootstrap-components>=1.0
Requires-Dist: dash-bootstrap-templates>=1.0
Requires-Dist: dash>=2.0
Requires-Dist: pandas>=2.0
Requires-Dist: plotly>=5.0
Requires-Dist: pydantic>=2.0
Description-Content-Type: text/markdown

# dot-metrics

![Python Version](https://img.shields.io/badge/python-3.12%2B-blue)
![Coverage](https://img.shields.io/badge/coverage-100%25-brightgreen)

**dot-metrics** is a lightweight metrics and constraint evaluation framework. Define metrics and constraints, run them against your data, and get structured results with debug info. Visualize results with terminal bar charts or an interactive browser-based explorer with scatter, bar, and heatmap charts.

## Install

```bash
pip install dot-metrics
```

## Concept

A `MetricSet` holds metric and constraint definitions. Call `compute(data)` to evaluate them.

```
MetricSet
 ├── metrics:     {"coverage": MetricDefinition}
 └── constraints: {"errors":   ConstraintDefinition}
          │
          ▼
     set.compute(data)
          │
          ▼
      EvalResult
       ├── metrics:     {"coverage": Metric}
       └── constraints: {"errors":   Constraint}
```

**Metrics** are continuous measurements (e.g. coverage rate, score).
**Constraints** are pass/fail checks against a threshold (e.g. error count ≤ 0).

## Quick start

```python
from dot_metrics import MetricSet

metric_set = MetricSet()

@metric_set.metric("coverage")
def coverage(data):
    return data["covered"] / data["total"]

@metric_set.constraint("errors", threshold=0)
def errors(data):
    return data["error_count"]

result = metric_set.compute({"covered": 90, "total": 100, "error_count": 0})

result.score("coverage")     # 0.9
result.constraints_ok        # True
```

## Defining metrics and constraints

### Decorator style

```python
metric_set = MetricSet()

@metric_set.metric("latency_ms", unit="ms", higher_is_better=False)
def latency(data):
    return data["total_ms"] / data["requests"]

@metric_set.constraint("error_rate", threshold=0.01, unit="%")
def error_rate(data):
    return data["errors"] / data["requests"]
```

### Imperative style

```python
metric_set = MetricSet()
metric_set.add("coverage", lambda data: data["covered"] / data["total"])
metric_set.add_constraint("errors", lambda data: data["error_count"], threshold=0)
```

Both styles accept the same keyword arguments.

### Parameters

**Metrics** — `metric_set.add(key, fn, *, ...)` / `@metric_set.metric(key, *, ...)`

| Parameter | Default | Description |
|-----------|---------|-------------|
| `name` | `""` | Human-readable display label |
| `unit` | `""` | Unit of measurement |
| `description` | `""` | Free-text description |
| `higher_is_better` | `True` | Affects terminal chart rendering |
| `normalize` | `None` | `Callable[[float], float]` — maps raw value to [0, 1] for chart display |
| `metadata` | `{}` | Arbitrary data, passed through to results |

**Constraints** — `metric_set.add_constraint(key, fn, *, ...)` / `@metric_set.constraint(key, *, ...)`

| Parameter | Default | Description |
|-----------|---------|-------------|
| `threshold` | *required* | Pass/fail boundary |
| `name` | `""` | Human-readable display label |
| `unit` | `""` | Unit of measurement |
| `description` | `""` | Free-text description |
| `higher_is_better` | `False` | `False`: passes when `value <= threshold`. `True`: passes when `value >= threshold` |
| `metadata` | `{}` | Arbitrary data, passed through to results |

## Computing results

```python
result = metric_set.compute(data)
```

Every metric and constraint function must accept **exactly one argument** — the data object. `data` can be anything: a dict, dataclass, Pydantic model, etc.

### Accessing results

```python
result.score("coverage")                  # float — metric value
result.score("errors")                    # also works for constraints
result.metrics["coverage"].value          # same as score("coverage")
result.metrics["coverage"].name           # "" by default
result.metrics["coverage"].unit           # ""
result.metrics["coverage"].debug          # {} by default

result.constraints["errors"].passed       # True/False
result.constraints["errors"].threshold    # 0

result.constraints_ok                     # True if all constraints passed
result.violations                         # list of failed Constraint objects
result.assert_constraints()               # raises ValueError if any failed

# Iteration
len(result)                               # total number of metrics + constraints
result["coverage"]                        # returns Metric or Constraint by key
for key in result:                        # iterates over all keys (metrics first)
    print(key, result[key].value)
```

## Attaching debug info

Return a `ComputedValue` instead of a plain float to attach structured debug data:

```python
from dot_metrics import MetricSet, ComputedValue

metric_set = MetricSet()

@metric_set.metric("coverage")
def coverage(data):
    missed = [x for x in data if not x["covered"]]
    return ComputedValue(value=1 - len(missed) / len(data), debug={"missed": missed})

result = metric_set.compute(data)
result.metrics["coverage"].debug    # {"missed": [...]}
```

`ComputedValue` works the same way for constraints.

## Batch evaluation

Evaluate a set of inputs in one call:

```python
# dict of inputs
batch = metric_set.compute_batch({"run_1": data1, "run_2": data2})
batch["run_1"].score("coverage")            # 0.9
batch.scores("coverage")                   # {"run_1": 0.9, "run_2": 0.85}

# list of inputs
batch = metric_set.compute_batch([data1, data2, data3])
batch[0].score("coverage")                 # indexed by position
```

`BatchResult` supports iteration, `len()`, and `.items()`.

## Metric documentation

Add a Google-style docstring to auto-populate a `help` dict on both the definition and the result:

```python
@metric_set.metric("coverage", unit="%")
def coverage(data):
    """Percentage of code paths covered by tests.

    Range: 0-100
    Interpretation:
        - 90-100: Excellent
        - 70-90:  Good
        - <70:    Needs improvement
    Notes:
        - Returns 0 for empty input.
    """
    return sum(1 for x in data if x["covered"]) / len(data)

result = metric_set.compute(data)
result.metrics["coverage"].help
# {"summary": "Percentage of code paths covered by tests.",
#  "range": "0-100",
#  "interpretation": "- 90-100: Excellent\n- 70-90:  Good\n...",
#  "notes": "- Returns 0 for empty input."}
```

Supported sections: `Range:`, `Interpretation:`, `Notes:`. No docstring produces `help = {}`.

## Typing

`MetricSet` is generic over the input type `T` and an optional metadata type `M`:

```python
from dataclasses import dataclass
from typing import TypedDict
from dot_metrics import MetricSet

@dataclass
class SchedulingData:
    appointments: list
    solution: list

# Type-checked input
metric_set: MetricSet[SchedulingData] = MetricSet()
metric_set.add("rate", lambda d: len(d.solution) / len(d.appointments))

# Type-checked metadata
class MyMeta(TypedDict):
    category: str
    priority: int

ms: MetricSet[dict, MyMeta] = MetricSet()
ms.add("score", lambda d: 1.0, metadata=MyMeta(category="perf", priority=1))
```

Both type parameters are optional — omitting them is fine and everything still works at runtime.

## Terminal chart

```python
from dot_metrics import draw_terminal_chart

print(draw_terminal_chart(result))
# coverage  ████████████████████  90%
```

`draw_terminal_chart(result, width=40, char="█")` renders a Unicode bar chart from an `EvalResult`.

By default, metric values are assumed to be in [0, 1]. For metrics outside this range, provide a `normalize` function when registering the metric to map the raw value into [0, 1] for display:

```python
metric_set.add("wait_days", compute_wait, normalize=lambda v: 1 / (v + 1))
```

## Interactive explorer

Explore batch results interactively in the browser:

```python
from dot_metrics import serve

batch = metric_set.compute_batch({
    ("gpt4", "en"): data_en,
    ("gpt4", "fr"): data_fr,
    ("llama", "en"): data_en,
    ("llama", "fr"): data_fr,
}, key_names=["model", "language"])

serve(batch)  # opens localhost:8050
```

The app provides:
- **Chart** — scatter, bar, or heatmap with configurable X, Y, color, and size axes
- **Aggregation panel** — group by any categorical column, compute mean/median/min/max
- **Data table** — sortable, with debug cell inspection and CSV export

`key_names` labels tuple key components (defaults to `key[0]`, `key[1]`, …). You can also pass a single `EvalResult`.

`serve(data, *, host="127.0.0.1", port=8050, debug=False)`

## Full example

```python
from dot_metrics import MetricSet, ComputedValue

appointments = [
    {"id": "A1", "patient": "Alice",   "duration": 30},
    {"id": "A2", "patient": "Bob",     "duration": 60},
    {"id": "A3", "patient": "Charlie", "duration": 30},
]

solution = [
    {"appointment_id": "A1", "practitioner": "Dr. Martin", "slot": "09:00", "scheduled": True},
    {"appointment_id": "A2", "practitioner": "Dr. Martin", "slot": "09:00", "scheduled": True},  # conflict!
    {"appointment_id": "A3", "practitioner": "Dr. Martin", "slot": "10:00", "scheduled": True},
]

metric_set = MetricSet()

@metric_set.metric("scheduling_rate")
def scheduling_rate(data):
    scheduled = [e for e in data["solution"] if e["scheduled"]]
    unscheduled = [e["appointment_id"] for e in data["solution"] if not e["scheduled"]]
    return ComputedValue(value=len(scheduled) / len(data["appointments"]), debug={"unscheduled": unscheduled})

@metric_set.constraint("conflicts", threshold=0)
def count_conflicts(data):
    seen = {}
    conflicts = []
    for entry in data["solution"]:
        key = (entry["practitioner"], entry["slot"])
        if key in seen:
            conflicts.append((seen[key], entry["appointment_id"]))
        seen[key] = entry["appointment_id"]
    return ComputedValue(value=len(conflicts), debug={"conflicts": conflicts})

result = metric_set.compute({"appointments": appointments, "solution": solution})

result.score("scheduling_rate")                     # 1.0
result.constraints_ok                               # False
result.constraints["conflicts"].debug               # {"conflicts": [("A1", "A2")]}
```

## Reference

| Import | Description |
|--------|-------------|
| `MetricSet` | Main class — holds definitions, runs computation |
| `EvalResult` | Output of `compute()` — holds `Metric` and `Constraint` dicts |
| `BatchResult` | Output of `compute_batch()` — maps keys to `EvalResult` |
| `ComputedValue` | Wraps a float return value with optional debug data |
| `Metric` | Computed metric result |
| `Constraint` | Computed constraint result with `passed` flag |
| `MetricDefinition` | Stored metric definition (in `metric_set.metrics`) |
| `ConstraintDefinition` | Stored constraint definition (in `metric_set.constraints`) |
| `draw_terminal_chart` | Renders a Unicode bar chart from an `EvalResult` |
| `serve` | Launches an interactive Dash explorer |

## Contributing & Development

See [docs/CONTRIBUTING.md](docs/CONTRIBUTING.md) and [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md).

## License

See [LICENSE](LICENSE) for details.

## Contact

deepika Team — contact@deepika.ai
Project: [gitlab.com/deepika6190303/deepika-open-toolbox/dot-metrics](https://gitlab.com/deepika6190303/deepika-open-toolbox/dot-metrics)
