Metadata-Version: 2.4
Name: decision-graph
Version: 1a0
Summary: Versioned property-graph schema for agentic decision structures (dg/v1) — spec, validator, and generators
Project-URL: Homepage, https://github.com/decision-graph/decision-graph
Project-URL: Repository, https://github.com/decision-graph/decision-graph
Project-URL: Issues, https://github.com/decision-graph/decision-graph/issues
Author-email: Zach Blumenfeld <zach.blumenfeld@neo4j.com>
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: agent,agentic,arrow,decision-graph,graph,neo4j,opentelemetry,otel,parquet,property-graph,schema,spec
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.11
Description-Content-Type: text/markdown

# dg — decision-graph spec

> **DRAFT — pre-1.0.** Schema is settled in shape; the JSON spec format is a first-iteration draft and will likely refine before stable release.

`dg` is a versioned property-graph schema for representing **decision structures** — the choices, actions, and communications an AI agent (or human, or hybrid system) makes during a workflow run. It's designed to be the canonical target for mapping any decision-making data, including OpenTelemetry agent traces, into 

1. Contntext graph (Neo4j or any property-graph database)
2. Analytics-ready tabular data (Arrow / Parquet) for ML training pipelines, RL, and agentic research.

The spec exists as a **machine-readable JSON document** (`spec.json`) plus a **human-readable design rationale** (`design.md`). The JSON spec is intended to drive auto-generators in consumer libraries — Arrow schema generation, Cypher constraint / index / upsert templates, validation tooling, etc.


## What's in this repo

| File | Purpose |
| ---- | ------- |
| `spec.json` | Canonical machine-readable schema. Property types, node definitions (with labels, upsert keys, indexes, properties, inheritance), relationship definitions (with from/to, cardinality, edge properties). |
| `design.md` | Full design rationale — locked decisions and the why behind them, layering, identity rules, namespacing contract, provenance contract, mapping from OpenTelemetry agent-trace formats, considered-and-rejected alternatives, open questions. |
| `arrows.json` | Supplementary Arrows.app-importable JSON for visual editing. Renders the schema as a property-graph diagram. Open at [arrows.app](https://arrows.app) and paste contents. |

## Why three artifacts

- **spec.json** is what tooling consumes. Auto-generators read it; validators check instances against it.
- **design.md** is what humans read to understand *why* the schema looks the way it does. New consumers should start here.
- **arrows.json** is what humans use to *see* the schema as a diagram and to propose edits visually. Re-export from Arrows back into this file when changes are accepted.

The three should stay in sync — if you change one, update the others.

## Layers at a glance

```
                     ┌────────────────────────┐
                     │   Evaluation layer     │  scores, judgments — produced
                     │   (:Evaluation)        │  separately from the workflow
                     └────────────┬───────────┘
                                  │ EVALUATES
                                  ▼
┌─────────────────┐   FOR_TASK   ┌───────────────────────┐
│ Reference layer │◄─────────────│  Execution layer       │
│ (resolve down)  │              │  (UUIDs / OTel ids)    │
│ Task, Actor,    │              │  WorkflowExecution,    │
│ Tool, Resource  │              │  StepExecution         │
│  (incl. :Model) │              │  (:ToolCall :Reasoning │
│ Session         │              │   :Message)            │
└─────────────────┘              └────────────────────────┘
```

- **Reference layer** — natural keys, MERGE on upsert. Same `gpt-4o-mini` everywhere; same `Tool {service_name, name, version}` across all workflows. Resolves down across runs.
- **Execution layer** — IDs derived from OTel `trace_id` / `span_id` when sourced from OTel (so re-ingestion is idempotent); UUIDs otherwise. Per-run state with timestamps, status, errors, provenance. Edges to the reference layer carry per-call data (token counts, model identity, tool args).
- **Evaluation layer** — independent provenance, `EVALUATES` back into the execution layer. Produced by separate eval pipelines (LLM judges, human review batches, automated tests).

See `design.md` for the full reference.

## Consuming the spec

The intended pattern is for downstream libraries to:

1. Read `spec.json` (pin to a tagged version)
2. Generate Arrow schemas from the node + relationship definitions
3. Generate Cypher `CREATE CONSTRAINT` / `CREATE INDEX` / parameterized `MERGE` upsert templates per node label and edge type
4. Validate instance documents against the spec

The reference implementation is [otela](https://github.com/decision-graph/otela), which will produce dg/v1 instance data from OTel agent traces.

## Installing

```bash
pip install --pre decision-graph
# or with uv:
uv add --prerelease=allow decision-graph
```

The `--pre` flag is needed while `dg/v1` is in alpha (`1a0`, `1a1`, ...). Once `v1` ships stable, drop it.

The package ships with the bundled `dg/v1` spec accessible as `decision_graph.SPEC_PATH` (a `pathlib.Path`) and `decision_graph.load_spec()` (a parsed dict). It has no runtime dependencies.

## Validating an instance

A dg-instance JSON document is a single file with `nodes` and `relationships` arrays:

```json
{
  "spec": "dg",
  "spec_version": "v1",
  "nodes": [
    {"id": "n1", "labels": ["Tool"], "properties": {"service_name": "...", "name": "...", "version": "..."}}
  ],
  "relationships": [
    {"type": "CALLS_TOOL", "from_id": "tool-call-1", "to_id": "n1", "properties": {}}
  ]
}
```

The instance-level node `id` is an opaque handle used for relationship endpoint resolution within the file. The spec's `properties.id` (where present) is a separate concept — the natural identity used for upsert into a graph DB.

Validate against the bundled spec:

```bash
python -m decision_graph.validator path/to/instance.json
```

Or programmatically:

```python
import decision_graph
from decision_graph.validator import SpecModel, validate

spec = SpecModel(decision_graph.load_spec())
errors = validate(instance_dict, spec)
for err in errors:
    print(err)
```

The validator checks structural shape, property types and constraints (enums, `minLength`, `minimum`, `format=json`, nullable), edge endpoint type matching (via `extends` chains so an `:Actor:Agent` is accepted where `:Actor` is required), dangling endpoint detection, and upsert-key uniqueness grouped by upsert root (an Agent and a Human with the same `(service_name, name, version)` will collide under the abstract Actor constraint, matching the Cypher-generation intent).

To validate against a spec on disk (e.g., a different version):

```bash
python -m decision_graph.validator path/to/instance.json --spec path/to/spec.json
```
## Versions

| Version | Status        | Tag    | Notes |
| ------- | ------------- | ------ | ----- |
| `v1`    | DRAFT (alpha) | `v1a0` | Initial pre-release. Three layers (reference / execution / evaluation), Actor-centric execution, OTel-derived idempotency. |

The package's major version tracks the schema version: alpha releases of `dg/v1` are tagged `v1a0`, `v1a1`, ...; the stable `dg/v1` release is `v1`; the next schema generation goes to `v2a0`, then `v2`. The `main` branch reflects the in-progress next release. For a stable consumable spec, install a tagged release from PyPI or pin to a tag.

## Status and stability

`dg/v1` is **DRAFT** in alpha. The schema *shape* is settled (locked decisions in `design.md`); what's still in flux is the JSON spec format itself — `spec.json` is a first iteration of a JSON-Schema-flavored DDL with custom extensions for graph-DDL concerns (`extends`, `upsert_key`, `indexes`, `$arrow`). The format (`meta.format_version: "0.1"`) is expected to refine before `v1` is tagged stable.

Once a version is tagged stable, its `spec.json` is **immutable** at that tag. Iteration during alpha happens via pre-release tags (`v1a0`, `v1a1`, ...); subsequent schema generations get a new version (`v2a0`, `v2`, ...).

## Contributing

This is an early project. The way to contribute right now is to **open an issue** describing a real-world OTel trace shape, ML use case, or graph-query pattern that the current schema doesn't support cleanly. Concrete examples beat abstract proposals.

When the schema does change, the version's `design.md` "Changes from previous version" section captures the diff with rationale; the GitHub Release notes summarize.

## License

[Apache-2.0](LICENSE).
