# GraphReFly — Python package (`graphrefly`)

The reactive harness layer for agent workflows. Reactive graph runtime for human + LLM co-operation — causal tracing, persistent checkpoints, policy enforcement, zero runtime dependencies.

Keywords: harness engineering, agent harness, reactive graph, causal trace, explain_path, agent orchestration, LLM co-operation, human-in-the-loop, reactive middleware, universal reduction layer.

This file is an AI-oriented index for the Python repo. It points to source-of-truth docs and generated API references instead of maintaining a long, drift-prone inline API catalog.

## Canonical behavior and docs authority

- Behavior spec: `~/src/graphrefly/GRAPHREFLY-SPEC.md`
- Composition guidance: `~/src/graphrefly/COMPOSITION-GUIDE.md`
- Docs conventions (cross-language): `~/src/graphrefly-ts/docs/docs-guidance.md`
- Roadmap/state (cross-language): `~/src/graphrefly-ts/docs/roadmap.md`
- Testing standard (cross-language): `~/src/graphrefly-ts/docs/test-guidance.md`

## Active strategy docs (cross-language, live in graphrefly-ts)

- `~/src/graphrefly-ts/archive/docs/SESSION-reactive-collaboration-harness.md` — 7-stage reactive collaboration loop, gate port, `prompt_node`, `valve` rename, strategy model, `harness_loop()` factory
- `~/src/graphrefly-ts/archive/docs/SESSION-harness-engineering-strategy.md` — 8-requirement coverage, three-wave announcement plan, MCP Server priority, eval system design
- `~/src/graphrefly-ts/archive/docs/SESSION-marketing-promotion-strategy.md` — positioning pillars, wave plan, reply-marketing playbooks, competitive intel, prompt optimization algorithms, blog content plan

## Primary links

- Python docs: https://graphrefly.dev/py/
- Python API reference: https://graphrefly.dev/py/api/
- TypeScript docs: https://graphrefly.dev

## Package entry points

- Root: `from graphrefly import ...`
- Subpackages: `graphrefly.core`, `graphrefly.graph`, `graphrefly.extra`, `graphrefly.patterns`, `graphrefly.compat`, `graphrefly.integrations`

## Public API by subpackage

Full export lists live in the generated API pages under `website/src/content/docs/api/`. The descriptions below explain *what lives in each subpackage and why*, so an LLM reading this cold can navigate without consuming the drift-prone catalog. Python uses snake_case (`from_timer`, `switch_map`, `agent_loop`) — the behavior parity with TypeScript holds, the naming does not.

### `graphrefly.core` — `src/graphrefly/core/`

The minimal reactive substrate. One primitive (`node`) plus sugar constructors (`state`, `derived`, `producer`, `effect`, `pipe`, `dynamic_node`). Protocol internals: `MessageType` enum (`DATA`, `DIRTY`, `RESOLVED`, `COMPLETE`, `ERROR`, `TEARDOWN`, `INVALIDATE`, `PAUSE`, `RESUME`), dispatch (`dispatch_messages`, `is_phase2_message`), two-phase batch semantics (`batch`, `is_batching`, `partition_for_batch`, `down_with_batch`), guard / policy (`policy`, `policy_from_rules`, `compose_guards`, `GuardDenied`, `access_hint_for_guard`), actor identity (`Actor`, `system_actor`, `normalize_actor`), meta snapshots (`meta_snapshot`, `describe_node`), versioning (`create_versioning`, `advance_version`, `default_hash`, `is_v1`), and the central clock (`monotonic_ns`, `wall_clock_ns`). **Anything outside this package must not call `time.time_ns()` / `time.monotonic_ns()` directly — use `src/graphrefly/core/clock.py`.** Also holds the `Runner` protocol and default-runner resolution (`get_default_runner`, `set_default_runner`, `resolve_runner`) for asyncio/trio backends. Thread-safety invariants (per-subgraph `RLock`, per-node `_cache_lock`) are enforced here.

### `graphrefly.graph` — `src/graphrefly/graph/`

The `Graph` container: reactive node registry with `register` / `resolve` / `describe` / `observe` / `snapshot` / `diff` / `diagram` / `spy` / `mount` / `unmount` / `teardown`. `reachable(graph, start_path, options=None)` computes the dependency closure from a node path — foundation for `explain_path` (causal walkback) and persistence scope.

### `graphrefly.extra` — `src/graphrefly/extra/`

Everything reactive that isn't the primitive. Organized into focused modules:

- **Tier 1 operators** (`tier1.py`) — transform (`map`, `filter`, `scan`, `reduce`), limiting (`take`, `skip`, `take_while`, `take_until`), selection (`first`, `last`, `find`, `element_at`), utility (`start_with`, `tap`, `distinct_until_changed`, `pairwise`), combination (`combine`, `with_latest_from`), merging (`merge`, `zip`, `concat`, `race`).
- **Tier 2 operators** (`tier2.py`) — higher-order (`switch_map`, `concat_map`, `flat_map`, `exhaust_map`), timing (`debounce`, `throttle`, `delay`), sampling (`sample`, `audit`, `timeout`), buffering (`buffer`, `buffer_count`, `buffer_time`), windowing (`window`, `window_count`, `window_time`), `interval`, `repeat`, `pausable`, `rescue`. **Note:** the existing `gate(control)` here is a boolean control gate and is being renamed to `valve` per §9.0; the new human-approval `gate` lives under `patterns.orchestration`.
- **Sources** (`sources.py`) — synchronous (`of`, `empty`, `never`, `throw_error`, `from_iter`, `from_timer`, `from_cron`), async bridges (`from_awaitable`, `from_async_iter`), network / IO (`from_http`, `from_event_emitter`, `from_fs_watch`, `from_webhook`, `from_websocket`, `from_mcp`, `from_git_hook`), multicast / caching (`share`, `cached`, `replay`), collectors (`for_each`, `to_list`, `to_array`, `first_value_from`), sinks (`to_sse`, `to_websocket`). **All async boundaries must enter through sources** — node fns and operators never contain `async def` or raw awaitables.
- **Data structures** (`data_structures.py` + dedicated files) — `reactive_map`, `reactive_log` + `log_slice`, `reactive_index`, `reactive_list`, `pubsub`.
- **Resilience** (`resilience.py` + `backoff.py`) — `retry`, `circuit_breaker` + `CircuitOpenError`, `token_bucket`, `token_tracker`, `rate_limiter`, `with_breaker`, `with_status`; backoff strategies (`constant`, `linear`, `exponential`, `fibonacci`, `decorrelated_jitter`, `with_max_attempts`, `resolve_backoff_preset`).
- **Cron** (`cron.py`) — `parse_cron`, `matches_cron`.
- **Checkpoint** (`checkpoint.py`) — `Memory` / `Dict` / `File` / `Sqlite` adapters, `save_graph_checkpoint` / `restore_graph_checkpoint`, `checkpoint_node_value`. Auto-checkpoint is gated by `message_tier >= 3`. See `docs/ADAPTER-CONTRACT.md` for the adapter contract.
- **Composite** (`composite.py`) — `verifiable(source, opts)` wraps a node with verification tracking; `distill(raw, extract, opts)` compacts raw data into a budget-bounded summary (the reactive memory primitive that `agent_memory` plugs into).
- **Backpressure** (`backpressure.py`) — watermark controller for PAUSE/RESUME flow control.

### `graphrefly.patterns` — `src/graphrefly/patterns/` (Phase 4+ domain APIs)

Developer-facing factories that compose primitives into recognizable patterns. **Protocol internals never surface here** — sensible defaults, minimal boilerplate, clear errors. Python uses `with batch():` context managers instead of TypeScript's callback form, and `Node.__or__` maps to the TS `pipe()`.

- **Orchestration** (`orchestration.py`) — `pipeline`, `task`, `branch`, the boolean control `gate` *(being renamed to `valve`)*, `approval`, `for_each`, `join`, `loop`, `sub_pipeline`, `sensor`, `wait`, `on_failure`. The §9.0 port of the human-approval gate (`pending` / `count` / `is_open` reactive nodes and `approve` / `reject` / `modify(fn, n=1)` / `open` / `close` methods) also lands here.
- **Messaging** (`messaging.py`) — `topic` (TopicGraph), `subscription` (SubscriptionGraph with cursor / `pull` / `ack`), `job_queue`, `job_flow`, `topic_bridge`. These are the primitives the §9.0 harness loop uses for static-topology sinks and cursor-driven readers.
- **Memory** (`memory.py`) — `collection`, `light_collection`, `vector_index`, `knowledge_graph`, `decay(source, opts)` (OpenViking exponential decay, 7-day default half-life). `decay` is the priority-scoring primitive for the §9.0 QUEUE stage.
- **AI** (`ai.py`) — `from_llm`, `streaming_prompt_node`, `stream_extractor`, `chat_stream`, `tool_registry`, `system_prompt_builder`, `llm_extractor`, `llm_consolidator`, `agent_memory` (default: distill + vectors + KG + tiers), `agent_loop` (ReAct inner loop), `knobs_as_tools`, `gauges_as_context`, `validate_graph_def`. The §9.0 `prompt_node` factory (wraps any LLM call in a derived node with retry / cache / structured output) also lives here and is what plugs into `distill` as `extract_fn` / `consolidate_fn`.
- **CQRS** (`cqrs.py`) — `cqrs(name, opts=None)` creates a commands / events / projections / sagas Graph.
- **Reactive layout** (`reactive_layout/`) — `reactive_layout`, `reactive_block_layout`, `analyze_and_measure`, `compute_line_breaks`, `compute_char_positions`. DOM-free Knuth-Plass line breaking as a `state → derived` graph.

### `graphrefly.compat` — `src/graphrefly/compat/`

Async runner bindings. `AsyncioRunner` wraps a Graph in an asyncio event loop; `TrioRunner` does the same for trio. These are the only places allowed to orchestrate coroutines — node fns and operators stay synchronous. Runners resolve via `graphrefly.core.resolve_runner` so downstream code can inject a specific backend.

### `graphrefly.integrations` — `src/graphrefly/integrations/`

Framework integrations. `fastapi.GraphReflyRouter(graph)` exposes `GET /describe`, `GET /snapshot`, `WS /observe` over a registered `Graph`. Keep these thin — all logic belongs in `core` / `extra` / `patterns`.

### Design docs that back the above

- Phase 4+ factory authors must read `~/src/graphrefly/COMPOSITION-GUIDE.md` before composing primitives — covers lazy activation, subscription ordering, null guards, feedback cycles, `prompt_node` SENTINEL, wiring order.
- Backlog / anti-patterns / deferred follow-ups: `~/src/graphrefly-ts/docs/optimizations.md` (cross-language).
- Historical design decisions and parity notes: `~/src/graphrefly-ts/archive/optimizations/` (cross-language).

## Generated docs workflow

Python API pages are generated from docstrings:

- Generator: `website/scripts/gen_api_docs.py`
- Output: `website/src/content/docs/api/*.md`
- Commands:
  - `pnpm --dir website docs:gen`
  - `pnpm --dir website docs:gen:check`
  - `pnpm --dir website sync-docs`
  - `pnpm --dir website sync-docs:check`

Do not hand-edit generated API pages.

## Core design invariants

- No polling; propagation is reactive.
- No imperative triggers outside graph/message flow.
- No raw async primitives inside reactive layer logic.
- Use `src/graphrefly/core/clock.py` and `message_tier` helpers for timing/tier semantics.
- Keep Phase 4+ surfaces developer-oriented (hide protocol internals from primary API docs).
- Thread safety: per-subgraph `RLock`, per-node `_cache_lock`; no `async def` / `Awaitable` in public APIs.

## Harness engineering coverage

GraphReFly covers the 8 requirements of a production agent harness — context/state control, execution boundary, control flow, verification, observability, policy/safety, human governance, continuous improvement. Mapping lives in `README.md` and `~/src/graphrefly-ts/archive/docs/SESSION-harness-engineering-strategy.md`.

## Reactive collaboration loop (§9.0)

`harness_loop()` wires a static topology: INTAKE → TRIAGE → QUEUE → GATE → EXECUTE → VERIFY → REFLECT. Static topology + flowing data (the Kafka insight). Humans steer via `gate.modify(fn, n)`. Strategy model (`root_cause × intervention → success_rate`) feeds TRIAGE routing. Design source: `~/src/graphrefly-ts/archive/docs/SESSION-reactive-collaboration-harness.md`.
