👁️🧠 Verified Agents Framework

Verel

The agent framework where nothing is “done” until a grader returns a verdict — checked by real senses including eyes — and only verified work compounds into the fleet’s shared memory.

$ pip install verel
315 tests passing ruff + mypy clean MIT Ollama Cloud · OpenAI
A glowing brain with an eye, and verified work passing through a verdict gate
Eval-driven by design

The loop is the whole point

Every agent action is a hypothesis. One verdict bus unifies vision + tests + lint + types into a single pass / warn / fail — with attestation, so a hollow check can’t mint green.

An eval-driven loop of five connected nodes
✍️
Write
agent acts
👁️
Perceive
senses → percepts
⚖️
Gate
verdict bus
🛠️
Fix
agent patches
Pass
self-computed
One framework

Five organs

A brain, eyes, a verdict bus, a fleet, and agents that build their own tools — each gated by the bus.

🧠

Brain

Memory that compounds — trust + provenance, consolidation, a held-out attested promotion gate, and lifecycle controls (pin · volatile-until-confirmed · TTL · correction chains). Only verified work graduates.

verel.memory
👁️

Eyes

AgentVision as a perception organ (DOM / contrast / OCR grounded), feeding the verdict bus and the brain.

verel.senses
⚖️

Verdict bus

One schema for every sense — advisory ceiling, grader attestation, scrubbed fingerprints, stuck/progress.

verel.verdict
🚁

Fleet

Agents managing agents — an LLM manager fans out; workers run in isolated git worktrees under budget.

verel.fleet
🔧

Tool-smith

Detect → scaffold → test → register → reuse, sandboxed with bwrap, admitted only on a passing attested eval.

verel.toolsmith
♻️

Agent-run CI/CD

Self-healing pipeline with a deterministic rollback engine that never acts on advisory evidence.

verel.ci
In the wild

Six real-world scenarios

Situations a team actually hits — each a runnable script in examples/. The output below is real, not mocked: an agent never decides “done”, a grader does.

01

Your CI went red

Real pytest fails → an agent patches the source (never the tests) → the stage re-gates until the graders go green.

round 1: verdict=fail  patched=['mathx.py','strx.py']
round 2: verdict=pass   patched=[]
healed=True  terminated_on=passed
$ python examples/demo_selfheal.py
02

A bad merge slipped through

Canary grader fails → deterministic git revert to the last good HEAD — and it refuses to act on advisory-only evidence.

canary verdict=fail  rolled_back=True
  reverted c2852b1 → new HEAD 59ce62e
advisory-only failure → executed=False (refused)
$ python examples/demo_canary_rollback.py
03

Scale one fix across many repos

Concurrent managers fenced by leases (a stale leader’s writes refused); a multi-repo change commits as an atomic saga — nothing half-applied.

8 tasks, 2 managers — ran once each: True
stale leader A fenced off (token 1 < 2)
saga: committed=[] compensated=['commit:api']
  repos left landed: [] → atomic
$ python examples/demo_distributed_fleet.py
04

A polyglot monorepo

Python + JS + Go tests, lint, types, a perf budget, and a security scan — all on one verdict schema, one gate.

Go inner-loop: test [-] TestLogin failed
JS pre-merge: test [-] submit posts the form
Python: security [-] B602 shell=True
        perf [-] p95_ms 240 > budget 150
$ python examples/demo_polyglot_ci.py
05

An agent builds its own tool

detect → scaffold → test → register on a passing eval, then jailed to the syscalls it earned — a socket it never exercised is refused at the kernel.

learned 28 syscalls → enforced 71
tool runs under its jail: 5
socket() under the capability jail:
  REFUSED — Operation not permitted
$ python examples/demo_capability_jail.py
06

A shared team brain

Recall down a self→team→org→global lattice; a peer’s claim re-verifies before it’s trusted; leader-fenced HA with quorum reads that survive the leader being down.

agent-B's belief (my check fails):
  stayed CANDIDATE — trust did not travel
quorum read, leader DOWN → still returns
  'restart the worker pool' (freshest)
$ python examples/demo_shared_brain.py
Eyes & Brain

Two systems, one nervous system

AgentVision is the eyes; Verel is the brain. The eyes perceive a rendered artifact and grade it — including does it match what we set out to build? — then hand a clean signal up the optic nerve. The brain decides with grader attestation, and only verified work compounds into memory. The eyes can also watch over time — temporal verification of playback / loading / liveness for streaming UIs, video and live dashboards. A deterministic video stall gates the bus to FAIL, and verified playback compounds into memory. They ship and version independently (pip install agentvision, pip install verel), but in sync.

Eyes and Brain - AgentVision perceives and grades intent; Verel decides and compounds verified work into memory
How it works

Architecture & flow

The five organs and the eval-driven loop — hand-drawn, and verified clean by the eyes Verel ships.

Verel system architecture diagram Verel eval-driven loop diagram
At a glance

The whole story, one image

Rendered and verified by the eyes Verel ships — Verel gates its own marketing.

Verel architecture infographic — five organs and the eval-driven loop
5
organs, end-to-end
315
tests, dogfooded
1
verdict bus, all senses
MIT
open source
Adopt

Drop it into your workflow & agents

CI gate in one step (uses: amitpatole/verel@v0.28.0) — tests + lint + types in one verdict — a pre-commit hook, or verel-ci check in any script. For agents: verel-mcp exposes the verdict bus + memory, with AgentVision as the eyes.