Nexus — Layered Architecture

Production-grade agentic AI framework built on Dapr's distributed runtime

9
CLI & Developer Experience
nexus init nexus run nexus eval nexus deploy nexus dev nexus cost nexus serve nexus replay nexus memory nexus version nexus redteam nexus doctor watch mode
Scaffold, run, evaluate, and deploy agents in seconds. Agent DSL via YAML or Python decorators. Project templates: simple, crew, RAG. Includes nexus doctor (setup health check), nexus replay (time-travel debugging), nexus serve (REST/OpenAI API server), and nexus redteam (adversarial testing).
8
Evaluation Framework
EvalSuite assertions prompt versioning quality baseline regression detection red-teaming (G41) OWASP Agentic Top 10
Built-in eval suites with semantic, regex, and schema assertions. Prompt A/B testing, version diffing, and automated quality regression alerts. Adversarial red-teaming framework (ADR-020): 6 attack strategies covering OWASP Agentic Top 10, attacker/judge architecture, CLI integration with CI exit codes.
7
Observability
OTEL traces Prometheus metrics behavior monitor event log anomaly detection
Custom OTEL spans per agent action (llm_call, tool_call, memory_read/write, decision). Prometheus metrics endpoint. Immutable audit event log. Behavior drift detection.
6
Safety & Guardrails
injection detection PII redaction action boundaries rate limiting YAML policies
Runtime middleware wrapping every LLM call, tool call, and memory write. Multi-layer prompt injection detection. PII detection and redaction. Per-agent tool allowlists.
5
Orchestration Engine
graph engine model router cost tracker agent loop crew (multi-agent) ReAct / Plan-Execute debate & ensemble uncertainty QA long-horizon planning market allocation artifact-centric version router causal analysis
Checkpointed graph execution with conditional branching. Model routing across fast/balanced/powerful tiers. Budget enforcement. Hierarchical multi-agent crews via Dapr distributed locks. Advanced primitives: debate (ADR-012), uncertainty QA (ADR-013), long-horizon planning (ADR-014), market allocation + artifact-centric collaboration (ADR-015). Version router for A/B prompt testing (ADR-025). Two-tier causal analysis: CausalTracer + CausalWorldModel + SimpleCausalInference (ADR-019).
4
Tools & Sandbox
tool registry MCP client Docker sandbox code executor idempotency A2A protocol LangChain adapter document tools (PDF/Word/Excel) code analysis tools (AST) RAG retrieve tool plugin lifecycle hooks
Decorator-based tool registration with JSON schema generation. MCP-compatible tool discovery. Docker container sandbox with network/filesystem isolation and container pooling. Pre-built library: web search, HTTP, file I/O, SQL, document parsing (PDF/Word/Excel via [documents]), code analysis (stdlib AST + ruff/mypy subprocess). LangChain tool adapter unlocks 1,000+ community tools. Plugin lifecycle hooks (ADR-024): pre/post LLM, tool, and memory hooks via entry-point discovery.
3
Memory
working memory episodic semantic (facts) procedural provenance trust scoring hybrid retrieval reflexion (F1) user modeling (F2) graph consolidation (F3) lifecycle tiers (F3) multi-provider embeddings
Four-tier memory (working, episodic, semantic, procedural) with provenance + SHA-256 integrity on all writes. Post-launch additions: ReflexionEngine + SkillLibrary for agent self-improvement (ADR-016); three-tier user modeling with temporal validity and bidirectional construction (ADR-017); graph-structured memory consolidation with hot/warm/cold lifecycle tiers and adaptive retrieval routing (ADR-018). Multi-provider embeddings: OpenAI, Cohere, Ollama — routable per memory type (ADR-023).
2
Core Abstractions
NexusConfig NexusError hierarchy structlog Pydantic v2 types ModelClient ABC Anthropic / OpenAI Cohere / Ollama agent versioning
Foundation layer: config (pydantic-settings, NEXUS_ env prefix), typed error hierarchy, structured JSON logging, all canonical Pydantic types, and async model clients for Anthropic, OpenAI, Cohere, and Ollama. Agent versioning (ADR-025): content-addressed SHA-256 version IDs, VersionStore, ABTestManager, sticky session routing.
1
Dapr Runtime — Infrastructure Backbone
state (PostgreSQL + pgvector) cache (Redis) pub/sub workflows actors distributed lock mTLS OTEL auto-trace
All infrastructure via Dapr sidecar — agent code never touches databases or brokers directly. Swappable via component YAML with zero code changes. mTLS between services is zero-configuration.
Layer principles
Each layer builds exclusively on the layers below it
Dapr is the only I/O path — no direct DB or broker access
All public types are Pydantic v2 models (strict mode)
Safety wraps Orchestration — every LLM + tool call is intercepted
Every memory write carries provenance & SHA-256 hash
Observability and Evaluation are orthogonal — can start after Core

Build dependency order

Phase 0–2 (Foundation)

  • Bootstrap → Core → Dapr

Phase 3–5 (Memory)

  • Working & Episodic (3)
  • Semantic & Procedural (4)
  • Memory Security (5)

Phase 6–7 (Execution)

  • Tools & Sandbox (6)
  • Orchestration (7) ← needs 3+6

Phase 8 (Safety)

  • Wraps Orchestration
  • Requires Phase 7

Phase 9–10 (Cross-cutting)

  • Observability — after Phase 2
  • Evaluation — after Phase 5

Phase 11–12 (Ship)

  • CLI — after Phase 7
  • Integration + Release — last

Post-launch A–B (Complete)

  • Token streaming, deep research demo
  • Tool library, LangChain adapter
  • REST server, OpenAI-compat API, Jupyter

Post-launch C–E (Complete)

  • HITL UI, time-travel debug (C)
  • Debate, uncertainty QA, planning (E)
  • Market allocation, artifacts (E)

Post-launch F–H (Complete)

  • Reflexion, user modeling (F)
  • Graph memory, lifecycle tiers (F3)
  • Causal analysis (F4), red-teaming (G1)
  • Doc/code tools, multi-embed, plugins, versioning, RAG (H)

Phase I — Quality & Ops

  • nexus doctor, health endpoints
  • Secrets scan, pip-audit, SBOM
  • Graceful shutdown, rate limiting
  • CSP headers, SECURITY.md
  • Benchmarks, mutation testing