Why now
AI systems behave like distributed systems with a non-determinism budget. We don't have the time machine yet. NovaFabric is that time machine.
Who this is for
Engineers shipping AI agents to production. Researchers running experiments that have to reproduce. Auditors who need the trail. Platform teams whose oncall pager rings when a model returns garbage at 2am.
What's changed
Three years ago, debugging an LLM call meant reading a single prompt and a single response. Today's systems chain six model calls, ten tool invocations, four MCP servers, two retrieval backends, and three asynchronous workers — and any of them can drift between Tuesday and Wednesday. The mental model that worked for one prompt no longer holds.
What we won't do
We will not capture full prompts and responses by default — too noisy, too sensitive, too expensive. We will not run a hosted service in v0.x. We will not invent a third top-level format beyond Run Capsule and Evidence Bundle. We will not block the user workload by default if a NovaFabric component fails. The non-goals doc is the load-bearing constraint.