Changelog
What's new in each version of ARGUS.
v0.4.0BETA2026-05-17
Beta Testing Rollout
- •LLM token/cost tracking — auto-extracts usage from node outputs, shows cost per node and total
- •Redesigned sidebar into Observe / Analyze sections
- •Replay from subdirectories — runs recorded from child folders are now found and replayable
- •Changelog page with full version timeline
- •Report the Dev page for bug reports and feature requests
- •Restored and expanded test suite
v0.3.10MINOR2026-05-06
Replay wired to UI
- •Replay from UI — hover any step, click "replay from here", no CLI needed
- •App factory input persists to .argus/config.json automatically
- •Auto-compare after replay — navigates to diff view showing original vs replay
- •Eval metrics panel in compare page (failure count, severity, success rate)
- •--app flag on argus ui for startup config
v0.3.7MINOR2026-05-04
Auto-login & interrupt stitching
- •Auto-login in argus ui — reads CLI credentials, no second OAuth flow
- •Interrupted runs + resume continuations stitched into a single merged view
- •Resume runs hidden from top-level list to avoid duplicates
v0.3.5MINOR2026-05-04
Cloud storage
- •Google login and Supabase cloud storage
- •Background sync of runs to cloud
v0.3.3MAJOR2026-04-27
Web Dashboard
- •argus ui — web dashboard served by a pure-Python HTTP server, no Node.js required
- •Runs list with status, duration, pass rate stats
- •Per-run detail view with step-by-step inspection
- •Side-by-side run comparison at /compare
- •Zero extra dependencies — just Python
v0.3.2PATCH2026-04-22
Silent failure enhancements
- •Strict mode for inspector
- •Nested error scan in tool outputs
- •Generic list type checking
v0.3.0MAJOR2026-04-19
Parallel Execution
- •Parallel nodes grouped in a parallel panel in argus show
- •Graph topology diagram above the step list
- •Silent failure detection is parallel-aware — only blames nodes for fields they wrote
- •Root cause chain excludes innocent parallel siblings
v0.2.2MINOR2026-04-18
Run Differentiator
- •argus diff — compare any two runs node-by-node
- •Status changes with FIXED / REGRESSION labels
- •Inspection diff, validator result flips, output field deltas
- •Frozen node and duration change tracking
v0.2.1MINOR2026-04-14
Tool Call Failure Detection
- •Detects error_response, rate_limit (HTTP 429), and empty_result from tool calls
- •Tool failures show inline under the node that swallowed them
v0.1.1MINOR2026-04-10
Deterministic Replay
- •Replay plays back exact outputs from the original run — no live LLM calls
- •Reproducible replays with zero configuration
v0.1.0MAJOR2026-04-02
MVP
- •Core monitoring: ArgusWatcher for LangGraph, ArgusSession for any framework
- •Silent failure detection: missing fields, type mismatches, empty outputs
- •Semantic signature registry for placeholder/degraded LLM outputs
- •Root cause analysis chain
- •CLI: argus show, argus replay
- •Local storage in .argus/runs/