ARGUS
local

How to Use Argus

A step-by-step walkthrough of every section in the Argus dashboard — from browsing your run history to reading failure details, comparing executions, and replaying from a broken node.

Quick-start — paste this into your LLM (Claude Code, Cursor, etc.) to add ARGUS to your pipeline:

LLM Prompt — paste into Claude Code, Cursor, etc.
Add ARGUS monitoring to my LangGraph pipeline. In the file where the graph is built, add the following before graph.compile():

from argus import ArgusWatcher

watcher = ArgusWatcher()
watcher.watch(graph)    # must be called BEFORE graph.compile()
app = graph.compile()

For cyclic graphs, also call watcher.finalize() after app.invoke().

1. Runs List

The home page is your pipeline execution history. Every time your pipeline runs with Argus attached, an entry appears here automatically.

Argus runs list page

The runs list — aggregate stats at the top, evaluation panel, and the full run table below.

Summary cards

Total RunsNumber of pipeline executions recorded in your workspace.
CleanRuns where every node passed with no failures detected.
FailedRuns with at least one silent failure, crash, or semantic failure.
Pass RatePercentage of clean runs over the total.

Run table columns

RUN IDUnique identifier for the run. Click to open the full detail view.
STATUSOverall result — clean, silent failure, crashed, or semantic fail.
GRAPHThe node execution path, summarised as a chain.
STEPSTotal number of nodes that executed in this run.
FIRST FAILUREThe first node that produced bad output — the likely root cause.
SHAPEWhether all expected nodes ran (full) or the run was cut short (partial).

Evaluation panel

The Evaluation section lets you filter runs by criteria — set a goal description and add constraints like overall_status == clean to find runs that meet specific conditions. Hit Evaluate to filter the table.

Click any run ID to open its full detail page.

2. Run Detail

The run detail page gives you a complete picture of what happened during a single pipeline execution — metrics, the execution trace, AI analysis, and the initial state.

Run detail — header, root cause, metrics, and execution timeline

Top of the run detail page: run ID, status, root cause chain, metrics grid, and the execution timeline.

Header

Shows the run ID, overall status badge, timestamp, total duration, step count, and Argus version. The Compare button lets you immediately diff this run against another.

Root cause chain

When a failure propagates downstream, Argus traces back to find the originating node. The red banner shows the chain — e.g. extract_skills → generate_summary — so you know exactly which node to fix, not which node complained.

Metrics

DurationTotal wall-clock time for the full pipeline execution.
Success RatePercentage of nodes in this run that passed.
FailuresNumber of nodes with any failure status.
SeverityWorst severity level seen: ok, warning, or critical.
CompletedWhether the pipeline ran to the final node or was cut short.

Execution timeline

Each node is listed in order with its name, output type tag, duration, and status. Nodes with failures show an indented root cause annotation — the specific field that was missing and which upstream node failed to produce it. Expand any row with the arrow to see the full input/output JSON.

Execution timeline showing degraded input nodes and AI analysis

Lower execution timeline showing degraded_input propagation, followed by the AI Analysis panel.

AI Analysis

When OPENAI_API_KEY is set, Argus automatically investigates non-clean runs. The panel breaks down the failure into three parts:

Root Cause Node

The specific node Argus identified as the origin of the failure — not the node that complained, but the one that first produced the broken state.

Reason

A concise explanation of why that node failed and how the bad state propagated through downstream nodes.

How to Fix It

Numbered action items — each targeting a specific node — telling you exactly what to change to prevent the failure from recurring.

A confidence score is shown in the top-right of the panel. The footer shows how many causal hypotheses were evaluated and how many observations were used.

AI analysis fix steps, correlation panel, and behavior section

AI fix steps, the Correlation panel (origin node + confidence), and the Behavior/Initial State sections.

Correlation

Argus runs a correlation analysis to confirm which node is the true origin of the degradation. Shows the origin node name, step index, failure signals (e.g. missing_field), and a confidence score.

Behavior & Initial State

The Behavior section shows the raw initial state your pipeline received — the exact input dict at invocation time. Useful for reproducing the failure locally.

3. Compare

Compare two runs side-by-side to see exactly what changed — useful for verifying a fix worked, catching regressions, or understanding why one run is faster than another.

Compare page showing winner verdict and node-by-node diff

Compare page: winner verdict at the top, aggregate stats table, then a node-by-node status comparison.

How to compare

1

Open Compare

Click Compare in the sidebar, or use the Compare button on any run detail page (pre-fills Run A).
2

Enter two run IDs

Paste a Run A (typically the older / broken run) and Run B (the newer / fixed run).
3

Read the verdict

The winner banner shows which run performed better and why — fewer failures, faster duration, higher success rate.
4

Read the node diff

Each node is listed with its status in A and B. Nodes only present in one run are labelled only in A or only in B. Status changes are highlighted.

The aggregate table shows Failures, Duration, and Success Rate side-by-side with a winner indicator (B ✓) for each metric.

4. Replay

Replay re-executes your pipeline from a specific node using the frozen input state captured from a previous run. This means you can test a fix without re-running the full pipeline or making new LLM calls for the nodes before the broken one.

How replay works

When Argus records a run, it saves the input state at every node. When you replay from node X, Argus loads the exact input that node X received originally, then re-executes node X and everything downstream with your current code. A new run ID is created for the result.

Step by step — from the dashboard

1

Open the failing run

Click the run ID on the runs list to open its detail page.
2

Find the root cause node

Check the red root cause banner at the top — it names the node that first produced bad output. That's the node you want to replay from.
3

Click the replay icon

In the execution timeline, each node row has a replay icon (↺) on the right. Click it on the root cause node.
4

Wait for the new run

Argus re-executes from that node forward. When done, you're taken to the new run's detail page with a fresh set of results.
5

Compare to confirm

Use the Compare button to diff the original run against the replay. The broken nodes should now show pass.

Step by step — from the CLI

Replay from a specific node
argus replay <run-id> <node-name>
If node functions weren't stored in the run
argus replay <run-id> <node-name> --app my_pipeline:build_graph

The --app flag takes a module:function path to your graph factory function. Only needed if node function references weren't captured at recording time. After replay, use argus diff to compare:

argus diff <original-run-id> <replay-run-id>

Screenshots for the replay UI will be added in a future update.