QA Runbook/admin UI rebuild/2026-05-20/manual / browser

Manual QA runbook.

Step-by-step checks for every feature shipped in P1–P5 plus the security hardening. The automated suite (2297 tests) covers server logic; this runbook closes the gap it can't reach — the browser-side JavaScript and the real end-to-end flows. Click any test to tick it off; state persists in your browser.

Tests
checks
Smoke path
~15 min
Full pass
~60–90 min
Needs real $
re-run · replay
Browser gap
the JS tests
Legend

smoke part of the ~15-min critical happy path · js exercises browser JavaScript (the gap the test suite can't reach — do these carefully) · $ incurs real cost (LLM tokens or proxy bandwidth) · super-admin requires a super-admin session · perm a permission/negative test.

// jump setup smoke P1 P2 P3 P4 P5 security permissions known limits reset
// overall0 / 0
00Setup & prerequisites

Before you start.

Get the server running, a super-admin session, a completed scan to inspect, and (optionally) two extra accounts for the permission tests.

// prerequisites0 / 0
SET-01Run the migrations on your target DB
Do: set -a; source .env; set +a; .venv/bin/python -m alembic upgrade head
Expect: Upgrades run through e9a4b3c1d5f8 (llm_models) and d8e6f4a2b9c1 (validation_replays) with no error. llm_models seeds 9 rows.
SET-02Start the dev server
Do: set -a; source .env; set +a; .venv/bin/python -m uvicorn browser_recon_server.app:create_app --factory --port 8000 — wait ~10–15s for the DB connection on startup.
Expect: Log shows "Application startup complete". curl localhost:8000/health → 200.
SET-03Log in as super-admin
Do: Visit /dashboard/login, enter your super-admin email, submit. In dev the magic link is printed to the server log (LoggingEmailProvider) — copy it from stdout and open it. If you don't have a super-admin user yet, set one directly: UPDATE users SET role='super_admin' WHERE email='you@example.com'; (the role-change route itself requires super-admin, so the first one is a DB update).
Expect: You land on the dashboard and the top bar shows your role as super-admin.
SET-04Find a completed scan
Do: Open /admin/scans. Pick a scan with status complete (ideally one with an LLM synthesis + a validation step). Note its scan id. If you have none, run a real scan first, or pick the richest one available.
Expect: The list shows only terminal-state scans (complete / errored / cancelled) by default — no in-progress rows.
SET-05(Optional) Create an admin + a plain user for permission testsperm
Do: .venv/bin/python scripts/seed_user.py --email admin-test@example.com then UPDATE users SET role='admin' WHERE email='admin-test@example.com';. Repeat for a plain user role. Skip if you only want the happy-path checks.
Expect: Two extra accounts you can log in as for the §Permissions section.
01Smoke path~15 min · if these pass, the core works

The critical happy path.

Run this first. It walks one scan through the whole new surface and touches each major feature once.

// smoke0 / 0
SM-01Step Trace renderssmoke
Do: Open /debug/scans/{id} for your completed scan.
Expect: A single table lists every pipeline step (capture, detection, analysis, flow_confirm, intent_filter, validation, scrub, synthesis, replay, render) with #, type, started, duration, model, cost, status, S3, and an "explore →" link. Sub-prompts (notes, difficulty_drivers) appear indented under synthesis. Roll-up panels show cost-by-step and duration breakdown.
SM-02Trace filters worksmokejs
Do: Click the "llm" filter pill. Then type "synthesis" in the search box. Then toggle "cost > $0.01".
Expect: Rows hide/show live, the "N of M steps shown" count updates, and the URL hash changes (e.g. #filter=llm&q=synthesis). Reload the page with that hash → the filter state restores.
SM-03Open an LLM step + lazy-load the promptsmokejs
Do: Click "explore →" on the synthesis step. Click "fetch prompt + response".
Expect: The step page shows timing/metadata. After the fetch, the system + user prompts and the parsed response render in monospace panels. The "[expand ↧]" toggle reveals truncated content. No console errors.
SM-04Re-run an LLM step on a different Anthropic modelsmokejs$super-admin
Do: In the Re-run panel, pick claude-haiku-4-5, leave prompts empty, click "▶ run". Allow the browser notification permission prompt when it appears.
Expect: Page redirects back with ?just_ran=..., a new row appears in the Comparisons table (model, cost, latency, outcome), and a browser notification fires "Re-run complete". The original scan's report is unchanged.
SM-05Open the validation step + expand an endpointsmoke
Do: Back to the trace, "explore →" on the validation step. Click "expand →" on an endpoint.
Expect: Endpoint list shows bucket pills + best library. Expanding reveals the per-request matrix (library × proxy × status × elapsed × block), plus header_reduce, cookie scenario, and rate-limit panels.
SM-06Fire one validation replaysmokejs$super-admin
Do: In the Replay panel, keep a public target URL, supply any scrubbed header value if required, click "▶ fire".
Expect: Redirects with ?just_replayed=..., a row appears in Replay History (status, lib/proxy, elapsed), and a "Replay complete" notification fires.
SM-07SSRF block is enforcedsmokesuper-admin
Do: In the Replay panel, change the URL to http://169.254.169.254/latest/meta-data/ and fire.
Expect: No network call is made. A history row records status 0 with an error like "blocked target". No metadata/credentials in the body preview.
SM-08Scrub + Pipeline-Replay explorers rendersmoke
Do: From the trace, "explore →" on the scrub step, then on the replay step.
Expect: Scrub shows the redaction manifest (counts of stripped cookies/headers/etc.). Replay shows per-endpoint outcomes + a confidence figure.
SM-09Models Registry renders + edit a pricesmokesuper-admin
Do: Open /admin/llm-models. Edit a non-pinned model, change its input price, save. Then trigger any LLM action and check the cost.
Expect: List shows all models with prices + 30-day usage. The edit saves; the next LLM call's cost reflects the new rate with no redeploy. (Revert the price afterward.)
SM-10Retired routes redirectsmoke
Do: Visit /admin/sandbox/list, then /admin/evals/{id}.
Expect: The first 302s to /admin/experiments; the second 302s to /debug/scans/{id}. Neither 500s.
SM-11Tooltips appearsmokejs
Do: Hover (and Tab to) the "?" icons on the trace columns, the registry pricing headers, and a replay panel field.
Expect: A readable tooltip appears on hover and on keyboard focus; it doesn't break the row layout.
SM-12No console errors anywhere on the pathsmokejs
Do: With DevTools open, re-walk SM-01 → SM-10 watching the Console + Network tabs.
Expect: Zero uncaught JS errors, zero failed asset loads (CSS/JS/fonts), no 500s in Network.
02P1 · Step Trace (full)

Trace overview regression.

// P10 / 0
P1-01Status pills colour-coded correctly
Do: Inspect the status column across steps; find an errored or partial scan via /admin/scans if you have one.
Expect: complete=green, errored=red, partial=amber, not-reached/skipped=muted. An errored scan shows a danger banner "Scan errored at step N" and downstream steps as "not reached".
P1-02Cost + duration rollups add up
Do: Compare the per-step costs in the table against the "LLM cost by step" panel total.
Expect: The panel total equals the sum of per-step LLM costs; percentages sum to ~100%. Duration breakdown likewise.
P1-03Sub-page links resolve
Do: Click each link in the sub-pages strip (full report, capture timeline, buckets, audit log, raw json).
Expect: Each opens its target (the "raw json" opens the trace.json payload) with no 404/500.
P1-04Status filter + multi-pill filterjs
Do: Toggle several step-type pills on/off; change the status dropdown; clear with "all".
Expect: Filtering is additive and the table never goes blank when all type-pills are off (falls back to all). Count stays accurate.
P1-05/admin/scans in-progress toggle
Do: On /admin/scans, check the "include in-progress" toggle (you may need an in-progress scan to see a difference).
Expect: Default view hides pending/processing scans; the toggle reveals them. Existing user/domain/date filters still work alongside it.
P1-06trace.json ETag / 304
Do: curl -i the trace.json route (with your session cookie). Repeat with -H 'If-None-Match: <etag>'.
Expect: First call 200 with an ETag header; second call returns 304 with no body.
P1-07?legacy=1 is ignored, not broken
Do: Open /debug/scans/{id}?legacy=1.
Expect: Renders the new Step Trace (the old tabbed page is gone). No 500.
03P2 · LLM Explorer (full)

LLM step + re-run regression.

// P20 / 0
P2-01Metadata grid is complete + accurate
Do: On an LLM step, read the timing/metadata grid.
Expect: prompt_name, version, model, provider, started/ended/elapsed, retry_count, all four token counts, cost, S3 paths, and the llm_calls row id are present and plausible.
P2-02Copy-curl buttonjs
Do: After the lazy fetch completes, click "copy curl", then paste into an editor.
Expect: A runnable curl against the Anthropic API with the model + system + user. The button is disabled until the fetch completes.
P2-03Re-run with edited promptjs$super-admin
Do: Edit the system or user textarea, keep the model, fire.
Expect: New comparison row tagged "prompt edited". The diff link opens the experiment focused on the step page.
P2-04Comparisons diff viewjs
Do: Click "diff →" on a comparison row.
Expect: Lands on the step page with the experiment highlighted (?focus_experiment=); original-vs-experiment is viewable.
P2-05Cost cap returns 429$super-admin
Do: Only if you want to verify the cap — fire re-runs until the daily $5/admin budget is hit (or temporarily lower the cap env var).
Expect: A 429 with a clear "daily cost cap exceeded" message; no experiment row beyond the cap.
04P3 · Validation Explorer + replay (full)

Validation + replay regression.

// P30 / 0
P3-01Per-request matrix data is faithful
Do: Expand an endpoint; compare the matrix against what you'd expect (a blocked endpoint shows red statuses + block pills; a clean one shows 200s).
Expect: Each library × proxy attempt shows status, elapsed, block type, and a body preview. Colours match outcomes.
P3-02Header / cookie / rate-limit panels
Do: Read all three sub-panels on an expanded endpoint.
Expect: header_reduce lists required vs optional headers; cookie scenarios show cold/warmup/full verdicts; rate-limit shows rounds + estimated safe delay + any rotation caveat.
P3-03Replay with scrubbed-value substitutionjs$super-admin
Do: In the replay form, find a [scrubbed] header, type a real value, fire. Also try the "edit headers" + "compare with original" toggles.
Expect: The supplied value is used; compare-with-original shows the diff; the headers textarea parses into the request correctly.
P3-04More SSRF blockssuper-admin
Do: Try replay URLs http://localhost/, http://10.0.0.1/, and file:///etc/passwd.
Expect: All blocked (status 0, "blocked target"), no network call. A legitimate public URL still works.
P3-05Replay soft-delete + restorejssuper-admin
Do: Delete a replay row (confirm the prompt). Toggle "show deleted". Restore it.
Expect: Deleted rows hide by default, show struck-through when toggled, and restore cleanly within the 30-day window.
P3-06Replay cap (50/admin/day)$super-admin
Do: Optional — fire past 50 replays in a day (or lower the cap to test).
Expect: 429 "daily replay cap exceeded" once the cap is hit.
P3-07In-progress / cancelled state pages
Do: If you have an in-progress or cancelled scan, open its trace + a step directly by URL.
Expect: In-progress shows a "scan in progress — refresh" placeholder with auto-refresh; cancelled shows the cancellation marker + downstream "not reached".
05P4 · LLM Models Registry (full)

Registry regression.

// P40 / 0
P4-01List filters + usage statssuper-admin
Do: On /admin/llm-models apply the provider, status, frontier, reasoning filters. Check the calls/cost/tests-30d columns.
Expect: Filters narrow the list; the most-used model's call count is bolded; numbers look right vs your usage.
P4-02Add a new modelsuper-admin
Do: "+ add model", fill the form (a fake test model), save. Then delete/archive it after.
Expect: 303 back to the list with a "created" banner; the new row appears. Duplicate model_id → 409.
P4-03Archive a pinned model is blockedsuper-admin
Do: Try to archive the model pinned as your default (e.g. via BROWSER_RECON_LLM_MODEL_* or the global default).
Expect: 409 with a structured list of what pins it. The archive button shows a disabled/tooltip state on the edit form.
P4-04Deactivate-via-update is also blocked when pinnedsuper-admin
Do: On a pinned model's edit form, uncheck "is_active" and save.
Expect: Same 409 pin-block (the update path can't bypass the archive safeguard).
P4-05Kill switch falls back to hardcodedsuper-admin
Do: Set BROWSER_RECON_USE_HARDCODED_PRICING=1, restart the server, edit a price in the DB/registry, then trigger an LLM call.
Expect: The cost uses the hardcoded rate, ignoring the DB edit — proving the rollback switch works. Unset it afterward.
P4-06Audit entries for mutationssuper-admin
Do: After a create/update/archive, check the audit log (DB audit_log or any audit view).
Expect: One row per mutation with the actor + action (llm_model_create/update/archive/restore) and field deltas on update.
06P5 · Cleanup + retirement

Retired surfaces.

// P50 / 0
P5-01Experiments cross-scan index
Do: Open /admin/experiments. Apply the model / prompt / admin / scan filters.
Expect: A paginated list of every re-run across scans; each row links back to its source step; filters narrow correctly.
P5-02Sandbox from-call + experiment redirects
Do: Hit /admin/sandbox/from-call/{call_id} and /admin/sandbox/{experiment_id} for known ids.
Expect: Both 302 to the relevant step page (the experiment one with ?focus_experiment).
P5-03Evals index explainer
Do: Open /admin/evals.
Expect: A plain-language explainer card describing the fixture-regression purpose, with a link to the per-step quick-compare for single-scan comparison.
P5-04No dead links from the nav / dashboard
Do: Click through the admin sidebar/topbar (Models, Experiments, Scans, Dashboard, Evals).
Expect: Every nav item resolves; no link points at a deleted sandbox/evals-matrix template; no 404/500.
07Security hardening

Verify the QA-round fixes.

// security0 / 0
SEC-01SSRF coverage (consolidated)super-admin
Do: Confirm SM-07 + P3-04 all blocked: metadata IP, localhost, private range, non-http scheme. Confirm a public URL still works.
Expect: Only public http/https hosts reach the network; everything else is blocked pre-flight.
SEC-02Proxy creds never shownsuper-admin
Do: Force a proxy error (e.g. fire a replay through a tier whose proxy is misconfigured, or inspect a replay that hit a proxy failure). Read the error_message in history + DB.
Expect: No user:pass@host appears; the proxy netloc is redacted to <proxy>.
SEC-03Audit fires on the 304 trace pathsuper-admin
Do: As admin (not owner), fetch another user's trace.json once (200), then again with If-None-Match (304). Check audit_log.
Expect: The 304 still writes a scan_view_admin row (subject to the per-day dedup) — the ETag can't be used to view-without-logging.
SEC-04Header / method validationsuper-admin
Do: In a replay, try to set a Host or Proxy-Authorization header, a header value with a newline, or method CONNECT.
Expect: 422 rejection (denylisted header / CRLF / disallowed method); the request is not fired.
08Permission boundariesneeds the extra accounts from SET-05

Who can do what.

Log in as each role and confirm the boundaries. Mutations are super-admin only; views are admin-or-owner.

// permissions0 / 0
PERM-01Plain user can't reach another user's scanperm
Do: As a plain user, open /debug/scans/{someone-elses-id}.
Expect: 404 (not 403 — existence isn't confirmed). Their own scan, if any, is viewable.
PERM-02Admin (not super) can view but not mutateperm
Do: As admin, open a scan's trace + an LLM step. Then try to fire a re-run and a replay.
Expect: Views render; the re-run/replay panels are absent or the POST returns 403. Registry edit, sandbox/run, eval run/rerun all 403.
PERM-03Admin can't delete another admin's replay/experimentperm
Do: As admin A, attempt to soft-delete a replay fired by super-admin/admin B.
Expect: 404 (owner-or-super-admin only). Hard-delete is super-admin only and requires a reason.
PERM-04CSRF protects mutationsperm
Do: As super-admin, craft a POST to a mutation route (rerun/replay/model) without the CSRF token (e.g. via curl with only the session cookie).
Expect: 403. With the correct token (a real form submit), it succeeds.
09Known limitations

Expected behaviour — not bugs.

These will look "off" but are deliberate. Don't file them.