Step-by-step checks for every feature shipped in P1–P5 plus the security hardening. The automated suite (2297 tests) covers server logic; this runbook closes the gap it can't reach — the browser-side JavaScript and the real end-to-end flows. Click any test to tick it off; state persists in your browser.
smoke part of the ~15-min critical happy path · js exercises browser JavaScript (the gap the test suite can't reach — do these carefully) · $ incurs real cost (LLM tokens or proxy bandwidth) · super-admin requires a super-admin session · perm a permission/negative test.
Get the server running, a super-admin session, a completed scan to inspect, and (optionally) two extra accounts for the permission tests.
set -a; source .env; set +a; .venv/bin/python -m alembic upgrade heade9a4b3c1d5f8 (llm_models) and d8e6f4a2b9c1 (validation_replays) with no error. llm_models seeds 9 rows.set -a; source .env; set +a; .venv/bin/python -m uvicorn browser_recon_server.app:create_app --factory --port 8000 — wait ~10–15s for the DB connection on startup.curl localhost:8000/health → 200./dashboard/login, enter your super-admin email, submit. In dev the magic link is printed to the server log (LoggingEmailProvider) — copy it from stdout and open it. If you don't have a super-admin user yet, set one directly: UPDATE users SET role='super_admin' WHERE email='you@example.com'; (the role-change route itself requires super-admin, so the first one is a DB update)./admin/scans. Pick a scan with status complete (ideally one with an LLM synthesis + a validation step). Note its scan id. If you have none, run a real scan first, or pick the richest one available..venv/bin/python scripts/seed_user.py --email admin-test@example.com then UPDATE users SET role='admin' WHERE email='admin-test@example.com';. Repeat for a plain user role. Skip if you only want the happy-path checks.Run this first. It walks one scan through the whole new surface and touches each major feature once.
/debug/scans/{id} for your completed scan.#filter=llm&q=synthesis). Reload the page with that hash → the filter state restores.claude-haiku-4-5, leave prompts empty, click "▶ run". Allow the browser notification permission prompt when it appears.?just_ran=..., a new row appears in the Comparisons table (model, cost, latency, outcome), and a browser notification fires "Re-run complete". The original scan's report is unchanged.?just_replayed=..., a row appears in Replay History (status, lib/proxy, elapsed), and a "Replay complete" notification fires.http://169.254.169.254/latest/meta-data/ and fire./admin/llm-models. Edit a non-pinned model, change its input price, save. Then trigger any LLM action and check the cost./admin/sandbox/list, then /admin/evals/{id}./admin/experiments; the second 302s to /debug/scans/{id}. Neither 500s./admin/scans if you have one.trace.json payload) with no 404/500./admin/scans, check the "include in-progress" toggle (you may need an in-progress scan to see a difference).curl -i the trace.json route (with your session cookie). Repeat with -H 'If-None-Match: <etag>'.ETag header; second call returns 304 with no body./debug/scans/{id}?legacy=1.curl against the Anthropic API with the model + system + user. The button is disabled until the fetch completes.?focus_experiment=); original-vs-experiment is viewable.[scrubbed] header, type a real value, fire. Also try the "edit headers" + "compare with original" toggles.http://localhost/, http://10.0.0.1/, and file:///etc/passwd./admin/llm-models apply the provider, status, frontier, reasoning filters. Check the calls/cost/tests-30d columns.BROWSER_RECON_LLM_MODEL_* or the global default).BROWSER_RECON_USE_HARDCODED_PRICING=1, restart the server, edit a price in the DB/registry, then trigger an LLM call.audit_log or any audit view)./admin/experiments. Apply the model / prompt / admin / scan filters./admin/sandbox/from-call/{call_id} and /admin/sandbox/{experiment_id} for known ids.?focus_experiment)./admin/evals.user:pass@host appears; the proxy netloc is redacted to <proxy>.If-None-Match (304). Check audit_log.scan_view_admin row (subject to the per-day dedup) — the ETag can't be used to view-without-logging.Host or Proxy-Authorization header, a header value with a newline, or method CONNECT.Log in as each role and confirm the boundaries. Mutations are super-admin only; views are admin-or-owner.
user, open /debug/scans/{someone-elses-id}.admin, open a scan's trace + an LLM step. Then try to fire a re-run and a replay.These will look "off" but are deliberate. Don't file them.
+ gpt-5 / + grok-3 buttons currently land an experiment row with an error — sandbox.run_experiment is still Anthropic-SDK-only. The audit row + redirect still fire. The fix (route through the registry's resolver) is a known follow-up. Anthropic-model re-runs work fully.admin, that's now blocked by design (D·07). Revert the gate if you need admin-level eval access.?legacy=1 no longer renders the old tabbed page — it's deleted. If the new Trace ever misbehaves, the rollback is a git revert, not a query param.