OSPREY E2E Scenarios
Two cross-paradigm scenarios. Tier ≥ 2 required (tier 1 lacks vacuum gauges and most cavity PVs in every paradigm).
A — SR07 vacuum burst
File: tests/e2e/test_vacuum_burst_scenario.py · pure telemetry, no logbook.
Operator prompt
"We lost about 5 mA of beam yesterday around 14:32. Did the vacuum do anything weird around then?"
Required PVs
SR:DIAG:DCCT:01:CURRENT:RB
SR:VAC:GAUGE:SR{01..12}:PRESSURE:RB (12 sectors)
Seeded event
| When | Wall-clock anchor 14:32:08 on the day before today (re-fires each day inside the window) |
| SR07 gauge | baseline 5×10−8 Torr + Gaussian spike, width 0.025, amplitude 1.5×10−7 |
| SR01–06,08–12 | flat baseline + small noise (max |r| vs DCCT ≤ 0.08) |
| DCCT | 500 mA baseline − 5 mA Gaussian dip (width 0.05) at same instant + 0.3 mA step residual |
| Target r | SR07 vs DCCT ≈ −0.88 over a 10-min window |
Source: src/osprey/connectors/archiver/mock_archiver_connector.py:332–356
Pass criteria
- Tool-routing: ≥1
mcp__channel-finder__*, ≥1 mcp__controls__archiver_read, archiver inputs reference both dcct/current AND gauge/vac.
- Judge: names "Sector 7" or "SR07" explicitly. Vague "a vacuum gauge somewhere" / "no anomaly" / a different sector = fail.
B — RF cavity C1 thermal excursion
File: tests/e2e/test_rf_cavity_correlation_scenario.py · logbook + telemetry must converge.
Operator prompt
"The beam dumped this morning. Figure out what happened and plot the data."
Required PVs
SR:RF:CAVITY:01:{TEMPERATURE,POWER:FWD,POWER:REV,VOLTAGE,FREQUENCY}:RB
SR:RF:CAVITY:02:* (same fields — needed for C1-vs-C2 contrast)
Seeded telemetry (window-relative positions)
| C1 events | (0.20, 1.0), (0.55, 0.7), (0.85, 1.2) — three excursions, last worst |
| C1 temperature | 27.0 °C baseline + daily oscillation + envelope × 7 → peaks ~34 °C |
| C1 reflected power | 5 kW baseline + envelope × 80 → peaks ~85 kW |
| C1 forward power | 450 kW baseline − envelope × 440 → trips to ~0 |
| C1 voltage | 2.5 MV baseline − envelope × 2.4 |
| C1 frequency | 499.654 MHz − envelope × 0.001 (thermal detuning) |
| C2 (stable ref) | single minor blip (0.55, 0.25) on all fields, baseline ~26.5 °C |
Source: src/osprey/connectors/archiver/mock_archiver_connector.py:230–233. Primary detection: PV contains c1, k1, or :01:.
Seeded logbook (rebased to "now")
DEMO-026 | today−4 · 03:20 — beam dump, C1 reflected-power trip, names C2 unaffected |
DEMO-027 | today−3 · 10:00 — investigation, three thermal excursions, cooling-manifold blockage |
DEMO-028 | today−2 · 14:00 — repair, calcium-carbonate deposits, flow restored (rebase pivot) |
Pass criteria
- Tool-routing: ≥1
mcp__ariel__*, ≥1 mcp__channel-finder__*, ≥1 mcp__controls__archiver_read, archiver inputs reference cavity or rf, ≥1 plot tool (create_static_plot/create_interactive_plot/create_dashboard).
- Judge (all three required):
- (a) commits to cavity C1 / cavity 01 / first cavity
- (b) names the mechanism (thermal/cooling → reflected-power rise → forward-power trip)
- (c) contrasts with C2 (stable)
Cross-scenario invariants
- Temporal separation: A's event lives at "yesterday 14:32"; B's logbook arc spans today−4 to today−2 morning — no window overlap.
- Decoy in logbook:
DEMO-001 mentions a Sector 4 vacuum spike (same magnitude as A) but rebases to ~17 days back — outside any "yesterday" window. DEMO-002 uses "Sector 7" in a magnet-temperature context, also 17 days back.
- C1 ↔ "01" bridge is natural-language only. No alias table; channel description says "cavity 01", logbook says "C1".
Tier setting (critical). tests/e2e/sdk_helpers.py:88 defaults tier=1. Tier 1 contains no SR:VAC:GAUGE channels and no C1/C2 temperature/power/frequency channels in any paradigm (in_context, hierarchical, middle_layer) — the generator produces identical PV sets across paradigms at every tier. Both scenarios require tier ≥ 2. Bump the default or override per-test.