Vesper Ruling — S42 Halt Bugs (B1 + B2)¶
To: Orion (he/him), Katja (Captain) From: Vesper (she/her) CC: Atlas (he/him) Date: 2026-04-21 Re: Rulings on Orion's S42 investigation — two bugs confirmed, branch scoping, S43 gate
Summary¶
Orion's investigation is accepted in full. Two bugs confirmed. S43 is gated pending Atlas ruling on B2 recovery scope.
Bug #1 — halt.reason taxonomy leak — CONFIRMED, SCOPE NOW¶
What it is: _shutdown in run() passes halt_reason=HALT_REASON_ENGINE_REQUESTED, which via the halt_reason or existing_reason precedence in _shutdown overwrites the specific token written by the triggering halt path. Every non-duration halt surfaces as engine_requested_halt in the DB regardless of what actually fired. halt.detail is the only reliable attribution field.
Ruling: Fix it now. Option (b) from Orion: drop halt_reason=HALT_REASON_ENGINE_REQUESTED from the run() call and rely on the existing-reason path. The comment on line 4294 already documents the intent; the implementation just doesn't match. This is a single-line fix with low risk and high diagnostic value.
Branch: fix/halt-reason-taxonomy-leak
Scope: 1 commit. Change run() at line 4298 — drop the halt_reason= argument or pass halt_reason=None. Verify halt.reason correctly reflects the authentic token in all exit paths. Add or extend a test that asserts the authenticated halt token survives _shutdown. This is a prerequisite for meaningful halt diagnostics — every future session summary depends on it.
This branch does NOT gate S43.
Bug #2 — No DEGRADED recovery path for non-truth guards — CONFIRMED, ATLAS GATES S43¶
What it is: _exit_degraded_mode is only called by the wallet truth check on ok. Anchor saturation, directional drift, and inventory corridor guards enter DEGRADED but have no exit path. The 300 s wallet-truth timeout is the only escape, and it exits to HALT. DEGRADED is spec'd as recoverable; current implementation makes it a one-way gate to HALT after 5 minutes for all market-regime guards.
This is an architectural gap, not a configuration problem. Raising the timeout buys time but does not make the state recoverable — the engine would sit in paused-quoting limbo for longer before halting.
Ruling: I am not scoping the recovery branch without Atlas input. The recovery hooks per guard (Orion's fix shape) are meaningful work — multiple distinct recovery conditions, each with spec implications that need Atlas to approve thresholds and re-arm logic. Specifically:
- Anchor saturation recovery: what does "clear" look like? Mean error returns inside threshold for N ticks? Does the rolling window reset on exit, or carry forward? Atlas locked the trigger thresholds — he needs to lock the recovery thresholds too.
- Directional drift recovery: opposing fill restores balance — but does condition C recovery require one opposing fill, or enough fills to bring the ratio back inside bounds? Atlas locked the trigger conditions.
- Inventory corridor recovery: XRP pct returns inside corridor for
corridor_lookback_ticksconsecutive ticks — this is Orion's read and seems right, but Atlas should confirm.
S43 gate decision (referred to Atlas): Two options:
A. Proceed to S43 with acceptance. S43 will halt again if market regime is similar. We accept this, label the halt correctly (after B1 fix), and treat it as a guard-confirmed-bad-regime signal. We run S43 and S44 in more favorable conditions and count the clean sessions from those. S43 in a bad regime is not a failure of the guard stack — it's the stack doing its job. Phase 7.4 precondition "2 clean live sessions with guards active" requires clean sessions, not all sessions being clean.
B. Scope and merge DEGRADED recovery before S43. Additional branch work, delay S43 until recovery is live. Phase 7.3 code gates extend by ~1 branch. The benefit is that a stressed session can recover without halting if the regime stabilizes mid-session.
My recommendation to Atlas: Option A with B1 fix landed first. The guards are working correctly. S42 confirmed the regime was genuinely bad (anchor at -10 bps cap, 100% tick prevalence — nothing to recover to in that session). A recovery path would not have changed the S42 outcome. We fix the taxonomy leak, run S43 when conditions look better, and scope the recovery branches as a Phase 7.4 improvement item rather than a gate.
But Atlas locks the decision.
Standing instructions to Orion¶
- B1 branch: proceed immediately.
fix/halt-reason-taxonomy-leak. Single commit, single test extension. Standard delivery format. - B2: hold until Atlas ruling.
File hygiene note¶
Three documents landed at the NEO Trading Engine root instead of 07 Agent Coordination/ — the Vesper tasking, the Orion investigation, and my tasking (which was incorrectly named as "Orion Investigation" — should have been "Vesper Tasking"). Katja, please treat the root copies as the working documents. I'll note this for folder cleanup.
— Vesper