Orion Pre-Code Findings — FLAG-041 halt.reason Taxonomy Leak¶
To: Vesper (she/her)
From: Orion (he/him)
CC: Katja (Captain), Atlas (he/him)
Date: 2026-04-21
Branch: fix/halt-reason-taxonomy-leak (not yet created)
Type: Pre-code investigation — Q1-Q3 + one scope question flagged before code
TL;DR¶
Q1 and Q2 pass. Q3 has a finding that affects the scope of the fix: the clobber pattern you spec'd for main_loop.py:4282 also exists at run_paper_session.py:415, and since every paper session (S1-S42, and S43) goes through that path rather than NEOEngine.run(), fixing only main_loop.py will not change S43's halt.reason — the paper session loop will still clobber. Need your ruling on scope before writing code.
Detail inline. No code written, no branch created. Sitting on main.
Q1 — _shutdown fallback behavior with halt_reason=None¶
Finding: fallback is safe. Line 1018–1021 of main_loop.py:
existing_reason = self._state.get_engine_state("halt.reason") or ""
_hr = halt_reason or existing_reason or HALT_REASON_UNEXPECTED
self._state.set_engine_state("halt.reason", _hr)
With halt_reason=None (or argument omitted, since the kwarg defaults to None):
Prior state of halt.reason |
Resulting write |
|---|---|
specific token present (e.g. inventory_truth_halt) |
specific token preserved |
| empty or missing | HALT_REASON_UNEXPECTED ("unexpected_halt") |
HALT_REASON_UNEXPECTED is defined at line 142:
No path produces None or empty-string writes. The or "" on line 1019 guarantees existing_reason is a string, and the cascade guarantees at least the unexpected_halt literal lands in halt.reason. The outer try/except at line 1022 swallows errors from the StateManager itself (DB disconnected, etc.) — in which case nothing is written and the pre-existing value (if any) stands. That is existing behavior, not affected by the proposed fix.
Q1 verdict: safe to drop halt_reason= from the run() call.
Q2 — Non-duration _tick() return-False paths¶
Finding: every return-False site writes halt.reason before returning. Enumerating _tick() (lines 2247–2853) return sites:
| Line | Trigger | Writer of halt.reason |
Token written |
|---|---|---|---|
| 2286 | DEGRADED timeout in _tick itself |
_escalate_degraded_to_halt() at line 1453 (writes line 1469) |
HALT_REASON_INVENTORY_TRUTH ("inventory_truth_halt") |
| 2293 | MODE_HALT observed after _maybe_run_periodic_truth_check |
_apply_truth_check_result() at line 2151 → _escalate_degraded_to_halt() (line 2181) |
HALT_REASON_INVENTORY_TRUTH |
| 2365 | RiskStatus.HALT — risk engine escalated |
Inline in _tick at lines 2327–2361 (kill-switch, exposure, RPC, ledger, gateway, generic) |
HALT_REASON_KILL_SWITCH / HALT_REASON_RISK_XRP_EXPOSURE / HALT_REASON_RISK_RLUSD_EXPOSURE / HALT_REASON_RISK_RPC_FAILURE / HALT_REASON_RISK_STALE_LEDGER / HALT_REASON_RISK_GATEWAY / HALT_REASON_RISK_GENERIC |
| 2379 | ReplayExhausted caught from _market_data.fetch() |
Inline at lines 2374–2378 | HALT_REASON_REPLAY_EXHAUSTED |
| 2475 | recon_result.engine_signal == EngineStatus.HALTED |
Inline at lines 2458–2466 | HALT_REASON_RECONCILER ("reconciler_halt") |
Return-True sites (informational):
| Line | Outcome |
|---|---|
| 2450 | account_offers fetch failed → continue (no halt) |
| 2847 | tick completed normally → continue |
Each return-False has coverage. The inline writes at 2327–2361, 2374–2378, and 2458–2466 are all wrapped in try/except Exception: pass — meaning a StateManager write failure during halt classification would swallow silently, leaving halt.reason unset. In that edge case, the proposed fix (drop halt_reason= from run()) will fall through to HALT_REASON_UNEXPECTED — correct defensive behavior. Not silent.
Q2 verdict: no uncovered return-False path. Every triggering site writes a specific token before the clobber.
Q3 — Duration-elapsed path¶
Finding: duration-elapsed is unaffected by the proposed fix — but there is a second clobber site that IS material.
Q3.a — Duration-elapsed path (as asked)¶
HALT_REASON_DURATION_ELAPSED is written from exactly one site: run_paper_session.py:429 (inside the if not shutdown_called block after the duration-bounded while loop exits). It is NOT reached via _tick() → run() → _shutdown. It is an explicit direct call to engine._shutdown(reason, halt_reason=HALT_REASON_DURATION_ELAPSED) after the loop-exit condition.
while time.monotonic() < deadline:
should_continue = engine._tick()
...
if not shutdown_called:
engine._shutdown(
"Paper session duration elapsed",
halt_reason=HALT_REASON_DURATION_ELAPSED,
)
Because this path explicitly passes the duration_elapsed token and nothing has previously written halt.reason (no halt trigger fired), _shutdown's precedence halt_reason or existing_reason writes duration_elapsed. Correct.
The proposed fix does not touch this path. Dropping halt_reason= from the run() if not should_continue branch does not affect the paper-session if not shutdown_called branch. Duration-elapsed behavior is preserved.
Q3.b — Second clobber site (new finding)¶
HALT_REASON_ENGINE_REQUESTED is written at two sites, not one. The second is the paper-session loop at run_paper_session.py:410–419:
while time.monotonic() < deadline:
should_continue = engine._tick()
tick_count += 1
if not should_continue:
engine._shutdown(
"Paper session halted by engine",
halt_reason=HALT_REASON_ENGINE_REQUESTED,
)
shutdown_called = True
_engine_ref.shutdown_called = True
break
This is structurally identical to the main_loop.py:4277–4284 clobber your tasking memo targets. The comment text above it (line 4278–4279 of main_loop.py) asserts "halt.reason + halt.detail already written by the triggering halt path" — the same invariant holds for the paper-session loop, because both loops call _tick() and both receive the same return-False semantics.
Consequence for S42 / S43: S42 was a paper session. Paper sessions run through run_paper_session.py, not NEOEngine.run(). The engine_requested_halt in S42's DB row was written by line 415 of run_paper_session.py, not line 4282 of main_loop.py. If we fix only main_loop.py per your tasking, S43 will still surface engine_requested_halt in the DB because the paper-session loop will still clobber.
Grep evidence:
neo_engine/main_loop.py:123:HALT_REASON_ENGINE_REQUESTED = "engine_requested_halt"
neo_engine/main_loop.py:4282: halt_reason=HALT_REASON_ENGINE_REQUESTED,
run_paper_session.py:24: HALT_REASON_ENGINE_REQUESTED,
run_paper_session.py:415: halt_reason=HALT_REASON_ENGINE_REQUESTED,
My original investigation memo attributed the clobber to main_loop.py:4298 (line number drift; same block). That was wrong — S42 went through run_paper_session.py:415. Correction noted.
Scope Decision — Need Your Ruling¶
Two paths exist; both clobber; both are in scope for FLAG-041 ("halt.reason taxonomy leak"). Your tasking memo scopes only main_loop.py. Options:
Option A — Extend this branch to fix both sites. Two file edits, same shape:
- main_loop.py:4280–4283 — drop halt_reason=HALT_REASON_ENGINE_REQUESTED
- run_paper_session.py:413–416 — drop halt_reason=HALT_REASON_ENGINE_REQUESTED
Plus both tests already specified — token preservation test becomes a paper-session test (which matches S42's real path). Branch title and FLAG-041 description cover both. Minimal scope creep — same one-line idea applied in two places.
Option B — This branch fixes only main_loop.py; scope a follow-up for run_paper_session.py. Clean per your tasking. But S43 still surfaces engine_requested_halt in the DB — the dashboard bug isn't actually fixed until the follow-up lands. You'd want the follow-up to ship before S43.
Option C — Reject the finding / disagree with the analysis. Push back if I'm wrong about which path S42 took. I can supply the DB forensic trail or a diff run.
My recommendation is Option A. The fix is the same one-line change in two files; tests can cover both; the outcome is "S43's DB shows the authentic token" which is the actual point of FLAG-041. Scope creep is cosmetic — I wouldn't propose adding an unrelated change, just extending the same change to its second site.
Atlas-locked DEGRADED recovery semantics (Bug #2 from the S42 memo) remain out of scope and deferred, per your tasking.
Imports Needed for Tests¶
Either option:
- Test 1 (token preservation) — if Option A, prefer exercising run_paper_session.py:410–419 in a test context (matches S42's real path). If Option B, exercise main_loop.run() with a mocked _tick.
- Test 2 (fallback safety) — hits _shutdown directly; unaffected by branch decision.
Standing By¶
No branch pre-created. No code written. main is the current branch. Current working state: clean. Ready to write code once you rule A / B / C.
— Orion