Skip to content

Vesper Review — Branch #1: fix/halt-reason-lifecycle

To: Orion, Katja CC: Atlas From: Vesper Re: Pre-merge review per Atlas checklist


Verdict

APPROVED — merge when ready.

One nit (non-blocking) noted below. All four checklist items pass.


Checklist Results

✅ Taxonomy complete

All 20 halt paths covered. Full inventory:

Constant Token
HALT_REASON_DURATION_ELAPSED duration_elapsed
HALT_REASON_ENGINE_REQUESTED engine_requested_halt
HALT_REASON_RISK_XRP_EXPOSURE risk_xrp_exposure
HALT_REASON_RISK_RLUSD_EXPOSURE risk_rlusd_exposure
HALT_REASON_RISK_RPC_FAILURE risk_rpc_failure
HALT_REASON_RISK_STALE_LEDGER risk_stale_ledger
HALT_REASON_RISK_GATEWAY risk_gateway_unhealthy
HALT_REASON_RISK_GENERIC risk_generic
HALT_REASON_KILL_SWITCH kill_switch
HALT_REASON_RECONCILER reconciler_halt
HALT_REASON_REPLAY_EXHAUSTED replay_exhausted
HALT_REASON_INTERRUPTED_SIGINT interrupted_sigint
HALT_REASON_INTERRUPTED_SIGTERM interrupted_sigterm
HALT_REASON_INTERRUPTED_SIGBREAK interrupted_sigbreak
HALT_REASON_INTERRUPTED_OTHER interrupted_other
HALT_REASON_KEYBOARD_INTERRUPT interrupted_keyboard
HALT_REASON_STARTUP_FAILURE startup_failure
HALT_REASON_UNHANDLED_EXCEPTION unhandled_exception
HALT_REASON_CONFIG_MISMATCH config_mismatch (reserved, Branch #2)
HALT_REASON_UNEXPECTED unexpected_halt

No halt path missed.


_startup() clear-on-fresh block correctly guarded

if parent_session_id is None:
    try:
        self._state.set_engine_state("halt.reason", "")
        self._state.set_engine_state("halt.detail", "")
    except Exception:
        log.debug(...)

Guard is correct. Recovery restarts (parent_session_id set) preserve both keys. Test coverage: test_startup_clears_halt_reason_on_fresh_session and test_startup_preserves_halt_reason_on_recovery_restart.


_shutdown() write is unconditional

Old code:

_hr = halt_reason or existing_reason or "unexpected halt"
if _hr != existing_reason:
    self._state.set_engine_state("halt.reason", _hr)

New code:

_hr = halt_reason or existing_reason or HALT_REASON_UNEXPECTED
self._state.set_engine_state("halt.reason", _hr)

!= gate is gone. Always writes. test_explicit_halt_reason_overrides_stale_ghost directly exercises this invariant with the S39 ghost string.


✅ All 5 run_paper_session.py call sites pass explicit halt_reason

Site halt_reason value
Signal handler halt_token (sigint / sigterm / sigbreak / other)
Engine-requested halt (tick loop) "engine_requested_halt"
Duration elapsed "duration_elapsed"
sqlite3.OperationalError path "unhandled_exception"
Generic Exception path "unhandled_exception"

All 5 covered.


✅ Tests exercise all four Atlas 2E invariants

Test Invariant pinned
test_explicit_halt_reason_overrides_stale_ghost _shutdown always overwrites (S39 ghost scenario)
test_no_explicit_reason_with_no_existing_writes_unexpected unexpected_halt fallback — dashboards never see blank
test_startup_clears_halt_reason_on_fresh_session fresh session clears stale strings
test_startup_preserves_halt_reason_on_recovery_restart recovery restart preserves attribution

Nit (non-blocking)

run_paper_session.py uses string literals where main_loop.py defines constants:

# run_paper_session.py (current):
halt_reason="engine_requested_halt"
halt_reason="duration_elapsed"
halt_reason="unhandled_exception"

# Preferred — import and use the constants:
from neo_engine.main_loop import (
    HALT_REASON_ENGINE_REQUESTED,
    HALT_REASON_DURATION_ELAPSED,
    HALT_REASON_UNHANDLED_EXCEPTION,
)

Not blocking because the strings match exactly, but a future rename of a constant would require updating two files instead of one. Orion's call whether to fix in this branch or add to Branch #2 scope.


Branch #2 Note

Orion's scope warning is correct and appreciated. FLAG-034 already landed. The S39 inventory anomaly (337 RLUSD displayed vs ~197 actual) needs fresh investigation before Branch #2 code is written. No action on this branch.


Approved. Katja can apply and merge.

— Vesper