[C] Orion Audit — Session Closure Root Cause + Fix
To: Vesper (she/her) CC: Katja (Captain) From: Orion (he/him) Date: 2026-04-18
TL;DR¶
Issue 1 (CRITICAL) — found, fixed, tested. Root cause is a one-character typo in _shutdown(): it read _end_inv.total_value_rlusd (no _in_) from an InventorySnapshot dataclass whose actual field is total_value_in_rlusd. The AttributeError fired on every shutdown, was silently swallowed by log.debug, so close_session() was never executed. Every sessions row from S1 through S37 landed with ended_at = NULL as a consequence. Fix is 1 line of code + log-level promotion + 4 regression tests. Patch attached.
Issue 2 (HIGH) — confirmed mechanism, hardening recommended, backup restore safe. S38 DB corruption is consistent with CTRL_CLOSE_EVENT on Windows (terminal X-button close). Python's signal module does not deliver this event — our FLAG-027 signal handlers catch SIGINT / SIGTERM / SIGBREAK but nothing else. The neo_live_stage1.db.bak.20260418T192119Z pre-S38 backup was taken via SQLite's atomic backup() API and is a consistent snapshot — safe to restore. Recommended hardening: add a periodic synchronous PRAGMA wal_checkpoint(TRUNCATE) on a 60-second timer, to bound the blast radius of any future hard-kill.
Recommendation: Clear to run S39 after applying the Issue 1 patch and restoring the backup. Issue 2 hardening is additive defense-in-depth and can ship on a later branch.
Issue 1 — Root Cause¶
The bug (neo_engine/main_loop.py:668, pre-fix)¶
self._state.close_session(
ending_xrp=_end_inv.xrp_balance if _end_inv else 0.0,
ending_rlusd=_end_inv.rlusd_balance if _end_inv else 0.0,
ending_value=_end_inv.total_value_rlusd if _end_inv else 0.0, # ← WRONG
halt_reason=_final_reason,
)
InventorySnapshot (neo_engine/models.py:176) has fields:
xrp_balance, rlusd_balance, xrp_value_in_rlusd, total_value_in_rlusd,
xrp_pct, drift_pct, skew_tier, drift_velocity_pct_per_min
There is no total_value_rlusd attribute. _end_inv.total_value_rlusd raises AttributeError: 'InventorySnapshot' object has no attribute 'total_value_rlusd' every single time.
Verified in isolation:
$ python -c "from neo_engine.models import InventorySnapshot; \
s = InventorySnapshot(); s.total_value_rlusd"
AttributeError: 'InventorySnapshot' object has no attribute 'total_value_rlusd'
The confusion is real: the DB column in valuation_snapshots is named total_value_rlusd (no _in_), while the in-memory dataclass field has _in_. The two are inconsistent. _shutdown() picked the DB spelling on a dataclass access.
Why it has been silent since S1¶
The attribute access is inside a try/except block (main_loop.py:655–672) whose handler was:
except Exception as exc:
log.debug("Session close failed — continuing shutdown", extra={"error": str(exc)})
log.debug is filtered out at default log levels. The failure is invisible in every file log, every console log, every telemetry export.
Answers to the four audit questions¶
| # | Question | Answer |
|---|---|---|
| 1 | Is close_session raising an exception on every call? |
No — close_session is never reached. The AttributeError fires on the argument-evaluation line that computes ending_value, before the call. |
| 2 | Is _current_session_id None at shutdown time? |
No. create_session() sets it; nothing else clears it before close_session runs. |
| 3 | Is the DB connection already closed? | No. _state.close() is in a finally block at line 678, strictly after the close_session try/except. Ordering is correct. |
| 4 | Is _transaction() failing in WAL mode? |
No. _transaction() is never entered because close_session() is never called. |
The root cause is simpler than any of the hypotheses — it's a plain attribute typo in a silent-failure code path.
Git blame¶
$ git log -S "total_value_rlusd" -- neo_engine/main_loop.py
7eaed37 2026-04-12 Data: add session tracking for funded runs in stage1 DB
Introduced in the commit that first wired up close_session on shutdown. The bug has been present from the first session the table existed.
The Fix¶
Patch: orion-patches-2026-04-18-session-closure/0001-fix-session-closure-write-ended_at-total_value_in_rl.patch
Branch: fix/session-closure-ended-at — one commit, one production file + one new test file.
neo_engine/main_loop.py (line 668)¶
- ending_value=_end_inv.total_value_rlusd if _end_inv else 0.0,
+ # Note: the InventorySnapshot field is `total_value_in_rlusd`. The
+ # prior spelling `total_value_rlusd` (no `_in_`) matches the
+ # valuation_snapshots DB column, not the in-memory dataclass, and
+ # raised AttributeError on every shutdown — silently swallowed by
+ # the debug log below. That caused every session row from S1–S37
+ # to land with ended_at = NULL. Audit: 2026-04-18.
+ ending_value=_end_inv.total_value_in_rlusd if _end_inv else 0.0,
except Exception as exc:
- log.debug("Session close failed — continuing shutdown", extra={"error": str(exc)})
+ # Promoted from log.debug: session close is a source-of-truth write —
+ # failures must not be silent.
+ log.error(
+ "Session close failed — continuing shutdown",
+ extra={"error": str(exc)},
+ exc_info=True,
+ )
tests/test_shutdown_ended_at.py (new, 4 tests)¶
| # | Test | What it pins |
|---|---|---|
| 1 | test_shutdown_populates_ended_at_on_sessions_row |
End-to-end: real StateManager(":memory:"), open session, run _shutdown, assert ended_at IS NOT NULL |
| 2 | test_shutdown_passes_total_value_in_rlusd_as_ending_value |
Spies on close_session kwargs to pin the attribute name |
| 3 | test_shutdown_handles_missing_session_gracefully |
_current_session_id is None early-return path |
| 4 | test_close_session_failure_logs_at_error_level |
Forces close_session to raise; asserts ERROR record emitted |
All 4 pass with fix. 3 of 4 fail without fix (negative control confirmed).
Git Commands for Katja (PowerShell, copy-paste)¶
Block 1 — branch off main
Block 2 — apply patch
$patch = "C:\path\to\Claude Homebase Neo\02 Projects\NEO Trading Engine\orion-patches-2026-04-18-session-closure\0001-fix-session-closure-write-ended_at-total_value_in_rl.patch"
git am --3way "$patch"
Block 3 — verify
Expected: 1 commit,4 passed.
Block 4 — push and merge
Merge via GitHub UI.Issue 2 — DB Corruption Mechanism + Hardening¶
Confirmed mechanism¶
CTRL_CLOSE_EVENT (terminal X-button) is NOT caught by Python's signal module. Our FLAG-027 signal handlers cover SIGINT / SIGTERM / SIGBREAK only — the docstring explicitly documents this gap. S38 failure sequence: terminal X → OS gives 5s grace → WAL auto-checkpoint mid-flight → process killed → kernel drops file buffers → malformed DB.
Backup is safe¶
neo_live_stage1.db.bak.20260418T192119Z was created via SQLite's backup() API — a consistent snapshot at backup time, not a file copy. S38 writes are not in the backup and not recoverable from the malformed DB. Restore from backup is the correct path.
Hardening recommendation — FLAG-035¶
Add a periodic PRAGMA wal_checkpoint(TRUNCATE) on a 60-second timer. After each checkpoint, WAL file is zeroed — blast radius of any future hard-kill bounded to ≤60 seconds of writes. Implementation sketch provided (threading.Event-based loop, daemon thread, stop on shutdown). Ship as FLAG-035 on a separate branch after S39 confirms clean behavior.
Summary¶
| Priority | Item | Status |
|---|---|---|
| 1 | Fix ended_at write failure |
✅ Patch ready — branch fix/session-closure-ended-at, 4 tests |
| 2 | log.debug → log.error on shutdown failure |
✅ In same commit |
| 3 | WAL checkpoint cadence hardening | Designed, not built — FLAG-035, separate branch |
| 4 | Backup restore confirmed safe | ✅ Confirmed — restore and run PRAGMA quick_check |
Clear to run S39 once patch lands on main and PRAGMA quick_check on restored DB returns ok.
What stays unchanged: anchor_max_divergence_bps: 10.0, base_size_rlusd: 15.0, risk_engine.py, fill paths, schema, Phase 7.2 CLOB switch.
— Orion