[C] Orion Audit — Session Closure Root Cause + Fix

To: Vesper (she/her) CC: Katja (Captain) From: Orion (he/him) Date: 2026-04-18

TL;DR¶

Issue 1 (CRITICAL) — found, fixed, tested. Root cause is a one-character typo in _shutdown(): it read _end_inv.total_value_rlusd (no _in_) from an InventorySnapshot dataclass whose actual field is total_value_in_rlusd. The AttributeError fired on every shutdown, was silently swallowed by log.debug, so close_session() was never executed. Every sessions row from S1 through S37 landed with ended_at = NULL as a consequence. Fix is 1 line of code + log-level promotion + 4 regression tests. Patch attached.

Issue 2 (HIGH) — confirmed mechanism, hardening recommended, backup restore safe. S38 DB corruption is consistent with CTRL_CLOSE_EVENT on Windows (terminal X-button close). Python's signal module does not deliver this event — our FLAG-027 signal handlers catch SIGINT / SIGTERM / SIGBREAK but nothing else. The neo_live_stage1.db.bak.20260418T192119Z pre-S38 backup was taken via SQLite's atomic backup() API and is a consistent snapshot — safe to restore. Recommended hardening: add a periodic synchronous PRAGMA wal_checkpoint(TRUNCATE) on a 60-second timer, to bound the blast radius of any future hard-kill.

Recommendation: Clear to run S39 after applying the Issue 1 patch and restoring the backup. Issue 2 hardening is additive defense-in-depth and can ship on a later branch.

Issue 1 — Root Cause¶

The bug (`neo_engine/main_loop.py:668`, pre-fix)¶

self._state.close_session(
    ending_xrp=_end_inv.xrp_balance if _end_inv else 0.0,
    ending_rlusd=_end_inv.rlusd_balance if _end_inv else 0.0,
    ending_value=_end_inv.total_value_rlusd if _end_inv else 0.0,  # ← WRONG
    halt_reason=_final_reason,
)

InventorySnapshot (neo_engine/models.py:176) has fields:

xrp_balance, rlusd_balance, xrp_value_in_rlusd, total_value_in_rlusd,
xrp_pct, drift_pct, skew_tier, drift_velocity_pct_per_min

There is no total_value_rlusd attribute. _end_inv.total_value_rlusd raises AttributeError: 'InventorySnapshot' object has no attribute 'total_value_rlusd' every single time.

Verified in isolation:

$ python -c "from neo_engine.models import InventorySnapshot; \
             s = InventorySnapshot(); s.total_value_rlusd"
AttributeError: 'InventorySnapshot' object has no attribute 'total_value_rlusd'

The confusion is real: the DB column in valuation_snapshots is named total_value_rlusd (no _in_), while the in-memory dataclass field has _in_. The two are inconsistent. _shutdown() picked the DB spelling on a dataclass access.

Why it has been silent since S1¶

The attribute access is inside a try/except block (main_loop.py:655–672) whose handler was:

except Exception as exc:
    log.debug("Session close failed — continuing shutdown", extra={"error": str(exc)})

log.debug is filtered out at default log levels. The failure is invisible in every file log, every console log, every telemetry export.

Answers to the four audit questions¶

#	Question	Answer
1	Is `close_session` raising an exception on every call?	No — `close_session` is never reached. The `AttributeError` fires on the argument-evaluation line that computes `ending_value`, before the call.
2	Is `_current_session_id` None at shutdown time?	No. `create_session()` sets it; nothing else clears it before `close_session` runs.
3	Is the DB connection already closed?	No. `_state.close()` is in a `finally` block at line 678, strictly after the close_session try/except. Ordering is correct.
4	Is `_transaction()` failing in WAL mode?	No. `_transaction()` is never entered because `close_session()` is never called.

The root cause is simpler than any of the hypotheses — it's a plain attribute typo in a silent-failure code path.

Git blame¶

$ git log -S "total_value_rlusd" -- neo_engine/main_loop.py
7eaed37  2026-04-12  Data: add session tracking for funded runs in stage1 DB

Introduced in the commit that first wired up close_session on shutdown. The bug has been present from the first session the table existed.

The Fix¶

Patch: orion-patches-2026-04-18-session-closure/0001-fix-session-closure-write-ended_at-total_value_in_rl.patch

Branch: fix/session-closure-ended-at — one commit, one production file + one new test file.

`neo_engine/main_loop.py` (line 668)¶

-            ending_value=_end_inv.total_value_rlusd if _end_inv else 0.0,
+            # Note: the InventorySnapshot field is `total_value_in_rlusd`. The
+            # prior spelling `total_value_rlusd` (no `_in_`) matches the
+            # valuation_snapshots DB column, not the in-memory dataclass, and
+            # raised AttributeError on every shutdown — silently swallowed by
+            # the debug log below. That caused every session row from S1–S37
+            # to land with ended_at = NULL. Audit: 2026-04-18.
+            ending_value=_end_inv.total_value_in_rlusd if _end_inv else 0.0,
         except Exception as exc:
-            log.debug("Session close failed — continuing shutdown", extra={"error": str(exc)})
+            # Promoted from log.debug: session close is a source-of-truth write —
+            # failures must not be silent.
+            log.error(
+                "Session close failed — continuing shutdown",
+                extra={"error": str(exc)},
+                exc_info=True,
+            )

`tests/test_shutdown_ended_at.py` (new, 4 tests)¶

#	Test	What it pins
1	`test_shutdown_populates_ended_at_on_sessions_row`	End-to-end: real `StateManager(":memory:")`, open session, run `_shutdown`, assert `ended_at IS NOT NULL`
2	`test_shutdown_passes_total_value_in_rlusd_as_ending_value`	Spies on `close_session` kwargs to pin the attribute name
3	`test_shutdown_handles_missing_session_gracefully`	`_current_session_id` is None early-return path
4	`test_close_session_failure_logs_at_error_level`	Forces `close_session` to raise; asserts ERROR record emitted

All 4 pass with fix. 3 of 4 fail without fix (negative control confirmed).

Git Commands for Katja (PowerShell, copy-paste)¶

Block 1 — branch off main

git checkout main
git pull origin main
git checkout -b fix/session-closure-ended-at

Block 2 — apply patch

$patch = "C:\path\to\Claude Homebase Neo\02 Projects\NEO Trading Engine\orion-patches-2026-04-18-session-closure\0001-fix-session-closure-write-ended_at-total_value_in_rl.patch"
git am --3way "$patch"

Block 3 — verify

git log --oneline main..HEAD
python -m pytest tests/test_shutdown_ended_at.py -v

Expected: 1 commit, 4 passed.

Block 4 — push and merge

git push origin fix/session-closure-ended-at

Merge via GitHub UI.

Issue 2 — DB Corruption Mechanism + Hardening¶

Confirmed mechanism¶

CTRL_CLOSE_EVENT (terminal X-button) is NOT caught by Python's signal module. Our FLAG-027 signal handlers cover SIGINT / SIGTERM / SIGBREAK only — the docstring explicitly documents this gap. S38 failure sequence: terminal X → OS gives 5s grace → WAL auto-checkpoint mid-flight → process killed → kernel drops file buffers → malformed DB.

Backup is safe¶

neo_live_stage1.db.bak.20260418T192119Z was created via SQLite's backup() API — a consistent snapshot at backup time, not a file copy. S38 writes are not in the backup and not recoverable from the malformed DB. Restore from backup is the correct path.

Hardening recommendation — FLAG-035¶

Add a periodic PRAGMA wal_checkpoint(TRUNCATE) on a 60-second timer. After each checkpoint, WAL file is zeroed — blast radius of any future hard-kill bounded to ≤60 seconds of writes. Implementation sketch provided (threading.Event-based loop, daemon thread, stop on shutdown). Ship as FLAG-035 on a separate branch after S39 confirms clean behavior.

Summary¶

Priority	Item	Status
1	Fix `ended_at` write failure	✅ Patch ready — branch `fix/session-closure-ended-at`, 4 tests
2	`log.debug` → `log.error` on shutdown failure	✅ In same commit
3	WAL checkpoint cadence hardening	Designed, not built — FLAG-035, separate branch
4	Backup restore confirmed safe	✅ Confirmed — restore and run `PRAGMA quick_check`

Clear to run S39 once patch lands on main and PRAGMA quick_check on restored DB returns ok.

What stays unchanged: anchor_max_divergence_bps: 10.0, base_size_rlusd: 15.0, risk_engine.py, fill paths, schema, Phase 7.2 CLOB switch.

— Orion