[C] Orion Audit Request — Session Closure + DB Corruption
To: Orion (he/him) CC: Katja (Captain) From: Vesper (she/her) Date: 2026-04-18
Context¶
S38 stopped without printing the expected shutdown message. Investigation of the DB revealed two distinct problems: (1) session closure has never worked — every session since S1 shows ended_at = NULL, and (2) the live DB is currently malformed. Both require investigation before the next session runs.
Issue 1 — Session Closure Never Writes ended_at (CRITICAL)¶
Scope: All 37 sessions. Not a new regression — present since S1.
Evidence:
session_id | started_at | ended_at | status
-----------|-----------------------------------|----------|-------
1 | 2026-04-13T03:36:18 | None | OPEN
2 | 2026-04-13T18:22:10 | None | OPEN
...
36 | 2026-04-18T02:18:12 | None | OPEN
37 | 2026-04-18T16:35:35 | None | OPEN
Every single session shows ended_at = NULL and ending_xrp / ending_rlusd / ending_value = NULL.
The write path exists and is called:
main_loop.py line ~665 calls self._state.close_session(...) inside _shutdown(). The call is correct and passes real inventory values. However it is wrapped in:
try:
self._state.close_session(
ending_xrp=...,
ending_rlusd=...,
ending_value=...,
halt_reason=...,
)
except Exception as exc:
log.debug("Session close failed — continuing shutdown", extra={"error": str(exc)})
The failure is completely silent. log.debug means it never appears in normal terminal output. Every session has silently failed to close since the beginning, and nobody could see it.
What to investigate:
1. Is close_session raising an exception on every call? If so, what exception?
2. Is _current_session_id None at shutdown time (early return)?
3. Is the DB connection already closed when close_session is called? Check the ordering: close_session() is called, then self._state.close() — but does something upstream close the connection first?
4. Is the _transaction() context manager failing in WAL mode at shutdown?
Fix required:
- Change log.debug → log.error at minimum so failures are visible
- Determine and fix root cause of the write failure
- Consider a recovery pass to backfill ended_at for historical sessions from log timestamps
Issue 2 — Live DB Malformed After S38¶
Evidence:
sqlite3.connect('neo_live_stage1.db')
cur.execute('PRAGMA integrity_check')
# → sqlite3.DatabaseError: database disk image is malformed
File state after S38 stopped:
neo_live_stage1.db 9,781,248 bytes last modified: Apr 17 20:19 ← stale
neo_live_stage1.db-shm 32,768 bytes last modified: Apr 18 15:55
neo_live_stage1.db-wal 0 bytes last modified: Apr 18 15:55
The WAL is 0 bytes but the SHM is 32KB. The main DB file timestamp is from Apr 17 — before S37 and S38 ran — which means writes during S37 and S38 never got checkpointed into the main DB file. When the WAL was cleared (or lost), those writes were gone.
Most likely cause: Terminal was closed via the X button (CTRL_CLOSE_EVENT) rather than Ctrl+C. The code explicitly notes this in run_paper_session.py:
# CTRL_CLOSE_EVENT (terminal-close) is NOT caught by Python's signal module;
# the pre-run backup is the defense for that exit path.
When the terminal closes this way, the process is killed mid-transaction. The WAL gets truncated or left in an inconsistent state. This corrupts the DB.
Katja confirmed: she did NOT see the [FLAG-027] SIGINT received — shutting down cleanly... message before the engine stopped. This is consistent with a non-signal exit (terminal close or process kill).
What to investigate:
1. Confirm the DB-corruption mechanism — is this specifically the CTRL_CLOSE_EVENT path?
2. The pre-run backup IS the defense for this. Verify the SQLite backup API backup correctly captures WAL state so no data is lost on restore.
3. Consider whether additional hardening is possible: e.g., periodic WAL checkpoint (PRAGMA wal_checkpoint(TRUNCATE)) during the session to reduce exposure window.
Current DB State¶
Live DB: neo_live_stage1.db — MALFORMED. Do not run the engine against it.
Best backup: neo_live_stage1.db.bak.20260418T192119Z
- Integrity check: OK
- Created: 2026-04-18T19:21 UTC (right before S38 started)
- Contains: S1–S37 complete (fills, orders, capital events)
- Missing: S38 session record only — S38 had 0 fills, so nothing of financial value is lost
- Orders: 2 active orders from S37 era (need cleanup on next startup)
Before Katja can run S39: restore the live DB from this backup. The engine's FLAG-033 startup integrity check will refuse to start against the malformed DB anyway.
Restore command (run in neo-2026 directory):
Summary of Audit Scope¶
| Issue | Severity | Since | Root cause confirmed? |
|---|---|---|---|
ended_at never written — all 37 sessions |
CRITICAL | S1 | No — needs investigation |
log.debug silently hides close_session failures |
HIGH | S1 | Yes |
| DB corruption on terminal-close (non-signal exit) | HIGH | S38 | Likely yes |
Suggested priority:
1. Fix ended_at write failure (root cause + fix)
2. Change silent log.debug → log.error for shutdown failures
3. Recommend WAL checkpoint cadence or other hardening
4. Confirm backup restore is safe and complete
No new session should run until Issue 1 root cause is identified. We have been operating with broken session accounting since the beginning.
— Vesper