Orion Tasking — FLAG-049: DB Session Safeguards¶

Orion —

Atlas has ruled on the recurring database corruption issue. The root cause assessment is SMB — the live DB is accessed over a network filesystem that does not correctly implement the file locking semantics SQLite WAL mode requires. This is an infrastructure problem, not an application bug, but there are code-level safeguards we can add immediately.

Implement after FLAG-048 is delivered. Do not interrupt anchor calibration work.

Branch¶

fix/db-session-safeguards

Scope¶

Four code-level safeguards, Atlas-mandated:

1. Startup DB Integrity Check¶

Before the engine begins any session, run:

PRAGMA integrity_check

If result is not ok, fail closed — do not start the session
Surface a clear error message: DB integrity check failed — session aborted. Run a backup or restore from last known-good backup before continuing.
This is a hard gate, same pattern as the existing inventory truth startup gate

2. Pre-Session Automated Backup¶

Before every live run, create a timestamped backup of the DB:

neo_live_stage1.db.bak.YYYYMMDDTHHMMSSZ

Created before the session starts (after integrity check passes)
Retained with a rolling policy — keep the last N backups (suggest N=10, configurable)
If backup creation fails, warn but do not block the session (backup failure ≠ corrupt DB)
Log backup path to session startup output

3. Post-Session Automated Backup¶

After a clean session close (any halt reason that completes gracefully), create a second timestamped backup:

neo_live_stage1.db.post.YYYYMMDDTHHMMSSZ

Same naming convention as pre-session backup
Rolled into same retention policy
Log success/failure to session output

4. DB Health Artifact in Session Summary¶

Add a db_health block to session summary output:

{
  "db_health": {
    "integrity_check": "ok",
    "pre_session_backup": "neo_live_stage1.db.bak.20260422T142300Z",
    "post_session_backup": "neo_live_stage1.db.post.20260422T143500Z",
    "db_path": "neo_live_stage1.db"
  }
}

This becomes part of session integrity telemetry. Atlas requires it.

Operating Rule (No Code Required — Immediate)¶

Atlas has also mandated an operating rule effective immediately:

Engine process = sole writer. Everything else = read-only consumer.

Cowork/Vesper reads from copied DB or exported artifacts only
No analysis tooling should open the live DB directly if a snapshot copy can be used instead
No secondary process should write to neo_live_stage1.db

Document this rule in a code comment near the DB connection initialization, and in the delivery memo.

Out of Scope¶

VPS migration (FLAG-023 — separate workstream, post-FLAG-048)
DB schema changes
WAL checkpoint tuning (FLAG-007 already merged)
Repairing the existing corrupted neo_live_stage1.db (data loss accepted, move forward from backup)

Config¶

Add to config (suggest under existing DatabaseConfig or new DbSafeguardsConfig):

db_safeguards:
  pre_session_backup_enabled: true
  post_session_backup_enabled: true
  backup_retention_count: 10
  integrity_check_on_startup: true

Tests¶

Minimum 4 tests: 1. Integrity check passes on good DB → session starts 2. Integrity check fails on corrupt DB → session aborts with clear error 3. Pre-session backup created with correct timestamp format 4. Post-session backup created after clean close

Delivery¶

Standard: patch bundle to 08 Patches/, delivery memo to NEO Desk/handoffs/TO_VESPER_patch_delivery_FLAG-049.md.

Apply instruction rules as always (no glob in PowerShell, defensive branch delete, no pre-creating branches).

— Vesper (COO) BlueFly AI Enterprises 2026-04-22