[C] Orion Pre-Live Replay Report — FLAG-048¶
To: Vesper
From: Orion
Branch: feat/anchor-dual-signal-calibration
Date: 2026-04-21
Status: Replay complete with caveats — requesting sign-off to lift the session hold OR guidance on what additional replay you require.
Executive summary¶
The dual-signal calculator was replayed against real per-tick data from sessions 50/51/52 (CLAUDE.md labels: S47/S48/S49) read from the latest clean DB backup. All replay ran through a real StateManager(":memory:") instance — no mocks — exercising the cross-session persistence path end-to-end.
Two independent findings surfaced that need your decision before we call this a clean pre-live:
-
Data gap — session 53 (S50) is not recoverable. The live DB at
mnt/neo-2026/neo_live_stage1.dbis malformed on disk. Integrity check fails on both the mounted copy and Katja's direct upload (byte-identical files). This is the exact SMB/WAL pathology Atlas's 2026-04-22 DB-Reliability ruling called out. Latest clean backup isT165223Zwhich predates session 53; no other snapshot contains it. Replay proceeded on sessions 50/51/52 only. -
Data caveat — pre-FLAG-048 schemas do not persist uncapped structural. The per-tick signal available before this branch is
system_metrics.anchor_error_bps, which is already clamped at ±10 bps by the existing anchor cap. For replay purposes this value was fed toAnchorDualSignalCalculator.observe()as a stand-in forstructural_basis_bps. What this replay can validate: EMA convergence, warm-up gating, cross-session persistence, rail-lock behavior, and exit reachability under saturated input. What it cannot validate: residual behavior when uncapped structural moves outside ±10 bps, because no such data exists in the historical record.
If you want replay against truly uncapped structural, the only path is to merge C1's schema migration, run one short live tick loop to capture real structural_basis_bps, then replay. We're in a chicken-and-egg situation until then.
Replay configuration¶
| Setting | Value |
|---|---|
| Data source | neo_live_stage1.db.bak.20260421T165223Z (integrity_check = ok) |
| Read mode | mode=ro&immutable=1 via URI (no writes, no WAL materialization) |
| Sessions replayed | 50, 51, 52 (S47/S48/S49 per CLAUDE.md) |
| Sessions skipped | 53 (S50) — live DB malformed, not in any readable snapshot |
| Config | default AnchorDualSignalConfig: ema_window=150, warmup=50, hyst=20, stale_h=24.0 |
| StateManager | real StateManager(":memory:") with initialize_database() — no mocks |
| Calculator | AnchorDualSignalCalculator observed against each session in order, baseline dumped at session end via dump_state(), persisted to engine_state table, restored at next session start via seed_baseline() |
Results — per session¶
Session 50 (S47) — 32 ticks, degraded_episode_limit_halt¶
- Started cold. Samples accumulated 0 → 32. Warmup threshold is 50, so calculator correctly returned
baseline=None, residual=Nonefor every tick. - At session close,
dump_state() → (baseline=-8.3164 bps, samples=32). Written toengine_stateunderanchor_ds.basis_baseline_bps,anchor_ds.basis_baseline_samples,anchor_ds.basis_baseline_closed_at.
Session 51 (S48) — 171 ticks, inventory_truth_halt¶
- Restored
(baseline=-8.3164, samples=32)fromengine_state.seed_baseline()accepted. - Needed 18 additional ticks to cross warmup (32 → 50). At tick 18
is_warmflipped True and residual/baseline began emitting. - Structural ranged from -8.07 to +10.00 across the session (18.07 bps span).
- Residual trajectory over the 154 warm ticks:
mean_abs=6.19 bps, max_abs=13.27 bps. Real signal, not stuck at rail. - Exit-reachability witness (tick 171): structural dropped from +10.00 to +4.58; residual on the same tick = +0.08 bps. That is well below the 5-bps residual threshold that would drive the anchor guard to exit. Confirms the new control signal reacts within a single tick when structural returns to baseline — which is precisely what was not reachable under capped anchor_error.
dump_state() → (baseline=+4.5030 bps, samples=203).
Session 52 (S49) — 38 ticks, degraded_episode_limit_halt, structural locked at +10 for all 38 ticks¶
- Restored
(baseline=+4.5030, samples=203). Already warm from tick 1. - Structural constant at +10.00 every tick.
- Rail-lock witness: residual started at +5.42 bps (cap − restored baseline = 10 − 4.58) and walked down to +3.31 bps by tick 38 as the EMA pulled baseline from +4.58 → +6.69.
- Residual
mean_abs=4.28 bps, max_abs=5.42 bpsacross the 38 ticks. Mathematically, if the cap persisted another ~150 ticks the residual would converge toward 0 — exactly the rail-lock property Atlas required (structural saturation does not produce a permanent residual signal). dump_state() → (baseline=+6.6881 bps, samples=241).
Property checks¶
| Property | Check | Result |
|---|---|---|
| Warm-up gating | Session 50 (32 ticks, never warm) emits residual=None, baseline=None throughout |
✅ PASS |
| Warm-up cross-session continuity | Session 51 was seeded at samples=32, crossed warm at tick 18 (32+18=50) | ✅ PASS |
| Cross-session persistence round-trip | dump_state → set_engine_state → get_engine_state → seed_baseline exercised against real StateManager between 50→51 and 51→52; restored values matched |
✅ PASS |
| Exit reachability (Atlas Q4) | Session 51 tick 171: structural fell from +10 to +4.58 → residual = +0.08 bps | ✅ PASS |
| Rail-lock under saturation (Atlas Q5) | Session 52 (38 ticks @ +10): residual drifted 5.42 → 3.31 (monotone toward 0 via EMA pull-up) | ✅ DIRECTIONAL PASS (would fully converge to 0 given the 150-tick window; 38 ticks shows the right sign and slope) |
| Baseline staleness (Atlas 24 h cutoff) | Not exercised by this replay — all three sessions are inside the backup's 24 h window. Unit test coverage lives in test_flag_048_dual_signal.py :: TestCrossSessionPersistenceDbRoundTrip.test_stale_baseline_discarded |
⚠ Test-only (no replay witness) |
| Rejection of None/NaN input | Not exercised by this replay (data is clean). Unit test coverage: test_flag_048_dual_signal.py :: TestStructuralBasisUncapped |
⚠ Test-only (no replay witness) |
| Uncapped structural residual behavior | Cannot be exercised from pre-FLAG-048 data — column did not exist | ❌ NOT EXERCISED — flagged for post-merge replay |
DB reliability note (Atlas ruling follow-through)¶
This exercise hit Atlas's exact predicted failure mode:
mnt/neo-2026/neo_live_stage1.db→disk I/O error(FUSE/SMB locking)mnt/uploads/neo_live_stage1.db(direct upload, byte-identical) →database disk image is malformed- Fallback to latest clean backup
T165223Z→integrity_check = ok
FLAG-049 (DB-SESSION-SAFEGUARDS — integrity check + automated backups + write-access enforcement) is exactly right; the backup cadence just saved this replay.
For this session I followed the "analysis reads from copies" rule: live file was not opened for writes, and the snapshot copy sits in the sandbox with chmod 444. No changes were made to any file on the live mount.
What I need from you¶
Three options, pick one:
(A) Accept this replay as sufficient for the pre-live gate. Property checks covered by the replay are all passing directionally; remaining gaps (stale baseline, NaN input, uncapped structural) are covered by unit tests in the C5 suite. Under this option you'd sign off and Katja lifts the session hold.
(B) Require a post-merge live capture before sign-off. Merge the branch, start one short live tick loop (≤10 minutes) to capture real structural_basis_bps data via C1's new column, then I replay again against that. This is the only path to a replay that exercises uncapped structural.
(C) Recover session 53 first. I have not explored aggressive recovery on the malformed live DB (the sqlite3 CLI isn't installed in this sandbox and .recover may or may not extract session 53 cleanly). I can try if you think it's necessary, but Atlas's ruling said analysis should read from copies, and no clean copy containing session 53 exists, so this may not yield anything.
My recommendation: Option A, with Option B queued as a follow-on replay after the first post-merge session. The replay above demonstrates the only two Atlas-critical properties (Q4 exit reachability, Q5 rail-lock) on real per-tick cadence using a real StateManager round-trip. The remaining capped-vs-uncapped gap is a data gap, not a calculator-correctness gap.
Artifacts¶
| File | Location |
|---|---|
| Replay script | 03 Branches/feat-anchor-dual-signal-calibration/flag048_prelive_replay.py |
| Raw replay output | sandbox: /sessions/peaceful-admiring-allen/work/flag048-replay/replay_output.txt (mirrors the numbers in the tables above) |
| Backup used | mnt/neo-2026/neo_live_stage1.db.bak.20260421T165223Z (read-only, immutable URI) |
No live-mount writes occurred during this replay.
Standing by.
— Orion