Skip to content

NEO Trading Engine — Open Flags

Flags are observations noted during live sessions that are not immediate action items but should be reviewed before the next decision point.


Active Flags

FLAG-054 — DRIFT-GUARD-C-CALIBRATION ✅ CLOSED

Status: CLOSED ✅ — merged 2026-04-22 Filed: 2026-04-22 Atlas ruling: 2026-04-22 — Option D approved (temporary retirement behind config flag)

Root cause: Condition C (no_opposing_fill_ticks=15) fired in every session after a buy fill in a negative CLOB-AMM regime. 6/6 DEGRADED escalations were condition C; 0 genuine toxic-flow catches across all sessions. Condition C was functioning as a false-positive termination mechanism, not a protection layer.

Fix (Vesper, commit 9d7825d): - drift_condition_c_enabled: bool = True added to DirectionalDriftGuardConfig - and cfg.drift_condition_c_enabled gate added to condition C firing check in main_loop.py - drift_condition_c_enabled: false set in config_live_stage1.yaml - load_config wired; 4 new tests (4 pass / 2 skip)

Post-fix status: Condition C machinery preserved for future regime-aware redesign (Option B). Re-enable requires Atlas approval. S58 validation run in progress.


FLAG-053 — EXIT-EVALUATOR-LOCKOUT ✅ CLOSED

Status: CLOSED ✅ — merged 2026-04-22 Filed: 2026-04-22 Atlas ruling: 2026-04-22 — option (iii) approved (Option A: preview residual window + Option B: structural early-exit + sign convention standardization)

Root cause (Orion investigation): - Q1 EMA deadlock hypothesis REFUTED — observe() runs unconditionally, no ANCHOR_IDLE gating. - Q2 REAL BUG — _select_anchor_guard_window() fallback to legacy capped window (±10 bps) for first 70 ticks. In cap-saturated regime, both exit predicates mathematically fail → 100-tick minimum to exit ANCHOR_IDLE. - Q3 Dashboard sign mismatch NOT A BUG — three sign conventions, same regime, opposite sign perspectives. Legibility gap, not calculation error.

Fix (Orion, 4 commits): - Option A: Preview residual window (residual_exit_preview_warmup_ticks=20, residual_exit_preview_lookback_ticks=10) unlocks exit ~40 ticks earlier. for_exit=True kwarg ensures entry path never sees preview. - Option B: Structural early-exit path — raw last_structural_basis_bps, recovery_structural_early_exit_enabled=True, 30-tick stability counter. - Sign standardization: last_anchor_divergence_bps recomputed from canonical (mid − amm) / amm × 10000. Now matches structural_basis_bps and tick log. - Phase separation invariant: A and B never evaluate the same tick. Source-label branching enforced.

Branch: fix/flag-053-anchor-idle-exit-lockout — 4 commits, 14 files, +1112/−198 Tests: 13 new tests (test_flag_053_anchor_idle_exit_lockout.py) + 43/43 adjacent suite green Merged: main, 2026-04-22. Session hold lifted. Post-merge ops: - anchor_error_bps DB column now stores uncapped signed values (magnitudes may exceed ±10 bps) - Dashboard _render_anchor reads anchor_divergence.* keys (old anchor.* keys no longer written) - summarize_paper_run session-line: spot-check on first post-merge session


FLAG-052 — Truth check fires before reconciler resolves CANCEL_RACE_UNKNOWN (timing race) ✅ CLOSED

Identified: 2026-04-22 (S54 post-mortem — initial diagnosis corrected after code review) Resolved: 2026-04-22 — fix/flag-052-cancel-race-timer merged to main (no-ff merge, 2 files, +396 insertions) Confirmed in production: S54 (session_id=54) — engine entered DEGRADED at ~02:04 UTC. _cancel_all_live_orders("Degraded entry cancel") ran. Buy order had been filled on-chain → XRPL returned tecNO_TARGET → FLAG-047 correctly called mark_cancel_race_unknown → order is CANCEL_RACE_UNKNOWN. BUT: the 60-second truth check interval had elapsed; _maybe_run_periodic_truth_check fired at the TOP of the very next tick — BEFORE the reconciler (Step 5) could call get_account_tx_for_offer and record the fill. Truth check saw +13.65 XRP / −19.5 RLUSD delta → inventory_truth_halt. Reconciler never ran that tick. Initial diagnosis (WRONG): Thought the DEGRADED cancel path lacked FLAG-047's tecNO_TARGET detection. Corrected after reading _cancel_all_live_orders — both DEGRADED and shutdown paths share one function with FLAG-047 already in place. Actual root cause: Tick ordering: _maybe_run_periodic_truth_check runs at tick START, before reconciliation (Step 5). A 60-second check interval aligning with the tick immediately after CANCEL_RACE_UNKNOWN creation → truth check fires before reconciler gets to resolve the race. Fix (Vesper): In the else: branch of _cancel_all_live_orders (after mark_cancel_race_unknown succeeds), add self._last_truth_check_ts = time.time(). This gives the reconciler one full check_interval_s before the next truth check. On DB write failure the timer is NOT reset (correct — no fill to wait for). Tests: 4 tests — TIMER_RESET_ON_CONFIRMED_RACE, TIMER_NOT_RESET_ON_DB_FAILURE, TIMER_NOT_RESET_ON_NORMAL_CANCEL, GRACE_WINDOW_DEFERS_TRUTH_CHECK (integration). 4/4 green. FLAG-047 adjacency 12/12 green. Full suite baseline unchanged (357 pre-existing failures on main, same on branch). Branch: fix/flag-052-cancel-race-timer Why FLAG-047 tests didn't catch this: FLAG-047 tests mocked/disabled the truth check. The timing race requires a real 60-second interval aligned to the post-CANCEL_RACE tick. Integration case not covered — now covered by GRACE_WINDOW_DEFERS_TRUTH_CHECK. Status: CLOSED ✅ — merged Apr 22. S55 unblocked.


FLAG-051 — Cross-session EMA staleness causes ANCHOR_IDLE lockout on regime shift

Identified: 2026-04-22 (S53 post-mortem) Confirmed in production: S53 — engine entered ANCHOR_IDLE at tick ~5 and never exited. EMA baseline persisted from S49/S50/S52 sessions (~+10 bps cap-locked). S53 structural at −5 bps → residual −15 bps → guard fires, can't exit. Root cause: baseline_staleness_hours (24h) is a time-based check only. Does not detect structural regime shifts within the time window. A 15 bps regime shift within minutes of session end caused immediate ANCHOR_IDLE lockout with no exit path. Fix: Add baseline_regime_drift_threshold_bps (default 10.0 bps) to AnchorDualSignalConfig. On first observation after persistence restore, if abs(structural − persisted_baseline) > threshold, discard baseline and cold-start warm-up. Files: neo_engine/config.py, neo_engine/dual_signal_calculator.py, config/config_live_stage1.yaml Delivery: 03 Branches/fix-flag-051-regime-drift/ — 11 tests, all passing Status: CLOSED ✅ — merged a5897cc (Apr 22), 13 tests green. S54 unblocked.


FLAG-050 — Drift guard fill history not reset on ANCHOR_IDLE entry

Identified: 2026-04-22 (Orion Q2 answer, containment fix delivery) Confirmed in production: S52 (Apr 22) — engine blocked from quoting entire session post-sell-fill; drift C counter carried into and across ANCHOR_IDLE, fired immediately when engine attempted to resume. Scope: _drift_ticks_since_opposing_fill not cleared on ANCHOR_IDLE entry. Only resets on successful drift recovery. Pre-idle fill imbalance persists into post-idle quoting window, causing immediate C re-fire on ANCHOR_IDLE exit. Atlas ruling (Apr 22): Reset _drift_ticks_since_opposing_fill and any C-specific quoting-continuity state on ANCHOR_IDLE ENTRY (not exit). Preserve A/B state. Do not reset unrelated system state. Companion change: Idle-sourced and active-sourced episodes must be split — idle escalations must not consume active safety budget. (See Atlas ruling.) Orion tasked: NEO Desk/handoffs/TO_ORION_guard_architecture_fixes_post_atlas_ruling.md Status: CLOSED ✅ — fix/anchor-idle-guard-semantics merged (Apr 22), commits b270083 + a69b0de + 3fff6b3. 72 tests green. S53 unblocked.


FLAG-049 — DB-SESSION-SAFEGUARDS: startup integrity check, automated backups, write-access enforcement

Identified: 2026-04-22 (Atlas ruling on recurring DB corruption pattern) Scope: Recurring SQLite WAL corruption under local+SMB setup. Atlas ruled: (1) add PRAGMA integrity_check startup gate — fail closed on suspect DB; (2) mandatory timestamped pre-session backup before every live run; (3) mandatory timestamped post-session backup after clean close; (4) DB health artifact in session summary (integrity result, backup paths, DB path); (5) engine-only write access rule — Cowork/Vesper reads from copies only. Atlas ruling: 07 Agent Coordination/[C] Atlas Ruling — DB Reliability SMB Risk and VPS Migration Sequencing.md Root cause assessment (Atlas): SMB network filesystem does not correctly implement POSIX file locking semantics that SQLite WAL mode depends on. Infrastructure problem, not application bug. FLAG-007 hardening was correct but does not resolve the underlying storage reality. Orion tasking: NEO Desk/handoffs/TO_ORION_db_reliability_safeguards_FLAG-049.md — branch fix/db-session-safeguards Priority: High — implement after FLAG-048 delivery. Does not block anchor calibration work. Status: OPEN — Orion tasked. Implement after FLAG-048.


FLAG-048 — ANCHOR-CALIBRATION: anchor reference price methodology requires correction ✅ CLOSED

Identified: 2026-04-22 (S49/S50 post-session — Katja observation, Atlas ruling) Resolved: 2026-04-22 — feat/anchor-dual-signal-calibration merged to main (merge commit a8033e5, 13 files, 2166 insertions / 33 deletions, 1 new module, 17 new tests) Signal model (Atlas-locked, Option 3): - structural_basis_bps = ((clob_mid - amm_price) / clob_mid) * 10000.0 — diagnostic/context only - rolling_basis_baseline_bps — EMA of structural basis, configurable window (default 150 ticks) - residual_distortion_bps = structural_basis_bps - rolling_basis_baseline_bps — control signal; ANCHOR_IDLE keys off this What shipped: C1 (schema+config), C2 (dual signal calculator + wire), C3 (guard rewire + rename), C4 (cross-session persistence), C5 (17-test suite). 708 passed / 378 pre-existing failures unchanged. Rail-lock proof (T4) + exit reachability proof (T6) both green. Pre-live gate still open: Replay comparison on S48/S49/S50 showing exit condition reachable in afternoon ET — synthetic proof in C5, real DB replay pending. Required before lifting session hold. Status: CLOSED ✅ — merged Apr 22. Pre-live replay required before next session.


FLAG-047 — CANCELLED_BY_ENGINE guard masks real fills in cancel-fill race ✅ CLOSED

Identified: 2026-04-21 (S48 post-session, logs confirmed) Resolved: 2026-04-22 — fix/cancel-fill-race merged to main (fast-forward, 5 commits, +2021/−5, 12 tests) Branch: fix/cancel-fill-race — C1 schema+gateway, C2 cancel branch, C4 AffectedNodes parser, C3 reconciler branch, C5 tests Fix: On tecNO_TARGET cancel response, order demoted to CANCEL_RACE_UNKNOWN. Reconciler queries on-chain account_tx via new get_account_tx_for_offer gateway method. Three-way resolution: FILL (record atomically via mark_filled_after_race), CANCEL (terminal), INCONCLUSIVE (fail-closed → DEGRADED). S48 regression fixture confirmed passing. 106 adjacent-FLAG tests green. Status: CLOSED ✅ — merged Apr 22. S49 unblocked.


FLAG-046 — ANCHOR_IDLE state (FLAG-044 architectural refinement)

Identified: 2026-04-21 (S47 post-session — Katja observation, Atlas ruling) Scope: Anchor saturation currently routes through DEGRADED and consumes episode budget. This is architecturally incorrect — anchor saturation is a market condition (regime-based pause), not a system failure (safety-based pause). Atlas has approved ANCHOR_IDLE as a distinct third state between ACTIVE and DEGRADED. State model (Atlas-locked 2026-04-21): - ACTIVE → trade - ANCHOR_IDLE → wait for market (no episode count, no recovery machinery, no direct HALT path) - DEGRADED → protect system (drift/corridor/truth only — episode counted, recovery machinery, cap → HALT) - HALT → stop session ANCHOR_IDLE entry: anchor saturation condition met → cancel_all, stop quoting, log ANCHOR_IDLE_ENTER. No episode increment. ANCHOR_IDLE exit: anchor_mean < exit threshold AND prevalence < 30%, sustained N stability ticks → log ANCHOR_IDLE_EXIT, resume quoting. ANCHOR_IDLE interaction: if drift/corridor/truth fires while idle → escalate to DEGRADED (episode counted there, not at idle entry). Required tests (Atlas): (1) saturation → ANCHOR_IDLE, no episode; (2) persistent hostile → stays idle, no halt; (3) anchor normalizes → resumes quoting; (4) drift during idle → DEGRADED + episode; (5) no direct ANCHOR_IDLE → HALT path. Locked principle (Atlas 2026-04-21): "The engine should not consume failure budget for conditions that are expected to persist. Anchor saturation tells the system to pause — not to panic." Ruling: 07 Agent Coordination/[C] Atlas Ruling — ANCHOR_IDLE State (FLAG-044 Refinement).md Branch: feat/anchor-idle-state — 4 commits (c3eba41, 502be63, 7e0ba1c, 664212d), +1181/−444, 7 new tests, 65 adjacent-suite green. Status: CLOSED ✅ — merged Apr 22. S50 unblocked.


FLAG-044 — Replace recovery_exhausted_halt with cool-down session model

Identified: 2026-04-21 (S45 post-session, Katja + Vesper — escalated to Atlas) Scope: The FLAG-042 per-episode cap (recovery_exhausted_halt) collapses two distinct cases — true oscillation and persistent hostile regime — into a single early halt. In S45, the engine correctly entered DEGRADED but halted at 163 seconds with no way to wait out the regime. Atlas ruled to replace the hard cap with a cool-down model: after a failed recovery attempt, suppress the recovery evaluator for recovery_cooldown_ticks (default 120 ticks, ~8 min), then re-enable. A per-source episode cap (max_degraded_episodes_per_source_per_session: 3) replaces the old per-attempt cap as a backstop against pathological churn. Atlas ruling: 07 Agent Coordination/[C] Atlas Ruling — Cool-Down Session Model (FLAG-044).md Branch: feat/flag-044-recovery-cooldown — merged to main Apr 21 (c926cb7, 4 commits, 26 new tests) Locked principle: "A bad regime is not, by itself, a reason to terminate the session." Operator change: Use --duration-seconds 7200 (2-hour sessions). Prove 2-hour behavior before moving to 4 hours. Status: CLOSED ✅ — merged Apr 21


FLAG-043 — Corridor recovery stability window too short (dedicated param needed)

Identified: 2026-04-21 (Vesper review of feat/flag-042-degraded-recovery) Scope: Inventory corridor recovery currently reuses corridor_lookback_ticks (default: 3 ticks, ~12s at 4s cadence) for both the entry-side guard and the exit-side recovery window. This is compliant with the FLAG-042 spec but short — inventory can oscillate near the corridor boundary at that frequency, potentially causing rapid DEGRADED entry/exit cycles. The per-episode cap will catch any loop, but if this pattern appears in S45+ data it warrants a dedicated recovery_stability_ticks_corridor parameter to allow the recovery window to be set independently of the entry window. Trigger condition: observe in S46+ whether corridor recovery fires and re-enters DEGRADED within the same episode. If degraded_episode_limit_halt fires on a corridor source, this flag becomes active. Proposed fix: add recovery_stability_ticks_corridor to InventoryCorridorGuardConfig (default: 10–15 ticks, configurable). Decouple from corridor_lookback_ticks. Parallel to recovery_stability_ticks_drift already present in drift guard config. Status: OPEN — monitor in S46+. Low urgency. Address only if corridor recovery cycling is observed.


FLAG-042 — No DEGRADED recovery path for non-truth guards

Identified: 2026-04-21 (S42 post-session investigation, Orion) Scope: _exit_degraded_mode is only called by the wallet truth check on ok. Anchor saturation, directional drift, and inventory corridor guards enter DEGRADED but have no exit path — the 300 s wallet-truth timeout is the sole escape, and it exits to HALT. DEGRADED is Atlas-spec'd as recoverable ("cancel all, stop quoting, continue reconciliation, recoverable without restart"). Current implementation makes it a one-way gate to HALT after 300 s for all market-regime guards. Atlas ruling — UPDATED (Apr 21): FLAG-042 pulled forward from Phase 7.4 into current phase. S43 (mean +9.28 bps, 100% hostile) confirmed guards fire correctly. S44 (mean +4.43 bps, range [−3.6, +10.0], 57% prevalence, cycling regime) confirmed that anchor recovered mid-session but engine had no exit path — sat idle ~5 min, then halted. This is the mixed-regime evidence Atlas required. Recovery spec (Atlas-locked): - Anchor saturation exit: abs(mean) < 4 bps AND prevalence < 30%, sustained 20-40 consecutive ticks - Hysteresis required: entry threshold (6 bps / 40%) ≠ exit threshold (4 bps / 30%) - Time stability: N consecutive ticks — no single-tick exits - State reset on exit: reset rolling windows, guard counters, treat as fresh regime - One recovery attempt per episode: second DEGRADED re-entry → HALT immediately (no loop) - Anchor recovery first; directional drift + inventory corridor secondary (minimal) Branch: feat/flag-042-degraded-recovery — merged to main Apr 21 (9639b18, 5 commits, +1391/−45, 16 new tests, 162 passed full regression). Ruling files: 07 Agent Coordination/[C] Atlas Brief — S43 S44 Results + FLAG-042 Decision.md, 07 Agent Coordination/[C] Atlas Ruling — FLAG-042 Approved + Recovery Spec.md Status: CLOSED ✅ — merged Apr 21. S45 unblocked.


FLAG-036 — On-chain wallet reconciliation at session boundaries (BLOCKER)

Identified: Apr 19 (XRPScan on-chain audit) Scope: The engine's get_snapshot() reports XRP balance from internal inventory tracking, not from on-chain account_info. As of Apr 19, the DB was over-reporting by 43.87 XRP vs the actual on-chain balance — the engine was making trading decisions on a balance that was fictionally large. This caused the anchor saturation and fill-size asymmetry to do far more damage than the DB showed, undetected.

Root cause under investigation: The inventory tracking baseline was inflated by ~7.4 XRP at S33 start (immediately post-injection), and drifted further. Orion needs to determine why get_snapshot() diverged from on-chain reality from the very first session.

Required fix (design pending Atlas ruling): - At session start: query on-chain account_info for real XRP and RLUSD balances and compare against internal tracking. Halt if delta exceeds threshold. - At session end: same reconciliation check. Write discrepancy to DB. - Ideally: surface the gap as a live metric in the dashboard so it's visible before it becomes a problem.

Why this is a BLOCKER: Protection layers 1–3 (corridor guard, drift guard, saturation guard) all operate on the internal inventory state. If that state is wrong, the guards are protecting against the wrong numbers. No protection layer is meaningful until the balance baseline is trustworthy.

Atlas ruling (Apr 19): Confirmed blocker. Wallet truth reconciliation must become a first-class runtime safeguard — startup check, periodic 60s check, shutdown check, persisted status + deltas. Halt on mismatch above threshold. Not a nice-to-have. Orion tasking: [C] Orion Tasking — Wallet Truth Reconciliation + Inventory Baseline Investigation.md — two-phase: (1) root cause investigation first, (2) feat/wallet-truth-reconciliation branch. Status: OPEN — BLOCKER. Orion investigating root cause before implementation.


FLAG-035 — WAL checkpoint hardening (approved, deferred)

Identified: Apr 18 (Orion audit post-S38 DB corruption) Scope: CTRL_CLOSE_EVENT on Windows bypasses Python signal module — process killed mid-WAL-transaction. FLAG-027 backup + FLAG-033 startup check are current mitigations. Proposed fix (Atlas-approved): Periodic PRAGMA wal_checkpoint(TRUNCATE) on 60s timer. Must not interfere with main loop timing. Clean shutdown coordination required. Logging around checkpoint execution. Status: OPEN — approved, separate branch after S39 confirms clean behavior.


FLAG-040 — WAC correction post-phantom-fill fix

Identified: Apr 19 (Orion D1 investigation) Scope: _rebuild_wac replays from fills table, which includes phantom fills created by the reconciler's disappeared-order heuristic (FLAG-037). WAC is therefore wrong by the same proportion as the phantom-fill drift. After FLAG-037 (fix/reconciler-disappeared-order-conservative) lands and stops generating phantom fills, a one-time WAC rebuild pass against corrected fills is required. Status: OPEN — blocked on FLAG-037. Low urgency (WAC is display-only, not a safety gate).


FLAG-039 — Mid-session capital_events refresh gap

Identified: Apr 19 (Orion D1, Q4 latent gap) Scope: rebuild() runs only at engine startup. If a capital injection lands while the engine is running, _xrp_capital_overlay is stale until next restart. Under current ops (engine stopped → injected → restarted), this is not triggered. But it is a silent precondition — one unguarded change to ops procedure would re-introduce it silently. Proposed fix: Either (a) add a runtime refresh_capital_overlay() call after large capital events detected on-chain, or (b) document the stop-inject-restart requirement as an explicit invariant enforced by the startup sequence. Status: OPEN — low urgency, current ops pattern is safe. Address after Phase 7.3.


FLAG-038 — apply_fill() zero-quantity silent drop

Identified: Apr 19 (Orion D1, Q3/Q5) Scope: inventory_manager.py:245–247 raises on fill_quantity_rlusd <= 0. The exception is swallowed at execution_engine.py:956–967. The fill row exists in fills but no inventory_ledger entry is written → 754 vs 752 fill/ledger count mismatch → 0.69 XRP of residual internal drift. These are cancel-raced zero-fill events: an order gets cancelled while a fill notification is in flight, producing a zero-qty fill. Proposed fix: Instead of raise+swallow, log WARNING and explicitly write a zero-change inventory_ledger entry to maintain the fill/ledger 1:1 invariant. Or filter zero-qty fills at the source before they reach apply_fill. Status: OPEN — secondary contributor to inventory drift. Address in dedicated branch after feat/wallet-truth-reconciliation.


FLAG-037 — Reconciler phantom-fill heuristic violates its own invariant

Identified: Apr 19 (Orion D1, Q3 — primary root cause of inventory drift) Scope: ledger_reconciler.py:675–687_handle_disappeared_active_order() unconditionally calls _apply_full_fill(order, order.quantity, ...) when an active/partial order's offer_sequence disappears from account_offers without a cancel_tx_hash. This directly contradicts the module docstring (lines 17–18): "the reconciler never INVENTS a fill or cancellation." Any off-book cancellation, partial-fill-then-cancel, or transient node-snapshot gap produces a phantom fill credited to the inventory ledger. Over S1–S32 this mechanism credited ~6.71 XRP of phantom fills that never happened on-chain.

Escalation — S46 (Apr 21): FLAG-037 is now SESSION-BLOCKING. The phantom fill cycle is reproducing within every session that enters DEGRADED: 1. DEGRADED fires → cancel_all cancels all open orders on-chain 2. Reconciler next tick sees those order IDs gone from account_offers → no cancel_tx_hash → phantom fills applied 3. Wallet truth check detects delta (−6.28 XRP / +9.0 RLUSD) → status: halt 4. All subsequent ticks blocked by inventory_truth_gate 5. Engine cannot quote for the rest of the session

This has happened in S44, S45, and S46. Pre-session realignment corrects startup delta but cannot prevent within-session re-application. As long as any session enters DEGRADED and cancels orders, the phantom fill cycle re-creates the same divergence and the wallet truth gate blocks all trading.

Required fix — add explicit cancellation tracking: When the engine calls cancel_all (DEGRADED flow), it must write the cancelled order IDs to a "known_cancellations" record (either a new table or a status update on the orders table). The reconciler must check this before applying phantom fills — if the order ID is in known_cancellations, skip the phantom fill logic entirely. This distinguishes engine-initiated cancels from genuine disappeared orders.

Proposed fix (branch: fix/reconciler-disappeared-order-conservative): Original scope still valid. Expand to include: (1) known-cancellation write path in cancel_all DEGRADED flow, (2) reconciler check against known-cancellations before _apply_full_fill. General conservative fix (stop phantom fills, mark DISAPPEARED) is the fallback for orders not in known-cancellations.

Orion tasking: 03 Branches/fix-reconciler-disappeared-order-conservative/ — Atlas spec clarification required before implementation: confirm scope expansion from original FLAG-037 spec (age-threshold model) to explicit cancellation tracking model.

S47 confirmation (Apr 21): RECONCILER_SKIP_ENGINE_CANCEL fired for all CANCELLED_BY_ENGINE orders. Truth check: delta_xrp=3.97e-05 (negligible), delta_rlusd=0.0 — engine exited DEGRADED cleanly. No phantom fill cycle. FLAG-037 fix confirmed working across full DEGRADED→cancel_all→reconcile→truth-check sequence. Status: CLOSED ✅ — merged Apr 21. Confirmed in S47 (session_id=50). Phantom fill cycle broken.


FLAG-056 — RPC-PROVIDER: QuikNode credit exhaustion + provider selection required

Identified: 2026-04-22 (S60 post-session — risk_rpc_failure halt + realignment failure) Root cause: QuikNode API credit cap exhausted — 10,034,640 credits consumed in 11 days (Apr 11–22), ramping from ~30K to ~250K requests/day. TLS-level rejection (not HTTP 429) confirms hard cap hit. Upgrade to next tier = $49/month — rejected by Atlas as economically irrational at current capital level (~$200 RLUSD equivalent). Immediate mitigation (done): Switched config_live_stage1.yaml to s1.ripple.com:51234 (public Ripple endpoint). Committed 2026-04-22. Realignment and sessions restored. Atlas ruling (2026-04-22): Staged path approved — public bridge → cheap managed if available → self-host if managed pricing is irrational. Do not upgrade QuikNode. Evaluate NOWNodes, GetBlock, Ankr at real request profile. If none make economic sense, self-host rippled on Hetzner CPX31 (€12/month, approved). CPX31 upgrade approved if self-hosting chosen. Vesper pricing research (search-based, needs verification): - NOWNodes: ~$3/month for 100K requests/day — most promising. Katja to verify at nownodes.io/pricing. - GetBlock: XRPL on paid plans only, starts at $49/month — same as QuikNode, eliminated. - Ankr: complex credit-based PAYG, XRPL support unclear — lower priority. Decision pending: NOWNodes pricing confirmation. If $3/month holds at our volume → switch. If not → self-host on CPX31. Brief: 07 Agent Coordination/[C] Vesper → Atlas — RPC Provider Decision + FLAG-056.md Status: OPEN — public endpoint active as bridge. Provider decision pending NOWNodes verification.


FLAG-055 — SHUTDOWN-CANCEL: CANCEL_PENDING orders not swept at shutdown ✅ CLOSED

Identified: 2026-04-22 (S59/S60 post-session — recurring STARTUP_GATE_REFUSED pattern) Confirmed in production: Three consecutive sessions (S59, S60, S61 startup) required realignment before startup. Each showed the same +7.3 XRP / −10.5 RLUSD delta. Root cause confirmed from S60 startup log: offer_sequence 103476326 in CANCEL_PENDING, cancel_tx_hash=null, disappeared from ledger → cancel_race=1 → truth delta → STARTUP_GATE_REFUSED. Root cause: _cancel_all_live_orders calls get_active_orders() which returns only ACTIVE + PARTIALLY_FILLED. When _evaluate_cancels marks an order CANCEL_PENDING mid-tick (request_cancel called, gateway submit not completed before session end), the order is invisible to the shutdown sweep. Offer stays live on-chain. Next startup truth check sees fill delta → halts. Fix (Vesper, 2026-04-22): Three surgical replacements to _cancel_all_live_orders in main_loop.py: 1. Fetch CANCEL_PENDING orders with cancel_tx_hash=None and merge into the cancellable sweep list. 2. Skip mark_cancelled_by_engine for CANCEL_PENDING orders (already marked). 3. On tesSUCCESS: transition directly to CANCELED. On tecNO_TARGET: → CANCEL_RACE_UNKNOWN (same as ACTIVE path). Fix script: fix_flag055_shutdown_cancel_pending.py in Claude Homebase Neo. Tests: test_flag_055_shutdown_cancel_pending.py — 5 tests: sweep inclusion, success→CANCELED, tecNO_TARGET→CANCEL_RACE_UNKNOWN, tx_hash-set excluded, mixed active+pending sweep. 5/5 green. Commits: 1d046ce (main_loop.py fix), c007bc8 (test file) — 2026-04-22. Status: CLOSED ✅ — merged 2026-04-22. 5/5 tests green.


FLAG-045 — CANCELLED_BY_ENGINE orders accumulate in reconcile set indefinitely ✅ CLOSED

Identified: 2026-04-21 (Orion post-review observation) Confirmed blocking in production: S58 — 18 stale CANCELLED_BY_ENGINE orders from prior sessions were being rescanned every tick, generating ~36 log lines/tick (~3,900 noise lines over 109 ticks). Buried real events and made session interpretation unreliable. Root cause: _get_orders_for_reconciliation includes CANCELLED_BY_ENGINE in the scan set. After RECONCILER_SKIP_ENGINE_CANCEL fires, the function returns without any state transition — order stays in the scan bucket indefinitely. Fix (Vesper, 2026-04-22): After RECONCILER_SKIP_ENGINE_CANCEL logs, call self._state.update_order_status(order.id, OrderStatus.CANCELED). Retires to terminal state not in the scan bucket. No inventory change — housekeeping only. Atlas ruling (2026-04-22): Approved as written. Use CANCELED, not a new status. Consistent with CANCEL_RACE_UNKNOWN → cancel confirmed resolution path. Tests: test_flag_045_reconciler_noise.py — 4 tests: retire to CANCELED, not rescanned next tick, 18-order batch all retired, no phantom fill. Commits: fix_flag045_reconciler_noise.py applied, committed to main 2026-04-22. Status: CLOSED ✅ — merged 2026-04-22. 4/4 tests green.


WORKSPACE-003 — XRP Sentiment Intelligence Layer (Grok / xAI API)

Identified: Apr 21 (Katja idea, Vesper discussion) Scope: Pre-session market intelligence briefing powered by Grok (xAI). Scans live X/Twitter data for XRP sentiment — hashtags (#XRP, $XRP, #XRPL), high-signal accounts, breaking news, potential catalysts. Delivered as a structured brief before each session so Katja can make an informed go/no-go call with real market context, not just engine-internal signals. Why Grok specifically: Live X data is Grok's native edge. No other model has the same real-time access. Katja has tested it — it's significantly capable for this use case. xAI has a proper REST API (api.x.ai) for programmatic access. MCP registry check (Apr 21): No Grok, X/Twitter, crypto, or market sentiment connectors exist in the MCP registry. Integration would use xAI API directly via Python script or scheduled Cowork task. Proposed v1 workflow: Script hits xAI API → asks Grok to scan last 6 hours of XRP sentiment on X → returns structured brief (sentiment distribution, top signal posts, identified catalysts, go/no-go recommendation) → drops to a file Katja reads before session start. Can be triggered manually ("run sentiment check") or scheduled daily. Phase 2 (post-Phase 8): Sentiment score as a soft signal layer inside the engine — regime context beyond CLOB-AMM divergence. Not before the engine is stable. Cost note: xAI API is billed per token, separate from X Premium+ subscription. Daily pre-session scans would be low cost. Evaluate at signup time. Gate: Engine must be stable and running clean sessions before building this. Phase 8 is the realistic start point. Design note: 07 Agent Coordination/[C] Katja Idea — XRP Sentiment Intelligence Layer.md Status: OPEN — future, post-Phase 8. Log only. Revisit when sessions are running clean.


WORKSPACE-002 — Lean Agent Desk / Handoff System

Identified: Apr 21 (Katja + Atlas design, Vesper review) Scope: Lightweight operating-system infrastructure for agent coordination. Not engine-specific — spans any project using the NEO team model. Three lanes: handoffs/ (work transfer), reviews/ (processed outputs), escalations/ (operator + Atlas decisions). Named artifacts (TO_<RECIPIENT>_<TOPIC>_<REF>.md), one status file, n8n as notification layer once folder structure is stable. Atlas model: Atlas stays external — no embedding. "TO_ATLAS" routes to Katja who brings it to Atlas; ruling returns as ATLAS_RULING_<topic>.md. Atlas Principles doc (future) lets Vesper/Orion self-correct 80% of cases. Escalation trigger rule: live-run approvals, guard/threshold changes, flag scope changes, integrity failures, patches with cross-module consequences, or any item Vesper cannot route forward without Katja. Prerequisite before building: confirm whether Claude Homebase Neo is cloud-synced (OneDrive/GDrive/Dropbox) — determines whether n8n cloud or local/VPS install is the right path. Synergy: FLAG-023 (VPS) — same server could host n8n. Google Drive: confirmed available. Folder not yet synced — quick setup when ready. Enables n8n cloud to watch handoffs folder without laptop needing to be on. Cowork constraint: Vesper/Claude agents require active Cowork session — cannot run autonomously without Katja present. Notifications (n8n → email/phone) work without laptop on; agent reasoning does not. Future phase (Katja, Apr 21): once engine is running smoothly on VPS, explore Claude API / Claude Code on server for autonomous agent operations — Vesper reviewing session outputs without Katja present, autonomous routing, etc. "Bigger bells and whistles" — do not build until v1 handoff system is proven and engine is stable. Design docs: 07 Agent Coordination/[C] Atlas Design — NEO Lean Handoff System v1.md, [C] Atlas Guidance — How to Keep Atlas in the System.md, [C] Vesper Response — WORKSPACE-002 Agent Desk Review.md, [C] Atlas Alignment — WORKSPACE-002 Routing Model + Implementation Path.md, [C] Atlas Principles — NEO Operating Philosophy v1 DRAFT.md, [C] Vesper Brief — Orion WORKSPACE-002 Team Input Request.md Status: OPEN — future, not urgent against FLAG-042 / S45. Build when NEO engine sessions are stable and bandwidth allows.


WORKSPACE-001 — Skills audit + expansion

Identified: Apr 18 (Katja request) Scope: Review existing skills in workspace, identify gaps, build custom NEO-specific skills, explore Cowork plugin marketplace for useful additions. Priority skills to build: - Session debrief skill — auto-query DB, format for Experiment Log, draft Atlas alignment - Pre-session checklist skill — backup confirmed, config verified, integrity check, flags reviewed Also: Browse Cowork plugin marketplace for skills others have built that could be useful. Status: OPEN — post-S39 when bandwidth allows.


FLAG-029 — Async shutdown warnings + orphan reconciliation

Identified: Apr 17 (Orion observation) FLAG-029a: RuntimeWarning: coroutine 'submit_and_wait' was never awaited in xrpl_gateway.py:1044 during SIGINT cancel path. xrpl-py changed submit_and_wait to return a coroutine; cancel path doesn't await it. Non-fatal but will silently swallow cancel errors if API behavior changes. FLAG-029b: Engine reconciles order c7e14e73 on every launch — stale record never cleaned from snapshot. Non-fatal. Status: OPEN — monitored, non-fatal. Address in next Orion code pass.


FLAG-016 — Test suite not in clean state

Identified: Apr 14. Dominant failure cluster: config/model signature drift. Partial unblock Apr 17 (FLAG-008 restored max_size_pct_of_portfolio). Main failure cluster (xrpl module absence, 371 errors) unchanged. Impact: pytest cannot serve as a pre-run validation gate. Runtime log verification is the actual gate. Scope: Categorize failures → repair or retire stale tests → define minimal live-run safety subset → re-establish pytest as release gate. Status: OPEN — deferred. Required before broader refactors or production hardening.


FLAG-023 — VPS deployment + permanent dashboard hosting

Identified: Apr 16. Engine runs locally on Katja's machine. Dashboard exposed via ephemeral Cloudflare tunnel. Scope: Permanent Cloudflare tunnel (system service), VPS deployment (~$6–12/month), private key security (env var injection, no plaintext in config), process monitoring (systemd + journald), optional custom domain. Atlas reclassification (2026-04-22): Upgraded from "future / low urgency" to near-term infrastructure priority. Recurring DB corruption on SMB makes local+network-mounted setup operationally untrustworthy. VPS with local SSD filesystem is the real fix. Atlas platform preference: Ubuntu LTS, Hetzner first / DigitalOcean second / Linode third. Local SSD, single-node, SQLite local to box. No SMB, no network-mounted DB. No over-engineering (no API layer, no DB split from engine). Atlas migration sequencing: (1) Resolve FLAG-048 anchor calibration → (2) Run at least one meaningful validating session → (3) Execute VPS migration → (4) Pursue Phase 7.4 clean sessions on VPS. Synergy: WORKSPACE-002 (n8n) — same server can host n8n for handoff notifications. Status: OPEN — NEAR-TERM INFRASTRUCTURE PRIORITY (reclassified Apr 22). Plan migration now. Execute after FLAG-048 anchor calibration is validated.


FLAG-022 — Dashboard renders desktop layout on mobile without auto-detect

Identified: Apr 16. Mobile mode toggle exists but doesn't auto-detect viewport width. Status: OPEN — deferred, lower priority.


FLAG-021 — Spread diagnostics: fill count shown but not spread distribution

Identified: Apr 15. Dashboard shows "34 fills" with no bps breakdown for the 0–5 bps range where most fills cluster. Fix: Add fill count by spread bucket to terminal summary: < 0 bps: X | 0-2 bps: X | 2-5 bps: X | > 5 bps: X. Status: OPEN — low priority, dashboard/terminal improvement.


FLAG-019 — Terminal summary missing anchor diagnostics

Identified: Apr 15. Anchor mean/median/bias direction not in terminal output. Partial resolution: |err|>5bps stat added to session summary (cleanup branch, Apr 18). Full anchor mean/median not yet in terminal. Status: OPEN — partially addressed. Full anchor summary still pending.


FLAG-018 — Near-touch count may overcount during high-skew early-session state

Identified: Apr 15. High inventory skew at session start pulls bid offset below the 5 bps near-touch threshold — every tick registers as near-touch regardless of market movement. Status: OPEN — monitor. Low priority.


FLAG-017 — VS Session comparison: ending drift always shows "—"

Identified: Apr 15. Session ending drift not persisted to engine_state at shutdown — comparison row has no stored value to retrieve. Fix: Persist ending drift to engine_state at session close. Pattern same as existing engine_state writes. Status: OPEN — low priority, display fix.


FLAG-015 — Terminal summary missing regime context fields

Identified: Apr 14. Terminal summary lacks: XRP start/end/Δ%, max skew reached, max drift reached, effective offset distribution. Status: OPEN — add before next regime-sensitive experiment.


FLAG-014 — Future: exploit resistance / behavioral randomness

Identified: Apr 14 (elevated Apr 15). Proposed direction (do not implement yet): Mild timing jitter, regime-based dynamic offset, zone-aware patience, exploit resistance over concealment. Phase 6 scope section now outdated — Phase 6 complete. Relevant ideas carry forward to Phase 8+. Status: OPEN — future, post-Phase 7 strategy lock.


FLAG-002 — Anchor cap utilization escalating

Identified: Early Stage 1. Cap utilization climbed to 73%+ (anchor hitting 10 bps cap on ~3 of 4 ticks). Update (Apr 18): Phase 7.2 CLOB switch now fires when |anchor_error| > 3 bps — effectively managing cap utilization as a control variable, not just a monitoring metric. Phase 7.3 (offset calibration) is the next step. Status: OPEN — superseded by Phase 7 direction. Monitor cap utilization in Phase 7.3 sessions.


Resolved Flags — Summary Table

Resolved flags archived below. Full detail retained for reference.

Flag Summary Resolved
FLAG-048 Anchor dual-signal calibration — structural_basis_bps + residual_distortion_bps, 150-tick EMA, 17 tests, cross-session persistence Apr 22 ✅
FLAG-047 Cancel-fill race — CANCEL_RACE_UNKNOWN + account_tx on-chain resolver, mark_filled_after_race atomic method Apr 22 ✅
FLAG-041 halt.reason taxonomy leak — both clobber sites fixed (main_loop.py + run_paper_session.py) Apr 21 ✅
FLAG-027 Pre-run backup + RPC preflight + signal handlers Apr 17 ✅
FLAG-033 Startup DB integrity check (quick_check + refuse corrupt) Apr 18 ✅
FLAG-034 Session summary shows fills-only XRP, not total Apr 18 ✅
FLAG-032 net_deposits_rlusd_cum source verification (pre-injection gate) Apr 17 ✅
FLAG-031 Capital basis model: profit/principal accounting Apr 17 ✅
FLAG-030 Capital events not reaching inventory_ledger on rebuild Apr 17 ✅
FLAG-028 Dashboard read-only connection + use_container_width deprecation Apr 17 ✅
FLAG-026 Segment B isolation tooling Apr 16 ✅
FLAG-025 Orphaned fill fix: startup cleanup pass Apr 16 ✅
FLAG-024 Dashboard/engine calculation audit Apr 16 ✅
FLAG-013 Uniform requote TTL + atomic offer replacement Apr 16 ✅
FLAG-012 Stale resting orders not cancelled at session startup Apr 14 ✅
FLAG-011 Undocumented config/code changes (VS Code Claude retired) Apr 14 ✅
FLAG-010 Session shutdown summary not session-scoped Apr 13 ✅
FLAG-009 Skew sign inversion — dashboard artifact, engine was correct Apr 13 ✅
FLAG-008 WAC rebuild bug: capital injection + restart path Apr 17 ✅
FLAG-007 Order size config disconnect (hardcoded base size) Apr 13 ✅
FLAG-006 VW spread mid-price session boundary timing Apr 13 ✅
FLAG-005 Participation filter cumulative bleed Apr 13 ✅
FLAG-004 Sell fill mechanism diagnostic (superseded by Phase 6+ sessions) Apr 16 ✅
FLAG-003 Avg fill age 421s (session_id filter shipped) Apr 13 ✅
FLAG-001 Session Min Sell Dist 0.0 bps (18→16 offset locked) Apr 13 ✅

Review Schedule

Claude reviews this file at the start of each session where a tuning or injection decision is being made.