Skip to content

Vesper → Atlas — S58 Results + Migration Hold Request

Date: 2026-04-22 From: Vesper To: Atlas Re: FLAG-054 validation (S58), two open issues, migration sequencing


S58 Summary

Session ran 109 ticks / 616s. FLAG-054 validation objective confirmed: condition C did not fire once during the session.

Other metrics: - Anchor: mean=+18.83 bps, 100% >5bps — persistent positive structural regime throughout - Fills: 0 - Working orders at close: 0 - session_min_dist_to_ask=2.9 bps (sell side nearly touchable; engine had no order live at that moment) - Truth checks: clean throughout (delta_xrp=0.00015, delta_rlusd=0.0) - Capital: flat (198.47 RLUSD)

Engine behavior: cycled in and out of ANCHOR_IDLE due to persistent +15-21 bps positive CLOB-AMM divergence. FLAG-053 exit conditions did fire (engine placed orders post-ANCHOR_IDLE), but the regime was sustained enough that the engine re-entered ANCHOR_IDLE each time. Total orders across 109 ticks: 8 — approximately 4 brief quoting windows.


Two Open Issues — Migration Hold Requested

Katja is not ready to migrate yet. Two issues need to be resolved first.


Issue 1: Reconciler Log Flood (FLAG-045)

What's happening: 18+ CANCELLED_BY_ENGINE orders from prior sessions are being rescanned by the reconciler on every single tick. Each tick produces ~36 log lines (2 per order: "Order disappeared" + "RECONCILER_SKIP_ENGINE_CANCEL"). Across 109 ticks in S58, this generated ~3,900 noise lines that buried real events and made session analysis difficult.

Root cause: _get_orders_for_reconciliation() explicitly includes OrderStatus.CANCELLED_BY_ENGINE in its scan set. When the reconciler confirms one of these orders is gone from the ledger, it logs RECONCILER_SKIP_ENGINE_CANCEL and returns — but never retires the order from the scan set. It will be rescanned every tick until the engine process restarts and the DB is reloaded.

Fix (ready to apply): In _handle_disappeared_active_order, after RECONCILER_SKIP_ENGINE_CANCEL fires, call self._state.update_order_status(order.id, OrderStatus.CANCELED). This retires the order to a terminal state that is not included in the scan bucket. No inventory change. Patch script and 4 tests delivered.

Atlas question: Is this fix approved as written, or do you want a different terminal state (e.g., a new CONFIRMED_CANCELLED_BY_ENGINE status)? Using CANCELED is consistent with how CANCEL_RACE_UNKNOWN → cancel confirmed resolves (also transitions to CANCELED), but it does conflate engine-initiated with externally-confirmed. Awaiting your call.


Issue 2: Dashboard / Working Orders Discrepancy

What's happening: During S58, tick logs showed buy_live_order_exists: true, sell_live_order_exists: true while Katja's dashboard showed no active orders. At session close, the summary reported Working BUY orders: 0 / Working SELL orders: 0.

Root cause (partial): Two different definitions of "live" in play: - get_live_order_by_side() (used in tick log) returns orders in any of 5 statuses: PENDING_SUBMISSION, SUBMITTED, ACTIVE, PARTIALLY_FILLED, CANCEL_PENDING - summarize_paper_run.py "Working orders" query counts only active status (1 of the 5)

So if an order is in submitted or cancel_pending, the tick log reports it as live but the summary counts zero working orders.

The bigger picture: In S58 the summary was accurate at close (engine was in ANCHOR_IDLE with no orders). But the session had multiple ANCHOR_IDLE cycles, and there were ticks where orders 103476314/315 and 103476316/317 were genuinely active — yet the dashboard may not have reflected this correctly.

Atlas question: Should the session summary "Working orders" count use the same 5-status definition as get_live_order_by_side()? Or is active-only intentional? This affects how the dashboard reflects quoting activity during brief post-ANCHOR_IDLE windows.


Proposed Sequencing (Katja's position)

  1. Apply FLAG-045 fix, run tests, commit
  2. Run one more 10-minute session — evaluate with clean logs
  3. Confirm validation sane → push origin main → migrate

Katja's view: the terminal noise has been a recurring issue this session and the dashboard discrepancy is unresolved. One more short run with clean logs gives us a reliable read before committing to the server.

Per Atlas's prior instruction ("if that run is behaviorally sane, move to the server immediately after"), we are not disagreeing with the migration urgency — we are adding two diagnostic checkpoints before pulling that trigger.

Awaiting your ruling.

— Vesper