Vesper Review — Pre-Code Findings FLAG-047 (cancel-fill race)¶
Verdict: APPROVED — proceed to implementation¶
All five questions ruled below. 5-commit sequence approved. Branch may be cut.
Findings Review¶
Q1 (Cancel call site): Confirmed. _cancel_all_live_orders at main_loop.py:1281 discarding the CancelResponse is the exact hole. The fix shape (capture, branch on xrpl_result_code, transition to CANCEL_RACE_UNKNOWN) is correct and preserves the write-before-submit invariant from FLAG-037.
Q2 (On-chain tx history): account_tx is the right endpoint. The ±15-ledger window is sensible and easy to tune. The OfferResolution enum abstraction is the right design — keeps raw XRPL JSON out of the reconciler. Fail-closed to INCONCLUSIVE on any exception or ambiguity is Atlas-invariant compliant.
Q3 (Reconciler entry point): Correct. Three-point extension: _get_orders_for_reconciliation status filter, _reconcile_order dispatch, and the new branch at _handle_disappeared_active_order:760 BEFORE the CANCELLED_BY_ENGINE guard. Canonical evaluation order (CANCEL_RACE_UNKNOWN first, then CANCELLED_BY_ENGINE) is the right sequencing — a CANCEL_RACE_UNKNOWN order that slips through to the old guard would be mishandled.
Q4 (DB schema): Clean additive-only additions. _ensure_column idempotent migration pattern (matching FLAG-037 C5) is correct. Two new columns (cancel_race_detected_at, cancel_race_pivot_ledger) are the right split — timestamp for audit, ledger index for the account_tx window.
Q5 (FLAG-046 interaction): Confirmed — single-point fix at _cancel_all_live_orders covers DEGRADED, ANCHOR_IDLE (future), and shutdown cancel. Shutdown cancel fix is a welcome side effect. FLAG-046 branch inherits automatically — no additional work needed there.
Decision Rulings¶
D1 — tecNO_TARGET taxonomy correction¶
RULING: Accept tecNO_TARGET. Update tasking doc log-token names accordingly.
Orion's code inspection at xrpl_gateway.py:1220,1234 overrides the tasking doc assumption. CANCEL_RACE_DETECTED is fine as-is — it does not reference the result code by name. All branch code should use tecNO_TARGET throughout.
D2 — 5-commit sequence vs merge C1+C4¶
RULING: 5 commits. C4 as a standalone fill-size extraction commit.
The AffectedNodes TakerPays/TakerGets parser is the riskiest piece of this branch. Isolating it in C4 with its own fixture tests is the correct mitigation. Reviewability is also better — a standalone fill-size helper diff is much easier to audit than one buried inside the reconciler branch.
Approved sequence:
| # | Commit | Scope |
|---|---|---|
| C1 | Schema + gateway | CANCEL_RACE_UNKNOWN enum, cancel_race_detected_at + cancel_race_pivot_ledger columns, mark_cancel_race_unknown state_manager method, get_account_tx_for_offer gateway method, OfferResolution enum |
| C2 | Cancel result branch | _cancel_all_live_orders captures CancelResponse, branches on tecNO_TARGET, writes CANCEL_RACE_UNKNOWN, emits CANCEL_RACE_DETECTED |
| C3 | Reconciler branch | CANCEL_RACE_UNKNOWN branch before CANCELLED_BY_ENGINE guard. Status filter + dispatch extensions. Calls C1 gateway method. |
| C4 | Fill-size helper | AffectedNodes parser → (fill_size_xrp, fill_size_rlusd, fill_side). Isolated in xrpl_gateway.py with fixture tests. |
| C5 | Tests | tests/test_cancel_fill_race.py, 7 cases. |
D3 — mark_filled_after_race vs direct record_full_fill¶
RULING: Dedicated mark_filled_after_race method. Atomic status transition required.
A CANCEL_RACE_UNKNOWN → FILLED transition must be atomic: the status clear and the fill accounting must happen in a single transaction. record_full_fill does not know about CANCEL_RACE_UNKNOWN status and cannot perform the transition atomically. Dedicated method is the right call.
mark_filled_after_race(order_id, *, filled_at, fill_size_xrp, fill_size_rlusd) — single DB transaction: status → FILLED, filled_at populated, cancel_race_detected_at preserved for audit. The accounting side (capital events, inventory update) should route through the same call that record_full_fill uses, but the status lifecycle is owned by mark_filled_after_race exclusively.
D4 — Fill-size sourcing: on-chain-derived vs intended-size¶
RULING: On-chain-derived from AffectedNodes. No compromise on this one.
S48 evidence: delta_xrp=−7.317607, delta_rlusd=+10.5. If the order's stated size differed from what was actually executed, recording the intended size would re-introduce a truth divergence — the exact failure mode FLAG-047 is fixing. On-chain-derived accuracy is non-negotiable here. C4 exists precisely to isolate and test the parser risk. Use it.
D5 — S48 fixture: raw account_tx vs synthesized¶
RULING: Synthesize from S48 session log data.
Katja does not have the raw account_tx response from the S48 window archived. Synthesize a realistic AffectedNodes structure using the known S48 parameters: delta_xrp=−7.317607 XRP, delta_rlusd=+10.5 RLUSD, inferred price ≈1.4355 RLUSD/XRP (from the realignment output). The fixture must produce a valid DeletedNode of type Offer with PreviousFields.TakerPays/TakerGets encoding that matches those deltas. Correct shape + known values is a stronger regression test than a raw fixture we can't verify.
Non-Blocking Observations¶
-
Shutdown cancel race — Orion's note that the fix improves shutdown correctness as a side effect is correct and worth calling out in the C2 commit message. A stranded fill at shutdown is worse than a DEGRADED-race fill (it sits uncredited across session boundary). Include a comment in
_cancel_all_live_ordersnoting the shutdown-cancel race surface. -
cancel_race_pivot_ledgerpopulation — Confirm that C2 populates this field when writingCANCEL_RACE_UNKNOWN. The reconciler's account_tx call in C3 depends on it. If the pivot ledger is unavailable at C2 time (e.g., the CancelResponse doesn't include the validated ledger index), confirm the fallback (e.g.,None→ reconciler uses a wider default window). -
7-case test surface — The 7 cases are Atlas-level thoroughness. Approve all 7. Add a note in C5: test 5 (non-regression on CANCELLED_BY_ENGINE tesSUCCESS path) is a standing invariant — failure here means FLAG-037 was broken.
Post-Merge¶
After Katja applies patches:
1. python -m pytest tests/ -v — full regression
2. Confirm all 7 new test_cancel_fill_race.py tests pass
3. Confirm FLAG-037 C7 tests still pass (non-regression on CANCELLED_BY_ENGINE)
4. Report passing count to Vesper
5. S49 unblocked — run realignment dry-run before launch
— Vesper, COO, BlueFly AI Enterprises