Skip to content

Vesper Review — Pre-Code Findings FLAG-047 (cancel-fill race)

Verdict: APPROVED — proceed to implementation

All five questions ruled below. 5-commit sequence approved. Branch may be cut.


Findings Review

Q1 (Cancel call site): Confirmed. _cancel_all_live_orders at main_loop.py:1281 discarding the CancelResponse is the exact hole. The fix shape (capture, branch on xrpl_result_code, transition to CANCEL_RACE_UNKNOWN) is correct and preserves the write-before-submit invariant from FLAG-037.

Q2 (On-chain tx history): account_tx is the right endpoint. The ±15-ledger window is sensible and easy to tune. The OfferResolution enum abstraction is the right design — keeps raw XRPL JSON out of the reconciler. Fail-closed to INCONCLUSIVE on any exception or ambiguity is Atlas-invariant compliant.

Q3 (Reconciler entry point): Correct. Three-point extension: _get_orders_for_reconciliation status filter, _reconcile_order dispatch, and the new branch at _handle_disappeared_active_order:760 BEFORE the CANCELLED_BY_ENGINE guard. Canonical evaluation order (CANCEL_RACE_UNKNOWN first, then CANCELLED_BY_ENGINE) is the right sequencing — a CANCEL_RACE_UNKNOWN order that slips through to the old guard would be mishandled.

Q4 (DB schema): Clean additive-only additions. _ensure_column idempotent migration pattern (matching FLAG-037 C5) is correct. Two new columns (cancel_race_detected_at, cancel_race_pivot_ledger) are the right split — timestamp for audit, ledger index for the account_tx window.

Q5 (FLAG-046 interaction): Confirmed — single-point fix at _cancel_all_live_orders covers DEGRADED, ANCHOR_IDLE (future), and shutdown cancel. Shutdown cancel fix is a welcome side effect. FLAG-046 branch inherits automatically — no additional work needed there.


Decision Rulings

D1 — tecNO_TARGET taxonomy correction

RULING: Accept tecNO_TARGET. Update tasking doc log-token names accordingly.

Orion's code inspection at xrpl_gateway.py:1220,1234 overrides the tasking doc assumption. CANCEL_RACE_DETECTED is fine as-is — it does not reference the result code by name. All branch code should use tecNO_TARGET throughout.

D2 — 5-commit sequence vs merge C1+C4

RULING: 5 commits. C4 as a standalone fill-size extraction commit.

The AffectedNodes TakerPays/TakerGets parser is the riskiest piece of this branch. Isolating it in C4 with its own fixture tests is the correct mitigation. Reviewability is also better — a standalone fill-size helper diff is much easier to audit than one buried inside the reconciler branch.

Approved sequence: | # | Commit | Scope | |---|---|---| | C1 | Schema + gateway | CANCEL_RACE_UNKNOWN enum, cancel_race_detected_at + cancel_race_pivot_ledger columns, mark_cancel_race_unknown state_manager method, get_account_tx_for_offer gateway method, OfferResolution enum | | C2 | Cancel result branch | _cancel_all_live_orders captures CancelResponse, branches on tecNO_TARGET, writes CANCEL_RACE_UNKNOWN, emits CANCEL_RACE_DETECTED | | C3 | Reconciler branch | CANCEL_RACE_UNKNOWN branch before CANCELLED_BY_ENGINE guard. Status filter + dispatch extensions. Calls C1 gateway method. | | C4 | Fill-size helper | AffectedNodes parser → (fill_size_xrp, fill_size_rlusd, fill_side). Isolated in xrpl_gateway.py with fixture tests. | | C5 | Tests | tests/test_cancel_fill_race.py, 7 cases. |

D3 — mark_filled_after_race vs direct record_full_fill

RULING: Dedicated mark_filled_after_race method. Atomic status transition required.

A CANCEL_RACE_UNKNOWN → FILLED transition must be atomic: the status clear and the fill accounting must happen in a single transaction. record_full_fill does not know about CANCEL_RACE_UNKNOWN status and cannot perform the transition atomically. Dedicated method is the right call.

mark_filled_after_race(order_id, *, filled_at, fill_size_xrp, fill_size_rlusd) — single DB transaction: status → FILLED, filled_at populated, cancel_race_detected_at preserved for audit. The accounting side (capital events, inventory update) should route through the same call that record_full_fill uses, but the status lifecycle is owned by mark_filled_after_race exclusively.

D4 — Fill-size sourcing: on-chain-derived vs intended-size

RULING: On-chain-derived from AffectedNodes. No compromise on this one.

S48 evidence: delta_xrp=−7.317607, delta_rlusd=+10.5. If the order's stated size differed from what was actually executed, recording the intended size would re-introduce a truth divergence — the exact failure mode FLAG-047 is fixing. On-chain-derived accuracy is non-negotiable here. C4 exists precisely to isolate and test the parser risk. Use it.

D5 — S48 fixture: raw account_tx vs synthesized

RULING: Synthesize from S48 session log data.

Katja does not have the raw account_tx response from the S48 window archived. Synthesize a realistic AffectedNodes structure using the known S48 parameters: delta_xrp=−7.317607 XRP, delta_rlusd=+10.5 RLUSD, inferred price ≈1.4355 RLUSD/XRP (from the realignment output). The fixture must produce a valid DeletedNode of type Offer with PreviousFields.TakerPays/TakerGets encoding that matches those deltas. Correct shape + known values is a stronger regression test than a raw fixture we can't verify.


Non-Blocking Observations

  1. Shutdown cancel race — Orion's note that the fix improves shutdown correctness as a side effect is correct and worth calling out in the C2 commit message. A stranded fill at shutdown is worse than a DEGRADED-race fill (it sits uncredited across session boundary). Include a comment in _cancel_all_live_orders noting the shutdown-cancel race surface.

  2. cancel_race_pivot_ledger population — Confirm that C2 populates this field when writing CANCEL_RACE_UNKNOWN. The reconciler's account_tx call in C3 depends on it. If the pivot ledger is unavailable at C2 time (e.g., the CancelResponse doesn't include the validated ledger index), confirm the fallback (e.g., None → reconciler uses a wider default window).

  3. 7-case test surface — The 7 cases are Atlas-level thoroughness. Approve all 7. Add a note in C5: test 5 (non-regression on CANCELLED_BY_ENGINE tesSUCCESS path) is a standing invariant — failure here means FLAG-037 was broken.


Post-Merge

After Katja applies patches: 1. python -m pytest tests/ -v — full regression 2. Confirm all 7 new test_cancel_fill_race.py tests pass 3. Confirm FLAG-037 C7 tests still pass (non-regression on CANCELLED_BY_ENGINE) 4. Report passing count to Vesper 5. S49 unblocked — run realignment dry-run before launch


— Vesper, COO, BlueFly AI Enterprises