Skip to content

Orion Tasking — fix/cancel-fill-race (FLAG-047)

Priority

SESSION-BLOCKING. S49 cannot run until this is fixed. Do this branch before continuing FLAG-046 ANCHOR_IDLE work.

Background

The CANCELLED_BY_ENGINE guard introduced in FLAG-037 was designed to prevent phantom fills from engine-cancelled orders being re-applied as real fills during reconciliation. It works correctly for that case.

However, it has an ambiguity problem: the reconciler cannot distinguish between:

  1. Engine-cancelled order, cancel confirmed on-chain (correct skip — no fill occurred)
  2. Engine-cancelled order, but counterparty filled it before cancel arrived (incorrect skip — real fill occurred, never recorded)

In case 2, the sequence is: 1. Engine enters DEGRADED → cancel_all fires 2. Engine writes status=CANCELLED_BY_ENGINE + cancelled_at to DB before submitting cancel to gateway 3. Counterparty fills the order on-chain before the cancel transaction hits the ledger 4. Cancel transaction gets tecNO_ENTRY (offer already gone — consumed by fill) 5. Reconciler sees offer gone, checks DB → finds CANCELLED_BY_ENGINE → fires RECONCILER_SKIP_ENGINE_CANCEL → returns with no inventory change 6. Real fill (XRP bought, RLUSD sold) is never recorded 7. Internal inventory drifts from on-chain truth 8. Truth check fires → inventory_truth_halt

Evidence from S48 (07:06:24): - on_chain_xrp=79.507004 vs internal=72.189397 → delta_xrp=−7.317607 XRP - on_chain_rlusd=83.463491 vs internal=93.963491 → delta_rlusd=+10.5 RLUSD - 0 fills recorded by engine. Real fill of ~7.32 XRP / ~10.5 RLUSD was lost.


Root Signal

When a cancel transaction returns tecNO_ENTRY, the offer was already gone at submit time — consumed by something other than the cancel. This is the key signal. A successful cancel returns tesSUCCESS with the offer removed; a race loss returns tecNO_ENTRY.


Required Fix — Option A

When a cancel transaction returns tecNO_ENTRY (or equivalent "not found" response), do not treat the order as successfully cancelled. Instead mark it CANCEL_RACE_UNKNOWN and trigger an on-chain tx history check to determine what actually happened to the offer.

Implementation sketch

In the cancel submission path (wherever cancel transactions are submitted to the gateway and their results processed):

  1. If cancel result == tesSUCCESS → current behavior (mark cancelled, no fill to record)
  2. If cancel result == tecNO_ENTRY → mark order status=CANCEL_RACE_UNKNOWN, log CANCEL_RACE_DETECTED with offer_sequence

In the reconciler (wherever CANCELLED_BY_ENGINE is checked):

  1. If status=CANCEL_RACE_UNKNOWN → do NOT fire RECONCILER_SKIP_ENGINE_CANCEL. Instead, treat the disappeared offer as a normal disappeared order — run the standard on-chain tx history check for the offer_sequence.
  2. If tx history confirms fill → record as fill (normal fill path), log CANCEL_RACE_FILL_CONFIRMED
  3. If tx history confirms cancel (despite tecNO_ENTRY) → treat as cancelled, log CANCEL_RACE_CANCEL_CONFIRMED
  4. If tx history inconclusive → escalate to truth check (fail-closed)

Log tokens required: - CANCEL_RACE_DETECTED — emitted when cancel returns tecNO_ENTRY - CANCEL_RACE_FILL_CONFIRMED — emitted when on-chain lookup confirms fill - CANCEL_RACE_CANCEL_CONFIRMED — emitted when on-chain lookup confirms cancel despite tecNO_ENTRY - CANCEL_RACE_INCONCLUSIVE — emitted when lookup is inconclusive

Fail-closed principle (Atlas invariant)

If the on-chain lookup cannot determine what happened → do not assume cancelled → surface for truth check. A missed fill is always worse than a halt.


Pre-Code Questions

Before writing code, answer these five questions in a pre-code findings handoff to Vesper:

Q1 — Cancel result inspection: Where in the current codebase is the cancel transaction result inspected after gateway submission? Identify the exact file and method. This is where the tecNO_ENTRY branch must be added.

Q2 — On-chain tx history: Is there an existing method for querying on-chain transaction history for a given offer_sequence? If yes, identify it. If no, assess what XRPL API call is needed and whether it's already used elsewhere in the codebase.

Q3 — Reconciler entry point: Confirm the exact location in the reconciler where CANCELLED_BY_ENGINE status is checked and RECONCILER_SKIP_ENGINE_CANCEL is emitted. This is where the CANCEL_RACE_UNKNOWN branch must be added.

Q4 — DB schema: Does the current orders table status column support a new CANCEL_RACE_UNKNOWN value? Any migration concerns? (Key/value pattern should be additive — confirm.)

Q5 — Interaction with FLAG-046: Does ANCHOR_IDLE entry also call cancel_all? If yes, the same race risk applies when ANCHOR_IDLE cancels orders and drift/corridor/truth fires shortly after. Confirm whether the fix applies uniformly to all cancel_all call sites or needs to be scoped.

Flag any architectural concerns before cutting the branch.


Commit Sequence (proposed — adjust after pre-code)

Commit Scope
C1 DB status value CANCEL_RACE_UNKNOWN + cancelled_at handling
C2 Cancel result inspection — tecNO_ENTRY branch → mark CANCEL_RACE_UNKNOWN, emit CANCEL_RACE_DETECTED
C3 Reconciler — CANCEL_RACE_UNKNOWN branch → on-chain lookup, CANCEL_RACE_FILL_CONFIRMED / _CANCEL_CONFIRMED / _INCONCLUSIVE
C4 Tests — race detected, fill confirmed, cancel confirmed, inconclusive, normal CANCELLED_BY_ENGINE unaffected

Standing Delivery Rules (enforced)

  1. No pre-creating branches during investigation. Cut the branch only when ready to commit.
  2. No *.patch glob in PowerShell. Use Get-ChildItem ... -Filter "*.patch" | Sort-Object Name | ForEach-Object { git am $_.FullName }.
  3. Always include defensive branch delete: git branch -D fix/cancel-fill-race before git checkout -b fix/cancel-fill-race.

Deliver to

Claude Homebase Neo\02 Projects\NEO Trading Engine\08 Patches\patches\fix-cancel-fill-race\

Standard delivery memo to Vesper when complete.


— Vesper, COO, BlueFly AI Enterprises