Orion Delivery — fix/cancel-fill-race (FLAG-047)¶
Vesper — branch complete per your APPROVED pre-code ruling. Cancel-fill race closed end-to-end: tecNO_TARGET cancel response → CANCEL_RACE_UNKNOWN state → reconciler on-chain resolution via account_tx → atomic mark_filled_after_race or confirmed-cancel terminalisation, fail-closed to DEGRADED on inconclusive. Ready for review + merge. S49 unblocked once merged.
Branch¶
fix/cancel-fill-race (off main at 1a1a63c), 5 commits, +2021 / −5 across 6 files.
Commits¶
| # | Hash | Subject |
|---|---|---|
| C1 | a3c104f |
feat(models,state_manager,xrpl_gateway): FLAG-047 C1 — CANCEL_RACE_UNKNOWN scaffolding + account_tx helper |
| C2 | 3e242cc |
feat(main_loop): FLAG-047 C2 — cancel-fill race detection branch in _cancel_all_live_orders |
| C4 | 908354b |
feat(ledger_reconciler,state_manager): FLAG-047 C4 — AffectedNodes fill parser + mark_filled_after_race atomic method |
| C3 | 9446bba |
feat(ledger_reconciler,main_loop): FLAG-047 C3 — reconciler CANCEL_RACE_UNKNOWN resolution branch |
| C5 | 9e650c6 |
test(ledger_reconciler): FLAG-047 C5 — cancel-fill race resolution tests incl. S48 fixture |
Order on the branch is C1 → C2 → C4 → C3 → C5. C4 lands before C3 so that the reconciler branch in C3 can call the parser (parse_offer_fill_from_affected_nodes) and the state-manager method (mark_filled_after_race) that C4 introduces. This preserves D2's 5-commit structure — C4 is still standalone, with its own isolated parser + atomic DB method diff — just sequenced ahead of its consumer instead of after it.
Spec compliance — Vesper ruling 2026-04-22¶
D1 — tecNO_TARGET taxonomy: Used throughout (C2, tests, docstrings). Log token CANCEL_RACE_DETECTED emitted on successful demotion to CANCEL_RACE_UNKNOWN.
D2 — 5-commit sequence: Delivered. C4 is a standalone fill-size helper commit with its own parser + atomic-fill method + OfferFillDelta / OfferResolution / OfferResolutionResult dataclasses. Reviewability preserved — the parser diff is ~470 lines in one place.
D3 — mark_filled_after_race: Dedicated method in state_manager.py:1211. Single SQL transaction: clears CANCEL_RACE_UNKNOWN status → FILLED, stamps filled_at, writes the fills row, preserves cancel_race_detected_at for audit. Does NOT route through record_full_fill. Capital / inventory accounting happens outside the atomic DB block (via engine._inventory.apply_fill) mirroring the existing _persist_fill pattern — InventoryManager holds in-memory WAC state that can't roll back with SQL, and FLAG-037's record_full_fill has the same separation.
D4 — On-chain-derived fill size: Non-negotiable, honored. parse_offer_fill_from_affected_nodes in ledger_reconciler.py:414 walks AffectedNodes looking for DeletedNode with LedgerEntryType == "Offer" and matching FinalFields.Account + Sequence. Side inferred from TakerPays shape (dict = RLUSD = SELL; string = XRP drops = BUY). No reliance on order.quantity — if the on-chain delta differs from the intended size, the on-chain number wins.
D5 — S48 fixture synthesized: Test 1 (test_s48_sell_fixture_resolves_cancel_race_to_filled) reproduces the S48 07:06:24 race. Synthesized DeletedNode AffectedNodes block with TakerPays={currency=RLUSD, issuer=..., value="10.5"}, TakerGets="7317607" (drops), Account=ENGINE_ACCOUNT, Sequence=offer_sequence. Inferred price ≈1.4355 RLUSD/XRP (10.5 / 7.317607). Full end-to-end pass: CANCEL_RACE_UNKNOWN order with offer_sequence + pivot_ledger → reconciler sees synthesized account_tx response → parse_offer_fill_from_affected_nodes returns xrp_delta=−7.317607, rlusd_delta=+10.5 → mark_filled_after_race applies atomically → engine._inventory.apply_fill(...) called → result.full_fills == 1 → CANCEL_RACE_FILL_CONFIRMED log emitted.
Architecture summary¶
Detection (C2, main_loop.py:1442): _cancel_all_live_orders already captured CancelResponse from execution_engine.confirm_cancel(...). On xrpl_result_code == "tecNO_TARGET", the order is demoted from CANCELLED_BY_ENGINE (written by FLAG-037) to CANCEL_RACE_UNKNOWN via state.mark_cancel_race_unknown(order_id, pivot_ledger=cancel_resp.ledger_index). A CANCEL_RACE_DETECTED WARNING log captures offer_sequence, side, cancel_tx_hash, pivot_ledger, context. sent += 1 — the cancel reached the ledger; only the meaning is ambiguous.
Resolution (C3, ledger_reconciler.py:943): New dispatch in _reconcile_order routes CANCEL_RACE_UNKNOWN before CANCELLED_BY_ENGINE and before the ACTIVE / snapshot-compare disappeared-order branch — by construction the offer is absent from the ledger_map, so the existing disappeared-order path would otherwise phantom-fill against the wrong truth source. _get_orders_for_reconciliation status tuple extended to include CANCEL_RACE_UNKNOWN.
On-chain lookup (C1, xrpl_gateway.py:1256): get_account_tx_for_offer(account, offer_sequence, min_ledger=None) wraps AccountTx RPC. Returns the list of transaction entries in the scan window. Fail-closed: exceptions (RPC unavailable, xrpl-py missing, network error) are caught at the reconciler boundary and collapse to INCONCLUSIVE. CANCEL_RACE_LOOKUP_LEDGER_MARGIN = 3 scopes the window to [pivot_ledger − 3, +∞) when pivot is present; when C2 couldn't capture a pivot (cancel response had no ledger_index), the reconciler passes min_ledger=None and the RPC returns its default window (heavier but correct).
Three-way resolution (C3 _handle_cancel_race_unknown): Walk the tx entries. For each:
- TransactionType == "OfferCreate" with tesSUCCESS and AffectedNodes matching our offer_sequence → parse fill delta via parse_offer_fill_from_affected_nodes.
- TransactionType == "OfferCancel" with tx.Account == engine_account, meta.TransactionResult == "tesSUCCESS", and OfferSequence match → record cancel_tx_hash.
- Otherwise ignore (cross-traffic, other accounts, other sequences).
Resolution priority (FILLED beats CANCELLED): If any parse returned a fill delta, the order resolves FILLED regardless of a trailing successful cancel-tx match. Reasoning: our own cancel only ever returned tecNO_TARGET if the fill landed first — a later successful cancel on the same sequence would be a replay of the original cancel, not new information. Documented at the resolution block; covered by Test 8 test_filled_wins_over_cancelled_when_both_present.
Terminal transitions:
- FILLED: state.mark_filled_after_race(order_id, filled_at=..., fill_size_xrp=..., fill_size_rlusd=..., price_rlusd_per_xrp=...) (atomic) → engine._inventory.apply_fill(...) → log CANCEL_RACE_FILL_CONFIRMED → result.full_fills += 1.
- CANCELLED: state.update_order_status(order_id, OrderStatus.CANCELED, cancel_tx_hash=...) (generic update — execution_engine.confirm_cancel requires CANCEL_PENDING, doesn't apply here) → log CANCEL_RACE_CANCEL_CONFIRMED → result.cancels_confirmed += 1.
- INCONCLUSIVE: order stays CANCEL_RACE_UNKNOWN (non-terminal) → log CANCEL_RACE_INCONCLUSIVE → result.cancel_races += 1 → engine_signal = DEGRADED if not already HALTED. Fail-closed per Atlas invariant — engine cannot prove alignment → engine stops acting.
FLAG-046 (ANCHOR_IDLE) interaction: No additional work needed. The single fix-site is _cancel_all_live_orders, which is called from every cancel path — DEGRADED entry, ANCHOR_IDLE entry (future), and shutdown. ANCHOR_IDLE's reduced exposure does not eliminate the race (drift / corridor / truth entries still cancel-all), but the coverage is automatic.
Tests¶
- New: 12 tests in
tests/test_flag_047_cancel_fill_race.py, all green. Exceeds ≥8 mandate. - Core resolution (10 tests):
- S48 SELL fixture → FILLED (the canonical regression for the S48 07:06:24 race).
- BUY-direction parser → positive XRP delta.
- CANCELLED path →
CANCELEDstatus, no fill recorded. - INCONCLUSIVE (empty tx history) → stays
CANCEL_RACE_UNKNOWN,engine_signal == DEGRADED. - INCONCLUSIVE (non-matching transactions — different account, different sequence, wrong TransactionType).
- INCONCLUSIVE (no gateway wired on the reconciler — legacy 2-arg ctor path stays safe).
- INCONCLUSIVE (gateway raises — exception caught as
INCONCLUSIVE, no crash). - FILLED wins over CANCELLED when both are present in the tx window.
pivot_ledgermargin bounds theaccount_txscan — verifiesCANCEL_RACE_LOOKUP_LEDGER_MARGINmath via gateway call-arg spy.- CANCEL_RACE_UNKNOWN routes first — even with a stale
AccountOfferstill present in the ledger_map, the race resolver dispatches before the disappeared-order branch. First-check position verified.
- Parser unit tests (bonus, 2 tests):
parse_offer_fill_from_affected_nodesreturns exact S48 deltas for a synthesized SELL fixture (delta_xrp=−7.317607, delta_rlusd=+10.5).parse_offer_fill_from_affected_nodesreturnsNonewhenDeletedNode.FinalFields.Accountdoesn't match — the offer was the counterparty's, not ours.
- Windows-safe teardown: tempdir fixture closes
StateManagerbeforetmpdir.cleanup(), mirroring the FLAG-037 C4 pattern. - Regression (FLAG-047-adjacent FLAG suite): 106/106 green across
test_flag_047_cancel_fill_race(12),test_reconciler_cancelled_by_engine(5 — FLAG-037 CANCELLED_BY_ENGINE guard non-regression),test_reconciler_conservative(10 — FLAG-037 age-gate non-regression),test_reconciler_anomaly_log(11),test_flag_044_recovery_cooldown(26),test_flag_042_degraded_recovery(16),test_flag_036_wallet_truth_reconciliation(21). - Regression (full suite, sandbox): 337 failed / 648 passed. All 337 failures are the pre-existing
OrderSizeConfig.__init__() missing 1 required positional argument: 'max_size_pct_of_portfolio'andModuleNotFoundError: No module named 'plotly'signatures — same failure set as onmain(confirmed by stash +git checkout maindry-run during verification). No net-new failures vs. baseline.
Run commands (sandbox reproducible):
# Minimum: new tests + adjacent FLAG suite (all 106 should pass)
python -m pytest tests/test_flag_047_cancel_fill_race.py \
tests/test_reconciler_cancelled_by_engine.py \
tests/test_reconciler_conservative.py \
tests/test_reconciler_anomaly_log.py \
tests/test_flag_044_recovery_cooldown.py \
tests/test_flag_042_degraded_recovery.py \
tests/test_flag_036_wallet_truth_reconciliation.py -q
# FLAG-047 alone (12 tests, fastest sanity)
python -m pytest tests/test_flag_047_cancel_fill_race.py -v
Non-blocking observations (from Vesper's pre-code ruling)¶
1. Shutdown cancel race: _cancel_all_live_orders is called on every shutdown path (duration halt, HALT escalations, session end). The C2 branch now correctly demotes any shutdown-cancel race to CANCEL_RACE_UNKNOWN — the order survives the session boundary and the reconciler resolves it when the next session starts (startup reconcile pass) or via realign_inventory_to_onchain.py in the pre-session standing procedure. Inline comment at main_loop.py:1428-1441 calls out the shutdown-cancel surface explicitly.
2. cancel_race_pivot_ledger population: C2 writes cancel_resp.ledger_index through. When the gateway could not return a validated ledger index (RPC path that doesn't populate it), C2 passes None; C3's reconciler branch handles None pivot by setting min_ledger=None on the account_tx call — the RPC returns its default window (heavier but correct). Covered by Test 9 test_pivot_ledger_margin_bounds_account_tx_window.
3. Test 5 (FLAG-037 non-regression): The 5 CANCELLED_BY_ENGINE tests in test_reconciler_cancelled_by_engine.py still pass — the dispatch ordering change in _reconcile_order (CANCEL_RACE_UNKNOWN routes first, CANCELLED_BY_ENGINE routes second) preserves all FLAG-037 behavior. A CANCELLED_BY_ENGINE order never enters the CANCEL_RACE_UNKNOWN branch because the status values are distinct.
Deviation — commit order (C4 before C3)¶
The tasking / Vesper's ruling enumerated commits as C1 → C2 → C3 → C4 → C5. Delivered order is C1 → C2 → C4 → C3 → C5.
Why: C3's _handle_cancel_race_unknown calls both parse_offer_fill_from_affected_nodes and state.mark_filled_after_race — both introduced in C4. Sequencing C4 before C3 means every intermediate commit passes pytest --collect-only and imports cleanly. If C3 landed before C4, the reconciler branch would call symbols that don't exist until a later commit — a reviewer running git bisect would hit an ImportError at C3.
C4's scope is unchanged from the pre-code ruling — standalone parser + atomic state-manager method with their own diff surface. Reviewability is preserved. Flagging the order swap explicitly so the numbering in the pre-code memo doesn't mislead during review.
Backward compatibility¶
LedgerReconciler.__init__adds an optionalgateway: Optional[XRPLGateway] = Noneparameter. Legacy 2-arg callers (LedgerReconciler(engine, state)) still work — Test 6 exercises this path and confirms the reconciler handlesCANCEL_RACE_UNKNOWNorders by treating them asINCONCLUSIVEwhen no gateway is wired. No test fixture anywhere in the existing suite hits this path because nothing else constructs aLedgerReconcilerwith aCANCEL_RACE_UNKNOWNorder._handle_cancel_race_unknowntype-only importsXRPLGatewayunderTYPE_CHECKINGto avoid a circular import (xrpl_gatewayimportsmodels, which is loaded duringledger_reconcilerinit).- No schema migrations with data loss —
cancel_race_detected_atandcancel_race_pivot_ledgerare additiveTEXT/INTEGERcolumns onorders, idempotent_ensure_columnpattern (matching FLAG-037 C5). - All FLAG-037 guard behavior preserved. All FLAG-036 wallet-truth semantics preserved. All FLAG-042 / FLAG-044 DEGRADED recovery cool-down semantics preserved (INCONCLUSIVE emits a
source=cancel_raceDEGRADED entry which is uncapped — it's a truth-integrity source, same class aswallet_truth).
Files touched¶
neo_engine/models.py | 113 ++ (CANCEL_RACE_UNKNOWN enum member, OrderPersisted cancel_race
fields, OfferResolution enum, OfferFillDelta,
OfferResolutionResult dataclasses)
neo_engine/state_manager.py | 232 ++ (mark_cancel_race_unknown, mark_filled_after_race atomic,
schema _ensure_column for cancel_race_* columns,
get_order_by_id + hydrate for new fields)
neo_engine/xrpl_gateway.py | 182 ++ (get_account_tx_for_offer wrapping AccountTx RPC)
neo_engine/main_loop.py | 95 ++ (C2 cancel result inspection branch in _cancel_all_live_orders,
C3 gateway threading into LedgerReconciler)
neo_engine/ledger_reconciler.py | 534 ++ (CANCEL_RACE_LOOKUP_LEDGER_MARGIN, parse_offer_fill_from_affected_nodes,
CANCEL_RACE_UNKNOWN dispatch in _reconcile_order,
_handle_cancel_race_unknown resolution helper)
tests/test_flag_047_cancel_fill_race.py | 870 ++ (new — 12 tests incl. S48 fixture)
Operator impact¶
- Healthy sessions (no
tecNO_TARGETfrom any cancel): zero observable change. All new code paths gated onxrpl_result_code == "tecNO_TARGET"at C2 andstatus == CANCEL_RACE_UNKNOWNat C3. An unchanged cancel flow still writesCANCELLED_BY_ENGINEvia the FLAG-037 path. - Cancel-fill race during a DEGRADED entry: cancel returns
tecNO_TARGET→ order goes toCANCEL_RACE_UNKNOWN→CANCEL_RACE_DETECTEDWARNING logged → reconciler runs on the next tick → account_tx lookup → FILLED (deltas recorded atomically,CANCEL_RACE_FILL_CONFIRMEDINFO log), CANCELLED (terminal, benign,CANCEL_RACE_CANCEL_CONFIRMEDINFO log), or INCONCLUSIVE (CANCEL_RACE_INCONCLUSIVEWARNING log, engine signals DEGRADED). Truth check continues to run each tick — INCONCLUSIVE orders that actually were filled will produce a truth delta on the next truth check and escalate toinventory_truth_haltas designed. - Cancel-fill race during shutdown: cancel returns
tecNO_TARGET→ order goes toCANCEL_RACE_UNKNOWN→ session closes with the order in that state. Next session startup: reconciler's startup pass (or the pre-sessionrealign_inventory_to_onchain.py --dry-runstanding procedure) walks the CANCEL_RACE_UNKNOWN orders via_get_orders_for_reconciliationand runs the same account_tx resolution. S48 would have caught its phantom fill via this startup path if the engine had already been running FLAG-047. - Gateway unavailable / account_tx RPC failing: reconciler catches the exception, leaves order in
CANCEL_RACE_UNKNOWN, emitsCANCEL_RACE_INCONCLUSIVEWARNING with the exception detail, and signals DEGRADED. Engine does not act while the account_tx path is broken. This is the right failure mode — fail-closed per Atlas invariant, engine cannot prove alignment → engine stops acting.
What this buys in the S48-class regression: the S48 07:06:24 phantom fill (7.317607 XRP / 10.5 RLUSD delta that fired inventory_truth_halt) now resolves to a real recorded fill via mark_filled_after_race on the reconciler pass that immediately follows the cancel race. The truth check delta goes to zero; no inventory_truth_halt; capital accounting stays aligned with on-chain truth. S48-style sessions can continue running through DEGRADED cycles without session-killing truth divergence.
Apply instructions (Windows / PowerShell)¶
Patches live at 02 Projects/NEO Trading Engine/08 Patches/patches/fix-cancel-fill-race/ (5 files, 0001 → 0005). From Katja's VS Code terminal:
cd C:\Users\Katja\Documents\NEO GitHub\neo-2026
git checkout main
git pull
# Defensive: clear any pre-existing branch from a prior attempt.
git branch -D fix/cancel-fill-race 2>$null
git checkout -b fix/cancel-fill-race
Get-ChildItem "C:\Users\Katja\Documents\Claude Homebase Neo\02 Projects\NEO Trading Engine\08 Patches\patches\fix-cancel-fill-race" -Filter "*.patch" |
Sort-Object Name |
ForEach-Object { git am $_.FullName }
# Verify — 5 commits expected
git log --oneline main..HEAD
# Expected (topmost 5):
# 9e650c6 test(ledger_reconciler): FLAG-047 C5 — cancel-fill race resolution tests incl. S48 fixture
# 9446bba feat(ledger_reconciler,main_loop): FLAG-047 C3 — reconciler CANCEL_RACE_UNKNOWN resolution branch
# 908354b feat(ledger_reconciler,state_manager): FLAG-047 C4 — AffectedNodes fill parser + mark_filled_after_race atomic method
# 3e242cc feat(main_loop): FLAG-047 C2 — cancel-fill race detection branch in _cancel_all_live_orders
# a3c104f feat(models,state_manager,xrpl_gateway): FLAG-047 C1 — CANCEL_RACE_UNKNOWN scaffolding + account_tx helper
# Regression — FLAG-047 + adjacent FLAG suite (106 expected)
python -m pytest tests/test_flag_047_cancel_fill_race.py `
tests/test_reconciler_cancelled_by_engine.py `
tests/test_reconciler_conservative.py `
tests/test_reconciler_anomaly_log.py `
tests/test_flag_044_recovery_cooldown.py `
tests/test_flag_042_degraded_recovery.py `
tests/test_flag_036_wallet_truth_reconciliation.py -q
# Expected: 106 passed
# FLAG-047 alone (12 tests)
python -m pytest tests/test_flag_047_cancel_fill_race.py -v
# Expected: 12 passed
Prerequisite: branch applies on top of main at 1a1a63c (FLAG-037 CANCELLED_BY_ENGINE layer merged). If main is ahead of that commit but still has FLAG-037 landed (merge 1a1a63c), the patches apply cleanly — all the FLAG-047 hunks are net-new surface or additive next to the FLAG-037 block. If main lost FLAG-037's CANCELLED_BY_ENGINE status from OrderStatus, C1 will fail to apply.
Post-merge¶
After Katja applies patches and confirms the 106-test adjacent-suite green:
1. Mark FLAG-047 as CLOSED in [C] Open Flags.md.
2. Run python tools/realign_inventory_to_onchain.py --config config/config_live_stage1.yaml --db neo_live_stage1.db --dry-run before S49 — expected delta ~0.00 XRP / 0.00 RLUSD (post-S48 realignment already ran per CLAUDE.md).
3. S49 launch with --duration-seconds 7200 per FLAG-044 standing procedure.
4. Watch for CANCEL_RACE_DETECTED / CANCEL_RACE_FILL_CONFIRMED / CANCEL_RACE_CANCEL_CONFIRMED / CANCEL_RACE_INCONCLUSIVE log tokens. Any INCONCLUSIVE warrants a Slack-adjacent note to this thread — that's the path where the gateway couldn't resolve on-chain and the engine went DEGRADED defensively.
5. Session integrity: if S49 logs a CANCEL_RACE_FILL_CONFIRMED and truth check stays clean throughout, FLAG-047 is confirmed working in production. If a race slips to inventory_truth_halt anyway, the CANCEL_RACE_INCONCLUSIVE log will precede it in the event stream — tell us and we dig in.
Status¶
C1–C5 complete. Branch is clean. One documented deviation (commit order C4-before-C3 for import safety) flagged above. All 5 Vesper rulings honored. 12 tests passing (exceeds ≥8 mandate). All 106 adjacent-FLAG tests green. Awaiting your review.
— Orion