Orion Delivery — fix/cancel-fill-race (FLAG-047)¶

Vesper — branch complete per your APPROVED pre-code ruling. Cancel-fill race closed end-to-end: tecNO_TARGET cancel response → CANCEL_RACE_UNKNOWN state → reconciler on-chain resolution via account_tx → atomic mark_filled_after_race or confirmed-cancel terminalisation, fail-closed to DEGRADED on inconclusive. Ready for review + merge. S49 unblocked once merged.

Branch¶

fix/cancel-fill-race (off main at 1a1a63c), 5 commits, +2021 / −5 across 6 files.

Commits¶

#	Hash	Subject
C1	`a3c104f`	feat(models,state_manager,xrpl_gateway): FLAG-047 C1 — CANCEL_RACE_UNKNOWN scaffolding + account_tx helper
C2	`3e242cc`	feat(main_loop): FLAG-047 C2 — cancel-fill race detection branch in _cancel_all_live_orders
C4	`908354b`	feat(ledger_reconciler,state_manager): FLAG-047 C4 — AffectedNodes fill parser + mark_filled_after_race atomic method
C3	`9446bba`	feat(ledger_reconciler,main_loop): FLAG-047 C3 — reconciler CANCEL_RACE_UNKNOWN resolution branch
C5	`9e650c6`	test(ledger_reconciler): FLAG-047 C5 — cancel-fill race resolution tests incl. S48 fixture

Order on the branch is C1 → C2 → C4 → C3 → C5. C4 lands before C3 so that the reconciler branch in C3 can call the parser (parse_offer_fill_from_affected_nodes) and the state-manager method (mark_filled_after_race) that C4 introduces. This preserves D2's 5-commit structure — C4 is still standalone, with its own isolated parser + atomic DB method diff — just sequenced ahead of its consumer instead of after it.

Spec compliance — Vesper ruling 2026-04-22¶

D1 — tecNO_TARGET taxonomy: Used throughout (C2, tests, docstrings). Log token CANCEL_RACE_DETECTED emitted on successful demotion to CANCEL_RACE_UNKNOWN.

D2 — 5-commit sequence: Delivered. C4 is a standalone fill-size helper commit with its own parser + atomic-fill method + OfferFillDelta / OfferResolution / OfferResolutionResult dataclasses. Reviewability preserved — the parser diff is ~470 lines in one place.

D3 — mark_filled_after_race: Dedicated method in state_manager.py:1211. Single SQL transaction: clears CANCEL_RACE_UNKNOWN status → FILLED, stamps filled_at, writes the fills row, preserves cancel_race_detected_at for audit. Does NOT route through record_full_fill. Capital / inventory accounting happens outside the atomic DB block (via engine._inventory.apply_fill) mirroring the existing _persist_fill pattern — InventoryManager holds in-memory WAC state that can't roll back with SQL, and FLAG-037's record_full_fill has the same separation.

D4 — On-chain-derived fill size: Non-negotiable, honored. parse_offer_fill_from_affected_nodes in ledger_reconciler.py:414 walks AffectedNodes looking for DeletedNode with LedgerEntryType == "Offer" and matching FinalFields.Account + Sequence. Side inferred from TakerPays shape (dict = RLUSD = SELL; string = XRP drops = BUY). No reliance on order.quantity — if the on-chain delta differs from the intended size, the on-chain number wins.

D5 — S48 fixture synthesized: Test 1 (test_s48_sell_fixture_resolves_cancel_race_to_filled) reproduces the S48 07:06:24 race. Synthesized DeletedNode AffectedNodes block with TakerPays={currency=RLUSD, issuer=..., value="10.5"}, TakerGets="7317607" (drops), Account=ENGINE_ACCOUNT, Sequence=offer_sequence. Inferred price ≈1.4355 RLUSD/XRP (10.5 / 7.317607). Full end-to-end pass: CANCEL_RACE_UNKNOWN order with offer_sequence + pivot_ledger → reconciler sees synthesized account_tx response → parse_offer_fill_from_affected_nodes returns xrp_delta=−7.317607, rlusd_delta=+10.5 → mark_filled_after_race applies atomically → engine._inventory.apply_fill(...) called → result.full_fills == 1 → CANCEL_RACE_FILL_CONFIRMED log emitted.

Architecture summary¶

Detection (C2, main_loop.py:1442): _cancel_all_live_orders already captured CancelResponse from execution_engine.confirm_cancel(...). On xrpl_result_code == "tecNO_TARGET", the order is demoted from CANCELLED_BY_ENGINE (written by FLAG-037) to CANCEL_RACE_UNKNOWN via state.mark_cancel_race_unknown(order_id, pivot_ledger=cancel_resp.ledger_index). A CANCEL_RACE_DETECTED WARNING log captures offer_sequence, side, cancel_tx_hash, pivot_ledger, context. sent += 1 — the cancel reached the ledger; only the meaning is ambiguous.

Resolution (C3, ledger_reconciler.py:943): New dispatch in _reconcile_order routes CANCEL_RACE_UNKNOWN before CANCELLED_BY_ENGINE and before the ACTIVE / snapshot-compare disappeared-order branch — by construction the offer is absent from the ledger_map, so the existing disappeared-order path would otherwise phantom-fill against the wrong truth source. _get_orders_for_reconciliation status tuple extended to include CANCEL_RACE_UNKNOWN.

On-chain lookup (C1, xrpl_gateway.py:1256): get_account_tx_for_offer(account, offer_sequence, min_ledger=None) wraps AccountTx RPC. Returns the list of transaction entries in the scan window. Fail-closed: exceptions (RPC unavailable, xrpl-py missing, network error) are caught at the reconciler boundary and collapse to INCONCLUSIVE. CANCEL_RACE_LOOKUP_LEDGER_MARGIN = 3 scopes the window to [pivot_ledger − 3, +∞) when pivot is present; when C2 couldn't capture a pivot (cancel response had no ledger_index), the reconciler passes min_ledger=None and the RPC returns its default window (heavier but correct).

Three-way resolution (C3 _handle_cancel_race_unknown): Walk the tx entries. For each: - TransactionType == "OfferCreate" with tesSUCCESS and AffectedNodes matching our offer_sequence → parse fill delta via parse_offer_fill_from_affected_nodes. - TransactionType == "OfferCancel" with tx.Account == engine_account, meta.TransactionResult == "tesSUCCESS", and OfferSequence match → record cancel_tx_hash. - Otherwise ignore (cross-traffic, other accounts, other sequences).

Resolution priority (FILLED beats CANCELLED): If any parse returned a fill delta, the order resolves FILLED regardless of a trailing successful cancel-tx match. Reasoning: our own cancel only ever returned tecNO_TARGET if the fill landed first — a later successful cancel on the same sequence would be a replay of the original cancel, not new information. Documented at the resolution block; covered by Test 8 test_filled_wins_over_cancelled_when_both_present.

Terminal transitions: - FILLED: state.mark_filled_after_race(order_id, filled_at=..., fill_size_xrp=..., fill_size_rlusd=..., price_rlusd_per_xrp=...) (atomic) → engine._inventory.apply_fill(...) → log CANCEL_RACE_FILL_CONFIRMED → result.full_fills += 1. - CANCELLED: state.update_order_status(order_id, OrderStatus.CANCELED, cancel_tx_hash=...) (generic update — execution_engine.confirm_cancel requires CANCEL_PENDING, doesn't apply here) → log CANCEL_RACE_CANCEL_CONFIRMED → result.cancels_confirmed += 1. - INCONCLUSIVE: order stays CANCEL_RACE_UNKNOWN (non-terminal) → log CANCEL_RACE_INCONCLUSIVE → result.cancel_races += 1 → engine_signal = DEGRADED if not already HALTED. Fail-closed per Atlas invariant — engine cannot prove alignment → engine stops acting.

FLAG-046 (ANCHOR_IDLE) interaction: No additional work needed. The single fix-site is _cancel_all_live_orders, which is called from every cancel path — DEGRADED entry, ANCHOR_IDLE entry (future), and shutdown. ANCHOR_IDLE's reduced exposure does not eliminate the race (drift / corridor / truth entries still cancel-all), but the coverage is automatic.

Tests¶

New: 12 tests in tests/test_flag_047_cancel_fill_race.py, all green. Exceeds ≥8 mandate.
Core resolution (10 tests):
1. S48 SELL fixture → FILLED (the canonical regression for the S48 07:06:24 race).
2. BUY-direction parser → positive XRP delta.
3. CANCELLED path → CANCELED status, no fill recorded.
4. INCONCLUSIVE (empty tx history) → stays CANCEL_RACE_UNKNOWN, engine_signal == DEGRADED.
5. INCONCLUSIVE (non-matching transactions — different account, different sequence, wrong TransactionType).
6. INCONCLUSIVE (no gateway wired on the reconciler — legacy 2-arg ctor path stays safe).
7. INCONCLUSIVE (gateway raises — exception caught as INCONCLUSIVE, no crash).
8. FILLED wins over CANCELLED when both are present in the tx window.
9. pivot_ledger margin bounds the account_tx scan — verifies CANCEL_RACE_LOOKUP_LEDGER_MARGIN math via gateway call-arg spy.
10. CANCEL_RACE_UNKNOWN routes first — even with a stale AccountOffer still present in the ledger_map, the race resolver dispatches before the disappeared-order branch. First-check position verified.
Parser unit tests (bonus, 2 tests):
1. parse_offer_fill_from_affected_nodes returns exact S48 deltas for a synthesized SELL fixture (delta_xrp=−7.317607, delta_rlusd=+10.5).
2. parse_offer_fill_from_affected_nodes returns None when DeletedNode.FinalFields.Account doesn't match — the offer was the counterparty's, not ours.
Windows-safe teardown: tempdir fixture closes StateManager before tmpdir.cleanup(), mirroring the FLAG-037 C4 pattern.
Regression (FLAG-047-adjacent FLAG suite): 106/106 green across test_flag_047_cancel_fill_race (12), test_reconciler_cancelled_by_engine (5 — FLAG-037 CANCELLED_BY_ENGINE guard non-regression), test_reconciler_conservative (10 — FLAG-037 age-gate non-regression), test_reconciler_anomaly_log (11), test_flag_044_recovery_cooldown (26), test_flag_042_degraded_recovery (16), test_flag_036_wallet_truth_reconciliation (21).
Regression (full suite, sandbox): 337 failed / 648 passed. All 337 failures are the pre-existing OrderSizeConfig.__init__() missing 1 required positional argument: 'max_size_pct_of_portfolio' and ModuleNotFoundError: No module named 'plotly' signatures — same failure set as on main (confirmed by stash + git checkout main dry-run during verification). No net-new failures vs. baseline.

Run commands (sandbox reproducible):

# Minimum: new tests + adjacent FLAG suite (all 106 should pass)
python -m pytest tests/test_flag_047_cancel_fill_race.py \
  tests/test_reconciler_cancelled_by_engine.py \
  tests/test_reconciler_conservative.py \
  tests/test_reconciler_anomaly_log.py \
  tests/test_flag_044_recovery_cooldown.py \
  tests/test_flag_042_degraded_recovery.py \
  tests/test_flag_036_wallet_truth_reconciliation.py -q

# FLAG-047 alone (12 tests, fastest sanity)
python -m pytest tests/test_flag_047_cancel_fill_race.py -v

Non-blocking observations (from Vesper's pre-code ruling)¶

1. Shutdown cancel race: _cancel_all_live_orders is called on every shutdown path (duration halt, HALT escalations, session end). The C2 branch now correctly demotes any shutdown-cancel race to CANCEL_RACE_UNKNOWN — the order survives the session boundary and the reconciler resolves it when the next session starts (startup reconcile pass) or via realign_inventory_to_onchain.py in the pre-session standing procedure. Inline comment at main_loop.py:1428-1441 calls out the shutdown-cancel surface explicitly.

2. cancel_race_pivot_ledger population: C2 writes cancel_resp.ledger_index through. When the gateway could not return a validated ledger index (RPC path that doesn't populate it), C2 passes None; C3's reconciler branch handles None pivot by setting min_ledger=None on the account_tx call — the RPC returns its default window (heavier but correct). Covered by Test 9 test_pivot_ledger_margin_bounds_account_tx_window.

3. Test 5 (FLAG-037 non-regression): The 5 CANCELLED_BY_ENGINE tests in test_reconciler_cancelled_by_engine.py still pass — the dispatch ordering change in _reconcile_order (CANCEL_RACE_UNKNOWN routes first, CANCELLED_BY_ENGINE routes second) preserves all FLAG-037 behavior. A CANCELLED_BY_ENGINE order never enters the CANCEL_RACE_UNKNOWN branch because the status values are distinct.

Deviation — commit order (C4 before C3)¶

The tasking / Vesper's ruling enumerated commits as C1 → C2 → C3 → C4 → C5. Delivered order is C1 → C2 → C4 → C3 → C5.

Why: C3's _handle_cancel_race_unknown calls both parse_offer_fill_from_affected_nodes and state.mark_filled_after_race — both introduced in C4. Sequencing C4 before C3 means every intermediate commit passes pytest --collect-only and imports cleanly. If C3 landed before C4, the reconciler branch would call symbols that don't exist until a later commit — a reviewer running git bisect would hit an ImportError at C3.

C4's scope is unchanged from the pre-code ruling — standalone parser + atomic state-manager method with their own diff surface. Reviewability is preserved. Flagging the order swap explicitly so the numbering in the pre-code memo doesn't mislead during review.

Backward compatibility¶

LedgerReconciler.__init__ adds an optional gateway: Optional[XRPLGateway] = None parameter. Legacy 2-arg callers (LedgerReconciler(engine, state)) still work — Test 6 exercises this path and confirms the reconciler handles CANCEL_RACE_UNKNOWN orders by treating them as INCONCLUSIVE when no gateway is wired. No test fixture anywhere in the existing suite hits this path because nothing else constructs a LedgerReconciler with a CANCEL_RACE_UNKNOWN order.
_handle_cancel_race_unknown type-only imports XRPLGateway under TYPE_CHECKING to avoid a circular import (xrpl_gateway imports models, which is loaded during ledger_reconciler init).
No schema migrations with data loss — cancel_race_detected_at and cancel_race_pivot_ledger are additive TEXT / INTEGER columns on orders, idempotent _ensure_column pattern (matching FLAG-037 C5).
All FLAG-037 guard behavior preserved. All FLAG-036 wallet-truth semantics preserved. All FLAG-042 / FLAG-044 DEGRADED recovery cool-down semantics preserved (INCONCLUSIVE emits a source=cancel_race DEGRADED entry which is uncapped — it's a truth-integrity source, same class as wallet_truth).

Files touched¶

neo_engine/models.py                         | 113 ++   (CANCEL_RACE_UNKNOWN enum member, OrderPersisted cancel_race
                                                          fields, OfferResolution enum, OfferFillDelta,
                                                          OfferResolutionResult dataclasses)
neo_engine/state_manager.py                  | 232 ++   (mark_cancel_race_unknown, mark_filled_after_race atomic,
                                                          schema _ensure_column for cancel_race_* columns,
                                                          get_order_by_id + hydrate for new fields)
neo_engine/xrpl_gateway.py                   | 182 ++   (get_account_tx_for_offer wrapping AccountTx RPC)
neo_engine/main_loop.py                      |  95 ++   (C2 cancel result inspection branch in _cancel_all_live_orders,
                                                          C3 gateway threading into LedgerReconciler)
neo_engine/ledger_reconciler.py              | 534 ++   (CANCEL_RACE_LOOKUP_LEDGER_MARGIN, parse_offer_fill_from_affected_nodes,
                                                          CANCEL_RACE_UNKNOWN dispatch in _reconcile_order,
                                                          _handle_cancel_race_unknown resolution helper)
tests/test_flag_047_cancel_fill_race.py      | 870 ++   (new — 12 tests incl. S48 fixture)

Operator impact¶

Healthy sessions (no tecNO_TARGET from any cancel): zero observable change. All new code paths gated on xrpl_result_code == "tecNO_TARGET" at C2 and status == CANCEL_RACE_UNKNOWN at C3. An unchanged cancel flow still writes CANCELLED_BY_ENGINE via the FLAG-037 path.
Cancel-fill race during a DEGRADED entry: cancel returns tecNO_TARGET → order goes to CANCEL_RACE_UNKNOWN → CANCEL_RACE_DETECTED WARNING logged → reconciler runs on the next tick → account_tx lookup → FILLED (deltas recorded atomically, CANCEL_RACE_FILL_CONFIRMED INFO log), CANCELLED (terminal, benign, CANCEL_RACE_CANCEL_CONFIRMED INFO log), or INCONCLUSIVE (CANCEL_RACE_INCONCLUSIVE WARNING log, engine signals DEGRADED). Truth check continues to run each tick — INCONCLUSIVE orders that actually were filled will produce a truth delta on the next truth check and escalate to inventory_truth_halt as designed.
Cancel-fill race during shutdown: cancel returns tecNO_TARGET → order goes to CANCEL_RACE_UNKNOWN → session closes with the order in that state. Next session startup: reconciler's startup pass (or the pre-session realign_inventory_to_onchain.py --dry-run standing procedure) walks the CANCEL_RACE_UNKNOWN orders via _get_orders_for_reconciliation and runs the same account_tx resolution. S48 would have caught its phantom fill via this startup path if the engine had already been running FLAG-047.
Gateway unavailable / account_tx RPC failing: reconciler catches the exception, leaves order in CANCEL_RACE_UNKNOWN, emits CANCEL_RACE_INCONCLUSIVE WARNING with the exception detail, and signals DEGRADED. Engine does not act while the account_tx path is broken. This is the right failure mode — fail-closed per Atlas invariant, engine cannot prove alignment → engine stops acting.

What this buys in the S48-class regression: the S48 07:06:24 phantom fill (7.317607 XRP / 10.5 RLUSD delta that fired inventory_truth_halt) now resolves to a real recorded fill via mark_filled_after_race on the reconciler pass that immediately follows the cancel race. The truth check delta goes to zero; no inventory_truth_halt; capital accounting stays aligned with on-chain truth. S48-style sessions can continue running through DEGRADED cycles without session-killing truth divergence.

Apply instructions (Windows / PowerShell)¶

Patches live at 02 Projects/NEO Trading Engine/08 Patches/patches/fix-cancel-fill-race/ (5 files, 0001 → 0005). From Katja's VS Code terminal:

cd C:\Users\Katja\Documents\NEO GitHub\neo-2026
git checkout main
git pull

# Defensive: clear any pre-existing branch from a prior attempt.
git branch -D fix/cancel-fill-race 2>$null

git checkout -b fix/cancel-fill-race

Get-ChildItem "C:\Users\Katja\Documents\Claude Homebase Neo\02 Projects\NEO Trading Engine\08 Patches\patches\fix-cancel-fill-race" -Filter "*.patch" |
    Sort-Object Name |
    ForEach-Object { git am $_.FullName }

# Verify — 5 commits expected
git log --oneline main..HEAD
# Expected (topmost 5):
#   9e650c6 test(ledger_reconciler): FLAG-047 C5 — cancel-fill race resolution tests incl. S48 fixture
#   9446bba feat(ledger_reconciler,main_loop): FLAG-047 C3 — reconciler CANCEL_RACE_UNKNOWN resolution branch
#   908354b feat(ledger_reconciler,state_manager): FLAG-047 C4 — AffectedNodes fill parser + mark_filled_after_race atomic method
#   3e242cc feat(main_loop): FLAG-047 C2 — cancel-fill race detection branch in _cancel_all_live_orders
#   a3c104f feat(models,state_manager,xrpl_gateway): FLAG-047 C1 — CANCEL_RACE_UNKNOWN scaffolding + account_tx helper

# Regression — FLAG-047 + adjacent FLAG suite (106 expected)
python -m pytest tests/test_flag_047_cancel_fill_race.py `
  tests/test_reconciler_cancelled_by_engine.py `
  tests/test_reconciler_conservative.py `
  tests/test_reconciler_anomaly_log.py `
  tests/test_flag_044_recovery_cooldown.py `
  tests/test_flag_042_degraded_recovery.py `
  tests/test_flag_036_wallet_truth_reconciliation.py -q
# Expected: 106 passed

# FLAG-047 alone (12 tests)
python -m pytest tests/test_flag_047_cancel_fill_race.py -v
# Expected: 12 passed

Prerequisite: branch applies on top of main at 1a1a63c (FLAG-037 CANCELLED_BY_ENGINE layer merged). If main is ahead of that commit but still has FLAG-037 landed (merge 1a1a63c), the patches apply cleanly — all the FLAG-047 hunks are net-new surface or additive next to the FLAG-037 block. If main lost FLAG-037's CANCELLED_BY_ENGINE status from OrderStatus, C1 will fail to apply.

Post-merge¶

After Katja applies patches and confirms the 106-test adjacent-suite green: 1. Mark FLAG-047 as CLOSED in [C] Open Flags.md. 2. Run python tools/realign_inventory_to_onchain.py --config config/config_live_stage1.yaml --db neo_live_stage1.db --dry-run before S49 — expected delta ~0.00 XRP / 0.00 RLUSD (post-S48 realignment already ran per CLAUDE.md). 3. S49 launch with --duration-seconds 7200 per FLAG-044 standing procedure. 4. Watch for CANCEL_RACE_DETECTED / CANCEL_RACE_FILL_CONFIRMED / CANCEL_RACE_CANCEL_CONFIRMED / CANCEL_RACE_INCONCLUSIVE log tokens. Any INCONCLUSIVE warrants a Slack-adjacent note to this thread — that's the path where the gateway couldn't resolve on-chain and the engine went DEGRADED defensively. 5. Session integrity: if S49 logs a CANCEL_RACE_FILL_CONFIRMED and truth check stays clean throughout, FLAG-047 is confirmed working in production. If a race slips to inventory_truth_halt anyway, the CANCEL_RACE_INCONCLUSIVE log will precede it in the event stream — tell us and we dig in.

Status¶

C1–C5 complete. Branch is clean. One documented deviation (commit order C4-before-C3 for import safety) flagged above. All 5 Vesper rulings honored. 12 tests passing (exceeds ≥8 mandate). All 106 adjacent-FLAG tests green. Awaiting your review.

— Orion