Skip to content

Orion Delivery — Branch #4 fix/flag-029-async-pin-and-orphan

To: Katja CC: Vesper, Atlas From: Orion Date: 2026-04-18


TL;DR

Two commits, two new test files, 2 patches ready to apply. Both FLAG-029a (async pin + safety) and FLAG-029b (orphan cleanup) closed.

# Commit Files Tests
1 33a1cd7feat(xrpl-gateway): pin xrpl-py <3.0 and guard submit_and_wait against async migration pyproject.toml, neo_engine/xrpl_gateway.py, tests/test_flag_029_async_safety.py 6 new
2 83452fefix(main-loop): force-cancel SUBMITTED orders with no offer_sequence at startup (FLAG-029b) neo_engine/main_loop.py, tests/test_flag_029b_orphan_cleanup.py 5 new

Both patches sit in patches-flag-029/ next to this memo. Total: 11/11 new tests pass in 0.15 s. No regressions in test_halt_reason_lifecycle.py (4/4 pass).

Branch base: df168f0 (Merge branch chore/archive-cleanup into local-katja-main — your Branch #3 merge).


Commit 1 — FLAG-029a: async pin + submit_and_wait safety

Changes

  • pyproject.toml: "xrpl-py>=2.4,""xrpl-py>=2.4,<3.0" (Q1 ruling).
  • neo_engine/xrpl_gateway.py:
  • import inspect.
  • New module-level helper _submit_and_wait_safe(tx, client, wallet). Imports submit_and_wait internally; raises RuntimeError before invocation if it's a coroutine function; if the call returns a coroutine, calls .close() on it before raising (prevents "coroutine was never awaited" RuntimeWarnings).
  • Three call sites migrated: submit_offer_create, submit_offer_replace_atomic, submit_offer_cancel — each now response = _submit_and_wait_safe(tx, client, wallet) with the inline from xrpl.transaction import submit_and_wait import removed.
  • XRPLGateway.__init__ smoke check: probes inspect.iscoroutinefunction(submit_and_wait) after the log.info init line. If True → log.error + raise GatewayError. ImportError is swallowed (paper/dry-run dev envs without xrpl-py installed continue to work; per-call ImportError branches handle it).
  • tests/test_flag_029_async_safety.py (new, 6 tests):
  • _submit_and_wait_safe passes sync responses through.
  • _submit_and_wait_safe raises on async def submit_and_wait (before invocation — coroutine never created).
  • _submit_and_wait_safe raises + closes coroutine when sync function returns an awaitable.
  • XRPLGateway.__init__ succeeds on sync submit_and_wait.
  • XRPLGateway.__init__ raises GatewayError on async def submit_and_wait.
  • XRPLGateway.__init__ tolerates xrpl-py not installed (paper mode).

Behaviour invariants

  • On xrpl-py 2.x sync: _submit_and_wait_safe is a pure pass-through. No behaviour change on the happy path.
  • On xrpl-py 3.x async (hypothetical silent bump): engine refuses to start at init, and even if init were somehow bypassed the per-call helper would raise on every submit before the on-chain call runs. No ghost orders possible.

Commit 2 — FLAG-029b: SUBMITTED orphan cleanup

Changes

  • neo_engine/main_loop.py:
  • New class constant _STARTUP_SUBMITTED_ORPHAN_CUTOFF_SECONDS = 7 * 24 * 3600 (Q2 ruling — 7 days).
  • _startup_force_cancel_stuck_orders docstring updated to describe three categories (1, 2, 3).
  • Category 3 added at the end: scans get_orders_by_status(OrderStatus.SUBMITTED), skips any row where offer_sequence is not None, skips rows younger than the 7-day floor, force-cancels the rest with failure_reason="startup: force-canceled — SUBMITTED without offer_sequence (submit orphan, >7d old)" and a log.warning.
  • Existing Categories 1 and 2 completely unchanged.
  • tests/test_flag_029b_orphan_cleanup.py (new, 5 tests):
  • Old SUBMITTED + no offer_sequence → canceled (exact shape of c7e14e73*).
  • Fresh SUBMITTED + no offer_sequence → preserved (in-flight protected).
  • Old SUBMITTED + offer_sequence set → preserved (reconciler owns it).
  • ACTIVE with offer_sequence → preserved (live order untouched).
  • Mixed population of 4 rows: only orphan canceled, second pass is idempotent (0 further).

Behaviour invariants

  • Generalized self-heal. Not a one-shot c7e14e73-specific script. Any future row that lands in the same state (submit crashed between "sent to gateway" and "engine promoted to ACTIVE") will be cleaned on the next startup once it crosses the 7-day floor.
  • Three guards protect live orders from accidental cancellation:
  • Status must be SUBMITTED (ACTIVE and PARTIALLY_FILLED are out of scope).
  • offer_sequence must be NULL (reconciler owns any row with a seq).
  • Row age ≥ 7 days (legitimate SUBMITTED→ACTIVE transitions are far below this).

Commit order (Q3 ruling)

As proposed and approved: async pin first, orphan cleanup second. Each commit is independently revertable — if the async pin causes an unexpected issue on your machine (e.g., an xrpl-py resolution problem), revert 33a1cd7 and keep the orphan fix from 83452fe. Or vice versa.


Apply on your Windows VS Code terminal

# from C:\Users\Katja\Documents\NEO\neo_engine (adjust to your repo root)
git fetch origin
git checkout -b fix/flag-029-async-pin-and-orphan local-katja-main

git am "$env:USERPROFILE\<workspace>\02 Projects\NEO Trading Engine\patches-flag-029\0001-feat-xrpl-gateway-pin-xrpl-py-3.0-and-guard-submit_a.patch"
git am "$env:USERPROFILE\<workspace>\02 Projects\NEO Trading Engine\patches-flag-029\0002-fix-main-loop-force-cancel-SUBMITTED-orders-with-no-.patch"

# verify the two commits landed
git log --oneline local-katja-main..HEAD

# run the new test files
python -m pytest tests/test_flag_029_async_safety.py tests/test_flag_029b_orphan_cleanup.py tests/test_halt_reason_lifecycle.py -v

Expected: 15/15 pass, ~0.15 s.

If git am fails on either patch (unlikely — base is your merged df168f0), abort and paste me the conflict before doing anything else:

git am --abort


Pre-existing test failures — same as prior branches

The same broad failures Vesper and I noted on Branches #1–#3 persist on this base: - tests/test_main_loop_old.py, tests/test_run_paper_session.py, and most of tests/test_execution_engine.py / tests/test_task4.py — all tied to OrderSizeConfig.__init__() missing max_size_pct_of_portfolio. Verified identical counts on local-katja-main before and after this branch. Out of scope here. Still queued under FLAG-016.

My new tests do not depend on any of the broken fixtures. The 11 new cases build their configs explicitly and include max_size_pct_of_portfolio=0.15.


Closes / updates

  • FLAG-029a — async warning: closed by Commit 1. xrpl-py pinned <3.0, submit_and_wait has sync-contract guards at both init and every call site.
  • FLAG-029b — orphan c7e14e73*: will self-heal on the next restart. Row has been SUBMITTED without offer_sequence for >7 days, so it matches Category 3 immediately. Verify post-apply:
    # After next engine startup
    sqlite3 neo_paper.db "SELECT id, status, failure_reason FROM orders WHERE id LIKE 'c7e14e73%';"
    # Expected: status='CANCELED', failure_reason starts with 'startup: force-canceled — SUBMITTED without offer_sequence'
    

Queue per audit plan: Branch #5 audit/config-wiring-pass (includes clob_switch_threshold_bps promotion) is next. Holding until you give the go after Vesper's review of this branch.

— Orion