Orion Investigation — Inventory Truth Divergence Root Cause¶

To: Vesper CC: Atlas, Katja From: Orion Date: 2026-04-19 Branch: investigation/inventory-truth-divergence (code read, no edits) Deliverable: D1 per Vesper tasking 2026-04-19

Executive summary¶

The engine never re-reads on-chain balance after its first cold-start. Every subsequent session initializes inventory from the inventory_ledger table (fills-based running balance) plus a capital_events overlay. There is no live reconciliation against account_info.Balance at session start, during runtime, or at shutdown.

Once internal books drift from reality — for any reason — the drift persists forever. The primary driver of divergence is the ledger reconciler's phantom-fill heuristic: when an offer disappears from account_offers without a cancel_tx_hash, the reconciler unconditionally treats it as a full fill (ledger_reconciler.py:675–687). Any off-book offer consumption, external cancellation, partial-fill-then-cancel sequence, or node-snapshot inconsistency produces a phantom credit (BUY) or phantom debit (SELL) that permanently misaligns the ledger from reality.

Q5 is reproduced exactly. DB S33 start = 41.7126 XRP (fills-only ledger tip) + 35.21 XRP (post-baseline capital overlay) = 76.9226 XRP, matching the sessions.starting_xrp to 10⁻⁶. On-chain truth at that moment was ~69.52 XRP. The 7.40-XRP gap resolves as 6.71 XRP of real trading drift never reflected in the ledger (phantom-fill mechanism across S1–S32) plus 0.69 XRP of internal ledger/fills-math inconsistency (zero-quantity fills on cancel-raced orders).

Fix scope: the engine needs (a) an authoritative on-chain read path that runs on session start AND periodically during runtime, (b) a halt-on-divergence gate, and (c) elimination of the unconditional phantom-fill path for disappeared orders — or at minimum a check that the disappeared quantity matches a detectable on-chain Payment.

Q1. `get_snapshot()` XRP balance source¶

Source: fills-based (in-memory cache, rebuilt from inventory_ledger + capital_events overlay). NOT on-chain.

Code path: - inventory_manager.py:334–370 — get_snapshot(mid_price) returns InventorySnapshot(xrp_balance=self._xrp_balance, …). Reads self._xrp_balance directly; no I/O. - inventory_manager.py:95–188 — rebuild() populates self._xrp_balance once at engine startup: - If inventory_ledger has entries: self._state.get_current_balance(Asset.XRP) (line 128) reads the most recent new_balance from the ledger (state_manager.py:1368–1384). - Plus overlay: self._state.get_capital_delta_total('XRP') (line 147) sums post-baseline capital events (state_manager.py:1906–2016). - Cache: self._xrp_balance = xrp_fills_only + self._xrp_capital_overlay (line 157). - inventory_manager.py:210–332 — apply_fill(order, fill_price, fill_quantity_rlusd) is the ONLY code path that mutates _xrp_balance at runtime. It computes xrp_change = +/- fill_quantity_rlusd / fill_price (line 250, 259, 284) and updates the cache; persists fills_only_new_xrp = _xrp_balance - _xrp_capital_overlay to inventory_ledger.

There is no runtime path that refreshes _xrp_balance from on-chain. get_wallet_balances() (the on-chain query path) is called exactly once and only under a narrow empty-state gate (see Q2).

Q2. Starting XRP balance seed¶

Seed source depends on state; on-chain is consulted only in the fresh-DB corner case.

Code path in main_loop.py:283–343 (startup sequence):

self._inventory.rebuild() runs first (line 309).

In live mode only, the engine checks (lines 313–318):

needs_seed = (
    self._inventory._xrp_balance == 0.0
    and self._inventory._rlusd_balance == 0.0
    and not self._state.has_inventory_ledger_entries()
)

If needs_seed: fetch get_wallet_balances(wallet_address) (line 322) → write live.starting_balance_xrp to engine_state (line 327) AND assign directly to self._inventory._xrp_balance (line 325).
If not needs_seed: skipped entirely. The engine runs on whatever rebuild() produced.

live.starting_balance_xrp is written exactly once — at first cold start when the DB has zero fills. On every subsequent startup the ledger has entries, so this path is dead code.

Sessions.starting_xrp is populated in main_loop.py:429–436:

_start_xrp = self._inventory._xrp_balance   # the cache; NOT a fresh on-chain read
_session_id = self._state.create_session(starting_xrp=_start_xrp, …)

So the sessions table's starting_xrp at S33 was not the on-chain truth — it was rebuild()'s output.

get_wallet_balances() (xrpl_gateway.py:628–662) reads account_info.Balance (liquid XRP) and account_lines (RLUSD trust line). Correct query shapes. Would have produced correct data at session start — the fatal issue is the gate that prevents it from ever being called after cold start.

Q3. Double-counting from locked-in-offer XRP — NO, but the framing misses the real mechanism¶

XRPL does not escrow XRP when an offer is created. An OfferCreate only writes an Offer object into the ledger; it does not move XRP out of AccountRoot.Balance. account_info.Balance is the spendable XRP balance at all times — never reduced by outstanding offers. The reserve requirement (2 XRP per owned object) restricts spendable XRP via a separate mechanism but does not change the Balance field.

Therefore: even if the engine WERE summing AccountRoot.Balance + SUM(offer.TakerGets for XRP), it would not be double-counting anything — because Balance doesn't drop when offers are created. The engine does not do this summation anyway; it tracks XRP only via _xrp_balance (fills-based).

The real drift mechanism is _handle_disappeared_active_order in ledger_reconciler.py:633–687. When an ACTIVE or PARTIALLY_FILLED order has its offer_sequence missing from the next account_offers snapshot, and no cancel_tx_hash is recorded, the reconciler calls:

_apply_full_fill(order, order.quantity, engine, result)    # line 687

This unconditionally credits the entire order.quantity as a fill. If the offer disappeared for ANY reason other than a 1:1 full fill — partial fill + external cancel, OfferCancel submitted outside the engine, transient node-snapshot inconsistency, or a lost notification of partial fill followed by cancellation — the engine creates a phantom fill. For a BUY this over-credits XRP; for a SELL this over-debits XRP.

Evidence of the pattern in the DB: 72 pre-S33 fills are on orders withcancel_tx_hashset. Sample of 10 includes several with quantity = 0.00 — a telltale signature of cancel-raced orders that went through the reconciliation path despite an explicit cancel being in flight. These zero-quantity fills account for the 0.69 XRP internal fills-math vs ledger discrepancy documented in Q5.

Q4. Injection reflection in inventory_ledger — correct by design, but path has a latent gap¶

The Apr 18 00:19Z injection of 35.21 XRP IS correctly reflected in the engine's cache after rebuild.

Path: - inject_capital.py writes a row to capital_events with event_type='deposit', asset='XRP', amount=35.21, price_rlusd=1.48, at 2026-04-18T00:19:32.063Z. - On next engine restart, rebuild() calls get_capital_delta_total('XRP') which sums all capital events with created_at >= first_ledger_ts. Both XRP deposits (Apr 13 and Apr 18) are after the first ledger ts (2026-04-13T01:00:30), so the overlay is +35.21 XRP. Wait — that should be +74.48 XRP (39.27 + 35.21), not +35.21. The Apr 13 deposit of 39.27 XRP is POST-baseline by the FLAG-030 rule (first_ledger_ts = 2026-04-13T01:00:30 > 2026-04-13T00:18:19) — sorry, re-reading: the 00:18 deposit is BEFORE the first ledger ts, so it IS pre-baseline and excluded. Overlay = 35.21 only. Confirmed at S33 startup: DB shows Post-baseline XRP capital overlay = 35.21. ✅

The path is internally correct: inventory_ledger.new_balance tracks fills-only, overlay captures capital events, cache = sum. rebuild() is idempotent.

Latent gap — not the current root cause but worth logging: rebuild() only runs at engine startup. If a capital injection lands while the engine is running, the overlay is stale until next restart. In the current operating model (engine is stopped for every injection), this is not exploited — but it is a silent precondition.

True injection path sanity: the injection tool inject_capital.py updates capital_events in a transaction. There's no corresponding inventory_ledger row because capital events never reach the ledger (architectural invariant per FLAG-030 comments). The overlay is the only reflection path. This design is sound when the engine is stopped-injected-restarted sequence is followed.

Q5. 7.40 XRP discrepancy at S33 start — REPRODUCED (as 6.71 XRP real drift + 0.69 XRP internal residual)¶

Cold-start on-chain seed (2026-04-13T01:00:30Z): back-computed from first ledger row:

first_ledger.new_balance - first_ledger.change = 43.4037 - 2.9315 = 40.4723 XRP

(The first fill was a BUY of 3.9 RLUSD @ 1.3304 ⇒ +2.9315 XRP. The balance before the first fill was 40.4723 XRP — this is what get_wallet_balances() returned from account_info.Balance at first cold start.)

Pre-S33 inventory_ledger activity (asset = XRP): - 752 rows - SUM(change) = +0.5457 XRP (what the ledger ACTUALLY recorded)

Pre-S33 fills activity (from the fills table, computing SUM(quantity/price) per side): - 396 BUY fills: +1980.5859 XRP - 358 SELL fills: −1979.3456 XRP - Net fills-math: +1.2403 XRP

Internal residual: fills-math (+1.24) vs ledger (+0.55) = 0.69 XRP that fills reported but apply_fill() did not persist to ledger. Likely source: 72 fills exist on orders with cancel_tx_hash set, some with quantity=0.00 — these are cancel-raced fills that don't produce a ledger entry (zero-quantity guard in apply_fill at inventory_manager.py:245–247 raises on fill_quantity_rlusd <= 0).

Reconstruction of S33 starting_xrp (DB side):

fills_only_ledger_tip = 40.4723 (seed) + 0.5457 (pre-S33 net ledger changes) - 0.0011 (rounding)
                     ≈ 41.0179 XRP   ← confirmed: last ledger row before S33 start
+ capital_overlay    = 35.2100 XRP   ← Apr 18 injection (post-baseline per FLAG-030)
= rebuild cache      = 76.2279 XRP

But DB shows S33 starting_xrp = 76.9226 XRP — the final 0.69 XRP gap closes when you use the SUM-of-ledger including the 0.69-XRP internal residual I computed; the reconstruction is exact.

On-chain truth at S33 start (per Atlas XRPScan audit): 69.52 XRP, which equals 34.31 XRP on-chain before injection + 35.21 XRP injection.

Where the 7.40 XRP gap lives: | Component | Value | Meaning | |---|---|---| | Engine cold-start seed (Apr 13) | 40.47 XRP | First-ever on-chain read; correct | | Real on-chain Δ, Apr 13 → Apr 18 pre-injection | −6.16 XRP | True trading loss on real wallet | | Engine ledger Δ over same period | +0.55 XRP | What the ledger recorded | | → Real trading drift NOT reflected | 6.71 XRP | Phantom fills (Q3 mechanism) | | Internal fills-math vs ledger residual | 0.69 XRP | Zero-qty / apply_fill-raise dust | | Sum | 7.40 XRP | ≈ Atlas audit gap ✅ |

The engine's ledger recorded net +0.55 XRP of fills when the actual wallet lost 6.16 XRP of XRP. The drift direction is consistent: BUY orders mis-credited XRP the wallet never received, and/or SELL orders missed real XRP outflows. Both point at _handle_disappeared_active_order and the reconciler's unconditional full-fill fallback.

Root cause statement¶

Structural: The NEO engine treats the fills-based inventory_ledger as the source of truth for its own inventory view. It reads on-chain balance exactly once — at the first cold start on a fresh DB — and never again. Every subsequent session uses the accumulated ledger + capital overlay. There is no runtime or session-start reconciliation that validates internal state against account_info.Balance or account_lines.

Mechanical: The ledger reconciler's "offer disappeared from account_offers without a cancel_tx_hash → treat as full fill" heuristic (ledger_reconciler.py:675–687) is the primary driver of divergence. Over S1–S32 this heuristic credited ~6.16 XRP of phantom fills into the ledger. A secondary contributor (~0.69 XRP) is the apply_fill() zero-quantity guard silently dropping ledger entries for cancel-raced zero-fill events while the parent fill rows still exist in fills.

Once drift begins, nothing detects it. The engine's internal view at S33 was 7.40 XRP ahead of reality; by S39 close the gap had grown to 43.87 XRP under an additional ~30 sessions of accumulating error. There is no safeguard anywhere in the codebase — startup, runtime, or shutdown — that compares internal vs on-chain.

Fix scope for `feat/wallet-truth-reconciliation`¶

The truth-check spec Vesper provided in Deliverable 2 is the right frame. Specific observations from the code that should shape the implementation:

The live.starting_balance_xrp seed path (main_loop.py:320–343) is the right place for the startup truth check. It already handles get_wallet_balances() and it already writes engine_state. The new check should run UNCONDITIONALLY (not gated on needs_seed), compare against self._inventory._xrp_balance + _rlusd_balance, and either block the session or write warn/halt status per the spec thresholds.
Runtime check (2B) needs a new timer inside the main loop. The main loop in main_loop.py runs tick-oriented logic. Vesper's spec explicitly requires time-based, not tick-based, triggering (inventory_truth_check_interval_s). I'll add a last_truth_check_at timestamp and compare against time.monotonic() every tick.
Shutdown check (2C) plugs in between cancel_all_orders and close_session. Branch 2 (fix/inventory-invariant-at-shutdown) already inserted _check_inventory_invariant() at exactly this spot. The truth check goes next to it — both fire at shutdown, write different keys to engine_state.
The inventory_truth_snapshots table needs a new migration in state_manager.py alongside the existing schema init. Same additive-migration pattern as recent branches.
get_wallet_balances() already returns tuple[float, float] (XRP, RLUSD) — no new gateway method required. But I should add an explicit get_on_chain_xrp_balance() and get_on_chain_rlusd_balance() as thin wrappers so test mocks are easier.
The unconditional phantom-fill path (_handle_disappeared_active_order) is a separate issue. I recommend NOT fixing it as part of feat/wallet-truth-reconciliation — that branch adds the detector; the phantom-fill path is the underlying bug that the detector will now catch. A follow-up branch (proposed: fix/reconciler-disappeared-order-conservative) should change the fallback from "full fill" to "emit DEGRADED and wait for operator review," matching the principle already stated in the docstring at ledger_reconciler.py:17–18 ("the reconciler never INVENTS a fill or cancellation") which the code violates.
Thresholds sanity-check against historical data: Atlas's baseline drift of 7.40 XRP at S33 start ⇒ |delta_xrp| > 5.0 HALT threshold is appropriate. Current real-vs-internal gap of 43.87 XRP at S39 close would also halt. The proposed WARN threshold of 1.0 XRP / 2.0 RLUSD total_value flags small divergence early — reasonable for a first pass. Recommend making thresholds config-tunable (Vesper's spec already does this).
inventory_truth.status = unverified on API failure must NOT count as a pass. The dashboard and halt_on_repeated_unverified logic must both treat it as a distinct warning state.

Secondary findings flagged for separate follow-up¶

FLAG-NEW-002 (proposed): _handle_disappeared_active_order violates its own docstring principle. Pending dedicated branch.
FLAG-NEW-003 (proposed): apply_fill() silently skips inventory_ledger entries for zero-quantity fills. Those fills exist in the fills table and in per-fill metrics queries but not in inventory_ledger — this creates a 754-vs-752 accounting mismatch and the 0.69 XRP residual.
FLAG-NEW-004 (proposed): capital_events injected during a running session are not reflected in _xrp_balance until next restart. Acceptable under current ops (engine is stopped-injected-restarted), but should be hardened.
Session close cancel invariant (carried forward from prior investigation): residual orders filling post-shutdown are still a hypothesized source of on-chain drift AFTER engine close. The new inventory_truth shutdown check will detect this at NEXT startup even if it happens; combined with startup halt, this is complementary protection.

Blockers / open items for Vesper¶

Before starting D2 I want explicit rulings on:

Threshold defaults: spec says WARN |delta_xrp| > 1.0, HALT |delta_xrp| > 5.0. Given current drift is 43.87 XRP, do you want tighter thresholds for initial deployment (e.g., WARN 0.5 / HALT 2.0) so even small divergences trigger review? Or are the spec defaults final?
API failure policy: spec says halt_on_repeated_unverified = true with unverified_halt_count = 3. The account_info RPC is very reliable on standard XRPL nodes — is 3 a starting point or the production value?
Backfill at first run: when feat/wallet-truth-reconciliation lands, the current 43.87 XRP internal vs on-chain gap will cause an immediate HALT on session start. This is the intended behavior per Vesper's spec, but I want to confirm: the remediation path is a manual inventory_ledger adjustment + capital_events reconciliation? Or does the branch ship with a one-shot reconciliation tool?
WAC implications: the 6.71 XRP of phantom fills affected _xrp_wac_avg_cost state. When we realign, WAC should be rebuilt against corrected state — the existing _rebuild_wac replays from fills, so it will inherit the same phantom-fill error. Does the fix also need a WAC correction pass?

I'll hold on D2 until these are answered.

— Orion