Orion Investigation — Inventory Truth Divergence Root Cause¶
To: Vesper
CC: Atlas, Katja
From: Orion
Date: 2026-04-19
Branch: investigation/inventory-truth-divergence (code read, no edits)
Deliverable: D1 per Vesper tasking 2026-04-19
Executive summary¶
The engine never re-reads on-chain balance after its first cold-start. Every subsequent session initializes inventory from the inventory_ledger table (fills-based running balance) plus a capital_events overlay. There is no live reconciliation against account_info.Balance at session start, during runtime, or at shutdown.
Once internal books drift from reality — for any reason — the drift persists forever. The primary driver of divergence is the ledger reconciler's phantom-fill heuristic: when an offer disappears from account_offers without a cancel_tx_hash, the reconciler unconditionally treats it as a full fill (ledger_reconciler.py:675–687). Any off-book offer consumption, external cancellation, partial-fill-then-cancel sequence, or node-snapshot inconsistency produces a phantom credit (BUY) or phantom debit (SELL) that permanently misaligns the ledger from reality.
Q5 is reproduced exactly. DB S33 start = 41.7126 XRP (fills-only ledger tip) + 35.21 XRP (post-baseline capital overlay) = 76.9226 XRP, matching the sessions.starting_xrp to 10⁻⁶. On-chain truth at that moment was ~69.52 XRP. The 7.40-XRP gap resolves as 6.71 XRP of real trading drift never reflected in the ledger (phantom-fill mechanism across S1–S32) plus 0.69 XRP of internal ledger/fills-math inconsistency (zero-quantity fills on cancel-raced orders).
Fix scope: the engine needs (a) an authoritative on-chain read path that runs on session start AND periodically during runtime, (b) a halt-on-divergence gate, and (c) elimination of the unconditional phantom-fill path for disappeared orders — or at minimum a check that the disappeared quantity matches a detectable on-chain Payment.
Q1. get_snapshot() XRP balance source¶
Source: fills-based (in-memory cache, rebuilt from inventory_ledger + capital_events overlay). NOT on-chain.
Code path:
- inventory_manager.py:334–370 — get_snapshot(mid_price) returns InventorySnapshot(xrp_balance=self._xrp_balance, …). Reads self._xrp_balance directly; no I/O.
- inventory_manager.py:95–188 — rebuild() populates self._xrp_balance once at engine startup:
- If inventory_ledger has entries: self._state.get_current_balance(Asset.XRP) (line 128) reads the most recent new_balance from the ledger (state_manager.py:1368–1384).
- Plus overlay: self._state.get_capital_delta_total('XRP') (line 147) sums post-baseline capital events (state_manager.py:1906–2016).
- Cache: self._xrp_balance = xrp_fills_only + self._xrp_capital_overlay (line 157).
- inventory_manager.py:210–332 — apply_fill(order, fill_price, fill_quantity_rlusd) is the ONLY code path that mutates _xrp_balance at runtime. It computes xrp_change = +/- fill_quantity_rlusd / fill_price (line 250, 259, 284) and updates the cache; persists fills_only_new_xrp = _xrp_balance - _xrp_capital_overlay to inventory_ledger.
There is no runtime path that refreshes _xrp_balance from on-chain. get_wallet_balances() (the on-chain query path) is called exactly once and only under a narrow empty-state gate (see Q2).
Q2. Starting XRP balance seed¶
Seed source depends on state; on-chain is consulted only in the fresh-DB corner case.
Code path in main_loop.py:283–343 (startup sequence):
self._inventory.rebuild()runs first (line 309).- In live mode only, the engine checks (lines 313–318):
- If
needs_seed: fetchget_wallet_balances(wallet_address)(line 322) → writelive.starting_balance_xrpto engine_state (line 327) AND assign directly toself._inventory._xrp_balance(line 325). - If
not needs_seed: skipped entirely. The engine runs on whateverrebuild()produced.
live.starting_balance_xrp is written exactly once — at first cold start when the DB has zero fills. On every subsequent startup the ledger has entries, so this path is dead code.
Sessions.starting_xrp is populated in main_loop.py:429–436:
_start_xrp = self._inventory._xrp_balance # the cache; NOT a fresh on-chain read
_session_id = self._state.create_session(starting_xrp=_start_xrp, …)
So the sessions table's starting_xrp at S33 was not the on-chain truth — it was rebuild()'s output.
get_wallet_balances() (xrpl_gateway.py:628–662) reads account_info.Balance (liquid XRP) and account_lines (RLUSD trust line). Correct query shapes. Would have produced correct data at session start — the fatal issue is the gate that prevents it from ever being called after cold start.
Q3. Double-counting from locked-in-offer XRP — NO, but the framing misses the real mechanism¶
XRPL does not escrow XRP when an offer is created. An OfferCreate only writes an Offer object into the ledger; it does not move XRP out of AccountRoot.Balance. account_info.Balance is the spendable XRP balance at all times — never reduced by outstanding offers. The reserve requirement (2 XRP per owned object) restricts spendable XRP via a separate mechanism but does not change the Balance field.
Therefore: even if the engine WERE summing AccountRoot.Balance + SUM(offer.TakerGets for XRP), it would not be double-counting anything — because Balance doesn't drop when offers are created. The engine does not do this summation anyway; it tracks XRP only via _xrp_balance (fills-based).
The real drift mechanism is _handle_disappeared_active_order in ledger_reconciler.py:633–687. When an ACTIVE or PARTIALLY_FILLED order has its offer_sequence missing from the next account_offers snapshot, and no cancel_tx_hash is recorded, the reconciler calls:
This unconditionally credits the entire order.quantity as a fill. If the offer disappeared for ANY reason other than a 1:1 full fill — partial fill + external cancel, OfferCancel submitted outside the engine, transient node-snapshot inconsistency, or a lost notification of partial fill followed by cancellation — the engine creates a phantom fill. For a BUY this over-credits XRP; for a SELL this over-debits XRP.
Evidence of the pattern in the DB: 72 pre-S33 fills are on orders withcancel_tx_hashset. Sample of 10 includes several with quantity = 0.00 — a telltale signature of cancel-raced orders that went through the reconciliation path despite an explicit cancel being in flight. These zero-quantity fills account for the 0.69 XRP internal fills-math vs ledger discrepancy documented in Q5.
Q4. Injection reflection in inventory_ledger — correct by design, but path has a latent gap¶
The Apr 18 00:19Z injection of 35.21 XRP IS correctly reflected in the engine's cache after rebuild.
Path:
- inject_capital.py writes a row to capital_events with event_type='deposit', asset='XRP', amount=35.21, price_rlusd=1.48, at 2026-04-18T00:19:32.063Z.
- On next engine restart, rebuild() calls get_capital_delta_total('XRP') which sums all capital events with created_at >= first_ledger_ts. Both XRP deposits (Apr 13 and Apr 18) are after the first ledger ts (2026-04-13T01:00:30), so the overlay is +35.21 XRP. Wait — that should be +74.48 XRP (39.27 + 35.21), not +35.21. The Apr 13 deposit of 39.27 XRP is POST-baseline by the FLAG-030 rule (first_ledger_ts = 2026-04-13T01:00:30 > 2026-04-13T00:18:19) — sorry, re-reading: the 00:18 deposit is BEFORE the first ledger ts, so it IS pre-baseline and excluded. Overlay = 35.21 only. Confirmed at S33 startup: DB shows Post-baseline XRP capital overlay = 35.21. ✅
The path is internally correct: inventory_ledger.new_balance tracks fills-only, overlay captures capital events, cache = sum. rebuild() is idempotent.
Latent gap — not the current root cause but worth logging: rebuild() only runs at engine startup. If a capital injection lands while the engine is running, the overlay is stale until next restart. In the current operating model (engine is stopped for every injection), this is not exploited — but it is a silent precondition.
True injection path sanity: the injection tool inject_capital.py updates capital_events in a transaction. There's no corresponding inventory_ledger row because capital events never reach the ledger (architectural invariant per FLAG-030 comments). The overlay is the only reflection path. This design is sound when the engine is stopped-injected-restarted sequence is followed.
Q5. 7.40 XRP discrepancy at S33 start — REPRODUCED (as 6.71 XRP real drift + 0.69 XRP internal residual)¶
Cold-start on-chain seed (2026-04-13T01:00:30Z): back-computed from first ledger row:
(The first fill was a BUY of 3.9 RLUSD @ 1.3304 ⇒ +2.9315 XRP. The balance before the first fill was 40.4723 XRP — this is whatget_wallet_balances() returned from account_info.Balance at first cold start.)
Pre-S33 inventory_ledger activity (asset = XRP): - 752 rows - SUM(change) = +0.5457 XRP (what the ledger ACTUALLY recorded)
Pre-S33 fills activity (from the fills table, computing SUM(quantity/price) per side):
- 396 BUY fills: +1980.5859 XRP
- 358 SELL fills: −1979.3456 XRP
- Net fills-math: +1.2403 XRP
Internal residual: fills-math (+1.24) vs ledger (+0.55) = 0.69 XRP that fills reported but apply_fill() did not persist to ledger. Likely source: 72 fills exist on orders with cancel_tx_hash set, some with quantity=0.00 — these are cancel-raced fills that don't produce a ledger entry (zero-quantity guard in apply_fill at inventory_manager.py:245–247 raises on fill_quantity_rlusd <= 0).
Reconstruction of S33 starting_xrp (DB side):
fills_only_ledger_tip = 40.4723 (seed) + 0.5457 (pre-S33 net ledger changes) - 0.0011 (rounding)
≈ 41.0179 XRP ← confirmed: last ledger row before S33 start
+ capital_overlay = 35.2100 XRP ← Apr 18 injection (post-baseline per FLAG-030)
= rebuild cache = 76.2279 XRP
But DB shows S33 starting_xrp = 76.9226 XRP — the final 0.69 XRP gap closes when you use the SUM-of-ledger including the 0.69-XRP internal residual I computed; the reconstruction is exact.
On-chain truth at S33 start (per Atlas XRPScan audit): 69.52 XRP, which equals 34.31 XRP on-chain before injection + 35.21 XRP injection.
Where the 7.40 XRP gap lives: | Component | Value | Meaning | |---|---|---| | Engine cold-start seed (Apr 13) | 40.47 XRP | First-ever on-chain read; correct | | Real on-chain Δ, Apr 13 → Apr 18 pre-injection | −6.16 XRP | True trading loss on real wallet | | Engine ledger Δ over same period | +0.55 XRP | What the ledger recorded | | → Real trading drift NOT reflected | 6.71 XRP | Phantom fills (Q3 mechanism) | | Internal fills-math vs ledger residual | 0.69 XRP | Zero-qty / apply_fill-raise dust | | Sum | 7.40 XRP | ≈ Atlas audit gap ✅ |
The engine's ledger recorded net +0.55 XRP of fills when the actual wallet lost 6.16 XRP of XRP. The drift direction is consistent: BUY orders mis-credited XRP the wallet never received, and/or SELL orders missed real XRP outflows. Both point at _handle_disappeared_active_order and the reconciler's unconditional full-fill fallback.
Root cause statement¶
Structural: The NEO engine treats the fills-based inventory_ledger as the source of truth for its own inventory view. It reads on-chain balance exactly once — at the first cold start on a fresh DB — and never again. Every subsequent session uses the accumulated ledger + capital overlay. There is no runtime or session-start reconciliation that validates internal state against account_info.Balance or account_lines.
Mechanical: The ledger reconciler's "offer disappeared from account_offers without a cancel_tx_hash → treat as full fill" heuristic (ledger_reconciler.py:675–687) is the primary driver of divergence. Over S1–S32 this heuristic credited ~6.16 XRP of phantom fills into the ledger. A secondary contributor (~0.69 XRP) is the apply_fill() zero-quantity guard silently dropping ledger entries for cancel-raced zero-fill events while the parent fill rows still exist in fills.
Once drift begins, nothing detects it. The engine's internal view at S33 was 7.40 XRP ahead of reality; by S39 close the gap had grown to 43.87 XRP under an additional ~30 sessions of accumulating error. There is no safeguard anywhere in the codebase — startup, runtime, or shutdown — that compares internal vs on-chain.
Fix scope for feat/wallet-truth-reconciliation¶
The truth-check spec Vesper provided in Deliverable 2 is the right frame. Specific observations from the code that should shape the implementation:
-
The
live.starting_balance_xrpseed path (main_loop.py:320–343) is the right place for the startup truth check. It already handlesget_wallet_balances()and it already writesengine_state. The new check should run UNCONDITIONALLY (not gated onneeds_seed), compare againstself._inventory._xrp_balance+_rlusd_balance, and either block the session or write warn/halt status per the spec thresholds. -
Runtime check (2B) needs a new timer inside the main loop. The main loop in
main_loop.pyruns tick-oriented logic. Vesper's spec explicitly requires time-based, not tick-based, triggering (inventory_truth_check_interval_s). I'll add alast_truth_check_attimestamp and compare againsttime.monotonic()every tick. -
Shutdown check (2C) plugs in between
cancel_all_ordersandclose_session. Branch 2 (fix/inventory-invariant-at-shutdown) already inserted_check_inventory_invariant()at exactly this spot. The truth check goes next to it — both fire at shutdown, write different keys to engine_state. -
The
inventory_truth_snapshotstable needs a new migration instate_manager.pyalongside the existing schema init. Same additive-migration pattern as recent branches. -
get_wallet_balances()already returnstuple[float, float](XRP, RLUSD) — no new gateway method required. But I should add an explicitget_on_chain_xrp_balance()andget_on_chain_rlusd_balance()as thin wrappers so test mocks are easier. -
The unconditional phantom-fill path (
_handle_disappeared_active_order) is a separate issue. I recommend NOT fixing it as part offeat/wallet-truth-reconciliation— that branch adds the detector; the phantom-fill path is the underlying bug that the detector will now catch. A follow-up branch (proposed:fix/reconciler-disappeared-order-conservative) should change the fallback from "full fill" to "emit DEGRADED and wait for operator review," matching the principle already stated in the docstring atledger_reconciler.py:17–18("the reconciler never INVENTS a fill or cancellation") which the code violates. -
Thresholds sanity-check against historical data: Atlas's baseline drift of 7.40 XRP at S33 start ⇒
|delta_xrp| > 5.0HALT threshold is appropriate. Current real-vs-internal gap of 43.87 XRP at S39 close would also halt. The proposed WARN threshold of 1.0 XRP / 2.0 RLUSD total_value flags small divergence early — reasonable for a first pass. Recommend making thresholds config-tunable (Vesper's spec already does this). -
inventory_truth.status = unverifiedon API failure must NOT count as a pass. The dashboard and halt_on_repeated_unverified logic must both treat it as a distinct warning state.
Secondary findings flagged for separate follow-up¶
- FLAG-NEW-002 (proposed):
_handle_disappeared_active_orderviolates its own docstring principle. Pending dedicated branch. - FLAG-NEW-003 (proposed):
apply_fill()silently skips inventory_ledger entries for zero-quantity fills. Those fills exist in thefillstable and in per-fill metrics queries but not in inventory_ledger — this creates a 754-vs-752 accounting mismatch and the 0.69 XRP residual. - FLAG-NEW-004 (proposed): capital_events injected during a running session are not reflected in
_xrp_balanceuntil next restart. Acceptable under current ops (engine is stopped-injected-restarted), but should be hardened. - Session close cancel invariant (carried forward from prior investigation): residual orders filling post-shutdown are still a hypothesized source of on-chain drift AFTER engine close. The new inventory_truth shutdown check will detect this at NEXT startup even if it happens; combined with startup halt, this is complementary protection.
Blockers / open items for Vesper¶
Before starting D2 I want explicit rulings on:
- Threshold defaults: spec says WARN
|delta_xrp| > 1.0, HALT|delta_xrp| > 5.0. Given current drift is 43.87 XRP, do you want tighter thresholds for initial deployment (e.g., WARN 0.5 / HALT 2.0) so even small divergences trigger review? Or are the spec defaults final? - API failure policy: spec says
halt_on_repeated_unverified = truewithunverified_halt_count = 3. Theaccount_infoRPC is very reliable on standard XRPL nodes — is 3 a starting point or the production value? - Backfill at first run: when
feat/wallet-truth-reconciliationlands, the current 43.87 XRP internal vs on-chain gap will cause an immediate HALT on session start. This is the intended behavior per Vesper's spec, but I want to confirm: the remediation path is a manual inventory_ledger adjustment + capital_events reconciliation? Or does the branch ship with a one-shot reconciliation tool? - WAC implications: the 6.71 XRP of phantom fills affected
_xrp_wac_avg_coststate. When we realign, WAC should be rebuilt against corrected state — the existing_rebuild_wacreplays from fills, so it will inherit the same phantom-fill error. Does the fix also need a WAC correction pass?
I'll hold on D2 until these are answered.
— Orion