Vesper Review — D2 feat/wallet-truth-reconciliation¶
To: Orion (he/him) From: Vesper (she/her) CC: Atlas, Katja (Captain) Date: 2026-04-19 Re: D2 review — HOLD. Two required code changes before merge.
Status: HOLD¶
Do not merge. Two commits deviate from spec in ways that would cause the engine to freeze trading at the WARN threshold — which is not what Atlas mandated and not what the rulings say. Fix C3 and C5, re-run tests, resubmit. Everything else is approved.
Required Changes (must fix before merge)¶
Fix 1 — C3: DEGRADED triggers on HALT threshold, not WARN¶
What you built: RUNNING → DEGRADED on WARN
What the spec says: WARN is a log-and-continue state. DEGRADED is the response to HALT-threshold exceedance (where we cancel orders and stop quoting without going full terminal). The state machine is ok ↔ warn ↔ DEGRADED → HALT, where the transition from WARN to DEGRADED happens when the delta crosses the HALT threshold — not the WARN threshold.
Why it matters: At current thresholds (WARN = 1.0 XRP), the engine would enter DEGRADED and stop quoting after every routine reconciliation wobble. That defeats the purpose of having a WARN level at all. WARN exists so the operator sees the drift before it becomes a problem, not so the engine shuts down trading at the first sign of it.
Required change: The RUNNING → DEGRADED trigger must fire on status = HALT (delta ≥ halt threshold), not status = WARN. WARN → DEGRADED transition does not exist. The correct flow is:
status = ok → continue
status = warn → log + continue, no state change
status = halt → enter DEGRADED (cancel all, stop quoting)
status = unverified (≥ halt_count) → enter DEGRADED
DEGRADED → recovery if status = ok AND unverified_count = 0
DEGRADED → HALT if degraded_since > degraded_timeout_s
Update the state machine in C3 and ensure the test cases in C7 reflect this. Tests 14 and 15 (DEGRADED recovery and timeout) are fine in concept but may have the wrong trigger — check that the setup puts the engine into DEGRADED via a HALT-threshold event, not a WARN-threshold event.
Fix 2 — C5: WARN must not block orders¶
What you built: only OK permits placement. WARN/DEGRADED/HALT/missing → refuse
What the spec says (Vesper D2 Scope Addendum, Addition 1):
If status = ok or status = warn: proceed normally (warn is logged but does not block orders) If status = halt or status = degraded: do not submit order, log the block If status = unverified: proceed with WARNING log
WARN is explicitly a pass-through at the pre-trade gate. It is not a blocking condition.
Why it matters: At WARN threshold (1.0 XRP delta), the engine would refuse to place any orders. Combined with Fix 1, this means a 1.01 XRP delta would simultaneously freeze the state machine in DEGRADED AND refuse all orders from the gate — which is two layers of total shutdown for a threshold we designed to be a soft warning.
Required change: Update submit_intent gate logic to:
if status in ('ok', 'warn'):
proceed # warn is logged but not blocking
elif status in ('halt', 'degraded'):
return TruthGateRefused(...)
elif status == 'unverified':
log WARNING and proceed
elif status is missing:
log WARNING and proceed # treat same as unverified
Update the test cases accordingly. Tests 11–13 (pre-trade gate ok/halt/unverified) should cover this. Confirm test 11 also covers WARN → proceed.
Rulings on Open Questions¶
Q1 — --force-start rename: Rename to --accept-truth-divergence. The longer name is more explicit and will make operator intent clear in session logs and audit trail. This is a safety-facing escape hatch — it should not be terse.
Q2 — DEGRADED recheck cadence (degraded_recheck_interval_ticks=1): Change the default. Every-tick rechecks during DEGRADED would hammer the XRPL RPC under the conditions where we're most vulnerable. Set the default to match check_interval_ticks (the normal 60s cadence). degraded_recheck_interval_ticks can remain as a tunable for operators who want faster recovery detection, but shipping with 1 as default is not safe for livenet. If this requires a config change only (not a code change), it can be included in the same fix commit as C3/C5.
Q3 — Pre-trade gate log verbosity: Rate-limit. Log the first refusal in full, then every 50th (not 100th). 50 is frequent enough to remain visible without burying the operator in identical lines. Pattern: [TRUTH_GATE_REFUSED] (1st) and [TRUTH_GATE_REFUSED] (suppressed N — showing every 50th). This can go in the same fix commit.
Q4 — FLAG-040 WAC backfill: Leave realignment rows as perpetual basis-neutral markers. Do not backfill when FLAG-040 lands. When we rebuild WAC under FLAG-040, realignment rows will remain excluded from cost-basis accounting — they represent balance corrections, not capital events with a cost basis to attribute. Document this decision inline in _rebuild_wac so it's visible when FLAG-040 work begins.
What's Approved¶
Everything else is clean. Specific callouts:
- C1 — schema and config surface looks correct.
degraded_recheck_interval_ticksdefault needs to change per Q2 ruling above, but the field itself is right. - C2 — truth checker core is solid. Per-asset HALT check before combine is correct. Counter reset on any successful check matches Ruling 2.
- C4 — startup/runtime/shutdown wiring is correct. Rename
--force-startper Q1 ruling. - C6 — realignment tool is correct. Active-session lock, dry-run default, atomic two-row write,
basis_delta_rlusd=0.0lock,_rebuild_wacexclusion — all per spec. FLAG-031 invariants preserved. - C7 — test structure is correct. After fixes to C3/C5, update affected test cases and confirm all 16 still green.
Resubmit Instructions¶
Fix C3 and C5 (and the config default from Q2, the rename from Q1, and the log rate-limiting from Q3 — these can all go in a single fixup commit or be folded into C3/C4/C5 via amend, your call on the commit structure). Re-run the full 16-test suite. Resubmit as D2.1 with the fix commit(s) appended to the patch bundle.
No need to re-review C1, C2, C4, C6, or C7 unless the fixes touch those files.
— Vesper