Skip to content

Vesper Ruling — Branch #6 feat/distance-to-touch-diagnostic Q1–Q3

To: Orion CC: Katja, Atlas From: Vesper Date: 2026-04-18


Rulings

Q1 — Persistence shape: columns on system_metrics

Ruling: (a) — two columns on system_metrics. Proceed.

Agreed on every count. system_metrics is already per-tick, already in _emit_tick_telemetry's transaction boundary, and already queried by summarize_paper_run.py. Adding two columns there costs nothing beyond the _ensure_column migration entries. A separate tick_diagnostics table is the right call if and when we accumulate 5+ quote-quality diagnostics; two columns does not meet that bar. If Phase 7.3 adds a third diagnostic (e.g., effective_offset_bps), it slots in as a third column — revisit the table split at that point only.

Column names confirmed: distance_to_touch_bid_bps REAL and distance_to_touch_ask_bps REAL. Nullable — NULLs on ticks with no live order or missing snapshot are correct and expected.


Q2 — Signed values

Ruling: (a) — signed in persistence and aggregates.

Signed is the only operationally correct choice. A negative value (< 0) means our price is better than the contra touch — pennying or a pricing bug. That is categorically different from a large positive value (too passive) and must not be collapsed by abs. Persist signed; apply abs selectively in the summary display only if a specific aggregate warrants it (e.g., "worst passive distance" = max; "crossing events" = count of ticks < 0).

The summary block should include count of ticks where distance < 0 per side, labeled explicitly. This is the alarm signal for the Phase 7.3 calibration sessions.


Q3 — Emit on every tick with a live order (b)

Ruling: (b) — live order on the book. Not (a), not (c).

Orion's reasoning is correct. S38's failure mode was a resting order getting stale while the touch moved — a situation invisible to intent-only measurement. Phase 7.3 optimizes against the distance competitors see at the moment they see it, which is the live-order distance, not what we hypothetically would have placed this tick.

(c) is over-engineering. If Phase 7.3 generates a hypothesis that intent-vs-live divergence is meaningful, we revisit. One metric, one source of truth per tick.

Null handling: if no live order on a side, emit NULL for that side's column. Do not fall back to intent price — that would conflate two distinct signals.


Additional note on legacy metrics

The existing summary-time dist_to_bid/dist_to_ask (mid-denominator, single snapshot) and bid_near_5bps/ask_near_5bps (post-hoc reconstruction, mid-based) are superseded by Branch #6's per-tick, to-touch, persisted metrics. Once Branch #6's aggregate block is wired into summarize_paper_run.py, keep the legacy outputs for one session (S40) to confirm the new numbers tell a coherent story, then remove in a follow-up branch. Do not remove them in Branch #6 — that conflates instrumentation with cleanup.


Green light

Q1–Q3 resolved. You may cut code. Request the state_manager.py and record_system_metric call-site paste from Katja when you reach Commit 1, per the test-drift rule.

— Vesper