Skip to content

Orion Investigation — Branch #6 feat/distance-to-touch-diagnostic

To: Katja CC: Vesper, Atlas From: Orion Date: 2026-04-18


Acknowledgment

Branch #5 is in. Green-lit for Branch #6 per Katja's ruling:

  • Workspace path (Windows): C:\Users\Katja\Documents\Claude Homebase Neo\
  • Repo path (Windows): C:\Users\Katja\Documents\NEO GitHub\neo-2026\
  • Test-drift rule: before any patch that modifies a pre-existing file, I request current head-of-main content from Katja's repo. New files, no paste needed.

This memo is pre-code. No commits, no patches. Q1–Q3 at the bottom — code waits on rulings.

Scope

Ship the primary Phase 7.3 metric per Atlas Pre-7.3 Review §9 and the pre-7.3 gate: per-tick, per-side distance from our active quote to the contra touch (best_bid for BUY, best_ask for SELL), expressed in bps.

This is the signal Phase 7.3 offset calibration will optimize against. S38's failure mode — "quotes 11–12 bps from touch in a 4 bps market" — is only visible because someone eyeballed it. Branch #6 makes it a first-class, persisted, summarized metric so offset sweeps can be compared session-over-session, not vibe-checked.

Success criteria

  1. Per-tick distance-to-touch emitted in structured logs (live operational signal).
  2. Per-tick persistence (so Phase 7.3 can compute session stats and compare runs).
  3. Session-aggregate stats in the paper-run summary (min, p50, p95, max per side).
  4. No behavioral change — diagnostic only. No quote placement logic touches.

Explicitly NOT in scope: offset calibration itself (that's Phase 7.3), adaptive offset adjustment (a future phase), toxicity-adjusted distance. This is instrumentation only.

Findings

A. What already exists (partial/related surfaces)

Summary-time dist_to_bid / dist_to_asksummarize_paper_run.py:900-901 computes same-side-to-touch distance for whatever active orders are observed at render time:

bid_dist = ((qq.get("best_bid") or 0) - (qq.get("our_bid") or 0)) / mid * 10000 if mid > 0 and qq.get("our_bid") else None
ask_dist = ((qq.get("our_ask") or 0) - (qq.get("best_ask") or 0)) / mid * 10000 if mid > 0 and qq.get("our_ask") else None

This is a single-point snapshot of the last state queried at summary time. It is not a time series, it is not persisted, and the denominator is mid (not the same-side touch). Useful at end-of-session; useless for live operational feedback or cross-session comparison. Renamed/kept as a legacy top-line once Branch #6's aggregated stats land — we can let Vesper rule on whether to delete.

Summary-time bid_near_5bps / ask_near_5bpssummarize_paper_run.py:629-749. Post-hoc reconstruction: walks system_metrics tick timestamps, bisect-joins against market_snapshots for mid and against orders history for the active order at each tick, counts ticks where same-side-to-mid distance ≤ 5 bps. Semantics:

  • Denominator is mid, not touch.
  • Count-only: 1 tick within 5 bps == 100 ticks within 5 bps on the output.
  • Reconstruction: relies on orders.created_at/updated_at to infer active windows. Works, but fragile.

Branch #6 replaces these with per-tick, per-side, to-touch measurements persisted inline with tick telemetry — no post-hoc reconstruction needed, no mid/touch ambiguity.

Tick telemetry pathNEOEngine._emit_tick_telemetry (main_loop.py:1620-1666). Already writes market_snapshots (has best_bid, best_ask, mid_price), inventory_snapshots, and system_metrics per tick. This is the natural seam to add distance-to-touch columns — same transaction boundary, same "persist without affecting trading" contract.

B. What's missing (the actual gap)

  1. No per-tick live emission. Our only same-side-to-touch measurement is end-of-session.
  2. No per-tick persistence. system_metrics schema (state_manager.py:366-380) has no distance columns.
  3. No session aggregation. We can't produce "p50 distance-to-touch on BUY side" without new persistence.

C. Where to compute the metric

The computation is stateless and requires only two inputs:

  1. snapshot.best_bid, snapshot.best_ask — already live on MarketSnapshot (market_data.py:47-95).
  2. Our active quote price per side — available two ways:
  3. From intent this tick — returned by StrategyEngine.calculate_quote() in OrderIntent.price (strategy_engine.py:489, 534). Available at main_loop.py:1197 via intents.
  4. From live orders already in flight — available via state.get_live_order_by_side(OrderSide.BUY/SELL).

The intent path gives "what we would place this tick"; the live-orders path gives "what we actually have resting on the book right now." Those diverge (e.g., tick suppresses new intent but a prior order is still live). Q3 territory.

D. Proposed formula and sign convention

Same-side, to-touch. For each side, denominator is the same-side touch (not mid):

distance_to_touch_bid_bps = (best_bid - our_buy_price) / best_bid * 10000
distance_to_touch_ask_bps = (our_sell_price - best_ask) / best_ask * 10000

Sign convention (signed):

Value Interpretation
> 0 Passive — we're behind the touch (below best_bid on BUY, above best_ask on SELL). Normal market-making posture.
= 0 At touch — joining the top of the book.
< 0 Improving or crossing — our price is better than the contra touch (above best_bid on BUY, below best_ask on SELL). Either pennying or a pricing bug.

S38's "quotes 11–12 bps from touch in a 4 bps market" becomes distance_to_touch_bid_bps ≈ +11.5 for BUY — positive, i.e. too passive. This is the signal we want to sweep against.

Aggregates (min, p50, p95, max) are computed over signed values. Abs is a presentation choice applied per metric in the summary if useful.

E. Proposed commit structure

Pending Katja Q1–Q3.

Commit 1 — schema: add distance-to-touch columns to system_metrics.

  • Extend CREATE TABLE IF NOT EXISTS system_metrics in state_manager.py:366-380 with two new columns:
  • distance_to_touch_bid_bps REAL
  • distance_to_touch_ask_bps REAL
  • Add _ensure_column migration entries alongside the existing ones around state_manager.py:469-475, so live DBs pick up the new columns on next startup (no re-migration).
  • Extend record_system_metric() signature (state_manager.py:1390-1440) with the two new optional float params, slotted into the INSERT.
  • Test: table has new columns post-migration; record_system_metric accepts and persists the new values; legacy rows have NULL (backward-compatible read).

Commit 2 — compute + persist + log per tick.

  • In _emit_tick_telemetry (main_loop.py:1620-1666), compute both values from snapshot.best_bid/best_ask and our active quote price per side (source TBD in Q3). Guard against None / zero divisors.
  • Pass both values into record_system_metric().
  • Emit a structured log line per tick at log.info level: dist_to_touch bid_bps=X.X ask_bps=X.X (or whatever format Vesper prefers — easy to tune).
  • Test: happy path (bid+ask both present, both live orders) persists both; missing sides (no live BUY) persists NULL on that side; invalid snapshot (best_bid None) persists NULLs on both; stale intent vs live order path per Q3 ruling.

Commit 3 — session aggregates in summarize_paper_run.py.

  • New summary block render_distance_to_touch_summary(qq: dict) -> str keyed on session_id. Aggregates from system_metrics.distance_to_touch_{bid,ask}_bps.
  • Per side: count of non-NULL ticks, min, p50, p95, max (signed), plus count of ticks < 0 (improving/crossing — should be 0 in normal operation).
  • Wire into the existing main() render path.
  • Test: 4 synthetic rows in an in-memory DB produce expected min/p50/p95/max; NULL rows excluded from counts; empty session returns "no data" line without crashing.

Test posture estimate

  • New tests: ~8 (2 for schema/migration, 3 for emit path, 3 for aggregate render). All new files, no test-drift risk.
  • Pre-existing tests touched: test_state_manager.py may need a record_system_metric fixture update if it pins the old signature. I'll request head-of-main paste when I get there.

What is explicitly out of scope

  • Offset calibration itself. Phase 7.3 uses this metric; it does not modify quote placement.
  • Adaptive offset adjustment. Future phase.
  • Cross-side distance (e.g., BUY price vs. best_ask). Not asked for and not obviously useful.
  • Mid-based distance. We keep to-touch; the existing mid-based near_5bps count stays as legacy at Vesper's discretion.
  • Historical backfill. Old sessions keep NULL for the new columns; no attempt to reconstruct pre-Branch-6 distance-to-touch from existing logs.

Open questions — Q1, Q2, Q3

Q1 — Persistence shape: columns on system_metrics, separate tick_diagnostics table, or both?

Two clean options:

  • (a) Two columns on system_metrics. Leverages the existing per-tick insert already happening in _emit_tick_telemetry. Same transaction, no new table. Atlas #9 called it "per-tick" which system_metrics already is. If Phase 7.3 wants a third or fourth diagnostic down the line (e.g., effective_offset_bps, anchor_error_bps_persisted) they slot in next to distance-to-touch as new columns. Minimal surface.
  • (b) Separate tick_diagnostics table. Cleaner separation (system_metrics is "system health"; tick_diagnostics is "quote-quality"). But: we double the insert path, double the migration, and double the query surface for summarize_paper_run.py. No operational benefit I can identify unless we expect tick_diagnostics to grow to 20+ columns — which we don't.

My recommendation: (a) — two columns on system_metrics. Flag if you'd rather we carve out a clean tick_diagnostics table now and pay the migration cost once instead of drifting back to it in Phase 8.

Q2 — Signed or abs?

  • (a) Signed. Preserves direction. Crossing/improving (< 0) is operationally distinct from passive (> 0) and needs to surface — a negative value at any tick is a yellow flag (pennying competitor) or red flag (pricing bug).
  • (b) Abs. Simpler aggregates. Hides direction entirely.

My recommendation: (a) signed in persistence and aggregates; apply abs() per stat in the summary render if Vesper wants cleaner display. Costs nothing to keep sign at the column level.

Q3 — Emit on all ticks or only on intent-producing ticks?

Two semantics:

  • (a) Intent-producing ticks only. Metric = distance between the quote we would place this tick and the contra touch. Data source: intents returned by calculate_quote(). NULL on ticks that produce no intent (invalid market, suppression by participation filter, inventory guards, dedupe vs. live order).
  • (b) Every tick with a live order. Metric = distance between our actually-live order on the book and the contra touch. Data source: state.get_live_order_by_side(side). NULL on ticks with no live order that side. Captures the "we placed a quote at t=0, it sat for 60 ticks, best_bid moved, our quote got stale" behavior — which is exactly the staleness Phase 7.3 cares about.
  • (c) Both. Two metrics per side: distance_to_touch_{bid,ask}_intent_bps (current-tick intent) and distance_to_touch_{bid,ask}_live_bps (resting order). Four columns total on system_metrics.

S38's problem was stale-quote-vs-moving-touch. That's (b). But (a) is cheaper and reveals pricing-engine posture. (c) is the most information for Phase 7.3 at the cost of schema surface.

My recommendation: (b) — live order on the book. This is what Phase 7.3 optimizes against: the actual distance competitors see at the actual moment they see it. Intent-only (a) can't surface staleness, and (c) is over-engineering until proven needed.

Flag if you want (c) — trivial to extend if the persistence shape in Q1 is (a) "columns on system_metrics"; we just add two more.


Standing by for Q1–Q3 rulings. No code until the rulings land.

— Orion