Orion Investigation — Branch #6 feat/distance-to-touch-diagnostic¶
To: Katja CC: Vesper, Atlas From: Orion Date: 2026-04-18
Acknowledgment¶
Branch #5 is in. Green-lit for Branch #6 per Katja's ruling:
- Workspace path (Windows):
C:\Users\Katja\Documents\Claude Homebase Neo\ - Repo path (Windows):
C:\Users\Katja\Documents\NEO GitHub\neo-2026\ - Test-drift rule: before any patch that modifies a pre-existing file, I request current head-of-main content from Katja's repo. New files, no paste needed.
This memo is pre-code. No commits, no patches. Q1–Q3 at the bottom — code waits on rulings.
Scope¶
Ship the primary Phase 7.3 metric per Atlas Pre-7.3 Review §9 and the pre-7.3 gate: per-tick, per-side distance from our active quote to the contra touch (best_bid for BUY, best_ask for SELL), expressed in bps.
This is the signal Phase 7.3 offset calibration will optimize against. S38's failure mode — "quotes 11–12 bps from touch in a 4 bps market" — is only visible because someone eyeballed it. Branch #6 makes it a first-class, persisted, summarized metric so offset sweeps can be compared session-over-session, not vibe-checked.
Success criteria¶
- Per-tick distance-to-touch emitted in structured logs (live operational signal).
- Per-tick persistence (so Phase 7.3 can compute session stats and compare runs).
- Session-aggregate stats in the paper-run summary (min, p50, p95, max per side).
- No behavioral change — diagnostic only. No quote placement logic touches.
Explicitly NOT in scope: offset calibration itself (that's Phase 7.3), adaptive offset adjustment (a future phase), toxicity-adjusted distance. This is instrumentation only.
Findings¶
A. What already exists (partial/related surfaces)¶
Summary-time dist_to_bid / dist_to_ask — summarize_paper_run.py:900-901 computes same-side-to-touch distance for whatever active orders are observed at render time:
bid_dist = ((qq.get("best_bid") or 0) - (qq.get("our_bid") or 0)) / mid * 10000 if mid > 0 and qq.get("our_bid") else None
ask_dist = ((qq.get("our_ask") or 0) - (qq.get("best_ask") or 0)) / mid * 10000 if mid > 0 and qq.get("our_ask") else None
This is a single-point snapshot of the last state queried at summary time. It is not a time series, it is not persisted, and the denominator is mid (not the same-side touch). Useful at end-of-session; useless for live operational feedback or cross-session comparison. Renamed/kept as a legacy top-line once Branch #6's aggregated stats land — we can let Vesper rule on whether to delete.
Summary-time bid_near_5bps / ask_near_5bps — summarize_paper_run.py:629-749. Post-hoc reconstruction: walks system_metrics tick timestamps, bisect-joins against market_snapshots for mid and against orders history for the active order at each tick, counts ticks where same-side-to-mid distance ≤ 5 bps. Semantics:
- Denominator is mid, not touch.
- Count-only: 1 tick within 5 bps == 100 ticks within 5 bps on the output.
- Reconstruction: relies on
orders.created_at/updated_atto infer active windows. Works, but fragile.
Branch #6 replaces these with per-tick, per-side, to-touch measurements persisted inline with tick telemetry — no post-hoc reconstruction needed, no mid/touch ambiguity.
Tick telemetry path — NEOEngine._emit_tick_telemetry (main_loop.py:1620-1666). Already writes market_snapshots (has best_bid, best_ask, mid_price), inventory_snapshots, and system_metrics per tick. This is the natural seam to add distance-to-touch columns — same transaction boundary, same "persist without affecting trading" contract.
B. What's missing (the actual gap)¶
- No per-tick live emission. Our only same-side-to-touch measurement is end-of-session.
- No per-tick persistence.
system_metricsschema (state_manager.py:366-380) has no distance columns. - No session aggregation. We can't produce "p50 distance-to-touch on BUY side" without new persistence.
C. Where to compute the metric¶
The computation is stateless and requires only two inputs:
snapshot.best_bid,snapshot.best_ask— already live onMarketSnapshot(market_data.py:47-95).- Our active quote price per side — available two ways:
- From intent this tick — returned by
StrategyEngine.calculate_quote()inOrderIntent.price(strategy_engine.py:489, 534). Available atmain_loop.py:1197viaintents. - From live orders already in flight — available via
state.get_live_order_by_side(OrderSide.BUY/SELL).
The intent path gives "what we would place this tick"; the live-orders path gives "what we actually have resting on the book right now." Those diverge (e.g., tick suppresses new intent but a prior order is still live). Q3 territory.
D. Proposed formula and sign convention¶
Same-side, to-touch. For each side, denominator is the same-side touch (not mid):
distance_to_touch_bid_bps = (best_bid - our_buy_price) / best_bid * 10000
distance_to_touch_ask_bps = (our_sell_price - best_ask) / best_ask * 10000
Sign convention (signed):
| Value | Interpretation |
|---|---|
> 0 |
Passive — we're behind the touch (below best_bid on BUY, above best_ask on SELL). Normal market-making posture. |
= 0 |
At touch — joining the top of the book. |
< 0 |
Improving or crossing — our price is better than the contra touch (above best_bid on BUY, below best_ask on SELL). Either pennying or a pricing bug. |
S38's "quotes 11–12 bps from touch in a 4 bps market" becomes distance_to_touch_bid_bps ≈ +11.5 for BUY — positive, i.e. too passive. This is the signal we want to sweep against.
Aggregates (min, p50, p95, max) are computed over signed values. Abs is a presentation choice applied per metric in the summary if useful.
E. Proposed commit structure¶
Pending Katja Q1–Q3.
Commit 1 — schema: add distance-to-touch columns to system_metrics.
- Extend
CREATE TABLE IF NOT EXISTS system_metricsinstate_manager.py:366-380with two new columns: distance_to_touch_bid_bps REALdistance_to_touch_ask_bps REAL- Add
_ensure_columnmigration entries alongside the existing ones aroundstate_manager.py:469-475, so live DBs pick up the new columns on next startup (no re-migration). - Extend
record_system_metric()signature (state_manager.py:1390-1440) with the two new optional float params, slotted into the INSERT. - Test: table has new columns post-migration;
record_system_metricaccepts and persists the new values; legacy rows have NULL (backward-compatible read).
Commit 2 — compute + persist + log per tick.
- In
_emit_tick_telemetry(main_loop.py:1620-1666), compute both values fromsnapshot.best_bid/best_askand our active quote price per side (source TBD in Q3). Guard againstNone/ zero divisors. - Pass both values into
record_system_metric(). - Emit a structured log line per tick at
log.infolevel:dist_to_touch bid_bps=X.X ask_bps=X.X(or whatever format Vesper prefers — easy to tune). - Test: happy path (bid+ask both present, both live orders) persists both; missing sides (no live BUY) persists NULL on that side; invalid snapshot (best_bid None) persists NULLs on both; stale intent vs live order path per Q3 ruling.
Commit 3 — session aggregates in summarize_paper_run.py.
- New summary block
render_distance_to_touch_summary(qq: dict) -> strkeyed on session_id. Aggregates fromsystem_metrics.distance_to_touch_{bid,ask}_bps. - Per side: count of non-NULL ticks, min, p50, p95, max (signed), plus count of ticks < 0 (improving/crossing — should be 0 in normal operation).
- Wire into the existing
main()render path. - Test: 4 synthetic rows in an in-memory DB produce expected min/p50/p95/max; NULL rows excluded from counts; empty session returns "no data" line without crashing.
Test posture estimate¶
- New tests: ~8 (2 for schema/migration, 3 for emit path, 3 for aggregate render). All new files, no test-drift risk.
- Pre-existing tests touched:
test_state_manager.pymay need arecord_system_metricfixture update if it pins the old signature. I'll request head-of-main paste when I get there.
What is explicitly out of scope¶
- Offset calibration itself. Phase 7.3 uses this metric; it does not modify quote placement.
- Adaptive offset adjustment. Future phase.
- Cross-side distance (e.g., BUY price vs. best_ask). Not asked for and not obviously useful.
- Mid-based distance. We keep to-touch; the existing mid-based
near_5bpscount stays as legacy at Vesper's discretion. - Historical backfill. Old sessions keep NULL for the new columns; no attempt to reconstruct pre-Branch-6 distance-to-touch from existing logs.
Open questions — Q1, Q2, Q3¶
Q1 — Persistence shape: columns on system_metrics, separate tick_diagnostics table, or both?¶
Two clean options:
- (a) Two columns on
system_metrics. Leverages the existing per-tick insert already happening in_emit_tick_telemetry. Same transaction, no new table. Atlas #9 called it "per-tick" whichsystem_metricsalready is. If Phase 7.3 wants a third or fourth diagnostic down the line (e.g., effective_offset_bps, anchor_error_bps_persisted) they slot in next to distance-to-touch as new columns. Minimal surface. - (b) Separate
tick_diagnosticstable. Cleaner separation (system_metrics is "system health"; tick_diagnostics is "quote-quality"). But: we double the insert path, double the migration, and double the query surface forsummarize_paper_run.py. No operational benefit I can identify unless we expect tick_diagnostics to grow to 20+ columns — which we don't.
My recommendation: (a) — two columns on system_metrics. Flag if you'd rather we carve out a clean tick_diagnostics table now and pay the migration cost once instead of drifting back to it in Phase 8.
Q2 — Signed or abs?¶
- (a) Signed. Preserves direction. Crossing/improving (
< 0) is operationally distinct from passive (> 0) and needs to surface — a negative value at any tick is a yellow flag (pennying competitor) or red flag (pricing bug). - (b) Abs. Simpler aggregates. Hides direction entirely.
My recommendation: (a) signed in persistence and aggregates; apply abs() per stat in the summary render if Vesper wants cleaner display. Costs nothing to keep sign at the column level.
Q3 — Emit on all ticks or only on intent-producing ticks?¶
Two semantics:
- (a) Intent-producing ticks only. Metric = distance between the quote we would place this tick and the contra touch. Data source:
intentsreturned bycalculate_quote(). NULL on ticks that produce no intent (invalid market, suppression by participation filter, inventory guards, dedupe vs. live order). - (b) Every tick with a live order. Metric = distance between our actually-live order on the book and the contra touch. Data source:
state.get_live_order_by_side(side). NULL on ticks with no live order that side. Captures the "we placed a quote at t=0, it sat for 60 ticks, best_bid moved, our quote got stale" behavior — which is exactly the staleness Phase 7.3 cares about. - (c) Both. Two metrics per side:
distance_to_touch_{bid,ask}_intent_bps(current-tick intent) anddistance_to_touch_{bid,ask}_live_bps(resting order). Four columns total onsystem_metrics.
S38's problem was stale-quote-vs-moving-touch. That's (b). But (a) is cheaper and reveals pricing-engine posture. (c) is the most information for Phase 7.3 at the cost of schema surface.
My recommendation: (b) — live order on the book. This is what Phase 7.3 optimizes against: the actual distance competitors see at the actual moment they see it. Intent-only (a) can't surface staleness, and (c) is over-engineering until proven needed.
Flag if you want (c) — trivial to extend if the persistence shape in Q1 is (a) "columns on system_metrics"; we just add two more.
Standing by for Q1–Q3 rulings. No code until the rulings land.
— Orion