Orion Investigation — Anchor Error Per-Tick Telemetry (pre-code)¶
To: Vesper
From: Orion
Date: 2026-04-21
Branch (not yet created): feat/anchor-error-per-tick-telemetry
Re: Pre-code investigation answers to Q1–Q4 from your 2026-04-21 tasking. No branch pre-created; no code written. Awaiting green light.
TL;DR¶
- Column name is clear (
anchor_error_bps, no duplicate exists). - Migration pattern is clean (
_ensure_columnalready applied twice tosystem_metrics). - Persistence shape is clean (Python
None→ SQLNULLvia sqlite3 binding; precedent indistance_to_touch_*). - One real finding to rule on: on invalid-snapshot ticks,
self._strategy.last_anchor_divergence_bpsis STALE (not None) becausecalculate_quoteearly-returns before the line that updates it. If we just read the field and write it, invalid-snapshot rows will carry last-tick's value. The anchor saturation guard's rolling window does NOT have this problem because its per-tick append is gated onsnapshot.is_valid()ANDdiv_bps is not Noneatmain_loop.py:2555–2563. Recommended fix: mirror that gate in the telemetry write so invalid ticks persistNULL. See Q1 below.
Q1 — system_metrics write location + last_anchor_divergence_bps availability¶
Write site¶
NEOEngine._persist_tick_telemetry(snapshot, inventory, tick_latency_ms, recon_result, risk_status) at neo_engine/main_loop.py:3123. This helper calls:
self._state.record_market_snapshot(...)— writes tomarket_snapshots.self._compute_distance_to_touch(snapshot)— computes Branch #6 values.self._state.record_inventory_snapshot(...)— writes toinventory_snapshots.self._state.record_system_metric(...)— writes tosystem_metrics. This is the insert we extend.
Called once per tick from NEOEngine._tick() at main_loop.py:2810, AFTER the strategy step (self._strategy.calculate_quote(...) at line 2550) and AFTER Step 8.5 (anchor saturation guard evaluation). Confirmed: the saturation guard reads last_anchor_divergence_bps before we persist it, so the telemetry row carries the same value the guard saw this tick.
Where last_anchor_divergence_bps is set¶
StrategyEngine.calculate_quote in neo_engine/strategy_engine.py:202–205:
if mid_price > 0:
self.last_anchor_divergence_bps = ((quote_anchor_price - mid_price) / mid_price) * 10000.0
else:
self.last_anchor_divergence_bps = None
On a VALID snapshot, calculate_quote ALWAYS reaches this block (it's after anchor selection, before cap/switch logic), so the field is either a fresh float or explicit None (if mid_price <= 0, which can't happen when is_valid() is True per MarketSnapshot.is_valid()).
On an INVALID snapshot, calculate_quote early-returns at line 154 (if not snapshot.is_valid(): return intents). The assignment at line 203/205 is SKIPPED. The attribute retains whatever value the last valid tick wrote.
Implication for the spec¶
Vesper's spec: "Value is self._strategy.last_anchor_divergence_bps ... When that value is None (no market data this tick), write NULL."
Literal read: only writes NULL if the field itself is None. On invalid-snapshot ticks the field is NOT None — it holds the previous valid tick's value. So a literal-spec implementation would persist stale data on those ticks.
The anchor saturation guard already handles this correctly via main_loop.py:2555–2563:
if snapshot.is_valid():
intents = self._strategy.calculate_quote(snapshot, inventory)
...
div_bps = self._strategy.last_anchor_divergence_bps
if div_bps is not None:
self._anchor_divergence_obs.append(div_bps)
...
self._anchor_error_window.append(div_bps)
else:
self._anchor_divergence_skipped += 1
On invalid ticks, the guard's rolling window is NOT appended — the stale value is not observed.
Recommended resolution¶
To keep telemetry aligned with what the guard actually evaluates, the persistence path should match the same gate. Options:
Option A (recommended). Gate in _persist_tick_telemetry:
if snapshot.is_valid():
anchor_err = self._strategy.last_anchor_divergence_bps # float or None
else:
anchor_err = None
Then pass anchor_err to record_system_metric. Net effect: on valid ticks, the tick's computed value (or None if the strategy set it to None); on invalid ticks, always None. This matches the guard's observation gate and the spec intent ("no market data this tick → NULL").
Option B (literal-spec). Just read self._strategy.last_anchor_divergence_bps unconditionally. Persists stale values on invalid-snapshot ticks. NOT recommended — invalidates the "NULL when mid unavailable" contract.
Option C (strategy-side fix). Make calculate_quote reset last_anchor_divergence_bps to None on early-return. Larger scope (changes guard feed semantics) and unnecessary — the guard's current gate is already correct at the call site. Out of scope for this telemetry-only branch.
I will build Option A unless you overrule. The gate is one if-statement, no new state, no behavioral effect on the guard (which already gates its own append).
Q2 — _ensure_column migration pattern¶
Confirmed¶
_ensure_column(conn, table_name, column_name, definition) is defined at neo_engine/state_manager.py:90–97:
def _ensure_column(conn, table_name, column_name, definition) -> None:
if not _column_exists(conn, table_name, column_name):
conn.execute(f"ALTER TABLE {table_name} ADD COLUMN {column_name} {definition}")
Helper is idempotent by construction: probes PRAGMA table_info before issuing the ALTER. Safe on every startup, no-op on DBs that already have the column.
Call site¶
StateManager.initialize_database() at state_manager.py:720 (enclosing function). Schema CREATE TABLE IF NOT EXISTS blocks run first inside an executescript, then a series of _ensure_column calls run for each additive migration. Lines 739–787 are the full existing migration list. My new call belongs in the system_metrics-specific block at lines 779–780 (alongside Branch #6's distance-to-touch columns):
# Branch #6: distance-to-touch diagnostic — per-tick, per-side
# signed bps distance from our live resting quote ...
_ensure_column(conn, "system_metrics", "distance_to_touch_bid_bps", "REAL")
_ensure_column(conn, "system_metrics", "distance_to_touch_ask_bps", "REAL")
# Phase 7.3 anchor error telemetry — per-tick strategy
# last_anchor_divergence_bps snapshot ...
_ensure_column(conn, "system_metrics", "anchor_error_bps", "REAL") # NEW
Is this the first _ensure_column use on system_metrics?¶
No. Three prior calls already target this table:
| Line | Column | Added in |
|---|---|---|
| 764 | session_id INTEGER |
Session tracking migration |
| 779 | distance_to_touch_bid_bps REAL |
Branch #6 |
| 780 | distance_to_touch_ask_bps REAL |
Branch #6 |
Pattern is well-established. My column is call #4.
Existing schema setup that needs updating alongside¶
The CREATE TABLE IF NOT EXISTS system_metrics (...) block at state_manager.py:573–595 is the canonical schema for new DBs. Strictly, no update is required — _ensure_column works on both new and existing DBs (the CREATE TABLE runs first, then the ALTER). But the distance-to-touch columns from Branch #6 were also added to the CREATE TABLE body (lines 587–594) so that grep-hitting the schema file shows them as first-class columns, not just as afterthought migrations.
I'll follow the same precedent: add anchor_error_bps REAL to the CREATE TABLE body AND keep the _ensure_column call for existing DBs. Two-line change; matches Branch #6 exactly.
Q3 — NULL vs 0.0 handling¶
Confirmed via distance_to_touch_* precedent¶
StateManager.record_system_metric at state_manager.py:1966–2025 has the signature:
def record_system_metric(
self,
tick_latency_ms: Optional[float],
...
distance_to_touch_bid_bps: Optional[float] = None,
distance_to_touch_ask_bps: Optional[float] = None,
) -> int:
The insert at lines 1985–2024 binds these values directly into a parameterized INSERT:
conn.execute(
"INSERT INTO system_metrics (..., distance_to_touch_bid_bps, distance_to_touch_ask_bps) "
"VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
(self._now(), tick_latency_ms, ..., distance_to_touch_bid_bps, distance_to_touch_ask_bps),
)
Python None → SQL NULL via sqlite3's default parameter binding. No cast error, no 0.0 substitution. This is the shape we need.
Caller's None-handling¶
At the call site in _persist_tick_telemetry:
dist_bid_bps, dist_ask_bps = self._compute_distance_to_touch(snapshot)
...
self._state.record_system_metric(
...
distance_to_touch_bid_bps=dist_bid_bps,
distance_to_touch_ask_bps=dist_ask_bps,
)
_compute_distance_to_touch returns None for either side when no live order exists or the snapshot lacks the touch; the caller does NOT substitute 0.0. The None flows directly through to the bind parameter → SQL NULL. Exact pattern I need.
My insert change¶
Add to signature (default None):
Add to INSERT column list + bind tuple in the same position. Pass through from _persist_tick_telemetry:
No 0.0 substitution anywhere on either side.
Q4 — Duplicate-column check¶
No duplicate exists¶
Grep for anchor_error_bps across neo-2026/:
| File | Use |
|---|---|
neo_engine/main_loop.py:221 |
Comment: "the strategy's per-tick last_anchor_divergence_bps values (aka anchor_error_bps)" — pure alias reference |
neo_engine/main_loop.py:3846 |
Comment referencing |anchor_error_bps| > 5 reliability stat |
neo_engine/strategy_engine.py:210 |
Comment referencing |anchor_error_bps| > strategy.clob_switch_threshold_bps |
config/config.yaml:160 |
YAML comment ("Rolling window of per-tick |anchor_error_bps| values") |
tests/test_anchor_error_stat.py:3 |
Docstring referencing the reliability stat |
tests/test_anchor_saturation_guard.py:8 |
Docstring alias reference |
tests/test_phase_7_2_clob_switch.py:5,10,155 |
Docstring aliases |
docs/NEO_Roadmap.md:23,85 |
Roadmap commentary |
AGENT_CHANGE_CONTROL.md:25,30 |
Change control log |
Every hit is a comment, docstring, or log/alias reference. There is no column, no dataclass field, no function parameter, and no method named anchor_error_bps in production code.
The CREATE TABLE IF NOT EXISTS system_metrics (...) block at state_manager.py:573–595 is the canonical schema; grep of the schema body confirms no existing column by this name or any similar name that captures the same value.
The existing session-level anchor.pct_error_above_5bps key (written via record_engine_state at session close by _log_anchor_divergence_summary) is a different shape — a session-aggregate stored as a string in engine_state, not a per-tick float in system_metrics. Separate storage, separate semantics. No collision.
Safe to introduce anchor_error_bps as the column name. The name matches the established conceptual alias throughout the codebase — this is arguably the natural name. Any future refactor that also renames last_anchor_divergence_bps → last_anchor_error_bps would then be a purely mechanical alignment.
Implementation plan (proposed)¶
Per your commit plan, three commits:
- C1 —
feat(db): add anchor_error_bps column to system_metrics state_manager.py:CREATE TABLEbody gainsanchor_error_bps REALat the end.state_manager.py:_ensure_column(conn, "system_metrics", "anchor_error_bps", "REAL")added next to the Branch #6 calls.state_manager.py:record_system_metricsignature gainsanchor_error_bps: Optional[float] = None; INSERT column list and bind tuple extended in the same position.-
No behavioral change until C2 wires the writer.
-
C2 —
feat(main_loop): write anchor_error_bps to system_metrics on each tick _persist_tick_telemetry: add Q1 Option A gate — readself._strategy.last_anchor_divergence_bpsonly whensnapshot.is_valid(); elseNone.- Pass through to
record_system_metric(..., anchor_error_bps=anchor_err). -
3-line change; no new state, no config, no guard touched.
-
C3 —
test(telemetry): anchor error per-tick telemetry — 6 tests(plus whatever integration tests fall out of the spec) - Fresh DB — column present after
initialize_database()(viaPRAGMA table_info). - Existing DB missing column —
_ensure_columnadds it without error (simulate by creating a DB without the column then re-initializing). - Tick with valid
last_anchor_divergence_bps— float round-trip throughrecord_system_metric. - Tick with
last_anchor_divergence_bps = None— NULL round-trip (not 0.0). - Invalid-snapshot tick — gate forces NULL (Q1 Option A test).
- Multi-tick integration — two ticks, values match inputs,
session_idpopulated. - Existing columns unaffected — round-trip
tick_latency_ms,engine_status,distance_to_touch_bid_bpsalongside the new field.
Windows teardown: tempfile.TemporaryDirectory() with LIFO addCleanup-equivalent (sm.close() registered LAST so it runs FIRST at teardown).
Deviations from spec¶
- Q1 Option A (gate on
snapshot.is_valid()) rather than literal-spec unconditional read. Flagged above; recommend confirming before I start C2. - Extra test beyond the minimum-6: a dedicated test for the invalid-snapshot gate (since Option A is a behavior you asked me to ratify). Brings the count to 7.
- I also intend to add the column to the
CREATE TABLEbody, not just the_ensure_columnmigration, matching Branch #6 precedent. Flagged here because your spec says "_ensure_column migration only — do not ALTER TABLE or DROP/RECREATE". TheCREATE TABLEbody does not ALTER or DROP anything — it's the canonical fresh-DB schema. Branch #6 did the same. Confirming this interpretation is OK.
What I will NOT touch¶
- Strategy behavior (
calculate_quoteunchanged). - Anchor saturation guard (
_evaluate_anchor_saturation_guardunchanged). - Any config file.
summarize_paper_run.pyor the dashboard — those can read the new column as a follow-up; not in this branch's scope per your "telemetry only" constraint.
Asks — one ruling¶
Ruling needed on Q1 resolution: Option A (gate on snapshot.is_valid(), recommended) vs Option B (literal spec — stale values on invalid ticks) vs Option C (strategy-side reset — out of scope).
On green light I'll create the branch off the tip of feat/inventory-corridor-guard (not yet merged — the corridor branch is awaiting your review), or off main if you'd prefer me to stack on main and rebase once corridor merges. Either works; please confirm.
Standing by.
— Orion