Skip to content

Orion Investigation — Anchor Error Per-Tick Telemetry (pre-code)

To: Vesper From: Orion Date: 2026-04-21 Branch (not yet created): feat/anchor-error-per-tick-telemetry Re: Pre-code investigation answers to Q1–Q4 from your 2026-04-21 tasking. No branch pre-created; no code written. Awaiting green light.


TL;DR

  • Column name is clear (anchor_error_bps, no duplicate exists).
  • Migration pattern is clean (_ensure_column already applied twice to system_metrics).
  • Persistence shape is clean (Python None → SQL NULL via sqlite3 binding; precedent in distance_to_touch_*).
  • One real finding to rule on: on invalid-snapshot ticks, self._strategy.last_anchor_divergence_bps is STALE (not None) because calculate_quote early-returns before the line that updates it. If we just read the field and write it, invalid-snapshot rows will carry last-tick's value. The anchor saturation guard's rolling window does NOT have this problem because its per-tick append is gated on snapshot.is_valid() AND div_bps is not None at main_loop.py:2555–2563. Recommended fix: mirror that gate in the telemetry write so invalid ticks persist NULL. See Q1 below.

Q1 — system_metrics write location + last_anchor_divergence_bps availability

Write site

NEOEngine._persist_tick_telemetry(snapshot, inventory, tick_latency_ms, recon_result, risk_status) at neo_engine/main_loop.py:3123. This helper calls:

  1. self._state.record_market_snapshot(...) — writes to market_snapshots.
  2. self._compute_distance_to_touch(snapshot) — computes Branch #6 values.
  3. self._state.record_inventory_snapshot(...) — writes to inventory_snapshots.
  4. self._state.record_system_metric(...) — writes to system_metrics. This is the insert we extend.

Called once per tick from NEOEngine._tick() at main_loop.py:2810, AFTER the strategy step (self._strategy.calculate_quote(...) at line 2550) and AFTER Step 8.5 (anchor saturation guard evaluation). Confirmed: the saturation guard reads last_anchor_divergence_bps before we persist it, so the telemetry row carries the same value the guard saw this tick.

Where last_anchor_divergence_bps is set

StrategyEngine.calculate_quote in neo_engine/strategy_engine.py:202–205:

if mid_price > 0:
    self.last_anchor_divergence_bps = ((quote_anchor_price - mid_price) / mid_price) * 10000.0
else:
    self.last_anchor_divergence_bps = None

On a VALID snapshot, calculate_quote ALWAYS reaches this block (it's after anchor selection, before cap/switch logic), so the field is either a fresh float or explicit None (if mid_price <= 0, which can't happen when is_valid() is True per MarketSnapshot.is_valid()).

On an INVALID snapshot, calculate_quote early-returns at line 154 (if not snapshot.is_valid(): return intents). The assignment at line 203/205 is SKIPPED. The attribute retains whatever value the last valid tick wrote.

Implication for the spec

Vesper's spec: "Value is self._strategy.last_anchor_divergence_bps ... When that value is None (no market data this tick), write NULL."

Literal read: only writes NULL if the field itself is None. On invalid-snapshot ticks the field is NOT None — it holds the previous valid tick's value. So a literal-spec implementation would persist stale data on those ticks.

The anchor saturation guard already handles this correctly via main_loop.py:2555–2563:

if snapshot.is_valid():
    intents = self._strategy.calculate_quote(snapshot, inventory)
    ...
    div_bps = self._strategy.last_anchor_divergence_bps
    if div_bps is not None:
        self._anchor_divergence_obs.append(div_bps)
        ...
        self._anchor_error_window.append(div_bps)
    else:
        self._anchor_divergence_skipped += 1

On invalid ticks, the guard's rolling window is NOT appended — the stale value is not observed.

To keep telemetry aligned with what the guard actually evaluates, the persistence path should match the same gate. Options:

Option A (recommended). Gate in _persist_tick_telemetry:

if snapshot.is_valid():
    anchor_err = self._strategy.last_anchor_divergence_bps  # float or None
else:
    anchor_err = None

Then pass anchor_err to record_system_metric. Net effect: on valid ticks, the tick's computed value (or None if the strategy set it to None); on invalid ticks, always None. This matches the guard's observation gate and the spec intent ("no market data this tick → NULL").

Option B (literal-spec). Just read self._strategy.last_anchor_divergence_bps unconditionally. Persists stale values on invalid-snapshot ticks. NOT recommended — invalidates the "NULL when mid unavailable" contract.

Option C (strategy-side fix). Make calculate_quote reset last_anchor_divergence_bps to None on early-return. Larger scope (changes guard feed semantics) and unnecessary — the guard's current gate is already correct at the call site. Out of scope for this telemetry-only branch.

I will build Option A unless you overrule. The gate is one if-statement, no new state, no behavioral effect on the guard (which already gates its own append).


Q2 — _ensure_column migration pattern

Confirmed

_ensure_column(conn, table_name, column_name, definition) is defined at neo_engine/state_manager.py:90–97:

def _ensure_column(conn, table_name, column_name, definition) -> None:
    if not _column_exists(conn, table_name, column_name):
        conn.execute(f"ALTER TABLE {table_name} ADD COLUMN {column_name} {definition}")

Helper is idempotent by construction: probes PRAGMA table_info before issuing the ALTER. Safe on every startup, no-op on DBs that already have the column.

Call site

StateManager.initialize_database() at state_manager.py:720 (enclosing function). Schema CREATE TABLE IF NOT EXISTS blocks run first inside an executescript, then a series of _ensure_column calls run for each additive migration. Lines 739–787 are the full existing migration list. My new call belongs in the system_metrics-specific block at lines 779–780 (alongside Branch #6's distance-to-touch columns):

# Branch #6: distance-to-touch diagnostic — per-tick, per-side
# signed bps distance from our live resting quote ...
_ensure_column(conn, "system_metrics", "distance_to_touch_bid_bps", "REAL")
_ensure_column(conn, "system_metrics", "distance_to_touch_ask_bps", "REAL")

# Phase 7.3 anchor error telemetry — per-tick strategy
# last_anchor_divergence_bps snapshot ...
_ensure_column(conn, "system_metrics", "anchor_error_bps", "REAL")   # NEW

Is this the first _ensure_column use on system_metrics?

No. Three prior calls already target this table:

Line Column Added in
764 session_id INTEGER Session tracking migration
779 distance_to_touch_bid_bps REAL Branch #6
780 distance_to_touch_ask_bps REAL Branch #6

Pattern is well-established. My column is call #4.

Existing schema setup that needs updating alongside

The CREATE TABLE IF NOT EXISTS system_metrics (...) block at state_manager.py:573–595 is the canonical schema for new DBs. Strictly, no update is required_ensure_column works on both new and existing DBs (the CREATE TABLE runs first, then the ALTER). But the distance-to-touch columns from Branch #6 were also added to the CREATE TABLE body (lines 587–594) so that grep-hitting the schema file shows them as first-class columns, not just as afterthought migrations.

I'll follow the same precedent: add anchor_error_bps REAL to the CREATE TABLE body AND keep the _ensure_column call for existing DBs. Two-line change; matches Branch #6 exactly.


Q3 — NULL vs 0.0 handling

Confirmed via distance_to_touch_* precedent

StateManager.record_system_metric at state_manager.py:1966–2025 has the signature:

def record_system_metric(
    self,
    tick_latency_ms: Optional[float],
    ...
    distance_to_touch_bid_bps: Optional[float] = None,
    distance_to_touch_ask_bps: Optional[float] = None,
) -> int:

The insert at lines 1985–2024 binds these values directly into a parameterized INSERT:

conn.execute(
    "INSERT INTO system_metrics (..., distance_to_touch_bid_bps, distance_to_touch_ask_bps) "
    "VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
    (self._now(), tick_latency_ms, ..., distance_to_touch_bid_bps, distance_to_touch_ask_bps),
)

Python None → SQL NULL via sqlite3's default parameter binding. No cast error, no 0.0 substitution. This is the shape we need.

Caller's None-handling

At the call site in _persist_tick_telemetry:

dist_bid_bps, dist_ask_bps = self._compute_distance_to_touch(snapshot)
...
self._state.record_system_metric(
    ...
    distance_to_touch_bid_bps=dist_bid_bps,
    distance_to_touch_ask_bps=dist_ask_bps,
)

_compute_distance_to_touch returns None for either side when no live order exists or the snapshot lacks the touch; the caller does NOT substitute 0.0. The None flows directly through to the bind parameter → SQL NULL. Exact pattern I need.

My insert change

Add to signature (default None):

anchor_error_bps: Optional[float] = None,

Add to INSERT column list + bind tuple in the same position. Pass through from _persist_tick_telemetry:

self._state.record_system_metric(
    ...
    anchor_error_bps=anchor_err,   # see Q1 Option A
)

No 0.0 substitution anywhere on either side.


Q4 — Duplicate-column check

No duplicate exists

Grep for anchor_error_bps across neo-2026/:

File Use
neo_engine/main_loop.py:221 Comment: "the strategy's per-tick last_anchor_divergence_bps values (aka anchor_error_bps)" — pure alias reference
neo_engine/main_loop.py:3846 Comment referencing |anchor_error_bps| > 5 reliability stat
neo_engine/strategy_engine.py:210 Comment referencing |anchor_error_bps| > strategy.clob_switch_threshold_bps
config/config.yaml:160 YAML comment ("Rolling window of per-tick |anchor_error_bps| values")
tests/test_anchor_error_stat.py:3 Docstring referencing the reliability stat
tests/test_anchor_saturation_guard.py:8 Docstring alias reference
tests/test_phase_7_2_clob_switch.py:5,10,155 Docstring aliases
docs/NEO_Roadmap.md:23,85 Roadmap commentary
AGENT_CHANGE_CONTROL.md:25,30 Change control log

Every hit is a comment, docstring, or log/alias reference. There is no column, no dataclass field, no function parameter, and no method named anchor_error_bps in production code.

The CREATE TABLE IF NOT EXISTS system_metrics (...) block at state_manager.py:573–595 is the canonical schema; grep of the schema body confirms no existing column by this name or any similar name that captures the same value.

The existing session-level anchor.pct_error_above_5bps key (written via record_engine_state at session close by _log_anchor_divergence_summary) is a different shape — a session-aggregate stored as a string in engine_state, not a per-tick float in system_metrics. Separate storage, separate semantics. No collision.

Safe to introduce anchor_error_bps as the column name. The name matches the established conceptual alias throughout the codebase — this is arguably the natural name. Any future refactor that also renames last_anchor_divergence_bps → last_anchor_error_bps would then be a purely mechanical alignment.


Implementation plan (proposed)

Per your commit plan, three commits:

  1. C1 — feat(db): add anchor_error_bps column to system_metrics
  2. state_manager.py: CREATE TABLE body gains anchor_error_bps REAL at the end.
  3. state_manager.py: _ensure_column(conn, "system_metrics", "anchor_error_bps", "REAL") added next to the Branch #6 calls.
  4. state_manager.py: record_system_metric signature gains anchor_error_bps: Optional[float] = None; INSERT column list and bind tuple extended in the same position.
  5. No behavioral change until C2 wires the writer.

  6. C2 — feat(main_loop): write anchor_error_bps to system_metrics on each tick

  7. _persist_tick_telemetry: add Q1 Option A gate — read self._strategy.last_anchor_divergence_bps only when snapshot.is_valid(); else None.
  8. Pass through to record_system_metric(..., anchor_error_bps=anchor_err).
  9. 3-line change; no new state, no config, no guard touched.

  10. C3 — test(telemetry): anchor error per-tick telemetry — 6 tests (plus whatever integration tests fall out of the spec)

  11. Fresh DB — column present after initialize_database() (via PRAGMA table_info).
  12. Existing DB missing column — _ensure_column adds it without error (simulate by creating a DB without the column then re-initializing).
  13. Tick with valid last_anchor_divergence_bps — float round-trip through record_system_metric.
  14. Tick with last_anchor_divergence_bps = None — NULL round-trip (not 0.0).
  15. Invalid-snapshot tick — gate forces NULL (Q1 Option A test).
  16. Multi-tick integration — two ticks, values match inputs, session_id populated.
  17. Existing columns unaffected — round-trip tick_latency_ms, engine_status, distance_to_touch_bid_bps alongside the new field.

Windows teardown: tempfile.TemporaryDirectory() with LIFO addCleanup-equivalent (sm.close() registered LAST so it runs FIRST at teardown).

Deviations from spec

  • Q1 Option A (gate on snapshot.is_valid()) rather than literal-spec unconditional read. Flagged above; recommend confirming before I start C2.
  • Extra test beyond the minimum-6: a dedicated test for the invalid-snapshot gate (since Option A is a behavior you asked me to ratify). Brings the count to 7.
  • I also intend to add the column to the CREATE TABLE body, not just the _ensure_column migration, matching Branch #6 precedent. Flagged here because your spec says "_ensure_column migration only — do not ALTER TABLE or DROP/RECREATE". The CREATE TABLE body does not ALTER or DROP anything — it's the canonical fresh-DB schema. Branch #6 did the same. Confirming this interpretation is OK.

What I will NOT touch

  • Strategy behavior (calculate_quote unchanged).
  • Anchor saturation guard (_evaluate_anchor_saturation_guard unchanged).
  • Any config file.
  • summarize_paper_run.py or the dashboard — those can read the new column as a follow-up; not in this branch's scope per your "telemetry only" constraint.

Asks — one ruling

Ruling needed on Q1 resolution: Option A (gate on snapshot.is_valid(), recommended) vs Option B (literal spec — stale values on invalid ticks) vs Option C (strategy-side reset — out of scope).

On green light I'll create the branch off the tip of feat/inventory-corridor-guard (not yet merged — the corridor branch is awaiting your review), or off main if you'd prefer me to stack on main and rebase once corridor merges. Either works; please confirm.

Standing by.

— Orion