Skip to content

Orion Investigation — feat/anchor-saturation-guard

To: Vesper From: Orion CC: Atlas, Katja (Captain) Date: 2026-04-19 Re: Pre-code investigation for anchor saturation guard — findings on Q1–Q4 Branch: feat/anchor-saturation-guard (created, empty) Status: Investigation complete. Awaiting green light before writing code.


Summary

All four investigation questions have concrete answers with file paths and line numbers. No remaining ambiguity on the integration surface. One small schema deviation from the spec that I'd like your ruling on before I touch code (Q4).


Q1 — Anchor error field + tick cadence

Spec asks: Where does the tick loop compute anchor error, what is the exact field name, and is this the same value as clob_vs_amm_divergence_bps?

Answer. The tick-loop signal to consume is:

self._strategy.last_anchor_divergence_bps: float | None

Defined at neo_engine/strategy_engine.py:99, set inside calculate_quote(...) on every tick:

# strategy_engine.py:203
self.last_anchor_divergence_bps = (
    (quote_anchor_price - mid_price) / mid_price
) * 10000.0

# strategy_engine.py:205 — null path when mid is unavailable
self.last_anchor_divergence_bps = None

Naming note: the tasking memo calls this anchor_error_bps. That name appears only in comments in the codebase — the Python attribute is last_anchor_divergence_bps. Same value, different label. I'll use the Python name throughout and add a short comment pointing out the equivalence so future readers aren't confused.

Is it the same as clob_vs_amm_divergence_bps? No. Those are two different measurements:

  • clob_vs_amm_divergence_bps (market_data.py:188–190) = ((clob_mid - amm_price) / amm_price) * 10000.0. It is a raw market-state measurement: how far the CLOB mid is from the AMM pool price, computed in the snapshot. It feeds the Phase 7.2 binary anchor-mode switch at strategy_engine.py:210, 219–220.
  • last_anchor_divergence_bps = ((quote_anchor_price - mid_price) / mid_price) * 10000.0. It is how far the strategy's chosen anchor sits from mid on this tick — i.e. how much our quote reference is being dragged away from where we'd like to quote. This is what "anchor saturation" means.

The guard must consume last_anchor_divergence_bps. That's what actually caused the S40 drain.

Cadence. calculate_quote(...) runs once per tick in Step 8 of the main loop (main_loop.py:1932), so the value refreshes every tick. When mid is missing the field is None and we must skip that tick from the rolling window rather than treating it as zero.


Q2 — Tick loop insertion point

Spec asks: Where in the tick loop should the guard evaluation run — right after anchor compute? Before intent submit?

Answer. The cleanest insertion point is main_loop.py between line 1943 and line 1945, immediately after the Phase 3E anchor diagnostics append:

# main_loop.py:1932
intents = self._strategy.calculate_quote(snapshot, inventory)

# main_loop.py:1937–1943  (Phase 3E diagnostic)
div_bps = self._strategy.last_anchor_divergence_bps
if div_bps is not None:
    self._anchor_divergence_obs.append(div_bps)

# ← GUARD EVALUATION HERE (new block)
#   1. append div_bps to self._anchor_error_window (deque, maxlen=lookback_ticks)
#   2. if window full AND |mean| ≥ bias AND prevalence ≥ prevalence_pct:
#        persist trigger + emit WARNING + _enter_degraded_mode(...)
#        intents = []   # suppress submit on this tick

# (volatility block continues at 1945+)
# ...
# main_loop.py:2064  Step 9 — intent submit loop
for intent in intents:
    result = self._execution.submit_intent(intent)

Rationale:

  1. The window is populated from the exact same field the diagnostic already uses — one data flow, one source of truth.
  2. The guard gets to clear intents before Step 9 runs, so even on the trigger tick no orders are submitted. (This is belt-and-suspenders: the C5 pre-trade gate in execution_engine.py would refuse them anyway once _enter_degraded_mode sets KEY_MODE=MODE_DEGRADED, but zeroing intents avoids emitting the refused-submit log lines on the transition tick.)
  3. It sits outside any conditional branch — it runs every tick regardless of regime, so a flapping anchor mode can't hide saturation from us.

I considered inserting right before Step 9 at line 2064 instead, but that version would be re-evaluating the guard after unrelated blocks (volatility, inventory, etc.) might have mutated state. Keeping it tight to the anchor compute is simpler.


Q3 — DEGRADED transition

Spec asks: When triggered, call self._enter_degraded_mode(...) — confirm signature, confirm it handles order cancellation, confirm idempotency.

Answer. Confirmed on all three points. main_loop.py:1241–1309:

def _enter_degraded_mode(self, reason: str) -> None:
    """
    Transition the engine to MODE_DEGRADED.

    Idempotent: safe to call on every tick where the condition holds.
    First-entry side effect: cancel all live orders.
    """
    current_mode = self._state.get_engine_state(KEY_MODE)
    is_first_entry = current_mode != MODE_DEGRADED

    self._state.set_engine_state(KEY_MODE, MODE_DEGRADED)
    if is_first_entry:
        self._state.set_engine_state(KEY_DEGRADED_SINCE, _now_iso())
    self._state.set_engine_state(KEY_DEGRADED_REASON, reason)

    if is_first_entry and not self._config.dry_run:
        self._cancel_all_live_orders("Degraded entry cancel")
    # ...

Signature: _enter_degraded_mode(self, reason: str) -> None. I'll call it with reason="anchor_saturation_guard_exceeded".

Idempotency: the current-mode check means repeat calls from subsequent ticks don't re-cancel or reset KEY_DEGRADED_SINCE. The reason string is overwritten on each call, which is actually what we want — if two different guards trigger in sequence, the most recent reason wins.

Order cancellation: handled internally via _cancel_all_live_orders("Degraded entry cancel") on first entry only. Not needed from the guard site.

Post-DEGRADED submit blocking: handled by the C5 pre-trade gate at execution_engine.py:1030–1109, which reads KEY_MODE via get_engine_state and refuses any submit_intent(...) when MODE_DEGRADED or MODE_HALT. This is only active when wallet_reconciliation.enabled is true — which it is now per D2.2. Confirmed by grep: the gate fires on every submit attempt, so the guard does not need its own blocker.

Recovery: DEGRADED requires restart per D2.2 contract. The guard does not implement auto-recovery. Once tripped, reconciliation continues, quoting is off, and the operator decides when to restart.


Q4 — circuit_breaker_events persistence

Spec asks: Does the circuit_breaker_events table exist? What's its schema? Use existing writers or establish first writer?

Answer.

Table exists at state_manager.py:550–558:

CREATE TABLE IF NOT EXISTS circuit_breaker_events (
    id                       INTEGER PRIMARY KEY AUTOINCREMENT,
    created_at               TEXT    NOT NULL,
    breaker                  TEXT    NOT NULL,
    triggered_at             TEXT    NOT NULL,
    cooldown_seconds         INTEGER,
    manual_reset_required    INTEGER NOT NULL DEFAULT 0,
    context_json             TEXT
);

Writers: zero. I grepped the codebase and found no INSERT INTO circuit_breaker_events anywhere. This would be the first writer. The table is a provision from an earlier design round that never got wired up.

Proposed writer. New StateManager.record_circuit_breaker_event(...) method, modelled on record_reconciler_anomaly from the previous branch:

def record_circuit_breaker_event(
    self,
    *,
    breaker: str,
    triggered_at: str,
    cooldown_seconds: int | None = None,
    manual_reset_required: bool = True,  # DEGRADED = manual restart
    context: dict | None = None,
) -> int

Call site from the guard:

self._state.record_circuit_breaker_event(
    breaker="anchor_saturation_guard",
    triggered_at=_now_iso(),
    manual_reset_required=True,
    context={
        "mean_error_bps": round(mean_err, 3),
        "prevalence_pct": round(prev_pct, 2),
        "lookback_ticks": lookback,
        "bias_threshold_bps": bias_threshold,
        "prevalence_threshold_bps": prev_threshold,
        "window_sample_tail": [round(x, 3) for x in list(window)[-5:]],
    },
)

Along with the console WARNING:

[ANCHOR_SAT] DEGRADED triggered — mean={mean_err:+.2f}bps prevalence={prev_pct:.1f}% over {lookback} ticks

One deviation I'd like your ruling on

The current circuit_breaker_events schema has no session_id column. Every other operational table we've built in the last two weeks (inventory_truth_snapshots, reconciler_anomaly_log) does. Without it, session-scoped queries against this table require joining on triggered_at ranges vs. the sessions table, which is brittle.

Proposed: add session_id INTEGER REFERENCES sessions(session_id) via _ensure_column migration in initialize_database(), matching the pattern Vesper approved on the reconciler branch. Auto-tag session_id = self._current_session_id on every write.

This is a schema addition not explicitly in the spec. Flagging it for your ruling before I code it. If you'd rather keep the schema frozen, I'll store session_id inside context_json instead — same data, worse query shape.


Branch plan (unchanged from tasking — posting for confirmation)

Five commits, each independently testable:

  1. C1 — Config schema. AnchorSaturationGuardConfig dataclass in config.py; YAML block under strategy: in config.yaml and config_live_stage1.yaml; wiring in load_config(). Fields: enabled=True, lookback_ticks=25, bias_threshold_bps=7.0, prevalence_threshold_bps=5.0, prevalence_pct=40.0.
  2. C2 — Rolling window + schema. MainLoop._anchor_error_window: collections.deque[float] initialized from config; per-tick append of last_anchor_divergence_bps (skipping None). Plus: _ensure_column migration adding session_id to circuit_breaker_events (pending Q4 ruling).
  3. C3 — Guard evaluation + DEGRADED entry. Insert block at main_loop.py:1944. Three-condition check; on trigger: _enter_degraded_mode("anchor_saturation_guard_exceeded"), clear intents, set self._anchor_guard_triggered_this_session=True to prevent re-entry spam.
  4. C4 — Persistence + logging. record_circuit_breaker_event(...) writer; [ANCHOR_SAT] WARNING log at trigger site.
  5. C5 — Tests (8 minimum). (a) window-not-full no-op, (b) below-bias no-op, (c) below-prevalence no-op, (d) positive-bias trigger, (e) negative-bias trigger (symmetry), (f) DEGRADED transition correctness — mode set, orders cancelled, C5 gate engages, (g) no-retrigger once DEGRADED, (h) enabled=false disables. Plus patch bundle and delivery memo.

Ready to code once you confirm

  • Q1–Q3: no blockers, spec is clear.
  • Q4: pending your ruling on session_id column addition vs. context_json embedding.

Standing by.

— Orion 2026-04-19