Skip to content

Orion Investigation — Branch #5 audit/config-wiring-pass

To: Katja CC: Vesper, Atlas From: Orion Date: 2026-04-18


Acknowledgment

Branch #4 (fix/flag-029-async-pin-and-orphan) merged — 15/15 passed, Vesper approved. Mirrored locally; Branch #5 cut off df168f0…83452fe tip. Starting investigation now before any code. Three open questions at the end — do not start code until Q1–Q3 are ruled on.

Scope (locked in Atlas Pre-7.3 Review §2B, §2C, §2E)

  1. Audit every critical config parameter from YAML → Config dataclass → runtime consumer → observable metric. Any value that does not manifest in behavior is a wiring failure.
  2. Promote the CLOB-switch threshold from the hardcoded 3.0 at strategy_engine.py:216 into strategy.clob_switch_threshold_bps (default 3.0). Required for Phase 7.3 tuning.
  3. Add config_mismatch to the halt taxonomy (reserved constant HALT_REASON_CONFIG_MISMATCH already exists at main_loop.py:123 but is never emitted). Triggered when runtime config ≠ expected config / invariant failure.

Findings

A. CLOB-switch threshold hardcoded — one site

neo_engine/strategy_engine.py:211
    # use the selected anchor. Atlas-locked threshold is 3 bps (control
    # threshold — distinct from the 5 bps evaluation reliability floor).
    # Binary only — no blending, no weighted average.
    if (
        self.last_anchor_divergence_bps is not None
        and abs(self.last_anchor_divergence_bps) > 3.0     ← hardcoded
    ):
        reference_mid = mid_price
        reference_source = "clob_mid_phase7_switch"
    else:
        reference_mid = quote_anchor_price
        reference_source = quote_anchor_source

Nowhere else in neo_engine/ does 3.0 appear as a switch-threshold constant. The in-line comment even names it "Atlas-locked threshold is 3 bps" — direct evidence the value was meant to be configurable but got wired as a literal. Single surface, clean promotion.

B. bid_offset_bps / ask_offset_bps wiring — clean, verified

Layer Location Value
YAML (config.yaml) not present; falls back to default
StrategyConfig dataclass config.py:145-146 10.0 / 16.0
YAML loader config.py:437-438 (strat_raw.get(...)) pass-through
Runtime assignment strategy_engine.py:92-93self.base_bid_offset_bps, self.base_ask_offset_bps taken from config.strategy
Tick-time consumer strategy_engine.py:227-228final_bid_offset_bps = base_bid_offset_bps + skew_bps used
Quote placement strategy_engine.py:271-272buy_price = reference_mid * (1 - final_bid_offset_bps/10000) observable
Persistence main_loop.py:1121-1122, 2332, 2395-2396intended_bid_offset_bps on orders/fills observable

Verified. Changing strategy.bid_offset_bps or strategy.ask_offset_bps in YAML deterministically moves buy_price/sell_price and is recorded on every order/fill row. No phantom path. Will be captured in the wiring table without code changes.

C. anchor_max_divergence_bps wiring — clean (Branch #1 fix)

YAML strategy.anchor_max_divergence_bps: 10.0StrategyConfigstrategy_engine.py:173cap_frac = value / 10000.0 applied to capped_amm anchor. Clean single consumer. Not needed for Phase 7.3 switch — that's a separate knob (see A).

D. Risk caps (max_xrp_exposure, max_rlusd_exposure) — clean

Already audited and verified under Audit Item 1 before Branch #4. YAML → RiskConfigmain_loop.py:801, 803 risk gate. The S39 "100 cap" ghost was a stale halt.reason string, fixed in Branch #1 (not a wiring bug).

E. HALT_REASON_CONFIG_MISMATCH — declared, never emitted

neo_engine/main_loop.py:123
    HALT_REASON_CONFIG_MISMATCH = "config_mismatch"  # reserved — emitted by Branch #2 invariant check

The comment refers to a Branch #2 check that never landed. Today there is no call site emitting HALT_REASON_CONFIG_MISMATCH. Branch #5 is the right place to wire it up.

Proposed structure — three commits

Commit 1 — feat(strategy): promote clob_switch_threshold_bps to config (Phase 7.3 tuning)

  • neo_engine/config.py: add clob_switch_threshold_bps: float = 3.0 to StrategyConfig; add loader line clob_switch_threshold_bps=float(strat_raw.get("clob_switch_threshold_bps", 3.0)).
  • neo_engine/strategy_engine.py:216: replace 3.0 with self._config.strategy.clob_switch_threshold_bps.
  • config/config.example.yaml and config/config.yaml: add clob_switch_threshold_bps: 3.0 line next to anchor_max_divergence_bps.
  • tests/test_clob_switch_threshold_config.py (new): (a) default 3.0 → switch fires at |div| > 3.0 and not at 3.0 exactly; (b) override to 5.0 → switch does NOT fire at div=4.0 but DOES at div=5.5. Drives through the strategy's evaluate() result to prove the knob moves reference_source.

Behaviour invariant: default 3.0 preserves S36–S39 behavior bit-for-bit. Only paths with the override changed diverge.

Commit 2 — feat(startup): config invariant check emits config_mismatch on failure

The piece Atlas actually called out (§2E): runtime invariants that on failure halt with config_mismatch. Scope is the question in Q1 below. My proposed minimum set (three cheap checks): - risk.max_xrp_exposure and risk.max_rlusd_exposure must be > 0. - engine.tick_interval_seconds must equal strategy.requote_interval_seconds (noted as missing assertion in my original Item 5 audit). - strategy.clob_switch_threshold_bps must be > 0 (zero would disable the switch entirely; negative would always trigger it).

Each failure: log.error with the specific mismatch, set halt.reason=HALT_REASON_CONFIG_MISMATCH, raise RuntimeError from _startup() so the engine refuses to begin ticking.

Tests (tests/test_config_invariants.py, new): one happy-path, three failure paths (one per invariant), each asserting (a) RuntimeError raised, (b) halt.reason=config_mismatch persisted.

Commit 3 — docs(config): add config wiring reference table

  • New file docs/config_wiring.md (or whatever path you prefer — see Q3). Columns: YAML key | Config field | Loader site | Runtime consumer | Observable metric | Verified. One row per critical knob (≈25 rows). No code changes.

What is explicitly out of scope

  • Not auditing parameters in ParameterConfig (circuit-breaker, spread-regime, skew buckets) unless they are behavior-critical in Phase 7.3. They're fed into circuit-breaker logic that has its own test coverage; widening here dilutes the branch.
  • Not touching StrategyConfig flags that are experimental / behind feature toggles (momentum filter, bid ladder, etc.) — they're documented as knobs, flip-flag changes behavior visibly.
  • Not rewriting the loader — still dict.get(key, default) pattern per existing code style.

Open questions before I write code

  • Q1 — Invariant scope for config_mismatch. Atlas §2E says "runtime config ≠ expected config, invariant failure detected." I propose the three cheap checks above (positive caps, tick == requote, threshold > 0). Want more (e.g., validate anchor_mode is one of the allowed strings, cross-check bid_ladder_size_weights sums to ~1.0), or keep it to the three? Either is fine; I want the ruling before writing the test set.

  • Q2 — clob_switch_threshold_bps default — 3.0 confirmed? Atlas ruling §2C says 3.0. Current hardcoded is 3.0. I'll default the dataclass AND the loader fallback AND the YAML line to 3.0. Just pinning that nobody wants it changed under this branch.

  • Q3 — Wiring-table doc location. Three options: (a) docs/config_wiring.md (new dir in repo), (b) inside neo_engine/config.py as a long module docstring, or (c) a workspace artifact [C] NEO Config Wiring Reference.md that is not checked into the repo — kept next to operating principles. I'd lean (c) since it's reference material for the team rather than something a code reader needs at the call site. Your call.

Commit 1 is behavior-neutral at the default; Commit 2 is what could actually halt an engine that was previously starting. The three-invariant minimum in Commit 2 is my recommendation, but it's your call.

Standing by for Q1–Q3.

— Orion