Orion Investigation — Branch #5 audit/config-wiring-pass¶
To: Katja CC: Vesper, Atlas From: Orion Date: 2026-04-18
Acknowledgment¶
Branch #4 (fix/flag-029-async-pin-and-orphan) merged — 15/15 passed, Vesper approved. Mirrored locally; Branch #5 cut off df168f0…83452fe tip. Starting investigation now before any code. Three open questions at the end — do not start code until Q1–Q3 are ruled on.
Scope (locked in Atlas Pre-7.3 Review §2B, §2C, §2E)¶
- Audit every critical config parameter from YAML → Config dataclass → runtime consumer → observable metric. Any value that does not manifest in behavior is a wiring failure.
- Promote the CLOB-switch threshold from the hardcoded
3.0atstrategy_engine.py:216intostrategy.clob_switch_threshold_bps(default 3.0). Required for Phase 7.3 tuning. - Add
config_mismatchto the halt taxonomy (reserved constantHALT_REASON_CONFIG_MISMATCHalready exists atmain_loop.py:123but is never emitted). Triggered when runtime config ≠ expected config / invariant failure.
Findings¶
A. CLOB-switch threshold hardcoded — one site¶
neo_engine/strategy_engine.py:211
# use the selected anchor. Atlas-locked threshold is 3 bps (control
# threshold — distinct from the 5 bps evaluation reliability floor).
# Binary only — no blending, no weighted average.
if (
self.last_anchor_divergence_bps is not None
and abs(self.last_anchor_divergence_bps) > 3.0 ← hardcoded
):
reference_mid = mid_price
reference_source = "clob_mid_phase7_switch"
else:
reference_mid = quote_anchor_price
reference_source = quote_anchor_source
Nowhere else in neo_engine/ does 3.0 appear as a switch-threshold constant. The in-line comment even names it "Atlas-locked threshold is 3 bps" — direct evidence the value was meant to be configurable but got wired as a literal. Single surface, clean promotion.
B. bid_offset_bps / ask_offset_bps wiring — clean, verified¶
| Layer | Location | Value |
|---|---|---|
| YAML (config.yaml) | not present; falls back to default | — |
StrategyConfig dataclass |
config.py:145-146 |
10.0 / 16.0 |
| YAML loader | config.py:437-438 (strat_raw.get(...)) |
pass-through |
| Runtime assignment | strategy_engine.py:92-93 → self.base_bid_offset_bps, self.base_ask_offset_bps |
taken from config.strategy |
| Tick-time consumer | strategy_engine.py:227-228 → final_bid_offset_bps = base_bid_offset_bps + skew_bps |
used |
| Quote placement | strategy_engine.py:271-272 → buy_price = reference_mid * (1 - final_bid_offset_bps/10000) |
observable |
| Persistence | main_loop.py:1121-1122, 2332, 2395-2396 → intended_bid_offset_bps on orders/fills |
observable |
Verified. Changing strategy.bid_offset_bps or strategy.ask_offset_bps in YAML deterministically moves buy_price/sell_price and is recorded on every order/fill row. No phantom path. Will be captured in the wiring table without code changes.
C. anchor_max_divergence_bps wiring — clean (Branch #1 fix)¶
YAML strategy.anchor_max_divergence_bps: 10.0 → StrategyConfig → strategy_engine.py:173 → cap_frac = value / 10000.0 applied to capped_amm anchor. Clean single consumer. Not needed for Phase 7.3 switch — that's a separate knob (see A).
D. Risk caps (max_xrp_exposure, max_rlusd_exposure) — clean¶
Already audited and verified under Audit Item 1 before Branch #4. YAML → RiskConfig → main_loop.py:801, 803 risk gate. The S39 "100 cap" ghost was a stale halt.reason string, fixed in Branch #1 (not a wiring bug).
E. HALT_REASON_CONFIG_MISMATCH — declared, never emitted¶
neo_engine/main_loop.py:123
HALT_REASON_CONFIG_MISMATCH = "config_mismatch" # reserved — emitted by Branch #2 invariant check
The comment refers to a Branch #2 check that never landed. Today there is no call site emitting HALT_REASON_CONFIG_MISMATCH. Branch #5 is the right place to wire it up.
Proposed structure — three commits¶
Commit 1 — feat(strategy): promote clob_switch_threshold_bps to config (Phase 7.3 tuning)¶
neo_engine/config.py: addclob_switch_threshold_bps: float = 3.0toStrategyConfig; add loader lineclob_switch_threshold_bps=float(strat_raw.get("clob_switch_threshold_bps", 3.0)).neo_engine/strategy_engine.py:216: replace3.0withself._config.strategy.clob_switch_threshold_bps.config/config.example.yamlandconfig/config.yaml: addclob_switch_threshold_bps: 3.0line next toanchor_max_divergence_bps.tests/test_clob_switch_threshold_config.py(new): (a) default 3.0 → switch fires at |div| > 3.0 and not at 3.0 exactly; (b) override to 5.0 → switch does NOT fire at div=4.0 but DOES at div=5.5. Drives through the strategy'sevaluate()result to prove the knob movesreference_source.
Behaviour invariant: default 3.0 preserves S36–S39 behavior bit-for-bit. Only paths with the override changed diverge.
Commit 2 — feat(startup): config invariant check emits config_mismatch on failure¶
The piece Atlas actually called out (§2E): runtime invariants that on failure halt with config_mismatch. Scope is the question in Q1 below. My proposed minimum set (three cheap checks):
- risk.max_xrp_exposure and risk.max_rlusd_exposure must be > 0.
- engine.tick_interval_seconds must equal strategy.requote_interval_seconds (noted as missing assertion in my original Item 5 audit).
- strategy.clob_switch_threshold_bps must be > 0 (zero would disable the switch entirely; negative would always trigger it).
Each failure: log.error with the specific mismatch, set halt.reason=HALT_REASON_CONFIG_MISMATCH, raise RuntimeError from _startup() so the engine refuses to begin ticking.
Tests (tests/test_config_invariants.py, new): one happy-path, three failure paths (one per invariant), each asserting (a) RuntimeError raised, (b) halt.reason=config_mismatch persisted.
Commit 3 — docs(config): add config wiring reference table¶
- New file
docs/config_wiring.md(or whatever path you prefer — see Q3). Columns: YAML key | Config field | Loader site | Runtime consumer | Observable metric | Verified. One row per critical knob (≈25 rows). No code changes.
What is explicitly out of scope¶
- Not auditing parameters in
ParameterConfig(circuit-breaker, spread-regime, skew buckets) unless they are behavior-critical in Phase 7.3. They're fed into circuit-breaker logic that has its own test coverage; widening here dilutes the branch. - Not touching
StrategyConfigflags that are experimental / behind feature toggles (momentum filter, bid ladder, etc.) — they're documented as knobs, flip-flag changes behavior visibly. - Not rewriting the loader — still
dict.get(key, default)pattern per existing code style.
Open questions before I write code¶
-
Q1 — Invariant scope for
config_mismatch. Atlas §2E says "runtime config ≠ expected config, invariant failure detected." I propose the three cheap checks above (positive caps, tick == requote, threshold > 0). Want more (e.g., validateanchor_modeis one of the allowed strings, cross-checkbid_ladder_size_weightssums to ~1.0), or keep it to the three? Either is fine; I want the ruling before writing the test set. -
Q2 —
clob_switch_threshold_bpsdefault — 3.0 confirmed? Atlas ruling §2C says 3.0. Current hardcoded is 3.0. I'll default the dataclass AND the loader fallback AND the YAML line to 3.0. Just pinning that nobody wants it changed under this branch. -
Q3 — Wiring-table doc location. Three options: (a)
docs/config_wiring.md(new dir in repo), (b) insideneo_engine/config.pyas a long module docstring, or (c) a workspace artifact[C] NEO Config Wiring Reference.mdthat is not checked into the repo — kept next to operating principles. I'd lean (c) since it's reference material for the team rather than something a code reader needs at the call site. Your call.
Commit 1 is behavior-neutral at the default; Commit 2 is what could actually halt an engine that was previously starting. The three-invariant minimum in Commit 2 is my recommendation, but it's your call.
Standing by for Q1–Q3.
— Orion