Orion Delivery — feat/flag-042-degraded-recovery¶
Vesper — branch complete and all green per Atlas's 2026-04-21 locked recovery spec and your pre-code rulings. Ready for review + merge.
Branch¶
feat/flag-042-degraded-recovery (off main including fix/startup-mode-reset), 5 commits, +1391 / −45.
Commits¶
| # | Hash | Subject |
|---|---|---|
| C1 | e209370 |
feat(config): add FLAG-042 recovery config schema (anchor + drift + corridor + episode cap) |
| C2 | 00b2898 |
feat(main_loop): FLAG-042 infrastructure — recovery counter + cap + halt taxonomy + startup reset |
| C3 | 4bf8105 |
feat(main_loop): FLAG-042 C3 — anchor saturation recovery evaluator + Step 8.4 wiring |
| C4 | c38dd41 |
feat(main_loop): FLAG-042 C4 — drift + corridor recovery evaluators + Step 8.4 wiring |
| C5 | 2d380a4 |
test(recovery): FLAG-042 C5 — 16 tests for DEGRADED recovery evaluators |
Tests¶
- New: 16 tests in
tests/test_flag_042_degraded_recovery.py, all green. - Part A (6) — anchor recovery no-op gates:
enabled=False,recovery_enabled=False, not-DEGRADED, wrong-source reason (directional_drift_guard_*), window not full,get_engine_stateraise. - Part B (4) — anchor recovery hysteresis: bias excursion resets counter, prevalence excursion resets counter, both-conditions-clear advances counter, advance-then-excursion resets to 0.
- Part C (2) — anchor recovery exit: mode transition + flag clear + counter reset;
[ANCHOR_SAT]exit WARNING log. - Part D (2) — drift recovery: exit on stability (deques cleared, ticks-since-opposing zeroed, last-side/fills-seen cleared, one-shot flag cleared, watermark preserved); burst excursion resets counter.
- Part E (2) — corridor recovery: exit on stability (both conditions hold,
_corridor_ticks_outsidereset, flag cleared);mid_price=0safe-reset. - Regression: 158/158 green across guard + truth + reason + config + recovery suites.
Run command (sandbox reproducible):
python -m pytest tests/test_anchor_saturation_guard.py tests/test_directional_drift_guard.py \
tests/test_inventory_corridor_guard.py tests/test_reconciler_conservative.py \
tests/test_flag_036_wallet_truth_reconciliation.py tests/test_halt_reason_lifecycle.py \
tests/test_reconciler_anomaly_log.py tests/test_config.py tests/test_config_invariants.py \
tests/test_flag_042_degraded_recovery.py -q
Spec compliance — Atlas 2026-04-21¶
Anchor recovery (hysteresis):
exit bias threshold: recovery_exit_bias_threshold_bps=4.0 (entry 7.0)
exit prev threshold: recovery_exit_prevalence_pct=30.0 (entry 40.0)
stability window: recovery_stability_ticks=30 (consecutive)
anchor_saturation: if abs(mean(window)) < 4 bps AND %(|x|>5 bps) < 30%, increment counter; any excursion past either threshold resets to 0. Counter reaching 30 → _exit_degraded_mode(), reset counter, clear _anchor_guard_triggered_this_session so the entry evaluator can re-fire.
Drift recovery (minimal — no hysteresis):
On a DEGRADED tick with reason startingdirectional_drift_guard: evaluate conditions A (burst within live burst_window_seconds) and B (net notional within live net_notional_window_seconds). Condition C is deliberately excluded — _drift_ticks_since_opposing_fill grows monotonically during DEGRADED (no new fills), so including C would permanently latch the guard. A and B are time-bounded and correctly decay to "no active flow." On exit: burst deque cleared, net-notional deque cleared, ticks-since-opposing zeroed, last-side cleared, fills-seen zeroed. Watermark preserved — resetting would re-play every session fill on the next tick.
Corridor recovery (no new stability parameter):
On a DEGRADED tick with reason startinginventory_corridor_guard: require BOTH rlusd >= min_rlusd_floor AND xrp_pct ∈ [min_xrp_pct, max_xrp_pct]. Missing / zero mid_price OR total portfolio below min_portfolio_rlusd → cannot confirm safe → counter reset. Counter reaching corridor_lookback_ticks → exit, reset _corridor_ticks_outside, clear the one-shot flag.
Per-episode cap:
max_recovery_attempts_per_episode=1
RECOVERY_CAPPED_SOURCES = (anchor, drift, corridor)
HALT_REASON_RECOVERY_EXHAUSTED = "recovery_exhausted_halt"
engine_state[degraded_recovery.<source>.attempts]. On each DEGRADED entry for a capped source, the counter increments atomically; attempts > 1 (i.e. a second entry from the same source in one session) escalates to HALT with recovery_exhausted_halt. Wallet-truth and reconciler sources are uncapped (existing refusal behavior). DB errors on counter reads are treated as "first attempt" to avoid spurious HALT.
Execution ordering¶
Step 8.4 (new) runs before Step 8.5 / 8.5b / 8.5c guards on purpose:
Step 8.4 — DEGRADED recovery evaluators (anchor → drift → corridor)
Step 8.5 — Anchor saturation guard
Step 8.5b — Directional drift guard
Step 8.5c — Inventory corridor guard
A tick that exits DEGRADED at 8.4 can be immediately re-evaluated by the guards at 8.5–8.5c in the same tick. If the regime is still hostile, the guard trips again and the per-episode cap escalates to HALT with recovery_exhausted_halt. This is the mechanism for catching "recovered just to re-enter" pathology in hostile regimes.
Each recovery evaluator is a no-op unless the DEGRADED reason matches its source prefix — routing through engine_state[KEY_DEGRADED_REASON]. Only one evaluator can exit DEGRADED on any given tick.
Startup state reset¶
The startup reset block added in fix/startup-mode-reset is extended in C2 to clear the three new recovery-attempt keys on fresh session start:
degraded_recovery.anchor.attempts -> ""
degraded_recovery.drift.attempts -> ""
degraded_recovery.corridor.attempts -> ""
inventory_truth.mode / degraded_since / degraded_reason clears. Confirmed alongside the 3 FLAG-041-follow-up tests in tests/test_halt_reason_lifecycle.py (green on this branch; no behavior change to those tests).
Backward compatibility¶
_escalate_degraded_to_haltgained optionalhalt_reasonkwarg (defaultHALT_REASON_INVENTORY_TRUTH). Existing call sites unchanged.recovery_enableddefaults totrueon each guard. Setting any of them tofalserestores pre-FLAG-042 one-way behavior (guard fires, engine stays DEGRADED until restart) with zero code path change elsewhere.- No schema migrations — all state uses existing
engine_stateK/V. - Existing guard tests updated only for the new
source=...kwarg added to_enter_degraded_modein C2. No semantic test changes.
Files touched¶
config/config.example.yaml | 30 ++
config/config.yaml | 30 ++
config/config_live_stage1.yaml | 24 ++
neo_engine/config.py | 184 ++ (+validator + loader fields for all three guards
and DegradedRecoveryConfig)
neo_engine/main_loop.py | 599 ++ (recovery state fields, _escalate_* kwarg, cap
helpers, 3 recovery evaluators, Step 8.4 wiring,
startup reset extension, source kwargs on guard
_enter_degraded_mode calls)
tests/test_anchor_saturation_guard.py | 8 ++ (source="anchor" on 4 assertions)
tests/test_directional_drift_guard.py | 16 ++ (source="drift" on 7 assertions)
tests/test_inventory_corridor_guard.py | 14 ++ (source="corridor" on 7 assertions)
tests/test_reconciler_conservative.py | 8 ++ (source=SOURCE_RECONCILER source-level assertion)
tests/test_flag_042_degraded_recovery.py | 523 ++ (new)
Operator impact¶
- Healthy sessions (no DEGRADED): zero observable change. Recovery evaluators short-circuit on the
mode != DEGRADEDcheck. - Single DEGRADED episode that recovers (new behavior): one
[<GUARD>]DEGRADED WARNING on entry, counter increments inengine_state[degraded_recovery.<source>.attempts], stability window accumulates (30 ticks anchor / 10 ticks drift / 3 ticks corridor),[<GUARD>] recovery conditions stable — exiting DEGRADEDWARNING on exit. Mode returns to OK; guard one-shot flag cleared so a re-entry is possible. - Second entry from same source in one session: immediate HALT with
halt_reason=recovery_exhausted_haltand detailrecovery_exhausted:<source>. Matches the existinginventory_truth_haltescalation contract. - recovery_enabled=false on any guard: that source behaves exactly as before FLAG-042 (one-way into DEGRADED until restart). Useful for Phase 7.4 SR-AUDIT comparison runs.
Deviations from tasking¶
One — condition C omitted from drift recovery. Documented in both C4 commit message and _evaluate_drift_recovery docstring. During DEGRADED no new fills arrive, so _drift_ticks_since_opposing_fill grows monotonically and condition C would permanently latch the guard. Conditions A (burst) and B (net notional) are the correct recovery signals — both use time-bounded rolling windows that decay to "no active flow." This matches the spirit of your ruling ("none of the drift conditions A/B/C would re-trigger") without introducing the pathology. Flagging for your review — happy to rework if you want a different interpretation.
No other deviations. Hysteresis thresholds, stability windows, episode cap, source taxonomy, halt token, startup reset, and Step 8.4 ordering all match the locked spec.
Apply instructions (Windows / PowerShell)¶
Patches live at 02 Projects/NEO Trading Engine/08 Patches/patches-flag-042-degraded-recovery/ (5 files, 0001 → 0005). From Katja's VS Code terminal:
cd C:\Users\Katja\Documents\NEO GitHub\neo-2026
git checkout main
git pull
# Defensive: clear any pre-existing branch from a prior attempt.
git branch -D feat/flag-042-degraded-recovery 2>$null
git checkout -b feat/flag-042-degraded-recovery
Get-ChildItem "C:\Users\Katja\Documents\Claude Homebase Neo\02 Projects\NEO Trading Engine\08 Patches\patches-flag-042-degraded-recovery" -Filter "*.patch" |
Sort-Object Name |
ForEach-Object { git am $_.FullName }
# Verify
git log --oneline main..HEAD
# Expected (topmost 5):
# 2d380a4 test(recovery): FLAG-042 C5 — 16 tests for DEGRADED recovery evaluators
# c38dd41 feat(main_loop): FLAG-042 C4 — drift + corridor recovery evaluators + Step 8.4 wiring
# 4bf8105 feat(main_loop): FLAG-042 C3 — anchor saturation recovery evaluator + Step 8.4 wiring
# 00b2898 feat(main_loop): FLAG-042 infrastructure — recovery counter + cap + halt taxonomy + startup reset
# e209370 feat(config): add FLAG-042 recovery config schema (anchor + drift + corridor + episode cap)
# Regression
python -m pytest tests/test_flag_042_degraded_recovery.py -v
# Expected: 16 passed
python -m pytest tests/test_anchor_saturation_guard.py tests/test_directional_drift_guard.py `
tests/test_inventory_corridor_guard.py tests/test_reconciler_conservative.py `
tests/test_flag_036_wallet_truth_reconciliation.py tests/test_halt_reason_lifecycle.py `
tests/test_reconciler_anomaly_log.py tests/test_config.py tests/test_config_invariants.py `
tests/test_flag_042_degraded_recovery.py -q
# Expected: 158 passed
Prerequisite: fix/startup-mode-reset must already be on main (per CLAUDE.md, it merged Apr 21). If main is behind that merge, apply the startup-mode-reset bundle first or the C2 startup reset block hunk will not apply cleanly.
Status¶
C1–C5 complete. Branch is clean and ready. Atlas's locked spec followed; one documented drift-condition-C deviation flagged above. Awaiting your review.
— Orion