Orion Delivery — feat/flag-044-recovery-cooldown¶

Vesper — branch complete per Atlas's 2026-04-21 FLAG-044 ruling and your tasking memo. FLAG-042's per-episode cap is fully replaced by the per-source cool-down + session episode cap. Ready for review + merge.

Branch¶

feat/flag-044-recovery-cooldown (off main including fix/startup-mode-reset + feat/flag-042-degraded-recovery), 4 commits, +1318 / −194.

Commits¶

#	Hash	Subject
C1	`4e6d681`	feat(config): FLAG-044 C1 — cool-down + session episode cap (Atlas 2026-04-21)
C2	`b557f23`	feat(engine): FLAG-044 C2 — cool-down taxonomy, state keys, entry-side logic
C3	`f0a6d79`	feat(main_loop): FLAG-044 C3 — cool-down suppression at Step 8.4
C4	`c926cb7`	test(recovery): FLAG-044 C4 — 26 tests for cool-down + session cap

Deviates from the 5-unit commit plan in the tasking by folding units (1) "drop attempts + recovery_exhausted_halt" and (3) "add per-source episode counting + degraded_episode_limit_halt" into C2 — both touch the same taxonomy block and the same entry-side escalation site in _enter_degraded_mode; splitting would have produced a temporarily incoherent intermediate commit. Cool-down tracking (unit 2), suppression (unit 4), and startup-reset extension (unit 5) remain separable — unit 5 lives in C2 next to the other engine_state key init to keep all new keys registered together.

Tests¶

New: 26 tests in tests/test_flag_044_recovery_cooldown.py, all green. Organized in 5 parts:
Part A (3) — fail-safe engine_state I/O: _read_int_engine_state returns 0 on missing key, 0 on parse failure (non-numeric string); _write_int_engine_state round-trips an integer.
Part B (5) — counter + cool-down helpers: _bump_episode_count increments and persists; _arm_recovery_cooldown writes the configured tick count; _decrement_recovery_cooldown does the right thing at >0, =0, and already-at-0 (no underflow).
Part C (6) — entry-side: fresh-entry increments episode_count but does not arm cool-down; re-entry arms cool-down; exceeding max_degraded_episodes_per_source_per_session escalates to HALT with degraded_episode_limit_halt; cap is >= (Nth entry halts); uncapped sources (wallet_truth, reconciler) never arm cool-down and never hit the cap; already_degraded=True short-circuits (no double-count).
Part D (9) — recovery-side suppression: anchor / drift / corridor each tested for the decrement-then-check semantics — armed to N → N−1 real suppressions (a tick that reads remaining=0 proceeds to the reason + stability logic); recovery_enabled=False short-circuits before touching the cool-down; uncapped sources are never suppressed; debug log recovery_suppressed_by_cooldown fires only while remaining > 0.
Part E (3) — end-to-end lifecycle (anchor, drift, corridor): enter → stability met → exit → re-enter → cool-down armed to 3 → tick 1 suppressed (3→2) → tick 2 suppressed (2→1) → tick 3 proceeds (1→0) → continued advance on stable conditions → 3rd entry hits cap → _escalate_degraded_to_halt(HALT_REASON_DEGRADED_EPISODE_LIMIT).
FLAG-042 docstring + mock refresh (in C4): three stale docstrings updated to point at the FLAG-044 supersession, and _decrement_recovery_cooldown stubbed to MagicMock(return_value=0) on the three helper fixtures (_anchor_engine, _drift_engine, _corridor_engine) so the existing FLAG-042 stability tests aren't tripped by MagicMock(spec=NEOEngine) auto-stubbing the new method as a comparable-unfriendly mock.
Regression (targeted guard / recovery / flag suite): 165/165 green across test_anchor_saturation_guard, test_directional_drift_guard, test_inventory_corridor_guard, test_reconciler_conservative, test_reconciler_anomaly_log, test_flag_036_wallet_truth_reconciliation, test_flag_033_startup_integrity, test_halt_reason_lifecycle, test_flag_042_degraded_recovery, test_flag_044_recovery_cooldown (2 subtests as well).
Regression (full suite, sandbox): no regressions vs. true baseline (stash -u): current 337 failed / 620 passed vs. baseline 348 failed / 583 passed — delta +37 passing / −11 failures fully explained by the 26 new FLAG-044 tests + 11 FLAG-042 tests that the mock-stub refresh in C4 unblocks. Residual failures are the pre-existing test_xrpl_gateway.py environment issue (41 failures on baseline as well); identical to what you saw on feat/flag-042-degraded-recovery.

Run commands (sandbox reproducible):

# Minimum: new tests + the FLAG-042 suite that C4 refreshes
python -m pytest tests/test_flag_044_recovery_cooldown.py tests/test_flag_042_degraded_recovery.py -q

# Targeted guard/recovery/flag suite
python -m pytest tests/test_anchor_saturation_guard.py tests/test_directional_drift_guard.py \
  tests/test_inventory_corridor_guard.py tests/test_reconciler_conservative.py \
  tests/test_reconciler_anomaly_log.py tests/test_flag_036_wallet_truth_reconciliation.py \
  tests/test_flag_033_startup_integrity.py tests/test_halt_reason_lifecycle.py \
  tests/test_flag_042_degraded_recovery.py tests/test_flag_044_recovery_cooldown.py -q

Spec compliance — Atlas 2026-04-21¶

Replacement semantics (summary): FLAG-042's per-episode cap + recovery_exhausted_halt → per-source cool-down + max_degraded_episodes_per_source_per_session + degraded_episode_limit_halt. DEGRADED behavior, recovery thresholds, hysteresis, drift condition-C exclusion, wallet-truth uncapped path, and session duration limit are all unchanged.

DegradedRecoveryConfig (neo_engine/config.py):

recovery_cooldown_ticks: int = 120                           (~8 min at 4s/tick)
max_degraded_episodes_per_source_per_session: int = 3

Validator rejects either field <1 with a message steering operators to recovery_enabled=false when they actually want recovery off. max_recovery_attempts_per_episode dropped entirely.

Taxonomy (neo_engine/main_loop.py):

HALT_REASON_DEGRADED_EPISODE_LIMIT = "degraded_episode_limit_halt"
RECOVERY_CAPPED_SOURCES = (SOURCE_ANCHOR, SOURCE_DRIFT, SOURCE_CORRIDOR)

HALT_REASON_RECOVERY_EXHAUSTED + every recovery_exhausted_halt emission removed. Wallet-truth / reconciler explicitly excluded from RECOVERY_CAPPED_SOURCES — uncapped, per the tasking directive.

Per-source engine_state keys (one set per source in RECOVERY_CAPPED_SOURCES):

degraded_recovery.<source>.cooldown_ticks_remaining
degraded_recovery.<source>.episode_count

Entry-side logic (_enter_degraded_mode): 1. Short-circuit if already_degraded — no double-count, no cool-down arming. 2. For capped sources: new_count = _bump_episode_count(source). 3. If new_count >= max_degraded_episodes_per_source_per_session → _escalate_degraded_to_halt(HALT_REASON_DEGRADED_EPISODE_LIMIT) (Nth entry halts). 4. Else if new_count > 1 (i.e. this is a re-entry, not a first entry) → _arm_recovery_cooldown(source, recovery_cooldown_ticks). 5. First entry leaves cool-down at 0 — recovery is allowed to run immediately when conditions clear.

Recovery-side suppression (Step 8.4, each evaluator): After the mode != DEGRADED and reason-prefix checks, before the stability / hysteresis window logic:

remaining = self._decrement_recovery_cooldown(source)
if remaining > 0:
    log.debug("[<TAG>] recovery_suppressed_by_cooldown",
              extra={"source": source, "cooldown_ticks_remaining": remaining})
    return

Decrement-then-check semantics: the tick that reads remaining=0 proceeds to the stability check (allowing recovery to resume on the next stable window). Armed-to-N gives N−1 real suppressions — this is intentional and covered in Part D + Part E tests.

Startup reset extension (alongside the FLAG-041 / FLAG-042 clears):

degraded_recovery.<source>.cooldown_ticks_remaining  -> "0"
degraded_recovery.<source>.episode_count             -> "0"

For each source in RECOVERY_CAPPED_SOURCES.

Fail-safe integer I/O: _read_int_engine_state returns 0 on missing key AND on parse failure (non-numeric string, DB error surfaced by get_engine_state). Matches the FLAG-042 "treat unknown counter state as first attempt" posture — a DB blip never spuriously halts the engine.

Deviation — `cooldown_ticks_remaining` countdown vs. `cooldown_until_tick` absolute¶

The tasking specified absolute-tick storage (cooldown_until_tick = current_tick + recovery_cooldown_ticks, compared against current_tick each evaluator call). The delivered implementation stores a remaining-ticks countdown (cooldown_ticks_remaining), decremented in Step 8.4 before the guard checks. Katja green-lit this model mid-implementation (Option B); flagging it explicitly so it gets a look during review.

Why the countdown wins, operationally: - Does not depend on an engine-wide monotonic tick counter. The engine_state key space has no authoritative current_tick — using absolute ticks would require persisting that separately and guaranteeing it's monotonic across restarts, which opens a new failure mode (clock roll / DB edit / crash between tick increment and cool-down write). - Natural test ergonomics — tests assert countdown values directly instead of synthesizing a moving current_tick mock. - Same "fixed-length waiting window, not sliding" semantics as the tasking required. Cool-down is armed once (on re-entry) and only ever decrements to 0; nothing ever extends it. - Equivalent observability: the DEBUG log emits cooldown_ticks_remaining — operationally more useful than cooldown_until_tick (operators do not want to mentally subtract tick counters; they want "how much longer").

Test-level coverage of the "not sliding" invariant (tasking test #8): the arming path only writes cool-down when entering DEGRADED (not while stuck in DEGRADED), and Part D includes test_cooldown_not_extended_while_degraded_persists — a repeated call inside Step 8.4 only decrements, never resets.

Execution ordering (unchanged from FLAG-042)¶

Step 8.4 order is the same as FLAG-042:

Step 8.4  — DEGRADED recovery evaluators (anchor → drift → corridor)
            — each now begins with cool-down decrement + suppression
Step 8.5  — Anchor saturation guard
Step 8.5b — Directional drift guard
Step 8.5c — Inventory corridor guard

Suppression happens before the reason / stability checks — a suppressed tick is a genuine no-op for that evaluator. If a tick exits DEGRADED at 8.4 and 8.5–8.5c immediately re-trigger, that is a fresh DEGRADED entry for the source: episode counter increments, and if new_count > 1 the cool-down is armed. If new_count >= max_degraded_episodes_per_source_per_session, the engine halts via degraded_episode_limit_halt.

Backward compatibility¶

recovery_enabled=false on any guard still restores pre-FLAG-042 one-way behavior (guard fires, engine stays DEGRADED until restart). Cool-down / episode-count paths are skipped via the top-level recovery_enabled gate in each evaluator.
No schema migrations — all state is engine_state K/V with string values.
FLAG-042 guard tests (test_anchor_saturation_guard, test_directional_drift_guard, test_inventory_corridor_guard, test_reconciler_conservative) unchanged; they keep their source= kwargs added during FLAG-042.
Any operator who had max_recovery_attempts_per_episode hand-set in a YAML override will hit a config validator error on load — intentional; the field is dropped. The new defaults (recovery_cooldown_ticks: 120, max_degraded_episodes_per_source_per_session: 3) are written into all three checked-in configs (config.yaml, config.example.yaml, config_live_stage1.yaml).

Files touched¶

config/config.example.yaml                   |  38 ++   (degraded_recovery block swap)
config/config.yaml                           |  38 ++   (same)
config/config_live_stage1.yaml               |  26 ++   (same — live stage 1)
neo_engine/config.py                         | 146 ++   (DegradedRecoveryConfig + validator + loader)
neo_engine/main_loop.py                      | 411 ++   (taxonomy swap, state keys, _bump_episode_count,
                                                          _arm_recovery_cooldown, _decrement_recovery_cooldown,
                                                          entry-side cap + arm, Step 8.4 suppression in 3 evaluators,
                                                          startup reset extension)
tests/test_flag_042_degraded_recovery.py     |  22 ++   (docstring refresh + mock stub)
tests/test_flag_044_recovery_cooldown.py     | 831 ++   (new — 26 tests)

Operator impact¶

Healthy sessions (no DEGRADED): zero observable change. All new code paths gated on mode == DEGRADED and recovery_enabled=True.
First DEGRADED entry from a source: episode_count=1, cool-down NOT armed (cool-down only arms on re-entry). Recovery can proceed immediately when conditions stabilize — matches FLAG-042 first-episode semantics.
Re-entry after a recovery (same source, same session): episode_count=2, cool-down armed to recovery_cooldown_ticks (default 120 ≈ 8 min at 4s/tick). Engine stays idle in DEGRADED for the waiting window; no quoting, truth checks continue. DEBUG log recovery_suppressed_by_cooldown each tick with a decrementing count. After cool-down expires, normal recovery evaluator runs — if the regime has cleared, the engine exits DEGRADED; if not, it stays DEGRADED until stability conditions hold.
Third re-entry from the same source in one session: episode_count=3 hits max_degraded_episodes_per_source_per_session=3 (cap semantics: >=). Engine halts immediately with halt_reason=degraded_episode_limit_halt and source-tagged detail. Match the inventory_truth_halt escalation contract.
Episode cap is per-source: anchor hitting 3 never affects drift or corridor counters — per the tasking directive and confirmed in Part C test test_episode_cap_is_per_source.
Per-session reset: all three (cooldown_ticks_remaining, episode_count, and mode/degraded_since/degraded_reason) clear on fresh session startup via the existing reset block.

What this buys in the hostile-regime case that killed S45: the engine can now wait 8 minutes for a hostile regime to clear without terminating. S45-style scenarios (guard fires, recovers, re-fires) go: 1st entry → recover → 2nd entry (cool-down armed, wait 120 ticks) → recover or stay waiting → 3rd entry → HALT. Atlas's principle — "A bad regime is not, by itself, a reason to terminate the session" — is what the new defaults encode.

Apply instructions (Windows / PowerShell)¶

Patches live at 02 Projects/NEO Trading Engine/08 Patches/patches-flag-044-recovery-cooldown/ (4 files, 0001 → 0004). From Katja's VS Code terminal:

cd C:\Users\Katja\Documents\NEO GitHub\neo-2026
git checkout main
git pull

# Defensive: clear any pre-existing branch from a prior attempt.
git branch -D feat/flag-044-recovery-cooldown 2>$null

git checkout -b feat/flag-044-recovery-cooldown

Get-ChildItem "C:\Users\Katja\Documents\Claude Homebase Neo\02 Projects\NEO Trading Engine\08 Patches\patches-flag-044-recovery-cooldown" -Filter "*.patch" |
    Sort-Object Name |
    ForEach-Object { git am $_.FullName }

# Verify
git log --oneline main..HEAD
# Expected (topmost 4):
#   c926cb7 test(recovery): FLAG-044 C4 — 26 tests for cool-down + session cap
#   f0a6d79 feat(main_loop): FLAG-044 C3 — cool-down suppression at Step 8.4
#   b557f23 feat(engine): FLAG-044 C2 — cool-down taxonomy, state keys, entry-side logic
#   4e6d681 feat(config): FLAG-044 C1 — cool-down + session episode cap (Atlas 2026-04-21)

# Regression
python -m pytest tests/test_flag_044_recovery_cooldown.py tests/test_flag_042_degraded_recovery.py -q
# Expected: 42 passed, 2 subtests passed

python -m pytest tests/test_anchor_saturation_guard.py tests/test_directional_drift_guard.py `
  tests/test_inventory_corridor_guard.py tests/test_reconciler_conservative.py `
  tests/test_reconciler_anomaly_log.py tests/test_flag_036_wallet_truth_reconciliation.py `
  tests/test_flag_033_startup_integrity.py tests/test_halt_reason_lifecycle.py `
  tests/test_flag_042_degraded_recovery.py tests/test_flag_044_recovery_cooldown.py -q
# Expected: 165 passed (+ 2 subtests)

Prerequisite: feat/flag-042-degraded-recovery must already be on main (merged Apr 21 per CLAUDE.md, commit 9639b18). If main is behind that merge, apply the FLAG-042 bundle first — FLAG-044 rewrites the FLAG-042 taxonomy / state / entry-logic block and the C2 hunk will not apply cleanly to pre-FLAG-042 main.

Post-merge operator note¶

Once this lands, the recommended live session command is --duration-seconds 7200 (2-hour sessions) per CLAUDE.md — long enough for the cool-down mechanism to actually be exercised in a hostile regime without butting up against the duration limit. Pre-session realignment (tools/realign_inventory_to_onchain.py) standing procedure is unchanged.

Status¶

C1–C4 complete. Branch is clean. One documented deviation (countdown vs. absolute-tick storage, Katja green-lit) flagged above. All tasking test slots covered + 18 additional tests in Parts A/B/D. Awaiting your review.

— Orion