Atlas Ruling — Replace recovery_exhausted_halt with Cool-Down Session Model¶
To: Vesper (she/her), Orion (he/him) From: Atlas (he/him) CC: Katja (Captain) Date: 2026-04-21 Re: Spec question — idle session model (filed from NEO Desk escalations)
Final Ruling¶
Option A approved. Replace recovery_exhausted_halt with a cool-down model.
Diagnosis¶
The current FLAG-042 design solved oscillation, but solved it by converting a recoverable market-state problem into an operator restart problem. That is too rigid.
The one-recovery-attempt cap collapses two distinct cases into one outcome: 1. Borderline oscillation that should not flap endlessly 2. Persistently hostile regime that may normalize later in the same session
Both currently become: recover once → fail once → halt. That is too blunt.
Locked principle: A bad regime is not, by itself, a reason to terminate the session. A bad regime is a reason to stop trading and keep observing.
Termination should happen because: - Session duration elapsed - Truth/integrity failed - A hard safety boundary was crossed - Repeated recovery attempts indicate a deeper control failure
Not merely because the market stayed bad for a few minutes.
Approved Model¶
Replace:
- Per-episode attempts counter
- recovery_exhausted_halt
With: - Per-source cool-down timer - Repeated recovery opportunities after cool-down expires - Session duration as the outer safety wall
This preserves anti-oscillation protection, ability to idle through hostile conditions, and ability to re-enter when the regime clears.
Configuration Ruling¶
Approved as the default. At current cadence (~4s), that is roughly 8 minutes — long enough to avoid tick-to-tick thrashing, short enough to permit real intra-session recovery in a 2–4 hour run. Make it config-driven and tunable.
Required Safety Constraints¶
1. Cool-down is per guard source¶
Track independently for anchor saturation, directional drift, and inventory corridor. A failed anchor recovery should not block a truth-layer recovery path, and vice versa.
2. DEGRADED is the only state during cool-down¶
During cool-down: no quoting, no order placement, continue observation and truth checks, log that recovery is suppressed. This is an idle state, not a soft-active state.
3. Cool-down does not reset on every hostile tick¶
The timer is set when a recovery fails and the guard re-fires. It is NOT extended continuously because the hostile condition persists. Reason: otherwise you create an unbounded sliding lockout. Discrete waiting windows, not a moving prison.
4. Recovery still requires hysteresis + stability¶
Cool-down changes when recovery evaluation is allowed, not how recovery is judged. Exit from DEGRADED still requires: exit threshold below entry threshold, sustained stability window, no single-tick exits.
5. No infinite flapping — episode cap as session-level backstop¶
If the same guard source enters DEGRADED more than N times in one session → HALT.
Model becomes: try → wait → try again → but not forever.
6. Taxonomy changes¶
- Remove:
recovery_exhausted_halt - Add (logged state, not a halt):
recovery_cooldown_active - Add to state/logs:
degraded_episode_count,cooldown_until_tick,recovery_suppressed_by_cooldown
What Must NOT Change¶
- DEGRADED semantics (cancel all, stop quoting, continue observation)
- Truth-guard recovery behavior
- Thresholds (hysteresis unchanged)
- Session duration limit
- Recovery evaluating every tick during cool-down (it does not)
The system remains: trade → detect → degrade → wait safely → re-evaluate → resume if safe.
Session Duration Ruling¶
Use 2-hour sessions (--duration-seconds 7200) as the default test length for this model. 4-hour sessions acceptable later if 2-hour runs behave cleanly. Do not jump straight to 4 hours.
Engineering Scope¶
Focused follow-on to FLAG-042. Not a redesign.
Required changes:
- Drop attempt-counter logic
- Add per-source cooldown tracking (cooldown_until_tick)
- Add per-source episode counting (degraded_episode_count)
- Add config params: recovery_cooldown_ticks, max_degraded_episodes_per_source_per_session
- Preserve hysteresis / stability logic unchanged
Required tests: - Cool-down activation - Cool-down suppression during active cool-down - Post-cool-down re-evaluation - Successful recovery after cool-down - Repeated DEGRADED episodes → HALT at episode cap - Cool-down and episode counter reset at session start
Bottom Line¶
Approved: replace recovery_exhausted_halt with cool-down.
Default cooldown: 120 ticks (~8 min).
Add per-source episode cap (max 3) as session-level backstop.
Keep DEGRADED as idle observation state.
Use 2-hour sessions first.
S43–S45 showed the current design is safe but too brittle. The cool-down model preserves safety and adds patience. That is the correct evolution.
— Atlas