Summary¶
Pre-code investigation answers for Atlas's ANCHOR_IDLE ruling and your
tasking memo. Bottom line: the change is structurally clean — the
existing FLAG-042 hysteresis config on AnchorSaturationGuardConfig is
already the right shape for the ANCHOR_IDLE exit evaluator (no new
config needed); anchor saturation currently has exactly one entry call
site; the pre-trade truth gate needs a one-line extension to also
block on ANCHOR_IDLE; and ANCHOR_IDLE bypasses the FLAG-044 episode
machinery entirely (anchor leaves RECOVERY_CAPPED_SOURCES). Five
findings below, then a risk register and a recommended implementation
sequence.
Two non-trivial decisions need your ruling before I cut the branch:
-
Retire anchor from FLAG-044 now, or keep the old keys dead for one session? Retiring in the same branch is cleaner but touches test_recovery* files (anchor-source episode-cap tests become unreachable). Keeping the keys dead defers cleanup but ships the ruling faster.
-
_current_truth_mode()return surface. Current API returns MODE_OK / MODE_DEGRADED / MODE_HALT. Adding MODE_ANCHOR_IDLE expands it to four. Every call site that treats the return as a binary "safe to quote?" check needs auditing. I've found the two that matter (truth gate +_evaluate_anchor_recovery); Q3 covers the scan.
Q1 — Current anchor saturation entry path¶
Call site: _evaluate_anchor_saturation_guard in neo_engine/main_loop.py.
In the local sandbox tree (pre-FLAG-042/044 numbering) this lives at
lines 1518–1607. After FLAG-042 C3 + FLAG-044 C2 it shifts down (C3
added a 122-line _evaluate_anchor_recovery, C2 added ~250 lines of
taxonomy + helpers); canonical main numbering will differ but the
function body is intact.
Trigger construction (all three must hold on the same tick):
1. len(self._anchor_error_window) == self._anchor_error_window.maxlen
# i.e. cfg.lookback_ticks (default 25) real values populated —
# None-valued ticks skipped, not zero-filled
2. abs(mean(window)) >= cfg.bias_threshold_bps # default 7.0
3. prevalence_pct >= cfg.prevalence_pct # default 40.0
# where prevalence_count = #{|x| > cfg.prevalence_threshold_bps (5.0)}
On trigger (current, post-FLAG-042/044):
That call does FOUR things today:
- bumps degraded_recovery.anchor.episode_count
- on episode_count >= cap (default 3) → HALT with
HALT_REASON_DEGRADED_EPISODE_LIMIT (degraded_episode_limit_halt)
- on episode_count > 1 arms degraded_recovery.anchor.cooldown_ticks_remaining
- sets KEY_MODE=MODE_DEGRADED, KEY_DEGRADED_SINCE, KEY_DEGRADED_REASON,
cancels all live orders (_cancel_all_live_orders), logs WARN
Data flowing into the decision:
- self._anchor_error_window: deque[float] — filled per tick from
last_anchor_divergence_bps (strategy's anchor-vs-mid deflection).
- cfg = self._config.anchor_saturation_guard — the frozen dataclass.
- Session one-shot: self._anchor_guard_triggered_this_session: bool
(belt-and-braces so the persistence + WARN log block fires exactly
once per session; FLAG-042 C3 already resets this when anchor
recovery exits DEGRADED).
Tick order of operations (Step 8.x in _tick):
- Step 8.4 — _evaluate_anchor_recovery() (FLAG-042 C3) runs FIRST
- Step 8.5 — _evaluate_anchor_saturation_guard() runs SECOND
This ordering means a tick that recovers out of DEGRADED can be re-evaluated by the guard in the same tick (the FLAG-044 immediate-re-entry pathway). For ANCHOR_IDLE the same ordering applies — the ANCHOR_IDLE exit evaluator runs before the guard at Step 8.5, so a tick that normalizes into ACTIVE can still re-enter ANCHOR_IDLE immediately if the exit was spurious.
Only ONE call site for anchor. Verified via grep — the anchor
string anchor_saturation appears in other spots (docstrings, log
tokens, KEY_DEGRADED_REASON reads) but the only place
_enter_degraded_mode(..., source=SOURCE_ANCHOR) is called is inside
_evaluate_anchor_saturation_guard. Nothing outside the guard's own
evaluator triggers anchor DEGRADED today; the conversion to
ANCHOR_IDLE is strictly local.
Q2 — Exit threshold parameters — can ANCHOR_IDLE reuse FLAG-042?¶
Yes, cleanly. AnchorSaturationGuardConfig already carries the
exact hysteresis fields Atlas specified. Current config (post-FLAG-042
C1, per the live-stage-1 YAML):
@dataclass(frozen=True)
class AnchorSaturationGuardConfig:
enabled: bool = True
lookback_ticks: int = 25
bias_threshold_bps: float = 7.0 # entry
prevalence_threshold_bps: float = 5.0 # entry
prevalence_pct: float = 40.0 # entry
recovery_enabled: bool = True # existing
recovery_exit_bias_threshold_bps: float = 4.0 # exit — matches Atlas's 4 bps
recovery_exit_prevalence_pct: float = 30.0 # exit — matches Atlas's 30%
recovery_stability_ticks: int = 30 # exit — matches Atlas's "N consecutive ticks"
No new config parameters needed. The three recovery_exit_* fields
are already validated (FLAG-042 invariant check — exit strictly tighter
than entry) and already wired through to the live-stage-1 YAML.
Semantic rename decision: these fields currently document as
"DEGRADED recovery thresholds" for anchor. Under the new model they
govern ANCHOR_IDLE exit (the regime-pause exit), which is semantically
the same operation on the same data window but through a new state
transition. Recommendation: keep the field names (no YAML churn,
no migration), update docstrings to say "ANCHOR_IDLE exit thresholds"
with a "formerly FLAG-042 DEGRADED recovery for anchor source" note.
The recovery_enabled field stays — gating ANCHOR_IDLE's exit
machinery is useful for regression debugging exactly as it was for the
old anchor recovery.
Alternative considered + rejected: rename to
anchor_idle_exit_*. Tempting for clarity, but it's a YAML-breaking
change against live-stage-1 and the meaning is identical. Not worth
the operator churn.
Q3 — State representation¶
Current DEGRADED tracking (3 keys + 1 process cache):
engine_state keys (SQLite-persisted):
inventory_truth.mode KEY_MODE
values: MODE_OK | MODE_DEGRADED | MODE_HALT (imported from state_manager)
inventory_truth.degraded_since KEY_DEGRADED_SINCE (ISO timestamp)
inventory_truth.degraded_reason KEY_DEGRADED_REASON (free-form string)
Process cache:
self._degraded_since_epoch: Optional[float]
Plus FLAG-044 per-source bookkeeping (not mode-bearing):
degraded_recovery.anchor.episode_count
degraded_recovery.anchor.cooldown_ticks_remaining
(and same for drift / corridor)
Mode is READ through _current_truth_mode() which returns one of
MODE_OK / MODE_DEGRADED / MODE_HALT. MODE constants live in
neo_engine/state_manager.py.
Proposed ANCHOR_IDLE additions (parallel set, not a replacement):
state_manager.py:
MODE_ANCHOR_IDLE = "anchor_idle" # new constant
main_loop.py (engine_state keys):
anchor_idle.since # ISO timestamp
anchor_idle.reason # reserved — always "anchor_saturation" today
main_loop.py (process cache):
self._anchor_idle_since_epoch: Optional[float]
self._anchor_idle_exit_stability_ticks: int # NEW — distinct from
# _anchor_recovery_stability_ticks
# which is FLAG-042 DEGRADED exit
Recommend a new stability counter, not repurposing
_anchor_recovery_stability_ticks. Even though the field retires
for anchor under the new model, a same-branch rename is mechanical
churn and loses the paper-trail of what the old counter was for. New
field, old field deleted in the same commit.
_current_truth_mode() extension. Option A: have it return one of
four values. Option B: add a parallel _current_anchor_idle()
predicate and leave truth_mode as the tri-state. Recommend A —
downstream consumers already key off the string from KEY_MODE, not
from Python enums, so MODE_ANCHOR_IDLE as a new return value is no
worse a surface area than adding a new helper that the same consumers
would still need to call. Audit below (Q4) identifies the two call
sites that matter.
No state_manager schema migration needed. engine_state is a key/value table (see D2 C1); new keys add without migration.
Q4 — Interaction with inventory_truth_gate + fix/startup-mode-reset¶
Truth gate today — _check_inventory_truth_gate, main_loop.py
~line 1027–1120. The block condition is:
if mode == MODE_DEGRADED or mode == MODE_HALT:
# block — refusal logged, intent dropped, rate-limited at 1:50
Extension for ANCHOR_IDLE: one-liner.
Refusal log message update: the current format string says
"inventory_truth_gate: mode={mode} ...". With MODE_ANCHOR_IDLE
flowing through, the log token auto-updates — operators see
mode=anchor_idle in the refusal log and the rate-limiter (1:50)
still applies. Recommend adjusting the WARNING-level message to
distinguish "safety block" (DEGRADED/HALT) from "regime pause"
(ANCHOR_IDLE) — a single dict-key branch, not a structural change.
Nice to have, not required for correctness.
No change to inventory_truth_checker. It doesn't inspect
KEY_MODE; it writes to inventory_truth_snapshots and returns a
status that the main loop routes. ANCHOR_IDLE is orthogonal.
fix/startup-mode-reset extension:
Current fresh-session reset block (parent_session_id is None path in
_startup) clears:
- KEY_MODE → MODE_OK
- KEY_DEGRADED_SINCE → ""
- KEY_DEGRADED_REASON → ""
- FLAG-044 per-source keys: cooldown + episode_count for anchor /
drift / corridor (6 keys)
Add to the fresh-session clear block:
- anchor_idle.since → ""
- anchor_idle.reason → ""
Preserve on recovery-restart (parent_session_id != None) — symmetric with KEY_DEGRADED_* and FLAG-044 counters, so a resuming engine sees its prior ANCHOR_IDLE state if the parent engine was idle at restart.
_current_truth_mode() call-site audit (for the MODE_ANCHOR_IDLE
return-value extension):
| call site | file:line | current behavior | ANCHOR_IDLE behavior |
|---|---|---|---|
_check_inventory_truth_gate |
main_loop.py:1073 | blocks on DEGRADED/HALT | extend — also block on ANCHOR_IDLE |
_enter_degraded_mode idempotency check |
main_loop.py:~1445 | already_degraded = (mode == MODE_DEGRADED) |
unchanged — ANCHOR_IDLE is not DEGRADED; entering DEGRADED from ANCHOR_IDLE is a fresh entry (correct per Atlas: episode counted from there) |
_evaluate_anchor_recovery guard |
main_loop.py:~1790 (C3) | if mode != MODE_DEGRADED: return |
retire — this evaluator is replaced by _evaluate_anchor_idle_exit |
_exit_degraded_mode idempotency |
main_loop.py:~1435 | if mode != MODE_DEGRADED: no-op |
unchanged — ANCHOR_IDLE has its own exit |
_maybe_escalate_to_halt (truth timeout) |
~1590 | if mode == MODE_DEGRADED and timeout_exceeded(): escalate |
unchanged — ANCHOR_IDLE does not escalate on truth timeout; timeout only applies to DEGRADED |
The audit is short because the state machine is already centralized. No scattered "is engine trading?" checks outside the truth gate.
Q5 — Episode counter isolation¶
Confirmation: ANCHOR_IDLE entry does NOT increment episode counts.
The mechanism is precise: entry into ANCHOR_IDLE does NOT call
_enter_degraded_mode. A new method _enter_anchor_idle_mode(reason)
handles entry and bypasses the FLAG-044 bookkeeping entirely.
Current FLAG-044 episode increment path for anchor:
_evaluate_anchor_saturation_guard
→ _enter_degraded_mode(reason="anchor_saturation_guard_exceeded",
source=SOURCE_ANCHOR)
→ if source in RECOVERY_CAPPED_SOURCES and not already_degraded:
→ _bump_episode_count(SOURCE_ANCHOR)
→ writes KEY_EPISODE_COUNT_ANCHOR via _write_int_engine_state
→ returns new_count
→ if new_count >= max_degraded_episodes_per_source_per_session:
→ _escalate_degraded_to_halt(..., halt_reason=HALT_REASON_DEGRADED_EPISODE_LIMIT)
→ if new_count > 1:
→ _arm_recovery_cooldown(SOURCE_ANCHOR)
→ writes KEY_COOLDOWN_TICKS_REMAINING_ANCHOR
Proposed ANCHOR_IDLE entry path (bypasses all of the above):
_evaluate_anchor_saturation_guard
→ _enter_anchor_idle_mode("anchor_saturation")
→ set KEY_MODE = MODE_ANCHOR_IDLE
→ set anchor_idle.since = iso(now)
→ set anchor_idle.reason = "anchor_saturation"
→ if not already_anchor_idle:
→ _cancel_all_live_orders("Anchor idle entry cancel")
→ WARN log: ANCHOR_IDLE_ENTER
else:
→ INFO log: ANCHOR_IDLE re-entered — reason updated
Zero calls to _bump_episode_count, _arm_recovery_cooldown,
_increment_recovery_attempts, or any of the FLAG-042/044 helpers.
No keys in degraded_recovery.anchor.* are ever written from the
ANCHOR_IDLE path.
Cleanup scope — anchor retires from RECOVERY_CAPPED_SOURCES:
Removing anchor from this tuple makes three pieces of FLAG-044
machinery unreachable for anchor:
1. _bump_episode_count(SOURCE_ANCHOR) — never called
2. _arm_recovery_cooldown(SOURCE_ANCHOR) — never called
3. _decrement_recovery_cooldown(SOURCE_ANCHOR) — never called
(the cooldown decrement loop iterates RECOVERY_CAPPED_SOURCES)
The three anchor engine_state keys (episode_count,
cooldown_ticks_remaining, plus the FLAG-042 vestigial
degraded_recovery.anchor.attempts if any) become dead but harmless
keys in SQLite — key/value with null reads. Recommend explicit
cleanup in the same branch:
- Delete
KEY_EPISODE_COUNT_ANCHOR,KEY_COOLDOWN_TICKS_REMAINING_ANCHOR, their entries in the dispatch tables, and any anchor-specific helper calls. - Update the fresh-session startup reset to remove the anchor lines from the 6-key FLAG-044 block (down to 4 keys — drift + corridor pair each).
HALT_REASON_DEGRADED_EPISODE_LIMITstays (still used by drift and corridor).
Escalation from ANCHOR_IDLE → DEGRADED: Atlas spec says "drift/corridor/truth fires → DEGRADED (episode counted from there)". Mechanically this works without new code:
- Drift guard at Step 8.5b already calls
_enter_degraded_mode(reason, source=SOURCE_DRIFT). - Corridor guard at Step 8.5c already calls
_enter_degraded_mode(reason, source=SOURCE_CORRIDOR). - Truth timeout path already routes through
_enter_degraded_mode(reason, source=SOURCE_WALLET_TRUTH).
Each of those _enter_degraded_mode calls currently assumes either
MODE_OK or MODE_DEGRADED as the pre-state. The idempotency check
(already_degraded = mode == MODE_DEGRADED) will correctly evaluate
to False when pre-state is MODE_ANCHOR_IDLE, so a drift fire during
ANCHOR_IDLE is treated as a FRESH entry into DEGRADED — episode_count
bumps for drift, cooldown potentially armed for drift. This is
correct per Atlas (drift DEGRADED episode counted from the transition).
One subtle correctness check: _enter_degraded_mode overwrites
KEY_MODE, KEY_DEGRADED_SINCE, KEY_DEGRADED_REASON. It does NOT clear
anchor_idle.since or anchor_idle.reason. Add explicit clear on
ANCHOR_IDLE → DEGRADED transition — either inside
_enter_degraded_mode (conditional on pre-state), or via a new
_exit_anchor_idle_on_escalation() helper invoked from the drift /
corridor / truth paths. Cleaner separation: new helper, called by
_enter_degraded_mode when mode_before_entry == MODE_ANCHOR_IDLE.
Log token: ANCHOR_IDLE_ESCALATED_TO_DEGRADED per the tasking spec.
Risk register + implementation order¶
Risks¶
-
Test suite impact (anchor-retire-from-FLAG-044). FLAG-042 C5 and FLAG-044 C4 added tests around anchor-source episode cap + cooldown. Best estimate from the patch bundles: ~5–8 anchor specific tests become stale. They must be removed or repurposed to drift/corridor. Scope is surgical but non-zero.
-
Live-stage-1 YAML drift. No YAML change needed per Q2, but I recommend adding a commented doc block under
anchor_saturation_guard:explaining that therecovery_exit_*fields now govern ANCHOR_IDLE exit (not DEGRADED recovery). Purely for operator clarity; no functional change. -
Dashboard / session-summary filters. Any code that filters sessions by "halt_reason != degraded_episode_limit_halt" still works (drift and corridor can still trigger it). No change.
summarize_paper_run.pyreads KEY_MODE transitively via halt.reason — ANCHOR_IDLE isn't a halt, so it won't surface there. Session rows for S48+ will havehalt.reason = duration_elapsedonce the engine survives a 2-hour hostile regime — which is the whole point of the change. -
Truth gate refusal log noise. ANCHOR_IDLE sessions will generate refusal logs at the rate-limited 1:50 cadence. This is the same pattern as DEGRADED today. Operator expectation: a healthy ANCHOR_IDLE session produces periodic refusal log entries — documented behavior, not an alert.
Implementation order¶
Proposing a 4-commit branch. All commits additive or surgical.
C1 — State surface + MODE constant
- state_manager.py: add MODE_ANCHOR_IDLE = "anchor_idle".
- main_loop.py: add KEY_ANCHOR_IDLE_SINCE,
KEY_ANCHOR_IDLE_REASON, process cache fields, import
MODE_ANCHOR_IDLE.
- Fresh-session startup reset extended with the two new keys.
- Zero runtime behavior change yet.
C2 — Entry / exit / escalation methods
- _enter_anchor_idle_mode(reason) — entry helper (cancel_all,
log ANCHOR_IDLE_ENTER, set keys).
- _exit_anchor_idle_mode() — exit helper (clear keys, set
MODE_OK, log ANCHOR_IDLE_EXIT).
- _exit_anchor_idle_on_escalation(new_source) — internal helper
invoked by _enter_degraded_mode when pre-state was
ANCHOR_IDLE; clears the anchor_idle.* keys and logs
ANCHOR_IDLE_ESCALATED_TO_DEGRADED. _enter_degraded_mode gains
a single conditional block for this.
- _evaluate_anchor_idle_exit() — the hysteresis evaluator,
reusing the existing anchor error window + FLAG-042
recovery_exit_* config fields + the new
_anchor_idle_exit_stability_ticks counter.
C3 — Retire anchor from FLAG-044 + rewire guard
- Anchor dropped from RECOVERY_CAPPED_SOURCES.
- KEY_EPISODE_COUNT_ANCHOR, KEY_COOLDOWN_TICKS_REMAINING_ANCHOR,
their dispatch-table entries, and _anchor_recovery_stability_ticks
field removed.
- _evaluate_anchor_recovery (FLAG-042 C3) removed; Step 8.4 now
dispatches _evaluate_anchor_idle_exit instead of
_evaluate_anchor_recovery. Drift + corridor recovery evaluators
unchanged.
- _evaluate_anchor_saturation_guard changes from
_enter_degraded_mode(..., source=SOURCE_ANCHOR) to
_enter_anchor_idle_mode("anchor_saturation").
- _check_inventory_truth_gate block condition extended.
- Fresh-session startup reset trimmed (anchor lines dropped from
the FLAG-044 clear block).
C4 — Tests + stale-test retirement
- New tests/test_anchor_idle_state.py — 5 Atlas-locked tests from
the tasking memo. All using the same Windows-safe teardown
pattern (StateManager.close() before TemporaryDirectory.cleanup()).
- Retire or repurpose the ~5–8 stale anchor-source
episode-cap/cooldown tests in test_recovery*.py.
- Expected regression footprint: the 5 new tests pass, no
anchor-source FLAG-044 tests remain to fail.
Standing-rule compliance for delivery¶
- No pre-created branch during investigation — no branch yet; this memo is on throwaway local state.
- No
*.patchglob in PowerShell — the delivery apply block will useGet-ChildItem ... | Sort-Object Name | ForEach-Object { git am $_.FullName }. - Defensive
git branch -D feat/anchor-idle-statebeforegit checkout -b— will be included in the apply block.
Local main drift acknowledgement¶
The sandbox tree I investigated on is PRE-FLAG-042/044 — the local
neo_engine does not have the FLAG-042 C3 _evaluate_anchor_recovery,
the FLAG-044 C2 helpers, or the updated _enter_degraded_mode
signature. I verified the FLAG-042 and FLAG-044 patch bundles in
08 Patches/patches-flag-042-degraded-recovery/ and
08 Patches/patches-flag-044-recovery-cooldown/ and my findings above
reflect the post-FLAG-044 behavior that's live in Katja's main. I will
recut against canonical main if git am fuzz appears at
implementation time.
Asks¶
- Ruling on the retire-anchor-from-FLAG-044 scope (same branch vs keep keys dead for one session).
- Ruling on the
_current_truth_mode()return-surface expansion (MODE_ANCHOR_IDLE as a fourth return value vs a parallel predicate). - Confirmation of the 4-commit sequence above.
- Any tests you want added beyond the 5 Atlas-locked ones — e.g.
explicit
ANCHOR_IDLE → DEGRADED via truth timeout(Atlas lists drift/corridor/truth; I have drift in test #4; adding truth as a sixth test is cheap if you want coverage).
Standing by for your review before I start C1.
— orion