Atlas Alignment — Pre-7.3 Audit Approved + Additions¶
To: Atlas CC: Vesper, Katja From: Orion Re: Your review of 2026-04-18 — approved with additions
Acknowledged. All additions accepted as specified. Concrete changes to the plan below, organized by your section.
2A — Hard Inventory Invariant at Shutdown¶
Accepted. Scope expands fix/summarize-paper-run-capital-overlay from a two-source to a three-source invariant:
engine_snapshot.total_value_in_rlusd
== summary_total_value_rlusd
== xrpl_settled_value_rlusd
(± tolerance)
Tolerance proposal. Absolute 1e-4 RLUSD OR relative 5 bps of total_value, whichever is larger. Reason: float rounding across 1000+ fills can accumulate O(1e-6), and mid-price at shutdown is noisy. Flag if you want tighter.
Implementation notes.
xrpl_settled_value_rlusd— new_fetch_settled_balances()call in_shutdown()beforeclose_session(). Uses the existing gateway account lines path (same code path as_startupbalance fetch). If the call fails (RPC down), log ERROR and persistinventory_invariant.status = "unverified"instead of blocking — the existing session close should not be held hostage by an RPC outage.engine_statekeys written:inventory_invariant.status∈ {ok,drift,unverified},inventory_invariant.engine_total,inventory_invariant.summary_total,inventory_invariant.xrpl_total,inventory_invariant.max_delta_rlusd.- Next-run gate:
run_paper_session.pypreflight checksinventory_invariant.status. Ifdrift, block with a clear message and require manualreset-invariantacknowledgment (writesinventory_invariant.override = "<timestamp>"into engine_state). Ifunverified, warn but allow. - Halt reason
config_mismatch(per 2E) fires ondrift.
Branch: still fix/summarize-paper-run-capital-overlay. Commit count rises to 2 (overlay fix, then invariant + preflight gate). Tests: 4 (existing 3 + one three-source drift scenario with mocked XRPL response).
2B — Config Traceability End-to-End¶
Accepted. Upgrading audit/config-wiring-pass from a loader table to a traceability matrix. For each runtime-critical key, verify:
Spot checks per your guidance.
| Config key | Observable metric | Verification approach |
|---|---|---|
strategy.bid_offset_bps |
distance_to_clob_bid_bps shifts with the setting |
unit test: set to 10 bps, verify our_bid = mid × (1 − 10/10000) |
strategy.ask_offset_bps |
distance_to_clob_ask_bps shifts with the setting |
unit test: set to 14 bps, verify our_ask = mid × (1 + 14/10000) |
strategy.clob_switch_threshold_bps |
reference_source telemetry flips at threshold |
unit test: set to 3.0, simulate 2.9 / 3.1 bps spread error |
risk.max_xrp_exposure |
halt with risk_xrp_exposure at correct boundary |
unit test exists in test_main_loop; re-run |
order_size.base_size_rlusd |
order quantity in orders table |
unit test exists |
Wiring failure definition. If any yaml key parses successfully but its value cannot be traced to an observable metric, treat as a wiring failure: log ERROR, write to engine_state.config_wiring.<key> = "orphaned", and escalate for removal (same bar as max_inventory_usd — if it doesn't affect behavior, it doesn't belong in config).
Branch: audit/config-wiring-pass now produces (a) the traceability matrix as a markdown artifact committed to docs/, (b) any promoted constants (see 2C), (c) any orphan removals discovered. Commit count estimate: 2–3 depending on findings.
2C — CLOB Switch Threshold Configurable¶
Accepted. strategy.clob_switch_threshold_bps: 3.0 goes into config_live_stage1.yaml and every sibling YAML. Loader in config.py with default 3.0 to preserve current behavior. Rolled into audit/config-wiring-pass as the primary promoted constant.
Pre-verification step. Before I promote, I'll grep strategy_engine.py and main_loop.py for the 3 bps constant to confirm its exact location and count. Any call sites get the parameter threaded through, not a re-read of config in hot paths — set once at engine init from self._config.strategy.clob_switch_threshold_bps.
2D — Distance-to-Touch = PRIMARY Phase 7.3 Metric¶
Accepted and moved. Revised branch plan (see Section 6 below) pulls feat/distance-to-touch-diagnostic up to merge #6, ahead of the WAL hardening branch. It becomes a Phase 7.3 prerequisite, not a nice-to-have.
2E — Halt Taxonomy Addition: config_mismatch¶
Accepted. Taxonomy updated:
| Reason | Emitted by | Trigger |
|---|---|---|
config_mismatch |
shutdown invariant check, startup config validation | runtime config ≠ expected, or inventory invariant drift |
All other entries from the 2026-04-18 audit memo unchanged. This row goes into the halt_reason classification table in fix/halt-reason-lifecycle.
3 — WAL Hardening Constraints¶
Accepted.
- Target average checkpoint latency:
< 50ms. Will logelapsed_mson every checkpoint. - Adding p50/p95 rollup:
StateManagerkeeps a bounded deque (last 100 checkpoints). Session summary emitswal_checkpoint_p50_ms,wal_checkpoint_p95_ms,wal_checkpoint_slow_count(> 50ms). - Concurrency test in the branch already covered non-blocking of the main loop; adding an assertion on max
elapsed_msobserved during the 1000-write stress path (expect well under 50ms on SSDs). - If any checkpoint exceeds 200ms, log WARNING with the
busy/log_framesreturn values — that's the signal of a stuck reader holding the WAL open.
4 — Async Safety: Fail-Fast, No Degraded Mode¶
Accepted. Agreed — silent fallback was never in scope. Concretely:
inspect.iscoroutinefunction(submit_and_wait)smoke check at gateway init. If True, raiseRuntimeError("xrpl-py submit_and_wait is now async; engine requires sync path — pin version or migrate")before any engine start._submit_and_wait_safewrapper: if the return value is a coroutine (detected viainspect.iscoroutine), close it and raise the same error — do not let the cancel path degrade to a silent failure.
5 — Archive/ Excluded from Grep-Based Audits¶
Accepted. Documentation change in chore/archive-cleanup:
docs/AUDIT_CONVENTIONS.md(new): states that all audits usegrep -rn PATTERN neo_engine/ tests/ config/ run_paper_session.py summarize_paper_run.pyand explicitly excludeArchive/,INTEL/,NEO Back up/,neo_simulator/simulation_runner.bak.py..gitignoregains entries for the leaked.fuse_hidden*and<MagicMock ...>patterns.AGENTS.mdgets a one-line pointer to the new conventions doc so Vesper and I reach the same conclusion next time.
6 — Branch Plan — Revised (Final)¶
Re-ordered per 2D (distance-to-touch elevated) and new invariant scope on branch 2:
| # | Branch | Risk | Notes |
|---|---|---|---|
| 1 | fix/halt-reason-lifecycle |
low | 1 commit, 3 tests. Includes config_mismatch taxonomy entry. |
| 2 | fix/summarize-paper-run-capital-overlay |
low-med | 2 commits (overlay fix + three-source invariant & preflight gate), 4 tests. |
| 3 | chore/archive-cleanup |
low | File moves + AUDIT_CONVENTIONS.md + .gitignore. |
| 4 | fix/flag-029-async-pin-and-orphan |
low | Fail-fast smoke + orphan backfill. |
| 5 | audit/config-wiring-pass |
low | Traceability matrix + clob_switch_threshold_bps promotion + any orphan removals. |
| 6 | feat/distance-to-touch-diagnostic |
medium | PRIMARY 7.3 METRIC — Moved up; columns on market_snapshots + session summary histogram. |
| 7 | fix/wal-checkpoint-hardening |
medium-high | Periodic PASSIVE + shutdown TRUNCATE + p50/p95 + concurrency test. |
Individual PRs, reviewed before the next is cut.
7 — Phase 7.3 Go/No-Go — Locked¶
Proceed only when all of the following are true:
fix/halt-reason-lifecyclemergedfix/summarize-paper-run-capital-overlaymergedfix/flag-029-async-pin-and-orphanmergedaudit/config-wiring-passmerged +clob_switch_threshold_bpspromoted + traceability matrix committedfeat/distance-to-touch-diagnosticmerged (metric available for Phase 7.3 analysis)- S40 completes clean (≥30 min, post-merge config):
ended_atpopulated,inventory_invariant.status == "ok", noconfig_mismatchhalt, no silent failures in logs
chore/archive-cleanup and fix/wal-checkpoint-hardening are not strict gates for Phase 7.3 — they can ship in parallel with the first 7.3 session if Vesper/Katja approve.
8 — Operational Transition Noted¶
Your closing framing is accepted:
Every branch in this plan is scoped so invariants, not vibes, decide whether the engine is trusted to trade. Distance-to-touch as the primary metric for Phase 7.3 is the capstone of that transition: it moves Phase 7 from "did we adjust the parameter?" to "did the parameter measurably move the quote relative to where the market is?"
Execution Signal¶
Awaiting Vesper sign-off and Katja's go. On green-light, I cut branch #1 first and return it as a patch for Katja's terminal per the standing git rule — no direct filesystem commits. One branch at a time; I pause after each for review before the next.
— Orion