Atlas Ruling — DB Reliability, SMB Risk, and VPS Migration Sequencing¶
1. Database Corruption — Atlas's Read¶
Treat the current local+SMB SQLite setup as operationally untrustworthy.
Not "a bit fragile." Not "monitor it more closely." Untrustworthy.
Repeated corruption, unrecoverable session loss, and a persistence model that depends on SQLite WAL behavior in an environment SQLite explicitly does not like. That is enough.
This is now an infrastructure problem, not an application bug.
FLAG-007 hardening was still worth doing, but it does not change the underlying storage reality.
Root cause assessment: The SMB hypothesis is the leading explanation and should be treated as the working root cause unless disproven.
- Repeated pattern, not isolated event
- WAL corruption, not just app-level inconsistency
- Unrecoverable database state
- Known incompatibility class between SQLite WAL semantics and network filesystems / file locking edge cases
Stop expecting further software-side "hygiene" to solve this completely.
2. Direct Answers — DB Reliability Questions¶
Q: Add pre-session DB integrity check?
Yes. Approved immediately.
Add a startup integrity gate:
- PRAGMA integrity_check
- Fail closed if result is not clean
- Do not start the session on a suspect DB
This does not prevent corruption. It does prevent starting from bad state, wasting a session on a broken file, or discovering corruption too late.
Q: Add automated pre-session backups?
Yes. Approved immediately. Now mandatory, not optional.
Minimum: - Timestamped pre-session backup - Created before every live run - Retained with rolling policy
Reason: still on unstable storage. Until migration, point-in-time rollback is required.
Q: Should Cowork treat DB as read-only?
Yes.
Engine process = sole writer. Everything else = read-only consumer.
Do not let Cowork or any secondary process write to the live DB. Do not let analysis tooling touch WAL behavior beyond reading snapshots or copies.
Until migration: - Engine writes to live DB - Analysis reads from copied DB or exported artifacts - No shared live-write access pattern
This is an immediate operating rule.
3. Additional Short-Term Controls¶
Before migration, add these operating safeguards:
A. Per-session backup before run — Mandatory.
B. Post-session backup after clean close — Also mandatory.
C. Read analysis from copies, not live DB — No tooling should inspect the live DB directly if a copied snapshot can be used instead.
D. DB health artifact per session — Each session should record: - Integrity check result at start - Backup timestamp used - DB path used - Whether post-session backup succeeded
This becomes part of session integrity.
4. VPS Migration — Atlas's Read¶
Katja is correct. VPS migration is no longer a "future nice-to-have." It is moving toward near-term necessity.
Not because the engine is fully ready for production — but because the storage substrate is now undermining trust in results. You cannot keep proving system readiness on top of an unreliable persistence layer.
That said: do NOT migrate before anchor calibration is resolved enough to make sessions meaningful. Migration should not become avoidance.
Sequencing:
Fix signal validity enough to make sessions worth running
→ then move to stable infrastructure
→ then continue clean-session proof
5. Direct Answers — VPS Questions¶
Q: Preferred VPS provider / OS baseline?
Ubuntu LTS on a simple, boring VPS.
Provider preference order: 1. Hetzner 2. DigitalOcean 3. Linode
Reason: simple, cost-effective, predictable, plenty good enough for this workload.
Baseline: - Ubuntu LTS - Local SSD-backed filesystem - Single-node deployment - Engine + SQLite local to the box - No SMB - No network-mounted DB
Keep it boring.
Q: Architectural changes before migration?
Do not overbuild this.
Not wanted: - DB split from engine into a separate service - API layer first - Premature distributed architecture
Wanted before/at migration: - Local filesystem only - Engine as sole DB writer - Automated backup scripts - Health checks - Clear runtime/service management - Log rotation - Session artifact export path
That is enough.
Q: Migration timing — standalone branch after FLAG-048, or wait until after Phase 7.4?
Plan migration as the first major post-FLAG-048 infrastructure task, but do not execute until anchor recalibration path is sufficiently validated.
Recommended sequencing: 1. Resolve FLAG-048 / anchor calibration enough that the engine is no longer idling on a broken signal 2. Run at least one meaningful validating session under corrected anchor logic 3. Execute VPS migration 4. Pursue the 2 clean-session Phase 7.4 requirement on the VPS
6. Reclassification Ruling¶
FLAG-023 should be reclassified from "future / low urgency" to "near-term infrastructure priority."
Not current blocker. But no longer back-burner.
7. Operational Posture From This Point¶
Current local setup = development / interim validation only VPS = first serious operational environment
Stop treating the current box as something to trust long-term. Use it only to get through the current calibration layer, then move.
8. Final Directives¶
Immediate: - Add startup integrity check - Add automated pre-session backup - Add automated post-session backup - Enforce engine-only write access - Read analysis from copies, not live DB
Near-term: - Reclassify FLAG-023 upward - Prepare VPS migration plan - Do not migrate until anchor recalibration is sufficiently validated
Preferred platform: - Ubuntu LTS - Hetzner first choice, DigitalOcean second, Linode third - Local SSD, single-node, SQLite local
The database issue is real, recurring, and infrastructure-rooted. Treat SMB + SQLite WAL as untrustworthy. Mitigate immediately. Migrate soon after anchor calibration is validated. Katja's instinct is right.
— Atlas (CSO) 2026-04-22