Skip to content

Windows: crash on WAL startup in v1.38.0+ due to missing SEH in ccgo build (v1.37.0 OK)

Summary

  • After upgrading to modernc.org/sqlite v1.38.0 (SQLite 3.50.1), our user's Windows Server 2019 deployment crashes at startup when opening a database in WAL mode. Pinning to v1.37.0 avoids the problem.
  • Root cause: SQLite 3.50.1 relies on Windows SEH around the WAL-index header “optimistic” unlocked read. The ccgo build is compiled with SQLITE_OMIT_SEH, so Windows I/O exceptions from memory-mapped *-shm reads are not caught and crash the process. In upstream C builds, SEH catches these and SQLite internally retries/recover, so there is no crash.

Environment

  • OS: Windows Server 2019
  • DB: WAL mode, files on a secondary data volume
  • Likely filters on that volume (AV/EDR/backup/snapshot/dedup/compression), which increase transient in-page faults on mmapped I/O (asked user on more details)

Observed Crash

  • Fatal on first prepare/open:
    • signal 0xc0000006 (STATUS_IN_PAGE_ERROR)
    • faulting in memcpy during wal-index header read from *-shm
  • Top of stack (example):
    unexpected fault address 0x20e69690000
    [signal 0xc0000006 code=0x0 addr=0x20e69690000 pc=...]
    
    modernc.org/libc.Xmemcpy(...)
    modernc.org/sqlite/lib._walIndexTryHdr(...)
    modernc.org/sqlite/lib._walIndexReadHdr(...)
    modernc.org/sqlite/lib._walTryBeginRead(...)
    modernc.org/sqlite/lib._walBeginReadTransaction(...)
    modernc.org/sqlite/lib._sqlite3PagerSharedLock(...)
    modernc.org/sqlite/lib.Xsqlite3_prepare_v3(...)

sqlite_win2019_crash.txt

  • The fault address is 64KB-aligned (typical MapViewOfFile granularity), consistent with a page fault in the mapped *-shm view during the “optimistic” unlocked read.

Why this is upstream-related but only crashes in the ccgo build

  • In SQLite C (3.50.1–3.50.4, Windows):
    • walIndexReadHdr first does an unlocked read (two memcpy reads of 48 bytes) from the mapped *-shm header, then retries under a lock if needed.
    • The retry path sets writeLock = 2 specifically so walHandleException() will unlock if a SEH exception is thrown.
    • The read is inside __try/__except (SEH). If Windows raises STATUS_IN_PAGE_ERROR (e.g., due to filter driver/SMB/dedup transient or truncate), SQLite catches it and returns WAL_RETRY/SQLITE_IOERR. No crash; the library internally retries/recover.
  • In the Go-translated ccgo build:
    • Built with SQLITE_OMIT_SEH (as indicated in the generated file banner), so Windows exceptions are not caught. The unlocked memcpy can raise 0xC0000006 and terminate the process before SQLite’s retry logic can run.
  • Version note: v1.38.0 (upgrade to SQLite 3.50.1) explicitly sets writeLock = 2 and expects SEH. v1.37.0 did not exhibit this crash on our host.

Expected Behavior

  • On Windows, transient mmapped I/O faults during WAL header reads should not crash the process; they should be handled internally (converted to WAL_RETRY/SQLITE_IOERR) as in upstream SQLite.

Proposed fixes (SQLite C upstream and ccgo wrapping)

SQLite C upstream

  • Option 1: Compile-time switch to skip the optimistic unlocked header read on Windows when SEH is unavailable.

    • In walIndexReadHdr (wal.c), under #if defined(SQLITE_OS_WIN) && defined(SQLITE_OMIT_SEH), set the initial attempt to “failed” (e.g., badHdr = 1;) so the code immediately takes the locked retry path (no unlocked memcpy from *-shm).
    • Alternatively, introduce a dedicated macro (e.g., SQLITE_WAL_SKIP_OPTIMISTIC_READ) and enable it by default when SQLITE_OMIT_SEH && SQLITE_OS_WIN.
    • Benefit: preserves upstream’s behavior everywhere else; avoids SEH-only assumptions when SEH is not present.
  • Option 2: Compile-time switch to force heap wal-index when SEH is unavailable on Windows.

    • In early WAL-open/read paths, when SQLITE_OMIT_SEH && SQLITE_OS_WIN, set pWal->bShmUnreliable = 1 and pWal->exclusiveMode = WAL_HEAPMEMORY_MODE so the wal-index is kept in heap memory instead of mapping *-shm.
    • Trade-off: slightly higher per-process memory and reduced multi-process sharing; high reliability for environments with filter drivers/SMB.

This translated repo

  • Option 3: Windows-only short-circuit in generated code to skip the unlocked header read.

    • In lib/sqlite_windows.go’s _walIndexReadHdr, set badHdr = 1 before the first _walIndexTryHdr call (when page0 is mapped), forcing the locked path. Guard behind GOOS=windows to keep performance on other OSes.
    • This can be applied as a small post-generation patch or kept as a repo-local change while waiting for an upstream switch.
  • Option 4: Runtime/build knob to force heap wal-index on Windows.

    • Introduce a build tag or env flag (e.g., SQLITE_WIN_HEAP_WALINDEX=1) that, when enabled, marks SHM as unreliable and uses heap for the wal-index. Applies only on Windows; avoids *-shm mmaps entirely.

What we can contribute

  • I can communicate this in the sqlite forum
  • I will try to create windows stage to reproduce the issue

Please advise which approach you think may be a better option

Edited by Roman