Windows: crash on WAL startup in v1.38.0+ due to missing SEH in ccgo build (v1.37.0 OK)
Summary
- After upgrading to
modernc.org/sqlite v1.38.0
(SQLite 3.50.1), our user's Windows Server 2019 deployment crashes at startup when opening a database in WAL mode. Pinning tov1.37.0
avoids the problem. - Root cause: SQLite 3.50.1 relies on Windows SEH around the WAL-index header “optimistic” unlocked read. The ccgo build is compiled with
SQLITE_OMIT_SEH
, so Windows I/O exceptions from memory-mapped*-shm
reads are not caught and crash the process. In upstream C builds, SEH catches these and SQLite internally retries/recover, so there is no crash.
Environment
- OS: Windows Server 2019
- DB: WAL mode, files on a secondary data volume
- Likely filters on that volume (AV/EDR/backup/snapshot/dedup/compression), which increase transient in-page faults on mmapped I/O (asked user on more details)
Observed Crash
- Fatal on first prepare/open:
-
signal 0xc0000006
(STATUS_IN_PAGE_ERROR) - faulting in
memcpy
during wal-index header read from*-shm
-
- Top of stack (example):
unexpected fault address 0x20e69690000 [signal 0xc0000006 code=0x0 addr=0x20e69690000 pc=...] modernc.org/libc.Xmemcpy(...) modernc.org/sqlite/lib._walIndexTryHdr(...) modernc.org/sqlite/lib._walIndexReadHdr(...) modernc.org/sqlite/lib._walTryBeginRead(...) modernc.org/sqlite/lib._walBeginReadTransaction(...) modernc.org/sqlite/lib._sqlite3PagerSharedLock(...) modernc.org/sqlite/lib.Xsqlite3_prepare_v3(...)
- The fault address is 64KB-aligned (typical
MapViewOfFile
granularity), consistent with a page fault in the mapped*-shm
view during the “optimistic” unlocked read.
Why this is upstream-related but only crashes in the ccgo build
- In SQLite C (3.50.1–3.50.4, Windows):
-
walIndexReadHdr
first does an unlocked read (twomemcpy
reads of 48 bytes) from the mapped*-shm
header, then retries under a lock if needed. - The retry path sets
writeLock = 2
specifically sowalHandleException()
will unlock if a SEH exception is thrown. - The read is inside
__try/__except
(SEH). If Windows raisesSTATUS_IN_PAGE_ERROR
(e.g., due to filter driver/SMB/dedup transient or truncate), SQLite catches it and returnsWAL_RETRY
/SQLITE_IOERR
. No crash; the library internally retries/recover.
-
- In the Go-translated ccgo build:
- Built with
SQLITE_OMIT_SEH
(as indicated in the generated file banner), so Windows exceptions are not caught. The unlockedmemcpy
can raise0xC0000006
and terminate the process before SQLite’s retry logic can run.
- Built with
- Version note: v1.38.0 (upgrade to SQLite 3.50.1) explicitly sets
writeLock = 2
and expects SEH. v1.37.0 did not exhibit this crash on our host.
Expected Behavior
- On Windows, transient mmapped I/O faults during WAL header reads should not crash the process; they should be handled internally (converted to
WAL_RETRY
/SQLITE_IOERR
) as in upstream SQLite.
Proposed fixes (SQLite C upstream and ccgo wrapping)
SQLite C upstream
-
Option 1: Compile-time switch to skip the optimistic unlocked header read on Windows when SEH is unavailable.
- In
walIndexReadHdr
(wal.c), under#if defined(SQLITE_OS_WIN) && defined(SQLITE_OMIT_SEH)
, set the initial attempt to “failed” (e.g.,badHdr = 1;
) so the code immediately takes the locked retry path (no unlockedmemcpy
from*-shm
). - Alternatively, introduce a dedicated macro (e.g.,
SQLITE_WAL_SKIP_OPTIMISTIC_READ
) and enable it by default whenSQLITE_OMIT_SEH && SQLITE_OS_WIN
. - Benefit: preserves upstream’s behavior everywhere else; avoids SEH-only assumptions when SEH is not present.
- In
-
Option 2: Compile-time switch to force heap wal-index when SEH is unavailable on Windows.
- In early WAL-open/read paths, when
SQLITE_OMIT_SEH && SQLITE_OS_WIN
, setpWal->bShmUnreliable = 1
andpWal->exclusiveMode = WAL_HEAPMEMORY_MODE
so the wal-index is kept in heap memory instead of mapping*-shm
. - Trade-off: slightly higher per-process memory and reduced multi-process sharing; high reliability for environments with filter drivers/SMB.
- In early WAL-open/read paths, when
This translated repo
-
Option 3: Windows-only short-circuit in generated code to skip the unlocked header read.
- In
lib/sqlite_windows.go
’s_walIndexReadHdr
, setbadHdr = 1
before the first_walIndexTryHdr
call (when page0 is mapped), forcing the locked path. Guard behindGOOS=windows
to keep performance on other OSes. - This can be applied as a small post-generation patch or kept as a repo-local change while waiting for an upstream switch.
- In
-
Option 4: Runtime/build knob to force heap wal-index on Windows.
- Introduce a build tag or env flag (e.g.,
SQLITE_WIN_HEAP_WALINDEX=1
) that, when enabled, marks SHM as unreliable and uses heap for the wal-index. Applies only on Windows; avoids*-shm
mmaps entirely.
- Introduce a build tag or env flag (e.g.,
What we can contribute
- I can communicate this in the sqlite forum
- I will try to create windows stage to reproduce the issue
Please advise which approach you think may be a better option
Edited by Roman