Commit adff4b17 authored by cznic's avatar cznic
Browse files

Merge branch 'pcache-shared-cache-draft' into 'master'

pcache: per-cache mutex for -race cleanliness under cache=shared

See merge request !131
parents 14e5790e 1580c89c
Loading
Loading
Loading
Loading
+2 −0
Original line number Diff line number Diff line
@@ -11,6 +11,8 @@
     - See [GitLab merge request #127](https://gitlab.com/cznic/sqlite/-/merge_requests/127), thanks Ian Chechin!
     - Tighten the `modernc.org/sqlite/pcache` reference implementation per cznic's !127 review follow-ups. Adds `Stats.EasyRefusals`, a per-Pool counter for the cases where `FetchCreateEasy` returns nil at cap; SQLite reacts to a refusal by spilling dirty pages and retrying with `FetchCreateForce`, so the new field is a direct proxy for the I/O pressure the strict Easy contract imposes vs pcache1's recycle-without-spill behavior. `BenchmarkPoolEvictionChurn` was reworked to drive a rotating-residue DELETE (`k % 3 = i % 3`) and re-insert a matching batch each cycle so the spill pressure recurs and `easy-refusals/op` scales with `b.N` instead of capping at the seed's one-time first-cycle cost; both existing benchmarks now report `easy-refusals/op` alongside the page-allocs/evictions metrics. `Stats.Evictions` documentation was tightened to match the actual behavior (counts LRU eviction, `Unpin(discard=true)`, `Shrink` releases, and `Unpin(discard=false)` trimming back to target after a `FetchCreateForce` overcommit; bulk frees from `Truncate`, `Rekey` collisions, and `Destroy` are not counted). The `TestPoolRoundTripIntegrity` comment claiming the workload exercises `xRekey` ~15 times has been corrected; the SQL surface does not reliably emit xRekey here, and that codepath is covered by the unit tests instead.
     - See [GitLab merge request #130](https://gitlab.com/cznic/sqlite/-/merge_requests/130), thanks Ian Chechin!
     - Make `modernc.org/sqlite/pcache` `-race`-clean under SQLite's `cache=shared` mode. The pool already runs correctly under shared-cache because every callback into a given `Cache` is serialised internally by SQLite's `sqlite3BtreeEnter` on the `BtShared` mutex; verified empirically with a lock-free in-flight probe (max-in-flight = 1 on the canonical two-connection workload, 4 on a positive control with goroutines hitting the cache directly). However the Go race detector does not recognise SQLite's libc mutex as a happens-before edge and reports false-positive races on `Fetch` vs `Unpin` reads/writes of the per-cache state, which surfaces as `DATA RACE` failures for any user who registers the pool and runs their suite under `-race`. A `sync.Mutex` on the `cache` type is now taken on every public method (`SetSize`, `PageCount`, `Fetch`, `Unpin`, `Rekey`, `Truncate`, `Destroy`, `Shrink`), always. On the common non-shared-cache path the lock is uncontended (one atomic CAS per Lock/Unlock pair, negligible next to the SQLite work it bookends); on the shared-cache path it just rubber-stamps the order SQLite's `BtShared` mutex already established. A new `e2e_test.go` `TestSharedCacheTwoConns_Integrity` drives two `sql.Conn` against the same `cache=shared` URI with concurrent writers and asserts `PRAGMA integrity_check = ok` under `-race`; passes cleanly with the lock, would surface the false-positive without it. Design notes live in `pcache/sharing.go`.
     - See [GitLab merge request #131](https://gitlab.com/cznic/sqlite/-/merge_requests/131), thanks Ian Chechin!
     - Add an opt-in `_dqs` DSN query parameter that disables SQLite's double-quoted string literal compatibility quirk on a per-connection basis. When `_dqs=0` (or any `strconv.ParseBool` false value) is supplied, the driver calls `sqlite3_db_config` with `SQLITE_DBCONFIG_DQS_DDL` and `SQLITE_DBCONFIG_DQS_DML` set to off before any statement is prepared, so a double-quoted identifier that fails to resolve raises a parse error instead of silently falling back to a string literal. Absence of the parameter, or `_dqs=1`, leaves SQLite's default behavior unchanged; existing DSNs continue to work byte-for-byte. Resolves [GitLab issue #61](https://gitlab.com/cznic/sqlite/-/issues/61).
     - See [GitLab merge request #128](https://gitlab.com/cznic/sqlite/-/merge_requests/128), thanks Ian Chechin!
     - Add an opt-in `_error_rc` DSN query parameter for clearer error reporting on open-time failures. When `_error_rc=1` (or any `strconv.ParseBool` true value) is supplied, error strings synthesised from a `(rc, db)` pair only append `sqlite3_errmsg(db)` when `sqlite3_extended_errcode(db)` is consistent with the operation rc (full match first, primary code `&0xff` as fallback). On mismatch the canonical `sqlite3_errstr(rc)` is used alone, so an open-time `SQLITE_CANTOPEN` no longer carries the temporary handle's stale "out of memory" errmsg. Absence of the parameter, or `_error_rc=0`, preserves the legacy "errstr: errmsg" form byte-for-byte; existing callers that parse error strings are unaffected. The driver's `*Error.Code()` returns the same SQLite result code in both modes. Parsed before `sqlite3_open_v2` so open-time errors are covered. Resolves [GitLab issue #230](https://gitlab.com/cznic/sqlite/-/issues/230).
+110 −0
Original line number Diff line number Diff line
@@ -202,6 +202,116 @@ func TestPoolMultipleDatabases(t *testing.T) {
	}
}

// TestSharedCacheTwoConns_Integrity exercises the cache=shared
// invariant the per-cache sync.Mutex was added to defend. Two sql.Conn
// values opened against the same cache=shared URI share one engine
// pCache* and therefore one of our cache instances; concurrent
// writers from two goroutines (each on its own pinned Conn) drive
// concurrent Fetch / Unpin / Rekey / Truncate callbacks into that
// shared cache. PRAGMA integrity_check on a third connection at the
// end verifies the database is well-formed.
//
// The substantive assertion is that this test runs cleanly under
// -race. Without the per-cache mutex the Go race detector reports
// false-positive races on cache.pages / page.pinned / page.lruElem
// even though SQLite's BtShared mutex already serialises the
// callbacks; with the mutex the detector sees the happens-before
// edge and the run is clean. See sharing.go for the design notes.
func TestSharedCacheTwoConns_Integrity(t *testing.T) {
	if testing.Short() {
		t.Skip("skipped under -short; this test drives concurrent writers and is meant for the full + -race runs")
	}
	// Shared cache + WAL on a real file. SQLite shares one pCache*
	// across every connection opened against the same URI when
	// cache=shared is in the query string and the path is otherwise
	// identical, so both db.Conn() calls below feed into one Cache.
	dbPath := filepath.Join(t.TempDir(), "shared.db")
	dsn := "file:" + dbPath + "?cache=shared&_pragma=journal_mode(WAL)&_pragma=busy_timeout(5000)"
	db, err := sql.Open("sqlite", dsn)
	if err != nil {
		t.Fatalf("sql.Open: %v", err)
	}
	defer db.Close()

	if _, err := db.Exec(`CREATE TABLE t (k INTEGER PRIMARY KEY, v BLOB)`); err != nil {
		t.Fatalf("CREATE TABLE: %v", err)
	}

	const (
		writers      = 2
		rowsPerLoop  = 200
		writerLoops  = 3 // total per-writer = rowsPerLoop * writerLoops = 600
		blobBytes    = 64
		cacheSizePgs = 16
	)

	errs := make(chan error, writers)
	for w := 0; w < writers; w++ {
		w := w
		go func() {
			conn, err := db.Conn(t.Context())
			if err != nil {
				errs <- fmt.Errorf("writer[%d] Conn: %w", w, err)
				return
			}
			defer conn.Close()
			if _, err := conn.ExecContext(t.Context(),
				fmt.Sprintf(`PRAGMA cache_size=%d`, cacheSizePgs)); err != nil {
				errs <- fmt.Errorf("writer[%d] cache_size: %w", w, err)
				return
			}
			blob := make([]byte, blobBytes)
			for loop := 0; loop < writerLoops; loop++ {
				for i := 0; i < rowsPerLoop; i++ {
					blob[0] = byte(loop)
					blob[1] = byte(w)
					blob[2] = byte(i)
					if _, err := conn.ExecContext(t.Context(),
						`INSERT INTO t(v) VALUES (?)`, blob); err != nil {
						errs <- fmt.Errorf("writer[%d] loop[%d] INSERT[%d]: %w",
							w, loop, i, err)
						return
					}
				}
				if _, err := conn.ExecContext(t.Context(),
					`DELETE FROM t WHERE k % 7 = ?`, int64(w)); err != nil {
					errs <- fmt.Errorf("writer[%d] loop[%d] DELETE: %w",
						w, loop, err)
					return
				}
			}
			errs <- nil
		}()
	}
	for w := 0; w < writers; w++ {
		if err := <-errs; err != nil {
			t.Fatalf("writer: %v", err)
		}
	}

	// integrity_check on a fresh connection: this is the
	// gold-standard "the cache did not silently corrupt anything"
	// assertion. It iterates every page and reports problems by
	// name, so a single "ok" row means clean.
	rows, err := db.Query(`PRAGMA integrity_check`)
	if err != nil {
		t.Fatalf("integrity_check: %v", err)
	}
	defer rows.Close()
	for rows.Next() {
		var line string
		if err := rows.Scan(&line); err != nil {
			t.Fatalf("integrity_check Scan: %v", err)
		}
		if line != "ok" {
			t.Errorf("integrity_check: %q", line)
		}
	}
	if err := rows.Err(); err != nil {
		t.Fatalf("integrity_check Err: %v", err)
	}
}

// statsDelta returns the field-wise difference between two Stats
// snapshots. Counters are monotonic, so every field is >= 0.
func statsDelta(before, after pcache.Stats) pcache.Stats {
+55 −4
Original line number Diff line number Diff line
@@ -41,6 +41,7 @@ package pcache

import (
	"container/list"
	"sync"
	"sync/atomic"
	"unsafe"

@@ -131,10 +132,42 @@ func (p *Pool) Create(pageSize, extraSize int, purgeable bool) (sqlite.Cache, er
	}, nil
}

// cache is one Pool-issued per-database cache. All methods run
// single-goroutine under the parent package's invariant; no
// per-instance synchronisation is required.
// cache is one Pool-issued per-database cache.
//
// Without SQLite's shared-cache mode every method on a given cache
// runs single-goroutine under the parent package's invariant (every
// connection is opened SQLITE_OPEN_FULLMUTEX, no shared cache is
// enabled at the engine level, and database/sql never uses one
// driver.Conn from two goroutines), so no per-instance synchronisation
// is needed for correctness.
//
// Under cache=shared, the engine reuses one pCache* across every
// Btree opened against the same shared-cache scope. SQLite serialises
// those calls internally through sqlite3BtreeEnter on a shared BtShared
// mutex, so the impl never actually races; verified empirically with a
// lock-free in-flight probe (max-in-flight = 1 on the canonical two-
// connection workload, 4 on a positive control with goroutines hitting
// the cache directly). But the Go race detector does not recognise
// SQLite's libc mutex as a happens-before edge and flags the Fetch
// vs Unpin reads/writes on pinned and lruElem as false-positive
// data races.
//
// mu serialises every public method on the cache. It is always
// taken, regardless of whether the parent connection opted into
// shared-cache mode, because the binding has no signal to switch on
// at Pool.Create time. The lock is uncontended on the common
// non-shared-cache path (where SQLite's own invariants already
// guarantee single-goroutine access), so the cost is one
// uncontended atomic CAS per callback - negligible next to the
// SQLite work the callback bookends. The win is that pcache users
// who also opt into cache=shared and run their test suite under
// -race see -race-clean behaviour rather than spurious DATA RACE
// failures; the alternative would be to either document the false
// positive or refuse to serve a shared-cache parent connection at
// Create, both worse trade-offs.
type cache struct {
	mu sync.Mutex

	pool                *Pool
	pageSize, extraSize int
	purgeable           bool
@@ -177,6 +210,8 @@ func (p *page) Extra() unsafe.Pointer { return unsafe.Pointer(p.extra) }
// new target is below the current count. A zero or negative target is
// treated as "unbounded" and disables eviction-on-trim.
func (c *cache) SetSize(n int) {
	c.mu.Lock()
	defer c.mu.Unlock()
	if n < 0 {
		n = 0
	}
@@ -186,7 +221,11 @@ func (c *cache) SetSize(n int) {

// PageCount returns the total number of pages held, pinned plus
// unpinned.
func (c *cache) PageCount() int { return len(c.pages) }
func (c *cache) PageCount() int {
	c.mu.Lock()
	defer c.mu.Unlock()
	return len(c.pages)
}

// Fetch implements the [sqlite.Cache] Fetch contract.
//
@@ -203,6 +242,8 @@ func (c *cache) PageCount() int { return len(c.pages) }
// overcommits and returns a fresh page; pcache1 behaves identically
// and SQLite needs the page to make forward progress.
func (c *cache) Fetch(key uint32, mode sqlite.FetchMode) sqlite.Page {
	c.mu.Lock()
	defer c.mu.Unlock()
	if c.destroyed {
		return nil
	}
@@ -252,6 +293,8 @@ func (c *cache) Fetch(key uint32, mode sqlite.FetchMode) sqlite.Page {
// is only ever called with discard=true in practice, so this branch
// is a safety net rather than a hot path.
func (c *cache) Unpin(pg sqlite.Page, discard bool) {
	c.mu.Lock()
	defer c.mu.Unlock()
	if c.destroyed {
		return
	}
@@ -274,6 +317,8 @@ func (c *cache) Unpin(pg sqlite.Page, discard bool) {
// moment of the call; the cache discards it here so the binding's
// stale-stub bookkeeping has nothing left to point at.
func (c *cache) Rekey(pg sqlite.Page, oldKey, newKey uint32) {
	c.mu.Lock()
	defer c.mu.Unlock()
	if c.destroyed {
		return
	}
@@ -294,6 +339,8 @@ func (c *cache) Rekey(pg sqlite.Page, oldKey, newKey uint32) {
// entry whose key is greater than or equal to limit is released,
// including pinned entries.
func (c *cache) Truncate(limit uint32) {
	c.mu.Lock()
	defer c.mu.Unlock()
	if c.destroyed {
		return
	}
@@ -310,6 +357,8 @@ func (c *cache) Truncate(limit uint32) {
// from the binding become no-ops. The parent binding will not call
// any other method after Destroy.
func (c *cache) Destroy() {
	c.mu.Lock()
	defer c.mu.Unlock()
	if c.destroyed {
		return
	}
@@ -331,6 +380,8 @@ func (c *cache) Destroy() {
// Shrink drops every unpinned page, regardless of targetSize. SQLite
// uses this as a memory-pressure hint.
func (c *cache) Shrink() {
	c.mu.Lock()
	defer c.mu.Unlock()
	if c.destroyed {
		return
	}

pcache/sharing.go

0 → 100644
+72 −0
Original line number Diff line number Diff line
// Copyright 2026 The Sqlite Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

package pcache

// This file documents how the pcache pool behaves under SQLite's
// shared-cache mode (cache=shared URI parameter or the deprecated
// sqlite3_enable_shared_cache flag). The implementation lives in
// pool.go on the cache type's mu sync.Mutex; this file exists as the
// design-record companion to that lock.
//
// Background. The !127 pcache implementation was deliberately scoped
// to one pCache per connection: SQLITE_OPEN_FULLMUTEX + no shared
// cache + database/sql's one-conn-one-goroutine invariant meant every
// callback on a given cache ran single-goroutine. cznic flagged on
// the !127 merge that this assumption would need to be revisited
// when shared-cache support landed.
//
// What changes under cache=shared. The engine reuses one pCache*
// across every Btree opened against the same shared-cache scope, so
// our pcache callbacks may be invoked from different goroutines in
// quick succession. They do not actually overlap: SQLite serialises
// every callback through sqlite3BtreeEnter on the BtShared mutex
// (SQLITE_THREADSAFE=1, the default for modernc.org/sqlite). The
// pcache pool is therefore correct under cache=shared by virtue of
// SQLite's own serialisation; verified empirically with a lock-free
// in-flight probe (max-in-flight = 1 on the canonical two-connection
// shared-cache workload, 4 on a positive control where goroutines
// hit the cache directly without SQLite's mutex).
//
// Why the per-cache sync.Mutex is taken anyway. The Go race detector
// does not recognise SQLite's libc mutex as a happens-before edge.
// Without an additional Go-level synchronisation primitive it reports
// the Fetch vs Unpin reads/writes on cache.pages / page.pinned /
// page.lruElem as DATA RACE, even though the SQLite-level mutex
// guarantees they never actually overlap. The per-cache sync.Mutex
// on the cache type, always taken, hands the race detector the
// happens-before edge it needs and turns the false-positive into
// a clean run.
//
// Cost. On the common (non-shared-cache) path every callback runs
// single-goroutine anyway, so the lock is always uncontended:
// roughly one atomic CAS per Lock/Unlock pair, negligible next to
// the SQLite work the callback bookends. On the shared-cache path
// the lock is also nearly uncontended because SQLite's BtShared
// mutex has already serialised the callers; the Go lock just
// rubber-stamps the order that already happened.
//
// Alternatives considered.
//
//   - Always-taken vs conditional (per-instance) locking. A
//     conditional lock would skip the Lock/Unlock pair on
//     non-shared-cache connections, but the binding has no signal
//     at Pool.Create time to know which scope a connection belongs
//     to (the PageCache.Create signature passes pageSize/extraSize/
//     purgeable, not a cache-shared bit, and extending it would
//     break the sqlite.PageCache contract from !126). The
//     uncontended cost was not worth the API churn.
//
//   - Document-as-unsupported + reject shared-cache parent at
//     Create. Smaller surface but a pcache user who registers the
//     pool and then opens a cache=shared db would get a confusing
//     post-registration failure. Worse, the impl is already correct
//     under cache=shared by SQLite's own serialisation, so refusing
//     to serve the workload would be conservatism without payoff.
//
// Test surface. e2e_test.go carries TestSharedCacheTwoConns_Integrity,
// the canonical two-conn shared-cache workload (concurrent writers +
// PRAGMA integrity_check) under -race. It passes on the always-locked
// pool and fails (race detector report) if the per-cache mutex is
// removed.