Server Push Diaries with Cuckoo Filters
Server Push faces a problem of over-pushing. This can mean:
- unused assets being pushed,
- already cached assets being pushed again, or
- duplicate assets being pushed repeatedly.
These problem would be addressed by the proposed Cache Digests HTTP/2 extension. Using probabilistic data structures like Bloom Filters, Golomb Coded Sets, or Cuckoo Filters, a condensed representation of the browser cache would be shared by the browser and server. Knowing what the browser already has cached, the server could avoid wasting bandwidth on unnecessary Server Push streams.
Sadly real-world adoption has been minimal by browser vendors, despite efforts by myself and others to shim this with HTTP cookies and Service Workers to prove the technique's effectiveness. Since browser-based Cache Digests are not moving forward very much in the past 2 years, another approach is to implement a push diary on the server side.
HTTP/2 sessions are intended to be very long lived. In the case of popular CDNs, sessions may even be re-used across many domains. This allows reasonable tracking of which assets are already pushed to the end user during that session.
Apache httpd web server implemented this some time ago and enables it by default, so it seems to work.
This patch implements a diary for the Commons Host server. Since the size of each diary is only a few bits per entry, hundreds of pushed assets can be stored efficiently. Lookups are in the millions per second using modern hardware. The minimal memory and CPU cost on the server is vastly outweighed by otherwise wasted network round trips.
Best of all, this feature is 100% transparent to both end users and web developers deploying their website with a Server Push Manifest. It just works.
Future plans: Currently Commons Host TLS certificates only list a single domain per site. Once multi-site certificates are implemented, browsers will automatically re-use HTTP/2 sessions and the push diary will be even more effective.
Diary Size Selection
Max size of the diary does not mean it will ever be fully occupied, and lookup/insert performance varies by occupancy rate, so it may be good to over-provision memory.
Memory size of the Cuckoo Filter diary grows in steps. Table shows the maximum number of items per step-increase of the memory size in bytes.
Items | Bytes |
---|---|
3 | 6 |
7 | 12 |
15 | 24 |
30 | 48 |
61 | 96 |
122 | 192 |
245 | 384 |
491 | 768 |
983 | 1536 |
1966 | 3072 |
3932 | 6144 |
7864 | 12288 |
Playground Code
const CuckooFilter = require('cuckoofilter-native')
const { randomBytes } = require('crypto')
// Check the memory cost of different sized sketches.
let lastSize = new CuckooFilter(0).bytes
console.log('| Items | Bytes |')
console.log('|-------|-------|')
for (let i = 1; i < 10000; i++) {
const sketch = new CuckooFilter(i)
if (lastSize === sketch.bytes) continue
console.log(`| ${String(i - 1).padStart(5)} | ${String(lastSize).padStart(5)} |`)
lastSize = sketch.bytes
}
// Check the behaviour of filling up the sketch.
const maxSize = 256
const sketch = new CuckooFilter(maxSize)
let tooFull = 0
let hits = 0
let insertions = 0
let overruns = 0
for (let i = 0; i < maxSize * 20; i++) {
const value = randomBytes(1000).toString('hex')
// const value = String(i).repeat(10000)
if (sketch.size > maxSize) {
console.log(sketch.size, 'size')
overruns++
}
if (sketch.contain(value)) {
hits++
// console.log(`HIT ${i} - Size: ${sketch.size} (${sketch.bytes} bytes)`)
} else {
try {
sketch.add(value)
} catch (error) {
tooFull++
// console.log(error.message)
continue
}
insertions++
// console.log('INSERTED')
}
}
console.log(`${overruns} times overrun`)
console.log(`${hits} times hit`)
console.log(`${insertions} times inserted`)
console.log(`${tooFull} times too full`)
console.log(`Size: ${sketch.size} (${sketch.bytes} bytes)`)
References
- Paper about Cuckoo Filters (CF) http://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf
- C++ implementation of CF https://github.com/efficient/cuckoofilter
- Node.js wrapper of the C++ CF library https://github.com/mcollina/cuckoofilter
- Apache httpd implements HTTP/2 Server Push diary to avoid over-push problem https://httpd.apache.org/docs/trunk/mod/mod_http2.html#h2pushdiarysize
- Discussing using Cuckoo Filters for CACHE_DIGEST (CD) HTTP/2 extension frame https://github.com/httpwg/http-extensions/pull/413
- Reference implementation CD with CF in C++ https://github.com/yoavweiss/cache-digests-cuckoo
- Prototype of CD for Node.js https://gitlab.com/commonshost/cache-digest-koel