Server Push Diaries with Cuckoo Filters (!35) · Merge requests · Commons Host / server

Sebastiaan Deckers requested to merge sebdeckers/server:diaries into master Jan 02, 2019

Server Push faces a problem of over-pushing. This can mean:

unused assets being pushed,
already cached assets being pushed again, or
duplicate assets being pushed repeatedly.

These problem would be addressed by the proposed Cache Digests HTTP/2 extension. Using probabilistic data structures like Bloom Filters, Golomb Coded Sets, or Cuckoo Filters, a condensed representation of the browser cache would be shared by the browser and server. Knowing what the browser already has cached, the server could avoid wasting bandwidth on unnecessary Server Push streams.

Sadly real-world adoption has been minimal by browser vendors, despite efforts by myself and others to shim this with HTTP cookies and Service Workers to prove the technique's effectiveness. Since browser-based Cache Digests are not moving forward very much in the past 2 years, another approach is to implement a push diary on the server side.

HTTP/2 sessions are intended to be very long lived. In the case of popular CDNs, sessions may even be re-used across many domains. This allows reasonable tracking of which assets are already pushed to the end user during that session.

Apache httpd web server implemented this some time ago and enables it by default, so it seems to work.

This patch implements a diary for the Commons Host server. Since the size of each diary is only a few bits per entry, hundreds of pushed assets can be stored efficiently. Lookups are in the millions per second using modern hardware. The minimal memory and CPU cost on the server is vastly outweighed by otherwise wasted network round trips.

Best of all, this feature is 100% transparent to both end users and web developers deploying their website with a Server Push Manifest. It just works.

Future plans: Currently Commons Host TLS certificates only list a single domain per site. Once multi-site certificates are implemented, browsers will automatically re-use HTTP/2 sessions and the push diary will be even more effective.

Diary Size Selection

Max size of the diary does not mean it will ever be fully occupied, and lookup/insert performance varies by occupancy rate, so it may be good to over-provision memory.

Memory size of the Cuckoo Filter diary grows in steps. Table shows the maximum number of items per step-increase of the memory size in bytes.

Items	Bytes
3	6
7	12
15	24
30	48
61	96
122	192
245	384
491	768
983	1536
1966	3072
3932	6144
7864	12288

Playground Code

const CuckooFilter = require('cuckoofilter-native')
const { randomBytes } = require('crypto')

// Check the memory cost of different sized sketches.

let lastSize = new CuckooFilter(0).bytes
console.log('| Items | Bytes |')
console.log('|-------|-------|')
for (let i = 1; i < 10000; i++) {
  const sketch = new CuckooFilter(i)
  if (lastSize === sketch.bytes) continue
  console.log(`| ${String(i - 1).padStart(5)} | ${String(lastSize).padStart(5)} |`)
  lastSize = sketch.bytes
}

// Check the behaviour of filling up the sketch.

const maxSize = 256
const sketch = new CuckooFilter(maxSize)
let tooFull = 0
let hits = 0
let insertions = 0
let overruns = 0
for (let i = 0; i < maxSize * 20; i++) {
  const value = randomBytes(1000).toString('hex')
  // const value = String(i).repeat(10000)
  if (sketch.size > maxSize) {
    console.log(sketch.size, 'size')
    overruns++
  }
  if (sketch.contain(value)) {
    hits++
    // console.log(`HIT ${i} - Size: ${sketch.size} (${sketch.bytes} bytes)`)
  } else {
    try {
      sketch.add(value)
    } catch (error) {
      tooFull++
      // console.log(error.message)
      continue
    }
    insertions++
    // console.log('INSERTED')
  }
}
console.log(`${overruns} times overrun`)
console.log(`${hits} times hit`)
console.log(`${insertions} times inserted`)
console.log(`${tooFull} times too full`)
console.log(`Size: ${sketch.size} (${sketch.bytes} bytes)`)

References

Paper about Cuckoo Filters (CF) http://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf
C++ implementation of CF https://github.com/efficient/cuckoofilter
Node.js wrapper of the C++ CF library https://github.com/mcollina/cuckoofilter
Apache httpd implements HTTP/2 Server Push diary to avoid over-push problem https://httpd.apache.org/docs/trunk/mod/mod_http2.html#h2pushdiarysize
Discussing using Cuckoo Filters for CACHE_DIGEST (CD) HTTP/2 extension frame https://github.com/httpwg/http-extensions/pull/413
Reference implementation CD with CF in C++ https://github.com/yoavweiss/cache-digests-cuckoo
Prototype of CD for Node.js https://gitlab.com/commonshost/cache-digest-koel

Server Push Diaries with Cuckoo Filters

Diary Size Selection

Playground Code

References

Merge request reports