Skip to content

Server Push Diaries with Cuckoo Filters

Sebastiaan Deckers requested to merge sebdeckers/server:diaries into master

Server Push faces a problem of over-pushing. This can mean:

  1. unused assets being pushed,
  2. already cached assets being pushed again, or
  3. duplicate assets being pushed repeatedly.

These problem would be addressed by the proposed Cache Digests HTTP/2 extension. Using probabilistic data structures like Bloom Filters, Golomb Coded Sets, or Cuckoo Filters, a condensed representation of the browser cache would be shared by the browser and server. Knowing what the browser already has cached, the server could avoid wasting bandwidth on unnecessary Server Push streams.

Sadly real-world adoption has been minimal by browser vendors, despite efforts by myself and others to shim this with HTTP cookies and Service Workers to prove the technique's effectiveness. Since browser-based Cache Digests are not moving forward very much in the past 2 years, another approach is to implement a push diary on the server side.

HTTP/2 sessions are intended to be very long lived. In the case of popular CDNs, sessions may even be re-used across many domains. This allows reasonable tracking of which assets are already pushed to the end user during that session.

Apache httpd web server implemented this some time ago and enables it by default, so it seems to work.

This patch implements a diary for the Commons Host server. Since the size of each diary is only a few bits per entry, hundreds of pushed assets can be stored efficiently. Lookups are in the millions per second using modern hardware. The minimal memory and CPU cost on the server is vastly outweighed by otherwise wasted network round trips.

Best of all, this feature is 100% transparent to both end users and web developers deploying their website with a Server Push Manifest. It just works.

Future plans: Currently Commons Host TLS certificates only list a single domain per site. Once multi-site certificates are implemented, browsers will automatically re-use HTTP/2 sessions and the push diary will be even more effective.

Diary Size Selection

Max size of the diary does not mean it will ever be fully occupied, and lookup/insert performance varies by occupancy rate, so it may be good to over-provision memory.

Memory size of the Cuckoo Filter diary grows in steps. Table shows the maximum number of items per step-increase of the memory size in bytes.

Items Bytes
3 6
7 12
15 24
30 48
61 96
122 192
245 384
491 768
983 1536
1966 3072
3932 6144
7864 12288

Playground Code

const CuckooFilter = require('cuckoofilter-native')
const { randomBytes } = require('crypto')

// Check the memory cost of different sized sketches.

let lastSize = new CuckooFilter(0).bytes
console.log('| Items | Bytes |')
console.log('|-------|-------|')
for (let i = 1; i < 10000; i++) {
  const sketch = new CuckooFilter(i)
  if (lastSize === sketch.bytes) continue
  console.log(`| ${String(i - 1).padStart(5)} | ${String(lastSize).padStart(5)} |`)
  lastSize = sketch.bytes
}

// Check the behaviour of filling up the sketch.

const maxSize = 256
const sketch = new CuckooFilter(maxSize)
let tooFull = 0
let hits = 0
let insertions = 0
let overruns = 0
for (let i = 0; i < maxSize * 20; i++) {
  const value = randomBytes(1000).toString('hex')
  // const value = String(i).repeat(10000)
  if (sketch.size > maxSize) {
    console.log(sketch.size, 'size')
    overruns++
  }
  if (sketch.contain(value)) {
    hits++
    // console.log(`HIT ${i} - Size: ${sketch.size} (${sketch.bytes} bytes)`)
  } else {
    try {
      sketch.add(value)
    } catch (error) {
      tooFull++
      // console.log(error.message)
      continue
    }
    insertions++
    // console.log('INSERTED')
  }
}
console.log(`${overruns} times overrun`)
console.log(`${hits} times hit`)
console.log(`${insertions} times inserted`)
console.log(`${tooFull} times too full`)
console.log(`Size: ${sketch.size} (${sketch.bytes} bytes)`)

References

Merge request reports