Skip to content

OpenGL coalesce vertex buffer copies

Yon Uriarte requested to merge kicad3053404/kicad:opengl-coalesce-copy into master

When growing OpenGL vertex buffers an OpenGL copy operation is requested for each item. It's possible to coalesce consecutive copies. Items are sorted by start address and merged with the next one if possible. On opening a file this nets a N:1 reduction in OpenGL calls as there are no holes in the src buffer.

It'd be great to get timings on different hw/driver combos, I suspect there's gonna be a great variation.

TODO:

  • Benchmark on different hw/driver combos.
  • Remove profiling.
  • Remove #ifdef DO_SORT, it's just for an easier benchmark comparison.

Note: this is on intel win10 UHD drivers (8350 mobile processor), which usually don't use this path, opting for direct memcpy of the mmaped buffers. The check was removed and they use API calls. This faster but per the comments this might crash some driver versions. Maybe not invoking the drivers 120k times helps stability and the check could be removed.

Benchmark: loading a big test PCB that generates 1 GB vertex buffer. Each run is 5 buffer reallocations. TL;DR is the cumulative timing. It's saving around a second on this combo.

Before: 3 runs

Time to copy 1048576 bytes and 19833 items, # calls 19833 #merges 0 : glCopyBuffer took 39.8498ms
glCopyBuffer post Map took 122.12ms
Time to defrag  2097152 vertices, ( cumulative 163.421 ) : CACHED_CONTAINER::reallocate took 163.421ms

Time to copy 2097152 bytes and 26953 items, # calls 26953 #merges 0 : glCopyBuffer took 65.9292ms
glCopyBuffer post Map took 208.45ms
Time to defrag  4194304 vertices, ( cumulative 439.403 ) : CACHED_CONTAINER::reallocate took 275.982ms

Time to copy 4194304 bytes and 41431 items, # calls 41431 #merges 0 : glCopyBuffer took 63.5321ms
glCopyBuffer post Map took 382.573ms
Time to defrag  8388608 vertices, ( cumulative 888.195 ) : CACHED_CONTAINER::reallocate took 448.792ms

Time to copy 8388608 bytes and 71038 items, # calls 71038 #merges 0 : glCopyBuffer took 174.649ms
glCopyBuffer post Map took 726.257ms
Time to defrag  16777216 vertices, ( cumulative 1792.45 ) : CACHED_CONTAINER::reallocate took 904.258ms

Time to copy 16777216 bytes and 128115 items, # calls 128115 #merges 0 : glCopyBuffer took 349.541ms
glCopyBuffer post Map took 1.41187s
Time to defrag  33554432 vertices, ( cumulative 3556.99 ) : CACHED_CONTAINER::reallocate took 1.76454s





Time to copy 1048576 bytes and 19833 items, # calls 19833 #merges 0 : glCopyBuffer took 38.9023ms
glCopyBuffer post Map took 122.088ms
Time to defrag  2097152 vertices, ( cumulative 162.466 ) : CACHED_CONTAINER::reallocate took 162.466ms

Time to copy 2097152 bytes and 26953 items, # calls 26953 #merges 0 : glCopyBuffer took 55.4274ms
glCopyBuffer post Map took 208.019ms
Time to defrag  4194304 vertices, ( cumulative 427.916 ) : CACHED_CONTAINER::reallocate took 265.45ms

Time to copy 4194304 bytes and 41431 items, # calls 41431 #merges 0 : glCopyBuffer took 64.0795ms
glCopyBuffer post Map took 386.007ms
Time to defrag  8388608 vertices, ( cumulative 879.245 ) : CACHED_CONTAINER::reallocate took 451.329ms

Time to copy 8388608 bytes and 71038 items, # calls 71038 #merges 0 : glCopyBuffer took 175.05ms
glCopyBuffer post Map took 729.114ms
Time to defrag  16777216 vertices, ( cumulative 1785.37 ) : CACHED_CONTAINER::reallocate took 906.129ms

Time to copy 16777216 bytes and 128115 items, # calls 128115 #merges 0 : glCopyBuffer took 348.158ms
glCopyBuffer post Map took 1.40416s
Time to defrag  33554432 vertices, ( cumulative 3540.86 ) : CACHED_CONTAINER::reallocate took 1.75549s





Time to copy 1048576 bytes and 19833 items, # calls 19833 #merges 0 : glCopyBuffer took 39.0761ms
glCopyBuffer post Map took 124.195ms
Time to defrag  2097152 vertices, ( cumulative 164.996 ) : CACHED_CONTAINER::reallocate took 164.996ms

Time to copy 2097152 bytes and 26953 items, # calls 26953 #merges 0 : glCopyBuffer took 67.166ms
glCopyBuffer post Map took 215.705ms
Time to defrag  4194304 vertices, ( cumulative 449.654 ) : CACHED_CONTAINER::reallocate took 284.658ms

Time to copy 4194304 bytes and 41431 items, # calls 41431 #merges 0 : glCopyBuffer took 63.1543ms
glCopyBuffer post Map took 398.557ms
Time to defrag  8388608 vertices, ( cumulative 913.554 ) : CACHED_CONTAINER::reallocate took 463.901ms

Time to copy 8388608 bytes and 71038 items, # calls 71038 #merges 0 : glCopyBuffer took 175.581ms
glCopyBuffer post Map took 757.597ms
Time to defrag  16777216 vertices, ( cumulative 1848.99 ) : CACHED_CONTAINER::reallocate took 935.439ms

Time to copy 16777216 bytes and 128115 items, # calls 128115 #merges 0 : glCopyBuffer took 335.899ms
glCopyBuffer post Map took 1.46708s
Time to defrag  33554432 vertices, ( cumulative 3655.11 ) : CACHED_CONTAINER::reallocate took 1.80612s

After: 1 run


glCopyBuffer pre-sort took 2.5836ms
Time to copy 1048576 bytes and 19833 items, # calls 1 #merges 19833 : glCopyBuffer took 17.9823ms
glCopyBuffer post Map took 71.3982ms
Time to defrag  2097152 vertices, ( cumulative 91.0055 ) : CACHED_CONTAINER::reallocate took 91.0055ms

glCopyBuffer pre-sort took 2.9535ms
Time to copy 2097152 bytes and 26953 items, # calls 1 #merges 26953 : glCopyBuffer took 30.8489ms
glCopyBuffer post Map took 142.968ms
Time to defrag  4194304 vertices, ( cumulative 266.731 ) : CACHED_CONTAINER::reallocate took 175.726ms

glCopyBuffer pre-sort took 5.2545ms
Time to copy 4194304 bytes and 41431 items, # calls 1 #merges 41431 : glCopyBuffer took 45.7286ms
glCopyBuffer post Map took 284.233ms
Time to defrag  8388608 vertices, ( cumulative 598.632 ) : CACHED_CONTAINER::reallocate took 331.901ms

glCopyBuffer pre-sort took 9.3934ms
Time to copy 8388608 bytes and 71038 items, # calls 1 #merges 71038 : glCopyBuffer took 89.9987ms
glCopyBuffer post Map took 564.452ms
Time to defrag  16777216 vertices, ( cumulative 1255.59 ) : CACHED_CONTAINER::reallocate took 656.963ms

glCopyBuffer pre-sort took 17.9027ms
Time to copy 16777216 bytes and 128115 items, # calls 1 #merges 128115 : glCopyBuffer took 230.003ms
glCopyBuffer post Map took 1.10567s
Time to defrag  33554432 vertices, ( cumulative 2594.41 ) : CACHED_CONTAINER::reallocate took 1.33881s

Merge request reports