Skip to content
  • Nathan Bossart's avatar
    Optimize pg_popcount() with AVX-512 instructions. · 792752af
    Nathan Bossart authored
    Presently, pg_popcount() processes data in 32-bit or 64-bit chunks
    when possible.  Newer hardware that supports AVX-512 instructions
    can use 512-bit chunks, which provides a nice speedup, especially
    for larger buffers.  This commit introduces the infrastructure
    required to detect compiler and CPU support for the required
    AVX-512 intrinsic functions, and it adds a new pg_popcount()
    implementation that uses these functions.  If CPU support for this
    optimized implementation is detected at runtime, a function pointer
    is updated so that it is used by subsequent calls to pg_popcount().
    
    Most of the existing in-tree calls to pg_popcount() should benefit
    from these instructions, and calls with smaller buffers should at
    least not regress compared to v16.  The new infrastructure
    introduced by this commit can also be used to optimize
    visibilitymap_count(), but that is left for a follow-up commit.
    
    Co-authored-by: Paul Amonson, Ants Aasma
    Reviewed-by: Matthias van de Meent, Tom Lane, Noah Misch, Akash Shankaran, Alvaro Herrera, Andres Freund, David Rowley
    Discussion: https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com
    792752af