• Parav Pandit's avatar
    RDMA/core: Acquire and release mmap_sem on page range · 3994586f
    Parav Pandit authored
    Currently mmap_sem is read locked while pinning the memory.  In a
    multi-threaded application of a process, holding mmap_sem lock creates
    contention with other threads who might be either registering memory,
    creating QPs or simply doing mmap() as such operations also require to
    hold the mmap_sem write lock.
    
    All such operation cannot make forward progress until one memory pin
    operation is completed.  It becomes more worse if the memory is unpinned
    and/or memory registration is large (in GB range).
    
    Therefore, instead of holding mmap_sem for too long (for whole region
    pinning), acquire and release the lock for every few pages.  For example
    on x86 with 4K page size, acquire and release mmap_sem for every 2Mbytes
    memory chunk.
    
    This allows other competing threads to make progress who might wish to
    hold mmap_sem for shorter duration.
    
    When memory registration latency is measured using [1] for memory sizes
    ranging from 4K to 48GB, <= 1% or 0.5% degradation is noticed. In many
    runs no difference is seen other than run-to-run variance.
    
    In other targeted tests of users with large memory, desired improvements
    are seen due to reduced contention of mmap_sem.
    
    [1] https://github.com/paravmellanox/rtool
    
    $ rdma_resource_lat -c 1 -s 48G -a -u L -i 500 -A
    
    It registers pinned memory from 4K to 48GB size with 500 iterations for
    each memory size.
    
    $ rdma_resource_lat -c 1 -s 12G -a -u L -i 500 -t 4
    
    4 competing threads pin memory, each of 12GB size with 500 iterations.
    Signed-off-by: 's avatarParav Pandit <parav@mellanox.com>
    Signed-off-by: 's avatarLeon Romanovsky <leonro@mellanox.com>
    Signed-off-by: 's avatarJason Gunthorpe <jgg@mellanox.com>
    3994586f
umem.c 9.01 KB