Skip to content

Avoid using mmap if not required and speedup metrics collection

Paweł Chojnacki requested to merge avoid_using_mmap_if_not_required into master

This MR:

Avoid using MMAP when parsing metrics

When parsing many metrics files to generate Prometheus endpoints response, we don't actually have to use MMAP. And as such we can use a much safer option of simply reading the file using standard File methods. Those operations are also a bit faster due to File access require fewer operations than MMAP.

Introduce checks for PID change on each metric operation

When a worker is forked from the master process all its metrics are copied over. This also means file handles and locks already used. Previously we've been relying on Unicorn hooks to reset the metrics when the process is forked. But that is not sufficient to guard against possible corruption.

Because a metric can fire between the time when the process is forked and when metrics are reset, we cannot guarantee that files will be reinitialized in time.

Native get_double method

This method allows atomic read and parsing double from MMAP'ed memory as with the previous implementation the underlying memory could have been freed between reading and parsing.

Previous ruby implementation used code similar to

slice(0..3).unpack('d')

This is composed of two methods slice(0..3) which returns String that internally points to mmapped memory. And unpack which parses the string into double.

It can theoretically happen that right after slice(0..3) is called and before unpack('d') executes the underlying memory is freed by another thread. This would cause unpack('d') to access invalid memory potentially leading to many different errors.

Native add_entry method

Through some manual testing when adding access synchronization using additional locks in code that instruments methods in GitLab. Resulted in a significant 10x+ slowdown of the execution time of the whole process.

Porting this method to C allowed using GIL to synchronize access as well using the speed of C to optimize performance ofcode that gets executed whenever new metric is initialized which was also a contributor to initial warm'up time of code that relied heavily on instrumenting using Prometheus metrics.

Additional changes:

  • refactor metric processing and representation to separate module to aid with optimization/rewriting to C

  • implement bin/parse tool

    meant as a helper to diagnose and parse various data files, it outputs in both undigested JSON representing all data points and it can output Prometheus text as well

Edited by Paweł Chojnacki

Merge request reports