Towards thread-safety: adding the rubocop-thread_safety cop
The GitLab rails codebase already runs in a multithreaded environment: Sidekiq.
Going forward, we also intend to investigate moving our production fleet across to Puma, a multithreaded application server. This is already possible using GDK by setting the EXPERIMENTAL_PUMA=1
flag when executing gdk run
.
This brings the issue of thread-safety to the fore.
There are several common Ruby coding patterns which we use fairly extensively in our codebase which we should be avoiding in a multithreaded environment.
One of the main offenders here is lazy initialization of global state (||=
). This is fairly common in our code-base, but is not thread safe if used on a global variable.
Just to pick one minor example at random from a Rack middleware:
def self.http_request_total
@http_request_total ||= Gitlab::Metrics.counter(:http_requests_total, 'Request count')
end
def self.rack_uncaught_errors_count
@rack_uncaught_errors_count ||= Gitlab::Metrics.counter(:rack_uncaught_errors_total, 'Request handling uncaught errors count')
end
def self.http_request_duration_seconds
@http_request_duration_seconds ||= Gitlab::Metrics.histogram(:http_request_duration_seconds, 'Request handling execution time',
{}, [0.05, 0.1, 0.25, 0.5, 0.7, 1, 2.5, 5, 10, 25])
end
In this example, two or more threads could instantiate these class variables simultaneously. (Note: the GIL will only ensure that the write to the class variable is atomic, not the entire statement). The last contentious thread to write will "win" while the others will use dangling (overwritten) instances.
To fix this, the threads either need to co-ordinate the instantiation, or, much more simply, this initialization should be made to be not lazy and done once at class initialization time/startup.
This article does a good job of explaining all this: https://lucaguidi.com/2014/03/27/thread-safety-with-ruby/
As a first step in this direction, I'm considering adding some Rubocop style checks for thread-safety (via rubocop-thread_safety
) (with exceptions for now) to the codebase, along with some documentation. This would help improve awareness of this issue in the run up to the puma deploy.
What does everyone think of this?