Object storage verification causing spike in memory usage
Summary
Copying over from thread in issue #393570 Customer in https://gitlab.zendesk.com/agent/tickets/460058 was seeing this behavior.
Summary of the issue
They had memory spikes every 30 minutes causing their nodes to terminate and be replaced.
This occurred after their upgrade (which I assume is similar to "initial replication" because secondary has to catch up).
They increased the size of their instance type and then they no longer terminated but the spikes were still visible in the metrics.
We were able to identify Sidekiq as the cause of the problem, but it wasn't apparent which job was causing the problem from the logs alone.
We then found this issue and tried reducing the concurrency, this had no effect.
It was when we disabled the feature flag "geo_object_storage_verification" with Feature.disable(:geo_object_storage_verification)
in the rails console their memory usage returned to normal.
The customer has advised they have a Geo primary but no Geo secondary at the moment. So I'm not sure if #24081 is related. I'm actually wondering if the issue was triggered by the lack of the secondary but I'm yet to learn how the Geo object storage verification works (reading up on it now).
Steps to reproduce
Based on the above description:
- Setup GitLab instance with Object storage
- Enable object storage verification (enabled by default)
What is the current bug behavior?
High memory consumption when object storage verification is turned on.
What is the expected correct behavior?
We should not see a significant jump in memory usage when object storage verification is running.
Workaround
Disable object storage verification by disabling the feature flag by
-
Running the following command in the gitlab-rails console
Feature.disable(:geo_object_storage_verification)
or
-
Running the following command on the command line on one of the rails nodes on the primary site
sudo gitlab-rails runner 'Feature.disable(:geo_object_storage_verification)'