Review urgent-cpu-bound shard's queues to see what causes NFS reads
In &89 (closed), we're moving our Sidekiq shards to Kubernetes.
On Kubernetes, we can't have shared NFS mounts. Currently our VMs have:
- /var/opt/gitlab/gitlab-rails/shared/artifacts - used for ci traces
- /var/opt/gitlab/gitlab-rails/shared/pages - used for pages
- /var/opt/gitlab/gitlab-ci/builds - used for ci traces
- /var/opt/gitlab/gitlab-rails/shared/cache - used for caching archives (project downloads)
In the future we'd like to migrate the urgent-cpu-bound shard. We can see that this performs some NFS reads, but no writes.
This in itself is curious, and we suspect it's due to build traces, but we should investigate. This shard runs 15 queues so it should be possible to check them manually:
- Code inspection.
- Local testing - set everything to use object storage, and make
shared/
unreadable by the user running the application. If a worker fails to read a file fromshared/
, it's the problem.
If that doesn't work, we can try some sort of binary search, by splitting the shard into two, and seeing which shard (or shards) perform reads. Then we can rebalance the queues until we know which queues are performing reads. But that would be in a separate issue.
cc @jarv