Improve cleanup of gpg-homedirs
What does this MR do?
The gpg-agent
that could have been spawned here dies when it sees
it's socket disappear.
However, sometimes it seems like we fail to delete the homedir,
causing the gpg-agent
to live on forever. We've noticed that the
deletion failed in
http://gitlab.com/gitlab-org/gitlab-foss/issues/36998: there was a
race condition during the deletion where gpg-agent
would still be
modifying files while we've already called FileUtils.remove_entry
.
This will attempt to delete the directory multiple times, at least 0.1
seconds apart. This is a naive way of trying to make sure
we clean up the homedir and count on gpg-agent
to see that and make
itself go away.
On a web node we'll attempt for at most 0.5 seconds to clean up the directory before failing. In a sidekiq process we'll attempt the deletion for up to 2 seconds.
When the cleanup fails, we will now track that exception in Sentry to gain some visibility.
This also introduces a gauge that will count how many gpg-keychains we
currently have. Which should roughly be equivalent to the number of
gpg-agent
processes.
This would be a first step in #20918 (closed)
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry - [-] Documentation created/updated or follow-up review issue created
-
Code review guidelines -
Merge request performance guidelines -
Style guides - [-] Database guides
-
Separation of EE specific content
Availability and Testing
- [-] Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
- [-] Tested in all supported browsers