Skip to content

Improve cleanup of gpg-homedirs

What does this MR do?

The gpg-agent that could have been spawned here dies when it sees it's socket disappear.

However, sometimes it seems like we fail to delete the homedir, causing the gpg-agent to live on forever. We've noticed that the deletion failed in http://gitlab.com/gitlab-org/gitlab-foss/issues/36998: there was a race condition during the deletion where gpg-agent would still be modifying files while we've already called FileUtils.remove_entry.

This will attempt to delete the directory multiple times, at least 0.1 seconds apart. This is a naive way of trying to make sure we clean up the homedir and count on gpg-agent to see that and make itself go away.

On a web node we'll attempt for at most 0.5 seconds to clean up the directory before failing. In a sidekiq process we'll attempt the deletion for up to 2 seconds.

When the cleanup fails, we will now track that exception in Sentry to gain some visibility.

This also introduces a gauge that will count how many gpg-keychains we currently have. Which should roughly be equivalent to the number of gpg-agent processes.

This would be a first step in #20918 (closed)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Edited by Bob Van Landuyt

Merge request reports