Skip to content

Check NFS mounts in a separate process

Bob Van Landuyt requested to merge bvl-circuitbreaker-process into master

What does this MR do?

Moving the check out of the general requests, makes sure we don't have any slowdown in the regular requests.

To keep the process performing this checks small, the check is still performed inside a unicorn. But that is called from a process running on the same server.

Because the checks are now done outside normal request, we can have a simpler failure strategy:

The check is now performed in the background every circuitbreaker_check_interval. Failures are logged in redis. The failures are reset when the check succeeds. Per check we will try circuitbreaker_access_retries times within circuitbreaker_storage_timeout seconds.

When the number of failures exceeds circuitbreaker_failure_count_threshold, we will block access to the storage.

After failure_reset_time of no checks, we will clear the stored failures. This could happen when the process that performs the checks is not running.

The background process can be started using bin/storage_check it takes a socket path or a host. It will periodically make requests the provided unicorn. The HealtController#storage_check will handle the request and report the status of the shards of that particular host.

Why was this MR needed?

This simplifies a lot of the circuitbreaker implementations, and moves the check out of requests. That way the request doesn't get bogged down.

Does this MR meet the acceptance criteria?

TODO:

What are the relevant issue numbers?

Closes https://gitlab.com/gitlab-org/gitlab-ce/issues/39847

Edited by Bob Van Landuyt

Merge request reports