Silent mode MVC: Discover the sources of GitLab outbound communications
What is Silent mode?
"GitLab Silent Mode for Backup / Disaster Recovery Testing" is a feature proposal to help GitLab sysadmins perform disaster recovery tests with less risk than an actual failover, and without impacting users or the outside world. This mode would silence most outbound communications from a GitLab instance.
For example, any emails sent by a DR site during a failover test are either duplicates or completely invalid. Since the primary site is still up also sending emails.
Without this mode, sysadmins must block outbound communications during a failover test. But it is difficult for sysadmins to properly implement a robust barrier while allowing any valid communications necessary for failover testing. So a "Silent mode" feature in GitLab would make failover testing easier and more reliable for all deployments.
Purpose of this issue
- A "Silent mode" requires blocking outbound communications.
- GitLab is a large application and there are many sources of outbound communications. One person is unlikely to remember all of them.
- Some outbound communications should be allowed. For example, if they help to retain normal functions during a failover test and if they don't have an impact on users.
- Some outbound communications, if blocked, may break QA tests unless additional changes are made to GitLab or to QA tests.
Therefore, we need to discover the sources of all GitLab outbound communications and characterize each one with respect to a failover test.
Ask
For each known or suspected source of outbound communication, post a comment with this template. It's ok to answer questions with "I don't know" or "probably". At this stage we mostly want to get a sense of scope. If another comment already covers the source, but something is left unsaid, then please add to that discussion.
### Brief description
<!-- Free form description -->
Questions:
- What is the impact of allowing these communications as-is during a failover test?
-
- What is the potential negative impact of blocking this inside of a failover test?
-
- Should it be blocked?
-
- If yes, what is the estimated effort to block it and mitigate any negative impact on failover tests?
<!-- Please use these t-shirt sizes: Small(1-2 weeks), Medium (About 1 month), Large (1-3 months), XL (3 months) -->
-
<!--
If it should be blocked, then describe any initial ideas on blocking this source in a Silent mode MVC. Links to code or APIs are welcome.
MVC considerations:
-->
See other posts as examples.
When estimating, assume there'll be an easy way to check silent_mode?
in Rails.
Possible application chokepoints
ActionMailer
- We could disable
config.action_mailer.perform_deliveries
, but that would require reload/restart. - Probably better to use an ActionMailer callback to prevent sending emails
- Warning: Devise emails do not use
ActionMailer
. We may be able to block atDeviseMailer
. - Are there other emails that do not use
ActionMailer
?
HTTP request libraries
GitLab::HTTP
Gitlab::HTTP is recommended by secure coding guidelines, and we have a Rubocop cop which blocks direct use of HTTParty, in favor of Gitlab::HTTP.
There are about 95 usages of Gitlab::HTTP.
.
Therefore Gitlab::HTTP
may be a good chokepoint for many outbound HTTP requests.
The effort to block it is small at face value, but there may be downstream effects which require work to address (when Silent mode is on) such as QA failures or UI errors.
In an attempt to see what happens to QA tests when non-GET requests are blocked, I opened this MR which blocks these requests with a feature flag silent_mode
. Since QA tests run with a new feature flag enabled and disabled, this may give us more information about the impact of blocking outbound HTTP requests on our ability to perform QA during a failover test. !96415 (closed)
Faraday
- Is there a reason
Gitlab::HTTP
is not used for these cases? - A long list of gems and our own code which use Faraday: &8029
- We can write a "middleware" to block it: https://lostisland.github.io/faraday/#/middleware/index
Net::HTTP
Other HTTP libraries?
- Are there other HTTP request libraries being used than the above?
- What about in Workhorse?
- What about other services?
Reference
Run in browser console to grab level 3 headings and their comment URLs:
msg="";$('.note-text h3').each(function(e,o) {href=$(o).closest('.timeline-content').find('.note-timestamp').attr('href');msg = msg + `\n- [${$(o).text()}](${href})`;});console.log(msg)
Summary
-
Outbound Emails
- Effort: Small. Block ActionMailer and DeviseMailer.
-
Container Registry Webhooks
- Effort: None, just document as a limitation for now.
-
GitLab Project or Group Webhooks
- Effort: Small. Block Gitlab::HTTP and enqueuing the workers.
-
Pull mirroring
- Effort: Small. Block at or around GitalyClient.
-
Push mirroring
- Effort: Small. Block at or around GitalyClient.
-
Bidirectional mirroring
- Effort: None in addition to push mirroring.
-
Server Hooks
- Effort: None, just document as a limitation.
-
Deprecated Kubernetes Connections (not needed for KAS)
- Effort: Small. Block at Gitlab::Kubernetes::KubeClient.
-
GitLab integrations
- Effort: None if already blocking Gitlab::HTTP.
-
GitLab Geo
- Effort: None.
-
Object storage
- Effort: None, just document as a limitation. Like, "Silent Mode does not block requests to object storage. The site can write and delete objects. You are responsible for clean up." If we were to block writes to object storage, then much QA is not possible. Regardless of Silent mode, we may need to put effort into cleaning up after manual/automated QA.
-
Elasticsearch
- Effort: None, just document as a limitation.
-
File Hooks
- Effort: None, just document as a limitation.
-
Snowplow
- Effort: Small. Block at Gitlab::Tracking and/or Net::HTTP
-
Sentry Error Tracking
- Effort: None, just document as a limitation.
-
Integrated error tracking
- Effort: None, just document as a limitation.
-
Dependency Proxy
- Effort: None, just document as a limitation.
-
System Hooks
- Effort: None in addition to GitLab Project or Group Webhooks
- Future features, future gems
- Effort: Future maintenance. A question: How should we proactively find about about new things that should be blocked before they cause a problem for someone?
Effort: Small x6