2021-04-01 Addition to GitLab.com IP ranges
Production Change
Emergency maintenance
We are fast-tracking the addition of a second IP range to ensure availability for outgoing connections from GitLab.com. This update is a response to recurring scaling needs identified during recent incidents.
The additional range will be added at 2021-04-01 at 02:00 am UTC.
Please ensure that the IP ranges listed in https://docs.gitlab.com/ee/user/gitlab_com/#ip-range are added to your allowlists to avoid intermittent connection issues.
Change Summary
Related to #3981 (closed) and #4099 (closed), we are going to expand the IP range used by GitLab.com with CloudNAT. The Doc change is this MR: gitlab-org/gitlab!56770 (merged) The changes to merge in the true application/CloudNAT changes will be noted below.
Change Details
- Services Impacted - ServiceWeb
- Change Technician - @devin
- Change Criticality - C2
- Change Type - changescheduled
- Change Reviewer - @msmiley
- Due Date - 2021-04-01 02:00 UTC
- Time tracking - 5 min
- Downtime Component - No downtime
Detailed steps for the change
Pre-Change Steps - steps to be completed before execution of the change
-
merge in the documentation MR to update the IP ranges -
announce a zero downtime maintenance window and tweet about the new range -
Proceed with the change on 2021-04-01
Change Steps - steps to take to execute the change
Estimated Time to Complete (mins) - 20 minutes
Post-Change Steps - steps to take to verify the change
-
Post-Change Step 1
Rollback
Rollback steps - steps to be taken in the event of a need to rollback this change
Estimated Time to Complete (mins) - 20 min
-
Revert the CloudNAT extension MR
Monitoring
Key metrics to observe
- Metric: Allocated ports for the Cloud NAT
- Location: Google console
- What changes to this metric should prompt a rollback: Any extreme on both ends (more tthan 1 Million ports allocated, and less than 800k )
Summary of infrastructure changes
-
Does this change introduce new compute instances? -
Does this change re-size any existing compute instances? -
Does this change introduce any additional usage of tooling like Elastic Search, CDNs, Cloudflare, etc?
Summary of the above
Changes checklist
-
This issue has a criticality label (e.g. C1, C2, C3, C4) and a change-type label (e.g. changeunscheduled, changescheduled) based on the Change Management Criticalities. -
This issue has the change technician as the assignee. -
Pre-Change, Change, Post-Change, and Rollback steps and have been filled out and reviewed. -
Necessary approvals have been completed based on the Change Management Workflow. -
Change has been tested in staging and results noted in a comment on this issue. -
A dry-run has been conducted and results noted in a comment on this issue. -
SRE on-call has been informed prior to change being rolled out. (In #production channel, mention @sre-oncall
and this issue and await their acknowledgement.) -
There are currently no active incidents.