A customer encountered an issue with their self-managed Gitlab instance, which was unable to synchronize with the package data stored in the License-db public buckets. This problem arose from a network connectivity error. The customer faced difficulty in determining the complete set of Google Cloud Platform (GCP) IP addresses that needed to be allowed for access.
Zendesk Ticket - internal only - for GitLab team members who can view the ticket
Proposed solution
We can use an External Application Load Balancer to reach the cloud storage buckets. This way anyone can use a static IP to reach the load balancer and hence the bucket, enabling users to add to their allowlist one IP address.
WARNING: Existing instances of GitLab (self managed and giltab.com) are already synchronizing data with the GCP bucket and any change we're doing should stay backward compatible to avoid breaking this feature for these customers.
Given the significance of this update, we recommend that we seek input from a Security Infrastructure expert and a member of the Reliability team before proceeding with deployment to the production environment.
Implementation Plan
Register a domain name for dev through Cloud domains
@fcatteau Thanks for your answer. I am not sure how this will work exactly if we add a load balancer. I hope that we can use the endpoint argument in Google::Cloud::Storage.anonymous in order to connect through the LB address. I need to experiment with this by creating a quick PoC.
Anyways most probably this issue will require changes in the Rails backend as well.
I would like to investigate if we can reach the public bucket both from the LB and directly (like we do now). If this is possible this will enable us to migrate nicely from the current implementation to the new one.
Disclaimer: Probably we can also make it work with 1 LB but I am not sure how.
In this proposal we can reserve a static IP per LB. Even better we can use a domain name with an A record linked to the static IP so that we don't need to worry if the IP changes for any reason.
Terraform code
We will create a loadbalancer module and include it in main.tf.
# Reserve IP address for the load balacnerresource"google_compute_global_address""licenses_static_ip"{name="public-licenses-lb-ip"}resource"google_compute_global_address""advisories_static_ip"{name="public-advisories-lb-ip"}# Create LB backend bucketsresource"google_compute_backend_bucket""licenses_bucket"{name="licenses"description="Contains exported licenses"bucket_name=var.licenses_export_bucket_name}resource"google_compute_backend_bucket""advisories_bucket"{name="advisories"description="Contains exported advisories"bucket_name=var.advisories_export_bucket_name}# Create url mapsresource"google_compute_url_map""licenses_url_map"{name="http-lb-licenses"default_service=google_compute_backend_bucket.licenses_bucket.idhost_rule{hosts=["*"]path_matcher="path-matcher-2"}path_matcher{name="path-matcher-2"default_service=google_compute_backend_bucket.licenses_bucket.id}}resource"google_compute_url_map""advisories_url_map"{name="http-lb-advisories"default_service=google_compute_backend_bucket.advisories_bucket.idhost_rule{hosts=["*"]path_matcher="path-matcher-2"}path_matcher{name="path-matcher-2"default_service=google_compute_backend_bucket.advisories_bucket.id}}# Create HTTP target proxiesresource"google_compute_target_http_proxy""licenses_http_proxy"{name="http-lb-proxy-licenses"url_map=google_compute_url_map.licenses_url_map.id}resource"google_compute_target_http_proxy""advisories_http_proxy"{name="http-lb-proxy-advisories"url_map=google_compute_url_map.advisories_url_map.id}# Create forwarding rulesresource"google_compute_global_forwarding_rule""licenses_forwarding_rule"{name="http-lb-forwarding-rule-licenses"ip_protocol="TCP"load_balancing_scheme="EXTERNAL_MANAGED"port_range="80"target=google_compute_target_http_proxy.licenses_http_proxy.idip_address=google_compute_global_address.licenses_static_ip.id}resource"google_compute_global_forwarding_rule""advisories_forwarding_rule"{name="http-lb-forwarding-rule-advisories"ip_protocol="TCP"load_balancing_scheme="EXTERNAL_MANAGED"port_range="80"target=google_compute_target_http_proxy.advisories_http_proxy.idip_address=google_compute_global_address.advisories_static_ip.id}# # DNS zone and recordsresource"google_dns_managed_zone""licenses_dns_zone"{name="licenses-bucket-dns-zone"dns_name=join("",[var.licenses_domain_name,"."])description="DNS zone for public license bucket"labels=var.labelsdnssec_config{state="off"}}resource"google_dns_record_set""licenses_rs_a"{name=google_dns_managed_zone.licenses_dns_zone.dns_namemanaged_zone=google_dns_managed_zone.licenses_dns_zone.nametype="A"ttl=300rrdatas=[google_compute_global_address.licenses_static_ip.address]}resource"google_dns_managed_zone""advisories_dns_zone"{name="advisories-bucket-dns-zone"dns_name=join("",[var.advisories_domain_name,"."])description="DNS zone for public advisory bucket"labels=var.labelsdnssec_config{state="off"}}resource"google_dns_record_set""advisories_rs_a"{name=google_dns_managed_zone.advisories_dns_zone.dns_namemanaged_zone=google_dns_managed_zone.advisories_dns_zone.nametype="A"ttl=300rrdatas=[google_compute_global_address.advisories_static_ip.address]}
modules/loadbalancer/varialbes.tf
variable"licenses_export_bucket_name"{type=stringdescription="The name of the licenses export bucket"}variable"advisories_export_bucket_name"{type=stringdescription="The name of the advisories export bucket"}variable"licenses_domain_name"{type=stringdescription="The domain name to be used for the licenses bucket load balancer"}variable"advisories_domain_name"{type=stringdescription="The domain name to be used for the advisories bucket load balancer"}variable"labels"{type=map(string)description="labels to be added in the resources"}
How to list files
After deploying the terraform code from the section above we can use the advisories_static_ip and licenses_static_ip to list the files. Of course you need to find the IPs. You can do that in the IP section of your GCP sandbox project.
The response contains all files and all paths to files in the bucket.
How to retrieve a file
In order to retrieve a file you need to give the full path of the file. For example from the previous example in order to retrieve from the advisory bucket the file <advisory_bucket>/v2/something/sheep.txt you can execute:
This command will return the content of the indicated file.
Changes on the Rails backend
Currently the Rails backend is using google-apis-storage_v1 and google-cloud-storage gems to list and read files from the buckets. This API cannot be used in this solution. Proposal 2 offers an alternative for which we don't need to change the rails implementation.
For this proposal we need to modify the Rails backend to do the following:
For listing all the files, we need to parse the xml response
For downloading a file we need to perform an http GET request
Pros and cons
+ Straightforward solution when it comes to infrastructure
+ We can access the buckets both through the load balancer and directly using the storage api (current implementation). This can be handy if we want to migrate without any break changes.
@fcatteau When it comes to the Rails backend part of this issue, do you think this is doable? Probably we need a separate issue for the Rails implementation.
@nilieskou Yes, we definitely need a separate issue for changes to the gitlab backend. That seems doable that we'll figure that out as part of the issue refinement. As said, it seems like the Ruby gems provided by Google don't allow us to specify static IPs, so the change doesn't seem trivial.
cc @ifrenkel You might have additional feedback since you've implemented the code that fetches files from GCP buckets.
Regarding the solution you suggested using a Ruby client configured to a custom endpoint and then using a CNAME entry to redirect to c.storage.googleapis.com I couldn't make it work. So in theory what is required is to have a bucket named as a subdomain of the domain used for the LB. So if for example we have example.com then we should name the bucket advisory.example.com and link it using a CNAME entry.
Even if this would work I am wondering if this solves any problem since the redirect to c.storage.googleapis.com will still be problematic since we are trying to avoid using that endpoint in the first place.
Nick Ilieskoumarked the checklist item Create TF module for adding loadbalancer, DNS zones and everything that is required. You can work based on this Draft MR. as completed
marked the checklist item Create TF module for adding loadbalancer, DNS zones and everything that is required. You can work based on this Draft MR. as completed