Explore continuous performance testing of GitLab Cloud Native Reference Architecture

changed milestone to %13.8

added Quality devopssystems groupdistribution sectioncore platform labels

mentioned in epic gitlab-org&4986 (closed)

mentioned in issue gitlab-com/www-gitlab-com#9679 (closed)

Yeah this is certainly worth exploring going forward. For it to be valuable though we'd need it to have the same conditions as our main test pipelines:

GitLab version is the same across components, in this case in the charts and omnibus.
Updates can happen seamlessly in CI
Monitoring works the same as well, we really need full monitoring to be able to investigate failures correctly. With the hybrid environments this is difficult to get Prometheus (omnibus) to be able to poll chart nodes.

cc @nwestbury

changed milestone to %13.10

mentioned in issue gitlab-org/charts/gitlab#1832 (closed)

mentioned in issue #747 (closed)

mentioned in issue #746 (closed)

mentioned in issue #793 (closed)

mentioned in merge request gitlab-org/charts/gitlab!1872 (merged)

With the recent Grant's amazing work on adding Hybrid Reference Architecture support to GitLab Environment Toolkit, we probably can revisit this issue soon. Updating the milestone on the issue and will keep it on my radar.

One initial issue I can call out about this is that you can't simply just turn off a k8s cluster. To keep costs low this is a problem for us.

While was testing 50k hybrid, I've searched if there is a solution to save costs with GKE and found that gcloud container clusters resize --num-nodes 0 can be used to resize specific node pool to 0. As far as I understand it cordons the node and makes it unschedulable, then eventually drains the pods. In our case we need to resize 3 times for now until gitlab-org/quality/reference-architectures#65 (closed) is closed. It took quite a lot of time to resize the nodes on 50k - about 1 hour and or even more. It's probably due to the fact that while GKE is draining the node, controller tries to reschedule pods but it can't since all nodes are cordoned. Not sure how to bypass this issue first. But overall when I resized the node back to its original size, pods were reinitialised and the environment worked fine. Another path may be to delete the release, resize the node pools to 0 after each test run. So that the next time GET will install the chart once again.

changed due date to April 30, 2021

changed milestone to %Next 1-3 releases

removed from epic gitlab-org&4986 (closed)

removed due date

mentioned in issue gitlab-com/www-gitlab-com#11593 (closed)

removed from epic &13

added to epic &13

mentioned in issue #891 (closed)

changed milestone to %14.3

changed milestone to %Next 1-3 releases

changed due date to October 22, 2021

mentioned in epic &13

mentioned in issue #1044

mentioned in issue #1050 (closed)

removed due date

mentioned in issue gitlab-org/gitlab#338978 (closed)

mentioned in issue #1151 (closed)

mentioned in issue #1250 (closed)

mentioned in issue #1358 (closed)

I believe we can close this issue as Grant worked hard on adding more hybrid performance pipelines over this quarter - gitlab-org/quality/reference-architectures#125 (closed), gitlab-org/quality/reference-architectures#127 (closed)

closed

removed milestone %Next 1-3 releases

Explore continuous performance testing of GitLab Cloud Native Reference Architecture

Designs

Child items ...

Activity

Explore continuous performance testing of GitLab Cloud Native Reference Architecture

Relates to

Activity