For charts do we want to add a new CI cluster? Or just do a manual verification for now, and continue to rely on the kind api resource validation to keep our MRs inline.
Adding a full CI cluster for each version would be great, but then ideally we could reuse the same cluster that we setup for the operator CI, or use some lighter weight cluster to keep costs lower.
Separate from manually verfication, I'm not sure we are ready to tackle this until after we already have the cluster up and hooked into to the operator project. Thoughts @mnielsen ?
OpenShift Continuous Integration: The vendor has a CI system process in place to continuously test OpenShift updates as they are released by Red Hat to verify interoperability.
As there are some components of the total gitlab/gitlab chart that are not deployed by the Operator, I think it makes sense to consider how we would right test a complete deployment against a 1.22+ API server (possibly even a mock). CI validates, but does not test to catch things like NGINX Ingress Controller changing the required API availability during runtime.
Where and how did you collect the above stats? We will need a method to verify things are addressed, and a confirmation method reproduce the problematic findings.
Well, in #3355 (closed) we have essentially shown that this is a problem that appears to be sourced from the Chart itself. The next question is what else which is deployed into this cluster could using these dated APIs?
curious coincidence: gitlab-operator cluster shows similar issues ... and "last used" dates (time is different). I wander if that's some "cron job" that's at the source of it. Talking with @twk3 possibility has been raised that perhaps it's cert-manager trying to renew certs?
turns out we should not have external-dns installed in cluster for charts:
Error: INSTALLATION FAILED: rendered manifests contain a resource that already exists. Unable to continue with install: could not get information about the resource: clusterroles.rbac.authorization.k8s.io "gitlab-external-dns-helm-charts-win" is forbidden: User "system:serviceaccount:helm-charts-win:helm-charts-win-service-account" cannot get resource "clusterroles" in API group "rbac.authorization.k8s.io" at the cluster scope
will have to yank it out of the cluster and re-run pipeline
@WarheadsSE it's all in context of cloud-native-v122 cluster. Since that comment @twk3 suggested we just remote the task installing external-dns instead from the pipeline.
removing install_external_dns call from pipeline got things moving further, but looks like permissions for SA are incorrect:
Error: rendered manifests contain a resource that already exists. Unable to continue with install: could not get information about the resource: clusterroles.rbac.authorization.k8s.io "k8s122-review-301-ai49tg-nginx-ingress" is forbidden: User "system:serviceaccount:helm-charts-win:helm-charts-win-service-account" cannot get resource "clusterroles" in API group "rbac.authorization.k8s.io" at the cluster scope
interesting that the first portion of error reads rendered manifests contain a resource that already exists - might have to dig there. Perhaps it's not just external-dns that creates the conflict..
Error: UPGRADE FAILED: could not get information about the resource: clusterroles.rbac.authorization.k8s.io "k8s122-review-301-ai49tg-nginx-ingress" is forbidden: User "system:serviceaccount:helm-charts-win:helm-charts-win-service-account" cannot get resource "clusterroles" in API group "rbac.authorization.k8s.io" at the cluster scope
it looks like system:serviceaccount:helm-charts-win:helm-charts-win-service-account is being used and notgitlab.
@dmakovey is that the right cluster? Our existing GKE cluster has that SA, and it's attached to the edit cluster role using the namespaced rolebinding to the helm-charts-win project.
I have also attempted to eliminate possibility of typo/mix-ups by using script from #3018 (comment 1048414097) . At least on surface everything seems to line up.
I'll add a bit more debug to pipeline, perhaps it'll shed some light here..
so indeed deployment is being done via different SA than gitlab. I'm starting to think that this may be an issue of me manually creating helm-charts-win prior and that perhaps gitlab was supposed to set it all up automatically which is why we're missing something. I might try to delete helm-charts-win-related objects from the cluster and see what happens. We may need to re-submit values for this project's configuration under .../clusters ... will be digging further
further testing shows that system:serviceaccount:helm-charts-win:helm-charts-win-service-account has admin privileges ( kubectl auth can-i '*' '*' ) yet it fails to get clusterroles
Job logs snippet
$ /tmp/kubectl-whoamisystem:serviceaccount:helm-charts-win:helm-charts-win-service-account.......$ kubectl auth can-i '*' '*' || trueyes.......Error: UPGRADE FAILED: could not get information about the resource: clusterroles.rbac.authorization.k8s.io "k8s122-review-301-ai49tg-nginx-ingress" is forbidden: User "system:serviceaccount:helm-charts-win:helm-charts-win-service-account" cannot get resource "clusterroles" in API group "rbac.authorization.k8s.io" at the cluster scope
next step would be to grab system:serviceaccount:helm-charts-win:helm-charts-win-service-account credentials and re-run tests using those, trying to replicate issue in a more controlled environment.
After getting some assistance from @ghickey and @dustinmm80 got jobs to properly deploy into the cluster. Need to confirm whether it's related to Legacy authorization setting
It looks like it is related as job failed immediately after change of setting on twosubsequent runs. I'm re-confirming that with the 3rd run after switching Legacy authorization back on.
kubernetes-provisioning ticket has been filed ( gitlab-org/distribution/infrastructure/kubernetes-provisioning#15 ) with corresponding MR: gitlab-org/distribution/infrastructure/kubernetes-provisioning!9