Currently, when the "GitLab Runner" application is installed via Kubernetes integration there is no way to update the runner to newer versions.
Further details
This makes using newer runner features on Kubernetes integration impossible. As we are introducing new syntax (e.g., for reports) that will make old runners not able to run the related jobs.
People that are using our integration to run CI/CD will have hard times by accessing those new features if we don't allow to update the existing runners in an easy way.
Solution
Automatically upgrade the installed runner application to the latest runner version in sync with GitLab
Show when last upgraded, the chart version, and link to the chart.
Application fails to automatically update
When an application upgrade fails, we should retry # times (same as jobs, which I think is 3 times). When the upgrade fails # times, we should notify the group/project owner of the failure via email.
[Retry upgrade] takes the user the cluster detail view.
(Note: the project line would not appear for group level clusters)
If the upgrade fails # times, we should revert back to the previous working version (when possible) and allow the user to manually retry.
If successful, we show the success alert. If unsuccessful, we show the danger alert once again.
What does success look like, and how can we measure that?
A method to upgrade Kubernetes runners.
Usage ping updated to track how many users have clicked the upgrade button
This page may contain information related to upcoming products, features and functionality.
It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes.
Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.
We installed runner via gitlab to kubernetes on GKE yesterday and it installed runner v 10.3.0 (2017-12-22), this is a really big issue.. please fix asap. Kubernetes integration is not very usable given outdated versions of things get installed
thoughts on updating the live one retroactively with the 11.2 version? Easy to do?
@jlenny Since I'll be preparing a version with 11.3.0 tomorrow (as in gitlab-runner#3579 (closed)) I don't think we need another MR that would contain the update to 11.2 (which will also block the merge of update to 11.3 in case of any problems). Let's just start syncing the versions with 11.3.0.
This feature seems to be more and more important, as we are introducing new syntax (e.g., for reports) that will make old runners not able to run the related jobs.
People that are using our integration to run CI/CD will have hard times by accessing those new features if we don't allow to update the existing runners in a easy way.
Any chance it could be prioritized in the near future? Thanks!
It seems the chart has already been updated and the only thing left would be to use the new chart version in the k8s integration, is my read here correct @tkuah@DylanGriffith ?
Is it an automatic upgrade or do we require the user to explicitly hit an upgrade button ?
Is there the risk of something breaking for the user? We should automatically upgrade imo for I wonder if further consideration for backward compatibility should be made.
Is there a suitable place to show the current Helm chart and application version that is installed ?
I picture an "app info" hover that provides this information when the user requires it.
Is there the risk of something breaking for the user? We should automatically upgrade imo for I wonder if further consideration for backward compatibility should be made.
I think the best thing for us to do is be opinionated and keep things up to date with the versions we set and test in GitLab. Doing this automatically seems preferable to me but we'd just need some way to present the information to a user if the upgrade failed for some reason. Also a failed upgrade may cause something to be broken in their cluster so we'd maybe even want to email them if they're previously working Ingress just got upgraded and is now broken (ie. traffic is no longer reaching their app).
I think being opinionated is important because if people want to manage all of this stuff (and disagree with our opinions) then they can always just install themselves outside of GitLab but if they install through GitLab we should not put a burden on them to keep things up to date.
I imagine we could have a few details here:
Display the installed version in GitLab's UI
Regular schedule to check your installed applications and update them if they are out of date
Display upgrade errors somewhere
Alert users if an upgrade fails (maybe the user that installed the application should get an email)
I think the best thing for us to do is be opinionated and keep things up to date with the versions we set and test in GitLab. Doing this automatically seems preferable to me but we'd just need some way to present the information to a user if the upgrade failed for some reason
I agree, let's for automatic. I think we can alert users to check the logs of the pod used to upgrade, very similar to how we do this for installs.
Also a failed upgrade may cause something to be broken in their cluster so we'd maybe even want to email them
Notification is going to be interesting. Do we pro-actively notify or simply leave a error message in the cluster page for that application ?
If the upgrade fails, will we automatically retry? How many times?
Yes, we should retry. Sometimes there are temporary connection errors which means re-trying automatically will resolve the situation. We can re-try as many times as we prefer.
If it continues to fail, what will the user need to do to mitigate the situation?
Once the user have fixed the issue causing the upgrade to fail, can they have a button to try the Upgrade again ? That seems sensible to me.
If an upgrade fails, how feasible is it to roll back so that we don't break something in their cluster?
Good question. I think it's feasible as long as we know the previous version to rollback to, which we should have in the history. Are you thinking of an automatic rollback in event of failure ?
My guess is that Helm charts mainly use Kubernetes Deployments so an upgrade will not break things in the sense that the old version will keep running until a new version can successfully start. Of course, that doesn't mean it's foolproof so we can still rollback with Helm rollback.
Daniel Gruessochanged title from Provide a method for upgrading kubernetes runner to Provide a method for upgrading kubernetes runner application via kubernetes integration
changed title from Provide a method for upgrading kubernetes runner to Provide a method for upgrading kubernetes runner application via kubernetes integration
Daniel Gruessochanged title from Provide a method for upgrading kubernetes runner application via kubernetes integration to Upgrade kubernetes runner application via kubernetes integration
changed title from Provide a method for upgrading kubernetes runner application via kubernetes integration to Upgrade kubernetes runner application via kubernetes integration
Can we assume users will have stock standard configurations ?
Are there any users who may have edited their helm applications after they have installed via GitLab managed apps ? (Feels increasingly difficult, since mutual auth)
For those following this issue, feature freeze is not actually for another 6 days; @gitlab-bot appears to have applied the label incorrectly, so that can be disregarded at this point.
Apologies for the spam, but as with yesterday this issue has not actually missed %11.7 at this point (feature freeze is not for several days). Re-removing the missed:11.7 label.
The description is referencing the application version number, which I think makes sense as that is what is being automatically updated.
Do we know which apps do not have the appVersion?
Alternatively, we could show Upgraded x days ago and inlude to [appVersion] if its available. That way, the user knows when the app was last updated even if the version number isn't provided.
Regarding the chart number, how easy is it to determine errors caused by the chart bugs? Would it make more sense to display an error to the user, rather than always displaying the chart number? I could see some benefit to always linking to the current chart version (if thats possible for all apps?), but that could be separated out of this issue potentially.
Looks like all the apps we currently have all have appVersion - so we could cross that bridge when we encounter an app that does not have an appVersion.
Alternatively, we could show Upgraded x days ago and inlude to [appVersion] if its available. That way, the user knows when the app was last updated even if the version number isn't provided.
Sounds good to me.
Regarding the chart number, how easy is it to determine errors caused by the chart bugs? Would it make more sense to display an error to the user, rather than always displaying the chart number? I could see some benefit to always linking to the current chart version (if thats possible for all apps?), but that could be separated out of this issue potentially.
Having the information about what chart version is always helpful to isolate the problem. In any case we lock to one chart version on every GitLab release, so we can infer the chart version if we knew the GitLab version(s) involved.
Yes, if we can display the errors, we should. I anticipate that due to security issues, we might not be able to display the full error but rather display an summary then tell them where to find the full details (which is accessible by an admin)
I believe it is probably going to be important to show the chart number. Even if the user can tell the app number is out of date there is no way to actually figure out if there has been a new chart released for that app version anyway. Since we're using the chart version to install it's the only indicator we have that there is an upgrade available for you. App version also seems important though.
My initial reaction is that it makes sense to show the appVersion and link to the chart. I'm not sure the chart number is important, the actual content within the chart seems important.
Thinking further, maybe the appVersion isn't that important to show. If we tell the user when the app was last updated and link to the chart, they have all the information if and when necessary.
Thanks, this is making it more concrete to me now. Wouldn't linking to the Helm chart with the app version raise questions about which helm chart version is actually being upgraded to ? I can imagine a scenario where chart developer fixes a bug in the chart but the appVersion stays the same.
It is a lot more clearer to me to show the chart version + the link :
Upgraded 8 days ago to 0.1.44 (appVersion: 11.7.0). View Helm chart
We are upgrading the chart, yes :) The commands would be something like:
helm upgrade runner --version <new-chart-version>
So the chart definitely gets upgraded.
The application may or may not get upgraded - I as the person upgrading the helm chart does not control that. The helm chart developer controls that by updating the chart to use a new version of the application.
The appVersion is displayed for each application and links to the chart. When an application is automatically upgraded, we show the user when it was last updated.
Application fails to automatically update
When an application upgrade fails, we should retry # times (same as jobs, which I think is 3 times). When the upgrade fails # times, we should notify the group/project owner of the failure via email.
(Note: the project line would not appear for group level clusters)
If the upgrade fails # times, we should revert back to the previous working version (when possible) and allow the user to manually retry.
If successful, we show a toast message. If unsuccessful, we show the danger alert once again.
cc @pedroms@andyvolpe, please review this use case of alerts as it relates to our recent conversation. Let me know if you have thoughts or improvements! I mimicked job failures as close as possible, including the email, danger alert, and blue retry button.
@tauriedavis I think this is the appropriate usage of alerts and toasts! The alert (non-dismissable) warns the user a config error has occurred and the toast gives the user immediate feedback and confirmation of their successful attempt at upgrading.
@tauriedavis yes, I agree with this approach of using alerts and toasts.
About the error alert in the app:
Should the retry button be inside the alert?
You change the version text to v1.0 Upgrade failed 2 days ago but this gives the impression that we failed to upgrade to v1.0. I think we should not change this text and keep it with the last successful upgrade. Maybe you can improve the error alert text to “Something went wrong when updating GitLab Runner to v1.0 (2 days ago). …” What do you think?
Am still porting existing functionality from EE and reconciling lots of duplication. And then I can implement what's not there. This Merge Request has the ability to upgrade an application; also starts storing when the application was last updated, and application upgrade status (updating, updated, update_errored)
I think what's not there are:
Triggering the automatic update on each GitLab release (discussion)
user notification when error fails
showing when app was last updated
showing app or chart version with link to chart
and a button to allow users to trigger upgrade again
I've updated the description. Please reach out if there are any questions or if I can help.
@jerasmus - I updated the description to use a banner alert for the success message. This should use a toast message, but that component has yet to be implemented by the frontend team. I've created UX Debt issue to account for this https://gitlab.com/gitlab-org/gitlab-ce/issues/57017. The UX team has struggled to get this component implemented - it would be great if we were able to work towards adding it to GitLab.
Thanks @tauriedavis, I was going to suggest that we scope that off too since I think this would probably become a discussion point amongst the frontend team before we decide on a dependency for the toast messages.