Enable Auto DevOps by default for self managed instances of GitLab
Problem to solve
Once Auto DevOps is GA and proven, it should be enabled by default for all on-prem (self-hosted) installations.
Enabling Auto DevOps by default means GitLab runners performing additional work for
n number of projects for which Auto DevOps, may or may not work. To mitigate these inefficiencies we want to at least Automatically disable Auto DevOps for a project if the first pipeline failed. The following are also considered important to address but not critical:
- Skip auto devops jobs based on license
- Support db migration and initialization for Auto DevOps template
- Don't run Auto DevOps for projects which don't have a shared runner configured
- Skip Auto DevOps stage when necessary components for the stage are not present
- Notification for first failed pipeline after auto devops is automatically enabled
Set Auto DevOps to "enabled" instance wide as the default Auto DevOps settings.
Current plan for enabling auto devops by default:
- Aim to merge the following before first RC:
- If staggered rollout via feature flag has no major issues and makes it to 100%, enable ADO by default for self-managed in 11.3 on 2018-09-22
- Display Banner To Notify Users If The Project Is Implicitly Opted In To Auto DevOps: https://gitlab.com/gitlab-org/gitlab-ce/issues/50535
- Internal communication with support
- External communication prior to release (tweeter, blog)
What does success look like, and how can we measure that?
Auto DevOps jobs are triggered automatically after instance is upgraded.
There are several discussions in various threads about some risks that will need to be approved before we enable this for on premise customers. These have been separated into known risks (which are mostly UX issues) and hypothetical risks which are more complicated and relate to security, reliability and costs. The risks I've outlined are exactly that "risks". We don't know to what extent our customers may suffer from any of these problems but they are at least things we think our customers may end up experiencing and being bothered by. These risks are intended to be an objective list of things we believe customers may be frustrated by when we enable it for them by default and not just an exhaustive list of things Auto DevOps could do better but this can occasionally be subjective so may need revision.
- Customers that don't have runners configured at all will have stuck jobs created: https://gitlab.com/gitlab-org/gitlab-ce/issues/49081
- Even though we disable Auto DevOps after the first failure our implementation does not handle the case where many pipelines are created in one push (eg. pushing lots of branches and tags) so the UX is not ideal as many pipelines will be created that may all fail.
- Auto DevOps runs stages which are not relevant for some projects which use excessive resources (runner time, object storage)
- Current users of Auto DevOps believe it to be quite slow due to lots of docker images being downloaded uploaded which can waste lots of runner time (the pipeline can take around 30 minutes on a fast internet connection): https://gitlab.com/gitlab-org/gitlab-ce/issues/49562
- Customers making use of external CI (eg. Jenkins) may experience strange results for CI (eg. failed Merge Request that is actually passing on Jenkins). We have not done any testing about how Auto DevOps interacts with external CI like Jenkins.
- Customers may configure their own CI runners which are now running the Auto DevOps pipeline possibly unexpectedly as we enable this setting for them. As such running certain commands on their servers (runners) may cause very strange things to happen (eg. running
rspecwhen you have a
DATABASE_URLset on the server can cause very dangerous things to happen, like truncating a production DB, if you weren't intending on running this command on this host). This risk is higher with shell runners as they inherit the entire environment and have wider access to the server filesystem etc.
- Customers may experience some of the same scale problems we've predicted on GitLab.com but to reiterate those can be rephrased for our customers as:
- Customers with very large numbers of repositories and high numbers of pushes may experience significant delays to their CI/CD which will potentially affect developer productivity or delay production deployments as their runners need to catch up with a very long queue of jobs
- Customers with very large numbers of repositories and high numbers of pushes may end up with a very large object storage bill as we store the docker images created in the
buildstage of Auto DevOps
- Depending on where the runner is hosted and where the object storage is hosted the customers with very large numbers of repositories and high numbers of pushes may end up with a very large ingress/egress bill for docker images being pushed/pulled during the
deployphases of the Auto DevOps pipeline
- There is a slim chance that somebody has an existing project configured with a Kubernetes cluster but not using GitLab CI that is now going to end up being deployed to a cluster with an internet facing URL but never intended this to be deployed to the public. This is now incredibly unlikely that we don't plan to automatically set a domain name for them anymore (see https://gitlab.com/gitlab-org/gitlab-ce/issues/45560#note_101947623). We did however at least have one customer complain about this on
gitlab.comas they were concerned that their application may have been made public online.
- Customers may have docker runners setup that do not support Docker in Docker and as such their builds will fail in a way that is not helpful to them and will cause some frustration
Risks should be accepted before we merge #21157 (closed) (not needed until we've done some 1% testing on Gitlab.com):