Enable merge train for gitlab-org/gitlab
This issue is for tracking using merge train for the gitlab-org/gitlab
project.
Known issues / improvements
-
Pipelines for Merge requests are a prerequisite. -
Re-Enabling Merge Trains for www-gitlab-com (gitlab-org&1881 - closed) -
Allow parent project's developers to create pip... (gitlab-org/gitlab#217451 - closed) -
gitlab-org/gitlab#35135 (comment 318542454) -
`retry` isn't taken in account for Merge Train ... (gitlab-org/gitlab#244856 - closed) -
https://gitlab.com/gitlab-org/release-tools/-/issues/479+ -
[Non-blocker] Prevent pushes to a branch once an MR is on the... (gitlab-org/gitlab#244878)
Plan
GitLab Master Pipeline Success Rate has been been above 90% for the past few months.
%13.4
Experiment duringClick to expand
8 6.5 hour experiment on 2020-09-03
At 0800 UTC:
-
Check state of the ci_merge_train_logging
flag:/chatops run feature get ci_merge_train_logging
State: Enabled Enabled Globally: false Scoped to: ["Project:19608354", "Project:7764"]
-
Check state of the disable_merge_trains
flag:/chatops run feature get disable_merge_trains
State: Enabled Enabled Globally: false Scoped to: ["Project:278964"]
-
Enable the ci_merge_train_logging
flag forgitlab-org/gitlab
:/chatops run feature set --project=gitlab-org/gitlab ci_merge_train_logging true
State: Enabled Enabled Globally: false Scoped to: ["Project:19608354", "Project:278964", "Project:7764"]
-
Disable the disable_merge_trains
flag forgitlab-org/gitlab
:/chatops run feature set --project=gitlab-org/gitlab disable_merge_trains false
State: Disable Enabled Globally: false
Notes:
- Due to several set to MWPS, the Merge Train was recreated several times. Next time we should probably remove all the MWPS MRs before enabling the Merge Train:
project = Project.find_by_full_path('gitlab-org/gitlab') project.merge_requests.with_auto_merge_enabled.select { |mr| mr.auto_merge_strategy == 'merge_when_pipeline_succeeds' }.map(&:iid)
-
retry
isn't taken in account for Merge Train pipeline jobs? => gitlab-org/gitlab#244856 (closed)- We cannot manually retry failed jobs in a Merge Train pipeline, this is expected.
- Artifacts upload error made the first MR in the train to be removed, triggering new pipelines for all the subsequent MRs in the train.
- Automatic Gitaly updates bypass the Merge Train, e.g. gitlab-org/gitlab!41294 (merged). => https://gitlab.com/gitlab-org/release-tools/-/issues/479
- It's possible to push new commits even if the MR is on the Merge Train already, e.g. gitlab-org/gitlab!41139 (comment 406804989). This makes the train derail and force new pipelines to be created for all the MRs coming after the MR. Feature proposal to prevent pushes once an MR is on the Merge Train. => gitlab-org/gitlab#244878
-
package-and-qa
, might slow down the train. - We might want to disable Review Apps for MR on the Merge Train.
- Pipeline type isn't an addition of all the previous pipelines in the train, e.g. a MR that only touches a frontend test has a "small" pipeline with no Review App () even though the previous pipeline in train do have Review App ().
At 1430UTC:
-
Enable the disable_merge_trains
flag forgitlab-org/gitlab
:/chatops run feature set --project=gitlab-org/gitlab disable_merge_trains true
State: Enabled Enabled Globally: false Scoped to: ["Project:278964"]
-
Disable the ci_merge_train_logging
flag forgitlab-org/gitlab
:/chatops run feature set --project=gitlab-org/gitlab ci_merge_train_logging false
State: Enabled Enabled Globally: false Scoped to: ["Project:19608354", "Project:7764"]
24 hour experiment on 2020-09-xx based on results of previous experiments
48 hour experiment on 2020-09-xx based on results of previous experiments
~~During the experiment we will not merge any Community contribution MRs unless gitlab-org/gitlab#217451 (closed) is completed before the test.~~ Community contribution MRs can be merged now that gitlab-org/gitlab#217451 (closed) is turned on for GitLab.com
Experiment questions
Engineering Productivity is looking to understand the following items with the experimentation:
- What jobs are in the pipeline based on the
changes:
logic for the 2nd and 3rd pipelines in the train? If the 1st MR in the train is a code change and the 2nd is a docs change would the 2nd pipeline be a docs + code or just docs pipeline graph - How do flaky specs and other transient failures impact the merge train?
- Is there any change for throughput during the experiment?
- Is there any change for pipeline duration during this the experiment?
Implications
Using Pipelines for Merged Results itself doesn't contribute to the CI cost increase, as the number of created pipelines won't be different. Although, this could slightly increase Gitaly load because it passes recheck
to MergeabilityCheckService
, so we'd better to monitor the load.
Using Merge Train could increase CI cost, as it's basically creates additional pipeline per merging an MR, so basically we can estimate the increase diff with this simple formula - daily_increase = per-pipeline-cost * daily_merged_mr_count
. There is an edge case that when a merge request has been dropped from a merge train, pipeline reconstruction happens on the train, however, this reconstraction happens maximum four pipelines per train, which means we wouldn't observe significant CI cost increase unless pipeline failures on merge train happen frequently.
Performance-wise, it aslo slightly increases Gitaly load because it generates train-ref per merge request on train. Also, we should look at the sidekiq worker AutoMergeProcessWorker
's activity https://dashboards.gitlab.net/d/000000124/sidekiq-workers?orgId=1&refresh=5s&var-worker=AutoMergeProcessWorker%23perform&var-database=influxdb-01-inf-gprd closely as it's the main sidekiq job for processing merge train orchestration.
Lastly, we've already started working on dogfooding on CE/EE (See https://gitlab.com/gitlab-org/gitlab-ce/issues/57190, https://gitlab.com/gitlab-org/quality/team-tasks/issues/195 and https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/15761). I'd suggest we start dogfooding on www-gitlab-com for measuring/collecting some metrics and making sure that it's enough performant on CE/EE, as I'd expect CE/EE would introduce bigger impact than that.
If we observe troubles, problems, or unacceptable-performance/cost-increase, we can immediately stop using the above features by simply disabling the checkbox https://docs.gitlab.com/ee/ci/merge_request_pipelines/pipelines_for_merged_results/#enabling-pipelines-for-merged-results. All merge requests will be dropped from merge train immediately and go back to the previous behavior
Debugging
- https://gitlab.com/api/v4/projects/278964/merge_trains?scope=active&sort=asc
- https://dashboards.gitlab.net/d/sidekiq-queue-detail/sidekiq-queue-detail?orgId=1&var-PROMETHEUS_DS=Global&var-environment=gprd&var-stage=main&var-queue=auto_merge:auto_merge_process
- https://log.gprd.gitlab.net/app/kibana#/discover?_g=h@adfdeeb&_a=h@1ab0ea1