Pipeline duration metrics do not seem correct with child pipelines
It seems our pipelines duration calculation could be wrong in GitLab (and therefore Sisense). See https://gitlab.com/gitlab-org/gitlab/-/pipelines/452082225 which reports taking only 29 minutes, while the child pipeline itself takes 62 minutes: https://gitlab.com/gitlab-org/gitlab/-/pipelines/452086059. I noticed that while looking at https://app.periscopedata.com/app/gitlab/564156/EP---Pipelines-health?widget=11750100&udv=895976 where “code” pipelines take longer than “frontend” ones, which seems unexpected.
-
@rymai - find out from with grouppipeline execution if this is desired or unexpected. We believe this should be unexpected for
start-review-app-pipelinebecause ofstrategy:depend - @rymai - continue finding a few examples
- @kwiebers identify the scope of impacted PIs and dashboards.
Discussed in https://docs.google.com/document/d/1vgEAUOzph1WS4_0olmywYai9HMyIvo63l2BaPIOrm90/edit#bookmark=id.lv9nlob25buz
Example:
-
Parent pipeline queued for 7 seconds (shown in the UI), started at
2022-03-11 10:23:10.063and finished at2022-03-13 11:26:05.387(2 days, 1 hour, 3 minutes), duration reported as 53 minutes (shown in the UI) -
Child pipeline queued for 2 days, 40 minutes (shown in the UI), started at
2022-03-13 11:24:03.062and finished at2022-03-13 11:26:05.249(2 minutes!), duration reported as 32 minutes (shown in the UI)-
review-build-cng-envqueued for 1 second, started at2022-03-11 10:44:45.615and finished at2022-03-11 10:45:36.228(50 seconds) -
allure-report-qa-reliablequeued for 53 minutes, started at2022-03-11 11:37:33.911and finished at2022-03-11 11:38:56.446(1 minute 23 seconds) -
review-stopqueued for 2 days, 40 minutes,2022-03-13 11:24:04.194and finished at2022-03-13 11:26:05.168(2 minutes). This timestamp coincides with theAutoStopCronWorkerwhich runs by default every hour on minute 24: https://gitlab.com/gitlab-org/gitlab/-/blob/3a71a48b5954d8b03f91885913b80c8178e30e00/config/gitlab.yml.example#L517-519
-
Notes:
- Pipeline durations are based on jobs' durations (see https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/ci/pipeline/duration.rb), it's a CI minutes duration, not a wall-clock duration.
-
start-review-app-pipelinestarts approximately 21 minutes 30 seconds (2022-03-11 10:44:44.184) after the parent pipeline started (2022-03-11 10:23:03.435) - Parent pipeline duration (including the child pipeline one) should then be 53 minutes - 21 minutes = 32 minutes, which what's stored (and shown in the UI) in the database
Open questions
- Why does the child pipeline
started_attimestamp (2022-03-13 11:24:03.062) is set to (almost) the samestarted_attimestamp as thereview-stop(2022-03-13 11:24:04.194)?
Conclusion
By looking closer, it seems start-review-app-pipeline is actually not in the critical path anymore: it takes 28 minutes on average these days (https://app.periscopedata.com/app/gitlab/496118/EP---Sandbox?widget=14116510&udv=785399), and they start after 20-21 minutes due to its prerequisite jobs.
That means some of the rspec * jobs are now the critical jobs in both code and frontend pipelines, but code pipelines run the additional rspec:undercoverage job, which explains why code pipelines are slower than frontend pipelines!
Also, it seems rspec system jobs got 5 minutes slower since 2022-02-14 (Apollo was updated from 2 to 3 that day: https://gitlab.slack.com/archives/C0GQHHPGW/p1644837170251409):
-
rspec system: https://app.periscopedata.com/app/gitlab/652085/EP---Jobs-Durations?widget=6914227&udv=1005715 -
rspec-ee system: https://app.periscopedata.com/app/gitlab/652085/EP---Jobs-Durations?widget=6914193&udv=1005715 -
rspec-ee system geo: https://app.periscopedata.com/app/gitlab/652085/EP---Jobs-Durations?widget=6914199&udv=1005715
Lastly, it seems the child pipelines went from 50 to 30 minutes since 2022-02-14/16... (We did merge gitlab-org/gitlab!80121 (merged) on 2022-02-15).
We should investigate that.