Put in place monitoring for our docker image pull tasks (CI Shared runners)
This is onf of the possible queries representing the times for this task: https://thanos-query.ops.gitlab.net/graph?g0.range_input=6h&g0.max_source_resolution=0s&g0.expr=sum(gitlab_runner_jobs%7Binstance%3D~%22gitlab-docker-shared-runners-manager-01.gitlab.com.%7Cgitlab-docker-shared-runners-manager-02.gitlab.com.%7Cgitlab-docker-shared-runners-manager-03.gitlab.com.%7Cgitlab-docker-shared-runners-manager-04.gitlab.com.%7Cgitlab-shared-runners-manager-3.gitlab.com.%7Cgitlab-shared-runners-manager-4.gitlab.com.%7Cgitlab-shared-runners-manager-5.gitlab.com.%7Cgitlab-shared-runners-manager-6.gitlab.com.%7Cprivate-runners-manager-3.gitlab.com.%7Cprivate-runners-manager-4.gitlab.com.%7Cshared-runners-manager-3.gitlab.com.%7Cshared-runners-manager-3.staging.gitlab.com.%7Cshared-runners-manager-4.gitlab.com.%7Cshared-runners-manager-4.staging.gitlab.com.%7Cshared-runners-manager-5.gitlab.com.%7Cshared-runners-manager-6.gitlab.com.%7Cwindows-shared-runners-manager-1.gitlab.com.%7Cwindows-shared-runners-manager-2.gitlab.com.%22%7D)%20by%20(executor_stage)&g0.tab=0
We should monitor when the exec time for some of these phases (the critical ones):
- {executor_stage="prepare"}
- {executor_stage="finish"}
- {executor_stage="docker_run"}
- {executor_stage="docker_pulling_image"}
- {executor_stage="docker_machine_release_machine"}
- {executor_stage="docker_creating_user_volumes"}
- {executor_stage="docker_creating_services"}
- {executor_stage="docker_creating_build_volumes"}
- {executor_stage="docker_cleanup"}
- {executor_stage="cleanup"}
exceed a health threshold.
(This was marked as an S2, so desired ETA would be a month from now)
cc/ @steveazz , @T4cC0re