Skip to content

Identify/Reduce job timeout errors

Problem statement

Job timeouts are one of the most frequent causes of failed CI/CD jobs in merge requests.

Goals

  1. Identify the most common causes of timeouts, and reduce them.
  2. Ensure we have some monitoring in place to keep an eye on job timeouts, and tools to understand why they happen.
  3. Develop tooling to help Engineers self-serve and understand why their jobs timed out.

Technical thoughts

A job timeout is a symptom, not a cause.

Possible causes include:

  1. Knapsack expected/actual duration times differ greatly
  2. Resource saturation makes the tests run a lot longer than expected
  3. Blocking/Stuck tests

Impact

  • CI Resource Waste: approximately 19,500 minutes of CI time wasted due to job_timeout, along with significant compute resources.
  • Developer Experience: - Reduced productivity due to job_timeout failure causing Failed pipelines.

CURRENT STATUS

job_timeout Failure Number of occurences(in last 28 days)

CI Minutes Wasted

( 65 mins as average)

% of total failed jobs due to job_timeout % of total failed jobs Issue Link label / Team handling Status

E2E Tests_JOB_TIMEOUT

76 4940 mins 24% 0.5%

gitlab-org/gitlab#558136

test governance

Child Issue to address: "Capture artifacts when E2E test jobs time out" Link

spec_lib_gitlab_ci_templates_katalon_gitlab_ci_yaml_spec_job_timeouts

36 2340 mins 11% 0.25%

gitlab-org/gitlab#559984

KATALAN team

cross_database_modification

47 3055 mins 15% 0.3%

gitlab-org/gitlab#557248

Discussing ongoing in the thread

spec_features_work_items_detail_shortcuts_work_item_spec_job_timeouts

19 1235 mins 6% 0.13%

gitlab-org/gitlab#558139

Team Planning

spec_features_groups_runners_owner_manages_runners_spec_rspec_at_80_min

18 1170 mins 5% 0.12%

gitlab-org/gitlab#559998

Runner

ee_spec_lib_ee_gitlab_ci_pipeline_chain_validate_security_orchestration_policy_spec_job_timeouts

14 910 mins 4% 0.09%

gitlab-org/gitlab#560094

Security Policies

spec_models_blob_viewer_gitlab_ci_yml_spec_job_timeouts

11 715 mins 3% 0.07%

gitlab-org/gitlab#560095 (closed)

Source Code

spec_lib_gitlab_ci_config_external_mapper_normalizer_spec_job_timeouts

10 650 mins 3% 0.069%

gitlab-org/gitlab#560096 (closed)

Pipeline Authoring

Click to Full Job_timeout Analysis

============================================================
FAILURE CATEGORY SUMMARY
============================================================
E2E Tests_JOB_TIMEOUT                                                                                                                      : 76
cross_database_modification                                                                                                                : 47
spec_lib_gitlab_ci_templates_katalon_gitlab_ci_yaml_spec_job_timeouts                                                                      : 36
spec_features_work_items_detail_shortcuts_work_item_spec_job_timeouts                                                                      : 19
rspec_valid_rspec_errors_or_flaky_tests                                                                                                    : 18
spec_features_groups_runners_owner_manages_runners_spec_rspec_at_80_min                                                                    : 18
ee_spec_lib_ee_gitlab_ci_pipeline_chain_validate_security_orchestration_policy_spec_job_timeouts                                           : 14
spec_models_blob_viewer_gitlab_ci_yml_spec_job_timeouts                                                                                    : 11
spec_lib_gitlab_ci_config_external_mapper_normalizer_spec_job_timeouts                                                                     : 10
ruby_generic_failure                                                                                                                       :  8
RestClient::Exceptions::ReadTimeout_JOB_TIMEOUT                                                                                            :  5
spec_features_issues_user_uploads_file_to_note_spec_job_timeouts                                                                           :  4
spec_features_ide_spec_job_timeouts                                                                                                        :  3
spec_lib_gitlab_metrics_exporter_base_exporter_spec_job_timeouts                                                                           :  3
  Compiling frontend assets with webpack, running: yarn webpack > tmp/webpack-output.log 2>&1JOB_TIMEOUT                                   :  2
spec_features_discussion_comments_snippets_spec_job_timeouts                                                                               :  2
spec_lib_gitlab_metrics_exporter_base_exporter_spec_rspec_at_80_min                                                                        :  2
  ==> 'bundle exec rake db:drop db:create db:schema:load db:migrate gitlab:db:lock_writes' succeeded in 5324 seconds.JOB_TIMEOUT:  1
  JOB_TIMEOUT                                                                                                                          :  1
[5/5] Building fresh packages...JOB_TIMEOUT                                                                             :  1
assets_compilation                                                                                                                         :  1
db_cross_schema_access                                                                                                                     :  1
ee_spec_features_issues_issue_sidebar_spec_job_timeouts                                                                                    :  1
ee_spec_features_merge_trains_user_adds_merge_request_to_merge_train_spec_job_timeouts                                                     :  1
ee_spec_graphql_types_requirements_management_test_report_state_enum_spec_job_timeouts                                                     :  1
ee_spec_lib_sbom_occurrence_uuid_spec_job_timeouts                                                                                         :  1
git_issues_network_error                                                                                                                   :  1
logs_too_big_to_analyze                                                                                                                    :  1
spec_features_admin_runners_admin_manages_runners_spec_rspec_at_80_min                                                                     :  1
spec_features_discussion_comments_merge_request_spec_job_timeouts                                                                          :  1
spec_features_ide_user_opens_merge_request_spec_job_timeouts                                                                               :  1
spec_features_incidents_incident_details_spec_job_timeouts                                                                                 :  1
spec_features_merge_request_user_merges_only_if_pipeline_succeeds_spec_job_timeouts                                                        :  1
spec_features_merge_requests_user_sees_note_updates_in_real_time_spec_job_timeouts                                                         :  1
spec_features_projects_files_user_deletes_files_spec_job_timeouts                                                                          :  1
spec_features_projects_files_user_edits_files_spec_job_timeouts                                                                            :  1
spec_features_projects_files_user_replaces_files_spec_job_timeouts                                                                         :  1
spec_features_projects_fork_spec_job_timeouts                                                                                              :  1
spec_features_projects_spec_job_timeouts                                                                                                   :  1
spec_features_snippets_user_edits_snippet_spec_job_timeouts                                                                                :  1
spec_lib_gitlab_ci_config_external_mapper_base_spec_job_timeouts                                                                           :  1
spec_lib_gitlab_ci_pipeline_chain_create_spec_job_timeouts                                                                                 :  1
spec_lib_gitlab_ci_yaml_processor_test_cases_include_spec_job_timeouts                                                                     :  1
spec_serializers_ci_lint_result_entity_spec_job_timeouts                                                                                   :  1
spec_tasks_gettext_rake_spec_job_timeouts                                                                                                  :  1
-----------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                                      : 306
============================================================
Edited by Pranshu Sharma