Identify/Reduce job timeout errors
Problem statement
Job timeouts are one of the most frequent causes of failed CI/CD jobs in merge requests.
Goals
- Identify the most common causes of timeouts, and reduce them.
- Ensure we have some monitoring in place to keep an eye on job timeouts, and tools to understand why they happen.
- Develop tooling to help Engineers self-serve and understand why their jobs timed out.
Technical thoughts
A job timeout is a symptom, not a cause.
Possible causes include:
- Knapsack expected/actual duration times differ greatly
- Resource saturation makes the tests run a lot longer than expected
- Blocking/Stuck tests
Impact
-
CI Resource Waste: approximately
19,500
minutes of CI time wasted due tojob_timeout
, along with significant compute resources. -
Developer Experience: - Reduced productivity due to
job_timeout
failure causing Failed pipelines.
CURRENT STATUS
job_timeout Failure | Number of occurences(in last 28 days) |
CI Minutes Wasted ( 65 mins as average) |
% of total failed jobs due to job_timeout | % of total failed jobs | Issue Link | label / Team handling | Status |
---|---|---|---|---|---|---|---|
|
76 | 4940 mins | 24% | 0.5% | test governance |
Child Issue to address: "Capture artifacts when E2E test jobs time out" Link |
|
|
36 | 2340 mins | 11% | 0.25% | KATALAN team | ||
|
47 | 3055 mins | 15% | 0.3% |
Discussing ongoing in the thread |
||
|
19 | 1235 mins | 6% | 0.13% | Team Planning | ||
|
18 | 1170 mins | 5% | 0.12% | |||
|
14 | 910 mins | 4% | 0.09% | |||
|
11 | 715 mins | 3% | 0.07% | |||
|
10 | 650 mins | 3% | 0.069% |
Click to Full Job_timeout
Analysis
Job_timeout
Analysis============================================================
FAILURE CATEGORY SUMMARY
============================================================
E2E Tests_JOB_TIMEOUT : 76
cross_database_modification : 47
spec_lib_gitlab_ci_templates_katalon_gitlab_ci_yaml_spec_job_timeouts : 36
spec_features_work_items_detail_shortcuts_work_item_spec_job_timeouts : 19
rspec_valid_rspec_errors_or_flaky_tests : 18
spec_features_groups_runners_owner_manages_runners_spec_rspec_at_80_min : 18
ee_spec_lib_ee_gitlab_ci_pipeline_chain_validate_security_orchestration_policy_spec_job_timeouts : 14
spec_models_blob_viewer_gitlab_ci_yml_spec_job_timeouts : 11
spec_lib_gitlab_ci_config_external_mapper_normalizer_spec_job_timeouts : 10
ruby_generic_failure : 8
RestClient::Exceptions::ReadTimeout_JOB_TIMEOUT : 5
spec_features_issues_user_uploads_file_to_note_spec_job_timeouts : 4
spec_features_ide_spec_job_timeouts : 3
spec_lib_gitlab_metrics_exporter_base_exporter_spec_job_timeouts : 3
Compiling frontend assets with webpack, running: yarn webpack > tmp/webpack-output.log 2>&1JOB_TIMEOUT : 2
spec_features_discussion_comments_snippets_spec_job_timeouts : 2
spec_lib_gitlab_metrics_exporter_base_exporter_spec_rspec_at_80_min : 2
==> 'bundle exec rake db:drop db:create db:schema:load db:migrate gitlab:db:lock_writes' succeeded in 5324 seconds.JOB_TIMEOUT: 1
JOB_TIMEOUT : 1
[5/5] Building fresh packages...JOB_TIMEOUT : 1
assets_compilation : 1
db_cross_schema_access : 1
ee_spec_features_issues_issue_sidebar_spec_job_timeouts : 1
ee_spec_features_merge_trains_user_adds_merge_request_to_merge_train_spec_job_timeouts : 1
ee_spec_graphql_types_requirements_management_test_report_state_enum_spec_job_timeouts : 1
ee_spec_lib_sbom_occurrence_uuid_spec_job_timeouts : 1
git_issues_network_error : 1
logs_too_big_to_analyze : 1
spec_features_admin_runners_admin_manages_runners_spec_rspec_at_80_min : 1
spec_features_discussion_comments_merge_request_spec_job_timeouts : 1
spec_features_ide_user_opens_merge_request_spec_job_timeouts : 1
spec_features_incidents_incident_details_spec_job_timeouts : 1
spec_features_merge_request_user_merges_only_if_pipeline_succeeds_spec_job_timeouts : 1
spec_features_merge_requests_user_sees_note_updates_in_real_time_spec_job_timeouts : 1
spec_features_projects_files_user_deletes_files_spec_job_timeouts : 1
spec_features_projects_files_user_edits_files_spec_job_timeouts : 1
spec_features_projects_files_user_replaces_files_spec_job_timeouts : 1
spec_features_projects_fork_spec_job_timeouts : 1
spec_features_projects_spec_job_timeouts : 1
spec_features_snippets_user_edits_snippet_spec_job_timeouts : 1
spec_lib_gitlab_ci_config_external_mapper_base_spec_job_timeouts : 1
spec_lib_gitlab_ci_pipeline_chain_create_spec_job_timeouts : 1
spec_lib_gitlab_ci_yaml_processor_test_cases_include_spec_job_timeouts : 1
spec_serializers_ci_lint_result_entity_spec_job_timeouts : 1
spec_tasks_gettext_rake_spec_job_timeouts : 1
-----------------------------------------------------------------------------------------------------------------------------------------------
Total : 306
============================================================