Pipeline status stuck at "running" for certain pipelines that have already finished
There seems to be an issue where some pipeline's status is not updated even when the CI jobs have finished. Some examples can be found listed below where the status still says "running" when all the jobs have finished:
- https://gitlab.com/gitlab-com/gl-infra/gitter-infrastructure/commit/e209c2691004997ebefaeb30b0b7fbeecce23de5/pipelines
- https://gitlab.com/gitlab-org/gitaly/pipelines/7408787
- https://gitlab.com/gitlab-org/gitaly/pipelines/28755159
- https://gitlab.com/gitlab-org/gitaly/pipelines/28736996
@andrewn mentioned this is likely a sidekiq job that failed to update the status and we may be able to fix it with an occasional cron job that runs to tidy this up.
This is also creating downstream issues in our data warehouse in the table gitlab_dotcom_ci_pipelines where I have seen the following issues with relevant queries to reproduce these cases:
- Some pipelines that have status='running' also have finished_at and duration populated
SELECT
ci.CI_PIPELINE_ID,
ci.CREATED_AT,
ci.UPDATED_AT,
ci.STARTED_AT,
ci.FINISHED_AT,
ci.REF,
ci.PROJECT_ID,
projects.project_path,
namespaces_child.namespace_path,
ci.STATUS,
ci.USER_ID,
ci.CI_PIPELINE_DURATION/60 as ci_runner_minutes
FROM
ANALYTICS_STAGING.GITLAB_DOTCOM_CI_PIPELINES ci
INNER JOIN ANALYTICS.GITLAB_DOTCOM_PROJECTS_XF projects
ON ci.project_id = projects.project_id
INNER JOIN ANALYTICS.GITLAB_DOTCOM_NAMESPACES_XF namespaces_child
ON projects.namespace_id = namespaces_child.namespace_id
WHERE status = 'running'
AND DATE_TRUNC('MONTH',STARTED_AT) < '2018-12-01 00:00:00'
AND FINISHED_AT is not null
AND started_at is not null
AND ci.CI_PIPELINE_DURATION is not null
ORDER BY ci_runner_minutes DESC
- Some pipeline have status = 'running', but also have a finished at date and no duration and no start date
SELECT
ci.CI_PIPELINE_ID,
ci.CREATED_AT,
ci.UPDATED_AT,
ci.STARTED_AT,
ci.FINISHED_AT,
ci.REF,
ci.PROJECT_ID,
projects.project_path,
namespaces_child.namespace_path,
ci.STATUS,
ci.USER_ID,
ci.CI_PIPELINE_DURATION/60 as ci_runner_minutes
FROM
ANALYTICS_STAGING.GITLAB_DOTCOM_CI_PIPELINES ci
INNER JOIN ANALYTICS.GITLAB_DOTCOM_PROJECTS_XF projects
ON ci.project_id = projects.project_id
INNER JOIN ANALYTICS.GITLAB_DOTCOM_NAMESPACES_XF namespaces_child
ON projects.namespace_id = namespaces_child.namespace_id
WHERE status = 'running'
AND FINISHED_AT is not null
AND started_at is null
AND ci.CI_PIPELINE_DURATION is null
ORDER BY ci_runner_minutes DESC
- some pipelines started as far back as 2016 still have status='running' and no finished at date or duration recorded
SELECT
ci.CI_PIPELINE_ID,
ci.CREATED_AT,
ci.UPDATED_AT,
ci.STARTED_AT,
ci.FINISHED_AT,
ci.REF,
ci.PROJECT_ID,
projects.project_path,
namespaces_child.namespace_path,
ci.STATUS,
ci.USER_ID,
ci.CI_PIPELINE_DURATION/60 as ci_runner_minutes
FROM
ANALYTICS_STAGING.GITLAB_DOTCOM_CI_PIPELINES ci
INNER JOIN ANALYTICS.GITLAB_DOTCOM_PROJECTS_XF projects
ON ci.project_id = projects.project_id
INNER JOIN ANALYTICS.GITLAB_DOTCOM_NAMESPACES_XF namespaces_child
ON projects.namespace_id = namespaces_child.namespace_id
WHERE status = 'running'
AND DATE_TRUNC('MONTH',STARTED_AT) < '2018-12-01 00:00:00'
AND FINISHED_AT is null
AND started_at is not null
AND ci.CI_PIPELINE_DURATION is null
ORDER BY ci_runner_minutes DESC
Note: This may be related to #33818 also
@kathleentam @jjstark FYI, you can look at some of these issues here: https://app.periscopedata.com/app/gitlab/530329/WIP:-Davis-CI-Runner-Minutes