Admin Job's running page fails with 500 error
Summary
Following our discussion on Slack (internal only), I am raising this issue to investigate further and fix error 500 while accessing the admin page jobs https://<giltab-url>/admin/jobs?scope=running
After investigation, we have narrowed it to SQL, which runs when you access the page. We have found out that it takes about two minutes to produce results. Hence, it times out. We have tried to vacuum the table and rebuild indexes and have had no improvement.
Below is the query plan.
gitlabhq_production=# SET statement_timeout = 0;
SET
gitlabhq_production=# EXPLAIN (ANALYZE, BUFFERS) SELECT "ci_builds"."id" FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND ("ci_builds"."status" IN ('running')) ORDER BY "ci_builds"."id" ASC LIMIT 31;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..22465.95 rows=31 width=8) (actual time=121223.800..121223.801 rows=0 loops=1)
Buffers: shared hit=9022155 read=1851020
-> Index Scan using ci_builds_pkey on ci_builds (cost=0.43..16754927.12 rows=23120 width=8) (actual time=121223.797..121223.798 rows=0 loops=1)
Filter: (((type)::text = 'Ci::Build'::text) AND ((status)::text = 'running'::text))
Rows Removed by Filter: 11743779
Buffers: shared hit=9022155 read=1851020
Planning Time: 1.502 ms
Execution Time: 121224.937 ms
(8 rows)
gitlabhq_production=# EXPLAIN (ANALYZE, BUFFERS) SELECT "ci_builds"."id" FROM "ci_builds" WHERE "ci_builds"."type" = 'Ci::Build' AND ("ci_builds"."status" IN ('pending')) ORDER BY "ci_builds"."id" ASC LIMIT 31;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1536.52..1536.60 rows=31 width=8) (actual time=0.082..0.083 rows=0 loops=1)
Buffers: shared hit=7
-> Sort (cost=1536.52..1539.01 rows=997 width=8) (actual time=0.081..0.081 rows=0 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=7
-> Index Scan using ci_builds_gitlab_monitor_metrics on ci_builds (cost=0.56..1506.84 rows=997 width=8) (actual time=0.027..0.027 rows=0 loops=1)
Index Cond: ((status)::text = 'pending'::text)
Buffers: shared hit=4
Planning Time: 0.493 ms
Execution Time: 0.116 ms
(11 rows)
Steps to reproduce
Accessing the admin job page via https:///admin/jobs?scope=running gives 500 error.
Zendesk ticket (internal only)
Example Project
What is the current bug behavior?
What is the expected correct behavior?
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true
)(we will only investigate if the tests are passing)