Tag listing is slower than expected
Overview
Our company uses GitLab container (version 4.34.0) registry separately from GitLab itself. Our workload requires handling a high number of image tags per repository, with the most problematic repository currently having ~500,000 tags. Tag listing for this largest repository takes much longer than expected (currently 72 seconds).
Problem
The documentation shows SQL query for tag listing. Even without pagination, it executes much faster than what we're observing:
# EXPLAIN (ANALYZE,BUFFERS,VERBOSE,TIMING) SELECT id, top_level_namespace_id, repository_id, manifest_id, name, created_at, updated_at FROM tags WHERE top_level_namespace_id = '3' AND repository_id = '2' ORDER BY name;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using tags_p_57_top_level_namespace_id_repository_id_name_key on partitions.tags_p_57 tags (cost=0.55..49535.39 rows=491457 width=162) (actual time=0.027..424.259 rows=491841 loops=1)
Output: tags.id, tags.top_level_namespace_id, tags.repository_id, tags.manifest_id, tags.name, tags.created_at, tags.updated_at
Index Cond: ((tags.top_level_namespace_id = '3'::bigint) AND (tags.repository_id = '2'::bigint))
Buffers: shared hit=506706
Query Identifier: 4304366928498447998
Planning:
Buffers: shared hit=99
Planning Time: 0.405 ms
Execution Time: 442.756 ms
(9 rows)
pg_stat_statements reveals that the problem is actually in another query:
# SELECT total_exec_time, mean_exec_time, calls, query FROM pg_stat_statements ORDER BY total_exec_time DESC limit 3;
total_exec_time | mean_exec_time | calls | query
--------------------+--------------------+----------+------------------------------------------------------------------------------------------------------------------
3327369600.2000575 | 98.34944978308418 | 33832112 | SELECT +
| | | $1 +
| | | FROM +
| | | tags +
| | | WHERE +
| | | top_level_namespace_id = $2 +
| | | AND repository_id = $3 +
| | | AND name LIKE $4 AND name > $5
2241813280.313653 | 457.7347755929858 | 4897625 | DELETE FROM manifests WHERE top_level_namespace_id = $1 AND repository_id = $2 AND digest = decode( $3 , $4)
61920865.86521688 | 1.8300905795042925 | 33834864 | SELECT +
| | | id, +
| | | top_level_namespace_id, +
| | | name, +
| | | repository_id, +
| | | manifest_id, +
| | | created_at, +
| | | updated_at +
| | | FROM +
| | | tags +
| | | WHERE +
| | | top_level_namespace_id = $1 +
| | | AND repository_id = $2 +
| | | AND name > $3 +
| | | ORDER BY +
| | | name +
| | | LIMIT $4
(3 rows)
Notice the very similar number of calls between the top query and the third query. However, the first query has no LIMIT, which it probably should. As a result, we see a significantly higher mean_exec_time.
pg_stat_activity shows many such queries running concurrently. In fact, most of the time SELECT query FROM pg_stat_activity WHERE state='active' ORDER BY query_start; returns nothing but queries like this (and those queries do not return anything meaningful, all of them are SELECT 1 FROM tags ...).
Suspected cause
A cursory search of the source code identifies only one location with a similar query: function HasTagsBeforeName. Perhaps a reasonable LIMIT can be added there.