Investigate GroupsFinder usage and possibly prevent some use cases that result in performance issues
A seemingly simple and harmless GroupsFinder
call:
GroupsFinder.new(nil, { search: params[:group_id] }).execute.first
in !141358 (merged) resulted in a production issue.
The generated query ran for a long time and caused query timeout errors:
Time: 6.661 min
- planning: 10.830 ms
- execution: 6.660 min
- I/O read: 5.559 min
- I/O write: 344.306 ms
Shared buffers:
- hits: 17080624 (~130.30 GiB) from the buffer pool
- reads: 4156111 (~31.70 GiB) from the OS file cache, including disk I/O
- dirtied: 27703 (~216.40 MiB)
- writes: 18779 (~146.70 MiB)
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/27605/commands/86113 (internal)
The query itself is may not necessarily be the problem, because GroupsFinder
calls that result in a smaller number of results ran fine:
someorg/somegroup
:
Time: 7.265 s
- planning: 6.492 ms
- execution: 7.258 s
- I/O read: 6.743 s
- I/O write: 0.000 ms
Shared buffers:
- hits: 15291 (~119.50 MiB) from the buffer pool
- reads: 10669 (~83.40 MiB) from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/27640/commands/86222 (internal)
zzzzzzzzzz/yyyyyyyyyy
:
Time: 986.685 ms
- planning: 10.687 ms
- execution: 975.998 ms
- I/O read: 916.884 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 746 (~5.80 MiB) from the buffer pool
- reads: 1023 (~8.00 MiB) from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/27640/commands/86230
We should review the GroupsFinder
usages to see if there are others clients using a nil
user + search
param. If not, we could have a validation around this (nil
user + only search
param = not allowed).
For scenarios like !141358 (merged), where we have the full group path, the singular finder GroupFinder.find_by_full_path
is the better finder to use.
cc @10io