Renaming a namespace with large number of projects without container repositories times out
Context
It is a well known problem that renaming/moving groups with container repositories is not possible Allow renaming/moving groups and projects with ... (&9459).
This was recently brought to our attention in this issue (internal) that attempting to rename a namespace that removed all container repositories is failing.
🐰 Digging the rabbit hole
Looking at the log entry with the timeout error: https://log.gprd.gitlab.net/app/discover#/doc/7092c4e2-4eb5-46f2-8305-a7da2edad090/pubsub-rails-inf-gprd-014596?id=8Pg6CYUBjhF1IBb-kXwZ.
Let's look at some values:
-
json.duration_s
:59.98642
. Obviously, we hit the web timeout (60 secs
). - Let's see where we spent this time:
-
json.external_http_duration_s
:1.806
. Out of ~60secs, we spent less than 2 seconds contacting external backends. The external backend here is the container registry (from the rails backend point of view). Side note, let's see how many calls to the container registry we did.json.external_http_count
:436
. - (A) Next culprit in line, the database.
json.db_duration_s
:36.248
.😱 we spent ~60% of the total time in the database. - (B) Where did we spent, the rest of the time. Well:
json.cpu_s
:23.122
. This means that we spent it in ruby code.
-
So, we have two problems: (A) and (B).
Let's take them one by one.
💾 (A) Database queries
(A) Let's look at the database logs with the correlation id:
Those 2 UPDATE
s combined take 20 seconds. That's a bit high but it could be expected as this action is about updating the group path. Nevertheless, a deeper investigation could be useful here to know if we could improve that.
The other aspect here is the amount of queries: json.db_count
15,736
. 15000+ queries for a single action n+1
situation. I didn't push too far here but I think that all_projects.find(&:has_container_registry_tags?)
is loading each project and then, it will load all container repositories. That's a small n+1
right there. Tossing a solution here, I think all_projects.includes(:container_repositories).find(&:has_container_registry_tags?)
could help here (it will load all container repositories of all projects in a single query).
♦ (B) Ruby loops
(B) There are several loops in functions that fundamentally only need to check a thing: if a given namespace has tags or not. I'm wondering here if we could improve things with the container registry gitlab api?
This can be prevented by adding a new endpoint to the Container Registry GitLab V1 API. See feat: add namespace with tags endpoint (container-registry#868 - closed).
See Use new registry API endpoint to check if a nam... (#388537 - closed) for the proposed solution.
🔮 Conclusions
I think we have two solid leads here to improve the group transfer action:
- Reduce the SQL queries.
- Reduce/remove the loops done in the code.
Please note that I looked quickly at both points and they probably need a deeper investigation.