recursive group sync is not complete
Originally reported by @jwalzer at https://gitlab.com/profclems/glab/-/issues/281
Description
When using recursive group sync on an (self hosted) gitlab instance not all subgroups and their repositories are fetched
Expected Behavior vs Actual Behavior
I'm using the following command to sync a gitlab group (and all subgroups and repos) recursively:
GITLAB_HOST=gitlab.internal.domain glab repo clone -g "parentgroup" -p -G -S
this fetches some of the subgroups and repositories from all parts of the tree beyond, but definitely not all. there are quite somme missing groups and repos.
- specifying a subgroup in the recursive tree allows filling up the gap but also does not deliver everything.
- I found, that doing exactly the same command, I get exactly the same synced repos. Of course glab complains about all the directories already existing.
Some Details to the gitlab instance: This is an internal server with classified content. So I cannot show nor document with specific repositories and group names, logs or commandlines will have to be anonymized. I'm aware, that this can make troubleshooting a bit harder if we are to find "patterns" but I'm sure this can somehow be reproduced in an open instance.
The group and repository structure is quite sophisticated. In my current situation we have >500 groups nested in hierarchies up to 10levels deep and >5000 repositories. My current customer uses these for granular controls and permissions.
I've seen this issue with at least two other customers already, so I assume it is not isolated to a misconfiguration on this customers instance.
It seems related to the visibility setting that is specified.
When limiting via --visibility
I can fetch additional repositories. but it seems others are not fetched then
Core main assumption is, that a simple "repo clone -g" will clone ALL repositories that are visible to me, and --visibility
is only a filter that further shrinks the selection. If this is different, then probably it should be mentioned in the documentation
I assume, there is some kind of request limiting in the REST API with maybe paged responses, that limit the number of repositories returned. At least that would explain this kind of behaviour.
Possible Fix
-
at least I wish to have documented:
- how the
--visibility
switch works by default - by default there is a chance not all repositories of a group will be fetched.
- how the
-
how to debug is a bit hard, bc I've not seen a way in the documentation, to turn debug logging on, so I cannot verify, if this is a behavior of a wrong request, wrong reply, or wrong parsing of the response ...
Steps to Reproduce
- create or use a gitlab instance with lots of groups and repositories (see numbers above)
GITLAB_HOST=gitlab.internal.domain glab repo clone -g "$parentgroup" -p -G -S
- See that not all groups and repositories are cloned
- You may try to play with the
--visibility
filter to change the amount of repos that are pulled
Logs
Your Environment
- Version used (Run
glab --version
):
$ glab version
glab version 1.22.0 (2022-01-10)
- Operating System and version: reproduced on:
- MacOS
- Linux