Geo: Figure out how to deal/optimize keep-around refs
In GitLab CE, it appears we have over 43,000 keep-around
refs. This is affecting performance significantly in a number of ways:
- An initial clone of the CE repository takes over 40+ minutes for some reason. An strace shows an endless stream of:
21400 00:42:29.187024 stat("./refs/keep-around/07516f8e21f69f0cf3ae6b6354eeaef5ab877ac6", {st_mode=S_IFREG|0644, st_size=41, ...}) = 0 <0.000022>
21400 00:42:29.187076 lstat("./refs/keep-around/07516f8e21f69f0cf3ae6b6354eeaef5ab877ac6", {st_mode=S_IFREG|0644, st_size=41, ...}) = 0 <0.000014>
21400 00:42:29.187116 open("./refs/keep-around/07516f8e21f69f0cf3ae6b6354eeaef5ab877ac6", O_RDONLY) = 10 <0.000077>
21400 00:42:29.187264 read(10, "07516f8e21f69f0cf3ae6b6354eeaef5"..., 256) = 41 <0.000017>
21400 00:42:29.187317 read(10, "", 215) = 0 <0.000014>
21400 00:42:29.187353 close(10) = 0 <0.000016>
21400 00:42:29.187392 stat("./refs/keep-around/060e14edd9f64ba92b17b3f12b42c8191afb3f25", {st_mode=S_IFREG|0644, st_size=41, ...}) = 0 <0.000023>
21400 00:42:29.187446 lstat("./refs/keep-around/060e14edd9f64ba92b17b3f12b42c8191afb3f25", {st_mode=S_IFREG|0644, st_size=41, ...}) = 0 <0.000016>
21400 00:42:29.187488 open("./refs/keep-around/060e14edd9f64ba92b17b3f12b42c8191afb3f25", O_RDONLY) = 10 <0.000085>
21400 00:42:29.187617 read(10, "060e14edd9f64ba92b17b3f12b42c819"..., 256) = 41 <0.000014>
21400 00:42:29.187661 read(10, "", 215) = 0 <0.000011>
21400 00:42:29.187693 close(10) = 0 <0.000013>
21400 00:42:29.187727 stat("./refs/keep-around/080a359e0cad3f2a08fc0050aef4ef038bd645dc", {st_mode=S_IFREG|0644, st_size=41, ...}) = 0 <0.000018>
21400 00:42:29.187772 lstat("./refs/keep-around/080a359e0cad3f2a08fc0050aef4ef038bd645dc", {st_mode=S_IFREG|0644, st_size=41, ...}) = 0 <0.000029>
21400 00:42:29.187841 open("./refs/keep-around/080a359e0cad3f2a08fc0050aef4ef038bd645dc", O_RDONLY) = 10 <0.000050>
21400 00:42:29.187952 read(10, "080a359e0cad3f2a08fc0050aef4ef03"..., 256) = 41 <0.000049>
21400 00:42:29.188052 read(10, "", 215) = 0 <0.000016>
21400 00:42:29.188127 close(10) = 0 <0.000016>
- After the initial fetch is done, subsequent git fetches take a few minutes to negotiate in the
info-refs
endpoint, so long that the JWT token that was generated has already expired, preventing the clone from even starting. We could raise the validity time of the token, but that won't help improve the speed of the fetches in general. (https://gitlab.com/gitlab-org/gitlab-ee/issues/4881)
Since the number of keep-around references will continue to increase over time, I think we need to do something about this.
/cc: @nick.thomas, @DouweM
Edited by Katrin Leinweber (GTLB)