Introduce additional connection logic to LDAP so groups don't lose membership unnecessarily
Related to https://gitlab.com/gitlab-org/gitlab-ee/issues/480
The fix for #480 (closed) added some retry logic to LDAP connections which has certainly helped to avoid random connection failures. However, there is another component that we should consider and we did partially discuss in #480 (closed).
We had a scenario with a customer in https://gitlab.zendesk.com/agent/tickets/96258 where their bind user's account was locked for about 15 minutes. This was a temporary situation but unfortunately a group sync ran while the bind user was locked and would have presented as a connection error. As a result, GitLab cleared out all GitLab group membership for groups with LDAP group links. This is unfortunate and may be addressed by some of the additional logic we discussed earlier.
Specifically, we don't want to simply say that we will ignore connection failures and allow membership to remain indefinitely. That could be a security concern since the administrators/group owners are saying they want LDAP to be the source of truth. So if we want to address this we may need to track the number of failures or how long a failure has occurred. Within that frame we allow membership to remain unchanged. After some specific number of failures or amount of time we will remove members if we cannot re-establish communication.
cc/ @DouweM since you were involved in the original discussion.
For what it's worth, I think the changes shipped recently fixing #480 (closed) will alleviate a large portion of the problem so this may not be urgent, but it would definitely be nice. In the case of the customer, they used overrides quite heavily and now they have to reconstruct all of those since the members were wiped and re-added. That's a large inconvenience.