`Rugged::Diff#find_similar!` takes forever on huge diffs

Zendesk: https://gitlab.zendesk.com/agent/tickets/93078

When creating a merge request for a particular project, the customer often runs into timeouts. The change sets are pretty large and they don't have any problem with smaller ones completing in time. We ran a profile on a particularly large change set and it took 165 seconds. This isn't possible in the UI due to a 60 second timeout.

The profile is too big to attach here so see Zendesk - that way we also don't need to mark this confidential. The profile shows that the majority of the time was spent in Rugged::Diff#find_similar!.

  • Is there anything we can do to improve this call?
  • Can we at least make this async so simply creating a merge request is possible and doesn't require waiting on the diff?

@stanhu Any thoughts here?

I also marked this as ~Discussion since it involves a MR but I suspect it may actually be ~Platform since it's not exclusive to MRs? @smcgivern