GitHub API (Octokit) search query can grow too long and run into a HTTP 414 (URI Too Long)

Update

Proposed solution for this issue: Use GitHub GraphQL API to perform this query instead of REST GET request. This will definitely increase the possible query length to the limit that seems to be unreachable. GET query string length limit that causes such error is 2KB and GraphQL body limit is 8KB as it uses POST (see below).

Summary

When mirroring or importing repositories from a GitHub instance, the Octokit client is used. When a search term is presented on the repository listing UI, the search query built by Octokit tries to inject every collaborated repository via repo: terms, and every discovered organization via org: terms in addition to the base query.

When these lists of repos or orgs are fairly long in length, it causes the resulting search API v3 request to be extremely long due to use of URI parameters.

The search fails with a HTTP 414 (URI Too Long) returned from the GitHub instance.

Steps to reproduce

Enable feature flag to use the new GitHub client implementation (legacy implementation appeared too slow at returning even listing results):

Feature.enable(:remove_legacy_github_client)
  1. Create a large number of orgs and repos on a GitHub instance
  2. Use GitHub integration in GitLab via a personal authentication token
  3. List importable repositories via the GitHub integration in the New Project screen
  4. Observe the list to be long when scrolling the list of repositories on GitHub, decide to search a name instead
  5. Enter search term in top right search bar on this page, issue the search
  6. Error appears: Requesting your github repositories failed

Example Project

What is the current bug behavior?

Searching for repositories by name on GitHub import page fails with an error

What is the expected correct behavior?

Searching for repositories by name on GitHub import page must show results

Relevant logs and/or screenshots

image

Output of checks

This was originally reported over a self-managed GitLab instance, importing against a GitHub Enterprise instance. Version of GitLab used was: v13.9.0-ee.

Results of GitLab environment info

Expand for output related to GitLab environment info

(For installations with omnibus-gitlab package run and paste the output of:
`sudo gitlab-rake gitlab:env:info`)

(For installations from source run and paste the output of:
`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)

Results of GitLab application Check

Expand for output related to the GitLab application check

(For installations with omnibus-gitlab package run and paste the output of: sudo gitlab-rake gitlab:check SANITIZE=true)

(For installations from source run and paste the output of: sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)

(we will only investigate if the tests are passing)

Possible solution

There are two directions that will help us to prevent such cases:

  • Use GitHub GraphQL API to perform this query instead of REST GET request. This will definitely increase the possible query length to the limit that seems to be unreachable. GET query string length limit that causes such error is 2KB and GraphQL body limit is 8KB as it uses POST (this issue).

  • Decrease search query by adding filters where to find repositories: owned, collaborated or in the specific organization (for this thing there is another issue).

More detailed logs including the very large API query is available on customer's ticket https://gitlab.zendesk.com/agent/tickets/213714 (internal, https://about.gitlab.com/handbook/support/internal-support/#viewing-support-tickets)

Edited by Magdalena Frankiewicz