Improve our GraphQL API’s ability to calculate complexity

Argument complexity

Arguments can add complexity to a field, if the argument is in use.

Example: The :foo field has a complexity of 1. Its complexity becomes 3 when the complex_sort argument is passed.

module Types
  class MyType < BaseType
    field :foo, [MyType], complexity: 1, null: true do
      argument :complex_sort, GraphQL::BOOLEAN_TYPE, complexity: 2, required: false
    end
  end
end

This would replace our current method of applying complexity to arguments through resolvers.

Resolver complexity

Resolver complexity is applied as the complexity of any field that uses the resolver. A sanity check prevents fields from having different complexity to its resolver.

module Resolvers
  class MyResolver < BaseResolver
    type Types::MyType, null: true
    complexity 10
  end
end

This would replace our current method of applying complexity to resolvers.

Connection complexity

Specifying O(N+1) vs O(1) complexity

Complexity calculation on connections now defaults to considering the complexity to be O(N+1). This can be turned into a O(1) with complexity_type: :'O(1)'. We should use complexity_type: :'O(1)' when we know that the field is resolved with BatchLoader, eager loading, or other means that avoid N+1 issues.

E.g., given this type definition:

module Types
  class MyType < BaseType
    field :foo, MyType, complexity: 1, null: true
    field :bar, MyType, complexity: 1, complexity_type: :'O(1)', null: true
  end
end

Field :foo will have a complexity that takes into consideration the theoretical max number of nodes that could be returned (which is defined as GitlabSchema::DEFAULT_MAX_PAGE_SIZE, which is currently 100), whereas :bar will never have a complexity other than 1:

{
  myTypes {
    nodes {
      foo # complexity = 100
      bar # complexity = 1
    }
  }
}

N+1 complexity and limiting arguments

N+1 complexity is adjusted if an argument is given that will reduce the maximum number of nodes.

first, last

first, last will limit complexity to the value:

{
  myTypes {
    nodes(first: 10) {
      foo # complexity = 10
    }
  }
}

ids, iids

ids, iids will limit complexity to the length of the Array: (TODO can we have argument validation when they ask for too many?)

{
  myTypes {
    nodes(iids: ["123", "456", "789"]) {
      foo # complexity = 3
    }
  }
}

This would replace our current method of considering the complexity of these arguments in BaseResolver.

id, iid

id, iid will set complexity to 1.

{
  myTypes {
    nodes(iid: "123"]) {
      foo # complexity = 1
    }
  }
}

Limiting queries by potential node count

It's possible to reduce the complexity score of a query dramatically when N+1 issues are eliminated (and complexity_type: :'O(1)' is added to the fields).

For example, if the below query uses batch loading at every point in the nested query, we can end up making 3 queries, and the complexity is 4.

However, it has the potential to return 1,010,100 Foo, Bar, and Baz records:

{
  foos { # 100 Foos
    nodes {
      bars { # 10_000 Bars
        nodes {
          bazs { # 1_000_000 Bazs
            nodes {
              id
            }
          }
        }
      }
    }
  }
}

This example shows that although a query may not be complex to perform, the resulting data set may cause complications for our services when it returns the result, not to mention for the user who is unlikely to want over 1 million records to potentially be returned to them.

To be comprehensive in protecting the performance of our GraphQL API, we also need to also factor in the number of records that can be returned somehow.

I've approached this by having the complexity analyzer fail a query that has the potential to return too many nodes. A node is considered to be anything that isn't a Scalar in our GraphQL schema (this may result in some false hits, but will be fairly accurate).

The limit of nodes returned is currently 100,000. This is up for debate, and it's possible this is still too high! Note that this is less than GitHub's limit of 500,000 nodes.

Users who hit the potential node count limit should apply some sensible limiting arguments, and if they wish to have more pages of results, they can ask for additional pages of results using our regular GraphQL connection paging arguments.

Given the exponential nature of nodes in deeply nested connection queries, some simple and sensible limits dramatically reduce the number of potential nodes returned.

E.g., this query's potential node count drops from 1,010,100 down to 3,775 with some limiting arguments:

{
  foos(first: 25) { # 25 Foos
    nodes {
      bars(first: 25) { # 625 Bars
        nodes {
          bazs(first: 5) { # 3,125 Bazs
            nodes {
              id
            }
          }
        }
      }
    }
  }
}

Querying for complexity and potential node count score

Users can query for the calculated complexity and the potential node count of any query by asking for the values in metadata:

{
  metadata {
    queryComplexity
    queryPotentialNodeCount
  }
  project(fullPath: "gitlab-org/gitlab") {
    issue(iid: "1") {
      discussions {
        nodes {
          notes {
            nodes {
              body
            }
          }
        }
      }
    }
  }
}

A real-life example

The total complexity score on nested connections has the potential to spiral exponentially.

Given that the GraphQL API returns collections of records at 100 per page by default, this query has the potential to return up to 1,010,102 database records and has a total complexity of 1,010,102:

{
  metadata {
    queryComplexity # = 1010102
    queryPotentialNodeCount # = 1010102
  }
  project(fullPath: "gitlab-org/gitlab") { # complexity = 1 (although it's currently complexity 0)
    issues { # complexity = 1 (although it's currently complexity 2, because it has a default 'sort' argument, however there are indexes, so shrug?)
      nodes {
        discussions { # complexity = 100
          nodes {
            notes { # complexity = 10_000
              nodes {
                body # complexity = 1_000_000
              }
            }
          }
        }
      }
    }
  }
}

But, issues has no N+1 problems as all issues are loaded at once so we can mark this field with complexity_type: :'O(1)', notes are loaded at the same time as their parent discussions are, so there are next extra queries to load them, they can have a complexity of 0 and body of a note has no extra complexity, so they can have a complexity of 0 also.

When the schema is changed to apply the above changes, the query now has a complexity of 102, which seems reasonable. But the query still has the potential to return 1_010_101 records.

{
  metadata {
    queryComplexity # = 102
    queryPotentialNodeCount # = 1010102
  }
  project(fullPath: "gitlab-org/gitlab") { # complexity = 1
    issues { # complexity = 1
      nodes {
        discussions { # complexity = 100
          nodes {
            notes { # complexity = 0
              nodes {
                body # complexity = 0
              }
            }
          }
        }
      }
    }
  }
}

The complexity analyzer can also fail queries that have the potential to return a number of nodes greater than a maximum. This limit has currently been set at 100_000.

If users wish to make that query, they would need to consider changing it in some way, probably by adding limiting arguments like first and last.

Most users do not actually want to receive 100,000 records anyway, so having them consider using limiting arguments and pagination is probably mutually beneficial!

The query below has a drastically reduced potential node count of 31,902 when given some reasonable limiting arguments:

{
  metadata {
    queryComplexity # = 27
    queryPotentialNodeCount # = 31902
  }
  project(fullPath: "root/design-management") {
    issues(first: 25) {
      nodes {
        discussions(first: 25) {
          nodes {
            notes(first: 50) {
              nodes {
                body
              }
            }
          }
        }
      }
    }
  }
}
Edited Nov 18, 2019 by Luke Duncalfe
Assignee Loading
Time tracking Loading