Resource exhaustion using GraphQL `vulnerabilitiesCountByDay`
Summary
VulnerabilitiesCountPerDayResolver
is used to show the count of vulnerabilities over time on the security dashboard.
It takes a start_date
and an end_date
, and then returns the counts for every date in the range. The typical query and response looks like this:
Query
query projectVulnerabilitiesCount {
project(fullPath: "gitlab-org/gitlab") {
id
vulnerabilitiesCountByDay(startDate: "2022-01-01", endDate: "2022-01-05") {
nodes {
date
critical
high
info
low
medium
unknown
}
}
}
}
Response
{
"data": {
"project": {
"id": "gid://gitlab/Project/278964",
"vulnerabilitiesCountByDay": {
"nodes": [
{
"date": "2022-01-01",
"critical": 0,
"high": 0,
"info": 0,
"low": 0,
"medium": 0,
"unknown": 0
},
{
"date": "2022-01-02",
"critical": 0,
"high": 0,
"info": 0,
"low": 0,
"medium": 0,
"unknown": 0
},
{
"date": "2022-01-03",
"critical": 0,
"high": 0,
"info": 0,
"low": 0,
"medium": 0,
"unknown": 0
},
{
"date": "2022-01-04",
"critical": 0,
"high": 0,
"info": 0,
"low": 0,
"medium": 0,
"unknown": 0
},
{
"date": "2022-01-05",
"critical": 0,
"high": 0,
"info": 0,
"low": 0,
"medium": 0,
"unknown": 0
}
]
}
}
}
}
It works by querying for Vulnerability::HistoricalStatistic
records between the given dates. There is one record per day, and each record contains the count of vulnerabilities for that day. Records do not exist for every day. To present data for dates where there is no HistoricalStatistic
record, we iterate through every date between start_date
and end_date
and fill in the counts for that date with zeroes. This happens before pagination is applied. There are no limits on the upper and lower bounds of the date range, and ruby has seemingly no limits to how far out the Date
calendar can go. These dates are valid, and have over 700 trillion days between them:
[3] pry(main)> Date.iso8601('999999999999-01-01')
=> Fri, 01 Jan 999999999999
[4] pry(main)> Date.iso8601('-999999999999-01-01')
=> Wed, 01 Jan -999999999999
This means that the seemingly innocuous code (start_date..end_date).to_a
can actually result in ruby trying to build an array of infinite size in memory.
Steps to reproduce
-
Log in to GDK.
-
Send this GraphQL query:
query projectVulnerabilitiesCount { project(fullPath: "gitlab-org/gitlab-test") { id vulnerabilitiesCountByDay(startDate: "0001-01-01", endDate: "5874897-01-01") { nodes { date critical high info low medium unknown } } } }
When I tested this locally, the request was pending for 5 minutes before timing out. The Ruby process was consuming 7.4 GB of memory when the request timed out. After the request timed out, the memory consumed was not freed. In fact, it continued growing (!!!!) and the Ruby process had to be killed.
What is the current bug behavior?
100% of memory and CPU can be consumed with a single request.
What is the expected correct behavior?
Not that.
Possible fixes
- Truncate date ranges outside of a certain size
- Apply pagination before backfilling counts
- Apply sensible limits on what years can be in a date
Issue: How do we tell how many dates are between a date range without iterating through them?
Implementation plan
-
end_date.year - start_date.year
inee/app/graphql/resolvers/vulnerabilities_count_per_day_resolver.rb
, if the result is bigger than 1 (year) then error out