When a new CVE comes in that we are going to measure, we want to see it reported to our customers as soon as possible.
By measuring the time from CVE report to us having it in our product and setting a target for this KPI we can present customers with a good customer experience.
Steps to develop this KPI (up for discussion):
Establish how we will measure (CVE date to Merge?)
For past 3-6 months establish baseline numbers and historical performance
Establish an automated method to capture this data and report in periscope
Based on previous two parts, establish a target KPI and shift resources if we are not within the KPI performance boundaries.
The current suggested KPI metrics is less than 7 days.
Category:Fuzz Testing
GitLab Ultimate
devops
application security testing
feature flag
frontend
fuzzing
coverage
group
dynamic analysis
missed:14.7
section
sec
type
feature
workflow
in dev
Category:Fuzz Testing
GitLab Ultimate
backend
devops
application security testing
direction
fuzzing
coverage
group
dynamic analysis
missed:14.3
missed:14.4
section
sec
type
feature
workflow
in dev
Category:Fuzz Testing
Deliverable
GitLab Ultimate
backend
devops
application security testing
direction
fuzzing
coverage
group
dynamic analysis
section
sec
type
feature
workflow
in dev
Category:Fuzz Testing
Deliverable
GitLab Ultimate
backend
devops
application security testing
direction
fuzzing
coverage
group
dynamic analysis
section
sec
type
feature
workflow
in dev
Category:Fuzz Testing
Deliverable
GitLab Ultimate
backend
devops
application security testing
direction
fuzzing
coverage
group
dynamic analysis
section
sec
type
feature
workflow
in dev
For Line item 2, this might be a bit weird to view since we were doing them manually up until about a month ago. While doing them manually, we were averaging 3-5 CVE's/week. Now that it is automated, we have processed our backlog at about 100/week (we had about 450 in backlog). Given that, a massive spike will show for the end of October - beginning of November. Going forward, the trending should be bit bit more consistent.
After thinking a little bit about the proposal, I started wondering whether "last update to CVEs" is actually a good KPI. This KPI would capture the time delta between the NVD vulnerability report publication time and the time where the corresponding advisory is merged in gemnasium-db. While this information is interesting to gather, I would argue against the premise that having a target KPI of <2 days can be used to measure a good customer experience in the context of dependency scanning.
From a customer perspective, if there is a vulnerability in one of the project dependencies, I think it does not really matter whether the vulnerability was reported in a recent CVE, or in a CVE that is one year old. Actually, the vulnerability from the old CVE can be considered even more severe because there may be more exploits around as compared to a vulnerability that was recently disclosed. To frame it differently: if you have two CVEs where one is newer than the other one, I would argue that the newer one is not necessarily more valuable than the older one just because of its age.
At least from this perspective, I would argue that a KPI such as throughput (e.g., #advisories/week) is more valuable (for us and customers) because if we are able to increase throughput by increasing the number of advisories we provide per week, the higher the likelihood of uncovering (old and new) vulnerabilities in customer code/dependencies. Increasing throughput will implicitly impact the response time, too. Based on that, I think that #advisories/week would be a better KPI for capturing a good customer experience.
Just to provide some numbers. At the moment, we have 1775 advisories in our database out of which 1061 are CVE related. 378 advisories are originating from NVD 2019, 75 from NVD 2018, 118 from NVD 2017. The differences in the number of advisories when comparing the NVD feeds from 2019 with the older feeds shows that there are probably many uncovered advisories in the older data-feeds on which we have to catch up.
If we would use the KPI <2 days metric, all the advisories we take from the old feeds could make us look very bad when measured against this metric because the vulnerability reports are already one or two years old; actually, adding advisories from old data feeds could worsen the KPI measure which would be counterproductive, I suppose, as every newly added advisory irrespective of its age provides value. At any rate, based on what I have written above, I think that advisories from old data feeds have the same value as advisories generated from the most recently discovered vulnerability coming from NVD 2019.
As a complementary KPI, we could also measure coverage (#relevant CVEs/#NVD-Feed-Size). When measured over the feeds from 2018 and 2017 this should give us a pretty good idea about how many CVEs from NVD we were able to identify as relevant (i.e., related to a package). Because of the amount of historical data we could use, this would be a good, stable baseline; once we have processed the data from 2018, 2017, ..., we will have a good estimate about how many CVEs we probably would have to extract when looking at a new data feed. Increasing coverage would mean that we are able to extract more relevant data from the NVD data feed which would lead to more uncovered vulnerabilities in the customer code/dependencies (better customer experience).
There are a few points here that should be evaluated...
All CVEs have value. I don't believe it is a matter of lesser vs greater value or newer vs older.
Every software security company I have worked at tries to get as close to zero-day detection as possible. This ensures our customers are always up-to-date wrt patching.
I believe that both KPIs ("CVE to Merge" and "number of advisories per week") are interesting and have value. We should track them both!
All CVEs have value. I don't believe it is a matter of lesser vs greater value or newer vs older.
Yes, the problem I see with the CVE date to merge metric is that it only looks at the response time. So essentially this KPI only rewards the addition of new advisories (i.e., advisories with a small time difference between NVD publication date and the date at which they were merged); at the same time it punishes the addition of old advisories. I am providing a more concrete example below.
Every software security company I have worked at tries to get as close to zero-day detection as possible. This ensures our customers are always up-to-date w.r.t. patching.
I think that it is important to update the advisory database on a regular basis, too, so that we do not miss out NVD feed updates. We can do that by setting up a policy w.r.t. the time frame in which we check the NVD feed and update the advisory database. This time frame can be two days.
However, my main concern was about using CVE date to merge as a KPI to measure a good customer experience because there is actually no correlation between a good customer experience and CVE date to merge; it cannot be used as a measure to track our progress towards the goal that ... our customers are always up-to-date w.r.t. patching, either. I try to illustrate this claim with an example below.
I think that in the SecOps context, the incident response time (which is comparable to CVE date to merge) would make perfect sense because your system may be already in flames and you have to take quick action. In SecOps, taking action today vs. taking action tomorrow makes a huge difference w.r.t. customer experience that are using your system.
In the context of dependency scanning, this is different. From the perspective of a GitLab dependency scanning customer, it does not make any difference w.r.t. the customer value whether one of his dependencies has a vulnerability which has been recently reported or an old one; if we do not have a corresponding advisory in our database it is equally bad in both cases. The CVE date to merge KPI has the fundamental drawback of exclusively "rewarding" CVEs that have been added within the given time-frame (<2 days after their publication). But looking at the advisory database as a whole, comprehensiveness is much more important than the time in which an advisory was added after its publication on NVD.
I would like to illustrate that based on an example:
Customer A has a software project with stable (but not the most recent) versions of his dependencies. One of her/his project dependencies is vulnerable to CVE-2017-old.
Customer B always uses the latest and greatest dependencies and tries to upgrade to the most recent versions. One of her/his project dependencies is vulnerable to CVE-2019-latest1.
Customer C uses a combination of old and new software packages in his software. Her/his project has CVE-2017-old and CVE-2019-latest1 vulnerabilities.
Customer D uses rather old project dependencies. One of her/his dependencies, although rather old, contains a vulnerability that has been disclosed recently: CVE-2019-latest2. This is because the date where a CVE is published does not necessarily imply that the disclosed vulnerability is also related to a recently published version of a package/dependency. In other words, it can very well be that a recently published CVE is related to a rather old version of a package/dependency.
Not having CVE-2019-latest1, CVE-2019-latest2 and CVE-2017-old in our advisory database is equally bad. Actually, depending on the vulnerability, not having CVE-2017-old could be the worst case: the software of Customer A and C could be running in the wild for two years with a vulnerability in one of its dependencies.
CVE date to merge <2 days only measures the response time between disclosing a vulnerability and publishing it through our advisory database so, in our example, it is biased towards satisfying Customer B and Customer D.
I try to summarise what happens when using a KPI CVE date to merge as a baseline and how the corresponding customer experience(s) for Customers A-D would look like:
CVEs present in advisory db
KPI (CVE date to merge) < 2 days
Customer (ABCD) experience/value
CVE-2019-latest1
1day
CVE-2019-latest2
1day
CVE-2017-old
700days
CVE-2017-old, CVE-2019-latest1
(700days + 1day)/2=350days
CVE-2019-latest1, CVE-2019-latest2
2days/2=1day
CVE-2017-old, CVE-2019-latest1, CVE-2019-latest2
700days+2days/3 = 234
As illustrated in the table above, we would have the best KPI measures if we only consider CVE-2019-latest1, CVE-2019-latest2 or both. However, the best customer value would be provided if we have CVE-2017-old, CVE-2019-latest1 and CVE-2019-latest2 in our database. There is essentially no correlation between the second and third column which is the reason why I think that we should avoid using CVE date to merge as a KPI altogether.
To put it in more provocative terms : if our success would be measured by the CVE date to merge KPI, we should only consider generating advisories for CVE-2019-latest(1|2) and ignore the rest.
Instead of CVE date to merge, I would suggest using #relevant CVEs/#NVD-Feed-Size. For a feed size of 100, we would get the following values:
CVEs present in advisory db
KPI (#relevant CVEs/#NVD-Feed-Size)
Customer (ABCD) experience/value
CVE-2019-latest1
0.01
CVE-2019-latest2
0.01
CVE-2017-old
0.01
CVE-2017-old, CVE-2019-latest1
0.02
CVE-2019-latest1, CVE-2019-latest2
0.02
CVE-2017-old, CVE-2019-latest1, CVE-2019-latest2
0.03
#relevant CVEs/#NVD-Feed-Size when compared to CVE date to merge is better for measuring the customer value (and to track the progress) because the higher the measure, the more value we are providing to the customer (there is a strong correlation between columns 2 and 3). Moreover, because we can measure it over a huge data-set, it is a good baseline measure: it will provide a good predictor for how many advisories we should be able to extract when looking at a new data-feed.
I would like to make the following alternative proposal:
Use a 2day policy instead of a CVE date to merge KPI: Extract data and generate advisories from the NVD feed every two days. I should mention at this point that our data-extraction approach always processes the whole NVD data-feed. Our approach infers from the NVD metadata whether a certain entry is related to a package/dependency or not; we are constantly improving this inference step which requires us to re-process whole data-feeds once in a while for extracting new advisories (from new and old CVEs).
Use #relevant CVEs/#NVD-Feed-Size (coverage) KPI: the coverage measure tells us how good our extraction mechanism and advisory generation approach work. As illustrated above, this KPI is also a good measure for customer experience/value.
Use #advisories/time-frame (throughput) KPI: This is a useful measure for improving our automation and advisory generation process. If we are able to increase this KPI, it means that we were able to improve efficiency (e.g., by improving automation or a quicker review procedure). Moreover, reaching an higher throughput will indirectly improve the coverage too (more advisories in a given time-frame).
@julianthome I want to make sure that I understand your concerns appropriately...
Are you saying that the backlog of advisories that you just processed consisted of current
advisories?
I agree that we need to have a comprehensive list of advisories. My assumption is that is what we have. If that is not the case, then I believe that we need to continue to process the older advisories; but I don't think they should be at the cost of the newer advisories. Which leads me to my next question - on average - how many new advisories are there per week? My assumption is that it's a manageable number. If that is true, then I believe we have the capacity to process the new advisories and start working against the older advisories in parallel. Perhaps we can work together to set a target amount per week.
Whether they are old or new, we need to have a rich set of advisories that cover a large percentage (if not all) of our customer base.
Are you saying that the backlog of advisories that you just processed consisted of current advisories?
Yes that is true, at the moment we are only processing the CVE backlog for NVD-2019 (2019/01/01-today).
I agree that we need to have a comprehensive list of advisories. My assumption is that is what we have. If that is not the case, then I believe that we need to continue to process the older advisories; but I don't think they should be at the cost of the newer advisories.
To address these points, I would like to go a little bit into the details of the extraction procedure that we use for extracting/generating advisories from NVD CVEs.
NVD is a very general vulnerability database and only a very small fraction of CVEs that you find in an NVD feed are actually relevant in our context; a CVE is relevant if it refers to a package/dependency.
I should mention that we identify a CVE from NVD as relevant by checking its metadata against a knowledge-base that we built up. This knowledge base is constantly improving/growing with every CVE we process; processing the NVD2017 feed may yield useful information to improve the advisory extraction for NVD2019 and vice versa.
From the NVD data feed from 2019 (Jan-Nov), out of 11633, ~500 advisories are related to dependencies/packages; so we have a coverage of 4.2%: 4.2% of vulnerabilities from an NVD data feed are related to packages/dependencies and thus relevant w.r.t. dependency scanning. When looking at the data-feed from 2018 (Jan-Dez), which contains 16K vulnerability entries, we have 118 advisories in our advisory database which is a coverage of 0.7%.
I consider comprehensiveness like a goal we should strive for but which we can never fully achieve because we can never be fully certain that our knowledge-base, which we use to identify relevant vulnerability reports in the NVD feed, is complete. Hence, we have to assume that there will always be CVEs which are related to packages/dependencies but which we did not identify as relevant (false negatives).
We can use the coverage measure to check where we stand in terms of comprehensiveness: based on the difference between the coverage of 4.2% for NVD2019, and 0.7% for NVD2018, we can conclude that we most probably missed some relevant CVEs from the NVD2018 feed. Based on that difference, I think it is safe to say that we are not comprehensive yet.
Which leads me to my next question - on average - how many new advisories are there per week?
Usually NVD publishes/modifies around 300 vulnerability reports a week. Based on the coverage we have so far for 2019, I would roughly estimate that it should be 12 package-related advisories/week.
My assumption is that it's a manageable number.
Yes that is true (when considering NVD2019).
If that is true, then I believe we have the capacity to process the new advisories and start working against the older advisories in parallel.
Yes, we definitely do have the capacity to do that.
Based on your suggestion, I would like to change the last part from my previous comment to:
Use a 2day policy for processing the most recent NVD feed
Use #relevant CVEs/#NVD-Feed-Size (coverage) KPI
Use #advisories/time-frame (throughput) KPI
Perhaps we can work together to set a target amount per week.
Yes, I think that 100 advisories/week was perfectly manageable for the start, but I think we can push the boundaries there
Just to better clarify my previous comment; my concern was only about using CVE date to merge as a means to measure our success (KPI).
With the example above, I tried to illustrate that CVE date to merge cannot be used to measure customer experience/value and using it as a KPI would even be counterproductive because it punishes the addition of advisories that are originating from older NVD feeds. In practice, every single advisory that is added to our database improves dependency scanning (can lead to +1 vulnerability detected). With the CVE date to merge KPI, newly added advisories generated from CVEs with a CVE date to merge greater than the current average worsen the overall KPI.
Using CVE date to merge as a means to measure our success would essentially doom us to failure simply because the amount of old CVEs (>=2 days) is largely disproportionate to the amount of new CVEs (<2 days) even when looking at a single datafeed (15957 vs 43) . As I mentioned above, false negatives (although we try to keep them low by constantly updating and improving our knowledge base) are inevitable; we still have to process old data feeds such as NVD2018, NVD2017, NVD2016. Every advisory added from there will lead to a worse CVE date to merge measure although it has a positive impact dependency scanning and the customer experience.
Again, thanks for the very thoughtful write-up @julianthome. If I were to guess at what the spirit of this KPI is, it is to say that when new advisories come out, we can successfully integrate them within a matter of days. It is a mean time to merge (MTTM) metric to ensure that we are staying up to date on the new advisories that come out. This seems to be a rational approach to me. Especially since - at some point in the near future - we will have processed what we believe to be a comprehensive set of historical advisories.
Especially since - at some point in the near future - we will have processed what we believe to be a comprehensive set of historical advisories.
As our knowledge-base, which we use to identify relevant CVEs, can never be complete, there will be always false negatives so we will never be 100% comprehensive. This is inevitable because the process of identifying relevant advisories on NVD is no exact science. This is also a good motivation to measure coverage because, if used as a KPI, it will indicate whether we are getting to the goal of having a more comprehensive vulnerability database.
If I were to guess at what the spirit of this KPI is, it is to say that when new advisories come out, we can successfully integrate them within a matter of days. It is a mean time to merge (MTTM) metric to ensure that we are staying up to date on the new advisories that come out.
As an example let us assume we are adding the last 5 package-related advisories within one day after their publication on NVD in Nov. Based on our knowledge-base, we did not miss a single vulnerability; we believed our advisory database to be comprehensive up to this point. Shortly afterwards, after discovering a gap in our knowledge-base and closing it, we unravel a new advisory that is based on a CVE with an age of 31 days. Whenever we improve our knowledge-base, it can have a chain effect which may lead to unravelling advisories in new and old NVD data-feeds. In this example, although we added all advisories that have been published within a single day, our MTTM would be 36/6 = 6 days. Although we improved our knowledge-base and the advisory database, by just adding a single advisory, our KPI value worsened from a 1 day average to a 6 days average. We were not aware of the fact that we missed a CVE 31 days ago where we had a perfect MTTM of 1 day; 31 days later this falls back on us although we improved our process. We had a situation just like this recently (https://gitlab.slack.com/archives/CHZTNM1TN/p1574450483015900) where we got aware of a gap in our knowledge-base. Going back to this example, we basically obtained a worse MTTM we cannot act upon; actually, the MTTM has gotten worse because we already improved our process.
Given that our data-set is highly skewed (due the partition scheme <2 days, >=2 days), using the Median value instead of the Mean would at least help to avoid the problems mentioned above. In the example, we would have a median time to merge of 1 day also after adding the advisory that is based on the CVE with an age of 31 days. However, for both MTTM or median time to merge, I do not know how to use this KPI as a means to improve our process.
As our extraction process always extracts all the relevant CVEs based on the knowledge-base which will never be complete and is always subject to improvement, I am wondering whether it would not be just better to use a 2 day sync policy instead of MTTM.
There are a couple of questions regarding MTTM that are not clear to me. I would appreciate if you could help me to answer them:
how does MTTM inform our process? Based on the fact that we can be only as good as our knowledge-base, which is the key for automating the advisory generation in the first place, what information do we gain from knowing that we have an MTTM of 6 days in the example above although we had all relevant advisories in our database within a single day?
is MTTM an actionable KPI? We only got an MTTM of 6 days because we updated our knowledge-base. So strangely, after already improving our knowledge-base and process, our MTTM has gotten worse.
what would be the threshold after which we consider a CVE as old? In the example, I set the threshold to 31 days. The explained problems will remain irrespective of the threshold, though.
what is the advantage of MTTM over using a 2 day sync policy where we process the most recent NVD data-feed every two days? We could enforce/measure this in terms of time difference between merge-time and creation-time of CVE related advisory MRs.
These are great questions @julianthome, keep them coming!
First, there is a saying, "don't make the perfect the enemy of the good." Meaning that we aren't going for perfection. We just want to be able to measure something that can be used to show our customers the value we are providing. Like all metrics, it is an indicator/hint. The more metrics we have around this, the better story we can tell.
how does MTTM inform our process? Based on the fact that we can be only as good as our knowledge-base, which is the key for automating the advisory generation in the first place, what information do we gain from knowing that we have an MTTM of 6 days in the example above although we had all relevant advisories in our database within a single day?
Not every metric needs to bolster our process. Some are used to tell a story. That is what this one will do. It helps our customers understand our dedication to updating our DB.
is MTTM an actionable KPI? We only got an MTTM of 6 days because we updated our knowledge-base. So strangely, after already improving our knowledge-base and process, our MTTM has gotten worse.
This is why more metrics are better. Having a single datasource can take people in the wrong direction. With more data, we can show why this value going down could be considered a good thing.
what would be the threshold after which we consider a CVE as old? In the example, I set the threshold to 31 days. The explained problems will remain irrespective of the threshold, though.
My assumption is that if we have a <2day MTTM, then anything after would be considered late. However, there is still the issue of needing to build up our historical data, so perhaps there is a way to accomplish this and not have it affect the MTTM. As an example, we say within 2 days and no greater than a month as the measurement. That way anything over a month old doesn't influence the metric, as it'd be considered adding old CVEs to our dataset.
what is the advantage of MTTM over using a 2 day sync policy where we process the most recent NVD data-feed every two days? We could enforce/measure this in terms of time difference between merge-time and creation-time of CVE related advisory MRs.
The answer I gave for point 3 kind of touches on this.
Thanks a lot @tstadelhofer for the clarifications during our conversation yesterday and for the answers to the questions above. The context in which the MTTM metric is used is much clearer to me now.
I would like to start working on a CI/CD script which automatically computes metrics and generates charts for the data we have in our advisory database. I would like to provide some data by next week. For the start, I would like to gather the following statistics:
MTTM (median time to merge)
Mean-TTM (mean time to merge)
Coverage (#relevant CVEs/#NVD-Feed-Size)
Package-type distribution (#advisories per package type)
@julianthome I spoke with @ddesanto and we agreed that we should build up our dataset as follows:
First: Main focus should be the last 3 years (2016 and newer)
Next steps to explore: May want to explore up to 8 years (2012 and newer)
The MR regarding the CI job integration is merged. Realtime stats (covering all measures mentioned in the description of this issue) are now available under https://gitlab-org.gitlab.io/security-products/gemnasium-db/. The Periscope Integration is ongoing. Should we close this MR then or should we wait until the Periscope Integration is completed?
This has been a great thread to read! I have a few thoughts on it:
Two separate efforts are being talked about in two different contexts
Efforts:
Catching up on old advisories
Quickly adding newly detected advisories
Contexts:
Demonstrating our commitment to customers to keeping our advisory database
current
Directly measuring the performance of the VR team & tools
MTTM could still be a good KPI, with a big IF
If we track MTTM from the time that we first detect that the advisory exists,
MTTM could still be a good KPI.
This covers the cases that have been discussed in this issue:
Category
Notes
New Advisories
We detect it soon after the advisory is released, and add it in <=2 days
Backlog/Old
We hadn't started looking at these, so they weren't "detected" by us yet
Updated, recent
A recent advisory from NVD is updated that we now recognize as a relevant advisory - this is when we start tracking MTTM, not at the publication date
Thoughts on #advisories/week
If we are completely up-to-date with our backlog of package-related
vulnerability advisories that we need to add to our database, I'm not sure
how much this will mean to customers or to us. As an interesting datapoint it
may still hold value, but maybe not as a KPI.
That being said, until we are completely caught up, I do think it is an
excellent metric to track how fast we are catching up on the backlog of
advisories and how much we are improving our tools and automation.
Thanks for adding your thoughts @d0c-s4vage . I look at it this way... We are presently not measuring anything as a KPI. Having something to measure - even if it's not perfect - is still better than measuring nothing. From that info, we will gain insights on what we should be measuring and iterate. It's ok if we don't get it right at first. It's about learning and making tweaks until we feel as though we are capturing the right data.
It came to my attention today that CVEs can be released for years other than the current year: https://twitter.com/JGamblin/status/1216718221790851083. 5 CVEs were released today with their IDs being CVE-2014-XXXX. I didn't know this was possible or allowed - it may impact how we measure our performance.
The pubdate data is derived from the publicationDate field (=NVD Published Date in the screenshot below) that is attached as meta information to every CVE in the NVD JSON feed; the publicationDate timestamp refers to the actual date on which an advisory has been published:
@d0c-s4vage and @julianthome this does complicate things a bit. However, I'd go back to my original statement of, most customers will want to see that we have coverage for the new CVEs. Again, we should absolutely have coverage for these and track them, since this gives us comprehensive coverage as well. We will want several metrics to tell the story here.
@clefelhocz1 the current target date for having this in Periscope is Jan 20. There is some work needed from the data team in order to get this information and that is their target date. See https://gitlab.com/gitlab-data/analytics/issues/3209 for more detail.
Starting from this week (2020-05), we shifted to a daily synchronisation model which improved our weekly MTTM to 2.4 days.
We will continue to stick to the daily model and discuss ways in which we can further automate the advisory synchronisation: gitlab-org&2508 (closed).
I take the liberty of closing this issue as we have all the data in Sisense now and KPI tracking is online. Please feel free to re-open in case any issue arises.