Skip to content

Extend Threat Insights error budget exception

Thiago Figueiró requested to merge error-budget-extension into master

Why is this change being made?

This MR extends the Threat Insights exception until 2023-05-22 (%16.0 release).

In 2022-11, an error budget exception for Threat Insights was introduced. The intention is to remove the exception when both of these Epics are closed:

  1. Extend the use of `security_findings` (gitlab-org&8341 - closed)
  2. Deprecate and remove Vulnerabilities::Feedback (gitlab-org&5629 - closed)

While the first epic has been delivered, its benefits won't be fully realized until the remaining epic is also closed.

Details

Methods in the Projects::VulnerabilityFeedbackController are responsible for about 58% of the Apdex violations in Threat Insights. With the completion of gitlab-org&5629 (closed), we'll be able to retire the under-performing code in this controller. This will significantly improve the error budget, and the exception should no longer be necessary.

The percentages in red below are the error rates (i.e. apdex violations / request count).

image

Questions

Why is the extension needed?

gitlab-org&5629 (closed) has been delayed by three major reasons:

  1. Complexity of required database migrations.
    1. One of the migrations had to be reworked. In gitlab-org/gitlab#386323 (closed), we identified an outage caused by a service that was going to be used in the migration. The migration was reworked in gitlab-org/gitlab#387665 (closed), but had to be retried 3 times before it finally succeeded.
    2. A migration was blocked by the failures above. A similar migration in gitlab-org/gitlab#384222 (closed) was likely to face the same issues as the one above. We decided to wait until we had a successful pattern before merging it.
  2. Strong advice against mandatory upgrades.
    1. To avoid data issues in self-managed instances, the team originally proposed to make %15.9 a mandatory upgrade so that we could safely enable the feature flag by default in %15.10.
    2. In the adjusted plan, we'll take advantage of the existing mandatory upgrade in %15.11 to ensure that the required migrations are complete before enabling the feature flag by default in %16.0.
  3. Cross-dependency with the MR widget refactor.
    1. The changes required by [MR Widget] V2 (gitlab-org&8353 - closed) overlap with the endpoints supported by Projects::VulnerabilityFeedbackController. Rather than refactor a controller that is due to be removed to support the new widget version, the team decided to account for these changes so that the refactored widget uses new endpoints that don't depend on Vulnerability::Feedback.

What needs to be done?

gitlab-org&5629 (closed) needs to be finished. The new scope includes the work required to replace the existing MR widget with the new, better-performing version.

How long is it going to take?

The work described above is already refined, and scheduled to finish in %15.10.

We're allowing ourselves until %16.0 to account for the self-managed roll-out, unforeseen problems in development, and also any issues detected during the feature flag roll-out on the SaaS environments.

Who's doing the work?

What else have we done/tried?

  1. https://gitlab.com/gitlab-org/gitlab/-/issues/390434 to confirm why the error budget continued to decline after completion of Extend the use of `security_findings` (gitlab-org&8341 - closed).
    1. We identified that certain GraphQL operations have started to significantly contribute to error budget consumption. This was expected as Threat Insights is working to retire its REST endpoints in favor of GraphQL.
    2. Created and scheduled gitlab-org/gitlab#391419 (closed) in %15.10 to address a particularly slow GraphQL mutation.
  2. gitlab-org/gitlab#388066 (closed) to degrade the UI by disabling the expensive call to Projects::VulnerabilityFeedbackController#index.
    1. gitlab-org/gitlab#388701 (closed) enabled the FF for gitlab-org/gitlab. The UI degradation was deemed unacceptable to enable for other customers.
    2. The top 3 projects are responsible for around 75% of calls to this endpoint. We decided to not engage with these customers to ask about degrading the UI for their project.
  3. gitlab-org&8353 (closed) is still in progress, and the feedback refactor scope has been extended to avoid code re-work in the short-term.

Author Checklist

  • Provided a concise title for this Merge Request (MR)
  • Added a description to this MR explaining the reasons for the proposed change, per say why, not just what
    • Copy/paste the Slack conversation to document it for later, or upload screenshots. Verify that no confidential data is added, and the content is SAFE
  • Assign reviewers for this MR to the correct Directly Responsible Individual/s (DRI)
    • If the DRI for the page/s being updated isn’t immediately clear, then assign it to one of the people listed in the Maintained by section on the page being edited
    • If your manager does not have merge rights, please ask someone to merge it AFTER it has been approved by your manager in #mr-buddies
    • The when to get approval handbook section explains the workflow in more detail
  • If the changes affect team members, or warrant an announcement in another way, please consider posting an update in #whats-happening-at-gitlab linking to this MR
    • If this is a change that directly impacts the majority of global team members, it should be a candidate for #company-fyi. Please work with internal communications and check the handbook for examples.

Edited by Thiago Figueiró

Merge request reports