An error occurred while fetching the assigned milestone of the selected merge_request.
Draft: Add performance tests for sync license
4 unresolved threads
4 unresolved threads
Compare changes
Files
2+ 12
− 0
@@ -37,6 +37,18 @@
This spec captures one of the root causes of the incident mentioned here:
If we follow the suggestion of using a batch of projects, this will reduce the number of invocations of
Security::ProcessScanResultPolicyWorker
.@mc_rocha Once we introduce the suggested ID batching, this test will fail, despite it being a performance improvement. It seems to test this loop. Which sorts of performance issues could it detect that we might introduce?🤔
Great question @bauerdominic!
Maybe it won't detect new performance issues, but I had it for 2 reasons:
I believe this is a spec we were missing. If we had it before, it could have helped us identify the potential performance risk.
If this spec is merged before the ID batching update, it will be explicit that we are no longer enqueuing a job for each project.
@mc_rocha Agreed that the spec is useful! I don't think we disagree here. My concern regarding #433409 (closed) is that is aimed at detecting performance regressions by counting queries, especially regarding the improvements from &9971.
I think the improvements &9971 will bring can be tested by counting queries, since we will be able to express the assertions in relative terms: If only one policy is changed, we should need less queries than if two policies were changed.
But since today's implementation deletes and recreates all
scan_result_policy_read
s and their associations, a change to one or two policies take us the same amount of queries, because we also rebuild all other policies. So we would end up with an already failing test case, and I don't think these are useful (!138401 (comment 1681530094)).I am not sure about this, since the worker behaved as specified in the test case. What happened was that the worker got repeatedly invoked for a group with a large number of contained projects. So we could also have written the test like:
We correctly get 12346 invocations of
ProcessScanResultPolicyWorker
, so this test wouldn't have saved us.WDYT?
Thanks for the detailed response @bauerdominic!
I believe it is a missing spec because the expected behavior was not explicit in the tests. No spec indicated that we are invoking a job for each project in a group.
I thought that making this behavior explicit in a spec could have helped us because this incident and this another incident was caused by a large number of invocations of a worker.
In one incident we opted to inline some of the workers invocations. And for the most recent one we are discussing about batching to reduce the number of workers.
I believe that worker invocation is a point for attention during our reviews, and specs like this one could help anticipate future performance incidents.
I believe this is expected since we would be changing the service's behavior. Fixing this test in the future will demonstrate the performance improvement.