Skip to content
Snippets Groups Projects

Draft: Add performance tests for sync license

Closed Marcos Rocha requested to merge mc_rocha-performance-specs-sync-policy into master
4 unresolved threads
2 files
+ 54
0
Compare changes
  • Side-by-side
  • Inline
Files
2
@@ -37,6 +37,18 @@
subject
end
context 'with multiple projects in the namespace' do
let_it_be(:worker) { Security::ProcessScanResultPolicyWorker }
it 'does trigger SyncScanResultPoliciesProjectService for each project in group' do
create_list(:project, 10, namespace: namespace)
expect(worker).to receive(:perform_async).and_call_original.exactly(11).times
    • Author Maintainer

      This spec captures one of the root causes of the incident mentioned here:

      Huge number of invocation of Security::ProcessScanResultPolicyWorker

      If we follow the suggestion of using a batch of projects, this will reduce the number of invocations of Security::ProcessScanResultPolicyWorker.

      • @mc_rocha Once we introduce the suggested ID batching, this test will fail, despite it being a performance improvement. It seems to test this loop. Which sorts of performance issues could it detect that we might introduce? 🤔

        Edited by Dominic Bauer
      • Author Maintainer

        Which sorts of performance issues could it detect that we might introduce? 🤔

        Great question @bauerdominic!

        Maybe it won't detect new performance issues, but I had it for 2 reasons:

        1. I believe this is a spec we were missing. If we had it before, it could have helped us identify the potential performance risk.

        2. If this spec is merged before the ID batching update, it will be explicit that we are no longer enqueuing a job for each project.

      • @mc_rocha Agreed that the spec is useful! I don't think we disagree here. My concern regarding #433409 (closed) is that is aimed at detecting performance regressions by counting queries, especially regarding the improvements from &9971.

        I think the improvements &9971 will bring can be tested by counting queries, since we will be able to express the assertions in relative terms: If only one policy is changed, we should need less queries than if two policies were changed.

        But since today's implementation deletes and recreates all scan_result_policy_reads and their associations, a change to one or two policies take us the same amount of queries, because we also rebuild all other policies. So we would end up with an already failing test case, and I don't think these are useful (!138401 (comment 1681530094)).

        I believe this is a spec we were missing. If we had it before, it could have helped us identify the potential performance risk.

        I am not sure about this, since the worker behaved as specified in the test case. What happened was that the worker got repeatedly invoked for a group with a large number of contained projects. So we could also have written the test like:

        create_list(:project, 12345, namespace: namespace)
        expect(worker).to receive(:perform_async).and_call_original.exactly(12346).times

        We correctly get 12346 invocations of ProcessScanResultPolicyWorker, so this test wouldn't have saved us.

        WDYT?

      • Author Maintainer

        Thanks for the detailed response @bauerdominic!

        I believe it is a missing spec because the expected behavior was not explicit in the tests. No spec indicated that we are invoking a job for each project in a group.

        I thought that making this behavior explicit in a spec could have helped us because this incident and this another incident was caused by a large number of invocations of a worker.

        In one incident we opted to inline some of the workers invocations. And for the most recent one we are discussing about batching to reduce the number of workers.

        I believe that worker invocation is a point for attention during our reviews, and specs like this one could help anticipate future performance incidents.

        Once we introduce the suggested ID batching, this test will fail, despite it being a performance improvement.

        I believe this is expected since we would be changing the service's behavior. Fixing this test in the future will demonstrate the performance improvement.

      • Please register or sign in to reply
Please register or sign in to reply
subject
end
end
end
end
end
Loading