Skip to content

Scheduling the design repositories sync

What does this MR do?

The part that sync design is already done. When a new design is pushed it creates event that will be handled now. The part that we miss is design repository backfill, this MR fills this gap.

This also changes SQL queries. As designs table has many records for every project, after inner join we need to care of duplicates and it's not very convenient to use distinct everywhere as it has some requirements about the order of calling. I propose to use subqueries instead. When we make the queries in the main database it's better to use subquery in WHERE statement according to my tests. For FDW tables the most performant way is a subquery in the inner join statement.

SELECT COUNT("gitlab_secondary"."projects".id)
FROM "gitlab_secondary"."projects"
INNER JOIN "design_registry" ON "gitlab_secondary"."projects"."id" = "design_registry"."project_id"
INNER JOIN (
    SELECT DISTINCT "gitlab_secondary"."design_management_designs"."project_id" FROM "gitlab_secondary"."design_management_designs"
  ) dd
  ON dd."project_id" = "gitlab_secondary"."projects"."id";
Aggregate  (cost=453.91..453.92 rows=1 width=8) (actual time=189.409..189.409 rows=1 loops=1)
   Buffers: shared hit=1
   ->  Hash Join  (cost=333.26..446.60 rows=2925 width=4) (actual time=47.393..189.405 rows=10 loops=1)
         Hash Cond: (projects.id = design_management_designs.project_id)
         Buffers: shared hit=1
         ->  Hash Join  (cost=121.70..227.19 rows=2925 width=8) (actual time=0.487..142.496 rows=10 loops=1)
               Hash Cond: (projects.id = design_registry.project_id)
               Buffers: shared hit=1
               ->  Foreign Scan on projects  (cost=100.00..197.75 rows=2925 width=4) (actual time=0.459..130.279 rows=151518 loops=1)
               ->  Hash  (cost=15.20..15.20 rows=520 width=4) (actual time=0.016..0.016 rows=10 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 9kB
                     Buffers: shared hit=1
                     ->  Seq Scan on design_registry  (cost=0.00..15.20 rows=520 width=4) (actual time=0.011..0.013 rows=10 loops=1)
                           Buffers: shared hit=1
         ->  Hash  (cost=209.06..209.06 rows=200 width=4) (actual time=46.900..46.900 rows=1518 loops=1)
               Buckets: 2048 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 70kB
               ->  HashAggregate  (cost=205.06..207.06 rows=200 width=4) (actual time=46.558..46.719 rows=1518 loops=1)
                     Group Key: design_management_designs.project_id
                     ->  Foreign Scan on design_management_designs  (cost=100.00..197.75 rows=2925 width=4) (actual time=0.225..37.591 rows=45204 loops=1)
 Planning time: 0.250 ms
 Execution time: 189.632 ms

Explain for the old one:

--------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=357.18..357.19 rows=1 width=8) (actual time=1080.597..1080.597 rows=1 loops=1)
   Buffers: shared hit=1, temp read=5455 written=5432
   ->  Hash Join  (cost=232.74..354.04 rows=1258 width=919) (actual time=1043.533..1078.893 rows=559 loops=1)
         Hash Cond: (design_management_designs.project_id = projects.id)
         Buffers: shared hit=1, temp read=5455 written=5432
         ->  Foreign Scan on design_management_designs  (cost=100.00..197.75 rows=2925 width=4) (actual time=0.249..31.218 rows=45204 loops=1)
         ->  Hash  (cost=131.66..131.66 rows=86 width=927) (actual time=1043.276..1043.276 rows=10 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 11kB
               Buffers: shared hit=1, temp read=5455 written=5432
               ->  Hash Join  (cost=113.66..131.66 rows=86 width=927) (actual time=975.368..1043.256 rows=10 loops=1)
                     Hash Cond: (design_registry.project_id = projects.id)
                     Buffers: shared hit=1, temp read=5455 written=5432
                     ->  Seq Scan on design_registry  (cost=0.00..15.20 rows=520 width=4) (actual time=0.007..0.010 rows=10 loops=1)
                           Buffers: shared hit=1
                     ->  Hash  (cost=112.58..112.58 rows=86 width=923) (actual time=975.241..975.241 rows=151518 loops=1)
                           Buckets: 32768 (originally 1024)  Batches: 16 (originally 1)  Memory Usage: 3841kB
                           Buffers: temp written=3459
                           ->  Foreign Scan on projects  (cost=100.00..112.58 rows=86 width=923) (actual time=1.368..904.125 rows=151518 loops=1)
 Planning time: 0.249 ms
 Execution time: 1080.802 ms
(20 rows)

Screenshots

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Closes #34324 (closed)

Edited by Valery Sizov

Merge request reports