How do we keep the main DB responsive during the first import from License DB?

Topic to Evaluate

On GitLab instances that import data from License DB for the first time, PackageMetadata::SyncWorker inserts millions of row into the main Postgres database using INSERT queries. This causes important performance issues.

  • This heavily degrades the initial user experience.
  • It makes QA tests fail due to performance slowdowns in the main DB. #396649 (comment 1319465309)

We need to evaluate options to solve this performance problem.

Tasks to Evaluate

For each option considered in this issue:

  • Check feasibility
  • Check compatibility with all supported deployments
  • Assess cost of implementation
  • Measure performance gain

Then,

  • Compare options, and choose a first step
  • Create implementation issues

Risks and Implementation Considerations

Team

/cc @gonzoyumo @sam.white @brytannia @stanhu

Auto-Summary 🤖

Discoto Usage

Points

Discussion points are declared by headings, list items, and single lines that start with the text (case-insensitive) point:. For example, the following are all valid points:

  • #### POINT: This is a point
  • * point: This is a point
  • + Point: This is a point
  • - pOINT: This is a point
  • point: This is a **point**

Note that any markdown used in the point text will also be propagated into the topic summaries.

Topics

Topics can be stand-alone and contained within an issuable (epic, issue, MR), or can be inline.

Inline topics are defined by creating a new thread (discussion) where the first line of the first comment is a heading that starts with (case-insensitive) topic:. For example, the following are all valid topics:

  • # Topic: Inline discussion topic 1
  • ## TOPIC: **{+A Green, bolded topic+}**
  • ### tOpIc: Another topic

Quick Actions

Action Description
/discuss sub-topic TITLE Create an issue for a sub-topic. Does not work in epics
/discuss link ISSUABLE-LINK Link an issuable as a child of this discussion

Last updated by this job

  • TOPIC Import using COPY FROM #397670 (comment 1320233823)
  • TOPIC Import to partition, attach #397670 (comment 1320273682)
  • TOPIC Licenses as arrays of IDs #397670 (comment 1320310058)
  • TOPIC Use separate DB #397670 (comment 1320320460)
  • TOPIC Improve bulk upsert #397670 (comment 1320324486)
  • TOPIC Allow duplicates #397670 (comment 1320393315)
  • TOPIC Admins enable the sync of each package type #397670 (comment 1320405899)
  • TOPIC Get package metadata via an API #397670 (comment 1320425477)
  • TOPIC Throttle database requests #397670 (comment 1320815585)
    • Following the normal workflow #397670 (comment 1320838568)
  • TOPIC Testing #397670 (comment 1320819134)
  • TOPIC Allow importing the top N most popular packages of each PURL type as an option #397670 (comment 1321230965)
  • TOPIC Compress data using ranges of versions #397670 (comment 1322371276)
  • TOPIC Compress the export files to save on space and network transfer time #397670 (comment 1324805831)
Discoto Settings
---
summary:
  max_items: -1
  sort_by: created
  sort_direction: ascending

See the settings schema for details.

Implementation Plan

  • modify PackageMetadata::SyncService to add a simple db request throttle after ingest via sleep https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/services/package_metadata/sync_service.rb#L45
Edited Mar 22, 2023 by Lucas Charles
Assignee Loading
Time tracking Loading