Skip to content

Create partitioned copy for events (monthly partition)

Problem

Due to the ongoing increased load issues on the primary, we need to take action to reduce the load. One of the solutions we're considering is partitioning tables > 100GB.

Analysis

See #24538 (closed). The summary is as follows:

  1. Most queries order by created_at, or can be easily updated to do so. About 55% of queries involving events will need to be updated.
  2. The data for March 2023 will be around 40GB

Proposal

This issue will be to perform the following :

  1. https://docs.gitlab.com/ee/development/database/table_partitioning.html#step-1-creating-the-partitioned-copy-release-n.
    1. To save space, create the copy table following https://docs.gitlab.com/ee/development/database/ordering_table_columns.html#real-example
    2. To speed up the backfill, consider not creating indexes in the copy until the backfill is complete.
  2. https://docs.gitlab.com/ee/development/database/table_partitioning.html#step-2-backfill-the-partitioned-copy-release-n

At 50,000 rows every two minutes, it will take 83 days to complete the backfill


Previous analysis

Due to the ongoing increased load issues on the primary, we need to take action to reduce the load. One of the solutions we're considering is partitioning tables > 100GB. As a very large table, events has been selected as a candidate. For more information on available strategies, please review our docs

We'd like to answer the following questions:

  1. Does events have a key that would be easy to partition on?

    Yes, #24538 (closed)

  2. What strategy seems like the best bet for partitioning this table?

    Range

  3. Are there other strategies we'd like to consider to reduce the size of events?

  4. Is there any improvement even if we partition ? Or do we have to delete old partitions to see any improvement ?

    If we just partition, it's my understanding from DBRE that we should still see improvements to write performance. While keeping the old data around isn't ideal, having the table partitioned helps because writing to a smaller table is easier, and while each partition would need vacuuming, the vacuuming processes would be much faster.

  5. What is the growth rate of the events table ?

    10-80 per second

    Screenshot_2023-03-20_at_9.36.52_AM

    -- source

Edited by 🤖 GitLab Bot 🤖