Skip to content

Automatically create dynamic database partitions

Andreas Brandl requested to merge ab/auto-create-partitions into master

What does this MR do?

This change implements automatic creation of dynamic partitions.

What are dynamic partitions? Why do we need this change?

Postgres supports a few variants of partitioning:

  1. HASH
  2. RANGE
  3. LIST

For (1), we typically create all the partitions needed ahead of runtime, i.e. in development. All needed partitions are known and we add them to db/structure.sql as usual. We call those static partitions.

Now, with (2, 3) the situation can be different. Consider time-based partitioning: We create one partition per month. However, we cannot create partitions until eternity in a static manner. Instead and as time goes by, we create those partitions dynamically and during runtime. That's why we call those dynamic partitions.

This has a couple implications:

  1. Dynamic partitions are not part of db/structure.sql since they are created during runtime only
  2. We have to have mechanics to make sure dynamic partitions are created at the appropriate time
  3. Not having the right partition created is disastrous: We cannot insert records and transactions fail because of that.

When do we need to create partitions?

  1. On application startup. Consider we install GitLab and shut it down completely for a year or so. After starting it up again, we expect it to fully work - without waiting for a cronjob to run or a manual rake task to create partitions. Therefore, we trigger partition creation in an initializer.
  2. While the application is running. This is the regular case - the application keeps running for a long time and we need to regularly check and create partitions. Therefore, we trigger partition creation through a sidekiq cronjob.

There are a few more helpful touchpoints which are not strictly required but make our lives much easier:

  1. After initializing the database schema. For fresh installations, we initialize the database schema from db/structure.sql. This doesn't contain dynamic partitions, so we trigger partition creation after rake db:structure:load. This has a different angle, too: The table definition in db/structure.sql can move to a different position in the file depending on the presence or absence of its partitions in the database. This is due to topological sorting in the SQL dump (the partition definitions are dependent on the definition of the "parent" table, which resides in a different schema). So this is about making the db:check-schema task happy.
  2. After running database migrations. This isn't strictly required but seems like a good time to trigger that logic, too (since we're already typically creating tables and such anyways at this time).
  3. While we reload the database schema for testing. In the case of CI and running tests, we use rake db:test:load to recreate the database schema from scratch. Similar to db:structure:load, we don't have dynamic partitions here. We cannot rely on initializers since we might not restart the environment between tests.

How does it work?

We only implement monthly partitioning here, though we expect there'll be other strategies soon.

Let's take audit_events (partitioned by created_at by month) as an example. For this strategy, we typically want

  1. A catchall partition from the beginning of time up to the first actual partition. This is used in case we try to insert records with a very old created_at for some reason.
  2. From the month of the youngest record present, we want a gapless sequence of partitions - one for each month going forward.
  3. We create partitions 6 months into the future for safety reasons (this gives us plenty of time to check issues)

This logic is being triggered in various places (see above). Most prominently, this goes into a sidekiq cronjob which executes 4 times a day.

Follow-ups

Note this change does not actually submit a migration to partition a table or add a partitioned model. This has been extracted into !36298 (merged) - which builds on top of this MR. Please refer to that change in order to see how this is going to be used.

We are thinking about adding metrics to the number of missing/created partitions to add monitoring capabilities. #227353 (closed)

Relates to #220321 (closed)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Edited by Mayra Cabrera

Merge request reports