Skip to content

Analyze partitioned tables when syncing

Tianwen Chen requested to merge 423135-analyze into master

What does this MR do and why?

This is a MR to run analyze on partitioned table when partition syncing is run on a weekly schedule.

The reason we need to run analyze on partitioned table is that:

See this discussion at !129812 (comment 1525625243)

In short, the AUTOVACUUM doesn't run for partitioned tables. Therefore, the table statistics (produced by analyze) will not be updated for partitioned tables, and because table statistics is used to optimize the db queries, which means the db queries will not be optimized for partitioned tables. And this is the reason why we need to run the analyze on a regular basis.

NOTE: AUTOVACUUM process consists of two separate processes: AUTOVACUUM VACUUM and AUTOVACUUM ANALYZE . AUTOVACUUM VACUUM is like the original vacuum, reclaiming storage space taken by dead tuples. AUTOVACUUM ANALYZE is responsible for updating table statistics to ensure that queries are optimized for performance.

from https://levelup.gitconnected.com/understanding-the-autovacuum-process-in-postgresql-everything-you-need-to-know-ab389af7719d#:~:text=The%20AUTOVACUUM%20process%20consists%20of,queries%20are%20optimized%20for%20performance.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

  1. In rails console enable the feature flag:

    Feature.enable(:database_analyze_on_partitioned_tables)
  2. Setup:

    table_name = :_test_gitlab_main_my_model_example_table
    partition_table_name = :_test_gitlab_main_my_model_example_table_1
    model = Class.new(ApplicationRecord) do
      include PartitionedTable
      self.table_name = table_name
    
      partitioned_by :partition_id,
        strategy: :ci_sliding_list,
        next_partition_if: proc { false },
        detach_partition_if: proc { false },
        analyze_interval: 1.week
    end
    
    model.connection.execute(<<~SQL)
      CREATE TABLE #{table_name}(id serial) PARTITION BY LIST (id);
      CREATE TABLE IF NOT EXISTS #{partition_table_name} PARTITION OF #{table_name} FOR VALUES IN (1);
    SQL
    
    Gitlab::Database::Partitioning::PartitionManager.new(model).sync_partitions; nil
  3. See if it runs the ANALYZE

  4. Run again and it shouldn't run the ANALYZE

    Gitlab::Database::Partitioning::PartitionManager.new(model).sync_partitions; nil
  5. Teardown

    model.connection.execute(<<~SQL)
      DROP TABLE public.#{partition_table_name};
      DROP TABLE public.#{table_name};
    SQL

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #423135 (closed)

Edited by Tianwen Chen

Merge request reports