[C1][GPRD] VACUUM FULL on pg_statistic/pg_attribute

Production Change

Change Summary

On Sunday, August 3rd, 2025 DBO ran [GPRD][Sec Decomp] Truncate gitlab_sec tables in main (and vice-versa). This C1 CR involved truncating large portions of data in both patroni-main (~2.9TB freed) and patroni-sec (29TB freed).

An unintended side effect of this change was severe fragmentation of the pg_attribute and pg_statistic tables. Normally this would involve only slightly increased planning time, but in the bloat_btree.sql query (an arguably terrible query that was originally pulled from GitHub, and while it has received a few bugfixes up to about 3 years ago, we are still on a version from 5 years ago), the use of pg_attribute and pg_statistic tables in the query (to calculate index bloat...a monitoring query), causes the actual runtime of the query to spike from previous 0.8 - 1.2 second runtime to currently 300+ second runtime. This causes the index bloat query to timeout and cause prometheus scrape failure alerts every time it happens.

The current issue we're seeing is none of the following attempts to address the fragmentation seemed to help:

VACUUM
VACUUM ANALYZE
REINDEX TABLE
manual update of statistics for these tables
etc...

The only sure-fire fix we've found is to VACUUM FULL both tables (which rebuilds the table transparently and reindexes). This is particularly problematic in this context because pg_attribute and especially pg_statistic are used by the query planner to generate plans for every query that runs within the cluster. This means for the duration of the LOCKs for the VACUUM, PostgreSQL will be unable to plan any read-write or read-only queries. This effectively becomes an outage.

To avoid this, we considered a re-write of the query in question to a much more modern (and efficient) CTE format, but there is effectively too much technical debt involved to do so in a timely manner, as we would have to get all environments up to the latest version across different subsets of hosts in order to make the change and ensure a fix.

So in sticking with the original query, testing in db-lab shows the LOCK for the VACUUM FULL on pg_attribute is negligible, but the LOCK for the VACUUM FULL on pg_statistic will run somewhere between 10 and 20 seconds. As db-lab is not under significant load, this is expected to be more appropriately 1-2 minutes in production.

This is too long of a window to mask it with pgbouncer pauses (and it will affect the read-replicas as well, so the application will see impact regardless), so we need to do the following:

Announce downtime to customers following all required procedures
Schedule for an appropriate off-peak hour time
Run the VACUUM FULLs on main
Validate bloat_btree.sql runs within 15 seconds again post-VACUUM

Change Details

Services Impacted - ServicePatroni
Change Technician - @jjsisson or @rhenchen.gitlab
Change Reviewer - @bshah11 @bprescott_ @msmiley @stomlinson
Scheduled Date and Time (UTC in format YYYY-MM-DD HH:MM) - 2025-08-13 01:00
Time tracking - 30m
Downtime Component - 5m

Set Maintenance Mode in GitLab

If your change involves scheduled maintenance, add a step to set and unset maintenance mode per our runbooks. This will make sure SLA calculations adjust for the maintenance period.

Detailed steps for the change

Pre-execution steps

Make sure all tasks in Change Technician checklist are done
For C1 and C2 change issues, the SRE on-call has been informed prior to change being rolled out. (In #production channel, mention @sre-oncall and this issue and await their acknowledgement.)
- The SRE on-call provided approval with the eoc_approved label on the issue.
For C1, C2, or blocks deployments change issues, Release managers have been informed prior to change being rolled out. (In #production channel, mention @release-managers and this issue and await their acknowledgment.)
There are currently no active incidents that are severity1 or severity2
If the change involves doing maintenance on a database host, an appropriate silence targeting the host(s) should be added for the duration of the change.

Change steps - steps to take to execute the change

Estimated Time to Complete (mins) - Estimated Time to Complete in Minutes

Set label changein-progress /label ~change::in-progress
ssh patroni-main-v16-102-db-gprd.c.gitlab-production.internal # current primary. Double check!
sudo gitlab-psql
set STATEMENT_TIMEOUT=0;
\timing on
VACUUM FULL pg_attribute;
VACUUM FULL pg_statistic;

Verify query plan is reduced runtime (400 - 10000 ms execution time)

EXPLAIN ANALYZE SELECT current_database(), nspname AS schemaname, tblname, idxname AS object_name, bs*(relpages)::bigint AS real_size,
bs*(relpages-est_pages)::bigint AS extra_size,
100 * (relpages-est_pages)::float / relpages AS extra_ratio,
fillfactor,
CASE WHEN relpages > est_pages_ff
THEN bs*(relpages-est_pages_ff)
ELSE 0
END AS bloat_size,
100 * (relpages-est_pages_ff)::float / relpages AS bloat_ratio,
is_na
-- , 100-(pst).avg_leaf_density AS pst_avg_bloat, est_pages, index_tuple_hdr_bm, maxalign, pagehdr, nulldatawidth, nulldatahdrwidth, reltuples, relpages -- (DEBUG INFO)
FROM (
SELECT coalesce(1 +
     ceil(reltuples/floor((bs-pageopqdata-pagehdr)/(4+nulldatahdrwidth)::float)), 0 -- ItemIdData size + computed avg size of a tuple (nulldatahdrwidth)
  ) AS est_pages,
  coalesce(1 +
     ceil(reltuples/floor((bs-pageopqdata-pagehdr)*fillfactor/(100*(4+nulldatahdrwidth)::float))), 0
  ) AS est_pages_ff,
  bs, nspname, tblname, idxname, relpages, fillfactor, is_na
  -- , pgstatindex(idxoid) AS pst, index_tuple_hdr_bm, maxalign, pagehdr, nulldatawidth, nulldatahdrwidth, reltuples -- (DEBUG INFO)
FROM (
  SELECT maxalign, bs, nspname, tblname, idxname, reltuples, relpages, idxoid, fillfactor,
        ( index_tuple_hdr_bm +
            maxalign - CASE -- Add padding to the index tuple header to align on MAXALIGN
              WHEN index_tuple_hdr_bm%maxalign = 0 THEN maxalign
              ELSE index_tuple_hdr_bm%maxalign
            END
          + nulldatawidth + maxalign - CASE -- Add padding to the data to align on MAXALIGN
              WHEN nulldatawidth = 0 THEN 0
              WHEN nulldatawidth::integer%maxalign = 0 THEN maxalign
              ELSE nulldatawidth::integer%maxalign
            END
        )::numeric AS nulldatahdrwidth, pagehdr, pageopqdata, is_na
        -- , index_tuple_hdr_bm, nulldatawidth -- (DEBUG INFO)
  FROM (
      SELECT n.nspname, i.tblname, i.idxname, i.reltuples, i.relpages,
          i.idxoid, i.fillfactor, current_setting('block_size')::numeric AS bs,
          CASE -- MAXALIGN: 4 on 32bits, 8 on 64bits (and mingw32 ?)
            WHEN version() ~ 'mingw32' OR version() ~ '64-bit|x86_64|ppc64|ia64|amd64' THEN 8
            ELSE 4
          END AS maxalign,
          /* per page header, fixed size: 20 for 7.X, 24 for others */
          24 AS pagehdr,
          /* per page btree opaque data */
          16 AS pageopqdata,
          /* per tuple header: add IndexAttributeBitMapData if some cols are null-able */
          CASE WHEN max(coalesce(s.null_frac,0)) = 0
              THEN 2 -- IndexTupleData size
              ELSE 2 + (( 32 + 8 - 1 ) / 8) -- IndexTupleData size + IndexAttributeBitMapData size ( max num filed per index + 8 - 1 /8)
          END AS index_tuple_hdr_bm,
          /* data len: we remove null values save space using it fractionnal part from stats */
          sum( (1-coalesce(s.null_frac, 0)) * coalesce(s.avg_width, 1024)) AS nulldatawidth,
          max( CASE WHEN i.atttypid = 'pg_catalog.name'::regtype THEN 1 ELSE 0 END ) > 0 AS is_na
      FROM (
          SELECT ct.relname AS tblname, ct.relnamespace, ic.idxname, ic.attpos, ic.indkey, ic.indkey[ic.attpos], ic.reltuples, ic.relpages, ic.tbloid, ic.idxoid, ic.fillfactor,
              coalesce(a1.attnum, a2.attnum) AS attnum, coalesce(a1.attname, a2.attname) AS attname, coalesce(a1.atttypid, a2.atttypid) AS atttypid,
              CASE WHEN a1.attnum IS NULL
              THEN ic.idxname
              ELSE ct.relname
              END AS attrelname
          FROM (
              SELECT idxname, reltuples, relpages, tbloid, idxoid, fillfactor, indkey,
                  pg_catalog.generate_series(1,indnatts) AS attpos
              FROM (
                  SELECT ci.relname AS idxname, ci.reltuples, ci.relpages, i.indrelid AS tbloid,
                      i.indexrelid AS idxoid,
                      coalesce(substring(
                          array_to_string(ci.reloptions, ' ')
                          from 'fillfactor=([0-9]+)')::smallint, 90) AS fillfactor,
                      i.indnatts,
                      pg_catalog.string_to_array(pg_catalog.textin(
                          pg_catalog.int2vectorout(i.indkey)),' ')::int[] AS indkey
                  FROM pg_catalog.pg_index i
                  JOIN pg_catalog.pg_class ci ON ci.oid = i.indexrelid
                  WHERE ci.relam=(SELECT oid FROM pg_am WHERE amname = 'btree')
                  AND ci.relpages > 0
              ) AS idx_data
          ) AS ic
          JOIN pg_catalog.pg_class ct ON ct.oid = ic.tbloid
          LEFT JOIN pg_catalog.pg_attribute a1 ON
              ic.indkey[ic.attpos] <> 0
              AND a1.attrelid = ic.tbloid
              AND a1.attnum = ic.indkey[ic.attpos]
          LEFT JOIN pg_catalog.pg_attribute a2 ON
              ic.indkey[ic.attpos] = 0
              AND a2.attrelid = ic.idxoid
              AND a2.attnum = ic.attpos
        ) i
        JOIN pg_catalog.pg_namespace n ON n.oid = i.relnamespace
        JOIN pg_catalog.pg_stats s ON s.schemaname = n.nspname
                                  AND s.tablename = i.attrelname
                                  AND s.attname = i.attname
        GROUP BY 1,2,3,4,5,6,7,8,9,10,11
  ) AS rows_data_stats
) AS rows_hdr_pdg_stats
) AS relation_stats
ORDER BY nspname, tblname, idxname;

Set label changecomplete /label ~change::complete

Rollback

Rollback steps - steps to be taken in the event of a need to rollback this change

Estimated Time to Complete (mins) - None

There is no rollback from VACUUM FULL as this rebuilds the table it targets in a compact form (removing fragmentation) as well as performs a REINDEX on the table.

Monitoring

Key metrics to observe

Metric: SLAs
- Location: Dashboard URL
- What changes to this metric should prompt a rollback: Sustained Apdex drop (especially for rails, patroni, patroni-sec, or sidekiq)
Metric: Patroni Service Error Ratio
- Location: Dashboard URL
- What changes to this metric should prompt a rollback: Sustained Service Error Ratio

Change Reviewer checklist

C4 C3 C2 C1:

Check if the following applies:
- The scheduled day and time of execution of the change is appropriate.
- The change plan is technically accurate.
- The change plan includes estimated timing values based on previous testing.
- The change plan includes a viable rollback plan.
- The specified metrics/monitoring dashboards provide sufficient visibility for the change.

C2 C1:

Check if the following applies:
- The complexity of the plan is appropriate for the corresponding risk of the change. (i.e. the plan contains clear details).
- The change plan includes success measures for all steps/milestones during the execution.
- The change adequately minimizes risk within the environment/service.
- The performance implications of executing the change are well-understood and documented.
- The specified metrics/monitoring dashboards provide sufficient visibility for the change.
  - If not, is it possible (or necessary) to make changes to observability platforms for added visibility?
- The change has a primary and secondary SRE with knowledge of the details available during the change window.
- The change window has been agreed with Release Managers in advance of the change. If the change is planned for APAC hours, this issue has an agreed pre-change approval.
- The labels blocks deployments and/or blocks feature-flags are applied as necessary.

Change Technician checklist

The change plan is technically accurate.
This Change Issue is linked to the appropriate Issue and/or Epic
Change has been tested in staging and results noted in a comment on this issue.
A dry-run has been conducted and results noted in a comment on this issue.
The change execution window respects the Production Change Lock periods.
For C1 and C2 change issues, the change event is added to the GitLab Production calendar.
For C1 and C2 change issues, the Infrastructure Manager provided approval with the manager_approved label on the issue. Mention @gitlab-org/saas-platforms/inframanagers in this issue to request approval and provide visibility to all infrastructure managers.
For C1, C2, or blocks deployments change issues, confirm with Release managers that the change does not overlap or hinder any release process (In #production channel, mention @release-managers and this issue and await their acknowledgment.)

Edited Aug 13, 2025 by Rafael Henchen