Skip to content

Rebuild table statistics after reindexing an expression index

Andreas Brandl requested to merge ab/reindex-functional-analyze into master

What does this MR do?

Expression indexes are based on a expression, not on columns directly. An example is LOWER(name) and creating an index for this. Regular table statistics cannot know about statistics for the derived expression and hence those indexes (often) require additional statistics to be available.

Upon reindexing, we drop the old index after swapping indexes. Unfortunately, this also drops the associated additional statistics object with it. We end up in a bad situation where we have the index but don't have the statistics for it. The planner may choose not to use the index because of that.

This is indeed what happened to us in gitlab-com/gl-infra/production#2885 (closed) which led to a full outage.

Now in this change, we add a step that recreates these additional statistics before we drop the old index. This makes sure that at any time in the process we have both: A working expression index and its associated statistics.

Other references

An interesting reference is postgres/postgres@b17ff07a which was a follow-up to the GitLab.com outage. This has been fixed for REINDEX CONCURRENTLY, which can be used from PG12.

https://www.postgresql.org/message-id/flat/CAFcNs%2BqpFPmiHd1oTXvcPdvAHicJDA9qBUSujgAhUMJyUMb%2BSA%40mail.gmail.com has a broader discussion around this.

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Andreas Brandl

Merge request reports