Skip to content
GitLab Next
  • Menu
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • gitaly gitaly
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 582
    • Issues 582
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 50
    • Merge requests 50
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • GitLab.orgGitLab.org
  • gitalygitaly
  • Merge requests
  • !4090

Materialize valid_primaries view (14.4)

  • Review changes

  • Download
  • Email patches
  • Plain diff
Merged Sami Hiltunen requested to merge smh-optimize-dataloss-query-14-4 into 14-4-stable Nov 17, 2021
  • Overview 1
  • Commits 3
  • Pipelines 7
  • Changes 3

The dataloss query is extremely slow for bigger datasets. The problem is that for each row that the data loss query is returning, Postgres computes the full result of the valid_primaries view only to filter down to the correct record. This results in an o(n2) complexity which kills the performance as soon as the dataset size increases. It's not clear why the join parameters are not pushed down in to the view in the query.

This optimizes the query by materializing the valid_primaries view. This ensures Postgres computes the full view only once and joins with the pre-computed result.

RepositoryStoreCollector gathers metrics on repositories which don't have a valid primary candidates available. This indicates the repository is unavailable as the current primary is not valid and ther are no valid candidates to failover to. The query is currently extremely inefficient on some versions of Postgres as it ends up computing the full valid_primaries view for each of the rows it checks. This doesn't seem to occur on all versions of Postgres, namely 12.6 at least manages to push down the search criteria inside the view. This commit fixes the situation by materializing the valid_primaries view prior to querying it. This ensures the full view isn't computed for all of the rows but rather Postgres just uses the pre-computed result.

See #3744 (comment 735797239) for explanation and query plans.

Part of #3744 (closed)

Edited Nov 17, 2021 by Sami Hiltunen
Assignee
Assign to
Reviewer
Request review from
Time tracking
Source branch: smh-optimize-dataloss-query-14-4