Skip to content

Don't query for primary for read operations

Pavlo Strokov requested to merge ps-dont-query-primary into master

On each read/write operation praefect requires to know which gitaly node is a primary. For mutator operations it is used to define from what node the response will be returned back to the client. For the read operations it is used to redirect request to or as a fallback option for reads distribution in case it is enabled. The default strategy for defining the primary is 'sql' which means the primary is tracked inside of the Postgres database and praefect issues select statement into it each time it needs to define the current primary. It creates a high load on the database when there are too many read operations (the outcome of the performance testing).

To resolve this problem we change the logic of retrieval of the set of up to date storages to return all storages including the primary. With it in place we don't need to know the current primary and use any storage that has latest generation of the repository to serve the requests. As this information is cached by the in-memory cache praefect won't create a high load on the database anymore.

This change also makes check IsLatestGeneration for the primary node redundant as it won't be present in the set of consistent storages if its generation not the latest one.

Closes: #3337 (closed)

Edited by Pavlo Strokov

Merge request reports

Loading