Advanced search appears to keep PostgreSQL waiting while Elastic Search is processing requests
Summary
While investigating database performance issues for a large customer, we noticed that amongst the sessions data gathered from the pg_stat_activity
table/view in PostgreSQL, there were advanced search related queries from sessions that were in idle in transaction
state.
This means that the client (Rails) has started a request, and then left it there while it does something else. This is not good, because queries over the database should complete quickly (ideally in milliseconds) and allow other queries to make use of the limited resources on the database server.
For example: the session is one of these limited resources; when using connection pooling, there will be a limited number of sessions available, but idle in transaction
queries will hold on to the session.
A separate investigation into search timeouts in their environment (500 errors being returned) identified that some queries were running into 60 seconds rack timeouts.
Looking for examplesof rack timeouts, we identified an example in the data on their performance ticket:
- A stack trace from the rack timeout
- A long running query from the database data for the same correlation ID.
Steps to reproduce
- Log
pg_stat_activity
data on PostgreSQL - Run searches in GitLab on an overloaded Elastic cluster so the requests time out.
Example Project
What is the current bug behavior?
Transactions are left idling on the database.
What is the expected correct behavior?
Transactions are completed prior to performing slow operations such as querying Elasticsearch.
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
Self managed 16.3.7