Investigate buffer_mapping contention issue from 2021-11-22 incident
We had a database incident that caused an severity1 outage in gitlab-com/gl-infra/production#5952 (closed).
This issue attempts to discuss possible theories and track reproduction steps/experiments for that issue.
Participants
Below is a list of team members who will be participating in this investigation and identification of root cause.
| role | name | region | notes |
|---|---|---|---|
| Management lead | @shampton | AMER | DRI of overall effort. Provide updates in R&S standup, etc. |
| Development lead | @grzesiek | EMEA | Investigate technical root cause |
| Development lead | @stanhu | AMER | Investigate technical root cause |
| Development lead | @engwan | APAC | Investigate technical root cause |
| Development lead | @abrandl | EMEA | Investigate technical root cause. Andreas is evaluating the extent of his involvement with his manager. |
| 2021-11-25 R&S standup update | @grzesiek | EMEA | |
| 2021-11-30 R&S standup update | @cdu1 | AMER | |
| 2021-12-02 R&S standup update | @alexives | AMER | |
| after 2021-12-02 R&S standup updates | @shampton | AMER |
Summary
- Many old rows in
ci_job_artifactswere deleted, but since there are over 1.4 billion rows in that table, the autovacuum threshold (7+ million rows) was not reached. As a result, there were a lot of dead tuples with lowjob_idvalues. - These old rows need to be scanned when the query planner calculates
min(job_id). - We now understand why the excessive buffer read only happens to replicas. When planning a
min(job_id)query on primaries, the primary opportunistically marks dead tuples with aLP_DEADhint bit. This significantly reduces future buffer reads, but these hint bits are ignored by replicas at the moment. There is a PostgreSQL patch in review that should fix this, but the earliest it could land is PostgreSQL 15. - Based on the analysis above, an index vacuum would avoid the excessive buffer read.
- Suggested corrective actions -
- Short term: Lower the autovacuum thresholds for
ci_job_artifactstable: https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/14723 - Short term: Improve our observability with pgwatch2: https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/13891#note_741922183
- Mid/Long term: Partition big CI tables (related links below)
- Draft: Add the CI/CD data time-decay blueprint - !70052 (merged)
- Reduce tables sizes to < 100 GB per physical table - &6211
- Mid/Long term: Review autovacuum strategy for all types of tables holistically: #346980 (moved)
- Short term: Lower the autovacuum thresholds for
Edited by Alex Ives