2018-05-14 site degradation, increased load on the api

Summary

In 2018-05-14 we saw increased load on the api fleet resulting in slow pipelines and severe degradation of api operations.

timeline

  • 10:50 - api limit lowered from 6 to 4 https://gitlab.com/gitlab-com/infrastructure/issues/4195#note_72948654
  • 10:57 - 10.8rc8 deployment finished
  • 15:00 - 6 new api servers added to the fleet increasing the fleet size from 14 to 20.
  • 15:25 - rolled back the api limit setting so it is now back to 6
  • 16:00 - gdpr enabled on gitlab.com
  • 2018-05-16 11:21 @yorickpeterse notices with show pools on pgbouncer that we are running dangerously close to the max connection limit of 300.

So what I'm currently thinking is this:

  1. We reduce the fleet size back to normal
  2. We somewhat increase the number of database connections Unicorn can use, from 100 to e.g. 120
Edited by John Jarvis