2018-08-02 Shared Runners in GCP problems
We're observing today very strange behavior of half of our Shared Runners fleet in GCP. It seems that there is a network related problem in us-east1-c
region.
First type of problems, is the problem of machines creation - there are regular dropdowns of number of created machines. Also the number of machines created where Runner is able to do it is also much smaller than in us-east1-d
. For comparison few graphs:
us-east1-c
us-east1-d
While looking into logs I can see that most of machine creation failures are caused by different timeouts received by Docker Machine.
Also checking jobs that eventually were started on Runners based in us-east1-c
I can see, that most of them hangs and finally fails on different networking operations (in most cases - on pulling the images for job container and defined services).
It clearly looks like a networking problem in GCP us-east1-c. However - at least for now - GCP status Dashboard shows that all services are operating normally.
Life graphs can be previewed at: