Rework BuildGrid's server startup to fork for monitoring before instantiating any gRPC objects
Before raising this MR, consider whether the following are required, and complete if so:
-
Unit tests -
Metrics -
Documentation update(s)
If not required, please explain in brief why not.
Description
Currently, the metrics subprocess gets created after both the gRPC server and the gRPC channels for interacting with remote services.
This isn't safe or supported, and can result in the subprocess crashing unexpectedly. Any forking must take place before any gRPC objects have been instantiated. Moving the fork to before we create the gRPC server is trivial, but the gRPC channels are created in the configuration parsing step.
This MR reworks the configuration parsing to not setup any gRPC objects, and then reworks the server's start
method to handle triggering that setup, as well as handling the fork for monitoring and instantiating the gRPC server too. This makes the startup a bit more well-defined, with gRPC being started up as late as possible.
This MR doesn't make the same change in the WIP RabbitMQ-based BuildGrid, that will be covered in a follow-up MR.
Validation
Start BuildGrid with monitoring enabled, send an execute request, and notice that the metrics likely stop being published from the execution service.
docker-compose -f docker-compose-monitoring.yml up -d
tox -e venv -- bgd execute --remote-cas http://localhost:50052 command buildgrid sleep 10
You could also docker exec
into the execution controller container and see the zombie monitoring process
docker exec -it buildgrid_controller_1 /bin/bash
ps aux
Apply the MR and do the same thing, and you can observe that the subprocess doesn't die and the metrics continue to be published as expected. Note that the original issue isn't 100% reproducible, so it might take a few execute requests before it shows up.