Skip to content

Rework BuildGrid's server startup to fork for monitoring before instantiating any gRPC objects

Adam Coldrick requested to merge sotk/bugs/fork-before-grpc into master

Before raising this MR, consider whether the following are required, and complete if so:

  • Unit tests
  • Metrics
  • Documentation update(s)

If not required, please explain in brief why not.

Description

Currently, the metrics subprocess gets created after both the gRPC server and the gRPC channels for interacting with remote services.

This isn't safe or supported, and can result in the subprocess crashing unexpectedly. Any forking must take place before any gRPC objects have been instantiated. Moving the fork to before we create the gRPC server is trivial, but the gRPC channels are created in the configuration parsing step.

This MR reworks the configuration parsing to not setup any gRPC objects, and then reworks the server's start method to handle triggering that setup, as well as handling the fork for monitoring and instantiating the gRPC server too. This makes the startup a bit more well-defined, with gRPC being started up as late as possible.

This MR doesn't make the same change in the WIP RabbitMQ-based BuildGrid, that will be covered in a follow-up MR.

Validation

Start BuildGrid with monitoring enabled, send an execute request, and notice that the metrics likely stop being published from the execution service.

docker-compose -f docker-compose-monitoring.yml up -d
tox -e venv -- bgd execute --remote-cas http://localhost:50052 command buildgrid sleep 10

You could also docker exec into the execution controller container and see the zombie monitoring process

docker exec -it buildgrid_controller_1 /bin/bash
ps aux

Apply the MR and do the same thing, and you can observe that the subprocess doesn't die and the metrics continue to be published as expected. Note that the original issue isn't 100% reproducible, so it might take a few execute requests before it shows up.

Edited by Adam Coldrick

Merge request reports