Expired bot sessions may remain accounted as they were in the metrics
Currently the only place we update the set
s used to publish metrics around the bot sessions are implemented within buildgrid/server/bots/service.py
.
When the bot_session_reaper
closes expired_sessions
within buildgrid/server/bots/instance.py
, that is not reflected in the metrics (expired sessions stick around in terms of metrics), since those are only updated with requests from grpc clients.
Also it looks like we're keeping track of the last time we received a request from each bot in service.py::bots[bot_id]
for no reason other than counting them (and we do keep similar data in instance.py
for expiry and lease tracking reasons already).
An approach to fix this could be to delegate the work of keeping track of # of bots in different states within instance.py
(and have service.py
ask the instance object it owns for those numbers to publish), so that the reaper can update the states set
s too.