Make BotsService store state in a database
Before raising this MR, consider whether the following are required, and complete if so:
-
Unit tests -
Metrics -
Documentation update(s)
If not required, please explain in brief why not.
Description
The aim of this request is to stop the BotsService storing of the state information in memory, and make it store it in a database instead. As such, the BotsInterface must be configured with an SqlProvider. The state information is stored in the following table:
TABLE bots (
name = Column(String, nullable=False, index=True, primary_key=True)
bot_id = Column(String, nullable=False, index=True)
last_update_timestamp = Column(DateTime, index=True, nullable=False)
bot_status = Column(String, nullable=False)
lease_id = Column(String, nullable=True)
instance_name = Column(String, nullable=True)
)
Note: As buildgrid currently only allows for a single lease per bot session, the table will only store a single lease_id. I initially had used a CSV string, to allow for multiple leases. Unable to use an SQL relationship to the Leases table, as it's population is dependent on the DataStore use. If multiple leases are to be used in the future, a better solution for handling the lease_ids should be found.
Changes proposed in this merge request:
- Add SqlProvider to Bots Interface and Execution Controller
- Create a
bots
table to store the appropriate state information - Update BotsInterface logic to store state in the
bots
table - Update the BotsService monitoring counts to use the database instead of per instance counts
- Update the BotsService unit tests to query leases state from database
- Remove part of a permissive mode test that had a deprecated outcome when using the database
Validation
Using docker-compose.yml
, run docker-compose up --build --scale bots-interface=3
. Check which bots-interface instance is getting requests, for example buildgrid_bots-interface_3
.
Run a build: tox -e venv -- bgd execute command ./buildgrid sleep 180
When build is running, stop the bots-interface handling the session: docker stop buildgrid_bots-interface_3
The job should be transferred to a different interface and completed on time, with no redoing of already completed work.
A further test was made using a small program to mock CreateBotSession and UpdateBotSession calls to the bots-interfaces, to ensure that when a bot moves to a different interface, any existing sessions are re-queued then closed, while the bot session on the new interface remains.
Issues addressed
Closes / Resolves / Addresses (delete as appropriate) issue <e.g. repo-name#x>