Skip to content

Make BotsService store state in a database

Neill Whillans requested to merge neill/bots_scalability_rework into master

Before raising this MR, consider whether the following are required, and complete if so:

  • Unit tests
  • Metrics
  • Documentation update(s)

If not required, please explain in brief why not.

Description

The aim of this request is to stop the BotsService storing of the state information in memory, and make it store it in a database instead. As such, the BotsInterface must be configured with an SqlProvider. The state information is stored in the following table:

TABLE bots (
    name = Column(String, nullable=False, index=True, primary_key=True)
    bot_id = Column(String, nullable=False, index=True)
    last_update_timestamp = Column(DateTime, index=True, nullable=False)
    bot_status = Column(String, nullable=False)
    lease_id = Column(String, nullable=True)
    instance_name = Column(String, nullable=True)
)

Note: As buildgrid currently only allows for a single lease per bot session, the table will only store a single lease_id. I initially had used a CSV string, to allow for multiple leases. Unable to use an SQL relationship to the Leases table, as it's population is dependent on the DataStore use. If multiple leases are to be used in the future, a better solution for handling the lease_ids should be found.

Changes proposed in this merge request:

  • Add SqlProvider to Bots Interface and Execution Controller
  • Create a bots table to store the appropriate state information
  • Update BotsInterface logic to store state in the bots table
  • Update the BotsService monitoring counts to use the database instead of per instance counts
  • Update the BotsService unit tests to query leases state from database
  • Remove part of a permissive mode test that had a deprecated outcome when using the database

Validation

Using docker-compose.yml, run docker-compose up --build --scale bots-interface=3. Check which bots-interface instance is getting requests, for example buildgrid_bots-interface_3.

Run a build: tox -e venv -- bgd execute command ./buildgrid sleep 180

When build is running, stop the bots-interface handling the session: docker stop buildgrid_bots-interface_3

The job should be transferred to a different interface and completed on time, with no redoing of already completed work.

A further test was made using a small program to mock CreateBotSession and UpdateBotSession calls to the bots-interfaces, to ensure that when a bot moves to a different interface, any existing sessions are re-queued then closed, while the bot session on the new interface remains.

Issues addressed

Closes / Resolves / Addresses (delete as appropriate) issue <e.g. repo-name#x>

Edited by Neill Whillans

Merge request reports