bgd bot: StatusCode.RESOURCE_EXHAUSTED
Context
I am trying build Freedesktop SDK with BuildStream + BuildGrid + BuildBox.
In the configuration I am testing, I just have one BuildBox bot.
It builds few jobs and from time to time get stuck in a loop after completing a job.
Here is the log for one occurrence. I cut at 300 characters per line. In real there are two lines that are very long: stderr and stdout.
2019-01-08 13:51:57,129:[ buildgrid.bot.session][DEBUG]: Updating bot session: [.valmont]
2019-01-08 13:51:57,630:[ buildgrid.bot.session][DEBUG]: Updating bot session: [.valmont]
2019-01-08 13:51:57,748:[ buildgrid._app.bots.buildbox][DEBUG]: BuildBox stderr: [b'+ sh -c -e \'mkdir "bst_build_dir"\ncd "bst_build_dir"\necho slibdir=/usr/lib/x86_64-linux-gnu >configparms\necho gconvdir=/usr/lib/x86_64-linux-gnu/gconv >>configparms\necho rootsbindir=/usr
/bin >>configparms
2019-01-08 13:51:57,804:[ buildgrid._app.bots.buildbox][DEBUG]: BuildBox stdout: [b'checking build system type... x86_64-pc-linux-gnu\nchecking host system type... x86_64-unknown-linux-gnu\nchecking for x86_64-unknown-linux-gnu-gcc... x86_64-unknown-linux-gnu-gcc\nchecking for suffix of objec
2019-01-08 13:51:57,834:[ buildgrid._app.bots.buildbox][DEBUG]: BuildBox exit code: [0]
2019-01-08 13:51:57,834:[ buildgrid._app.bots.buildbox][DEBUG]: Output root digest: [hash: "32e5ea081ec1a1da9eae0ef84c1493d696bc6f37ed8abcba18cccfcd01c6bcb7"
size_bytes: 891
]
2019-01-08 13:51:58,132:[ buildgrid.bot.session][DEBUG]: Updating bot session: [.valmont]
2019-01-08 13:51:58,633:[ buildgrid.bot.session][DEBUG]: Updating bot session: [.valmont]
2019-01-08 13:51:59,134:[ buildgrid.bot.session][DEBUG]: Updating bot session: [.valmont]
2019-01-08 13:51:59,758:[ buildgrid.bot.session][DEBUG]: Updating bot session: [.valmont]
2019-01-08 13:51:59,905:[ buildgrid.bot.tenant][DEBUG]: Work completed: lease_id=[e7914629-86aa-4b51-bec5-5f2a1eaac099]
2019-01-08 13:52:00,260:[ buildgrid.bot.session][DEBUG]: Updating bot session: [.valmont]
2019-01-08 13:52:00,380:[ buildgrid.bot.interface][ERROR]: StatusCode.RESOURCE_EXHAUSTED
2019-01-08 13:52:00,881:[ buildgrid.bot.session][DEBUG]: Creating bot session
2019-01-08 13:52:00,990:[ buildgrid.bot.interface][ERROR]: StatusCode.RESOURCE_EXHAUSTED
2019-01-08 13:52:01,491:[ buildgrid.bot.session][DEBUG]: Creating bot session
2019-01-08 13:52:01,602:[ buildgrid.bot.interface][ERROR]: StatusCode.RESOURCE_EXHAUSTED
2019-01-08 13:52:02,102:[ buildgrid.bot.session][DEBUG]: Creating bot session
The two last lines are repeated every second or so.
If I just run a new bgd bot or restart bgd bot, then next jobs are run.
bgd operation list reports that the job (in this case e7914629-86aa-4b51-bec5-5f2a1eaac099) is still being executed:
Queued: /a3310456-609f-4dae-b85b-98d7b31e11fe: Waiting for execution (stage=2)
Executing: /e7914629-86aa-4b51-bec5-5f2a1eaac099: Currently running (stage=3)
This issue happens very frequently. For most of jobs.