Exception thrown when retrying a task

Summary

When trying to retry a failed task, it is possible that a stack trace is shown and BuildStream becomes blocked.

I had more success with reproducing the bug when pushing to an artifact server and waiting a few minutes before pressing 'c' to continue

Steps to reproduce

  • Build a failing artifact
  • At the question prompt, wait a bit
  • hit continue

What is the current bug behavior?

A strack trace is thrown

What is the expected correct behavior?

There should be no stacktrace and the underlying problem should be handled

Relevant logs and/or screenshots

Push failure on element: stage0/freedesktop-junction.bst:bootstrap/build/base-sdk/filtered.bst

Choose one of the following options:
  (c)ontinue  - Continue queueing jobs as much as possible
  (q)uit      - Exit after all ongoing jobs complete
  (t)erminate - Terminate any ongoing jobs and exit
  (r)etry     - Retry this job
  (l)og       - View the full log file

Pressing ^C will terminate jobs and exit

Choice: [continue]: r

Retrying failed job

Unknown exception in SIGCHLD handler
Traceback (most recent call last):
  File "/usr/lib/python3.7/asyncio/unix_events.py", line 876, in _sig_chld
    self._do_waitpid_all()
  File "/usr/lib/python3.7/asyncio/unix_events.py", line 942, in _do_waitpid_all
    self._do_waitpid(pid)
  File "/usr/lib/python3.7/asyncio/unix_events.py", line 976, in _do_waitpid
    callback(pid, returncode, *args)
  File "/usr/local/lib/python3.7/dist-packages/buildstream/_scheduler/jobs/job.py", line 516, in _parent_child_completed
    self._scheduler.job_completed(self, status)
  File "/usr/local/lib/python3.7/dist-packages/buildstream/_scheduler/scheduler.py", line 251, in job_completed
    self._state.fail_task(job.action_name, job.name, element=element_info)
  File "/usr/local/lib/python3.7/dist-packages/buildstream/_state.py", line 331, in fail_task
    cb(action_name, full_name, element)
  File "/usr/local/lib/python3.7/dist-packages/buildstream/_frontend/app.py", line 581, in _job_failed
    self._handle_failure(element, action_name, failure, full_name)
  File "/usr/local/lib/python3.7/dist-packages/buildstream/_frontend/app.py", line 667, in _handle_failure
    self.stream._failure_retry(action_name, unique_id)
  File "/usr/local/lib/python3.7/dist-packages/buildstream/_stream.py", line 1338, in _failure_retry
    queue._task_group.failed_tasks.remove(element._get_full_name())
ValueError: list.remove(x): x not in list
[00:12:33][11e5e5ba][   fetch:stage0/freedesktop-junction.bst:bootstrap/build/binutils-stage1.bst] BUG     Fetch