Skip to content

Fix waitForRunningContainer

We've found that waitForRunningContainer does not work as expected under certain circumstances. While we haven't seen any of those issues or have reports thereof, we should still fix it.

Problems / issues to address:

  • we don't handle watch channel closes; those might occur on context cancellations & watch timeouts
  • rather than the default return, after reading all events, being nil, we should return an explicit error (e.g. fmt.Errorf("container %q not running in time", container))
  • in addition (or as an replacement of the current tests) we should set up tests against a real cluster, esp. covering timeout scenarios; the fake client infra does behave slightly different (ie. not closing the channel on timeout)
  • optional: we might want to handle Error Events from the channel explicitly, and return the underlying error instead of event object is not a pod

Note:
I don't think these are pressing issues, from what I can tell the worst that can currently happen is that we surface a somewhat wrong error event object is not a pod: nil when there was actually a timeout waiting for the container. However, to make our users' and our lives easier, we should surface errors that properly capture the cause.