Skip to content

Investigate Calico connectivity issues during deployments

Summary

During a recent deployment, EOC was alerted with the following Alert:

Firing 1 - Containers for the git service, main are unable to unable to start. More than 50% of the deployment's maxSurge setting consists of containers unable to start for reasons other than ContainerCreating.

Prometheus showed the following, query:

prometheus

A quick investigation showed the following errors in the affected pods (more context here: production#6268 (comment 829420838))

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "0bee02cbc1e95d8ed1095632c0948dd8ee0eaae233793754fde926965a218a0c": failed to find plugin "calico" in path [/home/kubernetes/bin]

This issue's aim is to understand this error and decide if there's any actions that should be taken.

Related Incident(s)

Originating issue(s): production#6268 (closed)

Desired Outcome/Acceptance criteria

  • The error observed is understood.
  • Evaluate whether there's any action that can be taken to prevent this from re-occurring.

Associated Services

Corrective Action Issue Checklist

  • link the incident(s) this corrective action arose out of
  • give context for what problem this corrective action is trying to prevent from re-occurring
  • assign a severity label (this is the highest sev of related incidents, defaults to 'severity::4')
  • assign a priority (this will default to 'priority::4')