More robust request routing
✅ Problem 1
Was addressed in !609 (merged).
Currently when kas is looking for another kas to route an incoming request to, it checks Redis and gets a list of kas instances with suitable tunnels. It then tries to connect to them in order. If it connects to one it just uses any suitable tunnel from it. However, if there are no suitable tunnels (e.g. they all have been used by concurrent requests or have been disconnected), then the kas->kas connection is still open, waiting for a tunnel. If a suitable tunnel appears, it's immediately picked up and used, all good. However, it may happen that it never appears on that kas instance - for example, if the agent connects to a different kas (depends on load balancer and the agent itself when there is no load balancer). This situation will result in timing out requests.
This works like that because it was much faster to do, it's "iteration 1" and is good enough for that. However, this edge case must be addressed to avoid those spuriously stuck requests.
Proposal
When kas 1 routes to kas 2 and kas 2 doesn't have a suitable tunnel, kas 2 need to immediately reply, letting kas 1 know about that. kas 1 to kas 2 request still stays open in case a suitable tunnel becomes available, but kas 1 also looks for a suitable tunnel on a different kas instance. This can be repeated as many times as necessary, up to the number of available kas instances.
✅ Problem 2
Was addressed in !559 (merged).
Currently kas -> kas gRPC pool is not actually pooling connections (first iteration code, quite dumb, but took less than an hour to build). A connection is established and torn down for each request, which is wasteful.
A thing to remember here is that kas instances can come and go dynamically so gRPC connections cannot be cached per IP forever. There needs to be a TTL or some other mechanism to evict connections that are no longer valid.