Commit 08830296 authored by Eric Blake's avatar Eric Blake
Browse files

states: Use RESYNC to handle more structured reply server bugs



In software, it's always better to be strict in what you produce and
lenient in what you accept.  Continuing on from the previous patch,
there are quite a few situations where we are expecting a particular
structured reply, but can still gracefully recover and keep the
connection alive if the server sends us something unexpected (either a
wrong reply type, or a wrong length to a correct reply type).  There
are only a few situations left where we really do want a structured
reply to move to DEAD: read() errors on the socket itself, if the
server's payload is so large as to be a denial of service, or if
malloc() fails.

While touching the code, note that once we check that cmd->type is
NBD_CMD_BLOCK_STATUS, we do not also need to check whether
cmd->cb.fn.extent is non-null.

The handling for NBD_REPLY_TYPE_ERROR_* is interesting: since the
error value comes first, even if we don't get a full error packet over
the wire, we can prefer the server's errno over EPROTO.  But that
means error messages should not use RESYNC except when we don't get a
full error value.  This patch also changes the code to use EPROTO
instead of EINVAL if the server mistakenly sends NBD_SUCCESS as its
error code.

Again, compliant servers will never trip over to the new state; but an
easy way to demonstrate this in action is with a temporary patch to
nbdkit:

| diff --git i/common/protocol/nbd-protocol.h w/common/protocol/nbd-protocol.h
| index e5d6404b..1e05d825 100644
| --- i/common/protocol/nbd-protocol.h
| +++ w/common/protocol/nbd-protocol.h
| @@ -242,7 +242,7 @@ struct nbd_structured_reply_error {
|
|  /* Structured reply types. */
|  #define NBD_REPLY_TYPE_NONE         0
| -#define NBD_REPLY_TYPE_OFFSET_DATA  1
| +#define NBD_REPLY_TYPE_OFFSET_DATA  11
|  #define NBD_REPLY_TYPE_OFFSET_HOLE  2
|  #define NBD_REPLY_TYPE_BLOCK_STATUS 5
|  #define NBD_REPLY_TYPE_ERROR        NBD_REPLY_TYPE_ERR (1)

With the following command line:

$ ./run /path/to/nbdkit -U - memory 1M --run 'nbdsh -u "$uri" -c "\
try:
  h.pread(1, 0)
except nbd.Error as ex:
  print(ex.string)
h.flush()
print(h.get_size())
"'

pre-patch results show the client hanging up abruptly on the server:

nbd_pread: unknown structured reply type (11)
nbdsh: command line script failed: nbd_flush: invalid state: DEAD: the handle must be connected with the server: Invalid argument
nbdkit: memory.1: error: read request: Connection reset by peer

while post-patch detects the server error, but allows the next command:

nbd_pread: read: command failed: Protocol error
1048576

Reading the LIBNBD_DEBUG=1 log further shows:

libnbd: debug: nbdsh: nbd_pread: unexpected reply type 11 or payload length 9 for cookie 1 and command 0, this is probably a server bug

Message-Id: <20220812020638.856332-5-eblake@redhat.com>
Acked-by: default avatarRichard W.M. Jones <rjones@redhat.com>
parent 9acf53b1
Pipeline #611324275 failed with stage
in 10 minutes and 19 seconds
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment