-
Eric Blake authored
nbdcopy has a nasty bug when performing multi-threaded copies using asynchronous nbd calls - it was blindly treating the completion of an asynchronous command as successful, rather than checking the *error parameter. This can result in the silent creation of a corrupted image in two different ways: when a read fails, we blindly wrote garbage to the destination; when a write fails, we did not flag that the destination was not written. Since nbdcopy already calls exit() on a synchronous read or write failure to a file, doing the same for an asynchronous op to an NBD server is the simplest solution. A nicer solution, but more invasive to code and thus not done here, might be to allow up to N retries of the transaction (in case the read or write failure was transient), or even having a mode where as much data is copied as possible (portions of the copy that failed would be logged on stderr, and nbdcopy would still fail with a non-zero exit status, but this would copy more than just stopping at the first error, as can be done with rsync or ddrescue). Note that since we rely on auto-retiring and do NOT call nbd_aio_command_completed, our completion callbacks must always return 1 (if they do not exit() first), even when acting on *error, so as not leave the command allocated until nbd_close. As such, there is no sane way to return an error to a manual caller of the callback, and therefore we can drop dead code that calls perror() and exit() if the callback "failed". It is also worth documenting the contract on when we must manually call the callback during the asynch_zero callback, so that we do not leak or double-free the command; thankfully, all the existing code paths were correct. The added testsuite script demonstrates several scenarios, some of which fail without the rest of this patch in place, and others which showcase ways in which sparse images can bypass errors. Once backports are complete, a followup patch on the main branch will edit docs/libnbd-security.pod with the mailing list announcement of the stable branch commit ids and release versions that incorporate this fix. Reported-by: Nir Soffer <nsoffer@redhat.com> Fixes: bc896eec ("copy: Implement multi-conn, multiple threads, multiple requests in flight.", v1.5.6) Fixes: https://bugzilla.redhat.com/2046194 Message-Id: <20220203202558.203013-6-eblake@redhat.com> Acked-by: Richard W.M. Jones <rjones@redhat.com> Acked-by: Nir Soffer <nsoffer@redhat.com> [eblake: fix error message per Nir, tweak requires lines in unit test per Rich] Reviewed-by: Laszlo Ersek <lersek@redhat.com>
8d444b41