1. 04 May, 2019 5 commits
    • David Howells's avatar
      afs: Fix StoreData op marshalling · fb853a4a
      David Howells authored
      [ Upstream commit 8c7ae38d ]
      
      The marshalling of AFS.StoreData, AFS.StoreData64 and YFS.StoreData64 calls
      generated by ->setattr() ops for the purpose of expanding a file is
      incorrect due to older documentation incorrectly describing the way the RPC
      'FileLength' parameter is meant to work.
      
      The older documentation says that this is the length the file is meant to
      end up at the end of the operation; however, it was never implemented this
      way in any of the servers, but rather the file is truncated down to this
      before the write operation is effected, and never expanded to it (and,
      indeed, it was renamed to 'TruncPos' in 2014).
      
      Fix this by setting the position parameter to the new file length and doing
      a zero-lengh write there.
      
      The bug causes Xwayland to SIGBUS due to unexpected non-expansion of a file
      it then mmaps.  This can be tested by giving the following test program a
      filename in an AFS directory:
      
      	#include <stdio.h>
      	#include <stdlib.h>
      	#include <unistd.h>
      	#include <fcntl.h>
      	#include <sys/mman.h>
      	int main(int argc, char *argv[])
      	{
      		char *p;
      		int fd;
      		if (argc != 2) {
      			fprintf(stderr,
      				"Format: test-trunc-mmap <file>\n");
      			exit(2);
      		}
      		fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC);
      		if (fd < 0) {
      			perror(argv[1]);
      			exit(1);
      		}
      		if (ftruncate(fd, 0x140008) == -1) {
      			perror("ftruncate");
      			exit(1);
      		}
      		p = mmap(NULL, 4096, PROT_READ | PROT_WRITE,
      			 MAP_SHARED, fd, 0);
      		if (p == MAP_FAILED) {
      			perror("mmap");
      			exit(1);
      		}
      		p[0] = 'a';
      		if (munmap(p, 4096) < 0) {
      			perror("munmap");
      			exit(1);
      		}
      		if (close(fd) < 0) {
      			perror("close");
      			exit(1);
      		}
      		exit(0);
      	}
      
      Fixes: 31143d5d ("AFS: implement basic file write support")
      Reported-by: 's avatarJonathan Billings <jsbillin@umich.edu>
      Tested-by: 's avatarJonathan Billings <jsbillin@umich.edu>
      Signed-off-by: 's avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: 's avatarSasha Levin (Microsoft) <sashal@kernel.org>
      fb853a4a
    • Al Viro's avatar
      ceph: fix use-after-free on symlink traversal · b0921da0
      Al Viro authored
      [ Upstream commit daf5cc27 ]
      
      free the symlink body after the same RCU delay we have for freeing the
      struct inode itself, so that traversal during RCU pathwalk wouldn't step
      into freed memory.
      Signed-off-by: 's avatarAl Viro <viro@zeniv.linux.org.uk>
      Reviewed-by: 's avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: 's avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: 's avatarSasha Levin (Microsoft) <sashal@kernel.org>
      b0921da0
    • Trond Myklebust's avatar
      NFS: Fix a typo in nfs_init_timeout_values() · 8dcf6dce
      Trond Myklebust authored
      [ Upstream commit 5a698243 ]
      
      Specifying a retrans=0 mount parameter to a NFS/TCP mount, is
      inadvertently causing the NFS client to rewrite any specified
      timeout parameter to the default of 60 seconds.
      
      Fixes: a956beda ("NFS: Allow the mount option retrans=0")
      Signed-off-by: 's avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: 's avatarSasha Levin (Microsoft) <sashal@kernel.org>
      8dcf6dce
    • Filipe Manana's avatar
      Btrfs: fix file corruption after snapshotting due to mix of buffered/DIO writes · 099a2655
      Filipe Manana authored
      [ Upstream commit 609e804d ]
      
      When we are mixing buffered writes with direct IO writes against the same
      file and snapshotting is happening concurrently, we can end up with a
      corrupt file content in the snapshot. Example:
      
      1) Inode/file is empty.
      
      2) Snapshotting starts.
      
      2) Buffered write at offset 0 length 256Kb. This updates the i_size of the
         inode to 256Kb, disk_i_size remains zero. This happens after the task
         doing the snapshot flushes all existing delalloc.
      
      3) DIO write at offset 256Kb length 768Kb. Once the ordered extent
         completes it sets the inode's disk_i_size to 1Mb (256Kb + 768Kb) and
         updates the inode item in the fs tree with a size of 1Mb (which is
         the value of disk_i_size).
      
      4) The dealloc for the range [0, 256Kb[ did not start yet.
      
      5) The transaction used in the DIO ordered extent completion, which updated
         the inode item, is committed by the snapshotting task.
      
      6) Snapshot creation completes.
      
      7) Dealloc for the range [0, 256Kb[ is flushed.
      
      After that when reading the file from the snapshot we always get zeroes for
      the range [0, 256Kb[, the file has a size of 1Mb and the data written by
      the direct IO write is found. From an application's point of view this is
      a corruption, since in the source subvolume it could never read a version
      of the file that included the data from the direct IO write without the
      data from the buffered write included as well. In the snapshot's tree,
      file extent items are missing for the range [0, 256Kb[.
      
      The issue, obviously, does not happen when using the -o flushoncommit
      mount option.
      
      Fix this by flushing delalloc for all the roots that are about to be
      snapshotted when committing a transaction. This guarantees total ordering
      when updating the disk_i_size of an inode since the flush for dealloc is
      done when a transaction is in the TRANS_STATE_COMMIT_START state and wait
      is done once no more external writers exist. This is similar to what we
      do when using the flushoncommit mount option, but we do it only if the
      transaction has snapshots to create and only for the roots of the
      subvolumes to be snapshotted. The bulk of the dealloc is flushed in the
      snapshot creation ioctl, so the flush work we do inside the transaction
      is minimized.
      
      This issue, involving buffered and direct IO writes with snapshotting, is
      often triggered by fstest btrfs/078, and got reported by fsck when not
      using the NO_HOLES features, for example:
      
        $ cat results/btrfs/078.full
        (...)
        _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent
        *** fsck.btrfs output ***
        [1/7] checking root items
        [2/7] checking extents
        [3/7] checking free space cache
        [4/7] checking fs roots
        root 258 inode 264 errors 100, file extent discount
        Found file extent holes:
              start: 524288, len: 65536
        ERROR: errors found in fs roots
      Signed-off-by: 's avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: 's avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: 's avatarSasha Levin (Microsoft) <sashal@kernel.org>
      099a2655
    • Matthew Wilcox's avatar
      fs: prevent page refcount overflow in pipe_buf_get · 27f65114
      Matthew Wilcox authored
      commit 15fab63e upstream.
      
      Change pipe_buf_get() to return a bool indicating whether it succeeded
      in raising the refcount of the page (if the thing in the pipe is a page).
      This removes another mechanism for overflowing the page refcount.  All
      callers converted to handle a failure.
      Reported-by: 's avatarJann Horn <jannh@google.com>
      Signed-off-by: 's avatarMatthew Wilcox <willy@infradead.org>
      Cc: stable@kernel.org
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      27f65114
  2. 02 May, 2019 18 commits
    • Al Viro's avatar
      Fix aio_poll() races · b6dd51f0
      Al Viro authored
      commit af5c72b1 upstream.
      
      aio_poll() has to cope with several unpleasant problems:
      	* requests that might stay around indefinitely need to
      be made visible for io_cancel(2); that must not be done to
      a request already completed, though.
      	* in cases when ->poll() has placed us on a waitqueue,
      wakeup might have happened (and request completed) before ->poll()
      returns.
      	* worse, in some early wakeup cases request might end
      up re-added into the queue later - we can't treat "woken up and
      currently not in the queue" as "it's not going to stick around
      indefinitely"
      	* ... moreover, ->poll() might have decided not to
      put it on any queues to start with, and that needs to be distinguished
      from the previous case
      	* ->poll() might have tried to put us on more than one queue.
      Only the first will succeed for aio poll, so we might end up missing
      wakeups.  OTOH, we might very well notice that only after the
      wakeup hits and request gets completed (all before ->poll() gets
      around to the second poll_wait()).  In that case it's too late to
      decide that we have an error.
      
      req->woken was an attempt to deal with that.  Unfortunately, it was
      broken.  What we need to keep track of is not that wakeup has happened -
      the thing might come back after that.  It's that async reference is
      already gone and won't come back, so we can't (and needn't) put the
      request on the list of cancellables.
      
      The easiest case is "request hadn't been put on any waitqueues"; we
      can tell by seeing NULL apt.head, and in that case there won't be
      anything async.  We should either complete the request ourselves
      (if vfs_poll() reports anything of interest) or return an error.
      
      In all other cases we get exclusion with wakeups by grabbing the
      queue lock.
      
      If request is currently on queue and we have something interesting
      from vfs_poll(), we can steal it and complete the request ourselves.
      
      If it's on queue and vfs_poll() has not reported anything interesting,
      we either put it on the cancellable list, or, if we know that it
      hadn't been put on all queues ->poll() wanted it on, we steal it and
      return an error.
      
      If it's _not_ on queue, it's either been already dealt with (in which
      case we do nothing), or there's aio_poll_complete_work() about to be
      executed.  In that case we either put it on the cancellable list,
      or, if we know it hadn't been put on all queues ->poll() wanted it on,
      simulate what cancel would've done.
      
      It's a lot more convoluted than I'd like it to be.  Single-consumer APIs
      suck, and unfortunately aio is not an exception...
      Signed-off-by: 's avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b6dd51f0
    • Al Viro's avatar
      aio: store event at final iocb_put() · f6408361
      Al Viro authored
      commit 2bb874c0 upstream.
      
      Instead of having aio_complete() set ->ki_res.{res,res2}, do that
      explicitly in its callers, drop the reference (as aio_complete()
      used to do) and delay the rest until the final iocb_put().
      Signed-off-by: 's avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f6408361
    • Al Viro's avatar
      aio: keep io_event in aio_kiocb · a8a538ae
      Al Viro authored
      commit a9339b78 upstream.
      
      We want to separate forming the resulting io_event from putting it
      into the ring buffer.
      Signed-off-by: 's avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8a538ae
    • Al Viro's avatar
      aio: fold lookup_kiocb() into its sole caller · 636fa71e
      Al Viro authored
      commit 833f4154 upstream.
      Signed-off-by: 's avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      636fa71e
    • Linus Torvalds's avatar
      pin iocb through aio. · 199f34c1
      Linus Torvalds authored
      commit b53119f1 upstream.
      
      aio_poll() is not the only case that needs file pinned; worse, while
      aio_read()/aio_write() can live without pinning iocb itself, the
      proof is rather brittle and can easily break on later changes.
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: 's avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      199f34c1
    • Tetsuo Handa's avatar
      NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family. · becfa96e
      Tetsuo Handa authored
      commit 7c2bd9a3 upstream.
      
      syzbot is reporting uninitialized value at rpc_sockaddr2uaddr() [1]. This
      is because syzbot is setting AF_INET6 to "struct sockaddr_in"->sin_family
      (which is embedded into user-visible "struct nfs_mount_data" structure)
      despite nfs23_validate_mount_data() cannot pass sizeof(struct sockaddr_in6)
      bytes of AF_INET6 address to rpc_sockaddr2uaddr().
      
      Since "struct nfs_mount_data" structure is user-visible, we can't change
      "struct nfs_mount_data" to use "struct sockaddr_storage". Therefore,
      assuming that everybody is using AF_INET family when passing address via
      "struct nfs_mount_data"->addr, reject if its sin_family is not AF_INET.
      
      [1] https://syzkaller.appspot.com/bug?id=599993614e7cbbf66bc2656a919ab2a95fb5d75cReported-by: 's avatarsyzbot <syzbot+047a11c361b872896a4f@syzkaller.appspotmail.com>
      Signed-off-by: 's avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: 's avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      becfa96e
    • Dan Carpenter's avatar
      ext4: fix some error pointer dereferences · acaec7f6
      Dan Carpenter authored
      commit 7159a986 upstream.
      
      We can't pass error pointers to brelse().
      
      Fixes: fb265c9c ("ext4: add ext4_sb_bread() to disambiguate ENOMEM cases")
      Signed-off-by: 's avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: Theodore Ts'o's avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: 's avatarJan Kara <jack@suse.cz>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      acaec7f6
    • Jeff Layton's avatar
      nfsd: wake blocked file lock waiters before sending callback · 6d29f7c7
      Jeff Layton authored
      commit f456458e upstream.
      
      When a blocked NFS lock is "awoken" we send a callback to the server and
      then wake any hosts waiting on it. If a client attempts to get a lock
      and then drops off the net, we could end up waiting for a long time
      until we end up waking locks blocked on that request.
      
      So, wake any other waiting lock requests before sending the callback.
      Do this by calling locks_delete_block in a new "prepare" phase for
      CB_NOTIFY_LOCK callbacks.
      
      URL: https://bugzilla.kernel.org/show_bug.cgi?id=203363
      Fixes: 16306a61 ("fs/locks: always delete_block after waiting.")
      Reported-by: 's avatarSlawomir Pryczek <slawek1211@gmail.com>
      Cc: Neil Brown <neilb@suse.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: 's avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: 's avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d29f7c7
    • Jeff Layton's avatar
      nfsd: wake waiters blocked on file_lock before deleting it · 6569ae32
      Jeff Layton authored
      commit 6aaafc43 upstream.
      
      After a blocked nfsd file_lock request is deleted, knfsd will send a
      callback to the client and then free the request. Commit 16306a61
      ("fs/locks: always delete_block after waiting.") changed it such that
      locks_delete_block is always called on a request after it is awoken,
      but that patch missed fixing up blocked nfsd request handling.
      
      Call locks_delete_block on the block to wake up any locks still blocked
      on the nfsd lock request before freeing it. Some of its callers already
      do this however, so just remove those calls.
      
      URL: https://bugzilla.kernel.org/show_bug.cgi?id=203363
      Fixes: 16306a61 ("fs/locks: always delete_block after waiting.")
      Reported-by: 's avatarSlawomir Pryczek <slawek1211@gmail.com>
      Cc: Neil Brown <neilb@suse.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: 's avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: 's avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6569ae32
    • Trond Myklebust's avatar
      nfsd: Don't release the callback slot unless it was actually held · 5e4a20e6
      Trond Myklebust authored
      commit e6abc8ca upstream.
      
      If there are multiple callbacks queued, waiting for the callback
      slot when the callback gets shut down, then they all currently
      end up acting as if they hold the slot, and call
      nfsd4_cb_sequence_done() resulting in interesting side-effects.
      
      In addition, the 'retry_nowait' path in nfsd4_cb_sequence_done()
      causes a loop back to nfsd4_cb_prepare() without first freeing the
      slot, which causes a deadlock when nfsd41_cb_get_slot() gets called
      a second time.
      
      This patch therefore adds a boolean to track whether or not the
      callback did pick up the slot, so that it can do the right thing
      in these 2 cases.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: 's avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: 's avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e4a20e6
    • Yan, Zheng's avatar
      ceph: fix ci->i_head_snapc leak · 87058848
      Yan, Zheng authored
      commit 37659182 upstream.
      
      We missed two places that i_wrbuffer_ref_head, i_wr_ref, i_dirty_caps
      and i_flushing_caps may change. When they are all zeros, we should free
      i_head_snapc.
      
      Cc: stable@vger.kernel.org
      Link: https://tracker.ceph.com/issues/38224Reported-and-tested-by: 's avatarLuis Henriques <lhenriques@suse.com>
      Signed-off-by: 's avatar"Yan, Zheng" <zyan@redhat.com>
      Signed-off-by: 's avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87058848
    • Jeff Layton's avatar
      ceph: ensure d_name stability in ceph_dentry_hash() · d9061ef0
      Jeff Layton authored
      commit 76a495d6 upstream.
      
      Take the d_lock here to ensure that d_name doesn't change.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: 's avatarJeff Layton <jlayton@kernel.org>
      Reviewed-by: 's avatar"Yan, Zheng" <zyan@redhat.com>
      Signed-off-by: 's avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d9061ef0
    • Jeff Layton's avatar
      ceph: only use d_name directly when parent is locked · bcd9cbff
      Jeff Layton authored
      commit 1bcb3440 upstream.
      
      Ben reported tripping the BUG_ON in create_request_message during some
      performance testing. Analysis of the vmcore showed that the length of
      the r_dentry->d_name string changed after we allocated the buffer, but
      before we encoded it.
      
      build_dentry_path returns pointers to d_name in the common case of
      non-snapped dentries, but this optimization isn't safe unless the parent
      directory is locked. When it isn't, have the code make a copy of the
      d_name while holding the d_lock.
      
      Cc: stable@vger.kernel.org
      Reported-by: 's avatarBen England <bengland@redhat.com>
      Signed-off-by: 's avatarJeff Layton <jlayton@kernel.org>
      Reviewed-by: 's avatar"Yan, Zheng" <zyan@redhat.com>
      Signed-off-by: 's avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bcd9cbff
    • Jann Horn's avatar
      tracing: Fix buffer_ref pipe ops · 8659a04c
      Jann Horn authored
      commit b9872226 upstream.
      
      This fixes multiple issues in buffer_pipe_buf_ops:
      
       - The ->steal() handler must not return zero unless the pipe buffer has
         the only reference to the page. But generic_pipe_buf_steal() assumes
         that every reference to the pipe is tracked by the page's refcount,
         which isn't true for these buffers - buffer_pipe_buf_get(), which
         duplicates a buffer, doesn't touch the page's refcount.
         Fix it by using generic_pipe_buf_nosteal(), which refuses every
         attempted theft. It should be easy to actually support ->steal, but the
         only current users of pipe_buf_steal() are the virtio console and FUSE,
         and they also only use it as an optimization. So it's probably not worth
         the effort.
       - The ->get() and ->release() handlers can be invoked concurrently on pipe
         buffers backed by the same struct buffer_ref. Make them safe against
         concurrency by using refcount_t.
       - The pointers stored in ->private were only zeroed out when the last
         reference to the buffer_ref was dropped. As far as I know, this
         shouldn't be necessary anyway, but if we do it, let's always do it.
      
      Link: http://lkml.kernel.org/r/20190404215925.253531-1-jannh@google.com
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: stable@vger.kernel.org
      Fixes: 73a757e6 ("ring-buffer: Return reader page back into existing ring buffer")
      Signed-off-by: 's avatarJann Horn <jannh@google.com>
      Signed-off-by: 's avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8659a04c
    • Frank Sorenson's avatar
      cifs: do not attempt cifs operation on smb2+ rename error · 90b70b3e
      Frank Sorenson authored
      commit 652727bb upstream.
      
      A path-based rename returning EBUSY will incorrectly try opening
      the file with a cifs (NT Create AndX) operation on an smb2+ mount,
      which causes the server to force a session close.
      
      If the mount is smb2+, skip the fallback.
      Signed-off-by: 's avatarFrank Sorenson <sorenson@redhat.com>
      Signed-off-by: 's avatarSteve French <stfrench@microsoft.com>
      CC: Stable <stable@vger.kernel.org>
      Reviewed-by: 's avatarRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      90b70b3e
    • Jérôme Glisse's avatar
      cifs: fix page reference leak with readv/writev · e6302b84
      Jérôme Glisse authored
      commit 13f5938d upstream.
      
      CIFS can leak pages reference gotten through GUP (get_user_pages*()
      through iov_iter_get_pages()). This happen if cifs_send_async_read()
      or cifs_write_from_iter() calls fail from within __cifs_readv() and
      __cifs_writev() respectively. This patch move page unreference to
      cifs_aio_ctx_release() which will happens on all code paths this is
      all simpler to follow for correctness.
      Signed-off-by: 's avatarJérôme Glisse <jglisse@redhat.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: linux-cifs@vger.kernel.org
      Cc: samba-technical@lists.samba.org
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Stable <stable@vger.kernel.org>
      Signed-off-by: 's avatarSteve French <stfrench@microsoft.com>
      Reviewed-by: 's avatarPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6302b84
    • Ronnie Sahlberg's avatar
      cifs: fix memory leak in SMB2_read · 62cf691c
      Ronnie Sahlberg authored
      [ Upstream commit 05fd5c2c ]
      
      Commit 088aaf17 introduced a leak where
      if SMB2_read() returned an error we would return without freeing the
      request buffer.
      
      Cc: Stable <stable@vger.kernel.org>
      Signed-off-by: 's avatarRonnie Sahlberg <lsahlber@redhat.com>
      Reviewed-by: 's avatarPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: 's avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: 's avatarSasha Levin <sashal@kernel.org>
      62cf691c
    • YueHaibing's avatar
      fs/proc/proc_sysctl.c: Fix a NULL pointer dereference · 41e09d7e
      YueHaibing authored
      [ Upstream commit 89189557 ]
      
      Syzkaller report this:
      
        sysctl could not get directory: /net//bridge -12
        kasan: CONFIG_KASAN_INLINE enabled
        kasan: GPF could be caused by NULL-ptr deref or user memory access
        general protection fault: 0000 [#1] SMP KASAN PTI
        CPU: 1 PID: 7027 Comm: syz-executor.0 Tainted: G         C        5.1.0-rc3+ #8
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
        RIP: 0010:__write_once_size include/linux/compiler.h:220 [inline]
        RIP: 0010:__rb_change_child include/linux/rbtree_augmented.h:144 [inline]
        RIP: 0010:__rb_erase_augmented include/linux/rbtree_augmented.h:186 [inline]
        RIP: 0010:rb_erase+0x5f4/0x19f0 lib/rbtree.c:459
        Code: 00 0f 85 60 13 00 00 48 89 1a 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 89 f2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 75 0c 00 00 4d 85 ed 4c 89 2e 74 ce 4c 89 ea 48
        RSP: 0018:ffff8881bb507778 EFLAGS: 00010206
        RAX: dffffc0000000000 RBX: ffff8881f224b5b8 RCX: ffffffff818f3f6a
        RDX: 000000000000000a RSI: 0000000000000050 RDI: ffff8881f224b568
        RBP: 0000000000000000 R08: ffffed10376a0ef4 R09: ffffed10376a0ef4
        R10: 0000000000000001 R11: ffffed10376a0ef4 R12: ffff8881f224b558
        R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
        FS:  00007f3e7ce13700(0000) GS:ffff8881f7300000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007fd60fbe9398 CR3: 00000001cb55c001 CR4: 00000000007606e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        PKRU: 55555554
        Call Trace:
         erase_entry fs/proc/proc_sysctl.c:178 [inline]
         erase_header+0xe3/0x160 fs/proc/proc_sysctl.c:207
         start_unregistering fs/proc/proc_sysctl.c:331 [inline]
         drop_sysctl_table+0x558/0x880 fs/proc/proc_sysctl.c:1631
         get_subdir fs/proc/proc_sysctl.c:1022 [inline]
         __register_sysctl_table+0xd65/0x1090 fs/proc/proc_sysctl.c:1335
         br_netfilter_init+0x68/0x1000 [br_netfilter]
         do_one_initcall+0xbc/0x47d init/main.c:901
         do_init_module+0x1b5/0x547 kernel/module.c:3456
         load_module+0x6405/0x8c10 kernel/module.c:3804
         __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
         do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
        Modules linked in: br_netfilter(+) backlight comedi(C) hid_sensor_hub max3100 ti_ads8688 udc_core fddi snd_mona leds_gpio rc_streamzap mtd pata_netcell nf_log_common rc_winfast udp_tunnel snd_usbmidi_lib snd_usb_toneport snd_usb_line6 snd_rawmidi snd_seq_device snd_hwdep videobuf2_v4l2 videobuf2_common videodev media videobuf2_vmalloc videobuf2_memops rc_gadmei_rm008z 8250_of smm665 hid_tmff hid_saitek hwmon_vid rc_ati_tv_wonder_hd_600 rc_core pata_pdc202xx_old dn_rtmsg as3722 ad714x_i2c ad714x snd_soc_cs4265 hid_kensington panel_ilitek_ili9322 drm drm_panel_orientation_quirks ipack cdc_phonet usbcore phonet hid_jabra hid extcon_arizona can_dev industrialio_triggered_buffer kfifo_buf industrialio adm1031 i2c_mux_ltc4306 i2c_mux ipmi_msghandler mlxsw_core snd_soc_cs35l34 snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer ac97_bus snd_compress snd soundcore gpio_da9055 uio ecdh_generic mdio_thunder of_mdio fixed_phy libphy mdio_cavium iptable_security iptable_raw iptable_mangle
         iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun joydev mousedev ppdev tpm kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ide_pci_generic piix aes_x86_64 crypto_simd cryptd ide_core glue_helper input_leds psmouse intel_agp intel_gtt serio_raw ata_generic i2c_piix4 agpgart pata_acpi parport_pc parport floppy rtc_cmos sch_fq_codel ip_tables x_tables sha1_ssse3 sha1_generic ipv6 [last unloaded: br_netfilter]
        Dumping ftrace buffer:
           (ftrace buffer empty)
        ---[ end trace 68741688d5fbfe85 ]---
      
      commit 23da9588 ("fs/proc/proc_sysctl.c: fix NULL pointer
      dereference in put_links") forgot to handle start_unregistering() case,
      while header->parent is NULL, it calls erase_header() and as seen in the
      above syzkaller call trace, accessing &header->parent->root will trigger
      a NULL pointer dereference.
      
      As that commit explained, there is also no need to call
      start_unregistering() if header->parent is NULL.
      
      Link: http://lkml.kernel.org/r/20190409153622.28112-1-yuehaibing@huawei.com
      Fixes: 23da9588 ("fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links")
      Fixes: 0e47c99d ("sysctl: Replace root_list with links between sysctl_table_sets")
      Signed-off-by: 's avatarYueHaibing <yuehaibing@huawei.com>
      Reported-by: 's avatarHulk Robot <hulkci@huawei.com>
      Reviewed-by: 's avatarKees Cook <keescook@chromium.org>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: 's avatarSasha Levin <sashal@kernel.org>
      41e09d7e
  3. 27 Apr, 2019 7 commits
    • Andrea Arcangeli's avatar
      coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping · 1eb719f0
      Andrea Arcangeli authored
      commit 04f5866e upstream.
      
      The core dumping code has always run without holding the mmap_sem for
      writing, despite that is the only way to ensure that the entire vma
      layout will not change from under it.  Only using some signal
      serialization on the processes belonging to the mm is not nearly enough.
      This was pointed out earlier.  For example in Hugh's post from Jul 2017:
      
        https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils
      
        "Not strictly relevant here, but a related note: I was very surprised
         to discover, only quite recently, how handle_mm_fault() may be called
         without down_read(mmap_sem) - when core dumping. That seems a
         misguided optimization to me, which would also be nice to correct"
      
      In particular because the growsdown and growsup can move the
      vm_start/vm_end the various loops the core dump does around the vma will
      not be consistent if page faults can happen concurrently.
      
      Pretty much all users calling mmget_not_zero()/get_task_mm() and then
      taking the mmap_sem had the potential to introduce unexpected side
      effects in the core dumping code.
      
      Adding mmap_sem for writing around the ->core_dump invocation is a
      viable long term fix, but it requires removing all copy user and page
      faults and to replace them with get_dump_page() for all binary formats
      which is not suitable as a short term fix.
      
      For the time being this solution manually covers the places that can
      confuse the core dump either by altering the vma layout or the vma flags
      while it runs.  Once ->core_dump runs under mmap_sem for writing the
      function mmget_still_valid() can be dropped.
      
      Allowing mmap_sem protected sections to run in parallel with the
      coredump provides some minor parallelism advantage to the swapoff code
      (which seems to be safe enough by never mangling any vma field and can
      keep doing swapins in parallel to the core dumping) and to some other
      corner case.
      
      In order to facilitate the backporting I added "Fixes: 86039bd3"
      however the side effect of this same race condition in /proc/pid/mem
      should be reproducible since before 2.6.12-rc2 so I couldn't add any
      other "Fixes:" because there's no hash beyond the git genesis commit.
      
      Because find_extend_vma() is the only location outside of the process
      context that could modify the "mm" structures under mmap_sem for
      reading, by adding the mmget_still_valid() check to it, all other cases
      that take the mmap_sem for reading don't need the new check after
      mmget_not_zero()/get_task_mm().  The expand_stack() in page fault
      context also doesn't need the new check, because all tasks under core
      dumping are frozen.
      
      Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
      Fixes: 86039bd3 ("userfaultfd: add new syscall to provide memory externalization")
      Signed-off-by: 's avatarAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: 's avatarJann Horn <jannh@google.com>
      Suggested-by: 's avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: 's avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: 's avatarMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: 's avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: 's avatarJann Horn <jannh@google.com>
      Acked-by: 's avatarJason Gunthorpe <jgg@mellanox.com>
      Acked-by: 's avatarMichal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1eb719f0
    • Aneesh Kumar K.V's avatar
      fs/dax: Deposit pagetable even when installing zero page · d12bcf87
      Aneesh Kumar K.V authored
      commit 11cf9d86 upstream.
      
      Architectures like ppc64 use the deposited page table to store hardware
      page table slot information. Make sure we deposit a page table when
      using zero page at the pmd level for hash.
      
      Without this we hit
      
      Unable to handle kernel paging request for data at address 0x00000000
      Faulting instruction address: 0xc000000000082a74
      Oops: Kernel access of bad area, sig: 11 [#1]
      ....
      
      NIP [c000000000082a74] __hash_page_thp+0x224/0x5b0
      LR [c0000000000829a4] __hash_page_thp+0x154/0x5b0
      Call Trace:
       hash_page_mm+0x43c/0x740
       do_hash_page+0x2c/0x3c
       copy_from_iter_flushcache+0xa4/0x4a0
       pmem_copy_from_iter+0x2c/0x50 [nd_pmem]
       dax_copy_from_iter+0x40/0x70
       dax_iomap_actor+0x134/0x360
       iomap_apply+0xfc/0x1b0
       dax_iomap_rw+0xac/0x130
       ext4_file_write_iter+0x254/0x460 [ext4]
       __vfs_write+0x120/0x1e0
       vfs_write+0xd8/0x220
       SyS_write+0x6c/0x110
       system_call+0x3c/0x130
      
      Fixes: b5beae5e ("powerpc/pseries: Add driver for PAPR SCM regions")
      Cc: <stable@vger.kernel.org>
      Reviewed-by: 's avatarJan Kara <jack@suse.cz>
      Signed-off-by: 's avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: 's avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d12bcf87
    • Ronnie Sahlberg's avatar
      cifs: fix handle leak in smb2_query_symlink() · f6846161
      Ronnie Sahlberg authored
      commit e6d0fb7b upstream.
      
      If we enter smb2_query_symlink() for something that is not a symlink
      and where the SMB2_open() would succeed we would never end up
      closing this handle and would thus leak a handle on the server.
      
      Fix this by immediately calling SMB2_close() on successfull open.
      Signed-off-by: 's avatarRonnie Sahlberg <lsahlber@redhat.com>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: 's avatarSteve French <stfrench@microsoft.com>
      Reviewed-by: 's avatarPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f6846161
    • ZhangXiaoxu's avatar
      cifs: Fix use-after-free in SMB2_read · 76dbd554
      ZhangXiaoxu authored
      commit 088aaf17 upstream.
      
      There is a KASAN use-after-free:
      BUG: KASAN: use-after-free in SMB2_read+0x1136/0x1190
      Read of size 8 at addr ffff8880b4e45e50 by task ln/1009
      
      Should not release the 'req' because it will use in the trace.
      
      Fixes: eccb4422 ("smb3: Add ftrace tracepoints for improved SMB3 debugging")
      Signed-off-by: 's avatarZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: 's avatarSteve French <stfrench@microsoft.com>
      CC: Stable <stable@vger.kernel.org> 4.18+
      Reviewed-by: 's avatarPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      76dbd554
    • ZhangXiaoxu's avatar
      cifs: Fix use-after-free in SMB2_write · e8ac406c
      ZhangXiaoxu authored
      commit 6a3eb336 upstream.
      
      There is a KASAN use-after-free:
      BUG: KASAN: use-after-free in SMB2_write+0x1342/0x1580
      Read of size 8 at addr ffff8880b6a8e450 by task ln/4196
      
      Should not release the 'req' because it will use in the trace.
      
      Fixes: eccb4422 ("smb3: Add ftrace tracepoints for improved SMB3 debugging")
      Signed-off-by: 's avatarZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: 's avatarSteve French <stfrench@microsoft.com>
      CC: Stable <stable@vger.kernel.org> 4.18+
      Reviewed-by: 's avatarPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8ac406c
    • ZhangXiaoxu's avatar
      cifs: Fix lease buffer length error · 9582ba40
      ZhangXiaoxu authored
      commit b57a55e2 upstream.
      
      There is a KASAN slab-out-of-bounds:
      BUG: KASAN: slab-out-of-bounds in _copy_from_iter_full+0x783/0xaa0
      Read of size 80 at addr ffff88810c35e180 by task mount.cifs/539
      
      CPU: 1 PID: 539 Comm: mount.cifs Not tainted 4.19 #10
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
                  rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
      Call Trace:
       dump_stack+0xdd/0x12a
       print_address_description+0xa7/0x540
       kasan_report+0x1ff/0x550
       check_memory_region+0x2f1/0x310
       memcpy+0x2f/0x80
       _copy_from_iter_full+0x783/0xaa0
       tcp_sendmsg_locked+0x1840/0x4140
       tcp_sendmsg+0x37/0x60
       inet_sendmsg+0x18c/0x490
       sock_sendmsg+0xae/0x130
       smb_send_kvec+0x29c/0x520
       __smb_send_rqst+0x3ef/0xc60
       smb_send_rqst+0x25a/0x2e0
       compound_send_recv+0x9e8/0x2af0
       cifs_send_recv+0x24/0x30
       SMB2_open+0x35e/0x1620
       open_shroot+0x27b/0x490
       smb2_open_op_close+0x4e1/0x590
       smb2_query_path_info+0x2ac/0x650
       cifs_get_inode_info+0x1058/0x28f0
       cifs_root_iget+0x3bb/0xf80
       cifs_smb3_do_mount+0xe00/0x14c0
       cifs_do_mount+0x15/0x20
       mount_fs+0x5e/0x290
       vfs_kern_mount+0x88/0x460
       do_mount+0x398/0x31e0
       ksys_mount+0xc6/0x150
       __x64_sys_mount+0xea/0x190
       do_syscall_64+0x122/0x590
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      It can be reproduced by the following step:
        1. samba configured with: server max protocol = SMB2_10
        2. mount -o vers=default
      
      When parse the mount version parameter, the 'ops' and 'vals'
      was setted to smb30,  if negotiate result is smb21, just
      update the 'ops' to smb21, but the 'vals' is still smb30.
      When add lease context, the iov_base is allocated with smb21
      ops, but the iov_len is initiallited with the smb30. Because
      the iov_len is longer than iov_base, when send the message,
      copy array out of bounds.
      
      we need to keep the 'ops' and 'vals' consistent.
      
      Fixes: 9764c02f ("SMB3: Add support for multidialect negotiate (SMB2.1 and later)")
      Fixes: d5c7076b ("smb3: add smb3.1.1 to default dialect list")
      Signed-off-by: 's avatarZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: 's avatarSteve French <stfrench@microsoft.com>
      CC: Stable <stable@vger.kernel.org>
      Reviewed-by: 's avatarPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9582ba40
    • aaptel's avatar
      CIFS: keep FileInfo handle live during oplock break · ebac4d0a
      aaptel authored
      commit b98749ca upstream.
      
      In the oplock break handler, writing pending changes from pages puts
      the FileInfo handle. If the refcount reaches zero it closes the handle
      and waits for any oplock break handler to return, thus causing a deadlock.
      
      To prevent this situation:
      
      * We add a wait flag to cifsFileInfo_put() to decide whether we should
        wait for running/pending oplock break handlers
      
      * We keep an additionnal reference of the SMB FileInfo handle so that
        for the rest of the handler putting the handle won't close it.
        - The ref is bumped everytime we queue the handler via the
          cifs_queue_oplock_break() helper.
        - The ref is decremented at the end of the handler
      
      This bug was triggered by xfstest 464.
      
      Also important fix to address the various reports of
      oops in smb2_push_mandatory_locks
      Signed-off-by: aaptel's avatarAurelien Aptel <aaptel@suse.com>
      Signed-off-by: 's avatarSteve French <stfrench@microsoft.com>
      Reviewed-by: 's avatarPavel Shilovsky <pshilov@microsoft.com>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ebac4d0a
  4. 20 Apr, 2019 10 commits
    • Chao Yu's avatar
      f2fs: fix to add refcount once page is tagged PG_private · 1c108a1b
      Chao Yu authored
      [ Upstream commit 240a5915 ]
      
      As Gao Xiang reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=202749
      
      f2fs may skip pageout() due to incorrect page reference count.
      
      The problem here is that MM defined the rule [1] very clearly that
      once page was set with PG_private flag, we should increment the
      refcount in that page, also main flows like pageout(), migrate_page()
      will assume there is one additional page reference count if
      page_has_private() returns true.
      
      But currently, f2fs won't add/del refcount when changing PG_private
      flag. Anyway, f2fs should follow MM's rule to make MM's related flows
      running as expected.
      
      [1] https://lore.kernel.org/lkml/2b19b3c4-2bc4-15fa-15cc-27a13e5c7af1@aol.com/Reported-by: 's avatarGao Xiang <gaoxiang25@huawei.com>
      Signed-off-by: 's avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: 's avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: 's avatarSasha Levin <sashal@kernel.org>
      1c108a1b
    • Chao Yu's avatar
      f2fs: fix to use kvfree instead of kzfree · b5f51f7a
      Chao Yu authored
      [ Upstream commit 2a6a7e72 ]
      
      As Jiqun Li reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=202747
      
      System can panic due to using wrong allocate/free function pair
      in xattr interface:
      - use kvmalloc to allocate memory
      - use kzfree to free memory
      
      Let's fix to use kvfree instead of kzfree, BTW, we are safe to
      get rid of kzfree, since there is no such confidential data stored
      as xattr, we don't need to zero it before free memory.
      
      Fixes: 5222595d ("f2fs: use kvmalloc, if kmalloc is failed")
      Reported-by: 's avatarJiqun Li <jiqun.li@unisoc.com>
      Signed-off-by: 's avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: 's avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: 's avatarSasha Levin <sashal@kernel.org>
      b5f51f7a
    • Chao Yu's avatar
      f2fs: fix to dirty inode for i_mode recovery · c55d13d9
      Chao Yu authored
      [ Upstream commit ca597bdd ]
      
      As Seulbae Kim reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=202637
      
      We didn't recover permission field correctly after sudden power-cut,
      the reason is in setattr we didn't add inode into global dirty list
      once i_mode is changed, so latter checkpoint triggered by fsync will
      not flush last i_mode into disk, result in this problem, fix it.
      Reported-by: 's avatarSeulbae Kim <seulbae@gatech.edu>
      Signed-off-by: 's avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: 's avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: 's avatarSasha Levin <sashal@kernel.org>
      c55d13d9
    • Ronnie Sahlberg's avatar
      cifs: return -ENODATA when deleting an xattr that does not exist · f0f1c97f
      Ronnie Sahlberg authored
      [ Upstream commit 21094641 ]
      
      BUGZILLA: https://bugzilla.kernel.org/show_bug.cgi?id=202007
      
      When deleting an xattr/EA:
      SMB2/3 servers will return SUCCESS when clients delete non-existing EAs.
      This means that we need to first QUERY the server and check if the EA
      exists or not so that we can return -ENODATA correctly when this happens.
      Signed-off-by: 's avatarRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: 's avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: 's avatarSasha Levin <sashal@kernel.org>
      f0f1c97f
    • Jaegeuk Kim's avatar
      f2fs: sync filesystem after roll-forward recovery · 3b2457ce
      Jaegeuk Kim authored
      [ Upstream commit 812a9597 ]
      
      Some works after roll-forward recovery can get an error which will release
      all the data structures. Let's flush them in order to make it clean.
      
      One possible corruption came from:
      
      [   90.400500] list_del corruption. prev->next should be ffffffed1f566208, but was (null)
      [   90.675349] Call trace:
      [   90.677869]  __list_del_entry_valid+0x94/0xb4
      [   90.682351]  remove_dirty_inode+0xac/0x114
      [   90.686563]  __f2fs_write_data_pages+0x6a8/0x6c8
      [   90.691302]  f2fs_write_data_pages+0x40/0x4c
      [   90.695695]  do_writepages+0x80/0xf0
      [   90.699372]  __writeback_single_inode+0xdc/0x4ac
      [   90.704113]  writeback_sb_inodes+0x280/0x440
      [   90.708501]  wb_writeback+0x1b8/0x3d0
      [   90.712267]  wb_workfn+0x1a8/0x4d4
      [   90.715765]  process_one_work+0x1c0/0x3d4
      [   90.719883]  worker_thread+0x224/0x344
      [   90.723739]  kthread+0x120/0x130
      [   90.727055]  ret_from_fork+0x10/0x18
      Reported-by: 's avatarSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: 's avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: 's avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: 's avatarSasha Levin <sashal@kernel.org>
      3b2457ce
    • Darrick J. Wong's avatar
      ext4: prohibit fstrim in norecovery mode · 8bc6ef89
      Darrick J. Wong authored
      [ Upstream commit 18915b58 ]
      
      The ext4 fstrim implementation uses the block bitmaps to find free space
      that can be discarded.  If we haven't replayed the journal, the bitmaps
      will be stale and we absolutely *cannot* use stale metadata to zap the
      underlying storage.
      Signed-off-by: 's avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: Theodore Ts'o's avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: 's avatarSasha Levin <sashal@kernel.org>
      8bc6ef89
    • Kairui Song's avatar
      x86/gart: Exclude GART aperture from kcore · 64253073
      Kairui Song authored
      [ Upstream commit ffc8599a ]
      
      On machines where the GART aperture is mapped over physical RAM,
      /proc/kcore contains the GART aperture range. Accessing the GART range via
      /proc/kcore results in a kernel crash.
      
      vmcore used to have the same issue, until it was fixed with commit
      2a3e83c6 ("x86/gart: Exclude GART aperture from vmcore")', leveraging
      existing hook infrastructure in vmcore to let /proc/vmcore return zeroes
      when attempting to read the aperture region, and so it won't read from the
      actual memory.
      
      Apply the same workaround for kcore. First implement the same hook
      infrastructure for kcore, then reuse the hook functions introduced in the
      previous vmcore fix. Just with some minor adjustment, rename some functions
      for more general usage, and simplify the hook infrastructure a bit as there
      is no module usage yet.
      Suggested-by: 's avatarBaoquan He <bhe@redhat.com>
      Signed-off-by: 's avatarKairui Song <kasong@redhat.com>
      Signed-off-by: 's avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: 's avatarJiri Bohac <jbohac@suse.cz>
      Acked-by: 's avatarBaoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Dave Young <dyoung@redhat.com>
      Link: https://lkml.kernel.org/r/20190308030508.13548-1-kasong@redhat.comSigned-off-by: 's avatarSasha Levin <sashal@kernel.org>
      64253073
    • Paulo Alcantara (SUSE)'s avatar
      cifs: Fix slab-out-of-bounds when tracing SMB tcon · 14bec2dd
      Paulo Alcantara (SUSE) authored
      [ Upstream commit 68ddb496 ]
      
      This patch fixes the following KASAN report:
      
      [  779.044746] BUG: KASAN: slab-out-of-bounds in string+0xab/0x180
      [  779.044750] Read of size 1 at addr ffff88814f327968 by task trace-cmd/2812
      
      [  779.044756] CPU: 1 PID: 2812 Comm: trace-cmd Not tainted 5.1.0-rc1+ #62
      [  779.044760] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-0-ga698c89-prebuilt.qemu.org 04/01/2014
      [  779.044761] Call Trace:
      [  779.044769]  dump_stack+0x5b/0x90
      [  779.044775]  ? string+0xab/0x180
      [  779.044781]  print_address_description+0x6c/0x23c
      [  779.044787]  ? string+0xab/0x180
      [  779.044792]  ? string+0xab/0x180
      [  779.044797]  kasan_report.cold.3+0x1a/0x32
      [  779.044803]  ? string+0xab/0x180
      [  779.044809]  string+0xab/0x180
      [  779.044816]  ? widen_string+0x160/0x160
      [  779.044822]  ? vsnprintf+0x5bf/0x7f0
      [  779.044829]  vsnprintf+0x4e7/0x7f0
      [  779.044836]  ? pointer+0x4a0/0x4a0
      [  779.044841]  ? seq_buf_vprintf+0x79/0xc0
      [  779.044848]  seq_buf_vprintf+0x62/0xc0
      [  779.044855]  trace_seq_printf+0x113/0x210
      [  779.044861]  ? trace_seq_puts+0x110/0x110
      [  779.044867]  ? trace_raw_output_prep+0xd8/0x110
      [  779.044876]  trace_raw_output_smb3_tcon_class+0x9f/0xc0
      [  779.044882]  print_trace_line+0x377/0x890
      [  779.044888]  ? tracing_buffers_read+0x300/0x300
      [  779.044893]  ? ring_buffer_read+0x58/0x70
      [  779.044899]  s_show+0x6e/0x140
      [  779.044906]  seq_read+0x505/0x6a0
      [  779.044913]  vfs_read+0xaf/0x1b0
      [  779.044919]  ksys_read+0xa1/0x130
      [  779.044925]  ? kernel_write+0xa0/0xa0
      [  779.044931]  ? __do_page_fault+0x3d5/0x620
      [  779.044938]  do_syscall_64+0x63/0x150
      [  779.044944]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  779.044949] RIP: 0033:0x7f62c2c2db31
      [ 779.044955] Code: fe ff ff 48 8d 3d 17 9e 09 00 48 83 ec 08 e8 96 02
      02 00 66 0f 1f 44 00 00 8b 05 fa fc 2c 00 48 63 ff 85 c0 75 13 31 c0
      0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 55 53 48 89 d5 48
      89
      [  779.044958] RSP: 002b:00007ffd6e116678 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
      [  779.044964] RAX: ffffffffffffffda RBX: 0000560a38be9260 RCX: 00007f62c2c2db31
      [  779.044966] RDX: 0000000000002000 RSI: 00007ffd6e116710 RDI: 0000000000000003
      [  779.044966] RDX: 0000000000002000 RSI: 00007ffd6e116710 RDI: 0000000000000003
      [  779.044969] RBP: 00007f62c2ef5420 R08: 0000000000000000 R09: 0000000000000003
      [  779.044972] R10: ffffffffffffffa8 R11: 0000000000000246 R12: 00007ffd6e116710
      [  779.044975] R13: 0000000000002000 R14: 0000000000000d68 R15: 0000000000002000
      
      [  779.044981] Allocated by task 1257:
      [  779.044987]  __kasan_kmalloc.constprop.5+0xc1/0xd0
      [  779.044992]  kmem_cache_alloc+0xad/0x1a0
      [  779.044997]  getname_flags+0x6c/0x2a0
      [  779.045003]  user_path_at_empty+0x1d/0x40
      [  779.045008]  do_faccessat+0x12a/0x330
      [  779.045012]  do_syscall_64+0x63/0x150
      [  779.045017]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [  779.045019] Freed by task 1257:
      [  779.045023]  __kasan_slab_free+0x12e/0x180
      [  779.045029]  kmem_cache_free+0x85/0x1b0
      [  779.045034]  filename_lookup.part.70+0x176/0x250
      [  779.045039]  do_faccessat+0x12a/0x330
      [  779.045043]  do_syscall_64+0x63/0x150
      [  779.045048]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [  779.045052] The buggy address belongs to the object at ffff88814f326600
      which belongs to the cache names_cache of size 4096
      [  779.045057] The buggy address is located 872 bytes to the right of
      4096-byte region [ffff88814f326600, ffff88814f327600)
      [  779.045058] The buggy address belongs to the page:
      [  779.045062] page:ffffea00053cc800 count:1 mapcount:0 mapping:ffff88815b191b40 index:0x0 compound_mapcount: 0
      [  779.045067] flags: 0x200000000010200(slab|head)
      [  779.045075] raw: 0200000000010200 dead000000000100 dead000000000200 ffff88815b191b40
      [  779.045081] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
      [  779.045083] page dumped because: kasan: bad access detected
      
      [  779.045085] Memory state around the buggy address:
      [  779.045089]  ffff88814f327800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  779.045093]  ffff88814f327880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  779.045097] >ffff88814f327900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  779.045099]                                                           ^
      [  779.045103]  ffff88814f327980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  779.045107]  ffff88814f327a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  779.045109] ==================================================================
      [  779.045110] Disabling lock debugging due to kernel taint
      
      Correctly assign tree name str for smb3_tcon event.
      Signed-off-by: 's avatarPaulo Alcantara (SUSE) <paulo@paulo.ac>
      Signed-off-by: 's avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: 's avatarSasha Levin <sashal@kernel.org>
      14bec2dd
    • Steve French's avatar
      fix incorrect error code mapping for OBJECTID_NOT_FOUND · a419571b
      Steve French authored
      [ Upstream commit 85f9987b ]
      
      It was mapped to EIO which can be confusing when user space
      queries for an object GUID for an object for which the server
      file system doesn't support (or hasn't saved one).
      
      As Amir Goldstein suggested this is similar to ENOATTR
      (equivalently ENODATA in Linux errno definitions) so
      changing NT STATUS code mapping for OBJECTID_NOT_FOUND
      to ENODATA.
      Signed-off-by: 's avatarSteve French <stfrench@microsoft.com>
      CC: Amir Goldstein <amir73il@gmail.com>
      Signed-off-by: 's avatarSasha Levin <sashal@kernel.org>
      a419571b
    • Xiaoli Feng's avatar
      cifs: fix that return -EINVAL when do dedupe operation · 21edc981
      Xiaoli Feng authored
      [ Upstream commit b073a080 ]
      
      dedupe_file_range operations is combiled into remap_file_range.
      But it's always skipped for dedupe operations in function
      cifs_remap_file_range.
      
      Example to test:
      Before this patch:
        # dd if=/dev/zero of=cifs/file bs=1M count=1
        # xfs_io -c "dedupe cifs/file 4k 64k 4k" cifs/file
        XFS_IOC_FILE_EXTENT_SAME: Invalid argument
      
      After this patch:
        # dd if=/dev/zero of=cifs/file bs=1M count=1
        # xfs_io -c "dedupe cifs/file 4k 64k 4k" cifs/file
        XFS_IOC_FILE_EXTENT_SAME: Operation not supported
      
      Influence for xfstests:
      generic/091
      generic/112
      generic/127
      generic/263
      These tests report this error "do_copy_range:: Invalid
      argument" instead of "FIDEDUPERANGE: Invalid argument".
      Because there are still two bugs cause these test failed.
      https://bugzilla.kernel.org/show_bug.cgi?id=202935
      https://bugzilla.kernel.org/show_bug.cgi?id=202785Signed-off-by: 's avatarXiaoli Feng <fengxiaoli0714@gmail.com>
      Signed-off-by: 's avatarSteve French <stfrench@microsoft.com>
      Signed-off-by: 's avatarSasha Levin <sashal@kernel.org>
      21edc981