1. 12 Apr, 2015 4 commits
  2. 26 Mar, 2015 1 commit
  3. 20 Jan, 2015 2 commits
  4. 14 Jan, 2015 1 commit
  5. 17 Nov, 2014 1 commit
    • Benjamin Marzinski's avatar
      fs: add freeze_super/thaw_super fs hooks · 48b6bca6
      Benjamin Marzinski authored
      Currently, freezing a filesystem involves calling freeze_super, which locks
      sb->s_umount and then calls the fs-specific freeze_fs hook. This makes it
      hard for gfs2 (and potentially other cluster filesystems) to use the vfs
      freezing code to do freezes on all the cluster nodes.
      
      In order to communicate that a freeze has been requested, and to make sure
      that only one node is trying to freeze at a time, gfs2 uses a glock
      (sd_freeze_gl). The problem is that there is no hook for gfs2 to acquire
      this lock before calling freeze_super. This means that two nodes can
      attempt to freeze the filesystem by both calling freeze_super, acquiring
      the sb->s_umount lock, and then attempting to grab the cluster glock
      sd_freeze_gl. Only one will succeed, and the other will be stuck in
      freeze_super, making it impossible to finish freezing the node.
      
      To solve this problem, this patch adds the freeze_super and thaw_super
      hooks.  If a filesystem implements these hooks, they are called instead of
      the vfs freeze_super and thaw_super functions. This means that every
      filesystem that implements these hooks must call the vfs freeze_super and
      thaw_super functions itself within the hook function to make use of the vfs
      freezing code.
      Reviewed-by: default avatarJan Kara <[email protected]>
      Signed-off-by: default avatarBenjamin Marzinski <[email protected]>
      Signed-off-by: Steven Whitehouse's avatarSteven Whitehouse <[email protected]>
      48b6bca6
  6. 31 Oct, 2014 1 commit
    • David Jeffery's avatar
      Return short read or 0 at end of a raw device, not EIO · b2de525f
      David Jeffery authored
      Author: David Jeffery <[email protected]>
      Changes to the basic direct I/O code have broken the raw driver when reading
      to the end of a raw device.  Instead of returning a short read for a read that
      extends partially beyond the device's end or 0 when at the end of the device,
      these reads now return EIO.
      
      The raw driver needs the same end of device handling as was added for normal
      block devices.  Using blkdev_read_iter, which has the needed size checks,
      prevents the EIO conditions at the end of the device.
      Signed-off-by: default avatarDavid Jeffery <[email protected]>
      Signed-off-by: default avatarAl Viro <[email protected]>
      b2de525f
  7. 10 Oct, 2014 1 commit
    • Akinobu Mita's avatar
      block_dev: implement readpages() to optimize sequential read · 447f05bb
      Akinobu Mita authored
      Sequential read from a block device is expected to be equal or faster than
      from the file on a filesystem.  But it is not correct due to the lack of
      effective readpages() in the address space operations for block device.
      
      This implements readpages() operation for block device by using
      mpage_readpages() which can create multipage BIOs instead of BIOs for each
      page and reduce system CPU time consumption.
      
      Install 1GB of RAM disk storage:
      
      	# modprobe scsi_debug dev_size_mb=1024 delay=0
      
      Sequential read from file on a filesystem:
      
      	# mkfs.ext4 /dev/$DEV
      	# mount /dev/$DEV /mnt
      	# fio --name=t --size=512m --rw=read --filename=/mnt/file
      	...
      	  read : io=524288KB, bw=2133.4MB/s, iops=546133, runt=   240msec
      
      Sequential read from a block device:
      	# fio --name=t --size=512m --rw=read --filename=/dev/$DEV
      	...
      (Without this commit)
      	  read : io=524288KB, bw=1700.2MB/s, iops=435455, runt=   301msec
      
      (With this commit)
      	  read : io=524288KB, bw=2160.4MB/s, iops=553046, runt=   237msec
      Signed-off-by: default avatarAkinobu Mita <[email protected]>
      Cc: Jens Axboe <[email protected]>
      Cc: Alexander Viro <[email protected]>
      Cc: Jeff Moyer <[email protected]>
      Signed-off-by: default avatarAndrew Morton <[email protected]>
      Signed-off-by: default avatarLinus Torvalds <[email protected]>
      447f05bb
  8. 08 Sep, 2014 2 commits
    • Tejun Heo's avatar
      bdi: reimplement bdev_inode_switch_bdi() · 018a17bd
      Tejun Heo authored
      A block_device may be attached to different gendisks and thus
      different bdis over time.  bdev_inode_switch_bdi() is used to switch
      the associated bdi.  The function assumes that the inode could be
      dirty and transfers it between bdis if so.  This is a bit nasty in
      that it reaches into bdi internals.
      
      This patch reimplements the function so that it writes out the inode
      if dirty.  This is a lot simpler and can be implemented without
      exposing bdi internals.
      Signed-off-by: default avatarTejun Heo <[email protected]>
      Cc: Alexander Viro <[email protected]>
      Signed-off-by: default avatarJens Axboe <[email protected]>
      018a17bd
    • Tejun Heo's avatar
      block, bdi: an active gendisk always has a request_queue associated with it · ff9ea323
      Tejun Heo authored
      bdev_get_queue() returns the request_queue associated with the
      specified block_device.  blk_get_backing_dev_info() makes use of
      bdev_get_queue() to determine the associated bdi given a block_device.
      
      All the callers of bdev_get_queue() including
      blk_get_backing_dev_info() assume that bdev_get_queue() may return
      NULL and implement NULL handling; however, bdev_get_queue() requires
      the passed in block_device is opened and attached to its gendisk.
      Because an active gendisk always has a valid request_queue associated
      with it, bdev_get_queue() can never return NULL and neither can
      blk_get_backing_dev_info().
      
      Make it clear that neither of the two functions can return NULL and
      remove NULL handling from all the callers.
      Signed-off-by: default avatarTejun Heo <[email protected]>
      Cc: Chris Mason <[email protected]>
      Cc: Dave Chinner <[email protected]>
      Signed-off-by: default avatarJens Axboe <[email protected]>
      ff9ea323
  9. 12 Jun, 2014 1 commit
    • Al Viro's avatar
      ->splice_write() via ->write_iter() · 8d020765
      Al Viro authored
      iter_file_splice_write() - a ->splice_write() instance that gathers the
      pipe buffers, builds a bio_vec-based iov_iter covering those and feeds
      it to ->write_iter().  A bunch of simple cases coverted to that...
      
      [AV: fixed the braino spotted by Cyrill]
      Signed-off-by: default avatarAl Viro <[email protected]>
      8d020765
  10. 04 Jun, 2014 1 commit
  11. 06 May, 2014 5 commits
  12. 03 Apr, 2014 2 commits
  13. 02 Apr, 2014 1 commit
  14. 04 Sep, 2013 1 commit
  15. 30 Jul, 2013 1 commit
  16. 09 Jul, 2013 1 commit
  17. 03 Jul, 2013 1 commit
  18. 29 Jun, 2013 1 commit
  19. 28 Jun, 2013 1 commit
    • Jan Kara's avatar
      writeback: Fix periodic writeback after fs mount · a5faeaf9
      Jan Kara authored
      Code in blkdev.c moves a device inode to default_backing_dev_info when
      the last reference to the device is put and moves the device inode back
      to its bdi when the first reference is acquired. This includes moving to
      wb.b_dirty list if the device inode is dirty. The code however doesn't
      setup timer to wake corresponding flusher thread and while wb.b_dirty
      list is non-empty __mark_inode_dirty() will not set it up either. Thus
      periodic writeback is effectively disabled until a sync(2) call which can
      lead to unexpected data loss in case of crash or power failure.
      
      Fix the problem by setting up a timer for periodic writeback in case we
      add the first dirty inode to wb.b_dirty list in bdev_inode_switch_bdi().
      Reported-by: default avatarBert De Jonghe <[email protected]>
      CC: [email protected] # >= 3.0
      Signed-off-by: default avatarJan Kara <[email protected]>
      Signed-off-by: default avatarJens Axboe <[email protected]>
      a5faeaf9
  20. 08 May, 2013 1 commit
  21. 07 May, 2013 2 commits
  22. 01 May, 2013 1 commit
  23. 29 Apr, 2013 1 commit
  24. 01 Apr, 2013 1 commit
    • Anatol Pomozov's avatar
      loop: prevent bdev freeing while device in use · c1681bf8
      Anatol Pomozov authored
      struct block_device lifecycle is defined by its inode (see fs/block_dev.c) -
      block_device allocated first time we access /dev/loopXX and deallocated on
      bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile"
      we want that block_device stay alive until we destroy the loop device
      with "losetup -d".
      
      But because we do not hold /dev/loopXX inode its counter goes 0, and
      inode/bdev can be destroyed at any moment. Usually it happens at memory
      pressure or when user drops inode cache (like in the test below). When later in
      loop_clr_fd() we want to use bdev we have use-after-free error with following
      stack:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000280
        bd_set_size+0x10/0xa0
        loop_clr_fd+0x1f8/0x420 [loop]
        lo_ioctl+0x200/0x7e0 [loop]
        lo_compat_ioctl+0x47/0xe0 [loop]
        compat_blkdev_ioctl+0x341/0x1290
        do_filp_open+0x42/0xa0
        compat_sys_ioctl+0xc1/0xf20
        do_sys_open+0x16e/0x1d0
        sysenter_dispatch+0x7/0x1a
      
      To prevent use-after-free we need to grab the device in loop_set_fd()
      and put it later in loop_clr_fd().
      
      The issue is reprodusible on current Linus head and v3.3. Here is the test:
      
        dd if=/dev/zero of=loop.file bs=1M count=1
        while [ true ]; do
          losetup /dev/loop0 loop.file
          echo 2 > /proc/sys/vm/drop_caches
          losetup -d /dev/loop0
        done
      
      [ Doing bdgrab/bput in loop_set_fd/loop_clr_fd is safe, because every
        time we call loop_set_fd() we check that loop_device->lo_state is
        Lo_unbound and set it to Lo_bound If somebody will try to set_fd again
        it will get EBUSY.  And if we try to loop_clr_fd() on unbound loop
        device we'll get ENXIO.
      
        loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
        loop_device->lo_ctl_mutex. ]
      Signed-off-by: Anatol Pomozov's avatarAnatol Pomozov <[email protected]>
      Cc: Al Viro <[email protected]>
      Signed-off-by: default avatarLinus Torvalds <[email protected]>
      c1681bf8
  25. 23 Feb, 2013 1 commit
  26. 22 Feb, 2013 3 commits
  27. 18 Dec, 2012 1 commit