1. 27 Jan, 2019 2 commits
  2. 02 Jan, 2019 1 commit
    • Linus Torvalds's avatar
      block: don't use un-ordered __set_current_state(TASK_UNINTERRUPTIBLE) · 1ac5cd49
      Linus Torvalds authored
      This mostly reverts commit 849a3700 ("block: avoid ordered task
      state change for polled IO").  It was wrongly claiming that the ordering
      wasn't necessary.  The memory barrier _is_ necessary.
      
      If something is truly polling and not going to sleep, it's the whole
      state setting that is unnecessary, not the memory barrier.  Whenever you
      set your state to a sleeping state, you absolutely need the memory
      barrier.
      
      Note that sometimes the memory barrier can be elsewhere.  For example,
      the ordering might be provided by an external lock, or by setting the
      process state to sleeping before adding yourself to the wait queue list
      that is used for waking up (where the wait queue lock itself will
      guarantee that any wakeup will correctly see the sleeping state).
      
      But none of those cases were true here.
      
      NOTE! Some of the polling paths may indeed be able to drop the state
      setting entirely, at which point the memory barrier also goes away.
      
      (Also note that this doesn't revert the TASK_RUNNING cases: there is no
      race between a wakeup and setting the process state to TASK_RUNNING,
      since the end result doesn't depend on ordering).
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
      1ac5cd49
  3. 28 Dec, 2018 1 commit
  4. 21 Dec, 2018 1 commit
    • Eric Sandeen's avatar
      iomap: don't search past page end in iomap_is_partially_uptodate · 3cc31fa6
      Eric Sandeen authored
      iomap_is_partially_uptodate() is intended to check wither blocks within
      the selected range of a not-uptodate page are uptodate; if the range we
      care about is up to date, it's an optimization.
      
      However, the iomap implementation continues to check all blocks up to
      from+count, which is beyond the page, and can even be well beyond the
      iop->uptodate bitmap.
      
      I think the worst that will happen is that we may eventually find a zero
      bit and return "not partially uptodate" when it would have otherwise
      returned true, and skip the optimization.  Still, it's clearly an invalid
      memory access that must be fixed.
      
      So: fix this by limiting the search to within the page as is done in the
      non-iomap variant, block_is_partially_uptodate().
      
      Zorro noticed thiswhen KASAN went off for 512 byte blocks on a 64k
      page system:
      
       BUG: KASAN: slab-out-of-bounds in iomap_is_partially_uptodate+0x1a0/0x1e0
       Read of size 8 at addr ffff800120c3a318 by task fsstress/22337
      Reported-by: 's avatarZorro Lang <zlang@redhat.com>
      Signed-off-by: 's avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: Eric Sandeen's avatarEric Sandeen <sandeen@sandeen.net>
      Reviewed-by: 's avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: 's avatarDarrick J. Wong <darrick.wong@oracle.com>
      3cc31fa6
  5. 20 Dec, 2018 1 commit
    • Dave Chinner's avatar
      iomap: Revert "fs/iomap.c: get/put the page in iomap_page_create/release()" · a837eca2
      Dave Chinner authored
      This reverts commit 61c6de66.
      
      The reverted commit added page reference counting to iomap page
      structures that are used to track block size < page size state. This
      was supposed to align the code with page migration page accounting
      assumptions, but what it has done instead is break XFS filesystems.
      Every fstests run I've done on sub-page block size XFS filesystems
      has since picking up this commit 2 days ago has failed with bad page
      state errors such as:
      
      # ./run_check.sh "-m rmapbt=1,reflink=1 -i sparse=1 -b size=1k" "generic/038"
      ....
      SECTION       -- xfs
      FSTYP         -- xfs (debug)
      PLATFORM      -- Linux/x86_64 test1 4.20.0-rc6-dgc+
      MKFS_OPTIONS  -- -f -m rmapbt=1,reflink=1 -i sparse=1 -b size=1k /dev/sdc
      MOUNT_OPTIONS -- /dev/sdc /mnt/scratch
      
      generic/038 454s ...
       run fstests generic/038 at 2018-12-20 18:43:05
       XFS (sdc): Unmounting Filesystem
       XFS (sdc): Mounting V5 Filesystem
       XFS (sdc): Ending clean mount
       BUG: Bad page state in process kswapd0  pfn:3a7fa
       page:ffffea0000ccbeb0 count:0 mapcount:0 mapping:ffff88800d9b6360 index:0x1
       flags: 0xfffffc0000000()
       raw: 000fffffc0000000 dead000000000100 dead000000000200 ffff88800d9b6360
       raw: 0000000000000001 0000000000000000 00000000ffffffff
       page dumped because: non-NULL mapping
       CPU: 0 PID: 676 Comm: kswapd0 Not tainted 4.20.0-rc6-dgc+ #915
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
       Call Trace:
        dump_stack+0x67/0x90
        bad_page.cold.116+0x8a/0xbd
        free_pcppages_bulk+0x4bf/0x6a0
        free_unref_page_list+0x10f/0x1f0
        shrink_page_list+0x49d/0xf50
        shrink_inactive_list+0x19d/0x3b0
        shrink_node_memcg.constprop.77+0x398/0x690
        ? shrink_slab.constprop.81+0x278/0x3f0
        shrink_node+0x7a/0x2f0
        kswapd+0x34b/0x6d0
        ? node_reclaim+0x240/0x240
        kthread+0x11f/0x140
        ? __kthread_bind_mask+0x60/0x60
        ret_from_fork+0x24/0x30
       Disabling lock debugging due to kernel taint
      ....
      
      The failures are from anyway that frees pages and empties the
      per-cpu page magazines, so it's not a predictable failure or an easy
      to debug failure.
      
      generic/038 is a reliable reproducer of this problem - it has a 9 in
      10 failure rate on one of my test machines. Failure on other
      machines have been at random points in fstests runs but every run
      has ended up tripping this problem. Hence generic/038 was used to
      bisect the failure because it was the most reliable failure.
      
      It is too close to the 4.20 release (not to mention holidays) to
      try to diagnose, fix and test the underlying cause of the problem,
      so reverting the commit is the only option we have right now. The
      revert has been tested against a current tot 4.20-rc7+ kernel across
      multiple machines running sub-page block size XFs filesystems and
      none of the bad page state failures have been seen.
      Signed-off-by: 's avatarDave Chinner <dchinner@redhat.com>
      Cc: Piotr Jaroszynski <pjaroszynski@nvidia.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: Brian Foster <bfoster@redhat.com>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
      a837eca2
  6. 14 Dec, 2018 1 commit
    • Piotr Jaroszynski's avatar
      fs/iomap.c: get/put the page in iomap_page_create/release() · 61c6de66
      Piotr Jaroszynski authored
      migrate_page_move_mapping() expects pages with private data set to have
      a page_count elevated by 1.  This is what used to happen for xfs through
      the buffer_heads code before the switch to iomap in commit 82cb1417
      ("xfs: add support for sub-pagesize writeback without buffer_heads").
      Not having the count elevated causes move_pages() to fail on memory
      mapped files coming from xfs.
      
      Make iomap compatible with the migrate_page_move_mapping() assumption by
      elevating the page count as part of iomap_page_create() and lowering it
      in iomap_page_release().
      
      It causes the move_pages() syscall to misbehave on memory mapped files
      from xfs.  It does not not move any pages, which I suppose is "just" a
      perf issue, but it also ends up returning a positive number which is out
      of spec for the syscall.  Talking to Michal Hocko, it sounds like
      returning positive numbers might be a necessary update to move_pages()
      anyway though
      (https://lkml.kernel.org/r/20181116114955.GJ14706@dhcp22.suse.cz).
      
      I only hit this in tests that verify that move_pages() actually moved
      the pages.  The test also got confused by the positive return from
      move_pages() (it got treated as a success as positive numbers were not
      expected and not handled) making it a bit harder to track down what's
      going on.
      
      Link: http://lkml.kernel.org/r/20181115184140.1388751-1-pjaroszynski@nvidia.com
      Fixes: 82cb1417 ("xfs: add support for sub-pagesize writeback without buffer_heads")
      Signed-off-by: Piotr Jaroszynski's avatarPiotr Jaroszynski <pjaroszynski@nvidia.com>
      Reviewed-by: 's avatarChristoph Hellwig <hch@lst.de>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: Brian Foster <bfoster@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
      61c6de66
  7. 04 Dec, 2018 1 commit
    • Darrick J. Wong's avatar
      iomap: partially revert 4721a601 (simulated directio short read on EFAULT) · 8f67b5ad
      Darrick J. Wong authored
      In commit 4721a601, we tried to fix a problem wherein directio reads
      into a splice pipe will bounce EFAULT/EAGAIN all the way out to
      userspace by simulating a zero-byte short read.  This happens because
      some directio read implementations (xfs) will call
      bio_iov_iter_get_pages to grab pipe buffer pages and issue asynchronous
      reads, but as soon as we run out of pipe buffers that _get_pages call
      returns EFAULT, which the splice code translates to EAGAIN and bounces
      out to userspace.
      
      In that commit, the iomap code catches the EFAULT and simulates a
      zero-byte read, but that causes assertion errors on regular splice reads
      because xfs doesn't allow short directio reads.  This causes infinite
      splice() loops and assertion failures on generic/095 on overlayfs
      because xfs only permit total success or total failure of a directio
      operation.  The underlying issue in the pipe splice code has now been
      fixed by changing the pipe splice loop to avoid avoid reading more data
      than there is space in the pipe.
      
      Therefore, it's no longer necessary to simulate the short directio, so
      remove the hack from iomap.
      
      Fixes: 4721a601 ("iomap: dio data corruption and spurious errors when pipes fill")
      Reported-by: 's avatarMurphy Zhou <jencce.kernel@gmail.com>
      Ranted-by: Amir Goldstein's avatarAmir Goldstein <amir73il@gmail.com>
      Reviewed-by: 's avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: 's avatarDarrick J. Wong <darrick.wong@oracle.com>
      8f67b5ad
  8. 26 Nov, 2018 1 commit
    • Jens Axboe's avatar
      block: make blk_poll() take a parameter on whether to spin or not · 0a1b8b87
      Jens Axboe authored
      blk_poll() has always kept spinning until it found an IO. This is
      fine for SYNC polling, since we need to find one request we have
      pending, but in preparation for ASYNC polling it can be beneficial
      to just check if we have any entries available or not.
      
      Existing callers are converted to pass in 'spin == true', to retain
      the old behavior.
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      0a1b8b87
  9. 21 Nov, 2018 4 commits
    • Dave Chinner's avatar
      iomap: readpages doesn't zero page tail beyond EOF · 8c110d43
      Dave Chinner authored
      When we read the EOF page of the file via readpages, we need
      to zero the region beyond EOF that we either do not read or
      should not contain data so that mmap does not expose stale data to
      user applications.
      
      However, iomap_adjust_read_range() fails to detect EOF correctly,
      and so fsx on 1k block size filesystems fails very quickly with
      mapreads exposing data beyond EOF. There are two problems here.
      
      Firstly, when calculating the end block of the EOF byte, we have
      to round the size by one to avoid a block aligned EOF from reporting
      a block too large. i.e. a size of 1024 bytes is 1 block, which in
      index terms is block 0. Therefore we have to calculate the end block
      from (isize - 1), not isize.
      
      The second bug is determining if the current page spans EOF, and so
      whether we need split it into two half, one for the IO, and the
      other for zeroing. Unfortunately, the code that checks whether
      we should split the block doesn't actually check if we span EOF, it
      just checks if the read spans the /offset in the page/ that EOF
      sits on. So it splits every read into two if EOF is not page
      aligned, regardless of whether we are reading the EOF block or not.
      
      Hence we need to restrict the "does the read span EOF" check to
      just the page that spans EOF, not every page we read.
      
      This patch results in correct EOF detection through readpages:
      
      xfs_vm_readpages:     dev 259:0 ino 0x43 nr_pages 24
      xfs_iomap_found:      dev 259:0 ino 0x43 size 0x66c00 offset 0x4f000 count 98304 type hole startoff 0x13c startblock 1368 blockcount 0x4
      iomap_readpage_actor: orig pos 323584 pos 323584, length 4096, poff 0 plen 4096, isize 420864
      xfs_iomap_found:      dev 259:0 ino 0x43 size 0x66c00 offset 0x50000 count 94208 type hole startoff 0x140 startblock 1497 blockcount 0x5c
      iomap_readpage_actor: orig pos 327680 pos 327680, length 94208, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 331776 pos 331776, length 90112, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 335872 pos 335872, length 86016, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 339968 pos 339968, length 81920, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 344064 pos 344064, length 77824, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 348160 pos 348160, length 73728, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 352256 pos 352256, length 69632, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 356352 pos 356352, length 65536, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 360448 pos 360448, length 61440, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 364544 pos 364544, length 57344, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 368640 pos 368640, length 53248, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 372736 pos 372736, length 49152, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 376832 pos 376832, length 45056, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 380928 pos 380928, length 40960, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 385024 pos 385024, length 36864, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 389120 pos 389120, length 32768, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 393216 pos 393216, length 28672, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 397312 pos 397312, length 24576, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 401408 pos 401408, length 20480, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 405504 pos 405504, length 16384, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 409600 pos 409600, length 12288, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 413696 pos 413696, length 8192, poff 0 plen 4096, isize 420864
      iomap_readpage_actor: orig pos 417792 pos 417792, length 4096, poff 0 plen 3072, isize 420864
      iomap_readpage_actor: orig pos 420864 pos 420864, length 1024, poff 3072 plen 1024, isize 420864
      
      As you can see, it now does full page reads until the last one which
      is split correctly at the block aligned EOF, reading 3072 bytes and
      zeroing the last 1024 bytes. The original version of the patch got
      this right, but it got another case wrong.
      
      The EOF detection crossing really needs to the the original length
      as plen, while it starts at the end of the block, will be shortened
      as up-to-date blocks are found on the page. This means "orig_pos +
      plen" no longer points to the end of the page, and so will not
      correctly detect EOF crossing. Hence we have to use the length
      passed in to detect this partial page case:
      
      xfs_filemap_fault:    dev 259:1 ino 0x43  write_fault 0
      xfs_vm_readpage:      dev 259:1 ino 0x43 nr_pages 1
      xfs_iomap_found:      dev 259:1 ino 0x43 size 0x2cc00 offset 0x2c000 count 4096 type hole startoff 0xb0 startblock 282 blockcount 0x4
      iomap_readpage_actor: orig pos 180224 pos 181248, length 4096, poff 1024 plen 2048, isize 183296
      xfs_iomap_found:      dev 259:1 ino 0x43 size 0x2cc00 offset 0x2cc00 count 1024 type hole startoff 0xb3 startblock 285 blockcount 0x1
      iomap_readpage_actor: orig pos 183296 pos 183296, length 1024, poff 3072 plen 1024, isize 183296
      
      Heere we see a trace where the first block on the EOF page is up to
      date, hence poff = 1024 bytes. The offset into the page of EOF is
      3072, so the range we want to read is 1024 - 3071, and the range we
      want to zero is 3072 - 4095. You can see this is split correctly
      now.
      
      This fixes the stale data beyond EOF problem that fsx quickly
      uncovers on 1k block size filesystems.
      Signed-off-by: 's avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: 's avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: 's avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: 's avatarDarrick J. Wong <darrick.wong@oracle.com>
      8c110d43
    • Dave Chinner's avatar
      iomap: dio data corruption and spurious errors when pipes fill · 4721a601
      Dave Chinner authored
      When doing direct IO to a pipe for do_splice_direct(), then pipe is
      trivial to fill up and overflow as it can only hold 16 pages. At
      this point bio_iov_iter_get_pages() then returns -EFAULT, and we
      abort the IO submission process. Unfortunately, iomap_dio_rw()
      propagates the error back up the stack.
      
      The error is converted from the EFAULT to EAGAIN in
      generic_file_splice_read() to tell the splice layers that the pipe
      is full. do_splice_direct() completely fails to handle EAGAIN errors
      (it aborts on error) and returns EAGAIN to the caller.
      
      copy_file_write() then completely fails to handle EAGAIN as well,
      and so returns EAGAIN to userspace, having failed to copy the data
      it was asked to.
      
      Avoid this whole steaming pile of fail by having iomap_dio_rw()
      silently swallow EFAULT errors and so do short reads.
      
      To make matters worse, iomap_dio_actor() has a stale data exposure
      bug bio_iov_iter_get_pages() fails - it does not zero the tail block
      that it may have been left uncovered by partial IO. Fix the error
      handling case to drop to the sub-block zeroing rather than
      immmediately returning the -EFAULT error.
      Signed-off-by: 's avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: 's avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: 's avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: 's avatarDarrick J. Wong <darrick.wong@oracle.com>
      4721a601
    • Dave Chinner's avatar
      iomap: sub-block dio needs to zeroout beyond EOF · b450672f
      Dave Chinner authored
      If we are doing sub-block dio that extends EOF, we need to zero
      the unused tail of the block to initialise the data in it it. If we
      do not zero the tail of the block, then an immediate mmap read of
      the EOF block will expose stale data beyond EOF to userspace. Found
      with fsx running sub-block DIO sizes vs MAPREAD/MAPWRITE operations.
      
      Fix this by detecting if the end of the DIO write is beyond EOF
      and zeroing the tail if necessary.
      Signed-off-by: 's avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: 's avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: 's avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: 's avatarDarrick J. Wong <darrick.wong@oracle.com>
      b450672f
    • Dave Chinner's avatar
      iomap: FUA is wrong for DIO O_DSYNC writes into unwritten extents · 0929d858
      Dave Chinner authored
      When we write into an unwritten extent via direct IO, we dirty
      metadata on IO completion to convert the unwritten extent to
      written. However, when we do the FUA optimisation checks, the inode
      may be clean and so we issue a FUA write into the unwritten extent.
      This means we then bypass the generic_write_sync() call after
      unwritten extent conversion has ben done and we don't force the
      modified metadata to stable storage.
      
      This violates O_DSYNC semantics. The window of exposure is a single
      IO, as the next DIO write will see the inode has dirty metadata and
      hence will not use the FUA optimisation. Calling
      generic_write_sync() after completion of the second IO will also
      sync the first write and it's metadata.
      
      Fix this by avoiding the FUA optimisation when writing to unwritten
      extents.
      Signed-off-by: 's avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: 's avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: 's avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: 's avatarDarrick J. Wong <darrick.wong@oracle.com>
      0929d858
  10. 19 Nov, 2018 1 commit
  11. 16 Nov, 2018 1 commit
  12. 07 Nov, 2018 1 commit
  13. 26 Oct, 2018 1 commit
  14. 23 Oct, 2018 1 commit
  15. 18 Oct, 2018 1 commit
  16. 29 Sep, 2018 1 commit
    • Brian Foster's avatar
      iomap: set page dirty after partial delalloc on mkwrite · 561295a3
      Brian Foster authored
      The iomap page fault mechanism currently dirties the associated page
      after the full block range of the page has been allocated. This
      leaves the page susceptible to delayed allocations without ever
      being set dirty on sub-page block sized filesystems.
      
      For example, consider a page fault on a page with one preexisting
      real (non-delalloc) block allocated in the middle of the page. The
      first iomap_apply() iteration performs delayed allocation on the
      range up to the preexisting block, the next iteration finds the
      preexisting block, and the last iteration attempts to perform
      delayed allocation on the range after the prexisting block to the
      end of the page. If the first allocation succeeds and the final
      allocation fails with -ENOSPC, iomap_apply() returns the error and
      iomap_page_mkwrite() fails to dirty the page having already
      performed partial delayed allocation. This eventually results in the
      page being invalidated without ever converting the delayed
      allocation to real blocks.
      
      This problem is reliably reproduced by generic/083 on XFS on ppc64
      systems (64k page size, 4k block size). It results in leaked
      delalloc blocks on inode reclaim, which triggers an assert failure
      in xfs_fs_destroy_inode() and filesystem accounting inconsistency.
      
      Move the set_page_dirty() call from iomap_page_mkwrite() to the
      actor callback, similar to how the buffer head implementation works.
      The actor callback is called iff ->iomap_begin() returns success, so
      ensures the page is dirtied as soon as possible after an allocation.
      Signed-off-by: 's avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: 's avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: 's avatarDave Chinner <david@fromorbit.com>
      561295a3
  17. 14 Aug, 2018 1 commit
  18. 12 Aug, 2018 1 commit
  19. 02 Aug, 2018 1 commit
    • Eric Sandeen's avatar
      fs: fix iomap_bmap position calculation · 79b3dbe4
      Eric Sandeen authored
      The position calculation in iomap_bmap() shifts bno the wrong way,
      so we don't progress properly and end up re-mapping block zero
      over and over, yielding an unchanging physical block range as the
      logical block advances:
      
      # filefrag -Be file
       ext:   logical_offset:     physical_offset: length:   expected: flags:
         0:      0..       0:      21..        21:      1:             merged
         1:      1..       1:      21..        21:      1:         22: merged
      Discontinuity: Block 1 is at 21 (was 22)
         2:      2..       2:      21..        21:      1:         22: merged
      Discontinuity: Block 2 is at 21 (was 22)
         3:      3..       3:      21..        21:      1:         22: merged
      
      This breaks the FIBMAP interface for anyone using it (XFS), which
      in turn breaks LILO, zipl, etc.
      Bug-actually-spotted-by: 's avatarDarrick J. Wong <darrick.wong@oracle.com>
      Fixes: 89eb1906 ("iomap: add an iomap-based bmap implementation")
      Cc: stable@vger.kernel.org
      Signed-off-by: 's avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: 's avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: 's avatarDarrick J. Wong <darrick.wong@oracle.com>
      79b3dbe4
  20. 12 Jul, 2018 1 commit
    • Christoph Hellwig's avatar
      iomap: add support for sub-pagesize buffered I/O without buffer heads · 9dc55f13
      Christoph Hellwig authored
      After already supporting a simple implementation of buffered writes for
      the blocksize == PAGE_SIZE case in the last commit this adds full support
      even for smaller block sizes.   There are three bits of per-block
      information in the buffer_head structure that really matter for the iomap
      read and write path:
      
       - uptodate status (BH_uptodate)
       - marked as currently under read I/O (BH_Async_Read)
       - marked as currently under write I/O (BH_Async_Write)
      
      Instead of having new per-block structures this now adds a per-page
      structure called struct iomap_page to track this information in a slightly
      different form:
      
       - a bitmap for the per-block uptodate status.  For worst case of a 64k
         page size system this bitmap needs to contain 128 bits.  For the
         typical 4k page size case it only needs 8 bits, although we still
         need a full unsigned long due to the way the atomic bitmap API works.
       - two atomic_t counters are used to track the outstanding read and write
         counts
      
      There is quite a bit of boilerplate code as the buffered I/O path uses
      various helper methods, but the actual code is very straight forward.
      Signed-off-by: 's avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: 's avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: 's avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: 's avatarDarrick J. Wong <darrick.wong@oracle.com>
      9dc55f13
  21. 03 Jul, 2018 3 commits
  22. 20 Jun, 2018 1 commit
  23. 19 Jun, 2018 4 commits
  24. 05 Jun, 2018 1 commit
  25. 02 Jun, 2018 7 commits