1. 17 Dec, 2018 1 commit
    • Damien Le Moal's avatar
      block: mq-deadline: Fix write completion handling · 7211aef8
      Damien Le Moal authored
      For a zoned block device using mq-deadline, if a write request for a
      zone is received while another write was already dispatched for the same
      zone, dd_dispatch_request() will return NULL and the newly inserted
      write request is kept in the scheduler queue waiting for the ongoing
      zone write to complete. With this behavior, when no other request has
      been dispatched, rq_list in blk_mq_sched_dispatch_requests() is empty
      and blk_mq_sched_mark_restart_hctx() not called. This in turn leads to
      __blk_mq_free_request() call of blk_mq_sched_restart() to not run the
      queue when the already dispatched write request completes. The newly
      dispatched request stays stuck in the scheduler queue until eventually
      another request is submitted.
      
      This problem does not affect SCSI disk as the SCSI stack handles queue
      restart on request completion. However, this problem is can be triggered
      the nullblk driver with zoned mode enabled.
      
      Fix this by always requesting a queue restart in dd_dispatch_request()
      if no request was dispatched while WRITE requests are queued.
      
      Fixes: 5700f691 ("mq-deadline: Introduce zone locking support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      
      Add missing export of blk_mq_sched_restart()
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7211aef8
  2. 07 Nov, 2018 2 commits
  3. 24 May, 2018 1 commit
  4. 01 Mar, 2018 1 commit
    • Damien Le Moal's avatar
      mq-deadline: Make sure to always unlock zones · f3bc78d2
      Damien Le Moal authored
      In case of a failed write request (all retries failed) and when using
      libata, the SCSI error handler calls scsi_finish_command(). In the
      case of blk-mq this means that scsi_mq_done() does not get called,
      that blk_mq_complete_request() does not get called and also that the
      mq-deadline .completed_request() method is not called. This results in
      the target zone of the failed write request being left in a locked
      state, preventing that any new write requests are issued to the same
      zone.
      
      Fix this by replacing the .completed_request() method with the
      .finish_request() method as this method is always called whether or
      not a request completes successfully. Since the .finish_request()
      method is only called by the blk-mq core if a .prepare_request()
      method exists, add a dummy .prepare_request() method.
      
      Fixes: 5700f691 ("mq-deadline: Introduce zone locking support")
      Cc: Hannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      [ bvanassche: edited patch description ]
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f3bc78d2
  5. 06 Jan, 2018 1 commit
  6. 05 Jan, 2018 2 commits
    • Damien Le Moal's avatar
      mq-deadline: Introduce zone locking support · 5700f691
      Damien Le Moal authored
      Introduce zone write locking to avoid write request reordering with
      zoned block devices. This is achieved using a finer selection of the
      next request to dispatch:
      1) Any non-write request is always allowed to proceed.
      2) Any write to a conventional zone is always allowed to proceed.
      3) For a write to a sequential zone, the zone lock is first checked.
         a) If the zone is not locked, the write is allowed to proceed after
            its target zone is locked.
         b) If the zone is locked, the write request is skipped and the next
            request in the dispatch queue tested (back to step 1).
      
      For a write request that has locked its target zone, the zone is
      unlocked either when the request completes with a call to the method
      deadline_request_completed() or when the request is requeued using
      dd_insert_request().
      
      Requests targeting a locked zone are always left in the scheduler queue
      to preserve the lba ordering for write requests. If no write request
      can be dispatched, allow reads to be dispatched even if the write batch
      is not done.
      
      If the device used is not a zoned block device, or if zoned block device
      support is disabled, this patch does not modify mq-deadline behavior.
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: Martin K. Petersen's avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5700f691
    • Damien Le Moal's avatar
      mq-deadline: Introduce dispatch helpers · bf09ce56
      Damien Le Moal authored
      Avoid directly referencing the next_rq and fifo_list arrays using the
      helper functions deadline_next_request() and deadline_fifo_request() to
      facilitate changes in the dispatch request selection in
      __dd_dispatch_request() for zoned block devices.
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: default avatarBart Van Assche <Bart.VanAssche@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: Martin K. Petersen's avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bf09ce56
  7. 25 Oct, 2017 1 commit
  8. 29 Aug, 2017 1 commit
  9. 28 Aug, 2017 1 commit
  10. 04 May, 2017 1 commit
  11. 08 Feb, 2017 1 commit
  12. 03 Feb, 2017 1 commit
    • Jens Axboe's avatar
      block: free merged request in the caller · e4d750c9
      Jens Axboe authored
      If we end up doing a request-to-request merge when we have completed
      a bio-to-request merge, we free the request from deep down in that
      path. For blk-mq-sched, the merge path has to hold the appropriate
      lock, but we don't need it for freeing the request. And in fact
      holding the lock is problematic, since we are now calling the
      mq sched put_rq_private() hook with the lock held. Other call paths
      do not hold this lock.
      
      Fix this inconsistency by ensuring that the caller frees a merged
      request. Then we can do it outside of the lock, making it both more
      efficient and fixing the blk-mq-sched problem of invoking parts of
      the scheduler with an unknown lock state.
      Reported-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Reviewed-by: default avatarOmar Sandoval <osandov@fb.com>
      e4d750c9
  13. 02 Feb, 2017 1 commit
  14. 31 Jan, 2017 1 commit
  15. 27 Jan, 2017 1 commit
  16. 17 Jan, 2017 2 commits
  17. 21 Jul, 2016 1 commit
    • Tahsin Erdogan's avatar
      block: do not merge requests without consulting with io scheduler · 72ef799b
      Tahsin Erdogan authored
      Before merging a bio into an existing request, io scheduler is called to
      get its approval first. However, the requests that come from a plug
      flush may get merged by block layer without consulting with io
      scheduler.
      
      In case of CFQ, this can cause fairness problems. For instance, if a
      request gets merged into a low weight cgroup's request, high weight cgroup
      now will depend on low weight cgroup to get scheduled. If high weigt cgroup
      needs that io request to complete before submitting more requests, then it
      will also lose its timeslice.
      
      Following script demonstrates the problem. Group g1 has a low weight, g2
      and g3 have equal high weights but g2's requests are adjacent to g1's
      requests so they are subject to merging. Due to these merges, g2 gets
      poor disk time allocation.
      
      cat > cfq-merge-repro.sh << "EOF"
      #!/bin/bash
      set -e
      
      IO_ROOT=/mnt-cgroup/io
      
      mkdir -p $IO_ROOT
      
      if ! mount | grep -qw $IO_ROOT; then
        mount -t cgroup none -oblkio $IO_ROOT
      fi
      
      cd $IO_ROOT
      
      for i in g1 g2 g3; do
        if [ -d $i ]; then
          rmdir $i
        fi
      done
      
      mkdir g1 && echo 10 > g1/blkio.weight
      mkdir g2 && echo 495 > g2/blkio.weight
      mkdir g3 && echo 495 > g3/blkio.weight
      
      RUNTIME=10
      
      (echo $BASHPID > g1/cgroup.procs &&
       fio --readonly --name name1 --filename /dev/sdb \
           --rw read --size 64k --bs 64k --time_based \
           --runtime=$RUNTIME --offset=0k &> /dev/null)&
      
      (echo $BASHPID > g2/cgroup.procs &&
       fio --readonly --name name1 --filename /dev/sdb \
           --rw read --size 64k --bs 64k --time_based \
           --runtime=$RUNTIME --offset=64k &> /dev/null)&
      
      (echo $BASHPID > g3/cgroup.procs &&
       fio --readonly --name name1 --filename /dev/sdb \
           --rw read --size 64k --bs 64k --time_based \
           --runtime=$RUNTIME --offset=256k &> /dev/null)&
      
      sleep $((RUNTIME+1))
      
      for i in g1 g2 g3; do
        echo ---- $i ----
        cat $i/blkio.time
      done
      
      EOF
      # ./cfq-merge-repro.sh
      ---- g1 ----
      8:16 162
      ---- g2 ----
      8:16 165
      ---- g3 ----
      8:16 686
      
      After applying the patch:
      
      # ./cfq-merge-repro.sh
      ---- g1 ----
      8:16 90
      ---- g2 ----
      8:16 445
      ---- g3 ----
      8:16 471
      Signed-off-by: default avatarTahsin Erdogan <tahsin@google.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      72ef799b
  18. 28 Jun, 2016 1 commit
    • Jan Kara's avatar
      block: Convert fifo_time from ulong to u64 · 9828c2c6
      Jan Kara authored
      Currently rq->fifo_time is unsigned long but CFQ stores nanosecond
      timestamp in it which would overflow on 32-bit archs. Convert it to u64
      to avoid the overflow. Since the rq->fifo_time is unioned with struct
      call_single_data(), this does not change the size of struct request in
      any way.
      
      We have to slightly fixup block/deadline-iosched.c so that comparison
      happens in the right types.
      
      Fixes: 9a7f38c4Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      9828c2c6
  19. 01 Feb, 2016 1 commit
  20. 24 Feb, 2014 1 commit
  21. 11 Sep, 2013 1 commit
  22. 03 Jul, 2013 1 commit
    • majianpeng's avatar
      elevator: Fix a race in elevator switching · d50235b7
      majianpeng authored
      There's a race between elevator switching and normal io operation.
          Because the allocation of struct elevator_queue and struct elevator_data
          don't in a atomic operation.So there are have chance to use NULL
          ->elevator_data.
          For example:
              Thread A:                               Thread B
              blk_queu_bio                            elevator_switch
              spin_lock_irq(q->queue_block)           elevator_alloc
              elv_merge                               elevator_init_fn
      
          Because call elevator_alloc, it can't hold queue_lock and the
          ->elevator_data is NULL.So at the same time, threadA call elv_merge and
          nedd some info of elevator_data.So the crash happened.
      
          Move the elevator_alloc into func elevator_init_fn, it make the
          operations in a atomic operation.
      
          Using the follow method can easy reproduce this bug
          1:dd if=/dev/sdb of=/dev/null
          2:while true;do echo noop > scheduler;echo deadline > scheduler;done
      
          The test method also use this method.
      Signed-off-by: majianpeng's avatarJianpeng Ma <majianpeng@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d50235b7
  23. 23 Mar, 2013 1 commit
    • Kent Overstreet's avatar
      block: Add bio_end_sector() · f73a1c7d
      Kent Overstreet authored
      Just a little convenience macro - main reason to add it now is preparing
      for immutable bio vecs, it'll reduce the size of the patch that puts
      bi_sector/bi_size/bi_idx into a struct bvec_iter.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Lars Ellenberg <drbd-dev@lists.linbit.com>
      CC: Jiri Kosina <jkosina@suse.cz>
      CC: Alasdair Kergon <agk@redhat.com>
      CC: dm-devel@redhat.com
      CC: Neil Brown <neilb@suse.de>
      CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
      CC: Heiko Carstens <heiko.carstens@de.ibm.com>
      CC: linux-s390@vger.kernel.org
      CC: Chris Mason <chris.mason@fusionio.com>
      CC: Steven Whitehouse <swhiteho@redhat.com>
      Acked-by: Steven Whitehouse's avatarSteven Whitehouse <swhiteho@redhat.com>
      f73a1c7d
  24. 09 Dec, 2012 1 commit
  25. 06 Mar, 2012 1 commit
    • Tejun Heo's avatar
      elevator: make elevator_init_fn() return 0/-errno · b2fab5ac
      Tejun Heo authored
      elevator_ops->elevator_init_fn() has a weird return value.  It returns
      a void * which the caller should assign to q->elevator->elevator_data
      and %NULL return denotes init failure.
      
      Update such that it returns integer 0/-errno and sets elevator_data
      directly as necessary.
      
      This makes the interface more conventional and eases further cleanup.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b2fab5ac
  26. 13 Dec, 2011 1 commit
    • Tejun Heo's avatar
      block, cfq: move icq cache management to block core · 3d3c2379
      Tejun Heo authored
      Let elevators set ->icq_size and ->icq_align in elevator_type and
      elv_register() and elv_unregister() respectively create and destroy
      kmem_cache for icq.
      
      * elv_register() now can return failure.  All callers updated.
      
      * icq caches are automatically named "ELVNAME_io_cq".
      
      * cfq_slab_setup/kill() are collapsed into cfq_init/exit().
      
      * While at it, minor indentation change for iosched_cfq.elevator_name
        for consistency.
      
      This will help moving icq management to block core.  This doesn't
      introduce any functional change.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3d3c2379
  27. 02 Jun, 2011 1 commit
    • Jeff Moyer's avatar
      iosched: prevent aliased requests from starving other I/O · 796d5116
      Jeff Moyer authored
      Hi, Jens,
      
      If you recall, I posted an RFC patch for this back in July of last year:
      http://lkml.org/lkml/2010/7/13/279
      
      The basic problem is that a process can issue a never-ending stream of
      async direct I/Os to the same sector on a device, thus starving out
      other I/O in the system (due to the way the alias handling works in both
      cfq and deadline).  The solution I proposed back then was to start
      dispatching from the fifo after a certain number of aliases had been
      dispatched.  Vivek asked why we had to treat aliases differently at all,
      and I never had a good answer.  So, I put together a simple patch which
      allows aliases to be added to the rb tree (it adds them to the right,
      though that doesn't matter as the order isn't guaranteed anyway).  I
      think this is the preferred solution, as it doesn't break up time slices
      in CFQ or batches in deadline.  I've tested it, and it does solve the
      starvation issue.  Let me know what you think.
      
      Cheers,
      Jeff
      Signed-off-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      796d5116
  28. 10 Mar, 2011 1 commit
  29. 11 May, 2009 1 commit
    • Tejun Heo's avatar
      block: convert to pos and nr_sectors accessors · 83096ebf
      Tejun Heo authored
      With recent cleanups, there is no place where low level driver
      directly manipulates request fields.  This means that the 'hard'
      request fields always equal the !hard fields.  Convert all
      rq->sectors, nr_sectors and current_nr_sectors references to
      accessors.
      
      While at it, drop superflous blk_rq_pos() < 0 test in swim.c.
      
      [ Impact: use pos and nr_sectors accessors ]
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarGeert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
      Tested-by: default avatarGrant Likely <grant.likely@secretlab.ca>
      Acked-by: default avatarGrant Likely <grant.likely@secretlab.ca>
      Tested-by: default avatarAdrian McMenamin <adrian@mcmen.demon.co.uk>
      Acked-by: default avatarAdrian McMenamin <adrian@mcmen.demon.co.uk>
      Acked-by: default avatarMike Miller <mike.miller@hp.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      Cc: Borislav Petkov <petkovbb@googlemail.com>
      Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
      Cc: Eric Moore <Eric.Moore@lsi.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Pete Zaitcev <zaitcev@redhat.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Paul Clements <paul.clements@steeleye.com>
      Cc: Tim Waugh <tim@cyberelk.net>
      Cc: Jeff Garzik <jgarzik@pobox.com>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Alex Dubov <oakad@yahoo.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Dario Ballabio <ballabio_dario@emc.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: unsik Kim <donari75@gmail.com>
      Cc: Laurent Vivier <Laurent@lvivier.info>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      83096ebf
  30. 29 Dec, 2008 1 commit
  31. 09 Oct, 2008 2 commits
  32. 18 Dec, 2007 1 commit
  33. 02 Nov, 2007 3 commits
  34. 24 Jul, 2007 1 commit