1. 11 Apr, 2018 1 commit
    • Ming Lei's avatar
      blk-mq: Revert "blk-mq: reimplement blk_mq_hw_queue_mapped" · 2434af79
      Ming Lei authored
      This reverts commit 127276c6.
      
      When all CPUs of one hw queue become offline, there still may have IOs
      not completed from this hctx. But blk_mq_hw_queue_mapped() is called in
      blk_mq_queue_tag_busy_iter(), which is used for iterating request in timeout
      handler, timeout event will be missed on the inactive hctx, then request may
      never be completed.
      
      Also the replementation of blk_mq_hw_queue_mapped() doesn't match the helper's
      name any more, and it should have been named as blk_mq_hw_queue_active().
      
      Even other callers need further verification about this reimplemenation.
      
      So revert this patch now, and we can improve hw queue activate/inactivate event
      after adequent researching and test.
      
      Cc: Stefan Haberland <sth@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Reported-by: default avatarJens Axboe <axboe@kernel.dk>
      Fixes: 	127276c6 ("blk-mq: reimplement blk_mq_hw_queue_mapped")
      Reviewed-by: Sagi Grimberg's avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2434af79
  2. 10 Apr, 2018 10 commits
  3. 02 Apr, 2018 1 commit
  4. 28 Mar, 2018 1 commit
  5. 26 Mar, 2018 2 commits
  6. 22 Mar, 2018 1 commit
  7. 19 Mar, 2018 1 commit
    • Bart Van Assche's avatar
      block: Change a rcu_read_{lock,unlock}_sched() pair into rcu_read_{lock,unlock}() · 818e0fa2
      Bart Van Assche authored
      scsi_device_quiesce() uses synchronize_rcu() to guarantee that the
      effect of blk_set_preempt_only() will be visible for percpu_ref_tryget()
      calls that occur after the queue unfreeze by using the approach
      explained in https://lwn.net/Articles/573497/. The rcu read lock and
      unlock calls in blk_queue_enter() form a pair with the synchronize_rcu()
      call in scsi_device_quiesce(). Both scsi_device_quiesce() and
      blk_queue_enter() must either use regular RCU or RCU-sched.
      Since neither the RCU-protected code in blk_queue_enter() nor
      blk_queue_usage_counter_release() sleeps, regular RCU protection
      is sufficient. Note: scsi_device_quiesce() does not have to be
      modified since it already uses synchronize_rcu().
      Reported-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: 3a0a5299 ("block, scsi: Make SCSI quiesce and resume work reliably")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
      Cc: Martin Steigerwald <martin@lichtvoll.de>
      Cc: stable@vger.kernel.org # v4.15
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      818e0fa2
  8. 17 Mar, 2018 2 commits
  9. 16 Mar, 2018 2 commits
    • Joseph Qi's avatar
      blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir() · 4c699480
      Joseph Qi authored
      We've triggered a WARNING in blk_throtl_bio() when throttling writeback
      io, which complains blkg->refcnt is already 0 when calling blkg_get(),
      and then kernel crashes with invalid page request.
      After investigating this issue, we've found it is caused by a race
      between blkcg_bio_issue_check() and cgroup_rmdir(), which is described
      below:
      
      writeback kworker               cgroup_rmdir
                                        cgroup_destroy_locked
                                          kill_css
                                            css_killed_ref_fn
                                              css_killed_work_fn
                                                offline_css
                                                  blkcg_css_offline
        blkcg_bio_issue_check
          rcu_read_lock
          blkg_lookup
                                                    spin_trylock(q->queue_lock)
                                                    blkg_destroy
                                                    spin_unlock(q->queue_lock)
          blk_throtl_bio
          spin_lock_irq(q->queue_lock)
          ...
          spin_unlock_irq(q->queue_lock)
        rcu_read_unlock
      
      Since rcu can only prevent blkg from releasing when it is being used,
      the blkg->refcnt can be decreased to 0 during blkg_destroy() and schedule
      blkg release.
      Then trying to blkg_get() in blk_throtl_bio() will complains the WARNING.
      And then the corresponding blkg_put() will schedule blkg release again,
      which result in double free.
      This race is introduced by commit ae118896 ("blkcg: consolidate blkg
      creation in blkcg_bio_issue_check()"). Before this commit, it will
      lookup first and then try to lookup/create again with queue_lock. Since
      revive this logic is a bit drastic, so fix it by only offlining pd during
      blkcg_css_offline(), and move the rest destruction (especially
      blkg_put()) into blkcg_css_free(), which should be the right way as
      discussed.
      
      Fixes: ae118896 ("blkcg: consolidate blkg creation in blkcg_bio_issue_check()")
      Reported-by: default avatarJiufei Xue <jiufei.xue@linux.alibaba.com>
      Signed-off-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4c699480
    • Jonas Rabenstein's avatar
      block: sed-opal: fix u64 short atom length · 5f990d31
      Jonas Rabenstein authored
      The length must be given as bytes and not as 4 bit tuples.
      Reviewed-by: default avatarScott Bauer <scott.bauer@intel.com>
      Signed-off-by: default avatarJonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5f990d31
  10. 15 Mar, 2018 1 commit
  11. 13 Mar, 2018 3 commits
  12. 09 Mar, 2018 1 commit
  13. 08 Mar, 2018 5 commits
  14. 07 Mar, 2018 1 commit
  15. 01 Mar, 2018 3 commits
  16. 28 Feb, 2018 5 commits
    • Bart Van Assche's avatar
      block: Fix a race between request queue removal and the block cgroup controller · a063057d
      Bart Van Assche authored
      Avoid that the following race can occur:
      
      blk_cleanup_queue()               blkcg_print_blkgs()
        spin_lock_irq(lock) (1)           spin_lock_irq(blkg->q->queue_lock) (2,5)
          q->queue_lock = &q->__queue_lock (3)
        spin_unlock_irq(lock) (4)
                                          spin_unlock_irq(blkg->q->queue_lock) (6)
      
      (1) take driver lock;
      (2) busy loop for driver lock;
      (3) override driver lock with internal lock;
      (4) unlock driver lock;
      (5) can take driver lock now;
      (6) but unlock internal lock.
      
      This change is safe because only the SCSI core and the NVME core keep
      a reference on a request queue after having called blk_cleanup_queue().
      Neither driver accesses any of the removed data structures between its
      blk_cleanup_queue() and blk_put_queue() calls.
      Reported-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Jan Kara <jack@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a063057d
    • Bart Van Assche's avatar
      block: Fix a race between the cgroup code and request queue initialization · 498f6650
      Bart Van Assche authored
      Initialize the request queue lock earlier such that the following
      race can no longer occur:
      
      blk_init_queue_node()             blkcg_print_blkgs()
        blk_alloc_queue_node (1)
          q->queue_lock = &q->__queue_lock (2)
          blkcg_init_queue(q) (3)
                                          spin_lock_irq(blkg->q->queue_lock) (4)
        q->queue_lock = lock (5)
                                          spin_unlock_irq(blkg->q->queue_lock) (6)
      
      (1) allocate an uninitialized queue;
      (2) initialize queue_lock to its default internal lock;
      (3) initialize blkcg part of request queue, which will create blkg and
          then insert it to blkg_list;
      (4) traverse blkg_list and find the created blkg, and then take its
          queue lock, here it is the default *internal lock*;
      (5) *race window*, now queue_lock is overridden with *driver specified
          lock*;
      (6) now unlock *driver specified lock*, not the locked *internal lock*,
          unlock balance breaks.
      
      The changes in this patch are as follows:
      - Move the .queue_lock initialization from blk_init_queue_node() into
        blk_alloc_queue_node().
      - Only override the .queue_lock pointer for legacy queues because it
        is not useful for blk-mq queues to override this pointer.
      - For all all block drivers that initialize .queue_lock explicitly,
        change the blk_alloc_queue() call in the driver into a
        blk_alloc_queue_node() call and remove the explicit .queue_lock
        initialization. Additionally, initialize the spin lock that will
        be used as queue lock earlier if necessary.
      Reported-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Ulf Hansson <ulf.hansson@linaro.org>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      498f6650
    • Bart Van Assche's avatar
      block: Add 'lock' as third argument to blk_alloc_queue_node() · 5ee0524b
      Bart Van Assche authored
      This patch does not change any functionality.
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Ulf Hansson <ulf.hansson@linaro.org>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5ee0524b
    • Omar Sandoval's avatar
      block: clear ctx pending bit under ctx lock · e9a99a63
      Omar Sandoval authored
      When we insert a request, we set the software queue pending bit while
      holding the software queue lock. However, we clear it outside of the
      lock, so it's possible that a concurrent insert could reset the bit
      after we clear it but before we empty the request list. Afterwards, the
      bit would still be set but the software queue wouldn't have any requests
      in it, leading us to do a spurious run in the future. This is mostly a
      benign/theoretical issue, but it makes the following change easier to
      justify.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e9a99a63
    • Bart Van Assche's avatar
      blk-mq-debugfs: Show zone locking information · 18bc4230
      Bart Van Assche authored
      When debugging the ZBC code in the mq-deadline scheduler it is very
      important to know which zones are locked and which zones are not
      locked. Hence this patch that exports the zone locking information
      through debugfs.
      
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Tested-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      18bc4230