1. 12 Aug, 2018 1 commit
  2. 11 Aug, 2018 7 commits
    • Oleksandr Natalenko's avatar
      fix merge conflict · 986cbadb
      Oleksandr Natalenko authored
      986cbadb
    • Ming Lei's avatar
      blk-mq: avoid to synchronize rcu inside blk_cleanup_queue() · 4f6e278c
      Ming Lei authored
      SCSI probing may synchronously create and destroy a lot of request_queues
      for non-existent devices. Any synchronize_rcu() in queue creation or
      destroy path may introduce long latency during booting, see detailed
      description in comment of blk_register_queue().
      
      This patch removes one synchronize_rcu() inside blk_cleanup_queue()
      for this case, commit c2856ae2(blk-mq: quiesce queue before freeing queue)
      needs synchronize_rcu() for implementing blk_mq_quiesce_queue(), but
      when queue isn't initialized, it isn't necessary to do that since
      only pass-through requests are involved, no original issue in
      scsi_execute() at all.
      
      Without this patch and previous one, it may take more 20+ seconds for
      virtio-scsi to complete disk probe. With the two patches, the time becomes
      less than 100ms.
      
      Fixes: c2856ae2 ("blk-mq: quiesce queue before freeing queue")
      Reported-by: 's avatarAndrew Jones <drjones@redhat.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: linux-scsi@vger.kernel.org
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Tested-by: 's avatarAndrew Jones <drjones@redhat.com>
      Signed-off-by: 's avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      4f6e278c
    • Ming Lei's avatar
      blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set() · f00ab16a
      Ming Lei authored
      We have to remove synchronize_rcu() from blk_queue_cleanup(),
      otherwise long delay can be caused during lun probe. For removing
      it, we have to avoid to iterate the set->tag_list in IO path, eg,
      blk_mq_sched_restart().
      
      This patch reverts 5b79413946d (Revert "blk-mq: don't handle
      TAG_SHARED in restart"). Given we have fixed enough IO hang issue,
      and there isn't any reason to restart all queues in one tags any more,
      see the following reasons:
      
      1) blk-mq core can deal with shared-tags case well via blk_mq_get_driver_tag(),
      which can wake up queues waiting for driver tag.
      
      2) SCSI is a bit special because it may return BLK_STS_RESOURCE if queue,
      target or host is ready, but SCSI built-in restart can cover all these well,
      see scsi_end_request(), queue will be rerun after any request initiated from
      this host/target is completed.
      
      In my test on scsi_debug(8 luns), this patch may improve IOPS by 20% ~ 30%
      when running I/O on these 8 luns concurrently.
      
      Fixes: 705cda97 ("blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list")
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: linux-scsi@vger.kernel.org
      Reported-by: 's avatarAndrew Jones <drjones@redhat.com>
      Tested-by: 's avatarAndrew Jones <drjones@redhat.com>
      Signed-off-by: 's avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      f00ab16a
    • Roman Pen's avatar
      blk-mq: reinit q->tag_set_list entry only after grace period · 67f1d9af
      Roman Pen authored
      It is not allowed to reinit q->tag_set_list list entry while RCU grace
      period has not completed yet, otherwise the following soft lockup in
      blk_mq_sched_restart() happens:
      
      [ 1064.252652] watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [fio:9270]
      [ 1064.254445] task: ffff99b912e8b900 task.stack: ffffa6d54c758000
      [ 1064.254613] RIP: 0010:blk_mq_sched_restart+0x96/0x150
      [ 1064.256510] Call Trace:
      [ 1064.256664]  <IRQ>
      [ 1064.256824]  blk_mq_free_request+0xea/0x100
      [ 1064.256987]  msg_io_conf+0x59/0xd0 [ibnbd_client]
      [ 1064.257175]  complete_rdma_req+0xf2/0x230 [ibtrs_client]
      [ 1064.257340]  ? ibtrs_post_recv_empty+0x4d/0x70 [ibtrs_core]
      [ 1064.257502]  ibtrs_clt_rdma_done+0xd1/0x1e0 [ibtrs_client]
      [ 1064.257669]  ib_create_qp+0x321/0x380 [ib_core]
      [ 1064.257841]  ib_process_cq_direct+0xbd/0x120 [ib_core]
      [ 1064.258007]  irq_poll_softirq+0xb7/0xe0
      [ 1064.258165]  __do_softirq+0x106/0x2a2
      [ 1064.258328]  irq_exit+0x92/0xa0
      [ 1064.258509]  do_IRQ+0x4a/0xd0
      [ 1064.258660]  common_interrupt+0x7a/0x7a
      [ 1064.258818]  </IRQ>
      
      Meanwhile another context frees other queue but with the same set of
      shared tags:
      
      [ 1288.201183] INFO: task bash:5910 blocked for more than 180 seconds.
      [ 1288.201833] bash            D    0  5910   5820 0x00000000
      [ 1288.202016] Call Trace:
      [ 1288.202315]  schedule+0x32/0x80
      [ 1288.202462]  schedule_timeout+0x1e5/0x380
      [ 1288.203838]  wait_for_completion+0xb0/0x120
      [ 1288.204137]  __wait_rcu_gp+0x125/0x160
      [ 1288.204287]  synchronize_sched+0x6e/0x80
      [ 1288.204770]  blk_mq_free_queue+0x74/0xe0
      [ 1288.204922]  blk_cleanup_queue+0xc7/0x110
      [ 1288.205073]  ibnbd_clt_unmap_device+0x1bc/0x280 [ibnbd_client]
      [ 1288.205389]  ibnbd_clt_unmap_dev_store+0x169/0x1f0 [ibnbd_client]
      [ 1288.205548]  kernfs_fop_write+0x109/0x180
      [ 1288.206328]  vfs_write+0xb3/0x1a0
      [ 1288.206476]  SyS_write+0x52/0xc0
      [ 1288.206624]  do_syscall_64+0x68/0x1d0
      [ 1288.206774]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      What happened is the following:
      
      1. There are several MQ queues with shared tags.
      2. One queue is about to be freed and now task is in
         blk_mq_del_queue_tag_set().
      3. Other CPU is in blk_mq_sched_restart() and loops over all queues in
         tag list in order to find hctx to restart.
      
      Because linked list entry was modified in blk_mq_del_queue_tag_set()
      without proper waiting for a grace period, blk_mq_sched_restart()
      never ends, spining in list_for_each_entry_rcu_rr(), thus soft lockup.
      
      Fix is simple: reinit list entry after an RCU grace period elapsed.
      
      Fixes: Fixes: 705cda97 ("blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list")
      Cc: stable@vger.kernel.org
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: linux-block@vger.kernel.org
      Reviewed-by: 's avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: 's avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: 's avatarBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: 's avatarRoman Pen <roman.penyaev@profitbricks.com>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      67f1d9af
    • Ming Lei's avatar
      blk-mq: introduce new lock for protecting hctx->dispatch_wait · e5682822
      Ming Lei authored
      Now hctx->lock is only acquired when adding hctx->dispatch_wait to
      one wait queue, but not held when removing it from the wait queue.
      
      IO hang can be observed easily if SCHED RESTART is disabled, that means
      now RESTART exits just for fixing the issue in blk_mq_mark_tag_wait().
      
      This patch fixes the issue by introducing hctx->dispatch_wait_lock and
      holding it for removing hctx->dispatch_wait in blk_mq_dispatch_wake(),
      since we need to avoid acquiring hctx->lock in irq context.
      
      Fixes: eb619fdb ("blk-mq: fix issue with shared tag queue re-running")
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Tested-by: 's avatarAndrew Jones <drjones@redhat.com>
      Signed-off-by: 's avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      e5682822
    • Ming Lei's avatar
      blk-mq: don't pass **hctx to blk_mq_mark_tag_wait() · 340d442c
      Ming Lei authored
      'hctx' won't be changed at all, so not necessary to pass
      '**hctx' to blk_mq_mark_tag_wait().
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Tested-by: 's avatarAndrew Jones <drjones@redhat.com>
      Reviewed-by: 's avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: 's avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      340d442c
    • Ming Lei's avatar
      blk-mq: cleanup blk_mq_get_driver_tag() · 8e15016e
      Ming Lei authored
      We never pass 'wait' as true to blk_mq_get_driver_tag(), and hence
      we never change '**hctx' as well. The last use of these went away
      with the flush cleanup, commit 0c2a6fe4.
      
      So cleanup the usage and remove the two extra parameters.
      
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Tested-by: 's avatarAndrew Jones <drjones@redhat.com>
      Reviewed-by: 's avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: 's avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      8e15016e
  3. 09 Aug, 2018 20 commits
  4. 06 Aug, 2018 12 commits