1. 05 Apr, 2019 2 commits
    • Paolo Valente's avatar
      block, bfq: fix queue removal from weights tree · 8b3b22aa
      Paolo Valente authored
      [ Upstream commit 9dee8b3b ]
      
      bfq maintains an ordered list, through a red-black tree, of unique
      weights of active bfq_queues. This list is used to detect whether there
      are active queues with differentiated weights. The weight of a queue is
      removed from the list when both the following two conditions become
      true:
      
      (1) the bfq_queue is flagged as inactive
      (2) the has no in-flight request any longer;
      
      Unfortunately, in the rare cases where condition (2) becomes true before
      condition (1), the removal fails, because the function to remove the
      weight of the queue (bfq_weights_tree_remove) is rightly invoked in the
      path that deactivates the bfq_queue, but mistakenly invoked *before* the
      function that actually performs the deactivation (bfq_deactivate_bfqq).
      
      This commits moves the invocation of bfq_weights_tree_remove for
      condition (1) to after bfq_deactivate_bfqq. As a consequence of this
      move, it is necessary to add a further reference to the queue when the
      weight of a queue is added, because the queue might otherwise be freed
      before bfq_weights_tree_remove is invoked. This commit adds this
      reference and makes all related modifications.
      Signed-off-by: 's avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: 's avatarSasha Levin <sashal@kernel.org>
      8b3b22aa
    • Paolo Valente's avatar
      block, bfq: fix in-service-queue check for queue merging · c581587a
      Paolo Valente authored
      [ Upstream commit 058fdecc ]
      
      When a new I/O request arrives for a bfq_queue, say Q, bfq checks
      whether that request is close to
      (a) the head request of some other queue waiting to be served, or
      (b) the last request dispatched for the in-service queue (in case Q
      itself is not the in-service queue)
      
      If a queue, say Q2, is found for which the above condition holds, then
      bfq merges Q and Q2, to hopefully get a more sequential I/O in the
      resulting merged queue, and thus a possibly higher throughput.
      
      Case (b) is checked by comparing the new request for Q with the last
      request dispatched, assuming that the latter necessarily belonged to the
      in-service queue. Unfortunately, this assumption is no longer always
      correct, since commit d0edc247 ("block, bfq: inject other-queue I/O
      into seeky idle queues on NCQ flash").
      
      When the assumption does not hold, queues that must not be merged may be
      merged, causing unexpected loss of control on per-queue service
      guarantees.
      
      This commit solves this problem by adding an extra field, which stores
      the actual last request dispatched for the in-service queue, and by
      using this new field to correctly check case (b).
      Signed-off-by: 's avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: 's avatarSasha Levin <sashal@kernel.org>
      c581587a
  2. 03 Apr, 2019 1 commit
  3. 12 Feb, 2019 1 commit
    • Jianchao Wang's avatar
      blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue · aef1897c
      Jianchao Wang authored
      When requeue, if RQF_DONTPREP, rq has contained some driver
      specific data, so insert it to hctx dispatch list to avoid any
      merge. Take scsi as example, here is the trace event log (no
      io scheduler, because RQF_STARTED would prevent merging),
      
         kworker/0:1H-339   [000] ...1  2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
      scsi_inert_test-1987  [000] ....  2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
      scsi_inert_test-1987  [000] ...2  2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
         kworker/0:1H-339   [000] ....  2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
      scsi_inert_test-1996  [000] ..s1  2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
      scsi_inert_test-1996  [000] .Ns1  2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
         kworker/0:1H-339   [000] ...1  2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
         kworker/0:1H-339   [000] ...1  2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
      scsi_inert_test-1986  [000] ..s1  2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]
      
      (32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
      Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
      the sdb only contained the part of (32768 + 8), then only that part
      was completed. The lucky thing was that scsi_io_completion detected
      it and requeued the remaining part. So we didn't get corrupted data.
      However, the requeue of (32776 + 8) is not expected.
      Suggested-by: 's avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: 's avatarJianchao Wang <jianchao.w.wang@oracle.com>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      aef1897c
  4. 08 Feb, 2019 3 commits
    • Liu Bo's avatar
      blk-mq: remove duplicated definition of blk_mq_freeze_queue · 26984841
      Liu Bo authored
      As the prototype has been defined in "include/linux/blk-mq.h", the one
      in "block/blk-mq.h" can be removed then.
      Signed-off-by: 's avatarLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      26984841
    • Liu Bo's avatar
      Blk-iolatency: warn on negative inflight IO counter · 391f552a
      Liu Bo authored
      This is to catch any unexpected negative value of inflight IO counter.
      Signed-off-by: 's avatarLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      391f552a
    • Liu Bo's avatar
      blk-iolatency: fix IO hang due to negative inflight counter · 8c772a9b
      Liu Bo authored
      Our test reported the following stack, and vmcore showed that
      ->inflight counter is -1.
      
      [ffffc9003fcc38d0] __schedule at ffffffff8173d95d
      [ffffc9003fcc3958] schedule at ffffffff8173de26
      [ffffc9003fcc3970] io_schedule at ffffffff810bb6b6
      [ffffc9003fcc3988] blkcg_iolatency_throttle at ffffffff813911cb
      [ffffc9003fcc3a20] rq_qos_throttle at ffffffff813847f3
      [ffffc9003fcc3a48] blk_mq_make_request at ffffffff8137468a
      [ffffc9003fcc3b08] generic_make_request at ffffffff81368b49
      [ffffc9003fcc3b68] submit_bio at ffffffff81368d7d
      [ffffc9003fcc3bb8] ext4_io_submit at ffffffffa031be00 [ext4]
      [ffffc9003fcc3c00] ext4_writepages at ffffffffa03163de [ext4]
      [ffffc9003fcc3d68] do_writepages at ffffffff811c49ae
      [ffffc9003fcc3d78] __filemap_fdatawrite_range at ffffffff811b6188
      [ffffc9003fcc3e30] filemap_write_and_wait_range at ffffffff811b6301
      [ffffc9003fcc3e60] ext4_sync_file at ffffffffa030cee8 [ext4]
      [ffffc9003fcc3ea8] vfs_fsync_range at ffffffff8128594b
      [ffffc9003fcc3ee8] do_fsync at ffffffff81285abd
      [ffffc9003fcc3f18] sys_fsync at ffffffff81285d50
      [ffffc9003fcc3f28] do_syscall_64 at ffffffff81003c04
      [ffffc9003fcc3f50] entry_SYSCALL_64_after_swapgs at ffffffff81742b8e
      
      The ->inflight counter may be negative (-1) if
      
      1) blk-iolatency was disabled when the IO was issued,
      
      2) blk-iolatency was enabled before this IO reached its endio,
      
      3) the ->inflight counter is decreased from 0 to -1 in endio()
      
      In fact the hang can be easily reproduced by the below script,
      
      H=/sys/fs/cgroup/unified/
      P=/sys/fs/cgroup/unified/test
      
      echo "+io" > $H/cgroup.subtree_control
      mkdir -p $P
      
      echo $$ > $P/cgroup.procs
      
      xfs_io -f -d -c "pwrite 0 4k" /dev/sdg
      
      echo "`cat /sys/block/sdg/dev` target=1000000" > $P/io.latency
      
      xfs_io -f -d -c "pwrite 0 4k" /dev/sdg
      
      This fixes the problem by freezing the queue so that while
      enabling/disabling iolatency, there is no inflight rq running.
      
      Note that quiesce_queue is not needed as this only updating iolatency
      configuration about which dispatching request_queue doesn't care.
      Signed-off-by: 's avatarLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      8c772a9b
  5. 31 Jan, 2019 1 commit
  6. 30 Jan, 2019 2 commits
  7. 27 Jan, 2019 1 commit
  8. 24 Jan, 2019 2 commits
  9. 22 Jan, 2019 1 commit
    • Ming Lei's avatar
      block: cover another queue enter recursion via BIO_QUEUE_ENTERED · 698cef17
      Ming Lei authored
      Except for blk_queue_split(), bio_split() is used for splitting bio too,
      then the remained bio is often resubmit to queue via generic_make_request().
      So the same queue enter recursion exits in this case too. Unfortunatley
      commit cd4a4ae4 doesn't help this case.
      
      This patch covers the above case by setting BIO_QUEUE_ENTERED before calling
      q->make_request_fn.
      
      In theory the per-bio flag is used to simulate one stack variable, it is
      just fine to clear it after q->make_request_fn is returned. Especially
      the same bio can't be submitted from another context.
      
      Fixes: cd4a4ae4 ("block: don't use blocking queue entered for recursive bio submits")
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: NeilBrown <neilb@suse.com>
      Reviewed-by: 's avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: 's avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      698cef17
  10. 18 Jan, 2019 1 commit
    • Thomas Gleixner's avatar
      block: Cleanup license notice · 38197ca1
      Thomas Gleixner authored
      Remove the imprecise and sloppy:
      
        "This files is licensed under the GPL."
      
      license notice in the top level comment.
      
      1) The file already contains a SPDX license identifier which clearly
         states that the license of the file is GPL V2 only
      
      2) The notice resolves to GPL v1 or later for scanners which is just
         contrary to the intent of SPDX identifiers to provide clear and non
         ambiguous license information. Aside of that the value add of this
         notice is below zero,
      
      Cc: Damien Le Moal <damien.lemoal@wdc.com>
      Cc: Matias Bjorling <mb@lightnvm.io>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Fixes: 6a5ac984 ("block: Make struct request_queue smaller for CONFIG_BLK_DEV_ZONED=n")
      Reviewed-by: 's avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: 's avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      38197ca1
  11. 16 Jan, 2019 1 commit
  12. 14 Jan, 2019 1 commit
    • Paolo Valente's avatar
      block, bfq: fix comments on __bfq_deactivate_entity · 5bf85908
      Paolo Valente authored
      Comments on function __bfq_deactivate_entity contains two imprecise or
      wrong statements:
      1) The function performs the deactivation of the entity.
      2) The function must be invoked only if the entity is on a service tree.
      
      This commits replaces both statements with the correct ones:
      1) The functions updates sched_data and service trees for the entity,
      so as to represent entity as inactive (which is only part of the steps
      needed for the deactivation of the entity).
      2) The function must be invoked on every entity being deactivated.
      Signed-off-by: 's avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      5bf85908
  13. 09 Jan, 2019 2 commits
  14. 21 Dec, 2018 4 commits
  15. 20 Dec, 2018 1 commit
  16. 19 Dec, 2018 2 commits
    • Ming Lei's avatar
      block: save irq state in blkg_lookup_create() · 3a762de5
      Ming Lei authored
      blkg_lookup_create() may be called from pool_map() in which
      irq state is saved, so we have to do that in blkg_lookup_create().
      
      Otherwise, the following lockdep warning can be triggered:
      
      [  104.258537] ================================
      [  104.259129] WARNING: inconsistent lock state
      [  104.259725] 4.20.0-rc6+ #545 Not tainted
      [  104.260268] --------------------------------
      [  104.260865] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      [  104.261727] swapper/49/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
      [  104.262444] 00000000db365b5d (&(&pool->lock)->rlock#3){+.?.}, at: thin_endio+0xcf/0x2a3 [dm_thin_pool]
      [  104.263747] {SOFTIRQ-ON-W} state was registered at:
      [  104.264417]   _raw_spin_unlock_irq+0x29/0x4c
      [  104.265014]   blkg_lookup_create+0xdc/0xe6
      [  104.265609]   bio_associate_blkg_from_css+0xd3/0x13f
      [  104.266312]   bio_associate_blkg+0x15a/0x1bb
      [  104.266913]   pool_map+0xe8/0x103 [dm_thin_pool]
      [  104.267572]   __map_bio+0x98/0x29c [dm_mod]
      [  104.268162]   __split_and_process_non_flush+0x29e/0x306 [dm_mod]
      [  104.269003]   __split_and_process_bio+0x16a/0x25b [dm_mod]
      [  104.269971]   __dm_make_request.isra.14+0xdc/0x124 [dm_mod]
      [  104.270973]   generic_make_request+0x3f5/0x68b
      [  104.271676]   process_prepared_mapping+0x166/0x1ef [dm_thin_pool]
      [  104.272531]   schedule_zero+0x239/0x273 [dm_thin_pool]
      [  104.273245]   process_cell+0x60c/0x6f1 [dm_thin_pool]
      [  104.273967]   do_worker+0x60c/0xca8 [dm_thin_pool]
      [  104.274635]   process_one_work+0x4eb/0x834
      [  104.275203]   worker_thread+0x318/0x484
      [  104.275740]   kthread+0x1d1/0x1e1
      [  104.276203]   ret_from_fork+0x3a/0x50
      [  104.276714] irq event stamp: 170003
      [  104.277201] hardirqs last  enabled at (170002): [<ffffffff81bcc33e>] _raw_spin_unlock_irqrestore+0x44/0x6b
      [  104.278535] hardirqs last disabled at (170003): [<ffffffff81bcc1ad>] _raw_spin_lock_irqsave+0x20/0x55
      [  104.280273] softirqs last  enabled at (169978): [<ffffffff810d13d4>] irq_enter+0x4c/0x73
      [  104.281617] softirqs last disabled at (169979): [<ffffffff810d1479>] irq_exit+0x7e/0x11d
      [  104.282744]
      [  104.282744] other info that might help us debug this:
      [  104.283640]  Possible unsafe locking scenario:
      [  104.283640]
      [  104.284452]        CPU0
      [  104.284803]        ----
      [  104.285150]   lock(&(&pool->lock)->rlock#3);
      [  104.285762]   <Interrupt>
      [  104.286130]     lock(&(&pool->lock)->rlock#3);
      [  104.286750]
      [  104.286750]  *** DEADLOCK ***
      [  104.286750]
      [  104.287564] no locks held by swapper/49/0.
      [  104.288129]
      [  104.288129] stack backtrace:
      [  104.288738] CPU: 49 PID: 0 Comm: swapper/49 Not tainted 4.20.0-rc6+ #545
      [  104.289700] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
      [  104.290858] Call Trace:
      [  104.291204]  <IRQ>
      [  104.291502]  dump_stack+0x9a/0xe6
      [  104.291968]  mark_lock+0x56c/0x7a6
      [  104.292442]  ? check_usage_backwards+0x209/0x209
      [  104.293086]  __lock_acquire+0x400/0x15bf
      [  104.293662]  ? check_chain_key+0x150/0x1aa
      [  104.294236]  lock_acquire+0x1a6/0x1e3
      [  104.294768]  ? thin_endio+0xcf/0x2a3 [dm_thin_pool]
      [  104.295444]  ? _raw_spin_unlock_irqrestore+0x44/0x6b
      [  104.296143]  ? process_prepared_discard_fail+0x36/0x36 [dm_thin_pool]
      [  104.297031]  _raw_spin_lock_irqsave+0x46/0x55
      [  104.297659]  ? thin_endio+0xcf/0x2a3 [dm_thin_pool]
      [  104.298335]  thin_endio+0xcf/0x2a3 [dm_thin_pool]
      [  104.298997]  ? process_prepared_discard_fail+0x36/0x36 [dm_thin_pool]
      [  104.299886]  ? check_flags+0x20a/0x20a
      [  104.300408]  ? lock_acquire+0x1a6/0x1e3
      [  104.300954]  ? process_prepared_discard_fail+0x36/0x36 [dm_thin_pool]
      [  104.301865]  clone_endio+0x1bb/0x22d [dm_mod]
      [  104.302491]  ? disable_write_zeroes+0x20/0x20 [dm_mod]
      [  104.303200]  ? bio_disassociate_blkg+0xc6/0x15f
      [  104.303836]  ? bio_endio+0x2b2/0x2da
      [  104.304349]  clone_endio+0x1f3/0x22d [dm_mod]
      [  104.304978]  ? disable_write_zeroes+0x20/0x20 [dm_mod]
      [  104.305709]  ? bio_disassociate_blkg+0xc6/0x15f
      [  104.306333]  ? bio_endio+0x2b2/0x2da
      [  104.306853]  clone_endio+0x1f3/0x22d [dm_mod]
      [  104.307476]  ? disable_write_zeroes+0x20/0x20 [dm_mod]
      [  104.308185]  ? bio_disassociate_blkg+0xc6/0x15f
      [  104.308817]  ? bio_endio+0x2b2/0x2da
      [  104.309319]  blk_update_request+0x2de/0x4cc
      [  104.309927]  blk_mq_end_request+0x2a/0x183
      [  104.310498]  blk_done_softirq+0x16a/0x1a6
      [  104.311051]  ? blk_softirq_cpu_dead+0xe2/0xe2
      [  104.311653]  ? __lock_is_held+0x2a/0x87
      [  104.312186]  __do_softirq+0x250/0x4e8
      [  104.312705]  irq_exit+0x7e/0x11d
      [  104.313157]  call_function_single_interrupt+0xf/0x20
      [  104.313860]  </IRQ>
      [  104.314163] RIP: 0010:native_safe_halt+0x2/0x3
      [  104.314792] Code: 63 02 df f0 83 44 24 fc 00 48 89 df e8 cc 3f 7a ff 48 8b 03 a8 08 74 0b 65 81 25 9d 31 45 7e ff ff ff 7f 5b 5d 41 5c c3 fb f4 <c3> f4 c3 0f 1f 44 00 00 41 56 41 55 41 54 55 53 e8 a2 0d 5c ff e8
      [  104.317339] RSP: 0018:ffff888106c9fdc0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
      [  104.318390] RAX: 1ffff11020d92100 RBX: 0000000000000000 RCX: ffffffff81159ac7
      [  104.319366] RDX: 1ffffffff05d5e69 RSI: 0000000000000007 RDI: ffff888106c90d1c
      [  104.320339] RBP: 0000000000000000 R08: dffffc0000000000 R09: 0000000000000001
      [  104.321313] R10: ffffed1025d57ba0 R11: ffffed1025d57b9f R12: 1ffff11020d93fbf
      [  104.322328] R13: 0000000000000031 R14: ffff888106c90040 R15: 0000000000000000
      [  104.323307]  ? lockdep_hardirqs_on+0x26b/0x278
      [  104.323927]  default_idle+0xd9/0x1a8
      [  104.324427]  do_idle+0x162/0x2b2
      [  104.324891]  ? arch_cpu_idle_exit+0x28/0x28
      [  104.325467]  ? mark_held_locks+0x28/0x7f
      [  104.326031]  ? _raw_spin_unlock_irqrestore+0x44/0x6b
      [  104.326719]  cpu_startup_entry+0x1d/0x1f
      [  104.327261]  start_secondary+0x2cb/0x308
      [  104.327806]  ? set_cpu_sibling_map+0x8a3/0x8a3
      [  104.328421]  secondary_startup_64+0xa4/0xb0
      
      Fixes: b978962a ("blkcg: update blkg_lookup_create() to do locking")
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Dennis Zhou <dennis@kernel.org>
      Signed-off-by: 's avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      3a762de5
    • Christoph Hellwig's avatar
      scsi: block: remove the cluster flag · 38417468
      Christoph Hellwig authored
      Now that the the SCSI layer replaced the use of the cluster flag with
      segment size limits and the DMA boundary we can remove the cluster flag
      from the block layer.
      Signed-off-by: 's avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: 's avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: Martin K. Petersen's avatarMartin K. Petersen <martin.petersen@oracle.com>
      38417468
  17. 18 Dec, 2018 3 commits
  18. 17 Dec, 2018 9 commits
    • Ming Lei's avatar
      blk-mq: skip zero-queue maps in blk_mq_map_swqueue · e5edd5f2
      Ming Lei authored
      From 7e849dd9 ("nvme-pci: don't share queue maps"), the mapping
      table won't be initialized actually if map->nr_queues is zero, so
      we can't use blk_mq_map_queue_type() to retrieve hctx any more.
      
      This way still may cause broken mapping, fix it by skipping zero-queues
      maps in blk_mq_map_swqueue().
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Reviewed-by: 's avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: 's avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      e5edd5f2
    • Dennis Zhou's avatar
      block: fix blk-iolatency accounting underflow · 13369816
      Dennis Zhou authored
      The blk-iolatency controller measures the time from rq_qos_throttle() to
      rq_qos_done_bio() and attributes this time to the first bio that needs
      to create the request. This means if a bio is plug-mergeable or
      bio-mergeable, it gets to bypass the blk-iolatency controller.
      
      The recent series [1], to tag all bios w/ blkgs undermined how iolatency
      was determining which bios it was charging and should process in
      rq_qos_done_bio(). Because all bios are being tagged, this caused the
      atomic_t for the struct rq_wait inflight count to underflow and result
      in a stall.
      
      This patch adds a new flag BIO_TRACKED to let controllers know that a
      bio is going through the rq_qos path. blk-iolatency now checks if this
      flag is set to see if it should process the bio in rq_qos_done_bio().
      
      Overloading BLK_QUEUE_ENTERED works, but makes the flag rules confusing.
      BIO_THROTTLED was another candidate, but the flag is set for all bios
      that have gone through blk-throttle code. Overloading a flag comes with
      the burden of making sure that when either implementation changes, a
      change in setting rules for one doesn't cause a bug in the other. So
      here, we unfortunately opt for adding a new flag.
      
      [1] https://lore.kernel.org/lkml/20181205171039.73066-1-dennis@kernel.org/
      
      Fixes: 5cdf2e3f ("blkcg: associate blkg when associating a device")
      Signed-off-by: 's avatarDennis Zhou <dennis@kernel.org>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      13369816
    • Ming Lei's avatar
      blk-mq: fix dispatch from sw queue · c16d6b5a
      Ming Lei authored
      When a request is added to rq list of sw queue(ctx), the rq may be from
      a different type of hctx, especially after multi queue mapping is
      introduced.
      
      So when dispach request from sw queue via blk_mq_flush_busy_ctxs() or
      blk_mq_dequeue_from_ctx(), one request belonging to other queue type of
      hctx can be dispatched to current hctx in case that read queue or poll
      queue is enabled.
      
      This patch fixes this issue by introducing per-queue-type list.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: 's avatarMing Lei <ming.lei@redhat.com>
      
      Changed by me to not use separately cacheline aligned lists, just
      place them all in the same cacheline where we had just the one list
      and lock before.
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      c16d6b5a
    • Damien Le Moal's avatar
      block: mq-deadline: Fix write completion handling · 7211aef8
      Damien Le Moal authored
      For a zoned block device using mq-deadline, if a write request for a
      zone is received while another write was already dispatched for the same
      zone, dd_dispatch_request() will return NULL and the newly inserted
      write request is kept in the scheduler queue waiting for the ongoing
      zone write to complete. With this behavior, when no other request has
      been dispatched, rq_list in blk_mq_sched_dispatch_requests() is empty
      and blk_mq_sched_mark_restart_hctx() not called. This in turn leads to
      __blk_mq_free_request() call of blk_mq_sched_restart() to not run the
      queue when the already dispatched write request completes. The newly
      dispatched request stays stuck in the scheduler queue until eventually
      another request is submitted.
      
      This problem does not affect SCSI disk as the SCSI stack handles queue
      restart on request completion. However, this problem is can be triggered
      the nullblk driver with zoned mode enabled.
      
      Fix this by always requesting a queue restart in dd_dispatch_request()
      if no request was dispatched while WRITE requests are queued.
      
      Fixes: 5700f691 ("mq-deadline: Introduce zone locking support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: 's avatarDamien Le Moal <damien.lemoal@wdc.com>
      
      Add missing export of blk_mq_sched_restart()
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      7211aef8
    • Christoph Hellwig's avatar
      blk-mq: only dispatch to non-defauly queue maps if they have queues · 5aceaeb2
      Christoph Hellwig authored
      We should check if a given queue map actually has queues enabled before
      dispatching to it.  This allows drivers to not initialize optional but
      not used map types, which subsequently will allow fixing problems with
      queue map rebuilds for that case.
      Reviewed-by: 's avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: 's avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      5aceaeb2
    • Ming Lei's avatar
      blk-mq: export hctx->type in debugfs instead of sysfs · 346fc108
      Ming Lei authored
      Now we only export hctx->type via sysfs, and there isn't such info
      in hctx entry under debugfs. We often use debugfs only to diagnose
      queue mapping issue, so add the support in debugfs.
      
      Queue mapping becomes a bit more complicated after multiple queue
      mapping is supported, we may write blktest to verify if queue mapping
      is valid based on blk-mq-debugfs.
      
      Given not necessary to export hctx->type twice, so remove the export
      from sysfs.
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Reviewed-by: 's avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: 's avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      346fc108
    • Ming Lei's avatar
      blk-mq: fix allocation for queue mapping table · 07b35eb5
      Ming Lei authored
      Type of each element in queue mapping table is 'unsigned int,
      intead of 'struct blk_mq_queue_map)', so fix it.
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Reviewed-by: 's avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: 's avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      07b35eb5
    • Ming Lei's avatar
      blk-wbt: export internal state via debugfs · d19afebc
      Ming Lei authored
      This information is helpful to either investigate issues, or understand
      wbt's internal behaviour.
      
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Signed-off-by: 's avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      d19afebc
    • Ming Lei's avatar
      blk-mq-debugfs: support rq_qos · cc56694f
      Ming Lei authored
      blk-mq-debugfs has been proved as very helpful for debug some
      tough issues, such as IO hang.
      
      We have seen blk-wbt related IO hang several times, even inside
      Red Hat BZ, there is such report not sovled yet, so this patch
      adds support debugfs on rq_qos.
      
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Signed-off-by: 's avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: 's avatarJens Axboe <axboe@kernel.dk>
      cc56694f
  19. 16 Dec, 2018 2 commits