1. 22 Jan, 2019 1 commit
    • Yufen Yu's avatar
      block: use rcu_work instead of call_rcu to avoid sleep in softirq · f3631a8b
      Yufen Yu authored
      commit 94a2c3a3 upstream.
      We recently got a stack by syzkaller like this:
      BUG: sleeping function called from invalid context at mm/slab.h:361
      in_atomic(): 1, irqs_disabled(): 0, pid: 6644, name: blkid
      INFO: lockdep is turned off.
      CPU: 1 PID: 6644 Comm: blkid Not tainted 4.4.163-514.55.6.9.x86_64+ #76
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
       0000000000000000 5ba6a6b879e50c00 ffff8801f6b07b10 ffffffff81cb2194
       0000000041b58ab3 ffffffff833c7745 ffffffff81cb2080 5ba6a6b879e50c00
       0000000000000000 0000000000000001 0000000000000004 0000000000000000
      Call Trace:
       <IRQ>  [<ffffffff81cb2194>] __dump_stack lib/dump_stack.c:15 [inline]
       <IRQ>  [<ffffffff81cb2194>] dump_stack+0x114/0x1a0 lib/dump_stack.c:51
       [<ffffffff8129a981>] ___might_sleep+0x291/0x490 kernel/sched/core.c:7675
       [<ffffffff8129ac33>] __might_sleep+0xb3/0x270 kernel/sched/core.c:7637
       [<ffffffff81794c13>] slab_pre_alloc_hook mm/slab.h:361 [inline]
       [<ffffffff81794c13>] slab_alloc_node mm/slub.c:2610 [inline]
       [<ffffffff81794c13>] slab_alloc mm/slub.c:2692 [inline]
       [<ffffffff81794c13>] kmem_cache_alloc_trace+0x2c3/0x5c0 mm/slub.c:2709
       [<ffffffff81cbe9a7>] kmalloc include/linux/slab.h:479 [inline]
       [<ffffffff81cbe9a7>] kzalloc include/linux/slab.h:623 [inline]
       [<ffffffff81cbe9a7>] kobject_uevent_env+0x2c7/0x1150 lib/kobject_uevent.c:227
       [<ffffffff81cbf84f>] kobject_uevent+0x1f/0x30 lib/kobject_uevent.c:374
       [<ffffffff81cbb5b9>] kobject_cleanup lib/kobject.c:633 [inline]
       [<ffffffff81cbb5b9>] kobject_release+0x229/0x440 lib/kobject.c:675
       [<ffffffff81cbb0a2>] kref_sub include/linux/kref.h:73 [inline]
       [<ffffffff81cbb0a2>] kref_put include/linux/kref.h:98 [inline]
       [<ffffffff81cbb0a2>] kobject_put+0x72/0xd0 lib/kobject.c:692
       [<ffffffff8216f095>] put_device+0x25/0x30 drivers/base/core.c:1237
       [<ffffffff81c4cc34>] delete_partition_rcu_cb+0x1d4/0x2f0 block/partition-generic.c:232
       [<ffffffff813c08bc>] __rcu_reclaim kernel/rcu/rcu.h:118 [inline]
       [<ffffffff813c08bc>] rcu_do_batch kernel/rcu/tree.c:2705 [inline]
       [<ffffffff813c08bc>] invoke_rcu_callbacks kernel/rcu/tree.c:2973 [inline]
       [<ffffffff813c08bc>] __rcu_process_callbacks kernel/rcu/tree.c:2940 [inline]
       [<ffffffff813c08bc>] rcu_process_callbacks+0x59c/0x1c70 kernel/rcu/tree.c:2957
       [<ffffffff8120f509>] __do_softirq+0x299/0xe20 kernel/softirq.c:273
       [<ffffffff81210496>] invoke_softirq kernel/softirq.c:350 [inline]
       [<ffffffff81210496>] irq_exit+0x216/0x2c0 kernel/softirq.c:391
       [<ffffffff82c2cd7b>] exiting_irq arch/x86/include/asm/apic.h:652 [inline]
       [<ffffffff82c2cd7b>] smp_apic_timer_interrupt+0x8b/0xc0 arch/x86/kernel/apic/apic.c:926
       [<ffffffff82c2bc25>] apic_timer_interrupt+0xa5/0xb0 arch/x86/entry/entry_64.S:746
       <EOI>  [<ffffffff814cbf40>] ? audit_kill_trees+0x180/0x180
       [<ffffffff8187d2f7>] fd_install+0x57/0x80 fs/file.c:626
       [<ffffffff8180989e>] do_sys_open+0x45e/0x550 fs/open.c:1043
       [<ffffffff818099c2>] SYSC_open fs/open.c:1055 [inline]
       [<ffffffff818099c2>] SyS_open+0x32/0x40 fs/open.c:1050
       [<ffffffff82c299e1>] entry_SYSCALL_64_fastpath+0x1e/0x9a
      In softirq context, we call rcu callback function delete_partition_rcu_cb(),
      which may allocate memory by kzalloc with GFP_KERNEL flag. If the
      allocation cannot be satisfied, it may sleep. However, That is not allowed
      in softirq contex.
      Although we found this problem on linux 4.4, the latest kernel version
      seems to have this problem as well. And it is very similar to the
      previous one:
      Fix it by using RCU workqueue, which allows sleep.
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      Signed-off-by: default avatarYufen Yu <yuyufen@huawei.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  2. 13 Jan, 2019 2 commits
    • Damien Le Moal's avatar
      block: mq-deadline: Fix write completion handling · d902258a
      Damien Le Moal authored
      commit 7211aef8 upstream.
      For a zoned block device using mq-deadline, if a write request for a
      zone is received while another write was already dispatched for the same
      zone, dd_dispatch_request() will return NULL and the newly inserted
      write request is kept in the scheduler queue waiting for the ongoing
      zone write to complete. With this behavior, when no other request has
      been dispatched, rq_list in blk_mq_sched_dispatch_requests() is empty
      and blk_mq_sched_mark_restart_hctx() not called. This in turn leads to
      __blk_mq_free_request() call of blk_mq_sched_restart() to not run the
      queue when the already dispatched write request completes. The newly
      dispatched request stays stuck in the scheduler queue until eventually
      another request is submitted.
      This problem does not affect SCSI disk as the SCSI stack handles queue
      restart on request completion. However, this problem is can be triggered
      the nullblk driver with zoned mode enabled.
      Fix this by always requesting a queue restart in dd_dispatch_request()
      if no request was dispatched while WRITE requests are queued.
      Fixes: 5700f691 ("mq-deadline: Introduce zone locking support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Add missing export of blk_mq_sched_restart()
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    • Ming Lei's avatar
      block: deactivate blk_stat timer in wbt_disable_default() · 7571b18b
      Ming Lei authored
      commit 544fbd16 upstream.
      rwb_enabled() can't be changed when there is any inflight IO.
      wbt_disable_default() may set rwb->wb_normal as zero, however the
      blk_stat timer may still be pending, and the timer function will update
      wrb->wb_normal again.
      This patch introduces blk_stat_deactivate() and applies it in
      wbt_disable_default(), then the following IO hang triggered when running
      parted & switching io scheduler can be fixed:
      [  369.937806] INFO: task parted:3645 blocked for more than 120 seconds.
      [  369.938941]       Not tainted 4.20.0-rc6-00284-g906c801e5248 #498
      [  369.939797] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  369.940768] parted          D    0  3645   3239 0x00000000
      [  369.941500] Call Trace:
      [  369.941874]  ? __schedule+0x6d9/0x74c
      [  369.942392]  ? wbt_done+0x5e/0x5e
      [  369.942864]  ? wbt_cleanup_cb+0x16/0x16
      [  369.943404]  ? wbt_done+0x5e/0x5e
      [  369.943874]  schedule+0x67/0x78
      [  369.944298]  io_schedule+0x12/0x33
      [  369.944771]  rq_qos_wait+0xb5/0x119
      [  369.945193]  ? karma_partition+0x1c2/0x1c2
      [  369.945691]  ? wbt_cleanup_cb+0x16/0x16
      [  369.946151]  wbt_wait+0x85/0xb6
      [  369.946540]  __rq_qos_throttle+0x23/0x2f
      [  369.947014]  blk_mq_make_request+0xe6/0x40a
      [  369.947518]  generic_make_request+0x192/0x2fe
      [  369.948042]  ? submit_bio+0x103/0x11f
      [  369.948486]  ? __radix_tree_lookup+0x35/0xb5
      [  369.949011]  submit_bio+0x103/0x11f
      [  369.949436]  ? blkg_lookup_slowpath+0x25/0x44
      [  369.949962]  submit_bio_wait+0x53/0x7f
      [  369.950469]  blkdev_issue_flush+0x8a/0xae
      [  369.951032]  blkdev_fsync+0x2f/0x3a
      [  369.951502]  do_fsync+0x2e/0x47
      [  369.951887]  __x64_sys_fsync+0x10/0x13
      [  369.952374]  do_syscall_64+0x89/0x149
      [  369.952819]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  369.953492] RIP: 0033:0x7f95a1e729d4
      [  369.953996] Code: Bad RIP value.
      [  369.954456] RSP: 002b:00007ffdb570dd48 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
      [  369.955506] RAX: ffffffffffffffda RBX: 000055c2139c6be0 RCX: 00007f95a1e729d4
      [  369.956389] RDX: 0000000000000001 RSI: 0000000000001261 RDI: 0000000000000004
      [  369.957325] RBP: 0000000000000002 R08: 0000000000000000 R09: 000055c2139c6ce0
      [  369.958199] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c2139c0380
      [  369.959143] R13: 0000000000000004 R14: 0000000000000100 R15: 0000000000000008
      Cc: stable@vger.kernel.org
      Cc: Paolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  3. 11 Dec, 2018 1 commit
  4. 10 Dec, 2018 1 commit
  5. 07 Dec, 2018 2 commits
    • Jens Axboe's avatar
      blk-mq: punt failed direct issue to dispatch list · c616cbee
      Jens Axboe authored
      After the direct dispatch corruption fix, we permanently disallow direct
      dispatch of non read/write requests. This works fine off the normal IO
      path, as they will be retried like any other failed direct dispatch
      request. But for the blk_insert_cloned_request() that only DM uses to
      bypass the bottom level scheduler, we always first attempt direct
      dispatch. For some types of requests, that's now a permanent failure,
      and no amount of retrying will make that succeed. This results in a
      Instead of making special cases for what we can direct issue, and now
      having to deal with DM solving the livelock while still retaining a BUSY
      condition feedback loop, always just add a request that has been through
      ->queue_rq() to the hardware queue dispatch list. These are safe to use
      as no merging can take place there. Additionally, if requests do have
      prepped data from drivers, we aren't dependent on them not sharing space
      in the request structure to safely add them to the IO scheduler lists.
      This basically reverts ffe81d45 and is based on a patch from Ming,
      but with the list insert case covered as well.
      Fixes: ffe81d45 ("blk-mq: fix corruption with direct issue")
      Cc: stable@vger.kernel.org
      Suggested-by: default avatarMing Lei <ming.lei@redhat.com>
      Reported-by: default avatarBart Van Assche <bvanassche@acm.org>
      Tested-by: default avatarMing Lei <ming.lei@redhat.com>
      Acked-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    • Paolo Valente's avatar
      block, bfq: fix decrement of num_active_groups · ba7aeae5
      Paolo Valente authored
      Since commit '2d29c9f8 ("block, bfq: improve asymmetric scenarios
      detection")', if there are process groups with I/O requests waiting for
      completion, then BFQ tags the scenario as 'asymmetric'. This detection
      is needed for preserving service guarantees (for details, see comments
      on the computation * of the variable asymmetric_scenario in the
      function bfq_better_to_idle).
      Unfortunately, commit '2d29c9f8 ("block, bfq: improve asymmetric
      scenarios detection")' contains an error exactly in the updating of
      the number of groups with I/O requests waiting for completion: if a
      group has more than one descendant process, then the above number of
      groups, which is renamed from num_active_groups to a more appropriate
      num_groups_with_pending_reqs by this commit, may happen to be wrongly
      decremented multiple times, namely every time one of the descendant
      processes gets all its pending I/O requests completed.
      A correct, complete solution should work as follows. Consider a group
      that is inactive, i.e., that has no descendant process with pending
      I/O inside BFQ queues. Then suppose that num_groups_with_pending_reqs
      is still accounting for this group, because the group still has some
      descendant process with some I/O request still in
      flight. num_groups_with_pending_reqs should be decremented when the
      in-flight request of the last descendant process is finally completed
      (assuming that nothing else has changed for the group in the meantime,
      in terms of composition of the group and active/inactive state of
      child groups and processes). To accomplish this, an additional
      pending-request counter must be added to entities, and must be
      updated correctly.
      To avoid this additional field and operations, this commit resorts to
      the following tradeoff between simplicity and accuracy: for an
      inactive group that is still counted in num_groups_with_pending_reqs,
      this commit decrements num_groups_with_pending_reqs when the first
      descendant process of the group remains with no request waiting for
      This simplified scheme provides a fix to the unbalanced decrements
      introduced by 2d29c9f8. Since this error was also caused by lack
      of comments on this non-trivial issue, this commit also adds related
      Fixes: 2d29c9f8 ("block, bfq: improve asymmetric scenarios detection")
      Reported-by: Steven Barrett's avatarSteven Barrett <steven@liquorix.net>
      Tested-by: Steven Barrett's avatarSteven Barrett <steven@liquorix.net>
      Tested-by: Piotr Górski's avatarLucjan Lucjanov <lucjan.lucjanov@gmail.com>
      Reviewed-by: Federico Motta's avatarFederico Motta <federico@willer.it>
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
  6. 05 Dec, 2018 1 commit
    • Jens Axboe's avatar
      blk-mq: fix corruption with direct issue · ffe81d45
      Jens Axboe authored
      If we attempt a direct issue to a SCSI device, and it returns BUSY, then
      we queue the request up normally. However, the SCSI layer may have
      already setup SG tables etc for this particular command. If we later
      merge with this request, then the old tables are no longer valid. Once
      we issue the IO, we only read/write the original part of the request,
      not the new state of it.
      This causes data corruption, and is most often noticed with the file
      system complaining about the just read data being invalid:
      [  235.934465] EXT4-fs error (device sda1): ext4_iget:4831: inode #7142: comm dpkg-query: bad extra_isize 24937 (inode size 256)
      because most of it is garbage...
      This doesn't happen from the normal issue path, as we will simply defer
      the request to the hardware queue dispatch list if we fail. Once it's on
      the dispatch list, we never merge with it.
      Fix this from the direct issue path by flagging the request as
      REQ_NOMERGE so we don't change the size of it before issue.
      See also:
        https://bugzilla.kernel.org/show_bug.cgi?id=201685Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Fixes: 6ce3dd6e ("blk-mq: issue directly if hw queue isn't busy in case of 'none'")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
  7. 30 Nov, 2018 1 commit
    • Ming Lei's avatar
      block: fix single range discard merge · 2a5cf35c
      Ming Lei authored
      There are actually two kinds of discard merge:
      - one is the normal discard merge, just like normal read/write request,
      and call it single-range discard
      - another is the multi-range discard, queue_max_discard_segments(rq->q) > 1
      For the former case, queue_max_discard_segments(rq->q) is 1, and we
      should handle this kind of discard merge like the normal read/write
      This patch fixes the following kernel panic issue[1], which is caused by
      not removing the single-range discard request from elevator queue.
      Guangwu has one raid discard test case, in which this issue is a bit
      easier to trigger, and I verified that this patch can fix the kernel
      panic issue in Guangwu's test case.
      [1] kernel panic log from Jens's report
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000148
       PGD 0 P4D 0.
       Oops: 0000 [#1] SMP PTI
       CPU: 37 PID: 763 Comm: kworker/37:1H Not tainted \
      4.20.0-rc3-00649-ge64d9a554a91-dirty #14  Hardware name: Wiwynn \
      Leopard-Orv2/Leopard-DDR BW, BIOS LBM08   03/03/2017       Workqueue: kblockd \
      blk_mq_run_work_fn                                            RIP: \
      0010:blk_mq_get_driver_tag+0x81/0x120                                       Code: 24 \
      10 48 89 7c 24 20 74 21 83 fa ff 0f 95 c0 48 8b 4c 24 28 65 48 33 0c 25 28 00 00 00 \
      0f 85 96 00 00 00 48 83 c4 30 5b 5d c3 <48> 8b 87 48 01 00 00 8b 40 04 39 43 20 72 37 \
      f6 87 b0 00 00 00 02  RSP: 0018:ffffc90004aabd30 EFLAGS: 00010246                     \
        RAX: 0000000000000003 RBX: ffff888465ea1300 RCX: ffffc90004aabde8
       RDX: 00000000ffffffff RSI: ffffc90004aabde8 RDI: 0000000000000000
       RBP: 0000000000000000 R08: ffff888465ea1348 R09: 0000000000000000
       R10: 0000000000001000 R11: 00000000ffffffff R12: ffff888465ea1300
       R13: 0000000000000000 R14: ffff888465ea1348 R15: ffff888465d10000
       FS:  0000000000000000(0000) GS:ffff88846f9c0000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000148 CR3: 000000000220a003 CR4: 00000000003606e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        ? elv_rb_del+0x11/0x30
        ? process_one_work+0x350/0x350
        ? kthread_destroy_worker+0x50/0x50
       Modules linked in: sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel \
      kvm switchtec irqbypass iTCO_wdt iTCO_vendor_support efivars cdc_ether usbnet mii \
      cdc_acm i2c_i801 lpc_ich mfd_core ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq \
      button sch_fq_codel nfsd nfs_acl lockd grace auth_rpcgss oid_registry sunrpc nvme \
      nvme_core fuse sg loop efivarfs autofs4  CR2: 0000000000000148                        \
       ---[ end trace 340a1fb996df1b9b ]---
       RIP: 0010:blk_mq_get_driver_tag+0x81/0x120
       Code: 24 10 48 89 7c 24 20 74 21 83 fa ff 0f 95 c0 48 8b 4c 24 28 65 48 33 0c 25 28 \
      00 00 00 0f 85 96 00 00 00 48 83 c4 30 5b 5d c3 <48> 8b 87 48 01 00 00 8b 40 04 39 43 \
      20 72 37 f6 87 b0 00 00 00 02
      Fixes: 445251d0 ("blk-mq: fix discard merge with scheduler attached")
      Reported-by: default avatarJens Axboe <axboe@kernel.dk>
      Cc: Guangwu Zhang <guazhang@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
  8. 14 Nov, 2018 2 commits
    • Ming Lei's avatar
      SCSI: fix queue cleanup race before queue initialization is done · 8dc765d4
      Ming Lei authored
      c2856ae2 ("blk-mq: quiesce queue before freeing queue") has
      already fixed this race, however the implied synchronize_rcu()
      in blk_mq_quiesce_queue() can slow down LUN probe a lot, so caused
      performance regression.
      Then 1311326c ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()")
      tried to quiesce queue for avoiding unnecessary synchronize_rcu()
      only when queue initialization is done, because it is usual to see
      lots of inexistent LUNs which need to be probed.
      However, turns out it isn't safe to quiesce queue only when queue
      initialization is done. Because when one SCSI command is completed,
      the user of sending command can be waken up immediately, then the
      scsi device may be removed, meantime the run queue in scsi_end_request()
      is still in-progress, so kernel panic can be caused.
      In Red Hat QE lab, there are several reports about this kind of kernel
      panic triggered during kernel booting.
      This patch tries to address the issue by grabing one queue usage
      counter during freeing one request and the following run queue.
      Fixes: 1311326c ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()")
      Cc: Andrew Jones <drjones@redhat.com>
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: linux-scsi@vger.kernel.org
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
      Cc: stable <stable@vger.kernel.org>
      Cc: jianchao.wang <jianchao.w.wang@oracle.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    • Dave Chinner's avatar
      block: fix 32 bit overflow in __blkdev_issue_discard() · 4800bf7b
      Dave Chinner authored
      A discard cleanup merged into 4.20-rc2 causes fstests xfs/259 to
      fall into an endless loop in the discard code. The test is creating
      a device that is exactly 2^32 sectors in size to test mkfs boundary
      conditions around the 32 bit sector overflow region.
      mkfs issues a discard for the entire device size by default, and
      hence this throws a sector count of 2^32 into
      blkdev_issue_discard(). It takes the number of sectors to discard as
      a sector_t - a 64 bit value.
      The commit ba5d7385 ("block: cleanup __blkdev_issue_discard")
      takes this sector count and casts it to a 32 bit value before
      comapring it against the maximum allowed discard size the device
      has. This truncates away the upper 32 bits, and so if the lower 32
      bits of the sector count is zero, it starts issuing discards of
      length 0. This causes the code to fall into an endless loop, issuing
      a zero length discards over and over again on the same sector.
      Fixes: ba5d7385 ("block: cleanup __blkdev_issue_discard")
      Tested-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Killed pointless WARN_ON().
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
  9. 12 Nov, 2018 1 commit
  10. 09 Nov, 2018 3 commits
  11. 07 Nov, 2018 2 commits
  12. 02 Nov, 2018 1 commit
  13. 31 Oct, 2018 2 commits
  14. 29 Oct, 2018 1 commit
  15. 26 Oct, 2018 1 commit
  16. 25 Oct, 2018 10 commits
  17. 23 Oct, 2018 1 commit
  18. 21 Oct, 2018 1 commit
  19. 20 Oct, 2018 1 commit
  20. 18 Oct, 2018 1 commit
  21. 16 Oct, 2018 1 commit
  22. 14 Oct, 2018 1 commit
  23. 13 Oct, 2018 2 commits
    • Jianchao Wang's avatar
      blk-mq: fallback to previous nr_hw_queues when updating fails · e01ad46d
      Jianchao Wang authored
      When we try to increate the nr_hw_queues, we may fail due to
      shortage of memory or other reason, then blk_mq_realloc_hw_ctxs stops
      and some entries in q->queue_hw_ctx are left with NULL. However,
      because queue map has been updated with new nr_hw_queues, some cpus
      have been mapped to hw queue which just encounters allocation failure,
      thus blk_mq_map_queue could return NULL. This will cause panic in
      following blk_mq_map_swqueue.
      To fix it, when increase nr_hw_queues fails, fallback to previous
      nr_hw_queues and post warning. At the same time, driver's .map_queues
      usually use completion irq affinity to map hw and cpu, fallback
      nr_hw_queues will cause lack of some cpu's map to hw, so use default
      blk_mq_map_queues to do that.
      Reported-by: syzbot+83e8cbe702263932d9d4@syzkaller.appspotmail.com
      Signed-off-by: default avatarJianchao Wang <jianchao.w.wang@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    • Jianchao Wang's avatar
      blk-mq: realloc hctx when hw queue is mapped to another node · 34d11ffa
      Jianchao Wang authored
      When the hw queues and mq_map are updated, a hctx could be mapped
      to a different numa node. At this moment, we need to realloc the
      hctx. If fail to do that, go on using previous hctx.
      Signed-off-by: default avatarJianchao Wang <jianchao.w.wang@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>