1. 05 Dec, 2018 1 commit
  2. 28 Sep, 2018 1 commit
    • Dan Williams's avatar
      libnvdimm, region: Fail badblocks listing for inactive regions · 5d394eee
      Dan Williams authored
      While experimenting with region driver loading the following backtrace
      was triggered:
      
       INFO: trying to register non-static key.
       the code is fine but needs lockdep annotation.
       turning off the locking correctness validator.
       [..]
       Call Trace:
        dump_stack+0x85/0xcb
        register_lock_class+0x571/0x580
        ? __lock_acquire+0x2ba/0x1310
        ? kernfs_seq_start+0x2a/0x80
        __lock_acquire+0xd4/0x1310
        ? dev_attr_show+0x1c/0x50
        ? __lock_acquire+0x2ba/0x1310
        ? kernfs_seq_start+0x2a/0x80
        ? lock_acquire+0x9e/0x1a0
        lock_acquire+0x9e/0x1a0
        ? dev_attr_show+0x1c/0x50
        badblocks_show+0x70/0x190
        ? dev_attr_show+0x1c/0x50
        dev_attr_show+0x1c/0x50
      
      This results from a missing successful call to devm_init_badblocks()
      from nd_region_probe(). Block attempts to show badblocks while the
      region is not enabled.
      
      Fixes: 6a6bef90 ("libnvdimm: add mechanism to publish badblocks...")
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarDave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      5d394eee
  3. 25 Jul, 2018 2 commits
  4. 06 Jun, 2018 1 commit
    • Ross Zwisler's avatar
      libnvdimm, pmem: Do not flush power-fail protected CPU caches · 546eb031
      Ross Zwisler authored
      This commit:
      
      5fdf8e5b ("libnvdimm: re-enable deep flush for pmem devices via fsync()")
      
      intended to make sure that deep flush was always available even on
      platforms which support a power-fail protected CPU cache.  An unintended
      side effect of this change was that we also lost the ability to skip
      flushing CPU caches on those power-fail protected CPU cache.
      
      Fix this by skipping the low level cache flushing in dax_flush() if we have
      CPU caches which are power-fail protected.  The user can still override this
      behavior by manually setting the write_cache state of a namespace.  See
      libndctl's ndctl_namespace_write_cache_is_enabled(),
      ndctl_namespace_enable_write_cache() and
      ndctl_namespace_disable_write_cache() functions.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 5fdf8e5b ("libnvdimm: re-enable deep flush for pmem devices via fsync()")
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      546eb031
  5. 07 Apr, 2018 1 commit
  6. 03 Apr, 2018 1 commit
  7. 21 Mar, 2018 2 commits
    • Dan Williams's avatar
      libnvdimm, nfit: fix persistence domain reporting · fe9a552e
      Dan Williams authored
      The persistence domain is a point in the platform where once writes
      reach that destination the platform claims it will make them persistent
      relative to power loss. In the ACPI NFIT this is currently communicated
      as 2 bits in the "NFIT - Platform Capabilities Structure". The bits
      comprise a hierarchy, i.e. bit0 "CPU Cache Flush to NVDIMM Durability on
      Power Loss Capable" implies bit1 "Memory Controller Flush to NVDIMM
      Durability on Power Loss Capable".
      
      Commit 96c3a239 "libnvdimm: expose platform persistence attr..."
      shows the persistence domain as flags, but it's really an enumerated
      hierarchy.
      
      Fix this newly introduced user ABI to show the closest available
      persistence domain before userspace develops dependencies on seeing, or
      needing to develop code to tolerate, the raw NFIT flags communicated
      through the libnvdimm-generic region attribute.
      
      Fixes: 96c3a239 ("libnvdimm: expose platform persistence attr...")
      Reviewed-by: default avatarDave Jiang <dave.jiang@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      fe9a552e
    • Dan Williams's avatar
      libnvdimm, region: hide persistence_domain when unknown · 896196dc
      Dan Williams authored
      Similar to other region attributes, do not emit the persistence_domain
      attribute if its contents are empty.
      
      Fixes: 96c3a239 ("libnvdimm: expose platform persistence attr...")
      Cc: Dave Jiang <dave.jiang@intel.com>
      Reviewed-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      896196dc
  8. 01 Feb, 2018 1 commit
  9. 28 Sep, 2017 1 commit
  10. 05 Aug, 2017 1 commit
  11. 29 Jun, 2017 2 commits
  12. 27 Jun, 2017 3 commits
    • Dan Williams's avatar
      libnvdimm, nfit: enable support for volatile ranges · c9e582aa
      Dan Williams authored
      Allow volatile nfit ranges to participate in all the same infrastructure
      provided for persistent memory regions. A resulting resulting namespace
      device will still be called "pmem", but the parent region type will be
      "nd_volatile". This is in preparation for disabling the dax ->flush()
      operation in the pmem driver when it is hosted on a volatile range.
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      c9e582aa
    • Dan Williams's avatar
      libnvdimm, pmem: fix persistence warning · c00b396e
      Dan Williams authored
      The pmem driver assumes if platform firmware describes the memory
      devices associated with a persistent memory range and
      CONFIG_ARCH_HAS_PMEM_API=y that it has all the mechanism necessary to
      flush data to a power-fail safe zone. We warn if the firmware does not
      describe memory devices, but we also need to warn if the architecture
      does not claim pmem support.
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      c00b396e
    • Dan Williams's avatar
      x86, libnvdimm, pmem: remove global pmem api · ca6a4657
      Dan Williams authored
      Now that all callers of the pmem api have been converted to dax helpers that
      call back to the pmem driver, we can remove include/linux/pmem.h and
      asm/pmem.h.
      
      Cc: <x86@kernel.org>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Oliver O'Halloran <oohall@gmail.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      ca6a4657
  13. 15 Jun, 2017 1 commit
  14. 09 Jun, 2017 1 commit
    • Dan Williams's avatar
      x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations · 0aed55af
      Dan Williams authored
      The pmem driver has a need to transfer data with a persistent memory
      destination and be able to rely on the fact that the destination writes are not
      cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
      (non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
      to ensure data-writes have reached a power-fail-safe zone in the platform. The
      fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
      around and fence previous writes with an "sfence".
      
      Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
      memcpy_flushcache, that guarantee that the destination buffer is not dirty in
      the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
      will be used to replace the "pmem api" (include/linux/pmem.h +
      arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
      and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
      config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
      otherwise.
      
      This is meant to satisfy the concern from Linus that if a driver wants to do
      something beyond the normal nocache semantics it should be something private to
      that driver [1], and Al's concern that anything uaccess related belongs with
      the rest of the uaccess code [2].
      
      The first consumer of this interface is a new 'copy_from_iter' dax operation so
      that pmem can inject cache maintenance operations without imposing this
      overhead on other dax-capable drivers.
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
      [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html
      
      Cc: <x86@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Reviewed-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      0aed55af
  15. 04 May, 2017 1 commit
  16. 29 Apr, 2017 1 commit
    • Dan Williams's avatar
      libnvdimm: rework region badblocks clearing · 23f49844
      Dan Williams authored
      Toshi noticed that the new support for a region-level badblocks missed
      the case where errors are cleared due to BTT I/O.
      
      An initial attempt to fix this ran into a "sleeping while atomic"
      warning due to taking the nvdimm_bus_lock() in the BTT I/O path to
      satisfy the locking requirements of __nvdimm_bus_badblocks_clear().
      However, that lock is not needed since we are not acting on any data that
      is subject to change under that lock. The badblocks instance has its own
      internal lock to handle mutations of the error list.
      
      So, in order to make it clear that we are just acting on region devices,
      rename __nvdimm_bus_badblocks_clear() to nvdimm_clear_badblocks_regions().
      Eliminate the lock and consolidate all support routines for the new
      nvdimm_account_cleared_poison() in drivers/nvdimm/bus.c. Finally, to the
      opportunity to cleanup to some unnecessary casts, make the calling
      convention of nvdimm_clear_badblocks_regions() clearer by replacing struct
      resource with the minimal struct clear_badblocks_context, and use the
      DEVICE_ATTR macro.
      
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Reported-by: default avatarToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      23f49844
  17. 28 Apr, 2017 1 commit
    • Dan Williams's avatar
      libnvdimm, region: sysfs trigger for nvdimm_flush() · ab630891
      Dan Williams authored
      The nvdimm_flush() mechanism helps to reduce the impact of an ADR
      (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
      platform WPQ (write-pending-queue) buffers when power is removed. The
      nvdimm_flush() mechanism performs that same function on-demand.
      
      When a pmem namespace is associated with a block device, an
      nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
      request. These requests are typically associated with filesystem
      metadata updates. However, when a namespace is in device-dax mode,
      userspace (think database metadata) needs another path to perform the
      same flushing. In other words this is not required to make data
      persistent, but in the case of metadata it allows for a smaller failure
      domain in the unlikely event of an ADR failure.
      
      The new 'deep_flush' attribute is visible when the individual DIMMs
      backing a given interleave-set are described by platform firmware. In
      ACPI terms this is "NVDIMM Region Mapping Structures" and associated
      "Flush Hint Address Structures". Reads return "1" if the region supports
      triggering WPQ flushes on all DIMMs. Reads return "0" the flush
      operation is a platform nop, and in that case the attribute is
      read-only.
      
      Why sysfs and not an ioctl? An ioctl requires establishing a new
      ioctl function number space for device-dax. Given that this would be
      called on a device-dax fd an application could be forgiven for
      accidentally calling this on a filesystem-dax fd. Placing this interface
      in libnvdimm sysfs removes that potential for collision with a
      filesystem ioctl, and it keeps ioctls out of the generic device-dax
      implementation.
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      ab630891
  18. 24 Apr, 2017 1 commit
    • Dan Williams's avatar
      libnvdimm, region: fix flush hint detection crash · bc042fdf
      Dan Williams authored
      In the case where a dimm does not have any associated flush hints the
      ndrd->flush_wpq array may be uninitialized leading to crashes with the
      following signature:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
       IP: region_visible+0x10f/0x160 [libnvdimm]
      
       Call Trace:
        internal_create_group+0xbe/0x2f0
        sysfs_create_groups+0x40/0x80
        device_add+0x2d8/0x650
        nd_async_device_register+0x12/0x40 [libnvdimm]
        async_run_entry_fn+0x39/0x170
        process_one_work+0x212/0x6c0
        ? process_one_work+0x197/0x6c0
        worker_thread+0x4e/0x4a0
        kthread+0x10c/0x140
        ? process_one_work+0x6c0/0x6c0
        ? kthread_create_on_node+0x60/0x60
        ret_from_fork+0x31/0x40
      
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Fixes: f284a4f2 ("libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush()")
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      bc042fdf
  19. 13 Apr, 2017 2 commits
  20. 01 Mar, 2017 1 commit
    • Dan Williams's avatar
      nfit, libnvdimm: fix interleave set cookie calculation · 86ef58a4
      Dan Williams authored
      The interleave-set cookie is a sum that sanity checks the composition of
      an interleave set has not changed from when the namespace was initially
      created.  The checksum is calculated by sorting the DIMMs by their
      location in the interleave-set. The comparison for the sort must be
      64-bit wide, not byte-by-byte as performed by memcmp() in the broken
      case.
      
      Fix the implementation to accept correct cookie values in addition to
      the Linux "memcmp" order cookies, but only allow correct cookies to be
      generated going forward. It does mean that namespaces created by
      third-party-tooling, or created by newer kernels with this fix, will not
      validate on older kernels. However, there are a couple mitigating
      conditions:
      
          1/ platforms with namespace-label capable NVDIMMs are not widely
             available.
      
          2/ interleave-sets with a single-dimm are by definition not affected
             (nothing to sort). This covers the QEMU-KVM NVDIMM emulation case.
      
      The cookie stored in the namespace label will be fixed by any write the
      namespace label, the most straightforward way to achieve this is to
      write to the "alt_name" attribute of a namespace in sysfs.
      
      Cc: <stable@vger.kernel.org>
      Fixes: eaf96153 ("libnvdimm, nfit: add interleave-set state-tracking infrastructure")
      Reported-by: default avatarNicholas Moulin <nicholas.w.moulin@linux.intel.com>
      Tested-by: default avatarNicholas Moulin <nicholas.w.moulin@linux.intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      86ef58a4
  21. 16 Dec, 2016 1 commit
  22. 07 Oct, 2016 2 commits
    • Dan Williams's avatar
      libnvdimm, namespace: allow creation of multiple pmem-namespaces per region · 98a29c39
      Dan Williams authored
      Similar to BLK regions, publish new seed namespace devices to allow
      unused PMEM region capacity to be consumed by additional namespaces.
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      98a29c39
    • Dan Williams's avatar
      libnvdimm, region: update nd_region_available_dpa() for multi-pmem support · a1f3e4d6
      Dan Williams authored
      The free dpa (dimm-physical-address) space calculation reports how much
      free space is available with consideration for aliased BLK + PMEM
      regions.  Recall that BLK capacity is allocated from high addresses and
      PMEM is allocated from low addresses in their respective regions.
      
      nd_region_available_dpa() accounts for the fact that the largest
      encroachment (lowest starting address) into PMEM capacity by a BLK
      allocation limits the available capacity to that point, regardless if
      there is BLK allocation hole at a higher address.  Similarly, for the
      multi-pmem case we need to track the largest encroachment (highest
       ending address) of a PMEM allocation in BLK capacity regardless of
      whether there is an allocation hole that a BLK allocation could fill at
      a lower address.
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      a1f3e4d6
  23. 01 Oct, 2016 3 commits
  24. 24 Sep, 2016 1 commit
  25. 19 Sep, 2016 1 commit
    • Oliver O'Halloran's avatar
      nvdimm: fix PHYS_PFN/PFN_PHYS mixup · 480b6837
      Oliver O'Halloran authored
      nd_activate_region() iomaps any hint addresses required when activating
      a region. To prevent duplicate mappings it checks the PFN of the hint to
      be mapped against the PFNs of the already mapped hints. Unfortunately it
      doesn't convert the PFN back into a physical address before passing it
      to devm_nvdimm_ioremap(). Instead it applies PHYS_PFN a second time
      which ends about as well as you would imagine.
      Signed-off-by: default avatarOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      480b6837
  26. 11 Jul, 2016 4 commits
    • Dan Williams's avatar
      libnvdimm: cycle flush hints · 0c27af60
      Dan Williams authored
      When the NFIT provides multiple flush hint addresses per-dimm it is
      expressing that the platform is capable of processing multiple flush
      requests in parallel.  There is some fixed cost per flush request, let
      the cost be shared in parallel on multiple cpus.
      
      Since there may not be enough flush hint addresses for each cpu to have
      one, keep a per-cpu index of the last used hint, hash it with current
      pid, and assume that access pattern and scheduler randomness will keep
      the flush-hint usage somewhat staggered across cpus.
      
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      0c27af60
    • Dan Williams's avatar
      libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush() · f284a4f2
      Dan Williams authored
      nvdimm_flush() is a replacement for the x86 'pcommit' instruction.  It is
      an optional write flushing mechanism that an nvdimm bus can provide for
      the pmem driver to consume.  In the case of the NFIT nvdimm-bus-provider
      nvdimm_flush() is implemented as a series of flush-hint-address [1]
      writes to each dimm in the interleave set (region) that backs the
      namespace.
      
      The nvdimm_has_flush() routine relies on platform firmware to describe
      the flushing capabilities of a platform.  It uses the heuristic of
      whether an nvdimm bus provider provides flush address data to return a
      ternary result:
      
            1: flush addresses defined
            0: dimm topology described without flush addresses (assume ADR)
       -errno: no topology information, unable to determine flush mechanism
      
      The pmem driver is expected to take the following actions on this ternary
      result:
      
            1: nvdimm_flush() in response to REQ_FUA / REQ_FLUSH and shutdown
            0: do not set, WC or FUA on the queue, take no further action
       -errno: warn and then operate as if nvdimm_has_flush() returned '0'
      
      The caveat of this heuristic is that it can not distinguish the "dimm
      does not have flush address" case from the "platform firmware is broken
      and failed to describe a flush address".  Given we are already
      explicitly trusting the NFIT there's not much more we can do beyond
      blacklisting broken firmwares if they are ever encountered.
      
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      f284a4f2
    • Dan Williams's avatar
      libnvdimm, nfit: move flush hint mapping to region-device driver-data · e5ae3b25
      Dan Williams authored
      In preparation for triggering flushes of a DIMM's writes-posted-queue
      (WPQ) via the pmem driver move mapping of flush hint addresses to the
      region driver.  Since this uses devm_nvdimm_memremap() the flush
      addresses will remain mapped while any region to which the dimm belongs
      is active.
      
      We need to communicate more information to the nvdimm core to facilitate
      this mapping, namely each dimm object now carries an array of flush hint
      address resources.
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      e5ae3b25
    • Dan Williams's avatar
      libnvdimm, nfit: remove nfit_spa_map() infrastructure · a8a6d2e0
      Dan Williams authored
      Now that all shared mappings are handled by devm_nvdimm_memremap() we no
      longer need nfit_spa_map() nor do we need to trigger a callback to the
      bus provider at region disable time.
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      a8a6d2e0
  27. 21 May, 2016 1 commit
  28. 09 May, 2016 1 commit
    • Dan Williams's avatar
      libnvdimm, dax: introduce device-dax infrastructure · cd03412a
      Dan Williams authored
      Device DAX is the device-centric analogue of Filesystem DAX
      (CONFIG_FS_DAX).  It allows persistent memory ranges to be allocated and
      mapped without need of an intervening file system.  This initial
      infrastructure arranges for a libnvdimm pfn-device to be represented as
      a different device-type so that it can be attached to a driver other
      than the pmem driver.
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      cd03412a