Skip to content

powerpc/papr_scm: Implement initial support for injecting smart errors

Steve Best requested to merge sfbest/centos-stream-9:1873827 into main

Bugzilla: http://bugzilla.redhat.com/1873827

Build Info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=44131487

Tested: Successfully tested by IBM. Sanity boot tested on power 9 (ibm-p9z-20-lp23) system.

Conflicts: removed the new line at the end of hunk for /Documentation/ABI/ testing/sysfs-bus-papr-pmem file so the patch would apply cleanly.

[sbest@sbest rhel91873827]$ git am /home/sbest/1873827/0001.diff Applying: powerpc/papr_scm: Implement initial support for injecting smart errors .git/rebase-apply/patch:20: new blank line at EOF. + warning: 1 line adds whitespace errors.

Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git

commit bbbca72352bb9484bc057c91a408332b35ee8f4c Author: Vaibhav Jain vaibhav@linux.ibm.com Date: Tue Jan 25 01:52:04 2022 +0530

powerpc/papr_scm: Implement initial support for injecting smart errors

Presently PAPR doesn't support injecting smart errors on an
NVDIMM. This makes testing the NVDIMM health reporting functionality
difficult as simulating NVDIMM health related events need a hacked up
qemu version.

To solve this problem this patch proposes simulating certain set of
NVDIMM health related events in papr_scm. Specifically 'fatal' health
state and 'dirty' shutdown state. These error can be injected via the
user-space 'ndctl-inject-smart(1)' command. With the proposed patch and
corresponding ndctl patches following command flow is expected:

$ sudo ndctl list -DH -d nmem0
...
      "health_state":"ok",
      "shutdown_state":"clean",
...
$ sudo ndctl inject-smart nmem0 -Uf
...
      "health_state":"fatal",
      "shutdown_state":"dirty",
...
$ sudo ndctl inject-smart nmem0 -N
...
      "health_state":"ok",
      "shutdown_state":"clean",
...

The patch adds a new member 'health_bitmap_inject_mask' inside struct
papr_scm_priv which is then bitwise ANDed to the health bitmap fetched from the
hypervisor. The value for 'health_bitmap_inject_mask' is accessible from sysfs
at nmemX/papr/health_bitmap_inject.

A new PDSM named 'SMART_INJECT' is proposed that accepts newly
introduced 'struct nd_papr_pdsm_smart_inject' as payload thats
exchanged between libndctl and papr_scm to indicate the requested
smart-error states.

When the processing the PDSM 'SMART_INJECT', papr_pdsm_smart_inject()
constructs a pair or 'inject_mask' and 'clear_mask' bitmaps from the payload
and bit-blt it to the 'health_bitmap_inject_mask'. This ensures the after being
fetched from the hypervisor, the health_bitmap reflects requested smart-error
states.

Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220124202204.1488346-1-vaibhav@linux.ibm.com

Signed-off-by: Steve Best sbest@redhat.com

Edited by Steve Best

Merge request reports