Skip to content

x86/mce: Prevent duplicate error records

JIRA: https://issues.redhat.com/browse/RHEL-24447
Tested: by me

v2: rebased patch, migrated to jira.

commit c3629dd7e67d6ec5705d33b0de0d142c972fe573
Author: Borislav Petkov (AMD) bp@alien8.de
Date: Wed Jul 19 14:19:50 2023 +0200

x86/mce: Prevent duplicate error records      
  
A legitimate use case of the MCA infrastructure is to have the firmware      
log all uncorrectable errors and also, have the OS see all correctable      
errors.      
  
The uncorrectable, UCNA errors are usually configured to be reported      
through an SMI. CMCI, which is the correctable error reporting      
interrupt, uses SMI too and having both enabled, leads to unnecessary      
overhead.      
  
So what ends up happening is, people disable CMCI in the wild and leave      
on only the UCNA SMI.      
  
When CMCI is disabled, the MCA infrastructure resorts to polling the MCA      
banks. If a MCA MSR is shared between the logical threads, one error      
ends up getting logged multiple times as the polling runs on every      
logical thread.      
  
Therefore, introduce locking on the Intel side of the polling routine to      
prevent such duplicate error records from appearing.      
  
Based on a patch by Aristeu Rozanski <aris@ruivo.org>.      
  
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>      
Tested-by: Tony Luck <tony.luck@intel.com>      
Acked-by: Aristeu Rozanski <aris@ruivo.org>      
Link: https://lore.kernel.org/r/20230515143225.GC4090740@cathedrallabs.org      

Signed-off-by: Aristeu Rozanski arozansk@redhat.com

Edited by Aristeu Rozanski

Merge request reports