Skip to content
Snippets Groups Projects
  1. Jun 26, 2024
    • George Dunlap's avatar
    • George Dunlap's avatar
      x86/svm: Add a trace for VMEXIT_VMRUN · 971a3a2d
      George Dunlap authored
      
      Note that this trace is SVM-specific. Most HVM handler traces are
      shared between VMX and SVM because the underlying instruction set is
      largely the equivalent; but in this case, the instructions are
      different enough that there's no sensible way to share HVM handler
      traces between them.
      
      Keeping the target VMCB address should allow future analysis of which
      L2 vcpu within an L1 is running.
      
      Signed-off-by: default avatarGeorge Dunlap <george.dunlap@cloud.com>
      971a3a2d
    • George Dunlap's avatar
      xenalyze: Basic processing for XSETBV exits and handlers · 271e8849
      George Dunlap authored
      
      Basically this means adding VMEXIT strings for XSETBV exit, and adding
      the handlers and strings for them.
      
      Signed-off-by: default avatarGeorge Dunlap <george.dunlap@cloud.com>
      271e8849
    • George Dunlap's avatar
      x86/trace: Add trace to xsetbv svm/vmx handler path · 73179a65
      George Dunlap authored
      
      There are already "HVM handler" trace records for writing to XCRs in
      the context of an HVM guest.  This trace is currently taken in
      hvmemul_write_xcr.
      
      However, both VMX and SVM vmexits call hvm_handle_xsetbv as a result
      of an XSETBV vmexit, and hvm_handle_xsetbv calls x86emul_write_xcr
      directly, bypassing the trace, resulting in no "HVM handler" trace
      record for that VMEXIT.
      
      For maximal DRY-ness, we would want hvm_handle_xsetbv to call
      hvmemul_write_xcr; but since the intent seems to be for hvmemul_* to
      be only accesible via hvm_emulate(), just duplicate the trace.
      
      Signed-off-by: default avatarGeorge Dunlap <george.dunlap@cloud.com>
      ---
      CC: Andrew Cooper <andrew.cooper@cloud.com>
      CC: Jan Beulich <jbeulich@suse.com>
      CC: Roger Pau Monne <roger.pau@cloud.com>
      73179a65
    • George Dunlap's avatar
      xenalyze: Quiet warnings about VMEXIT_IOIO · 984db0d2
      George Dunlap authored
      
      There's a general issue with both PIO and MMIO reads (as detailed in
      the comment); do a work-around for now.
      
      Signed-off-by: default avatarGeorge Dunlap <george.dunlap@cloud.com>
      984db0d2
    • George Dunlap's avatar
      x86/emulate: Don't trace cr reads during emulation · d5fce446
      George Dunlap authored
      
      hvmemul_read_cr ends up being called fairly freqently during
      emulation; these are generally not necessary for understanding a
      trace, and make processing more complicated (because they show up as
      extra trace records within a vmexit/entry "arc" that must be
      classified).
      
      Leave the trace in write_cr, as any hypothetical writes to CRs *will*
      be necessary to understand traces.
      
      Signed-off-by: default avatarGeorge Dunlap <george.dunlap@cloud.com>
      d5fce446
    • George Dunlap's avatar
      svm: Do NPF trace before calling hvm_hap_nested_page_fault · be7f8889
      George Dunlap authored
      
      Unfortunately I've forgotten exactly why I made this change.  I
      suspect that there were other traces (like MMIO traces) which were
      being put before the NPF trace; but to understand the trace record
      you'd want to have the NPF information first.
      
      Signed-off-by: default avatarGeorge Dunlap <georg.dunlap@cloud.com>
      be7f8889
    • George Dunlap's avatar
      xen/hvm: Don't skip MSR_READ trace record · 1a761e7b
      George Dunlap authored
      
      Commit 37f074a3 ("x86/msr: introduce guest_rdmsr()") introduced a
      function to combine the MSR_READ handling between PV and HVM.
      Unfortunately, by returning directly, it skipped the trace generation,
      leading to gaps in the trace record, as well as xenalyze errors like
      this:
      
      hvm_generic_postprocess: d2v0 Strange, exit 7c(VMEXIT_MSR) missing a handler
      
      Replace the `return` with `goto out`.
      
      Fixes: 37f074a3 ("x86/msr: introduce guest_rdmsr()")
      Signed-off-by: default avatarGeorge Dunlap <george.dunlap@cloud.com>
      1a761e7b
    • George Dunlap's avatar
      xen/svm: Remove redundant HVM_HANDLER trace for EXCEPTION_AC · 2cdf9030
      George Dunlap authored
      
      Adding an HVM_TRAP trace record is redundant for EXCEPTION_AC: it adds
      trace volume without adding any information, and xenalyze already
      knows not to expect it.  Remove it.
      
      Signed-off-by: default avatarGeorge Dunlap <george.dunlap@cloud.com>
      2cdf9030
    • George Dunlap's avatar
      xenalyze: Ignore vmexits where an HVM_HANDLER traces would be redundant · 3c15dee2
      George Dunlap authored
      
      VMX combines all exceptions into a single VMEXIT exit reason, and so
      needs a separate HVM_HANDLER trace record (HVM_TRAP) to say which
      exception happened; but for a number of exceptions, no further
      information is put into the trace log.
      
      SVM, by contrast, has a separate VMEXIT exit reason for each exception
      vector, so HVM_TRAP would be redundant.
      
      NB that VMEXIT_EXCEPTION_DB doesn't have an HVM_HANDLER for SVM yet;
      but for VMX, there's a specific HVM_HANDLER trace record which
      includes more information; so SVM really should record information as
      well.
      
      Signed-off-by: default avatarGeorge Dunlap <george.dunlap@cloud.com>
      3c15dee2
    • George Dunlap's avatar
      xenalyze: Track generic event information when not in summary mode · abb43c22
      George Dunlap authored
      
      Generally speaking, a VMEXIT/VMENTRY cycle should have at least three
      trace records: the VMEXIT trace (which contains the processor-specific
      exit code), a more generic Xen-based Xen event (an HVM_HANDLER trace
      record), and a VMEXIT trace; and any given VMEXIT exit reason should
      only have a single HVM_HANDLER trace type.  Having duplicate or
      missing HVM_HANDLER traces is generally indicative of a problem that's
      crept in in the hypervisor tracing scheme.
      
      This is property is checked in hvm_generic_postprocess(), and
      violations are flagged with a warning.  In order to do this, when an
      HVM trace record that doesn't have a specific post-processor happens,
      information about the HVM trace record is stored in
      hvm_data->inflight.generic.
      
      Unfortunately, while the check was being done in all "modes", the
      relevant information was only being copied into inflight.generic in
      summary mode.  This resulted in spurious warnings about missing
      HVM_HANDLER traces when running in dump mode.
      
      Since running in dump mode is often critical to understanding how the
      warnings came about, just collect the information always as well.
      
      That said, the data from the trace doesn't appear to be used by
      anyone; so to save some time, don't bother copying it.
      
      Signed-off-by: default avatarGeorge Dunlap <george.dunlap@cloud.com>
      abb43c22
    • George Dunlap's avatar
      xenalyze: Basic nested virt processing · 8904d367
      George Dunlap authored
      
      On SVM, VMEXIT that occur in when an L1 is in non-root mode ("nested
      exits") are tagged with the TRC_HVM_NESTEDFLAG flag.  Expand xenalyze
      to do basic handling of these records:
      
      - Refactor hvm_process to filter out both TRC_HVM_NESTEDFLAG and
        TRC_64_FLAG when deciding how to process
      
      - Create separate "trace volume" sub-categories for them (HVM_VOL_*),
        tweaking layout so things continue to align
      
      - Visually distinquish nested from non-nested vmexits and vmentries in
        dump mode
      
      At the moment, nested VMEXITs which are passed through to the L1
      hypervisor won't have any HVM handlers; note in hvm_data when a vmexit
      was nested, and don't warn when no handlers occur.
      
      While here:
       - Expand one of the error messages with a bit more information
       - Replace repeated `ri->event & TRC_64_FLAG` instances with `flag64`
       - Remove a stray whitespace at the end of a line
       - Add a missing newline to a warning statement
      
      Signed-off-by: default avatarGeorge Dunlap <george.dunlap@cloud.com>
      8904d367
    • George Dunlap's avatar
      x86/cpu-policy: HACK Disable PCID when nested virt is enabled · 3871627f
      George Dunlap authored
      
      The non-nested HVM code knows how to provide PCID functionality
      (non-zero values in the lower 12 bits of CR3 when running in 64-bit
      mode), but the nested code doesn't.  If the L2 decides to use the PCID
      functionality, the L0 will fail the next L1 VMENTRY.
      
      Long term we definitely want to enable this feature, but for now, just
      hide it from guests when nested HVM is enabled.
      
      Signed-off-by: default avatarGeorge Dunlap <george.dunlap@cloud.com>
      3871627f
    • George Dunlap's avatar
      x86/cpuid-policy: Add AMD SVM CPUID leaf to featureset · 9007ab5a
      George Dunlap authored
      
      NOTE: This patch is be partially superseded by Andrew Cooper's "x86:
      AMD CPUID handling improvements" series.
      
      Currently, the CPUID leaf for SVM features (extd 0xa.edx) is manually
      twiddled:
      
       - hvm_max_policy takes host_policy and clamps it to supported
         features (with some features unilaterally enabled because they're
         always emulated
      
       - hvm_default_policy is copied from there
      
       - When recalculate_policy() is called for a guest, if SVM is clear,
         then the entire leaf is zeroed out.
      
      Move to a mode where the extended features are off by default, and
      enabled when nested_virt is enabled.
      
      In cpufeatureset.h, define a new featureset word for the AMD SVM
      features, and declare all of the bits defined in
      x86/include/asm/hvm/svm/svm.h.  Mark the ones we currently pass
      through to the "max policy" as HAP-only and optional.
      
      In cpu-policy.h, define FEATURESET_ead, and convert the un-named space
      in struct_cpu_policy into the appropriate union.  FIXME: Do this in a
      prerequisite patch, and change all references to p->extd.raw[0xa].
      
      Update x86_cpu_X_to_Y and Y_to_X to copy this into and out of the
      appropriate leaf.
      
      Populate this during boot in generic_identify().
      
      Add the new featureset definition into libxl_cpuid.c.
      
      Update the code in calculate_hvm_max_policy() to do nothing with the
      "normal" CPUID bits, and use the feature bit to unconditionally enable
      VMCBCLEAN. FIXME Move this to a follow-up patch.
      
      In recalculate_cpuid_policy(), enable max_fs when nested_hvm() is
      true.
      
      Signed-off-by: default avatarGeorge Dunlap <george.dunlap@cloud.com>
      ---
      CC: Andrew Cooper <andrew.cooper3@citrix.com>
      CC: Jan Beulich <jbeulich@suse.com>
      CC: Roger Pau Monne <roger.pau@cloud.com>
      9007ab5a
    • Jan Beulich's avatar
      Config.mk: update MiniOS commit · 4712e3b3
      Jan Beulich authored
      
      Pull in the gcc14 build fix there.
      
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Release-Acked-by: default avatarOleksii Kurochko <oleksii.kurochko@gmail.com>
      4712e3b3
  2. Jun 25, 2024
  3. Jun 24, 2024
  4. Jun 21, 2024
  5. Jun 20, 2024
    • Jan Beulich's avatar
      livepatch: use appropriate type for buffer offset variables · 62071a1c
      Jan Beulich authored
      As was made noticeable by the last of the commits referenced below,
      using a fixed-size type for such purposes is not only against
      ./CODING_STYLE, but can lead to actual issues. Switch to using size_t
      instead, thus also allowing calculations to be lighter-weight in 32-bit
      builds.
      
      No functional change for 64-bit builds.
      
      Link: https://gitlab.com/xen-project/xen/-/jobs/7136417308
      
      
      Fixes: b145b4a3 ("livepatch: Handle arbitrary size names with the list operation")
      Fixes: 5083e0ff ("livepatch: Add metadata runtime retrieval mechanism")
      Fixes: 43d5c5d5 ("xen: avoid UB in guest handle arithmetic")
      Reported-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Reviewed-by: default avatarRoss Lagerwall <ross.lagerwall@citrix.com>
      Release-Acked-by: default avatarOleksii Kurochko <oleksii.kurochko@gmail.com>
      62071a1c
    • Roger Pau Monné's avatar
      x86/irq: forward pending interrupts to new destination in fixup_irqs() · e2bb28d6
      Roger Pau Monné authored and Jan Beulich's avatar Jan Beulich committed
      
      fixup_irqs() is used to evacuate interrupts from to be offlined CPUs.  Given
      the CPU is to become offline, the normal migration logic used by Xen where the
      vector in the previous target(s) is left configured until the interrupt is
      received on the new destination is not suitable.
      
      Instead attempt to do as much as possible in order to prevent loosing
      interrupts.  If fixup_irqs() is called from the CPU to be offlined (as is
      currently the case for CPU hot unplug) attempt to forward pending vectors when
      interrupts that target the current CPU are migrated to a different destination.
      
      Additionally, for interrupts that have already been moved from the current CPU
      prior to the call to fixup_irqs() but that haven't been delivered to the new
      destination (iow: interrupts with move_in_progress set and the current CPU set
      in ->arch.old_cpu_mask) also check whether the previous vector is pending and
      forward it to the new destination.
      
      This allows us to remove the window with interrupts enabled at the bottom of
      fixup_irqs().  Such window wasn't safe anyway: references to the CPU to become
      offline are removed from interrupts masks, but the per-CPU vector_irq[] array
      is not updated to reflect those changes (as the CPU is going offline anyway).
      
      Signed-off-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      Release-Acked-by: default avatarOleksii Kurochko <oleksii.kurochko@gmail.com>
      e2bb28d6
    • Leigh's avatar
      tools/libs/light: Fix nic->vlan memory allocation · 74970165
      Leigh authored and Jan Beulich's avatar Jan Beulich committed
      
      After the following commit:
      3bc14e4f ("tools/libs/light: Add vlan field to libxl_device_nic")
      xl list -l aborts with a double free error if a domain has at least
      one vif defined:
      
        $ sudo xl list -l
        free(): double free detected in tcache 2
        Aborted
      
      Orginally, the vlan field was called vid and was defined as an integer.
      It was appropriate to call libxl__xs_read_checked() with gc passed as
      the string data was copied to a different variable.  However, the final
      version uses a string data type and the call should have been changed
      to use NOGC instead of gc to allow that data to live past the gc
      controlled lifetime, in line with the other string fields.
      
      This patch makes the change to pass NOGC instead of gc and moves the
      new code to be next to the other string fields (fixing a couple of
      errant tabs along the way), as recommended by Jason.
      
      Fixes: 3bc14e4f ("tools/libs/light: Add vlan field to libxl_device_nic")
      Signed-off-by: default avatarLeigh Brown <leigh@solinno.co.uk>
      Reviewed-by: default avatarJason Andryuk <jason.andryuk@amd.com>
      Acked-by: default avatarAnthony PERARD <anthony.perard@vates.tech>
      Release-acked-by: default avatarOleksii Kurochko <oleksii.kurochko@gmail.com>
      74970165
Loading