Commits · ba52b3b624e4a1a976908552364eba924ca45430 · xen-project / people / gdunlap / xen

Jun 24, 2024

tools/xl: Open xldevd.log with O_CLOEXEC · ba52b3b6

Andrew Cooper authored 10 months ago

`xl devd` has been observed leaking /var/log/xldevd.log into children.

Note this is specifically safe; dup2() leaves O_CLOEXEC disabled on newfd, so
after setting up stdout/stderr, it's only the logfile fd which will close on
exec().

Link: https://github.com/QubesOS/qubes-issues/issues/8292


Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Acked-by: Anthony PERARD <anthony.perard@vates.tech>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

ba52b3b6

Jun 21, 2024

libelf: avoid UB in elf_xen_feature_{get,set}() · 9e7c26ad

Jan Beulich authored 8 months ago and

Andrew Cooper committed 8 months ago


When the left shift amount is up to 31, the shifted quantity wants to be
of unsigned int (or wider) type.

While there also adjust types: get doesn't alter the array and returns a
boolean, while both don't really accept negative "nr". Drop a stray
blank each as well.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>

9e7c26ad

x86/ioapic: Fix signed shifts in io_apic.c · c5746b02

Matthew Barnes authored 8 months ago and

Andrew Cooper committed 8 months ago


There exists bitshifts in the IOAPIC code where signed integers are
shifted to the left by up to 31 bits, which is undefined behaviour.

This patch fixes this by changing the integers from signed to unsigned.

Signed-off-by: Matthew Barnes <matthew.barnes@cloud.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>

c5746b02

Jun 20, 2024

livepatch: use appropriate type for buffer offset variables · 62071a1c

Jan Beulich authored 8 months ago

As was made noticeable by the last of the commits referenced below,
using a fixed-size type for such purposes is not only against
./CODING_STYLE, but can lead to actual issues. Switch to using size_t
instead, thus also allowing calculations to be lighter-weight in 32-bit
builds.

No functional change for 64-bit builds.

Link: https://gitlab.com/xen-project/xen/-/jobs/7136417308


Fixes: b145b4a3 ("livepatch: Handle arbitrary size names with the list operation")
Fixes: 5083e0ff ("livepatch: Add metadata runtime retrieval mechanism")
Fixes: 43d5c5d5 ("xen: avoid UB in guest handle arithmetic")
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

62071a1c

x86/irq: forward pending interrupts to new destination in fixup_irqs() · e2bb28d6

Roger Pau Monné authored 8 months ago and

Jan Beulich committed 8 months ago

fixup_irqs() is used to evacuate interrupts from to be offlined CPUs. Given
the CPU is to become offline, the normal migration logic used by Xen where the
vector in the previous target(s) is left configured until the interrupt is
received on the new destination is not suitable.

Instead attempt to do as much as possible in order to prevent loosing
interrupts. If fixup_irqs() is called from the CPU to be offlined (as is
currently the case for CPU hot unplug) attempt to forward pending vectors when
interrupts that target the current CPU are migrated to a different destination.

Additionally, for interrupts that have already been moved from the current CPU
prior to the call to fixup_irqs() but that haven't been delivered to the new
destination (iow: interrupts with move_in_progress set and the current CPU set
in ->arch.old_cpu_mask) also check whether the previous vector is pending and
forward it to the new destination.

This allows us to remove the window with interrupts enabled at the bottom of
fixup_irqs(). Such window wasn't safe anyway: references to the CPU to become
offline are removed from interrupts masks, but the per-CPU vector_irq[] array
is not updated to reflect those changes (as the CPU is going offline anyway).

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

e2bb28d6

tools/libs/light: Fix nic->vlan memory allocation · 74970165

Leigh authored 8 months ago and

Jan Beulich committed 8 months ago


After the following commit:
3bc14e4f ("tools/libs/light: Add vlan field to libxl_device_nic")
xl list -l aborts with a double free error if a domain has at least
one vif defined:

  $ sudo xl list -l
  free(): double free detected in tcache 2
  Aborted

Orginally, the vlan field was called vid and was defined as an integer.
It was appropriate to call libxl__xs_read_checked() with gc passed as
the string data was copied to a different variable.  However, the final
version uses a string data type and the call should have been changed
to use NOGC instead of gc to allow that data to live past the gc
controlled lifetime, in line with the other string fields.

This patch makes the change to pass NOGC instead of gc and moves the
new code to be next to the other string fields (fixing a couple of
errant tabs along the way), as recommended by Jason.

Fixes: 3bc14e4f ("tools/libs/light: Add vlan field to libxl_device_nic")
Signed-off-by: Leigh Brown <leigh@solinno.co.uk>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
Acked-by: Anthony PERARD <anthony.perard@vates.tech>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

74970165

hotplug: Restore block-tap phy compatibility · f16ac12b

Jason Andryuk authored 8 months ago and

Jan Beulich committed 8 months ago


backendtype=phy using the blktap kernel module needs to use write_dev,
but tapback can't support that.  tapback should perform better, but make
the script compatible with the old kernel module again.

Fixes: 76a48419 ("hotplug: Update block-tap")
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Acked-by: Anthony PERARD <anthony.perard@vates.tech>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

f16ac12b

Jun 19, 2024

xen: avoid UB in guest handle arithmetic · 43d5c5d5

Jan Beulich authored 8 months ago


At least XENMEM_memory_exchange can have huge values passed in the
nr_extents and nr_exchanged fields. Adding such values to pointers can
overflow, resulting in UB. Cast respective pointers to "unsigned long"
while at the same time making the necessary multiplication explicit.
Remaining arithmetic is, despite there possibly being mathematical
overflow, okay as per the C99 spec: "A computation involving unsigned
operands can never overflow, because a result that cannot be represented
by the resulting unsigned integer type is reduced modulo the number that
is one greater than the largest value that can be represented by the
resulting type." The overflow that we need to guard against is checked
for in array_access_ok().

Note that in / down from array_access_ok() the address value is only
ever cast to "unsigned long" anyway, which is why in the invocation from
guest_handle_subrange_okay() the value doesn't need casting back to
pointer type.

In compat grant table code change two guest_handle_add_offset() to avoid
passing in negative offsets.

Since {,__}clear_guest_offset() need touching anyway, also deal with
another (latent) issue there: They were losing the handle type, i.e. the
size of the individual objects accessed. Luckily the few users we
presently have all pass char or uint8 handles.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>

43d5c5d5

x86/defns: Clean up X86_{XCR0,XSS}_* constants · 267122a2

Andrew Cooper authored 3 years ago


With the exception of one case in read_bndcfgu() which can use ilog2(),
the *_POS defines are unused.  Drop them.

X86_XCR0_X87 is the name used by both the SDM and APM, rather than
X86_XCR0_FP.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

267122a2

x86/cpuid: Fix handling of XSAVE dynamic leaves · 71cacfb0

Andrew Cooper authored 3 years ago


First, if XSAVE is available in hardware but not visible to the guest, the
dynamic leaves shouldn't be filled in.

Second, the comment concerning XSS state is wrong.  VT-x doesn't manage
host/guest state automatically, but there is provision for "host only" bits to
be set, so the implications are still accurate.

Introduce xstate_compressed_size() to mirror the uncompressed one.  Cross
check it at boot.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

71cacfb0

x86/cpu-policy: Simplify recalculate_xstate() · fdb7e77f

Andrew Cooper authored 3 years ago

Make use of xstate_uncompressed_size() helper rather than maintaining the
running calculation while accumulating feature components.

The rest of the CPUID data can come direct from the raw cpu policy. All
per-component data form an ABI through the behaviour of the X{SAVE,RSTOR}*
instructions.

Use for_each_set_bit() rather than opencoding a slightly awkward version of
it. Mask the attributes in ecx down based on the visible features. This
isn't actually necessary for any components or attributes defined at the time
of writing (up to AMX), but is added out of an abundance of caution.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

fdb7e77f

x86/xstate: Rework xstate_ctxt_size() as xstate_uncompressed_size() · df09dfb9

Andrew Cooper authored 3 years ago


We're soon going to need a compressed helper of the same form.

The size of the uncompressed image depends on the single element with the
largest offset + size.  Sadly this isn't always the element with the largest
index.

Name the per-xstate-component cpu_policy struture, for legibility of the logic
in xstate_uncompressed_size().  Cross-check with hardware during boot, and
remove hw_uncompressed_size().

This means that the migration paths don't need to mess with XCR0 just to
sanity check the buffer size.  It also means we can drop the "fastpath" check
against xfeature_mask (there to skip some XCR0 writes); this path is going to
be dead logic the moment Xen starts using supervisor states itself.

The users of hw_uncompressed_size() in xstate_init() can (and indeed need) to
be replaced with CPUID instructions.  They run with feature_mask in XCR0, and
prior to setup_xstate_features() on the BSP.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

df09dfb9

x86/boot: Collect the Raw CPU Policy earlier on boot · a09022a0

Andrew Cooper authored 9 months ago


This is a tangle, but it's a small step in the right direction.

In the following change, xstate_init() is going to start using the Raw policy.

calculate_raw_cpu_policy() is sufficiently separate from the other policies to
safely move like this.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

a09022a0

x86/xstate: Cross-check dynamic XSTATE sizes at boot · d31a1119

Andrew Cooper authored 5 years ago

Right now, xstate_ctxt_size() performs a cross-check of size with CPUID in for
every call. This is expensive, being used for domain create/migrate, as well
as to service certain guest CPUID instructions.

Instead, arrange to check the sizes once at boot. See the code comments for
details. Right now, it just checks hardware against the algorithm
expectations. Later patches will cross-check Xen's XSTATE calculations too.

Introduce more X86_XCR0_* and X86_XSS_* constants CPUID bits. This is to
maximise coverage in the sanity check, even if we don't expect to
use/virtualise some of these features any time soon. Leave HDC and HWP alone
for now; we don't have CPUID bits from them stored nicely.

Only perform the cross-checks when SELF_TESTS are active. It's only
developers or new hardware liable to trip these checks, and Xen at least
tracks "maximum value ever seen in xcr0" for the lifetime of the VM, which we
don't want to be tickling in the general case.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

d31a1119

x86/xstate: Fix initialisation of XSS cache · 9e6dbbe8

Andrew Cooper authored 9 months ago


The clobbering of this_cpu(xcr0) and this_cpu(xss) to architecturally invalid
values is to force the subsequent set_xcr0() and set_msr_xss() to reload the
hardware register.

While XCR0 is reloaded in xstate_init(), MSR_XSS isn't.  This causes
get_msr_xss() to return the invalid value, and logic of the form:

    old = get_msr_xss();
    set_msr_xss(new);
    ...
    set_msr_xss(old);

to try and restore said invalid value.

The architecturally invalid value must be purged from the cache, meaning the
hardware register must be written at least once.  This in turn highlights that
the invalid value must only be used in the case that the hardware register is
available.

Fixes: f7f4a523 ("x86/xstate: reset cached register values on resume")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

9e6dbbe8

xen/arch: Centralise __read_mostly and __ro_after_init · aba98c8d

Andrew Cooper authored 9 months ago


These living in cache.h is inherited from Linux, but cache.h is not a terribly
appropriately location for them to live.

__read_mostly is an optimisation related to data placement in order to avoid
having shared data in cachelines that are likely to be written to, but it
really is just a section of the linked image separating data by usage
patterns; it has nothing to do with cache sizes or flushing logic.

Worse, __ro_after_init was only in xen/cache.h because __read_mostly was in
arch/cache.h, and has literally nothing whatsoever to do with caches.

Move the definitions into xen/sections.h, which in particular means that
RISC-V doesn't need to repeat the problematic pattern.  Take the opportunity
to provide a short descriptions of what these are used for.

For now, leave TODO comments next to the other identical definitions.  It
turns out that unpicking cache.h is more complicated than it appears because a
number of files use it for transitive dependencies.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>

aba98c8d

xen/irq: Address MISRA Rule 8.3 violation · 82f48094

Andrew Cooper authored 8 months ago


When centralising irq_ack_none(), different architectures had different names
for the parameter.  As it's type is struct irq_desc *, it should be named
desc.  Make this consistent.

No functional change.

Fixes: 8aeda4a2 ("arch/irq: Make irq_ack_none() mandatory")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

82f48094

Revert "xen/arm: Alloc XenStore page for Dom0less DomUs from hypervisor" · efa6e9f1

Julien Grall authored 8 months ago


Michal reported that the gitlab CI is failing because of this series
[1].

This reverts commit 6f9d90ea.

Signed-off-by: Julien Grall <jgrall@amazon.com>

efa6e9f1

Revert "docs/features/dom0less: Update the late XenStore init protocol" · 443c488a

Julien Grall authored 8 months ago

Michal reported that the gitlab CI is failing because of this series
[1].

This reverts commit 53c5c99e.

[1] https://gitlab.com/xen-project/xen/-/pipelines/1338067978



Signed-off-by: Julien Grall <jgrall@amazon.com>

443c488a

xen/arm: static-shmem: fix "gbase/pbase used uninitialized" build failure · 988f1c7e

Michal Orzel authored 8 months ago

Building Xen with CONFIG_STATIC_SHM=y results in a build failure:

arch/arm/static-shmem.c: In function 'process_shm':
arch/arm/static-shmem.c:327:41: error: 'gbase' may be used uninitialized [-Werror=maybe-uninitialized]
327 | if ( is_domain_direct_mapped(d) && (pbase != gbase) )
arch/arm/static-shmem.c:305:17: note: 'gbase' was declared here
305 | paddr_t gbase, pbase, psize;

This is because the commit cb1ddafd adds a check referencing
gbase/pbase variables which were not yet assigned a value. Fix it.

Fixes: cb1ddafd ("xen/arm/static-shmem: Static-shmem should be direct-mapped for direct-mapped domains")
Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>

988f1c7e

docs/features/dom0less: Update the late XenStore init protocol · 53c5c99e

Henry Wang authored 9 months ago


With the new allocation strategy of Dom0less DomUs XenStore page,
update the doc of the late XenStore init protocol accordingly.

Signed-off-by: Henry Wang <xin.wang2@amd.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>

53c5c99e

xen/arm: Alloc XenStore page for Dom0less DomUs from hypervisor · 6f9d90ea

Henry Wang authored 9 months ago


There are use cases (for example using the PV driver) in Dom0less
setup that require Dom0less DomUs start immediately with Dom0, but
initialize XenStore later after Dom0's successful boot and call to
the init-dom0less application.

An error message can seen from the init-dom0less application on
1:1 direct-mapped domains:
```
Allocating magic pages
memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
Error on alloc magic pages
```

The "magic page" is a terminology used in the toolstack as reserved
pages for the VM to have access to virtual platform capabilities.
Currently the magic pages for Dom0less DomUs are populated by the
init-dom0less app through populate_physmap(), and populate_physmap()
automatically assumes gfn == mfn for 1:1 direct mapped domains. This
cannot be true for the magic pages that are allocated later from the
init-dom0less application executed in Dom0. For domain using statically
allocated memory but not 1:1 direct-mapped, similar error "failed to
retrieve a reserved page" can be seen as the reserved memory list is
empty at that time.

Since for init-dom0less, the magic page region is only for XenStore.
To solve above issue, this commit allocates the XenStore page for
Dom0less DomUs at the domain construction time. The PFN will be
noted and communicated to the init-dom0less application executed
from Dom0. To keep the XenStore late init protocol, set the connection
status to XENSTORE_RECONNECT.

Currently the GUEST_MAGIC_BASE in the init-dom0less application is
hardcoded, which will lead to failures for 1:1 direct-mapped Dom0less
DomUs.

Since the guest magic region allocation from init-dom0less is for
XenStore, and the XenStore page is now allocated from the hypervisor,
instead of hardcoding the guest magic pages region, use
xc_hvm_param_get() to get the XenStore page PFN. Rename alloc_xs_page()
to get_xs_page() to reflect the changes.

With this change, some existing code is not needed anymore, including:
(1) The definition of the XenStore page offset.
(2) Call to xc_domain_setmaxmem() and xc_clear_domain_page() as we
    don't need to set the max mem and clear the page anymore.
(3) Foreign mapping of the XenStore page, setting of XenStore interface
    status and HVM_PARAM_STORE_PFN from init-dom0less, as they are set
    by the hypervisor.

Take the opportunity to do some coding style improvements when possible.

Reported-by: Alec Kwapis <alec.kwapis@medtronic.com>
Suggested-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Signed-off-by: Henry Wang <xin.wang2@amd.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
Acked-by: Anthony PERARD <anthony.perard@vates.tech>

6f9d90ea

xen/arm/static-shmem: Static-shmem should be direct-mapped for direct-mapped domains · cb1ddafd

Henry Wang authored 8 months ago


Currently, users are allowed to map static shared memory in a
non-direct-mapped way for direct-mapped domains. This can lead to
clashing of guest memory spaces. Also, the current extended region
finding logic only removes the host physical addresses of the
static shared memory areas for direct-mapped domains, which may be
inconsistent with the guest memory map if users map the static
shared memory in a non-direct-mapped way. This will lead to incorrect
extended region calculation results.

To make things easier, add restriction that static shared memory
should also be direct-mapped for direct-mapped domains. Check the
host physical address to be matched with guest physical address when
parsing the device tree. Document this restriction in the doc.

Signed-off-by: Henry Wang <xin.wang2@amd.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Acked-by: Michal Orzel <michal.orzel@amd.com>

cb1ddafd

Jun 18, 2024

xen/ubsan: Fix UB in type_descriptor declaration · bd59af99

Andrew Cooper authored 8 months ago


struct type_descriptor is arranged with a NUL terminated string following the
kind/info fields.

The only reason this doesn't trip UBSAN detection itself (on more modern
compilers at least) is because struct type_descriptor is only referenced in
suppressed regions.

Switch the declaration to be a real flexible member.  No functional change.

Fixes: 00fcf4dd ("xen/ubsan: Import ubsan implementation from Linux 4.13")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

bd59af99

x86/irq: handle moving interrupts in _assign_irq_vector() · 36955892

Roger Pau Monné authored 8 months ago and

Jan Beulich committed 8 months ago


Currently there's logic in fixup_irqs() that attempts to prevent
_assign_irq_vector() from failing, as fixup_irqs() is required to evacuate all
interrupts from the CPUs not present in the input mask.  The current logic in
fixup_irqs() is incomplete, as it doesn't deal with interrupts that have
move_cleanup_count > 0 and a non-empty ->arch.old_cpu_mask field.

Instead of attempting to fixup the interrupt descriptor in fixup_irqs() so that
_assign_irq_vector() cannot fail, introduce logic in _assign_irq_vector()
to deal with interrupts that have either move_{in_progress,cleanup_count} set
and no remaining online CPUs in ->arch.cpu_mask.

If _assign_irq_vector() is requested to move an interrupt in the state
described above, first attempt to see if ->arch.old_cpu_mask contains any valid
CPUs that could be used as fallback, and if that's the case do move the
interrupt back to the previous destination.  Note this is easier because the
vector hasn't been released yet, so there's no need to allocate and setup a new
vector on the destination.

Due to the logic in fixup_irqs() that clears offline CPUs from
->arch.old_cpu_mask (and releases the old vector if the mask becomes empty) it
shouldn't be possible to get into _assign_irq_vector() with
->arch.move_{in_progress,cleanup_count} set but no online CPUs in
->arch.old_cpu_mask.

However if ->arch.move_{in_progress,cleanup_count} is set and the interrupt has
also changed affinity, it's possible the members of ->arch.old_cpu_mask are no
longer part of the affinity set, move the interrupt to a different CPU part of
the provided mask and keep the current ->arch.old_{cpu_mask,vector} for the
pending interrupt movement to be completed.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

36955892

x86/irq: deal with old_cpu_mask for interrupts in movement in fixup_irqs() · 817d1cd6

Roger Pau Monné authored 8 months ago and

Jan Beulich committed 8 months ago

Given the current logic it's possible for ->arch.old_cpu_mask to get out of
sync: if a CPU set in old_cpu_mask is offlined and then onlined
again without old_cpu_mask having been updated the data in the mask will no
longer be accurate, as when brought back online the CPU will no longer have
old_vector configured to handle the old interrupt source.

If there's an interrupt movement in progress, and the to be offlined CPU (which
is the call context) is in the old_cpu_mask, clear it and update the mask, so
it doesn't contain stale data.

Note that when the system is going down fixup_irqs() will be called by
smp_send_stop() from CPU 0 with a mask with only CPU 0 on it, effectively
asking to move all interrupts to the current caller (CPU 0) which is the only
CPU to remain online. In that case we don't care to migrate interrupts that
are in the process of being moved, as it's likely we won't be able to move all
interrupts to CPU 0 due to vector shortage anyway.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

817d1cd6

x86/Intel: unlock CPUID earlier for the BSP · fa4d0267

Jan Beulich authored 8 months ago


Intel CPUs have a MSR bit to limit CPUID enumeration to leaf two. If
this bit is set by the BIOS then CPUID evaluation does not work when
data from any leaf greater than two is needed; early_cpu_init() in
particular wants to collect leaf 7 data.

Cure this by unlocking CPUID right before evaluating anything which
depends on the maximum CPUID leaf being greater than two.

Inspired by (and description cloned from) Linux commit 0c2f6d04619e
("x86/topology/intel: Unlock CPUID before evaluating anything").

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

fa4d0267

automation/eclair_analysis: add more clean MISRA guidelines · 77b1ed1d

Nicola Vetrini authored 9 months ago


Rules 20.12 and 14.4 are now clean on ARM and x86, so they are added
to the list of clean guidelines.

Some guidelines listed in the additional clean section for ARM are also
clean on x86, so they can be removed from there.

No functional change.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
[stefano: remove 20.9 from commit message]
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>

77b1ed1d

automation/eclair_analysis: address remaining violations of MISRA C Rule 20.12 · 0f0235e9

Nicola Vetrini authored 9 months ago


The DEFINE macro in asm-offsets.c (for all architectures) still generates
violations despite the file(s) being excluded from compliance, due to the
fact that in its expansion it sometimes refers entities in non-excluded files.
These corner cases are deviated by the configuration.

No functional change.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>

0f0235e9

Jun 14, 2024

xen/docs: Describe static shared memory when host address is not provided · 8b4243a9

Penny authored 9 months ago


This commit describe the new scenario where host address is not provided
in "xen,shared-mem" property and a new example is added to the page to
explain in details.

Take the occasion to fix some typos in the page.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>

8b4243a9

xen/arm: Implement the logic for static shared memory from Xen heap · 3deac416

Luca Fancellu authored 9 months ago


This commit implements the logic to have the static shared memory banks
from the Xen heap instead of having the host physical address passed from
the user.

When the host physical address is not supplied, the physical memory is
taken from the Xen heap using allocate_domheap_memory, the allocation
needs to occur at the first handled DT node and the allocated banks
need to be saved somewhere.

Introduce the 'shm_heap_banks' for that reason, a struct that will hold
the banks allocated from the heap, its field bank[].shmem_extra will be
used to point to the bootinfo shared memory banks .shmem_extra space, so
that there is not further allocation of memory and every bank in
shm_heap_banks can be safely identified by the shm_id to reconstruct its
traceability and if it was allocated or not.

A search into 'shm_heap_banks' will reveal if the banks were allocated
or not, in case the host address is not passed, and the callback given
to allocate_domheap_memory will store the banks in the structure and
map them to the current domain, to do that, some changes to
acquire_shared_memory_bank are made to let it differentiate if the bank
is from the heap and if it is, then assign_pages is called for every
bank.

When the bank is already allocated, for every bank allocated with the
corresponding shm_id, handle_shared_mem_bank is called and the mapping
are done.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>

3deac416

xen/arm: Rework heap page allocation outside allocate_bank_memory · a21bfc91

Luca Fancellu authored 9 months ago


The function allocate_bank_memory allocates pages from the heap and
maps them to the guest using guest_physmap_add_page.

As a preparation work to support static shared memory bank when the
host physical address is not provided, Xen needs to allocate memory
from the heap, so rework allocate_bank_memory moving out the page
allocation in a new function called allocate_domheap_memory.

The function allocate_domheap_memory takes a callback function and
a pointer to some extra information passed to the callback and this
function will be called for every region, until a defined size is
reached.

In order to keep allocate_bank_memory functionality, the callback
passed to allocate_domheap_memory is a wrapper for
guest_physmap_add_page.

Let allocate_domheap_memory be externally visible, in order to use
it in the future from the static shared memory module.

Take the opportunity to change the signature of allocate_bank_memory
and remove the 'struct domain' parameter, which can be retrieved from
'struct kernel_info'.

No functional changes is intended.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>

a21bfc91

xen/arm: Parse xen,shared-mem when host phys address is not provided · 1544d355

Luca Fancellu authored 9 months ago


Handle the parsing of the 'xen,shared-mem' property when the host physical
address is not provided, this commit is introducing the logic to parse it,
but the functionality is still not implemented and will be part of future
commits.

Rework the logic inside process_shm_node to check the shm_id before doing
the other checks, because it ease the logic itself, add more comment on
the logic.
Now when the host physical address is not provided, the value
INVALID_PADDR is chosen to signal this condition and it is stored as
start of the bank, due to that change also early_print_info_shmem and
init_sharedmem_pages are changed, to not handle banks with start equal
to INVALID_PADDR.

Another change is done inside meminfo_overlap_check, to skip banks that
are starting with the start address INVALID_PADDR, that function is used
to check banks from reserved memory, shared memory and ACPI and since
the comment above the function states that wrapping around is not handled,
it's unlikely for these bank to have the start address as INVALID_PADDR.
Same change is done inside consider_modules, find_unallocated_memory and
dt_unreserved_regions functions, in order to skip banks that starts with
INVALID_PADDR from any computation.
The changes above holds because of this consideration.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>

1544d355

xen/p2m: put reference for level 2 superpage · 3e2121ce

Penny authored 9 months ago


We are doing foreign memory mapping for static shared memory, and
there is a great possibility that it could be super mapped.
But today, p2m_put_l3_page could not handle superpages.

This commits implements a new function p2m_put_l2_superpage to handle
level 2 superpages, specifically for helping put extra references for
foreign superpages.

Modify relinquish_p2m_mapping as well to take into account preemption
when we have a level-2 foreign mapping.

Currently level 1 superpages are not handled because Xen is not
preemptible and therefore some work is needed to handle such superpages,
for which at some point Xen might end up freeing memory and therefore
for such a big mapping it could end up in a very long operation.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

3e2121ce

xen/arm: Wrap shared memory mapping code in one function · 79a43c88

Luca Fancellu authored 9 months ago


Wrap the code and logic that is calling assign_shared_memory
and map_regions_p2mt into a new function 'handle_shared_mem_bank',
it will become useful later when the code will allow the user to
don't pass the host physical address.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>

79a43c88

xen/arm: Lookup bootinfo shm bank during the mapping · e90f8d56

Luca Fancellu authored 9 months ago

The current static shared memory code is using bootinfo banks when it
needs to find the number of borrowers, so every time assign_shared_memory
is called, the bank is searched in the bootinfo.shmem structure.

There is nothing wrong with it, however the bank can be used also to
retrieve the start address and size and also to pass less argument to
assign_shared_memory. When retrieving the information from the bootinfo
bank, it's also possible to move the checks on alignment to
process_shm_node in the early stages.

So create a new function find_shm_bank_by_id() which takes a
'struct shared_meminfo' structure and the shared memory ID, to look for a
bank with a matching ID, take the physical host address and size from the
bank, pass the bank to assign_shared_memory() removing the now unnecessary
arguments and finally remove the acquire_nr_borrower_domain() function
since now the information can be extracted from the passed bank.
Move the "xen,shm-id" parsing early in process_shm to bail out quickly in
case of errors (unlikely), as said above, move the checks on alignment
to process_shm_node.

Drawback of this change is that now the bootinfo are used also when the
bank doesn't need to be allocated, however it will be convenient later
to use it as an argument for assign_shared_memory when dealing with
the use case where the Host physical address is not supplied by the user.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>

e90f8d56

Jun 13, 2024

x86/EPT: drop questionable mfn_valid() from epte_get_entry_emt() · 4fdd8d75

Jan Beulich authored 9 months ago

mfn_valid() is RAM-focused; it will often return false for MMIO. Yet
access to actual MMIO space should not generally be restricted to UC
only; especially video frame buffer accesses are unduly affected by such
a restriction.

Since, as of 777c71d3 ("x86/EPT: avoid marking non-present entries
for re-configuring"), the function won't be called with INVALID_MFN or,
worse, truncated forms thereof anymore, we call fully drop that check.

Fixes: 81fd0d3c ("x86/hvm: simplify 'mmio_direct' check in epte_get_entry_emt()")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

4fdd8d75

x86/EPT: avoid marking non-present entries for re-configuring · 777c71d3

Jan Beulich authored 9 months ago


For non-present entries EMT, like most other fields, is meaningless to
hardware. Make the logic in ept_set_entry() setting the field (and iPAT)
conditional upon dealing with a present entry, leaving the value at 0
otherwise. This has two effects for epte_get_entry_emt() which we'll
want to leverage subsequently:
1) The call moved here now won't be issued with INVALID_MFN anymore (a
   respective BUG_ON() is being added).
2) Neither of the other two calls could now be issued with a truncated
   form of INVALID_MFN anymore (as long as there's no bug anywhere
   marking an entry present when that was populated using INVALID_MFN).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>

777c71d3

x86/EPT: correct special page checking in epte_get_entry_emt() · 5540b94e

Jan Beulich authored 9 months ago


mfn_valid() granularity is (currently) 256Mb. Therefore the start of a
1Gb page passing the test doesn't necessarily mean all parts of such a
range would also pass. Yet using the result of mfn_to_page() on an MFN
which doesn't pass mfn_valid() checking is liable to result in a crash
(the invocation of mfn_to_page() alone is presumably "just" UB in such a
case).

Fixes: ca24b2ff ("x86/hvm: set 'ipat' in EPT for special pages")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

5540b94e

xen/arm: ffa: support notification · b490f470

Jens Wiklander authored 9 months ago


Add support for FF-A notifications, currently limited to an SP (Secure
Partition) sending an asynchronous notification to a guest.

Guests and Xen itself are made aware of pending notifications with an
interrupt. The interrupt handler triggers a tasklet to retrieve the
notifications using the FF-A ABI and deliver them to their destinations.

Update ffa_partinfo_domain_init() to return error code like
ffa_notif_domain_init().

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

b490f470