Finding an appropriate way of loading the Nvidia drivers
Introduction
The current state of Nvidia drivers support in the Nonguix channel seems rather, well, incomplete. It has been decided in the issue #198 to perform a rewrite and solve most of the weaknesses to date.
The first component that indisputably needs to be rethought is the way of loading relevant drivers. Hence, this thread is intended to address this problem and facilitate installation and deployment of the Nvidia drivers, providing:
-
sound GLX, EGL (also Wayland), and Vulkan loading architecture, -
ability to load custom drivers (e.g. custom versions of nvidia), -
multi-gpu support using Prime and Optimus, -
legacy drivers for older devices.
The solution so far --- package input rewriting with grafting
At the time of writing the first version of the nvidia drivers package (see #31 (closed)), it didn't matter too much how they were loaded as long as they were loaded at all. It was decided then to go for the surest and most reliable way. For, what if there is possibility only to transparently replace mesa libraries with the nvidia ones?
In other words, as for now, we provide the mesa/fake
package
actually being an union of mesa
and nvidia-libs
with common
libraries overwritten by the latter. Then, for each package the user
wants to compile, package-input-rewriting
recursively replaces all
occurrences of mesa
with mesa/fake
in inputs.
This works extremely well, albeit with a drawback --- staticity. If
the user compiles given package with replace-mesa
, it's statically
connected with nvidia and there's no space for dynamic goods like
Optimus. It also appears that this method of input rewriting combined
with grafting breaks for packages like Steam (see #197 (closed)).
So are there any alternatives? Yes, there are!
Graphics Layer Vendor-Neutral Dispatch library (libglvnd)
Another approach --- instead of statically linking given packages against mesa or nvidia --- is to provide some sort of a man-in-the-middle that will automatically pass API calls to the relevant drivers. Such a dispatch library usually creates a list of possible candidates (here: drivers implementations) and chooses an appropriate one basing on information acquired (e.g. vendor name of the X screen).
The section name reveals everything --- GLVND. GLVND is an official dispatch library proposed and recommended by Nvidia to allow multiple drivers from different vendors to coexist on the same filesystem. This solution works incredibly well and is adapted in all major distributions.
Additionally, I haven't found this entry in the Nvidia documentation but it seems like GLVND is the only way to make Optimus work.
Forcing GLVND GLX library
However, one thing does not work well on Guix and other distributions
based on the isolation concept --- building the list of candidates.
GLVND assumes that all GLX implementations can be found under the
library path of the dynamic linker (LD_LIBRARY_PATH
). More
precisely, GLVND looks for all currently available libGLX_vendor.so
libraries where vendor
is the graphics drivers vendor name chosen
basing on X screen or set explicitly with
__GLX_VENDOR_LIBRARY_NAME=vendor
.
Here is the problem. Suppose we want to use neither mesa
nor
nvidia
but custom-vendor
. How GLVND installed as an input of some
package would be expected to find the libGLX_custom-vendor.so
file
in the LD_LIBRARY_PATH
if custom-vendor
was not included in the
inputs of that package?
@squarerectangle's proposal is to modify GLVND so that it accepts an
absolute path (with GUIX_GLVND_GLXLIB
environment variable)
instead of vendor name. After applying this approach to both guix and
nonguix channels directly (without any grafting or input rewriting), I
came to the following doubts:
- Consider an isolated container. If
GUIX_GLVND_GLXLIB
points to some derivation with custom drivers, how the container knows to expose this derivation? - If
GUIX_GLVND_GLXLIB
forces some GLX library, to where GLVND should fallback in case this library does not exists?
The thing I would like to show is that by forcing some GLX library
with GUIX_GLVND_GLXLIB
we take away the ability of GLVND to
prioritise and manage drivers in its wise way.
Keeping GLVND extensible
In my opinion, much better solution would be not to force how GLVND thinks but prepare its environment so that GLVND thinks as if it were on any other distribution. In other words, we want to provide some space with all the graphical drivers in one place that is accessible whether GLVND is currently running inside a container or not. Well, what if that space were GLVND itself?
The way I think about this solution is some kind of meta packages,
i.a. those found on Debian. Suppose we have a glvnd-metagl
package
(name does not matter) which is a complete union of graphical packages
one would like to use. As an example, for Nvida Optimus setups
glvnd-metagl
could be an union of glvnd
, mesa
(with glvnd
support), and nvidia-libs
packages. glvnd-metagl
is meant to be a
drop-in replacement for mesa
, so if the user wants to compile some
package with their own set of graphical drivers, they perform package
input rewriting of mesa
with glvnd-metagl
.
glvnd-metagl
should be a package template (actually a function
from a technical point of view). With no arguments provided,
glvnd-metagl
should be a build union of glvnd
and mesa
(with
glvnd
support). However, if there was an argument given (Gexp
expression pointing to a package), it would create a new derivation
with that package added. It would allow the user to install any custom
GLX drivers (providing they support GLVND) or specified version of
Nvidia drivers (e.g. for legacy devices).
As it would be common to use glvnd-metagl
with nvidia-libs
, we can
provide another package, let's say glvnd-metagl-nvidia
, and a
function similar to replace-mesa
that simplifies the input
rewriting. We can even build derivations with mesa
replaced on the
substitution server to make the build process faster on the user's
side.
Summary
I'm creating this issue mainly to start a discussion and get some more ideas. I believe solving the problem of loading will help the rewriting process of the Nvidia drivers. As for now, the solutions I see are:
- static package input rewriting
- with the use of GLVND:
- forcing GLVND to use specific libraries
- exposing all relevant libraries to GLVND
I hope that all our efforts will translate into better support of Nvidia graphics cards in the Guix system.
@podiki @qzdl @phodina @lukolszewski @jonsger @squarerectangle @aerique @ric342 @MorganJamesSmith