qemu user mode emulation for arm64 is broken
At Kali Linux, we use GitLab's CI/CD to build our Kali Linux container images for several architectures (amd64
, arm64
and armhf
at the moment). For that to work, we rely on QEMU user mode emulation and the binfmt_misc
kernel module.
We build our container images once a week, using GitLab's shared runners.
Starting the 1st of August our pipeline started to fail consistently. This is very likely due to GitLab shared runners migration from CoreOS to Google COS.
I spend much time investigating this issue, and the details are available at:
- kalilinux/build-scripts/kali-docker#41
- https://gitlab.com/kalilinux/build-scripts/kali-docker/-/jobs/1517756613
I also setup a minimal test pipeline to demonstrate the issue:
- https://gitlab.com/arnaudr/test-binfmt-misc/-/blob/master/.gitlab-ci.yml
- https://gitlab.com/arnaudr/test-binfmt-misc/-/pipelines/356065926/builds
Let me sum up what I learnt in a few words:
It seems to me that the qemu emulation is properly setup: we can see that the binfmt_misc
module is built-in the kernel of the host OS (zgrep BINFMT_MISC /proc/config.gz
), and the setup command update-binfmts --enable
succeeds. Moreover we can successfully build a Kali container image for the architecture armhf
. For the architecture arm64
, we can successfully run the tool arch-test
, demonstrating that we can run an arm64 binary. So once again: it seems that the qemu emulation is setup and functional, even for arm64.
However calling ldconfig
sefgaults for the arm64 architecture. Investigation suggests that we're hitting a bug with the combination of this particular version of ldconfig, and the kernel provided by the host OS. I tried bootstrapping an old version of Debian with an old version of ldconfig, and it works.
One thing I'd like to try is to run this CI job on a shared runner with a more recent Linux kernel, either the latest version of the 5.4 series, or the 5.10 series. If ever it's easy for you to try that yourself, or to grant us access to such a shared runner, then I'd suggest to start from there.
Note that we don't have a workaround for this issue at the moment.
Thanks a lot for your help!
/cc @rhertzog