22.06 SP1 updates appear to have bricked osk-sdl to unusable state for me on the original allwinner PinePhone.
What's the expected behaviour?
osk-sdl works like it did before updates
What's the current behaviour?
When I boot with no external usb keyboard attached, I get 1. postmarketOS logo, 2. split second text cursor, 3. OSK SDL shows with the keyboard moving in with its animation but then no reaction to touch whatsoever. I tried entering the password blindly but it didn't work, so it seems like it's not just a redraw issue. When I plug in a usb keyboard later, pressing any keys won't work.
When I boot with external usb keyboard attached, I get 1. postmarketOS logo, 2. not sure if the split second cursor showed, 3. black screen but with backlight on. Blindly logging in on usb keyboard doesn't work either.
So I'm stuck at the FDE prompt, the install seems to be effectively bricked.
How to reproduce your issue?
Install 21.12 (original-no-SP) on non-pro PinePhone, enable FDE via osk-sdl prompt
Upgrade to 22.06 no-SP via upgrade script
Upgrade to latest packages (SP1) which should also regenerate initramfs
???
Reboot
Edit/note: I did not try these cleanly, this just sort of describes the life cycle of this install. It worked on 22.06 no-SP (no service pack) fine, still
What device are you using?
PinePhone 1.2 CE 3GB Allwinner (non-pro)
On what postmarketOS version did you encounter the issue?
edge (master branch)
v22.06
v21.12 (supported until 2022-07-12)
I confirm that the issue still is present after running sudo apk upgrade -a
I don't have much time in the morning, but just tried to reproduce it with a fresh install via pmbootstrap install --fde --sdcard=... on v22.06 on a PinePhone 1.2 and PinePhone 1.2a (see revisions) and I was not able to reproduce it. I can type on the virtual keyboard and it boots up after typing in the password.
Nevertheless I've reverted the patch that upgraded osk-sdl in d25544ac, so at least users who did not upgrade yet should not be affected by this. I'll also edit the post to point to this issue.
From the top of my head, here is how it should be possible to recover from this situation. It's untested though:
if installed to SD: insert your SD card into a computer
if installed to eMMC: flash JumpDrive to an SD card, connect your phone via USB to your PC and boot the SD
in both cases you should now be able to see the boot and root partitions of postmarketOS on your PC with lsblk
mount the boot partition (the first, smaller one) to e.g. /mnt/pmos_boot. Something like: sudo mkdir /mnt/boot and sudo mount /dev/mmcblk0p1 /mnt/boot
make a copy of the broken initramfs and initramfs-extra files
replace them with the files from the boot partition of e.g. this image that was created before the SP was released, by mounting it with losetup, using partprobe to get the partitions inside it, and mounting the boot partition.
(I also tried uploading files directly but my internet is acting up >_< and as mentioned, I need to go...)
I hope this makes it work again for everybody affected, please let us know if you need further help recovering! You might possibly get faster answers in the chat, but make sure to put the important information in the issue here.
To figure out what went wrong, it would be useful:
if you could upload the broken files from your phones too.
if you could get serial output of the state where it is not working, maybe there's something useful in there.
Sadly I ran into the following combined points: 1. I had to make a call and I use my PinePhone as daily driver with no backup phone, 2. I have an emmc install with /home separate on sd card, 3. there is no way to copy files off the emmc first via the installer image since it has no live desktop to do other things in, 4. I only had one spare sd card, 5. I have usable user data backups. The result of that was that unless I was going to reimage an sd card multiple times with first jumpdrive and lots of more usb juggling, the easiest way was just to bulldoze a new install on top without preserving the broken install. So I'm afraid I currently can't provide a lot more other than that the reinstall fixed it, so it seems like it indeed wasn't a hardware issue but something about the touch input was broken by the update and restored by the reinstall. Sorry that I can't be more helpful!
I tried this fix and I was still stuck at no response on the keyboard. I noticed that u-boot had been updated too, so I put an older image on an sdcard and ran the u-boot fix from here and was then able to type in my password and complete the boot.
edit: by "this fix" I meant to replace the /boot files with those from an older image as Ollie suggested. Also, should point out that I'm running edge with sxmo-de-sway ... edit 2: on Braveheart pinephone
there were two recent commits that updated the u-boot package in Alpine (which is used by the pinephone):
commit 621208776c1e26175485b7342f229692ae1106acAuthor: Bart Ribbers <bribbers@disroot.org>Date: Fri Jul 15 09:09:06 2022 +0200 main/u-boot: rebuild against atf 2.7.0commit 6da96fbb190920f074a02de7ccf6a6d0a2d4ee95Author: Milan P. Stanić <mps@arvanta.net>Date: Tue Jul 12 14:38:09 2022 +0000 main/u-boot: upgrade to 2022.07
But if u-boot was the problem, then replacing files in /boot shouldn't have "fixed" it in your case, since u-boot is embedded outside of any filesystem
@z3ntu were you successful in reproducing this earlier?
sorry, maybe I wasn't clear. I meant that I tried the fix you suggested of replacing the files in /boot from an older image and that didn't work. Then I tried the u-boot fix and was able to type in my password.
@craftyguy edge, as a I mentioned. I know this issue was reported for 22.06, but I just got the same issue after I updated this morning(my first updates in about 7-10 days) so I thought it might be the same thing.
edit: oh, just noticed I used a 22.06 image to fix it. That's probably why you asked. Sorry, I've had a long day. I'll stop posting now, but hopefully this can help somebody figure it out.
@bolbishvili thanks. ya I was a little unclear if you had started with edge, or used some component from edge along with the v22.06 image, or what thanks for clarifying!
Thanks for reporting @bolbishvili! If you saved the broken files from the boot partition, it would be very useful if you could upload them. Having you report that you experienced this on edge makes me wonder if this may randomly happen depending on the order of the files in the initramfs or something... or a config change. It would be great to debug this and get to the root cause of it. (or alternatively, we'll soon switch to unl0kr (#1411 (closed)) where hopefully we don't have this issue at all...)
We extensively discussed this issue in last team meeting, and so far we only have one report of that happening on v22.06 (and another one on edge now). We can't reproduce it at all and Martijn also tested osk-sdl before releasing the service pack.
@bolbishvili, @ell1e: do you happen to remember if you customized your installation in any way that might have impacted this? I know that @ell1e upgraded from v21.12 to v22.06, so given that the installation is a bit older already, maybe something in the filesystem has been changed that might be related to this, adjusting the osk-sdl config or something?
There's also apk audit, it shows what files have been modified compared to the state of the files in the package. Sam, since you have your installation still, it would be useful if you could check if there's any config file changes that might be causing it.
Also @bolbishvili, I wonder if you can reproduce this by generating the initramfs again (should be sudo mkinitfs).
Getting serial output would also be useful to trying to resolve this.
Yesterday I've upgraded my encrypted pinephone install that's running v22.06 to the latest (which didn't contain the osk-sdl upgrade which was already reverted), all fine. Today I built the new osk-sdl 0.67 and upgraded, and it still works fine. So unfortunately I cannot reproduce this on my device.
I say I think because I tried the suggested fix of replacing initramfs* (or all the) files in /boot with the files from the image linked by Ollie above. That didn't change anything. And after a sudo update-u-boot -d /dev/mmcblk2 the phone stays dead completely, it doesn't do the initial LED/vibrate anymore. (The touchscreen works fine when booting from an SD card.) [Edit:] After replacing initramfs* in /boot and u-boot with dd, it boots and I can enter my password. Seems like it is indeed the same problem. (I did have to downgrade the kernel after booting, because of the lack of compatible kernel modules. sudo apk add --allow-untrusted /var/cache/apk/linux-postmarketos-allwinner-5.17.5_git20220429-r0.9e8ca604.apk did that for me.)
anyone affected: do you remember if you quickly rebooted after the upgrade?
my theory is that either:
we are missing a sync for some reason, so the initramfs-extra file does get written right, but it's not synced to disc before shutting off (for this it would be interesting to know if you rebooted right after running the update)
or the initramfs-extra generation code actually has a bug (though we took a lot of care to make sure that it's not the case, afaik we even extract it right after generating it to make sure the files are not broken...). maybe it does extract fine in postmarketos-mkinitfs, but not with busybox for some reason?
this is the end of the log above where the error happens:
Starting unudhcpd Using interface usb0 Starting the DHCP daemonTrying to start server with parameters: Server IP addr: 172.16.42.1:67, client IP addr: 172.16.42.2, interface: usb0Trying to bind to interface: usb0Server started!Mount boot partition (/dev/mmcblk2p1) to /boot (read-only)Detected ext filesystemExtract /boot/initramfs-extracpio: lib/libdevmapper.so.1.02 not created: newer or same age file existscpio: lib/ld-musl-aarch64.so.1 not created: newer or same age file existscpio: lib/libc.musl-aarch64.so.1 not created: newer or same age file exists292703 blocks
next up would be comparing the initramfs-extra file and its contents with a proper version and figuring out extractly what went wrong (can't spend more time on it right now though, but will take another look as soon as I find time; if somebody wants to analyze it more in the meantime it would be appreciated).
anyone affected: do you remember if you quickly rebooted after the upgrade?
I rebooted many hours later after seeing the upgrade & initramfs regeneration in the terminal. Probably around 12 hours minimum, might actually have been days.
Oliver Smithchanged title from 22.06 SP1 updates appear to have bricked osk-sdl to unusable state [Original Allwinner PinePhone] to 22.06 SP1 updates appear to have bricked osk-sdl to unusable state [Original Allwinner PinePhone] / probably broken initramfs-extra
changed title from 22.06 SP1 updates appear to have bricked osk-sdl to unusable state [Original Allwinner PinePhone] to 22.06 SP1 updates appear to have bricked osk-sdl to unusable state [Original Allwinner PinePhone] / probably broken initramfs-extra
My main suspicion is now that maybe the battery was (almost) empty and that lead to the weird osk-sdl behavior (not accepting keyboard input, or not showing up at all).
So what I've tried is, I looked at @formalferret's initramfs files. I compared them against 20220727-1832-postmarketOS-edge-phosh-18-pine64-pinephone.img.xz (which is from around the same timeframe):
I made sure I can boot that .img.xz from SD on a pinephone without issues
extracted the initramfs and initramfs-extra files and compared their contents with diffoscope
(when running diffoscope on the non-extraced files, it founds lots of file ordering differences that are unrelated)
the result is:
all files in initramfs-extra are identical
files in initramfs are identical, except for the following files which only have ordering differences of the lines (so also shouldn't matter):
/lib/modules/5.18.7/modules.alias
/lib/modules/5.18.7/modules.dep
/lib/modules/5.18.7/modules.symbols
the uploaded archive also had an initramfs-postmarketos-allwinner file, this is an old one that doesn't get used anymore; we used to have these "flavors" at the end of the initramfs files but now we don't anymore
I copied the "broken" initramfs files over the pre-built image, and was able to boot it fine. So there seems to be nothing wrong with these files.
I also read all the reports above again, and found that some refer to actually replacing u-boot leading to a fixed installation. This would mean, there's a bug with the u-boot script... but this would surprise me very much since we not only copy the new u-boot via dd, but actually verify that the checksum of it matches just to be sure (see validate_checksum). So if that failed, users should have noticed during the upgrade as apk would throw an error. This leads to an error message at the end of apk upgrade on cli, and also to an error if doing the upgrade via GNOME software.
So it seems more likely to me, that the battery was just empty when the bug occured, and after trying various things such as replacing initramfs files or uboot, the battery was charged up enough that it worked again.
This would also explain why the cause seemed to be an initramfs upgrade in one case, and u-boot upgrades in other case, leading to the same symptom. Because it wouldn't be the actual cause, rather the empty battery would be the real cause.
In my testing I also had an almost empty battery once, and that lead to not being able to boot from SD at all, the pinephone got stuck in a boot loop with only vibrating, led turning on, repeat. That's how I came up with the theory that osk-sdl may start up if installed to eMMC but not behave correctly.
So... really curious if that sounds be plausible, please let me know.
Doesn't really sound plausible to me, because I retried again over the course of 24 hours and it didn't start working again until I nuked the install. So battery doesn't really fit to me. (Edit: the battery could have been on the lower side though when it initially hit, or not at all, I honestly don't remember. But charging fixed nothing.)
I reran the update yesterday, hit sync a bunch of times, and made sure the battery charge stayed high the entire time. Shut down and turned back on today (94% bat / 4.0V), no problems. Though I also think that battery is a rather unlikely cause. (And if it were, it'd also be troublesome. I might sometimes want to reboot my phone with low-ish battery…)
I'll try to reboot later again, when the battery has discharged. [Edit:] No problems there either…
actually verify that the checksum of it matches just to be sure (see validate_checksum)
Unlikely that it's connected to this issue, but out of curiosity: is there anything that ensures that this read comes from storage, and not from block cache?
Thanks for the feedback! Then it's probably not the battery.
Last week I hit this bug (or some variant of it) on edge. I upgraded the pinephone on eMMC with FDE and on rebooting, it showed the "initramfs-extra not found" splash screen. I was very busy analyzing something else, so I only quickly booted from SD, made a copy of all files in /boot (that are probably broken? for later analysis) and copied known-working files from the SD card to /boot. That made it boot again, as expected.
This is the terminal output of postmarketos-mkinitfs I still had open. It looks normal, the "does not have enough free space to copy this file atomically" message is also expected since the initramfs-extra includes mesa and is very big right now, this will change once we use unl0kr. The message only means that instead of copying the file next to the original file, and then renaming both, and deleting the old one, postmarketos-mkinitfs would replace the file directly. Actually I'm not sure if this does relate to the bug somehow, but at least having this message is not unexpected.
2022/08/30 21:33:42 - Including FDE support2022/08/30 21:33:49 - Writing and verifying initramfs-extra archive2022/08/30 21:34:07 == Using boot-deploy to finalize/install files ====> kernel: device-tree blob operations==> u-boot: creating FIT images==> Checking free space at /bootDestination filesystem does not have enough free space to copy this file atomically: /tmp/mkinitfs1956926698/initramfs-extraNeed 61632 kilobytes, have 26544 kilobytes... OK!==> Installing: /boot/initramfs==> Installing: /boot/initramfs-extra*NOT* copying file atomically (not enough free space at target): /tmp/mkinitfs1956926698/initramfs-extra==> Installing: /boot/sun50i-a64-pinephone-1.1.dtb==> Installing: /boot/sun50i-a64-pinephone-1.2.dtb2022/08/30 21:34:29 mkinitfs completed in: 47.868254753sExecuting postmarketos-base-21-r1.trigger
So my latest theory was... since the order of the files inside the cpio archives for initramfs and initramfs-extra are currently not deterministic: maybe busybox cpio fails to extract the initramfs files if they happen to be in a specific order. Like there was a bug in busybox unzip some years ago. Yesterday I tried extracting initramfs and initramfs-extra with busybox cpio and gnu coreutils cpio and ran diffoscope, but they are the same. So this isn't it either.
Another data point is, a friend told me that he saw this happening on an android device running postmarketOS too.
Today I installed pmOS stable with FDE on one pinephone's eMMC and did "sudo mkinitfs" and "sudo reboot" 10 times in a row, hoping to be able to provoke the error. But it worked every time.
I guess I would need to analyze the (broken?) cpio files some more to figure out exactly why this happens. A good explanation why this only happens rarely would be the order of files (currently probably the same order as they were written to the eMMC on upgrade; which would also explain why in my test I always got the same result).
In the meantime, I think it's a good idea to change postmarketos-mkinitfs to put the files in the cpio archives in a deterministic order (postmarketos-mkinitfs#10 (closed)), so when we test generating the initramfs e.g. for a pmOS stable service pack, and it works there, it should also work for everybody else who runs postmarketos-mkinitfs while installing the upgrade. And if it fails in the test, we should be able to consistently reproduce the failure and finally get to the bottom of this.
Even though it seemed unlikely at first, by now I'm pretty sure that this is caused by upgrading u-boot. But only rarely, not every time. I saw this in the pmOS chat scrolling by:
Apparently installing uboot leads to no input o.O
ah for the sake of it I noticed that you can also replace u-boot. Replaced u-boot with the one from v22.06 and that did the trick
and it matches the stories above.
I think we should disable the code that does the automatic u-boot upgrade for now (v22.12 is coming up, so at least we shouldn't run into that bug anymore then). The plan is to move to tow-boot for the pinephone soon, which depends on the big installer rewrite (ondev2). Due to all the other tasks it's progressing rather slowly, but I think in a few months we should be able to do the switch. Once we have tow-boot, we can do the u-boot upgrades with fwupd in a consistent way across all distros, so that should be rock solid.
Oliver Smithchanged title from 22.06 SP1 updates appear to have bricked osk-sdl to unusable state [Original Allwinner PinePhone] / probably broken initramfs-extra to 22.06 SP1 updates appear to have bricked osk-sdl to unusable state [Original Allwinner PinePhone] / probably broken u-boot autoupgrade
changed title from 22.06 SP1 updates appear to have bricked osk-sdl to unusable state [Original Allwinner PinePhone] / probably broken initramfs-extra to 22.06 SP1 updates appear to have bricked osk-sdl to unusable state [Original Allwinner PinePhone] / probably broken u-boot autoupgrade