Regression report: Disk subsystem I/O failures/issues surfacing in DOS/early Windows [two separate issues: one bisected, one root-caused]

This bug has been copied automatically from: https://bugs.launchpad.net/qemu/+bug/1745312
[Headsup: This report is long-ish due to the amount of detail I've
stumbled on along the way that I think is relevant to include. I can't
speak as to the complexity of the actual bugs, but the size of this report
should not suggest that the reproduction process is particularly headache-
inducing.]

Hi!

I recently needed to fire up some ancient software for research purposes
and got very distracted discovering and playing with old versions of
Windows :). In the process I've discovered some glitches with disk I/O.

I believe I've stumbled on two completely separate issues that
coincidentally surfaced at the same time. It's possible that components of
this report will be re-filed as more specific new bugs, but I'm not an
authority on QEMU internals or how to narrow down/categorize what I've
found.

- The first bug only surfaces when the "isapc" machine type is used. It
intermittently produces "General failure {read,writ}ing drive _" under MS-
DOS 6.22, and also somehow interferes with early bootstrap of Windows NT 4
(in NTLDR). Enabling or disabling KVM (I'm on Linux) appears to make no
difference whatsoever, which may help with debugging.

- The second issue involves
  - a WinNT4 disk image
  - created by running through a bog-standard NT4 install inside QEMU
2.9.0
  - which will now fail to boot in any version of QEMU - even version 1.0
    - but which VirtualBox will boot fine
      - but only if I point VirtualBox at QEMU's raw disk image via a
        hacked-together VMDK file
      - if the raw image is converted to VHD(X), VirtualBox will also fail
        to boot the image with exactly the same error as QEMU
      - this state of affairs is not affected by image sparseness (which
makes
        sense)

I'm confident I've bisected the first issue.

I wasn't able to bisect the second issue (as all tested versions of QEMU
behaved identically), but I've figured out a working repro testcase and I
believe I've managed to pin down a solid root cause.



== #1: Intermittent I/O issues when `-M isapc` is used =====

These symptoms sometimes take a small amount of time and fiddling to
trigger, but I AM able to consistently surface them on my machine after a
short while. (I am very very interested to hear if others cannot reproduce
them.)

So, first of all:

https://github.com/qemu/qemu/commit/306ec6c3cece7004429c79c1ac93d49919f1f1cc
  (Jul 30 2013): the last version that works

https://github.com/qemu/qemu/commit/e689f7c668cbd9d08f330e17c3dd3a059c9553d3
  (Oct 30 2013): the first version that intermittently fails

Maybe lift out and build these branches while reading. *shrug*
(How to do this can be found at the end of this report - along with a
time-saving ./configure line, FWIW)

Here are the changelists between these two revisions:

https://github.com/qemu/qemu/compare/306ec6c...e689f7c
(Compare direction: OLD to NEW) (Commits: 166  Files changed: 192)

https://github.com/qemu/qemu/compare/e689f7c...306ec6c
(Compare direction: NEW to OLD) (Commits: 30   Files changed: 22)

(Someone else more familiar with Git might know why GitHub returns results
for both compare directions, and/or if the 2nd link is useful information.
The first link returns a lot more results than the 2nd one, at least. Does
comparing new>old return deletions?)

---

Now on to the symptoms. In a moment I'll describe reproduction.

# MS-DOS 6.22

The first symptom I discovered was that trivial read and write operations
under MS-DOS would sometimes fail:

  C:\>echo test > hi

  General failure writing drive C
  Abort, Retry, Fail?

Anything else that exercises the disk behaves similarly:

  C:\>dir /s > nul

  General failure reading drive C
  Abort, Retry, Fail?

(Note that the above demonstrates both write and read failures)

(Also, FWIW, `dir /s` == `ls -R`)

The behavior of the I/O errors is not possible to characterise as it
fluctuates so much. For example something as simple as DIR can produce
wildly differing results: in one run, poking around with DIR ended with
DOS deciding C:\ was empty at one point; at another point in a different
run C:\ mysteriously dropped 50% of its contents only to magically gain it
all back moments later after some poking around in one of the
subdirectories that was still visible.

The time it takes to trigger these errors is also highly variable. QEMU
may fall over as early as hanging forever at "Starting MS-DOS...", or I
might get all the way into Windows 3.1 before it triggers (in which case
Win3.1 reports vague memory errors - of all things).

Very occasionally I've seen _SeaBIOS itself_ report "Booting from Hard
Disk..." "Boot failed: could not read the boot disk" ... "No bootable
device.", and on one occasion I even got "Non-System disk or disk error"
"Replace and strike any key when ready"!


# WinNT 4 Terminal Server

Most of the time, NTLDR will fire up normally. But every so often...

  SeaBIOS (version
rel-1.7.3-117-g31b8b4e-20131206_080705-nilsson.home.kraxel.org)

  Booting from Hard Disk...
  A disk read error occurred.
  Insert a system diskette and restart
  the system.

(NB. You're seeing the old SeaBIOS version included with e689f7c, which
was the first buggy commit.)

If NT gets past this point without erroring out (ie, it makes it to the
boot menu), the rest of the system is 100% fine and there are no other
disk I/O issues whatsoever. For example, on QEMU 2.9.0 I was able to
enable disk compression, answer "Yes" to "Compress entire disk now?" and
have the process fully complete. No hitches.

This makes me vaguely recall/wonder that perhaps this could be somehow
related to LBA and/or Int 13h, or something floating around near that
bunch of functionality. (I'm woefully ignorant about such low-level
details.) Perhaps DOS/Win3.1 are stuck using a disk mode that QEMU has a
buggy implementation of, while NT 4 (once NTOSKRNL is up and running) is
able to use a different disk mode or access mechanism.

I'm really interested to get some understanding of what the root issue is
here, when this is fixed. (I wonder if it's a timing thing?)

I've observed some unusual behavior with repeated restarts. In one case, I
attempted to start NT4 multiple times, and QEMU consistently failed with
"No bootable device" each time. So, I removed `-M isapc`, promptly got a
boot menu, hit ^C, readded `-M isapc` - and continued to get a boot menu.
Yep. I'll accept "really really big coincidence" but I do very much wonder
if something else is going on here. I've observed many similar incidents.
It makes me wonder whether the contents of memory or some other system
state is an influence. Very probably not, but still...



-- Reproduction --------------------------------------------

First of all, there was unfortunately no way for me to avoid having to
post entire disk images, but I've managed to compress everything down to
174MB total download size.

FWIW, WinWorld and many other sites seem to have no operational issues
providing clear pointers to CD keys; I consider my distribution of my
installed HDD images an extension of the apparent status quo.

That being said, I've put everything on Google Drive so nobody has to
headscratch about Launchpad/Canonical/etc's stance on hosting this data.

So, this folder contains the disk images:
https://drive.google.com/drive/folders/1WdZVBh5Trs9HLC186-nHyKeqaSxfyM2c
("Download all" at the top-right will create a ZIP file, but FWIW
downloading the individual files simultaneously would implement a rough
form of download acceleration)

File meta info:

Compressed
|
|      Apparent
|      |    Actual
|      |    |
38M -> 200M (103M)  win31.img.xz
82M -> 1G   (289M)  wnt4ts-broken.img.xz
55M -> 350M (146M)  wnt4ts-intermittent.img.xz

SHA-256s:

win31:
8179b8180a2ab40bd472e8a2f3fb89fc331651e56923f94ceb9e52a78ee220d2
broken:
a2af5f0bc49a063b75f534b6ffe5b82e32ecc706a64a425b6626feccf6e3fdfa
intermittent:
77ae8c458829ebcdd64c71042012f45d5a2788e6ebd22db9d53de9ef1a574784

(Wanted to keep the checksum lines within 80 columns)

And, since I can't figure out where else in this report to put this,
wnt4ts-broken.img's password is "admin" but something seems to have
happened to the disk and NT doesn't actually boot properly :(, and wnt4ts-
intermittent.img's password is "1234". (These were set up as test images.
Now I'm _really_ glad I used simple passwords! :) )

---


I have two testcases: DOS 6.22 (+ Windows 3.1), and Windows NT 4.


# MS-DOS

DOS is the simplest. It basically consists of

$ qemu-system-i386 -drive file=win31.img,format=raw -M isapc -enable-kvm

And then literally just playing around. Things to try include creating
files (`echo blah > file`), repeatedly seeking across the entire FAT (`dir
/s > nul` or `dir /s`), and launching Windows (`win`).

win31.img is not special (as far as I can tell) and merely consists of the
result of installing DOS 6.22 and Windows 3.1 from WinWorldPC. I've
basically just included the image for convenience.

Generally no single "run" is immune to starting Win3.1 and then launching
File Manager; if that doesn't generate an error, something is definitely
up.

The second best trigger is creating new files. That very very frequently
produces "General Failure ...", but not always.


# WinNT 4

Windows NT 4 is a bit more complicated. Because this error only occurs at
presumably a single small point very early in boot, the window of
opportunity for the glitch to surface within is much much narrower and
thus often requires a larger number of tries.

Anecdotally I've had QEMU hit the boot error at the first try/run, and
after as many as 63 "successful" boots.

I made a small test harness that automates the launch process. It consists
of two shell scripts and requires tmux (and netcat). (*Potential epilepsy
warning*: if you use a light-colored terminal background, the terminal
QEMU is repeatedly invoked from will continuously flash rapidly from white
to black.)

One of the scripts is run inside a tmux session in one terminal, while the
other script is run in its own terminal (without any tmux).


I named this one `run-qemu-loop`:

--8<--------------------------------------------------------

#!/bin/bash

# ---

qemu=/path/to/qemu-system-i386
#or, alternatively: (I used the following line myself so I
#could tab-complete my way to different qemu executables)
#qemu="$1"

disk=/path/to/wnt4ts-intermittent.img

# ---

port=4444

rm -f STOP itercount

itercount=0

while :; do

        [ -f STOP ] && break

        ((itercount++))
        echo $itercount > itercount

        $qemu \
                -enable-kvm -vga cirrus -curses -M isapc \
                -drive file="$disk",format=raw \
                -chardev
socket,id=mon0,host=localhost,port=$port,server,nowait \
                -mon chardev=mon0,mode=readline

        #point to an otherwise-unused terminal if you like (see also:
`tty`)
        #echo "$itercount run(s)" > /dev/pts/__

done

------------------------------------------------------------

Not much logic above; this just repeatedly runs QEMU for as long as
the file `STOP` does not exist in the current directory.

The key "magic" bit is that QEMU is launched in -curses mode.

The other key bit is that the above script is run inside tmux.


Here's `tmux-ctl-loop`:

--8<--------------------------------------------------------

#!/bin/bash

port=4444

tmux=./tmux

printf -v l '%0.0s-' {0..25}
h1="$l/ buffer dump begin \\$l"
h2="$l-\\ buffer dump end /-$l"

while :; do

        while :; do
                echo | nc localhost $port -q0 -w1 > /dev/null && break
                echo 'Start qemu!'
        done

        buffer="$(tmux -S $tmux capture-pane; tmux -S $tmux save-buffer
-)"

        echo "$h1"
        [[ "$buffer" ]] && echo "$buffer" || echo '( * Screen buffer is
empty * )'
        echo "$h2"

        if echo "$buffer" | grep -q 'A disk read error occurred.'; then

                s="<Crashed after $(< itercount) runs.>"
                echo "$s"
                echo "$s" >> stats

                touch STOP

                #echo q | nc localhost $port -q0 > /dev/null

                exit

        elif echo "$buffer" | grep -q 'OS Loader V4.00'; then

                echo '<Booted successfully, trying again>'

                echo q | nc localhost $port -q0 > /dev/null

        else

                echo '<Waiting for boot>'

        fi

done

------------------------------------------------------------

Nothing particularly amazing going on here either.

While `qemu-run-loop` is running inside tmux in the first terminal, this
is running in the 2nd one.

The small infinite loop at the top only breaks when it can successfully
ping QEMU and it knows it's running.

Then, a screendump of the contents of the terminal QEMU is in is fetched
from tmux, and the buffer's content is analyzed.

- If NTLDR fails, the script creates `STOP` to halt qemu-run-loop,
  sends `q` to QEMU through netcat, and then the script exits.

- If NTLDR loads successfully, the script sends `q` to QEMU and continues
  looping. (qemu-run-loop will not find the `STOP` file, and so restart
qemu.)

The scripts run very quickly, with 2-3 iterations per second on my i3 box.



# Usage

Save the two scripts above to the same directory as wnt4ts-
intermittent.img,
then:

- (If port 4444 doesn't work, the value needs to be changed in both
scripts.)

- In the first terminal, run `tmux -S <file>`, where <file> names the
socket
  tmux will use. This needs to match "tmux=" at the top of `tmux-ctl-
loop`.
  (with `tmux=./tmux`, the command would be `tmux -S tmux`)

- Still in the first terminal (and now also inside tmux), enter
  `./qemu-run-loop`, passing the path to qemu if you're using that
approach
  (refer to the first few lines of the script). Don't hit enter yet.

- Now, in the 2nd terminal, type `./tmux-ctl-loop`

- Hit enter in both terminals.


Rationale for timing of Enter key:

- Running qemu-run-loop first will start QEMU, and if NTLDR starts
  successfully it will immediately begin counting down from 30. If NT
actually
  starts to boot and is then hard-shut-down this /may/ affect the disk
image

- tmux-ctl-loop will annoyingly spam a continuous stream of 'Start qemu!'
until
  qemu-run-loop is running.

- Starting both scripts at "more or less" the same time (no rush) works
out
  well.


Hopefully potential script modifications are obvious; for example

- changing tmux-ctl-loop to not send 'q' to qemu so you can connect to the
HMP
  yourself
  (NB, if `STOP` is not created, when qemu finally exits it will of course
  promptly be relaunched)

- pointing run-qemu-loop to a modified qemu binary



== #2: QEMU-vs-VirtualBox image issue ======================

I was initially completely stumped by this issue, perhaps unsurprisingly
so. :)

wnt4ts-broken.img is a perfectly ordinary NT 4 installation that was
created in QEMU 2.9.0. I created a 1GB disk with `truncate`, picked NTFS
and installed everything (which took a while).

NT setup reboots a number of times during the boot process, and IIRC those
all went just fine. However, at some point, the image began to
consistently bomb out with "A disk read error occurred. ...", and
stubbornly refused to boot, regardless of the number of boot attempts I
tried.

QEMU 2.0.0, 1.5.0, and 1.0 (the earliest version I was able to build on my
system) all consistently hit "disk read error occurred".

I tried compiling QEMU 1.0 using clang so I could build for 32-bit on my
64-bit system (GCC 7 died with "Frame pointer required, but reserved").
The resulting qemu completely crashed if I didn't enable KVM (ie, TCG was
(understandably) broken); with KVM enabled qemu didn't crash, but NTLDR
halted with the same error as on 64-bit qemu. (TL;DR, no difference
whatsoever.)

My initial reaction at this point was to try the image on another
virtualization platform. My first pick was VirtualBox.

So, I followed the official instructions for pointing VirtualBox to
physical disk images, except I substituted a /dev/loopN device I'd pointed
to the image file via losetup.

And... VirtualBox picked the image up fine and Just Worked(TM). Yay! - but
not yay. What gives?!

Confused, I then tried to convert the disk image to VHD format.
Unfortunately, for some reason, if I try `qemu-image convert ... -O vhdx
...`, VirtualBox chokes on the result:

-----

VD: error VERR_NOT_SUPPORTED opening image file
'/.../wnt4ts-broken-qemuconv.vhd' (VERR_NOT_SUPPORTED).

Result Code: NS_ERROR_FAILURE (0x80004005)
Component: MediumWrap
Interface: IMedium {4afe423b-43e0-e9d0-82e8-ceb307940dda}
Callee: IVirtualBox {0169423f-46b4-cde9-91af-1e9d5b6cd945}
Callee RC: VBOX_E_OBJECT_NOT_FOUND (0x80BB0001)

-----

Welp.

Well, a bit more digging later, and I found I could do

$ VBoxManage convertfromraw wnt4ts-broken.img wnt4ts-broken.vhd

but... as soon as I pointed VirtualBox to this, it too began to choke with
"A disk read error occurred".

And yet, the VMDK->raw image setup worked just fine.

I found I could even replace the loop device with the path of the .img
file itself and that worked just fine too.

At my wits' end, I followed some online instructions to learn about manual
CHS configuration so I could try and get the image working in Bochs. "A
disk read error occurred". I wasn't surprised.

It was at this point I began to give up, but I decided to try One Last
Thing(TM) before properly throwing in the towel.

:)

I decided to learn a bit more about how `VBoxManage internalcommands
createrawvmdk` worked, and try one thing in particular: I can edit the
.vmdk file, but can I point `createrawvmdk` at the .img file directly too?

Turns out, yes you can.

It also turns out that this promptly caused VirtualBox to bomb out.

Interesting.

For reference, here's the VMDK file I initially created (by pointing
`createrawvmdk` at /dev/loopN) and then later edited to point straight to
the .img file, with both approaches resulting in successful boot.

--8<--------------------------------------------------------

# Disk DescriptorFile
version=1
CID=e35b9a45
parentCID=ffffffff
createType="fullDevice"

# Extent description
RW 1536000 FLAT "/absolute/full/path/to/wnt4ts-broken.img" 0

# The disk Data Base
#DDB

ddb.virtualHWVersion = "4"
ddb.adapterType="ide"
ddb.geometry.cylinders="1523"
ddb.geometry.heads="16"
ddb.geometry.sectors="63"
ddb.uuid.image="871a6044-c8ca-48ed-b7aa-e6fc49da3db4"
ddb.uuid.parent="00000000-0000-0000-0000-000000000000"
ddb.uuid.modification="3661715c-3906-4e4a-ab65-486d140e03b8"
ddb.uuid.parentmodification="00000000-0000-0000-0000-000000000000"
ddb.geometry.biosCylinders="761"
ddb.geometry.biosHeads="32"
ddb.geometry.biosSectors="63"

------------------------------------------------------------


Here's the _diff_ of what happens if I point `createrawvmdk` at wnt4ts-
broken.img directly:

--8<--------------------------------------------------------

ddb.geometry.cylinders="2080"
ddb.geometry.heads="16"
ddb.geometry.sectors="63"

------------------------------------------------------------

:D

Naturally,

$ qemu-system-i386 -drive file=wnt4ts-
broken.img,format=raw,cyls=1523,heads=16,secs=63 -M isapc -sdl

will boot happily on 2.9.0 (notwithstanding the occasional "disk read
error occurred" documented above).

It will also boot in 1.6.0.

(POTENTIAL BUG HEADSUP: 1.0 and 1.5.0 both lock up with a blank 640x480
window and use 0% CPU if I specify `-M isapc`.)

And, of course, using these CHS values in Bochs also results in successful
boot as well (after setting the CPU type to pentium).

Unfortunately, I have no idea what sequence of events caused the creation
of the VMDK file above. No invocation of `createrawvmdk` is producing a
VMDK file with the CHS settings above.

I've only just begun to learn about the intricacies of CHS. Am I to
understand that these values are stored amongst the first 512 bytes of the
disk? If this is the case, then I wonder what changed the data, and why. I
was initially only using QEMU 2.9.0, and didn't move the image to
different VMs or QEMU versions. Perhaps Windows NT got confused about the
disk CHS and rewrote it?


== Sporadic BIOS-level boot failure ========================

I have multiple screenshots of SeaBIOS in QEMU 2.9.0 halting with "No
bootable device" (et al), even with the above manually-applied CHS
settings.

Commit e689f7c also presents such errors.

Commit 306ec6c does not suffer from intermittent breakage of any kind:

- No SeaBIOS flake-outs
- No "Non-system disk or disk error"
- No "A disk error has occurred"
- No "General failure ..."

While most of my confidence in commit 306ec6c is based on anecdotal
evidence, I modified `tmux-ctl-loop` a little to soak-test BIOS-level I/O
stability and left this modified version running for a few minutes.

--8<--------------------------------------------------------

#!/bin/bash

port=4444

tmux=./tmux

printf -v l '%0.0s-' {0..25}
h1="$l/ buffer dump begin \\$l"
h2="$l-\\ buffer dump end /-$l"

while :; do

        while :; do
                echo | nc localhost $port -q0 -w1 > /dev/null && break
                echo 'Start qemu!'
        done

        buffer="$(tmux -S $tmux capture-pane; tmux -S $tmux save-buffer
-)"

        echo "$h1"
        [[ "$buffer" ]] && echo "$buffer" || echo '( * Screen buffer is
empty * )'
        echo "$h2"

        if echo "$buffer" | grep -q 'Non-system disk' || echo "$buffer" |
\
                grep -q 'No bootable device'
        then

                s="<Hit error after $(< itercount) runs.>"
                echo "$s"
                echo "$s" >> stats

                touch STOP

                #echo q | nc localhost $port -q0 > /dev/null

                exit

        elif echo "$buffer" | grep -q 'OS Loader V4.00' || echo "$buffer"
| \
                grep -q 'A disk read error'
        then

                echo '<Boot did not hang at BIOS, trying again>'

                echo q | nc localhost $port -q0 > /dev/null

        else

                echo '<Waiting for boot>'

        fi

done

------------------------------------------------------------

For the above to work, the top of run-qemu-loop must also be modified to
read something along the lines of

disk=/path/to/wnt4ts-broken.img,format=raw,cyls=1523,heads=16,secs=63

(Suggestion: modify copies of both scripts)

One small terminal-flicker-headache (and a 57°C CPU) later, I was able to
carefully observe just over 350 successful runs in which QEMU commit
306ec6c only ever produced a boot menu. No other hitches.

** Important: **

However, commit 306ec6c will fail to boot, ever, if the cylinders and
geometry are not set to the values VirtualBox "discovered". (Of note is
the fact that QEMU (2.9.0) was what initially created this image. I must
admit that I don't remember what sequence of QEMU versions I fed the image
to - and I maybe, possibly, didn't think to back the file up (sorry), so
maybe something mangled something somewhere. But VirtualBox figured it out
nonetheless!)

Furthermore, feeding /dev/loopN to any QEMU version will NOT result in
correct CHS discovery (and successful boot).

This is what leads me to conclude that I've discovered two separate
issues.



== Appendix: How to build the branches =====================

It's very simple.

First, `git clone https://github.com/qemu/qemu` somewhere if you don't
already have a local copy. If you have an old git checkout that's from
2014 or later, you can use that old checkout instead. (If you want to test
an old checkout you have, the commands below will either work perfectly or
completely bomb out with no side effects.)

A full checkout is a ~183MB download. Sorry.

Next, create two new directories somewhere. Name them what you like, eg
`qemu-working` and `qemu-broken`.

Now, cd into the checkout directory, and run:

$ git archive 306ec6c3cece7004429c79c1ac93d49919f1f1cc | tar xC
/path/to/qemu-working/

$ git archive e689f7c668cbd9d08f330e17c3dd3a059c9553d3 | tar xC
/path/to/qemu-broken/

The paths can be relative.

Now, run this in both of the new directories:

$ ./configure --python=python2.7 --disable-libssh2 --disable-seccomp
--disable-usb-redir --disable-guest-agent --disable-libiscsi --disable-
spice --disable-smartcard-nss --disable-vhost-net --disable-docs
--disable-attr --disable-cap-ng --disable-vde --disable-user --disable-
bluez --disable-vnc-ws --disable-xen --disable-brlapi --enable-debug
--target-list=i386-softmmu --disable-fdt

$ make -j64

You can open two terminals and configure and build both simultaneously if
you like.

On my decent but very basic (2-core+HT) i3 box, -j64 actually works out -
make doesn't actually launch too many gcc processes. You *will* see your
system load spike to ~20 though :)
(NB. Do. not. use. -j64. with. the. linux. kernel.)

On my system, a single build with -j64 takes only about 35 seconds. C FTW.
(Although this has increased to 1min20sec for more recent builds.)

Most of the configure arguments remove functionality I'll never use (in
this situation) and which will only slow down the build.

Once QEMU is built, run qemu-system-i386 directly from where it has been
built.

$ /path/to/qemu-working/i386-softmmu/qemu-system-i386 ...
$ /path/to/qemu-broken/i386-softmmu/qemu-system-i386 ...

Again, the paths can be relative.