Changes
Page history
Document compression stream format
authored
Aug 22, 2019
by
Nico Bendlin
Show whitespace changes
Inline
Side-by-side
compression.md
View page @
7d2d3f9f
...
...
@@ -10,6 +10,103 @@ struct am2lob {
};
```
TODO: document decompression algorithm and in-place decompression issues.
Stream format
=============
The LOB compression method 6 is a variant of the
[
Lempel–Ziv–Storer–Szymanski (LZSS)
](
https://en.wikipedia.org/wiki/LZSS
)
compression algorithm.
Flags octet
-----------
A stream sequence starts with an octet that contains the flags for the codes
that follow. The flags are tested from the most significant bit to the least
significant bit:
| bit 7 | bit 6 | bit 5 | bit 4 | bit 3 | bit 2 | bit 1 | bit 0 |
|:------|:------|:------|:------|:------|:------|:------|:------|
| flg 0 | flg 1 | flg 2 | flg 3 | flg 4 | flg 5 | flg 6 | flg 7 |
If a flag is set the next code is a literal, else a dictionary reference.
The next sequence begins after all eight codes have been procecessed
(the next flags octet is read).
Literal code
------------
The code is a single octet with raw data.
On decoding the octet is written as is to the current output position.
Dictionary code
---------------
The offset/length pair is encoded with two octets:
| bit 7 | bit 6 | bit 5 | bit 4 | bit 3 | bit 2 | bit 1 | bit 0 |
|:-------|:-------|:-------|:-------|:-------|:-------|:-------|:-------|
| off 11 | off 10 | off 9 | off 8 | len 3 | len 2 | len 1 | len 0 |
| bit 7 | bit 6 | bit 5 | bit 4 | bit 3 | bit 2 | bit 1 | bit 0 |
|:-------|:-------|:-------|:-------|:-------|:-------|:-------|:-------|
| off 7 | off 6 | off 5 | off 4 | off 3 | off 2 | off 1 | off 0 |
The minimum length is 3 (
`length = (code[0] & 0x0F) + 3`
),
therefore the maximum number of octets to copy is 18.
The offset base is the current output position
(
`offset = ((code[0] << 4) & 0x0F00) | code[1]`
),
therefore the maximum distance is 4095 octets.
On decoding length octets are copied to the output position,
starting from initial output position minus offset.
Please note that the source and destinations might overlap
(copy single octets, neither use memcpy, nor
[
memmove
](
https://en.cppreference.com/w/c/string/byte/memmove#Notes
)
).
Please note that the offset value could be 0. This would result in
undefined behavior during decompression, and such a stream should
be considered invalid.
End Of Stream
-------------
Since the interpretation of the compressed stream is completely octet-based,
there is no need for an End Of Stream (EOS) sequence. The decoding immediately
stops when the size of the expected raw output is reached (known from header).
For literal codes, this is detected reliably. But the original decoder only
tests for the end of the output after writing the full length dictionary copy.
Therefore it would be possible to craft a compressed data stream that results
in a buffer overflow in the original game. Custom implementations should test
for this special case and handle it properly.
Notes
=====
It is strongly recommended that the size of compressed data in the LOB header
exactly matches file/item size minus the LOB header, since it is used in the
original decoder to move the data to the end of the buffer before starting
the in-place decompression. The game has an internal list of properties for
every game file/archive, which includes the displacement for the LOB routine.
Note that some files/archives have a displacement of 0
(which, besides some very special cases, effectively prohibits compression).
Since the displacement for the in-place decompression is fixed, it is
theoretically possible, that the stream contains self-modifying data.
Even if this is very unlikely, a custom packer would have to care about this.
The game and the original decoder have no additional checks for the header.
Only the compression method is verified, but this will not result in a failure,
and just stops decompression after moving the header and the data to the end
of the buffer (even more worse, the compression method is used to look up an
additional offset, which might result in a buffer overflow at run-time).
Trivia: LOB =
[
Lothar Becks
](
https://www.mobygames.com/developer/sheet/view/developerId,177857/
)
\ No newline at end of file