Geo: Format filesize string to use as checksum for verification of remote files
What does this MR do and why?
Related issue: #424875 (closed)
This MR resolves the bug introduced by adding verification of remote files by Geo, whereby files with file size when converted to string of odd length were stored incorrectly in the verification_checksum
column of datatype bytea
.
Format filesize string to use as checksum
With verification_checksum
for blob datatypes we are using a bytea
column for two types of data: hash in hex format and an integer as a string (file size).
The column expects a value in a hex format, with each byte being in octets.
The bytea
datatype in the “hex” format that we use encodes binary data as 2
hexadecimal digits per byte, most significant nibble first.
Therefore, when the length of the data is odd, a least significant nibble of value 0*
is added to the last digit. This occurs when file sizes converted to strings are of
odd lengths. It results in the file size string being stored with a trailing 0 which
makes the derived integer value 10x of what it should be.
*In PostgresQL binary strings allow storing octets of value zero.
Use verification_checksum
column for both kinds of checksum (hash and file size)
but format the file size string as an even length string, with least significant
byte of value zero, so that the value of derived integer is constant.
This can be done both when storing the file size string and when comparing it
with the file size on secondary.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
Before | After |
---|---|
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.