Table of Contents
Payload Hash Function
In order to identify or compare parts of an OFMX XML document, we use a well-defined payload hash function to convert the payload of an element into a 128-bit length UUID version 3.
The actual implementation is pretty straight forward, however, you have to make sure to respect the following rules:
- The XML must be schema valid.
- All
@mid
and@source
attributes must be ignored. - Attributes have to be ordered alphabetically while everything else remains in the order of appearance.
- Name extensions of named associations have to be ignored. For instance,
AseUidSameExtent
has to be treated as if it were justAseUid
. - Whitespaces (e.g.
<txtName> STRASBOURG APP</txtName>
orversion=" 1"
) have to be treated as is and must not be collapsed nor stripped. - Empty elements (e.g.
<txtName></txtName>
) or attributes (e.g.version=""
) have to be treated as is and must not be ignored.
For the sake of readability, the implementation is illustrated with the help of this invalid sample OFMX document which contains fictitious attributes, properties and edge cases:
<?xml version="1.0" encoding="utf-8"?>
<OFMX-Snapshot>
<Ser source="LF|AD|AD-2|2019-10-10|2047" active="true" type="essential">
<SerUid>
<UniUid region="LF">
<txtName>STRASBOURG APP</txtName>
</UniUid>
<codeType subversion="1.2" version="1">APP</codeType>
<noSeq>1</noSeq>
</SerUid>
<OrgUidAssoc mid="fd2b4e07-5a80-d3f6-63f2-660d07265922">
<txtName></txtName>
</OrgUidAssoc>
<Stt priority="1">
<codeWorkHr>H24</codeWorkHr>
</Stt>
<Stt priority="2" authority="false">
<codeWorkHr author="">HX</codeWorkHr>
</Stt>
<txtRmk> aka STRASBOURG approche</txtRmk>
</Ser>
</OFMX-Snapshot>
Now let's perform the calculation for two different elements of this example:
*Uid
Element Example
The payload has of a *Uid
element can be used to add a fingerprint @mid
attribute. When the *Uid
element is changed, so will the calculated hash and therefore the @mid
attribute which allows to identify, find and relate features with ease.
In order to calculate the payload hash of the SerUid
element, visualize that part of the document to be flattened and ordered as follows:
SerUid
UniUid (region="LF")
txtName "STRASBOURG APP"
codeType (subversion="2" version="1") "APP" # order attributes alphabetically
noSeq "1"
Here's the actual implementation in valid and easy to read Ruby code:
# Create an empty array
array = []
# Starting with the element which you calculate the payload hash of:
# Push the element name, then its ordered arguments (if any) and then its payload (if any)(if any) and then its payload (if any)
array << "SerUid"
# => ["SerUid"]
# Jump to the next element and do the same
array << "UniUid"
array << "region"
array << "LF"
# => ["SerUid", "UniUid", "region", "LF"]
# Jump to the next element and do the same
array << "txtName"
array << "STRASBOURG APP"
# => ["SerUid", "UniUid", "region", "LF", "txtName", "STRASBOURG APP"]
# Jump to the next element and do the same
array << "codeType"
array << "subversion"
array << "1.2"
array << "version"
array << "1"
array << "APP"
# => ["SerUid", "UniUid", "region", "LF", "txtName", "STRASBOURG APP", "codeType", "subversion", "1.2", "version", "1", "APP"]
# Continue down the list and do the same until you reach the end of the element for which you calculate the payload hash
array << "noSeq"
array << "1"
# => ["SerUid", "UniUid", "region", "LF", "txtName", "STRASBOURG APP", "codeType", "subversion", "1.2", "version", "1", "APP", "noSeq", "1"]
# Convert the array to a string by joining it's members with the pipe symbol:
string = array.join("|")
# => "SerUid|UniUid|region|LF|txtName|STRASBOURG APP|codeType|subversion|1.2|version|1|APP|noSeq|1"
# Build the 128-bit length MD5 represented in HEX
require "digest"
digest = Digest::MD5.hexdigest(string)
# => "6201128fcdc159f41858f30bdfc7f0d3"
# Add the dashes to format the digest as a UUID version 3
uuid = digest.unpack("a8a4a4a4a12").join("-")
# => "6201128f-cdc1-59f4-1858-f30bdfc7f0d3"
# Alternatively, you could add the dashes with a regular expression
uuid = digest.sub(/(\w{8})(\w{4})(\w{4})(\w{4})(\w{12})/, '\1-\2-\3-\4-\5')
# => "6201128f-cdc1-59f4-1858-f30bdfc7f0d3"
You could now use this payload hash to add the @mid
attribute to SerUid
:
<?xml version="1.0" encoding="utf-8"?>
<OFMX-Snapshot>
<Ser source="LF|AD|AD-2|2019-10-10|2047" type="essential" active="true">
<SerUid mid="e24bf066-7ccf-28ab-971b-9c001f1ceff2">
(...)
Any Element Example
You're not limited to *Uid
elements, in fact, you can calculate the payload hash of any element in the same way and then use these hash values to detect changes between different document versions.
In order to calculate the payload hash of the Ser
element, visualize that part of the document to be flattened and ordered as follows:
Ser (active="true" type="essential") # ignore source attribute
SerUid
UniUid (region="LF")
txtName "STRASBOURG APP"
codeType (subversion="1.2" version="1") "APP"
noSeq "1"
OrgUid # ignore name extension "Assoc" and mid attribute
txtName "" # don't ignore empty element
Stt (priority="1")
codeWorkHr "H24"
Stt (authority="false" priority="2") # order attributes alphabetically
codeWorkHr (author="") "HX" # don't ignore empty attribute
txtRmk " aka STRASBOURG approche" # don't collapse/strip whitespace
Here's the actual implementation in valid and easy to read Ruby code:
# Create an empty array
array = []
# Starting with the element for which you calculate the payload hash:
# Push the element name, then its ordered arguments (if any) and then its payload (if any)
array << "Ser"
array << "active"
array << "true"
array << "type"
array << "essential"
# => ["Ser", "active", "true", "type", "essential"]
# Seek to the next element and do the same
array << "SerUid"
# => ["Ser", "active", "true", "type", "essential", "SerUid"]
# Seek to the next element and do the same
array << "UniUid"
array << "region"
array << "LF"
# => ["Ser", "active", "true", "type", "essential", "SerUid", "UniUid", "region", "LF"]
# Seek to the next element and do the same
array << "txtName"
array << "STRASBOURG APP"
# => ["Ser", "active", "true", "type", "essential", "SerUid", "UniUid", "region", "LF", "txtName", "STRASBOURG APP"]
# Seek to the next element and do the same
array << "codeType"
array << "subversion"
array << "1.2"
array << "version"
array << "1"
array << "APP"
# => ["Ser", "active", "true", "type", "essential", "SerUid", "UniUid", "region", "LF", "txtName", "STRASBOURG APP", "codeType", "subversion", "1.2", "version", "1", "APP"]
# Continue down the list and do the same until you reach the end of the element for which you calculate the payload hash
array << "noSeq"
array << "1"
array << "OrgUid"
array << "txtName"
array << ""
array << "Stt"
array << "priority"
array << "1"
array << "codeWorkHr"
array << "H24"
array << "Stt"
array << "authority"
array << "false"
array << "priority"
array << "2"
array << "codeWorkHr"
array << "author"
array << ""
array << "HX"
array << "txtRmk"
array << " aka STRASBOURG approche"
# => ["Ser", "active", "true", "type", "essential", "SerUid", "UniUid", "region", "LF", "txtName", "STRASBOURG APP", "codeType", "subversion", "1.2", "version", "1", "APP", "noSeq", "1", "OrgUid", "txtName", "", "Stt", "priority", "1", "codeWorkHr", "H24", "Stt", "authority", "false", "priority", "2", "codeWorkHr", "author", "", "HX", "txtRmk", " aka STRASBOURG approche"]
# Convert the array to a string by joining it's members with the pipe symbol:
string = array.join("|")
# => "Ser|active|true|type|essential|SerUid|UniUid|region|LF|txtName|STRASBOURG APP|codeType|subversion|1.2|version|1|APP|noSeq|1|OrgUid|txtName||Stt|priority|1|codeWorkHr|H24|Stt|authority|false|priority|2|codeWorkHr|author||HX|txtRmk| aka STRASBOURG approche"
# Build the 128-bit length MD5 represented in HEX
require "digest"
digest = Digest::MD5.hexdigest(string)
# => "6d4f1c380f0423a728ccc3a1bbfa21ce"
# Add the dashes to format the digest as a UUID version 3
uuid = digest.unpack("a8a4a4a4a12").join("-")
# => "6d4f1c38-0f04-23a7-28cc-c3a1bbfa21ce"
# Alternatively, you could add the dashes with a regular expression
uuid = digest.sub(/(\w{8})(\w{4})(\w{4})(\w{4})(\w{12})/, '\1-\2-\3-\4-\5')
# => "6d4f1c38-0f04-23a7-28cc-c3a1bbfa21ce"
Reference Implementation
The AIXM gem for Ruby includes the reference implementation for the payload hash function.
Furthermore, it features two executables mkmid
and ckmid
which come in handy when inserting or checking mid
attributes into existing OFMX files.