Strong's Hebrew numbers: A frequency comparison of the KJV and the OSHB modules
The Open Scriptures Hebrew Bible project is setting a new standard for the semantic markup of the OT.
I therefore thought it would be a very useful and interesting exercise to compare the Strong's Hebrew numbers for these two modules:
- KJV version 3.1 currently in Beta
- OSHB version 2.1.3 dated 2022-01-29
The first task would be to get a measure of the magnitude and scope of the differences.
I therefore used the BabelPad Tools menu to generate a word frequency count for the extracted H#####
numbers, having first normalized the H-numbers to a length of 5 digits for both modules.
I then pasted the two tables into an Excel® worksheet, and added a column with ingenious formulae in order to align the rows, as well as one to calculate the difference in counts for each H-number.
Here is my initial summary:
- The OSHB had 299556 Strong's H-number tags, whereas the KJV had only 229267 H-number tags!
- The total difference (OSHB-KJV) was thus 70289.
- Of the 8674 H-numbers, 1234 had different counts, thus only 85.77% of H-numbers had the same count.
- The range of numerical difference in counts (OSHB-KJV) was from -522 to 10956.
- There are 42 H-numbers not used (so far) in the OSHB module. See below:
A list of unused H-numbers (normalized to 4 digits here) in the OSHB:
H0025, H0058, H0381, H0382, H0415, H0416, H0704, H0798, H1019, H1037, H1040, H1045, H1181, H1531, H1686, H1723, H1751, H1857, H1928, H2089, H2316, H2402, H2618, H3070, H3072, H3073, H3074, H3128, H3197, H3264, H3614, H3661, H4000, H4078, H5770, H5886, H5899, H5993, H7207, H7741, H8281, H8623