Commit 812b171a authored by Sergio Losilla's avatar Sergio Losilla

Replaced spaces with underscores in images file names.

parent 61d4a021
......@@ -60,7 +60,7 @@ As I have no idea how to mathematically prove the properties of the hashing algo
The most common occurrence is that tests will have quite varied test case names. For this, I selected
`"Some name"`,`"Another test"`, `"Feature"` and `"Foobar"`
![](Histograms different.jpg)
![](Histograms_different.jpg)
The graph shows quite clearly that for most intents and purposes, this article is a trivial (although admittedly fun) endeavor. All algorithms, including `fnv1a_mod`, are doing Just Fine™. Except for `fnv1a_new`: the final scrambling with the seed does very little to differentiate the hashes, and most of the permutations are never observed. Adding the XOR folding (`fnv1a_new_xor_fold`) helps a lot, but there are still some noticeable biases compared to the rest.
......@@ -70,13 +70,13 @@ Problems start to show when we have test cases that differ only in the last char
In case you are wondering if longer test names make a difference, the answer is no: The only thing that matters is that all characters but the last are the same.
![](Histograms two similar.jpg)
![](Histograms_two_similar.jpg)
### Two pairs
What happens when we have two pairs of similar-looking test names, such as `"a1"`, `"a2"`, `"b1"` and `"b2"`? (Yes, sorry, I was running out of ideas).
![](Histograms two similar pairs.jpg)
![](Histograms_two_similar_pairs.jpg)
Well, exactly what we expect: The algorithms that failed for a single pair show only 8 possible combinations, when 0 is adjacent to 1 and 2 is adjacent to 3. `fnv1a_mod_tail` is starting to show some serious bias, but the pattern is more difficult to recognize. Both algorithms with the multiplication fold are still providing a nice, homogeneous distribution.
......@@ -84,13 +84,13 @@ Well, exactly what we expect: The algorithms that failed for a single pair show
This is the histogram for shuffling `"a1"`, `"a2"`, `"a3"` and "`a4"`:
![](Histograms four similar.jpg)
![](Histograms_four_similar.jpg)
In the unlucky event that we have labeled our tests `"Test 1"`, `"Test 2"`, `"Test A"`, etc, most of the tested algorithms do not work very well, and some permutations will be observed very seldom.
`fnv1a_mod_mult_fold`, `fnv1a_mod_mult_fold` and in particular `fnv1a_mod_tail` perform quite acceptably, although there are visible biases. Quite surprisingly (since I found it by sheer dumb luck), `fnv1a_new_mult_fold` seems indistinguishable from `random`, even so when we zoom in:
![](Histograms four similar_zoom.jpg)
![](Histograms_four_similar_zoom.jpg)
(Yes, I know that this will not score high on [r/dataisbeautiful](https://reddit.com/r/dataisbeautiful) but I hope that it's enough to get the message across)
......
......@@ -26,10 +26,12 @@ for name, result in {k:v for k,v in results.items() if "Histogram" in k}.items()
plt.subplots_adjust(left=0.15, right=0.85, bottom=0.15, top=0.85)
plt.savefig(name+".jpg")
filename_root = name.replace(" ", "_")
plt.savefig(filename_root+".jpg")
plt.ylim(bottom=1/24*(1-1.e-1), top=1/24*(1+1.e-1))
plt.savefig(name+"_zoom.jpg")
plt.savefig(filename_root+"_zoom.jpg")
plt.cla()
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment