Commit 812b171a by Sergio Losilla

### Replaced spaces with underscores in images file names.

parent 61d4a021
 ... ... @@ -60,7 +60,7 @@ As I have no idea how to mathematically prove the properties of the hashing algo The most common occurrence is that tests will have quite varied test case names. For this, I selected `"Some name"`,`"Another test"`, `"Feature"` and `"Foobar"` ![](Histograms different.jpg) ![](Histograms_different.jpg) The graph shows quite clearly that for most intents and purposes, this article is a trivial (although admittedly fun) endeavor. All algorithms, including `fnv1a_mod`, are doing Just Fine™. Except for `fnv1a_new`: the final scrambling with the seed does very little to differentiate the hashes, and most of the permutations are never observed. Adding the XOR folding (`fnv1a_new_xor_fold`) helps a lot, but there are still some noticeable biases compared to the rest. ... ... @@ -70,13 +70,13 @@ Problems start to show when we have test cases that differ only in the last char In case you are wondering if longer test names make a difference, the answer is no: The only thing that matters is that all characters but the last are the same. ![](Histograms two similar.jpg) ![](Histograms_two_similar.jpg) ### Two pairs What happens when we have two pairs of similar-looking test names, such as `"a1"`, `"a2"`, `"b1"` and `"b2"`? (Yes, sorry, I was running out of ideas). ![](Histograms two similar pairs.jpg) ![](Histograms_two_similar_pairs.jpg) Well, exactly what we expect: The algorithms that failed for a single pair show only 8 possible combinations, when 0 is adjacent to 1 and 2 is adjacent to 3. `fnv1a_mod_tail` is starting to show some serious bias, but the pattern is more difficult to recognize. Both algorithms with the multiplication fold are still providing a nice, homogeneous distribution. ... ... @@ -84,13 +84,13 @@ Well, exactly what we expect: The algorithms that failed for a single pair show This is the histogram for shuffling `"a1"`, `"a2"`, `"a3"` and "`a4"`: ![](Histograms four similar.jpg) ![](Histograms_four_similar.jpg) In the unlucky event that we have labeled our tests `"Test 1"`, `"Test 2"`, `"Test A"`, etc, most of the tested algorithms do not work very well, and some permutations will be observed very seldom. `fnv1a_mod_mult_fold`, `fnv1a_mod_mult_fold` and in particular `fnv1a_mod_tail` perform quite acceptably, although there are visible biases. Quite surprisingly (since I found it by sheer dumb luck), `fnv1a_new_mult_fold` seems indistinguishable from `random`, even so when we zoom in: ![](Histograms four similar_zoom.jpg) ![](Histograms_four_similar_zoom.jpg) (Yes, I know that this will not score high on [r/dataisbeautiful](https://reddit.com/r/dataisbeautiful) but I hope that it's enough to get the message across) ... ...
 ... ... @@ -26,10 +26,12 @@ for name, result in {k:v for k,v in results.items() if "Histogram" in k}.items() plt.subplots_adjust(left=0.15, right=0.85, bottom=0.15, top=0.85) plt.savefig(name+".jpg") filename_root = name.replace(" ", "_") plt.savefig(filename_root+".jpg") plt.ylim(bottom=1/24*(1-1.e-1), top=1/24*(1+1.e-1)) plt.savefig(name+"_zoom.jpg") plt.savefig(filename_root+"_zoom.jpg") plt.cla() ... ...
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!