StringHash/Analysis

User Defined Functions &raquo; StringHash &raquo; Analysis

This is an analysis of bit transmutation and distribution patterns of the StringHash function, mainly in comparison to the "Java hash", a variant of Dan Bernstein's old "33" hash posted long ago in the comp.lang.c newsgroup.

= Reference code =

Java hash
for i = 1, len do     counter = (counter*31) + string.byte(text, i)     counter = mod(counter, 4294967296);	-- 2^32 end; return mod(counter, 4294967296); -- 2^32

This is the function implemented in the Java String.hashCode library call.

The multiplication by 31 basically translates to "new = old&lt;&lt;5 - old", i.e. you never completely lose data from earlier bytes as you would by only shifting data up. The relation to Bernstein's original "33" function is pretty clear; it simply uses 33 instead of 31.

The mod is necessary in Lua, otherwise the value turns floating-point and you lose precision after ~10 characters; Lua, at least under x86, uses 64-bit ("double-precision") IEEE 764 floats, which have a mantissa of 52 bits.

This function would be slightly better with a prime modulus, but 2^32 is chosen as a modulus to emulate 32-bit integer math as used in the Java SDK.

StringHash
for i = 1, len, 3 do     counter = mod(counter*8161, 4294967279) +  -- 2^32 - 17: Prime! (string.byte(text,i)*16776193) + ((string.byte(text,i+1) or (len-i+256))*8372226) + ((string.byte(text,i+2) or (len-i+256))*3932164); end; return mod(counter, 4294967291); -- 2^32 - 5: Prime (and different from the prime in the loop)

This string hash is ~30% faster than the Java hash due to working with 3 characters at a time; the modulus call is fairly expensive compared to the other operations.

The multiplications are all chosen to affect 3 areas of the accumulator at a time: 8161 =    8192 -     32 + 1  : 2&lt;&lt;13,  2&lt;&lt; 5,  2&lt;&lt;0 16776193 = 16777216 -  1024 + 1  : 2&lt;&lt;24,  2&lt;&lt;10,  2&lt;&lt;0 8372226 = 8388608 -  16384 + 2  : 2&lt;&lt;23,  2&lt;&lt;14,  2&lt;&lt;5 3932164 = 4194304 - 262144 + 4  : 2&lt;&lt;22,  2&lt;&lt;18,  2&lt;&lt;10

It is tempting to make the first multiplication much larger, but beware that the result of (4294967279+(256*29080583))*8161 must fit in a 52-bit mantissa without losing precision. The result is currently a 48-bit number so meets that contraint.

= Tests performed and result analysis =

Below is an array of tests that both algorithms have been submitted to. For a hash analysis, these tests are fairly basic and the result analysis is somewhat unscientific, but they will do to illustrate the difference between the Java hash and StringHash, which is all this document set out to do.


 * Performance test - raw measurement of relative hash speed
 * Bit distribution test - how often is each bit of the 32-bit output used?
 * Collision frequency test - compare observed collision statistics to expected numbers
 * Bit transmutation test - determine how many bits are changed as a result of 1 input bit changing
 * Collision finding test - find and count collisions for exhaustive input sets

Performance test
This test simply hashes the string "abcdefghijklmnopqrstuvwxyz" in a loop 20 000 times and clocks the start and stop time. The loop is included in the resulting time.


 * Java hash execution time: 1.719s
 * StringHash execution time: 1.313s

Result analysis
StringHash is slightly faster than the Java hash; likely not enough to matter, but certainly enough that the perceived "increased size" of StringHash can be discounted as an argument against it - it's actually faster!

The reason is of course that the math.mod call is relatively expensive compared to extracting characters from a string and performing multiplications. StringHash does 3 extract + multiply per mod call. The Java hash in its standard implementation has to do one mod call per character.

On the testing computer, a P4 at ~3Ghz, StringHash hashes in the neighborhood of half a megabyte per second; certainly not bad for a scripted language using only standard math library operations!

Bit distribution test
This test measures how many times each bit is 1 for a given set of inputs. The target value is for each bit to be 1 50% of the time, and 0 50% of the time, though there will obviously be deviations from this, which are expected and accepted, up to a certain point.

In this test, we feed each hash the same set of inputs: 10000 random strings consisting of "aaaaaaaaa" followed by 15 random bytes.

The results show the absolute difference between 50% and the result, i.e. 49.4 and 50.6 will both be reported as "0.6". Only differences of 0.5% or greater are shown (0.5% is in no way a ceiling of the expected difference; it just cuts down on the amount of output).

Result analysis
Both algorithms perform roughly as expected. With these samples, StringHash performs a little bit better, but that is to be expected as it mixes more bits; either way, the differences are not really meaningful. The test only attempts to evaluate if all possible result bits are used efficiently without unexpected "holes" in the output.

A more thorough test would attempt to find biases in different sets of bits, but that is really more meaningful for PRNG testing and secure hashes than it is for this type of light string hash.

Collision frequency test
This test takes the result of 20000 random strings consisting of ASCII characters 0-255, and selects 15 bits for testing. The resulting value becomes 0--32767, which, for anything other than a perfect hash, will result in a fair amount of collisions. We expect the frequency of collisions to match a Poisson distribution and compare the test results to it.

Long random string 32-127 bits 1-15
Java  SH     Expected 1: 10940 10981  10863   2:  6552   6554   6630   3:  2043   1998   2023   4:   388    396    411   5:    65     65     62   6:    12      6      7

Long random string 0-255 bits 1-15
Java  SH     Expected 1: 10717 10728  10863   2:  6796   6724   6630   3:  1995   2073   2023   4:   376    420    411   5:   110     55     62   6:     6             7

Long random string 0-255 bits 6-20
Java  SH     Expected 1: 10788 10875  10863   2:  6768   6628   6630   3:  1989   1971   2023   4:   400    460    411   5:    55     60     62   6:            6      7

Long random string 0-255 bits 12-26
Java  SH     Expected 1: 10917 10791  10863   2:  6630   6612   6630   3:  1881   2049   2023   4:   492    452    411   5:    80     90     62   6:            6      7

Long random string 0-255 bits 18-32
Java  SH     Expected 1: 10828 10795  10863   2:  6590   6860   6630   3:  2127   1932   2023   4:   380    360    411   5:    75     35     62   6:           18      7

Result analysis
Both hashes perform roughly as expected without showing too great variations over different bits; there are one or two aberrations in the higher collision counts, but that may just be the result of a too-small sample size. I will not investigate further as this test is only a rough indication of performance.

A more complete analysis would include testing smaller modulii, and certainly ones that are not a power of 2. A modulus constructed around series involving 31 and 32 will most likely produce "interesting" results for the Java hash, but that is an expected side effect of its construction.

Bit transmutation test
This test measures how many bits of the hash changes as a result of a single input bit changing, i.e. how close to an avalanche effect we're getting. The result is described as the number of times a given number of bits have been seen to change for the entire run.

The input starts out as a string of 0s (ASCII NULs), flipping bits on and off one at a time. Then a number of runs are executed starting out with random values.

All the below tests work with 13-byte strings, and execute for 50 runs, i.e. one run based on ASCII NULs, and 49 runs with fresh random numbers each time.

A good generic hash algorithm will tend toward flipping 50% of the bits for each input bit changed, though that is obviously unattainable. We do however expect a fair hump around 13-17 bits changed, and very low counts toward the extremes. 0 bits flipping may indicate a broken hash algorithm (the hash result didn't change at all as a result of changing the input).

Java, flip all bits in bytes 1 through 13
0:    0     1:   173     2:   208     3:   194   4:   188     5:   159     6:   160     7:   150   8:   200     9:   231    10:   228    11:   292  12:   343    13:   476    14:   503    15:   469  16:   418    17:   318    18:   210    19:   143  20:    69    21:    45    22:    11    23:     9  24:     2    25:     1    26:     0    27:     0  28:     0    29:     0    30:     0    31:     0  32:     0

This test does not show 0 bits changing, but the changes we do see are spread quite flat over the spectrum. There is a hump around 13-17 but not as marked as it should be. We get quite a few single bit changes, which is quite expected given the algorithm. The last few bytes in the string do not end up affecting very much of the end result.

StringHash, flip all bits in bytes 1 through 13
0:    0     1:     0     2:     0     3:    45   4:    70     5:    77     6:    71     7:    56   8:    45     9:    53    10:   115    11:   203  12:   327    13:   405    14:   533    15:   588  16:   620    17:   593    18:   499    19:   371  20:   259    21:   146    22:    76    23:    32  24:    12    25:     3    26:     1    27:     0  28:     0    29:     0    30:     0    31:     0  32:     0

StringHash has a much more pronounced hump around the 16 bits changed mark. There are no single bit changes, and no two-bit changes in the tests. This is expected as each bit changed is guaranteed to affect at least three bits of the hash, even for the last few bytes of the input.

Java, flip all bits in bytes 12 through 13
0:    0     1:   173     2:   208     3:   154   4:   108     5:    75     6:    53     7:    17   8:     6     9:     1    10:     3    11:     2  12:     0     (rest 0)

Here is where the weakness of the hash shows the most: changes in the last few bytes affect relatively few bits, thereby increasing chances for collisions.

StringHash, flip all bits in bytes 12 through 13
0:    0     1:     0     2:     0     3:    45   4:    70     5:    77     6:    71     7:    56   8:    36     9:    26    10:    36    11:    40  12:    54    13:    51    14:    57    15:    49  16:    42    17:    39    18:    23    19:    15  20:     9    21:     3    22:     1    23:     0  24:     0    (rest 0)

While not precisely shining compared to strong hashes, StringHash stirs quite a few more bits over a much larger spectrum.

Result analysis
StringHash outperforms the Java hash by distributing bit changes over more result bits, thus reducing the collision chance.

Collision finding test
This test exhaustively computes all possible values of a given input set and tests for collisions. The size of the input sets is unfortunately constrained by RAM and CPU, but it is believed to be a fair indication of hash strength.

ASCII codes 64-89 (upper case A-Z), 3 bytes
JavaHash 17576 non-colliding values StringHash 17576 non-colliding values

ASCII codes 64-89 (upper case A-Z), 4 bytes
JavaHash 456976 non-colliding values StringHash 456976 non-colliding values

ASCII codes 32-127 (all printable), 2 bytes
JavaHash 62 non-colliding values 62 instances of 2 inputs producing the same hash (124 strings) 2638 instances of 3 inputs producing the same hash (7914 strings) 279 instances of 4 inputs producing the same hash (1116 strings) StringHash 9216 non-colliding values

Already here, with only searching all possible combinations of 2 letters of printable ASCII, the Java hash is showing poor results.

ASCII codes 32-127 (all printable), 3 bytes
This is about as exhaustive we can go in Lua without resulting to data compression techniques for the result sets: all possible combinations of 3 letters of printable ASCII for a total of 884736 outputs.

JavaHash 62 non-colliding values 62 instances of 2 inputs producing the same hash (124 strings) 1630 instances of 3 inputs producing the same hash (4890 strings) 224 instances of 4 inputs producing the same hash (896 strings) 62 instances of 5 inputs producing the same hash (310 strings) 1630 instances of 6 inputs producing the same hash (9780 strings) 62 instances of 7 inputs producing the same hash (434 strings) 224 instances of 8 inputs producing the same hash (1792 strings) 68606 instances of 9 inputs producing the same hash (617454 strings) 5214 instances of 10 inputs producing the same hash (52140 strings) 5214 instances of 11 inputs producing the same hash (57354 strings) 9672 instances of 12 inputs producing the same hash (116064 strings) 558 instances of 13 inputs producing the same hash (7254 strings) 558 instances of 14 inputs producing the same hash (7812 strings) 558 instances of 15 inputs producing the same hash (8370 strings) StringHash 884736 non-colliding values

ASCII codes 32-127 (all printable), 3 bytes, with prefix
The same input values as the 32-127 x 3 test, except all prefixed with the string "01234567890123456789". This is to test for loss of accuracy.

JavaHash 62 non-colliding values 62 instances of 2 inputs producing the same hash (124 strings) 1630 instances of 3 inputs producing the same hash (4890 strings) 224 instances of 4 inputs producing the same hash (896 strings) 62 instances of 5 inputs producing the same hash (310 strings) 1630 instances of 6 inputs producing the same hash (9780 strings) 62 instances of 7 inputs producing the same hash (434 strings) 224 instances of 8 inputs producing the same hash (1792 strings) 68606 instances of 9 inputs producing the same hash (617454 strings) 5214 instances of 10 inputs producing the same hash (52140 strings) 5214 instances of 11 inputs producing the same hash (57354 strings) 9672 instances of 12 inputs producing the same hash (116064 strings) 558 instances of 13 inputs producing the same hash (7254 strings) 558 instances of 14 inputs producing the same hash (7812 strings) 558 instances of 15 inputs producing the same hash (8370 strings) StringHash 884736 non-colliding values

ASCII codes 32-127 (all printable), 3 bytes, with suffix
The same input values as the 32-127 x 3 test, except all suffixed with the string "01234567890123456789". This is to test for loss of "old" information (data from earlier on in the string) in the mixing function.

JavaHash 62 non-colliding values 62 instances of 2 inputs producing the same hash (124 strings) 1630 instances of 3 inputs producing the same hash (4890 strings) 224 instances of 4 inputs producing the same hash (896 strings) 62 instances of 5 inputs producing the same hash (310 strings) 1630 instances of 6 inputs producing the same hash (9780 strings) 62 instances of 7 inputs producing the same hash (434 strings) 224 instances of 8 inputs producing the same hash (1792 strings) 68606 instances of 9 inputs producing the same hash (617454 strings) 5214 instances of 10 inputs producing the same hash (52140 strings) 5214 instances of 11 inputs producing the same hash (57354 strings) 9672 instances of 12 inputs producing the same hash (116064 strings) 558 instances of 13 inputs producing the same hash (7254 strings) 558 instances of 14 inputs producing the same hash (7812 strings) 558 instances of 15 inputs producing the same hash (8370 strings) StringHash 884736 non-colliding values

ASCII codes 64-82 (A-S), 5 bytes
The Lua memory churn becomes somewhat ugly here, consuming several hundred megabytes. Even attempting A-Z with the same length brings a machine with 1GB RAM to its knees.

JavaHash 2476099 non-colliding values StringHash 2476099 non-colliding values

Result analysis
The Java hash is shown to produce many more collisions even for rather simple inputs. The only cases where it behaves well are the A-Z cases; as long as the input subset is constricted to 32 unique values, it is collision free for 6 bytes (6 * 5 bits = 30 bits), in other words, it is a perfect hash for e.g. short strings consisting of only upper case ASCII letters A-Z. This is an expected property of the algorithm. Though taken with the rather abysmal results for more generic strings above (or, indeed, any sort of string longer than 6 bytes), the Java hash demonstrably makes for a poor choice for general input.

= Result summary =

In the two simplest cases: collision frequency and bit distribution, the two hashes seem comparable. But on comparing bit transmutation patterns, StringHash tends toward much better spreads of input bits to output bits.

The theory is confirmed in the exhaustive collision search tests, where StringHash remains collision free for the small data sets used, while the Java hash produces more collisions (double, tripe, quadruple, etc) than it does collision-free results.

Added to this, the Lua implementation of StringHash is slightly faster than the Lua implementation of the Java hash (at least in its most straight-forward implementation).

= Applicability to other languages =

StringHash was designed to be reasonably collision-resistant and fast in a Lua standard library environment. Other interpreted languages capable of 50-bit integer math (i.e. through using 64-bit floats) should also be able to make a fair use of it.

Languages with access to lower level primitives (XOR, ANDs, rotations) would do better to look at e.g. Bob Jenkins' lookup3 hash, or at least read his page on hash analysis.

= Appendix: Lua hash analysis code =

mod=math.mod; -- This is the "Java hash", a variant of Dan Bernstein's old "33" hash posted long ago in the comp.lang.c newsgroup. -- The *31 basically translates to "new = old&lt;&lt;5 - old", i.e. you never completely lose data from earlier bytes as you would by only shifting -- The mod is necessary in Lua; otherwise the value turns floating-point and you lose precision after ~10 characters -- This function would be better with a prime modulus, but 2^32 is chosen as a mod to emulate the 32-bit math used in the original function JavaHash(text) local counter = 0; local len = string.len(text); for i = 1, len do     counter = (counter*31) + string.byte(text, i)     counter = mod(counter, 4294967296);	-- 2^32 end; return mod(counter, 4294967296); -- 2^32 end -- This string hash is ~30% faster due to working with 3 characters at a time; the mod expression is fairly expensive -- The multiplications are all chosen to affect 3 areas of the accumulator at a time: --     8161 =     8192 -     32 + 1  : 2&lt;&lt;13,  2&lt;&lt; 5,  2&lt;&lt;0 -- 16776193 = 16777216 -   1024 + 1  : 2&lt;&lt;24,  2&lt;&lt;10,  2&lt;&lt;0 --  8372226 =  8388608 -  16384 + 2  : 2&lt;&lt;23,  2&lt;&lt;14,  2&lt;&lt;5 --  3932164 =  4194304 - 262144 + 4  : 2&lt;&lt;22,  2&lt;&lt;18,  2&lt;&lt;10 function StringHash(text) local counter = 1; local len = string.len(text); for i = 1, len, 3 do     counter = mod(counter*8161, 4294967279) +  -- 2^32 - 17: Prime! (counter*8161 is at most a 48-bit number, which easily fits in the 52-bit mantissa of a double precision float) (string.byte(text,i)*16776193) + ((string.byte(text,i+1) or (len-i+256))*8372226) + ((string.byte(text,i+2) or (len-i+256))*3932164); end; return mod(counter, 4294967291); -- 2^32 - 5: Prime (and different from the prime in the loop) end -- Utility function: return number of bits changed between a and b function bitsdiff(a,b) local n=0; for i=1,32 do    if(mod(a,2) ~= mod(b,2)) then n=n+1; end a=math.floor(a/2); b=math.floor(b/2); end return n; end function tcopy(to,from) for k,v in pairs(from) do  	to[k]=v; end end -- Bit transmutation test: Count how many bits change as a result of a single bit of input flipping function transtest(func,num, len, first, last) local bitsdiffhits = {} local origstrbytes = { }; for c=1,len do  	table.insert(origstrbytes, 0); end math.randomseed(2); for rep=1,num do  	local strbytes = {}; for c=first,last do  		tcopy(strbytes, origstrbytes); local bitval=1; for b=1,8 do  			local before = func(string.char(unpack(strbytes))); strbytes[c] = origstrbytes[c]; strbytes[c] = math.mod(origstrbytes[c] + bitval, 256); local after = func(string.char(unpack(strbytes))); local n = bitsdiff(before,after); bitsdiffhits[n] = (bitsdiffhits[n] or 0) + 1; bitval=bitval*2; end end for c=1,table.getn(origstrbytes) do  		origstrbytes[c]=math.random(0,255); end end local score=0; for i=0,32 do  	score = score - (bitsdiffhits[i] or 0)*(math.abs(16-i)/16);	-- 16 = 50% of the bits, which is how many bits we want to change in a perfect distribution print(string.format(" %2u: %5u", i, bitsdiffhits[i] or 0)); end print(" SCORE: "..score);	-- 0 is unimaginatively good. negative is bad. end -- Performance test: Just measure the time it takes to execute a hash on a fixed string lots of times (including the cost of the loop, obviously) function perftest(func, num) local x=os.clock; for i=1,num do    func("abcdefghijklmnopqrstuvwxyz"); end return os.clock-x; end -- Count the number of single, double, triple etc collisions given a certain manipulation of the resultant hash: --  div by X, take modulus Y -- The modulus is there to create an artificial range clamping; testing would take too long otherwise. -- Should match expected Poisson distribution closely for an unbiased hash. function collisiontest(func, num, div,modulus, randfunc) local x = os.clock; local outputs = {} local collisions = {} math.randomseed(1); for i=1,num do  	str = randfunc; local n = math.mod(math.floor(func(str)/div), modulus); outputs[n] = (outputs[n] or 0)+1; end for _,n in pairs(outputs) do  	collisions[n] = (collisions[n] or 0) + n;   end local y = os.clock; local sortcol = {} for k,v in pairs(collisions) do  	table.insert(sortcol, k); end table.sort(sortcol); -- Compute expected Poisson distribution expect = {} lambda = num/modulus; fly = lambda * math.exp(-lambda); for c=1,10 do  	expect[c] = fly * modulus; fly = fly * lambda / c;  end -- Output results and expected results for _,k in ipairs(sortcol) do  	print(string.format("  %2u: %5u  (expected %5u)", k, collisions[k], expect[k] or -1)); end return y-x; end -- Bit distribution test: Measure number of times each bit is set, should average out to 50% chance in the end -- Will be presented as the absolute difference from 50% function bitdistribtest(func,num,lo,hi,dispmin) math.randomseed(1); local bitsset={}; for i=1,num do  	str = string.char(math.random(lo,hi),math.random(lo,hi),math.random(lo,hi),math.random(lo,hi),math.random(lo,hi),math.random(lo,hi),   		math.random(lo,hi),math.random(lo,hi),math.random(lo,hi),math.random(lo,hi),math.random(lo,hi),math.random(lo,hi),   		math.random(lo,hi),math.random(lo,hi),math.random(lo,hi),math.random(lo,hi),math.random(lo,hi),math.random(lo,hi)); local n = func(str); for k=1,32 do  		if(mod(n,2)==1) then bitsset[k] = (bitsset[k] or 0)+1; end n=math.floor(n/2); end end for k=1,32 do  	local diff = math.abs(50-(bitsset[k] or 0)/num*100); if(diff &gt; dispmin) then print(string.format(" %2u: %.1f%%", k, diff)); end end end -- Collision finder test: Loop all possible values in given set and count collisions (yes, this is slow, and uses quite a bit of ram) function collisionfinder(func, lo, hi,  bytes,  prefix, suffix) local str; local coll={}; prefix = prefix or ""; suffix = suffix or ""; if(bytes==2) then for b=lo,hi do  		for a=lo,hi do str = prefix .. string.char(a,b) .. suffix; local n = func(str); coll[n] = (coll[n] or 0) + 1; end end elseif(bytes==3) then for c=lo,hi do  		for b=lo,hi do   			for a=lo,hi do str = prefix .. string.char(a,b,c) .. suffix; local n = func(str); coll[n] = (coll[n] or 0) + 1; end end end elseif(bytes==4) then for d=lo,hi do  		for c=lo,hi do   			for b=lo,hi do   				for a=lo,hi do str = prefix .. string.char(a,b,c,d) .. suffix; local n = func(str); coll[n] = (coll[n] or 0) + 1; end end end end elseif(bytes==5) then for e=lo,hi do  		for d=lo,hi do   			for c=lo,hi do   				for b=lo,hi do   					for a=lo,hi do str = prefix .. string.char(a,b,c,d,e) .. suffix; local n = func(str); coll[n] = (coll[n] or 0) + 1; end end end end end end local collcnt = {}; for k,n in coll do  	collcnt[n] = (collcnt[n] or 0) + 1; end for colls, count in pairs(collcnt) do  	if(colls==1) then print(string.format(" %5u non-colliding values", count)); else print(string.format(" %5u instances of %u inputs producing the same hash (%u strings)", count, colls, count*colls)); end end end --- -- Run the tests if(true) then local colls; local function longrand(lo,hi) return function local ret = ""; while colls[ret] do  			ret = "aaaaaaaaa"..string.char(math.random(lo,hi),math.random(lo,hi),math.random(lo,hi),math.random(lo,hi),math.random(lo,hi),math.random(lo,hi),   				math.random(lo,hi),math.random(lo,hi),math.random(lo,hi),math.random(lo,hi),math.random(lo,hi),math.random(lo,hi),   				math.random(lo,hi),math.random(lo,hi),math.random(lo,hi),math.random(lo,hi),math.random(lo,hi),math.random(lo,hi)); end colls[ret] = true; return ret; end end print ""; print ""; print "Collision test - long random string 32-127 (20000 values into 32768 buckets)"; print "--" print("JavaHash:"); colls = {}; print(collisiontest(JavaHash, 20000, 1,32768, longrand(32,127))); print("StringHash:"); colls = {}; print(collisiontest(StringHash, 20000, 1,32768, longrand(32,127))); print ""; print "Collision test - long random string 0-255 (20000 values into 32768 buckets, bits 1-15)"; print "--" print("JavaHash:"); colls = {}; print(collisiontest(JavaHash, 20000, 1,32768, longrand(0,255))); print("StringHash:"); colls = {}; print(collisiontest(StringHash, 20000, 1,32768, longrand(0,255))); print ""; print "Collision test - long random string 0-255 (20000 values into 32768 buckets, bits 6-20)"; print "--" print("JavaHash:"); colls = {}; print(collisiontest(JavaHash, 20000, math.pow(2,5),32768, longrand(0,255))); print("StringHash:"); colls = {}; print(collisiontest(StringHash, 20000, math.pow(2,5),32768, longrand(0,255))); print ""; print "Collision test - long random string 0-255 (20000 values into 32768 buckets, bits 12-26)"; print "--" print("JavaHash:"); colls = {}; print(collisiontest(JavaHash, 20000, math.pow(2,11),32768, longrand(0,255))); print("StringHash:"); colls = {}; print(collisiontest(StringHash, 20000, math.pow(2,11),32768, longrand(0,255))); print ""; print "Collision test - long random string 0-255 (20000 values into 32768 buckets, bits 18-32)"; print "--" print("JavaHash:"); colls = {}; print(collisiontest(JavaHash, 20000, math.pow(2,17),32768, longrand(0,255))); print("StringHash:"); colls = {}; print(collisiontest(StringHash, 20000, math.pow(2,17),32768, longrand(0,255))); end if(true) then print ""; print ""; print "Performance test"; print "-" print("JavaHash speed: ", perftest(JavaHash, 20000)); print("StringHash speed: ", perftest(StringHash, 20000)); end if(true) then print ""; print ""; print "Bit transmutation test (how many bits change if a single bit changes?)"; print "--"; print(""); print("JavaHash 13 bytes total, change 1-13:"); transtest(JavaHash,50, 13, 1,13); print(""); print("StringHash 13 bytes total, change 1-13:"); transtest(StringHash,50, 13, 1,13); print(""); print("JavaHash 13 bytes total, change 1-2:"); transtest(JavaHash,50, 13, 1,2); print(""); print("StringHash 13 bytes total, change 1-2:"); transtest(StringHash,50, 13, 1,2); print(""); print("JavaHash 13 bytes total, change 12-13:"); transtest(JavaHash,50, 13, 12,13); print(""); print("StringHash 13 bytes total, change 12-13:"); transtest(StringHash,50, 13, 12,13); end if(true) then print ""; print ""; print "Bit distribution test (what's the chance of each bit being set? expressed as abs diff from 50%)"; print "-"; print(""); print("JavaHash 0-255, 10000 loops: (showing diffs&gt;0.4%)"); bitdistribtest(JavaHash,10000,0,255, 0.4); print(""); print("StringHash 0-255, 10000 loops: (showing diffs&gt;0.4%)"); bitdistribtest(StringHash,10000,0,255, 0.4); print(""); print("JavaHash 32-127, 10000 loops: (showing diffs&gt;0.4%)"); bitdistribtest(JavaHash,10000,32,127, 0.4); print(""); print("StringHash 32-127, 10000 loops: (showing diffs&gt;0.4%)"); bitdistribtest(StringHash,10000,32,127, 0.4); end if(true) then print ""; print ""; print "Collision finder (run through all possible values of a set and count collisions)"; print ""; print(""); print("JavaHash, 64-89 x 3 bytes"); collisionfinder(JavaHash, 64, 89, 3); print(""); print("StringHash, 64-89 x 3 bytes"); collisionfinder(StringHash, 64, 89, 3); print(""); print(""); print("JavaHash, 64-89 x 4 bytes"); collisionfinder(JavaHash, 64, 89, 4); print(""); print("StringHash, 64-89 x 4 bytes"); collisionfinder(StringHash, 64, 89, 4); print(""); print(""); print("JavaHash, 32-127 x 2 bytes"); collisionfinder(JavaHash, 32, 127, 2); print(""); print("StringHash, 32-127 x 2 bytes"); collisionfinder(StringHash, 32, 127, 2); print(""); print(""); print("JavaHash, 32-127 x 3 bytes"); collisionfinder(JavaHash, 32, 127, 3); print(""); print("StringHash, 32-127 x 3 bytes"); collisionfinder(StringHash, 32, 127, 3); print(""); print(""); print("JavaHash, 32-127 x 3 bytes, prefix '01234567890123456789'"); collisionfinder(JavaHash, 32, 127, 3, "01234567890123456789"); print(""); print("StringHash, 32-127 x 3 bytes, prefix '01234567890123456789'"); collisionfinder(StringHash, 32, 127, 3, "01234567890123456789"); print(""); print(""); print("JavaHash, 32-127 x 3 bytes, suffix '01234567890123456789'"); collisionfinder(JavaHash, 32, 127, 3, "", "01234567890123456789"); print(""); print("StringHash, 32-127 x 3 bytes, suffix '01234567890123456789'"); collisionfinder(StringHash, 32, 127, 3, "", "01234567890123456789"); print("");		-- This test takes a looooong time if(true) then print(""); print("JavaHash, 64-82 x 5 bytes"); collisionfinder(JavaHash, 64, 82, 5); print(""); print("StringHash, 64-82 x 5 bytes"); collisionfinder(StringHash, 64, 82, 5); end end