2016-07-31

A fuzzer is a piece of code that tries to randomly compose attempts from a set of inputs. It then checks this "attempt" against some target test in hopes of passing, or failing, the test. A hash is a piece of code that takes a number of inputs and tries to figure an optimal hash for the possible inputs. A hash is a way to lookup values by arbitrary keys. A genetic algorithm (GA) is a way of searching where that tries to find the result with the best score according to a certain fitness, or "how well does it do", algorithm. The Gasher is a combination of these, because I wanted one.

You can jump to the demo now, or read the rest first ;)

## Bits

I recently needed to go a little deeper into bitwise hacks. A library I'm working on was representing domains as an array of number pairs. Many many of these arrays are constructed and it is causing significant GC perf problems. One way I reduced this was to represent domains with only numbers smaller than 31 as bitwise fields, so a single regular 32bit int, instead of arrays. Of course this meant now doing many operations in a bitwise context. For example, the binary number

While I was reading up on this arcane matter, reading "Hacker's Delight" for example, I got intrigued by the algorithms for "popcount" (population count, the number of set bits in a number) and "clz" (count leading zeroes). Short of using a cpu op that does this operation in a single cycle, algorithms may be rather cryptic but can be very interesting. Here are the proposed algorithms for popcount and clz;

The second type is a lookup with a "perfect hash", meaning the output of the function can be looked up in a table and there are no cases where one output may mean multiple things. It's only the next best thing to a "minimal perfect hash", where the hash table is as large as the number of input elements to hash and each index maps to a unique result. Obviously that's not the case for a table of 37 elements when there are just 32 bits.

## A better hash

That actually got me interested and I wanted to see whether I couldn't generate random pieces of code that may get a better hash. Not that it really matters in the grand scheme of things. The operation has a cpu op on most chips and otherwise the 37 element table and even the mod (tradionally relatively expensive) is neglible these days. But I was curious whether I couldn't get an algorithm that would give me a minimal perfect hash and that's when I started building the Gasher.

It started as a fuzzer. Get a list of "values" (a variable for the actual input value and some numbers) and a list of operators and randomly string them together. Run the code and check the number of unique outputs for all the inputs. Repeat until the target is found (number of unique outputs equals the number of inputs).

My test case was doing a clz. That means that the inputs were the list of numbers where only one bit was set (also called "flags"), in es6 depicted by `new Array(32).fill(0).map((_,i) => 1 << i >>> 0)`. Note that highest number for

The fuzzer would automatically force the outputs to be within range of

## Gasher

After messing around with the fuzzer/hasher a bit and not getting the results I wanted, I decided to take a slightly more directed search and went with applying a "genetic algorithms". Before this point the fuzzer was combining codes with a fixed weight per code particle. A higher weight means a higher chance to be picked as the next part for the current code. For example, I would perhaps set the weight of the

Anyways, the GA approach means these weights become dynamic and the search will concern itself with optimizing these weights. It will take a set of weights, generate a batch of codes using these weights, score them, take the average, and this will be the score for this set of weight. We'll call the set of weights "DNA" in line with the GA lingo. Apart from these weights, the fuzzer is still unguided otherwise. So it will construct a piece of code using random bits picked based on a weight as per the current DNA. The number of bits to construct is random but capped at an arbitrary max. The fuzzer is unguided and does not look back when picking the next element, the weights are all that count.

Even with such an unguided fuzzer, the results of the Gasher looked quite promising quite quickly. Remember that the input were numbers between

The Gasher itself works fine. Maybe not perfect and not super fast, but it works. The complexer the input the longer it takes for a target hash to appear, of course. I say "of course" because you need to see this in terms of a search space. The space is bigger the more inputs and outputs there are. So if you want to get a perfect minimal hash for 32 bit flags there are many many outcomes but only a handful will actually be perfect. For example, this is the recording of an arbitrary search:

In this, each line represents the result for one DNA. The avg is the actual score for this DNA, at least this batch. The best is the currently best recorded DNA, so we know how well this DNA did compared to its "parent". The numbers that follow are the number of times this many unique outcomes were recorded. If the count is below 100 it's shown with an x and otherwise the % of the total number of trials, which is represented by the last numbers in the row. The Gasher runs in a web worker that takes no input once it started and will try to make sure each batch runs in about 1 second (so one DNA per second). Each batch generates offspring (a code) from the best DNA and a new DNA. Each offspring is tested once or five times, depending on the clamp setting.

The perf is about the biggest problem right now. I didn't exactly design this thing, it just came together while playing around. There are some ideas about how to crank up the perf because 36.000 trials per second is ridiculously slow, even considering what it's actually doing (the bucket counting is relatively expensive). I mean, I work in an environment where functions are called millions times per second so 36.000 trials per second looks a little off to me. Anyways, don't care much about that right now. The tool served its purpose for me and I need to move on :)

In the above graph you basically want to get the lowest count (or none) on the left and the highest count on the right (because in this particular test, 33 unique outcomes is what you want, hasing all the inputs for 32 flags and no flags). If you look at it from a distance you can see the pattern where the percentages are near the left side and in the 11-14 range, like a little wave. As the Gasher continues running, the numbers on the right get more calls.

## Bias

As with any search there is the danger of bias. In my test for example, the GA tended to favor the

Point is, the problem here, under this interpretation at least, was that the code was preferring multiplication not because it tended to yield better scores but because the small numbers simply didn't yield good scores. It still didn't yield optimal scores (spoiler; I only found one solution so far and it isn't pretty). The GA would prefer multiplication so much that it would zero out all the other weights and so the weights would end up only allowing multiplication with a high weight and one other op with a low weight (because the system forced that). I resolved that by adding a mode which capped the number of zero weights per class to three (an arbitrary number).

Bias is a dangerous thing because it can steer your search in the wrong direction.

The whole thing is a bit mesmerizing to watch. For me it is, anyways. It's like you're seeing the results improve and just waiting for that perfect hash to ping. The screen will turn green when that happens, just to make clear that you've got the goal ;) Here is the same run from above a few hours later:

You can see it's clearly doing a lot better on average. Yet it had found no solution in the entire time (about 5x "32").

## Interface

Let's talk about the UI a little. Don't look at the source, I just threw it together and it's ugly as hell.

At the top you see a spinner. It's not super relevant anymore but before the gasher was running on the main thread and the spinner would show you when the thread was busy. Now everything runs in a webworker so the spinner should pretty much always spin.

The top input is for entering some expression that returns the list of input values. These are the values to hash. You can enter a regular array or some JS that returns an array. This code is evaluated as is. By default the number of elements of this array is the target number of unique outcomes to find. You can override that below.

The next section are values, prefixes, infixes, and their weights. The fuzzer puts these together as regular JS expressions. So it starts with a value. It may prefix a prefix string to that, like turning

Next are weight constraints; can they all, some, or none be zero? This tries to combat bias as described above.

With bounds you can make the fuzzer wrap each step in code that boxes the result to 32bit signed or unsigned (or none at all). In some environments you may not have or want signed numbers.

There are a few ways to clamp the outputs to buckets ranges for the result. For example, if the target has 33 unique values we'll be looking for a table that maps the numbers

Lastly you can pick the cap of how many particles are added to the generated code. This is a cap and it will actually use a uniformly distributed random number of particles up to this number. You can override the number of buckets to target. Let's say for my example I wanted to try and find a table of

The buttons; "restart from last" will stop the Gasher and start a new one using the "current" weights. "restart from best" will also restart but use the "best" weights (the ones you can't edit). "pause ui" stops the ui from updating but the webworker still does its thing. "randomize" will fill the "current" inputs with random weight values so you can (re)start with random values. "stop" (and "start") should be quite self evident ;)

The Gasher will track the overall best score, the current best score, and the last score. Overall means the best DNA ever seen which is different from the current best score because it allows it's "best" dna to regress a little to try and move away from local plateaus. The top two lines in the output represent the code score (the integers) and the DNA score (the floats). The list shows you only the last checked DNA score, even though it will continously check the best DNA alongside with a new DNA.

## Conclusion

It's not that easy to find an alternative to popcount. Oh, you mean Gasher? Well it's a cool tool to generate hashes for your inputs. I'm going to use it for the next (es6) iteration of zeparser, if I ever get to it.

You can find the Gasher at https://github.com/qfox/gasher and a live demo at https://qfox.github.io/gasher/web.html

It was a fun experiment and now it's time to move on :)

You can jump to the demo now, or read the rest first ;)

I recently needed to go a little deeper into bitwise hacks. A library I'm working on was representing domains as an array of number pairs. Many many of these arrays are constructed and it is causing significant GC perf problems. One way I reduced this was to represent domains with only numbers smaller than 31 as bitwise fields, so a single regular 32bit int, instead of arrays. Of course this meant now doing many operations in a bitwise context. For example, the binary number

`0010011101100111101110100001110`

, represented by the integer `330554638`

, represents a domain that contains the numbers `1,2,3,8,10,11,12,14,15,16,17,20,21,23,24,25,28`

, because those bits were set. To count them, we need to do a so called "popcount" or "population count", or simply count the number of set bits in the number. To get the lowest number in the domain we need to do a "clz" (count leading zeroes) or some variety of it. To add the number `5`

to the domain we can do `domain | (1 << 5)`

. To remove it we do `(domain | (1 << 5)) ^ (1 << 5)`

. I can go on but let's save that for another post.While I was reading up on this arcane matter, reading "Hacker's Delight" for example, I got intrigued by the algorithms for "popcount" (population count, the number of set bits in a number) and "clz" (count leading zeroes). Short of using a cpu op that does this operation in a single cycle, algorithms may be rather cryptic but can be very interesting. Here are the proposed algorithms for popcount and clz;

Code:

`//popcount:`

num = (num - (((num >>> 1) & 0x55555555))) | 0;

num = ((num & 0x33333333) + ((num >>> 2) & 0x33333333)) | 0;

num = ((+((num + (num >>> 4)) & 0x0F0F0F0F) * +0x01010101)|0) >>> 24;

// clz:

let table = [32, 0, 1, 26, 2, 23, 27, 0, 3, 16, 24, 30, 28, 11, 0, 13, 4, 7, 17, 0, 25, 22, 31, 15, 29, 10, 12, 6, 0, 21, 14, 9, 5, 20, 8, 19, 18];

let size = talbe[(-num & num) % 37];

The second type is a lookup with a "perfect hash", meaning the output of the function can be looked up in a table and there are no cases where one output may mean multiple things. It's only the next best thing to a "minimal perfect hash", where the hash table is as large as the number of input elements to hash and each index maps to a unique result. Obviously that's not the case for a table of 37 elements when there are just 32 bits.

That actually got me interested and I wanted to see whether I couldn't generate random pieces of code that may get a better hash. Not that it really matters in the grand scheme of things. The operation has a cpu op on most chips and otherwise the 37 element table and even the mod (tradionally relatively expensive) is neglible these days. But I was curious whether I couldn't get an algorithm that would give me a minimal perfect hash and that's when I started building the Gasher.

It started as a fuzzer. Get a list of "values" (a variable for the actual input value and some numbers) and a list of operators and randomly string them together. Run the code and check the number of unique outputs for all the inputs. Repeat until the target is found (number of unique outputs equals the number of inputs).

My test case was doing a clz. That means that the inputs were the list of numbers where only one bit was set (also called "flags"), in es6 depicted by `new Array(32).fill(0).map((_,i) => 1 << i >>> 0)`. Note that highest number for

`i`

here will be `31`

. We need to do the `>>> 0`

part because `1 <&;t; 31`

would otherwise be a negative number. I actually started with only 31 bits because that's what I needed for my case. Then 31 bits and no bits (when the number is `0`

and no bits are set) and later the same for 32 bits.The fuzzer would automatically force the outputs to be within range of

`0`

and the max number of buckets. Otherwise you may end up with 32 unique outputs but they still couldn't form a table. This initial test seemed to perform quite well. The scores I got (remember; the score in this context means the number of unique outcomes) were often ending in the high 20's. As I discovered later, that really didn't mean much. You can get all the 30's you want, if you're looking for 31 they are still useless.After messing around with the fuzzer/hasher a bit and not getting the results I wanted, I decided to take a slightly more directed search and went with applying a "genetic algorithms". Before this point the fuzzer was combining codes with a fixed weight per code particle. A higher weight means a higher chance to be picked as the next part for the current code. For example, I would perhaps set the weight of the

`^`

to `5`

and the weight of `/`

to `1`

because I'd want the `^`

to appear more than the `/`

because I think it yields a better result?Anyways, the GA approach means these weights become dynamic and the search will concern itself with optimizing these weights. It will take a set of weights, generate a batch of codes using these weights, score them, take the average, and this will be the score for this set of weight. We'll call the set of weights "DNA" in line with the GA lingo. Apart from these weights, the fuzzer is still unguided otherwise. So it will construct a piece of code using random bits picked based on a weight as per the current DNA. The number of bits to construct is random but capped at an arbitrary max. The fuzzer is unguided and does not look back when picking the next element, the weights are all that count.

Even with such an unguided fuzzer, the results of the Gasher looked quite promising quite quickly. Remember that the input were numbers between

`1`

(just the lowest bit set) and `2147483648`

(just the highest bit set), so it's not like you could divide them by `100`

and call it a night. After running it for a while you could see the average go up. I think the highest I've seen are around 20 average. That means the some ~1000 codes generated would produce 20 unique outputs on average, given the flags as input. That's pretty amazing. And super useless... But that's for another blog post.The Gasher itself works fine. Maybe not perfect and not super fast, but it works. The complexer the input the longer it takes for a target hash to appear, of course. I say "of course" because you need to see this in terms of a search space. The space is bigger the more inputs and outputs there are. So if you want to get a perfect minimal hash for 32 bit flags there are many many outcomes but only a handful will actually be perfect. For example, this is the recording of an arbitrary search:

Code:

` bucket-count 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 batch codes tests`

(9.523 avg / 9.523 best) [ 9% 9% 4% 3% 3% 6% 53x 47x 55x 80x 12% 18% 3% 3% 96x 2% 4% 79x 92x 50x 37x 30x 13x 8x 3x - - - - - - - - ] 3640 : 7280 : 36400

(9.487 avg / 9.487 best) [ 9% 8% 4% 3% 4% 8% 62x 37x 53x 81x 11% 17% 4% 3% 2% 2% 3% 85x 84x 54x 57x 20x 14x 4x 1x 2x - - - - - - - ] 3640 : 7280 : 36400

(9.247 avg / 9.502 best) [ 11% 8% 3% 2% 3% 8% 50x 41x 59x 81x 11% 18% 3% 3% 96x 94x 4% 78x 63x 41x 41x 29x 12x 6x 1x 1x 1x - - - - - - ] 3640 : 7280 : 36400

(9.157 avg / 9.485 best) [ 12% 8% 4% 2% 3% 7% 61x 46x 71x 3% 12% 17% 3% 3% 3% 86x 4% 67x 74x 44x 38x 31x 13x 7x 1x 1x - - - - - - - ] 3640 : 7280 : 36400

(9.186 avg / 9.485 best) [ 11% 9% 4% 3% 2% 8% 50x 51x 68x 83x 13% 15% 3% 2% 3% 93x 3% 84x 85x 65x 31x 31x 9x 5x 4x 2x - - - - - - - ] 3640 : 7280 : 36400

(9.485 avg / 9.485 best) [ 10% 9% 4% 2% 3% 7% 62x 47x 56x 76x 12% 17% 4% 3% 2% 85x 3% 87x 71x 55x 54x 40x 15x 5x 2x - - - - - - - - ] 3640 : 7280 : 36400

(9.187 avg / 9.437 best) [ 12% 8% 4% 3% 2% 8% 69x 49x 64x 75x 12% 16% 3% 3% 95x 88x 4% 76x 77x 46x 33x 35x 16x 7x 4x 1x - - - - - - - ] 3640 : 7280 : 36400

(9.310 avg / 9.409 best) [ 11% 8% 3% 3% 3% 8% 58x 49x 53x 75x 12% 17% 3% 3% 96x 81x 4% 75x 81x 55x 52x 29x 13x 6x 1x 1x - - - - - - - ] 3640 : 7280 : 36400

(9.216 avg / 9.216 best) [ 12% 8% 4% 3% 3% 8% 65x 51x 46x 82x 11% 16% 3% 2% 2% 92x 4% 89x 78x 55x 40x 22x 17x 1x 1x 2x - - - - - - - ] 3640 : 7280 : 36400

(8.997 avg / 8.997 best) [ 12% 10% 4% 2% 3% 8% 61x 44x 51x 95x 12% 15% 3% 3% 93x 2% 3% 58x 70x 56x 44x 26x 8x 6x 3x - - - - - - - - ] 3640 : 7280 : 36400

(8.718 avg / 8.981 best) [ 15% 8% 4% 4% 3% 6% 59x 54x 58x 90x 12% 14% 3% 92x 80x 82x 3% 81x 70x 62x 36x 20x 15x 8x 9x 1x - - - - - - - ] 3640 : 7280 : 36400

(8.940 avg / 8.981 best) [ 13% 9% 3% 3% 3% 7% 70x 42x 59x 79x 11% 15% 4% 2% 94x 86x 4% 78x 61x 45x 48x 23x 14x 5x 3x - 1x - - - - - - ] 3640 : 7280 : 36400

(8.704 avg / 8.981 best) [ 15% 9% 4% 2% 3% 6% 64x 52x 58x 76x 11% 15% 3% 2% 90x 80x 3% 64x 76x 42x 55x 29x 9x 5x 4x 1x - - - - - - - ] 3640 : 7280 : 36400

(8.857 avg / 8.857 best) [ 14% 10% 4% 3% 3% 7% 57x 54x 65x 2% 10% 15% 3% 2% 98x 82x 4% 76x 84x 58x 37x 28x 10x 8x 3x 2x - - - - - - - ] 3640 : 7280 : 36400

(8.612 avg / 8.974 best) [ 13% 10% 4% 4% 3% 8% 90x 45x 62x 85x 10% 14% 3% 3% 3% 98x 3% 66x 68x 32x 33x 27x 12x 3x 3x 1x - - - - - - - ] 3640 : 7280 : 36400

(8.556 avg / 8.974 best) [ 14% 10% 4% 3% 3% 9% 47x 49x 48x 94x 10% 14% 3% 88x 3% 81x 4% 71x 72x 45x 38x 25x 7x 3x 2x 1x - - - - - - - ] 3640 : 7280 : 36400

(8.739 avg / 8.974 best) [ 15% 9% 4% 3% 2% 8% 52x 55x 51x 75x 11% 15% 3% 3% 89x 72x 3% 83x 68x 41x 51x 24x 12x 9x 4x - - - - - - - - ] 3640 : 7280 : 36400

(8.813 avg / 8.813 best) [ 14% 9% 4% 3% 3% 7% 60x 48x 50x 97x 12% 15% 3% 2% 93x 90x 4% 67x 71x 59x 35x 17x 16x 9x 2x - - - - - - - - ] 3640 : 7280 : 36400

(7.866 avg / 8.340 best) [ 18% 12% 5% 3% 3% 7% 55x 40x 51x 91x 11% 12% 2% 2% 74x 83x 3% 54x 46x 37x 29x 23x 14x 4x 4x 1x - - - - - - - ] 3640 : 7280 : 36400

(8.106 avg / 8.340 best) [ 18% 12% 4% 3% 2% 6% 49x 44x 45x 69x 11% 14% 3% 90x 77x 67x 4% 58x 70x 37x 29x 27x 9x 4x 7x 1x - - - - - - - ] 3640 : 7280 : 36400

(7.627 avg / 8.340 best) [ 20% 14% 5% 2% 3% 5% 47x 27x 71x 78x 10% 12% 2% 85x 77x 69x 3% 51x 65x 34x 37x 37x 9x 2x 2x - - - - - - - - ] 3640 : 7280 : 36400

(8.223 avg / 8.340 best) [ 16% 12% 4% 3% 3% 6% 58x 42x 63x 82x 10% 13% 3% 2% 95x 84x 3% 59x 66x 42x 48x 26x 14x 3x 1x - 1x - 1x - - - - ] 3640 : 7280 : 36400

(8.120 avg / 8.308 best) [ 16% 13% 4% 3% 2% 7% 57x 44x 48x 85x 10% 14% 2% 2% 2% 69x 3% 68x 52x 37x 35x 20x 7x 10x 1x 2x 1x - - - - - - ] 3640 : 7280 : 36400

(7.997 avg / 8.308 best) [ 18% 13% 4% 3% 98x 7% 50x 45x 45x 59x 10% 14% 3% 86x 79x 84x 3% 60x 63x 34x 39x 25x 4x 2x 1x 1x 1x - - - - - - ] 3640 : 7280 : 36400

(8.036 avg / 8.308 best) [ 18% 12% 4% 2% 3% 7% 52x 41x 47x 92x 10% 13% 3% 98x 87x 74x 3% 60x 69x 31x 40x 20x 10x 9x 5x - - - - - - - - ] 3640 : 7280 : 36400

In this, each line represents the result for one DNA. The avg is the actual score for this DNA, at least this batch. The best is the currently best recorded DNA, so we know how well this DNA did compared to its "parent". The numbers that follow are the number of times this many unique outcomes were recorded. If the count is below 100 it's shown with an x and otherwise the % of the total number of trials, which is represented by the last numbers in the row. The Gasher runs in a web worker that takes no input once it started and will try to make sure each batch runs in about 1 second (so one DNA per second). Each batch generates offspring (a code) from the best DNA and a new DNA. Each offspring is tested once or five times, depending on the clamp setting.

The perf is about the biggest problem right now. I didn't exactly design this thing, it just came together while playing around. There are some ideas about how to crank up the perf because 36.000 trials per second is ridiculously slow, even considering what it's actually doing (the bucket counting is relatively expensive). I mean, I work in an environment where functions are called millions times per second so 36.000 trials per second looks a little off to me. Anyways, don't care much about that right now. The tool served its purpose for me and I need to move on :)

In the above graph you basically want to get the lowest count (or none) on the left and the highest count on the right (because in this particular test, 33 unique outcomes is what you want, hasing all the inputs for 32 flags and no flags). If you look at it from a distance you can see the pattern where the percentages are near the left side and in the 11-14 range, like a little wave. As the Gasher continues running, the numbers on the right get more calls.

As with any search there is the danger of bias. In my test for example, the GA tended to favor the

`*`

because the input numbers were small and it needed to reduce super large numbers to something below 32. So the search would often lean towards a high weight for multiplication because the value particles would only be `1 through 10`

. It needed multiplication to get large numbers into the formulas because the smaller numbers simply yielded bad scores. Or well, that's one way to interpret those results anyways ;)Point is, the problem here, under this interpretation at least, was that the code was preferring multiplication not because it tended to yield better scores but because the small numbers simply didn't yield good scores. It still didn't yield optimal scores (spoiler; I only found one solution so far and it isn't pretty). The GA would prefer multiplication so much that it would zero out all the other weights and so the weights would end up only allowing multiplication with a high weight and one other op with a low weight (because the system forced that). I resolved that by adding a mode which capped the number of zero weights per class to three (an arbitrary number).

Bias is a dangerous thing because it can steer your search in the wrong direction.

The whole thing is a bit mesmerizing to watch. For me it is, anyways. It's like you're seeing the results improve and just waiting for that perfect hash to ping. The screen will turn green when that happens, just to make clear that you've got the goal ;) Here is the same run from above a few hours later:

Code:

` bucket-count 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 batch codes tests`

(13.00 avg / 14.56 best) [ 69x 3% 53x 57x 35x 34x 39x 47x 46x 57x 13% 23% 6% 5% 4% 3% 8% 5% 4% 92x 85x 52x 28x 15x 3x - 1x - - - - - - ] 3360 : 6720 : 33600

(12.30 avg / 14.56 best) [ 3% 10% 74x 21x 10x 5x 15x 15x 25x 44x 11% 22% 5% 4% 3% 3% 9% 4% 4% 3% 76x 54x 21x 20x 7x 1x - - - - - - - ] 3360 : 6720 : 33600

(13.40 avg / 14.56 best) [ 45x 40x 51x 55x 32x 21x 49x 37x 52x 76x 14% 24% 6% 5% 4% 4% 8% 3% 5% 83x 3% 66x 35x 15x 2x 3x - - - - - - - ] 3360 : 6720 : 33600

(13.95 avg / 14.56 best) [ 15x 24x 8x 13x 29x 76x 22x 17x 19x 44x 14% 25% 7% 4% 4% 4% 11% 4% 5% 3% 3% 54x 25x 10x 2x 5x - - - - - - - ] 3360 : 6720 : 33600

(14.19 avg / 14.56 best) [ 27x 30x 4x 8x 3x 8x 21x 22x 28x 45x 14% 27% 6% 5% 4% 5% 9% 4% 5% 3% 88x 72x 30x 27x 11x 2x 1x - 1x - - - - ] 3360 : 6720 : 33600

(13.84 avg / 14.56 best) [ 20x 43x 13x 16x 33x 3% 16x 13x 39x 29x 13% 25% 6% 4% 4% 4% 9% 5% 5% 3% 2% 54x 42x 16x 5x 1x 2x - - - - - - ] 3360 : 6720 : 33600

(14.25 avg / 14.56 best) [ 16x 12x 4x 11x 5x 5x 17x 16x 36x 51x 14% 27% 7% 4% 4% 4% 9% 5% 5% 3% 3% 60x 33x 17x 6x 2x - - - - - - - ] 3360 : 6720 : 33600

(14.34 avg / 14.56 best) [ 19x 23x 3x 6x 5x 9x 19x 17x 39x 49x 14% 26% 6% 4% 5% 5% 8% 5% 6% 3% 3% 74x 49x 8x 5x 2x 2x - 1x - - - - ] 3360 : 6720 : 33600

(14.23 avg / 14.56 best) [ 23x 35x 2x 5x 7x 15x 14x 24x 38x 49x 13% 26% 7% 5% 4% 4% 10% 5% 5% 98x 3% 67x 44x 20x 9x 2x 1x - - - - - - ] 3360 : 6720 : 33600

(13.37 avg / 14.56 best) [ 43x 39x 70x 46x 24x 39x 50x 49x 55x 51x 15% 22% 7% 4% 4% 4% 9% 4% 4% 94x 88x 63x 41x 15x 3x 3x 3x - - - - - - ] 3360 : 6720 : 33600

(12.25 avg / 14.56 best) [ 3% 10% 91x 31x 4x 15x 19x 22x 38x 29x 12% 22% 6% 4% 3% 3% 8% 5% 4% 96x 72x 53x 33x 13x 7x 1x - - - - - - - ] 3360 : 6720 : 33600

(13.99 avg / 14.56 best) [ 21x 23x 10x 24x 23x 91x 20x 16x 32x 35x 15% 24% 6% 4% 5% 4% 9% 5% 5% 3% 3% 71x 30x 14x 6x 5x - - - - - - - ] 3360 : 6720 : 33600

(13.69 avg / 14.56 best) [ 26x 53x 2x 9x 16x 3% 22x 13x 28x 50x 14% 26% 6% 4% 4% 3% 10% 5% 4% 97x 96x 55x 24x 13x 10x 1x 2x - - - - - - ] 3360 : 6720 : 33600

(14.37 avg / 14.56 best) [ 14x 23x 6x 6x 6x 9x 16x 15x 33x 37x 13% 26% 7% 5% 5% 5% 10% 5% 5% 3% 3% 70x 32x 15x 11x 4x 2x 1x - - - - - ] 3360 : 6720 : 33600

(14.01 avg / 14.56 best) [ 38x 19x 13x 13x 13x 3% 26x 23x 52x 66x 11% 24% 6% 4% 5% 4% 10% 5% 5% 3% 82x 82x 36x 23x 9x 5x 2x - - - - - - ] 3360 : 6720 : 33600

(12.30 avg / 14.56 best) [ 88x 11% 86x 20x 15x 14x 18x 20x 32x 42x 12% 22% 6% 4% 3% 3% 8% 4% 4% 3% 3% 59x 26x 19x 7x - 1x - - - - - - ] 3360 : 6720 : 33600

(12.32 avg / 14.56 best) [ 3% 11% 82x 27x 8x 9x 19x 8x 45x 32x 11% 22% 5% 3% 4% 3% 8% 4% 5% 94x 89x 54x 34x 18x 3x - 2x - - - - - - ] 3360 : 6720 : 33600

(12.35 avg / 14.56 best) [ 94x 11% 68x 30x 10x 12x 20x 16x 27x 33x 11% 22% 6% 4% 4% 4% 8% 4% 4% 98x 76x 67x 24x 14x 11x 3x 1x - - - - - - ] 3360 : 6720 : 33600

(13.18 avg / 14.56 best) [ 37x 51x 52x 66x 33x 32x 46x 49x 49x 72x 14% 25% 6% 4% 3% 4% 9% 4% 4% 96x 67x 54x 28x 10x 8x 2x 1x - - - - - - ] 3360 : 6720 : 33600

(13.98 avg / 14.56 best) [ 17x 37x 12x 18x 24x 84x 15x 19x 26x 30x 14% 25% 7% 4% 4% 4% 10% 4% 5% 3% 84x 75x 26x 18x 13x 6x 3x 1x - - - - - ] 3360 : 6720 : 33600

(11.69 avg / 14.56 best) [ 4% 10% 3% 66x 29x 26x 35x 40x 40x 53x 11% 20% 5% 3% 3% 3% 8% 4% 4% 72x 82x 55x 26x 21x 1x 3x - - - - - - - ] 3360 : 6720 : 33600

(13.47 avg / 14.56 best) [ 39x 46x 51x 54x 44x 22x 37x 46x 54x 70x 13% 23% 6% 4% 5% 4% 8% 5% 5% 97x 93x 59x 33x 18x 6x 3x 2x - - - - - - ] 3360 : 6720 : 33600

(14.15 avg / 14.56 best) [ 40x 47x 4x 6x 1x 6x 19x 15x 41x 48x 13% 26% 8% 4% 5% 3% 10% 5% 4% 3% 3% 65x 29x 32x 7x 5x - - - - - - - ] 3360 : 6720 : 33600

(14.47 avg / 14.56 best) [ 20x 16x 4x 6x 5x 5x 9x 19x 26x 27x 14% 26% 7% 5% 4% 4% 10% 5% 6% 3% 3% 77x 44x 13x 9x 2x 1x 1x - - - - - ] 3360 : 6720 : 33600

You can see it's clearly doing a lot better on average. Yet it had found no solution in the entire time (about 5x "32").

Let's talk about the UI a little. Don't look at the source, I just threw it together and it's ugly as hell.

At the top you see a spinner. It's not super relevant anymore but before the gasher was running on the main thread and the spinner would show you when the thread was busy. Now everything runs in a webworker so the spinner should pretty much always spin.

The top input is for entering some expression that returns the list of input values. These are the values to hash. You can enter a regular array or some JS that returns an array. This code is evaluated as is. By default the number of elements of this array is the target number of unique outcomes to find. You can override that below.

The next section are values, prefixes, infixes, and their weights. The fuzzer puts these together as regular JS expressions. So it starts with a value. It may prefix a prefix string to that, like turning

`5`

into `~5`

. Then this composition gets an infix (operator with two operands) and another value is generated. From there on out that infix and value step is appended for so many times. Somewhere along the line at least one `x`

is enforced (because that's the point) and the result will be something like `(((((((((7*9) |0)|x)^9) |0)*7) |0)) >>> 0) % 33`

. Super ugly, of course, and I've added an ulgifyjs step to try and reduce the formula statically (but only for displaying), in this last example it would end up as `((7*(9^(63|x)|0)|0)>>>0)%33`

. When we actually look at it we can reduce that further, of course, because we know the inputs but it doesn't hurt to have it anyways. The uglify step only runs in the UI end so not in the actual Gasher part of the code.Next are weight constraints; can they all, some, or none be zero? This tries to combat bias as described above.

With bounds you can make the fuzzer wrap each step in code that boxes the result to 32bit signed or unsigned (or none at all). In some environments you may not have or want signed numbers.

There are a few ways to clamp the outputs to buckets ranges for the result. For example, if the target has 33 unique values we'll be looking for a table that maps the numbers

`0 through 32`

to the original input values. That means the output for the generated codes should result in a number below `33`

. There are various ways to clamp that and you can choose how to, or add your own method. You can also have Gasher try all of them, which means there'll be fewer trials per DNA because they'll use all varaitions of clamping.Lastly you can pick the cap of how many particles are added to the generated code. This is a cap and it will actually use a uniformly distributed random number of particles up to this number. You can override the number of buckets to target. Let's say for my example I wanted to try and find a table of

`34`

elements then I'd have to put that here. And the "append" controls the output field. By default it will replace the output if it found an equal or better score and only once it reached the goal it will append the results because you may be interested in more than one result. The "append" field allows you to set the appending behavior sooner, by default one below the target.The buttons; "restart from last" will stop the Gasher and start a new one using the "current" weights. "restart from best" will also restart but use the "best" weights (the ones you can't edit). "pause ui" stops the ui from updating but the webworker still does its thing. "randomize" will fill the "current" inputs with random weight values so you can (re)start with random values. "stop" (and "start") should be quite self evident ;)

The Gasher will track the overall best score, the current best score, and the last score. Overall means the best DNA ever seen which is different from the current best score because it allows it's "best" dna to regress a little to try and move away from local plateaus. The top two lines in the output represent the code score (the integers) and the DNA score (the floats). The list shows you only the last checked DNA score, even though it will continously check the best DNA alongside with a new DNA.

It's not that easy to find an alternative to popcount. Oh, you mean Gasher? Well it's a cool tool to generate hashes for your inputs. I'm going to use it for the next (es6) iteration of zeparser, if I ever get to it.

You can find the Gasher at https://github.com/qfox/gasher and a live demo at https://qfox.github.io/gasher/web.html

It was a fun experiment and now it's time to move on :)