A bit less complexity

2016-07-04

Currently working on optimizing a finite domain constraint solver. This program intends to receive a bunch of variables with possible values, some constraints, and return you a valuation that satisfies all constraints. Well or flat out reject if it could not find any, of course. Like having two variables, A and B, with each being possible zero or one, and asking the system what A and B would be knowing that A must be lesser than B. It's an academic level kind of thing (it gets complex very quickly) and I won't bore you too much with the details of solving in this post. I was trying to optimize a particular case and wanted to outline how I solved it.

TLDR; reduced a fairly complex piece of code to a simple one liner using bitwise operators.

The scene


First a little background on the model. So there are variables and each variable has a set of possible values we call "domains". Internally these values are represented as a set of ranges in an array. If you have a var whose valid values are 0, 1, 5, 9 tm 20 we would display this as [[0, 1], [5, 5], [9, 20]]. Initially the library I'm working on actually represented a domain this way internally, but I quickly optimized that to a flat array of numbers. So that becomes [0, 1, 5, 5, 9, 20], which will aleviate the GC tremendously (it cut runtime in half when I applied that).

Additionally, the solver often works with small domains. That is, domains with numbers smaller than 30. In fact, most of the values are below 5. So to use arrays on that is still a waste. To that end I changed the code and introduced "small domains as bitwise flags" along side the flat array model. The small domains consists of a single 32bit number where each bit represents whether or not the domain contains the value of its bit-index, 0 through 30 inclusive. Not 31 because I did not want to have to mess with the sign and overflow and stuff. And really, 30 or 31 won't make a big difference. I just use 30 as an upper limit because I'll have the space in JS numbers, regardless.

These bitwise domains actually opened up a whole new vector of optimizations, compared to the original library anyways. Unification and containsValue become much cheaper, of course some other operations become more expensive. And I love that level of hackery. I love fiddling with these domains as a bitwise field. It's like code golfing, but different, but relevant! :)

The code


Okay, now that I've set the scene, in this particular case I'm trying to optimize some code that adds all numbers in a given range to an existing bitwise domain. In other words, I have a domain [[0, 10], [20, 30]] and I want to add the range [15, 18] to it to end up with [[0, 10], [15, 18], [20, 30]]. This function is used by a function that applies mathematical addition to two domains. And it is used a lot in the way that my client (The Grid) is using the library. In a baseline perf test case it is called some 500.000 times. Not the most critical hot code, mind you, but still quite relevant.

The before version of the code, which I did write myself, is a very long series of case statements in a switch which fall through to inject ranges. This is what it basically looks like:

Code:
/**
* Add all numbers within given range [lo, hi) to given (small) domain
*
* @param {$domain_num} domain
* @param {number} from
* @param {number} to
* @returns {$domain_num}
*/
function domain_addRangeNum(domain, from, to) {
// this switch is designed to case at lo and break at hi.
// it prevents a very hot loop.
switch (from) {
case 0:
domain |= ZERO;
// fall-through
case 1:
if (to < 1) break;
domain |= ONE;
// fall-through
case 2:
if (to < 2) break;
domain |= TWO;
// fall-through
case 3:
if (to < 3) break;
domain |= THREE;
// fall-through
case 4:
if (to < 4) break;
domain |= FOUR;
// fall-through
case 5:
if (to < 5) break;
domain |= FIVE;
// fall-through
case 6:
... etc
case 29:
if (to < 29) break;
domain |= TWENTYNINE;
// fall-through
case 30:
if (to < 30) break;
domain |= THIRTY;
}

return domain;
}

Before this was a loop and the inner parts of that loop was extremely hot, like a few million calls on the perf case which was unacceptable. I figured a "smart" switch would be better. I didn't like it but it beat the code before and it performed decently. Before it was this;

Code:
function domain_fromFlags(domain) {
ASSERT(typeof domain === 'number', 'ONLY_USED_WITH_NUMBERS');
if (domain === EMPTY) return [];

let arr = [];
let lo = -1;
let hi = -1;
if (ZERO & domain) {
lo = 0;
hi = 0;
}
if (ONE & domain) {
if (lo !== 0) { // lo is either 0 or nothing
lo = 1;
}
hi = 1; // there cannot be a gap yet
}
if (TWO & domain) {
if (hi === 0) {
arr.push(0, 0);
lo = 2;
} else if (hi !== 1) {
// if hi isnt 0 and hi isnt 1 then hi isnt set and so lo isnt set
lo = 2;
}
hi = 2;
}
if (THREE & domain) {
// etc for 30 flags

Which is basically an unrolled loop. Before that it lazily converted the flags to an array domain, injected the range, and then converted that array back to flags. A really inefficient lazy method just to get the ball rolling.

Now that I'm looking at it again I don't quite understand how I missed the simplicity of the short solution here. I suppose the outline above could make the insight more obvious but I somehow missed it. I'm currently reading "Hacker's Delight", which may have helped to nudge my frame of mind on the right path :) It's a great book, would recommend it to ... hm, well that's a little hard to describe. To "people who enjoy code golfing"? :)

Actually another reason of insight was visualizing these numbers and looking for patterns. I did this for all 8bit numbers but it scales to 32bit easily.

Code:
let flags = [EMPTY, ZERO, ONE, TWO, THREE, FOUR, FIVE, SIX, SEVEN, EIGHT, NINE, TEN, ELEVEN, TWELVE, THIRTEEN, FOURTEEN, FIFTEEN, SIXTEEN, SEVENTEEN, EIGHTEEN, NINETEEN, TWENTY, TWENTYONE, TWENTYTWO, TWENTYTHREE, TWENTYFOUR, TWENTYFIVE, TWENTYSIX, TWENTYSEVEN, TWENTYEIGHT, TWENTYNINE, THIRTY];

for (let i = 0; i < 8; ++i) {
let flag = flags[i ];
console.log(' '.repeat(18)+'[j,0] [j,1] [j,2] [j,3] [j,4] [j,5] [j,6] [j,7]');
for (let j = 0; j < 8; ++j) {
let s = flag.toString(2).padStart(8, '0') + ' ['+j+',k] => ';
for (let k = 0; k < 8; ++k) {
if (k < j) s += ' '.repeat(9);
else s += domain_addRangeNum(flag, j, k).toString(2).padStart(8, '0') + ' ';
}
console.log(s);
}
console.log('');
}

This results in...

Code:
[j,0] [j,1] [j,2] [j,3] [j,4] [j,5] [j,6] [j,7]
00000000 [0,k] => 00000001 00000011 00000111 00001111 00011111 00111111 01111111 11111111
00000000 [1,k] => 00000010 00000110 00001110 00011110 00111110 01111110 11111110
00000000 [2,k] => 00000100 00001100 00011100 00111100 01111100 11111100
00000000 [3,k] => 00001000 00011000 00111000 01111000 11111000
00000000 [4,k] => 00010000 00110000 01110000 11110000
00000000 [5,k] => 00100000 01100000 11100000
00000000 [6,k] => 01000000 11000000
00000000 [7,k] => 10000000

[j,0] [j,1] [j,2] [j,3] [j,4] [j,5] [j,6] [j,7]
00000001 [0,k] => 00000001 00000011 00000111 00001111 00011111 00111111 01111111 11111111
00000001 [1,k] => 00000011 00000111 00001111 00011111 00111111 01111111 11111111
00000001 [2,k] => 00000101 00001101 00011101 00111101 01111101 11111101
00000001 [3,k] => 00001001 00011001 00111001 01111001 11111001
00000001 [4,k] => 00010001 00110001 01110001 11110001
00000001 [5,k] => 00100001 01100001 11100001
00000001 [6,k] => 01000001 11000001
00000001 [7,k] => 10000001

[j,0] [j,1] [j,2] [j,3] [j,4] [j,5] [j,6] [j,7]
00000010 [0,k] => 00000011 00000011 00000111 00001111 00011111 00111111 01111111 11111111
00000010 [1,k] => 00000010 00000110 00001110 00011110 00111110 01111110 11111110
00000010 [2,k] => 00000110 00001110 00011110 00111110 01111110 11111110
00000010 [3,k] => 00001010 00011010 00111010 01111010 11111010
00000010 [4,k] => 00010010 00110010 01110010 11110010
00000010 [5,k] => 00100010 01100010 11100010
00000010 [6,k] => 01000010 11000010
00000010 [7,k] => 10000010

[j,0] [j,1] [j,2] [j,3] [j,4] [j,5] [j,6] [j,7]
00000100 [0,k] => 00000101 00000111 00000111 00001111 00011111 00111111 01111111 11111111
00000100 [1,k] => 00000110 00000110 00001110 00011110 00111110 01111110 11111110
00000100 [2,k] => 00000100 00001100 00011100 00111100 01111100 11111100
00000100 [3,k] => 00001100 00011100 00111100 01111100 11111100
00000100 [4,k] => 00010100 00110100 01110100 11110100
00000100 [5,k] => 00100100 01100100 11100100
00000100 [6,k] => 01000100 11000100
00000100 [7,k] => 10000100

[j,0] [j,1] [j,2] [j,3] [j,4] [j,5] [j,6] [j,7]
00001000 [0,k] => 00001001 00001011 00001111 00001111 00011111 00111111 01111111 11111111
00001000 [1,k] => 00001010 00001110 00001110 00011110 00111110 01111110 11111110
00001000 [2,k] => 00001100 00001100 00011100 00111100 01111100 11111100
00001000 [3,k] => 00001000 00011000 00111000 01111000 11111000
00001000 [4,k] => 00011000 00111000 01111000 11111000
00001000 [5,k] => 00101000 01101000 11101000
00001000 [6,k] => 01001000 11001000
00001000 [7,k] => 10001000

[j,0] [j,1] [j,2] [j,3] [j,4] [j,5] [j,6] [j,7]
00010000 [0,k] => 00010001 00010011 00010111 00011111 00011111 00111111 01111111 11111111
00010000 [1,k] => 00010010 00010110 00011110 00011110 00111110 01111110 11111110
00010000 [2,k] => 00010100 00011100 00011100 00111100 01111100 11111100
00010000 [3,k] => 00011000 00011000 00111000 01111000 11111000
00010000 [4,k] => 00010000 00110000 01110000 11110000
00010000 [5,k] => 00110000 01110000 11110000
00010000 [6,k] => 01010000 11010000
00010000 [7,k] => 10010000

[j,0] [j,1] [j,2] [j,3] [j,4] [j,5] [j,6] [j,7]
00100000 [0,k] => 00100001 00100011 00100111 00101111 00111111 00111111 01111111 11111111
00100000 [1,k] => 00100010 00100110 00101110 00111110 00111110 01111110 11111110
00100000 [2,k] => 00100100 00101100 00111100 00111100 01111100 11111100
00100000 [3,k] => 00101000 00111000 00111000 01111000 11111000
00100000 [4,k] => 00110000 00110000 01110000 11110000
00100000 [5,k] => 00100000 01100000 11100000
00100000 [6,k] => 01100000 11100000
00100000 [7,k] => 10100000

[j,0] [j,1] [j,2] [j,3] [j,4] [j,5] [j,6] [j,7]
01000000 [0,k] => 01000001 01000011 01000111 01001111 01011111 01111111 01111111 11111111
01000000 [1,k] => 01000010 01000110 01001110 01011110 01111110 01111110 11111110
01000000 [2,k] => 01000100 01001100 01011100 01111100 01111100 11111100
01000000 [3,k] => 01001000 01011000 01111000 01111000 11111000
01000000 [4,k] => 01010000 01110000 01110000 11110000
01000000 [5,k] => 01100000 01100000 11100000
01000000 [6,k] => 01000000 11000000
01000000 [7,k] => 11000000

Pretty, no? :D Do you see the patterns?

The solution


So we have a field of bit flags. We want to "set" a range of bits in this field. Instead of the, in hindsight, attrocious switch that does the job, let's get a .. bit lower level. To this end it helps to look at bitwise patterns and I'm afraid it's going to be a bit hard to explain. I am however going to skip the primer on binary. There's plenty of primers out there. Don't worry, I'll wait.

In a nutshell what we'll do is create a range of ones and shift those to the left and merge them with the input. The examples below will be in 8bit (for brievety) which in actuallity we'll scale up to 32bit.

First, create a number that in binary has all zeroes except for a 1. One of the binary operators that are available to us allow us to "move" a 1 some places to the left (and another operator can move them to the right). This is called a "shift". It will do this for all bits but if you only have one 1 then obvious that's the only thing that moves. Note that the decimal number 1 represents the 8bit binary value 00000001. By applying "left shift", or 1 << 3 we would move the 1 exactly 3 places to the left, ending up with 00001000, which in decimal is 8.

In JS you can do num.toString(2) to get a binary representation of your number, but note that this will trunc all zeroes on the left side. This may look a little confusing at first.

If we take that and subtract 1 from it, all the zeroes to the right "of the right-most 1" will become 1 and the original 1 becomes a 0: 8 - 1 = 7 or 00001000 - 00000001 = 00000111. So now we have a number which in binary has 3 ones and all zeroes otherwise. If this were a bitwise domain then it would represent the numbers [0, 2] because the first, second, and third flag are "set" (1).

As I said before, the bitwise shift actually shifts all ones to the left so we can use that now. 7 << 2 = 28, or in binary 00000111 << 00000010 = 00011100. Look, it makes so much more sense in binary representation than in decimal or hex. Now we can see that 00011100 represents a domain with [2, 4] because the third, fourth, and fifth bit are set. Not so obvious from the number 28, even though that's what it is.

To add this new domain to the input domain we can simply "or" them, with the | bitwise operator. This operation basically takes two numbers and compares each bit at the same index. If either of the two bits is set it will set the result bit and otherwise it will unset the result bit. result = domain | newDomain.

Actual solution


Okay, back to our 32bit reality and our example. It works the same way only with more bits to manipulate. We had an input domain [[0, 10], [20, 30]] and wanted to add the range [15, 18] to it. In binary these domains are 1111111111100000000011111111111 and 0000000000001111000000000000000, although we'd be getting the latter as actual ranges (for reasons), so we'd get 15, 18.

Code:
// input: 1111111111100000000011111111111 = [[0, 10], [20, 30]]
// generate the hi-lo ones
let newDomain = 1 << (1 + 18 - 15); // binary: 0000000000000000000000000010000
newDomain -= 1; // binary: 0000000000000000000000000001111
newDomain <<= 15; // binary: 0000000000001111000000000000000
// note:
// input: 1111111111100000000011111111111
// range: 0000000000001111000000000000000
// merge them! :)
domain |= newDomain;
// -> 1111111111101111000011111111111

And that's how you do that. So this super long switch, or super hot loop, can be turned into a fairly simple line of code:
return input | (((1 << (1 + hi - lo)) - 1) << lo);

Funny how stuff works when you think a little outside the box. The binary stuff on its own has nothing to do with domains or what's semantically going on. Yet it does the job far more efficient than any semantic code could do.

Ah well. Hope you enjoyed it :) Sorry if it bored you.