Creating a 1k brainfuck demo

2010-09-04

Even though I can't compete myself, I wanted to submit a demo anyways. But I'm not a graphics whiz, those old skool canvas demo's are simply out of my league. Being the organizer of the contest gave me a good excuse not to compete. But at some point @mathias on IRC mentioned a Brainfuck interpreter. I told him he'd better make one or I'd do so. He didn't so I did :)

Brainfuck is a little Turing language with just eight different instructions. It has a code pointer, a data pointer and a stack. The stack has 30.000 elements that wrap around 255 (one byte per element). The stack pointer wraps too (so moving to the next pointer from 30.000 gives you 0). Whenever I say "the current element" I really mean the element the data pointer currently points to.

> moves your data pointer to the next element
< moves your data pointer to the previous element
+ increments the value of the current element
- decrements the value of the current element
, reads one byte of input (usually stdin) and stores it in the current element (replacing whatever value was there)
. outputs the value of the current element as an ascii character of the same value
[ moves the code pointer to the next _corresponding_ closing square bracket (]) if the value of the current element is zero, otherwise does nothing
] moves the code pointer to the previous _corresponding_ opening square bracket ([) if the value of the current element is NOT zero, otherwise does nothing

I want to make clear that the [] actually nest. This wasn't apparent to me at first and caused about two days of wondering why online demo's kept b0rking. At some point I did figure out that simply getting the first left or right bracket wasn't enough and things went great.

Note that this explanation is actually about the first submission (here). My second version includes a much better GUI, some bug fixes and is a meta quine: running the source as a program gives almost the same result :) The extra's were possible thanks to jscrush, by @aivopaas.

Getting down to the code, this is the nitty gritty of the interpreter:

Code: (JS..ish)

function branch(I,J){
while(x|m[c]!=I&c>J|m[c+J])
x+=(G=m[J?c--:c++])==(J?C:B)||G==I&&-1
}
function step(){
!(b=m[c])?stop():
g[Z]=m[L='substr'](0,c)+'<u>'+b+'</u>'+m[L](c+1),

R=s[d]|=x=0,
b=='>'&&++d,
b=='<'&&--d,
b=='+'&&++s[d],
b=='-'&&--s[d],
b=='.'&&(j[Z]+=S(R)+(++k%80?'':'\n')),
//b==','&&(n?((s[d]=n.charCodeAt(0)),n=n[L](1)):s[d]=0),
b==','&&(s[d]=n.charCodeAt(0)|0,v[11][Z]=n=n[L](1))
b==(B='[')&!R&&branch(C,0,++c),
b==(C=']')&&R&&branch(B,1,--c),

v[5][Q]=m[v[4][Z]=c++], // code (increase after last usage)
v[7][Q]=S(s[v[6][Z]=d&=32767]&=255),
v[8][Z]=++o, // steps
v[9][Q]=s.join()
}

What does this do? As you can see for the most part, each instruction has a simple expression. d is the data pointer, while c is the code pointer and s is the stack (where the data pointer points to).

The second function is the step function. Every call executes one step of the interpreter. The branch function extracts the while statement (required for [ and ]) and allows me to create a single expression from the entire step function. All variables are global for this exercise, except for those used as parameters.

So first the step function takes the next byte from the source input (c points to the correct byte of the source, m contains our source). The result is cached (one byte per reference, instead of four). If there is no such input, the stop function is called and the interpreter stops. Otherwise the interpreter core runs.

Note that I go to great lengths to avoid statements in this function. In fact, it is one big statement. This prevents me from having to return anything. The ternary operator either quits the interpreter or it executes one entire step.

First the GUI source field is updated with a visual indicator of where the code pointer is currently at. This takes up painfully many bytes, but I feel they are worth it. Note that I cache 'substr' here because I know for sure this is the first place it is used in the program. After that it's available as a global as L.

Next the current element of the stack is casted. Arrays in JavaScript are simply objects, every position in the array is undefined, instead of a number. But doing math on undefined results in NaN for most operations. So instead of casting the value at every use, I make sure it's a number here by calling s[d]|=0. This causes the current element to be binary ORed with 0. In JavaScript, ORing any value will result in at least a number. If NaN would be produced, zero is returned instead. We make use of that feature here. If s[d] was in fact already a number, it is unchanged. The result is also cached in R, again because that's one byte versus four.

The next lines should be obvious. I mentioned previously that if then else can be replaced by the ternary operator (?:) or && and ||. I make use of that here, explicitly trying to avoid a statement. The comma operator is stronger than the && operator so I can safely use this pattern. Since b will only match one character per step call I don't have to worry about using else. It's just slightly less efficient this way.

The first two either increment or decrement the data pointer. Nothing fancy. The third and fourth increment or decrement the data value. Again, nothing fancy here.

The next line, checking for b=='.', needs to output a byte. This uses the static String.fromCharCode function to create convert the number into a character, which can be stored in one variable (S) because it is static. j is the output element in the GUI and Z is 'innerHTML'. The character is appended, which is considered to be really slow, but we don't care about that here. Since Brainfuck normally outputs to stdout and 80 character column mode is still considered normal I decided to wrap output if it exceeds 80 bytes. Note that you won't see that right now because that element does not have whitespace set to pre right now, but I'll probably fix that later. k tracks the output length (shorter than polling the actual length) and inserts a return every 80 chars. Since we use ++k the return is not added on the first character.

Moving on to input, the interpreter accepts input through one of the input elements from the GUI. It caches this value (so you won't have to re-enter it after a reset) in the variabel n. If no more characters exist, the value zero is returned, marking the end of the input. Since s[d] is the current element, the result of n.charCodeAt(0)|0 is stored there. If n is an empty string (because we've eaten it all), charCodeAt will return undefined, so we OR the value with zero. Come to think about it, since we cast elements to zero at the beginning of each step, this seems to be a redundant step right now (except for showing undefined in the GUI after such a step).

The temporary input element of the GUI is updated with the remaining input string, which is a substring of n from the second character onwards. The whole sub-expression needs to be embraced by brackets because otherwise they would not be considered a single sub-expression after the &&. There's no way around this.

Now we get to the [ and ]. These are actually the biggest problem for the interpreter size because they require an iteration. You can't just take the .indexOf('[',c) because that doesn't take care of nesting. You need to check bytes and remember state, only stopping when you know the current closing bracket is the one you need. Now this was a minor problem for me because all the iteration tools in JavaScript are in fact a statement. I needed this part to be an expression. So I wrote it into a function. At first I wrote two functions, one for each side, but later I merged them up to the point you see now in the branch.

This function takes two parameters. The first is either the '[' or ']' value. This is a cached value and prevents us from having to repeat the ternary expression (J?B:C). The second parameter is a boolean (1 or 0) indicates which instruction we is executed right now. This affects the direction of searching. The while has a condition that starts with x|m[c]. This checks whether the current character is the one we're looking for and whether we need to skip it. The x is our counter and basicaly counts scopes. Note that x is reset from the step function, to save one byte. The next part, I&c>J|m[c+J] reads like (I&(c>J))|(m[c+J]), so:

Code: (JSish)
if ((direction=='left' && c > 1) || source[codePointer+1]) ..

Basically, when going left, don't go beyond the start of the input. This interpreter doesn't throw errors if that happens. It just happily ignores that fact. When going right, the end of the input should not be exceeded. So the left part of the while condition checks whether we have the character we are scanning for, the right part checks whether the current position is still within the source. Loop as long as left is not the case and while right is the case. Note that the inverse is impossible, because if we find our target we can never be out of bounds.

The body of the whole is a complex expression, x+=(G=m[J?c--:c++])==(J?C:B)||G==I&&-1. What actually happens here is tracking nesting depth. Since the while header checked whether the current character is our target (and it wasn't), we assume this as a fact. So if we encounter a square bracket, we increment or decrement the counter (x) depending on the direction of searching (J). Obviously, the result is added to x through x+=. The following expression (G=m[J?c--:c++])==(J?C:B) checks whether the current character, cached in G is the same as the initial character C or B, again depending on direction J. Note that at the same time we increment or decrement the code pointer, to save space. If the condition is true, it is returned due to the ||. Booleans are coerced to zero or one, so x will get incremented. Otherwise we check whether it is the opposite bracket, G==I. We sneakily pass on this char as a parameter to save space (was the most efficient way). If this is true -1 will be returned, otherwise 0, and added to x. And that's how we track nesting depth for both ways. Quite an expensive instruction though.

Note that the c>=Y|m[c+Y] is actually rewritten from m[c+(Y?-1:1)] (which is equivalent to Y?c>=0:m[c+1]), saving two bytes.

While working on the second version and testing the quine-ish-ness it appeared that this code is not functioning entirely correct. The condition for the while is now x||m[c]!=I&&c>=0&&m[c+J].

Back to the step function, the remaining lines updates the GUI. v refers to document.body.children. We use numbered indices, which can be tricky, to be short. The first line sets the code pointer and the code value. The second line does the same trick for data pointer and data value, but this line also binds the limits of the data pointer AND data value. I simply use a binary AND to make sure the pointer doesn't exceed 32K values (okay, this is about 2k more than specced, but implementations differ in this regard anyways) and the value doesn't exceed 255. I used 32K because I use the AND operator and I need a number with all ones. 30k is not all ones, so I use 32k. Note that this operation also clears out any negative numbers. The reason I do this here is because I would otherwise need to do it twice above. Since this is the first moment after the instructions to use this value, after they MIGHT have changed, it's perfect to prevent duplicate usage.

The last two lines should be obvious, setting the number of steps and printing the contents of the array. Note that yes, this will get ugly if you wrap around the stack and set the last number... Big deal.

Okay, I won't cover the rest of the program in the same detail. Instead I'll just highlight whatever I deem interesting.

The HTML literal has been drastically stripped, starting out as 400 bytes, including styles and whatever, I dumbed it down to this. I make use of the <u>, <q>, <b> and <input> tags and one <textarea> for the inputting the source. The default styles for these tags are taken for granted (a larger source input or stack input is useful though). Note that I use tabs instead of spaces here, the original (and hopefully future) version set document.body.style.whiteSpace to pre. Same reason the \n are still in.

I obviously try to embed variable initialization into the first usage of that value. I do this to the extreme for the functions. The whole demo uses five functions. Only one of them isn't immediately assigned or even invoked (the while loop function). The step function is bound to the step element in the GUI at the same time as when it is invoked in the run function. Now that I write about it, this does make the step button useless without first pressing run. Damnit.

Anyways, The start and stop functions will update the display style for several elements and either start a timer for step or stop the last timer. The stop function in fact calls the start function, while having the global Y set to 1, causing it to stop the timer rather than start it. The stop function is immediately invoked upon initialization, straightening up the GUI, because q and input elements don't have display:block by default. The assignment and execution of the start and stop functions is in the call brackets for running the reset function, after assigning/attaching it.

In the reset function, s=[Y=o=c=d=k=0]; initializes certain variables while putting the result in the array. The array doesn't need initialization but doesn't suffer from it either (due to it getting zeroed out at the start of each step). This move saves me one byte (the semi-colon).

The timer variable l is zeroed out when setting the onclick handle for R in the GUI, because that's the only zero used in the initialization phase. And I can't set it in the reset function before I use it, because the first usage is to clear the existing timer.

The entire demo (note, this is version 2), at the end of my manual minification tuning process is this (including some rewritten snippets, commented out):

Code: (JS)
//function A(){x=0;while(((x|=0)||m[c]!=']')&&m[c+1])x+=(G=m[c++])=='['||G==']'&&-1}function B(){x=0;while(((x|=0)||m[c]!='[')&&m[c-1])x+=(G=m[c--])==']'||G=='['&&-1}
//function A(_,I){x=0;while(((x|=0)||m[c]!=(I?'[':']'))&&m[c+(I?-1:1)])x+=(G=m[(I?c--:c++)])==(I?']':'[')||G==(I?'[':']')&&-1}function B(){A(0,1)}
',[.';
function A(I,J){//J=0:jump back, J=1:jump forward,I=(J?B:C)
while(x||m[c]!=I&&c>=0&&m[c+J])
// g[Z]=m[L='substr'](0,c)+'<u style="background:#ccc;">'+b+'</u>'+m[L](c+1), // update position in while
// v[4][Q] = x+" "+c+" "+I+" "+J+" "+(c>=J|m[c+J])+" ", // debug in stack output
x+=(G=m[J?c--:c++])==(J?C:B)||G==I&&-1
}

/*
m[c+(Y?-1:1)]
Y?c>=0:m[c+1]
c>=Y|m[c+Y]
*/
//function E(){H(f,M);H(g,N);H(a,M);H(e,O);l=setInterval(D,5)}function F(){H(f,N);H(g,M);H(a,O);H(e,M);T(l)}
//function E(_,I){H(f,I?N:M);H(g,I?M:N);H(a,I?O:M);H(e,I?M:O);I?T(l):l=setInterval(D,5)}function F(){E(0,1)}
//function E(){H(f,Y?N:M);H(g,Y?M:N);H(a,Y?O:M);H(v[11],Y?M:O);Y?T(l):l=setInterval(D,5)}function F(){Y=E(Y=1)}

z=document.body,T=clearInterval,S=String.fromCharCode;
z[Z='innerHTML']='<pre><u>Reset</u> <u>Step</u> <u>Start</u> <u>Stop</u>\nStack: <input> Steps: <b></b>\nCode[<b></b>]=<b></b> Data[<b></b>]=<b></b><input><b></b><textarea></textarea><q></q><q></q></pre>';
v=z[v='children'][0][v];

a=v[10]; // input input
a[Q='value']='Hi JS1K!';

f=v[12]; // source
f[Q]='+><,[.[-],]';
//f[Q]='++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.[,.]';

g=v[13]; // input locator

E=v[1][z='onclick']=
// step
function(){
!(b=m[c])?F():
// g[Z]=m,
g[Z]=m[L='substr'](0,c)+'<u style=background:red>'+b+'</u>'+m[L](c+1),

g[K][X]=N, // show source position when stepping
R=s[d]|=x=0,
b=='>'&&++d,
b=='<'&&--d,
b=='+'&&++s[d],
b=='-'&&--s[d],
//b=='-'&&(s[d]=0),
b=='.'&&(j[Z]+=S(R)+(++k%80?'':'\n')), // output
//b==','&&(n?((s[d]=n.charCodeAt(0)),n=n[L](1)):s[d]=0),
b==','&&(s[d]=n.charCodeAt(0)|0,v[11][Z]=(n=n[L](1))||' ') // input
b==(B='[')&!R&&A(C,0,++c),
b==(C=']')&&R&&A(B,1,--c),

++c,
v[7][Z]=b, // code (increase after last usage)
v[9][Z]=S(s[v[8][Z]=d&=32767]&=255), // data
v[4][Q]=s.join(), // stack
v[5][Z]=++o // steps
};

// reset function (also an event when editing input/source and pressing the R link)
(f[$='onkeyup']=a[$]=v[l=0][z]=function(){
T(l);
s=[Y=o=c=d=k=0];
n=a[Q];
m=f[Q];
F((j=v[14])[Z]='')
//i=-1;while((i+=80)<m.length)m=m[L](0,i)+'\n'+m[L](i);
})( // reset is immediately invoked
// start and stop (E) func defs and event hooking (as the call arguments for invoking reset, note that they execute first, then the reset, which is no problem)
v[3][z]=F=function(){Y=(v[2][z]=function(){
//function H(I,J){I[K].display=J}H(f,Y?N:M);H(g,Y?M:N);H(a,Y?O:M);H(v[11],Y?M:O);
f[K='style'][X='display']=a[K][X]=Y?N='block':M;
g[K][X]=v[11][K][X]=Y?M='none':N;

Y?T(l):l=setInterval(E,1)
})(Y=1)}
)

The last interesting bit I wanted to point out is that the source code position snippet simply replaces innerHTML, inserting a new tag at the proper position. The tag, <u>, gets a background color by using the css background shorthand, the shortest color (red) and no quotes because they are optional in HTML. Hurray, for once.

The entire original code before I started minification is like this (barring errors I fixed later):

Code: (JS)
document.body.innerHTML =
'<pre>'+
'<a href="javascript:reset();">reset</a> '+
'<a href="javascript:step();">step</a> '+
'<a href="javascript:start();">start</a> '+
'<a href="javascript:stop();">stop</a>\n'+
'Code[<span id="cp"></span>] = <input id="code" style="width:30px; border:0;"> '+
'Data[<span id="dp"></span>] = <input id="data" style="width:30px; border:0;">\n'+
'Stack <input style="width: 692px; border:0;" id="stack"></textarea>\n'+
'Input <input id="input" style="width: 692px;" value="terriblewalruswalruspneumaticfluffle"/>'+
'<span id="inputspan" style="font-family:monospace; display:none;"></span>\n'+
'</pre>'+
'<div style="float:left; width: 645px; font-size:11px;">Source</div><div>Output</div>'+
'<pre id="inputlocator" style="float:left; margin: 0; width: 640px; font-size:11px;"></pre>'+
'<textarea style="float:left; height: 200px; width: 640px;" id="source" onkeyup="reset()">[,.]</textarea>'+
'<pre style="min-height: 200px; width: 640px; border: 1px solid black; border-bottom: 0; border-right: 0; float: left; white-space: pre; font-family: monospace; margin-top: 0; padding: 2px; font-size:11px;" id="output"></pre>'+
'';

var ia = document.getElementById('code');
var ic = document.getElementById('cp');
var id = document.getElementById('data');
var is = document.getElementById('dp');
var ii = document.getElementById('input');
var si = document.getElementById('inputspan');
var ts = document.getElementById('stack');
var ti = document.getElementById('source');
var to = document.getElementById('output');
var il = document.getElementById('inputlocator');

var c = 0; // code pointer
var d = 0; // data pointer
var s = []; // stack
var tn = 0; // output len
var tel = 0;
var timer;
var pleaseStop = false;
var startsource;
var startinput;
function reset(){
c=d=tn=tel=0;
s=[];
clearInterval(timer);
startsource = ti.value;
startinput = ii.value;
startsource = startsource.split('').filter(function(c){return c!=' '&&['<','>','+','-',',','.','[',']'].indexOf(c)>=-1}).join('');
to.innerHTML = '';
var i=-1;
while ((i+=80) < startsource.length) startsource = startsource.substring(0,i)+'\n'+startsource.substring(i);
}
function step(){
il.innerHTML = startsource.substring(0,c)+'<u style="background-color:#ccc;">'+startsource[c]+'</u>'+startsource.substring(c+1);
if (c>=startsource.length || pleaseStop) {
pleaseStop = false;
return stop();
}
s[d] |= 0;
b = startsource[c];
if (b == '>') {
if (d == 30000) d = 0;
else ++d;
}
if (b == '<') {
if (!d) d = 30000;
else --d;
}
if (b == '+') {
if (s[d] == 255) s[d] = 0;
else ++s[d];
}
if (b == '-') {
if (!s[d]) s[d] = 255;
else --s[d];
}
if (b == '.') {
++tn;
to.textContent += String.fromCharCode(s[d]) + (tn % 80 ? '' : '\n');
}
if (b == ',') {
if (startinput) {
s[d] = String.charCodeAt(startinput[0]);
si.innerHTML = startinput = startinput.substring(1);
} else {
s[d] = 0;
}
}
if (b == '[' && !s[d]) {
var x=0;
++c;
while ((x || startsource[c] != ']') && c<startsource.length) {
if (startsource[c] == '[') ++x;
else if (startsource[c] == ']') --x;
++c;
}
}
if (b == ']' && s[d]) {
var x=0;
--c;
while ((x || startsource[c] != '[') && c>=0) {
if (startsource[c] == ']') ++x;
else if (startsource[c] == '[') --x;
--c;
}
}
ic.innerHTML = c;
ia.value = startsource[c];
id.value = String.fromCharCode(s[d]|0);
is.innerHTML = d;

ts.value = s.join(',');
/*
len = ts.value+b+"("+c+","+d+") "+s.join(',')+'\n';
len = len.substring(len.length-5000);
ts.value = len;
ts.scrollTop = ts.scrollHeight;
//*/
++c;
};
function start(){
ti.style.display = 'none';
il.style.display = 'block';
ii.style.display = 'none';
si.style.display = 'inline';
timer = setInterval(step,5);
}
function stop(){
ti.style.display = 'block';
il.style.display = 'none';
ii.style.display = 'inline';
si.style.display = 'none';
clearTimeout(timer);
}

reset();

As you can see, a few minor features were stripped for size. Most were kept though :) I think the most notable the source characters are now not filtered and the source is not force wrapped at 80 chars.

While I don't know any other important things to notice here, I think I've said quite enough already :) I hope you enjoyed this read through and hope it helped you. Good luck in the js1k contest!