asm.js primer

2016-07-23

Let me start by stating the obvious; asm.js is NOT designed to be written manually. It was designed to be a "compilation target", translating from one language to another. The syntax of asmjs is a subset from ES5 and contains a lot of mandatory overhead for typing. Additionally it is very restrictive in typing and memory, you'll have to do many memory related things yourself. The tradeof is that this allows browsers to infer types much better and compile the code in a highly efficient manner. Additionally, to some degree, you control the GC.

Do not proceed if you're learning to code, JS or otherwise. You have been warned. There are only a handful of real world applications where you'd want to write asmjs and "if you have to ask", yours probably isn't one of them.

Okay let's get going. I'm going to try and explain the latest asm.js spec (from 2014, at the time of writing) in more human readable form. I hope :)

Module wrapper


All asm.js code is wrapped in a "CommonJS" kind of way, pre-dating ES6, where you have a function that acts as your scope. Everything happens inside the scope. This function gets a global object (your regular window, for example) with all its built-ins and you can access parts of of them and when you do, asm.js will know their types and compile them. We'll get back to this.

Code:
// from the asmjs.org website
function MyAsmModule() {
"use asm";

//...

return {
// functions to expose
};
}

Nothing spectacular except that the "declaration prologue" for "use asm"; will ask the browser to interpret the entire body of this function as an "asm.js module". You don't need to repeat the prologue for every function inside the module, that is inferred. The good part, by design, is that the code will run with regular JS semantics if the browser does not support asmjs. In some cases it may just not be as efficient. And even if asmjs is supported but the compilation failed, the module should still be available to you albeit probably a bit slower.

The asmjs module is defined as accepting up to three parameters: stdlib, foreign, and heap. You can use different names or omit params right-to-left, if you want. Inside the module we can interpret the heap as being of various types but at its core it's just an ArrayBuffer (a fixed length unsigned byte array). The length has to be a power of 2, so 2^5 or 2^200 but not 300. So yes, you are responsible for any memory management inside asmjs modules. Creepy.

Code:
function MyAsmModule(stdLib, foreign, heap) {
"use asm";

//...

return {
// functions to expose
};
}

Keep in mind you can't use eval as a var name anywhere. Also, any ASI rules from regular JS apply.

The browser does not call this module to initialize it, you do this yourself like let mymod = MyAsmModule(window, funcs, new ArrayBuffer(1024));.

Heap


Your asm.js code works on one "heap". Meaning one large fixed consecutive area of memory for you to do with as you please. The memory is essentially an ArrayBuffer on which you can put certain "viewports" (TypedArray) which interpret the memory in certain formats like int or float.

The heap is like a scratch pad insofar that the buffer is fixed between calls. You can write input data into it and read output from it by maintaining a reference to the buffer externally.

The size is fixed and while there was a proposal to allow growing the buffer, this proposal was knifed because heuristics in v8 caused problems. Thanks. The only way to grow now is to create a new buffer, copy the old buffer into it (using .set()) and use a new module call with that buffer. Note that wasm does allow growing memory.

Types


Now for annotating. Pretty much everything must be annotated and there are a few types we can take into account for this:

- void (undefined)
- double (regular js numbers)
- signed (32bit integers where the most significant bit is a sign flag)
- unsigned (32bit integers without sign flag)
- int (either signed or unsigned but unspecified otherwise)
- fixnum (any int that would have the same value as signed and unsigned, so any positive literal that uses the most significant bit)
- intish (the result of an int operation and must be explicitly casted back to signed or unsigned)
- double? (for operations that produce either a double or undefined, these must be casted back to a number)
- float (32bit floating number)
- float? (for operations that produce either a float or undefined, these must be casted back to a number)
- floatish (the result of a float operation and must be explicitly casted back to float with Math.fround)
- extern (any value exposed outside of asmjs scope)

From these, you'll only actively use signed, unsigned, and double. And a lot of that. There are no strings, no arrays, no objects, no higher level types. You're on your own here.

You'll need to explicitly annotate any function arguments, return types, and any use of variables. Annotations are in the form of (otherwise) excessive source code from which the environment can 100% determine one of the above types. In particular:

- signed is forced by foo | 0
- unsigned is forced by foo >>> 0
- double is forced by +foo

Number literals


Number literals become a double by having a dot in them (2.0), fixnum if they are below 1<<31, and unsigned if they are between 1<<31 and 1<<32 because there's no other way to interpret positive 32bit number with that most significant bit set. You explicitly get a float by doing fround(2.0) where fround must be the built-in Math.fround function. Any number literal that is beyond 32bit (signed or unsigned) is deemed syntactically invalid.

You can negate a number literal by prefixing - to it. Note that you still have to do explicit casting with + or |0 in that case. If the value has no dot (.) and the result would be 32bit. Otherwise a double or invalid.

Variables


Module globals, vars declared inside the main scope of the module, can only have a fixed number of initializers;

- as an int (signed?) var x = 5;
- as a double var x = 5.0;
- as a float var x = fround(5.0);, in this case the name must be "fround" and the number must contain a dot
- "library import" var x = stdlib.foo, note that stdlib must be the same name of the first param of the module and foo can only be one of a handful of names
- math import, similar to library import but with Math added var x = stdlib.Math.floor;
- external imports
-- var x = foreign.foo imports a value as read-only function
-- var x = foreign.foo|0 imports a value as read-write int
-- var x = +foreign.foo imports a value as a read-write double
- heap vieports var x = new stdlib.Uint16Array(heap), where stdlib and heap are matching module param names and the result is a certain known ArrayBufferView view on the module buffer

Variables in a function can only be defined in three different ways;

- as an int (signed?) var x = 5;
- as a double var x = 5.0;
- as a float var x = fround(5.0);, in this case the name must be "fround" and the number must contain a dot

You can't initialize a var with another var or other kinds of expression. I think this restriction is a little silly but I suppose it reduces the number of combinatory edge cases.

Functions


Functions are the only place for logic. They can have arguments, which must be each casted in an explicit way on the first lines of the function. They can have a return value, which may not be implied (unlike JS).

The spec doesn't mention var statements as a valid statement type but they are defined above.

All var statements in a function must be grouped at the top of a function. This fact is defined a little obscure in the spec. You can see it in the function type annotations. You can use multiple statements as long as they are grouped. Note that you'll have to initialize the vars with a literal for typing (see above). You can assign the result of an expression to them after that.

Arguments


Function arguments must be typed explicitly on the first lines of a function.

Code:
//... asmjs header

function foo(a, b) {
a = a | 0;
b = fround(b);

// ...
}

Note that the casting of the parameters is actually part of the syntax for declaring an asm.js function. It can't happen elsewhere in the function and is not otherwise inferred even if it could be.

Returns


Actually, the last example is incorrect because the return type must also be declared. It can have one of five types:

- return +x; (returns a double)
- return x|0; (returns a signed)
- return 5; and return -5;
- return func(x); (only when func is one of a set of standard functions, other functions must be casted)
- return; (void, function result can not be assigned)

Note that if you return a variable or defined (non-standard) function the type is not inferred from that value, you must always explicitly cast it.

Logic


In an asm.js module, regular JS logic like if and blocks can only appear inside functions and are actually illegal in the module global space. From what I can see, most of the regular ES3 syntax is valid with a touch of ES5. This includes if, if-else, blocks, return, while, do, for(;;) (but NOT for-in!), break, continue, labels, and switch with case and an optional switch. Expression statements, basically anything that yields a value, are also allowed.

Switches


Cases in a switch can only have signed integer literals as values. The condition must also yield a signed int.

They ought be compiled as a jump table so keep that in mind. If you use numbers that are too big you'll get an error about this. This actually confused me at first because Firefox, at the time, threw "asm.js type error: all switch statements generate tables; this table would be too big". At first I thought that meant there were too many cases. A quick twitter convo revealed that the real reason was caused by using case values that were simply way too big and I interpreted the error message in a wrong way.

I'm not sure if default can appear anywhere but last case in a switch like it can in JS but who ever does that, anyways. There's a good chance you didn't even know about this edge case before reading it now ;) The asm.js spec is not ruling this out either way, and since JS explicitly allows it but jump tables don't work that way... well, you're on your own here.

Expressions


A few particulars;

- Since there are no objects as concept, the only property access you can do is "as an array" on the heap. Return type for this access depends on the type of heap view being accessed.
- You can't assign to variables not explicitly defined in the module or the current function
- ~~x is explicitly understood to convert an arg to signed if it is a double or float? (but not double? or float?)
- +foo() will be a double if the function returns a double
- There is an explicit list of input-output types for most binary operators. If the types listed are "super types" they can be substituted by a valid sub-type and yield the same (as listed) type as a result.
- addition and negation results in an intish type if the values are below 1<<20 and double otherwise (I don't think it errors out on the typing)
- x ? y : z is valid and works as in JS; returns the value from either operand, but unlike JS the operands must have the same type
- compound assignments don't work, that's stuff like |= ^= etc. You have to write them out explicitly.

Imports


In the header you can have "standard library imports". This means you're assigning a property from that global object you pass on to a local variable in the global module space, so;

Code:
function module(stdlib) {
var fround = stdlib.Math.fround;
}

The spec explicitly only allows certain properties from the global or global.Math to be imported this way. (Spoiler; Infinity, NaN, and most of Math).

You can also import things from the foreign argument (the second argument of the module wrapper) in two forms.

- you can import it as is and in that case it becomes an immutable function
- you can import it casted foreign.foo|0 (as int) or +foreign.foo (as double), and the var will be mutable.

Code:
function module(stdlib, foreign) {
var func = foreign.foo; // immutable function
var a = foreign.a | 0; // mutable int
var b = +foreign.b; // mutable double
}

Heap


You can declare various ways of accessing the heap, the third parameter of the module wrapper, by wrapping the heap in one of the valid ArrayBufferView types].

Jump tables


The spec allows to define a function jump table as a module global. This is kind of an array of functions. Every function should return the same type.

Code:
function mymod() {
function a(){ return 5; },
function b(){ return 20; },

function go(n) {
n = n | 0;
return jmp[ n & 1 ]() | 0;
}

// !Important! The jump table goes AFTER functions and BEFORE the export...

var jmp = [ a, b ];

return {go:go};
}

I don't think the spec is very clear on these tables so let me point out some caveats;

- the table can only be declared in the global module space
- the declaration goes after the functions and before the export
- accessing the function requires the & 1 (or any int literal), is | not allowed
- the tables must be a power of 2, so len=2^n for some n

Note that array literals don't exist as such. The type of jmp is actually an "immutable table".

I think the way to access them is a little confusing. The & 0 way of accessing a function may not look like the same as |0 but keep in mind that & is not a "logical and". You can do x & 0xff (where 0xff is the length of the table) which is effectively the same as |0 except it also makes sure the number can't spill over the array length. Since the table length is known at compile time and the right number must be a literal, the compiler can confirm and "proof" that the table access never exceeds the length.

Code:
jmp[index & 1](); // for a jump table with 2 functions
jmp[index & 1023](); // for a jump table with 1024 functions
jmp[index & 31](); // for a jump table with 32 functions
// etc...

I would probably expect it to be jmp[ n >>> 0 & 0xff ] to make sure the index can't end up negative. But perhaps I missed something that already ensures this.

Exports


An asm.js module exposes its functions by returning them on an object literal or by returning one function, just like most CommonJS modules.

Code:

function mymod() {
"use asm";
function a(){ return 20; }
return a;
}

function mymod() {
"use asm";
function a(){ return 20; }
function b(){ return 30; }
return {foo:a, bar:b};
}

Syntax


Those are the basics to reading and writing a valid asm.js program. Keep in mind that the actual syntax only accepts a subset of that of actual JS. In fact, JS means ES5 here as it predates ES6 so there is no let, const, etc. Everything is var and function.

Validation


There's a validator that's part of the asmjs repo: https://github.com/dherman/asm.js/blob/master/lib/asm.js

Firefox actually gives some sensible feedback when running asmjs code (line numbers are helpful), though some messages are a bit cryptic at times.

And here is a nice validation suite where you can paste your code and it'll tell you what's up: http://turtlescript.github.cscott.net/asmjs.html

Conclusion


The fact that the code works with and without explicit asm.js support is an important selling point. It means you can have asmjs conforming code that'll still work everywhere and as Chrome is demonstrating; mostly without significant slowdowns.

I think the asm.js syntax does force you to think a little closer about your code. That should never really hurt. Except perhaps your brain.