Functional Programming in SpiderMonkey Redux
Spenser Bauman
In his blog post Two Reasons Functional Programming Is Slow in SpiderMonkey, Shu-yu Guo breaks down some of the issues SpiderMonkey has when optimizing functional style programs. Since I am working on this very issue as a Mozilla intern, I am going to try to give an updated account of what SpiderMonkey does wrong and some of the initial fixes I am working on.
As a running example, I’ll make use of the same benchmark Shu used:
1 function benchmark(n, iters, f) {
2 var outer = [];
3 for (var i = 0; i < n; i++) {
4 var inner = [];
5 for (var j = 0; j < n; j++)
6 inner.push(Math.random() * 100);
7 outer.push(inner);
8 }
9 var start = Date.now();
10 for (var i = 0; i < iters; i++)
11 f(outer);
12 return Date.now() - start;
13 }
14
15 Array.prototype.myForEach = function (f) {
16 for (var i = 0; i < this.length; i++) {
17 f(this[i]);
18 }
19 }
20
21 function doForEach(outer) {
22 var max = -Infinity;
23 outer.myForEach(function (inner) {
24 inner.myForEach(function (v) {
25 if (v > max)
26 max = v;
27 });
28 });
29 }
30
31 console.log(benchmark(50, 50000, doForEach));
This post will attempt to explain why the above code is nearly 3 times slower than the imperative equivalent and how to modify the JIT to recover most of the performance hit.
1 function doForEach(outer) {
2 var max = -Infinity;
3 for (var i = 0; i < outer.length; i++) {
4 var inner = outer[i];
5 for (var j = 0; j < inner.length; j++) {
6 var v = inner[j];
7 if (v > max)
8 max = v;
9 }
10 }
11 }
Unsurprisingly, SpiderMonkey does quite a good job optimizing the second
version, producing code where the inner loop operates over contiguous arrays of
Double
s without intervening type tests.
This is the kind of low-level code that compilers want us to write for them.
The code generated for the first version is markedly worse due to the
interactions of two shortcomings of SpiderMonkey:
- Bad inlining heuristics for recursive functions
- Poor type specialization of certain operations
Problem 1 is a result of the conservative nature of SpiderMonkey, which must
balance code quality with compile time.
SpiderMonkey’s current heuristic for preventing unrolling of recursive functions
will only inline a function once, even if the function is not recursive.
This means that the outer call to myForEach
(and its kernel) is inlined into
doForEach
, and a function call is emitted for the inner call to myForEach
.
Unfortunately, SpiderMonkey cannot generate optimized code for myForEach
without inlining it into the caller, which bring us to problem two.
SpiderMonkey tracks the types of function arguments and the result types of
JavaScript operations and uses that information to generate code specialized to
those types.
This benchmark invokes myForEach
on arrays of double and arrays of arrays of
doubles, requiring SpiderMonkey to generate code general enough to handle both
possibilities.
Annoyingly, a simple change to the code fixes both of these issues: simply
duplicate myForEach
and use a different version at each call site.
Now, SpiderMonkey inlines both calls and has type information specific to the
call site.
1 Array.prototype.myForEachArrayOfArrayOfDouble = function (f) {
2 for (var i = 0; i < this.length; i++) {
3 f(this[i]);
4 }
5 }
6
7 Array.prototype.myForEachArrayOfDouble = function (f) {
8 for (var i = 0; i < this.length; i++) {
9 f(this[i]);
10 }
11 }
12
13 function doForEach(outer) {
14 var max = -Infinity;
15 outer.myForEachArrayOfArrayOfDouble(function (inner) {
16 inner.myForEachArrayOfDouble(function (v) {
17 if (v > max)
18 max = v;
19 });
20 });
21 }
This version is only 20% slower than using manual for
-loops.
While this works, it is unsatisfying and does not generalize well - we expect to
use myForEach
at many locations with many different types, requiring many
duplicates.
C++ solves this problem by using templates to duplicate function bodies and
perform type specialization helped along by the programmer, while the ideal
solution for JavaScript solves the problem transparently.
Can we work around the inliner?
As a simple test, turn off the inlining restriction on recursive functions.
Somewhat surprisingly, this does not have the desired effect; though both calls
to myForEach
are inlined, the benchmark actually slows down.
Upon inspection, the generated code for each use of myForEach
is not
specialized to the type information available from doForEach
.
In particular, the results of array accesses are not specialized, requiring
unbox and type check operations on all array accesses.
Problem 2 rears its ugly head again.
Walking through the SpiderMonkey code which generates array access instructions,
we see that array access generate instructions based only on the observed
results of element accesses - the type(s) of the array is never consulted when
deciding the result type.
By inlining both calls to myForEach
into doForEach
sufficient type
information should be available to specialize the expected result type of
this[i]
in myForEach
.
How can this refinement be accomplished in SpiderMonkey? SpiderMonkey tracks the set of types which may result from each operation. These sets are generated by observing the results during execution and may be invalidated later. Fortunately, SpiderMonkey tracks, as part of an object’s type, the set of types produced by indexing said object by an integer. For an object type \( o \), this set is denoted as \( \mathrm{index}(o) \). Given the type set for the receiver of an indexing operation, the expected result type can be approximated as
\[ T_{ a[i] } = \bigcup_{o ~ \in ~ T_a} \mathrm{index}(o), \]
where \( T_e \) is the set of result types for some expression \( e \) (this notation is taken from the paper describing SpiderMonkey’s type inference algorithm). Simply, this equation says that the possible result type of indexing an object is the union of its possible index type sets.
Care must be taken when using this refinement, it is not always the case that this computation will yield more precise types than the result types recorded for the subscript operation. For that reason, SpiderMonkey only performs this refinement when the computed type is a subset of the recorded result types (i.e. we take the more precise of the possible result types).
When this reasoning is incorporated into SpiderMonkey, the forEach
example
performs as well as the version with duplicated code.
This version is still ~20% slower than manual use of for
-loops, but that is
still a sight better than 300% slower.
So, a combination of judicious inlining and propagation of type information
largely solves the problem.
Sadly, the restrictions on inlining are there for a good reason – turning the
restriction off completely causes significant performance regressions on
benchmarks making use of recursive functions.
Areas to improve upon
This leaves a couple of places where SpiderMonkey can be improved. A low hanging piece of fruit is to device a better inlining heurisitc which can distinguish between truly recursive functions and functions whose usages make them appear recursive.
A more ambitious target is to improve SpiderMonkey’s code generation in cases where
inlining fails.
The offending benchmark presented here is relatively small, so a deep inlining
is not a problem, but we cannot always rely on getting the ideal output from the
inliner.
Further, this example relies on outer myForEach
call getting inlined into its
caller to provide sufficient type information, which will only occur when the
caller is also hot.
In many cases, a function f
is hot and relies on type information from
its caller g
to produce efficient, type-specialized code.
1 /*
2 * Expensive data processing function, with multiple call sites (other than |g|)
3 * which pollute the type information associated with |f|.
4 */
5 function f(data) {
6 /* expensive data processing */
7 return result;
8 }
9
10 function g() {
11 var x = /* some big data set */
12 f(x);
13 }
14
15 g();
Currently, the only way to get the type information from g
to f
is to
compile g
and inline f
into its call site; this requires the JIT to
consider g
hot as well, a condition not implied by f
being hot.
Indeed, g
may only be executed once, but its call to f
may perform a large
amount of work, necessitating compilation.
So, a more difficult fix is to improve type information based on call site
without inlining or determine when compiling the caller of a hot function is
advantageous.
On a final note, it may seem reasonable to apply the optimization described above to general property accesses. The element types are stored are treated as a special property for each type, so applying this logic to property reads is simple. Alas, this has no observable effect on performance due to an eccentricity of how arrays special cased by the type inferencer: arrays are typed based on the types of there elements, while other types are simply identified by their class.
In SpiderMonkey’s type inferencer, arrays really have types like Array<Double>
or Array<String>
, whereas all instances of my special array type MyArray
will have type MyArray
irrespective of the elements therein.
This means the type inferencer cannot determine whether some expression always
produces a MyArray
containing integers.
This coarseness ensures that the number of types in a program stays small, but
we lose the ability to make important distinctions about the structure of a type
in the type system.