Functional Programming in SpiderMonkey Redux

In his blog post Two Reasons Functional Programming Is Slow in SpiderMonkey, Shu-yu Guo breaks down some of the issues SpiderMonkey has when optimizing functional style programs. Since I am working on this very issue as a Mozilla intern, I am going to try to give an updated account of what SpiderMonkey does wrong and some of the initial fixes I am working on.

As a running example, I’ll make use of the same benchmark Shu used:

 1 function benchmark(n, iters, f) {
 2   var outer = [];
 3   for (var i = 0; i < n; i++) {
 4     var inner = [];
 5     for (var j = 0; j < n; j++)
 6       inner.push(Math.random() * 100);
 7     outer.push(inner);
 8   }
 9   var start = Date.now();
10   for (var i = 0; i < iters; i++)
11     f(outer);
12   return Date.now() - start;
13 }
14 
15 Array.prototype.myForEach = function (f) {
16   for (var i = 0; i < this.length; i++) {
17     f(this[i]);
18   }
19 }
20 
21 function doForEach(outer) {
22   var max = -Infinity;
23   outer.myForEach(function (inner) {
24     inner.myForEach(function (v) {
25       if (v > max)
26         max = v;
27     });
28   });
29 }
30 
31 console.log(benchmark(50, 50000, doForEach));

This post will attempt to explain why the above code is nearly 3 times slower than the imperative equivalent and how to modify the JIT to recover most of the performance hit.

 1 function doForEach(outer) {
 2   var max = -Infinity;
 3   for (var i = 0; i < outer.length; i++) {
 4     var inner = outer[i];
 5     for (var j = 0; j < inner.length; j++) {
 6       var v = inner[j];
 7       if (v > max)
 8         max = v;
 9     }
10   }
11 }

Unsurprisingly, SpiderMonkey does quite a good job optimizing the second version, producing code where the inner loop operates over contiguous arrays of Doubles without intervening type tests. This is the kind of low-level code that compilers want us to write for them. The code generated for the first version is markedly worse due to the interactions of two shortcomings of SpiderMonkey:

Bad inlining heuristics for recursive functions
Poor type specialization of certain operations

Problem 1 is a result of the conservative nature of SpiderMonkey, which must balance code quality with compile time. SpiderMonkey’s current heuristic for preventing unrolling of recursive functions will only inline a function once, even if the function is not recursive. This means that the outer call to myForEach (and its kernel) is inlined into doForEach, and a function call is emitted for the inner call to myForEach.

Unfortunately, SpiderMonkey cannot generate optimized code for myForEach without inlining it into the caller, which bring us to problem two. SpiderMonkey tracks the types of function arguments and the result types of JavaScript operations and uses that information to generate code specialized to those types. This benchmark invokes myForEach on arrays of double and arrays of arrays of doubles, requiring SpiderMonkey to generate code general enough to handle both possibilities.

Annoyingly, a simple change to the code fixes both of these issues: simply duplicate myForEach and use a different version at each call site. Now, SpiderMonkey inlines both calls and has type information specific to the call site.

 1 Array.prototype.myForEachArrayOfArrayOfDouble = function (f) {
 2   for (var i = 0; i < this.length; i++) {
 3     f(this[i]);
 4   }
 5 }
 6 
 7 Array.prototype.myForEachArrayOfDouble = function (f) {
 8   for (var i = 0; i < this.length; i++) {
 9     f(this[i]);
10   }
11 }
12 
13 function doForEach(outer) {
14   var max = -Infinity;
15   outer.myForEachArrayOfArrayOfDouble(function (inner) {
16     inner.myForEachArrayOfDouble(function (v) {
17       if (v > max)
18         max = v;
19     });
20   });
21 }

This version is only 20% slower than using manual for-loops. While this works, it is unsatisfying and does not generalize well - we expect to use myForEach at many locations with many different types, requiring many duplicates. C++ solves this problem by using templates to duplicate function bodies and perform type specialization helped along by the programmer, while the ideal solution for JavaScript solves the problem transparently.

Can we work around the inliner?

As a simple test, turn off the inlining restriction on recursive functions. Somewhat surprisingly, this does not have the desired effect; though both calls to myForEach are inlined, the benchmark actually slows down. Upon inspection, the generated code for each use of myForEach is not specialized to the type information available from doForEach. In particular, the results of array accesses are not specialized, requiring unbox and type check operations on all array accesses.

Problem 2 rears its ugly head again. Walking through the SpiderMonkey code which generates array access instructions, we see that array access generate instructions based only on the observed results of element accesses - the type(s) of the array is never consulted when deciding the result type. By inlining both calls to myForEach into doForEach sufficient type information should be available to specialize the expected result type of this[i] in myForEach.

How can this refinement be accomplished in SpiderMonkey? SpiderMonkey tracks the set of types which may result from each operation. These sets are generated by observing the results during execution and may be invalidated later. Fortunately, SpiderMonkey tracks, as part of an object’s type, the set of types produced by indexing said object by an integer. For an object type \( o \), this set is denoted as \( \mathrm{index}(o) \). Given the type set for the receiver of an indexing operation, the expected result type can be approximated as

\[ T_{ a[i] } = \bigcup_{o ~ \in ~ T_a} \mathrm{index}(o), \]

where \( T_e \) is the set of result types for some expression \( e \) (this notation is taken from the paper describing SpiderMonkey’s type inference algorithm). Simply, this equation says that the possible result type of indexing an object is the union of its possible index type sets.

Care must be taken when using this refinement, it is not always the case that this computation will yield more precise types than the result types recorded for the subscript operation. For that reason, SpiderMonkey only performs this refinement when the computed type is a subset of the recorded result types (i.e. we take the more precise of the possible result types).

When this reasoning is incorporated into SpiderMonkey, the forEach example performs as well as the version with duplicated code. This version is still ~20% slower than manual use of for-loops, but that is still a sight better than 300% slower. So, a combination of judicious inlining and propagation of type information largely solves the problem. Sadly, the restrictions on inlining are there for a good reason – turning the restriction off completely causes significant performance regressions on benchmarks making use of recursive functions.

Areas to improve upon

This leaves a couple of places where SpiderMonkey can be improved. A low hanging piece of fruit is to device a better inlining heurisitc which can distinguish between truly recursive functions and functions whose usages make them appear recursive.

A more ambitious target is to improve SpiderMonkey’s code generation in cases where inlining fails. The offending benchmark presented here is relatively small, so a deep inlining is not a problem, but we cannot always rely on getting the ideal output from the inliner. Further, this example relies on outer myForEach call getting inlined into its caller to provide sufficient type information, which will only occur when the caller is also hot. In many cases, a function f is hot and relies on type information from its caller g to produce efficient, type-specialized code.

 1 /*
 2  * Expensive data processing function, with multiple call sites (other than |g|)
 3  * which pollute the type information associated with |f|.
 4  */
 5 function f(data) {
 6     /* expensive data processing */
 7     return result;
 8 }
 9 
10 function g() {
11     var x = /* some big data set */
12     f(x);
13 }
14 
15 g();

Currently, the only way to get the type information from g to f is to compile g and inline f into its call site; this requires the JIT to consider g hot as well, a condition not implied by f being hot. Indeed, g may only be executed once, but its call to f may perform a large amount of work, necessitating compilation. So, a more difficult fix is to improve type information based on call site without inlining or determine when compiling the caller of a hot function is advantageous.

On a final note, it may seem reasonable to apply the optimization described above to general property accesses. The element types are stored are treated as a special property for each type, so applying this logic to property reads is simple. Alas, this has no observable effect on performance due to an eccentricity of how arrays special cased by the type inferencer: arrays are typed based on the types of there elements, while other types are simply identified by their class.

In SpiderMonkey’s type inferencer, arrays really have types like Array<Double> or Array<String>, whereas all instances of my special array type MyArray will have type MyArray irrespective of the elements therein. This means the type inferencer cannot determine whether some expression always produces a MyArray containing integers. This coarseness ensures that the number of types in a program stays small, but we lose the ability to make important distinctions about the structure of a type in the type system.