JaegerMonkey – Fast JavaScript, Always!

February 26th, 2010 by dvander Leave a reply »

Mozilla’s JavaScript optimizer, TraceMonkey, is pretty powerful. It carefully observes loops and converts them to super-fast assembly. We call this “tracing”.

That’s great and all, but there’s a problem: sometimes tracing doesn’t work. Loops can throw curveballs that cause tracing to stop. Especially with recursion, or lots of nesting, it can be very difficult to build good traces on complex code.

Other JavaScript engines, such as Nitro (present in WebKit/Safari), take a simpler approach. Instead of compiling loops to assembly, they compile entire methods (functions) to assembly. The generated code is much more generic than tracing, so while it is not as fast, it can handle any curveball.

What we’ve found is that when tracing works, we’re faster than the generic approach. But when tracing fails, we have to fall back to our old-school interpreter. At that point your JavaScript runs about as fast as it would in 2007-2008 (i.e. before Firefox 3.5, Safari 4, Chrome, etc).

That’s not acceptable, and we need to fix that. Trace compilation is still an active area of research (one which we’ll continue to work on) – but in the interim, we need to make sure our “slow path” is at least as good as the competition.

The question we’ve been asked, and we’ve been asking of ourselves, is: Why couldn’t we trace and keep going SUPER AWESOME FAST, and when tracing fails, fall back to STILL REALLY FAST?

Enter JaegerMonkey.

Our new project, JaegerMonkey (or JägerMonkey), has exactly this in mind. We’re taking the tried-and-true approach of other vendors, and bolting trace compilation on top. Once the two are interacting seamlessly, you’ll have a much more consistent – and fast – JavaScript performance experience.

Dave Mandelin, Luke Wagner, Julian Seward and I have been sprinting the past few weeks to get something basic working. To emit actual machine code, we’re using some very pretty classes (“macro assembler”) from Nitro. That’s been a real treat; it’s well-abstracted and C++ish, and allowed us to get to work on the actual compiler very quickly.

Our compiler is simple so far. Before interpreting a method, we translate each bytecode into some pretty generic assembly. For example, an “ADD” opcode will emit assembly that can handle both fast cases (adding two numbers) and slow cases (adding, say, an object and a string).

Contrast this to tracing, where the types are known, and pinned, statically – it does not need to handle any extra cases that might come up. In the whole-method compiler, the generated code must handle all unexpected variations in control or type flow.

After the function is compiled we execute it right away – the interpreter is skipped entirely.

Early Progress.

We’ve barely started and the results are already really promising. Running SunSpider on my machine, the whole-method JIT is 30% faster than the interpreter on x86, and 45% faster on x64. This is with barely any optimization work! When we integrate tracing next week, we’ll already start to see the benefits of both working together.

For a more in-depth study, Dave Mandelin has blogged about our early performance gains, what’s done, up-and-coming, etc.

As we move forward, the two compilers will be tightly integrated. The method compiler will be able to identify loops and invoke the trace compiler. The trace compiler, if it decides a method is too complex to inline, may decide to invoke the method compiler.

The future of SpiderMonkey is bright and shiny, and we’ll be talking more about the project as it reaches major milestones.

In the meantime, if you are interested in learning more, I invite you to look at JaegerMonkey on the Mozilla wiki, and our makeshift source code repository. We also hang out in #jsapi on irc.mozilla.org.

23 comments

  1. Norman says:

    wow thanks guys, this sounds very promising. I hope it finds it way into Firefox 4. Really looking foward to it :)

  2. greg says:

    Are you going to support x86_64 from the start this time? I hope so.

  3. Paul says:

    Really good article.
    Maybe this is the right way.

  4. Michael says:

    Ve hunt!!

  5. dvander says:

    greg: definitely yes. it was even working before x86 :)

    (… Did I just see a Jägerkin reference?)

  6. Confiscative says:

    Is it a Moose reference?
    Why is this not CrabMonkey? :(

  7. PM says:

    Sounds excellent… Out if interest: Do you actually cache any generated code? Or would that not be a significant performance win?

  8. Dan says:

    Is it known who uses what approach in terms of all the slow and fast engines? Is this the approach that Opera uses now for their speedy results?

  9. dvander says:

    PM: We cache bytecode for the browser shell, but not webpages. It would definitely make sense to cache native code if we’re doing either, I think, but I don’t know if anyone’s tried it enough to measure.

    Dan: Most JavaScript engines use whole-method JIT compilation of some sort. Mozilla is the only one to use trace compilation at all.

  10. PM says:

    Dan: as far as I understand, opera JITs whole functions, though they seem to be doing some clever stuff at compile time like statical type analysis and “speculative specialization”.

    ( more info: http://my.opera.com/core/blog/2009/02/04/carakan and http://my.opera.com/core/blog/2009/12/22/carakan-revisited )

  11. Frikky says:

    If you have a lousy slow rendering engine and you happen to boost it’s speed by 10% of it’s current speed, it still remains a lousy slow rendering engine.

    Mozilla, stop spewing garbage, you’ll loose terrain, the only thing that keeps Firefox in the race are it’s add-ons, the thing that other people created, Mozilla has a lousy bunch of worthless programmers.

    (I hope someday you realize how awful it is to speak like this, to people you don’t even know. -dvander)

  12. mark says:

    If you have a high quality rendering engine and you can boost it’s speed by the amount that you are expecting, then it will remain an excellent product.

    Mozilla, thanks for keeping us informed, you are continuing to push the performance of Firefox forward. One of the things that keeps Firefox in the race are it’s add-ons, the other is the dedicated team behind the browser itself. Mozilla has a great bunch of talented programmers.

  13. PM says:

    How do dumb people find your blog!? You’d think at least the captcha would stop them. Amazing

  14. joe says:

    You spend paragraphs talking about speed improvements, but BARELY mention that this is because of webkit
    give them more credit

  15. dvander says:

    joe – I certainly don’t want to downplay that we’re using the macro assembler from Nitro, however this is not the underlying reason for speed improvements. It’s a small, lightweight library that abstracts emitting low-level assembly. For sure, we were able to get the compiler written so quickly because the Nitro developers made such a nice backend tool.

    However you still have to write a compiler around it, and the compiler is what I’d prefer talking about, since we’ll have our own unique approach there and that’s what actually performs optimizations.

  16. RyanVM says:

    Kudos to you and the entire team (seems to be growing larger every day!) for the great work you’re doing.

  17. Erik Harrison says:

    I’m curious what the interaction between inline JIT and trace JIT will be like. Can I have a method that is inlined but with a hot loop inside that is traced? Can a traced loop call out to an inline JITed method? I’d hate to live in a world where a single traceable loop consigned all of my code to run on the interpreter, or a fully inline JITed script “only” runs at V8/Nitro speeds because the compiler can’t “dial up” to tracing when possible.

  18. dvander says:

    Erik: Yeah! Definitely, we want to spend as little time in the interpreter as possible. The current plan is in bug 549522. Method JIT’d code will have very thin instrumentation on loops. If a loop gets hot, it’ll jump into the interpreter to record a trace.

    But that’s it. Once the recorder has finished – no matter if it was successful or not – we’ll go back to the method JIT. Basically we should never be in the interpreter unless we are specifically recording a trace, which is always at most one iteration of a loop. (Though we may re-try after 30 iterations or so.)

    From there the method JIT can invoke the trace-JIT’d code directly, without any interpreter transitions. If all goes according to plan, we’ll get the best of both worlds.

    Eventually it would be awesome to have the tracing JIT be able to invoke the method JIT. Right now it inlines every function call. But inlining isn’t always good, some methods don’t inline well. To come up with good heuristics for that, we (or someone) will have to sit down and study what makes a method a good candidate for inlining or not.

    In general there’s tons of future work ahead for letting the tracer relax on optimizations that are potentially too optimistic. With the method JIT in place we’ll be better positioned to inform the tracer of what it should/shouldn’t optimize.

  19. Its really great to know, right now when working on Linux 64bit, @ times I can feel there is a lot of difference in performance in Google Chrome and Mozilla Firefox.

    Hope to see that difference as negligible.

  20. cic says:

    This is a very interesting approach. I would like to know if it would be easy to apply to other language´s interpreters (like Ruby or Python) or the languages are too complicated to do it easily.
    Also, how is the starting time compared to the pure interpreter?

  21. dvander says:

    cic: I think these approaches would definitely apply to Python and Ruby. There are a few projects for Python already, that do whole-method JITing, and I think Psyco does type specialization. I don’t think anyone’s tried trace-based type specialization for any dynamic languages other than JavaScript though. It would be really interesting to see that.

    What do you mean by “starting time”? If you mean the overhead of compilation, so far it seems pretty cheap. Not exactly 0, but well under 1ms for most methods.

  22. James Gray says:

    This sounds like such a great approach! I just read about this on ars technica, and thought you would have written something interesting about it, and I was right.

    The potential here is absolutely incredible.

    -”sslice”

  23. Erik Harrison says:

    davander, I posted a (awaiting moderation) comment over on hacks.moz, but I figured I’d mention it here – I built Firefox from the jaegermonkey repo, and overall I’m seeing a 50% perf improvement in Sunspider with Methodjit+Tracing enabled, and no regressions. Great success!

Leave a Reply