JägerMonkey has Crossed the Streams

On July 12th, JägerMonkey officially crossed TraceMonkey on the v8 suite of benchmarks. Yay! It’s not by a lot, but this gap will continue to widen, and it’s an exciting milestone.

A lot’s happened over the past two months. You’ll have to excuse our blogging silence – we actually sprinted and rewrote JägerMonkey from scratch. Sounds crazy, huh? The progress has been great:

AWFY feed, v8-richards

The black line is the new method JIT, and the orange line is the tracing JIT. The original iteration of JägerMonkey (not pictured) was slightly faster than the pink line. We’ve recovered our original performance and more in significantly less time.

What Happened…

In early May, Dave Mandelin blogged about our half-way point. Around the same time, Luke Wagner finished the brunt of a massive overhaul of our value representation. The new scheme, “fat values”, uses a 64-bit encoding on all platforms.

We realized that retooling JägerMonkey would be a ton of work. Armed with the knowledge we’d learned, we brought up a whole new compiler over the next few weeks. By June we were ready to start optimizing again. “Prepare to throw one away”, indeed.

JägerMonkey has gotten a ton of new performance improvements and features since the reboot that were not present in the original compiler:

  • Local variables can now stay in registers (inside basic blocks).
  • Constants and type information propagate much better. We also do primitive type inference.
  • References to global variables and closures are now much faster, using more polymorphic inline caches.
  • There are many more fast-paths for common use patterns.
  • Intern Sean Stangl has made math much faster when floating-point numbers are involved – using the benefits of fat values.
  • Intern Andrew Drake has made our JIT’d code work with debuggers.

What about Tracer Integration?

This is a tough one to answer, and people are really curious! The bad news is we’re pretty curious too – we just don’t know what will happen yet. One thing is sure: if not carefully and properly tuned, the tracer will negatively dominate the method JIT’s performance.

The goal of JägerMonkey is to be as fast or faster than the competition, whether or not tracing is enabled. We have to integrate the two in a way that gives us a competitive edge. We didn’t do this in the first iteration, and it showed on the graphs.

This week I am going to do the simplest possible integration. From there we’ll tune heuristics as we go. Since this tuning can happen at any time, our focus will still be on method JIT performance. Similarly, it will be a while before an integrated line appears on Are We Fast Yet, to avoid distraction from the end goal.

The good news is, the two JITs win on different benchmarks. There will be a good intersection.

What’s Next?

The schedule is tight. Over the next six weeks, we’ll be polishing JägerMonkey in order to land by September 1st. That means the following things need to be done:

  • Tinderboxes must be green.
  • Everything in the test suite must JIT, sans oft-maligned features like E4X.
  • x64 and ARM must have ports.
  • All large-scale, invasive perf wins must be in place.
  • Integration with the tracing JIT must work, without degrading method JIT performance.

For more information, and who’s assigned to what, see our Path to Firefox 4 page.

Performance Wins Left

We’re generating pretty good machine code at this point, so our remaining performance wins fall into two categories. The first is driving down the inefficiencies in the SpiderMonkey runtime. The second is identifying places we can eliminate use of the runtime, by generating specialized JIT code.

Perhaps the most important is making function calls fast. Right now we’re seeing JM’s function calls being upwards of 10X slower than the competition. Its problems fall into both categories, and it’s a large project that will take multiple people over the next three months. Luke Wagner and Chris Leary are on the case already.

Lots of people on the JS team are now tackling other areas of runtime performance. Chris Leary has ported WebKit’s regular expression compiler. Brian Hackett and Paul Biggar are measuring and tackling what they find – so far lots of object allocation inefficiencies. Jason Orendorff, Andreas Gal, Gregor Wagner, and Blake Kaplan are working on Compartments (GC performance). Brendan is giving us awesome new changes to object layouts. Intern Alan Pierce is finding and fixing string inefficiencies.

During this home stretch, the JM folks are going to try and blog about progress and milestones much more frequently.

Are We Fast Yet Improvements

Sort of old news, but Michael Clackler got us fancy new hovering perf deltas on arewefastyet.com. wx24 gave us the XHTML compliant layout that looks way better (though, I’ve probably ruined compliance by now).

We’ve also got a makeshift page for individual test breakdowns now. It’s nice to see that JM is beating everyone on at least *one* benchmark (nsieve-bits).

Summit Slides

They’re here. Special thanks to Dave Mandelin for coaching me through this.

Conclusion

Phew! We’ve made a ton of progress, and a ton more is coming in the pipeline. I hope you’ll stay tuned.

25 thoughts on “JägerMonkey has Crossed the Streams

  1. jmdesp

    IMO the best solution for tracing would be to *not* instrument the code, but instead to record what you are executing everytime there’s an interruption and rely on probabilities. It’s very fast to see what part of your code east a lot of CPU, it’s the one that’s constantly running when you’re interrupted.

    In other words, go oprofile ( oprofile.sourceforge.net ) not gnu gprof.

    Reply
  2. Barak

    Great stuff, Kudos :)
    Do you know if there’s a plan to merge the fatval branch back to tracemonkey? and if so when? :)

    Reply
  3. Ed

    Great News, there are A LOT to be done. Especially when Mozilla is SO behind in terms of JS performance. But it is great that Mozilla is having nearly a dozen of people working on JS performance alone.

    Reply
  4. dvander Post author

    jmdesp: The way tracing works is it must precisely observe behavior, in order to know what code to generate. It’s tough to do that at a rough granularity while having features like method inlining.

    What we can try (and want to experiment with in the future) is something like this though. By capturing higher-level data across more iterations of a loop, we can decide how to trace it better. Brian Hackett is working on an awesome type inference framework which could really pave the way for more parametrized compilation granularity.

    Barak: Yeah, for sure! That’s supposed to happen this week.

    Reply
  5. Jan

    Thanks for your post, I follow JM and TM development quite closely but it’s still interesting to see a summary of what happened over the last months.

    One question though. What does this tell us about TM, if JM can already outperform it on many benchmarks? Is TM unable to JIT some important constructs? Does it not trace enough code? Or is the interpreter too slow? It would be interesting to get your opinion on that.

    Anyway, great work so far! I’m looking forward to the next posts and performance improvements…

    Reply
  6. dvander Post author

    Jan: TM can only JIT loops, and it’s best at loops that don’t have too much control flow or whacky things like eval() going on. The interpreter is indeed very slow – JM will (for the most part) replace it, giving us a better baseline.

    TM and JM beat each other on different benchmarks, which is a good sign. We can use JM for generally good performance, and drop into TM where we know it’ll win even more.

    Reply
  7. Mark

    Awesome work guys!

    Was confused by this sentence though:
    “All large-scale, invasive perf wins must in place.”
    Is there a missing word? I’m not understanding it.

    Reply
  8. Ciprian Mustiata

    Ed: Think that FF4 is not just about having the best JIT in class. FF3 worked (more or less) as Java Hotspot and did it a great job. Even is in between 2x and 5x slower than other JITs, is not slow for this reason. FF4 also improved and it will improve not only in JS times, but also in accelerating the drawing using Direct2D on Windows (and similar frameworks on other platforms), better layout performance. So you will feel it faster. Having a great architecture for improvement as is Jaeger Monkey, will just mean that even if FF4.0 will not be amazingly fast, will offer a good baseline to start on. Also the architecture changes will mean just what JM states: will offer no “worst cases” where FF4 will be 5x times slower because it mostly stay in an interpreter. So for me is just amazing that Mozilla team always prove that can innovate and impress with that great quality products that all try to copy (extensions, themes support, very fast JS, accelerated frameworks – started by using Cairo from FF3.0, low memory usage, great standard support: Acid 3 is 94/100 which is a great score!). So go on Mozilla, go and prove that you are great!

    Reply
  9. Rafa Colunga

    Hi, great job!! throwing away the last version and rewriting JägerMonkey from scratch was the best decision… If only more developing teams around the world were these brave…

    Is there anything than us firefox 4 beta testers can do to help you with this?

    Reply
  10. Mercedez Modgling

    What’s Happening i’m new to this, I stumbled upon this I’ve found It absolutely valuable and it has aided me out loads. I hope to contribute & help other people like its helped me. Great career.

    Reply
  11. Google

    Observers maintain the item displays a new coherent approach, one thing thus low in your culture,
    that it’s not necessarily realised by simply all. This will allow you to answer only those calls that come in on your
    forwarded toll free number and route other calls to different locations.

    You need to make your potential customers aware of your products and services to ensure that they recognize them as valid solutions to their everyday problems.

    Here is my web page; Google

    Reply
  12. mattress memory foam

    Great goods from you, man. I have understand your stuff
    previous to and you are just too great. I really like what you’ve
    acquired here, really like what you’re saying
    and the way in which you say it. You make it entertaining and
    you still take care of to keep it sensible. I can not wait to read
    far more from you. This is actually a great website.

    Reply
  13. this domain

    I’ve been exploring for a little bit for any high quality articles or
    weblog posts in this kind of area . Exploring in Yahoo I at last stumbled upon this web site.
    Studying this info So i’m satisfied to exhibit that I’ve an incredibly just right uncanny feeling I
    discovered exactly what I needed. I most indubitably will
    make certain to don?t overlook this web site
    and give it a look regularly.

    Reply
  14. Google

    Sooner or later, Google will find all new spam methods.
    * Let you know there are things you can do to improve y0ur ranking.
    And then on March 20, the world’s largest paid private blog network – BMR
    - announced that its vast network had been almost entirely de-indexed
    by Google, causing chaos in the internet
    marketing industries.

    Reply

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">