Inheritance Without Types

At AlliedModders we dabble in practical language design. Our two major scripting projects, AMX Mod X and SourceMod, iterated on our own fork of a scrappy language called Pawn. Our next iteration is untyped, and we’re adding object support.

What does object support mean? “Well, obviously,” I naively thought, “it means Java-style implementation inheritance!” Whoops. WRONG. It turns out there’s a ton of complexity with inheritance once you take away types. This post is a brief survey of some popular untyped languages and what we ended up deciding.

The Goal
We want our language to have as few pitfalls as possible. I’m not sure if I stole this from Graydon Hoare, but our motto is “No surprises!” We want people to be able to write code that works the way they expect, minimizing random or unforeseen run-time failures.

A great aspect of C++’s inheritance is that information hiding is super easy. A base class’s behavior isn’t trampled by derived classes. For example:

Select All Code:
class Base {
    int x;
  public:
    Base() : x(5) { }
    void print() {
        printf("Base: %d\n", x);
    }
};
 
class Derived : public Base {
  public:
    int x;
    Derived() : x(20) { }
};
 
int main() {
    Derived().print();
}

As you’d expect, this prints ’5′. The fact that Derived also declares x does not trample on Base‘s own behavior. Now let’s take an untyped language, Python 3:

Select All Code:
class Base:
    def __init__(self):
        self.x = 5
    def print(self):
        print(self.x)
 
class Derived(Base):
    def __init__(self):
        super().__init__()
        self.x = 20
 
Derived().print()

This prints ’20′. In Python, “self” is really one object, with one set of properties and methods, whereas in the C++ model “this” is conceptually two objects, and “this” is statically typed to be the object of the containing class. I consider the Python model unappealing for two reasons:

  1. When adding a member to a derived class, you have to know that you’re not colliding with a member of your base class.
  2. When adding a member to a base class, you might be breaking any number of unknown consumers.

JavaScript’s prototype-based inheritance discourages this style, but JS is super flexible, and if you don’t mind a very, very inefficient prototype chain, you can do this:

Select All Code:
function Base() {
    var x = 5;
    this.print = function () {
        alert(x);
    }
}
 
function Derived() {
    return Object.create(new Base(), { x: { value: 20 } });
}
 
Derived().print()

This prints “5″.

PHP’s Answer
For whatever reason, PHP takes Java’s model wholesale. Here’s an example:

Select All Code:
class Base {
    private $x;
    public function __construct() {
        $this->x = 5;
    }
    public function printMe() {
        print($this->x . "\n");
    }
}
 
class Derived extends Base {
    public $x;
    public function __construct() {
        parent::__construct();
        $this->x = 20;
    }
}
 
$obj = new Derived();
$obj->printMe();

This prints ’5′, which is what we want – and makes sense. But is $this statically typed to the Base class, like in C++? No:

Select All Code:
class Base {
    public function printMe() {
        print($this->stuff . "\n");
    }
}
 
class Derived extends Base {
    public $stuff;
    public function __construct() {
        $this->stuff = "hello";
    }
}
 
$obj = new Derived();
$obj->printMe();

This prints "hello", so PHP’s $this retains some dynamicism. Let’s up the ante. What should this do?

Select All Code:
class Base {
    private $x;
    public function __construct() {
        $this->x = 5;
    }
    public function compareTo($obj) {
        return $this->x == $obj->x;
    }
}
 
class Derived extends Base {
    public $x;
    public function __construct() {
        parent::__construct();
        $this->x = 20;
    }
}
 
$b = new Base();
$d = new Derived();
print($b->compareTo($d) . ", " . $d->compareTo($b) . "\n");

This prints true in both cases! For any property access in a PHP class function, if the object has that class on its inheritance chain, it uses the property on that class. Otherwise, it searches from the derived-most class back to the base class like normal. This implicitly hides variables on the derived object. Nonetheless it’s the right choice given the model, especially considering that it’s usually bad practice for a base class’s behavior to explicitly cast to one of its derived class.

Note that the fact that the inner x is private is actually irrelevant. Even if it were public, the base class should retain its behavior. Banning redeclaration works, though then you run the risk of potentially preventing an unknown consumer from compiling (albeit, better than silently being wrong). Similar issues occur with introducing new virtual functions or overloads.

So, what did we do?
After letting this all sink in, I decided to scrap inheritance as a feature of our next language iteration. Having objects alone will be a huge step forward, and we can evaluate use cases from there. I’m satisfied with this for a few reasons.

Inheritance is really complicated with types, and even more complicated without. And in an untyped language, it’s not even clear if implementation inheritance is useful. PHP’s interfaces and abstract classes seem verbose and heavyweight in comparison to JavaScript or Python, in return for a very small amount of static guarantees.

So far Pawn and our derivatives have mainly been embedded in games, and the majority of those games are based on the Half-Life engines (including Source). We want our object model to be compatible with the game’s internal object model, however, I’m not convinced that copying it would be ideal. The Source engine has the “Big-Ass Base Entity Class from which Everything Derives,” a painful and complex architecture continuing Source’s theme of not being scalable. Do we really want to encourage that?

I suspect we’ll end up with something like traits and/or composition, but for now, I’m content with keeping it simple and continuing to evaluate the strengths of other languages’ models.

(Addendum: I don’t much about Perl or Ruby. I tried quickly evaluating them for this post but the syntax was a little intimidating. I’d really appreciate insight into other languages.)

Addendum 2: SaberUK contributes this Ruby example which prints 20 – exhibiting similar behavior to Python.

Select All Code:
class Base
	def initialize
		@x = 5
	end
 
	def print_me
		puts @x
	end
end
class Derived < Base
	attr_accessor :x
	def initialize
		super
		@x = 20
	end
end
obj = Derived.new
obj.print_me

On Indonesia and Being Justin Bieber

Two weeks ago, a bunch of Mozillans went on a whirlwind tour of Indonesia, visiting Firefox 4 release parties around the country. We had a great time, and I found the trip really enlightening. Luke Wagner and Christian Legnitto handled the first four days of the trip, and then Dave Mandelin, Josh Aas and I did the rest.

Dave and I started off in San Francisco, making a 22-hour journey to Jakarta, with a stop in Taipei. We brought along extra suitcases full of Mozilla swag to give away: by my count about eight-hundred billion elastic headbands. It must have looked like a ton of cocaine when we put our luggage through the X-ray machine at customs. The security guards opened the suitcases up and looked perplexed, but let us go after I donned a headband for demonstration.

After immigration we met community members Viking Karwur (@vikingkarwur), Yofie Setiawan (@yofiesetiawan), and Arry T, the owner of a local private school. Arry took us to a restaurant where you order by walking around rows and rows of open coolers, filled with random frozen fish, live crabs, mussels, et cetera. We paced around with an empty cooler and grabbed what seemed like one of everything. (Mr. Burns: “I’ll just have a glass of milk… from THAT cow.”) Lo and behold about thirty minutes later it all arrived at our table fully cooked, and tasted great.

The next day we got up early and flew to Surabaya with Yofie (he was excited, it was his first time flying). The Surabaya venue took place in a shopping mall, with maybe 100-200 people showing up. There was a talk about web design (spoken entirely in Indonesian, but the slides were in good English), another from Yofie about, I think, Mozilla Indonesia, and then Dave, Josh, and I each presented a little bit about Firefox 4, concluding with JSNES and Flight of the Navigator demos. Then there was a kind of ribbon-cutting ceremony which involved a giant mound of rice. Each of us had to take a spoonful off the top without dropping it, which was surprisingly tricky.

What really surprised me was how much everyone liked Firefox – it has a huge market share in Indonesia, something like 77% – and how people were really excited that we came to visit. I think we spent a full thirty minutes posing for photographs at each event. As a C++ developer you don’t get treated like a rockstar too often, so that was pretty awesome (albeit exhausting). Afterward we got some local food, but I was so tired I don’t remember what it was.


The next day we drove three hours to Malang, which was actually a good chance to get a look at the country. The venue was at the polytechnic university, where we were treated to a plaque and boxes of local snacks. The students there had great questions. Recurring ones were: why does Firefox use so much memory? (we’re working on it), and when will it get a Blackberry port? (likely, never, but it’s a popular phone there). After Q&A we each talked a bit about why we like working for Mozilla, and Dave cut a ribbon which released a big balloon. A few students were interested in how to start a career at Mozilla, I wasn’t quite sure what to say but the internship program and just community involvement seem like great vectors. Unfortunately, Dave got pretty sick on the ride back to the hotel, so he was down for the next day.

At the Makassar event we temporarily lost Yofie as our babysitter, instead being placed in the capable hands of Rara (@rara79) and Mamie (@alwaysmamie). The event was in a café, and a structured a little more rigorously than the others – most questions were in Indonesian and translated by the MC. The audience seemed to be lots of bloggers and web developers in the community. Questions seemed to fall into four main categories: Memory use, support for Java phones, UI changes and support issues, and wondering where the money comes from. All good questions! Afterward, we went to Pizza Hut (of all places – it was actually pretty fancy), and got cheese-stuffed crust with cornflakes.


Finally, on the last day, we flew to Bali and met up with Viking and Yofie again. Josh and I relaxed at the resort pool which was occupied mostly by Australians. In the evening we went to a local café for a smaller venue, maybe a dozen people, with a local radio host as MC. At our table sat an American expatriate, Ken McClellan, CTO of Mitrais who talked about what it’s like living and running a company in Indonesia.

Indonesia is very clearly a developing country. Foreigners are advised not to drink the tap water, and to be careful about uncooked food, as they won’t have the same resistances as locals. In cities you can see a striking mix of poorer and middle-class areas, and the pollution near congested roads can be very noticeable, at least if you’re asthmatic like me. Traffic is amusingly chaotic – turn signals, stop lights (there are very few), and speed limits are all optional. When you’re in a car, honking is nearly constant as an advisory measure, since there are no clearly defined lanes, and the majority of vehicles on the road are motor scooters. Jakarta even had motorized rickshaws, most were very old, adding to the anachronistic mix of old and new technology. It seems like despite this, traffic was actually pretty reasonable, though as a Californian used to big open roads with camera-enforced stop lights every ten feet, I’d be too nervous to try driving.

The national airline, Garuda, was much higher quality than any domestic airline in the States. It kind of reminded me of what flying was like before 9/11, or when American airlines weren’t terrible. I guess what sums this paragraph up is that the flight seat pockets had pamphlets advertising “Investing in Indonesia.” I’m really curious as to what the country will look like in 20-30 years, because it seems like it’s growing rapidly.

All in all, it was a great trip, and I’d definitely go back. It helped having a local guide; Yofie, Viking, and others’ help was really appreciated. The food was great (Nasi Goreng anything, Oxtail Soup were my favorites), and cheap (coming from USD), and everyone we met was friendly – even complete strangers walking around. It was awesome seeing everyone so excited about Firefox, and whatever we’re doing that makes Firefox so popular in Indonesia, we should figure it out and do it more!

I am not Justin Bieber.

I would be remiss if I didn’t conclude by mentioning how people thought I was Justin Bieber. It was so persistent that Josh started calling me Justin. Near the end of the trip, swimming in Bali, he said that when my hair was slicked back it was “way more manly,” so I tried that for a day, but decided I couldn’t afford the daily metric ton of hairgel. Incidents of mistaken Bieber identity:

  • A few times in person at each event, including this tweet.
  • People were whispering it at the airport in Makassar.
  • Walking around Makassar, people rolled down the windows of their cars and called out at me and Josh.
  • Running into a group of loitering teens, they asked “who I was” and burst out giggling when I shrugged.
  • An airport employee in Makassar waved to me from behind a booth and yelled “Justin Bieber”! – I had to wave back.

Well, now that my pop star career is over, I’ve decided to go into dynamic language optimization.

Debugging 500 Errors

For the past two years, the AlliedModders site has been randomly dishing out “HTTP 500 – Internal Server Error” responses. It’s been pretty frustrating.

I actually tried debugging the errors two years ago. I got far enough to attach GDB to PHP, and saw it blocked on mysql_query(), but didn’t know how to proceed. We knew that during these incidents, the server load would spike to obscene levels – like 100-150. Lots of PHP processes were getting spawned. “iowait” time was through the roof, and CPU usage was normal. Something couldn’t read the disk, or the disk was getting hammered. I put MySQL on a separate hard drive, but the problems persisted and I gave up. Keep in mind: I didn’t know anything about running servers, and I still don’t – I just have a vague notion of how to debug things.

Fast forward to a few weeks ago, we had a serious hard drive crash and suffered some painful downtime. We lost a lot of data. I clung to a hope that our years of errors had been caused by a failing disk. Predictably, we weren’t so lucky, and the website problems continued. Someone joked, “Hey, I’m glad to see you got the server back up, even the 500 errors!” Yeah… it was time to fix this. Somehow.

Step One: Apache Logs

There were only two notable Apache errors. They’d appear over and over, but only in large clumps:

[Tue Jan 25 04:15:19 2011] [warn] [client xx.xxx.xxx.xx] (110)Connection timed out: mod_fcgid: ap_pass_brigade failed in handle_request function
[Tue Jan 25 04:15:25 2011] [warn] [client xxx.xxx.xx.xx] mod_fcgid: read data timeout in 60 seconds

It was pretty clear that somehow, communication was breaking down between mod_fcgid (Apache) and PHP processes. I read, I think, every single webpage on the Internet involving these two errors. Twice. They were all dead ends, both for us and both for the person reporting their own problems. A little sad. I wanted to blame mod_fcgid, since historically it has been really buggy for us, but I wanted to pin absolute blame.

So the next step was to break out a debugger.

Step Two: Debugging PHP, Apache

AM developer Fyren edited the mod_fcgid source code, such that instead of killing PHP, it would attach gdb in a separate tty. It usually took around three hours to get a debugging session going. The call stacks looked like:

(gdb) bt
...
#4 0x00007f1f7cad804f in net_write_command () from /usr/lib/libmysqlclient.so.15
#5 0x00007f1f7cad4c51 in cli_advanced_command () from /usr/lib/libmysqlclient.so.15
#6 0x00007f1f7cad1751 in mysql_send_query () from /usr/lib/libmysqlclient.so.15
#7 0x00007f1f7cad17b9 in mysql_real_query () from /usr/lib/libmysqlclient.so.15
...
(gdb) fin
Run till exit from #0 0x00007f1f7cad17b9 in mysql_real_query () from /usr/lib/libmysqlclient.so.15

At this point, the session would hang, indicating that MySQL was not responding. I really had no idea what to do, but we got lucky. We noticed during these incidents, Apache’s server-status was stuck at “200 requests being processed,” all in the “W” (“Sending reply”) state. Our MaxClients setting was exactly 200. Now we had a hypothesis: MySQL was intermittently not responding for short bursts of time. This blocked PHP, which in turn made mod_fcgid block and timeout, causing Apache to spawn a bunch of workers that would lock up as well.

Unfortunately, the intermittent failures only happened a few times a day, so debugging went slow. We managed to catch one, and ran iostat:

Device:          r/s       w/s   rsec/s    wsec/s    avgrq-sz  avgqu-sz   await  svctm  %util
sdc              320.80    4.40  3540.80    54.40    11.06     4.32   13.01   3.08 100.00
sdc1             320.80    4.40  3540.80    54.40    11.06     4.32   13.01   3.08 100.00

Someone was reading tons of data off the disk, and iotop confirmed MySQL was the culprit.

Diagnosing MySQL

I managed to find this enlightening presentation by Percona. Case study #3 looked identical to ours, except in their case, MySQL was writing. But their conclusion about bad queries seemed plausible. I made a script that would peek at server-status every second, log a ton of data, and e-mail me if it went above 100 concurrent requests. The data included oprofiling, iotop spew, SHOW FULL PROCESSLIST, SHOW OPEN TABLES, SHOW GLOBAL STATUS, etc.

Oprofile’s report wasn’t too interesting, but the MySQL status showed around 150 queries waiting for a table lock. The active queries, in all cases, were SELECTs. This explained the reads, but not the table locks. This didn’t make sense, so I generated a few core files of MySQL from within GDB, then Fyren and I did some post-mortem analysis.

The analysis revealed that MySQL had about 200 threads, most of which were waiting on locks (pthread_cond_timedwait) with a timeout of one year. The locks were almost all read locks for SELECTs, and the actually succeeding queries were exclusively SELECTs. Fyren tried to track down the thread owning the locks, but it ended up being apparently circular (thread X’s lock was held by thread Y, and thread Y was waiting on a lock held by thread Y).

That was all pretty mysterious (and still is), but rather than try to understand exactly what locks are in MyISAM tables, I took a look at the non-deadlocked queries. They were all one of: the AMX Mod X plugin search, the SourceMod plugin search, or the vBulletin member list search.

It turns out all three of these queries were extremely expensive – containing ORs, joins, full table scans of 800MB tables, etc. On average they each took about 500ms to complete. That’s without MySQL freaking out. I turned off vBulletin member searching (most large boards do), and replaced the plugin searches with faster queries off a much a smaller table. This brought those page load times down to 20ms. Twelve hours later, no 500 errors had occurred, and there were no mod_fcgid failures in Apache’s logs.

Too Early to Declare Victory

While waiting to see if any more 500 errors would show up, suddenly the server load spiked, and lo and behold – mod_fcgid became overloaded. Our monitoring script kicked in, and this time, MySQL was nowhere in the profile! It turned out some forum users were effectively DoS’ing the site by holding down F5 on their profile pages to bump up the “view” count. OProfile suggested that 20% of CPU time, during this incident, went to PHP parsing and lexing source code. I installed eAccelerator and now that time appears to be gone. (And for good measure, I removed profile view counts.)

Growth and Traffic

It’s unclear what our load and traffic really is. I think we’re pretty good though. CPU utilization is usually under 20%, disk I/O is under 10%, and we rarely see more than 30 concurrent requests. We don’t need to invest in a separate MySQL server yet, and we definitely aren’t growing fast enough to expect to need one anytime soon.

Community Involvement

I would like to thank Fyren for his help in looking into these problems, and asherkin for helping beef up our non-existent monitoring. Also, thanks to AzuiSleet, Drunken_Fool, MatthiasVance, and devicenull for sticking around in IRC and answering random questions.

Diagnose, then Solve

Many people, well-intentioned, said “switch to lighttpd, nginx, IIS, use my favorite host, use a VPS, buy more servers,” etc. I got caught up in this too, and fueled it by initially blaming mod_fcgid. It’s really important to make sure a solution has targeted a specific problem, and I had to keep telling myself to keep a cool head. Sometimes I had to slap myself or wait for Fyren to slap me. They all might be good ideas in their own right, but without diagnosing the actual cause of the errors, there was no guarantee of a fix. By careful debugging (albeit in painful, hour-per-day sprees), we got pretty close to the root cause, whereas throwing our infrastructure into a salad bowl would have delayed the inevitable.

Part of the reason our original drive is still at the recovery lab is because someone didn’t diagnose before “solving.” When it failed, a technician (not one of us) ran fsck on the disk instead of doing a hardware diagnostic first. I have no doubt this caused untold extra damage.

Conclusion

tl;dr, the forums should be much faster now, and it shouldn’t error anymore. I’ve got the system set up to e-mail me and dump diagnostics when our load goes way above normal, so I’ll continue monitoring for potential problems.

I would like to sincerely thank the whole AlliedModders community for the successful donation drive. I won’t lie, the past month has been pretty sucky. Despite our failings, the fact that everyone cares enough to see the whole thing keep going is really awesome. Our seven year anniversary was just a few weeks ago. Here’s to an AlliedModders for 2011!

JavaScript Checkers

I’ve been itching to write some JavaScript, so a few days ago I threw together a Checkers game that uses only HTML5 and JS. It’s got a simple AI that terminates after three seconds. The faster your browser can run JavaScript, the smarter the AI will be.

For example, here is Firefox 4 Beta (blue) versus Firefox 3 (red) – it happens that Firefox 4 is roughly 10X faster at this program, and soundly defeated its predecessor:

Firefox 4 Beats Firefox 3 at Checkers

The algorithm for the AI is UCT, a form of Monte-Carlo Tree Search. The idea is to estimate the likelihood of winning from a position by simulating hundreds or thousands of random games. UCT helps prune the search space by quickly eliminating bad positions.

While writing the source for this, I tried to turn off the part of my brain that said, “Hey! I know Firefox version x.y.z might be slow or fast at feature X.” That turned out to be okay, but it was harder to avoid general knowledge about JavaScript engines. I ended up with three variations:

  • Fast Checkers, which has manual inlining, manual strength reduction, hardcoded constants, and other gross stuff. Pieces and positions are represented via packed integers.
  • Slow Checkers, which removes manual inlining, strength reduction, and baked-in constants. Here, the additional overhead is extra memory accesses and function calls.
  • OO Checkers, which is the same as “slow”, except represents pieces as objects instead of packed integers. Here, an additional overhead is object allocation.

Performance seems uniform across most browsers. Below I’ve recorded the number of games each browser could simulate per second. Higher is better. Note – this chart is totally unscientific, and random simulations are affected by the behavior of Math.random().

Fast Checkers Slow Checkers OO Checkers
Firefox 4 Beta 14035 9018 9100
IE9 PP6 14579 8234 8067
Opera 11 Alpha 13165 8178 8749
Safari 5 12442 8045 8700
Chrome 9 Canary 4160 2060 2343

And – because why not – I ran them on my Droid 2 as well.

Fast Checkers Slow Checkers OO Checkers
Fennec 2b3pre 338 170 220
Android Browser 185 93 114
Opera Mobile 166 112 126

Since I’m pretty bad at web development, and don’t write JavaScript (sans test-cases) nearly as much as I should, this was an amusing experience. I kept making some really dumb mistakes, one repeatedly:

Select All Code:
Game.prototype.player = function () {
    return this.board.player;
}
...
var player = game.player;
if (player == x) { ...

And wondering why “player” showed as a function in the developer console. I probably should have used ES5 getters. A few other language features would have made the experience a little nicer – but nothing so much as real modules. I tried to emulate good encapsulation with closures, but it’s just not the same. And it doesn’t seem like any engine is smart enough yet to propagate constants through closures (which is one difference between the “fast” and “slow” variants).

Using developer tools for the first time was also an interesting experience. Firefox 4 and Chrome can debug code with a reasonable slow-down, but IE9 became over 100X slower; presumably it cannot debug with the JIT yet. I used Firebug until I needed single-stepping (we didn’t have that working in JägerMonkey for Beta 7), and then bounced over to Chrome – both proved really invaluable. I think my days of calling alert() are over.

Anyway, it was fun writing this, and I’m glad that I can write something computationally expensive and have it be fast everywhere. If and when time permits I may try a more stimulating game like Go.

Land Ho, Fast JavaScript!

Firefox just got a whole lot faster.

I’m excited to announce that Mozilla’s new JavaScript engine, JägerMonkey, is now available for testing!

What is JägerMonkey?

JägerMonkey is our new optimizing JIT compiler for JavaScript. It sits underneath our existing JIT, TraceMonkey, which appeared in Firefox 3.5. If you recall from previous posts, TraceMonkey’s job is to optimize loops to very fast machine code. However, not all code has loops, and not all loops can be trace compiled.

JägerMonkey is a general-purpose compiler which converts entire methods to machine code. The goal is to get great baseline performance. When it detects a loop that can be traced, it automatically engages the trace compiler, which makes it even faster. Yes, that’s right: there’s a turbo button inside.

This hybrid approach is designed to use well-established optimization techniques that work everywhere, and combine them with our existing hyper-optimizing engine that handles smaller subsets of code.

Results.

If you’ve been obsessing over Are We Fast Yet? like me, you’ve seen the numbers dive. Want to try it out? Click here to get preview builds of Firefox 4 with our new JavaScript engine. You can play demos like JSNES at a full, glorious 60FPS.

Disclaimer: It’s a preview – we’re still ironing out the rare kinks. Please report bugs or tell us if something’s wrong (or slow!)

Benchmarks.

We’ve been using the SunSpider 0.9.1 and V8-v5 benchmarks to gauge our general progress. SunSpider is a full 2X faster over Firefox 3.6!

Our improvement on the V8-v5 benchmark is even more dramatic – 4X!

Ongoing Work.

The rockin’ doesn’t stop here. Right now we’re polishing off the final pieces to get into the next Firefox 4.0 Beta. At the same time, here are some of the immediate performance works-in-progress:

  1. Function Calls. As discussed previously, this is one of our last big areas of optimization. The first of four major pieces, caching call sequences, was completed this week. The second big chunk, which Luke Wagner has slated for this week, will make arguments and stack frames faster. Brian Hackett, Chris Leary, and Bill McCloskey have more stack frame optimizations as part of the third wave.
  2. Tracer Integration. Deciding when to use the turbo button is pretty hard, but Bill and Dave have been researching it thoroughly. Right now we’re just scratching the surface, and we will have much better heuristics by the end of the month.
  3. Web Optimizations. Community member Jan de Mooij is continually finding demos and real-world tools and improving performance “gotchas” in our engine, like making common arithmetic patterns faster.

Conclusions.

Firefox 4 is seeing dramatic wins over 3.6 and the web is feeling faster. You can try it out now using a JS Engine Preview, or wait for Firefox 4 Beta 6.

Please stay tuned as we approach JägerMonkey end-game for Firefox 4. Dave Mandelin and I will be blogging, and for smaller things, tweeting (his here) progress & technical updates.

JägerMonkey has Crossed the Streams

On July 12th, JägerMonkey officially crossed TraceMonkey on the v8 suite of benchmarks. Yay! It’s not by a lot, but this gap will continue to widen, and it’s an exciting milestone.

A lot’s happened over the past two months. You’ll have to excuse our blogging silence – we actually sprinted and rewrote JägerMonkey from scratch. Sounds crazy, huh? The progress has been great:

AWFY feed, v8-richards

The black line is the new method JIT, and the orange line is the tracing JIT. The original iteration of JägerMonkey (not pictured) was slightly faster than the pink line. We’ve recovered our original performance and more in significantly less time.

What Happened…

In early May, Dave Mandelin blogged about our half-way point. Around the same time, Luke Wagner finished the brunt of a massive overhaul of our value representation. The new scheme, “fat values”, uses a 64-bit encoding on all platforms.

We realized that retooling JägerMonkey would be a ton of work. Armed with the knowledge we’d learned, we brought up a whole new compiler over the next few weeks. By June we were ready to start optimizing again. “Prepare to throw one away”, indeed.

JägerMonkey has gotten a ton of new performance improvements and features since the reboot that were not present in the original compiler:

  • Local variables can now stay in registers (inside basic blocks).
  • Constants and type information propagate much better. We also do primitive type inference.
  • References to global variables and closures are now much faster, using more polymorphic inline caches.
  • There are many more fast-paths for common use patterns.
  • Intern Sean Stangl has made math much faster when floating-point numbers are involved – using the benefits of fat values.
  • Intern Andrew Drake has made our JIT’d code work with debuggers.

What about Tracer Integration?

This is a tough one to answer, and people are really curious! The bad news is we’re pretty curious too – we just don’t know what will happen yet. One thing is sure: if not carefully and properly tuned, the tracer will negatively dominate the method JIT’s performance.

The goal of JägerMonkey is to be as fast or faster than the competition, whether or not tracing is enabled. We have to integrate the two in a way that gives us a competitive edge. We didn’t do this in the first iteration, and it showed on the graphs.

This week I am going to do the simplest possible integration. From there we’ll tune heuristics as we go. Since this tuning can happen at any time, our focus will still be on method JIT performance. Similarly, it will be a while before an integrated line appears on Are We Fast Yet, to avoid distraction from the end goal.

The good news is, the two JITs win on different benchmarks. There will be a good intersection.

What’s Next?

The schedule is tight. Over the next six weeks, we’ll be polishing JägerMonkey in order to land by September 1st. That means the following things need to be done:

  • Tinderboxes must be green.
  • Everything in the test suite must JIT, sans oft-maligned features like E4X.
  • x64 and ARM must have ports.
  • All large-scale, invasive perf wins must be in place.
  • Integration with the tracing JIT must work, without degrading method JIT performance.

For more information, and who’s assigned to what, see our Path to Firefox 4 page.

Performance Wins Left

We’re generating pretty good machine code at this point, so our remaining performance wins fall into two categories. The first is driving down the inefficiencies in the SpiderMonkey runtime. The second is identifying places we can eliminate use of the runtime, by generating specialized JIT code.

Perhaps the most important is making function calls fast. Right now we’re seeing JM’s function calls being upwards of 10X slower than the competition. Its problems fall into both categories, and it’s a large project that will take multiple people over the next three months. Luke Wagner and Chris Leary are on the case already.

Lots of people on the JS team are now tackling other areas of runtime performance. Chris Leary has ported WebKit’s regular expression compiler. Brian Hackett and Paul Biggar are measuring and tackling what they find – so far lots of object allocation inefficiencies. Jason Orendorff, Andreas Gal, Gregor Wagner, and Blake Kaplan are working on Compartments (GC performance). Brendan is giving us awesome new changes to object layouts. Intern Alan Pierce is finding and fixing string inefficiencies.

During this home stretch, the JM folks are going to try and blog about progress and milestones much more frequently.

Are We Fast Yet Improvements

Sort of old news, but Michael Clackler got us fancy new hovering perf deltas on arewefastyet.com. wx24 gave us the XHTML compliant layout that looks way better (though, I’ve probably ruined compliance by now).

We’ve also got a makeshift page for individual test breakdowns now. It’s nice to see that JM is beating everyone on at least *one* benchmark (nsieve-bits).

Summit Slides

They’re here. Special thanks to Dave Mandelin for coaching me through this.

Conclusion

Phew! We’ve made a ton of progress, and a ton more is coming in the pipeline. I hope you’ll stay tuned.

Are We Fast Yet?

Seems like Are We Fast Yet has spread around a bit, so I’d like to talk about it.

It started as a joke, suggested by Dave Mandelin, in the JavaScript “pit” at Mozilla. I made a site that displayed a giant “NO”, mounted a monitor where everyone could see it, and put the site on full-screen display. After a few hours everyone thought it was kind of depressing, so I made the “NO” small and added graphs.

The intent was for us to track our progress and notice regressions quickly. The UI is super minimalist and the terminology is internal. So after it started spreading, I’ve been getting tons of e-mails about how it could be improved.

What don’t people like? The data points don’t tell you any information, and the x-axis isn’t labeled. I fixed the data points last night, and the labels will come next.

People have also asked where Opera and IE are. I’ve put up a FAQ page to answer that. In short, respectively 1) it tests shell/console JS, not in-browser, and 2) everything runs off a Mac. I want to fix both of these eventually, but don’t have spare cycles right now. Improving perf is higher priority.

What do we look for on AWFY? Our goal is for the gray line (Jäger Only) to be as fast or faster than the competition. This puts Jäger+Tracing (purple line) at a greater advantage. The gray line should be moving fastest as that’s where the biggest gains are right now.

Anyway, I hope you like it!

JaegerMonkey – Fast JavaScript, Always!

Mozilla’s JavaScript optimizer, TraceMonkey, is pretty powerful. It carefully observes loops and converts them to super-fast assembly. We call this “tracing”.

That’s great and all, but there’s a problem: sometimes tracing doesn’t work. Loops can throw curveballs that cause tracing to stop. Especially with recursion, or lots of nesting, it can be very difficult to build good traces on complex code.

Other JavaScript engines, such as Nitro (present in WebKit/Safari), take a simpler approach. Instead of compiling loops to assembly, they compile entire methods (functions) to assembly. The generated code is much more generic than tracing, so while it is not as fast, it can handle any curveball.

What we’ve found is that when tracing works, we’re faster than the generic approach. But when tracing fails, we have to fall back to our old-school interpreter. At that point your JavaScript runs about as fast as it would in 2007-2008 (i.e. before Firefox 3.5, Safari 4, Chrome, etc).

That’s not acceptable, and we need to fix that. Trace compilation is still an active area of research (one which we’ll continue to work on) – but in the interim, we need to make sure our “slow path” is at least as good as the competition.

The question we’ve been asked, and we’ve been asking of ourselves, is: Why couldn’t we trace and keep going SUPER AWESOME FAST, and when tracing fails, fall back to STILL REALLY FAST?

Enter JaegerMonkey.

Our new project, JaegerMonkey (or JägerMonkey), has exactly this in mind. We’re taking the tried-and-true approach of other vendors, and bolting trace compilation on top. Once the two are interacting seamlessly, you’ll have a much more consistent – and fast – JavaScript performance experience.

Dave Mandelin, Luke Wagner, Julian Seward and I have been sprinting the past few weeks to get something basic working. To emit actual machine code, we’re using some very pretty classes (“macro assembler”) from Nitro. That’s been a real treat; it’s well-abstracted and C++ish, and allowed us to get to work on the actual compiler very quickly.

Our compiler is simple so far. Before interpreting a method, we translate each bytecode into some pretty generic assembly. For example, an “ADD” opcode will emit assembly that can handle both fast cases (adding two numbers) and slow cases (adding, say, an object and a string).

Contrast this to tracing, where the types are known, and pinned, statically – it does not need to handle any extra cases that might come up. In the whole-method compiler, the generated code must handle all unexpected variations in control or type flow.

After the function is compiled we execute it right away – the interpreter is skipped entirely.

Early Progress.

We’ve barely started and the results are already really promising. Running SunSpider on my machine, the whole-method JIT is 30% faster than the interpreter on x86, and 45% faster on x64. This is with barely any optimization work! When we integrate tracing next week, we’ll already start to see the benefits of both working together.

For a more in-depth study, Dave Mandelin has blogged about our early performance gains, what’s done, up-and-coming, etc.

As we move forward, the two compilers will be tightly integrated. The method compiler will be able to identify loops and invoke the trace compiler. The trace compiler, if it decides a method is too complex to inline, may decide to invoke the method compiler.

The future of SpiderMonkey is bright and shiny, and we’ll be talking more about the project as it reaches major milestones.

In the meantime, if you are interested in learning more, I invite you to look at JaegerMonkey on the Mozilla wiki, and our makeshift source code repository. We also hang out in #jsapi on irc.mozilla.org.

Debugging the Impossible

A few days we released SourceMod 1.3 and suddenly got a few reports of crashing when clients disconnect. We couldn’t reproduce it, but a few users claimed they could. We had about ten minidumps to work with (they include a few KB of stack memory and that’s it).

All of the stack traces had a weird code address at the top and nothing else. Here’s two examples:

>	1673a14a()	

EAX = 0ED884B0 EBX = 00000000 ECX = 0EDAC948 EDX = 0EDA6A48 
ESI = 0DFFA95E EDI = 100F0101 EIP = 1673A14A ESP = 0012DA09 
EBP = 00000001 EFL = 00010282 
>	0e531992()	

EAX = 0ED884B0 EBX = 00000000 ECX = 0EDAC948 EDX = 0EDA6A4
ESI = 0DFFA95E EDI = 100F0039 EIP = 0E531992 ESP = 0012DA09
EBP = 00000001 EFL = 00010282 

These code addresses are weird because the debugger couldn’t map them to any loaded module. So they’re either dynamically generated/exploited, or garbage. What can we do with such little information? Surprisingly, a lot!

The first thing of interest was that ESP, the stack pointer, was not aligned correctly (it should be divisible by 4). This wasn’t the case in all minidumps, but it was a good hint, given that it was in the usual range for Windows main-thread stack addresses. The CPU was probably jumping to garbage memory, and decoded it to something that incremented ESP. So after stripping that extra theoretical increment off ESP, I could inspect its memory in Visual Studio:

ESP dump

Whoa! A string on the stack suggests there was indeed something going on with client disconnects. Also, the value on the top of the stack – 0x0E531992 is very close to 0x0E531990. This was enough to form a wild guess: maybe the stack was misaligned, and the CPU picked the wrong return address to jump to. We can do better though.

The two most interesting binaries in the HL2 process are the engine and server. Let’s look at their address ranges in Visual Studio:

engine.dll   0BAF0000-0C10A000
server.dll   0DEB0000-0E3A4000

Two of the addresses close to the top of ESP fall into these ranges:
0x0012DA08 0e531990 0e531990 015c15f0 1320092c ..S...S.ð.\.,. .
0x0012DA18 0dfd4cda 100f0038 0bc40689 100f0038 ÚLý.8.....Ä.8...
0x0012DA28 0012de60 1320092c 63736944 656e6e6f `Þ..,. .Disconne
0x0012DA38 62207463 73752079 002e7265 00000000 ct by user......

Let’s take the outermost one first, since it may correspond to the stack frame that created that cool looking string. To go further, I like to use IDA – it’s a great disassembler and it has non-commercial and trial versions. With engine.dll loaded in IDA, all the addresses are wrong – luckily IDA can rebase the disassembly (Edit -> Segments -> Rebase Program):

Rebasing in IDA

For some reason you have to add 0x1000 to get the right addresses – I don’t know why. Now I go to Jump -> Jump to Address, and type in 0bc40689, and find this:

.text:0BC40683    lea     edx, [edi-4]
.text:0BC40686    push    edx
.text:0BC40687    call    eax
.text:0BC40689    lea     ecx, [esp+8]

Well, that’s kind of troublesome. IDA can’t tell where an indirect call will go. So I scroll around the function and look for hints, and get lucky. This is right above:

.text:0BC405F9    push    0
.text:0BC405FB    push    offset aPlayer_disconn ; "player_disconnect"
.text:0BC40600    call    edx
.text:0BC40602    mov     esi, eax

Again, this has something to do with player disconnects! Now I really want to know what function this is. Unfortunately Windows DLLS have symbols separated, but Linux binaries have them built-in. So I load the Linux version of the HL2 engine in IDA, and go to Jump -> Jump to Name, and type in aPlayer_disconn. By doing this I’m hoping the Linux binary has the same string, since IDA encodes string constants as unique names.

IDA name search

Here’s where IDA is extremely helpful – I click on aPlayer_disconn, and go to Jump -> Jump to cross reference:

IDA xref menu

And now I get this:

IDA xref menu 2

So now our mystery call stack has CGameClient::Disconnect() in it. Progress! Now I want to know what that virtual call usually goes to. First I find the same call in the Linux binary. The Windows one had this:

.text:0BC4067D    mov     eax, [eax+0A8h]
.text:0BC40683    lea     edx, [edi-4]
.text:0BC40686    push    edx
.text:0BC40687    call    eax
.text:0BC40689    lea     ecx, [esp+8]
.text:0BC4068D    push    ecx             ; char
.text:0BC4068E    push    offset aS_12    ; "%s"
.text:0BC40693    push    edi             ; int
.text:0BC40694    call    sub_BBA2750

The Linux one is much harder to read because of GCC’s (asinine) PIC code, which IDA has trouble figuring out. But the basic pattern I’m trying to mentally match is:

  1. A virtual call that flows from a large virtual offset.
  2. The string “%s”.
  3. A non-virtual call.

In the Linux version, this looks really close:

.text:00102BE4    lea     edi, (aS_3+2 - 343D64h)[ebx]
.text:00102BEA    mov     edx, [eax]
.text:00102BEC    mov     [esp+43Ch+var_438], ebp
.text:00102BF0    mov     [esp], eax
.text:00102BF3    call    dword ptr [edx+0ACh]
.text:00102BF9    mov     esi, [esp+43Ch+var_428]
.text:00102BFD    mov     [esp+43Ch+var_438], edi
.text:00102C01    mov     [esp+43Ch+var_43C], ebp
.text:00102C04    mov     [esp+43Ch+var_434], esi
.text:00102C08    call    __ZN11CBaseClient10DisconnectEPKcz

Unfortunately IDA had trouble decoding the string reference – aS_3 points to “, %s” – but note it adds two bytes, making “%s”. Now I start a Linux server in GDB, and compute the address I want to break on. This is easy by taking the IDA difference and adding it to the function in the process:

(gdb) p _ZN11CGameClient10DisconnectEPKcz
$1 = {} 0xb75f4a70
(gdb) p/x 0x00102BF3 - 0x00102A70
$2 = 0x183
(gdb) p $1 + $2
$3 = ( *) 0xb75f4bf3
(gdb) b *0xb75f4bf3
Breakpoint 2 at 0xb75f4bf3

Now I add and kick a bot (bold red text is my input):

(gdb) c
Continuing.
sv_cheats 1
bot

Cannot verify load for invalid steam ID [A:1:0:1]
kickid 2
Breakpoint 2, 0xb75f4bf3 in CGameClient::Disconnect ()
from /home/builds/common/srcds/orangebox/bin/engine_i486.so
(gdb)

Yay, it hit! Now I step into the assembly to see where this goes:

(gdb) display/i $pc
2: x/i $pc 0xb75f6bf3 <_ZN11CGameClient10DisconnectEPKcz+387>: call DWORD PTR [edx+0xac]
(gdb) si
0xb75edf00 in CGameServer::RemoveClientFromGame ()
from /home/builds/common/srcds/orangebox/bin/engine_i486.so

Now back in IDA, I go to Jump -> Jump to function and search for RemoveClientFromGame. Of interest in this function is the following code:

.text:000F9F6A     mov     ecx, ds:(serverGameEnts_ptr - 343D64h)[ebx]
...
.text:000F9F81     call    dword ptr [ecx+10h]

Since we have the symbol info, here I don’t have to do extensive guess work – this is almost definitely an IServerGameEnts pointer, a singleton interface the engine uses to feed information back to the actual game. Here it’s calling the offset for IServerGameEnts::FreeContainingEntity. Could this be the other address we saw on the stack, 0DFD4CDA? It seems likely, given that there is one pushed value in between, and the disassembly has one push as well. Let’s compare the target functions in source and binary. From loading server.dll in IDA:

.text:0DFD4CD0    mov         eax,dword ptr [esp+4] 
.text:0DFD4CD4    push        eax  
.text:0DFD4CD5    call        sub_0DFFA920 
.text:0DFD4CDA    pop         ecx  
.text:0DFD4CDB    retn         4    

The same function from hl2sdk/game/server/gameinterface.cpp:

Select All Code:
void CServerGameEnts::FreeContainingEntity( edict_t *e )
{
        ::FreeContainingEntity(e);
}

These seem really similar. So it seems like we might be crashing in ::FreeContainingEntity(), at a point where the stack depth is around 4 values. But looking at this function, it’s not really clear what could be happening:

Select All Code:
void FreeContainingEntity( edict_t *ed )
{
...
        CBaseEntity *ent = GetContainingEntity( ed );
        if ( ent )
        {
                ed->SetEdict( NULL, false );
                CBaseEntity::PhysicsRemoveTouchedList( ent );
                CBaseEntity::PhysicsRemoveGroundList( ent );
                UTIL_RemoveImmediate( ent );
        }
...
}

In HL2 parlance, UTIL_RemoveImmediate() means “immediately destroy and deallocate this entity”. The point at which the stack depth becomes 4 is around this call – but not exactly. At this point I gave up for the night and asked the user if he had any more information. But from this, I had the following guesses:

  1. Call stack that looked something like:
    (?+0) server.dll!FreeContainingEntity()
    (?+1) server.dll!CServerGameEnts::FreeContainingEntity()
    (?+2) engine.dll!CGameServer::RemoveClientFromGame()
    (?+3) engine.dll!CGameClient::Disconnect()
    
  2. The CPU jumped to an invalid address somewhere.
  3. The invalid address may be from a return instruction and a misaligned stack.

Amazingly, he posted a working test case that involved adding and kicking a bot on a specific game. I tried this game, and managed to reproduce it! I narrowed the problem down to a minimal set of “extra junk running”, and from there SM developer Fyren noticed the problem pretty quickly.

SourceMod overwrites virtual table slots to “detour” function calls in between the engine and whatever game is running (in this case, Day of Defeat:Source). The slot number can differ between games, so it is specified in a configuration file. One of these configuration entries was missing, and SourceMod was accidentally overwriting the zeroth vtable entiry – the destructor! Now the sequence of events is clear:

  1. Engine tells the game that a client is disconnecting.
  2. Game destroys the client object.
  3. C++ calls the virtual destructor, but the vtable entry points to a SourceMod function.
  4. The Windows ABI uses a callee-pop convention on member functions. The virtual destructor should be popping no parameters, but the SourceMod function pops two, since the conventions don’t match.
  5. Once SourceMod’s replacement function returns, the CPU tries to load the return address, and gets garbage.

Phew.

Given that the patch that added this code baked for months, it’s amazing it slipped into a release. And for the record, it had two peer reviews. But there you have it. Sometimes it’s possible to draw blood from a minidump stone – but without a test case it’s unlikely we could have gotten this as quickly.

Login Changes for AlliedModders

SourceMod developer and occasional ITer Scott Ehlert (“DS”) has overhauled the way we do source code access at AlliedModders. If you have any kind of Mercurial, SVN, or CVS access with us, this pertains to you. If not, you probably want to save yourself the boredom and read something else.

Domain Names
All source code services are now on a separate server. You need to use hg.alliedmods.net, svn.alliedmods.net, cvs.alliedmods.net, etc. Using just alliedmods.net will not work.

Secure Accounts
If you used SSH access (“ssh://” or “svn+ssh://”) to access a repository, your account is no longer valid. Our new account system uses e-mail addresses for user names. We’ve made an easy website for converting your old account to an e-mail address based one:

https://www.alliedmods.net/internal/oldaccount

If you had source code access and you want to keep it, you MUST provide a PUBLIC key. You CANNOT push to source code repositories with a password anymore. Keys are required, and of course you should be careful to protect the private half of your key.

When accessing source code repositories, you need to use your full account name. For example, “[email protected]@hg.alliedmods.net”. If you have SSH access to our webserver, you can continue to use your old account name.

Repository URLs
For Mercurial users, you no longer have to type “//hg/” before every path. For example, “//hg/sourcemod-central” is now “/sourcemod-central“.

For Subversion (SVN) users, the same applies for “/svnroot“. If you had “/svnroot/users/dvander” it is now “/users/dvander“.

User Repositories
We now have a fairly easy way for developers to create their own Mercurial repositories without filing bugs and waiting. You can read more here:

http://wiki.alliedmods.net/Mercurial_User_Repositories

This replaces our old, vastly insecure Subversion management tool. We don’t really want to give out new Subversion repositories unless there are extenuating circumstances. If you would like changes to your SVN repo, or a migration to Mercurial, please just file a bug.

Credits
I think we’re in a much better position now to quickly give community developers the resources they need to do secure, versioned, shared development. There were quite a few trials and tribulations getting this thing working. DS worked on and off on this for a while, only taking a “break” to help out with the L4D2 porting effort. Hopefully it’ll all have been worth it – and a huge thanks to him regardless.