Category Archives: Articles

“Official” articles that were pre-written for posting.

Inheritance Without Types

At AlliedModders we dabble in practical language design. Our two major scripting projects, AMX Mod X and SourceMod, iterated on our own fork of a scrappy language called Pawn. Our next iteration is untyped, and we’re adding object support.

What does object support mean? “Well, obviously,” I naively thought, “it means Java-style implementation inheritance!” Whoops. WRONG. It turns out there’s a ton of complexity with inheritance once you take away types. This post is a brief survey of some popular untyped languages and what we ended up deciding.

The Goal
We want our language to have as few pitfalls as possible. I’m not sure if I stole this from Graydon Hoare, but our motto is “No surprises!” We want people to be able to write code that works the way they expect, minimizing random or unforeseen run-time failures.

A great aspect of C++’s inheritance is that information hiding is super easy. A base class’s behavior isn’t trampled by derived classes. For example:

Select All Code:
class Base {
    int x;
  public:
    Base() : x(5) { }
    void print() {
        printf("Base: %d\n", x);
    }
};
 
class Derived : public Base {
  public:
    int x;
    Derived() : x(20) { }
};
 
int main() {
    Derived().print();
}

As you’d expect, this prints ’5′. The fact that Derived also declares x does not trample on Base‘s own behavior. Now let’s take an untyped language, Python 3:

Select All Code:
class Base:
    def __init__(self):
        self.x = 5
    def print(self):
        print(self.x)
 
class Derived(Base):
    def __init__(self):
        super().__init__()
        self.x = 20
 
Derived().print()

This prints ’20′. In Python, “self” is really one object, with one set of properties and methods, whereas in the C++ model “this” is conceptually two objects, and “this” is statically typed to be the object of the containing class. I consider the Python model unappealing for two reasons:

  1. When adding a member to a derived class, you have to know that you’re not colliding with a member of your base class.
  2. When adding a member to a base class, you might be breaking any number of unknown consumers.

JavaScript’s prototype-based inheritance discourages this style, but JS is super flexible, and if you don’t mind a very, very inefficient prototype chain, you can do this:

Select All Code:
function Base() {
    var x = 5;
    this.print = function () {
        alert(x);
    }
}
 
function Derived() {
    return Object.create(new Base(), { x: { value: 20 } });
}
 
Derived().print()

This prints “5″.

PHP’s Answer
For whatever reason, PHP takes Java’s model wholesale. Here’s an example:

Select All Code:
class Base {
    private $x;
    public function __construct() {
        $this->x = 5;
    }
    public function printMe() {
        print($this->x . "\n");
    }
}
 
class Derived extends Base {
    public $x;
    public function __construct() {
        parent::__construct();
        $this->x = 20;
    }
}
 
$obj = new Derived();
$obj->printMe();

This prints ’5′, which is what we want – and makes sense. But is $this statically typed to the Base class, like in C++? No:

Select All Code:
class Base {
    public function printMe() {
        print($this->stuff . "\n");
    }
}
 
class Derived extends Base {
    public $stuff;
    public function __construct() {
        $this->stuff = "hello";
    }
}
 
$obj = new Derived();
$obj->printMe();

This prints "hello", so PHP’s $this retains some dynamicism. Let’s up the ante. What should this do?

Select All Code:
class Base {
    private $x;
    public function __construct() {
        $this->x = 5;
    }
    public function compareTo($obj) {
        return $this->x == $obj->x;
    }
}
 
class Derived extends Base {
    public $x;
    public function __construct() {
        parent::__construct();
        $this->x = 20;
    }
}
 
$b = new Base();
$d = new Derived();
print($b->compareTo($d) . ", " . $d->compareTo($b) . "\n");

This prints true in both cases! For any property access in a PHP class function, if the object has that class on its inheritance chain, it uses the property on that class. Otherwise, it searches from the derived-most class back to the base class like normal. This implicitly hides variables on the derived object. Nonetheless it’s the right choice given the model, especially considering that it’s usually bad practice for a base class’s behavior to explicitly cast to one of its derived class.

Note that the fact that the inner x is private is actually irrelevant. Even if it were public, the base class should retain its behavior. Banning redeclaration works, though then you run the risk of potentially preventing an unknown consumer from compiling (albeit, better than silently being wrong). Similar issues occur with introducing new virtual functions or overloads.

So, what did we do?
After letting this all sink in, I decided to scrap inheritance as a feature of our next language iteration. Having objects alone will be a huge step forward, and we can evaluate use cases from there. I’m satisfied with this for a few reasons.

Inheritance is really complicated with types, and even more complicated without. And in an untyped language, it’s not even clear if implementation inheritance is useful. PHP’s interfaces and abstract classes seem verbose and heavyweight in comparison to JavaScript or Python, in return for a very small amount of static guarantees.

So far Pawn and our derivatives have mainly been embedded in games, and the majority of those games are based on the Half-Life engines (including Source). We want our object model to be compatible with the game’s internal object model, however, I’m not convinced that copying it would be ideal. The Source engine has the “Big-Ass Base Entity Class from which Everything Derives,” a painful and complex architecture continuing Source’s theme of not being scalable. Do we really want to encourage that?

I suspect we’ll end up with something like traits and/or composition, but for now, I’m content with keeping it simple and continuing to evaluate the strengths of other languages’ models.

(Addendum: I don’t much about Perl or Ruby. I tried quickly evaluating them for this post but the syntax was a little intimidating. I’d really appreciate insight into other languages.)

Addendum 2: SaberUK contributes this Ruby example which prints 20 – exhibiting similar behavior to Python.

Select All Code:
class Base
	def initialize
		@x = 5
	end
 
	def print_me
		puts @x
	end
end
class Derived < Base
	attr_accessor :x
	def initialize
		super
		@x = 20
	end
end
obj = Derived.new
obj.print_me

On Indonesia and Being Justin Bieber

Two weeks ago, a bunch of Mozillans went on a whirlwind tour of Indonesia, visiting Firefox 4 release parties around the country. We had a great time, and I found the trip really enlightening. Luke Wagner and Christian Legnitto handled the first four days of the trip, and then Dave Mandelin, Josh Aas and I did the rest.

Dave and I started off in San Francisco, making a 22-hour journey to Jakarta, with a stop in Taipei. We brought along extra suitcases full of Mozilla swag to give away: by my count about eight-hundred billion elastic headbands. It must have looked like a ton of cocaine when we put our luggage through the X-ray machine at customs. The security guards opened the suitcases up and looked perplexed, but let us go after I donned a headband for demonstration.

After immigration we met community members Viking Karwur (@vikingkarwur), Yofie Setiawan (@yofiesetiawan), and Arry T, the owner of a local private school. Arry took us to a restaurant where you order by walking around rows and rows of open coolers, filled with random frozen fish, live crabs, mussels, et cetera. We paced around with an empty cooler and grabbed what seemed like one of everything. (Mr. Burns: “I’ll just have a glass of milk… from THAT cow.”) Lo and behold about thirty minutes later it all arrived at our table fully cooked, and tasted great.

The next day we got up early and flew to Surabaya with Yofie (he was excited, it was his first time flying). The Surabaya venue took place in a shopping mall, with maybe 100-200 people showing up. There was a talk about web design (spoken entirely in Indonesian, but the slides were in good English), another from Yofie about, I think, Mozilla Indonesia, and then Dave, Josh, and I each presented a little bit about Firefox 4, concluding with JSNES and Flight of the Navigator demos. Then there was a kind of ribbon-cutting ceremony which involved a giant mound of rice. Each of us had to take a spoonful off the top without dropping it, which was surprisingly tricky.

What really surprised me was how much everyone liked Firefox – it has a huge market share in Indonesia, something like 77% – and how people were really excited that we came to visit. I think we spent a full thirty minutes posing for photographs at each event. As a C++ developer you don’t get treated like a rockstar too often, so that was pretty awesome (albeit exhausting). Afterward we got some local food, but I was so tired I don’t remember what it was.


The next day we drove three hours to Malang, which was actually a good chance to get a look at the country. The venue was at the polytechnic university, where we were treated to a plaque and boxes of local snacks. The students there had great questions. Recurring ones were: why does Firefox use so much memory? (we’re working on it), and when will it get a Blackberry port? (likely, never, but it’s a popular phone there). After Q&A we each talked a bit about why we like working for Mozilla, and Dave cut a ribbon which released a big balloon. A few students were interested in how to start a career at Mozilla, I wasn’t quite sure what to say but the internship program and just community involvement seem like great vectors. Unfortunately, Dave got pretty sick on the ride back to the hotel, so he was down for the next day.

At the Makassar event we temporarily lost Yofie as our babysitter, instead being placed in the capable hands of Rara (@rara79) and Mamie (@alwaysmamie). The event was in a café, and a structured a little more rigorously than the others – most questions were in Indonesian and translated by the MC. The audience seemed to be lots of bloggers and web developers in the community. Questions seemed to fall into four main categories: Memory use, support for Java phones, UI changes and support issues, and wondering where the money comes from. All good questions! Afterward, we went to Pizza Hut (of all places – it was actually pretty fancy), and got cheese-stuffed crust with cornflakes.


Finally, on the last day, we flew to Bali and met up with Viking and Yofie again. Josh and I relaxed at the resort pool which was occupied mostly by Australians. In the evening we went to a local café for a smaller venue, maybe a dozen people, with a local radio host as MC. At our table sat an American expatriate, Ken McClellan, CTO of Mitrais who talked about what it’s like living and running a company in Indonesia.

Indonesia is very clearly a developing country. Foreigners are advised not to drink the tap water, and to be careful about uncooked food, as they won’t have the same resistances as locals. In cities you can see a striking mix of poorer and middle-class areas, and the pollution near congested roads can be very noticeable, at least if you’re asthmatic like me. Traffic is amusingly chaotic – turn signals, stop lights (there are very few), and speed limits are all optional. When you’re in a car, honking is nearly constant as an advisory measure, since there are no clearly defined lanes, and the majority of vehicles on the road are motor scooters. Jakarta even had motorized rickshaws, most were very old, adding to the anachronistic mix of old and new technology. It seems like despite this, traffic was actually pretty reasonable, though as a Californian used to big open roads with camera-enforced stop lights every ten feet, I’d be too nervous to try driving.

The national airline, Garuda, was much higher quality than any domestic airline in the States. It kind of reminded me of what flying was like before 9/11, or when American airlines weren’t terrible. I guess what sums this paragraph up is that the flight seat pockets had pamphlets advertising “Investing in Indonesia.” I’m really curious as to what the country will look like in 20-30 years, because it seems like it’s growing rapidly.

All in all, it was a great trip, and I’d definitely go back. It helped having a local guide; Yofie, Viking, and others’ help was really appreciated. The food was great (Nasi Goreng anything, Oxtail Soup were my favorites), and cheap (coming from USD), and everyone we met was friendly – even complete strangers walking around. It was awesome seeing everyone so excited about Firefox, and whatever we’re doing that makes Firefox so popular in Indonesia, we should figure it out and do it more!

I am not Justin Bieber.

I would be remiss if I didn’t conclude by mentioning how people thought I was Justin Bieber. It was so persistent that Josh started calling me Justin. Near the end of the trip, swimming in Bali, he said that when my hair was slicked back it was “way more manly,” so I tried that for a day, but decided I couldn’t afford the daily metric ton of hairgel. Incidents of mistaken Bieber identity:

  • A few times in person at each event, including this tweet.
  • People were whispering it at the airport in Makassar.
  • Walking around Makassar, people rolled down the windows of their cars and called out at me and Josh.
  • Running into a group of loitering teens, they asked “who I was” and burst out giggling when I shrugged.
  • An airport employee in Makassar waved to me from behind a booth and yelled “Justin Bieber”! – I had to wave back.

Well, now that my pop star career is over, I’ve decided to go into dynamic language optimization.

Debugging 500 Errors

For the past two years, the AlliedModders site has been randomly dishing out “HTTP 500 – Internal Server Error” responses. It’s been pretty frustrating.

I actually tried debugging the errors two years ago. I got far enough to attach GDB to PHP, and saw it blocked on mysql_query(), but didn’t know how to proceed. We knew that during these incidents, the server load would spike to obscene levels – like 100-150. Lots of PHP processes were getting spawned. “iowait” time was through the roof, and CPU usage was normal. Something couldn’t read the disk, or the disk was getting hammered. I put MySQL on a separate hard drive, but the problems persisted and I gave up. Keep in mind: I didn’t know anything about running servers, and I still don’t – I just have a vague notion of how to debug things.

Fast forward to a few weeks ago, we had a serious hard drive crash and suffered some painful downtime. We lost a lot of data. I clung to a hope that our years of errors had been caused by a failing disk. Predictably, we weren’t so lucky, and the website problems continued. Someone joked, “Hey, I’m glad to see you got the server back up, even the 500 errors!” Yeah… it was time to fix this. Somehow.

Step One: Apache Logs

There were only two notable Apache errors. They’d appear over and over, but only in large clumps:

[Tue Jan 25 04:15:19 2011] [warn] [client xx.xxx.xxx.xx] (110)Connection timed out: mod_fcgid: ap_pass_brigade failed in handle_request function
[Tue Jan 25 04:15:25 2011] [warn] [client xxx.xxx.xx.xx] mod_fcgid: read data timeout in 60 seconds

It was pretty clear that somehow, communication was breaking down between mod_fcgid (Apache) and PHP processes. I read, I think, every single webpage on the Internet involving these two errors. Twice. They were all dead ends, both for us and both for the person reporting their own problems. A little sad. I wanted to blame mod_fcgid, since historically it has been really buggy for us, but I wanted to pin absolute blame.

So the next step was to break out a debugger.

Step Two: Debugging PHP, Apache

AM developer Fyren edited the mod_fcgid source code, such that instead of killing PHP, it would attach gdb in a separate tty. It usually took around three hours to get a debugging session going. The call stacks looked like:

(gdb) bt
...
#4 0x00007f1f7cad804f in net_write_command () from /usr/lib/libmysqlclient.so.15
#5 0x00007f1f7cad4c51 in cli_advanced_command () from /usr/lib/libmysqlclient.so.15
#6 0x00007f1f7cad1751 in mysql_send_query () from /usr/lib/libmysqlclient.so.15
#7 0x00007f1f7cad17b9 in mysql_real_query () from /usr/lib/libmysqlclient.so.15
...
(gdb) fin
Run till exit from #0 0x00007f1f7cad17b9 in mysql_real_query () from /usr/lib/libmysqlclient.so.15

At this point, the session would hang, indicating that MySQL was not responding. I really had no idea what to do, but we got lucky. We noticed during these incidents, Apache’s server-status was stuck at “200 requests being processed,” all in the “W” (“Sending reply”) state. Our MaxClients setting was exactly 200. Now we had a hypothesis: MySQL was intermittently not responding for short bursts of time. This blocked PHP, which in turn made mod_fcgid block and timeout, causing Apache to spawn a bunch of workers that would lock up as well.

Unfortunately, the intermittent failures only happened a few times a day, so debugging went slow. We managed to catch one, and ran iostat:

Device:          r/s       w/s   rsec/s    wsec/s    avgrq-sz  avgqu-sz   await  svctm  %util
sdc              320.80    4.40  3540.80    54.40    11.06     4.32   13.01   3.08 100.00
sdc1             320.80    4.40  3540.80    54.40    11.06     4.32   13.01   3.08 100.00

Someone was reading tons of data off the disk, and iotop confirmed MySQL was the culprit.

Diagnosing MySQL

I managed to find this enlightening presentation by Percona. Case study #3 looked identical to ours, except in their case, MySQL was writing. But their conclusion about bad queries seemed plausible. I made a script that would peek at server-status every second, log a ton of data, and e-mail me if it went above 100 concurrent requests. The data included oprofiling, iotop spew, SHOW FULL PROCESSLIST, SHOW OPEN TABLES, SHOW GLOBAL STATUS, etc.

Oprofile’s report wasn’t too interesting, but the MySQL status showed around 150 queries waiting for a table lock. The active queries, in all cases, were SELECTs. This explained the reads, but not the table locks. This didn’t make sense, so I generated a few core files of MySQL from within GDB, then Fyren and I did some post-mortem analysis.

The analysis revealed that MySQL had about 200 threads, most of which were waiting on locks (pthread_cond_timedwait) with a timeout of one year. The locks were almost all read locks for SELECTs, and the actually succeeding queries were exclusively SELECTs. Fyren tried to track down the thread owning the locks, but it ended up being apparently circular (thread X’s lock was held by thread Y, and thread Y was waiting on a lock held by thread Y).

That was all pretty mysterious (and still is), but rather than try to understand exactly what locks are in MyISAM tables, I took a look at the non-deadlocked queries. They were all one of: the AMX Mod X plugin search, the SourceMod plugin search, or the vBulletin member list search.

It turns out all three of these queries were extremely expensive – containing ORs, joins, full table scans of 800MB tables, etc. On average they each took about 500ms to complete. That’s without MySQL freaking out. I turned off vBulletin member searching (most large boards do), and replaced the plugin searches with faster queries off a much a smaller table. This brought those page load times down to 20ms. Twelve hours later, no 500 errors had occurred, and there were no mod_fcgid failures in Apache’s logs.

Too Early to Declare Victory

While waiting to see if any more 500 errors would show up, suddenly the server load spiked, and lo and behold – mod_fcgid became overloaded. Our monitoring script kicked in, and this time, MySQL was nowhere in the profile! It turned out some forum users were effectively DoS’ing the site by holding down F5 on their profile pages to bump up the “view” count. OProfile suggested that 20% of CPU time, during this incident, went to PHP parsing and lexing source code. I installed eAccelerator and now that time appears to be gone. (And for good measure, I removed profile view counts.)

Growth and Traffic

It’s unclear what our load and traffic really is. I think we’re pretty good though. CPU utilization is usually under 20%, disk I/O is under 10%, and we rarely see more than 30 concurrent requests. We don’t need to invest in a separate MySQL server yet, and we definitely aren’t growing fast enough to expect to need one anytime soon.

Community Involvement

I would like to thank Fyren for his help in looking into these problems, and asherkin for helping beef up our non-existent monitoring. Also, thanks to AzuiSleet, Drunken_Fool, MatthiasVance, and devicenull for sticking around in IRC and answering random questions.

Diagnose, then Solve

Many people, well-intentioned, said “switch to lighttpd, nginx, IIS, use my favorite host, use a VPS, buy more servers,” etc. I got caught up in this too, and fueled it by initially blaming mod_fcgid. It’s really important to make sure a solution has targeted a specific problem, and I had to keep telling myself to keep a cool head. Sometimes I had to slap myself or wait for Fyren to slap me. They all might be good ideas in their own right, but without diagnosing the actual cause of the errors, there was no guarantee of a fix. By careful debugging (albeit in painful, hour-per-day sprees), we got pretty close to the root cause, whereas throwing our infrastructure into a salad bowl would have delayed the inevitable.

Part of the reason our original drive is still at the recovery lab is because someone didn’t diagnose before “solving.” When it failed, a technician (not one of us) ran fsck on the disk instead of doing a hardware diagnostic first. I have no doubt this caused untold extra damage.

Conclusion

tl;dr, the forums should be much faster now, and it shouldn’t error anymore. I’ve got the system set up to e-mail me and dump diagnostics when our load goes way above normal, so I’ll continue monitoring for potential problems.

I would like to sincerely thank the whole AlliedModders community for the successful donation drive. I won’t lie, the past month has been pretty sucky. Despite our failings, the fact that everyone cares enough to see the whole thing keep going is really awesome. Our seven year anniversary was just a few weeks ago. Here’s to an AlliedModders for 2011!

JavaScript Checkers

I’ve been itching to write some JavaScript, so a few days ago I threw together a Checkers game that uses only HTML5 and JS. It’s got a simple AI that terminates after three seconds. The faster your browser can run JavaScript, the smarter the AI will be.

For example, here is Firefox 4 Beta (blue) versus Firefox 3 (red) – it happens that Firefox 4 is roughly 10X faster at this program, and soundly defeated its predecessor:

Firefox 4 Beats Firefox 3 at Checkers

The algorithm for the AI is UCT, a form of Monte-Carlo Tree Search. The idea is to estimate the likelihood of winning from a position by simulating hundreds or thousands of random games. UCT helps prune the search space by quickly eliminating bad positions.

While writing the source for this, I tried to turn off the part of my brain that said, “Hey! I know Firefox version x.y.z might be slow or fast at feature X.” That turned out to be okay, but it was harder to avoid general knowledge about JavaScript engines. I ended up with three variations:

  • Fast Checkers, which has manual inlining, manual strength reduction, hardcoded constants, and other gross stuff. Pieces and positions are represented via packed integers.
  • Slow Checkers, which removes manual inlining, strength reduction, and baked-in constants. Here, the additional overhead is extra memory accesses and function calls.
  • OO Checkers, which is the same as “slow”, except represents pieces as objects instead of packed integers. Here, an additional overhead is object allocation.

Performance seems uniform across most browsers. Below I’ve recorded the number of games each browser could simulate per second. Higher is better. Note – this chart is totally unscientific, and random simulations are affected by the behavior of Math.random().

Fast Checkers Slow Checkers OO Checkers
Firefox 4 Beta 14035 9018 9100
IE9 PP6 14579 8234 8067
Opera 11 Alpha 13165 8178 8749
Safari 5 12442 8045 8700
Chrome 9 Canary 4160 2060 2343

And – because why not – I ran them on my Droid 2 as well.

Fast Checkers Slow Checkers OO Checkers
Fennec 2b3pre 338 170 220
Android Browser 185 93 114
Opera Mobile 166 112 126

Since I’m pretty bad at web development, and don’t write JavaScript (sans test-cases) nearly as much as I should, this was an amusing experience. I kept making some really dumb mistakes, one repeatedly:

Select All Code:
Game.prototype.player = function () {
    return this.board.player;
}
...
var player = game.player;
if (player == x) { ...

And wondering why “player” showed as a function in the developer console. I probably should have used ES5 getters. A few other language features would have made the experience a little nicer – but nothing so much as real modules. I tried to emulate good encapsulation with closures, but it’s just not the same. And it doesn’t seem like any engine is smart enough yet to propagate constants through closures (which is one difference between the “fast” and “slow” variants).

Using developer tools for the first time was also an interesting experience. Firefox 4 and Chrome can debug code with a reasonable slow-down, but IE9 became over 100X slower; presumably it cannot debug with the JIT yet. I used Firebug until I needed single-stepping (we didn’t have that working in JägerMonkey for Beta 7), and then bounced over to Chrome – both proved really invaluable. I think my days of calling alert() are over.

Anyway, it was fun writing this, and I’m glad that I can write something computationally expensive and have it be fast everywhere. If and when time permits I may try a more stimulating game like Go.

Land Ho, Fast JavaScript!

Firefox just got a whole lot faster.

I’m excited to announce that Mozilla’s new JavaScript engine, JägerMonkey, is now available for testing!

What is JägerMonkey?

JägerMonkey is our new optimizing JIT compiler for JavaScript. It sits underneath our existing JIT, TraceMonkey, which appeared in Firefox 3.5. If you recall from previous posts, TraceMonkey’s job is to optimize loops to very fast machine code. However, not all code has loops, and not all loops can be trace compiled.

JägerMonkey is a general-purpose compiler which converts entire methods to machine code. The goal is to get great baseline performance. When it detects a loop that can be traced, it automatically engages the trace compiler, which makes it even faster. Yes, that’s right: there’s a turbo button inside.

This hybrid approach is designed to use well-established optimization techniques that work everywhere, and combine them with our existing hyper-optimizing engine that handles smaller subsets of code.

Results.

If you’ve been obsessing over Are We Fast Yet? like me, you’ve seen the numbers dive. Want to try it out? Click here to get preview builds of Firefox 4 with our new JavaScript engine. You can play demos like JSNES at a full, glorious 60FPS.

Disclaimer: It’s a preview – we’re still ironing out the rare kinks. Please report bugs or tell us if something’s wrong (or slow!)

Benchmarks.

We’ve been using the SunSpider 0.9.1 and V8-v5 benchmarks to gauge our general progress. SunSpider is a full 2X faster over Firefox 3.6!

Our improvement on the V8-v5 benchmark is even more dramatic – 4X!

Ongoing Work.

The rockin’ doesn’t stop here. Right now we’re polishing off the final pieces to get into the next Firefox 4.0 Beta. At the same time, here are some of the immediate performance works-in-progress:

  1. Function Calls. As discussed previously, this is one of our last big areas of optimization. The first of four major pieces, caching call sequences, was completed this week. The second big chunk, which Luke Wagner has slated for this week, will make arguments and stack frames faster. Brian Hackett, Chris Leary, and Bill McCloskey have more stack frame optimizations as part of the third wave.
  2. Tracer Integration. Deciding when to use the turbo button is pretty hard, but Bill and Dave have been researching it thoroughly. Right now we’re just scratching the surface, and we will have much better heuristics by the end of the month.
  3. Web Optimizations. Community member Jan de Mooij is continually finding demos and real-world tools and improving performance “gotchas” in our engine, like making common arithmetic patterns faster.

Conclusions.

Firefox 4 is seeing dramatic wins over 3.6 and the web is feeling faster. You can try it out now using a JS Engine Preview, or wait for Firefox 4 Beta 6.

Please stay tuned as we approach JägerMonkey end-game for Firefox 4. Dave Mandelin and I will be blogging, and for smaller things, tweeting (his here) progress & technical updates.

JägerMonkey has Crossed the Streams

On July 12th, JägerMonkey officially crossed TraceMonkey on the v8 suite of benchmarks. Yay! It’s not by a lot, but this gap will continue to widen, and it’s an exciting milestone.

A lot’s happened over the past two months. You’ll have to excuse our blogging silence – we actually sprinted and rewrote JägerMonkey from scratch. Sounds crazy, huh? The progress has been great:

AWFY feed, v8-richards

The black line is the new method JIT, and the orange line is the tracing JIT. The original iteration of JägerMonkey (not pictured) was slightly faster than the pink line. We’ve recovered our original performance and more in significantly less time.

What Happened…

In early May, Dave Mandelin blogged about our half-way point. Around the same time, Luke Wagner finished the brunt of a massive overhaul of our value representation. The new scheme, “fat values”, uses a 64-bit encoding on all platforms.

We realized that retooling JägerMonkey would be a ton of work. Armed with the knowledge we’d learned, we brought up a whole new compiler over the next few weeks. By June we were ready to start optimizing again. “Prepare to throw one away”, indeed.

JägerMonkey has gotten a ton of new performance improvements and features since the reboot that were not present in the original compiler:

  • Local variables can now stay in registers (inside basic blocks).
  • Constants and type information propagate much better. We also do primitive type inference.
  • References to global variables and closures are now much faster, using more polymorphic inline caches.
  • There are many more fast-paths for common use patterns.
  • Intern Sean Stangl has made math much faster when floating-point numbers are involved – using the benefits of fat values.
  • Intern Andrew Drake has made our JIT’d code work with debuggers.

What about Tracer Integration?

This is a tough one to answer, and people are really curious! The bad news is we’re pretty curious too – we just don’t know what will happen yet. One thing is sure: if not carefully and properly tuned, the tracer will negatively dominate the method JIT’s performance.

The goal of JägerMonkey is to be as fast or faster than the competition, whether or not tracing is enabled. We have to integrate the two in a way that gives us a competitive edge. We didn’t do this in the first iteration, and it showed on the graphs.

This week I am going to do the simplest possible integration. From there we’ll tune heuristics as we go. Since this tuning can happen at any time, our focus will still be on method JIT performance. Similarly, it will be a while before an integrated line appears on Are We Fast Yet, to avoid distraction from the end goal.

The good news is, the two JITs win on different benchmarks. There will be a good intersection.

What’s Next?

The schedule is tight. Over the next six weeks, we’ll be polishing JägerMonkey in order to land by September 1st. That means the following things need to be done:

  • Tinderboxes must be green.
  • Everything in the test suite must JIT, sans oft-maligned features like E4X.
  • x64 and ARM must have ports.
  • All large-scale, invasive perf wins must be in place.
  • Integration with the tracing JIT must work, without degrading method JIT performance.

For more information, and who’s assigned to what, see our Path to Firefox 4 page.

Performance Wins Left

We’re generating pretty good machine code at this point, so our remaining performance wins fall into two categories. The first is driving down the inefficiencies in the SpiderMonkey runtime. The second is identifying places we can eliminate use of the runtime, by generating specialized JIT code.

Perhaps the most important is making function calls fast. Right now we’re seeing JM’s function calls being upwards of 10X slower than the competition. Its problems fall into both categories, and it’s a large project that will take multiple people over the next three months. Luke Wagner and Chris Leary are on the case already.

Lots of people on the JS team are now tackling other areas of runtime performance. Chris Leary has ported WebKit’s regular expression compiler. Brian Hackett and Paul Biggar are measuring and tackling what they find – so far lots of object allocation inefficiencies. Jason Orendorff, Andreas Gal, Gregor Wagner, and Blake Kaplan are working on Compartments (GC performance). Brendan is giving us awesome new changes to object layouts. Intern Alan Pierce is finding and fixing string inefficiencies.

During this home stretch, the JM folks are going to try and blog about progress and milestones much more frequently.

Are We Fast Yet Improvements

Sort of old news, but Michael Clackler got us fancy new hovering perf deltas on arewefastyet.com. wx24 gave us the XHTML compliant layout that looks way better (though, I’ve probably ruined compliance by now).

We’ve also got a makeshift page for individual test breakdowns now. It’s nice to see that JM is beating everyone on at least *one* benchmark (nsieve-bits).

Summit Slides

They’re here. Special thanks to Dave Mandelin for coaching me through this.

Conclusion

Phew! We’ve made a ton of progress, and a ton more is coming in the pipeline. I hope you’ll stay tuned.

Debugging the Impossible

A few days we released SourceMod 1.3 and suddenly got a few reports of crashing when clients disconnect. We couldn’t reproduce it, but a few users claimed they could. We had about ten minidumps to work with (they include a few KB of stack memory and that’s it).

All of the stack traces had a weird code address at the top and nothing else. Here’s two examples:

>	1673a14a()	

EAX = 0ED884B0 EBX = 00000000 ECX = 0EDAC948 EDX = 0EDA6A48 
ESI = 0DFFA95E EDI = 100F0101 EIP = 1673A14A ESP = 0012DA09 
EBP = 00000001 EFL = 00010282 
>	0e531992()	

EAX = 0ED884B0 EBX = 00000000 ECX = 0EDAC948 EDX = 0EDA6A4
ESI = 0DFFA95E EDI = 100F0039 EIP = 0E531992 ESP = 0012DA09
EBP = 00000001 EFL = 00010282 

These code addresses are weird because the debugger couldn’t map them to any loaded module. So they’re either dynamically generated/exploited, or garbage. What can we do with such little information? Surprisingly, a lot!

The first thing of interest was that ESP, the stack pointer, was not aligned correctly (it should be divisible by 4). This wasn’t the case in all minidumps, but it was a good hint, given that it was in the usual range for Windows main-thread stack addresses. The CPU was probably jumping to garbage memory, and decoded it to something that incremented ESP. So after stripping that extra theoretical increment off ESP, I could inspect its memory in Visual Studio:

ESP dump

Whoa! A string on the stack suggests there was indeed something going on with client disconnects. Also, the value on the top of the stack – 0x0E531992 is very close to 0x0E531990. This was enough to form a wild guess: maybe the stack was misaligned, and the CPU picked the wrong return address to jump to. We can do better though.

The two most interesting binaries in the HL2 process are the engine and server. Let’s look at their address ranges in Visual Studio:

engine.dll   0BAF0000-0C10A000
server.dll   0DEB0000-0E3A4000

Two of the addresses close to the top of ESP fall into these ranges:
0x0012DA08 0e531990 0e531990 015c15f0 1320092c ..S...S.ð.\.,. .
0x0012DA18 0dfd4cda 100f0038 0bc40689 100f0038 ÚLý.8.....Ä.8...
0x0012DA28 0012de60 1320092c 63736944 656e6e6f `Þ..,. .Disconne
0x0012DA38 62207463 73752079 002e7265 00000000 ct by user......

Let’s take the outermost one first, since it may correspond to the stack frame that created that cool looking string. To go further, I like to use IDA – it’s a great disassembler and it has non-commercial and trial versions. With engine.dll loaded in IDA, all the addresses are wrong – luckily IDA can rebase the disassembly (Edit -> Segments -> Rebase Program):

Rebasing in IDA

For some reason you have to add 0x1000 to get the right addresses – I don’t know why. Now I go to Jump -> Jump to Address, and type in 0bc40689, and find this:

.text:0BC40683    lea     edx, [edi-4]
.text:0BC40686    push    edx
.text:0BC40687    call    eax
.text:0BC40689    lea     ecx, [esp+8]

Well, that’s kind of troublesome. IDA can’t tell where an indirect call will go. So I scroll around the function and look for hints, and get lucky. This is right above:

.text:0BC405F9    push    0
.text:0BC405FB    push    offset aPlayer_disconn ; "player_disconnect"
.text:0BC40600    call    edx
.text:0BC40602    mov     esi, eax

Again, this has something to do with player disconnects! Now I really want to know what function this is. Unfortunately Windows DLLS have symbols separated, but Linux binaries have them built-in. So I load the Linux version of the HL2 engine in IDA, and go to Jump -> Jump to Name, and type in aPlayer_disconn. By doing this I’m hoping the Linux binary has the same string, since IDA encodes string constants as unique names.

IDA name search

Here’s where IDA is extremely helpful – I click on aPlayer_disconn, and go to Jump -> Jump to cross reference:

IDA xref menu

And now I get this:

IDA xref menu 2

So now our mystery call stack has CGameClient::Disconnect() in it. Progress! Now I want to know what that virtual call usually goes to. First I find the same call in the Linux binary. The Windows one had this:

.text:0BC4067D    mov     eax, [eax+0A8h]
.text:0BC40683    lea     edx, [edi-4]
.text:0BC40686    push    edx
.text:0BC40687    call    eax
.text:0BC40689    lea     ecx, [esp+8]
.text:0BC4068D    push    ecx             ; char
.text:0BC4068E    push    offset aS_12    ; "%s"
.text:0BC40693    push    edi             ; int
.text:0BC40694    call    sub_BBA2750

The Linux one is much harder to read because of GCC’s (asinine) PIC code, which IDA has trouble figuring out. But the basic pattern I’m trying to mentally match is:

  1. A virtual call that flows from a large virtual offset.
  2. The string “%s”.
  3. A non-virtual call.

In the Linux version, this looks really close:

.text:00102BE4    lea     edi, (aS_3+2 - 343D64h)[ebx]
.text:00102BEA    mov     edx, [eax]
.text:00102BEC    mov     [esp+43Ch+var_438], ebp
.text:00102BF0    mov     [esp], eax
.text:00102BF3    call    dword ptr [edx+0ACh]
.text:00102BF9    mov     esi, [esp+43Ch+var_428]
.text:00102BFD    mov     [esp+43Ch+var_438], edi
.text:00102C01    mov     [esp+43Ch+var_43C], ebp
.text:00102C04    mov     [esp+43Ch+var_434], esi
.text:00102C08    call    __ZN11CBaseClient10DisconnectEPKcz

Unfortunately IDA had trouble decoding the string reference – aS_3 points to “, %s” – but note it adds two bytes, making “%s”. Now I start a Linux server in GDB, and compute the address I want to break on. This is easy by taking the IDA difference and adding it to the function in the process:

(gdb) p _ZN11CGameClient10DisconnectEPKcz
$1 = {} 0xb75f4a70
(gdb) p/x 0x00102BF3 - 0x00102A70
$2 = 0x183
(gdb) p $1 + $2
$3 = ( *) 0xb75f4bf3
(gdb) b *0xb75f4bf3
Breakpoint 2 at 0xb75f4bf3

Now I add and kick a bot (bold red text is my input):

(gdb) c
Continuing.
sv_cheats 1
bot

Cannot verify load for invalid steam ID [A:1:0:1]
kickid 2
Breakpoint 2, 0xb75f4bf3 in CGameClient::Disconnect ()
from /home/builds/common/srcds/orangebox/bin/engine_i486.so
(gdb)

Yay, it hit! Now I step into the assembly to see where this goes:

(gdb) display/i $pc
2: x/i $pc 0xb75f6bf3 <_ZN11CGameClient10DisconnectEPKcz+387>: call DWORD PTR [edx+0xac]
(gdb) si
0xb75edf00 in CGameServer::RemoveClientFromGame ()
from /home/builds/common/srcds/orangebox/bin/engine_i486.so

Now back in IDA, I go to Jump -> Jump to function and search for RemoveClientFromGame. Of interest in this function is the following code:

.text:000F9F6A     mov     ecx, ds:(serverGameEnts_ptr - 343D64h)[ebx]
...
.text:000F9F81     call    dword ptr [ecx+10h]

Since we have the symbol info, here I don’t have to do extensive guess work – this is almost definitely an IServerGameEnts pointer, a singleton interface the engine uses to feed information back to the actual game. Here it’s calling the offset for IServerGameEnts::FreeContainingEntity. Could this be the other address we saw on the stack, 0DFD4CDA? It seems likely, given that there is one pushed value in between, and the disassembly has one push as well. Let’s compare the target functions in source and binary. From loading server.dll in IDA:

.text:0DFD4CD0    mov         eax,dword ptr [esp+4] 
.text:0DFD4CD4    push        eax  
.text:0DFD4CD5    call        sub_0DFFA920 
.text:0DFD4CDA    pop         ecx  
.text:0DFD4CDB    retn         4    

The same function from hl2sdk/game/server/gameinterface.cpp:

Select All Code:
void CServerGameEnts::FreeContainingEntity( edict_t *e )
{
        ::FreeContainingEntity(e);
}

These seem really similar. So it seems like we might be crashing in ::FreeContainingEntity(), at a point where the stack depth is around 4 values. But looking at this function, it’s not really clear what could be happening:

Select All Code:
void FreeContainingEntity( edict_t *ed )
{
...
        CBaseEntity *ent = GetContainingEntity( ed );
        if ( ent )
        {
                ed->SetEdict( NULL, false );
                CBaseEntity::PhysicsRemoveTouchedList( ent );
                CBaseEntity::PhysicsRemoveGroundList( ent );
                UTIL_RemoveImmediate( ent );
        }
...
}

In HL2 parlance, UTIL_RemoveImmediate() means “immediately destroy and deallocate this entity”. The point at which the stack depth becomes 4 is around this call – but not exactly. At this point I gave up for the night and asked the user if he had any more information. But from this, I had the following guesses:

  1. Call stack that looked something like:
    (?+0) server.dll!FreeContainingEntity()
    (?+1) server.dll!CServerGameEnts::FreeContainingEntity()
    (?+2) engine.dll!CGameServer::RemoveClientFromGame()
    (?+3) engine.dll!CGameClient::Disconnect()
    
  2. The CPU jumped to an invalid address somewhere.
  3. The invalid address may be from a return instruction and a misaligned stack.

Amazingly, he posted a working test case that involved adding and kicking a bot on a specific game. I tried this game, and managed to reproduce it! I narrowed the problem down to a minimal set of “extra junk running”, and from there SM developer Fyren noticed the problem pretty quickly.

SourceMod overwrites virtual table slots to “detour” function calls in between the engine and whatever game is running (in this case, Day of Defeat:Source). The slot number can differ between games, so it is specified in a configuration file. One of these configuration entries was missing, and SourceMod was accidentally overwriting the zeroth vtable entiry – the destructor! Now the sequence of events is clear:

  1. Engine tells the game that a client is disconnecting.
  2. Game destroys the client object.
  3. C++ calls the virtual destructor, but the vtable entry points to a SourceMod function.
  4. The Windows ABI uses a callee-pop convention on member functions. The virtual destructor should be popping no parameters, but the SourceMod function pops two, since the conventions don’t match.
  5. Once SourceMod’s replacement function returns, the CPU tries to load the return address, and gets garbage.

Phew.

Given that the patch that added this code baked for months, it’s amazing it slipped into a release. And for the record, it had two peer reviews. But there you have it. Sometimes it’s possible to draw blood from a minidump stone – but without a test case it’s unlikely we could have gotten this as quickly.

OS X and Windows Perspective, Updated

A year ago I started using OS X, and gave a bunch of thoughts on the experience. A year later, I’m using it exclusively. I’m sure you could pull up any number of UI mistakes in OS X – there are many. Perhaps I’ve adjusted or gotten used to them. When I need Windows, I usually fire up VMware and run XP.

A few days ago I installed Windows 7 on a spare computer. The experience was horrifying. It was like putting my head in a clamp that has complimentary electroshock therapy. What surprises me most is that I used to be a die-hard Windows fanboy. I’d have nasty arguments with mac users. I’d love touting MSDN and the WinAPI. I can’t even try to muster up this enthusiasm anymore. The Windows 7 (and Vista) UI drains my soul.

The Windows 7 installer kept freezing at two points in the process – before copying or disk formatting or anything! I waited about ten minutes before rebooting. Rinse, lather repeat. Finally someone told me I had to just wait it out, despite there being no I/O activity or visual feedback. Sure enough, after about 15-20 minutes of waiting it proceeded.

After starting up, the Aero UI was… ugly. Everything looked about a dozen pixels too thick. The glass effect looked like my UI had beer goggles, made worse by the way-too-bright default blue. So I turned border padding really low, turned off transparency, and made the border colors as solid as they could get.

Delays in responsiveness make the UI seem slow or broken, or like my computer isn’t fast enough. What baffles me is why Windows has arbitrary delays in the UI. Two examples.

The Start Menu has this right-hand pane that lists things like “Documents”, “Downloads”, “Computer”, etc. As you hover over these, a respective icon fades into view. But there’s a very noticeable delay… it feels around just under a full a second, until the fade-in begins. So your mouse can sit on the “Control Panel” text but still have the icon for “Run”.

Minimizing and maximizing windows also has noticeable delay. There’s some kind of animation, but it’s much slower than OS X. It’s distracting and slows down workflow. I went to “Advanced Performance Settings” which requires administrative access. I knew that changing the animations is a setting that reverts after reboot, but I gave it a shot. I unchecked the feature, clicked OK – and suddenly all my UI settings reverted. I was back to hideous blue, transparent Aero. Ugh. So I fixed my UI prefs again.

Next, I installed a secondary drive to copy old files. The drive showed up as “E:”, so I navigated to “E:\Users\dvander”. It asked for Administrative credentials. I typed in my password, and hit okay… but nothing happened. No response, but the drive started churning furiously. I tried again, but nothing happened. Now I was getting worried – I knew the drive had bad sectors. Was it finding more? Was it creating more? What was all this activity? Why couldn’t I access my folder?

I launched Explorer as an Administrator, but no luck. It still asked me for admin credentials, and still did nothing. I launched cmd.exe, and navigated to “C:\Users\dvander”. Surprise! I could read it fine from there. So what was going on with Explorer?

Then I noticed the path bar in Explorer was slowly filling up with green stuff. Was it Indexing? I Googled “disable indexing” and found this site. I still don’t understand the UI here. Try it out. There’s some sort of two-pane list of mutually exclusive terms, but a selection carries across both. You can select something and hit “Modify” but it seems like the “Modify” box has nothing to do with what you selected. I’m sure this is way more flexible than Apple’s Spotlight, if I could only understand it. It looks something like this (image from article):

I took a guess that “Users” was indexing everything in E:\Users, so I removed that and hit OK. Instantly, all that churning stopped, and Explorer brought up my folder!

Explorer was blocking the entire user interface on an automatic, massive, recursive indexing operation over all my files. The only feedback was an authentication dialog for something I had access to, that silently failed anyway.

With that solved, I started actually using things. The taskbar was too big, older Windows apps have icons that don’t scale properly. Hell, most actual Microsoft apps don’t. So I made it smaller, only to find that the circular “Start” button clips into the working area of a window. This did not happen on Vista. So I made the taskbar auto-hide (which still leaves a tiny sliver about 1-2 pixels thick).

Okay… I’ve been ranting a while now. Two more things. What’s with Windows 7 file dialogs?

When this thing pops up, the first thing I do is mouse wheel to get to the file I want. Oh, sorry, I meant I used to do that. In Windows 7, when this dialog appears, the mouse wheel does absolutely nothing. First you have to click inside the file list. Great, extra movement. This might even be a regression but I forget what XP did. Of all the Windows 7 idiosyncrasies, this one hits me the most frequently, and it’s really bothersome.

Finally, what’s with “Program Files (x86)” versus “Program Files”? As if spaces were bad enough in such an important path (thank God “Documents and Settings” is gone), now we have parenthesis? Why is there even a difference in the first place? It seems like some x64 stuff goes into the x86 folder, and vice versa, so what’s the point? Why isn’t this called “Programs”?

Phew. If only OS X could run Valve games.

Six Years Later, Steam is Still Annoying

It’s been a year or so since I fired up Steam to actually try and play a game. Given that I help maintain multiple mods for Valve games, that might seem surprising.

I love the concept of Steam, I’ve had it installed since the early betas, and I hated the days of scouring for files like “10161100.exe” to update Half-Life 1. But the actual software is so mind-bogglingly annoying. I cringe when I hear Steam fanboys tout it as the shining jewel in content distribution.

To demonstrate, I recently got a desktop and decided to try playing a game I paid for on Steam. I downloaded Steam, installed it, and instantly got bombarded with stuff like this:

Steam Notifications

These three things would slide down, then three more would pop up. Rinse and repeat every few seconds. Really annoying. I have Steam Friends completely pref’d off on my laptop, but I guess Steam doesn’t sync account settings across computers. I’d rather not disable Steam Friends completely, but of the three options to disable notifications, none seem related to the ones up above. So I pref’d Friends off. The little messages kept coming though, until the whole list of notifications was exhausted.

Later I came back and tried to install a game. It said “Your Steam ticket has expired” and asked for my password. I typed it in, and nothing happened. The dialog disappeared but the game wasn’t downloading. I double-clicked the game to try again. It asked for my password again, but no-go.

I tried this a few more times, then restarted Steam. When it started back up, all of the icons were missing:

Steam - No Icons

Okay, that’s weird. I double-clicked the game again, and I still got the expired ticket dialog box. I typed in my password again, but this time selected “Remember my password”. The game didn’t start installing, but the icons appeared.

I tried installing again, and now I got a new dialog box: “The Steam servers are currently too busy to handle your request.” Huh? The next try got me back to the password entry dialog box because my “Steam ticket” had expired again.

I searched Google and found a Steam forum thread describing my problem. Another thread linked from comment #11 said to try deleting “ClientRegistry.blob”, and if that doesn’t work, reinstall Steam.

So I exited Steam, deleted “C:\Program Files (x86)\Steam\ClientRegistry.blob”, and restarted. When I tried installing the game, I actually got a progress bar. By the time it had finished downloading I’d moved on to other things, but at least next time I’m in the mood to play a game on Steam, I know to delete random internal files.

This product… needs polishing.

64-bit TraceMonkey!

As of an hour ago, Mozilla’s JavaScript engine now has 64-bit JIT support turned on by default in the TraceMonkey tree only. We currently don’t ship a 64-bit browser, but we will eventually, and this is an important step in making sure we’re staying optimized on all major platforms.

If you are bleeding-edge and build your own 64-bit Firefox, you can now easily try out the JIT and report any problems.

A nice discovery was that the 64-bit engine as a whole outperforms 32-bit builds. SunSpider ran about 20% faster, comparing both 32-bit and 64-bit builds with and without the JIT.

For more information see this newsgroup post on
mozilla.dev.tech.js-engine