The Evolution of Building at AlliedModders

AlliedModders recently passed its ten year anniversary. That’s literally crazy. At some point I’d like to do a longer post on its history and the people who made it happen – but today I wanted to talk about the evolution of our build process. It’s quite complicated for your average gritty open-source project – complicated enough that we wrote multiple custom build systems. Right now we’re pretty close to state-of-the-art: AMBuild 2 is a general-purpose, optimal-by-design DAG solver. But we started off in the dregs, and it took 10 years of iteration to get where we are now.

Without further ado…

The Beginning

The first project that would become part of AlliedModders was AMX Mod X. Felix Geyer (SniperBeamer) recruited me to the team, and I provided hosting. It was January of 2004, and there were two compilers you could use: GCC 2.95, and Visual Studio 6.0. The project had a single binary built from about 20 .cpp files.

On Linux it used a single Makefile. On Windows you had to run the Visual Studio IDE. For about six months our release process was entirely hand-driven. Even as we added more binary components, I’d build each one by hand, sort them into packages by hand, drop them into SourceForge FTP, then one by one add each package to a release.

As components became more complicated, I eventually got frustrated by Make’s lack of simple programmatic control. If you’re unfamiliar with GNU Make, it has something resembling the worst scripting language ever made. I couldn’t bring myself to touch it, and in August 2004 created two Perl scripts:

  • Makefile.pl – A straight translation of the Makefile to Perl, but with more programmatic control over how and what gets built.
  • package.pl (which I no longer have) – after building binaries by hand, this script would build packages and upload them to SourceForge.

Over the next year, AMX Mod X grew rapidly. The development team and community were very active, and the Half-Life 1 modding scene was at its peak. By July 2005, AMX Mod X had over 20 binary components and supported three platforms: Windows, Linux i386, and Linux AMD64. Releasing by hand took hours, and maintaining 20+ random copies of a Perl script was really difficult. Something had to change.

Fixed Pipeline Era

The first step was ditching Makefile.pl. I replaced it with a much cleaner Makefile. Most of the complexity was stripped away and reduced to a few variables and filters.

But I was still pretty new to software engineering, and made a major mistake – I didn’t separate the guts of the Makefile into something reusable. As a result the whole thing got copied and pasted to every component, and to components in other projects. To this day about 50 variations of that Makefile are scattered around, though it’s unclear whether anyone still uses them. Up until 2009 we were still syncing changes across them by hand.

The next step was automating all the component builds and packaging. AMX Mod X had weird packages. There was a “base” package, and then optional addon packages to supplement it. The steps for each package were similar, but they all built different sets of components. The result: I ended up writing something we affectionately called “the C# tool”: AMXXRelease. It had three pieces:

  • A general pipeline for constructing an AMX Mod X package.
  • An abstraction layer that decided whether to build via GNU Make or MSBuild.
  • Classes for each package, usually just supplying filenames but sometimes adding custom steps to the pipeline.

The C# tool survived many years. Though it was a hard-coded tool written in a verbose, rigid framework, it got the job done. AMX Mod X continued to use it until 2014, when we finally ditched it as part of a transition to GitHub.

Interlude: SourceMod

In 2006, Borja Ferrer (faluco) and I secretly begun work on the next major AlliedModders project: SourceMod. SourceMod was a rewrite of AMX Mod X for the Half-Life 2 Source engine. By 2007 it was nearing ready to go public as beta, so I ported the C# tool. Everything was peachy. Until Half-Life Episode 1 came out.

At the time, it wasn’t conceivable to us that Valve would maintain multiple forks of the Source engine. Half-Life 1 was all we knew. Valve had never changed Half-Life 1 in a way that couldn’t be fudged with some detection and binary hacks. But Source was different – it was vast and complex compared to its predecessor. Valve began changing it frequently – first with Episode 1 – and then again with the Orange Box and Team Fortress. Now there were three versions of the Source engine, all totally binary incompatible. The same code didn’t even build across their SDKs, and linking was too complex for something like a “universal Source binary”.

Suddenly our Makefile and Visual Studio project files were getting ridiculously hairy. Not only did SourceMod have a dozen binary components, but it had to build each one multiple times with different parameters, across two different platforms. By mid-2009 we realized this was crazy and started looking at other options.

AMBuild 1: Alpha-Gen Building

I evaluated two build systems for SourceMod: CMake and SCons. CMake was a contender for “worst scripting language ever”. After a lot of futzing I concluded it was impossible to replicate SourceMod’s build, and even if I could, I didn’t want to see what it would look like. SCons came much closer, but it didn’t have enough flexibility. Source binaries required a precise linking order, and at the time SCons had too much automagic behavior in the way.

We needed something where we could template a single component and repeat its build over and over with small changes. We needed reasonably accurate minimal builds. We needed platform-independent build scripts, but also tight control over how the compiler was invoked. So in August 2009, I dove into Python and wrote our own general-purpose build tool, dubbed “AMBuild 1”.

I would describe AMBuild 1 as “rough”. It was definitely extremely flexible, but it was buggy and slow. Each unit of work was encapsulated as a “job”, but jobs did not form dependencies. It had some hardcoded, special logic for C++ to support minimal rebuilds, but since it was special logic, it broke a lot, and because it was recursive in nature (like Make), it was very slow. Since jobs didn’t form dependencies, they couldn’t be parallelized either. Everything ran sequentially. The API was also pretty grody. The root build file for SourceMod was somewhat of a nightmare.

But… it worked! Suddenly, supporting a new Source engine was just a matter of adding a new line in its central build file. It worked so well we began migrating all of our projects to AMBuild.

Over time, as always, things began to break down. By 2013 there were twenty versions of Source, and AMBuild 1 was nigh unusable. Minimal rebuilding was inaccurate; sometimes it rebuilt the entire project for no reason. Sometimes it produced a corrupt build. Builds were also EXTREMELY slow. A complete Windows build could took two hours, and multiple cores did nothing. There were so many headers in the Source SDK, and so many file scans for dependency checks, that merely computing the job set for an update could take 15 seconds.

AMBuild 2: Modern Era

Sometime in 2013 I found Tup and was immediately inspired. Tup “gets it”. Tup knows what’s going on. Tup knows what it is. Tup is what build systems should aspire to be. In short, it has two philosophies:

  • Builds should be 100% accurate. Not 99% accurate, or 37% accurate, or “accurate if you clean or clobber”. They should be 100% accurate all the time.
  • Builds should be fast, because waiting for builds is a waste of time.

The design of Tup is simple on the surface. Rather than recursively walking the dependency graph and comparing timestamps, it finds a root set of changed files, and then propagates a dirty bit throughout the graph. This new “partial graph” is what has to be built. In practice, this is much faster. In addition, Tup caches the graph in between builds, so if your build files change, it can compute which parts of the graph changed.

Well, that’s awesome. But… I didn’t want to take on Tup as a dependency for AlliedModders. I have a few reasons for this, and I’m still not sure whether they’re valid:

  • Tup requires file system hooks (like FUSE) and I was worried about its portability.
  • Tup is a C project, and not yet in Linux distros, so we’d have to tell our users to install or build it.
  • AMBuild’s API, and of course Python, together make for a much better front-end than Lua.
  • Tup didn’t seem to include external resources (for example, system headers/libraries or external SDKs) in its graph.

So I sat down and began rewriting AMBuild with Tup’s principles in mind. The result was AMBuild 2. With a real dependency graph in place, the core of the build algorithm became much simpler, and parellelizing tasks was easy. The API got a massive overhaul and cleanup. It took about two months to work all the kinks out, but in the end, my SourceMod builds took about a minute locally instead of 15. And with some smarter #include’ing, our automated builds went from 45 minutes to 2 minutes. The build scripts don’t look too bad, either.

What’s next?

AMBuild 2 isn’t perfect. Its biggest problems are scalability and internal complexity. I’m a huge fan of Python, and it’s nearly everywhere now. But it’s slow. Dog slow. AMBuild 2 would not scale to a codebase the size of Mozilla or Chromium. In addition, Python’s concurrency support is very bad, even taking into account the multiprocessing module. Most of AMBuild’s complexity is in its IPC implementation.

Early on in its development a common suggestion I got was to separate AMBuild into a front-end (the build API) and back-end (the build algorithm). I heard this theory float around for Mozilla’s build system too. CMake is structured this way. Separating the components out means they are replaceable: AMBuild 2’s frontend could generate Makefiles instead of an AMB2 DAG, or it could read Makefiles and parse them into a DAG.

I eventually concluded that didn’t make much sense. The goal of AMBuild 2 is not to provide a general backend or general frontend: it’s to produce correct builds as quickly as possible. In fact, it considers most other build systems to be broken and inferior. Taking an AMBuild project and generating Makefiles defeats its entire purpose.

However, there is something very useful that comes out of this separation, which is the ability to generate IDE project files for casual use or local development. Keeping this in mind, AMBuild 2 did separate its front-end from its back-end, but the API makes too many assumptions that don’t have clean mappings to IDE files. As we experiment with Visual Studio and XCode generators, we’ll probably have to iterate on the API.

Lastly, it’s worth nothing that part of what AMBuild solves is papering over an early design flaw in SourceMod: with proper abstractions, it wouldn’t need 20 different builds. In fact, over time we have moved a large amount of code into single-build components. Unfortunately, due to the complexity of what’s left and the Source SDK in general, it would be nearly impossible to completely eliminate the multiple-build steps.

Ok

There are very few projects I’m truly proud of, where I look back and have very few regrets. (One is IonMonkey, but that’s a story for another day.) AMBuild is one of them. It’s not particularly stellar or original, but we needed to solve a difficult problem, and it does the job very well. I wouldn’t be surprised if it takes another 10 years of iteration to make it perfect – but that seems to be how things work.

Making a Card Game Prototype

Recently I wanted to try quickly prototyping a few card game ideas, but I couldn’t find much information on how to actually do it. After a few iterations I figured I’d jot down what I ended up doing.

My first attempt was pretty laborious. The process went something like this:

  • Create a sheet of card frames in PaintBrush.
  • Paste in clipart and text.
  • Print about 5-10 copies of the sheet.
  • Cut along the outer borders.
  • Shove the paper into card sleeves.
  • Throw Magic: the Gathering cards in the sleeves for support.
Iteration 1, Card Sheet
Iteration 1, Card Samples

Well, this worked, but it had a lot of problems. Any change in the card frame meant redoing all the cards. The text was really hard to align. I used actual scissors instead of a cutting board, so the edges were sloppy and my hands got super tired. Sleeving everything took forever.

But! After years and years of drafting Magic: the Gathering, I’ve got boxes and boxes of cards sitting in the closet. The vast majority are either jank (too weak to use even in casual decks), or duplicates of very common cards. I’d feel bad throwing them out, but all they do is take up space.

Enter Avery Adhesive Name Badges. It turns out these are 2.33×3.38in – Magic cards are 2.5×3.43in. These stickers fit cleanly within the black border of modern Magic cards! Now it’s just a matter of killing two birds with one stone: printing the prototype cards and using up worthless Magic cards in the process.

The first thing I did was ditch PaintBrush for Adobe InDesign. I don’t know much about graphic design, but it’s not too hard to use. I made a simple card frame template for each card type (sample below). This one has five blank areas: Title in the upper-left, Cost in the upper-right, Image at the top, Rules Text at the bottom, and Points at the bottom-left.
Iteration 2, Card Layout

The next step is generating a CSV file with the card designs on them. InDesign will use this file to generate a finished card for each row. For example:

Title,Cost,Points,@Image,Text
"Hamburger",5,+3,"C:\cards\hamburger.png","Hamburger is delicious."
"Yams",3,+2,"C:\cards\yams.jpg","If you currently smell yams, you get an extra 2 points."

Next, InDesign’s “Data Merge” tool comes in. This lets you associate each CSV column with a container in the document. After doing that, and clicking the Merge icon, we get finished cards:

Iteration 2, Merging Cards
Iteration 2, Merged Cards

Now that we’ve got cards, the next step is actually getting them to the printer. So far I’ve toyed with a few ways of doing this, but the fastest approach I’ve got is to use InDesign’s Data Merge feature again. Avery doesn’t provide an InDesign template for their sticker sheets, so I got out a ruler and measured out my own. It’s not perfect but it’s accurate enough for prototyping. I then export the card images, and make a new CSV file. This time each column represents a slot on the sticker sheet. Since I’ve only got two cards, the sheet will have duplicates:

@Slot1,@Slot2,@Slot3,@Slot4,@Slot5,@Slot6,@Slot7,@Slot8
"C:\cards\card-1.png","C:\cards\card-1.png","C:\cards\card-1.png","C:\cards\card-1.png","C:\cards\card-2.png","C:\cards\card-2.png","C:\cards\card-2.png","C:\cards\card-2.png"

Before the data merge, and after:

Iteration 2, Merging Card Sheet
Iteration 2, Merging Card Sheet

Now the document can be directly printed onto the sticker sheet. Since the sheets aren’t very cheap, I do a sample run first and make sure they line up. If it looks good… print and apply! Magic jank card pictured below, and the final product. The cards are only slightly thicker than Magic cards. The stickers withstand shuffling, though they have more friction than the Magic card surfaces.

Stickering cards up.
Final product.

This process works very well for proxying actual Magic cards too, or making your own Magic cards. I’m reluctant to show actual images of this – despite it being obviously not a counterfeiting attempt (you can tell there’s a huge sticker applied), I don’t want Wizards of the Coast suing me. But with some coding, my friend and I were able to build a draft set for Limited Edition Beta. Two sheets gets you 16 cards, so we stuck 1 rare, 3 uncommons, and 9 uncommons, randomly chosen, onto two sheets at a time. Apply the stickers onto other Magic cards, and you’ve got a booster pack of proxies for drafting cards that for all intents and purposes are unobtainable. We haven’t done the actual draft yet – but I’m guessing it’ll be pretty rough compared to modern sets.

How did the actual prototyped game turn out? Tragically, it didn’t involve hamburgers or yams. More importantly, I forgot to include a win condition. Eventually we realized there was no way to end the game so we stopped playing. But hey, that’s what playtesting is for.

Inheritance Without Types

At AlliedModders we dabble in practical language design. Our two major scripting projects, AMX Mod X and SourceMod, iterated on our own fork of a scrappy language called Pawn. Our next iteration is untyped, and we’re adding object support.

What does object support mean? “Well, obviously,” I naively thought, “it means Java-style implementation inheritance!” Whoops. WRONG. It turns out there’s a ton of complexity with inheritance once you take away types. This post is a brief survey of some popular untyped languages and what we ended up deciding.

The Goal
We want our language to have as few pitfalls as possible. I’m not sure if I stole this from Graydon Hoare, but our motto is “No surprises!” We want people to be able to write code that works the way they expect, minimizing random or unforeseen run-time failures.

A great aspect of C++’s inheritance is that information hiding is super easy. A base class’s behavior isn’t trampled by derived classes. For example:

Select All Code:
 

As you’d expect, this prints ‘5’. The fact that Derived also declares x does not trample on Base‘s own behavior. Now let’s take an untyped language, Python 3:

Select All Code:
 

This prints ’20’. In Python, “self” is really one object, with one set of properties and methods, whereas in the C++ model “this” is conceptually two objects, and “this” is statically typed to be the object of the containing class. I consider the Python model unappealing for two reasons:

  1. When adding a member to a derived class, you have to know that you’re not colliding with a member of your base class.
  2. When adding a member to a base class, you might be breaking any number of unknown consumers.

JavaScript’s prototype-based inheritance discourages this style, but JS is super flexible, and if you don’t mind a very, very inefficient prototype chain, you can do this:

Select All Code:
 

This prints “5”.

PHP’s Answer
For whatever reason, PHP takes Java’s model wholesale. Here’s an example:

Select All Code:
 

This prints ‘5’, which is what we want – and makes sense. But is $this statically typed to the Base class, like in C++? No:

Select All Code:
 

This prints "hello", so PHP’s $this retains some dynamicism. Let’s up the ante. What should this do?

Select All Code:
 

This prints true in both cases! For any property access in a PHP class function, if the object has that class on its inheritance chain, it uses the property on that class. Otherwise, it searches from the derived-most class back to the base class like normal. This implicitly hides variables on the derived object. Nonetheless it’s the right choice given the model, especially considering that it’s usually bad practice for a base class’s behavior to explicitly cast to one of its derived class.

Note that the fact that the inner x is private is actually irrelevant. Even if it were public, the base class should retain its behavior. Banning redeclaration works, though then you run the risk of potentially preventing an unknown consumer from compiling (albeit, better than silently being wrong). Similar issues occur with introducing new virtual functions or overloads.

So, what did we do?
After letting this all sink in, I decided to scrap inheritance as a feature of our next language iteration. Having objects alone will be a huge step forward, and we can evaluate use cases from there. I’m satisfied with this for a few reasons.

Inheritance is really complicated with types, and even more complicated without. And in an untyped language, it’s not even clear if implementation inheritance is useful. PHP’s interfaces and abstract classes seem verbose and heavyweight in comparison to JavaScript or Python, in return for a very small amount of static guarantees.

So far Pawn and our derivatives have mainly been embedded in games, and the majority of those games are based on the Half-Life engines (including Source). We want our object model to be compatible with the game’s internal object model, however, I’m not convinced that copying it would be ideal. The Source engine has the “Big-Ass Base Entity Class from which Everything Derives,” a painful and complex architecture continuing Source’s theme of not being scalable. Do we really want to encourage that?

I suspect we’ll end up with something like traits and/or composition, but for now, I’m content with keeping it simple and continuing to evaluate the strengths of other languages’ models.

(Addendum: I don’t much about Perl or Ruby. I tried quickly evaluating them for this post but the syntax was a little intimidating. I’d really appreciate insight into other languages.)

Addendum 2: SaberUK contributes this Ruby example which prints 20 – exhibiting similar behavior to Python.

Select All Code:
 

On Indonesia and Being Justin Bieber

Two weeks ago, a bunch of Mozillans went on a whirlwind tour of Indonesia, visiting Firefox 4 release parties around the country. We had a great time, and I found the trip really enlightening. Luke Wagner and Christian Legnitto handled the first four days of the trip, and then Dave Mandelin, Josh Aas and I did the rest.

Dave and I started off in San Francisco, making a 22-hour journey to Jakarta, with a stop in Taipei. We brought along extra suitcases full of Mozilla swag to give away: by my count about eight-hundred billion elastic headbands. It must have looked like a ton of cocaine when we put our luggage through the X-ray machine at customs. The security guards opened the suitcases up and looked perplexed, but let us go after I donned a headband for demonstration.

After immigration we met community members Viking Karwur (@vikingkarwur), Yofie Setiawan (@yofiesetiawan), and Arry T, the owner of a local private school. Arry took us to a restaurant where you order by walking around rows and rows of open coolers, filled with random frozen fish, live crabs, mussels, et cetera. We paced around with an empty cooler and grabbed what seemed like one of everything. (Mr. Burns: “I’ll just have a glass of milk… from THAT cow.”) Lo and behold about thirty minutes later it all arrived at our table fully cooked, and tasted great.

The next day we got up early and flew to Surabaya with Yofie (he was excited, it was his first time flying). The Surabaya venue took place in a shopping mall, with maybe 100-200 people showing up. There was a talk about web design (spoken entirely in Indonesian, but the slides were in good English), another from Yofie about, I think, Mozilla Indonesia, and then Dave, Josh, and I each presented a little bit about Firefox 4, concluding with JSNES and Flight of the Navigator demos. Then there was a kind of ribbon-cutting ceremony which involved a giant mound of rice. Each of us had to take a spoonful off the top without dropping it, which was surprisingly tricky.

What really surprised me was how much everyone liked Firefox – it has a huge market share in Indonesia, something like 77% – and how people were really excited that we came to visit. I think we spent a full thirty minutes posing for photographs at each event. As a C++ developer you don’t get treated like a rockstar too often, so that was pretty awesome (albeit exhausting). Afterward we got some local food, but I was so tired I don’t remember what it was.


The next day we drove three hours to Malang, which was actually a good chance to get a look at the country. The venue was at the polytechnic university, where we were treated to a plaque and boxes of local snacks. The students there had great questions. Recurring ones were: why does Firefox use so much memory? (we’re working on it), and when will it get a Blackberry port? (likely, never, but it’s a popular phone there). After Q&A we each talked a bit about why we like working for Mozilla, and Dave cut a ribbon which released a big balloon. A few students were interested in how to start a career at Mozilla, I wasn’t quite sure what to say but the internship program and just community involvement seem like great vectors. Unfortunately, Dave got pretty sick on the ride back to the hotel, so he was down for the next day.

At the Makassar event we temporarily lost Yofie as our babysitter, instead being placed in the capable hands of Rara (@rara79) and Mamie (@alwaysmamie). The event was in a café, and a structured a little more rigorously than the others – most questions were in Indonesian and translated by the MC. The audience seemed to be lots of bloggers and web developers in the community. Questions seemed to fall into four main categories: Memory use, support for Java phones, UI changes and support issues, and wondering where the money comes from. All good questions! Afterward, we went to Pizza Hut (of all places – it was actually pretty fancy), and got cheese-stuffed crust with cornflakes.


Finally, on the last day, we flew to Bali and met up with Viking and Yofie again. Josh and I relaxed at the resort pool which was occupied mostly by Australians. In the evening we went to a local café for a smaller venue, maybe a dozen people, with a local radio host as MC. At our table sat an American expatriate, Ken McClellan, CTO of Mitrais who talked about what it’s like living and running a company in Indonesia.

Indonesia is very clearly a developing country. Foreigners are advised not to drink the tap water, and to be careful about uncooked food, as they won’t have the same resistances as locals. In cities you can see a striking mix of poorer and middle-class areas, and the pollution near congested roads can be very noticeable, at least if you’re asthmatic like me. Traffic is amusingly chaotic – turn signals, stop lights (there are very few), and speed limits are all optional. When you’re in a car, honking is nearly constant as an advisory measure, since there are no clearly defined lanes, and the majority of vehicles on the road are motor scooters. Jakarta even had motorized rickshaws, most were very old, adding to the anachronistic mix of old and new technology. It seems like despite this, traffic was actually pretty reasonable, though as a Californian used to big open roads with camera-enforced stop lights every ten feet, I’d be too nervous to try driving.

The national airline, Garuda, was much higher quality than any domestic airline in the States. It kind of reminded me of what flying was like before 9/11, or when American airlines weren’t terrible. I guess what sums this paragraph up is that the flight seat pockets had pamphlets advertising “Investing in Indonesia.” I’m really curious as to what the country will look like in 20-30 years, because it seems like it’s growing rapidly.

All in all, it was a great trip, and I’d definitely go back. It helped having a local guide; Yofie, Viking, and others’ help was really appreciated. The food was great (Nasi Goreng anything, Oxtail Soup were my favorites), and cheap (coming from USD), and everyone we met was friendly – even complete strangers walking around. It was awesome seeing everyone so excited about Firefox, and whatever we’re doing that makes Firefox so popular in Indonesia, we should figure it out and do it more!

I am not Justin Bieber.

I would be remiss if I didn’t conclude by mentioning how people thought I was Justin Bieber. It was so persistent that Josh started calling me Justin. Near the end of the trip, swimming in Bali, he said that when my hair was slicked back it was “way more manly,” so I tried that for a day, but decided I couldn’t afford the daily metric ton of hairgel. Incidents of mistaken Bieber identity:

  • A few times in person at each event, including this tweet.
  • People were whispering it at the airport in Makassar.
  • Walking around Makassar, people rolled down the windows of their cars and called out at me and Josh.
  • Running into a group of loitering teens, they asked “who I was” and burst out giggling when I shrugged.
  • An airport employee in Makassar waved to me from behind a booth and yelled “Justin Bieber”! – I had to wave back.

Well, now that my pop star career is over, I’ve decided to go into dynamic language optimization.

Debugging 500 Errors

For the past two years, the AlliedModders site has been randomly dishing out “HTTP 500 – Internal Server Error” responses. It’s been pretty frustrating.

I actually tried debugging the errors two years ago. I got far enough to attach GDB to PHP, and saw it blocked on mysql_query(), but didn’t know how to proceed. We knew that during these incidents, the server load would spike to obscene levels – like 100-150. Lots of PHP processes were getting spawned. “iowait” time was through the roof, and CPU usage was normal. Something couldn’t read the disk, or the disk was getting hammered. I put MySQL on a separate hard drive, but the problems persisted and I gave up. Keep in mind: I didn’t know anything about running servers, and I still don’t – I just have a vague notion of how to debug things.

Fast forward to a few weeks ago, we had a serious hard drive crash and suffered some painful downtime. We lost a lot of data. I clung to a hope that our years of errors had been caused by a failing disk. Predictably, we weren’t so lucky, and the website problems continued. Someone joked, “Hey, I’m glad to see you got the server back up, even the 500 errors!” Yeah… it was time to fix this. Somehow.

Step One: Apache Logs

There were only two notable Apache errors. They’d appear over and over, but only in large clumps:

[Tue Jan 25 04:15:19 2011] [warn] [client xx.xxx.xxx.xx] (110)Connection timed out: mod_fcgid: ap_pass_brigade failed in handle_request function
[Tue Jan 25 04:15:25 2011] [warn] [client xxx.xxx.xx.xx] mod_fcgid: read data timeout in 60 seconds

It was pretty clear that somehow, communication was breaking down between mod_fcgid (Apache) and PHP processes. I read, I think, every single webpage on the Internet involving these two errors. Twice. They were all dead ends, both for us and both for the person reporting their own problems. A little sad. I wanted to blame mod_fcgid, since historically it has been really buggy for us, but I wanted to pin absolute blame.

So the next step was to break out a debugger.

Step Two: Debugging PHP, Apache

AM developer Fyren edited the mod_fcgid source code, such that instead of killing PHP, it would attach gdb in a separate tty. It usually took around three hours to get a debugging session going. The call stacks looked like:

(gdb) bt
...
#4 0x00007f1f7cad804f in net_write_command () from /usr/lib/libmysqlclient.so.15
#5 0x00007f1f7cad4c51 in cli_advanced_command () from /usr/lib/libmysqlclient.so.15
#6 0x00007f1f7cad1751 in mysql_send_query () from /usr/lib/libmysqlclient.so.15
#7 0x00007f1f7cad17b9 in mysql_real_query () from /usr/lib/libmysqlclient.so.15
...
(gdb) fin
Run till exit from #0 0x00007f1f7cad17b9 in mysql_real_query () from /usr/lib/libmysqlclient.so.15

At this point, the session would hang, indicating that MySQL was not responding. I really had no idea what to do, but we got lucky. We noticed during these incidents, Apache’s server-status was stuck at “200 requests being processed,” all in the “W” (“Sending reply”) state. Our MaxClients setting was exactly 200. Now we had a hypothesis: MySQL was intermittently not responding for short bursts of time. This blocked PHP, which in turn made mod_fcgid block and timeout, causing Apache to spawn a bunch of workers that would lock up as well.

Unfortunately, the intermittent failures only happened a few times a day, so debugging went slow. We managed to catch one, and ran iostat:

Device:          r/s       w/s   rsec/s    wsec/s    avgrq-sz  avgqu-sz   await  svctm  %util
sdc              320.80    4.40  3540.80    54.40    11.06     4.32   13.01   3.08 100.00
sdc1             320.80    4.40  3540.80    54.40    11.06     4.32   13.01   3.08 100.00

Someone was reading tons of data off the disk, and iotop confirmed MySQL was the culprit.

Diagnosing MySQL

I managed to find this enlightening presentation by Percona. Case study #3 looked identical to ours, except in their case, MySQL was writing. But their conclusion about bad queries seemed plausible. I made a script that would peek at server-status every second, log a ton of data, and e-mail me if it went above 100 concurrent requests. The data included oprofiling, iotop spew, SHOW FULL PROCESSLIST, SHOW OPEN TABLES, SHOW GLOBAL STATUS, etc.

Oprofile’s report wasn’t too interesting, but the MySQL status showed around 150 queries waiting for a table lock. The active queries, in all cases, were SELECTs. This explained the reads, but not the table locks. This didn’t make sense, so I generated a few core files of MySQL from within GDB, then Fyren and I did some post-mortem analysis.

The analysis revealed that MySQL had about 200 threads, most of which were waiting on locks (pthread_cond_timedwait) with a timeout of one year. The locks were almost all read locks for SELECTs, and the actually succeeding queries were exclusively SELECTs. Fyren tried to track down the thread owning the locks, but it ended up being apparently circular (thread X’s lock was held by thread Y, and thread Y was waiting on a lock held by thread Y).

That was all pretty mysterious (and still is), but rather than try to understand exactly what locks are in MyISAM tables, I took a look at the non-deadlocked queries. They were all one of: the AMX Mod X plugin search, the SourceMod plugin search, or the vBulletin member list search.

It turns out all three of these queries were extremely expensive – containing ORs, joins, full table scans of 800MB tables, etc. On average they each took about 500ms to complete. That’s without MySQL freaking out. I turned off vBulletin member searching (most large boards do), and replaced the plugin searches with faster queries off a much a smaller table. This brought those page load times down to 20ms. Twelve hours later, no 500 errors had occurred, and there were no mod_fcgid failures in Apache’s logs.

Too Early to Declare Victory

While waiting to see if any more 500 errors would show up, suddenly the server load spiked, and lo and behold – mod_fcgid became overloaded. Our monitoring script kicked in, and this time, MySQL was nowhere in the profile! It turned out some forum users were effectively DoS’ing the site by holding down F5 on their profile pages to bump up the “view” count. OProfile suggested that 20% of CPU time, during this incident, went to PHP parsing and lexing source code. I installed eAccelerator and now that time appears to be gone. (And for good measure, I removed profile view counts.)

Growth and Traffic

It’s unclear what our load and traffic really is. I think we’re pretty good though. CPU utilization is usually under 20%, disk I/O is under 10%, and we rarely see more than 30 concurrent requests. We don’t need to invest in a separate MySQL server yet, and we definitely aren’t growing fast enough to expect to need one anytime soon.

Community Involvement

I would like to thank Fyren for his help in looking into these problems, and asherkin for helping beef up our non-existent monitoring. Also, thanks to AzuiSleet, Drunken_Fool, MatthiasVance, and devicenull for sticking around in IRC and answering random questions.

Diagnose, then Solve

Many people, well-intentioned, said “switch to lighttpd, nginx, IIS, use my favorite host, use a VPS, buy more servers,” etc. I got caught up in this too, and fueled it by initially blaming mod_fcgid. It’s really important to make sure a solution has targeted a specific problem, and I had to keep telling myself to keep a cool head. Sometimes I had to slap myself or wait for Fyren to slap me. They all might be good ideas in their own right, but without diagnosing the actual cause of the errors, there was no guarantee of a fix. By careful debugging (albeit in painful, hour-per-day sprees), we got pretty close to the root cause, whereas throwing our infrastructure into a salad bowl would have delayed the inevitable.

Part of the reason our original drive is still at the recovery lab is because someone didn’t diagnose before “solving.” When it failed, a technician (not one of us) ran fsck on the disk instead of doing a hardware diagnostic first. I have no doubt this caused untold extra damage.

Conclusion

tl;dr, the forums should be much faster now, and it shouldn’t error anymore. I’ve got the system set up to e-mail me and dump diagnostics when our load goes way above normal, so I’ll continue monitoring for potential problems.

I would like to sincerely thank the whole AlliedModders community for the successful donation drive. I won’t lie, the past month has been pretty sucky. Despite our failings, the fact that everyone cares enough to see the whole thing keep going is really awesome. Our seven year anniversary was just a few weeks ago. Here’s to an AlliedModders for 2011!

JavaScript Checkers

I’ve been itching to write some JavaScript, so a few days ago I threw together a Checkers game that uses only HTML5 and JS. It’s got a simple AI that terminates after three seconds. The faster your browser can run JavaScript, the smarter the AI will be.

For example, here is Firefox 4 Beta (blue) versus Firefox 3 (red) – it happens that Firefox 4 is roughly 10X faster at this program, and soundly defeated its predecessor:

Firefox 4 Beats Firefox 3 at Checkers

The algorithm for the AI is UCT, a form of Monte-Carlo Tree Search. The idea is to estimate the likelihood of winning from a position by simulating hundreds or thousands of random games. UCT helps prune the search space by quickly eliminating bad positions.

While writing the source for this, I tried to turn off the part of my brain that said, “Hey! I know Firefox version x.y.z might be slow or fast at feature X.” That turned out to be okay, but it was harder to avoid general knowledge about JavaScript engines. I ended up with three variations:

  • Fast Checkers, which has manual inlining, manual strength reduction, hardcoded constants, and other gross stuff. Pieces and positions are represented via packed integers.
  • Slow Checkers, which removes manual inlining, strength reduction, and baked-in constants. Here, the additional overhead is extra memory accesses and function calls.
  • OO Checkers, which is the same as “slow”, except represents pieces as objects instead of packed integers. Here, an additional overhead is object allocation.

Performance seems uniform across most browsers. Below I’ve recorded the number of games each browser could simulate per second. Higher is better. Note – this chart is totally unscientific, and random simulations are affected by the behavior of Math.random().

Fast Checkers Slow Checkers OO Checkers
Firefox 4 Beta 14035 9018 9100
IE9 PP6 14579 8234 8067
Opera 11 Alpha 13165 8178 8749
Safari 5 12442 8045 8700
Chrome 9 Canary 4160 2060 2343

And – because why not – I ran them on my Droid 2 as well.

Fast Checkers Slow Checkers OO Checkers
Fennec 2b3pre 338 170 220
Android Browser 185 93 114
Opera Mobile 166 112 126

Since I’m pretty bad at web development, and don’t write JavaScript (sans test-cases) nearly as much as I should, this was an amusing experience. I kept making some really dumb mistakes, one repeatedly:

Select All Code:
Game.prototype.player = function () {
    return this.board.player;
}
...
var player = game.player;
if (player == x) { ...

And wondering why “player” showed as a function in the developer console. I probably should have used ES5 getters. A few other language features would have made the experience a little nicer – but nothing so much as real modules. I tried to emulate good encapsulation with closures, but it’s just not the same. And it doesn’t seem like any engine is smart enough yet to propagate constants through closures (which is one difference between the “fast” and “slow” variants).

Using developer tools for the first time was also an interesting experience. Firefox 4 and Chrome can debug code with a reasonable slow-down, but IE9 became over 100X slower; presumably it cannot debug with the JIT yet. I used Firebug until I needed single-stepping (we didn’t have that working in JägerMonkey for Beta 7), and then bounced over to Chrome – both proved really invaluable. I think my days of calling alert() are over.

Anyway, it was fun writing this, and I’m glad that I can write something computationally expensive and have it be fast everywhere. If and when time permits I may try a more stimulating game like Go.

Land Ho, Fast JavaScript!

Firefox just got a whole lot faster.

I’m excited to announce that Mozilla’s new JavaScript engine, JägerMonkey, is now available for testing!

What is JägerMonkey?

JägerMonkey is our new optimizing JIT compiler for JavaScript. It sits underneath our existing JIT, TraceMonkey, which appeared in Firefox 3.5. If you recall from previous posts, TraceMonkey’s job is to optimize loops to very fast machine code. However, not all code has loops, and not all loops can be trace compiled.

JägerMonkey is a general-purpose compiler which converts entire methods to machine code. The goal is to get great baseline performance. When it detects a loop that can be traced, it automatically engages the trace compiler, which makes it even faster. Yes, that’s right: there’s a turbo button inside.

This hybrid approach is designed to use well-established optimization techniques that work everywhere, and combine them with our existing hyper-optimizing engine that handles smaller subsets of code.

Results.

If you’ve been obsessing over Are We Fast Yet? like me, you’ve seen the numbers dive. Want to try it out? Click here to get preview builds of Firefox 4 with our new JavaScript engine. You can play demos like JSNES at a full, glorious 60FPS.

Disclaimer: It’s a preview – we’re still ironing out the rare kinks. Please report bugs or tell us if something’s wrong (or slow!)

Benchmarks.

We’ve been using the SunSpider 0.9.1 and V8-v5 benchmarks to gauge our general progress. SunSpider is a full 2X faster over Firefox 3.6!

Our improvement on the V8-v5 benchmark is even more dramatic – 4X!

Ongoing Work.

The rockin’ doesn’t stop here. Right now we’re polishing off the final pieces to get into the next Firefox 4.0 Beta. At the same time, here are some of the immediate performance works-in-progress:

  1. Function Calls. As discussed previously, this is one of our last big areas of optimization. The first of four major pieces, caching call sequences, was completed this week. The second big chunk, which Luke Wagner has slated for this week, will make arguments and stack frames faster. Brian Hackett, Chris Leary, and Bill McCloskey have more stack frame optimizations as part of the third wave.
  2. Tracer Integration. Deciding when to use the turbo button is pretty hard, but Bill and Dave have been researching it thoroughly. Right now we’re just scratching the surface, and we will have much better heuristics by the end of the month.
  3. Web Optimizations. Community member Jan de Mooij is continually finding demos and real-world tools and improving performance “gotchas” in our engine, like making common arithmetic patterns faster.

Conclusions.

Firefox 4 is seeing dramatic wins over 3.6 and the web is feeling faster. You can try it out now using a JS Engine Preview, or wait for Firefox 4 Beta 6.

Please stay tuned as we approach JägerMonkey end-game for Firefox 4. Dave Mandelin and I will be blogging, and for smaller things, tweeting (his here) progress & technical updates.

JägerMonkey has Crossed the Streams

On July 12th, JägerMonkey officially crossed TraceMonkey on the v8 suite of benchmarks. Yay! It’s not by a lot, but this gap will continue to widen, and it’s an exciting milestone.

A lot’s happened over the past two months. You’ll have to excuse our blogging silence – we actually sprinted and rewrote JägerMonkey from scratch. Sounds crazy, huh? The progress has been great:

AWFY feed, v8-richards

The black line is the new method JIT, and the orange line is the tracing JIT. The original iteration of JägerMonkey (not pictured) was slightly faster than the pink line. We’ve recovered our original performance and more in significantly less time.

What Happened…

In early May, Dave Mandelin blogged about our half-way point. Around the same time, Luke Wagner finished the brunt of a massive overhaul of our value representation. The new scheme, “fat values”, uses a 64-bit encoding on all platforms.

We realized that retooling JägerMonkey would be a ton of work. Armed with the knowledge we’d learned, we brought up a whole new compiler over the next few weeks. By June we were ready to start optimizing again. “Prepare to throw one away”, indeed.

JägerMonkey has gotten a ton of new performance improvements and features since the reboot that were not present in the original compiler:

  • Local variables can now stay in registers (inside basic blocks).
  • Constants and type information propagate much better. We also do primitive type inference.
  • References to global variables and closures are now much faster, using more polymorphic inline caches.
  • There are many more fast-paths for common use patterns.
  • Intern Sean Stangl has made math much faster when floating-point numbers are involved – using the benefits of fat values.
  • Intern Andrew Drake has made our JIT’d code work with debuggers.

What about Tracer Integration?

This is a tough one to answer, and people are really curious! The bad news is we’re pretty curious too – we just don’t know what will happen yet. One thing is sure: if not carefully and properly tuned, the tracer will negatively dominate the method JIT’s performance.

The goal of JägerMonkey is to be as fast or faster than the competition, whether or not tracing is enabled. We have to integrate the two in a way that gives us a competitive edge. We didn’t do this in the first iteration, and it showed on the graphs.

This week I am going to do the simplest possible integration. From there we’ll tune heuristics as we go. Since this tuning can happen at any time, our focus will still be on method JIT performance. Similarly, it will be a while before an integrated line appears on Are We Fast Yet, to avoid distraction from the end goal.

The good news is, the two JITs win on different benchmarks. There will be a good intersection.

What’s Next?

The schedule is tight. Over the next six weeks, we’ll be polishing JägerMonkey in order to land by September 1st. That means the following things need to be done:

  • Tinderboxes must be green.
  • Everything in the test suite must JIT, sans oft-maligned features like E4X.
  • x64 and ARM must have ports.
  • All large-scale, invasive perf wins must be in place.
  • Integration with the tracing JIT must work, without degrading method JIT performance.

For more information, and who’s assigned to what, see our Path to Firefox 4 page.

Performance Wins Left

We’re generating pretty good machine code at this point, so our remaining performance wins fall into two categories. The first is driving down the inefficiencies in the SpiderMonkey runtime. The second is identifying places we can eliminate use of the runtime, by generating specialized JIT code.

Perhaps the most important is making function calls fast. Right now we’re seeing JM’s function calls being upwards of 10X slower than the competition. Its problems fall into both categories, and it’s a large project that will take multiple people over the next three months. Luke Wagner and Chris Leary are on the case already.

Lots of people on the JS team are now tackling other areas of runtime performance. Chris Leary has ported WebKit’s regular expression compiler. Brian Hackett and Paul Biggar are measuring and tackling what they find – so far lots of object allocation inefficiencies. Jason Orendorff, Andreas Gal, Gregor Wagner, and Blake Kaplan are working on Compartments (GC performance). Brendan is giving us awesome new changes to object layouts. Intern Alan Pierce is finding and fixing string inefficiencies.

During this home stretch, the JM folks are going to try and blog about progress and milestones much more frequently.

Are We Fast Yet Improvements

Sort of old news, but Michael Clackler got us fancy new hovering perf deltas on arewefastyet.com. wx24 gave us the XHTML compliant layout that looks way better (though, I’ve probably ruined compliance by now).

We’ve also got a makeshift page for individual test breakdowns now. It’s nice to see that JM is beating everyone on at least *one* benchmark (nsieve-bits).

Summit Slides

They’re here. Special thanks to Dave Mandelin for coaching me through this.

Conclusion

Phew! We’ve made a ton of progress, and a ton more is coming in the pipeline. I hope you’ll stay tuned.

Are We Fast Yet?

Seems like Are We Fast Yet has spread around a bit, so I’d like to talk about it.

It started as a joke, suggested by Dave Mandelin, in the JavaScript “pit” at Mozilla. I made a site that displayed a giant “NO”, mounted a monitor where everyone could see it, and put the site on full-screen display. After a few hours everyone thought it was kind of depressing, so I made the “NO” small and added graphs.

The intent was for us to track our progress and notice regressions quickly. The UI is super minimalist and the terminology is internal. So after it started spreading, I’ve been getting tons of e-mails about how it could be improved.

What don’t people like? The data points don’t tell you any information, and the x-axis isn’t labeled. I fixed the data points last night, and the labels will come next.

People have also asked where Opera and IE are. I’ve put up a FAQ page to answer that. In short, respectively 1) it tests shell/console JS, not in-browser, and 2) everything runs off a Mac. I want to fix both of these eventually, but don’t have spare cycles right now. Improving perf is higher priority.

What do we look for on AWFY? Our goal is for the gray line (Jäger Only) to be as fast or faster than the competition. This puts Jäger+Tracing (purple line) at a greater advantage. The gray line should be moving fastest as that’s where the biggest gains are right now.

Anyway, I hope you like it!

JaegerMonkey – Fast JavaScript, Always!

Mozilla’s JavaScript optimizer, TraceMonkey, is pretty powerful. It carefully observes loops and converts them to super-fast assembly. We call this “tracing”.

That’s great and all, but there’s a problem: sometimes tracing doesn’t work. Loops can throw curveballs that cause tracing to stop. Especially with recursion, or lots of nesting, it can be very difficult to build good traces on complex code.

Other JavaScript engines, such as Nitro (present in WebKit/Safari), take a simpler approach. Instead of compiling loops to assembly, they compile entire methods (functions) to assembly. The generated code is much more generic than tracing, so while it is not as fast, it can handle any curveball.

What we’ve found is that when tracing works, we’re faster than the generic approach. But when tracing fails, we have to fall back to our old-school interpreter. At that point your JavaScript runs about as fast as it would in 2007-2008 (i.e. before Firefox 3.5, Safari 4, Chrome, etc).

That’s not acceptable, and we need to fix that. Trace compilation is still an active area of research (one which we’ll continue to work on) – but in the interim, we need to make sure our “slow path” is at least as good as the competition.

The question we’ve been asked, and we’ve been asking of ourselves, is: Why couldn’t we trace and keep going SUPER AWESOME FAST, and when tracing fails, fall back to STILL REALLY FAST?

Enter JaegerMonkey.

Our new project, JaegerMonkey (or JägerMonkey), has exactly this in mind. We’re taking the tried-and-true approach of other vendors, and bolting trace compilation on top. Once the two are interacting seamlessly, you’ll have a much more consistent – and fast – JavaScript performance experience.

Dave Mandelin, Luke Wagner, Julian Seward and I have been sprinting the past few weeks to get something basic working. To emit actual machine code, we’re using some very pretty classes (“macro assembler”) from Nitro. That’s been a real treat; it’s well-abstracted and C++ish, and allowed us to get to work on the actual compiler very quickly.

Our compiler is simple so far. Before interpreting a method, we translate each bytecode into some pretty generic assembly. For example, an “ADD” opcode will emit assembly that can handle both fast cases (adding two numbers) and slow cases (adding, say, an object and a string).

Contrast this to tracing, where the types are known, and pinned, statically – it does not need to handle any extra cases that might come up. In the whole-method compiler, the generated code must handle all unexpected variations in control or type flow.

After the function is compiled we execute it right away – the interpreter is skipped entirely.

Early Progress.

We’ve barely started and the results are already really promising. Running SunSpider on my machine, the whole-method JIT is 30% faster than the interpreter on x86, and 45% faster on x64. This is with barely any optimization work! When we integrate tracing next week, we’ll already start to see the benefits of both working together.

For a more in-depth study, Dave Mandelin has blogged about our early performance gains, what’s done, up-and-coming, etc.

As we move forward, the two compilers will be tightly integrated. The method compiler will be able to identify loops and invoke the trace compiler. The trace compiler, if it decides a method is too complex to inline, may decide to invoke the method compiler.

The future of SpiderMonkey is bright and shiny, and we’ll be talking more about the project as it reaches major milestones.

In the meantime, if you are interested in learning more, I invite you to look at JaegerMonkey on the Mozilla wiki, and our makeshift source code repository. We also hang out in #jsapi on irc.mozilla.org.