Category Archives: Articles

“Official” articles that were pre-written for posting.

The Evolution of Building at AlliedModders

AlliedModders recently passed its ten year anniversary. That’s literally crazy. At some point I’d like to do a longer post on its history and the people who made it happen – but today I wanted to talk about the evolution of our build process. It’s quite complicated for your average gritty open-source project – complicated enough that we wrote multiple custom build systems. Right now we’re pretty close to state-of-the-art: AMBuild 2 is a general-purpose, optimal-by-design DAG solver. But we started off in the dregs, and it took 10 years of iteration to get where we are now.

Without further ado…

The Beginning

The first project that would become part of AlliedModders was AMX Mod X. Felix Geyer (SniperBeamer) recruited me to the team, and I provided hosting. It was January of 2004, and there were two compilers you could use: GCC 2.95, and Visual Studio 6.0. The project had a single binary built from about 20 .cpp files.

On Linux it used a single Makefile. On Windows you had to run the Visual Studio IDE. For about six months our release process was entirely hand-driven. Even as we added more binary components, I’d build each one by hand, sort them into packages by hand, drop them into SourceForge FTP, then one by one add each package to a release.

As components became more complicated, I eventually got frustrated by Make’s lack of simple programmatic control. If you’re unfamiliar with GNU Make, it has something resembling the worst scripting language ever made. I couldn’t bring myself to touch it, and in August 2004 created two Perl scripts:

Makefile.pl – A straight translation of the Makefile to Perl, but with more programmatic control over how and what gets built.
package.pl (which I no longer have) – after building binaries by hand, this script would build packages and upload them to SourceForge.

Over the next year, AMX Mod X grew rapidly. The development team and community were very active, and the Half-Life 1 modding scene was at its peak. By July 2005, AMX Mod X had over 20 binary components and supported three platforms: Windows, Linux i386, and Linux AMD64. Releasing by hand took hours, and maintaining 20+ random copies of a Perl script was really difficult. Something had to change.

Fixed Pipeline Era

The first step was ditching Makefile.pl. I replaced it with a much cleaner Makefile. Most of the complexity was stripped away and reduced to a few variables and filters.

But I was still pretty new to software engineering, and made a major mistake – I didn’t separate the guts of the Makefile into something reusable. As a result the whole thing got copied and pasted to every component, and to components in other projects. To this day about 50 variations of that Makefile are scattered around, though it’s unclear whether anyone still uses them. Up until 2009 we were still syncing changes across them by hand.

The next step was automating all the component builds and packaging. AMX Mod X had weird packages. There was a “base” package, and then optional addon packages to supplement it. The steps for each package were similar, but they all built different sets of components. The result: I ended up writing something we affectionately called “the C# tool”: AMXXRelease. It had three pieces:

A general pipeline for constructing an AMX Mod X package.
An abstraction layer that decided whether to build via GNU Make or MSBuild.
Classes for each package, usually just supplying filenames but sometimes adding custom steps to the pipeline.

The C# tool survived many years. Though it was a hard-coded tool written in a verbose, rigid framework, it got the job done. AMX Mod X continued to use it until 2014, when we finally ditched it as part of a transition to GitHub.

Interlude: SourceMod

In 2006, Borja Ferrer (faluco) and I secretly begun work on the next major AlliedModders project: SourceMod. SourceMod was a rewrite of AMX Mod X for the Half-Life 2 Source engine. By 2007 it was nearing ready to go public as beta, so I ported the C# tool. Everything was peachy. Until Half-Life Episode 1 came out.

At the time, it wasn’t conceivable to us that Valve would maintain multiple forks of the Source engine. Half-Life 1 was all we knew. Valve had never changed Half-Life 1 in a way that couldn’t be fudged with some detection and binary hacks. But Source was different – it was vast and complex compared to its predecessor. Valve began changing it frequently – first with Episode 1 – and then again with the Orange Box and Team Fortress. Now there were three versions of the Source engine, all totally binary incompatible. The same code didn’t even build across their SDKs, and linking was too complex for something like a “universal Source binary”.

Suddenly our Makefile and Visual Studio project files were getting ridiculously hairy. Not only did SourceMod have a dozen binary components, but it had to build each one multiple times with different parameters, across two different platforms. By mid-2009 we realized this was crazy and started looking at other options.

AMBuild 1: Alpha-Gen Building

I evaluated two build systems for SourceMod: CMake and SCons. CMake was a contender for “worst scripting language ever”. After a lot of futzing I concluded it was impossible to replicate SourceMod’s build, and even if I could, I didn’t want to see what it would look like. SCons came much closer, but it didn’t have enough flexibility. Source binaries required a precise linking order, and at the time SCons had too much automagic behavior in the way.

We needed something where we could template a single component and repeat its build over and over with small changes. We needed reasonably accurate minimal builds. We needed platform-independent build scripts, but also tight control over how the compiler was invoked. So in August 2009, I dove into Python and wrote our own general-purpose build tool, dubbed “AMBuild 1”.

I would describe AMBuild 1 as “rough”. It was definitely extremely flexible, but it was buggy and slow. Each unit of work was encapsulated as a “job”, but jobs did not form dependencies. It had some hardcoded, special logic for C++ to support minimal rebuilds, but since it was special logic, it broke a lot, and because it was recursive in nature (like Make), it was very slow. Since jobs didn’t form dependencies, they couldn’t be parallelized either. Everything ran sequentially. The API was also pretty grody. The root build file for SourceMod was somewhat of a nightmare.

But… it worked! Suddenly, supporting a new Source engine was just a matter of adding a new line in its central build file. It worked so well we began migrating all of our projects to AMBuild.

Over time, as always, things began to break down. By 2013 there were twenty versions of Source, and AMBuild 1 was nigh unusable. Minimal rebuilding was inaccurate; sometimes it rebuilt the entire project for no reason. Sometimes it produced a corrupt build. Builds were also EXTREMELY slow. A complete Windows build could took two hours, and multiple cores did nothing. There were so many headers in the Source SDK, and so many file scans for dependency checks, that merely computing the job set for an update could take 15 seconds.

AMBuild 2: Modern Era

Sometime in 2013 I found Tup and was immediately inspired. Tup “gets it”. Tup knows what’s going on. Tup knows what it is. Tup is what build systems should aspire to be. In short, it has two philosophies:

Builds should be 100% accurate. Not 99% accurate, or 37% accurate, or “accurate if you clean or clobber”. They should be 100% accurate all the time.
Builds should be fast, because waiting for builds is a waste of time.

The design of Tup is simple on the surface. Rather than recursively walking the dependency graph and comparing timestamps, it finds a root set of changed files, and then propagates a dirty bit throughout the graph. This new “partial graph” is what has to be built. In practice, this is much faster. In addition, Tup caches the graph in between builds, so if your build files change, it can compute which parts of the graph changed.

Well, that’s awesome. But… I didn’t want to take on Tup as a dependency for AlliedModders. I have a few reasons for this, and I’m still not sure whether they’re valid:

Tup requires file system hooks (like FUSE) and I was worried about its portability.
Tup is a C project, and not yet in Linux distros, so we’d have to tell our users to install or build it.
AMBuild’s API, and of course Python, together make for a much better front-end than Lua.
Tup didn’t seem to include external resources (for example, system headers/libraries or external SDKs) in its graph.

So I sat down and began rewriting AMBuild with Tup’s principles in mind. The result was AMBuild 2. With a real dependency graph in place, the core of the build algorithm became much simpler, and parellelizing tasks was easy. The API got a massive overhaul and cleanup. It took about two months to work all the kinks out, but in the end, my SourceMod builds took about a minute locally instead of 15. And with some smarter #include’ing, our automated builds went from 45 minutes to 2 minutes. The build scripts don’t look too bad, either.

What’s next?

AMBuild 2 isn’t perfect. Its biggest problems are scalability and internal complexity. I’m a huge fan of Python, and it’s nearly everywhere now. But it’s slow. Dog slow. AMBuild 2 would not scale to a codebase the size of Mozilla or Chromium. In addition, Python’s concurrency support is very bad, even taking into account the multiprocessing module. Most of AMBuild’s complexity is in its IPC implementation.

Early on in its development a common suggestion I got was to separate AMBuild into a front-end (the build API) and back-end (the build algorithm). I heard this theory float around for Mozilla’s build system too. CMake is structured this way. Separating the components out means they are replaceable: AMBuild 2’s frontend could generate Makefiles instead of an AMB2 DAG, or it could read Makefiles and parse them into a DAG.

I eventually concluded that didn’t make much sense. The goal of AMBuild 2 is not to provide a general backend or general frontend: it’s to produce correct builds as quickly as possible. In fact, it considers most other build systems to be broken and inferior. Taking an AMBuild project and generating Makefiles defeats its entire purpose.

However, there is something very useful that comes out of this separation, which is the ability to generate IDE project files for casual use or local development. Keeping this in mind, AMBuild 2 did separate its front-end from its back-end, but the API makes too many assumptions that don’t have clean mappings to IDE files. As we experiment with Visual Studio and XCode generators, we’ll probably have to iterate on the API.

Lastly, it’s worth nothing that part of what AMBuild solves is papering over an early design flaw in SourceMod: with proper abstractions, it wouldn’t need 20 different builds. In fact, over time we have moved a large amount of code into single-build components. Unfortunately, due to the complexity of what’s left and the Source SDK in general, it would be nearly impossible to completely eliminate the multiple-build steps.

There are very few projects I’m truly proud of, where I look back and have very few regrets. (One is IonMonkey, but that’s a story for another day.) AMBuild is one of them. It’s not particularly stellar or original, but we needed to solve a difficult problem, and it does the job very well. I wouldn’t be surprised if it takes another 10 years of iteration to make it perfect – but that seems to be how things work.

Making a Card Game Prototype

Recently I wanted to try quickly prototyping a few card game ideas, but I couldn’t find much information on how to actually do it. After a few iterations I figured I’d jot down what I ended up doing.

My first attempt was pretty laborious. The process went something like this:

Create a sheet of card frames in PaintBrush.
Paste in clipart and text.
Print about 5-10 copies of the sheet.
Cut along the outer borders.
Shove the paper into card sleeves.
Throw Magic: the Gathering cards in the sleeves for support.

Well, this worked, but it had a lot of problems. Any change in the card frame meant redoing all the cards. The text was really hard to align. I used actual scissors instead of a cutting board, so the edges were sloppy and my hands got super tired. Sleeving everything took forever.

But! After years and years of drafting Magic: the Gathering, I’ve got boxes and boxes of cards sitting in the closet. The vast majority are either jank (too weak to use even in casual decks), or duplicates of very common cards. I’d feel bad throwing them out, but all they do is take up space.

Enter Avery Adhesive Name Badges. It turns out these are 2.33×3.38in – Magic cards are 2.5×3.43in. These stickers fit cleanly within the black border of modern Magic cards! Now it’s just a matter of killing two birds with one stone: printing the prototype cards and using up worthless Magic cards in the process.

The first thing I did was ditch PaintBrush for Adobe InDesign. I don’t know much about graphic design, but it’s not too hard to use. I made a simple card frame template for each card type (sample below). This one has five blank areas: Title in the upper-left, Cost in the upper-right, Image at the top, Rules Text at the bottom, and Points at the bottom-left.

The next step is generating a CSV file with the card designs on them. InDesign will use this file to generate a finished card for each row. For example:

Title,Cost,Points,@Image,Text
"Hamburger",5,+3,"C:\cards\hamburger.png","Hamburger is delicious."
"Yams",3,+2,"C:\cards\yams.jpg","If you currently smell yams, you get an extra 2 points."

Next, InDesign’s “Data Merge” tool comes in. This lets you associate each CSV column with a container in the document. After doing that, and clicking the Merge icon, we get finished cards:

Now that we’ve got cards, the next step is actually getting them to the printer. So far I’ve toyed with a few ways of doing this, but the fastest approach I’ve got is to use InDesign’s Data Merge feature again. Avery doesn’t provide an InDesign template for their sticker sheets, so I got out a ruler and measured out my own. It’s not perfect but it’s accurate enough for prototyping. I then export the card images, and make a new CSV file. This time each column represents a slot on the sticker sheet. Since I’ve only got two cards, the sheet will have duplicates:

@Slot1,@Slot2,@Slot3,@Slot4,@Slot5,@Slot6,@Slot7,@Slot8
"C:\cards\card-1.png","C:\cards\card-1.png","C:\cards\card-1.png","C:\cards\card-1.png","C:\cards\card-2.png","C:\cards\card-2.png","C:\cards\card-2.png","C:\cards\card-2.png"

Before the data merge, and after:

Now the document can be directly printed onto the sticker sheet. Since the sheets aren’t very cheap, I do a sample run first and make sure they line up. If it looks good… print and apply! Magic jank card pictured below, and the final product. The cards are only slightly thicker than Magic cards. The stickers withstand shuffling, though they have more friction than the Magic card surfaces.

This process works very well for proxying actual Magic cards too, or making your own Magic cards. I’m reluctant to show actual images of this – despite it being obviously not a counterfeiting attempt (you can tell there’s a huge sticker applied), I don’t want Wizards of the Coast suing me. But with some coding, my friend and I were able to build a draft set for Limited Edition Beta. Two sheets gets you 16 cards, so we stuck 1 rare, 3 uncommons, and 9 uncommons, randomly chosen, onto two sheets at a time. Apply the stickers onto other Magic cards, and you’ve got a booster pack of proxies for drafting cards that for all intents and purposes are unobtainable. We haven’t done the actual draft yet – but I’m guessing it’ll be pretty rough compared to modern sets.

How did the actual prototyped game turn out? Tragically, it didn’t involve hamburgers or yams. More importantly, I forgot to include a win condition. Eventually we realized there was no way to end the game so we stopped playing. But hey, that’s what playtesting is for.

Inheritance Without Types

At AlliedModders we dabble in practical language design. Our two major scripting projects, AMX Mod X and SourceMod, iterated on our own fork of a scrappy language called Pawn. Our next iteration is untyped, and we’re adding object support.

What does object support mean? “Well, obviously,” I naively thought, “it means Java-style implementation inheritance!” Whoops. WRONG. It turns out there’s a ton of complexity with inheritance once you take away types. This post is a brief survey of some popular untyped languages and what we ended up deciding.

The Goal
We want our language to have as few pitfalls as possible. I’m not sure if I stole this from Graydon Hoare, but our motto is “No surprises!” We want people to be able to write code that works the way they expect, minimizing random or unforeseen run-time failures.

A great aspect of C++’s inheritance is that information hiding is super easy. A base class’s behavior isn’t trampled by derived classes. For example:

Select All Code:

As you’d expect, this prints ‘5’. The fact that Derived also declares x does not trample on Base‘s own behavior. Now let’s take an untyped language, Python 3:

Select All Code:

This prints ’20’. In Python, “self” is really one object, with one set of properties and methods, whereas in the C++ model “this” is conceptually two objects, and “this” is statically typed to be the object of the containing class. I consider the Python model unappealing for two reasons:

When adding a member to a derived class, you have to know that you’re not colliding with a member of your base class.
When adding a member to a base class, you might be breaking any number of unknown consumers.

JavaScript’s prototype-based inheritance discourages this style, but JS is super flexible, and if you don’t mind a very, very inefficient prototype chain, you can do this:

Select All Code:

This prints “5”.

PHP’s Answer
For whatever reason, PHP takes Java’s model wholesale. Here’s an example:

Select All Code:

This prints ‘5’, which is what we want – and makes sense. But is $this statically typed to the Base class, like in C++? No:

Select All Code:

This prints "hello", so PHP’s $this retains some dynamicism. Let’s up the ante. What should this do?

Select All Code:

This prints true in both cases! For any property access in a PHP class function, if the object has that class on its inheritance chain, it uses the property on that class. Otherwise, it searches from the derived-most class back to the base class like normal. This implicitly hides variables on the derived object. Nonetheless it’s the right choice given the model, especially considering that it’s usually bad practice for a base class’s behavior to explicitly cast to one of its derived class.

Note that the fact that the inner x is private is actually irrelevant. Even if it were public, the base class should retain its behavior. Banning redeclaration works, though then you run the risk of potentially preventing an unknown consumer from compiling (albeit, better than silently being wrong). Similar issues occur with introducing new virtual functions or overloads.

So, what did we do?
After letting this all sink in, I decided to scrap inheritance as a feature of our next language iteration. Having objects alone will be a huge step forward, and we can evaluate use cases from there. I’m satisfied with this for a few reasons.

Inheritance is really complicated with types, and even more complicated without. And in an untyped language, it’s not even clear if implementation inheritance is useful. PHP’s interfaces and abstract classes seem verbose and heavyweight in comparison to JavaScript or Python, in return for a very small amount of static guarantees.

So far Pawn and our derivatives have mainly been embedded in games, and the majority of those games are based on the Half-Life engines (including Source). We want our object model to be compatible with the game’s internal object model, however, I’m not convinced that copying it would be ideal. The Source engine has the “Big-Ass Base Entity Class from which Everything Derives,” a painful and complex architecture continuing Source’s theme of not being scalable. Do we really want to encourage that?

I suspect we’ll end up with something like traits and/or composition, but for now, I’m content with keeping it simple and continuing to evaluate the strengths of other languages’ models.

(Addendum: I don’t much about Perl or Ruby. I tried quickly evaluating them for this post but the syntax was a little intimidating. I’d really appreciate insight into other languages.)

Addendum 2: SaberUK contributes this Ruby example which prints 20 – exhibiting similar behavior to Python.

Select All Code:

On Indonesia and Being Justin Bieber

Two weeks ago, a bunch of Mozillans went on a whirlwind tour of Indonesia, visiting Firefox 4 release parties around the country. We had a great time, and I found the trip really enlightening. Luke Wagner and Christian Legnitto handled the first four days of the trip, and then Dave Mandelin, Josh Aas and I did the rest.

Dave and I started off in San Francisco, making a 22-hour journey to Jakarta, with a stop in Taipei. We brought along extra suitcases full of Mozilla swag to give away: by my count about eight-hundred billion elastic headbands. It must have looked like a ton of cocaine when we put our luggage through the X-ray machine at customs. The security guards opened the suitcases up and looked perplexed, but let us go after I donned a headband for demonstration.

After immigration we met community members Viking Karwur (@vikingkarwur), Yofie Setiawan (@yofiesetiawan), and Arry T, the owner of a local private school. Arry took us to a restaurant where you order by walking around rows and rows of open coolers, filled with random frozen fish, live crabs, mussels, et cetera. We paced around with an empty cooler and grabbed what seemed like one of everything. (Mr. Burns: “I’ll just have a glass of milk… from THAT cow.”) Lo and behold about thirty minutes later it all arrived at our table fully cooked, and tasted great.

The next day we got up early and flew to Surabaya with Yofie (he was excited, it was his first time flying). The Surabaya venue took place in a shopping mall, with maybe 100-200 people showing up. There was a talk about web design (spoken entirely in Indonesian, but the slides were in good English), another from Yofie about, I think, Mozilla Indonesia, and then Dave, Josh, and I each presented a little bit about Firefox 4, concluding with JSNES and Flight of the Navigator demos. Then there was a kind of ribbon-cutting ceremony which involved a giant mound of rice. Each of us had to take a spoonful off the top without dropping it, which was surprisingly tricky.

What really surprised me was how much everyone liked Firefox – it has a huge market share in Indonesia, something like 77% – and how people were really excited that we came to visit. I think we spent a full thirty minutes posing for photographs at each event. As a C++ developer you don’t get treated like a rockstar too often, so that was pretty awesome (albeit exhausting). Afterward we got some local food, but I was so tired I don’t remember what it was.

The next day we drove three hours to Malang, which was actually a good chance to get a look at the country. The venue was at the polytechnic university, where we were treated to a plaque and boxes of local snacks. The students there had great questions. Recurring ones were: why does Firefox use so much memory? (we’re working on it), and when will it get a Blackberry port? (likely, never, but it’s a popular phone there). After Q&A we each talked a bit about why we like working for Mozilla, and Dave cut a ribbon which released a big balloon. A few students were interested in how to start a career at Mozilla, I wasn’t quite sure what to say but the internship program and just community involvement seem like great vectors. Unfortunately, Dave got pretty sick on the ride back to the hotel, so he was down for the next day.

At the Makassar event we temporarily lost Yofie as our babysitter, instead being placed in the capable hands of Rara (@rara79) and Mamie (@alwaysmamie). The event was in a café, and a structured a little more rigorously than the others – most questions were in Indonesian and translated by the MC. The audience seemed to be lots of bloggers and web developers in the community. Questions seemed to fall into four main categories: Memory use, support for Java phones, UI changes and support issues, and wondering where the money comes from. All good questions! Afterward, we went to Pizza Hut (of all places – it was actually pretty fancy), and got cheese-stuffed crust with cornflakes.

Finally, on the last day, we flew to Bali and met up with Viking and Yofie again. Josh and I relaxed at the resort pool which was occupied mostly by Australians. In the evening we went to a local café for a smaller venue, maybe a dozen people, with a local radio host as MC. At our table sat an American expatriate, Ken McClellan, CTO of Mitrais who talked about what it’s like living and running a company in Indonesia.

Indonesia is very clearly a developing country. Foreigners are advised not to drink the tap water, and to be careful about uncooked food, as they won’t have the same resistances as locals. In cities you can see a striking mix of poorer and middle-class areas, and the pollution near congested roads can be very noticeable, at least if you’re asthmatic like me. Traffic is amusingly chaotic – turn signals, stop lights (there are very few), and speed limits are all optional. When you’re in a car, honking is nearly constant as an advisory measure, since there are no clearly defined lanes, and the majority of vehicles on the road are motor scooters. Jakarta even had motorized rickshaws, most were very old, adding to the anachronistic mix of old and new technology. It seems like despite this, traffic was actually pretty reasonable, though as a Californian used to big open roads with camera-enforced stop lights every ten feet, I’d be too nervous to try driving.

The national airline, Garuda, was much higher quality than any domestic airline in the States. It kind of reminded me of what flying was like before 9/11, or when American airlines weren’t terrible. I guess what sums this paragraph up is that the flight seat pockets had pamphlets advertising “Investing in Indonesia.” I’m really curious as to what the country will look like in 20-30 years, because it seems like it’s growing rapidly.

All in all, it was a great trip, and I’d definitely go back. It helped having a local guide; Yofie, Viking, and others’ help was really appreciated. The food was great (Nasi Goreng anything, Oxtail Soup were my favorites), and cheap (coming from USD), and everyone we met was friendly – even complete strangers walking around. It was awesome seeing everyone so excited about Firefox, and whatever we’re doing that makes Firefox so popular in Indonesia, we should figure it out and do it more!

I am not Justin Bieber.

I would be remiss if I didn’t conclude by mentioning how people thought I was Justin Bieber. It was so persistent that Josh started calling me Justin. Near the end of the trip, swimming in Bali, he said that when my hair was slicked back it was “way more manly,” so I tried that for a day, but decided I couldn’t afford the daily metric ton of hairgel. Incidents of mistaken Bieber identity:

A few times in person at each event, including this tweet.
People were whispering it at the airport in Makassar.
Walking around Makassar, people rolled down the windows of their cars and called out at me and Josh.
Running into a group of loitering teens, they asked “who I was” and burst out giggling when I shrugged.
An airport employee in Makassar waved to me from behind a booth and yelled “Justin Bieber”! – I had to wave back.

Well, now that my pop star career is over, I’ve decided to go into dynamic language optimization.

Debugging 500 Errors

For the past two years, the AlliedModders site has been randomly dishing out “HTTP 500 – Internal Server Error” responses. It’s been pretty frustrating.

I actually tried debugging the errors two years ago. I got far enough to attach GDB to PHP, and saw it blocked on mysql_query(), but didn’t know how to proceed. We knew that during these incidents, the server load would spike to obscene levels – like 100-150. Lots of PHP processes were getting spawned. “iowait” time was through the roof, and CPU usage was normal. Something couldn’t read the disk, or the disk was getting hammered. I put MySQL on a separate hard drive, but the problems persisted and I gave up. Keep in mind: I didn’t know anything about running servers, and I still don’t – I just have a vague notion of how to debug things.

Fast forward to a few weeks ago, we had a serious hard drive crash and suffered some painful downtime. We lost a lot of data. I clung to a hope that our years of errors had been caused by a failing disk. Predictably, we weren’t so lucky, and the website problems continued. Someone joked, “Hey, I’m glad to see you got the server back up, even the 500 errors!” Yeah… it was time to fix this. Somehow.

Step One: Apache Logs

There were only two notable Apache errors. They’d appear over and over, but only in large clumps:

[Tue Jan 25 04:15:19 2011] [warn] [client xx.xxx.xxx.xx] (110)Connection timed out: mod_fcgid: ap_pass_brigade failed in handle_request function [Tue Jan 25 04:15:25 2011] [warn] [client xxx.xxx.xx.xx] mod_fcgid: read data timeout in 60 seconds

It was pretty clear that somehow, communication was breaking down between mod_fcgid (Apache) and PHP processes. I read, I think, every single webpage on the Internet involving these two errors. Twice. They were all dead ends, both for us and both for the person reporting their own problems. A little sad. I wanted to blame mod_fcgid, since historically it has been really buggy for us, but I wanted to pin absolute blame.

So the next step was to break out a debugger.

Step Two: Debugging PHP, Apache

AM developer Fyren edited the mod_fcgid source code, such that instead of killing PHP, it would attach gdb in a separate tty. It usually took around three hours to get a debugging session going. The call stacks looked like:

(gdb) bt ... #4 0x00007f1f7cad804f in net_write_command () from /usr/lib/libmysqlclient.so.15 #5 0x00007f1f7cad4c51 in cli_advanced_command () from /usr/lib/libmysqlclient.so.15 #6 0x00007f1f7cad1751 in mysql_send_query () from /usr/lib/libmysqlclient.so.15 #7 0x00007f1f7cad17b9 in mysql_real_query () from /usr/lib/libmysqlclient.so.15 ... (gdb) fin Run till exit from #0 0x00007f1f7cad17b9 in mysql_real_query () from /usr/lib/libmysqlclient.so.15

At this point, the session would hang, indicating that MySQL was not responding. I really had no idea what to do, but we got lucky. We noticed during these incidents, Apache’s server-status was stuck at “200 requests being processed,” all in the “W” (“Sending reply”) state. Our MaxClients setting was exactly 200. Now we had a hypothesis: MySQL was intermittently not responding for short bursts of time. This blocked PHP, which in turn made mod_fcgid block and timeout, causing Apache to spawn a bunch of workers that would lock up as well.

Unfortunately, the intermittent failures only happened a few times a day, so debugging went slow. We managed to catch one, and ran iostat:

Device:          r/s       w/s   rsec/s    wsec/s    avgrq-sz  avgqu-sz   await  svctm  %util
sdc              320.80    4.40  3540.80    54.40    11.06     4.32   13.01   3.08 100.00
sdc1             320.80    4.40  3540.80    54.40    11.06     4.32   13.01   3.08 100.00

Someone was reading tons of data off the disk, and iotop confirmed MySQL was the culprit.

Diagnosing MySQL

I managed to find this enlightening presentation by Percona. Case study #3 looked identical to ours, except in their case, MySQL was writing. But their conclusion about bad queries seemed plausible. I made a script that would peek at server-status every second, log a ton of data, and e-mail me if it went above 100 concurrent requests. The data included oprofiling, iotop spew, SHOW FULL PROCESSLIST, SHOW OPEN TABLES, SHOW GLOBAL STATUS, etc.

Oprofile’s report wasn’t too interesting, but the MySQL status showed around 150 queries waiting for a table lock. The active queries, in all cases, were SELECTs. This explained the reads, but not the table locks. This didn’t make sense, so I generated a few core files of MySQL from within GDB, then Fyren and I did some post-mortem analysis.

The analysis revealed that MySQL had about 200 threads, most of which were waiting on locks (pthread_cond_timedwait) with a timeout of one year. The locks were almost all read locks for SELECTs, and the actually succeeding queries were exclusively SELECTs. Fyren tried to track down the thread owning the locks, but it ended up being apparently circular (thread X’s lock was held by thread Y, and thread Y was waiting on a lock held by thread Y).

That was all pretty mysterious (and still is), but rather than try to understand exactly what locks are in MyISAM tables, I took a look at the non-deadlocked queries. They were all one of: the AMX Mod X plugin search, the SourceMod plugin search, or the vBulletin member list search.

It turns out all three of these queries were extremely expensive – containing ORs, joins, full table scans of 800MB tables, etc. On average they each took about 500ms to complete. That’s without MySQL freaking out. I turned off vBulletin member searching (most large boards do), and replaced the plugin searches with faster queries off a much a smaller table. This brought those page load times down to 20ms. Twelve hours later, no 500 errors had occurred, and there were no mod_fcgid failures in Apache’s logs.

Too Early to Declare Victory

While waiting to see if any more 500 errors would show up, suddenly the server load spiked, and lo and behold – mod_fcgid became overloaded. Our monitoring script kicked in, and this time, MySQL was nowhere in the profile! It turned out some forum users were effectively DoS’ing the site by holding down F5 on their profile pages to bump up the “view” count. OProfile suggested that 20% of CPU time, during this incident, went to PHP parsing and lexing source code. I installed eAccelerator and now that time appears to be gone. (And for good measure, I removed profile view counts.)

Growth and Traffic

It’s unclear what our load and traffic really is. I think we’re pretty good though. CPU utilization is usually under 20%, disk I/O is under 10%, and we rarely see more than 30 concurrent requests. We don’t need to invest in a separate MySQL server yet, and we definitely aren’t growing fast enough to expect to need one anytime soon.

Community Involvement

I would like to thank Fyren for his help in looking into these problems, and asherkin for helping beef up our non-existent monitoring. Also, thanks to AzuiSleet, Drunken_Fool, MatthiasVance, and devicenull for sticking around in IRC and answering random questions.

Diagnose, then Solve

Many people, well-intentioned, said “switch to lighttpd, nginx, IIS, use my favorite host, use a VPS, buy more servers,” etc. I got caught up in this too, and fueled it by initially blaming mod_fcgid. It’s really important to make sure a solution has targeted a specific problem, and I had to keep telling myself to keep a cool head. Sometimes I had to slap myself or wait for Fyren to slap me. They all might be good ideas in their own right, but without diagnosing the actual cause of the errors, there was no guarantee of a fix. By careful debugging (albeit in painful, hour-per-day sprees), we got pretty close to the root cause, whereas throwing our infrastructure into a salad bowl would have delayed the inevitable.

Part of the reason our original drive is still at the recovery lab is because someone didn’t diagnose before “solving.” When it failed, a technician (not one of us) ran fsck on the disk instead of doing a hardware diagnostic first. I have no doubt this caused untold extra damage.

Conclusion

tl;dr, the forums should be much faster now, and it shouldn’t error anymore. I’ve got the system set up to e-mail me and dump diagnostics when our load goes way above normal, so I’ll continue monitoring for potential problems.

I would like to sincerely thank the whole AlliedModders community for the successful donation drive. I won’t lie, the past month has been pretty sucky. Despite our failings, the fact that everyone cares enough to see the whole thing keep going is really awesome. Our seven year anniversary was just a few weeks ago. Here’s to an AlliedModders for 2011!

JavaScript Checkers

I’ve been itching to write some JavaScript, so a few days ago I threw together a Checkers game that uses only HTML5 and JS. It’s got a simple AI that terminates after three seconds. The faster your browser can run JavaScript, the smarter the AI will be.

For example, here is Firefox 4 Beta (blue) versus Firefox 3 (red) – it happens that Firefox 4 is roughly 10X faster at this program, and soundly defeated its predecessor:

The algorithm for the AI is UCT, a form of Monte-Carlo Tree Search. The idea is to estimate the likelihood of winning from a position by simulating hundreds or thousands of random games. UCT helps prune the search space by quickly eliminating bad positions.

While writing the source for this, I tried to turn off the part of my brain that said, “Hey! I know Firefox version x.y.z might be slow or fast at feature X.” That turned out to be okay, but it was harder to avoid general knowledge about JavaScript engines. I ended up with three variations:

Fast Checkers, which has manual inlining, manual strength reduction, hardcoded constants, and other gross stuff. Pieces and positions are represented via packed integers.
Slow Checkers, which removes manual inlining, strength reduction, and baked-in constants. Here, the additional overhead is extra memory accesses and function calls.
OO Checkers, which is the same as “slow”, except represents pieces as objects instead of packed integers. Here, an additional overhead is object allocation.

Performance seems uniform across most browsers. Below I’ve recorded the number of games each browser could simulate per second. Higher is better. Note – this chart is totally unscientific, and random simulations are affected by the behavior of Math.random().

	Fast Checkers	Slow Checkers	OO Checkers
Firefox 4 Beta	14035	9018	9100
IE9 PP6	14579	8234	8067
Opera 11 Alpha	13165	8178	8749
Safari 5	12442	8045	8700
Chrome 9 Canary	4160	2060	2343

And – because why not – I ran them on my Droid 2 as well.

	Fast Checkers	Slow Checkers	OO Checkers
Fennec 2b3pre	338	170	220
Android Browser	185	93	114
Opera Mobile	166	112	126

Since I’m pretty bad at web development, and don’t write JavaScript (sans test-cases) nearly as much as I should, this was an amusing experience. I kept making some really dumb mistakes, one repeatedly:

Select All Code:

Game.prototype.player = function () {
    return this.board.player;
}
...
var player = game.player;
if (player == x) { ...

And wondering why “player” showed as a function in the developer console. I probably should have used ES5 getters. A few other language features would have made the experience a little nicer – but nothing so much as real modules. I tried to emulate good encapsulation with closures, but it’s just not the same. And it doesn’t seem like any engine is smart enough yet to propagate constants through closures (which is one difference between the “fast” and “slow” variants).

Using developer tools for the first time was also an interesting experience. Firefox 4 and Chrome can debug code with a reasonable slow-down, but IE9 became over 100X slower; presumably it cannot debug with the JIT yet. I used Firebug until I needed single-stepping (we didn’t have that working in JägerMonkey for Beta 7), and then bounced over to Chrome – both proved really invaluable. I think my days of calling alert() are over.

Anyway, it was fun writing this, and I’m glad that I can write something computationally expensive and have it be fast everywhere. If and when time permits I may try a more stimulating game like Go.

Land Ho, Fast JavaScript!

Firefox just got a whole lot faster.

I’m excited to announce that Mozilla’s new JavaScript engine, JägerMonkey, is now available for testing!

What is JägerMonkey?

JägerMonkey is our new optimizing JIT compiler for JavaScript. It sits underneath our existing JIT, TraceMonkey, which appeared in Firefox 3.5. If you recall from previous posts, TraceMonkey’s job is to optimize loops to very fast machine code. However, not all code has loops, and not all loops can be trace compiled.

JägerMonkey is a general-purpose compiler which converts entire methods to machine code. The goal is to get great baseline performance. When it detects a loop that can be traced, it automatically engages the trace compiler, which makes it even faster. Yes, that’s right: there’s a turbo button inside.

This hybrid approach is designed to use well-established optimization techniques that work everywhere, and combine them with our existing hyper-optimizing engine that handles smaller subsets of code.

Results.

If you’ve been obsessing over Are We Fast Yet? like me, you’ve seen the numbers dive. Want to try it out? Click here to get preview builds of Firefox 4 with our new JavaScript engine. You can play demos like JSNES at a full, glorious 60FPS.

Disclaimer: It’s a preview – we’re still ironing out the rare kinks. Please report bugs or tell us if something’s wrong (or slow!)

Benchmarks.

We’ve been using the SunSpider 0.9.1 and V8-v5 benchmarks to gauge our general progress. SunSpider is a full 2X faster over Firefox 3.6!

Our improvement on the V8-v5 benchmark is even more dramatic – 4X!

Ongoing Work.

The rockin’ doesn’t stop here. Right now we’re polishing off the final pieces to get into the next Firefox 4.0 Beta. At the same time, here are some of the immediate performance works-in-progress:

Function Calls. As discussed previously, this is one of our last big areas of optimization. The first of four major pieces, caching call sequences, was completed this week. The second big chunk, which Luke Wagner has slated for this week, will make arguments and stack frames faster. Brian Hackett, Chris Leary, and Bill McCloskey have more stack frame optimizations as part of the third wave.
Tracer Integration. Deciding when to use the turbo button is pretty hard, but Bill and Dave have been researching it thoroughly. Right now we’re just scratching the surface, and we will have much better heuristics by the end of the month.
Web Optimizations. Community member Jan de Mooij is continually finding demos and real-world tools and improving performance “gotchas” in our engine, like making common arithmetic patterns faster.

Conclusions.

Firefox 4 is seeing dramatic wins over 3.6 and the web is feeling faster. You can try it out now using a JS Engine Preview, or wait for Firefox 4 Beta 6.

Please stay tuned as we approach JägerMonkey end-game for Firefox 4. Dave Mandelin and I will be blogging, and for smaller things, tweeting (his here) progress & technical updates.

JägerMonkey has Crossed the Streams

On July 12th, JägerMonkey officially crossed TraceMonkey on the v8 suite of benchmarks. Yay! It’s not by a lot, but this gap will continue to widen, and it’s an exciting milestone.

A lot’s happened over the past two months. You’ll have to excuse our blogging silence – we actually sprinted and rewrote JägerMonkey from scratch. Sounds crazy, huh? The progress has been great:

AWFY feed, v8-richards

The black line is the new method JIT, and the orange line is the tracing JIT. The original iteration of JägerMonkey (not pictured) was slightly faster than the pink line. We’ve recovered our original performance and more in significantly less time.

What Happened…

In early May, Dave Mandelin blogged about our half-way point. Around the same time, Luke Wagner finished the brunt of a massive overhaul of our value representation. The new scheme, “fat values”, uses a 64-bit encoding on all platforms.

We realized that retooling JägerMonkey would be a ton of work. Armed with the knowledge we’d learned, we brought up a whole new compiler over the next few weeks. By June we were ready to start optimizing again. “Prepare to throw one away”, indeed.

JägerMonkey has gotten a ton of new performance improvements and features since the reboot that were not present in the original compiler:

Local variables can now stay in registers (inside basic blocks).
Constants and type information propagate much better. We also do primitive type inference.
References to global variables and closures are now much faster, using more polymorphic inline caches.
There are many more fast-paths for common use patterns.
Intern Sean Stangl has made math much faster when floating-point numbers are involved – using the benefits of fat values.
Intern Andrew Drake has made our JIT’d code work with debuggers.

What about Tracer Integration?

This is a tough one to answer, and people are really curious! The bad news is we’re pretty curious too – we just don’t know what will happen yet. One thing is sure: if not carefully and properly tuned, the tracer will negatively dominate the method JIT’s performance.

The goal of JägerMonkey is to be as fast or faster than the competition, whether or not tracing is enabled. We have to integrate the two in a way that gives us a competitive edge. We didn’t do this in the first iteration, and it showed on the graphs.

This week I am going to do the simplest possible integration. From there we’ll tune heuristics as we go. Since this tuning can happen at any time, our focus will still be on method JIT performance. Similarly, it will be a while before an integrated line appears on Are We Fast Yet, to avoid distraction from the end goal.

The good news is, the two JITs win on different benchmarks. There will be a good intersection.

What’s Next?

The schedule is tight. Over the next six weeks, we’ll be polishing JägerMonkey in order to land by September 1st. That means the following things need to be done:

Tinderboxes must be green.
Everything in the test suite must JIT, sans oft-maligned features like E4X.
x64 and ARM must have ports.
All large-scale, invasive perf wins must be in place.
Integration with the tracing JIT must work, without degrading method JIT performance.

For more information, and who’s assigned to what, see our Path to Firefox 4 page.

Performance Wins Left

We’re generating pretty good machine code at this point, so our remaining performance wins fall into two categories. The first is driving down the inefficiencies in the SpiderMonkey runtime. The second is identifying places we can eliminate use of the runtime, by generating specialized JIT code.

Perhaps the most important is making function calls fast. Right now we’re seeing JM’s function calls being upwards of 10X slower than the competition. Its problems fall into both categories, and it’s a large project that will take multiple people over the next three months. Luke Wagner and Chris Leary are on the case already.

Lots of people on the JS team are now tackling other areas of runtime performance. Chris Leary has ported WebKit’s regular expression compiler. Brian Hackett and Paul Biggar are measuring and tackling what they find – so far lots of object allocation inefficiencies. Jason Orendorff, Andreas Gal, Gregor Wagner, and Blake Kaplan are working on Compartments (GC performance). Brendan is giving us awesome new changes to object layouts. Intern Alan Pierce is finding and fixing string inefficiencies.

During this home stretch, the JM folks are going to try and blog about progress and milestones much more frequently.

Are We Fast Yet Improvements

Sort of old news, but Michael Clackler got us fancy new hovering perf deltas on arewefastyet.com. wx24 gave us the XHTML compliant layout that looks way better (though, I’ve probably ruined compliance by now).

We’ve also got a makeshift page for individual test breakdowns now. It’s nice to see that JM is beating everyone on at least *one* benchmark (nsieve-bits).

Summit Slides

They’re here. Special thanks to Dave Mandelin for coaching me through this.

Conclusion

Phew! We’ve made a ton of progress, and a ton more is coming in the pipeline. I hope you’ll stay tuned.

Debugging the Impossible

A few days we released SourceMod 1.3 and suddenly got a few reports of crashing when clients disconnect. We couldn’t reproduce it, but a few users claimed they could. We had about ten minidumps to work with (they include a few KB of stack memory and that’s it).

All of the stack traces had a weird code address at the top and nothing else. Here’s two examples:

>	1673a14a()	

EAX = 0ED884B0 EBX = 00000000 ECX = 0EDAC948 EDX = 0EDA6A48 
ESI = 0DFFA95E EDI = 100F0101 EIP = 1673A14A ESP = 0012DA09 
EBP = 00000001 EFL = 00010282

>	0e531992()	

EAX = 0ED884B0 EBX = 00000000 ECX = 0EDAC948 EDX = 0EDA6A4
ESI = 0DFFA95E EDI = 100F0039 EIP = 0E531992 ESP = 0012DA09
EBP = 00000001 EFL = 00010282

These code addresses are weird because the debugger couldn’t map them to any loaded module. So they’re either dynamically generated/exploited, or garbage. What can we do with such little information? Surprisingly, a lot!

The first thing of interest was that ESP, the stack pointer, was not aligned correctly (it should be divisible by 4). This wasn’t the case in all minidumps, but it was a good hint, given that it was in the usual range for Windows main-thread stack addresses. The CPU was probably jumping to garbage memory, and decoded it to something that incremented ESP. So after stripping that extra theoretical increment off ESP, I could inspect its memory in Visual Studio:

ESP dump

Whoa! A string on the stack suggests there was indeed something going on with client disconnects. Also, the value on the top of the stack – 0x0E531992 is very close to 0x0E531990. This was enough to form a wild guess: maybe the stack was misaligned, and the CPU picked the wrong return address to jump to. We can do better though.

The two most interesting binaries in the HL2 process are the engine and server. Let’s look at their address ranges in Visual Studio:

engine.dll   0BAF0000-0C10A000
server.dll   0DEB0000-0E3A4000

Two of the addresses close to the top of ESP fall into these ranges:
0x0012DA08 0e531990 0e531990 015c15f0 1320092c ..S...S.ð.\.,. . 0x0012DA18 0dfd4cda 100f0038 0bc40689 100f0038 ÚLý.8.....Ä.8... 0x0012DA28 0012de60 1320092c 63736944 656e6e6f `Þ..,. .Disconne 0x0012DA38 62207463 73752079 002e7265 00000000 ct by user......

Let’s take the outermost one first, since it may correspond to the stack frame that created that cool looking string. To go further, I like to use IDA – it’s a great disassembler and it has non-commercial and trial versions. With engine.dll loaded in IDA, all the addresses are wrong – luckily IDA can rebase the disassembly (Edit -> Segments -> Rebase Program):

Rebasing in IDA

For some reason you have to add 0x1000 to get the right addresses – I don’t know why. Now I go to Jump -> Jump to Address, and type in 0bc40689, and find this:

.text:0BC40683    lea     edx, [edi-4]
.text:0BC40686    push    edx
.text:0BC40687    call    eax
.text:0BC40689    lea     ecx, [esp+8]

Well, that’s kind of troublesome. IDA can’t tell where an indirect call will go. So I scroll around the function and look for hints, and get lucky. This is right above:

.text:0BC405F9    push    0
.text:0BC405FB    push    offset aPlayer_disconn ; "player_disconnect"
.text:0BC40600    call    edx
.text:0BC40602    mov     esi, eax

Again, this has something to do with player disconnects! Now I really want to know what function this is. Unfortunately Windows DLLS have symbols separated, but Linux binaries have them built-in. So I load the Linux version of the HL2 engine in IDA, and go to Jump -> Jump to Name, and type in aPlayer_disconn. By doing this I’m hoping the Linux binary has the same string, since IDA encodes string constants as unique names.

IDA name search

Here’s where IDA is extremely helpful – I click on aPlayer_disconn, and go to Jump -> Jump to cross reference:

IDA xref menu

And now I get this:

IDA xref menu 2

So now our mystery call stack has CGameClient::Disconnect() in it. Progress! Now I want to know what that virtual call usually goes to. First I find the same call in the Linux binary. The Windows one had this:

.text:0BC4067D    mov     eax, [eax+0A8h]
.text:0BC40683    lea     edx, [edi-4]
.text:0BC40686    push    edx
.text:0BC40687    call    eax
.text:0BC40689    lea     ecx, [esp+8]
.text:0BC4068D    push    ecx             ; char
.text:0BC4068E    push    offset aS_12    ; "%s"
.text:0BC40693    push    edi             ; int
.text:0BC40694    call    sub_BBA2750

The Linux one is much harder to read because of GCC’s (asinine) PIC code, which IDA has trouble figuring out. But the basic pattern I’m trying to mentally match is:

A virtual call that flows from a large virtual offset.
The string “%s”.
A non-virtual call.

In the Linux version, this looks really close:

.text:00102BE4    lea     edi, (aS_3+2 - 343D64h)[ebx]
.text:00102BEA    mov     edx, [eax]
.text:00102BEC    mov     [esp+43Ch+var_438], ebp
.text:00102BF0    mov     [esp], eax
.text:00102BF3    call    dword ptr [edx+0ACh]
.text:00102BF9    mov     esi, [esp+43Ch+var_428]
.text:00102BFD    mov     [esp+43Ch+var_438], edi
.text:00102C01    mov     [esp+43Ch+var_43C], ebp
.text:00102C04    mov     [esp+43Ch+var_434], esi
.text:00102C08    call    __ZN11CBaseClient10DisconnectEPKcz

Unfortunately IDA had trouble decoding the string reference – aS_3 points to “, %s” – but note it adds two bytes, making “%s”. Now I start a Linux server in GDB, and compute the address I want to break on. This is easy by taking the IDA difference and adding it to the function in the process:

(gdb) p _ZN11CGameClient10DisconnectEPKcz $1 = {} 0xb75f4a70 (gdb) p/x 0x00102BF3 - 0x00102A70 $2 = 0x183 (gdb) p $1 + $2 $3 = ( *) 0xb75f4bf3 (gdb) b *0xb75f4bf3 Breakpoint 2 at 0xb75f4bf3

Now I add and kick a bot (bold red text is my input):

(gdb) c Continuing. sv_cheats 1 bot Cannot verify load for invalid steam ID [A:1:0:1] kickid 2 Breakpoint 2, 0xb75f4bf3 in CGameClient::Disconnect () from /home/builds/common/srcds/orangebox/bin/engine_i486.so (gdb)

Yay, it hit! Now I step into the assembly to see where this goes:

(gdb) display/i $pc 2: x/i $pc 0xb75f6bf3 <_ZN11CGameClient10DisconnectEPKcz+387>: call DWORD PTR [edx+0xac] (gdb) si 0xb75edf00 in CGameServer::RemoveClientFromGame () from /home/builds/common/srcds/orangebox/bin/engine_i486.so

Now back in IDA, I go to Jump -> Jump to function and search for RemoveClientFromGame. Of interest in this function is the following code:

.text:000F9F6A     mov     ecx, ds:(serverGameEnts_ptr - 343D64h)[ebx]
...
.text:000F9F81     call    dword ptr [ecx+10h]

Since we have the symbol info, here I don’t have to do extensive guess work – this is almost definitely an IServerGameEnts pointer, a singleton interface the engine uses to feed information back to the actual game. Here it’s calling the offset for IServerGameEnts::FreeContainingEntity. Could this be the other address we saw on the stack, 0DFD4CDA? It seems likely, given that there is one pushed value in between, and the disassembly has one push as well. Let’s compare the target functions in source and binary. From loading server.dll in IDA:

.text:0DFD4CD0    mov         eax,dword ptr [esp+4] 
.text:0DFD4CD4    push        eax  
.text:0DFD4CD5    call        sub_0DFFA920 
.text:0DFD4CDA    pop         ecx  
.text:0DFD4CDB    retn         4

The same function from hl2sdk/game/server/gameinterface.cpp:

Select All Code:

void CServerGameEnts::FreeContainingEntity( edict_t *e )
{
        ::FreeContainingEntity(e);
}

These seem really similar. So it seems like we might be crashing in ::FreeContainingEntity(), at a point where the stack depth is around 4 values. But looking at this function, it’s not really clear what could be happening:

Select All Code:

void FreeContainingEntity( edict_t *ed )
{
...
        CBaseEntity *ent = GetContainingEntity( ed );
        if ( ent )
        {
                ed->SetEdict( NULL, false );
                CBaseEntity::PhysicsRemoveTouchedList( ent );
                CBaseEntity::PhysicsRemoveGroundList( ent );
                UTIL_RemoveImmediate( ent );
        }
...
}

In HL2 parlance, UTIL_RemoveImmediate() means “immediately destroy and deallocate this entity”. The point at which the stack depth becomes 4 is around this call – but not exactly. At this point I gave up for the night and asked the user if he had any more information. But from this, I had the following guesses:

Call stack that looked something like:

(?+0) server.dll!FreeContainingEntity()
(?+1) server.dll!CServerGameEnts::FreeContainingEntity()
(?+2) engine.dll!CGameServer::RemoveClientFromGame()
(?+3) engine.dll!CGameClient::Disconnect()

The CPU jumped to an invalid address somewhere.
The invalid address may be from a return instruction and a misaligned stack.

Amazingly, he posted a working test case that involved adding and kicking a bot on a specific game. I tried this game, and managed to reproduce it! I narrowed the problem down to a minimal set of “extra junk running”, and from there SM developer Fyren noticed the problem pretty quickly.

SourceMod overwrites virtual table slots to “detour” function calls in between the engine and whatever game is running (in this case, Day of Defeat:Source). The slot number can differ between games, so it is specified in a configuration file. One of these configuration entries was missing, and SourceMod was accidentally overwriting the zeroth vtable entiry – the destructor! Now the sequence of events is clear:

Engine tells the game that a client is disconnecting.
Game destroys the client object.
C++ calls the virtual destructor, but the vtable entry points to a SourceMod function.
The Windows ABI uses a callee-pop convention on member functions. The virtual destructor should be popping no parameters, but the SourceMod function pops two, since the conventions don’t match.
Once SourceMod’s replacement function returns, the CPU tries to load the return address, and gets garbage.

Phew.

Given that the patch that added this code baked for months, it’s amazing it slipped into a release. And for the record, it had two peer reviews. But there you have it. Sometimes it’s possible to draw blood from a minidump stone – but without a test case it’s unlikely we could have gotten this as quickly.

OS X and Windows Perspective, Updated

A year ago I started using OS X, and gave a bunch of thoughts on the experience. A year later, I’m using it exclusively. I’m sure you could pull up any number of UI mistakes in OS X – there are many. Perhaps I’ve adjusted or gotten used to them. When I need Windows, I usually fire up VMware and run XP.

A few days ago I installed Windows 7 on a spare computer. The experience was horrifying. It was like putting my head in a clamp that has complimentary electroshock therapy. What surprises me most is that I used to be a die-hard Windows fanboy. I’d have nasty arguments with mac users. I’d love touting MSDN and the WinAPI. I can’t even try to muster up this enthusiasm anymore. The Windows 7 (and Vista) UI drains my soul.

The Windows 7 installer kept freezing at two points in the process – before copying or disk formatting or anything! I waited about ten minutes before rebooting. Rinse, lather repeat. Finally someone told me I had to just wait it out, despite there being no I/O activity or visual feedback. Sure enough, after about 15-20 minutes of waiting it proceeded.

After starting up, the Aero UI was… ugly. Everything looked about a dozen pixels too thick. The glass effect looked like my UI had beer goggles, made worse by the way-too-bright default blue. So I turned border padding really low, turned off transparency, and made the border colors as solid as they could get.

Delays in responsiveness make the UI seem slow or broken, or like my computer isn’t fast enough. What baffles me is why Windows has arbitrary delays in the UI. Two examples.

The Start Menu has this right-hand pane that lists things like “Documents”, “Downloads”, “Computer”, etc. As you hover over these, a respective icon fades into view. But there’s a very noticeable delay… it feels around just under a full a second, until the fade-in begins. So your mouse can sit on the “Control Panel” text but still have the icon for “Run”.

Minimizing and maximizing windows also has noticeable delay. There’s some kind of animation, but it’s much slower than OS X. It’s distracting and slows down workflow. I went to “Advanced Performance Settings” which requires administrative access. I knew that changing the animations is a setting that reverts after reboot, but I gave it a shot. I unchecked the feature, clicked OK – and suddenly all my UI settings reverted. I was back to hideous blue, transparent Aero. Ugh. So I fixed my UI prefs again.

Next, I installed a secondary drive to copy old files. The drive showed up as “E:”, so I navigated to “E:\Users\dvander”. It asked for Administrative credentials. I typed in my password, and hit okay… but nothing happened. No response, but the drive started churning furiously. I tried again, but nothing happened. Now I was getting worried – I knew the drive had bad sectors. Was it finding more? Was it creating more? What was all this activity? Why couldn’t I access my folder?

I launched Explorer as an Administrator, but no luck. It still asked me for admin credentials, and still did nothing. I launched cmd.exe, and navigated to “C:\Users\dvander”. Surprise! I could read it fine from there. So what was going on with Explorer?

Then I noticed the path bar in Explorer was slowly filling up with green stuff. Was it Indexing? I Googled “disable indexing” and found this site. I still don’t understand the UI here. Try it out. There’s some sort of two-pane list of mutually exclusive terms, but a selection carries across both. You can select something and hit “Modify” but it seems like the “Modify” box has nothing to do with what you selected. I’m sure this is way more flexible than Apple’s Spotlight, if I could only understand it. It looks something like this (image from article):

I took a guess that “Users” was indexing everything in E:\Users, so I removed that and hit OK. Instantly, all that churning stopped, and Explorer brought up my folder!

Explorer was blocking the entire user interface on an automatic, massive, recursive indexing operation over all my files. The only feedback was an authentication dialog for something I had access to, that silently failed anyway.

With that solved, I started actually using things. The taskbar was too big, older Windows apps have icons that don’t scale properly. Hell, most actual Microsoft apps don’t. So I made it smaller, only to find that the circular “Start” button clips into the working area of a window. This did not happen on Vista. So I made the taskbar auto-hide (which still leaves a tiny sliver about 1-2 pixels thick).

Okay… I’ve been ranting a while now. Two more things. What’s with Windows 7 file dialogs?

When this thing pops up, the first thing I do is mouse wheel to get to the file I want. Oh, sorry, I meant I used to do that. In Windows 7, when this dialog appears, the mouse wheel does absolutely nothing. First you have to click inside the file list. Great, extra movement. This might even be a regression but I forget what XP did. Of all the Windows 7 idiosyncrasies, this one hits me the most frequently, and it’s really bothersome.

Finally, what’s with “Program Files (x86)” versus “Program Files”? As if spaces were bad enough in such an important path (thank God “Documents and Settings” is gone), now we have parenthesis? Why is there even a difference in the first place? It seems like some x64 stuff goes into the x86 folder, and vice versa, so what’s the point? Why isn’t this called “Programs”?

Phew. If only OS X could run Valve games.

Mystery Bail Theater

Sometimes worth reading

Category Archives: Articles

The Evolution of Building at AlliedModders

Making a Card Game Prototype

Inheritance Without Types

On Indonesia and Being Justin Bieber

Debugging 500 Errors

JavaScript Checkers

Land Ho, Fast JavaScript!

What is JägerMonkey?

Results.

Benchmarks.

Ongoing Work.

Conclusions.

JägerMonkey has Crossed the Streams

Debugging the Impossible

OS X and Windows Perspective, Updated