SourceMod JIT Changes

A few months ago I made some major changes to the SourceMod JIT, the first major changes since the thing was originally written two years ago.

Pawn suffers from many problems. One of my (least) favorites is the lack of separation between a runtime and a context. Some language implementations don’t seem to have this either. Python, notably has a global runtime and no notion of a context as far as I can tell. SpiderMonkey abstracts both. Lua seems to have both and combines them into one structure.

Pawn historically shoved both a runtime (VM code+heap) and a context (thread stack+locals) into the same structure, and SourcePawn originally did the same. The problem was that once you created this structure, it was immutable. You couldn’t add code or data. Sharing between scripts meant jumping runtimes which is very expensive to marshal. Each plugin having its own runtime and context is a waste in every sense of the word.

Unfortunately the design of the VM doesn’t really let us fix that. References to memory are not typed, and references to global memory can’t be easily deduced at verification time. This makes it impossible to put multiple scripts into one structure with a common heap. For example:

const.pri 50
load.pri

This is a load from global data at address BASE+50. SourcePawn is a straight-line JIT, and some form of data flow analysis would be needed to catch and relocate these references. Although we could change the compiler, we have to be backwards compatible with old bytecode.

So stuck with this crummy design I went on a spree of changes to try and clean things up. The first was separating the ideas of a context and a runtime – although runtimes are still per-plugin, the invocation model is now centered around Functions and Contexts. That is, the API for calling a function used to require a runtime and a code addres. It now requires a function object and a thread context. The API looks a whole lot cleaner, but it’s just lipstick on a pig – internally, contexts and runtimes are still tied together because of Pawn’s limitations.

Though as part of those changes the implementations for contexts and runtimes got moved into the JIT library. Previously they were separated, the JIT lived half its own library and half in SourceMod. SourcePawn is officially standalone now. For testing I was even able to hack up a shell app to load plugins using the DLL and headers from SourceMod snapshots.

The third big change was to move to whole-method incremental compilation. Previously the entire code blob of a plugin would get JIT’d on load. Now we only JIT functions once they’ve been run for the first time. This speeds up load time noticeably; compilation time becomes spread out, and the hot paths (such as OnGameFrame) are hit quickly, so gameplay is unlikely to be interrupted. It’s faster to compile a smaller chunk of code anyway, especially since SourcePawn does little to no JIT-time analysis.

Functions that call other functions (via the CALL opcode) are special. If the target function is already compiled, a direct call is generated. If the function is not already compiled then a special thunk is generated instead. If the thunk is ever executed it checks to see if the function has been compiled yet. If it hasn’t, it compiles it. The thunk completes by replacing itself with a direct call to the target function.

The last big change was a rewrite of our debugging implementation. The previous concept (lifted from AMX Mod X) was pretty bad. Pawn emits BREAK opcodes throughout methods to indicate line changes. The JIT would change these BREAK opcodes into a function call called the “debug hook.” The debug hook would look for changes to the frame pointer to deduce whether the call stack was entering or leaving a function. This was really expensive so you had to explicitly enable debug mode to get error diagnostics.

The new implementation emits a single instruction for each BREAK opcode:

mov [save_pc], pc

For the CALL opcode, the pc (program counter) is pushed onto a separate stack (the call stack). When a CALL finishes, the top value is popped. If a function throws an error, the complete stack trace can be recovered from a combination of save_pc and the call stack. The pc can be used to find the line, file, and function in the debug info. This is fast enough to remove re-JITing to debug a plugin.

So SourcePawn is a little more mature now. It’s got an isolated runtime, it incrementally JITs, and the design (on the outside) is a lot cleaner. I wouldn’t recommend using it in other projects yet, but it’s at a point where if we wanted to take it further we could do so without more massive rewrites.

Dude, that’s Gangsta – Driving across the USA

I apologize for the lack of updates lately. Work is busy! Please enjoy (or don’t) this boring, non-coding writeup. A big article about recent work with trace compilation is coming up soon, I promise!


Back in May I drove across the country – from Rhode Island to California – with my friend Dan. I’m not sure how he managed (or why he even agreed) to put up with me through this, especially when our only entertainment was two Killers’ CDs, a Lewis Black CD, and some Family Guy. Our destination was San Jose where twisty lives (a friend from WPI).

The plan was simple: We’d drive out, stopping at random hotels when we got tired, and hit CA within four days. Dan would fly back after we relaxed for a bit, and I’d stick around for my new job.

As we drove across the country I took mental notes about the various states we went through. For all of these states we pretty much stayed entirely on I-90 W. It was a strange road. It can’t quite decide what its name is. Near the start of the mid-west it becomes “I-90/I-80” and then just “I-80.”

Along this highway, up until Illinois or so, “rest stops” were big plazas with restaurants and tourist information. This fanciness stopped fairly quickly and as we delved further into the west, rest stops became so far apart and dilapidated (if not entirely closed) that we gave up using them.

All of our experiences were based on that highway, as in, these comments apply to a rather small and quickly applied vignette of each state.

Day 0
New York: We completely avoided any cities and mostly only saw trees and hilly regions. Our stop for the night (we got a late start) was Weedsport, NY at a Days Inn.

Distance: 333.7 miles (333.7 miles total).

Day 1
Pennsylvania: We cut through PA pretty quickly. If I recall correctly it was overcast and maybe raining. It seemed to be mostly farmland in the area. We drove straight through Erie and didn’t even visit sawce.

Ohio: This is our least favorite state, as it’s the only one we got a speeding ticket, for 87mph (I think our record to that point was 97). It was the last state with a speeding limit of 65mph. Going through Cleveland was a pain as the traffic was bad, and I also don’t recall any other part of the highway going through such an annoying area.

After Ohio we used cruise control the entire way.

Indiana: I can’t recall anything particularly interesting about this state. The speed limit got raised to 70mph or so.

Illinois: The last state with a decent rest stop, sadly at the very beginning. If I recall correctly the speed limit was down to 65mph again. We stopped in Morris, IL overnight at a Holiday Inn.

Distance: 687.7 miles (1021.4 miles total).

Day 2
Iowa: Long and boring drive with endless farmland. No crops were actually growing yet so there was quite literally nothing to see except for farm structures/equipment and cows. The speed limit was bumped back up to 70mph here.

Nebraska: The first state that was really, really long. Luckily the speed limit was bumped to 75mph (effectively 80, though we played it safe). At this point and until Nevada, most other vehicles on the road were trucks. Amazingly enough, a blizzard approached and I-80 was shut down. We narrowly missed it, though a good deal of the night driving was through heavy rain and winds and pretty unsafe. Surprising for early May.

Nebraska, like Iowa, was very boring. An endless day sky combined with increasingly little to see was tiring, though there were still frequent towns and stop points. It was a very weird experience to drive hundreds of miles and not feel like you’ve moved. Everything was literally so flat from there on that we had no frame of reference for where we were. Talk about feeling like a speck.

Nebraska was the end of Dunkin’ Donuts for Dan. Surprisingly NoCal is pretty lacking in equivalent drive-thru breakfast/coffee shops.

The area was mostly farms. Gas was cheapest in NE, with the higher quality gas being less expensive for some reason (at around $3.35, compared to RI’s $3.70 and CA’s upwards of $4 — this was in May, again). We stopped in Kearney, NE overnight at a Days Inn.

Distance: 635 miles (1656 miles total)

Day 3
Wyoming: This state was endless. Unfortunately the real interests (Yellowstone National Park and whatnot) are up north, and I-80 is down south. I have to say Wyoming was the weirdest state of all and Dan would probably agree.

The whole state felt off-kilter from the start, like we had entered the twilight zone. I can’t really explain it. There was tumbleweed on the highway and the scenery felt dated. After we got through Cheyenne things got weirder. Highways turned into very sketchy roads clinging rocky structures full of construction.

We stopped at a Pizza Hut (hey, I love their cheese-crust pizza!) which was a mistake. It was full of loud kids. The table next to us was occupied by a strange family. Four really, really, morbidly obese children with an equally corpulent father. The mother seemed really thin. One of the kids, a girl, went up to the salad bar and came back with a plate full of cookies. The dad kept yelling at her, “stop eating that plate of cookies!” She didn’t. Interesting parenting.

After escaping that we went to get gas, and Dan freaked out because the gas was all 85 octane. My Toyota Corolla takes 87. He scoffed at the vendors “ripping people off,” but later we read on Wikipedia that high elevation (such as in Wyoming) means you can use 85-octane gas without the “knocking” effect. Thank you, Wikipedia.

The rest of Wyoming was very, very, sparse. The lack of civilization was amazing. Every 40 miles or so we saw a trailer camp and maybe a small industrial rig that looked like something out of Half-Life 2. There were occasional billboards. They were always for gambling, porn, or fireworks. Even gas stations and random convenience stores had slot machines. Weird.

The most amusing spectacle was an old school bus dumped along the side of the highway. It had a giant banner strapped to it for “ADULT VIDEOS.” Someone had decided that an abandoned school bus made a good porn billboard. We henceforth dubbed this the “porn bus.”

All that aside, it had some very nice views. Mountains were viewable from a good portion of the highway (Rockies, I think?) and there was a high-elevation rest stop that made for some decent pictures.

Utah: Even having started during the day we didn’t get into Utah until night, since Nebraska and Wyoming are endless. There’s not much to see at night and we tried to speed through most of Utah, missing any sights such as the Great Salt Lake. Unfortunately it was around 1:30AM by the time we approached the Utah border. I was already dead tired and hopped up on Monster (which after two cans makes you feel hung over). The nearest city was another 80-100 miles away so we couldn’t claim conquering Utah in its entirety — we stopped at a Days Inn in Wendover, Utah overnight. It was a mere two-miles from entering Nevada.

Distance: About 870 miles (2526 miles total)

Day 4
Nevada: Border to border dust and desert.

The desert was this weird amalgam of completely foreign sand compositions. For example, bright white patches of sand (or sometimes even seas of the stuff). Sometimes it was hard to tell whether it was giant pools of liquid. I can’t really explain it. Not terribly interesting, just strange.

Along the roadside a common phenomenon was small black rocks being arranged into letters/words. This occurred randomly throughout the entire stretch of Nevada’s I-80, often sprinkled throughout the sparse vegetation. I regret not stopping for pictures because I can’t recall any of the phrases now. I just have to wonder:

Who’s been writing things with rocks on the roadside for hundreds of miles? We could go thirty minutes and not see a single car or sign of life, but rock graffiti would be there. Judging by a google query, I’m not crazy. Perhaps we should have stopped and added our own — AlliedModders!

It was very surreal. The worst was exiting Nevada (near Reno). We hit some combination of a dust and rain storm, which meant my car got CAKED in this thick battering of mud. All the cars driving out of the storm with us had this layer of crap. I didn’t get my car washed for a few months, mostly to keep this “battle scar” as a souvenir.

California
As we approached CA we were stopped by some sort of customs booth asking if we had any foreign fruits or something. The elevation started getting pretty high after (7k feet or so). We tiredly cheered seeing the “Welcome to California” sign. The driving instantly changed at this point. CA drivers are crazy. Apparently we were driving too slow from the start, and a guy on a motorcycle wearing a skull mask gave us the middle finger as he sped by.

Crawling down the high elevation was interesting as the degree of slope on some roads was high. There were long stretches of gravel on the sides of roads, which Dan told me were for trucks — if their brakes failed and they plummeted down the roads, they could aim for these gravel stretches and sink in. And then probably explode.

We stopped at a random point at Lake Tahoe to stretch for a bit. It was evening though and the views were getting cloudy. We finally got off I-80 (a big milestone!) and landed on I-5, and then a few hours later I-680. We reached our destination in San Jose at 10:45PM.

Distance: About 700 miles. The grand total ended up being around 3,300 or so.

Conclusion
Would I do it again? You betcha. I don’t get out much so this was an adventure for me. When I do it again I will change some things:

  • Leave more time. Rushing makes me testy.
  • Go through areas where things are worth seeing. Who wants to drive through a thousand miles of corn? The CA coastline is amazing (my youngest brother drove it with my dad) and I would like to see it.
  • Bring along more entertainment. I listened to those Killers CDs about a hundred times each.
  • Stop for pictures when I see a porn bus.
  • Better pacing so the boring driving gets done at night.
  • Stock up on energy drinks beforehand. In general, keep a cooler with food — fastfood gets old and it’s the only thing in many areas.

When twisty’s neighbor saw my car parked outside, he said “Rhode Island? How did that get here?” “Oh, I drove.” “You drove the entire way?” “Yep!” “Dude, that’s gangsta.”

SourceMod Project Changes

Over the past three weeks we’ve made some big changes to SourceMod in terms of managing the project.

Buildbot
I have pretty much nothing but praise for this tool. It’s easy to set up and configure. In about a day I had replaced SourceMod’s terrible and messy Perl scripts. And it looks very pretty.

We have a separate build server with slaves for Linux and Windows (Window is running under VMWare). The master lives on the web server. The version control system (was Subversion, is now Mercurial) notifies the master of source code changes, and the slaves build SourceMod and push snapshots back to the web server. It also manages the symbol server.

Our scripts for buildbot live in tools/buildbot under the source tree for all who are curious.

Mercurial
I’ve come to love Mercurial, despite its idiosyncrasies, from using it at work constantly. The distributed nature makes everything much easier. I can keep many copies of the repository around on different computers and share changesets in between them and other people. I don’t need internet access to push changes and I can juggle many patches at once.

We started to notice a problem in SourceMod. We had two branches: 1.0, and 1.1. When we pushed bug fixes to 1.0, there was no way to share these back to 1.1 while maintaining change history. Mercurial inherently solves this. Subversion 1.5 supposedly has merge tracking, but by the time it was out our minds were already made up. Mercurial’s distributed nature is awesome.

I haven’t learned how to use patch queues yet (mqueue), but I’m being encouraged to at work. Our Mercurial repositories are here and I’ve written a tutorial (akin to our CVS and Subversion tutorials) here.

Bugzilla
I originally chose Flyspray over Bugzilla because it looked nicer and I could understand the permissions system. Eventually I came to realize that Flyspray was easy to outgrow, and I came to like Bugzilla more and more from using it at work.

Flyspray is good, and probably suffices for small projects. But it’s buggy. Our install was breaking for some reason, we couldn’t search for users anymore (every user would return “not found”). Marking a project as inactive would let normal users edit bugs still. There seemed to be no way to have group-specific permissions, which meant we couldn’t have private or internal projects. The UI didn’t work on mobile devices (this is probably the fault of poor AJAX implementations). You couldn’t refresh a bug after posting a comment, because there was no link back to the bug.

Bugzilla has a much more complex UI but there are a few things that sealed the deal. Aside from addressing the problems Flyspray had, its authentication system was a lot more flexible internally. I was able to write a simple authentication plugin for vBulletin. The ability to watch users makes automated CC’ing much more flexible. Now anyone can listen for all SourceMod bugs, whereas previously only the project owner could.

The attachment system is a huge win. Attachments can be obsoleted and they all live on the same section of the screen, near the top, rather than sprinkled throughout comments. It can show diffs for patches which is really nice. You can request peer-reviews on patches which is something Mozilla does extensively.

If anyone’s looking to convert from Flyspray to Bugzilla, you can have my hacked-up Perl script. It uses an intermediate table called “mvtab” (type varchar, old_id int, new_id int) as a temporary work table. You need to create this yourself. You also need to manually import products and components (components must have the same names as categories). It’s really a mess but it got the job done, so play with it at your own risk.

flyspray_to_bugzilla.pl
bz_import_mail_settings.pl (tack-on, since the original import didn’t get e-mail settings right)

And here’s the Bugzilla vBulletin binding: vBulletin.pm (goes in Bugzilla/Auth/Verify). You may need to change config files somewhere to use it.

Can you do this to your VP?

Last week, us Mozilla interns held our annual intern BBQ. I guess one of the traditions is to throw someone in the pool.

This year they chose Mike Schroepfer (also known as “Schrep”). Schrep is the VP of the Engineering, and about five seconds with him is enough to know he means business. He talks fast and asks deep technical questions about the projects he oversees. This is an interesting trait, as he’s both very bright on a technical level while also being a high-level manager.

So when five+ interns tried to gang up on Schrep (who, as always, was dressed as neat as a pin), people on the sidelines seemed a bit apprehensive about how that’d play out. Lo and behold, not well.

In the process of trying to grab him, they accidentally tore his shirt. Then as they got close to the pool, Asa Doltzer warned them that Schrep’s shoes were too nice to get wet, so the interns had to pull them off. After that they gave up and dropped him to the ground by the pool.

Schrep didn’t look terribly pleased but he took it all in stride, even making a joke about it at the all-hands company meeting on Monday: “For the record,” he said, “Despite it being five-on-one, you guys still couldn’t take me down!”

I have a feeling that if I end up at another company, throwing the VP into pools won’t be an employee perk. Though, the only people I have been known to throw into pools are my brothers (who, as a trait of being younger than I, eternally deserve it).

Valve’s Changes to the Iterative Model

Today Mike Beltzner (Firefox Director) gave a very interesting presentation on software development models and how they apply and evolve at Mozilla. When he put up a slide about the iterative model, it included a picture from Wikipedia that looked like:

“Plan -> Design -> Implement -> Deploy -> Test”

Instead of:

“Plan -> Design -> Implement -> Test -> Deploy”

This elicited a few chuckles, since obviously no one tests after deployment. I almost shouted out, “Well, Valve does that.”

But I didn’t, since that’d be just wrong. Valve doesn’t test at all.

No, I do not consider that comment flamebait. In the past few weeks patches have had to go out twice because of things like servers crashing on load or massive memory leaks.

Recent Replacements

At AlliedModders we’ve been working on slowly replacing our aging “infrastructure” (big word for our little operation). In mid-2006 we got a Pentium 4 Xeon box at ThePlanet and the only operating system available was RHEL3.

Anyone who knows me knows I despise RHEL; yum is a huge pain. It doesn’t handle dependencies or core package upgrades well. It’s slow and its search is cumbersome. In order to not risk breaking the system we built everything from source and tried to completely avoid RPM packages. This meant a lot of cruft built up.

ThePlanet used to be good, but it seems their support quality has waned as of late (probably the result of the EV1 takeover). On two occasions our server simply rebooted with no explanation. They couldn’t provide answers as to why. In one of those cases, they simply closed the ticket saying “If you want to know, file another ticket.” Well, obviously, I wanted to know in the ticket already created for the incident.

Their control panel was decent but some information was just left blank — like serial access, which other big companies (like 1and1) provide. There was an incident where after 430 days of uptime, I got a call at midnight from CST from what sounded like a not-all-there ThePlanet technician. The conversation went like this:

TP: “I just got a ping notice that your server is down.”
Me: “Uh, well that’s strange. I didn’t turn it off. Why is it down?”
TP: “I dunno.”
(The delivery of this stupid statement dumbfounded me, so it took me a few seconds to recover.)
Me: “Well… can you find out?”
TP: “I guess… I can reboot it.”
Me: “That would be nice.”

After fifteen minutes it was still down, so I called the number they had called me from. This time they refused to talk to me since I couldn’t authorize myself (my customer information was on the server itself, a stupid mistake I won’t make again). I found this odd because they had called me fifteen minutes ago and taken a reboot order without even verifying my identity.

Eventually it got rebooted, but a few weeks later there was an explosion at their datacenter which took 9,000 servers offline (google “theplanet explosion” for coverage of the incident). We weren’t affected but it was the nail in the coffin – a few weeks later we completely moved to SoftLayer. I’d heard good recommendations of them in IRC, and they seem to be comprised of a few ThePlanet employees from times past.

(As an aside – I’m not the only one complaining about ThePlanet. My former employer had/has a large number of machines hosted at ThePlanet and frequently complained about support difficulties.)

Alongside that, we made a few big immediate changes. The first was ditching RedHat for Debian (my favorite distro because of its stability commitment, and the general convenience of aptitude). We also migrated to 64-bit. The next was ditching the “Cyrus” IMAP server. It seems no one can write an e-mail server that takes less than two days of work to configure properly. Cyrus was belligerent in refusing to consistently accept authentication methods and having archaic tools with poor documentation. Its configuration files were confusing and using its Sieve implementation required a weird FTP-like client. I’m sure that’s nice for a big setup but for <10 accounts it's a huge pain. So, Cyrus got dumped for Dovecot. So far I have plenty of praise for Dovecot. The documentation was pretty well written and had extensive use-cases with working examples. The authentication was easy to set up and its Sieve usage was trivial to get working. It even had an example of working with my favorite MTA, Postfix. I was able to convert over my 1.5GB maildir with only one annoyance (it didn't maintain whether I had read a message or not). The next big steps are moving to Mercurial over Subversion, and perhaps switching to Bugzilla over flyspray. Someday.

Tamarin Tracing, Intro to Tracing JITs

I’ve been porting Tamarin-Tracing‘s code generator to AMD64. The idea behind Tamarin-Tracing is that you have an interpreter for some dynamically typed language, or an environment where a fully optimizing JIT would be too expensive.

As the interpreter runs, it will decide to start tracing sequences of code that could be “hot paths” for performance. For example, loops are good candidates. Once tracing begins, the interpreter captures the state of every instruction and emits it as a “trace.” A trace represents the exact sequence that was taken through a particular control path.

A good example could be a method call. For example, “obj.crab()” in languages such as JavaScript or Python requires a dynamic lookup, since the type of “obj” is only known at run-time. You can emit a trace like this:

LET x = TYPEOF(obj)
TRACE:
   IF TYPEOF(obj) != x
      EXIT
   ELSE
      CALL x::crab ON obj

This trace says, “Assume object is type X. If it’s not, we’ll have to recompute it, but if it is, we can optimize this to a direct method call.” Later, the trace will be analyzed, optimized, and then compiled to assembly. Next time the interpreter needs to call the method, it will see that a compiled trace exists. The compiled trace will first check that the type of “obj” matches the path it originally took. If not, it will exit back to the interpreter. This check is called a “guard” and the exit is called a “side exit.” Otherwise, the optimized method call is invoked. A side exit can later turn into another trace; these branches form a “trace tree.”

Anything can be traced. For example, one side of an “if” branch can be traced. The resulting compiled code would be for the “hot” side of the branch. If the “cold” side is detected, it would jump back to the interpreter and a new trace might branch off. Another case is addition for weakly typed languages. For example, “5 + '3'” is valid in JavaScript, and a trace might optimize numeric conversion paths.

One of the most surprising features is that the compilation granularity is much finer. SourceMod compiles entire scripts to native code (known as “ahead of time” compilation). Sun and Microsoft’s JITs compile methods at a time, and thus compilation is deferred until a method is needed. A tracing JIT, however, is capable of compiling only pieces of code that are deemed as important. It can trace through anything it wants, including method calls, and return control back to the interpreter when the hard work is done.

This is a pretty new concept that seems to be the future of optimizing dynamic languages. JITs for these languages typically hit performance boundaries because the native code must either perform run-time conversion itself or exit back to interpreter functions for resolving types. There is supposedly a LuaJIT in the works for possibly my least favorite language (LUA), and someone on IRC mentioned a tracing JIT in the PyPy project (though I have not looked into it).

Unfortunately, benchmarks have continually shown that Tamarin Tracing is just plain slow. Some people are wondering why we’re even bothering with it. What’s important to recognize is that the speed of the tracer is bound to the speed of the interpreter. Tamarin Tracing is highly experimental, and the interpreter Adobe packaged with it is not optimized. The tracing engine, however, can be easily detached from the interpreter. Since Mozilla already has a reasonably fast interpreter (SpiderMonkey), our new project is to couple the two together, as “TraceMonkey,” so the interpreter can emit traces.

A trace is a set of low-level, abstracted instructions, internally called “LIR.” Each “word” of LIR is 32-bits — an 8-bit opcode and three optional 8-bit operands. For example, encoding a 32-bit immediate value requires an empty LIR_imm32 word and another word for the value. These LIR opcodes are designed to be fairly platform agnostic, but more importantly, they are intrinsically in SSA form. This makes liveness analysis, common sub-expression elimination, and other optimizations much easier.

Since TT’s interpreter is not 64-bit safe, I’ve been testing my port by writing programs directly in trace LIR. For my first function I implemented XKCD’s getRandomInt() { return 4; } function. Each label is the word/instruction number.

1: imm64 4      ; immediate value of 4
4: imm64, &addr ; immediate value, address from C code
7: sti 1, #0(4) ; take the pointer value of instruction 4, store value in instruction 1 at offset 0.
8: x            ; exit the trace

These traces are compiled backwards. In this example, the epilogue is generated first, and the last instruction to be generated is the first instruction of the prologue. Generation is a single pass and thus code size is not computed beforehand. Executable pages are allocated one at a time. If there is not enough space to encode a full instruction in the current page, a new one is allocated and a JMP is generated to bridge the code.

That’s it for my rough overview. If you’re interested in details of the tracer, including LIR and everything after, I highly recommend Dave Mandelin’s Tamarin Tracing Internals articles (parts 3, 4, and 5). He provides some excellent insight into the optimizations it makes, and I’ve been using the articles as a guide while reading the code.

SourceMod’s JIT Opened, Performance Ideas

Originally, we had a few motivations for keeping SourceMod’s JIT closed source. They are neither here nor there, but we’ve decided to open it. I’d like to talk a bit about what it is and what it isn’t. If you’d like to see the source, it’s in sourcepawn/jit/x86.

A JIT is a technology that, “when it’s needed,” compiles code from one form to another for optimization. For example, the Java JIT compiles Java bytecode to native assembly. There are three levels of granularity in a JIT:

  • Whole Program. The JIT compiles the entire program in one go (actually an “ahead of time” compiler). No granularity.
  • Methods. The JIT compiles functions one at a time as they are needed. Coarse granularity.
  • Tracing. Can compile any path of code anywhere in the program. Fine granularity.

Tracing JITs are new and still seem to be experimental. Traditional JITs, like Microsoft’s and Sun’s, compile functions one at a time. SourceMod’s JIT is a whole program compiler. We chose this route for a few reasons:

  • It’s easy.
  • We’re not concerned about loading/compilation time since Half-Life servers take forever to start anyway.
  • Even so, the cost of computing optimizations on a SourceMod plugin is extremely cheap.
  • There is no performance gain from tracing because Half-Life runs on good processors and all types are statically known.

Why is optimizing Pawn code so easy? It has a simple, two-register design internally. This has a number of really nice implications:

  • The generated patterns are simple. Peephole optimization in the source compiler does a good job at reducing expressions.
  • The two registers that represent machine state can be easily mapped to processor registers, since nearly every processor has two scratch registers.
  • Types are static and optimal code can be emitted.

Therefore SourceMod’s JIT is “dumb” for the most part. It performs a massive peephole “search and replace” of every Pawn bytecode to native machine code. Where it wins is that the assembly is highly handcrafted to the x86 ISA, rather than piped through a processor abstraction layer. faluco spent a lot of work optimizing moves and stores to reduce things like pipeline stalls. A pipeline stall is when one instruction depends on the result before it. For example, if x86 sees an address coming up, it tries to pre-fetch the final computed address onto the address pipeline. This is why LEA is called a “free” instruction on x86 (because the computed result is “already there”). If the address computation requires a prior instruction, the pipeline will be stalled.

SourceMod performs some other more general optimizations. It optimizes for space by emitting short instructions when possible. Certain function intrinsics are implemented in inlined hand-crafted assembly (like the GENARRAY and FILL opcodes). Native dispatch is entirely inlined down to assembly.

A special amount of attention is given to switch cases. If all case values are consecutive they are optimized down to a jump table rather than a series of if statements. The sequence can start at any number and end at any number, as long as no number is skipped.

Since Pawn doesn’t have types, the JIT performs some peephole optimizations on certain natives. For example, if it sees the ‘FloatAdd’ native, it will optimize the code down to FPU instructions. This is a huge bonus because native dispatch is expensive (the VM’s internal state must be cached and then restored). This specialized peephole optimization occurs mainly on float natives.

The JIT maps Pawn’s two registers to EAX and EDX. This ends up being really nice, as they’re the two scratch registers used for lots of important things on x86. For example, a MUL instruction uses both, and many of the ALU instructions have shortened forms for EAX. The unfortunate side effect is that stack spilling can be common, but the Sethi-Ullman Algorithm shows that two registers will suffice for binary expressions.

The SourceMod JIT gets the job done. In terms of design considerations, it’s not perfect. Its optimizations are primitive, and it assumes the source compiler will do most of the hard work. But in the end, if fits the bill well. Pawn doesn’t have much complexity, and the scripts are fairly small.

I do have some future plans for the JIT. Some easy optimizations are that more float natives can be optimized away in the peephole pipeline. Similarly, if SSE/SSE2 is detected, faster instructions could be emitted instead.

With a little help from the compiler, we could completely eliminate a large class of useless performance losses in SourceMod. Pawn scripts use “local addresses” that are relative to a base pointer. With a little help from the compiler, the JIT could eliminate local addresses completely. This would greatly improve generated code performance and turn LocalTo* calls (which extension writers should hate by now) into nops. There are other implications too – calls from plugin to plugin would become direct calls instead of slowly marshaled through PushCell() and GetNativeCell() and whatnot.

Lastly, natives are expensive. It takes 28 instructions to jump into native mode. Unfortunately, calls like IsClientConnected() and IsValidEntity() can’t be implemented without native code. A possible solution to this is to introduce a few very specialized “native” opcodes into the JIT. For example, a check for IsClientConnected might look like this in an extended Pawn bytecode:

  proc
  push.s  0xC     ;push the first parameter (the entity index)
  getuseraddr 1   ;offset of &g_Players address in context::user[] array
  vcall 5         ;call function at virtual index N, GetGamePlayer()
  vcall 2         ;call function at virtual index N, IsConnected()
  ret               

The JIT would then compile this directly into the plugin, as if it were a stock. This would effectively let anyone write very fast natives in an abstracted assembly language, without incurring any native overhead. It is of course a lazy solution based on Pawn’s limitations, but definitely a decent attempt at gaining performance for very little work.

For now, I will likely back away from other lacking things in Pawn (like CSE, liveness analysis, optimizing register usage, etc). The principle reason being that the other proposed optimizations are likely to get more bang for the buck at this point. The fact that the JIT is hardcoded to assume EAX/EDX as the two storage registers means such extensive optimization would warrant a complete redesign for a dynamic register allocator. We wouldn’t be able to justify that until we freed up some of the registers required by Pawn (like the DAT and FRM registers for the local addressing mode, see earlier idea).

Driving in CA is Dangerous

After a few weeks of driving in CA, I’ve seen more accidents than I see in months of driving on the east coast.

I’m not sure what the reason is. There is certainly a lot more traffic, but the roads are bigger to compensate. It seems people are just more reckless. Examples of weird traffic incidents:

  • (Morning) A car is completely stopped in in the middle of a freeway, causing a big traffic mess as people try to route around it.
  • (Morning) A car has skidded off the freeway into a steep ditch, becoming engulfed in bushes. Supposedly this car had a person in it and it took police days to notice it, so there were a few police cars and traffic was bad.
  • (Morning) A truck’s doors flew open and the contents scattered everywhere on the freeway. Unbelievably, the owner of the truck was on the side of the road, making mad dashes into the freeway to collect his dropped stuff.
  • (Afternoon) A truck crashed badly on a long, curvy exit ramp. Two lanes of traffic had to be shut down and it took almost half an hour to get off the ramp.
  • (Afternoon) A “generic” accident involving three or four cars held up traffic as we left a movie theater.
  • (Afternoon) On the same trip as above, we saw an accident in progress – a car swerved from the rightmost lane of the freeway, and headed left horizontally into (I’m assuming) the concrete barrier. I didn’t see the whole thing as we were driving in the opposite direction.
  • (Afternoon) On “bike to work day,” a biker was pedaling alongside the freeway. He was on the breakdown lane, but when an exit ramp came up, he simply pedaled into the freeway. This was extremely stupid, and I almost hit the guy trying to merge. It’s not easy to merge when there’s someone on a bicycle in front of you traveling a fraction of your speed.

Then again, maybe it’s just me. The family I’m staying with contends they haven’t seen stuff like this very often, and since a few of the incidents happened with us traveling together, I’ve become a bad luck charm for cars.

I wouldn’t care except, the frequency of these incidents means I have to allocate thirty minutes extra into my driving time. Traffic in San Jose isn’t as bad as Los Angeles, but it’s pretty slow during rush hours.

A Drive Through the Country

As I alluded to in February, I’m going to try and shy away from the gaming scene while I experiment with other fields. That doesn’t mean “leave completely,” but my activity will be hampered. That is partially the reason SourceMod moved to a “rolling release” concept so we can push out incremental feature updates easier.

As part of this endeavor, I’ve accepted an internship at Mozilla. I don’t know what I’ll be working on yet, but my primary interest is in language development, so hopefully it will be EMCA4/JavaScript 2/Tamarin. It requires relocation to Mountain View, California.

I’m really looking forward to this, but the next two weeks I am not. I managed to schedule myself into a rather nasty hole. In specific, finals last until April 29th. My apartment lease ends on April 30th. I said I’d be in California by May 5th. That means I need to finish school, move all of my belongings to storage, and move — in a time span of a week.

What makes this feat particularly amusing is that, against better judgment, I have decided to drive to my destination. This is not a short road trip. Clocking in at 3,100 miles, I can expect to spend at least a few days on the road. I’ve gotten a friend to tag along to make sure I don’t accidentally drive in a direction other than “West.”

While I’m resigned to the fact that I will probably miss my deadline by a few days, it’s both a long journey and a stressful two weeks ahead. As of May 1st, I will probably not be on IRC for a week or so. You probably won’t see any commits from me for at least two weeks though I will try to read the forums and bug reports as I have time. Blog articles will resume on May 5th (I have a rather scathing, two-part overview of the JVM spooled up).

And so…
Mozilla or Bust

Sayonara, bickering with Valve!