Category Archives: Articles

“Official” articles that were pre-written for posting.

A Divine TraceMonkey Bug

I spent the better part of the last week or so on a rather annoying TraceMonkey bug. A few weeks ago I mentioned that we extended multiple type-specialized trees to include specialization on global types. Suddenly a rare but dangerous class of bugs appeared.

The Bug

When a trace exits back to the interpreter, it does so through a guard. These guards contain an “exit typemap,” which tells the interpreter how to take types from the native CPU and box them back into generic containers for the interpreter. The bug was that some global variables were being boxed back as the wrong type. For example, the 32-bits in an integer would be stored in a slot interpreted as a 64-bit double, or an object would be tagged as a string. When the interpreter went to unbox these again, it got garbage. That’s bad. The only way this can happen is if the exit typemap contains wrong type information.

Lazily Specialization

Global variables are tracked lazily. The overarching Trace Monitor keeps track of which global variables have been seen on trace. For example, say we have a type-stable tree, Tree 1. It has no global variables, and thus has empty exit and entry typemaps (for globals). Later, the global variable X is encountered. It’s now being tracked, but Tree 1 doesn’t have a type entry for it.

If Tree 1 wants to execute again, it will lazily update its entry typemap. The exit typemaps on the other hand are fixed and cannot be modified. So now Tree 1 looks like this:

Tree 1 Entry: X = Object
Tree 1 Exit: 

When exiting the tree, we merge the exit and outermost entry types together, giving a complete typemap. More on this later. When entering Tree 1, X is unboxed and boxed as an Object, even if it never gets used. This is because Tree 1 could call into a nested tree that does use X.

Problem #1

Let’s say later we build Tree 2. It’s the same loop as Tree 1, but it’s type-unstable. The typemaps look like this:

Tree 2 Entry: X = String
Tree 2 Exit:  X = Object

TraceMonkey’s multitrees kicks in, and now Tree 2‘s exit will go directly to Tree 1‘s entry, since their types link together. When Tree 1 exits, we combine the outermost typemap with the exit typemap. In this case, Tree 2‘s entry is the outermost typemap, but it contains the wrong type! The type of X is an Object, but now it’s being boxed as a String. Crash.

Note: This is different from normal type instability, because type unstable traces represent a path where a type is actually changed. In this scenario, the type is being mistaken, which is deadly.

Failed Solutions

Solving this was complicated by nested trees. Trees can be deeply nested, and each tree along the way could be missing global types, so it seemed like we needed to recover missing global types incrementally up the call stack. That is:

  1. Start with global types from the innermost exit.
  2. Add missing globals from each tree on the call stack, starting at the deepest, and ending at the outermost tree.

Since the outermost tree is the original tree we started with, it is guaranteed to have all missing types, so it was the last resort. Achieving this ended up with three different implementations as the patch progressed, but the idea was: as we enter trees, push them onto a stack. As we exit trees, pop the stack. If we hit a guard (which exits), immediately reconstruct the full global typemap using that stack of trees. By the time we exit back to the interpreter, a global typemap will have been prepared already.

Alas, this wasn’t enough, because…

Problem #2

Say we have two trees on the same loop, Tree 1 and Tree 2. Their typemaps are:

Tree 1 Entry: X = Object
Tree 1 Exit:  X = String
Tree 2 Entry: X = String
Tree 2 Exit:  X = String

In this situation, Tree 1‘s exit is linked to Tree 2‘s entry. Later, a new global appears, Y. We execute Tree 2 with X = String and Y = Object. Then we execute Tree 1 with X = Object and Y = String. Because of lazy specialization, the typemaps now look like this:

Tree 1 Entry: X = Object, Y = String
Tree 1 Exit:  X = String
Tree 2 Entry: X = String, Y = Object
Tree 2 Exit:  X = String

This is problematic, because Tree 1 is still connected to Tree 2, but their typemaps are incompatible! If we run Tree 1, Y will be unboxed as a String and reboxed as an Object, without the underlying type actually changing. The solution to Problem #1 doesn’t fix this.

Solution

Linked trees should never have incompatible typemaps. What counts as a linked tree? Any two trees that are connected via a nested call (nested loops), or any two trees that are connected via type instability (multitrees), are “friendly.”

In pseudocode:

LazilySpecialize(Tree):
  AddMissingGlobalTypes(Tree)
  FOREACH Tree IN FriendlyTrees
    IF MissingGlobals(Tree)
      LazilySpecialize(Tree)

Now when a tree exits, it suffices to use only the exit typemap and the typemap of the innermost entry (that is, the tree the exit immediately came from). This neatly solves Problem #1. If any one tree gets a new global, the entire graph of connected trees is updated, solving Problem #2.

This is probably one of the more difficult TraceMonkey bugs I’ve had to reason through. Nested trees always seems to complicate how you reason about tracing, and in this case it ended up not mattering. And though the problems are conceptually disjoint, they also feed into each other: fixing only #1 led to #2 breaking.

What’s next for AlliedModders?

It’s the question no one asked!

SourceMod Motivation

As I’ve too often said, the original SourceMod failed because we tried to do too much without understanding project management, the scope, or even how to code most of what we wanted to do. When we began talking about restarting the project, we did so from a research point of view. We asked the community: should we spend time writing a new language, or just get something to market on Pawn?

The community voted the latter, so we approached the project with the following questions: What would SourcePawn look like if it were modernized a bit? What would AMX Mod X look like if it had a much cleaner and more consistent API?

We wrote a new VM and JIT for Pawn. We added fast strings, basic function pointers, and dynamic arrays to the language. We boxed it into a fairly simple plugin with a small API and watched it take off, adding more as it got requested. Commands, events, menus, FFI, et cetera. It was fun designing all this.

What’s left to do that’s interesting? There’s a lot of open bugs for features people want. Perhaps the only really big ticket item left for SourceMod is script-level detouring of virtual functions, which has been in the works for a long time (and is about 50-60% done).

Most of what needs to be done though is hacking away at the indescribably infuriating Source engine, and all of the closed source games that run on it. This is boring. Sure, we’ll continue to do it. But it’s not interesting. It’s rote reverse engineering against the same old stuff. It’s monotonous and tedious, not to mention fragile. If we were developing a full product (game), we’d have access to everything. But we don’t, and we’re stuck in what’s ultimately a script kiddy’s role.

SourcePawn is Limited

SourceMod has stretched Pawn to its limits. It’s a crufty language. There is no heap. Memory management is manual. Strings are byte buffers. There are no types. There is no data integrity or safety. There are no structures or classes, just C-style functions and opaque integer values.

Basically, writing SourceMod plugins feels vaguely reminiscent of WinAPI except you don’t have any of the advantages of C.

Compiling plugins is asinine. It causes so many problems – arbitrary compatibility issues, secret backdoors, license violations, site and forum maintenance difficulty, et cetera.

Pawn is very fast because it’s so limited, but it’s also hard to squeeze more optimizations out. Everything has to go through natives, even small trivial functions. The bytecode is very low level, and though Pawn as a language has immense JIT potential, our JIT is very basic because the bytecode is too hard to analyze.

So… where do we go from here?

I want to start looking at more involved questions. The type of questions SourceMod might have asked if we had chosen the former path when restarting the project. I’ve boiled this down to two issues:

  • What would a language look like if it were specifically designed with trace compilation, game scripting, and embedding in mind?
  • What would Pawn, a language designed to be simple, look like as a truly modern scripting language, and not something from 1980?

Relatedly, there are two interesting questions for SourceMod:

  • What would SourceMod (as an “AMX Mod X done right”) look like with a more modern, object-oriented language available?
  • Can SourceMod be self-hosted?

I have mentioned this before. Most people say “Just use Python, or JavaScript, or Mono, or X!” where X is some random language that I don’t want to use. That’s not something I’m interested in. If you want to see what your favorite language looks like in Source, go ahead and embed it. This isn’t a case of NIH. It’s something I very much want to pursue to see where it goes.

The Next Project

The next project has been started. It’s a new programming language. I’ve chatted a bit about it in IRC, mostly to get some bikeshedding out of the way. As the project is not official yet, there’s no official name.

I have no idea how far this will go, or if it will go anywhere. But it’s been started. Goals, or rather, properties of the language will be:

  • Dynamically typed.
  • Strongly typed.
  • Object oriented.
  • Garbage collected.
  • Static lexical scoping.
  • Nested block scoping.
  • Explicit variable declaration.

I’ve started work on a garbage collected, register-based bytecode interpreter. I won’t start the actual compiler until I’m pleased enough with backend progress, and I won’t start the JIT until I’m pleased enough with overall maturity.

Everything will be incremental. Parity with Pawn is enough to let people start playing with things, and that would be a huge, huge step.

If anything ever comes of this, it will be interested to see how it can fit into SourceMod. While we can never get rid of Pawn because of compatibility, there’s lots of room for potential. IRC goer theY4Kman embedded Python in a SourceMod extension. We could try something similar as a starting point, or we could do something like EventScripts where Python and the “old language” of the older, darker times lives alongside.

Who is working on this?

For the most part, just me (and, if they so desire, other SourceMod developers). This is a pet project and I don’t want to be too distracted. I also don’t want to tie SourceMod to its success, so it has no relation to SourceMod’s roadmap in any way. But like SourceMod, it’s something I will attempt to work on incrementally in my free time.

If progress on this is something people are interested in, I can keep talking about it. If you’re really interested in it, feel free to find me on IRC (irc.gamesurge.net, #sourcemod). More than likely I’ll be occasionally throwing random design questions around out loud.

AAA Doesn’t Like Doing Work

Since I’ve returned to the wonderful and freezing east coast in December, I’ve had two encounters with AAA while staying with my parents.

Encounter 1
Around December 20th or so our car couldn’t get up an unplowed hill in New Hampshire. It was a two-wheel drive Mazda (I forget the model), and for whatever reason we couldn’t gun it enough to get to our house.

We called AAA to see if the car could be towed up the hill. No go, they said, if we couldn’t get it up then they couldn’t get it up either. Furthermore that didn’t count as something that would be covered, since the problem was with the road, not the car.

We said, “well eventually we’ll be stuck long enough out here that we’ll run out of gas.” The reply? “Well, in that case we’d come bring you more gas.” Uh.

I can believe them that that towing the car up the hill seemed implausible. But they could have offered something, like, towing the car somewhere safe and then bringing us up the hill.

Instead, we had to walk. In 10F degree (-9.5C) snowing weather. Up a steep hill, in the dark, for what we later clocked to be 0.7 miles (1 km). While carrying all of our luggage. It felt like the desert scene from Spaceballs, but we didn’t have any food or supplies at the house, so there were more necessities than usual.

I have severe chronic asthma, which though has gotten much better over the years, still needs work. It must have taken me around fifteen minutes to breath normally again after deeply inhaling freezing cold air for so long.

Later the next day, a family acquaintance with a plow came and plowed the street, then managed to gun the car straight up the hill with no problem. Gee, I wonder if AAA could have done that.

The incident caused my parents to, a mere ten days later, get a used 2006 Toyota Highlander Hybrid for its four wheel drive.

Encounter 2
Cars are supposedly supposed to turn off their internal lights once they’ve been closed and locked. For one reason or another that didn’t happen in the Toyota Hybrid, and the battery drained. We didn’t have jumper cables for that car yet so we called AAA.

AAA refused to jump the car because it was a hybrid. They hadn’t developed a procedure for hybrids yet. Huh? Hybrids have been out for a decade. What has AAA been doing all this time?

But that’s irrelevant, because it’s a normal battery like any other car. The owner’s manual even tells you how to jump it, and it’s exactly the same as any other vehicle.

So AAA tows the car back to our house, but the guy AAA contracted wants to try jumping it anyway out of curiosity. He connects positive to positive, then negative to negative – wait what?

You’re not supposed to connect the dead negative terminal – it’s supposed to go to ground. So AAA apparently doesn’t know how to jump a normal car. The guy was nice though, was glad to know the correct procedure, and what do you know? The hybrid jumped fine.

Conclusion
I’m not really sure I have one, other than that AAA clearly isn’t going to go out of their way to help you. They have some tome of procedures and will obey it to the letter. Maybe they’re afraid hybrids will detonate into some sort of mushroom cloud explosion. Perhaps in 2050 they’ll know how to jump them (or, any car).

More Type Instability and Specialization in JavaScript

A while back I talked about type instability in JavaScript, and how TraceMonkey optimizes it by connecting type-specialized traces together. This stitching mechanism was only half complete.

TraceMonkey has a global cache called the “trace monitor.” As the original version did not type-specialize any loop (tree) more than once, there was a design decision to only monitor one global object at a time. This is significant because there can be many global objects in JavaScript. Each window in the browser has its own global object as a security measure (windows can’t poison other windows).

Let’s say there’s a window A with global object of shape X. The monitor notices this and compiles a bunch of traces specialized to that global object. Then window B with global object of shape Y starts running an intensive loop. The trace monitor kicks in, notices that the global object is different, and flushes its entire cache. A cache flush invalidates every trace, meaning the JIT’d code is deleted.

The trace monitor also locked itself to the types of every property in the global object it was tracking. If any variable in the global object changed, the cache would flush, even if most of the traces never used this variable. Any type of global instability, either from changed global variables or different global objects, would throw out all JIT’d code.

This would be fine:

Select All Code:
function f() { 
   for (var i in ["1", "1", "1", 1, 1, 1.5])
      ;
}

This would continually flush the cache, as i is global and type unstable:

Select All Code:
for (var i in ["1", "1", "1", 1, 1, 1.5])
 ;

Luckily we’re now working on fixing this, and the first major step landed last week. The trace monitor no longer keeps track of global types. This information is tracked in each tree and each guard, and the old principles of “multitrees” apply. If a global variable is not type-stable across a loop, the loop is left unclosed and the dangling edge becomes an “unstable exit.” If there ever exists a tree whose entry types match an unstable exit, the edges are joined together.

See the original post for diagrams.

The only additional complication with global variables is that they are lazily tracked by the trace monitor. This is an optimization. If there are 50,000 globals in play and only 5 ever get used, there’s no need to iterate through every single one. As traces get recorded, every tracked global variable is included in a tree’s specialization. Old trees that don’t include newly tracked globals will be updated automatically. This removes the need for complex dependency tracking for branches and nested trees.

So now that global types are specialized pre-tree, what’s the next step? The plan is to include the actual identity of the global object in per-tree specializations. The easiest way to do this is probably to probably divide up the trace monitor, so that it can maintain separate caches for each global object. This way if a global object’s shape changes, only its cache will be flushed.

Fine-grained specialization continues to be the most interesting and promising aspect of trace compilation for dynamic languages.

Thoughts on OS X

After using OS X for the past six months I’ve kept mentally jotting down random thoughts.


1. Apple, update your dev tools. Your GDB and GCC are ancient and buggy (almost four years old)!

2. Off-topic: Why do some Apple users get so whiny about Firefox missing unnoticeable UI aspects? Safari looks completely out of place on Windows.

3. OS X usually just works. The UI is infinitely better than Linux (which doesn’t say a lot) – it’s great to have a Unixy OS with a pleasing UI.

4. Functionality for Home/End is horrible and broken. Fix it.

5. Spaces rocks. On Windows I need multiple monitors; on Mac I only need one.

6. Macbook keyboards are crap, crap, crap. Half the keys I use frequently are missing. I realize other people might not use these, and I realize there are key combinations you can use instead. This creates yet another disparity I have to learn, which is odd as the disparity is between Mac products.

7. Macbook mouse button is crap too. Why do I need two hands, and three fingers to right click, when the button is clearly big enough to at least give the user option for one side to be biased toward right clicking?

8. Macbook crappiness: why is there no pointing stick? Fitts Law — repositioning hands to move between clicking and typing is a pain.

9. Macbook continued: are there nuclear reactions going on in the left back corner? It burns.

10. Dock is far more useful than the Windows taskbar, though I miss an overall applications listing. Someone should combine the best of both worlds.

11. Thank god for macports.

12. There are places where the one-menu-bar doesn’t play out nicely. Closing the system prefs window, for example, leaves the application open – with no easy way (if there is one at all) to bring it back up. But re-launching the app doesn’t bring up the window, so you are stuck with a useless menu bar until you Cmd+Q.

13. Why the hell do I get constantly bugged to update iTunes and QuickTime and to reboot the whole machine from these updates which I don’t care about?

14. OS X needs to learn how to multitask. One app thrashing one core will grind my system to a halt. This doesn’t happen on Windows.

15. OS X takes forever to let an app crash. Even a simple shell program will sit and pause for 5-10 seconds. Need a way to disable the crash reporter entirely? And forget about Firefox – can take 5 minutes for a debug version to die.

16. Speaking of which, it’s easy to get programs into an unkillable state. It’s like you need a “kill -s 666” to uber-kill processes that are zombied. Processes being analyzed by the crash reporter seem to fall into this category. Very annoying.

17. Where is mspaint for OS X?

18. I really like stickies. Though, why can’t I move them from one space to another?

19. iChat is pretty nice.

20. Terminal is surprisingly awesome. The Gnome and KDE terminals feel really bad in comparison.

21. The two-finger-scroll on Macbook touchpads is amazing. I keep trying to use it on my Thinkpad and quickly realize how much I miss it.

22. I have yet to find a use for the dashboard. Maybe if there’s a way to put arbitrary apps on it I’d use it.

23. What’s with Mail.app? Every time I click “Erase all deleted messages” or “Erase all junk,” absolutely nothing happens.

24. Why does installing OS X make you sit through a 15 second “introduction” that’s just the world “Welcome” flying around in a bunch of random languages?

25. The security model is a lot more usable than Vista’s UAC (which, by the way, I like for the most part). Where is Vista’s “keychain?” Maybe Microsoft thinks it’d be exploited too easily.


Overall I really enjoy using OS X. If only Valve bothered porting their server software! I don’t think I’ll buy a Macbook though. As a touch typist (and perhaps related, a developer) the input mechanisms are too awkward.

TraceMonkey and Type Instability in JavaScript

Work on TraceMonkey continues! TraceMonkey is Mozilla’s JavaScript JIT based on Franz-Gal trace compilation. The JIT ships with Firefox 3.1 and can be enabled in the beta via about:config. This article is about how TraceMonkey was recently changed to handle type changes in JavaScript data flow.

Introduction

The greatest advantage to a trace compiling JIT rather than a method compiling JIT, for dynamic languages, is type specialization. Observe the following loop:

Select All Code:
function f(c) {
  for (var i = 0; i < 500; i++)
    c += 2;
  return c;
}

In a traditional JIT this method would need to be compiled generically enough to use all possible incoming types. The function could be invoked using either a string, object, number, or anything — and the compiled code must account for that. Optimizing further requires static analysis.

Type Specialization

TraceMonkey’s approach is remarkably different. If a hot path (loop) is encountered, the runtime behavior is observed, recorded, and compiled as a “trace tree.” If that loop is run using integers, the trace tree becomes a specialized chunk of assembly using straight integer math. Because trace compilation only compiles instructions that have been hit (for example, it may only compile one side to an if path), it’s much easier and faster to type specialize than doing aggressive static analysis.

This approach has worked very well so far. Except for type instability.

If that loop above were run a second time with c being a string instead, the original tree could not be used as the types are not compatible. This resulted in a “tree mismatch,” and excessive tree mismatches meant destroying the tree to make way for another. What if you could have multiple type-specialized trees for a given loop? Enter multi-trees.

Over the past week or so I have been working on (and now landed) my second major TraceMonkey patch — multi-trees.

Multiple Type Specializations per Path

TraceMonkey stores a cache of all trees. The bytecode location of each hot loop is mapped to a linked list of trees; the first in each list is called the “root peer.” Each tree records typemaps, which are a simple arrays describing the data type of each slot in the stack. A tree contains one entry typemap, which is the set of types the stack must have in order to execute the compiled tree.

The simplest case of multiple type specializations is the example above, where a simple loop’s typemap can alternate on entry. This is the easiest case to handle. The linked list is grown for each new combination of types. f(3) will create the root peer with an integer specialization. f('a') will link in a new tree with a string specialization.

To invoke a tree, the root tree is fetched and the list is traversed until a matching entry typemap is found. The tree is then executed. If no matching typemap exists, the list is appended and a new tree using the new typemap is recorded. This means there are multiple trees recorded for the same loop, and we pick one matching the types on the stack.

Type Instability Inside Loops

The hard cases involve type instability within the same loop. TraceMonkey relies on being able to close loops, meaning that the loop edge (tail of the loop) can jump directly back to the top. This can only happen if the entry type map is equivalent to the type map at the loop edge. For example, a number turning into a string can’t be closed because the compiled code at the top of the loop expects a double, and we never recorded a path to bridge the conversion back.

There are many reasons this can happen. Take the following dumb example:

Select All Code:
function g() {
  var q;
  for (var i = 0; i < 100; i++)
    q = 2.0;
  return q;
}

The first time we enter this loop, the entry typemap contains Boolean for q since it is undefined. When the loop reaches its first edge, q has changed to a Number. Recording starts on the second iteration, so now both the entry and exit typemaps will contain Number.

Now if g() is run again we can’t run the tree we just recorded, because it expects q to be a Number when it really starts out as Boolean. To solve this we immediately start recording a new tree. At the end we encounter a problem: in our tree, q started out as Boolean and now it’s a Number. This means the loop cannot be closed because the exit and entry points are not type compatible. Joining the ends would be incorrect.

To solve this we search for a tree X (on the same loop) whose entry types are identical to our ending types. If X exists we compile a “loop” that runs once, then immediately jumps into tree X. This resolves the type conflict by bridging the type flow to another loop. Visualization:

Figure 1

Delayed Bridging

What happens if no matching tree exists? Take this example:

Select All Code:
function f(var c) {
   var q;
   for (var i = 0; i < c; i++)
     q = 2.0;
   return q;
}
f(1);
f(1);
f(20);

The first call to f puts a counter that says “record a trace for me next time I’m reached.” The second call to f records a trace with a type instability: q flows from undefined to Number. Unlike the previous example though, there is no stable tree to complete the bridge.

To solve this, we have the top of each tree (where the entry typemap is stored) contain a list of unstable loop edge exit points for that specific tree. During the third call to f a stable loop will be compiled. We then use the following algorithm:

1. For all other peers that have compiled code,
2. For all unstable exit points in that peer,
3. If any unstable exit point has a typemap matching our entry typemap, bridge the points together and remove the unstable exit from its peer’s list.

This algorithm is invoked every time a main trace is compiled (that is, a trace that’s not tying to extend a branch off a tree). Once it runs in the above example we a diagram very similar to Figure 1.

The important result of this is that we can bridge many types of instability together. For example, imagine a scenario where two traces are mutually unstable. This is easily bridged:

Figure 2

Other crazy situations are handled as well. For example, stable loops with unstable branch exits, or chaining multiple unstable traces together (multiple levels of mutual instability). One case I ran into was something like this:

Figure 3

Nested Type Instability

The situation gets hairy with nested trees. Consider a loop like:

Select All Code:
for (var i = 0; i < 100; i++) {
  var q;
  for (var j = 0; j < 100; j++) {
    q = 2.0;
  }
}

In this example there’s an inner tree that will quickly type stabilize. When the outer loop starts recording, the inner loop’s incoming types will not match the existing root tree. In this case the outer loop’s recording is temporarily aborted and the inner loop immediately starts recording again under the assumption that it will record a type stabilizing trace. In this loop it does, and the two trees are connected together. The outer loop can then record and directly call into the tree that starts out with unstable types.

This solves all sorts of crazy data flow stability problems. Previously SunSpider was traced as very type unstable, and loop hoisted variables (such as the q in that example) served as tracing poison. They’re now handled fine. Our least favorite test, access-fannkuch, used to have 100,000 and even 200,000 side exits (exits from the JIT to the interpreter). It now only has 200 and it runs 2.3X faster than the interpreter (over an old 1.3X). Other cases like crypto-md5 are almost completely covered by tracing.

Thin Loops

Multi-trees go hand in hand with “thin loops,” a concept Andreas landed recently. The idea is that if a loop doesn’t close because it’s too short, we close it anyway and assume the best. That works really nicely for very thin loops, especially thin nested loops which would prevent the outer from recording. Unfortunately it doesn’t give thin loops time to type stabilize, so many thin loops are thrown out. For example:

Select All Code:
function h(c, k) {
  var q = new String('');
  for (var i = 0; i < c; i++)
    q = k + q;
  return q;
}
h(1, 'a');
h(1, 'a');
h(5, 'a');

Multi-trees solves this. The first call to h tells the JIT to start recording the next time h is called. When h gets called the second time a thin loop is recorded. But the types are not stable, and the loop cannot be closed – we get another dangling trace that runs once and exits. When h gets called a third time it type stabilizes. A new loop is compiled and the two traces are bridged together.

Note: h is not immediately stable because new String returns a JSVAL_OBJECT in SpiderMonkey, whereas the addition returns a JSVAL_STRING.

Type Speculation

TraceMonkey has a very cool optimization called type speculation. Floating point values like 30.0 or -12.0 fit into an integer. When TraceMonkey starts recording a loop it speculates that such “promotable” numbers are integers for the entire loop. When the loop closes, if the number in that slot changed to a double, the loop can’t be closed and the recording is thrown out. How do you tell it to not perform integer promotion next time that loop is recorded?

The original solution was an oracle, a simple lossy hash table. If an integer->double conflict arose the oracle would be told “please blacklist stack slot N for bytecode location X, so we don’t try to promote it to an integer.” Since the hash table is lossy this led to false positives as traces filled the JIT cache.

Multitrees rids most uses of the oracle. If a loop can’t be closed because of integer->double conflicts, a new trace is immediately recorded with the same conflicts demoted to doubles on entry. This is better than compiling bridged traces (int->double and double->double), because the extra time spent compiling has very little gain.

There is one case where the oracle is still important (global variables aside), in that the oracle can help memoize type instability for nested loops. If the JIT cache is flushed the oracle will still remember which stack slots can’t be promoted, and outer loops can be compiled faster. Thus the oracle only memoizes stack slots when recording inner tree calls.

There is also a case where we build similar traces that are impossible to connect. For example, two variables going from int->int in one tree, and double->double in a peer tree. If a branch from the first goes from int->double for one the variables but not the other, the traces cannot be connected without some sort of intermediate conversion logic. Perhaps with very bushy trees it would make more sense to build such intermediate bridges. For now we simply throw out the tree that is causing the problems (the original integer tree). This is safe because integers can always be boxed into doubles, so the second trace is all we need.

Problems

It’s not perfect. There’s potential for code/trace explosion if types are too unstable. The heuristics for deeply nested trees can get messy too. If there are many inner trees that are type unstable it can take a long time to build a stable outer tree.

Compared to the original “single-tree specialization” TraceMonkey, there are cases that get a little slower (although still faster than with no JIT). This is because the original algorithm aggressively pegged incompatible values into mismatching trees. For example, undefined (Boolean) was passed as a 0 for integer slots and NaN for double slots. This was wrong. For example, ((undefined == undefined) == true) but ((NaN == NaN) == false). And ((undefined == undefined) == true) but ((undefined == 0) == false). Other operators are hurt too. NaN is a poison value and causes any expression to return false.

So there were cases where the compiled code was simply wrong and produced incorrect results (but boy was it fast!). By the time this fault was discovered it was too late. Removing this aggressive coercion corrected a small percentage of code at the price of greatly reducing performance. Trees would continually get trashed because they mismatched. Multitrees solved this, but in some cases it uncovers new paths that don’t yet trace well.

Conclusion

As TraceMonkey improves, type specialization will bring JIT’d code closer and closer to C equivalents. Multiple type specialized trees furthers us toward this goal. It solves a huge class of correctness bugs and improves performance in many scenarios where nested trees could not be compiled.

SourceMod JIT Changes

A few months ago I made some major changes to the SourceMod JIT, the first major changes since the thing was originally written two years ago.

Pawn suffers from many problems. One of my (least) favorites is the lack of separation between a runtime and a context. Some language implementations don’t seem to have this either. Python, notably has a global runtime and no notion of a context as far as I can tell. SpiderMonkey abstracts both. Lua seems to have both and combines them into one structure.

Pawn historically shoved both a runtime (VM code+heap) and a context (thread stack+locals) into the same structure, and SourcePawn originally did the same. The problem was that once you created this structure, it was immutable. You couldn’t add code or data. Sharing between scripts meant jumping runtimes which is very expensive to marshal. Each plugin having its own runtime and context is a waste in every sense of the word.

Unfortunately the design of the VM doesn’t really let us fix that. References to memory are not typed, and references to global memory can’t be easily deduced at verification time. This makes it impossible to put multiple scripts into one structure with a common heap. For example:

const.pri 50
load.pri

This is a load from global data at address BASE+50. SourcePawn is a straight-line JIT, and some form of data flow analysis would be needed to catch and relocate these references. Although we could change the compiler, we have to be backwards compatible with old bytecode.

So stuck with this crummy design I went on a spree of changes to try and clean things up. The first was separating the ideas of a context and a runtime – although runtimes are still per-plugin, the invocation model is now centered around Functions and Contexts. That is, the API for calling a function used to require a runtime and a code addres. It now requires a function object and a thread context. The API looks a whole lot cleaner, but it’s just lipstick on a pig – internally, contexts and runtimes are still tied together because of Pawn’s limitations.

Though as part of those changes the implementations for contexts and runtimes got moved into the JIT library. Previously they were separated, the JIT lived half its own library and half in SourceMod. SourcePawn is officially standalone now. For testing I was even able to hack up a shell app to load plugins using the DLL and headers from SourceMod snapshots.

The third big change was to move to whole-method incremental compilation. Previously the entire code blob of a plugin would get JIT’d on load. Now we only JIT functions once they’ve been run for the first time. This speeds up load time noticeably; compilation time becomes spread out, and the hot paths (such as OnGameFrame) are hit quickly, so gameplay is unlikely to be interrupted. It’s faster to compile a smaller chunk of code anyway, especially since SourcePawn does little to no JIT-time analysis.

Functions that call other functions (via the CALL opcode) are special. If the target function is already compiled, a direct call is generated. If the function is not already compiled then a special thunk is generated instead. If the thunk is ever executed it checks to see if the function has been compiled yet. If it hasn’t, it compiles it. The thunk completes by replacing itself with a direct call to the target function.

The last big change was a rewrite of our debugging implementation. The previous concept (lifted from AMX Mod X) was pretty bad. Pawn emits BREAK opcodes throughout methods to indicate line changes. The JIT would change these BREAK opcodes into a function call called the “debug hook.” The debug hook would look for changes to the frame pointer to deduce whether the call stack was entering or leaving a function. This was really expensive so you had to explicitly enable debug mode to get error diagnostics.

The new implementation emits a single instruction for each BREAK opcode:

mov [save_pc], pc

For the CALL opcode, the pc (program counter) is pushed onto a separate stack (the call stack). When a CALL finishes, the top value is popped. If a function throws an error, the complete stack trace can be recovered from a combination of save_pc and the call stack. The pc can be used to find the line, file, and function in the debug info. This is fast enough to remove re-JITing to debug a plugin.

So SourcePawn is a little more mature now. It’s got an isolated runtime, it incrementally JITs, and the design (on the outside) is a lot cleaner. I wouldn’t recommend using it in other projects yet, but it’s at a point where if we wanted to take it further we could do so without more massive rewrites.

Dude, that’s Gangsta – Driving across the USA

I apologize for the lack of updates lately. Work is busy! Please enjoy (or don’t) this boring, non-coding writeup. A big article about recent work with trace compilation is coming up soon, I promise!


Back in May I drove across the country – from Rhode Island to California – with my friend Dan. I’m not sure how he managed (or why he even agreed) to put up with me through this, especially when our only entertainment was two Killers’ CDs, a Lewis Black CD, and some Family Guy. Our destination was San Jose where twisty lives (a friend from WPI).

The plan was simple: We’d drive out, stopping at random hotels when we got tired, and hit CA within four days. Dan would fly back after we relaxed for a bit, and I’d stick around for my new job.

As we drove across the country I took mental notes about the various states we went through. For all of these states we pretty much stayed entirely on I-90 W. It was a strange road. It can’t quite decide what its name is. Near the start of the mid-west it becomes “I-90/I-80” and then just “I-80.”

Along this highway, up until Illinois or so, “rest stops” were big plazas with restaurants and tourist information. This fanciness stopped fairly quickly and as we delved further into the west, rest stops became so far apart and dilapidated (if not entirely closed) that we gave up using them.

All of our experiences were based on that highway, as in, these comments apply to a rather small and quickly applied vignette of each state.

Day 0
New York: We completely avoided any cities and mostly only saw trees and hilly regions. Our stop for the night (we got a late start) was Weedsport, NY at a Days Inn.

Distance: 333.7 miles (333.7 miles total).

Day 1
Pennsylvania: We cut through PA pretty quickly. If I recall correctly it was overcast and maybe raining. It seemed to be mostly farmland in the area. We drove straight through Erie and didn’t even visit sawce.

Ohio: This is our least favorite state, as it’s the only one we got a speeding ticket, for 87mph (I think our record to that point was 97). It was the last state with a speeding limit of 65mph. Going through Cleveland was a pain as the traffic was bad, and I also don’t recall any other part of the highway going through such an annoying area.

After Ohio we used cruise control the entire way.

Indiana: I can’t recall anything particularly interesting about this state. The speed limit got raised to 70mph or so.

Illinois: The last state with a decent rest stop, sadly at the very beginning. If I recall correctly the speed limit was down to 65mph again. We stopped in Morris, IL overnight at a Holiday Inn.

Distance: 687.7 miles (1021.4 miles total).

Day 2
Iowa: Long and boring drive with endless farmland. No crops were actually growing yet so there was quite literally nothing to see except for farm structures/equipment and cows. The speed limit was bumped back up to 70mph here.

Nebraska: The first state that was really, really long. Luckily the speed limit was bumped to 75mph (effectively 80, though we played it safe). At this point and until Nevada, most other vehicles on the road were trucks. Amazingly enough, a blizzard approached and I-80 was shut down. We narrowly missed it, though a good deal of the night driving was through heavy rain and winds and pretty unsafe. Surprising for early May.

Nebraska, like Iowa, was very boring. An endless day sky combined with increasingly little to see was tiring, though there were still frequent towns and stop points. It was a very weird experience to drive hundreds of miles and not feel like you’ve moved. Everything was literally so flat from there on that we had no frame of reference for where we were. Talk about feeling like a speck.

Nebraska was the end of Dunkin’ Donuts for Dan. Surprisingly NoCal is pretty lacking in equivalent drive-thru breakfast/coffee shops.

The area was mostly farms. Gas was cheapest in NE, with the higher quality gas being less expensive for some reason (at around $3.35, compared to RI’s $3.70 and CA’s upwards of $4 — this was in May, again). We stopped in Kearney, NE overnight at a Days Inn.

Distance: 635 miles (1656 miles total)

Day 3
Wyoming: This state was endless. Unfortunately the real interests (Yellowstone National Park and whatnot) are up north, and I-80 is down south. I have to say Wyoming was the weirdest state of all and Dan would probably agree.

The whole state felt off-kilter from the start, like we had entered the twilight zone. I can’t really explain it. There was tumbleweed on the highway and the scenery felt dated. After we got through Cheyenne things got weirder. Highways turned into very sketchy roads clinging rocky structures full of construction.

We stopped at a Pizza Hut (hey, I love their cheese-crust pizza!) which was a mistake. It was full of loud kids. The table next to us was occupied by a strange family. Four really, really, morbidly obese children with an equally corpulent father. The mother seemed really thin. One of the kids, a girl, went up to the salad bar and came back with a plate full of cookies. The dad kept yelling at her, “stop eating that plate of cookies!” She didn’t. Interesting parenting.

After escaping that we went to get gas, and Dan freaked out because the gas was all 85 octane. My Toyota Corolla takes 87. He scoffed at the vendors “ripping people off,” but later we read on Wikipedia that high elevation (such as in Wyoming) means you can use 85-octane gas without the “knocking” effect. Thank you, Wikipedia.

The rest of Wyoming was very, very, sparse. The lack of civilization was amazing. Every 40 miles or so we saw a trailer camp and maybe a small industrial rig that looked like something out of Half-Life 2. There were occasional billboards. They were always for gambling, porn, or fireworks. Even gas stations and random convenience stores had slot machines. Weird.

The most amusing spectacle was an old school bus dumped along the side of the highway. It had a giant banner strapped to it for “ADULT VIDEOS.” Someone had decided that an abandoned school bus made a good porn billboard. We henceforth dubbed this the “porn bus.”

All that aside, it had some very nice views. Mountains were viewable from a good portion of the highway (Rockies, I think?) and there was a high-elevation rest stop that made for some decent pictures.

Utah: Even having started during the day we didn’t get into Utah until night, since Nebraska and Wyoming are endless. There’s not much to see at night and we tried to speed through most of Utah, missing any sights such as the Great Salt Lake. Unfortunately it was around 1:30AM by the time we approached the Utah border. I was already dead tired and hopped up on Monster (which after two cans makes you feel hung over). The nearest city was another 80-100 miles away so we couldn’t claim conquering Utah in its entirety — we stopped at a Days Inn in Wendover, Utah overnight. It was a mere two-miles from entering Nevada.

Distance: About 870 miles (2526 miles total)

Day 4
Nevada: Border to border dust and desert.

The desert was this weird amalgam of completely foreign sand compositions. For example, bright white patches of sand (or sometimes even seas of the stuff). Sometimes it was hard to tell whether it was giant pools of liquid. I can’t really explain it. Not terribly interesting, just strange.

Along the roadside a common phenomenon was small black rocks being arranged into letters/words. This occurred randomly throughout the entire stretch of Nevada’s I-80, often sprinkled throughout the sparse vegetation. I regret not stopping for pictures because I can’t recall any of the phrases now. I just have to wonder:

Who’s been writing things with rocks on the roadside for hundreds of miles? We could go thirty minutes and not see a single car or sign of life, but rock graffiti would be there. Judging by a google query, I’m not crazy. Perhaps we should have stopped and added our own — AlliedModders!

It was very surreal. The worst was exiting Nevada (near Reno). We hit some combination of a dust and rain storm, which meant my car got CAKED in this thick battering of mud. All the cars driving out of the storm with us had this layer of crap. I didn’t get my car washed for a few months, mostly to keep this “battle scar” as a souvenir.

California
As we approached CA we were stopped by some sort of customs booth asking if we had any foreign fruits or something. The elevation started getting pretty high after (7k feet or so). We tiredly cheered seeing the “Welcome to California” sign. The driving instantly changed at this point. CA drivers are crazy. Apparently we were driving too slow from the start, and a guy on a motorcycle wearing a skull mask gave us the middle finger as he sped by.

Crawling down the high elevation was interesting as the degree of slope on some roads was high. There were long stretches of gravel on the sides of roads, which Dan told me were for trucks — if their brakes failed and they plummeted down the roads, they could aim for these gravel stretches and sink in. And then probably explode.

We stopped at a random point at Lake Tahoe to stretch for a bit. It was evening though and the views were getting cloudy. We finally got off I-80 (a big milestone!) and landed on I-5, and then a few hours later I-680. We reached our destination in San Jose at 10:45PM.

Distance: About 700 miles. The grand total ended up being around 3,300 or so.

Conclusion
Would I do it again? You betcha. I don’t get out much so this was an adventure for me. When I do it again I will change some things:

  • Leave more time. Rushing makes me testy.
  • Go through areas where things are worth seeing. Who wants to drive through a thousand miles of corn? The CA coastline is amazing (my youngest brother drove it with my dad) and I would like to see it.
  • Bring along more entertainment. I listened to those Killers CDs about a hundred times each.
  • Stop for pictures when I see a porn bus.
  • Better pacing so the boring driving gets done at night.
  • Stock up on energy drinks beforehand. In general, keep a cooler with food — fastfood gets old and it’s the only thing in many areas.

When twisty’s neighbor saw my car parked outside, he said “Rhode Island? How did that get here?” “Oh, I drove.” “You drove the entire way?” “Yep!” “Dude, that’s gangsta.”

SourceMod Project Changes

Over the past three weeks we’ve made some big changes to SourceMod in terms of managing the project.

Buildbot
I have pretty much nothing but praise for this tool. It’s easy to set up and configure. In about a day I had replaced SourceMod’s terrible and messy Perl scripts. And it looks very pretty.

We have a separate build server with slaves for Linux and Windows (Window is running under VMWare). The master lives on the web server. The version control system (was Subversion, is now Mercurial) notifies the master of source code changes, and the slaves build SourceMod and push snapshots back to the web server. It also manages the symbol server.

Our scripts for buildbot live in tools/buildbot under the source tree for all who are curious.

Mercurial
I’ve come to love Mercurial, despite its idiosyncrasies, from using it at work constantly. The distributed nature makes everything much easier. I can keep many copies of the repository around on different computers and share changesets in between them and other people. I don’t need internet access to push changes and I can juggle many patches at once.

We started to notice a problem in SourceMod. We had two branches: 1.0, and 1.1. When we pushed bug fixes to 1.0, there was no way to share these back to 1.1 while maintaining change history. Mercurial inherently solves this. Subversion 1.5 supposedly has merge tracking, but by the time it was out our minds were already made up. Mercurial’s distributed nature is awesome.

I haven’t learned how to use patch queues yet (mqueue), but I’m being encouraged to at work. Our Mercurial repositories are here and I’ve written a tutorial (akin to our CVS and Subversion tutorials) here.

Bugzilla
I originally chose Flyspray over Bugzilla because it looked nicer and I could understand the permissions system. Eventually I came to realize that Flyspray was easy to outgrow, and I came to like Bugzilla more and more from using it at work.

Flyspray is good, and probably suffices for small projects. But it’s buggy. Our install was breaking for some reason, we couldn’t search for users anymore (every user would return “not found”). Marking a project as inactive would let normal users edit bugs still. There seemed to be no way to have group-specific permissions, which meant we couldn’t have private or internal projects. The UI didn’t work on mobile devices (this is probably the fault of poor AJAX implementations). You couldn’t refresh a bug after posting a comment, because there was no link back to the bug.

Bugzilla has a much more complex UI but there are a few things that sealed the deal. Aside from addressing the problems Flyspray had, its authentication system was a lot more flexible internally. I was able to write a simple authentication plugin for vBulletin. The ability to watch users makes automated CC’ing much more flexible. Now anyone can listen for all SourceMod bugs, whereas previously only the project owner could.

The attachment system is a huge win. Attachments can be obsoleted and they all live on the same section of the screen, near the top, rather than sprinkled throughout comments. It can show diffs for patches which is really nice. You can request peer-reviews on patches which is something Mozilla does extensively.

If anyone’s looking to convert from Flyspray to Bugzilla, you can have my hacked-up Perl script. It uses an intermediate table called “mvtab” (type varchar, old_id int, new_id int) as a temporary work table. You need to create this yourself. You also need to manually import products and components (components must have the same names as categories). It’s really a mess but it got the job done, so play with it at your own risk.

flyspray_to_bugzilla.pl
bz_import_mail_settings.pl (tack-on, since the original import didn’t get e-mail settings right)

And here’s the Bugzilla vBulletin binding: vBulletin.pm (goes in Bugzilla/Auth/Verify). You may need to change config files somewhere to use it.

Can you do this to your VP?

Last week, us Mozilla interns held our annual intern BBQ. I guess one of the traditions is to throw someone in the pool.

This year they chose Mike Schroepfer (also known as “Schrep”). Schrep is the VP of the Engineering, and about five seconds with him is enough to know he means business. He talks fast and asks deep technical questions about the projects he oversees. This is an interesting trait, as he’s both very bright on a technical level while also being a high-level manager.

So when five+ interns tried to gang up on Schrep (who, as always, was dressed as neat as a pin), people on the sidelines seemed a bit apprehensive about how that’d play out. Lo and behold, not well.

In the process of trying to grab him, they accidentally tore his shirt. Then as they got close to the pool, Asa Doltzer warned them that Schrep’s shoes were too nice to get wet, so the interns had to pull them off. After that they gave up and dropped him to the ground by the pool.

Schrep didn’t look terribly pleased but he took it all in stride, even making a joke about it at the all-hands company meeting on Monday: “For the record,” he said, “Despite it being five-on-one, you guys still couldn’t take me down!”

I have a feeling that if I end up at another company, throwing the VP into pools won’t be an employee perk. Though, the only people I have been known to throw into pools are my brothers (who, as a trait of being younger than I, eternally deserve it).