Monthly Archives: June 2009

Quick String Interning Benchmark

While working on a project, I was wondering how to most efficiently implement string internalization. String “interning” is when two strings that compare true always have the same reference. This is quite nice as you can test if two strings are equal by just comparing their pointers.

It seems everyone uses hash tables for this, and indeed when I asked someone with more VM experience, he suggested using hash tables as well. I was still curious though, especially since SourceMod has a fancy-pants compressed “double-array” trie implementation. We knew that insertion time was poor, but I secretly hoped that this cost would be amortized over the much faster retrieval case.

I took a dump of ~5MB of plugin text from the forums and extracted all syntactically valid identifier tokens (including ones found in comments and strings). In total I had 553,640 strings, of which 19,852 were unique. This seems pretty reasonable as people tend to use similar names for identifiers. I set up four benchmarks:

  • std::map<std::string, int> — Typical STL container (red-black tree I think)
  • SourceMod::KTrie<int> — SourceMod’s Trie container
  • SourceHook::THash<const char*, int> – SourceHook’s “TinyHash,” chained hash table
  • SourceHook::THash<std::string, int> – Same, with more allocation involved

As a note, “TinyHash” is one of the first ADTs I tried to write, and it shows. Its stock find() function iterates over the whole table, instead of computing a hash, which is really slow. I filed a bug on this and hacked up my local copy to benchmark.

Each test gets run twice. The first “batch” run goes over all identifiers and inserts them if they are not already in the container. The second “retrieval” run does the same thing, except now there will be no string insertions since the table is populated.

Here’s the results, in milliseconds. I used the Intel C++ Compiler, version 10, on my Macbook Pro (2.4GHz Core 2 Duo, 4GB RAM).

Whoa! Everyone was right. Well, I knew that insertion on our Trie was pants, but I was surprised to see that the cost was in no way amortized. In fact, it wasn’t even faster on retrieval! Why? I don’t know (yet), but if I had a guess, it’s because our Trie has to peek at memory a lot. David Gregg explained to me that the advantage of tries is sorting. He also theorized it might be faster on long strings, which is in line with my original benchmarks ages ago, where I used very long randomized strings. (Tries are also good for compression.)

Well, I has sad. Looks like we’ll be ditching KTrie for a chained hash table. At least for string interning. Someone is welcome to request a hash table ADT for SourceMod scripting as well.

I also benchmarked non-PGO ICC and non-PGO g++-4.2 if you would like to see other compilers.

Note 1: Go article returns next week.
Note 2: Firefox 3.5 comes out tomorrow (June 30th!)

Computer Go, Part 2

Last week I gave an introduction to Computer Go, and left off mentioning that the current hot topic is Monte-Carlo methods. These methods use repeated random gameplay to sample a position’s value.

How do programs use random gameplay to reach a good conclusion? While you can sample specific positions, you only have finite time. How do you balance simulating one interesting position over another?

A few years ago a breakthrough algorithm was published, called UCT (Upper Confidence Bounds applied to Trees). The algorithm attempts to minimize the amount of sampling needed to converge to an acceptable solution. It does this by treating each step in the game as a multi-armed bandit problem.

Bandit Problem

A bandit is a slot machine. Say you have a slot machine with some probability of giving you a reward. Let’s treat this reward as a random variable X. You don’t know the true value of X, but by playing the slot machine repeatedly, you can observe an empirical reward, which is just an average (\mu).

Now consider having k slot machines, each expressed as a random variable X_1 \ldots X_k. Once again, you don’t know the true value of any X_i. How can you choose the right slot machine such that you minimize your losses? That is, you want to minimize the loss you incur from not always playing the most optimal machine.

Well, if you knew the true values of each machine, you could just play on the one with the greatest reward! Unfortunately you don’t know the true values, so instead you conduct trials on each machine, giving you averages \mu_i. You can then use this information to play on the observed best machine, but that might not end up being the most optimal strategy. You really need to balance exploration (discovering \mu_i) with exploitation (taking advantage of the best \mu_i).

A paper on this problem published an algorithm called UCB1, or Upper Confidence Bounds, which attempts to minimize regret in such multi-armed bandit problems. It computes an upper confidence index for each machine, and the optimal strategy is to pick the machine with the highest such index. For more information, see the paper.


Levente Kocsis and Csaba Szepesvarí’s breakthrough idea was to treat the “exploration versus exploitation” dilemma as a multi-armed bandit problem. In this case, “exploration” is experimenting with different game positions, and “exploitation” is performing Monte-Carlo simulations on a given position to approximate its \mu_i. UCT forms a game tree of these positions. Each node in the UCT tree stores \mu_i and a “visit” counter. The confidence index for each node is computed as:

\mu_i + \sqrt{\frac{2\ln(parent.visits)}{child.visits}}

Each run of the algorithm traverses down the tree until it finds a suitable node on which to run a Monte-Carlo simulation. Once a node has received enough simulations, it becomes “mature,” and UCT will start exploring deeper through that position. Once enough overall simulations have been performed, any number of methods can be used to pick the best action from the top of the UCT tree.

How this all worked confused me greatly at first, so I made what is hopefully an easy flow-chart diagram. In my implementations, UCT starts with an empty tree (save for the initial moves the player can make).

UCT Algorithm

UCT converges to an acceptable solution very quickly, can be easily modified to run in parallel, and can be easily interrupted. It has seen massive success; all competitive Go programs now use it in some form.

Levente’s paper: click here

Next week: Monte-Carlo simulations and my project.

Computer Go, Part 1

As a degree requirement, my university requires students to complete a “major qualifying project” (MQP) in their field of study. Usually these projects are done in teams of two to three people, and often they take place abroad. MQPs last seven weeks and they culminate with the group writing a thesis-length paper (80+ pages) and giving a twenty minute presentation.

One of the projects available last quarter was about teaching computers to play Go. It took place in Budapest, Hungary, because one of the foremost researchers in the field (Levente Kocsis) works at the renowned Hungarian Academy of Sciences.

I really enjoy Go. It is an amazingly deep game where complex play is derived from very simple rules. Computers have a very difficult time playing it. Amateurs can defeat the best programs even when they run on supercomputers. Playing Go is very much an open problem in artificial intelligence.

The project was researching Go with Levente Kocsis. I was really eager to get on board! Because of my (bad) academic standing I had problems with the bilious idiots in the registrar and administration, but eventually I got in. In the end, no one else joined. I had to tackle the whole project myself. That’s implementation, research, testing, and writing that paper — all in seven weeks. Well, that’s a lot for one student with absolutely no AI experience.

If I had known this ahead of time, and I had also known that I’d be going into real compiler research, I might not have done this Go project. On the other hand, I loved Budapest. It’s a great city, and I’d go back in a heartbeat. I also learned a lot. So in another one or two blog posts, I will try to share a bit about the project, before it completely slips my mind.


Go is a two-player board game. The board is a 19×19 grid. Players take turns placing black and white stones on the intersections. The object of the game is to surround as many empty intersections as possible. The game ends when both players agree there are no more moves, at which point the player with more “territory” is the winner.

So why is Go really difficult for computers, while a desktop can beat grandmasters at Chess?

The standard Chess algorithm is α-β pruning over minimax trees. Unfortunately Go’s search space is too large for this to work efficiently (see chart). The average number of moves per position in Chess is around 35, whereas in Go it is 250. Even if you play Go on a small board (such as 9×9, for beginners), there are around 40-60 moves per position.

There are additional complications. In Chess there are opening books and end-game solvers. In Go there are opening styles and patterns, but they evolve rapidly over time and are not rote as in Chess. It is also possible to quickly evaluate the value of a Chess position, whereas in Go the game must be played until the end to get an accurate scoring.

So, that’s what doesn’t work work. What does work?

Monte-Carlo Introduction

If there’s one thing a computer can do quickly, it’s picking moves at random until there is a position that can be scored. With enough random sampling, the true value of a position can be approximated within a certain accuracy. This is the basis of all Monte-Carlo algorithms, and it has become the foremost area of research for Go.

The most common application is Monte-Carlo Tree Search. The idea is to play random games from various positions. The results can then be used to explore the most interesting paths in the game tree. The idea seems ridiculous both in its simplicity and stochasticity, but it works.

Next week: Bandit problems.

Trace Compilation at PLDI 2009

To mirror Dave Mandelin’s blog post, we’ve gotten a paper on trace compilation accepted to PLDI, one of the top programming language research conferences.

Andreas Gal is presenting the paper on June 18th in Dublin. You can read it by clicking here (requires PDF reader).

Trace compilation is the new optimization technology appearing in Firefox 3.5. If you want to see how it feels, give 3.5 beta a go (it’s stable and nearing release).

I’ll have a bigger and better post about this and sundry things soon. The short of it is, I’m back into trace/language research after a six-month reprieve. It is good to be back!