Portability: Variadic Arguments

Last week I mentioned a portability problem with variadic functions. Today’s topic is similar.

In late 2005 I transitioned ESEA from AMX Mod to AMX Mod X. We were only using it for a CSDM server. The server ran in 64-bit mode, so I installed 64-bit builds of AMX Mod X and CSDM, verified that they were running, and considered the job done.

Soon reports came in from users that the server wasn’t working – the gun menus were simply dying out instead of giving weapons. This bit of code in CSDM was failing (simplified):

Select All Code:
public OnMenuSelected(client, item)
{
   if (item == -1)
   {
      /* Do something */
   }
}

After hours of debugging, the problem became known (I believe it was PM who discovered it). To explain the problem, let’s take a look at what’s involved. AMX Mod X plugins use a data type for integers called a “cell.” Cells have a small catch over normal integers:

Select All Code:
#if defined __x86_64__
typedef int64_t cell;
#else
typedef int32_t cell;
#endif

It is 32-bit on 32-bit systems, and 64-bit on 64-bit systems. That’s unusual because on AMD64, an integer is 32-bit by default. The cell’s weird behaviour was a necessary but awkward idiosyncrasy resulting from some legacy code restrictions.

AMX Mod X relied on a single function for running stuff in plugins. This function’s job was to eat up parameters as cells, using va_arg, and to pass them to a plugin. For demonstration purposes, it looked like:

Select All Code:
int RunPluginFunction(const char *name, ...);

CSDM’s failing function was getting invoked like this:

Select All Code:
RunPluginFunction("OnMenuSelected", client, -1);

Now, let’s construct a sample program which demonstrates how this idea can break:

Select All Code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#include <stdio.h>
#include <stdint.h>
#include <stdarg.h>
 
#if defined __x86_64__
typedef int64_t cell;
#else
typedef int32_t cell;
#endif
 
void print_cells(int dummy, ...)
{
    cell val;
    va_list ap;
 
    va_start(ap, dummy);
    val = va_arg(ap, cell);
    printf("Test: %016Lx\n", val);
    va_end(ap);
}
 
int main()
{
    cell val = -1;
    print_cells(1, 1);
    print_cells(1, val);
    print_cells(1, -1);
    return 0;
}

This program has a small variadic routine which reads in a number as a cell and prints it. Our tests print 1, -1, and -1. Here’s what it outputs on AMD64:

Test: 0000000000000001
Test: ffffffffffffffff
Test: 00000000ffffffff

The first case looks good, but what’s up with the other two? We passed -1 in both times, but it came out differently! The reason is simple and I alluded to it earlier: AMD64 treats numbers as 32-bit by default, and thus that hardcoded -1 was 32-bit. The higher bits didn’t get used, but they’re there anyway because internally everything is stored in 64-bit chunks (registers are 64-bit and thus items on the stack tend to be 64-bit just to make things easy).

If you were to take that raw 64-bit data and interpret it as a 32-bit integer, it would read as -1. But as a 64-bit integer (or a cell), because of two’s complements, it’s not even negative! Of course, va_arg doesn’t know that we passed 32-bit data. It simply reads what it sees off the stack/register.

So what happened is that the plugin got a “chopped” value, and the comparison of 0xffffffffffffffff (64-bit -1) to 0x00000000ffffffff (32-bit -1 with some garbage) failed. As a fix, we went through every single instance of such a call that could have negative numbers, and manually casted each afflicted parameter to a 64-bit type.

The lesson? Avoid variadic functions as API calls unless you’re doing formatting routines. Otherwise you’ll find yourself documenting all of the resulting oddities on various platforms.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">