Portability: Variadic Arguments

Last week I mentioned a portability problem with variadic functions. Today’s topic is similar.

In late 2005 I transitioned ESEA from AMX Mod to AMX Mod X. We were only using it for a CSDM server. The server ran in 64-bit mode, so I installed 64-bit builds of AMX Mod X and CSDM, verified that they were running, and considered the job done.

Soon reports came in from users that the server wasn’t working – the gun menus were simply dying out instead of giving weapons. This bit of code in CSDM was failing (simplified):

Select All Code:
public OnMenuSelected(client, item)
{
   if (item == -1)
   {
      /* Do something */
   }
}

After hours of debugging, the problem became known (I believe it was PM who discovered it). To explain the problem, let’s take a look at what’s involved. AMX Mod X plugins use a data type for integers called a “cell.” Cells have a small catch over normal integers:

Select All Code:
#if defined __x86_64__
typedef int64_t cell;
#else
typedef int32_t cell;
#endif

It is 32-bit on 32-bit systems, and 64-bit on 64-bit systems. That’s unusual because on AMD64, an integer is 32-bit by default. The cell’s weird behaviour was a necessary but awkward idiosyncrasy resulting from some legacy code restrictions.

AMX Mod X relied on a single function for running stuff in plugins. This function’s job was to eat up parameters as cells, using va_arg, and to pass them to a plugin. For demonstration purposes, it looked like:

Select All Code:
int RunPluginFunction(const char *name, ...);

CSDM’s failing function was getting invoked like this:

Select All Code:
RunPluginFunction("OnMenuSelected", client, -1);

Now, let’s construct a sample program which demonstrates how this idea can break:

Select All Code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#include <stdio.h>
#include <stdint.h>
#include <stdarg.h>
 
#if defined __x86_64__
typedef int64_t cell;
#else
typedef int32_t cell;
#endif
 
void print_cells(int dummy, ...)
{
    cell val;
    va_list ap;
 
    va_start(ap, dummy);
    val = va_arg(ap, cell);
    printf("Test: %016Lx\n", val);
    va_end(ap);
}
 
int main()
{
    cell val = -1;
    print_cells(1, 1);
    print_cells(1, val);
    print_cells(1, -1);
    return 0;
}

This program has a small variadic routine which reads in a number as a cell and prints it. Our tests print 1, -1, and -1. Here’s what it outputs on AMD64:

Test: 0000000000000001
Test: ffffffffffffffff
Test: 00000000ffffffff

The first case looks good, but what’s up with the other two? We passed -1 in both times, but it came out differently! The reason is simple and I alluded to it earlier: AMD64 treats numbers as 32-bit by default, and thus that hardcoded -1 was 32-bit. The higher bits didn’t get used, but they’re there anyway because internally everything is stored in 64-bit chunks (registers are 64-bit and thus items on the stack tend to be 64-bit just to make things easy).

If you were to take that raw 64-bit data and interpret it as a 32-bit integer, it would read as -1. But as a 64-bit integer (or a cell), because of two’s complements, it’s not even negative! Of course, va_arg doesn’t know that we passed 32-bit data. It simply reads what it sees off the stack/register.

So what happened is that the plugin got a “chopped” value, and the comparison of 0xffffffffffffffff (64-bit -1) to 0x00000000ffffffff (32-bit -1 with some garbage) failed. As a fix, we went through every single instance of such a call that could have negative numbers, and manually casted each afflicted parameter to a 64-bit type.

The lesson? Avoid variadic functions as API calls unless you’re doing formatting routines. Otherwise you’ll find yourself documenting all of the resulting oddities on various platforms.

One thought on “Portability: Variadic Arguments

  1. Jasper

    I’m not sure this problem has anything to do with variadic functions…

    I believe this could have just been fixed by adding “L” (long) to the “-1″ like so:

    print_cells(1, -1L);

    I believe by default constants are “int” which on an LP64 machine would be 32-bit hence why you are only seeing the bottom half of -1 in that last print. Adding “L” tells the compiler to use a long constant (64-bit on LP64 machine)

    See “Numeric Constants” here: http://www.ibm.com/developerworks/library/l-port64/index.html (“Porting Linux applications to 64-bit systems”)

    Reply

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">