Monthly Archives: March 2008

Vista Problems (Surprise?)

I recently migrated to Vista (a story and a review for another day), but one thing has been bothering me. The “Visual Effects” settings (from My Computer Properties -> Advanced System Properties -> Performance Settings) do not stick.

If I customize the settings, they get reset on logoff or reboot. This is true no matter which account I use, Administrator or limited user. I tried to monitor the dialog box with procmon to see what registry keys were involved, but there were quite a few and they looked annoying to research.

The strange thing is that the dialog box is an administrator-only feature, which would imply that the settings are system-wide. Yet monitoring the dialog box shows all sorts of per-user settings go by.

I used the classic Windows theme for XP (I hated Luna). Aero is tolerable so I decided to give it a shot, but I don’t like all of the frilly, useless animations. For example, windows “slurping” into the task bar and menus fading in and out feels kitschy to me, and only seems to serve as some sort of visual distraction or delay. So I disabled all of these animations in the Visual Effects dialog, and soon discovered that as soon as I logged out and logged back in, I had to reapply all of the settings.

I couldn’t find any other instances of this problem on Google. Damaged Soul managed to find one but it contained a red herring and no solution. I gave up and solved it programmatically, with a small, very insecure (I was lazy) C program sitting in my Startup folder.

Select All Code:

#define WINDOWS_LEAN_AND_MEAN
#include <windows.h>
#include <stdio.h>
#include <stdlib.h>
#include <tchar.h>
 
int main()
{
	HANDLE hToken;
	ANIMATIONINFO info;
 
	if (!LogonUser(_T("Administrator"),
		_T("KNIGHT"),
		_T("blahblahblah"),
		LOGON32_LOGON_BATCH,
		LOGON32_PROVIDER_DEFAULT,
		&hToken))
	{
		exit(1);
	}
 
	if (!ImpersonateLoggedOnUser(hToken))
	{
		exit(1);
	}
 
	SystemParametersInfo(SPI_SETDISABLEOVERLAPPEDCONTENT,
		0,
		(PVOID)TRUE,
		SPIF_SENDCHANGE);
 
	SystemParametersInfo(SPI_SETCOMBOBOXANIMATION,
		0,
		(PVOID)FALSE,
		SPIF_SENDCHANGE);
 
	SystemParametersInfo(SPI_SETDRAGFULLWINDOWS,
		0,
		(PVOID)FALSE,
		SPIF_SENDCHANGE);
 
	SystemParametersInfo(SPI_SETSELECTIONFADE,
		0,
		(PVOID)FALSE,
		SPIF_SENDCHANGE);
 
	SystemParametersInfo(SPI_SETCLIENTAREAANIMATION,
		0,
		(PVOID)FALSE,
		SPIF_SENDCHANGE);
 
	SystemParametersInfo(SPI_SETMENUANIMATION,
		0,
		(PVOID)FALSE,
		SPIF_SENDCHANGE);
 
	info.cbSize = sizeof(ANIMATIONINFO);
	info.iMinAnimate = 0;
	SystemParametersInfo(SPI_SETANIMATION, 
		sizeof(ANIMATIONINFO),
		&info, 
		SPIF_SENDCHANGE);
}

Two notes from playing with this API:

SPI_SETDISABLEOVERLAPPEDCONTENT appears to do nothing? I thought it would be related to Transparent Glass, but… Transparent Glass is a user-mode (per-user?) setting. You can change it in your display preferences, and oddly enough, that will cause it to flip the switch in the Administrator-only settings! I have no idea what’s going on there. Also, transparent glass is the only “effect” setting not to be reset on logging off.
SPIF_UPDATEINIFILE fails with ERROR_MOD_NOT_FOUND on Vista. Maybe it does that on previous versions too, I have no idea. Maybe I’m forgetting to link to something or maybe I’ve missed a security policy thing somewhere that fixes all of my problems.

Now that I have my hacky fix, I don’t feel like investigating the problem any further. But it’d be nice to know what’s going on here, and why those settings can’t per-user in the first place.

Why I’m not an ECE Major

I’ve always felt that Computer Science is somewhat of a liberal arts major compared to the hardy breed that is electrical engineers. So I’m taking an introductory ECE course to satisfy my own curiosity.

Today we had a simple lab demonstrating digital logic. I prepared part of the circuit on our breadboard and my lab partner finished the rest.

Unfortunately, nothing worked. I spent about fifteen minutes rewiring and pulling apart the circuit until nothing was left but the breadboard and a single logic gate. Even then I still wasn’t getting the output I expected.

I said, perplexed, “Well, I have no idea what’s wrong.” My lab partner stared at the board for a few seconds and then remarked, “Does it matter that you have the power source plugged into the GND line instead of the power line?”

I had pulled apart our entire circuit because I had forgot to plug in the power, once again reminding me that I should keep a safe distance from anything resembling hardware.

Answer to va_list on Microsoft’s x64

A while back I wrote about bad va_list assumptions. Recap: AMD64 passes some arguments through registers, and GNU’s va_list structure changes to accomodate that. Such changes mean you need to use va_copy instead of relying on x86 assumptions.

Microsoft does not have va_copy, so I was unsure how their x64 compiler solved the problem. I had three guesses: 1) va_list could be copied through assignment, 2) all variadic functions required every parameter to be on the stack, or 3) something else.

It turned out to be something else. Microsoft takes a rather strange approach. The caller reserves space on the stack for all of the registers who have arguments being passed. Then it moves the data into the respective registers but doesn’t touch the stack. The variadic callee then moves these register values into the stack above the frame, that is, where the other variadic parameters are.

For example, here is how logmessage() gets called:

Select All Code:

000000013F7F1060  sub         rsp,28h 
    logmessage("%s %s %s\n", "a", "b", "c");
000000013F7F1064  lea         r9,[string "c" (13F7F21B0h)] 
000000013F7F106B  lea         r8,[string "b" (13F7F21B4h)] 
000000013F7F1072  lea         rdx,[string "a" (13F7F21B8h)] 
000000013F7F1079  lea         rcx,[string "%s %s %s\n" (13F7F21C0h)] 
000000013F7F1080  call        logmessage (13F7F1000h)

And, here is logmessage()‘s prologue, which immediately saves its four arguments in the stack space above its frame.

Select All Code:

void logmessage(const char *fmt, ...)
{
000000013F7F1000  mov         qword ptr [rsp+8],rcx 
000000013F7F1005  mov         qword ptr [rsp+10h],rdx 
000000013F7F100A  mov         qword ptr [rsp+18h],r8 
000000013F7F100F  mov         qword ptr [rsp+20h],r9

After doing that, the register complication of AMD64 is removed, because everything just sits on the stack. Thus the va_list variable can be re-used because it’s just a by-value pointer to the stack:

Select All Code:

    va_start(ap, fmt);
000000013F7F1019  lea         rbx,[rsp+38h] 
    vfprintf(stdout, fmt, ap);
000000013F7F101E  call        qword ptr [__imp___iob_func (13F7F2138h)]

And indeed, it appears to work fine:

a b c
a b c
Press any key to continue . . .

This implementation is interesting to me and I’d love to know the reasoning behind it. I have one big guess: it preserves the calling convention. The other option is to say, “all variadic functions must pass everything on the stack.” Perhaps that additional bit of complexity was undesired, or perhaps there are optimization cases where you’d want variadic functions that don’t immediately use the stack or va_list, but still need CRT compatibility.

Whatever the case, it’s not a big deal.

And, if you were wondering: You can indeed assign va_list pointers on Microsoft’s x64 compiler. GNU forbids that so I’m unsure if that’s intended or an accident on Microsoft’s part.

IA32/x86 and GCC’s fPIC

Lately Valve has started using GCC’s fPIC option to compile their Linux binaries, and I remain unconvinced that this is a good idea.

The purpose of fPIC is to generate position independent code, or code that references data positions without the need for code relocation. Instead of referencing data sections by their actual address, you reference them by an offset from the program counter. In and of itself, it’s not a bad idea.

My observation on fPIC is that its usefulness varies depending on the platform. AMD64 has a built-in mechanism for referencing memory as an offset from the program counter. This makes generating PIC code nearly trivial, and can reduce generated code size because you don’t need full 64-bit address references. On the other hand, it can actually complicate relocation. Since the references are 32-bit, the data cannot be relocated more than 2GB away from the code. That’s a minor problem for loaders, but certainly a nastier problem for people implementing detours and the like.

So, what about x86? It has no concept of PC-relative addressing. In fact, it doesn’t even have an instruction to get the program counter (EIP)! Let’s take a simple C++ code snippet, and look at the disassembly portion for modifying g_something:

Select All Code:

int g_something = 0;
 
int do_something(int x)
{
    g_something = x;
    return ++g_something;
}

With GCC flags “-O3” I get this assembly routine:

Select All Code:

0x080483d7 <_Z12do_somethingi+7>:       mov    ds:0x804960c,eax

With GCC flags “-fPIC -O3” I get this:

Select All Code:

0x0804849a <__i686.get_pc_thunk.cx+0>:  mov    ecx, [esp]
0x0804849d <__i686.get_pc_thunk.cx+3>:  ret
 
0x08048441 <_Z12do_somethingi+1>:       call   0x8048496 <__i686.get_pc_thunk.cx>
0x08048446 <_Z12do_somethingi+6>:       add    ecx,0x12b6
0x08048451 <_Z12do_somethingi+17>:      mov    edx,DWORD PTR [ecx-0x8]
0x08048458 <_Z12do_somethingi+24>:      mov    DWORD PTR [edx],eax

The non-PIC version is one instruction. The PIC version is six instructions. As if that couldn’t be any worse, there’s an entire branch added into the fray! Let’s look at what it’s doing:

The call instruction calls a routine which simply returns the value at [esp]. The value at [esp] is the return address. This is a fairly inefficient way to get the program counter, but (as far as I know) the only way on x86 while avoiding relocation.
A constant offset is added to the EIP. The new address points to the global offset table, or GOT. The GOT is a big table of addresses, each entry being an address to an item in the data section. The entries in this table require relocating patching from the loader (and the code, subsequently, does not).
The actual address to the data is computed by looking up the GOT entry.
Finally, the value can be stored in the data’s memory.

Meanwhile, let’s look at the AMD64 versions. I apologize for the ugly AT&T syntax; GDB won’t show RIP-addressing on Intel mode.

PIC version:

Select All Code:

0x0000000000400560 <_Z12do_somethingi+0>:       mov    1049513(%rip),%rdx        # 0x500910 <_DYNAMIC+448>
0x000000000040056a <_Z12do_somethingi+10>:      mov    %eax,(%rdx)

Non-PIC version:

Select All Code:

0x0000000000400513 <_Z12do_somethingi+3>:       mov    %eax,1049587(%rip)        # 0x50090c <g_something>

Although there’s still one extra instruction, that’s a lot more reasonable. So, why would anyone generate fPIC code on x86?

Supposedly without any relocations, the operating system can keep one central, unmodified copy of a library’s code in memory. To me, this seems like a pretty meaningless advantage. Unless you’ve got 4MB of memory, chances are you have plenty of it (especially if you’re running Half-Life 1/2 servers). Also, the cost of relocation should be considered a negligible one-time expense. If it wasn’t, it’d mean you were probably doing something silly like loading a shared library quickly and repeatedly.

My thoughts on this matter are shared by the rest of the AlliedModders developers: don’t use GCC’s fPIC. On x86 the generated code is a lot uglier and slower because the processor doesn’t facilitate such addressing. On AMD64 the difference is small, but even so — as far as I know, Microsoft’s compiler doesn’t ever use such a crazy scheme. Microsoft uses absolute addressing on x86 and RIP-relative addressing on AMD64, and at least on x86 (I’m willing to bet on AMD64 as well), they’ve never used a global offset table for data.

Conclusion: Save yourself the run-time expense. Don’t use GCC’s fPIC on x86. If you’ve got a reason explaining otherwise, I’d love to hear it. This issue has been eating at me for a long time.

(Note: Yes, we told Valve about this. Of course, it was ignored, but that no longer bothers me.)

Turtles and Random Bits

I was brushing up on sorting algorithms and came across this one whose goal is to kill turtles. How uncouth!

Speaking of turtles, I’ve received two comments on dealing with TortoiseSVN’s terrible caching program. sawce replaced its executable with a dummy that exits on startup. Damaged Soul says simply removing the executable does the job as well.

Not speaking of turtles, I think I may have to start a new segment on “disappointing software.” My Motorola Q decided to format itself this week, and I had to reinstall everything. The first thing that I always install after wiping my smartphone is a program to play Go. Go is extremely difficult for computers to play, so I don’t expect much from an embedded processor. The best program I’ve found so far (that works) has been Pocket GNU Go.

I always forget its name though, so while searching for it I found a (relatively new?) program called Go Mobile. It bragged about having good graphics so I tried it out. By the third move or so it seemed to crap out. I don’t have a screenshot but the move looked like this (the triangled white piece):

To make a Chess comparison, that move is like skipping your turn and then giving a random piece to your opponent. If you’re going to make a Go program but you don’t want to bother writing playable AI for it, why not embed one of the existing solutions?

The lack of good pocket programs for Go is disappointing. Even the GNU Go port I use is very old, it’s an entire major release out of date with the PC version. Contrast — about seven years ago I had a Palm 3 (black and white, 2MB of RAM) that had an amazingly complete chess program.

Don’t call me pedantic!

A few weeks ago in a class, the professor mentioned that C does not support nested functions. He wrote an example on the board that he claimed would not compile.

A student quickly typed up the program on his laptop and said, “GCC compiled it just fine.” The professor replied, “Well, I’ve never heard of that. It must be a compiler feature unless they changed the standard.”

The student then said, “Well, I added -ansi and GCC still compiles it.” The professor caved in — “I guess they changed the standard recently.”

I was sure they most certainly didn’t. This incident bothered me since I knew nested functions couldn’t possibly be in C99. When I got home I reproduced the scenario, and sure enough, the student was right. GCC was compiling completely invalid syntax even with the “-ansi” setting. After digging through the documentation, I was able to find two things:

GCC has its own custom extensions to the C language, which includes nested functions.
This custom extension is enabled by default even if you choose a language specification where it does not exist.

As if that weren’t strange enough, GCC’s option to disable its custom additions appears to be called “-pedantic.” Uh, I don’t think it’s pedantic if I want the language to conform to the actual standard. As much as I like GCC and its custom extensions (many of which are pretty cool), they should be opt-in, not opt-out, in compliance modes.

I frequently see people say “Microsoft’s stupid compiler doesn’t conform to ANSI.” Well, after this, I’m not so convinced GCC is innocent either.

That said, GCC’s nested function feature and its implementation are both very cool. Taking a look at the disassembly, it does run-time code generation on the stack. Simple example:

Select All Code:

include <stdio.h>
 
int blah(int (*g)())
{
    g();
}
 
int main()
{
    int i = 0;
 
    int g()
    {
        i++;
    }
 
    g();
    blah(g);
 
    printf("%d\n", i);
}

Mystery Bail Theater

Sometimes worth reading

Monthly Archives: March 2008

Vista Problems (Surprise?)

Why I’m not an ECE Major

Answer to va_list on Microsoft’s x64

IA32/x86 and GCC’s fPIC

Turtles and Random Bits

Don’t call me pedantic!