Advanced programming tips, tricks and hacks for Mac development in C/Objective-C and Cocoa.

Type punning isn't funny: Using pointers to recast in C is bad.

A very common C technique for reinterpreting data types has the potential to cause nasty bugs. Apple knows this, which is why the implementation of NSRectToCGRect (correctly) doesn't do what the documention claims. I show you a technique to perform reinterpret casts safely in your own code.

Apple's documentation for the function NSRectToCGRect claim that it is implemented as follows:

CGRect NSRectToCGRect(NSRect nsrect) {
   return (*(CGRect *)&(nsrect));
}

If you have seen a lot of C code, chances are that you've seen this approach before. You can't cast one struct to another to reinterpret — even if they have the same fields — so it is common to see reinterpreting by making a pointer and casting the pointer.

The implication is that NSRectToCGRect reinterprets an NSRect as a CGRect without altering the contained data.

While the implied functionality is accurate, the displayed implementation is not. In actuality, the function looks like this:

NS_INLINE CGRect NSRectToCGRect(NSRect nsrect) {
    union _ {NSRect ns; CGRect cg;};
    return ((union _ *)&nsrect)->cg;
}

Why the difference? Why bother creating a union? Why shouldn't you simply cast through a pointer?

Type punning

As common as casting through a pointer is, it is actually bad practice and potentially risky code. Casting through a pointer has the potential to create bugs because of type punning.

Type punning
A form of pointer aliasing where two pointers and refer to the same location in memory but represent that location as different types. The compiler will treat both "puns" as unrelated pointers. Type punning has the potential to cause dependency problems for any data accessed through both pointers.

Most of the time, type punning won't cause any problems. It is considered undefined behavior by the C standard but will usually do the work you expect.

That is unless you're trying to squeeze more performance out of your code through optimizations. Specifically, if you ever turn on "Enforce Strict Aliasing" in XCode (a.k.a -fstrict_aliasing in GCC) you run the risk of unpredictable and errant behavior. With strict aliasing, the compiler may start doing things in the wrong order or leaving instructions out entirely.

To be clear, these bugs can only occur if you dereference both pointers (or otherwise access their shared data) within a single scope or function. Just creating a pointer should be safe.

An example of a punning bug

Before the NSRectToCGRect function existed, I had some code which did the following:

NSRect ellipseBounds;
ellipseBounds.origin.x = 0;
ellipseBounds.origin.y = 0;
ellipseBounds.size.width = WIDGET_SIZE - 1.0;
ellipseBounds.size.height = WIDGET_SIZE - 1.0;
ellipseBounds = NSInsetRect(ellipseBounds, 4, 4);

CGContextAddEllipseInRect(context, *(CGRect *)&ellipseBounds);
CGContextFillPath(context);

This code creates and sets up an NSRect and then reinterprets it as a CGRect before using it.

In this case, with -fstrict_aliasing enabled, GCC chose to order the NSInsetRect after the call to CGContextAddEllipseInRect because the dependency between the two was broken by type punning when the pointer to ellipseBounds was dereferenced as a different type.

Union solves the problem

The traditional solution to this problem, to allow the code to be correct with -fstrict_aliasing enabled, is to use a union. As shown in the NSRectToCGRect code, the union should contain the source and destination types and you simply set or cast to the source type before reading from the destination type.

According to the C standard, anything involving type punning is implementation specific. So in a "standard" sense, using a union doesn't necessarily solve the problem. According to the standard, if you set data in a union on one field, you are required to read back from the same field.

Fortunately, GCC explicitly gives permission to do different. From the GCC documentation:

The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common. Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type.

Excellent.

A macro to reinterpret your own data safely

Really simple:

#define UNION_CAST(x, destType) \
	(((union {__typeof__(x) a; destType b;})x).b)
This example now incorporates the "__typeof__" suggestion made by Daniel NĂ©ri in the comments.

So you could cast a float variable named myFloat to an int as follows:

int myInt = UNION_CAST(myFloat, int);

You might notice that I don't bother with an inline function, I don't give the union a name, and I don't make a pointer to the value before casting. The Apple NSRectToCGRect function did these things but they are unnecessary. Although, since the compiler should optimize away the extra work, the function, the extra pointer and the dereference in Apple's code shouldn't matter.

Conclusion

Creating a pointer to a value and recasting the pointer to a new type is the most common way to reinterpret data in C that I've seen. Despite its prevalence, you shouldn't do it. Always do your reinterpret casts through a union. It could save you a lot of trouble if you're ever trying to squeeze performance through compiler options.

No comments: