Advanced programming tips, tricks and hacks for Mac development in C/Objective-C and Cocoa.

Memory and thread-safe custom property methods

Objective-2.0 property methods are a nice convenience but if you need to override a property implementation — particularly an atomic, retained or copied object setter property — there are some potential bugs you can create if you don't follow the rules carefully. I'll show you the pitfalls and the correct way to implement a property accessor. I'll also show a way to directly invoke hidden runtime functions to let Objective-C perform atomic getting and setting safely for you.

Custom getter and setter methods for implicitly atomic types

For implicitly atomic types or for types where memory management doesn't apply, custom getter and setter methods in Objective-C are easy. These "easy" situations include:

  • Basic value types (char, short, int, float, long, double, etc).
  • Objective-C objects in a garbage collected environment
  • Assigned (non-retained) pointers

For these types, it is pretty hard to get a custom getter or setter method wrong. For the following property declaration:

@property SomeAtomicType somePropertyVariable;

the custom getter and setter simply look like this:

- (SomeAtomicType)somePropertyVariable
{
    return somePropertyVariable;
}
- (void)setSomePropertyVariable:(SomeAtomicType)aValue
{
    somePropertyVariable = aValue;
}

Common mistakes in accessor methods for non-atomic types

Non-atomic types require greater care. These types include:

  • Objective-C objects in a manually managed memory environment
  • structs and other compound types

Given how simple custom getter and setter methods are for atomic types, it is easy to be complacent about implementing methods for these types. However, following the wrong approach can lead to memory crash bugs and lack of proper thread safety.

To illustrate how simple it can be to introduce bugs while implementing a custom setter method, consider the following declared property:

@property NSString (copy) someString;

A hasty implementation of the setter might be:

- (void)setSomeString:(NSString *)aString
{
    [someString release];
    someString = [aString copy];
}

This implementation actually contains two bugs:

  1. This method is not atomic.
    The someString object changes twice: once on release and again when it is assigned the copied object's address. This method is not atomic and therefore violates the declaration (which omits the nonatomic keyword and therefore requires atomicity).
  2. The assignment contains a potential memory deallocation bug.
    If someString is ever assigned its own value, it will release it before copying it, causing potential use of a released variable. The code: self.someString = someString; is an example of this potential issue.

Don't feel too bad if you've ever made these mistakes. I spent some time looking at clang's synthesized method implementations when I was researching this post and I noticed that they've forgotten to handle struct accessor methods in an atomic manner when required.

Safe implementations of custom accessor methods for non-atomic types

To address this second issue, Apple's Declared Properties documentation suggests that your setter methods should look like this:

- (void)setSomeString:(NSString *)aString
{
    if (someString != aString)
    {
        [someString release];
        someString = [aString copy];
    }
}

This only fixes the memory issue, it doesn't fix the atomicity issue. To handle that, the only simple solution is to used a @synchronized section:

- (void)setSomeString:(NSString *)aString
{
    @synchronized(self)
    {
        if (someString != aString)
        {
            [someString release];
            someString = [aString copy];
        }
    }
}

This approach will also work for retain properties as well (simply replace the copy method with a retain.

To maintain atomicity, you also need a retain/autorelease pattern and lock on any getter methods too:

- (NSString *)someString
{
    @synchronized(self)
    {
        id result = [someString retain];
    }
    return [result autorelease];
}

The @synchronized section is only required around the retain since that will prevent a setter releasing the value before we can return it (the autorelease is then safely done outside the section).

For struct and other compound data types, we don't need to retain or copy, so only the @synchronized section is required:

- (NSRect)someRect
{
    @synchronized(self)
    {
        return someRect;
    }
}
- (void)setSomeRect:(NSRect)aRect
{
    @synchronized(self)
    {
        someRect = aRect;
    }
}

A faster, shorter way to implement custom accessors

There are two negative points to the custom accessor methods listed above:

  • They need to be coded exactly to avoid bugs.
  • The @synchronized section on self is coarse-grained and slow.

There is another way to implement these methods that doesn't require as much careful coding and uses much more efficient locking: use the same functions that the synthesized methods use.

The following functions are implemented in the Objective-C runtime:

id objc_getProperty(id self, SEL _cmd, ptrdiff_t offset, BOOL atomic);
void objc_setProperty(id self, SEL _cmd, ptrdiff_t offset, id newValue, BOOL atomic,
    BOOL shouldCopy);
void objc_copyStruct(void *dest, const void *src, ptrdiff_t size, BOOL atomic,
    BOOL hasStrong);

While these functions are implemented in the runtime, they are not declared, so if you want to use them, you must declare them yourself (the compiler will then find their definitions when you compile).

These methods are much faster than using a @synchronized section on the whole object because (as shown in their Apple opensource implementation) they use a finely grained, instance variable only spin lock for concurrent access (although the copy struct function uses two locks following an interface design mixup).

When you declare these functions, you can also declare the following convenience macros:

#define AtomicRetainedSetToFrom(dest, source) \
    objc_setProperty(self, _cmd, (ptrdiff_t)(&dest) - (ptrdiff_t)(self), source, YES, NO)
#define AtomicCopiedSetToFrom(dest, source) \
    objc_setProperty(self, _cmd, (ptrdiff_t)(&dest) - (ptrdiff_t)(self), source, YES, YES)
#define AtomicAutoreleasedGet(source) \
    objc_getProperty(self, _cmd, (ptrdiff_t)(&source) - (ptrdiff_t)(self), YES)
#define AtomicStructToFrom(dest, source) \
    objc_copyStruct(&dest, &source, sizeof(__typeof__(source)), YES, NO)

I like to include the "To/From" words so I can remember the ordering of the source and destination parameters. You can remove them if they bother you.

With these macros, the someString "copy" getter and setter methods above would become:

- (NSString *)someString
{
    return AtomicAutoreleasedGet(someString);
}
- (void)setSomeString:(NSString *)aString
{
    AtomicCopiedSetToFrom(someString, aString);
}

and the someRect accessor methods shown above would become:

- (NSRect)someRect
{
    NSRect result;
    AtomicStructToFrom(result, someRect);
    return result;
}
- (void)setSomeRect:(NSRect)aRect
{
    AtomicStructToFrom(someRect, aRect);
}

Conclusion

Most of the accessor methods I've shown here are atomic but in reality, most Objective-C object accessors are declared nonatomic.

Even if your properties are declared nonatomic, the memory management rules still apply. These rules are important to follow since they can lead to some very obscure and hard to track down memory bugs.

The macros I've provided are all for atomic properties. For non-atomic properties the boilerplate assignment code is probably simple enough to remember. If not, you could also use a macro:

#define NonatomicRetainedSetToFrom(a, b) do{if(a!=b){[a release];a=[b retain];}}while(0)
#define NonatomicCopySetToFrom(a, b) do{if(a!=b){[a release];a=[b copy];}}while(0)

Update: following comments below, I realize I omitted to qualify the situations in which these accessors are thread-safe. Specifically:

  1. These setter methods are only thread-safe if the parameters passed to them are immutable. For mutable parameters, you may need to ensure thread safety between mutations on the parameter and the assignment of the property.
  2. Atomic accessors only provide thread safety to an instance variable if they are the sole way you access the instance variable. If non-property access is required, you must ensure shared thread safety between property accessor methods and the non-property access.
  3. Atomic assignment for the "implicitly atomic" types I listed does not mean that all CPUs/cores see the same thing (since each CPU/core could have its own cache of the value) — it only ensures that value is wholly set without possibility of interruption. If you require all CPUs/core to be synchronized and see the same value at a given moment, then even the "implicitly atomic" types may require volatile qualifiers or a @synchronized section around the assignment to flush caches.
Read more...

How blocks are implemented (and the consequences)

This post is a look at how clang implements blocks and how this implementation leads to a number of strange behaviors including local variables that end up global, Objective-C objects allocated on the stack instead of the heap, C variables that behave like C++ references, Objective-C objects in non-Objective-C languages, copy methods that don't copy and retain methods that don't retain.

What blocks are to the compiler

Blocks are addressable sections of code implemented inline (inside other functions). The inline-edness can be convenient but the real reason why blocks are different to regular functions and function pointers is that they can reference local variables from the scope of the function surrounding their implementation without the invoker of the block needing to know of the surrounding scope variables' existence.

A block is implemented internally using two pieces:

  1. compiled code in the .text segment of the executable
  2. a data structure that predominantly contains the values of the variables that the block uses from its surrounding scope

The compiled code lives in its own separate location and does not actually reside inside inside the code of its surrounding scope. In implementation, the code is a function like any other. If you run:

otool -tV MyCompiledExecutable

then you'll see your blocks appearing immediately after their surrounding functions with names like ___surroundingFunction_block_invoke_21.

So it is not the code which makes blocks special, it is the separate data structure. It is this data structure that I will focus on for the remainder of this post.

The block data structure

Clang's basic documentation on block implementations indicates that the data structure describing the block looks something like this:

struct Block_literal {
    void *isa;

    int flags;
    int reserved; // is actually the retain count of heap allocated blocks

    void (*invoke)(void *, ...); // a pointer to the block's compiled code

    struct Block_descriptor {
        unsigned long int reserved; // always nil
        unsigned long int size; // size of the entire Block_literal
        
        // functions used to copy and dispose of the block (if needed)
        void (*copy_helper)(void *dst, void *src);
        void (*dispose_helper)(void *src); 
    } *descriptor;

    // Here the struct contains one entry for every surrounding scope variable.
    // For non-pointers, these entries are the actual const values of the variables.
    // For pointers, there are a range of possibilities (__block pointer,
    // object pointer, weak pointer, ordinary pointer)
};

Of course, the reality is that this structure is never explicitly declared like this in clang. Clang is a compiler — a code generator — and the format of this structure is generated programmatically from the CodeGenFunction::BuildBlockLiteralTmp method.

Stack blocks and global blocks

Since the biggest difference between a function pointer and a block is the ability to use variables from the surrounding scope, it is interesting to look at what happens when a block does not reference anything in the surrounding scope.

Normally, the Block_literal data appears on the stack (like a regular struct would in its surrounding function). With no references to the surrounding scope, clang configures the Block_literal as a global block instead. This causes the block to appear in a fixed global location instead of on the stack (the flags value has the BLOCK_IS_GLOBAL flag set to indicate this at runtime but it's not immediately clear to me if this is ever used).

The implication of this is that global blocks are never actually copied or disposed, even if you invoke the functions to do so. This optimisation is possible because without any references to the surrounding scope, no part of the block (neither its code nor its Block_literal) will ever change — it becomes a shared constant value.

Blocks are always objects

If you're familiar with how Objective-C objects are declared, the isa field in the Block_literal above should be familiar — blocks are Objective-C objects. This may not seem strange in Objective-C but the reality is that even in pure C or C++, blocks are still Objective-C objects and the runtime support for blocks handles the retain/release/copy behaviors for the block in an Objective-C messaging manner.

Clang uses the class names _NSConcreteStackBlock and _NSConcreteGlobalBlock to refer to the classes for block literals but in CoreFoundation projects, this will map onto NSStackBlock and NSGlobalBlock. If you copy an NSStackBlock, it will return an NSMallocBlock (indicating its changed allocation location).

Blocks are slightly weird objects

The interesting point to note about NSStackBlock is that it is a stack allocated Objective-C object. If you have ever tried to allocate an Objective-C object on the stack (not as a pointer but statically allocated) you'll know that the compiler normally forbids this.

The reason why blocks are placed on the stack by default is speed. In the common case where the lifetime of the block is less than that of the stack function that contains it, this is a very good optimisation.

The implication of stack blocks being allocated on the stack, is that a stack block cannot simply be retained — it will become invalid once the function that contains it is popped from the stack. If you invoke retain on a stack block, it will have no effect (the retain count of the block will remain at 1).

For this reason, if you need to return a block from a function or method, you must [[block copy] autorelease] it, not simply [[block retain] autorelease] it.

__block values can move magically

Scope variables used in a block are normally passed to the block by const value (the compiler won't let you change the value but even if it did, the change wouldn't affect the value of the variable outside the block).

To alter this behavior, the type specifier __block was added. Any variable declared __block is passed by reference into the block (value on the outside will be changed after the block is invoked).

In the implementation, __block variables are initially allocated on the stack but if any block which references them is copied, they are moved onto the heap (malloced). This leads to the following strange situation...

int (^function())()
{
    __block int x = 0;
    
    int (^block)() = ^{
        x += 1;
        return x;
    };
    
    NSLog(@"x's location is on the stack: %p", &x);
    block = [[block copy] autorelease];
    NSLog(@"x's location is now on the heap: %p", &x);
    
    return block;
}

In this example, x's address changes when the copy is invoked. This is because when we declare a __block variable, a pointer to the real variable is created and any attempt to use the variable dereferences it. When copy is invoked, the location pointed to by the pointer changes to the new heap location, so any use of x causes a dereference to this new location.

This makes __block similar to to a reference parameter in C++ since C++ references are also transparently dereferenced pointers.

NSMallocBlock never actually copies

Copying a block doesn't really give you a copy of the block — if the block is already an NSMallocBlock, a copy simply increases the retain count of the block (this retain count is an internal reserved field — the retainCount returned from the object will remain at 1). This is perfectly appropriate since the scope of the block cannot change after it is created (therefore the block is constant) but it does mean that invoking copy on a block is not the same thing as recreating it.

Assume the following code is in the same program as the previous example.

int (^someBlock)() = counterBlock();
int (^someBlockCopy)() = [[someBlock copy] autorelease];
int (^anotherBlock)() = counterBlock();

The block returned from counterBlock() counts the number of times that it is invoked by saving the count in the __block variable x.

In this example though, someBlock and someBlockCopy share the same x variable — they are not actually separate copies. However, anotherBlock does have its own separate x value.

If you need a genuinely separate copy, recreate the block, don't copy it.

Blocks retain their NSObject scope variables

Blocks will retain any NSObject that they use from their enclosing scope when they are copied.

The biggest implication of this is that you must remember to avoid retain cycles if the block will be held beyond a simple stack lifetime.

A pointed out elsewhere, you can suppress this retain of NSObjects by assigning the object to a __block variable outside the block and only ever using the __block variable inside the block.

You can also do the reverse of this and force a pointer that isn't an NSObject derived class to be retained when copied. Do this by declaring the pointer with __attribute__((NSObject)). Of course, the situations where you'd want to do this are exceedingly rare.

Conclusion

Blocks are very simple to use in the case where you declare one inline and immediately pass into another function but once you need to copy or hold onto a block for a while, there are a number of quirks, some of which I've covered in this post.

Sadly at this time, Apple's documentation on blocks is fairly basic and lacking in detail. This is what led me to start looking at clang's source code.

Of course, you don't need to stare at someone else's C++ code to learn about blocks. There are other sources of lighter, more approachable documentation on the topic. In addition to sources that I've already linked, there's also:

Read more...

Objective-C's niche: why it survives in a world of alternatives

Objective-C remains an impediment for many programmers coming to the Mac or iPhone platforms — few programmers have ever experienced it before learning Cocoa, forcing two learning curves at once for new Cocoa developers. How did Apple end up with such a weird language? And for a company known to replace CPU architectures and their entire operating system, why does Apple persist with Objective-C? The answer lies in the methods.

Virtual methods

Most compiled, object-oriented languages (like C++, Java and C♯) adhere closely to the object-oriented approaches first introduced in Simula 67 — in particular the concept of virtual methods and how they enable methods to be overridden.

Origins in Algol: In much the same way that Objective-C is often called a "pure superset" of C, Simula 67 was a "pure superset" of Algol 60. While Fortran is sometimes remembered as the first high-level language to gain popularity (and disdain), Algol 60 was the first programming language to actually resemble a modern language as it contained the for, if/else, while (of sorts), and other procedural contructs that are expected now in programming languages. While Algol was rarely used past the 1970's, Pascal and its descendants closely resemble Algol in syntax.

In a compiled language, a regular function (non-overrideable) ends up as a basic memory address. When the function is invoked, the CPU jumps to the memory address.

Simula 67 introduced virtual method tables to adapt this for object-orientation. Instead of basic memory addresses, methods are compiled to row numbers in a table. To get the memory address, the method table is retrieved from the class of the object and the CPU jumps to the address at the specified row.

Since different objects have different classes, they will have different addresses in their method tables, hereby allowing sublcasses to have different implementations of methods (method overrides) to their base classes.

Message passing

While virtual method tables do introduce a level of indirection that allows method behavior to change from object to object, the offsets into the table and hence the tables themselves all need to be created at compile-time.

History of message passing: As with object-orientation itself, message passing was inspired by Simula 67 but Simula's message passing (called "Simulation") wasn't for method invocations — it was instead used for discrete event simulation (mostly queueing and list processing). Smalltalk expanded upon this idea by using message passing for method invocation. Smalltalk subsequently inspired the Actor Model (used in distributed processing) and remote procedure calls (RPC). Originally, Smalltalk messages were conceived to have a large amount of metadata (more like the full headers on an email) but eventually, this was simplified down to an approach syntactically similar to Objective-C's current implementation (minus square brackets).

Message passing presents an alternative way of solving the method dispatch problem. Instead of the virtual method's compile-time offsets and tables which don't consult the object (except for its type), message passing sends a unique message identifier to the object itself and the object determines at runtime what action to take.

Message passing approaches may still have a virtual method table ("vtable") in the class' representation but this structure is not known at compile time — it is handled entirely at runtime — and instances of the class have the opportunity to take different actions in response to the message that are unrelated to the content of the table.

There are two important differences here:

  • Runtime resolution — so the connection between message identifier and action can be changed at runtime.
  • Involvement of the object itself, not just its class.

On a technical level, the difference between a virtual method table and passing a message identifier is relatively minor (since both are really table lookups and both are actually performed at runtime). The difference ends up being conceptual:

  • Virtual method table languages generally make it hard or impossible to change the virtual method table contents or pointers at runtime.
  • Type safety is essential in a virtual method table language since the compiler may alter table lookups based on type, particularly in cases of multiple inheritance. In message passing systems, type safety is irrelevant to method invocation.

Why this matters

The short answer is that this dynamic message handling in Objective-C makes it much easier to work within a large framework that you didn't create because you can examine, patch and modify elements of that framework on the fly. The most common situation where this is likely to occur is when dealing with an application framework.

The biggest reason for this is that you can add or change methods on existing objects, without needing to subclass them, while they are running. Approaches for this include categories, method swizzling and isa-swizzling.

This makes the following situations possible:

  • You want to add a convenience method to someone else's object (a quick search of my own posts reveals that about a dozens of my own posts involve adding convenience methods to Cocoa classes, e.g. Safely fetching an NSManagedObject by URI).
  • You want to change the behavior of a class you didn't (and can't) allocate because it is created by someone else (this is how Key-Value Observing is implemented in Cocoa).
  • You want to treat objects generically and handle potential differences with runtime introspection.
  • You want to substitute an object of a completely different class to the expected class (this is used in Cocoa by NSProxy to turn a regular object into a distributed object).

These points may seem somewhat mild but they are central to maximizing code reuse when working within someone else's framework: if you need existing code to work differently, you don't need to reimplement the whole class and you don't need to change how it is allocated.

Languages using virtual method tables can adopt some of these ideas (like the boost::any class or C♯ 4.0's dynamic member lookup) but these features have additional restrictions and don't apply to all objects, meaning that they can't be used on purely arbitrary objects (such as those you don't control or didn't create) and so don't help when interacting with someone else's framework.

Simply put: dynamic message passing instead of virtual method invocations makes Objective-C a much better language for working with a large library or framework that someone has written.

The tradeoff

The downside to dynamic message invocation is that it is only as fast as virtual method invocation when the message lookup is cached, otherwise it is invariably slower.

Also, in keeping with the philosophy of a purely dynamic messaging system, Objective-C does not use templates or template metaprogramming and does not have non-dynamic (i.e. non-virtual) methods. This means that Objective-C methods will miss out on the compiler optimizations possible when employing these techniques. Also, since modern programming in C++ is substantially focussed on these features, it can difficult to adapt programs using these ideas to Objective-C.

Theoretically, Objective-C could implement these features but they are in opposition to the underlying concepts of flexibility and dynamic behavior in Objective-C — and would shut-down all advantages from the previous section if they were used.

Conclusion

It's not a coincidence that I write an Objective-C/Cocoa blog, I'm obviously an advocate of Objective-C and Cocoa. In my opinion, Objective-C is the best language for programming situations where you must make extensive use of a framework written by someone else (particularly an application framework). The success of Objective-C in this situation is due to the combination of:

  • speed and precision (from its compiled C roots)
  • dynamic flexibility (due to using message passing for method invocations)

To frame this conclusion, I'll state that I've written major projects using C/WIN32, C++/PowerPlant, C++/MFC, and Java/Swing/AWT. I've also dabbled in smaller projects using C♯/.Net. In all of these cases I have found the application frameworks to be less flexible and less reusable because they lack the dynamic modifiability of Objective-C.

As I've stated, I do view Objective-C's strength in the area of application frameworks as a niche (albeit a very large niche). If I were writing a compiler, OS kernel, or a low-level/high-performance library, then I would use C++ (I wouldn't use pure C because I'd miss my abstractions) — but these are situations where metaprogramming, greater inlining and faster method invocations would trump flexibility concerns. Of course, if you have a project that needs to satisfy all these criteria: then there's always Objective-C++.

Read more...

The ugly side of blocks: explicit declarations and casting.

Blocks are a welcome addition to C/Objective-C/C++/Objective-C++ with Snow Leopard but they carry with them the worst aspect of Standard C: function pointer declaration and casting syntax. In this post, I'll show you how to understand declarations and casting syntax for blocks and function pointers, even in the worst of scenarios.

Simple block declarations and casting

Used as intended (simple inline code implementations) blocks are fairly elegant. This is due to one advantage they offer: in simple cases, you do not need to specify the return type — it can be inferred from the return statement in the block itself.

So declaring a block that returns an int can be as simple as:

int (^alwaysReturnIntZero)() = ^{ return 0; };

In this case though, an unqualified integral value is correctly assumed to be an int. If we want the block to return an NSInteger, we need to either cast the return type or not rely on type inference and declare the return fully:

NSInteger (^alwaysReturnNSIntegerZero)() = ^ NSInteger (){ return 0; };

Notice that the block literal (righthand side) does not follow the structure as the block declaration (lefthand side). The block literal uses a straightforward "caret, return type, parameter list" order but the block declaration uses the C function pointer declaration syntax, which can grow more complex (as I'll show later). At this point though, the two are of similar complexity.

Casting a block looks much like declaring a block, minus the name of the variable from the declaration.

long long (^alwaysReturnLongLongZero)() = (long long (^)())alwaysReturnNSIntegerZero;

If you look at what is done here, all that is needed to create a cast for a value to the variable type, is to copy the variable's declaration, put parentheses around the copied declaration and remove the variable name.

Function pointers

Blocks borrow their syntax from standard C function pointers. In almost all cases, the only difference between a block declaration or cast and a function pointer declaration or cast is the "^" character is used for the block and the "*" character is used for the function pointer. e.g.:

long long (*fnAlwaysReturnLongLongZero)() = (long long (*)())fnAlwaysReturnNSIntegerZero;

Of course, functions cannot be declared inline, so you cannot have function literals in the same way as you can have block literals. However, all other syntactic traits remain the same.

Reading declarations correctly

Unfortunately, blocks follow the typical C declaration rules which become outright confusing when you try to return something. Before it all gets complicated, I'm going to explain something simple about C declarations.

Consider how a pointer is declared:

int *myVariable;

If you're reading this blog at all, you should know that this statement creates a pointer named myVariable which points to an int.

But the operator used here is a "dereference", it is not the "make a pointer" (address of) operator. The correct way to read this line is:

  1. Declare a variable:
    myVariable
  2. It can be dereferenced (and by implication is therefore a pointer)
    *myVariable
  3. If it is deferenced, then the value yielded from the dereference should be treated as an int:
    int *myVariable;.

Let's look at the alwaysReturnIntZero declaration from above again and we'll apply this same reading to it.

int (^alwaysReturnIntZero)() = ^{ return 0; };
  1. Declare a variable:
    alwaysReturnIntZero
  2. It can be dereferenced to yield block information (and by implication is therefore a block pointer):
    ^alwaysReturnIntZero
  3. Its block implementation takes no parameters and returns an int:
    int (^alwaysReturnIntZero)()

This approach to reading a declaration is quite simple but you'll need to it to follow the next section.

Declaring a block that returns a block

Imagine you wanted to use a block to compare a double to an int and return true if the double is greater than the int or false if the double is equal or smaller. In the simple case, that might look like this:

bool (^compareDoubleToInt)(int i, double j) = ^{ return j > i; };

Easy enough but imagine now that you want to break this into two pieces:

  1. A first block which takes only the int and returns a second block, pre-configured to use this int.
  2. The second block then takes the double, compares it to its pre-configured int and returns the result.

The first block is then a factory block which creates instances of the second block that operate like the compareDoubleToInt shown above for a single, pre-configured value of i.

The complete implemention of this would be:

bool (^(^newDoubleToIntComparison)(int))(double) =
    ^(int i)
    {
        return Block_copy(^ (double j)
        {
            return j > i;
        });
    };
Pay careful attention to the "new" in the name — this serves to notify that you must use Block_destroy on any blocks created in this fashion when you're done.

If everything about the syntax on that first line (the declaration) makes immediate sense to you, then you may consider yourself skilled at syntactic recursion.

The reason most people find this hard to read is that verbally, we would describe this scenario in a very different order:

  1. Declare a variable:
    newDoubleToIntComparison
  2. It can be dereferenced to yield block information (and by implication is therefore a block pointer):
    ^newDoubleToIntComparison
  3. The block takes an int parameter:
    (^newDoubleToIntComparison)(int)
  4. Its return value can be dereferenced to yield block information (and by implication the return value is therefore a block pointer):
    (^(^newDoubleToIntComparison)(int))
  5. This returned block takes a double parameter
    (^(^newDoubleToIntComparison)(int))(double)
  6. And the returned block returns a bool
    bool (^(^newDoubleToIntComparison)(int))(double);

If C declarations read from left-to-right, it would be far less confusing. Instead, we have a situation where blocks that return blocks are recursively nested inside each other.

Of course, most people mitigate this by typedef'ing absolutely function pointer they ever use. Doing this for the previous block declaration changes it to:

typedef bool (^IsDoubleBiggerBlock)(double);
IsDoubleBiggerBlock (^newDoubleToIntComparison)(int);

Functions or methods that return blocks

It may also be helpful to see the subtle difference between declaring a block that returns a block and the definition of of a function returns a block.

Replacing the factory block with a factory function in the previous example would lead to:

bool (^NewDoubleToIntComparisonFunction(int i))(double)
{
    return (bool (^)(double))Block_copy(^ (double j)
    {
        return j > i;
    });
};

This function takes a single int as its parameter and yet the last component on the function prototype line is (double). The int parameter that the function actually takes and the name of the function are nested inside of the return type (the return type comprises the double parameter to the right, the caret character and the bool return value to the left).

Also notice that you need to cast the output of Block_copy to have it recognized as the correct return type.

As with the variable declarations, this nested behaviour is normally considered too annoying, so typedefs are employed to simplify:

typedef bool (^IsDoubleBiggerBlock)(double);
IsDoubleBiggerBlock NewDoubleToIntComparisonFunction(int i)
{
    return (IsDoubleBiggerBlock)Block_copy(^ (double j)
    {
        return j > i;
    });
};

This has the huge advantage that it puts the function's parameter back where it belongs — as the last component on the function prototype line.

An Objective-C method that returns a block is a much simpler situation since the method does not become nested within the return type in the same way. Instead, the return type looks identical to the cast of the returned copied block and the rest of the method remains distinct.

- (bool (^)(double))newDoubleToIntComparison:(int)i
{
    return (bool (^)(double))Block_copy(^ (double j)
    {
        return j > i;
    });
}

Conclusion

The declaration of C function pointers is widely regarded as the worst syntax in the language. There is a good reason for this: the information in a function pointer's declaration flows from the most significant components which are nestled on the inside of the declaration to the least significant components which encircle the outside. They could flow left-to-right like a sentence but instead they flow outwards from an identifier somewhere in the middle.

Sadly, blocks follow in this tradition. All you can do to mitigate the torment is use typedef'd declarations judiciously and try to keep your blocks simple. They're not really intended for large numbers of parameters and complex return values, anyway.

Read more...

Apologies for Atom/RSS feed issues. Here are the real articles...

Last week, I updated a large number of old articles (to ensure the code still builds). During the course of these updates, I accidentally duplicated two old articles. Once I noticed, I deleted the duplicates but unfortunately, the duplicates appeared as new posts in many Atom/RSS feeds with a broken/missing link to the actual post.

If you're looking for the actual location of these articles, you can find them here:

Read more...