Classes for fetching and parsing XML or JSON via HTTP

Please note: this article is part of the older "Objective-C era" on Cocoa with Love. I don't keep these articles up-to-date; please be wary of broken code or potentially out-of-date information. Read "A new era for Cocoa with Love" for more.

In this post I show two reusable classes for fetching data via HTTP: one that parses the result as XML and another that parses as JSON. These are relatively simple tasks but due to the number of required steps, they can become tiresome if you don't have robust, reusable code for the task. These classes will work on iOS or on the Mac but the optional error alerts and password dialogs are only implemented for iOS.

Introduction

In my experience, "fetching data via HTTP" is probably the second most common task that iOS applications perform after "displaying a list of things in a table". Since I wrote a recent post showing how I handle display in tables, showing my reusable classes for fetching via HTTP seemed like a reasonable follow up.

As with the post on UITableView management, this post is all about trying to make the HTTP fetching, handling and processing as simple and reusable as possible.

What I hope to demonstrate is that even though the Cocoa API makes it look like you need to bolt NSURLConnection delegate methods onto your own classes every time you need a network connection, it doesn't mean that you actually need to do all this work every time you need a network connection. For the most common tasks like this, you should develop your own, reusable approaches that you like, that serve your needs and that make new code easier.

There are lots of alternative approaches around that demonstrate similar ideas. My implementation is a simple implementation compared to full frameworks (for a more thorough implementation along similar lines, you may want to look at RestKit). I hope you'll still be able to see the contrast compared to ad hoc solutions though, especially if you've ever jammed HTTP communication into your projects without thinking about keeping the interface clean and simple.

You can download the four classes discussed in this project: HTTPXMLJSONFetchers.zip (16kB)

HTTP connections in Cocoa

BSD sockets and CFHTTPStream are generally too low level to use regularly. Unless your program requires meticulous control of the network layer, you probably want to use NSURLConnection for handling HTTP fetching.

Technically, NSURLConnection can perform network connections in a single instruction: +[NSURLConnection sendSynchronousRequest:returningResponse:error:]. Synchronous connection should be avoided in all but a few rare worker-thread situations because it stops your program's user-interface and it doesn't allow careful error handling.

This means that when fetching via HTTP, you should be using NSURLConnection's delegate methods. The delegate methods are:

- (void)connection:(NSURLConnection *)connection didReceiveResponse:(NSURLResponse *)response
- (void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data
- (void)connection:(NSURLConnection *)connection didFailWithError:(NSError *)error
- (void)connection:(NSURLConnection *)aConnection didReceiveAuthenticationChallenge:(NSURLAuthenticationChallenge *)aChallenge
- (void)connectionDidFinishLoading:(NSURLConnection *)connection

and a thorough implementation means implementing all 5 of these methods.

A commonly seen case for this is to add the NSURLConnection delegate methods to your UITableViewController and make that view controller manage the connection.

While this might seem like a good idea (the view controller can track the status of the connection and provide visual updates and also present its own errors) the reality is that fully handling the connection takes a lot of code. How much code? The code I use is 530 lines long (including comments and spacing).

But there's also a more serious problem: bolting NSURLConnection to your UITableViewController limits code reuse. If your network code is tied closely to the view controller, there's more work involved in adding network behaviors to other view controllers or parts of your program.

Why do NSURLConnection delegates take so much code to implement? In the simplest case, they don't (you could probably manage a connection in 20 lines or so) but you'd be overlooking a lot of subtler behaviors. Errors, password authentication, cancelling the connection cleanly and offering simple construction versus meticulous construction are the type of behaviors that get left out if you're rewriting the code every time or operating under serious time constraints.

HTTPFetcher

The idea behind my HTTPFetcher class is really simple: it's reusable NSURLConnection delegate. It handles all the NSURLConnection delegate work and calls back when it has the results. It provides default error handling, password authentication and while it has a very simple default constructor, it still provides enough hooks that you can customize its behavior.

The interface to the class is really just construction methods, a start, a cancel and some properties. The assign properties are for configuring the connection before you start it. The readonly properties are for gathering information once the connection is complete.

@interface HTTPFetcher : NSObject <UITextFieldDelegate>

@property (nonatomic, readonly) NSData *data;
@property (nonatomic, readonly) NSURLRequest *urlRequest;
@property (nonatomic, readonly) NSDictionary *responseHeaderFields;
@property (nonatomic, readonly) NSInteger failureCode;
@property (nonatomic, assign) BOOL showAlerts;
@property (nonatomic, assign) BOOL showAuthentication;
@property (nonatomic, assign) void *context;

- (id)initWithURLRequest:(NSURLRequest *)aURLRequest
    receiver:(id)aReceiver
    action:(SEL)receiverAction;
- (id)initWithURLString:(NSString *)aURLString
    receiver:(id)aReceiver
    action:(SEL)receiverAction;
- (id)initWithURLString:(NSString *)aURLString
    timeout:(NSTimeInterval)aTimeoutInterval
    cachePolicy:(NSURLCacheStoragePolicy)aCachePolicy
    receiver:(id)aReceiver
    action:(SEL)receiverAction;
- (void)start;
- (void)cancel;

@end

You initialize the class in whatever way you choose (the middle init method shown here is the simplest), optionally configure the class (the most common configuration is to set the context pointer so that when the connection completes, you can remember where to set the data), start the connection and then it will invoke the receiverAction on your receiver object (the receiver action takes one parameter: the HTTPFetcher itself).

// Example fetcher creation
fetcher = [[HTTPFetcher alloc]
    initWithURLString:@"http://some-domain.com/some/path"
    receiver:self
    action:@selector(receiveResponse:)];
[fetcher start];

// Example fetcher response handling
- (void)receiveResponse:(HTTPFetcher *)aFetcher
{
    NSAssert(aFetcher == fetcher,
        @"In this example, aFetcher is always the same as the fetcher ivar we set above");
    if ([fetcher.data length] > 0)
    {
        [self doSomethingWithTheData:fetcher.data];
    }
    [fetcher release];
    fetcher = nil;
}

Ordinarily, your program will want to customize the code that presents the errors and make the presentation consistent to your application. You can do this with the HTTPFetcher class by either subclassing or editing the class itself or you can disable the alerts and authentication functionality and perform the work outside the class. However, if you don't have time to do this customization, there is default behavior in the class that will suffice.

HTTPFetcher memory management: the HTTPFetcher does not retain itself while running and does not retain the receiver. This is because the expected behavior is that the receiver retains the HTTPFetcher and we don't want a retain cycle. If you create the HTTPFetcher and don't have a retain count on it, it will immediately auto-cancel itself and dealloc.

XMLFetcher

The HTTPFetcher is fine if you simply want the data from an HTTP connection. For my own purposes though, I've never used the HTTPFetcher on its own — I've always used it as the base-class for classes which post-process the HTTP data before invoking the receiver's callback method.

The XMLFetcher class is for turning an XML response into something more useful. Instead of needing to look at the data property of the HTTPFetcher, you can use the results property which is the array of nodes matching a given XPath query on the XML result.

@interface XMLFetcher : HTTPFetcher

@property (nonatomic, copy, readonly) NSString *xPathQuery;
@property (nonatomic, retain, readonly) NSArray *results;

- (id)initWithURLString:(NSString *)aURLString
    xPathQuery:(NSString *)query
    receiver:(id)aReceiver
    action:(SEL)receiverAction;

@end

I've previously spoken about how I'm not a fan of the event-driven model (sometimes called a SAX parser) promoted by Apple in the iOS API. It is certainly memory efficient and faster for large files but it requires you perform your own structured handling which is tiresome, prone to mistakes and not really reusable. I personally prefer a document-based model like the NSXML API that exists in Mac OS X but not in iOS.

The XMLFetcher class blends the libXML-based XPath based parsing and querying with the HTTPFetcher.

However, I've addressed a number of the shortcomings of my previous libXML-based parsing. The biggest problem with that earlier code was that it simply packaged the XML into NSDictionarys (which is inelegant at best) — so instead, the results are now a dedicated XPathResultNode class which can cleanly represent attributes, childNodes and contentStrings. There's also better handling of content strings either side of subnodes and concatenating of text data spread over subnodes.

@interface XPathResultNode : NSObject

@property (nonatomic, retain, readonly) NSString *name;
@property (nonatomic, retain, readonly) NSMutableDictionary *attributes;
@property (nonatomic, retain, readonly) NSMutableArray *content;

+ (NSArray *)nodesForXPathQuery:(NSString *)query onHTML:(NSData *)htmlData;
+ (NSArray *)nodesForXPathQuery:(NSString *)query onXML:(NSData *)xmlData;

- (NSArray *)childNodes;
- (NSString *)contentString;
- (NSString *)contentStringByUnifyingSubnodes;

@end

XPath query note: XPath queries can be a little difficult to get used to — if you're not accustomed to XPath, it can be hard to extract the exact nodes you want. Like regular expressions though, they're a highly specialized language for extracting data and once you understand the different functions available, they are the quickest way of getting specific nodes out of XML.

Compiler note: the XPathResultNode.m file contains a comment at the time which explains the Xcode compiler settings required to make it work. Basically, you need to include libxml in the include path and link your project with libxml2.dylib.

JSONFetcher

The JSONFetcher is really just the same idea as the XMLFetcher — parse the result from HTTPFetcher once complete, this time as JSON data.

The class I've written relies on SBJSON, Stig Brautaset's BSD-licensed JSON parsing library. You will need to download these files separately and include them in your project (it's 3 .m files and 4 .h files).

SBJSON isn't your only option for JSON handling in iOS or Mac OS X. There are a few other JSON libraries for iOS and Mac discussed here on Stackoverflow if you'd prefer options. Obviously though, you'd need to make minor adjustments to integrate a different parser.

With a JSON response, there's not the same expectation of needing to find a subnode within a larger result (as is the common case for XML), so the JSON parser simply parses the whole JSON structure and returns it all.

@interface JSONFetcher : HTTPFetcher

@property (nonatomic, readonly) id result;

@end

Conclusion

You can download the four classes discussed in this project: HTTPXMLJSONFetchers.zip (16kB)

I've presented my classes for handling these tasks. I don't expect that everyone has the same data and network requirements as I do, so there's every chance that you would need very different classes to suit your own exact needs.

The point is really to consider reuse in your own code — how can you evolve your classes so that when you start a new project you need to rewrite as little as possible — you can simply bring in your own class for handling network data, pass different parameters into its constructor and your network connection is done.

Until I had composed these classes for my own purposes, new projects involved hundreds of lines of code that went through a copy, paste, refactor process from existing projects I'd written. While copy, paste, refactor will work, it is slower, more prone to errors and harder to keep up-to-date than properly reusable classes. In most cases, you should view copy and paste as a failure of your own processes. That's a hard rule to adhere to, since copy, paste, refactor is faster than designing a reusable class — or at least it is initially (compared to an up-front design effort). You need to have the discipline to recognize the common behaviors between classes or projects and refactor into shared classes if required.

A final thought: I realize I I haven't really shown these classes at work in an example program. If you can't work out how to use them in a real program, please wait a week or two: I plan to share a real-world project that uses them to handle all its network communication.