Reader beware: this post is part of the older "Objective-C era" on Cocoa with Love. I don't keep these articles up-to-date so the code may be broken or superceded by newer APIs. There's some good information but there's also some opinions I no longer endorse – keep a skeptical mind. Read "A new era for Cocoa with Love" for more.
NSXMLDocument is the normal tree-based XML parser in Cocoa. But if you're writing for the iPhone, this class isn't available. Even on the Mac, sometimes you want tree-based parsing without the full overhead of NSXMLDocument. Here's how to use libxml2 to perform tree-based parsing in a Cocoa-friendly way.
NSXMLDocument is an excellent XML parser and XML document generator. Sadly, Apple have chosen not to include it in the current iPhone SDK. Apple instead recommend
NSXMLParser on the iPhone.
Personally, I don't like the "event-driven" parsing of
NSXMLParser. For the types of project I find myself writing, it is time-consuming and fiddly. I also like throwing HTML at my XML parsers and
NSXMLParser (which is a strict, non-correcting parser) requires that HTML be cleaned-up first (using libtidy or similar), which eliminates much of the performance gain from this type of parsing anyway.
Fortunately, libxml2 exists on the iPhone and we can use it to perform much of the same parsing that
NSXMLDocument performs for us on the Mac.
Other programmers have noted that libxml2 can be faster and more memory efficient than
NSXMLDocument on the Mac, so there may be reasons to use libxml2 directly, even when
NSXMLDocument is available.
Downside to libxml2
libxml2 itself is a fairly simple, clean library but the official documentation is famously confusing. The documentation is really just a slightly commented version of the header files — not a great way to learn. Being pure C, the structure and style of the declarations and the datatypes used are also a long way from what is normally expected in Cocoa.
If you want to use libxml2 in Cocoa, you'll want a wrapper around it.
Reflecting the manner in which I use XML, my solution will have two functions declared as follows:
For an entire XML document, contained in the
NSData object "
document", this function executes the XPath query in the
query" and returns an
NSDictionary node objects for nodes that match the query.
The only difference between the two listed functions is that the the first expects proper XML data and the second expects HTML data.
Each result in the array of nodes returned will be an
NSDictionary with the following structure:
- nodeName — an
NSStringcontaining the name of the node
- nodeContent — an
NSStringcontaining the textual content of the node
- nodeAttributeArray — an
NSDictionarywhere each dictionary has two keys: attributeName (
NSString) and nodeContent (
- nodeChildArray — an
NSArrayof child nodes (same structure as this node)
Any of these fields may absent if not found in the libxml2 result.
If you don't know how or why to use an XPath query on an XML document, please look at my previous post titled A Cocoa application driven by HTTP data which shows how XPath queries can be used to extract specific sections of data from an HTML document.
Download the full solution here: XPathQuery.m and XPathQuery.h as a 2kb zip file.
The implementation is very straightforward. The entry point looks like this:
The only real difference in the
PerformHTMLXPathQuery version is that it calls
htmlReadMemory instead of
The query itself is then performed in an internal function common to both entry functions:
The work done is here simple: create the working space for the XPath query on the document, evalute the XPath query, get all the nodes from the result and use the
DictionaryForNode function to parse them into our
NSDictionary objects, and clean up when done.
The implementation of the
DictionaryForNode function is the only one I haven't shown. If you download the full solution, you can see how it's done. It's a bit bigger than I want to dump into my blog's text but it really just traverses the libxml2
xmlNodePtr structures, getting the fields it needs and converting them to
NSDictionary as appropriate.
Setting up your project file
You need to add libxml2.dylib to your project (don't put it in the Frameworks section). On the Mac, you'll find it at
/usr/lib/libxml2.dylib and for the iPhone, you'll want the
Since libxml2 is a .dylib (not a nice friendly .framework) we still have one more thing to do. Go to the Project build settings (Project->Edit Project Settings->Build) and find the "Search Paths". In "Header Search Paths" add the following path:
This solution will let you get the results of an XPath query on the iPhone in nice Cocoa friendly objects.
I've only tested this on textual data — I don't know how it will behave on XML CDATA.
If you don't want an XPath query (for example: if you need the whole document) you can either run the query "/" to get the root node or drop the
PerformXPathQuery function and instead run
DictionaryForNode on the
children of the
xmlDocPtr or even cast the
xmlDocPtr to an
xmlNodePtr and run it directly on that (in either case, pass
NULL in as the