2016-05-31: The /* Programming Comments */ documents have moved, and will no longer be updated or maintained at this location! Please update your bookmarks: http://www.ccoderun.ca/programming/.


The first time I attempted to use libxml2 from within a .c or .cpp file, I just about bailed, convinced it would be easier to write a short shell script to get the job done. The libxml2 calls required to find, update, and save a value within a .xml file is not simple, for what at first glance might appear to be a relatively trivial and common task.

What I wanted -- and could not find at the time! -- was a very simple example that showed which of the numerous libxml2 APIs I had to call. This article will hopefully fill that gap. Here is what I found I had to do to read, update, and save a .xml file:

How to build

My example source code does not come with a Makefile. These are the 2 parameters required to get my example source to build with g++:

g++ -I/usr/include/libxml2 -lxml2 2010-01-03_LibXml2.cpp
  1. -I/usr/include/libxml2 to let the compiler know where it needs to look to find the necessary headers.
  2. -lxml2 to let the linker know what library to link against.


Startup consists of initializing libxml2 and opening the example .xml file. The relevant source lines are:

#include <libxml/xpath.h> ... xmlInitParser(); LIBXML_TEST_VERSION xmlDoc *doc = xmlParseFile( "example.xml" );

The doc pointer you get back will be used until you are finished with the .xml file, so store it somewhere as you'll need to reference it several times.

Directly accessing xml nodes

If you wont be using XPath, you can use the doc pointer to walk the tree of XML nodes. Note how doc has a children pointer -- start there with your favorite debugger, and keep dereferencing the parent, children, name, content and next pointers to find what you need. Nodes are represented using xmlNode

If you're looking for an easy method to read, update, and write .xml files, this is likely not what you want to do. Instead, take a look at XPath.

Finding a node using XPath

Think of XPath as a way to access an XML node using something that looks very much like a directory path. XPath expressions can be quite fancy, but at a minimum, the <Pet> node from this example .xml file:

<Example> <Objects display="true"> <Kiwi colour="green" count="8"/> <Pet type="dog" count="1" size="big"/> </Objects> </Example>

...can be accessed using the following xpath code:

xmlXPathContext *xpathCtx = xmlXPathNewContext( doc ); xmlXPathObject * xpathObj = xmlXPathEvalExpression( (xmlChar*)"/Example/Objects/Pet", xpathCtx ); xmlNode *node = xpathObj->nodesetval->nodeTab[0];
This excludes important things like error checking and cleanup; see 2010-01-03_LibXml2.cpp for the full listing.

Accessing attributes

Once we have a node, accessing attributes (aka "properties" in xmlNode) is relatively simple. Each node has a single pointer to an attribute, and that attribute has a pointer to the next one. Everything is stored as a string in libxml2, so you'll have to convert attributes back to numeric, boolean, etc., as necessary. This loop will display the name and value of each attribute in a node. Using the example .xml file above, if the node is Pet, then the attributes displayed would be "type", "count", and "size":

xmlAttr *attr = node->properties; while ( attr ) { std::cout << "Attribute name: " << attr->name << " value: " << attr->children->content << std::endl; attr = attr->next; }

Adding a new attribute

Adding a new attribute to an existing node is very simple:

xmlSetProp( node, (xmlChar*)"age", (xmlChar*)"3" );

Remember that all attributes are text strings (or, "xmlChar") to libxml2.

Saving the .xml file

There are several different ways to save a file using libxml2. Here is the one I use:

xmlSaveFormatFileEnc( "output.xml", doc, "utf-8", 1 );

The "1" as the last parameter indicates that nodes need to be auto-indented. However, it seems this also requires other APIs to have been called prior to opening the .xml file. Not included in the example code, my own code into libxml2 normally includes these lines just prior to calling xmlParseFile():

xmlLineNumbersDefault(1); xmlThrDefIndentTreeOutput(1); xmlKeepBlanksDefault(0); xmlThrDefTreeIndentString(" ");

Shutting down

Cleaning up libxml2 is simple. The tree of nodes and attributes are recursively handled in a single call when the doc pointer is freed, but all xpath contexts and xpath objects need to be explicitly released:

xmlXPathFreeObject( xpathObj ); xmlXPathFreeContext( xpathCtx ); xmlFreeDoc( doc ); xmlCleanupParser();

Additional APIs and information I find useful

Last modified: 2014-09-28
Stéphane Charette, stephanecharette@gmail.com