Here is a real size example, where the actual content of
theapplicationdata is not kept in the DOM tree but uses internal structures.
Itis based ona proposal to keep a database of jobs related to Gnome, with
anXML basedstorage structure. Here is an XML
encodedjobsbase: <?xml version="1.0"?>
<gjob:Helping xmlns:gjob="http://www.gnome.org/some-location">
<gjob:Jobs>
<gjob:Job>
<gjob:Project ID="3"/>
<gjob:Application>GBackup</gjob:Application>
<gjob:Category>Development</gjob:Category>
<gjob:Update>
<gjob:Status>Open</gjob:Status>
<gjob:Modified>Mon, 07 Jun 1999 20:27:45 -0400 MET DST</gjob:Modified>
<gjob:Salary>USD 0.00</gjob:Salary>
</gjob:Update>
<gjob:Developers>
<gjob:Developer>
</gjob:Developer>
</gjob:Developers>
<gjob:Contact>
<gjob:Person>Nathan Clemons</gjob:Person>
<gjob:Email>nathan@windsofstorm.net</gjob:Email>
<gjob:Company>
</gjob:Company>
<gjob:Organisation>
</gjob:Organisation>
<gjob:Webpage>
</gjob:Webpage>
<gjob:Snailmail>
</gjob:Snailmail>
<gjob:Phone>
</gjob:Phone>
</gjob:Contact>
<gjob:Requirements>
The program should be released as free software, under the GPL.
</gjob:Requirements>
<gjob:Skills>
</gjob:Skills>
<gjob:Details>
A GNOME based system that will allow a superuser to configure
compressed and uncompressed files and/or file systems to be backed
up with a supported media in the system. This should be able to
perform via find commands generating a list of files that are passed
to tar, dd, cpio, cp, gzip, etc., to be directed to the tape machine
or via operations performed on the filesystem itself. Email
notification and GUI status display very important.
</gjob:Details>
</gjob:Job>
</gjob:Jobs>
</gjob:Helping> While loading the XML file into an internal DOM tree is a matter
ofcallingonly a couple of functions, browsing the tree to gather the data
andgeneratethe internal structures is harder, and more error prone. The suggested principle is to be tolerant with respect to
theinputstructure. For example, the ordering of the attributes is
notsignificant,the XML specification is clear about it. It's also usually a
goodidea not todepend on the order of the children of a given node, unless
itreally makesthings harder. Here is some code to parse the information for
aperson: /*
* A person record
*/
typedef struct person {
char *name;
char *email;
char *company;
char *organisation;
char *smail;
char *webPage;
char *phone;
} person, *personPtr;
/*
* And the code needed to parse it
*/
personPtr parsePerson(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) {
personPtr ret = NULL;
DEBUG("parsePerson\n");
/*
* allocate the struct
*/
ret = (personPtr) malloc(sizeof(person));
if (ret == NULL) {
fprintf(stderr,"out of memory\n");
return(NULL);
}
memset(ret, 0, sizeof(person));
/* We don't care what the top level element name is */
cur = cur->xmlChildrenNode;
while (cur != NULL) {
if ((!strcmp(cur->name, "Person")) && (cur->ns == ns))
ret->name = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1);
if ((!strcmp(cur->name, "Email")) && (cur->ns == ns))
ret->email = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1);
cur = cur->next;
}
return(ret);
} Here are a couple of things to notice: - Usually a recursive parsing style is the more convenient one: XMLdatais
by nature subject to repetitive constructs and usually
exhibitshighlystructured patterns.
- The two arguments of type xmlDocPtrand
xmlNsPtr,i.e.the pointer to the global XML document and the
namespace reserved totheapplication. Document wide information are needed
for example todecodeentities and it's a good coding practice to define a
namespace foryourapplication set of data and test that the element and
attributesyou'reanalyzing actually pertains to your application space.
This isdone by asimple equality test (cur->ns == ns).
- To retrieve text and attributes value, you can use
thefunctionxmlNodeListGetStringto gather all the text and
entityreferencenodes generated by the DOM output and produce an single
textstring.
Here is another piece of code used to parse another level
ofthestructure: #include <libxml/tree.h>
/*
* a Description for a Job
*/
typedef struct job {
char *projectID;
char *application;
char *category;
personPtr contact;
int nbDevelopers;
personPtr developers[100]; /* using dynamic alloc is left as an exercise */
} job, *jobPtr;
/*
* And the code needed to parse it
*/
jobPtr parseJob(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) {
jobPtr ret = NULL;
DEBUG("parseJob\n");
/*
* allocate the struct
*/
ret = (jobPtr) malloc(sizeof(job));
if (ret == NULL) {
fprintf(stderr,"out of memory\n");
return(NULL);
}
memset(ret, 0, sizeof(job));
/* We don't care what the top level element name is */
cur = cur->xmlChildrenNode;
while (cur != NULL) {
if ((!strcmp(cur->name, "Project")) && (cur->ns == ns)) {
ret->projectID = xmlGetProp(cur, "ID");
if (ret->projectID == NULL) {
fprintf(stderr, "Project has no ID\n");
}
}
if ((!strcmp(cur->name, "Application")) && (cur->ns == ns))
ret->application = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1);
if ((!strcmp(cur->name, "Category")) && (cur->ns == ns))
ret->category = xmlNodeListGetString(doc, cur->xmlChildrenNode, 1);
if ((!strcmp(cur->name, "Contact")) && (cur->ns == ns))
ret->contact = parsePerson(doc, ns, cur);
cur = cur->next;
}
return(ret);
} Once you are used to it, writing this kind of code is quite
simple,butboring. Ultimately, it could be possible to write stubbers taking
eitherCdata structure definitions, a set of XML examples or an XML DTD
andproducethe code needed to import and export the content between C data
andXMLstorage. This is left as an exercise to the reader :-) Feel free to use the code for the
fullCparsing exampleas a template, it is also available with Makefile
intheGnome CVS base under gnome-xml/example Daniel Veillard |