Action against software patentsGnome2 LogoW3C LogoRed Hat Logo
Made with Libxml2 Logo

The XML C parser and toolkit of Gnome

Entities or no entities

Developer Menu
API Indexes
Related links

Entities in principle are similar to simple C macros. An entity definesanabbreviation for a given string that you can reuse many times throughoutthecontent of your document. Entities are especially useful when a givenstringmay occur frequently within a document, or to confine the change neededto adocument to a restricted area in the internal subset of the document (atthebeginning). Example:

1 <?xml version="1.0"?>
2 <!DOCTYPE EXAMPLE SYSTEM "example.dtd" [
3 <!ENTITY xml "Extensible Markup Language">
4 ]>
5 <EXAMPLE>
6    &xml;
7 </EXAMPLE>

Line 3 declares the xml entity. Line 6 uses the xml entity, byprefixingits name with '&' and following it by ';' without any spacesadded. Thereare 5 predefined entities in libxml2 allowing you to escapecharacters withpredefined meaning in some parts of the xml documentcontent:&lt;for the character '<',&gt;for the character '>',&apos;for the character''',&quot;for the character '"',and&amp;for the character '&'.

One of the problems related to entities is that you may want the parsertosubstitute an entity's content so that you can see the replacement textinyour application. Or you may prefer to keep entity references as such inthecontent to be able to save the document back without losing thisusuallyprecious information (if the user went through the pain ofexplicitlydefining entities, he may have a a rather negative attitude if youblindlysubstitute them as saving time). The xmlSubstituteEntitiesDefault()functionallows you to check and change the behaviour, which is to notsubstituteentities by default.

Here is the DOM tree built by libxml2 for the previous document inthedefault case:

/gnome/src/gnome-xml -> ./xmllint --debug test/ent1
DOCUMENT
version=1.0
   ELEMENT EXAMPLE
     TEXT
     content=
     ENTITY_REF
       INTERNAL_GENERAL_ENTITY xml
       content=Extensible Markup Language
     TEXT
     content=

And here is the result when substituting entities:

/gnome/src/gnome-xml -> ./tester --debug --noent test/ent1
DOCUMENT
version=1.0
   ELEMENT EXAMPLE
     TEXT
     content=     Extensible Markup Language

So, entities or no entities? Basically, it depends on your use case.Isuggest that you keep the non-substituting default behaviour and avoidusingentities in your XML document or data if you are not willing to handletheentity references elements in the DOM tree.

Note that at save time libxml2 enforces the conversion of thepredefinedentities where necessary to prevent well-formedness problems, andwill alsotransparently replace those with chars (i.e. it will not generateentityreference elements in the DOM tree or call the reference() SAX callbackwhenfinding them in the input).

WARNING: handlingentitieson top of the libxml2 SAX interface is difficult!!! If you plan tousenon-predefined entities in your documents, then the learning curve tohandlethen using the SAX API may be long. If you plan to use complexdocuments, Istrongly suggest you consider using the DOM interface instead andlet libxmldeal with the complexity rather than trying to do it yourself.

Daniel Veillard