From 2ee13d9e464a1f5daccaff58f5d09d36b7c4f667 Mon Sep 17 00:00:00 2001
From: Aron Xu
If you don't unde without knowing what encoding it uses, then as Joel Spolsky said please do not write another line of code until you finish reading that article.. It is a prerequisite to understand this page, and avoid a lot of problems with -libxml2, XML or text processing in general.
Table of Content:
Table of Content:
<?xml version="1.0" encoding="ISO-8859-1"?> -<très>là </très>
Having internationalization support in libxml2 means the following:
Having internationalization support in libxml2 means the following:
<!DOCTYPE HTML PUBLIC "- <p>W3C crée des standards pour le Web.</body> </html>
One of the core decisions was to force all documents to be converted to a default internal encoding, and that encoding to be UTF-8, here are the -rationales for those choices:
What does this mean in practice for the libxml2 user:
What does this mean in practice for the libxml2 user:
Let's describe how all this works within libxml, basically the I18N (internationalization) support get triggered only during I/O operation, i.e. when reading a document or saving one. Let's look first at the reading -sequence:
otherwise everything is written in the internal form, i.e. UTF-8
@@ -175,7 +183,8 @@ so a couple of functions htmlGetMetaEncoding() and htmlSetMetaEncoding() have been provided. The parser also attempts to switch encoding on the fly when detecting such a tag on input. Except for that the processing is the same (and again reuses the same code).libxml2 has a set of default converters for the following encodings -(located in encoding.c):