summaryrefslogtreecommitdiff
path: root/doc/mmutf8fix.html
diff options
context:
space:
mode:
Diffstat (limited to 'doc/mmutf8fix.html')
-rw-r--r--doc/mmutf8fix.html105
1 files changed, 0 insertions, 105 deletions
diff --git a/doc/mmutf8fix.html b/doc/mmutf8fix.html
deleted file mode 100644
index 6275c17..0000000
--- a/doc/mmutf8fix.html
+++ /dev/null
@@ -1,105 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
-<html><head>
-<meta http-equiv="Content-Language" content="en">
-<title>Fix invalid UTF-8 Sequences (mmutf8fix)</title></head>
-
-<body>
-<a href="rsyslog_conf_modules.html">back</a>
-
-<h1>Fix invalid UTF-8 Sequences (mmutf8fix)</h1>
-<p><b>Module Name:&nbsp;&nbsp;&nbsp; mmutf8fix</b></p>
-<p><b>Author: </b>Rainer Gerhards &lt;rgerhards@adiscon.com&gt;</p>
-<p><b>Available since</b>: 7.5.4</p>
-<p><b>Description</b>:</p>
-<p>The mmutf8fix module permits to fix invalid UTF-8 sequences.
-Most often, such invalid sequences result from syslog sources sending
-in non-UTF character sets, e.g. ISO 8859. As syslog does not have a way
-to convey the character set information, these sequences are not properly
-handled. While they are typically uncritical with plain text files, they can
-cause big headache with database sources as well as systems like ElasticSearch.
-<p>The module supports different "fixing" modes and fixes. The current
-implementation will always replace invalid bytes with a single US ASCII
-character. Additional replacement modes will probably be added in the future,
-depending on user demand. In the longer term
-it could also be evolved into an any-charset-to-UTF8 converter. But
-first let's see if it really gets into widespread enough use.
-
-<p><b>Proper Usage</b>:</p>
-<p>Some notes are due for proper use of this module. This is a message modification
-module utilizing the action interface, which means you call it like an action.
-This gives great flexibility on the question on when and how to call this module.
-Note that once it has been called, it actually modifies the message. The original
-messsage is then no longer available. However, this does <b>not</b> change any
-properties set, used or extracted before the modification is done.
-<p>One potential use case is to normalize all messages. This is done by simply calling
-mmutf8fix right in front of all other actions.
-<p>If only a specific source (or set of sources) is known to cause problems,
-mmutf8fix can be conditionally called only on messages from them. This also offers
-performance benefits. If such multiple sources exists, it probably is a good idea
-to define different listeners for their incoming traffic, bind them to specific
-<a href="multi_ruleset.html">ruleset</a> and call mmutf8fix as first action in this
-ruleset.
-
-<p><b>Module Configuration Parameters</b>:</p>
-<p>Currently none.
-<p>&nbsp;</p>
-<p><b>Action Confguration Parameters</b>:</p>
-<ul>
-<li><b>mode</b> - <b>utf-8</b>/controlcharacters<br>
-This sets the basic detection mode.
-<br>In <b>utf-8</b> mode (the default), proper
-UTF-8 encoding is checked and bytes which are not proper UTF-8 sequences
-are acted on. If a proper multi-byte start sequence byte is detected but
-any of the following bytes is invalid, the whole sequence is replaced by
-the replacement method. This mode is most useful with non-US-ASCII character
-sets, which validly includes multibyte sequences. Note that in this mode
-control characters are NOT being replaced, because they are valid UTF-8.
-<br>In <b>controlcharacters</b> mode, all bytes which do not represent a
-printable US-ASCII character (codes 32 to 126) are replaced. Note that this
-also mangles valid UTF-8 multi-byte sequences, as these are (deliberately) outside
-of that character range. This mode is most useful if it is known that no
-characters outside of the US-ASCII alphabet need to be processed.
-<li><b>replacementChar</b> - default " " (space), a single character<br>
-This is the character that invalid sequences are replaced by. Currently, it
-MUST be a <b>printable</b> US-ASCII character.
-</ul>
-
-<p><b>Caveats/Known Bugs:</b>
-<ul>
-<li>overlong UTF-8 encodings are currently not detected in utf-8 mode.
-</ul>
-
-<p><b>Samples:</b></p>
-<p>In this snippet, we write one file without fixing UTF-8 and another one
-with the message fixed. Note that once mmutf8fix has run, access to the
-original message is no longer possible.
-<p><textarea rows="5" cols="60">module(load="mmutf8fix")
-action(type="omfile" file="/path/to/non-fixed.log")
-action(type="mmutf8fix")
-action(type="omfile" file="/path/to/fixed.log")
-</textarea>
-
-<p>In this sample, we fix only message originating from host 10.0.0.1.
-<p><textarea rows="5" cols="60">module(load="mmutf8fix")
-if $fromhost-ip == "10.0.0.1" then
- action(type="mmutf8fix")
-# all other actions here...
-</textarea>
-
-<p>This is mostly the same as the previous sample, but uses "controlcharacters"
-processing mode.
-<p><textarea rows="5" cols="60">module(load="mmutf8fix")
-if $fromhost-ip == "10.0.0.1" then
- action(type="mmutf8fix" mode="controlcharacters")
-# all other actions here...
-</textarea>
-
-<p>[<a href="rsyslog_conf.html">rsyslog.conf overview</a>] [<a href="manual.html">manual
-index</a>] [<a href="http://www.rsyslog.com/">rsyslog site</a>]</p>
-<p><font size="2">This documentation is part of the
-<a href="http://www.rsyslog.com/">rsyslog</a> project.<br>
-Copyright &copy; 2013 by <a href="http://www.gerhards.net/rainer">Rainer Gerhards</a> and
-<a href="http://www.adiscon.com/">Adiscon</a>. Released under the GNU GPL
-version 3 or higher.</font></p>
-
-</body></html>