summaryrefslogtreecommitdiff
path: root/doc/lookup_tables.html
blob: d72810f18c57bbcec560c57a5bdc2ef404f2351a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head>
<title>Lookup Tables</title>
</head>

<body>
<h1>Lookup Tables</h1>

<p><b><font color="red">NOTE: this is</font> proposed functionality, which is
<font color="red">NOT YET IMPLEMENTED</font>!</b>

<p><b>Lookup tables</a> are a powerful construct
to obtain "class" information based on message content (e.g. to build
log file names for different server types, departments or remote 
offices).</b>
<p>The base idea is to use a message variable as an index into a table which then
returns another value. For example, $fromhost-ip could be used as an index, with
the table value representing the type of server or the department or remote office
it is located in. A main point with lookup tables is that the lookup is very fast.
So while lookup tables can be emulated with if-elseif constructs, they are generally
much faster. Also, it is possible to reload lookup tables during rsyslog runtime without
the need for a full restart.
<p>The lookup tables itself exists in a separate configuration file (one per table). This
file is loaded on rsyslog startup and when a reload is requested.
<p>There are different types of lookup tables:
<ul>
<li><b>string</b> - the value to be looked up is an arbitrary string. Only exact
some strings match.
<li><b>array</b> - the value to be looked up is an integer number from a consequtive set.
The set does not need to start at zero or one, but there must be no number missing. So, for example
5,6,7,8,9 would be a valid set of index values, while 1,2,4,5 would not be (due to missing
2).
A match happens if the requested number is present.
<li><b>sparseArray</b> - the value to be looked up is an integer value, but there may
be gaps inside the set of values (usually there are large gaps). A typical use case would
be the matching of IPv4 address information. A match happens on the first value that is
less than or equal to the requested value.
</ul>
<p>Note that index integer numbers are represented by unsigned 32 bits.
<p>Lookup tables can be access via the lookup() built-in function. The core idea is to
set a local variable to the lookup result and later on use that local variable in templates.
<p>More details on usage now follow.
<h2>Lookup Table File Format</h2>
<p>Lookup table files contain a single JSON object. This object contains of a header and a
table part.
<h3>Header</h3>
<p>The header is the top-level json. It has paramters "version", "nomatch", and "type".
The version parameter
must be given and must always be one for this version of rsyslog. The nomatch
parameter is optional. If specified, it contains the value to be used if lookup()
is provided an index value for which no entry exists. The default for
"nomatch" is the empty string.  Type specifies the type of lookup to be done.
<h3>Table</h3>
This must be an array of elements, even if only a single value exists (for obvious
reasons, we do not expect this to occur often). Each array element must contain two
fields "index" and "value".
<h3>Example</h3>
<p>This is a sample of how an ip-to-office mapping may look like:
<pre>
{ "version":1, "nomatch":"unk", "type":"string",
  "table":[ {"index":"10.0.1.1", "value":"A" },
          {"index":"10.0.1.2", "value":"A" },
          {"index":"10.0.1.3", "value":"A" },
          {"index":"10.0.2.1", "value":"B" },
          {"index":"10.0.2.2", "value":"B" },
          {"index":"10.0.2.3", "value":"B" }
        ]
}
</pre>
Note: if a different IP comes in, the value "unk"
is returend thanks to the nomatch parameter in
the first line.
<p>
<h2>RainerScript Statements</h2>
<h3>lookup_table() Object</h3>
<p>This statement defines and intially loads a lookup table. Its format is
as follows:
<pre>
lookup_table(name="name" file="/path/to/file" reloadOnHUP="on|off")
</pre>
<h4>Parameters</h4>
<ul>
	<li><b>name</b> (mandatory)<br>
	Defines the name of lookup table for further reference
	inside the configuration. Names must be unique. Note that
	it is possible, though not advisible, to have different
	names for the same file.
	<li><b>file</b> (mandatory)<br>
	Specifies the full path for the lookup table file. This file
	must be readable for the user rsyslog is run under (important
	when dropping privileges). It must point to a valid lookup
	table file as described above.
	<li><b>reloadOnHUP</b> (optional, default "on")<br>
	Specifies if the table shall automatically be reloaded
	as part of HUP processing. For static tables, the
	default is "off" and specifying "on" triggers an
	error message. Note that the default of "on" may be
	somewhat suboptimal performance-wise, but probably
	is what the user intuitively expects. Turn it off
	if you know that you do not need the automatic
	reload capability.
</ul>

<h3>lookup() Function</h3>
<p>This function is used to actually do the table lookup. Format:
<pre>
lookup_table("name", indexvalue)
</pre>
<h4>Parameters</h4>
<ul>
	<li><b>return value</b><br>
	The function returns the string that is associated with the
	given indexvalue. If the indexvalue is not present inside the
	lookup table, the "nomatch" string is returned (or an empty string
	if it is not defined).
	<li><b>name</b> (constant string)<br>
	The lookup table to be used. Note that this must be specificed as a
	constant. In theory, variable table names could be made possible, but
	their runtime behaviour is not as good as for static names, and we do
	not (yet) see good use cases where dynamic table names could be useful.
	<li><b>indexvalue</b> (expression)<br>
	The value to be looked up. While this is an arbitrary RainerScript expression,
	it's final value is always converted to a string in order to conduct
	the lookup. For example, "lookup(table, 3+4)" would be exactly the same
	as "lookup(table, "7")". In most cases, indexvalue will probably be
	a single variable, but it could also be the result of all RainerScript-supported
	expression types (like string concatenation or substring extraction).
	Valid samples are "lookup(name, $fromhost-ip &amp; $hostname)" or
	"lookup(name, substr($fromhost-ip, 0, 5))" as well as of course the
	usual "lookup(table, $fromhost-ip)".
</ul>


<h3>load_lookup_table Statement</h3>

<p><b>Note: in the final implementation, this MAY be implemented as an action.
This is a low-level decesion that must be made during the detail development
process. Parameters and semantics will remain the same of this happens.</b>

<p>This statement is used to reload a lookup table. It will fail if
the table is static. While this statement is executed, lookups to this table
are temporarily blocked. So for large tables, there may be a slight performance
hit during the load phase. It is assume that always a triggering condition
is used to load the table.
<pre>
load_lookup_table(name="name" errOnFail="on|off" valueOnFail="value")
</pre>
<h4>Parameters</h4>
<ul>
	<li><b>name</b> (string)<br>
	The lookup table to be used.
	<li><b>errOnFail</b> (boolean, default "on")<br>
	Specifies whether or not an error message is to be emitted if
	there are any problems reloading the lookup table.
	<li><b>valueOnFail</b> (optional, string)<br>
	This parameter affects processing if the lookup table cannot
	be loaded for some reason: If the parameter is not present,
	the previous table will be kept in use. If the parameter is
	given, the previous table will no longer be used, and instead
	an empty table be with nomath=valueOnFail be generated. In short,
	that means when the parameter is set and the reload fails,
	all matches will always return what is specified in valueOnFail.
</ul>

<h3>Usage example</h3>
<p>For clarity, we show only those parts of rsyslog.conf that affect
lookup tables. We use the remote office example that an example lookup
table file is given above for.
<pre>
lookup_table(name="ip2office" file="/path/to/ipoffice.lu"
             reloadOnHUP="off")


template(name="depfile" type="string"
         string="/var/log/%$usr.dep%/messages")

set $usr.dep = lookup("ip2office", $fromhost-ip);
action(type="omfile" dynfile="depfile")

# support for reload "commands"
if $fromhost-ip == "10.0.1.123"
   and $msg contains "reload office lookup table"
   then
   load_lookup_table(name="ip2office" errOnFail="on")
</pre>

<p>Note: for performance reasons, it makes sense to put the reload command into
a dedicated ruleset, bound to a specific listener - which than should also
be sufficiently secured, e.g. via TLS mutual auth.

<h2>Implementation Details</h2>
<p>The lookup table functionality is implemented via highly efficient algorithms.
The string lookup is based on a parse tree and has O(1) time complexity. The array
lookup is also O(1). In case of sparseArray, we have O(log n).
<p>To preserve space and, more important, increase cache hit performance, equal
data values are only stored once, no matter how often a lookup index points to them.
<p>[<a href="rsyslog_conf.html">rsyslog.conf overview</a>]
[<a href="manual.html">manual index</a>] [<a href="http://www.rsyslog.com/">rsyslog site</a>]</p>
<p><font size="2">This documentation is part of the
<a href="http://www.rsyslog.com/">rsyslog</a> project.<br>
Copyright &copy; 2013 by <a href="http://www.gerhards.net/rainer">Rainer Gerhards</a> and
<a href="http://www.adiscon.com/">Adiscon</a>.
Released under the GNU GPL version 3 or higher.</font></p>
</body>
</html>