diff options
Diffstat (limited to 'docs/reference/modules/scripting.asciidoc')
-rw-r--r-- | docs/reference/modules/scripting.asciidoc | 316 |
1 files changed, 316 insertions, 0 deletions
diff --git a/docs/reference/modules/scripting.asciidoc b/docs/reference/modules/scripting.asciidoc new file mode 100644 index 0000000..166a030 --- /dev/null +++ b/docs/reference/modules/scripting.asciidoc @@ -0,0 +1,316 @@ +[[modules-scripting]] +== Scripting + +The scripting module allows to use scripts in order to evaluate custom +expressions. For example, scripts can be used to return "script fields" +as part of a search request, or can be used to evaluate a custom score +for a query and so on. + +The scripting module uses by default http://mvel.codehaus.org/[mvel] as +the scripting language with some extensions. mvel is used since it is +extremely fast and very simple to use, and in most cases, simple +expressions are needed (for example, mathematical equations). + +Additional `lang` plugins are provided to allow to execute scripts in +different languages. Currently supported plugins are `lang-javascript` +for JavaScript, `lang-groovy` for Groovy, and `lang-python` for Python. +All places where a `script` parameter can be used, a `lang` parameter +(on the same level) can be provided to define the language of the +script. The `lang` options are `mvel`, `js`, `groovy`, `python`, and +`native`. + +[float] +=== Default Scripting Language + +The default scripting language (assuming no `lang` parameter is +provided) is `mvel`. In order to change it set the `script.default_lang` +to the appropriate language. + +[float] +=== Preloaded Scripts + +Scripts can always be provided as part of the relevant API, but they can +also be preloaded by placing them under `config/scripts` and then +referencing them by the script name (instead of providing the full +script). This helps reduce the amount of data passed between the client +and the nodes. + +The name of the script is derived from the hierarchy of directories it +exists under, and the file name without the lang extension. For example, +a script placed under `config/scripts/group1/group2/test.py` will be +named `group1_group2_test`. + +[float] +=== Disabling dynamic scripts + +We recommend running Elasticsearch behind an application or proxy, +which protects Elasticsearch from the outside world. If users are +allowed to run dynamic scripts (even in a search request), then they +have the same access to your box as the user that Elasticsearch is +running as. + +First, you should not run Elasticsearch as the `root` user, as this +would allow a script to access or do *anything* on your server, without +limitations. Second, you should not expose Elasticsearch directly to +users, but instead have a proxy application inbetween. If you *do* +intend to expose Elasticsearch directly to your users, then you have +to decide whether you trust them enough to run scripts on your box or +not. If not, then even if you have a proxy which only allows `GET` +requests, you should disable dynamic scripting by adding the following +setting to the `config/elasticsearch.yml` file on every node: + +[source,yaml] +----------------------------------- +script.disable_dynamic: true +----------------------------------- + +This will still allow execution of named scripts provided in the config, or +_native_ Java scripts registered through plugins, however it will prevent +users from running arbitrary scripts via the API. + +[float] +=== Automatic Script Reloading + +The `config/scripts` directory is scanned periodically for changes. +New and changed scripts are reloaded and deleted script are removed +from preloaded scripts cache. The reload frequency can be specified +using `watcher.interval` setting, which defaults to `60s`. +To disable script reloading completely set `script.auto_reload_enabled` +to `false`. + +[float] +=== Native (Java) Scripts + +Even though `mvel` is pretty fast, allow to register native Java based +scripts for faster execution. + +In order to allow for scripts, the `NativeScriptFactory` needs to be +implemented that constructs the script that will be executed. There are +two main types, one that extends `AbstractExecutableScript` and one that +extends `AbstractSearchScript` (probably the one most users will extend, +with additional helper classes in `AbstractLongSearchScript`, +`AbstractDoubleSearchScript`, and `AbstractFloatSearchScript`). + +Registering them can either be done by settings, for example: +`script.native.my.type` set to `sample.MyNativeScriptFactory` will +register a script named `my`. Another option is in a plugin, access +`ScriptModule` and call `registerScript` on it. + +Executing the script is done by specifying the `lang` as `native`, and +the name of the script as the `script`. + +Note, the scripts need to be in the classpath of elasticsearch. One +simple way to do it is to create a directory under plugins (choose a +descriptive name), and place the jar / classes files there, they will be +automatically loaded. + +[float] +=== Score + +In all scripts that can be used in facets, allow to access the current +doc score using `doc.score`. + +[float] +=== Computing scores based on terms in scripts + +see <<modules-advanced-scripting, advanced scripting documentation>> + +[float] +=== Document Fields + +Most scripting revolve around the use of specific document fields data. +The `doc['field_name']` can be used to access specific field data within +a document (the document in question is usually derived by the context +the script is used). Document fields are very fast to access since they +end up being loaded into memory (all the relevant field values/tokens +are loaded to memory). + +The following data can be extracted from a field: + +[cols="<,<",options="header",] +|======================================================================= +|Expression |Description +|`doc['field_name'].value` |The native value of the field. For example, +if its a short type, it will be short. + +|`doc['field_name'].values` |The native array values of the field. For +example, if its a short type, it will be short[]. Remember, a field can +have several values within a single doc. Returns an empty array if the +field has no values. + +|`doc['field_name'].empty` |A boolean indicating if the field has no +values within the doc. + +|`doc['field_name'].multiValued` |A boolean indicating that the field +has several values within the corpus. + +|`doc['field_name'].lat` |The latitude of a geo point type. + +|`doc['field_name'].lon` |The longitude of a geo point type. + +|`doc['field_name'].lats` |The latitudes of a geo point type. + +|`doc['field_name'].lons` |The longitudes of a geo point type. + +|`doc['field_name'].distance(lat, lon)` |The `plane` distance (in meters) +of this geo point field from the provided lat/lon. + +|`doc['field_name'].distanceWithDefault(lat, lon, default)` |The `plane` distance (in meters) +of this geo point field from the provided lat/lon with a default value. + +|`doc['field_name'].distanceInMiles(lat, lon)` |The `plane` distance (in +miles) of this geo point field from the provided lat/lon. + +|`doc['field_name'].distanceInMilesWithDefault(lat, lon, default)` |The `plane` distance (in +miles) of this geo point field from the provided lat/lon with a default value. + +|`doc['field_name'].distanceInKm(lat, lon)` |The `plane` distance (in +km) of this geo point field from the provided lat/lon. + +|`doc['field_name'].distanceInKmWithDefault(lat, lon, default)` |The `plane` distance (in +km) of this geo point field from the provided lat/lon with a default value. + +|`doc['field_name'].arcDistance(lat, lon)` |The `arc` distance (in +meters) of this geo point field from the provided lat/lon. + +|`doc['field_name'].arcDistanceWithDefault(lat, lon, default)` |The `arc` distance (in +meters) of this geo point field from the provided lat/lon with a default value. + +|`doc['field_name'].arcDistanceInMiles(lat, lon)` |The `arc` distance (in +miles) of this geo point field from the provided lat/lon. + +|`doc['field_name'].arcDistanceInMilesWithDefault(lat, lon, default)` |The `arc` distance (in +miles) of this geo point field from the provided lat/lon with a default value. + +|`doc['field_name'].arcDistanceInKm(lat, lon)` |The `arc` distance (in +km) of this geo point field from the provided lat/lon. + +|`doc['field_name'].arcDistanceInKmWithDefault(lat, lon, default)` |The `arc` distance (in +km) of this geo point field from the provided lat/lon with a default value. + +|`doc['field_name'].factorDistance(lat, lon)` |The distance factor of this geo point field from the provided lat/lon. + +|`doc['field_name'].factorDistance(lat, lon, default)` |The distance factor of this geo point field from the provided lat/lon with a default value. + + +|======================================================================= + +[float] +=== Stored Fields + +Stored fields can also be accessed when executed a script. Note, they +are much slower to access compared with document fields, but are not +loaded into memory. They can be simply accessed using +`_fields['my_field_name'].value` or `_fields['my_field_name'].values`. + +[float] +=== Source Field + +The source field can also be accessed when executing a script. The +source field is loaded per doc, parsed, and then provided to the script +for evaluation. The `_source` forms the context under which the source +field can be accessed, for example `_source.obj2.obj1.field3`. + +Accessing `_source` is much slower compared to using `_doc` +but the data is not loaded into memory. For a single field access `_fields` may be +faster than using `_source` due to the extra overhead of potentially parsing large documents. +However, `_source` may be faster if you access multiple fields or if the source has already been +loaded for other purposes. + + +[float] +=== mvel Built In Functions + +There are several built in functions that can be used within scripts. +They include: + +[cols="<,<",options="header",] +|======================================================================= +|Function |Description +|`time()` |The current time in milliseconds. + +|`sin(a)` |Returns the trigonometric sine of an angle. + +|`cos(a)` |Returns the trigonometric cosine of an angle. + +|`tan(a)` |Returns the trigonometric tangent of an angle. + +|`asin(a)` |Returns the arc sine of a value. + +|`acos(a)` |Returns the arc cosine of a value. + +|`atan(a)` |Returns the arc tangent of a value. + +|`toRadians(angdeg)` |Converts an angle measured in degrees to an +approximately equivalent angle measured in radians + +|`toDegrees(angrad)` |Converts an angle measured in radians to an +approximately equivalent angle measured in degrees. + +|`exp(a)` |Returns Euler's number _e_ raised to the power of value. + +|`log(a)` |Returns the natural logarithm (base _e_) of a value. + +|`log10(a)` |Returns the base 10 logarithm of a value. + +|`sqrt(a)` |Returns the correctly rounded positive square root of a +value. + +|`cbrt(a)` |Returns the cube root of a double value. + +|`IEEEremainder(f1, f2)` |Computes the remainder operation on two +arguments as prescribed by the IEEE 754 standard. + +|`ceil(a)` |Returns the smallest (closest to negative infinity) value +that is greater than or equal to the argument and is equal to a +mathematical integer. + +|`floor(a)` |Returns the largest (closest to positive infinity) value +that is less than or equal to the argument and is equal to a +mathematical integer. + +|`rint(a)` |Returns the value that is closest in value to the argument +and is equal to a mathematical integer. + +|`atan2(y, x)` |Returns the angle _theta_ from the conversion of +rectangular coordinates (_x_, _y_) to polar coordinates (r,_theta_). + +|`pow(a, b)` |Returns the value of the first argument raised to the +power of the second argument. + +|`round(a)` |Returns the closest _int_ to the argument. + +|`random()` |Returns a random _double_ value. + +|`abs(a)` |Returns the absolute value of a value. + +|`max(a, b)` |Returns the greater of two values. + +|`min(a, b)` |Returns the smaller of two values. + +|`ulp(d)` |Returns the size of an ulp of the argument. + +|`signum(d)` |Returns the signum function of the argument. + +|`sinh(x)` |Returns the hyperbolic sine of a value. + +|`cosh(x)` |Returns the hyperbolic cosine of a value. + +|`tanh(x)` |Returns the hyperbolic tangent of a value. + +|`hypot(x, y)` |Returns sqrt(_x2_ + _y2_) without intermediate overflow +or underflow. +|======================================================================= + +[float] +=== Arithmetic precision in MVEL + +When dividing two numbers using MVEL based scripts, the engine tries to +be smart and adheres to the default behaviour of java. This means if you +divide two integers (you might have configured the fields as integer in +the mapping), the result will also be an integer. This means, if a +calculation like `1/num` is happening in your scripts and `num` is an +integer with the value of `8`, the result is `0` even though you were +expecting it to be `0.125`. You may need to enforce precision by +explicitly using a double like `1.0/num` in order to get the expected +result. |