diff options
author | Hilko Bengen <bengen@debian.org> | 2014-06-07 12:02:12 +0200 |
---|---|---|
committer | Hilko Bengen <bengen@debian.org> | 2014-06-07 12:02:12 +0200 |
commit | d5ed89b946297270ec28abf44bef2371a06f1f4f (patch) | |
tree | ce2d945e4dde69af90bd9905a70d8d27f4936776 /docs/reference/mapping/fields | |
download | elasticsearch-d5ed89b946297270ec28abf44bef2371a06f1f4f.tar.gz |
Imported Upstream version 1.0.3upstream/1.0.3
Diffstat (limited to 'docs/reference/mapping/fields')
-rw-r--r-- | docs/reference/mapping/fields/all-field.asciidoc | 78 | ||||
-rw-r--r-- | docs/reference/mapping/fields/analyzer-field.asciidoc | 41 | ||||
-rw-r--r-- | docs/reference/mapping/fields/boost-field.asciidoc | 72 | ||||
-rw-r--r-- | docs/reference/mapping/fields/id-field.asciidoc | 52 | ||||
-rw-r--r-- | docs/reference/mapping/fields/index-field.asciidoc | 15 | ||||
-rw-r--r-- | docs/reference/mapping/fields/parent-field.asciidoc | 21 | ||||
-rw-r--r-- | docs/reference/mapping/fields/routing-field.asciidoc | 69 | ||||
-rw-r--r-- | docs/reference/mapping/fields/size-field.asciidoc | 26 | ||||
-rw-r--r-- | docs/reference/mapping/fields/source-field.asciidoc | 41 | ||||
-rw-r--r-- | docs/reference/mapping/fields/timestamp-field.asciidoc | 82 | ||||
-rw-r--r-- | docs/reference/mapping/fields/ttl-field.asciidoc | 70 | ||||
-rw-r--r-- | docs/reference/mapping/fields/type-field.asciidoc | 31 | ||||
-rw-r--r-- | docs/reference/mapping/fields/uid-field.asciidoc | 11 |
13 files changed, 609 insertions, 0 deletions
diff --git a/docs/reference/mapping/fields/all-field.asciidoc b/docs/reference/mapping/fields/all-field.asciidoc new file mode 100644 index 0000000..65453ef --- /dev/null +++ b/docs/reference/mapping/fields/all-field.asciidoc @@ -0,0 +1,78 @@ +[[mapping-all-field]] +=== `_all` + +The idea of the `_all` field is that it includes the text of one or more +other fields within the document indexed. It can come very handy +especially for search requests, where we want to execute a search query +against the content of a document, without knowing which fields to +search on. This comes at the expense of CPU cycles and index size. + +The `_all` fields can be completely disabled. Explicit field mapping and +object mapping can be excluded / included in the `_all` field. By +default, it is enabled and all fields are included in it for ease of +use. + +When disabling the `_all` field, it is a good practice to set +`index.query.default_field` to a different value (for example, if you +have a main "message" field in your data, set it to `message`). + +One of the nice features of the `_all` field is that it takes into +account specific fields boost levels. Meaning that if a title field is +boosted more than content, the title (part) in the `_all` field will +mean more than the content (part) in the `_all` field. + +Here is a sample mapping: + +[source,js] +-------------------------------------------------- +{ + "person" : { + "_all" : {"enabled" : true}, + "properties" : { + "name" : { + "type" : "object", + "dynamic" : false, + "properties" : { + "first" : {"type" : "string", "store" : true , "include_in_all" : false}, + "last" : {"type" : "string", "index" : "not_analyzed"} + } + }, + "address" : { + "type" : "object", + "include_in_all" : false, + "properties" : { + "first" : { + "properties" : { + "location" : {"type" : "string", "store" : true, "index_name" : "firstLocation"} + } + }, + "last" : { + "properties" : { + "location" : {"type" : "string"} + } + } + } + }, + "simple1" : {"type" : "long", "include_in_all" : true}, + "simple2" : {"type" : "long", "include_in_all" : false} + } + } +} +-------------------------------------------------- + +The `_all` fields allows for `store`, `term_vector` and `analyzer` (with +specific `index_analyzer` and `search_analyzer`) to be set. + +[float] +[[highlighting]] +==== Highlighting + +For any field to allow +<<search-request-highlighting,highlighting>> it has +to be either stored or part of the `_source` field. By default `_all` +field does not qualify for either, so highlighting for it does not yield +any data. + +Although it is possible to `store` the `_all` field, it is basically an +aggregation of all fields, which means more data will be stored, and +highlighting it might produce strange results. diff --git a/docs/reference/mapping/fields/analyzer-field.asciidoc b/docs/reference/mapping/fields/analyzer-field.asciidoc new file mode 100644 index 0000000..30bb072 --- /dev/null +++ b/docs/reference/mapping/fields/analyzer-field.asciidoc @@ -0,0 +1,41 @@ +[[mapping-analyzer-field]] +=== `_analyzer` + +The `_analyzer` mapping allows to use a document field property as the +name of the analyzer that will be used to index the document. The +analyzer will be used for any field that does not explicitly defines an +`analyzer` or `index_analyzer` when indexing. + +Here is a simple mapping: + +[source,js] +-------------------------------------------------- +{ + "type1" : { + "_analyzer" : { + "path" : "my_field" + } + } +} +-------------------------------------------------- + +The above will use the value of the `my_field` to lookup an analyzer +registered under it. For example, indexing a the following doc: + +[source,js] +-------------------------------------------------- +{ + "my_field" : "whitespace" +} +-------------------------------------------------- + +Will cause the `whitespace` analyzer to be used as the index analyzer +for all fields without explicit analyzer setting. + +The default path value is `_analyzer`, so the analyzer can be driven for +a specific document by setting `_analyzer` field in it. If custom json +field name is needed, an explicit mapping with a different path should +be set. + +By default, the `_analyzer` field is indexed, it can be disabled by +settings `index` to `no` in the mapping. diff --git a/docs/reference/mapping/fields/boost-field.asciidoc b/docs/reference/mapping/fields/boost-field.asciidoc new file mode 100644 index 0000000..1d00845 --- /dev/null +++ b/docs/reference/mapping/fields/boost-field.asciidoc @@ -0,0 +1,72 @@ +[[mapping-boost-field]] +=== `_boost` + +deprecated[1.0.0.RC1,See <<function-score-instead-of-boost>>] + +Boosting is the process of enhancing the relevancy of a document or +field. Field level mapping allows to define explicit boost level on a +specific field. The boost field mapping (applied on the +<<mapping-root-object-type,root object>>) allows +to define a boost field mapping where *its content will control the +boost level of the document*. For example, consider the following +mapping: + +[source,js] +-------------------------------------------------- +{ + "tweet" : { + "_boost" : {"name" : "my_boost", "null_value" : 1.0} + } +} +-------------------------------------------------- + +The above mapping defines mapping for a field named `my_boost`. If the +`my_boost` field exists within the JSON document indexed, its value will +control the boost level of the document indexed. For example, the +following JSON document will be indexed with a boost value of `2.2`: + +[source,js] +-------------------------------------------------- +{ + "my_boost" : 2.2, + "message" : "This is a tweet!" +} +-------------------------------------------------- + +[[function-score-instead-of-boost]] +==== Function score instead of boost + +Support for document boosting via the `_boost` field has been removed +from Lucene and is deprecated in Elasticsearch as of v1.0.0.RC1. The +implementation in Lucene resulted in unpredictable result when +used with multiple fields or multi-value fields. + +Instead, the <<query-dsl-function-score-query>> can be used to achieve +the desired functionality by boosting each document by the value in +any field the document: + +[source,js] +-------------------------------------------------- +{ + "query": { + "function_score": { + "query": { <1> + "match": { + "title": "your main query" + } + }, + "functions": [{ + "script_score": { <2> + "script": "doc['my_boost_field'].value" + } + }], + "score_mode": "multiply" + } + } +} +-------------------------------------------------- +<1> The original query, now wrapped in a `function_score` query. +<2> This script returns the value in `my_boost_field`, which is then + multiplied by the query `_score` for each document. + + diff --git a/docs/reference/mapping/fields/id-field.asciidoc b/docs/reference/mapping/fields/id-field.asciidoc new file mode 100644 index 0000000..1adab49 --- /dev/null +++ b/docs/reference/mapping/fields/id-field.asciidoc @@ -0,0 +1,52 @@ +[[mapping-id-field]] +=== `_id` + +Each document indexed is associated with an id and a type. The `_id` +field can be used to index just the id, and possible also store it. By +default it is not indexed and not stored (thus, not created). + +Note, even though the `_id` is not indexed, all the APIs still work +(since they work with the `_uid` field), as well as fetching by ids +using `term`, `terms` or `prefix` queries/filters (including the +specific `ids` query/filter). + +The `_id` field can be enabled to be indexed, and possibly stored, +using: + +[source,js] +-------------------------------------------------- +{ + "tweet" : { + "_id" : {"index": "not_analyzed", "store" : false } + } +} +-------------------------------------------------- + +The `_id` mapping can also be associated with a `path` that will be used +to extract the id from a different location in the source document. For +example, having the following mapping: + +[source,js] +-------------------------------------------------- +{ + "tweet" : { + "_id" : { + "path" : "post_id" + } + } +} +-------------------------------------------------- + +Will cause `1` to be used as the id for: + +[source,js] +-------------------------------------------------- +{ + "message" : "You know, for Search", + "post_id" : "1" +} +-------------------------------------------------- + +This does require an additional lightweight parsing step while indexing, +in order to extract the id to decide which shard the index operation +will be executed on. diff --git a/docs/reference/mapping/fields/index-field.asciidoc b/docs/reference/mapping/fields/index-field.asciidoc new file mode 100644 index 0000000..96a320b --- /dev/null +++ b/docs/reference/mapping/fields/index-field.asciidoc @@ -0,0 +1,15 @@ +[[mapping-index-field]] +=== `_index` + +The ability to store in a document the index it belongs to. By default +it is disabled, in order to enable it, the following mapping should be +defined: + +[source,js] +-------------------------------------------------- +{ + "tweet" : { + "_index" : { "enabled" : true } + } +} +-------------------------------------------------- diff --git a/docs/reference/mapping/fields/parent-field.asciidoc b/docs/reference/mapping/fields/parent-field.asciidoc new file mode 100644 index 0000000..3225b53 --- /dev/null +++ b/docs/reference/mapping/fields/parent-field.asciidoc @@ -0,0 +1,21 @@ +[[mapping-parent-field]] +=== `_parent` + +The parent field mapping is defined on a child mapping, and points to +the parent type this child relates to. For example, in case of a `blog` +type and a `blog_tag` type child document, the mapping for `blog_tag` +should be: + +[source,js] +-------------------------------------------------- +{ + "blog_tag" : { + "_parent" : { + "type" : "blog" + } + } +} +-------------------------------------------------- + +The mapping is automatically stored and indexed (meaning it can be +searched on using the `_parent` field notation). diff --git a/docs/reference/mapping/fields/routing-field.asciidoc b/docs/reference/mapping/fields/routing-field.asciidoc new file mode 100644 index 0000000..8ca2286 --- /dev/null +++ b/docs/reference/mapping/fields/routing-field.asciidoc @@ -0,0 +1,69 @@ +[[mapping-routing-field]] +=== `_routing` + +The routing field allows to control the `_routing` aspect when indexing +data and explicit routing control is required. + +[float] +==== store / index + +The first thing the `_routing` mapping does is to store the routing +value provided (`store` set to `false`) and index it (`index` set to +`not_analyzed`). The reason why the routing is stored by default is so +reindexing data will be possible if the routing value is completely +external and not part of the docs. + +[float] +==== required + +Another aspect of the `_routing` mapping is the ability to define it as +required by setting `required` to `true`. This is very important to set +when using routing features, as it allows different APIs to make use of +it. For example, an index operation will be rejected if no routing value +has been provided (or derived from the doc). A delete operation will be +broadcasted to all shards if no routing value is provided and `_routing` +is required. + +[float] +==== path + +The routing value can be provided as an external value when indexing +(and still stored as part of the document, in much the same way +`_source` is stored). But, it can also be automatically extracted from +the index doc based on a `path`. For example, having the following +mapping: + +[source,js] +-------------------------------------------------- +{ + "comment" : { + "_routing" : { + "required" : true, + "path" : "blog.post_id" + } + } +} +-------------------------------------------------- + +Will cause the following doc to be routed based on the `111222` value: + +[source,js] +-------------------------------------------------- +{ + "text" : "the comment text" + "blog" : { + "post_id" : "111222" + } +} +-------------------------------------------------- + +Note, using `path` without explicit routing value provided required an +additional (though quite fast) parsing phase. + +[float] +==== id uniqueness + +When indexing documents specifying a custom `_routing`, the uniqueness +of the `_id` is not guaranteed throughout all the shards that the index +is composed of. In fact, documents with the same `_id` might end up in +different shards if indexed with different `_routing` values. diff --git a/docs/reference/mapping/fields/size-field.asciidoc b/docs/reference/mapping/fields/size-field.asciidoc new file mode 100644 index 0000000..7abfd40 --- /dev/null +++ b/docs/reference/mapping/fields/size-field.asciidoc @@ -0,0 +1,26 @@ +[[mapping-size-field]] +=== `_size` + +The `_size` field allows to automatically index the size of the original +`_source` indexed. By default, it's disabled. In order to enable it, set +the mapping to: + +[source,js] +-------------------------------------------------- +{ + "tweet" : { + "_size" : {"enabled" : true} + } +} +-------------------------------------------------- + +In order to also store it, use: + +[source,js] +-------------------------------------------------- +{ + "tweet" : { + "_size" : {"enabled" : true, "store" : true } + } +} +-------------------------------------------------- diff --git a/docs/reference/mapping/fields/source-field.asciidoc b/docs/reference/mapping/fields/source-field.asciidoc new file mode 100644 index 0000000..22bb963 --- /dev/null +++ b/docs/reference/mapping/fields/source-field.asciidoc @@ -0,0 +1,41 @@ +[[mapping-source-field]] +=== `_source` + +The `_source` field is an automatically generated field that stores the +actual JSON that was used as the indexed document. It is not indexed +(searchable), just stored. When executing "fetch" requests, like +<<docs-get,get>> or +<<search-search,search>>, the `_source` field is +returned by default. + +Though very handy to have around, the source field does incur storage +overhead within the index. For this reason, it can be disabled. For +example: + +[source,js] +-------------------------------------------------- +{ + "tweet" : { + "_source" : {"enabled" : false} + } +} +-------------------------------------------------- + +[float] +[[include-exclude]] +==== Includes / Excludes + +Allow to specify paths in the source that would be included / excluded +when it's stored, supporting `*` as wildcard annotation. For example: + +[source,js] +-------------------------------------------------- +{ + "my_type" : { + "_source" : { + "includes" : ["path1.*", "path2.*"], + "excludes" : ["pat3.*"] + } + } +} +-------------------------------------------------- diff --git a/docs/reference/mapping/fields/timestamp-field.asciidoc b/docs/reference/mapping/fields/timestamp-field.asciidoc new file mode 100644 index 0000000..97bca8d --- /dev/null +++ b/docs/reference/mapping/fields/timestamp-field.asciidoc @@ -0,0 +1,82 @@ +[[mapping-timestamp-field]] +=== `_timestamp` + +The `_timestamp` field allows to automatically index the timestamp of a +document. It can be provided externally via the index request or in the +`_source`. If it is not provided externally it will be automatically set +to the date the document was processed by the indexing chain. + +[float] +==== enabled + +By default it is disabled, in order to enable it, the following mapping +should be defined: + +[source,js] +-------------------------------------------------- +{ + "tweet" : { + "_timestamp" : { "enabled" : true } + } +} +-------------------------------------------------- + +[float] +==== store / index + +By default the `_timestamp` field has `store` set to `false` and `index` +set to `not_analyzed`. It can be queried as a standard date field. + +[float] +==== path + +The `_timestamp` value can be provided as an external value when +indexing. But, it can also be automatically extracted from the document +to index based on a `path`. For example, having the following mapping: + +[source,js] +-------------------------------------------------- +{ + "tweet" : { + "_timestamp" : { + "enabled" : true, + "path" : "post_date" + } + } +} +-------------------------------------------------- + +Will cause `2009-11-15T14:12:12` to be used as the timestamp value for: + +[source,js] +-------------------------------------------------- +{ + "message" : "You know, for Search", + "post_date" : "2009-11-15T14:12:12" +} +-------------------------------------------------- + +Note, using `path` without explicit timestamp value provided require an +additional (though quite fast) parsing phase. + +[float] +==== format + +You can define the <<mapping-date-format,date +format>> used to parse the provided timestamp value. For example: + +[source,js] +-------------------------------------------------- +{ + "tweet" : { + "_timestamp" : { + "enabled" : true, + "path" : "post_date", + "format" : "YYYY-MM-dd" + } + } +} +-------------------------------------------------- + +Note, the default format is `dateOptionalTime`. The timestamp value will +first be parsed as a number and if it fails the format will be tried. diff --git a/docs/reference/mapping/fields/ttl-field.asciidoc b/docs/reference/mapping/fields/ttl-field.asciidoc new file mode 100644 index 0000000..d47aaca --- /dev/null +++ b/docs/reference/mapping/fields/ttl-field.asciidoc @@ -0,0 +1,70 @@ +[[mapping-ttl-field]] +=== `_ttl` + +A lot of documents naturally come with an expiration date. Documents can +therefore have a `_ttl` (time to live), which will cause the expired +documents to be deleted automatically. + +[float] +==== enabled + +By default it is disabled, in order to enable it, the following mapping +should be defined: + +[source,js] +-------------------------------------------------- +{ + "tweet" : { + "_ttl" : { "enabled" : true } + } +} +-------------------------------------------------- + +[float] +==== store / index + +By default the `_ttl` field has `store` set to `true` and `index` set to +`not_analyzed`. Note that `index` property has to be set to +`not_analyzed` in order for the purge process to work. + +[float] +==== default + +You can provide a per index/type default `_ttl` value as follows: + +[source,js] +-------------------------------------------------- +{ + "tweet" : { + "_ttl" : { "enabled" : true, "default" : "1d" } + } +} +-------------------------------------------------- + +In this case, if you don't provide a `_ttl` value in your query or in +the `_source` all tweets will have a `_ttl` of one day. + +In case you do not specify a time unit like `d` (days), `m` (minutes), +`h` (hours), `ms` (milliseconds) or `w` (weeks), milliseconds is used as +default unit. + +If no `default` is set and no `_ttl` value is given then the document +has an infinite `_ttl` and will not expire. + +You can dynamically update the `default` value using the put mapping +API. It won't change the `_ttl` of already indexed documents but will be +used for future documents. + +[float] +==== Note on documents expiration + +Expired documents will be automatically deleted regularly. You can +dynamically set the `indices.ttl.interval` to fit your needs. The +default value is `60s`. + +The deletion orders are processed by bulk. You can set +`indices.ttl.bulk_size` to fit your needs. The default value is `10000`. + +Note that the expiration procedure handle versioning properly so if a +document is updated between the collection of documents to expire and +the delete order, the document won't be deleted. diff --git a/docs/reference/mapping/fields/type-field.asciidoc b/docs/reference/mapping/fields/type-field.asciidoc new file mode 100644 index 0000000..bac7457 --- /dev/null +++ b/docs/reference/mapping/fields/type-field.asciidoc @@ -0,0 +1,31 @@ +[[mapping-type-field]] +=== Type Field + +Each document indexed is associated with an id and a type. The type, +when indexing, is automatically indexed into a `_type` field. By +default, the `_type` field is indexed (but *not* analyzed) and not +stored. This means that the `_type` field can be queried. + +The `_type` field can be stored as well, for example: + +[source,js] +-------------------------------------------------- +{ + "tweet" : { + "_type" : {"store" : true} + } +} +-------------------------------------------------- + +The `_type` field can also not be indexed, and all the APIs will still +work except for specific queries (term queries / filters) or faceting +done on the `_type` field. + +[source,js] +-------------------------------------------------- +{ + "tweet" : { + "_type" : {"index" : "no"} + } +} +-------------------------------------------------- diff --git a/docs/reference/mapping/fields/uid-field.asciidoc b/docs/reference/mapping/fields/uid-field.asciidoc new file mode 100644 index 0000000..f9ce245 --- /dev/null +++ b/docs/reference/mapping/fields/uid-field.asciidoc @@ -0,0 +1,11 @@ +[[mapping-uid-field]] +=== `_uid` + +Each document indexed is associated with an id and a type, the internal +`_uid` field is the unique identifier of a document within an index and +is composed of the type and the id (meaning that different types can +have the same id and still maintain uniqueness). + +The `_uid` field is automatically used when `_type` is not indexed to +perform type based filtering, and does not require the `_id` to be +indexed. |