Imported Upstream version 1.0.3upstream/1.0.3

author: Hilko Bengen <bengen@debian.org> 2014-06-07 12:02:12 +0200
committer: Hilko Bengen <bengen@debian.org> 2014-06-07 12:02:12 +0200
commit: d5ed89b946297270ec28abf44bef2371a06f1f4f (patch)
tree: ce2d945e4dde69af90bd9905a70d8d27f4936776 /docs/reference/mapping/fields
download: elasticsearch-d5ed89b946297270ec28abf44bef2371a06f1f4f.tar.gz
13 files changed, 609 insertions, 0 deletions
diff --git a/docs/reference/mapping/fields/all-field.asciidoc b/docs/reference/mapping/fields/all-field.asciidoc
new file mode 100644
index 0000000..65453ef
--- /dev/null
+++ b/docs/reference/mapping/fields/all-field.asciidoc
@@ -0,0 +1,78 @@
+[[mapping-all-field]]
+=== `_all`
+
+The idea of the `_all` field is that it includes the text of one or more
+other fields within the document indexed. It can come very handy
+especially for search requests, where we want to execute a search query
+against the content of a document, without knowing which fields to
+search on. This comes at the expense of CPU cycles and index size.
+
+The `_all` fields can be completely disabled. Explicit field mapping and
+object mapping can be excluded / included in the `_all` field. By
+default, it is enabled and all fields are included in it for ease of
+use.
+
+When disabling the `_all` field, it is a good practice to set
+`index.query.default_field` to a different value (for example, if you
+have a main "message" field in your data, set it to `message`).
+
+One of the nice features of the `_all` field is that it takes into
+account specific fields boost levels. Meaning that if a title field is
+boosted more than content, the title (part) in the `_all` field will
+mean more than the content (part) in the `_all` field.
+
+Here is a sample mapping:
+
+[source,js]
+--------------------------------------------------
+{
+    "person" : {
+        "_all" : {"enabled" : true},
+        "properties" : {
+            "name" : {
+                "type" : "object",
+                "dynamic" : false,
+                "properties" : {
+                    "first" : {"type" : "string", "store" : true , "include_in_all" : false},
+                    "last" : {"type" : "string", "index" : "not_analyzed"}
+                }
+            },
+            "address" : {
+                "type" : "object",
+                "include_in_all" : false,
+                "properties" : {
+                    "first" : {
+                        "properties" : {
+                            "location" : {"type" : "string", "store" : true, "index_name" : "firstLocation"}
+                        }
+                    },
+                    "last" : {
+                        "properties" : {
+                            "location" : {"type" : "string"}
+                        }
+                    }
+                }
+            },
+            "simple1" : {"type" : "long", "include_in_all" : true},
+            "simple2" : {"type" : "long", "include_in_all" : false}
+        }
+    }
+}
+--------------------------------------------------
+
+The `_all` fields allows for `store`, `term_vector` and `analyzer` (with
+specific `index_analyzer` and `search_analyzer`) to be set.
+
+[float]
+[[highlighting]]
+==== Highlighting
+
+For any field to allow
+<<search-request-highlighting,highlighting>> it has
+to be either stored or part of the `_source` field. By default `_all`
+field does not qualify for either, so highlighting for it does not yield
+any data.
+
+Although it is possible to `store` the `_all` field, it is basically an
+aggregation of all fields, which means more data will be stored, and
+highlighting it might produce strange results.
diff --git a/docs/reference/mapping/fields/analyzer-field.asciidoc b/docs/reference/mapping/fields/analyzer-field.asciidoc
new file mode 100644
index 0000000..30bb072
--- /dev/null
+++ b/docs/reference/mapping/fields/analyzer-field.asciidoc
@@ -0,0 +1,41 @@
+[[mapping-analyzer-field]]
+=== `_analyzer`
+
+The `_analyzer` mapping allows to use a document field property as the
+name of the analyzer that will be used to index the document. The
+analyzer will be used for any field that does not explicitly defines an
+`analyzer` or `index_analyzer` when indexing.
+
+Here is a simple mapping:
+
+[source,js]
+--------------------------------------------------
+{
+    "type1" : {
+        "_analyzer" : {
+            "path" : "my_field"
+        }
+    }
+}
+--------------------------------------------------
+
+The above will use the value of the `my_field` to lookup an analyzer
+registered under it. For example, indexing a the following doc:
+
+[source,js]
+--------------------------------------------------
+{
+    "my_field" : "whitespace"
+}
+--------------------------------------------------
+
+Will cause the `whitespace` analyzer to be used as the index analyzer
+for all fields without explicit analyzer setting.
+
+The default path value is `_analyzer`, so the analyzer can be driven for
+a specific document by setting `_analyzer` field in it. If custom json
+field name is needed, an explicit mapping with a different path should
+be set.
+
+By default, the `_analyzer` field is indexed, it can be disabled by
+settings `index` to `no` in the mapping.
diff --git a/docs/reference/mapping/fields/boost-field.asciidoc b/docs/reference/mapping/fields/boost-field.asciidoc
new file mode 100644
index 0000000..1d00845
--- /dev/null
+++ b/docs/reference/mapping/fields/boost-field.asciidoc
@@ -0,0 +1,72 @@
+[[mapping-boost-field]]
+=== `_boost`
+
+deprecated[1.0.0.RC1,See <<function-score-instead-of-boost>>]
+
+Boosting is the process of enhancing the relevancy of a document or
+field. Field level mapping allows to define explicit boost level on a
+specific field. The boost field mapping (applied on the
+<<mapping-root-object-type,root object>>) allows
+to define a boost field mapping where *its content will control the
+boost level of the document*. For example, consider the following
+mapping:
+
+[source,js]
+--------------------------------------------------
+{
+    "tweet" : {
+        "_boost" : {"name" : "my_boost", "null_value" : 1.0}
+    }
+}
+--------------------------------------------------
+
+The above mapping defines mapping for a field named `my_boost`. If the
+`my_boost` field exists within the JSON document indexed, its value will
+control the boost level of the document indexed. For example, the
+following JSON document will be indexed with a boost value of `2.2`:
+
+[source,js]
+--------------------------------------------------
+{
+    "my_boost" : 2.2,
+    "message" : "This is a tweet!"
+}
+--------------------------------------------------
+
+[[function-score-instead-of-boost]]
+==== Function score instead of boost
+
+Support for document boosting via the `_boost` field has been removed
+from Lucene and is deprecated in Elasticsearch as of v1.0.0.RC1. The
+implementation in Lucene resulted in unpredictable result when
+used with multiple fields or multi-value fields.
+
+Instead, the <<query-dsl-function-score-query>> can be used to achieve
+the desired functionality by boosting each document by the value in
+any field the document:
+
+[source,js]
+--------------------------------------------------
+{
+    "query": {
+        "function_score": {
+            "query": {  <1>
+                "match": {
+                    "title": "your main query"
+                }
+            },
+            "functions": [{
+                "script_score": { <2>
+                    "script": "doc['my_boost_field'].value"
+                }
+            }],
+            "score_mode": "multiply"
+        }
+    }
+}
+--------------------------------------------------
+<1> The original query, now wrapped in a `function_score` query.
+<2> This script returns the value in `my_boost_field`, which is then
+    multiplied by the query `_score` for each document.
+
+
diff --git a/docs/reference/mapping/fields/id-field.asciidoc b/docs/reference/mapping/fields/id-field.asciidoc
new file mode 100644
index 0000000..1adab49
--- /dev/null
+++ b/docs/reference/mapping/fields/id-field.asciidoc
@@ -0,0 +1,52 @@
+[[mapping-id-field]]
+=== `_id`
+
+Each document indexed is associated with an id and a type. The `_id`
+field can be used to index just the id, and possible also store it. By
+default it is not indexed and not stored (thus, not created).
+
+Note, even though the `_id` is not indexed, all the APIs still work
+(since they work with the `_uid` field), as well as fetching by ids
+using `term`, `terms` or `prefix` queries/filters (including the
+specific `ids` query/filter).
+
+The `_id` field can be enabled to be indexed, and possibly stored,
+using:
+
+[source,js]
+--------------------------------------------------
+{
+    "tweet" : {
+        "_id" : {"index": "not_analyzed", "store" : false }
+    }
+}
+--------------------------------------------------
+
+The `_id` mapping can also be associated with a `path` that will be used
+to extract the id from a different location in the source document. For
+example, having the following mapping:
+
+[source,js]
+--------------------------------------------------
+{
+    "tweet" : {
+        "_id" : {
+            "path" : "post_id"
+        }
+    }
+}
+--------------------------------------------------
+
+Will cause `1` to be used as the id for:
+
+[source,js]
+--------------------------------------------------
+{
+    "message" : "You know, for Search",
+    "post_id" : "1"
+}
+--------------------------------------------------
+
+This does require an additional lightweight parsing step while indexing,
+in order to extract the id to decide which shard the index operation
+will be executed on.
diff --git a/docs/reference/mapping/fields/index-field.asciidoc b/docs/reference/mapping/fields/index-field.asciidoc
new file mode 100644
index 0000000..96a320b
--- /dev/null
+++ b/docs/reference/mapping/fields/index-field.asciidoc
@@ -0,0 +1,15 @@
+[[mapping-index-field]]
+=== `_index`
+
+The ability to store in a document the index it belongs to. By default
+it is disabled, in order to enable it, the following mapping should be
+defined:
+
+[source,js]
+--------------------------------------------------
+{
+    "tweet" : {
+        "_index" : { "enabled" : true }
+    }
+}
+--------------------------------------------------
diff --git a/docs/reference/mapping/fields/parent-field.asciidoc b/docs/reference/mapping/fields/parent-field.asciidoc
new file mode 100644
index 0000000..3225b53
--- /dev/null
+++ b/docs/reference/mapping/fields/parent-field.asciidoc
@@ -0,0 +1,21 @@
+[[mapping-parent-field]]
+=== `_parent`
+
+The parent field mapping is defined on a child mapping, and points to
+the parent type this child relates to. For example, in case of a `blog`
+type and a `blog_tag` type child document, the mapping for `blog_tag`
+should be:
+
+[source,js]
+--------------------------------------------------
+{
+    "blog_tag" : {
+        "_parent" : {
+            "type" : "blog"
+        }
+    }
+}
+--------------------------------------------------
+
+The mapping is automatically stored and indexed (meaning it can be
+searched on using the `_parent` field notation).
diff --git a/docs/reference/mapping/fields/routing-field.asciidoc b/docs/reference/mapping/fields/routing-field.asciidoc
new file mode 100644
index 0000000..8ca2286
--- /dev/null
+++ b/docs/reference/mapping/fields/routing-field.asciidoc
@@ -0,0 +1,69 @@
+[[mapping-routing-field]]
+=== `_routing`
+
+The routing field allows to control the `_routing` aspect when indexing
+data and explicit routing control is required.
+
+[float]
+==== store / index
+
+The first thing the `_routing` mapping does is to store the routing
+value provided (`store` set to `false`) and index it (`index` set to
+`not_analyzed`). The reason why the routing is stored by default is so
+reindexing data will be possible if the routing value is completely
+external and not part of the docs.
+
+[float]
+==== required
+
+Another aspect of the `_routing` mapping is the ability to define it as
+required by setting `required` to `true`. This is very important to set
+when using routing features, as it allows different APIs to make use of
+it. For example, an index operation will be rejected if no routing value
+has been provided (or derived from the doc). A delete operation will be
+broadcasted to all shards if no routing value is provided and `_routing`
+is required.
+
+[float]
+==== path
+
+The routing value can be provided as an external value when indexing
+(and still stored as part of the document, in much the same way
+`_source` is stored). But, it can also be automatically extracted from
+the index doc based on a `path`. For example, having the following
+mapping:
+
+[source,js]
+--------------------------------------------------
+{
+    "comment" : {
+        "_routing" : {
+            "required" : true,
+            "path" : "blog.post_id"
+        }
+    }
+}
+--------------------------------------------------
+
+Will cause the following doc to be routed based on the `111222` value:
+
+[source,js]
+--------------------------------------------------
+{
+    "text" : "the comment text"
+    "blog" : {
+        "post_id" : "111222"
+    }
+}
+--------------------------------------------------
+
+Note, using `path` without explicit routing value provided required an
+additional (though quite fast) parsing phase.
+
+[float]
+==== id uniqueness
+
+When indexing documents specifying a custom `_routing`, the uniqueness
+of the `_id` is not guaranteed throughout all the shards that the index
+is composed of. In fact, documents with the same `_id` might end up in
+different shards if indexed with different `_routing` values.
diff --git a/docs/reference/mapping/fields/size-field.asciidoc b/docs/reference/mapping/fields/size-field.asciidoc
new file mode 100644
index 0000000..7abfd40
--- /dev/null
+++ b/docs/reference/mapping/fields/size-field.asciidoc
@@ -0,0 +1,26 @@
+[[mapping-size-field]]
+=== `_size`
+
+The `_size` field allows to automatically index the size of the original
+`_source` indexed. By default, it's disabled. In order to enable it, set
+the mapping to:
+
+[source,js]
+--------------------------------------------------
+{
+    "tweet" : {
+        "_size" : {"enabled" : true}
+    }
+}
+--------------------------------------------------
+
+In order to also store it, use:
+
+[source,js]
+--------------------------------------------------
+{
+    "tweet" : {
+        "_size" : {"enabled" : true, "store" : true }
+    }
+}
+--------------------------------------------------
diff --git a/docs/reference/mapping/fields/source-field.asciidoc b/docs/reference/mapping/fields/source-field.asciidoc
new file mode 100644
index 0000000..22bb963
--- /dev/null
+++ b/docs/reference/mapping/fields/source-field.asciidoc
@@ -0,0 +1,41 @@
+[[mapping-source-field]]
+=== `_source`
+
+The `_source` field is an automatically generated field that stores the
+actual JSON that was used as the indexed document. It is not indexed
+(searchable), just stored. When executing "fetch" requests, like
+<<docs-get,get>> or
+<<search-search,search>>, the `_source` field is
+returned by default.
+
+Though very handy to have around, the source field does incur storage
+overhead within the index. For this reason, it can be disabled. For
+example:
+
+[source,js]
+--------------------------------------------------
+{
+    "tweet" : {
+        "_source" : {"enabled" : false}
+    }
+}
+--------------------------------------------------
+
+[float]
+[[include-exclude]]
+==== Includes / Excludes
+
+Allow to specify paths in the source that would be included / excluded
+when it's stored, supporting `*` as wildcard annotation. For example:
+
+[source,js]
+--------------------------------------------------
+{
+    "my_type" : {
+        "_source" : {
+            "includes" : ["path1.*", "path2.*"],
+            "excludes" : ["pat3.*"]
+        }
+    }
+}
+--------------------------------------------------
diff --git a/docs/reference/mapping/fields/timestamp-field.asciidoc b/docs/reference/mapping/fields/timestamp-field.asciidoc
new file mode 100644
index 0000000..97bca8d
--- /dev/null
+++ b/docs/reference/mapping/fields/timestamp-field.asciidoc
@@ -0,0 +1,82 @@
+[[mapping-timestamp-field]]
+=== `_timestamp`
+
+The `_timestamp` field allows to automatically index the timestamp of a
+document. It can be provided externally via the index request or in the
+`_source`. If it is not provided externally it will be automatically set
+to the date the document was processed by the indexing chain.
+
+[float]
+==== enabled
+
+By default it is disabled, in order to enable it, the following mapping
+should be defined:
+
+[source,js]
+--------------------------------------------------
+{
+    "tweet" : {
+        "_timestamp" : { "enabled" : true }
+    }
+}
+--------------------------------------------------
+
+[float]
+==== store / index
+
+By default the `_timestamp` field has `store` set to `false` and `index`
+set to `not_analyzed`. It can be queried as a standard date field.
+
+[float]
+==== path
+
+The `_timestamp` value can be provided as an external value when
+indexing. But, it can also be automatically extracted from the document
+to index based on a `path`. For example, having the following mapping:
+
+[source,js]
+--------------------------------------------------
+{
+    "tweet" : {
+        "_timestamp" : {
+            "enabled" : true,
+            "path" : "post_date"
+        }
+    }
+}
+--------------------------------------------------
+
+Will cause `2009-11-15T14:12:12` to be used as the timestamp value for:
+
+[source,js]
+--------------------------------------------------
+{
+    "message" : "You know, for Search",
+    "post_date" : "2009-11-15T14:12:12"
+}
+--------------------------------------------------
+
+Note, using `path` without explicit timestamp value provided require an
+additional (though quite fast) parsing phase.
+
+[float]
+==== format
+
+You can define the <<mapping-date-format,date
+format>> used to parse the provided timestamp value. For example:
+
+[source,js]
+--------------------------------------------------
+{
+    "tweet" : {
+        "_timestamp" : {
+            "enabled" : true,
+            "path" : "post_date",
+            "format" : "YYYY-MM-dd"
+        }
+    }
+}
+--------------------------------------------------
+
+Note, the default format is `dateOptionalTime`. The timestamp value will
+first be parsed as a number and if it fails the format will be tried.
diff --git a/docs/reference/mapping/fields/ttl-field.asciidoc b/docs/reference/mapping/fields/ttl-field.asciidoc
new file mode 100644
index 0000000..d47aaca
--- /dev/null
+++ b/docs/reference/mapping/fields/ttl-field.asciidoc
@@ -0,0 +1,70 @@
+[[mapping-ttl-field]]
+=== `_ttl`
+
+A lot of documents naturally come with an expiration date. Documents can
+therefore have a `_ttl` (time to live), which will cause the expired
+documents to be deleted automatically.
+
+[float]
+==== enabled
+
+By default it is disabled, in order to enable it, the following mapping
+should be defined:
+
+[source,js]
+--------------------------------------------------
+{
+    "tweet" : {
+        "_ttl" : { "enabled" : true }
+    }
+}
+--------------------------------------------------
+
+[float]
+==== store / index
+
+By default the `_ttl` field has `store` set to `true` and `index` set to
+`not_analyzed`. Note that `index` property has to be set to
+`not_analyzed` in order for the purge process to work.
+
+[float]
+==== default
+
+You can provide a per index/type default `_ttl` value as follows:
+
+[source,js]
+--------------------------------------------------
+{
+    "tweet" : {
+        "_ttl" : { "enabled" : true, "default" : "1d" }
+    }
+}
+--------------------------------------------------
+
+In this case, if you don't provide a `_ttl` value in your query or in
+the `_source` all tweets will have a `_ttl` of one day.
+
+In case you do not specify a time unit like `d` (days), `m` (minutes),
+`h` (hours), `ms` (milliseconds) or `w` (weeks), milliseconds is used as
+default unit.
+
+If no `default` is set and no `_ttl` value is given then the document
+has an infinite `_ttl` and will not expire.
+
+You can dynamically update the `default` value using the put mapping
+API. It won't change the `_ttl` of already indexed documents but will be
+used for future documents.
+
+[float]
+==== Note on documents expiration
+
+Expired documents will be automatically deleted regularly. You can
+dynamically set the `indices.ttl.interval` to fit your needs. The
+default value is `60s`.
+
+The deletion orders are processed by bulk. You can set
+`indices.ttl.bulk_size` to fit your needs. The default value is `10000`.
+
+Note that the expiration procedure handle versioning properly so if a
+document is updated between the collection of documents to expire and
+the delete order, the document won't be deleted.
diff --git a/docs/reference/mapping/fields/type-field.asciidoc b/docs/reference/mapping/fields/type-field.asciidoc
new file mode 100644
index 0000000..bac7457
--- /dev/null
+++ b/docs/reference/mapping/fields/type-field.asciidoc
@@ -0,0 +1,31 @@
+[[mapping-type-field]]
+=== Type Field
+
+Each document indexed is associated with an id and a type. The type,
+when indexing, is automatically indexed into a `_type` field. By
+default, the `_type` field is indexed (but *not* analyzed) and not
+stored. This means that the `_type` field can be queried.
+
+The `_type` field can be stored as well, for example:
+
+[source,js]
+--------------------------------------------------
+{
+    "tweet" : {
+        "_type" : {"store" : true}
+    }
+}
+--------------------------------------------------
+
+The `_type` field can also not be indexed, and all the APIs will still
+work except for specific queries (term queries / filters) or faceting
+done on the `_type` field.
+
+[source,js]
+--------------------------------------------------
+{
+    "tweet" : {
+        "_type" : {"index" : "no"}
+    }
+}
+--------------------------------------------------
diff --git a/docs/reference/mapping/fields/uid-field.asciidoc b/docs/reference/mapping/fields/uid-field.asciidoc
new file mode 100644
index 0000000..f9ce245
--- /dev/null
+++ b/docs/reference/mapping/fields/uid-field.asciidoc
@@ -0,0 +1,11 @@
+[[mapping-uid-field]]
+=== `_uid`
+
+Each document indexed is associated with an id and a type, the internal
+`_uid` field is the unique identifier of a document within an index and
+is composed of the type and the id (meaning that different types can
+have the same id and still maintain uniqueness).
+
+The `_uid` field is automatically used when `_type` is not indexed to
+perform type based filtering, and does not require the `_id` to be
+indexed.
author	Hilko Bengen <bengen@debian.org>	2014-06-07 12:02:12 +0200
committer	Hilko Bengen <bengen@debian.org>	2014-06-07 12:02:12 +0200
commit	d5ed89b946297270ec28abf44bef2371a06f1f4f (patch)
tree	ce2d945e4dde69af90bd9905a70d8d27f4936776 /docs/reference/mapping/fields
download	elasticsearch-d5ed89b946297270ec28abf44bef2371a06f1f4f.tar.gz