diff options
Diffstat (limited to 'docs/reference/glossary.asciidoc')
-rw-r--r-- | docs/reference/glossary.asciidoc | 190 |
1 files changed, 190 insertions, 0 deletions
diff --git a/docs/reference/glossary.asciidoc b/docs/reference/glossary.asciidoc new file mode 100644 index 0000000..6f7061f --- /dev/null +++ b/docs/reference/glossary.asciidoc @@ -0,0 +1,190 @@ +[glossary] +[[glossary]] += Glossary of terms + +[glossary] +[[glossary-analysis]] analysis :: + + Analysis is the process of converting <<glossary-text,full text>> to + <<glossary-term,terms>>. Depending on which analyzer is used, these phrases: + `FOO BAR`, `Foo-Bar`, `foo,bar` will probably all result in the + terms `foo` and `bar`. These terms are what is actually stored in + the index. + + + A full text query (not a <<glossary-term,term>> query) for `FoO:bAR` will + also be analyzed to the terms `foo`,`bar` and will thus match the + terms stored in the index. + + + It is this process of analysis (both at index time and at search time) + that allows elasticsearch to perform full text queries. + + + Also see <<glossary-text,text>> and <<glossary-term,term>>. + +[[glossary-cluster]] cluster :: + + A cluster consists of one or more <<glossary-node,nodes>> which share the + same cluster name. Each cluster has a single master node which is + chosen automatically by the cluster and which can be replaced if the + current master node fails. + +[[glossary-document]] document :: + + A document is a JSON document which is stored in elasticsearch. It is + like a row in a table in a relational database. Each document is + stored in an <<glossary-index,index>> and has a <<glossary-type,type>> and an + <<glossary-id,id>>. + + + A document is a JSON object (also known in other languages as a hash / + hashmap / associative array) which contains zero or more + <<glossary-field,fields>>, or key-value pairs. + + + The original JSON document that is indexed will be stored in the + <<glossary-source_field,`_source` field>>, which is returned by default when + getting or searching for a document. + +[[glossary-id]] id :: + + The ID of a <<glossary-document,document>> identifies a document. The + `index/type/id` of a document must be unique. If no ID is provided, + then it will be auto-generated. (also see <<glossary-routing,routing>>) + +[[glossary-field]] field :: + + A <<glossary-document,document>> contains a list of fields, or key-value + pairs. The value can be a simple (scalar) value (eg a string, integer, + date), or a nested structure like an array or an object. A field is + similar to a column in a table in a relational database. + + + The <<glossary-mapping,mapping>> for each field has a field _type_ (not to + be confused with document <<glossary-type,type>>) which indicates the type + of data that can be stored in that field, eg `integer`, `string`, + `object`. The mapping also allows you to define (amongst other things) + how the value for a field should be analyzed. + +[[glossary-index]] index :: + + An index is like a _database_ in a relational database. It has a + <<glossary-mapping,mapping>> which defines multiple <<glossary-type,types>>. + + + An index is a logical namespace which maps to one or more + <<glossary-primary-shard,primary shards>> and can have zero or more + <<glossary-replica-shard,replica shards>>. + +[[glossary-mapping]] mapping :: + + A mapping is like a _schema definition_ in a relational database. Each + <<glossary-index,index>> has a mapping, which defines each <<glossary-type,type>> + within the index, plus a number of index-wide settings. + + + A mapping can either be defined explicitly, or it will be generated + automatically when a document is indexed. + +[[glossary-node]] node :: + + A node is a running instance of elasticsearch which belongs to a + <<glossary-cluster,cluster>>. Multiple nodes can be started on a single + server for testing purposes, but usually you should have one node per + server. + + + At startup, a node will use unicast (or multicast, if specified) to + discover an existing cluster with the same cluster name and will try + to join that cluster. + + [[glossary-primary-shard]] primary shard :: + + Each document is stored in a single primary <<glossary-shard,shard>>. When + you index a document, it is indexed first on the primary shard, then + on all <<glossary-replica-shard,replicas>> of the primary shard. + + + By default, an <<glossary-index,index>> has 5 primary shards. You can + specify fewer or more primary shards to scale the number of + <<glossary-document,documents>> that your index can handle. + + + You cannot change the number of primary shards in an index, once the + index is created. + + + See also <<glossary-routing,routing>> + + [[glossary-replica-shard]] replica shard :: + + Each <<glossary-primary-shard,primary shard>> can have zero or more + replicas. A replica is a copy of the primary shard, and has two + purposes: + + + 1. increase failover: a replica shard can be promoted to a primary + shard if the primary fails + 2. increase performance: get and search requests can be handled by + primary or replica shards. + + + By default, each primary shard has one replica, but the number of + replicas can be changed dynamically on an existing index. A replica + shard will never be started on the same node as its primary shard. + +[[glossary-routing]] routing :: + + When you index a document, it is stored on a single + <<glossary-primary-shard,primary shard>>. That shard is chosen by hashing + the `routing` value. By default, the `routing` value is derived from + the ID of the document or, if the document has a specified parent + document, from the ID of the parent document (to ensure that child and + parent documents are stored on the same shard). + + + This value can be overridden by specifying a `routing` value at index + time, or a <<mapping-routing-field,routing + field>> in the <<glossary-mapping,mapping>>. + +[[glossary-shard]] shard :: + + A shard is a single Lucene instance. It is a low-level “worker” unit + which is managed automatically by elasticsearch. An index is a logical + namespace which points to <<glossary-primary-shard,primary>> and + <<glossary-replica-shard,replica>> shards. + + + Other than defining the number of primary and replica shards that an + index should have, you never need to refer to shards directly. + Instead, your code should deal only with an index. + + + Elasticsearch distributes shards amongst all <<glossary-node,nodes>> in the + <<glossary-cluster,cluster>>, and can move shards automatically from one + node to another in the case of node failure, or the addition of new + nodes. + + [[glossary-source_field]] source field :: + + By default, the JSON document that you index will be stored in the + `_source` field and will be returned by all get and search requests. + This allows you access to the original object directly from search + results, rather than requiring a second step to retrieve the object + from an ID. + + + Note: the exact JSON string that you indexed will be returned to you, + even if it contains invalid JSON. The contents of this field do not + indicate anything about how the data in the object has been indexed. + +[[glossary-term]] term :: + + A term is an exact value that is indexed in elasticsearch. The terms + `foo`, `Foo`, `FOO` are NOT equivalent. Terms (i.e. exact values) can + be searched for using _term_ queries. + + See also <<glossary-text,text>> and <<glossary-analysis,analysis>>. + +[[glossary-text]] text :: + + Text (or full text) is ordinary unstructured text, such as this + paragraph. By default, text will be <<glossary-analysis,analyzed>> into + <<glossary-term,terms>>, which is what is actually stored in the index. + + + Text <<glossary-field,fields>> need to be analyzed at index time in order to + be searchable as full text, and keywords in full text queries must be + analyzed at search time to produce (and search for) the same terms + that were generated at index time. + + + See also <<glossary-term,term>> and <<glossary-analysis,analysis>>. + +[[glossary-type]] type :: + + A type is like a _table_ in a relational database. Each type has a + list of <<glossary-field,fields>> that can be specified for + <<glossary-document,documents>> of that type. The <<glossary-mapping,mapping>> + defines how each field in the document is analyzed. + |