diff options
Diffstat (limited to 'docs/reference/query-dsl/queries/match-query.asciidoc')
-rw-r--r-- | docs/reference/query-dsl/queries/match-query.asciidoc | 234 |
1 files changed, 234 insertions, 0 deletions
diff --git a/docs/reference/query-dsl/queries/match-query.asciidoc b/docs/reference/query-dsl/queries/match-query.asciidoc new file mode 100644 index 0000000..d514768 --- /dev/null +++ b/docs/reference/query-dsl/queries/match-query.asciidoc @@ -0,0 +1,234 @@ +[[query-dsl-match-query]] +=== Match Query + +A family of `match` queries that accept text/numerics/dates, analyzes +it, and constructs a query out of it. For example: + +[source,js] +-------------------------------------------------- +{ + "match" : { + "message" : "this is a test" + } +} +-------------------------------------------------- + +Note, `message` is the name of a field, you can substitute the name of +any field (including `_all`) instead. + +[float] +==== Types of Match Queries + +[float] +===== boolean + +The default `match` query is of type `boolean`. It means that the text +provided is analyzed and the analysis process constructs a boolean query +from the provided text. The `operator` flag can be set to `or` or `and` +to control the boolean clauses (defaults to `or`). The minimum number of +should clauses to match can be set using the +<<query-dsl-minimum-should-match,`minimum_should_match`>> +parameter. + +The `analyzer` can be set to control which analyzer will perform the +analysis process on the text. It default to the field explicit mapping +definition, or the default search analyzer. + +`fuzziness` allows _fuzzy matching_ based on the type of field being queried. +See <<fuzziness>> for allowed settings. + +The `prefix_length` and +`max_expansions` can be set in this case to control the fuzzy process. +If the fuzzy option is set the query will use `constant_score_rewrite` +as its <<query-dsl-multi-term-rewrite,rewrite +method>> the `rewrite` parameter allows to control how the query will get +rewritten. + +Here is an example when providing additional parameters (note the slight +change in structure, `message` is the field name): + +[source,js] +-------------------------------------------------- +{ + "match" : { + "message" : { + "query" : "this is a test", + "operator" : "and" + } + } +} +-------------------------------------------------- + +.zero_terms_query +If the analyzer used removes all tokens in a query like a `stop` filter +does, the default behavior is to match no documents at all. In order to +change that the `zero_terms_query` option can be used, which accepts +`none` (default) and `all` which corresponds to a `match_all` query. + +[source,js] +-------------------------------------------------- +{ + "match" : { + "message" : { + "query" : "to be or not to be", + "operator" : "and", + "zero_terms_query": "all" + } + } +} +-------------------------------------------------- + +.cutoff_frequency +The match query supports a `cutoff_frequency` that allows +specifying an absolute or relative document frequency where high +frequent terms are moved into an optional subquery and are only scored +if one of the low frequent (below the cutoff) terms in the case of an +`or` operator or all of the low frequent terms in the case of an `and` +operator match. + +This query allows handling `stopwords` dynamically at runtime, is domain +independent and doesn't require on a stopword file. It prevent scoring / +iterating high frequent terms and only takes the terms into account if a +more significant / lower frequent terms match a document. Yet, if all of +the query terms are above the given `cutoff_frequency` the query is +automatically transformed into a pure conjunction (`and`) query to +ensure fast execution. + +The `cutoff_frequency` can either be relative to the number of documents +in the index if in the range `[0..1)` or absolute if greater or equal to +`1.0`. + +Note: If the `cutoff_frequency` is used and the operator is `and` +_stacked tokens_ (tokens that are on the same position like `synonym` filter emits) +are not handled gracefully as they are in a pure `and` query. For instance the query +`fast fox` is analyzed into 3 terms `[fast, quick, fox]` where `quick` is a synonym +for `fast` on the same token positions the query might require `fast` and `quick` to +match if the operator is `and`. + +Here is an example showing a query composed of stopwords exclusivly: + +[source,js] +-------------------------------------------------- +{ + "match" : { + "message" : { + "query" : "to be or not to be", + "cutoff_frequency" : 0.001 + } + } +} +-------------------------------------------------- + +[float] +===== phrase + +The `match_phrase` query analyzes the text and creates a `phrase` query +out of the analyzed text. For example: + +[source,js] +-------------------------------------------------- +{ + "match_phrase" : { + "message" : "this is a test" + } +} +-------------------------------------------------- + +Since `match_phrase` is only a `type` of a `match` query, it can also be +used in the following manner: + +[source,js] +-------------------------------------------------- +{ + "match" : { + "message" : { + "query" : "this is a test", + "type" : "phrase" + } + } +} +-------------------------------------------------- + +A phrase query matches terms up to a configurable `slop` +(which defaults to 0) in any order. Transposed terms have a slop of 2. + +The `analyzer` can be set to control which analyzer will perform the +analysis process on the text. It default to the field explicit mapping +definition, or the default search analyzer, for example: + +[source,js] +-------------------------------------------------- +{ + "match_phrase" : { + "message" : { + "query" : "this is a test", + "analyzer" : "my_analyzer" + } + } +} +-------------------------------------------------- + +[float] +===== match_phrase_prefix + +The `match_phrase_prefix` is the same as `match_phrase`, except that it +allows for prefix matches on the last term in the text. For example: + +[source,js] +-------------------------------------------------- +{ + "match_phrase_prefix" : { + "message" : "this is a test" + } +} +-------------------------------------------------- + +Or: + +[source,js] +-------------------------------------------------- +{ + "match" : { + "message" : { + "query" : "this is a test", + "type" : "phrase_prefix" + } + } +} +-------------------------------------------------- + +It accepts the same parameters as the phrase type. In addition, it also +accepts a `max_expansions` parameter that can control to how many +prefixes the last term will be expanded. It is highly recommended to set +it to an acceptable value to control the execution time of the query. +For example: + +[source,js] +-------------------------------------------------- +{ + "match_phrase_prefix" : { + "message" : { + "query" : "this is a test", + "max_expansions" : 10 + } + } +} +-------------------------------------------------- + +[float] +==== Comparison to query_string / field + +The match family of queries does not go through a "query parsing" +process. It does not support field name prefixes, wildcard characters, +or other "advance" features. For this reason, chances of it failing are +very small / non existent, and it provides an excellent behavior when it +comes to just analyze and run that text as a query behavior (which is +usually what a text search box does). Also, the `phrase_prefix` type can +provide a great "as you type" behavior to automatically load search +results. + +[float] +==== Other options + +* `lenient` - If set to true will cause format based failures (like +providing text to a numeric field) to be ignored. Defaults to false. |