summaryrefslogtreecommitdiff
path: root/docs/reference/query-dsl/queries/match-query.asciidoc
diff options
context:
space:
mode:
Diffstat (limited to 'docs/reference/query-dsl/queries/match-query.asciidoc')
-rw-r--r--docs/reference/query-dsl/queries/match-query.asciidoc234
1 files changed, 234 insertions, 0 deletions
diff --git a/docs/reference/query-dsl/queries/match-query.asciidoc b/docs/reference/query-dsl/queries/match-query.asciidoc
new file mode 100644
index 0000000..d514768
--- /dev/null
+++ b/docs/reference/query-dsl/queries/match-query.asciidoc
@@ -0,0 +1,234 @@
+[[query-dsl-match-query]]
+=== Match Query
+
+A family of `match` queries that accept text/numerics/dates, analyzes
+it, and constructs a query out of it. For example:
+
+[source,js]
+--------------------------------------------------
+{
+ "match" : {
+ "message" : "this is a test"
+ }
+}
+--------------------------------------------------
+
+Note, `message` is the name of a field, you can substitute the name of
+any field (including `_all`) instead.
+
+[float]
+==== Types of Match Queries
+
+[float]
+===== boolean
+
+The default `match` query is of type `boolean`. It means that the text
+provided is analyzed and the analysis process constructs a boolean query
+from the provided text. The `operator` flag can be set to `or` or `and`
+to control the boolean clauses (defaults to `or`). The minimum number of
+should clauses to match can be set using the
+<<query-dsl-minimum-should-match,`minimum_should_match`>>
+parameter.
+
+The `analyzer` can be set to control which analyzer will perform the
+analysis process on the text. It default to the field explicit mapping
+definition, or the default search analyzer.
+
+`fuzziness` allows _fuzzy matching_ based on the type of field being queried.
+See <<fuzziness>> for allowed settings.
+
+The `prefix_length` and
+`max_expansions` can be set in this case to control the fuzzy process.
+If the fuzzy option is set the query will use `constant_score_rewrite`
+as its <<query-dsl-multi-term-rewrite,rewrite
+method>> the `rewrite` parameter allows to control how the query will get
+rewritten.
+
+Here is an example when providing additional parameters (note the slight
+change in structure, `message` is the field name):
+
+[source,js]
+--------------------------------------------------
+{
+ "match" : {
+ "message" : {
+ "query" : "this is a test",
+ "operator" : "and"
+ }
+ }
+}
+--------------------------------------------------
+
+.zero_terms_query
+If the analyzer used removes all tokens in a query like a `stop` filter
+does, the default behavior is to match no documents at all. In order to
+change that the `zero_terms_query` option can be used, which accepts
+`none` (default) and `all` which corresponds to a `match_all` query.
+
+[source,js]
+--------------------------------------------------
+{
+ "match" : {
+ "message" : {
+ "query" : "to be or not to be",
+ "operator" : "and",
+ "zero_terms_query": "all"
+ }
+ }
+}
+--------------------------------------------------
+
+.cutoff_frequency
+The match query supports a `cutoff_frequency` that allows
+specifying an absolute or relative document frequency where high
+frequent terms are moved into an optional subquery and are only scored
+if one of the low frequent (below the cutoff) terms in the case of an
+`or` operator or all of the low frequent terms in the case of an `and`
+operator match.
+
+This query allows handling `stopwords` dynamically at runtime, is domain
+independent and doesn't require on a stopword file. It prevent scoring /
+iterating high frequent terms and only takes the terms into account if a
+more significant / lower frequent terms match a document. Yet, if all of
+the query terms are above the given `cutoff_frequency` the query is
+automatically transformed into a pure conjunction (`and`) query to
+ensure fast execution.
+
+The `cutoff_frequency` can either be relative to the number of documents
+in the index if in the range `[0..1)` or absolute if greater or equal to
+`1.0`.
+
+Note: If the `cutoff_frequency` is used and the operator is `and`
+_stacked tokens_ (tokens that are on the same position like `synonym` filter emits)
+are not handled gracefully as they are in a pure `and` query. For instance the query
+`fast fox` is analyzed into 3 terms `[fast, quick, fox]` where `quick` is a synonym
+for `fast` on the same token positions the query might require `fast` and `quick` to
+match if the operator is `and`.
+
+Here is an example showing a query composed of stopwords exclusivly:
+
+[source,js]
+--------------------------------------------------
+{
+ "match" : {
+ "message" : {
+ "query" : "to be or not to be",
+ "cutoff_frequency" : 0.001
+ }
+ }
+}
+--------------------------------------------------
+
+[float]
+===== phrase
+
+The `match_phrase` query analyzes the text and creates a `phrase` query
+out of the analyzed text. For example:
+
+[source,js]
+--------------------------------------------------
+{
+ "match_phrase" : {
+ "message" : "this is a test"
+ }
+}
+--------------------------------------------------
+
+Since `match_phrase` is only a `type` of a `match` query, it can also be
+used in the following manner:
+
+[source,js]
+--------------------------------------------------
+{
+ "match" : {
+ "message" : {
+ "query" : "this is a test",
+ "type" : "phrase"
+ }
+ }
+}
+--------------------------------------------------
+
+A phrase query matches terms up to a configurable `slop`
+(which defaults to 0) in any order. Transposed terms have a slop of 2.
+
+The `analyzer` can be set to control which analyzer will perform the
+analysis process on the text. It default to the field explicit mapping
+definition, or the default search analyzer, for example:
+
+[source,js]
+--------------------------------------------------
+{
+ "match_phrase" : {
+ "message" : {
+ "query" : "this is a test",
+ "analyzer" : "my_analyzer"
+ }
+ }
+}
+--------------------------------------------------
+
+[float]
+===== match_phrase_prefix
+
+The `match_phrase_prefix` is the same as `match_phrase`, except that it
+allows for prefix matches on the last term in the text. For example:
+
+[source,js]
+--------------------------------------------------
+{
+ "match_phrase_prefix" : {
+ "message" : "this is a test"
+ }
+}
+--------------------------------------------------
+
+Or:
+
+[source,js]
+--------------------------------------------------
+{
+ "match" : {
+ "message" : {
+ "query" : "this is a test",
+ "type" : "phrase_prefix"
+ }
+ }
+}
+--------------------------------------------------
+
+It accepts the same parameters as the phrase type. In addition, it also
+accepts a `max_expansions` parameter that can control to how many
+prefixes the last term will be expanded. It is highly recommended to set
+it to an acceptable value to control the execution time of the query.
+For example:
+
+[source,js]
+--------------------------------------------------
+{
+ "match_phrase_prefix" : {
+ "message" : {
+ "query" : "this is a test",
+ "max_expansions" : 10
+ }
+ }
+}
+--------------------------------------------------
+
+[float]
+==== Comparison to query_string / field
+
+The match family of queries does not go through a "query parsing"
+process. It does not support field name prefixes, wildcard characters,
+or other "advance" features. For this reason, chances of it failing are
+very small / non existent, and it provides an excellent behavior when it
+comes to just analyze and run that text as a query behavior (which is
+usually what a text search box does). Also, the `phrase_prefix` type can
+provide a great "as you type" behavior to automatically load search
+results.
+
+[float]
+==== Other options
+
+* `lenient` - If set to true will cause format based failures (like
+providing text to a numeric field) to be ignored. Defaults to false.