summaryrefslogtreecommitdiff
path: root/docs/reference/search/suggesters/completion-suggest.asciidoc
blob: 41da0b5de6ba52fabd1d77cbff5228a27385cf50 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
[[search-suggesters-completion]]
=== Completion Suggester

NOTE: In order to understand the format of suggestions, please
read the <<search-suggesters>> page first.

The `completion` suggester is a so-called prefix suggester. It does not
do spell correction like the `term` or `phrase` suggesters but allows
basic `auto-complete` functionality.

==== Why another suggester? Why not prefix queries?

The first question which comes to mind when reading about a prefix
suggestion is, why you should use it all, if you have prefix queries
already. The answer is simple: Prefix suggestions are fast.

The data structures are internally backed by Lucenes
`AnalyzingSuggester`, which uses FSTs to execute suggestions. Usually
these data structures are costly to create, stored in-memory and need to
be rebuilt every now and then to reflect changes in your indexed
documents. The `completion` suggester circumvents this by storing the
FST as part of your index during index time. This allows for really fast
loads and executions.

[[completion-suggester-mapping]]
==== Mapping

In order to use this feature, you have to specify a special mapping for
this field, which enables the special storage of the field.

[source,js]
--------------------------------------------------
curl -X PUT localhost:9200/music
curl -X PUT localhost:9200/music/song/_mapping -d '{
  "song" : {
        "properties" : {
            "name" : { "type" : "string" },
            "suggest" : { "type" : "completion",
                          "index_analyzer" : "simple",
                          "search_analyzer" : "simple",
                          "payloads" : true
            }
        }
    }
}'
--------------------------------------------------

Mapping supports the following parameters:

`index_analyzer`::
    The index analyzer to use, defaults to `simple`.

`search_analyzer`::
    The search analyzer to use, defaults to `simple`.
    In case you are wondering why we did not opt for the `standard`
    analyzer: We try to have easy to understand behaviour here, and if you
    index the field content `At the Drive-in`, you will not get any
    suggestions for `a`, nor for `d` (the first non stopword).


`payloads`::
    Enables the storing of payloads, defaults to `false`

`preserve_separators`::
    Preserves the separators, defaults to `true`.
    If disabled, you could find a field starting with `Foo Fighters`, if you
    suggest for `foof`.

`preserve_position_increments`::
    Enables position increments, defaults
    to `true`. If disabled and using stopwords analyzer, you could get a
    field starting with `The Beatles`, if you suggest for `b`. *Note*: You
    could also achieve this by indexing two inputs, `Beatles` and
    `The Beatles`, no need to change a simple analyzer, if you are able to
    enrich your data.

`max_input_length`::
    Limits the length of a single input, defaults to `50` UTF-16 code points.
    This limit is only used at index time to reduce the total number of
    characters per input string in order to prevent massive inputs from
    bloating the underlying datastructure. The most usecases won't be influenced
    by the default value since prefix completions hardly grow beyond prefixes longer
    than a handful of characters. (Old name "max_input_len" is deprecated)

[[indexing]]
==== Indexing

[source,js]
--------------------------------------------------
curl -X PUT 'localhost:9200/music/song/1?refresh=true' -d '{
    "name" : "Nevermind",
    "suggest" : {
        "input": [ "Nevermind", "Nirvana" ],
        "output": "Nirvana - Nevermind",
        "payload" : { "artistId" : 2321 },
        "weight" : 34
    }
}'
--------------------------------------------------

The following parameters are supported:

`input`::
    The input to store, this can be a an array of strings or just
    a string. This field is mandatory.

`output`::
    The string to return, if a suggestion matches. This is very
    useful to normalize outputs (i.e. have them always in the format
    `artist - songname`). The result is de-duplicated if several documents
    have the same output, i.e. only one is returned as part of the
    suggest result. This is optional.

`payload`::
    An arbitrary JSON object, which is simply returned in the
    suggest option. You could store data like the id of a document, in order
    to load it from elasticsearch without executing another search (which
    might not yield any results, if `input` and `output` differ strongly).

`weight`::
    A positive integer, which defines a weight and allows you to
    rank your suggestions. This field is optional.

NOTE: Even though you are losing most of the features of the
completion suggest, you can opt in for the shortest form, which even
allows you to use inside of multi fields. But keep in mind, that you will
not be able to use several inputs, an output, payloads or weights.

[source,js]
--------------------------------------------------
{
  "suggest" : "Nirvana"
}
--------------------------------------------------

NOTE: The suggest data structure might not reflect deletes on
documents immediately. You may need to do an <<indices-optimize>> for that.
You can call optimize with the `only_expunge_deletes=true` to only cater for deletes
or alternatively call a <<index-modules-merge>> operation.

[[querying]]
==== Querying

Suggesting works as usual, except that you have to specify the suggest
type as `completion`.

[source,js]
--------------------------------------------------
curl -X POST 'localhost:9200/music/_suggest?pretty' -d '{
    "song-suggest" : {
        "text" : "n",
        "completion" : {
            "field" : "suggest"
        }
    }
}'

{
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "song-suggest" : [ {
    "text" : "n",
    "offset" : 0,
    "length" : 4,
    "options" : [ {
      "text" : "Nirvana - Nevermind",
      "score" : 34.0, "payload" : {"artistId":2321}
    } ]
  } ]
}
--------------------------------------------------

As you can see, the payload is included in the response, if configured
appropriately. If you configured a weight for a suggestion, this weight
is used as `score`. Also the `text` field uses the `output` of your
indexed suggestion, if configured, otherwise the matched part of the
`input` field.


[[fuzzy]]
==== Fuzzy queries

The completion suggester also supports fuzzy queries - this means,
you can actually have a typo in your search and still get results back.

[source,js]
--------------------------------------------------
curl -X POST 'localhost:9200/music/_suggest?pretty' -d '{
    "song-suggest" : {
        "text" : "n",
        "completion" : {
            "field" : "suggest",
            "fuzzy" : {
                "fuzziness" : 2
            }
        }
    }
}'
--------------------------------------------------

The fuzzy query can take specific fuzzy parameters.
The following parameters are supported:

[horizontal]
`fuzziness`::
    The fuzziness factor, defaults to `AUTO`.
    See  <<fuzziness>> for allowed settings.

`transpositions`::
    Sets if transpositions should be counted
    as one or two changes, defaults to `true`

`min_length`::
    Minimum length of the input before fuzzy
    suggestions are returned, defaults `3`

`prefix_length`::
    Minimum length of the input, which is not
    checked for fuzzy alternatives, defaults to `1`

`unicode_aware`::
    Sets all are measurements (like edit distance,
    transpositions and lengths) in unicode code points
    (actual letters) instead of bytes.

NOTE: If you want to stick with the default values, but
      still use fuzzy, you can either use `fuzzy: {}`
      or `fuzzy: true`.