summaryrefslogtreecommitdiff
path: root/docs/reference/search/facets.asciidoc
blob: a6529a4421085e59b324d3b4e5642beaecbd604f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
[[search-facets]]
== Facets

The usual purpose of a full-text search engine is to return a small
number of documents matching your query.

_Facets_ provide aggregated data based on a search query. In the
simplest case, a
<<search-facets-terms-facet,terms facet>>
can return _facet counts_ for various _facet values_ for a specific
_field_. Elasticsearch supports more facet implementations, such as
<<search-facets-statistical-facet,statistical>>
or
<<search-facets-date-histogram-facet,date
histogram>> facets.

The field used for facet calculations _must_ be of type numeric,
date/time or be analyzed as a single token — see the
<<mapping,_Mapping_>> guide for details on the
analysis process.

You can give the facet a custom _name_ and return multiple facets in one
request.

Let's try it out with a simple example. Suppose we have a number of
articles with a field called `tags`, preferably analyzed with the
<<analysis-keyword-analyzer,keyword>>
analyzer. The facet aggregation will return counts for the most popular
tags across the documents matching your query — or across all documents
in the index.

We will store some example data first:

[source,js]
--------------------------------------------------
curl -X DELETE "http://localhost:9200/articles"
curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "One",   "tags" : ["foo"]}'
curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "Two",   "tags" : ["foo", "bar"]}'
curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "Three", "tags" : ["foo", "bar", "baz"]}'
--------------------------------------------------

Now, let's query the index for articles beginning with letter `T`
and retrieve a
<<search-facets-terms-facet,_terms facet_>>
for the `tags` field. We will name the facet simply: _tags_.

[source,js]
--------------------------------------------------
curl -X POST "http://localhost:9200/articles/_search?pretty=true" -d '
  {
    "query" : { "query_string" : {"query" : "T*"} },
    "facets" : {
      "tags" : { "terms" : {"field" : "tags"} }
    }
  }
'
--------------------------------------------------

This request will return articles `Two` and `Three` (because
they match our query), as well as the `tags` facet:

[source,js]
--------------------------------------------------
"facets" : {
  "tags" : {
    "_type" : "terms",
    "missing" : 0,
    "total": 5,
    "other": 0,
    "terms" : [ {
      "term" : "foo",
      "count" : 2
    }, {
      "term" : "bar",
      "count" : 2
    }, {
      "term" : "baz",
      "count" : 1
    } ]
  }
}
--------------------------------------------------

In the `terms` array, relevant _terms_ and _counts_ are returned. You'll
probably want to display these to your users. The facet returns several
important counts:

* `missing` : The number of documents which have no value for the
faceted field +
 * `total` : The total number of terms in the facet +
 * `other` : The number of terms not included in the returned facet
(effectively `other` = `total` - `terms` )

Notice, that the counts are scoped to the current query: _foo_ is
counted only twice (not three times), _bar_ is counted twice and _baz_
once. Also note that terms are counted once per document, even if the
occur more frequently in that document.

That's because the primary purpose of facets is to enable
http://en.wikipedia.org/wiki/Faceted_search[_faceted navigation_],
allowing the user to refine her query based on the insight from the
facet, i.e. restrict the search to a specific category, price or date
range. Facets can be used, however, for other purposes: computing
histograms, statistical aggregations, and more. See the blog about
link:/blog/data-visualization-with-elasticsearch-and-protovis/[data visualization].for inspiration.



[float]
=== Scope

As we have already mentioned, facet computation is restricted to the
scope of the current query, called `main`, by default. Facets can be
computed within the `global` scope as well, in which case it will return
values computed across all documents in the index:

[source,js]
--------------------------------------------------
{
    "facets" : {
        "my_facets" : {
            "terms" : { ... },
            "global" : true <1>
        }
    }
}
--------------------------------------------------
<1> The `global` keyword can be used with any facet type.

There's one *important distinction* to keep in mind. While search
_queries_ restrict both the returned documents and facet counts, search
_filters_ restrict only returned documents — but _not_ facet counts.

If you need to restrict both the documents and facets, and you're not
willing or able to use a query, you may use a _facet filter_.

[float]
=== Facet Filter

All facets can be configured with an additional filter (explained in the
<<query-dsl,Query DSL>> section), which _will_ reduce
the documents they use for computing results. An example with a _term_
filter:

[source,js]
--------------------------------------------------
{
    "facets" : {
        "<FACET NAME>" : {
            "<FACET TYPE>" : {
                ...
            },
            "facet_filter" : {
                "term" : { "user" : "kimchy"}
            }
        }
    }
}
--------------------------------------------------

Note that this is different from a facet of the
<<search-facets-filter-facet,filter>> type.

[float]
=== Facets with the _nested_ types

<<mapping-nested-type,Nested>> mapping allows
for better support for "inner" documents faceting, especially when it
comes to multi valued key and value facets (like histograms, or term
stats).

What is it good for? First of all, this is the only way to use facets on
nested documents once they are used (possibly for other reasons). But,
there is also facet specific reason why nested documents can be used,
and that's the fact that facets working on different key and value field
(like term_stats, or histogram) can now support cases where both are
multi valued properly.

For example, let's use the following mapping:

[source,js]
--------------------------------------------------
{
    "type1" : {
        "properties" : {
            "obj1" : {
                "type" : "nested"
            }
        }
    }
}
--------------------------------------------------

And, here is a sample data:

[source,js]
--------------------------------------------------
{
    "obj1" : [
        {
            "name" : "blue",
            "count" : 4
        },
        {
            "name" : "green",
            "count" : 6
        }
    ]
}
--------------------------------------------------


[float]
==== All Nested Matching Root Documents

Another option is to run the facet on all the nested documents matching
the root objects that the main query will end up producing. For example:

[source,js]
--------------------------------------------------
{
    "query": {
        "match_all": {}
    },
    "facets": {
        "facet1": {
            "terms_stats": {
                "key_field" : "name",
                "value_field": "count"
            },
            "nested": "obj1"
        }
    }
}
--------------------------------------------------

The `nested` element provides the path to the nested document (can be a
multi level nested docs) that will be used.

Facet filter allows you to filter your facet on the nested object level.
It is important that these filters match on the nested object level and
not on the root document level. In the following example the
`terms_stats` only applies on nested objects with the name 'blue'.

[source,js]
--------------------------------------------------
{
    "query": {
        "match_all": {}
    },
    "facets": {
        "facet1": {
            "terms_stats": {
                "key_field" : "name",
                "value_field": "count"
            },
            "nested": "obj1",
            "facet_filter" : {
                "term" : {"name" : "blue"}
            }
        }
    }
}
--------------------------------------------------

include::facets/terms-facet.asciidoc[]

include::facets/range-facet.asciidoc[]

include::facets/histogram-facet.asciidoc[]

include::facets/date-histogram-facet.asciidoc[]

include::facets/filter-facet.asciidoc[]

include::facets/query-facet.asciidoc[]

include::facets/statistical-facet.asciidoc[]

include::facets/terms-stats-facet.asciidoc[]

include::facets/geo-distance-facet.asciidoc[]