summaryrefslogtreecommitdiff
path: root/docs/reference/mapping/types/geo-shape-type.asciidoc
blob: 600900a54db647d0dfcf5b9016d02073567c5707 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
[[mapping-geo-shape-type]]
=== Geo Shape Type

The `geo_shape` mapping type facilitates the indexing of and searching
with arbitrary geo shapes such as rectangles and polygons. It should be
used when either the data being indexed or the queries being executed
contain shapes other than just points.

You can query documents using this type using
<<query-dsl-geo-shape-filter,geo_shape Filter>>
or <<query-dsl-geo-shape-query,geo_shape
Query>>.

Note, the `geo_shape` type uses
https://github.com/spatial4j/spatial4j[Spatial4J] and
http://www.vividsolutions.com/jts/jtshome.htm[JTS], both of which are
optional dependencies. Consequently you must add Spatial4J v0.3 and JTS
v1.12 to your classpath in order to use this type.

[float]
==== Mapping Options

The geo_shape mapping maps geo_json geometry objects to the geo_shape
type. To enable it, users must explicitly map fields to the geo_shape
type.

[cols="<,<",options="header",]
|=======================================================================
|Option |Description

|`tree` |Name of the PrefixTree implementation to be used: `geohash` for
GeohashPrefixTree and `quadtree` for QuadPrefixTree. Defaults to
`geohash`.

|`precision` |This parameter may be used instead of `tree_levels` to set
an appropriate value for the `tree_levels` parameter. The value
specifies the desired precision and Elasticsearch will calculate the
best tree_levels value to honor this precision. The value should be a
number followed by an optional distance unit. Valid distance units
include: `in`, `inch`, `yd`, `yard`, `mi`, `miles`, `km`, `kilometers`,
`m`,`meters` (default), `cm`,`centimeters`, `mm`, `millimeters`.

|`tree_levels` |Maximum number of layers to be used by the PrefixTree.
This can be used to control the precision of shape representations and
therefore how many terms are indexed. Defaults to the default value of
the chosen PrefixTree implementation. Since this parameter requires a
certain level of understanding of the underlying implementation, users
may use the `precision` parameter instead. However, Elasticsearch only
uses the tree_levels parameter internally and this is what is returned
via the mapping API even if you use the precision parameter.

|`distance_error_pct` |Used as a hint to the PrefixTree about how
precise it should be. Defaults to 0.025 (2.5%) with 0.5 as the maximum
supported value.
|=======================================================================

[float]
==== Prefix trees

To efficiently represent shapes in the index, Shapes are converted into
a series of hashes representing grid squares using implementations of a
PrefixTree. The tree notion comes from the fact that the PrefixTree uses
multiple grid layers, each with an increasing level of precision to
represent the Earth.

Multiple PrefixTree implementations are provided:

* GeohashPrefixTree - Uses
http://en.wikipedia.org/wiki/Geohash[geohashes] for grid squares.
Geohashes are base32 encoded strings of the bits of the latitude and
longitude interleaved. So the longer the hash, the more precise it is.
Each character added to the geohash represents another tree level and
adds 5 bits of precision to the geohash. A geohash represents a
rectangular area and has 32 sub rectangles. The maximum amount of levels
in Elasticsearch is 24.
* QuadPrefixTree - Uses a
http://en.wikipedia.org/wiki/Quadtree[quadtree] for grid squares.
Similar to geohash, quad trees interleave the bits of the latitude and
longitude the resulting hash is a bit set. A tree level in a quad tree
represents 2 bits in this bit set, one for each coordinate. The maximum
amount of levels for the quad trees in Elasticsearch is 50.

[float]
===== Accuracy

Geo_shape does not provide 100% accuracy and depending on how it is
configured it may return some false positives or false negatives for
certain queries. To mitigate this, it is important to select an
appropriate value for the tree_levels parameter and to adjust
expectations accordingly. For example, a point may be near the border of
a particular grid cell. And may not match a query that only matches the
cell right next to it even though the shape is very close to the point.

[float]
===== Example

[source,js]
--------------------------------------------------
{
    "properties": {
        "location": {
            "type": "geo_shape",
            "tree": "quadtree",
            "precision": "1m"
        }
    }
}
--------------------------------------------------

This mapping maps the location field to the geo_shape type using the
quad_tree implementation and a precision of 1m. Elasticsearch translates
this into a tree_levels setting of 26.

[float]
===== Performance considerations

Elasticsearch uses the paths in the prefix tree as terms in the index
and in queries. The higher the levels is (and thus the precision), the
more terms are generated. Both calculating the terms, keeping them in
memory, and storing them has a price of course. Especially with higher
tree levels, indices can become extremely large even with a modest
amount of data. Additionally, the size of the features also matters.
Big, complex polygons can take up a lot of space at higher tree levels.
Which setting is right depends on the use case. Generally one trades off
accuracy against index size and query performance.

The defaults in Elasticsearch for both implementations are a compromise
between index size and a reasonable level of precision of 50m at the
equator. This allows for indexing tens of millions of shapes without
overly bloating the resulting index too much relative to the input size.

[float]
==== Input Structure

The http://www.geojson.org[GeoJSON] format is used to represent Shapes
as input as follows:

[source,js]
--------------------------------------------------
{
    "location" : {
        "type" : "point",
        "coordinates" : [45.0, -45.0]
    }
}
--------------------------------------------------

Note, both the `type` and `coordinates` fields are required.

The supported `types` are `point`, `linestring`, `polygon`, `multipoint`
and `multipolygon`.

Note, in geojson the correct order is longitude, latitude coordinate
arrays. This differs from some APIs such as e.g. Google Maps that
generally use latitude, longitude.

[float]
===== Envelope

Elasticsearch supports an `envelope` type which consists of coordinates
for upper left and lower right points of the shape:

[source,js]
--------------------------------------------------
{
    "location" : {
        "type" : "envelope",
        "coordinates" : [[-45.0, 45.0], [45.0, -45.0]]
    }
}
--------------------------------------------------

[float]
===== http://www.geojson.org/geojson-spec.html#id4[Polygon]

A polygon is defined by a list of a list of points. The first and last
points in each list must be the same (the polygon must be closed).

[source,js]
--------------------------------------------------
{
    "location" : {
        "type" : "polygon",
        "coordinates" : [
            [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ]
        ]
    }
}
--------------------------------------------------

The first array represents the outer boundary of the polygon, the other
arrays represent the interior shapes ("holes"):

[source,js]
--------------------------------------------------
{
    "location" : {
        "type" : "polygon",
        "coordinates" : [
            [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ],
            [ [100.2, 0.2], [100.8, 0.2], [100.8, 0.8], [100.2, 0.8], [100.2, 0.2] ]
        ]
    }
}
--------------------------------------------------

[float]
===== http://www.geojson.org/geojson-spec.html#id7[MultiPolygon]

A list of geojson polygons.

[source,js]
--------------------------------------------------
{
    "location" : {
        "type" : "multipolygon",
        "coordinates" : [
            [[[102.0, 2.0], [103.0, 2.0], [103.0, 3.0], [102.0, 3.0], [102.0, 2.0]]],
            [[[100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0]],
            [[100.2, 0.2], [100.8, 0.2], [100.8, 0.8], [100.2, 0.8], [100.2, 0.2]]]
        ]
    }
}
--------------------------------------------------

[float]
==== Sorting and Retrieving index Shapes

Due to the complex input structure and index representation of shapes,
it is not currently possible to sort shapes or retrieve their fields
directly. The geo_shape value is only retrievable through the `_source`
field.