summaryrefslogtreecommitdiff
path: root/docs/reference/index-modules/merge.asciidoc
diff options
context:
space:
mode:
Diffstat (limited to 'docs/reference/index-modules/merge.asciidoc')
-rw-r--r--docs/reference/index-modules/merge.asciidoc215
1 files changed, 215 insertions, 0 deletions
diff --git a/docs/reference/index-modules/merge.asciidoc b/docs/reference/index-modules/merge.asciidoc
new file mode 100644
index 0000000..84d2675
--- /dev/null
+++ b/docs/reference/index-modules/merge.asciidoc
@@ -0,0 +1,215 @@
+[[index-modules-merge]]
+== Merge
+
+A shard in elasticsearch is a Lucene index, and a Lucene index is broken
+down into segments. Segments are internal storage elements in the index
+where the index data is stored, and are immutable up to delete markers.
+Segments are, periodically, merged into larger segments to keep the
+index size at bay and expunge deletes.
+
+The more segments one has in the Lucene index means slower searches and
+more memory used. Segment merging is used to reduce the number of segments,
+however merges can be expensive to perform, especially on low IO environments.
+Merges can be throttled using <<store-throttling,store level throttling>>.
+
+
+[float]
+[[policy]]
+=== Policy
+
+The index merge policy module allows one to control which segments of a
+shard index are to be merged. There are several types of policies with
+the default set to `tiered`.
+
+[float]
+[[tiered]]
+==== tiered
+
+Merges segments of approximately equal size, subject to an allowed
+number of segments per tier. This is similar to `log_bytes_size` merge
+policy, except this merge policy is able to merge non-adjacent segment,
+and separates how many segments are merged at once from how many
+segments are allowed per tier. This merge policy also does not
+over-merge (i.e., cascade merges).
+
+This policy has the following settings:
+
+`index.merge.policy.expunge_deletes_allowed`::
+
+ When expungeDeletes is called, we only merge away a segment if its delete
+ percentage is over this threshold. Default is `10`.
+
+`index.merge.policy.floor_segment`::
+
+ Segments smaller than this are "rounded up" to this size, i.e. treated as
+ equal (floor) size for merge selection. This is to prevent frequent
+ flushing of tiny segments from allowing a long tail in the index. Default
+ is `2mb`.
+
+`index.merge.policy.max_merge_at_once`::
+
+ Maximum number of segments to be merged at a time during "normal" merging.
+ Default is `10`.
+
+`index.merge.policy.max_merge_at_once_explicit`::
+
+ Maximum number of segments to be merged at a time, during optimize or
+ expungeDeletes. Default is `30`.
+
+`index.merge.policy.max_merged_segment`::
+
+ Maximum sized segment to produce during normal merging (not explicit
+ optimize). This setting is approximate: the estimate of the merged segment
+ size is made by summing sizes of to-be-merged segments (compensating for
+ percent deleted docs). Default is `5gb`.
+
+`index.merge.policy.segments_per_tier`::
+
+ Sets the allowed number of segments per tier. Smaller values mean more
+ merging but fewer segments. Default is `10`. Note, this value needs to be
+ >= then the `max_merge_at_once` otherwise you'll force too many merges to
+ occur.
+
+`index.reclaim_deletes_weight`::
+
+ Controls how aggressively merges that reclaim more deletions are favored.
+ Higher values favor selecting merges that reclaim deletions. A value of
+ `0.0` means deletions don't impact merge selection. Defaults to `2.0`.
+
+`index.compound_format`::
+
+ Should the index be stored in compound format or not. Defaults to `false`.
+ See <<index-compound-format,`index.compound_format`>> in
+ <<index-modules-settings>>.
+
+For normal merging, this policy first computes a "budget" of how many
+segments are allowed by be in the index. If the index is over-budget,
+then the policy sorts segments by decreasing size (pro-rating by percent
+deletes), and then finds the least-cost merge. Merge cost is measured by
+a combination of the "skew" of the merge (size of largest seg divided by
+smallest seg), total merge size and pct deletes reclaimed, so that
+merges with lower skew, smaller size and those reclaiming more deletes,
+are favored.
+
+If a merge will produce a segment that's larger than
+`max_merged_segment` then the policy will merge fewer segments (down to
+1 at once, if that one has deletions) to keep the segment size under
+budget.
+
+Note, this can mean that for large shards that holds many gigabytes of
+data, the default of `max_merged_segment` (`5gb`) can cause for many
+segments to be in an index, and causing searches to be slower. Use the
+indices segments API to see the segments that an index have, and
+possibly either increase the `max_merged_segment` or issue an optimize
+call for the index (try and aim to issue it on a low traffic time).
+
+[float]
+[[log-byte-size]]
+==== log_byte_size
+
+A merge policy that merges segments into levels of exponentially
+increasing *byte size*, where each level has fewer segments than the
+value of the merge factor. Whenever extra segments (beyond the merge
+factor upper bound) are encountered, all segments within the level are
+merged.
+
+This policy has the following settings:
+
+[cols="<,<",options="header",]
+|=======================================================================
+|Setting |Description
+|index.merge.policy.merge_factor |Determines how often segment indices
+are merged by index operation. With smaller values, less RAM is used
+while indexing, and searches on unoptimized indices are faster, but
+indexing speed is slower. With larger values, more RAM is used during
+indexing, and while searches on unoptimized indices are slower, indexing
+is faster. Thus larger values (greater than 10) are best for batch index
+creation, and smaller values (lower than 10) for indices that are
+interactively maintained. Defaults to `10`.
+
+|index.merge.policy.min_merge_size |A size setting type which sets the
+minimum size for the lowest level segments. Any segments below this size
+are considered to be on the same level (even if they vary drastically in
+size) and will be merged whenever there are mergeFactor of them. This
+effectively truncates the "long tail" of small segments that would
+otherwise be created into a single level. If you set this too large, it
+could greatly increase the merging cost during indexing (if you flush
+many small segments). Defaults to `1.6mb`
+
+|index.merge.policy.max_merge_size |A size setting type which sets the
+largest segment (measured by total byte size of the segment's files)
+that may be merged with other segments. Defaults to unbounded.
+
+|index.merge.policy.max_merge_docs |Determines the largest segment
+(measured by document count) that may be merged with other segments.
+Defaults to unbounded.
+|=======================================================================
+
+[float]
+[[log-doc]]
+==== log_doc
+
+A merge policy that tries to merge segments into levels of exponentially
+increasing *document count*, where each level has fewer segments than
+the value of the merge factor. Whenever extra segments (beyond the merge
+factor upper bound) are encountered, all segments within the level are
+merged.
+
+[cols="<,<",options="header",]
+|=======================================================================
+|Setting |Description
+|index.merge.policy.merge_factor |Determines how often segment indices
+are merged by index operation. With smaller values, less RAM is used
+while indexing, and searches on unoptimized indices are faster, but
+indexing speed is slower. With larger values, more RAM is used during
+indexing, and while searches on unoptimized indices are slower, indexing
+is faster. Thus larger values (greater than 10) are best for batch index
+creation, and smaller values (lower than 10) for indices that are
+interactively maintained. Defaults to `10`.
+
+|index.merge.policy.min_merge_docs |Sets the minimum size for the lowest
+level segments. Any segments below this size are considered to be on the
+same level (even if they vary drastically in size) and will be merged
+whenever there are mergeFactor of them. This effectively truncates the
+"long tail" of small segments that would otherwise be created into a
+single level. If you set this too large, it could greatly increase the
+merging cost during indexing (if you flush many small segments).
+Defaults to `1000`.
+
+|index.merge.policy.max_merge_docs |Determines the largest segment
+(measured by document count) that may be merged with other segments.
+Defaults to unbounded.
+|=======================================================================
+
+[float]
+[[scheduling]]
+=== Scheduling
+
+The merge schedule controls the execution of merge operations once they
+are needed (according to the merge policy). The following types are
+supported, with the default being the `ConcurrentMergeScheduler`.
+
+[float]
+==== ConcurrentMergeScheduler
+
+A merge scheduler that runs merges using a separated thread, until the
+maximum number of threads at which when a merge is needed, the thread(s)
+that are updating the index will pause until one or more merges
+completes.
+
+The scheduler supports the following settings:
+
+[cols="<,<",options="header",]
+|=======================================================================
+|Setting |Description
+|index.merge.scheduler.max_thread_count |The maximum number of threads
+to perform the merge operation. Defaults to
+`Math.max(1, Math.min(3, Runtime.getRuntime().availableProcessors() / 2))`.
+|=======================================================================
+
+[float]
+==== SerialMergeScheduler
+
+A merge scheduler that simply does each merge sequentially using the
+calling thread (blocking the operations that triggered the merge, the
+index operation).