summaryrefslogtreecommitdiff
path: root/docs/reference/query-dsl/queries/regexp-syntax.asciidoc
diff options
context:
space:
mode:
authorHilko Bengen <bengen@debian.org>2014-06-07 12:02:12 +0200
committerHilko Bengen <bengen@debian.org>2014-06-07 12:02:12 +0200
commitd5ed89b946297270ec28abf44bef2371a06f1f4f (patch)
treece2d945e4dde69af90bd9905a70d8d27f4936776 /docs/reference/query-dsl/queries/regexp-syntax.asciidoc
downloadelasticsearch-d5ed89b946297270ec28abf44bef2371a06f1f4f.tar.gz
Imported Upstream version 1.0.3upstream/1.0.3
Diffstat (limited to 'docs/reference/query-dsl/queries/regexp-syntax.asciidoc')
-rw-r--r--docs/reference/query-dsl/queries/regexp-syntax.asciidoc280
1 files changed, 280 insertions, 0 deletions
diff --git a/docs/reference/query-dsl/queries/regexp-syntax.asciidoc b/docs/reference/query-dsl/queries/regexp-syntax.asciidoc
new file mode 100644
index 0000000..5d2c061
--- /dev/null
+++ b/docs/reference/query-dsl/queries/regexp-syntax.asciidoc
@@ -0,0 +1,280 @@
+[[regexp-syntax]]
+==== Regular expression syntax
+
+Regular expression queries are supported by the `regexp` and the `query_string`
+queries. The Lucene regular expression engine
+is not Perl-compatible but supports a smaller range of operators.
+
+[NOTE]
+====
+We will not attempt to explain regular expressions, but
+just explain the supported operators.
+====
+
+===== Standard operators
+
+Anchoring::
++
+--
+
+Most regular expression engines allow you to match any part of a string.
+If you want the regexp pattern to start at the beginning of the string or
+finish at the end of the string, then you have to _anchor_ it specifically,
+using `^` to indicate the beginning or `$` to indicate the end.
+
+Lucene's patterns are always anchored. The pattern provided must match
+the entire string. For string `"abcde"`:
+
+ ab.* # match
+ abcd # no match
+
+--
+
+Allowed characters::
++
+--
+
+Any Unicode characters may be used in the pattern, but certain characters
+are reserved and must be escaped. The standard reserved characters are:
+
+....
+. ? + * | { } [ ] ( ) " \
+....
+
+If you enable optional features (see below) then these characters may
+also be reserved:
+
+ # @ & < > ~
+
+Any reserved character can be escaped with a backslash `"\*"` including
+a literal backslash character: `"\\"`
+
+Additionally, any characters (except double quotes) are interpreted literally
+when surrounded by double quotes:
+
+ john"@smith.com"
+
+
+--
+
+Match any character::
++
+--
+
+The period `"."` can be used to represent any character. For string `"abcde"`:
+
+ ab... # match
+ a.c.e # match
+
+--
+
+One-or-more::
++
+--
+
+The plus sign `"+"` can be used to repeat the preceding shortest pattern
+once or more times. For string `"aaabbb"`:
+
+ a+b+ # match
+ aa+bb+ # match
+ a+.+ # match
+ aa+bbb+ # no match
+
+--
+
+Zero-or-more::
++
+--
+
+The asterisk `"*"` can be used to match the preceding shortest pattern
+zero-or-more times. For string `"aaabbb`":
+
+ a*b* # match
+ a*b*c* # match
+ .*bbb.* # match
+ aaa*bbb* # match
+
+--
+
+Zero-or-one::
++
+--
+
+The question mark `"?"` makes the preceding shortest pattern optional. It
+matches zero or one times. For string `"aaabbb"`:
+
+ aaa?bbb? # match
+ aaaa?bbbb? # match
+ .....?.? # match
+ aa?bb? # no match
+
+--
+
+Min-to-max::
++
+--
+
+Curly brackets `"{}"` can be used to specify a minimum and (optionally)
+a maximum number of times the preceding shortest pattern can repeat. The
+allowed forms are:
+
+ {5} # repeat exactly 5 times
+ {2,5} # repeat at least twice and at most 5 times
+ {2,} # repeat at least twice
+
+For string `"aaabbb"`:
+
+ a{3}b{3} # match
+ a{2,4}b{2,4} # match
+ a{2,}b{2,} # match
+ .{3}.{3} # match
+ a{4}b{4} # no match
+ a{4,6}b{4,6} # no match
+ a{4,}b{4,} # no match
+
+--
+
+Grouping::
++
+--
+
+Parentheses `"()"` can be used to form sub-patterns. The quantity operators
+listed above operate on the shortest previous pattern, which can be a group.
+For string `"ababab"`:
+
+ (ab)+ # match
+ ab(ab)+ # match
+ (..)+ # match
+ (...)+ # no match
+ (ab)* # match
+ abab(ab)? # match
+ ab(ab)? # no match
+ (ab){3} # match
+ (ab){1,2} # no match
+
+--
+
+Alternation::
++
+--
+
+The pipe symbol `"|"` acts as an OR operator. The match will succeed if
+the pattern on either the left-hand side OR the right-hand side matches.
+The alternation applies to the _longest pattern_, not the shortest.
+For string `"aabb"`:
+
+ aabb|bbaa # match
+ aacc|bb # no match
+ aa(cc|bb) # match
+ a+|b+ # no match
+ a+b+|b+a+ # match
+ a+(b|c)+ # match
+
+--
+
+Character classes::
++
+--
+
+Ranges of potential characters may be represented as character classes
+by enclosing them in square brackets `"[]"`. A leading `^`
+negates the character class. The allowed forms are:
+
+ [abc] # 'a' or 'b' or 'c'
+ [a-c] # 'a' or 'b' or 'c'
+ [-abc] # '-' or 'a' or 'b' or 'c'
+ [abc\-] # '-' or 'a' or 'b' or 'c'
+ [^a-c] # any character except 'a' or 'b' or 'c'
+ [^a-c] # any character except 'a' or 'b' or 'c'
+ [-abc] # '-' or 'a' or 'b' or 'c'
+ [abc\-] # '-' or 'a' or 'b' or 'c'
+
+Note that the dash `"-"` indicates a range of characeters, unless it is
+the first character or if it is escaped with a backslash.
+
+For string `"abcd"`:
+
+ ab[cd]+ # match
+ [a-d]+ # match
+ [^a-d]+ # no match
+
+--
+
+===== Optional operators
+
+These operators are only available when they are explicitly enabled, by
+passing `flags` to the query.
+
+Multiple flags can be enabled either using the `ALL` flag, or by
+concatenating flags with a pipe `"|"`:
+
+ {
+ "regexp": {
+ "username": {
+ "value": "john~athon<1-5>",
+ "flags": "COMPLEMENT|INTERVAL"
+ }
+ }
+ }
+
+Complement::
++
+--
+
+The complement is probably the most useful option. The shortest pattern that
+follows a tilde `"~"` is negated. For the string `"abcdef"`:
+
+ ab~df # match
+ ab~cf # no match
+ a~(cd)f # match
+ a~(bc)f # no match
+
+Enabled with the `COMPLEMENT` or `ALL` flags.
+
+--
+
+Interval::
++
+--
+
+The interval option enables the use of numeric ranges, enclosed by angle
+brackets `"<>"`. For string: `"foo80"`:
+
+ foo<1-100> # match
+ foo<01-100> # match
+ foo<001-100> # no match
+
+Enabled with the `INTERVAL` or `ALL` flags.
+
+
+--
+
+Intersection::
++
+--
+
+The ampersand `"&"` joins two patterns in a way that both of them have to
+match. For string `"aaabbb"`:
+
+ aaa.+&.+bbb # match
+ aaa&bbb # no match
+
+Using this feature usually means that you should rewrite your regular
+expression.
+
+Enabled with the `INTERSECTION` or `ALL` flags.
+
+--
+
+Any string::
++
+--
+
+The at sign `"@"` matches any string in its entirety. This could be combined
+with the intersection and complement above to express ``everything except''.
+For instance:
+
+ @&~(foo.+) # anything except string beginning with "foo"
+
+Enabled with the `ANYSTRING` or `ALL` flags.
+--