summaryrefslogtreecommitdiff
path: root/usr/src/man/man7/regex.7
diff options
context:
space:
mode:
Diffstat (limited to 'usr/src/man/man7/regex.7')
-rw-r--r--usr/src/man/man7/regex.71040
1 files changed, 1040 insertions, 0 deletions
diff --git a/usr/src/man/man7/regex.7 b/usr/src/man/man7/regex.7
new file mode 100644
index 0000000000..99dd14afad
--- /dev/null
+++ b/usr/src/man/man7/regex.7
@@ -0,0 +1,1040 @@
+.\"
+.\" Sun Microsystems, Inc. gratefully acknowledges The Open Group for
+.\" permission to reproduce portions of its copyrighted documentation.
+.\" Original documentation from The Open Group can be obtained online at
+.\" http://www.opengroup.org/bookstore/.
+.\"
+.\" The Institute of Electrical and Electronics Engineers and The Open
+.\" Group, have given us permission to reprint portions of their
+.\" documentation.
+.\"
+.\" In the following statement, the phrase ``this text'' refers to portions
+.\" of the system documentation.
+.\"
+.\" Portions of this text are reprinted and reproduced in electronic form
+.\" in the SunOS Reference Manual, from IEEE Std 1003.1, 2004 Edition,
+.\" Standard for Information Technology -- Portable Operating System
+.\" Interface (POSIX), The Open Group Base Specifications Issue 6,
+.\" Copyright (C) 2001-2004 by the Institute of Electrical and Electronics
+.\" Engineers, Inc and The Open Group. In the event of any discrepancy
+.\" between these versions and the original IEEE and The Open Group
+.\" Standard, the original IEEE and The Open Group Standard is the referee
+.\" document. The original Standard can be obtained online at
+.\" http://www.opengroup.org/unix/online.html.
+.\"
+.\" This notice shall appear on any product containing this material.
+.\"
+.\" The contents of this file are subject to the terms of the
+.\" Common Development and Distribution License (the "License").
+.\" You may not use this file except in compliance with the License.
+.\"
+.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
+.\" or http://www.opensolaris.org/os/licensing.
+.\" See the License for the specific language governing permissions
+.\" and limitations under the License.
+.\"
+.\" When distributing Covered Code, include this CDDL HEADER in each
+.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE.
+.\" If applicable, add the following below this CDDL HEADER, with the
+.\" fields enclosed by brackets "[]" replaced with your own identifying
+.\" information: Portions Copyright [yyyy] [name of copyright owner]
+.\"
+.\"
+.\" Copyright (c) 1992, X/Open Company Limited All Rights Reserved
+.\" Portions Copyright (c) 1999, Sun Microsystems, Inc. All Rights Reserved
+.\" Copyright 2017 Nexenta Systems, Inc.
+.\"
+.Dd August 14, 2020
+.Dt REGEX 7
+.Os
+.Sh NAME
+.Nm regex
+.Nd internationalized basic and extended regular expression matching
+.Sh DESCRIPTION
+Regular Expressions
+.Pq REs
+provide a mechanism to select specific strings from a set of character strings.
+The Internationalized Regular Expressions described below differ from the Simple
+Regular Expressions described on the
+.Xr regexp 7
+manual page in the following ways:
+.Bl -bullet
+.It
+both Basic and Extended Regular Expressions are supported
+.It
+the Internationalization features -- character class, equivalence class, and
+multi-character collation -- are supported.
+.El
+.Pp
+The Basic Regular Expression
+.Pq BRE
+notation and construction rules described in the
+.Sx BASIC REGULAR EXPRESSIONS
+section apply to most utilities supporting regular expressions.
+Some utilities, instead, support the Extended Regular Expressions
+.Pq ERE
+described in the
+.Sx EXTENDED REGULAR EXPRESSIONS
+section; any exceptions for both cases are noted in the descriptions of the
+specific utilities using regular expressions.
+Both BREs and EREs are supported by the Regular Expression Matching interfaces
+.Xr regcomp 3C
+and
+.Xr regexec 3C .
+.Sh BASIC REGULAR EXPRESSIONS
+.Ss BREs Matching a Single Character
+A BRE ordinary character, a special character preceded by a backslash, or a
+period matches a single character.
+A bracket expression matches a single character or a single collating element.
+See
+.Sx RE Bracket Expression ,
+below.
+.Ss BRE Ordinary Characters
+An ordinary character is a BRE that matches itself: any character in the
+supported character set, except for the BRE special characters listed in
+.Sx BRE Special Characters ,
+below.
+.Pp
+The interpretation of an ordinary character preceded by a backslash
+.Pq Qq \e
+is undefined, except for:
+.Bl -enum
+.It
+the characters
+.Qq \&) ,
+.Qq \&( ,
+.Qq { ,
+and
+.Qq }
+.It
+the digits 1 to 9 inclusive
+.Po see
+.Sx BREs Matching Multiple Characters ,
+below
+.Pc
+.It
+a character inside a bracket expression.
+.El
+.Ss BRE Special Characters
+A BRE special character has special properties in certain contexts.
+Outside those contexts, or when preceded by a backslash, such a character will
+be a BRE that matches the special character itself.
+The BRE special characters and the contexts in which they have their special
+meaning are:
+.Bl -tag -width Ds
+.It Sy \&. \&[ \&\e
+The period, left-bracket, and backslash are special except when used in a
+bracket expression
+.Po see
+.Sx RE Bracket Expression ,
+below
+.Pc .
+An expression containing a
+.Qq \&[
+that is not preceded by a backslash and is not part of a bracket expression
+produces undefined results.
+.It Sy *
+The asterisk is special except when used:
+.Bl -bullet
+.It
+in a bracket expression
+.It
+as the first character of an entire BRE
+.Po after an initial
+.Qq ^ ,
+if any
+.Pc
+.It
+as the first character of a subexpression
+.Po after an initial
+.Qq ^ ,
+if any; see
+.Sx BREs Matching Multiple Characters ,
+below
+.Pc .
+.El
+.It Sy ^
+The circumflex is special when used:
+.Bl -bullet
+.It
+as an anchor
+.Po see
+.Sx BRE Expression Anchoring ,
+below
+.Pc .
+.It
+as the first character of a bracket expression
+.Po see
+.Sx RE Bracket Expression ,
+below
+.Pc .
+.El
+.It Sy $
+The dollar sign is special when used as an anchor.
+.El
+.Ss Periods in BREs
+A period
+.Pq Qq \&. ,
+when used outside a bracket expression, is a BRE that matches any character in
+the supported character set except NUL.
+.Ss RE Bracket Expression
+A bracket expression
+.Po an expression enclosed in square brackets,
+.Qq []
+.Pc
+is an RE that matches a single collating element contained in the non-empty set
+of collating elements represented by the bracket expression.
+.Pp
+The following rules and definitions apply to bracket expressions:
+.Bl -enum
+.It
+A
+.Em bracket expression
+is either a matching list expression or a non-matching list expression.
+It consists of one or more expressions: collating elements, collating symbols,
+equivalence classes, character classes, or range expressions
+.Pq see rule 7 below .
+Portable applications must not use range expressions, even though all
+implementations support them.
+The right-bracket
+.Pq Qq \&]
+loses its special meaning and represents itself in a bracket expression if it
+occurs first in the list
+.Po after an initial circumflex
+.Pq Qq ^ ,
+if any
+.Pc .
+Otherwise, it terminates the bracket expression, unless it appears in a
+collating symbol
+.Po such as
+.Qq [.].]
+.Pc
+or is the ending right-bracket for a collating symbol, equivalence class, or
+character class.
+.Pp
+The special characters
+.Qq \&. ,
+.Qq * ,
+.Qq \&[ ,
+.Qq \&\e
+.Pq period, asterisk, left-bracket and backslash, respectively
+lose their special meaning within a bracket expression.
+.Pp
+The character sequences
+.Qq [. ,
+.Qq [= ,
+.Qq [:
+.Pq left-bracket followed by a period, equals-sign, or colon
+are special inside a bracket expression and are used to delimit collating
+symbols, equivalence class expressions, and character class expressions.
+These symbols must be followed by a valid expression and the matching
+terminating sequence
+.Qq .] ,
+.Qq =]
+or
+.Qq :] ,
+as described in the following items.
+.It
+A
+.Em matching list expression
+specifies a list that matches any one of the expressions represented in the
+list.
+The first character in the list must not be the circumflex.
+For example,
+.Qq [abc]
+is an RE that matches any of the characters
+.Qq a ,
+.Qq b
+or
+.Qq c .
+.It
+A
+.Em non-matching list expression
+begins with a circumflex
+.Pq Qq ^ ,
+and specifies a list that matches any character or collating element except for
+the expressions represented in the list after the leading circumflex.
+For example,
+.Qq [^abc]
+is an RE that matches any character or collating element except the characters
+.Qq a ,
+.Qq b ,
+or
+.Qq c .
+The circumflex will have this special meaning only when it occurs first in the
+list, immediately following the left-bracket.
+.It
+A
+.Em collating symbol
+is a collating element enclosed within bracket-period
+.Pq Qq [..]
+delimiters.
+Multi-character collating elements must be represented as collating symbols when
+it is necessary to distinguish them from a list of the individual characters
+that make up the multi-character collating element.
+For example, if the string
+.Qq ch
+is a collating element in the current collation sequence with the associated
+collating symbol
+.Qq Aq ch ,
+the expression
+.Qq [[.ch.]]
+will be treated as an RE matching the character sequence
+.Qq ch ,
+while
+.Qq [ch]
+will be treated as an RE matching
+.Qq c
+or
+.Qq h .
+Collating symbols will be recognized only inside bracket expressions.
+This implies that the RE
+.Qq [[.ch.]]*c
+matches the first to fifth character in the string
+.Qq chchch.
+If the string is not a collating element in the current collating sequence
+definition, or if the collating element has no characters associated with it,
+the symbol will be treated as an invalid expression.
+.It
+An
+.Em equivalence class expression
+represents the set of collating elements belonging to an equivalence class.
+Only primary equivalence classes will be recognised.
+The class is expressed by enclosing any one of the collating elements in the
+equivalence class within bracket-equal
+.Pq Qq [==]
+delimiters.
+For example, if
+.Qq a
+and
+.Qq b
+belong to the same equivalence class, then
+.Qq [[=a=]b] ,
+.Qq [[==]a]
+and
+.Qq [[==]b]
+will each be equivalent to
+.Qq [ab] .
+If the collating element does not belong to an equivalence class, the
+equivalence class expression will be treated as a
+.Em collating symbol .
+.It
+A
+.Em character class expression
+represents the set of characters belonging to a character class, as defined in
+the
+.Ev LC_CTYPE
+category in the current locale.
+All character classes specified in the current locale will be recognized.
+A character class expression is expressed as a character class name enclosed
+within bracket-colon
+.Pq Qq [::]
+delimiters.
+.Pp
+The following character class expressions are supported in all locales:
+.Bl -column "[:alnum:]" "[:cntrl:]" "[:lower:]" "[:xdigit:]"
+.It [:alnum:] Ta [:cntrl:] Ta [:lower:] Ta [:space:]
+.It [:alpha:] Ta [:digit:] Ta [:print:] Ta [:upper:]
+.It [:blank:] Ta [:graph:] Ta [:punct:] Ta [:xdigit:]
+.El
+.Pp
+In addition, character class expressions of the form
+.Qq [:name:]
+are recognized in those locales where the
+.Em name
+keyword has been given a
+.Em charclass
+definition in the
+.Ev LC_CTYPE
+category.
+.It
+A
+.Em range expression
+represents the set of collating elements that fall between two elements in the
+current collation sequence, inclusively.
+It is expressed as the starting point and the ending point separated by a hyphen
+.Pq Qq - .
+.Pp
+Range expressions must not be used in portable applications because their
+behavior is dependent on the collating sequence.
+Ranges will be treated according to the current collating sequence, and include
+such characters that fall within the range based on that collating sequence,
+regardless of character values.
+This, however, means that the interpretation will differ depending on collating
+sequence.
+If, for instance, one collating sequence defines as a variant of
+.Qq a ,
+while another defines it as a letter following
+.Qq z ,
+then the expression
+.Qq [-z]
+is valid in the first language and invalid in the second.
+.sp
+In the following, all examples assume the collation sequence specified for the
+POSIX locale, unless another collation sequence is specifically defined.
+.Pp
+The starting range point and the ending range point must be a collating element
+or collating symbol.
+An equivalence class expression used as a starting or ending point of a range
+expression produces unspecified results.
+An equivalence class can be used portably within a bracket expression, but only
+outside the range.
+For example, the unspecified expression
+.Qq [[=e=]-f]
+should be given as
+.Qq [[=e=]e-f] .
+The ending range point must collate equal to or higher than the starting range
+point; otherwise, the expression will be treated as invalid.
+The order used is the order in which the collating elements are specified in the
+current collation definition.
+One-to-many mappings
+.Po see
+.Xr locale 7
+.Pc
+will not be performed.
+For example, assuming that the character
+.Qq eszet
+is placed in the collation sequence after
+.Qq r
+and
+.Qq s ,
+but before
+.Qq t ,
+and that it maps to the sequence
+.Qq ss
+for collation purposes, then the expression
+.Qq [r-s]
+matches only
+.Qq r
+and
+.Qq s ,
+but the expression
+.Qq [s-t]
+matches
+.Qq s ,
+.Qq beta ,
+or
+.Qq t .
+.Pp
+The interpretation of range expressions where the ending range point is also
+the starting range point of a subsequent range expression
+.Po for instance
+.Qq [a-m-o]
+.Pc
+is undefined.
+.Pp
+The hyphen character will be treated as itself if it occurs first
+.Po after an initial
+.Qq ^ ,
+if any
+.Pc
+or last in the list, or as an ending range point in a range expression.
+As examples, the expressions
+.Qq [-ac]
+and
+.Qq [ac-]
+are equivalent and match any of the characters
+.Qq a ,
+.Qq c ,
+or
+.Qq -;
+.Qq [^-ac]
+and
+.Qq [^ac-]
+are equivalent and match any characters except
+.Qq a ,
+.Qq c ,
+or
+.Qq -;
+the expression
+.Qq [%--]
+matches any of the characters between
+.Qq %
+and
+.Qq -
+inclusive; the expression
+.Qq [--@]
+matches any of the characters between
+.Qq -
+and
+.Qq @
+inclusive; and the expression
+.Qq [a--@]
+is invalid, because the letter
+.Qq a
+follows the symbol
+.Qq -
+in the POSIX locale.
+To use a hyphen as the starting range point, it must either come first in the
+bracket expression or be specified as a collating symbol, for example:
+.Qq [][.-.]-0] ,
+which matches either a right bracket or any character or collating element that
+collates between hyphen and 0, inclusive.
+.Pp
+If a bracket expression must specify both
+.Qq -
+and
+.Qq \&] ,
+the
+.Qq \&]
+must be placed first
+.Po after the
+.Qq ^ ,
+if any
+.Pc
+and the
+.Qq -
+last within the bracket expression.
+.El
+.Pp
+Note: Latin-1 characters such as
+.Qq \(ga
+or
+.Qq ^
+are not printable in some locales, for example, the
+.Em ja
+locale.
+.Ss BREs Matching Multiple Characters
+The following rules can be used to construct BREs matching multiple characters
+from BREs matching a single character:
+.Bl -enum
+.It
+The concatenation of BREs matches the concatenation of the strings matched
+by each component of the BRE.
+.It
+A
+.Em subexpression
+can be defined within a BRE by enclosing it between the character pairs
+.Qq \e(
+and
+.Qq \e) .
+Such a subexpression matches whatever it would have matched without the
+.Qq \e(
+and
+.Qq \e) ,
+except that anchoring within subexpressions is optional behavior; see
+.Sx BRE Expression Anchoring ,
+below.
+Subexpressions can be arbitrarily nested.
+.It
+The
+.Em back-reference
+expression
+.Qq \e Ns Em n
+matches the same
+.Pq possibly empty
+string of characters as was matched by a subexpression enclosed between
+.Qq \e(
+and
+.Qq \e)
+preceding the
+.Qq \e Ns Em n .
+The character
+.Qq Em n
+must be a digit from 1 to 9 inclusive,
+.Em n Ns th
+subexpression
+.Po the one that begins with the
+.Em n Ns th
+.Qq \e(
+and ends with the corresponding paired
+.Qq \e)
+.Pc .
+The expression is invalid if less than
+.Em n
+subexpressions precede the
+.Qq \e Ns Em n .
+For example, the expression
+.Qq ^\e(.*\e)\e1$
+matches a line consisting of two adjacent appearances of the same string, and
+the expression
+.Qq \e(a\e)*\e1
+fails to match
+.Qq a .
+The limit of nine back-references to subexpressions in the RE is based on the
+use of a single digit identifier.
+This does not imply that only nine subexpressions are allowed in REs.
+.It
+When a BRE matching a single character, a subexpression or a back-reference is
+followed by the special character asterisk
+.Pq Qq * ,
+together with that asterisk it matches what zero or more consecutive occurrences
+of the BRE would match.
+For example,
+.Qq [ab]*
+and
+.Qq [ab][ab]
+are equivalent when matching the string
+.Qq ab .
+.It
+When a BRE matching a single character, a subexpression, or a back-reference
+is followed by an
+.Em interval expression
+of the format
+.Qq \e{ Ns Em m Ns \e} ,
+.Qq \e{ Ns Em m Ns ,\e}
+or
+.Qq \e{ Ns Em m Ns \&, Ns Em n Ns \e} ,
+together with that interval expression it matches what repeated consecutive
+occurrences of the BRE would match.
+The values of
+.Em m
+and
+.Em n
+will be decimal integers in the range 0 <=
+.Em m
+<=
+.Em n
+<=
+.Dv BRE_DUP_MAX ,
+where
+.Em m
+specifies the exact or minimum number of occurrences and
+.Em n
+specifies the maximum number of occurrences.
+The expression
+.Qq \e{ Ns Em m Ns \e}
+matches exactly
+.Em m
+occurrences of the preceding BRE,
+.Qq \e{ Ns Em m Ns ,\e}
+matches at least
+.Em m
+occurrences and
+.Qq \e{ Ns Em m Ns \&, Ns Em n Ns \e}
+matches any number of occurrences between
+.Em m
+and
+.Em n ,
+inclusive.
+.Pp
+For example, in the string
+.Qq abababccccccd ,
+the BRE
+.Qq c\e{3\e}
+is matched by characters seven to nine, the BRE
+.Qq \e(ab\e)\e{4,\e}
+is not matched at all and the BRE
+.Qq c\e{1,3\e}d
+is matched by characters ten to thirteen.
+.El
+.Pp
+The behavior of multiple adjacent duplication symbols
+.Po Qq *
+and intervals
+.Pc
+produces undefined results.
+.Ss BRE Precedence
+The order of precedence is as shown in the following table:
+.Bl -column "BRE Precedence (from high to low)" ""
+.It Sy BRE Precedence (from high to low) Ta
+.It collation-related bracket symbols Ta [= =] [: :] [. .]
+.It escaped characters Ta \e< Ns Em special character Ns >
+.It bracket expression Ta [ ]
+.It subexpressions/back-references Ta \e( \e) \e Ns Em n
+.It single-character-BRE duplication Ta * \e{ Ns Em m Ns \&, Ns Em n Ns \e}
+.It concatenation Ta
+.It anchoring Ta ^ $
+.El
+.Ss BRE Expression Anchoring
+A BRE can be limited to matching strings that begin or end a line; this is
+called
+.Em anchoring .
+The circumflex and dollar sign special characters will be considered BRE anchors
+in the following contexts:
+.Bl -enum
+.It
+A circumflex
+.Pq Qq ^
+is an anchor when used as the first character of an entire BRE.
+The implementation may treat circumflex as an anchor when used as the first
+character of a subexpression.
+The circumflex will anchor the expression to the beginning of a string;
+only sequences starting at the first character of a string will be matched by
+the BRE.
+For example, the BRE
+.Qq ^ab
+matches
+.Qq ab
+in the string
+.Qq abcdef ,
+but fails to match in the string
+.Qq cdefab .
+A portable BRE must escape a leading circumflex in a subexpression to match a
+literal circumflex.
+.It
+A dollar sign
+.Pq Qq $
+is an anchor when used as the last character of an entire BRE.
+The implementation may treat a dollar sign as an anchor when used as the last
+character of a subexpression.
+The dollar sign will anchor the expression to the end of the string being
+matched; the dollar sign can be said to match the end-of-string following the
+last character.
+.It
+A BRE anchored by both
+.Qq ^
+and
+.Qq $
+matches only an entire string.
+For example, the BRE
+^abcdef$
+matches strings consisting only of
+.Qq abcdef .
+.It
+.Qq ^
+and
+.Qq $
+are not special in subexpressions.
+.El
+.Pp
+Note: The Solaris implementation does not support anchoring in BRE
+subexpressions.
+.Sh EXTENDED REGULAR EXPRESSIONS
+The rules specified for BREs apply to Extended Regular Expressions
+.Pq EREs
+with the following exceptions:
+.Bl -bullet
+.It
+The characters
+.Qq | ,
+.Qq + ,
+and
+.Qq \&?
+have special meaning, as defined below.
+.It
+The
+.Qq {
+and
+.Qq }
+characters, when used as the duplication operator, are not preceded by
+backslashes.
+The constructs
+.Qq \e{
+and
+.Qq \e}
+simply match the characters
+.Qq {
+and
+.Qq }, respectively.
+.It
+The back reference operator is not supported.
+.It
+Anchoring
+.Pq Qq ^$
+is supported in subexpressions.
+.El
+.Ss EREs Matching a Single Character
+An ERE ordinary character, a special character preceded by a backslash, or a
+period matches a single character.
+A bracket expression matches a single character or a single collating element.
+An
+.Em ERE matching a single character
+enclosed in parentheses matches the same as the ERE without parentheses would
+have matched.
+.Ss ERE Ordinary Characters
+An
+.Em ordinary character
+is an ERE that matches itself.
+An ordinary character is any character in the supported character set, except
+for the ERE special characters listed in
+.Sx ERE Special Characters
+below.
+The interpretation of an ordinary character preceded by a backslash
+.Pq Qq \&\e
+is undefined.
+.Ss ERE Special Characters
+An
+.Em ERE special character
+has special properties in certain contexts.
+Outside those contexts, or when preceded by a backslash, such a character is an
+ERE that matches the special character itself.
+The extended regular expression special characters and the contexts in which
+they have their special meaning are:
+.Bl -tag -width Ds
+.It Sy \&. \&[ \&\e \&(
+The period, left-bracket, backslash, and left-parenthesis are special except
+when used in a bracket expression
+.Po see
+.Sx RE Bracket Expression ,
+above
+.Pc .
+Outside a bracket expression, a left-parenthesis immediately followed by a
+right-parenthesis produces undefined results.
+.It Sy \&)
+The right-parenthesis is special when matched with a preceding
+left-parenthesis, both outside a bracket expression.
+.It Sy * + \&? {
+The asterisk, plus-sign, question-mark, and left-brace are special except when
+used in a bracket expression
+.Po see
+.Sx RE Bracket Expression ,
+above
+.Pc .
+Any of the following uses produce undefined results:
+.Bl -bullet
+.It
+if these characters appear first in an ERE, or immediately following a
+vertical-line, circumflex or left-parenthesis
+.It
+if a left-brace is not part of a valid interval expression.
+.El
+.It Sy \&|
+The vertical-line is special except when used in a bracket expression
+.Po see
+.Sx RE Bracket Expression ,
+above
+.Pc .
+A vertical-line appearing first or last in an ERE, or immediately following a
+vertical-line or a left-parenthesis, or immediately preceding a
+right-parenthesis, produces undefined results.
+.It Sy ^
+The circumflex is special when used:
+.Bl -bullet
+.It
+as an anchor
+.Po see
+.Sx ERE Expression Anchoring ,
+below
+.Pc .
+.It
+as the first character of a bracket expression
+.Po see
+.Sx RE Bracket Expression ,
+above
+.Pc .
+.El
+.It Sy $
+The dollar sign is special when used as an anchor.
+.El
+.Ss Periods in EREs
+A period
+.Pq Qq \&. ,
+when used outside a bracket expression, is an ERE that matches any character in
+the supported character set except NUL.
+.Ss ERE Bracket Expression
+The rules for ERE Bracket Expressions are the same as for Basic Regular
+Expressions; see
+.Sx RE Bracket Expression ,
+above.
+.Ss EREs Matching Multiple Characters
+The following rules will be used to construct EREs matching multiple characters
+from EREs matching a single character:
+.Bl -enum
+.It
+A
+.Em concatenation of EREs
+matches the concatenation of the character sequences matched by each component
+of the ERE.
+A concatenation of EREs enclosed in parentheses matches whatever the
+concatenation without the parentheses matches.
+For example, both the ERE
+.Qq cd
+and the ERE
+.Qq (cd)
+are matched by the third and fourth character of the string
+.Qq abcdefabcdef .
+.It
+When an ERE matching a single character or an ERE enclosed in parentheses is
+followed by the special character plus-sign
+.Pq Qq + ,
+together with that plus-sign it matches what one or more consecutive occurrences
+of the ERE would match.
+For example, the ERE
+.Qq b+(bc)
+matches the fourth to seventh characters in the string
+.Qq acabbbcde ;
+.Qq [ab]+
+and
+.Qq [ab][ab]*
+are equivalent.
+.It
+When an ERE matching a single character or an ERE enclosed in parentheses is
+followed by the special character asterisk
+.Pq Qq * ,
+together with that asterisk it matches what zero or more consecutive occurrences
+of the ERE would match.
+For example, the ERE
+.Qq b*c
+matches the first character in the string
+.Qq cabbbcde ,
+and the ERE
+.Qq b*cd
+matches the third to seventh characters in the string
+.Qq cabbbcdebbbbbbcdbc .
+And,
+.Qq [ab]*
+and
+.Qq [ab][ab]
+are equivalent when matching the string
+.Qq ab .
+.It
+When an ERE matching a single character or an ERE enclosed in parentheses is
+followed by the special character question-mark
+.Pq Qq \&? ,
+together with that question-mark it matches what zero or one consecutive
+occurrences of the ERE would match.
+For example, the ERE
+.Qq b?c
+matches the second character in the string
+.Qq acabbbcde .
+.It
+When an ERE matching a single character or an ERE enclosed in parentheses is
+followed by an
+.Em interval expression
+of the format
+.Qq { Ns Em m Ns } ,
+.Qq { Ns Em m Ns ,}
+or
+.Qq { Ns Em m Ns \&, Ns Em n Ns } ,
+together with that interval expression it matches what repeated consecutive
+occurrences of the ERE would match.
+The values of
+.Em m
+and
+.Em n
+will be decimal integers in the range 0 <=
+.Em m
+<=
+.Em n
+<=
+.Dv RE_DUP_MAX ,
+where
+.Em m
+specifies the exact or minimum number of occurrences and
+.Em n
+specifies the maximum number of occurrences.
+The expression
+.Qq { Ns Em m Ns }
+matches exactly
+.Em m
+occurrences of the preceding ERE,
+.Qq { Ns Em m Ns ,}
+matches at least
+.Em m
+occurrences and
+.Qq { Ns m Ns \&, Ns Em n Ns }
+matches any number of occurrences between
+.Em m
+and
+.Em n ,
+inclusive.
+.El
+.Pp
+For example, in the string
+.Qq abababccccccd
+the ERE
+.Qq c{3}
+is matched by characters seven to nine and the ERE
+.Qq (ab){2,}
+is matched by characters one to six.
+.Pp
+The behavior of multiple adjacent duplication symbols
+.Po
+.Qq + ,
+.Qq * ,
+.Qq \&?
+and intervals
+.Pc
+produces undefined results.
+.Ss ERE Alternation
+Two EREs separated by the special character vertical-line
+.Pq Qq |
+match a string that is matched by either.
+For example, the ERE
+.Qq a((bc)|d)
+matches the string
+.Qq abc
+and the string
+.Qq ad .
+Single characters, or expressions matching single characters, separated by the
+vertical bar and enclosed in parentheses, will be treated as an ERE matching a
+single character.
+.Ss ERE Precedence
+The order of precedence will be as shown in the following table:
+.Bl -column "ERE Precedence (from high to low)" ""
+.It Sy ERE Precedence (from high to low) Ta
+.It collation-related bracket symbols Ta [= =] [: :] [. .]
+.It escaped characters Ta \e< Ns Em special character Ns >
+.It bracket expression Ta \&[ \&]
+.It grouping Ta \&( \&)
+.It single-character-ERE duplication Ta * + \&? { Ns Em m Ns \&, Ns Em n Ns}
+.It concatenation Ta
+.It anchoring Ta ^ $
+.It alternation Ta |
+.El
+.Pp
+For example, the ERE
+.Qq abba|cde
+matches either the string
+.Qq abba
+or the string
+.Qq cde
+.Po rather than the string
+.Qq abbade
+or
+.Qq abbcde ,
+because concatenation has a higher order of precedence than alternation
+.Pc .
+.Ss ERE Expression Anchoring
+An ERE can be limited to matching strings that begin or end a line; this is
+called
+.Em anchoring .
+The circumflex and dollar sign special characters are considered ERE anchors
+when used anywhere outside a bracket expression.
+This has the following effects:
+.Bl -enum
+.It
+A circumflex
+.Pq Qq ^
+outside a bracket expression anchors the expression or subexpression it begins
+to the beginning of a string; such an expression or subexpression can match only
+a sequence starting at the first character of a string.
+For example, the EREs
+.Qq ^ab
+and
+.Qq (^ab)
+match
+.Qq ab
+in the string
+.Qq abcdef ,
+but fail to match in the string
+.Qq cdefab ,
+and the ERE
+.Qq a^b
+is valid, but can never match because the
+.Qq a
+prevents the expression
+.Qq ^b
+from matching starting at the first character.
+.It
+A dollar sign
+.Pq Qq $
+outside a bracket expression anchors the expression or subexpression it ends to
+the end of a string; such an expression or subexpression can match only a
+sequence ending at the last character of a string.
+For example, the EREs
+.Qq ef$
+and
+.Qq (ef$)
+match
+.Qq ef
+in the string
+.Qq abcdef ,
+but fail to match in the string
+.Qq cdefab ,
+and the ERE
+.Qq e$f
+is valid, but can never match because the
+.Qq f
+prevents the expression
+.Qq e$
+from matching ending at the last character.
+.El
+.Sh SEE ALSO
+.Xr localedef 1 ,
+.Xr regcomp 3C ,
+.Xr attributes 7 ,
+.Xr environ 7 ,
+.Xr locale 7 ,
+.Xr regexp 7