diff options
Diffstat (limited to 'usr/src/man/man7/regex.7')
-rw-r--r-- | usr/src/man/man7/regex.7 | 1040 |
1 files changed, 1040 insertions, 0 deletions
diff --git a/usr/src/man/man7/regex.7 b/usr/src/man/man7/regex.7 new file mode 100644 index 0000000000..99dd14afad --- /dev/null +++ b/usr/src/man/man7/regex.7 @@ -0,0 +1,1040 @@ +.\" +.\" Sun Microsystems, Inc. gratefully acknowledges The Open Group for +.\" permission to reproduce portions of its copyrighted documentation. +.\" Original documentation from The Open Group can be obtained online at +.\" http://www.opengroup.org/bookstore/. +.\" +.\" The Institute of Electrical and Electronics Engineers and The Open +.\" Group, have given us permission to reprint portions of their +.\" documentation. +.\" +.\" In the following statement, the phrase ``this text'' refers to portions +.\" of the system documentation. +.\" +.\" Portions of this text are reprinted and reproduced in electronic form +.\" in the SunOS Reference Manual, from IEEE Std 1003.1, 2004 Edition, +.\" Standard for Information Technology -- Portable Operating System +.\" Interface (POSIX), The Open Group Base Specifications Issue 6, +.\" Copyright (C) 2001-2004 by the Institute of Electrical and Electronics +.\" Engineers, Inc and The Open Group. In the event of any discrepancy +.\" between these versions and the original IEEE and The Open Group +.\" Standard, the original IEEE and The Open Group Standard is the referee +.\" document. The original Standard can be obtained online at +.\" http://www.opengroup.org/unix/online.html. +.\" +.\" This notice shall appear on any product containing this material. +.\" +.\" The contents of this file are subject to the terms of the +.\" Common Development and Distribution License (the "License"). +.\" You may not use this file except in compliance with the License. +.\" +.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE +.\" or http://www.opensolaris.org/os/licensing. +.\" See the License for the specific language governing permissions +.\" and limitations under the License. +.\" +.\" When distributing Covered Code, include this CDDL HEADER in each +.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE. +.\" If applicable, add the following below this CDDL HEADER, with the +.\" fields enclosed by brackets "[]" replaced with your own identifying +.\" information: Portions Copyright [yyyy] [name of copyright owner] +.\" +.\" +.\" Copyright (c) 1992, X/Open Company Limited All Rights Reserved +.\" Portions Copyright (c) 1999, Sun Microsystems, Inc. All Rights Reserved +.\" Copyright 2017 Nexenta Systems, Inc. +.\" +.Dd August 14, 2020 +.Dt REGEX 7 +.Os +.Sh NAME +.Nm regex +.Nd internationalized basic and extended regular expression matching +.Sh DESCRIPTION +Regular Expressions +.Pq REs +provide a mechanism to select specific strings from a set of character strings. +The Internationalized Regular Expressions described below differ from the Simple +Regular Expressions described on the +.Xr regexp 7 +manual page in the following ways: +.Bl -bullet +.It +both Basic and Extended Regular Expressions are supported +.It +the Internationalization features -- character class, equivalence class, and +multi-character collation -- are supported. +.El +.Pp +The Basic Regular Expression +.Pq BRE +notation and construction rules described in the +.Sx BASIC REGULAR EXPRESSIONS +section apply to most utilities supporting regular expressions. +Some utilities, instead, support the Extended Regular Expressions +.Pq ERE +described in the +.Sx EXTENDED REGULAR EXPRESSIONS +section; any exceptions for both cases are noted in the descriptions of the +specific utilities using regular expressions. +Both BREs and EREs are supported by the Regular Expression Matching interfaces +.Xr regcomp 3C +and +.Xr regexec 3C . +.Sh BASIC REGULAR EXPRESSIONS +.Ss BREs Matching a Single Character +A BRE ordinary character, a special character preceded by a backslash, or a +period matches a single character. +A bracket expression matches a single character or a single collating element. +See +.Sx RE Bracket Expression , +below. +.Ss BRE Ordinary Characters +An ordinary character is a BRE that matches itself: any character in the +supported character set, except for the BRE special characters listed in +.Sx BRE Special Characters , +below. +.Pp +The interpretation of an ordinary character preceded by a backslash +.Pq Qq \e +is undefined, except for: +.Bl -enum +.It +the characters +.Qq \&) , +.Qq \&( , +.Qq { , +and +.Qq } +.It +the digits 1 to 9 inclusive +.Po see +.Sx BREs Matching Multiple Characters , +below +.Pc +.It +a character inside a bracket expression. +.El +.Ss BRE Special Characters +A BRE special character has special properties in certain contexts. +Outside those contexts, or when preceded by a backslash, such a character will +be a BRE that matches the special character itself. +The BRE special characters and the contexts in which they have their special +meaning are: +.Bl -tag -width Ds +.It Sy \&. \&[ \&\e +The period, left-bracket, and backslash are special except when used in a +bracket expression +.Po see +.Sx RE Bracket Expression , +below +.Pc . +An expression containing a +.Qq \&[ +that is not preceded by a backslash and is not part of a bracket expression +produces undefined results. +.It Sy * +The asterisk is special except when used: +.Bl -bullet +.It +in a bracket expression +.It +as the first character of an entire BRE +.Po after an initial +.Qq ^ , +if any +.Pc +.It +as the first character of a subexpression +.Po after an initial +.Qq ^ , +if any; see +.Sx BREs Matching Multiple Characters , +below +.Pc . +.El +.It Sy ^ +The circumflex is special when used: +.Bl -bullet +.It +as an anchor +.Po see +.Sx BRE Expression Anchoring , +below +.Pc . +.It +as the first character of a bracket expression +.Po see +.Sx RE Bracket Expression , +below +.Pc . +.El +.It Sy $ +The dollar sign is special when used as an anchor. +.El +.Ss Periods in BREs +A period +.Pq Qq \&. , +when used outside a bracket expression, is a BRE that matches any character in +the supported character set except NUL. +.Ss RE Bracket Expression +A bracket expression +.Po an expression enclosed in square brackets, +.Qq [] +.Pc +is an RE that matches a single collating element contained in the non-empty set +of collating elements represented by the bracket expression. +.Pp +The following rules and definitions apply to bracket expressions: +.Bl -enum +.It +A +.Em bracket expression +is either a matching list expression or a non-matching list expression. +It consists of one or more expressions: collating elements, collating symbols, +equivalence classes, character classes, or range expressions +.Pq see rule 7 below . +Portable applications must not use range expressions, even though all +implementations support them. +The right-bracket +.Pq Qq \&] +loses its special meaning and represents itself in a bracket expression if it +occurs first in the list +.Po after an initial circumflex +.Pq Qq ^ , +if any +.Pc . +Otherwise, it terminates the bracket expression, unless it appears in a +collating symbol +.Po such as +.Qq [.].] +.Pc +or is the ending right-bracket for a collating symbol, equivalence class, or +character class. +.Pp +The special characters +.Qq \&. , +.Qq * , +.Qq \&[ , +.Qq \&\e +.Pq period, asterisk, left-bracket and backslash, respectively +lose their special meaning within a bracket expression. +.Pp +The character sequences +.Qq [. , +.Qq [= , +.Qq [: +.Pq left-bracket followed by a period, equals-sign, or colon +are special inside a bracket expression and are used to delimit collating +symbols, equivalence class expressions, and character class expressions. +These symbols must be followed by a valid expression and the matching +terminating sequence +.Qq .] , +.Qq =] +or +.Qq :] , +as described in the following items. +.It +A +.Em matching list expression +specifies a list that matches any one of the expressions represented in the +list. +The first character in the list must not be the circumflex. +For example, +.Qq [abc] +is an RE that matches any of the characters +.Qq a , +.Qq b +or +.Qq c . +.It +A +.Em non-matching list expression +begins with a circumflex +.Pq Qq ^ , +and specifies a list that matches any character or collating element except for +the expressions represented in the list after the leading circumflex. +For example, +.Qq [^abc] +is an RE that matches any character or collating element except the characters +.Qq a , +.Qq b , +or +.Qq c . +The circumflex will have this special meaning only when it occurs first in the +list, immediately following the left-bracket. +.It +A +.Em collating symbol +is a collating element enclosed within bracket-period +.Pq Qq [..] +delimiters. +Multi-character collating elements must be represented as collating symbols when +it is necessary to distinguish them from a list of the individual characters +that make up the multi-character collating element. +For example, if the string +.Qq ch +is a collating element in the current collation sequence with the associated +collating symbol +.Qq Aq ch , +the expression +.Qq [[.ch.]] +will be treated as an RE matching the character sequence +.Qq ch , +while +.Qq [ch] +will be treated as an RE matching +.Qq c +or +.Qq h . +Collating symbols will be recognized only inside bracket expressions. +This implies that the RE +.Qq [[.ch.]]*c +matches the first to fifth character in the string +.Qq chchch. +If the string is not a collating element in the current collating sequence +definition, or if the collating element has no characters associated with it, +the symbol will be treated as an invalid expression. +.It +An +.Em equivalence class expression +represents the set of collating elements belonging to an equivalence class. +Only primary equivalence classes will be recognised. +The class is expressed by enclosing any one of the collating elements in the +equivalence class within bracket-equal +.Pq Qq [==] +delimiters. +For example, if +.Qq a +and +.Qq b +belong to the same equivalence class, then +.Qq [[=a=]b] , +.Qq [[==]a] +and +.Qq [[==]b] +will each be equivalent to +.Qq [ab] . +If the collating element does not belong to an equivalence class, the +equivalence class expression will be treated as a +.Em collating symbol . +.It +A +.Em character class expression +represents the set of characters belonging to a character class, as defined in +the +.Ev LC_CTYPE +category in the current locale. +All character classes specified in the current locale will be recognized. +A character class expression is expressed as a character class name enclosed +within bracket-colon +.Pq Qq [::] +delimiters. +.Pp +The following character class expressions are supported in all locales: +.Bl -column "[:alnum:]" "[:cntrl:]" "[:lower:]" "[:xdigit:]" +.It [:alnum:] Ta [:cntrl:] Ta [:lower:] Ta [:space:] +.It [:alpha:] Ta [:digit:] Ta [:print:] Ta [:upper:] +.It [:blank:] Ta [:graph:] Ta [:punct:] Ta [:xdigit:] +.El +.Pp +In addition, character class expressions of the form +.Qq [:name:] +are recognized in those locales where the +.Em name +keyword has been given a +.Em charclass +definition in the +.Ev LC_CTYPE +category. +.It +A +.Em range expression +represents the set of collating elements that fall between two elements in the +current collation sequence, inclusively. +It is expressed as the starting point and the ending point separated by a hyphen +.Pq Qq - . +.Pp +Range expressions must not be used in portable applications because their +behavior is dependent on the collating sequence. +Ranges will be treated according to the current collating sequence, and include +such characters that fall within the range based on that collating sequence, +regardless of character values. +This, however, means that the interpretation will differ depending on collating +sequence. +If, for instance, one collating sequence defines as a variant of +.Qq a , +while another defines it as a letter following +.Qq z , +then the expression +.Qq [-z] +is valid in the first language and invalid in the second. +.sp +In the following, all examples assume the collation sequence specified for the +POSIX locale, unless another collation sequence is specifically defined. +.Pp +The starting range point and the ending range point must be a collating element +or collating symbol. +An equivalence class expression used as a starting or ending point of a range +expression produces unspecified results. +An equivalence class can be used portably within a bracket expression, but only +outside the range. +For example, the unspecified expression +.Qq [[=e=]-f] +should be given as +.Qq [[=e=]e-f] . +The ending range point must collate equal to or higher than the starting range +point; otherwise, the expression will be treated as invalid. +The order used is the order in which the collating elements are specified in the +current collation definition. +One-to-many mappings +.Po see +.Xr locale 7 +.Pc +will not be performed. +For example, assuming that the character +.Qq eszet +is placed in the collation sequence after +.Qq r +and +.Qq s , +but before +.Qq t , +and that it maps to the sequence +.Qq ss +for collation purposes, then the expression +.Qq [r-s] +matches only +.Qq r +and +.Qq s , +but the expression +.Qq [s-t] +matches +.Qq s , +.Qq beta , +or +.Qq t . +.Pp +The interpretation of range expressions where the ending range point is also +the starting range point of a subsequent range expression +.Po for instance +.Qq [a-m-o] +.Pc +is undefined. +.Pp +The hyphen character will be treated as itself if it occurs first +.Po after an initial +.Qq ^ , +if any +.Pc +or last in the list, or as an ending range point in a range expression. +As examples, the expressions +.Qq [-ac] +and +.Qq [ac-] +are equivalent and match any of the characters +.Qq a , +.Qq c , +or +.Qq -; +.Qq [^-ac] +and +.Qq [^ac-] +are equivalent and match any characters except +.Qq a , +.Qq c , +or +.Qq -; +the expression +.Qq [%--] +matches any of the characters between +.Qq % +and +.Qq - +inclusive; the expression +.Qq [--@] +matches any of the characters between +.Qq - +and +.Qq @ +inclusive; and the expression +.Qq [a--@] +is invalid, because the letter +.Qq a +follows the symbol +.Qq - +in the POSIX locale. +To use a hyphen as the starting range point, it must either come first in the +bracket expression or be specified as a collating symbol, for example: +.Qq [][.-.]-0] , +which matches either a right bracket or any character or collating element that +collates between hyphen and 0, inclusive. +.Pp +If a bracket expression must specify both +.Qq - +and +.Qq \&] , +the +.Qq \&] +must be placed first +.Po after the +.Qq ^ , +if any +.Pc +and the +.Qq - +last within the bracket expression. +.El +.Pp +Note: Latin-1 characters such as +.Qq \(ga +or +.Qq ^ +are not printable in some locales, for example, the +.Em ja +locale. +.Ss BREs Matching Multiple Characters +The following rules can be used to construct BREs matching multiple characters +from BREs matching a single character: +.Bl -enum +.It +The concatenation of BREs matches the concatenation of the strings matched +by each component of the BRE. +.It +A +.Em subexpression +can be defined within a BRE by enclosing it between the character pairs +.Qq \e( +and +.Qq \e) . +Such a subexpression matches whatever it would have matched without the +.Qq \e( +and +.Qq \e) , +except that anchoring within subexpressions is optional behavior; see +.Sx BRE Expression Anchoring , +below. +Subexpressions can be arbitrarily nested. +.It +The +.Em back-reference +expression +.Qq \e Ns Em n +matches the same +.Pq possibly empty +string of characters as was matched by a subexpression enclosed between +.Qq \e( +and +.Qq \e) +preceding the +.Qq \e Ns Em n . +The character +.Qq Em n +must be a digit from 1 to 9 inclusive, +.Em n Ns th +subexpression +.Po the one that begins with the +.Em n Ns th +.Qq \e( +and ends with the corresponding paired +.Qq \e) +.Pc . +The expression is invalid if less than +.Em n +subexpressions precede the +.Qq \e Ns Em n . +For example, the expression +.Qq ^\e(.*\e)\e1$ +matches a line consisting of two adjacent appearances of the same string, and +the expression +.Qq \e(a\e)*\e1 +fails to match +.Qq a . +The limit of nine back-references to subexpressions in the RE is based on the +use of a single digit identifier. +This does not imply that only nine subexpressions are allowed in REs. +.It +When a BRE matching a single character, a subexpression or a back-reference is +followed by the special character asterisk +.Pq Qq * , +together with that asterisk it matches what zero or more consecutive occurrences +of the BRE would match. +For example, +.Qq [ab]* +and +.Qq [ab][ab] +are equivalent when matching the string +.Qq ab . +.It +When a BRE matching a single character, a subexpression, or a back-reference +is followed by an +.Em interval expression +of the format +.Qq \e{ Ns Em m Ns \e} , +.Qq \e{ Ns Em m Ns ,\e} +or +.Qq \e{ Ns Em m Ns \&, Ns Em n Ns \e} , +together with that interval expression it matches what repeated consecutive +occurrences of the BRE would match. +The values of +.Em m +and +.Em n +will be decimal integers in the range 0 <= +.Em m +<= +.Em n +<= +.Dv BRE_DUP_MAX , +where +.Em m +specifies the exact or minimum number of occurrences and +.Em n +specifies the maximum number of occurrences. +The expression +.Qq \e{ Ns Em m Ns \e} +matches exactly +.Em m +occurrences of the preceding BRE, +.Qq \e{ Ns Em m Ns ,\e} +matches at least +.Em m +occurrences and +.Qq \e{ Ns Em m Ns \&, Ns Em n Ns \e} +matches any number of occurrences between +.Em m +and +.Em n , +inclusive. +.Pp +For example, in the string +.Qq abababccccccd , +the BRE +.Qq c\e{3\e} +is matched by characters seven to nine, the BRE +.Qq \e(ab\e)\e{4,\e} +is not matched at all and the BRE +.Qq c\e{1,3\e}d +is matched by characters ten to thirteen. +.El +.Pp +The behavior of multiple adjacent duplication symbols +.Po Qq * +and intervals +.Pc +produces undefined results. +.Ss BRE Precedence +The order of precedence is as shown in the following table: +.Bl -column "BRE Precedence (from high to low)" "" +.It Sy BRE Precedence (from high to low) Ta +.It collation-related bracket symbols Ta [= =] [: :] [. .] +.It escaped characters Ta \e< Ns Em special character Ns > +.It bracket expression Ta [ ] +.It subexpressions/back-references Ta \e( \e) \e Ns Em n +.It single-character-BRE duplication Ta * \e{ Ns Em m Ns \&, Ns Em n Ns \e} +.It concatenation Ta +.It anchoring Ta ^ $ +.El +.Ss BRE Expression Anchoring +A BRE can be limited to matching strings that begin or end a line; this is +called +.Em anchoring . +The circumflex and dollar sign special characters will be considered BRE anchors +in the following contexts: +.Bl -enum +.It +A circumflex +.Pq Qq ^ +is an anchor when used as the first character of an entire BRE. +The implementation may treat circumflex as an anchor when used as the first +character of a subexpression. +The circumflex will anchor the expression to the beginning of a string; +only sequences starting at the first character of a string will be matched by +the BRE. +For example, the BRE +.Qq ^ab +matches +.Qq ab +in the string +.Qq abcdef , +but fails to match in the string +.Qq cdefab . +A portable BRE must escape a leading circumflex in a subexpression to match a +literal circumflex. +.It +A dollar sign +.Pq Qq $ +is an anchor when used as the last character of an entire BRE. +The implementation may treat a dollar sign as an anchor when used as the last +character of a subexpression. +The dollar sign will anchor the expression to the end of the string being +matched; the dollar sign can be said to match the end-of-string following the +last character. +.It +A BRE anchored by both +.Qq ^ +and +.Qq $ +matches only an entire string. +For example, the BRE +^abcdef$ +matches strings consisting only of +.Qq abcdef . +.It +.Qq ^ +and +.Qq $ +are not special in subexpressions. +.El +.Pp +Note: The Solaris implementation does not support anchoring in BRE +subexpressions. +.Sh EXTENDED REGULAR EXPRESSIONS +The rules specified for BREs apply to Extended Regular Expressions +.Pq EREs +with the following exceptions: +.Bl -bullet +.It +The characters +.Qq | , +.Qq + , +and +.Qq \&? +have special meaning, as defined below. +.It +The +.Qq { +and +.Qq } +characters, when used as the duplication operator, are not preceded by +backslashes. +The constructs +.Qq \e{ +and +.Qq \e} +simply match the characters +.Qq { +and +.Qq }, respectively. +.It +The back reference operator is not supported. +.It +Anchoring +.Pq Qq ^$ +is supported in subexpressions. +.El +.Ss EREs Matching a Single Character +An ERE ordinary character, a special character preceded by a backslash, or a +period matches a single character. +A bracket expression matches a single character or a single collating element. +An +.Em ERE matching a single character +enclosed in parentheses matches the same as the ERE without parentheses would +have matched. +.Ss ERE Ordinary Characters +An +.Em ordinary character +is an ERE that matches itself. +An ordinary character is any character in the supported character set, except +for the ERE special characters listed in +.Sx ERE Special Characters +below. +The interpretation of an ordinary character preceded by a backslash +.Pq Qq \&\e +is undefined. +.Ss ERE Special Characters +An +.Em ERE special character +has special properties in certain contexts. +Outside those contexts, or when preceded by a backslash, such a character is an +ERE that matches the special character itself. +The extended regular expression special characters and the contexts in which +they have their special meaning are: +.Bl -tag -width Ds +.It Sy \&. \&[ \&\e \&( +The period, left-bracket, backslash, and left-parenthesis are special except +when used in a bracket expression +.Po see +.Sx RE Bracket Expression , +above +.Pc . +Outside a bracket expression, a left-parenthesis immediately followed by a +right-parenthesis produces undefined results. +.It Sy \&) +The right-parenthesis is special when matched with a preceding +left-parenthesis, both outside a bracket expression. +.It Sy * + \&? { +The asterisk, plus-sign, question-mark, and left-brace are special except when +used in a bracket expression +.Po see +.Sx RE Bracket Expression , +above +.Pc . +Any of the following uses produce undefined results: +.Bl -bullet +.It +if these characters appear first in an ERE, or immediately following a +vertical-line, circumflex or left-parenthesis +.It +if a left-brace is not part of a valid interval expression. +.El +.It Sy \&| +The vertical-line is special except when used in a bracket expression +.Po see +.Sx RE Bracket Expression , +above +.Pc . +A vertical-line appearing first or last in an ERE, or immediately following a +vertical-line or a left-parenthesis, or immediately preceding a +right-parenthesis, produces undefined results. +.It Sy ^ +The circumflex is special when used: +.Bl -bullet +.It +as an anchor +.Po see +.Sx ERE Expression Anchoring , +below +.Pc . +.It +as the first character of a bracket expression +.Po see +.Sx RE Bracket Expression , +above +.Pc . +.El +.It Sy $ +The dollar sign is special when used as an anchor. +.El +.Ss Periods in EREs +A period +.Pq Qq \&. , +when used outside a bracket expression, is an ERE that matches any character in +the supported character set except NUL. +.Ss ERE Bracket Expression +The rules for ERE Bracket Expressions are the same as for Basic Regular +Expressions; see +.Sx RE Bracket Expression , +above. +.Ss EREs Matching Multiple Characters +The following rules will be used to construct EREs matching multiple characters +from EREs matching a single character: +.Bl -enum +.It +A +.Em concatenation of EREs +matches the concatenation of the character sequences matched by each component +of the ERE. +A concatenation of EREs enclosed in parentheses matches whatever the +concatenation without the parentheses matches. +For example, both the ERE +.Qq cd +and the ERE +.Qq (cd) +are matched by the third and fourth character of the string +.Qq abcdefabcdef . +.It +When an ERE matching a single character or an ERE enclosed in parentheses is +followed by the special character plus-sign +.Pq Qq + , +together with that plus-sign it matches what one or more consecutive occurrences +of the ERE would match. +For example, the ERE +.Qq b+(bc) +matches the fourth to seventh characters in the string +.Qq acabbbcde ; +.Qq [ab]+ +and +.Qq [ab][ab]* +are equivalent. +.It +When an ERE matching a single character or an ERE enclosed in parentheses is +followed by the special character asterisk +.Pq Qq * , +together with that asterisk it matches what zero or more consecutive occurrences +of the ERE would match. +For example, the ERE +.Qq b*c +matches the first character in the string +.Qq cabbbcde , +and the ERE +.Qq b*cd +matches the third to seventh characters in the string +.Qq cabbbcdebbbbbbcdbc . +And, +.Qq [ab]* +and +.Qq [ab][ab] +are equivalent when matching the string +.Qq ab . +.It +When an ERE matching a single character or an ERE enclosed in parentheses is +followed by the special character question-mark +.Pq Qq \&? , +together with that question-mark it matches what zero or one consecutive +occurrences of the ERE would match. +For example, the ERE +.Qq b?c +matches the second character in the string +.Qq acabbbcde . +.It +When an ERE matching a single character or an ERE enclosed in parentheses is +followed by an +.Em interval expression +of the format +.Qq { Ns Em m Ns } , +.Qq { Ns Em m Ns ,} +or +.Qq { Ns Em m Ns \&, Ns Em n Ns } , +together with that interval expression it matches what repeated consecutive +occurrences of the ERE would match. +The values of +.Em m +and +.Em n +will be decimal integers in the range 0 <= +.Em m +<= +.Em n +<= +.Dv RE_DUP_MAX , +where +.Em m +specifies the exact or minimum number of occurrences and +.Em n +specifies the maximum number of occurrences. +The expression +.Qq { Ns Em m Ns } +matches exactly +.Em m +occurrences of the preceding ERE, +.Qq { Ns Em m Ns ,} +matches at least +.Em m +occurrences and +.Qq { Ns m Ns \&, Ns Em n Ns } +matches any number of occurrences between +.Em m +and +.Em n , +inclusive. +.El +.Pp +For example, in the string +.Qq abababccccccd +the ERE +.Qq c{3} +is matched by characters seven to nine and the ERE +.Qq (ab){2,} +is matched by characters one to six. +.Pp +The behavior of multiple adjacent duplication symbols +.Po +.Qq + , +.Qq * , +.Qq \&? +and intervals +.Pc +produces undefined results. +.Ss ERE Alternation +Two EREs separated by the special character vertical-line +.Pq Qq | +match a string that is matched by either. +For example, the ERE +.Qq a((bc)|d) +matches the string +.Qq abc +and the string +.Qq ad . +Single characters, or expressions matching single characters, separated by the +vertical bar and enclosed in parentheses, will be treated as an ERE matching a +single character. +.Ss ERE Precedence +The order of precedence will be as shown in the following table: +.Bl -column "ERE Precedence (from high to low)" "" +.It Sy ERE Precedence (from high to low) Ta +.It collation-related bracket symbols Ta [= =] [: :] [. .] +.It escaped characters Ta \e< Ns Em special character Ns > +.It bracket expression Ta \&[ \&] +.It grouping Ta \&( \&) +.It single-character-ERE duplication Ta * + \&? { Ns Em m Ns \&, Ns Em n Ns} +.It concatenation Ta +.It anchoring Ta ^ $ +.It alternation Ta | +.El +.Pp +For example, the ERE +.Qq abba|cde +matches either the string +.Qq abba +or the string +.Qq cde +.Po rather than the string +.Qq abbade +or +.Qq abbcde , +because concatenation has a higher order of precedence than alternation +.Pc . +.Ss ERE Expression Anchoring +An ERE can be limited to matching strings that begin or end a line; this is +called +.Em anchoring . +The circumflex and dollar sign special characters are considered ERE anchors +when used anywhere outside a bracket expression. +This has the following effects: +.Bl -enum +.It +A circumflex +.Pq Qq ^ +outside a bracket expression anchors the expression or subexpression it begins +to the beginning of a string; such an expression or subexpression can match only +a sequence starting at the first character of a string. +For example, the EREs +.Qq ^ab +and +.Qq (^ab) +match +.Qq ab +in the string +.Qq abcdef , +but fail to match in the string +.Qq cdefab , +and the ERE +.Qq a^b +is valid, but can never match because the +.Qq a +prevents the expression +.Qq ^b +from matching starting at the first character. +.It +A dollar sign +.Pq Qq $ +outside a bracket expression anchors the expression or subexpression it ends to +the end of a string; such an expression or subexpression can match only a +sequence ending at the last character of a string. +For example, the EREs +.Qq ef$ +and +.Qq (ef$) +match +.Qq ef +in the string +.Qq abcdef , +but fail to match in the string +.Qq cdefab , +and the ERE +.Qq e$f +is valid, but can never match because the +.Qq f +prevents the expression +.Qq e$ +from matching ending at the last character. +.El +.Sh SEE ALSO +.Xr localedef 1 , +.Xr regcomp 3C , +.Xr attributes 7 , +.Xr environ 7 , +.Xr locale 7 , +.Xr regexp 7 |