From 04b08da9af0c450d645ab7389d1467308cfc2db8 Mon Sep 17 00:00:00 2001
From: Michael Stapelberg
Each code point is distinct; for instance, upper and lower case letters @@ -99,6 +99,11 @@ are different characters. Implementation restriction: For compatibility with other tools, a compiler may disallow the NUL character (U+0000) in the source text.
++Implementation restriction: For compatibility with other tools, a +compiler may ignore a UTF-8-encoded byte order mark +(U+FEFF) if it is the first Unicode code point in the source text. +
-In The Unicode Standard 6.0, +In The Unicode Standard 6.2, Section 4.5 "General Category" defines a set of character categories. Go treats those characters in category Lu, Ll, Lt, Lm, or Lo as Unicode letters, @@ -198,7 +203,7 @@ token is integer, floating-point, imaginary, - character, or + rune, or string literal @@ -360,13 +365,15 @@ imaginary_lit = (decimals | float_lit) "i" . -
-A character literal represents a character constant, -typically a Unicode code point, as one or more characters enclosed in single -quotes. Within the quotes, any character may appear except single -quote and newline. A single quoted character represents itself, +A rune literal represents a rune constant, +an integer value identifying a Unicode code point. +A rune literal is expressed as one or more characters enclosed in single quotes. +Within the quotes, any character may appear except single +quote and newline. A single quoted character represents the Unicode value +of the character itself, while multi-character sequences beginning with a backslash encode values in various formats.
@@ -380,8 +387,8 @@ a literala
, Unicode U+0061, value 0x61
, while
a literal a
-dieresis, U+00E4, value 0xe4
.
-Several backslash escapes allow arbitrary values to be represented
-as ASCII text. There are four ways to represent the integer value
+Several backslash escapes allow arbitrary values to be encoded as
+ASCII text. There are four ways to represent the integer value
as a numeric constant: \x
followed by exactly two hexadecimal
digits; \u
followed by exactly four hexadecimal digits;
\U
followed by exactly eight hexadecimal digits, and a
@@ -409,14 +416,14 @@ After a backslash, certain single-character escapes represent special values:
\t U+0009 horizontal tab
\v U+000b vertical tab
\\ U+005c backslash
-\' U+0027 single quote (valid escape only within character literals)
+\' U+0027 single quote (valid escape only within rune literals)
\" U+0022 double quote (valid escape only within string literals)
-All other sequences starting with a backslash are illegal inside character literals. +All other sequences starting with a backslash are illegal inside rune literals.
-char_lit = "'" ( unicode_value | byte_value ) "'" . +rune_lit = "'" ( unicode_value | byte_value ) "'" . unicode_value = unicode_char | little_u_value | big_u_value | escaped_char . byte_value = octal_byte_value | hex_byte_value . octal_byte_value = `\` octal_digit octal_digit octal_digit . @@ -439,6 +446,11 @@ escaped_char = `\` ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | `\` | "'" | ` '\xff' '\u12e4' '\U00101234' +'aa' // illegal: too many characters +'\xa' // illegal: too few hexadecimal digits +'\0' // illegal: too few octal digits +'\uDFFF' // illegal: surrogate half +'\U00110000' // illegal: invalid Unicode code point@@ -453,7 +465,8 @@ raw string literals and interpreted string literals. Raw string literals are character sequences between back quotes
``
. Within the quotes, any character is legal except
back quote. The value of a raw string literal is the
-string composed of the uninterpreted characters between the quotes;
+string composed of the uninterpreted (implicitly UTF-8-encoded) characters
+between the quotes;
in particular, backslashes have no special meaning and the string may
contain newlines.
Carriage returns inside raw string literals
@@ -464,8 +477,9 @@ Interpreted string literals are character sequences between double
quotes ""
. The text between the quotes,
which may not contain newlines, forms the
value of the literal, with backslash escapes interpreted as they
-are in character literals (except that \'
is illegal and
-\"
is legal). The three-digit octal (\
nnn)
+are in rune literals (except that \'
is illegal and
+\"
is legal), with the same restrictions.
+The three-digit octal (\
nnn)
and two-digit hexadecimal (\x
nn) escapes represent individual
bytes of the resulting string; all other escapes represent
the (possibly multi-byte) UTF-8 encoding of individual characters.
@@ -492,6 +506,8 @@ interpreted_string_lit = `"` { unicode_value | byte_value } `"` .
"日本語"
"\u65e5本\U00008a9e"
"\xff\u00FF"
+"\uD800" // illegal: surrogate half
+"\U00110000" // illegal: invalid Unicode code point
@@ -501,15 +517,15 @@ These examples all represent the same string:
"日本語" // UTF-8 input text `日本語` // UTF-8 input text as a raw literal -"\u65e5\u672c\u8a9e" // The explicit Unicode code points -"\U000065e5\U0000672c\U00008a9e" // The explicit Unicode code points -"\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e" // The explicit UTF-8 bytes +"\u65e5\u672c\u8a9e" // the explicit Unicode code points +"\U000065e5\U0000672c\U00008a9e" // the explicit Unicode code points +"\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e" // the explicit UTF-8 bytes
If the source code represents a character as two code points, such as a combining form involving an accent and a letter, the result will be -an error if placed in a character literal (it is not a single code +an error if placed in a rune literal (it is not a single code point), and will appear as two code points if placed in a string literal.
@@ -518,7 +534,7 @@ literal.There are boolean constants, -character constants, +rune constants, integer constants, floating-point constants, complex constants, and string constants. Character, integer, floating-point, @@ -528,7 +544,7 @@ collectively called numeric constants.
A constant value is represented by a -character, +rune, integer, floating-point, imaginary, @@ -622,14 +638,15 @@ expressions.
A type determines the set of values and operations specific to values of that -type. A type may be specified by a (possibly qualified) type name -(§Qualified identifier, §Type declarations) or a type literal, +type. A type may be specified by a +(possibly qualified) type name +(§Type declarations) or a type literal, which composes a new type from previously declared types.
Type = TypeName | TypeLit | "(" Type ")" . -TypeName = QualifiedIdent . +TypeName = identifier | QualifiedIdent . TypeLit = ArrayType | StructType | PointerType | FunctionType | InterfaceType | SliceType | MapType | ChannelType .@@ -646,7 +663,7 @@ type literals. The static type (or just type) of a variable is the type defined by its declaration. Variables of interface type also have a distinct dynamic type, which -is the actual type of the value stored in the variable at run-time. +is the actual type of the value stored in the variable at run time. The dynamic type may vary during execution but is always assignable to the static type of the interface variable. For non-interface @@ -764,19 +781,21 @@ particular architecture.
A string type represents the set of string values.
-Strings behave like slices of bytes but are immutable: once created,
+A string value is a (possibly empty) sequence of bytes.
+Strings are immutable: once created,
it is impossible to change the contents of a string.
The predeclared string type is string
.
+
-The elements of strings have type byte
and may be
-accessed using the usual indexing operations. It is
-illegal to take the address of such an element; if
-s[i]
is the ith byte of a
-string, &s[i]
is invalid. The length of string
-s
can be discovered using the built-in function
-len
. The length is a compile-time constant if s
-is a string literal.
+The length of a string s
(its size in bytes) can be discovered using
+the built-in function len
.
+The length is a compile-time constant if the string is a constant.
+A string's bytes can be accessed by integer indices
+0 through len(s)-1
.
+It is illegal to take the address of such an element; if
+s[i]
is the i
'th byte of a
+string, &s[i]
is invalid.
-The length is part of the array's type and must be a
-constant expression that evaluates to a non-negative
-integer value. The length of array a
can be discovered
-using the built-in function len(a)
.
-The elements can be indexed by integer
-indices 0 through len(a)-1
(§Indexes).
+The length is part of the array's type; it must evaluate to a non-
+negative constant representable by a value
+of type int
.
+The length of array a
can be discovered
+using the built-in function len
.
+The elements can be addressed by integer indices
+0 through len(a)-1
.
Array types are always one-dimensional but may be composed to form
multi-dimensional types.
Like arrays, slices are indexable and have a length. The length of a
slice s
can be discovered by the built-in function
-len(s)
; unlike with arrays it may change during
-execution. The elements can be addressed by integer indices 0
-through len(s)-1
(§Indexes). The slice index of a
+len
; unlike with arrays it may change during
+execution. The elements can be addressed by integer indices
+0 through len(s)-1
. The slice index of a
given element may be less than the index of the same element in the
underlying array.
T
. The method set of *S
also
includes promoted methods with receiver *T
.
-
+
S
contains an anonymous field *T
,
the method sets of S
and *S
both
@@ -994,7 +1014,7 @@ promoted methods are included in the method set of the struct as follows:
A field declaration may be followed by an optional string literal tag,
which becomes an attribute for all the fields in the corresponding
field declaration. The tags are made
-visible through a reflection interface
+visible through a reflection interface
but are otherwise ignored.
@@ -1046,8 +1066,11 @@ ParameterDecl = [ IdentifierList ] [ "..." ] Type .
Within a list of parameters or results, the names (IdentifierList) must either all be present or all be absent. If present, each name -stands for one item (parameter or result) of the specified type; if absent, each -type stands for one item of that type. Parameter and result +stands for one item (parameter or result) of the specified type and +all non-blank names in the signature +must be unique. +If absent, each type stands for one item of that type. +Parameter and result lists are always parenthesized except that if there is exactly one unnamed result it may be written as an unparenthesized type.
@@ -1232,10 +1255,10 @@ map[string]interface{}
The number of map elements is called its length.
For a map m
, it can be discovered using the
-built-in function len(m)
+built-in function len
and may change during execution. Elements may be added during execution
using assignments and retrieved with
-index expressions; they may be removed with the
+index expressions; they may be removed with the
delete
built-in function.
@@ -1510,11 +1533,11 @@ Go is lexically scoped using blocks: or function (but not method) declared at top level (outside any function) is the package block.
Unlike regular variable declarations, a short variable declaration may redeclare variables provided they -were originally declared in the same block with the same type, and at +were originally declared earlier in the same block with the same type, and at least one of the non-blank variables is new. As a consequence, redeclaration can only appear in a multi-variable short declaration. Redeclaration does not introduce a new @@ -1907,6 +1930,7 @@ variable; it just assigns a new value to the original.
field1, offset := nextField(str, 0) field2, offset := nextField(str, offset) // redeclares offset +a, a := 1, 2 // illegal: double declaration of a or no new variable if a was declared elsewhere
@@ -1969,8 +1993,15 @@ is visible only within selectors for that type.
-For a base type, the non-blank names of -methods bound to it must be unique. +A non-blank receiver identifier must be +unique in the method signature. +If the receiver's value is not referenced inside the body of the method, +its identifier may be omitted in the declaration. The same applies in +general to parameters of functions and methods. +
+ ++For a base type, the non-blank names of methods bound to it must be unique. If the base type is a struct type, the non-blank method and field names must be distinct.
@@ -1996,12 +2027,6 @@ with receiver type*Point
,
to the base type Point
.
--If the receiver's value is not referenced inside the body of the method, -its identifier may be omitted in the declaration. The same applies in -general to parameters of functions and methods. -
-
The type of a method is the type of a function with the receiver as first
argument. For instance, the method Scale
has type
@@ -2026,25 +2051,33 @@ operators and functions to operands.
-Operands denote the elementary values in an expression. +Operands denote the elementary values in an expression. An operand may be a +literal, a (possibly qualified) identifier +denoting a +constant, +variable, or +function, +a method expression yielding a function, +or a parenthesized expression.
-Operand = Literal | QualifiedIdent | MethodExpr | "(" Expression ")" . +Operand = Literal | OperandName | MethodExpr | "(" Expression ")" . Literal = BasicLit | CompositeLit | FunctionLit . -BasicLit = int_lit | float_lit | imaginary_lit | char_lit | string_lit . +BasicLit = int_lit | float_lit | imaginary_lit | rune_lit | string_lit . +OperandName = identifier | QualifiedIdent.-
-A qualified identifier is a non-blank identifier -qualified by a package name prefix. +A qualified identifier is an identifier qualified with a package name prefix. +Both the package name and the identifier must not be +blank.
-QualifiedIdent = [ PackageName "." ] identifier . +QualifiedIdent = PackageName "." identifier .
@@ -2089,7 +2122,7 @@ The types of the expressions must be assignable to the respective field, element, and key types of the LiteralType; there is no additional conversion. The key is interpreted as a field name for struct literals, -an index expression for array and slice literals, and a key for map literals. +an index for array and slice literals, and a key for map literals. For map literals, all elements must have a key. It is an error to specify multiple elements with the same field name or constant key value. @@ -2101,18 +2134,18 @@ For struct literals the following rules apply:
T
,
elements that are themselves composite literals may elide the respective
literal type if it is identical to the element type of T
.
Similarly, elements that are addresses of composite literals may elide
-the &T
when the the element type is *T
.
+the &T
when the element type is *T
.
@@ -2315,7 +2348,6 @@ Point{1, 2}
m["foo"]
s[i : j + 1]
obj.color
-math.Sin
f.p[i].x()
@@ -2323,7 +2355,9 @@ f.p[i].x()
-A primary expression of the form
+For a primary expression x
+that is not a package name, the
+selector expression
@@ -2331,17 +2365,20 @@ x.f
-denotes the field or method f
of the value denoted by x
-(or sometimes *x
; see below). The identifier f
-is called the (field or method)
-selector; it must not be the blank identifier.
-The type of the expression is the type of f
.
+denotes the field or method f
of the value x
+(or sometimes *x
; see below).
+The identifier f
is called the (field or method) selector;
+it must not be the blank identifier.
+The type of the selector expression is the type of f
.
+If x
is a package name, see the section on
+qualified identifiers.
A selector f
may denote a field or method f
of
a type T
, or it may refer
-to a field or method f
of a nested anonymous field of
-T
.
+to a field or method f
of a nested
+anonymous field of T
.
The number of anonymous fields traversed
to reach f
is called its depth in T
.
The depth of a field or method f
@@ -2350,9 +2387,11 @@ The depth of a field or method f
declared in
an anonymous field A
in T
is the
depth of f
in A
plus one.
The following rules apply to selectors:
+x
of type T
or *T
@@ -2364,18 +2403,26 @@ If there is not exactly one f
<
with shallowest depth, the selector expression is illegal.
x
of type I
-where I
is an interface type,
-x.f
denotes the actual method with name f
of the value assigned
-to x
if there is such a method.
-If no value or nil
was assigned to x
, x.f
is illegal.
+For a variable x
of type I
where I
+is an interface type, x.f
denotes the actual method with name
+f
of the value assigned to x
.
+If there is no method with name f
in the
+method set of I
, the selector
+expression is illegal.
x.f
is illegal.
x
is of pointer or interface type and has the value
+nil
, assigning to, evaluating, or calling x.f
+causes a run-time panic.
+
-Selectors automatically dereference pointers to structs.
+Selectors automatically dereference
+pointers to structs.
If x
is a pointer to a struct, x.y
is shorthand for (*x).y
; if the field y
is also a pointer to a struct, x.y.z
is shorthand
@@ -2384,6 +2431,7 @@ If x
contains an anonymous field of type *A
,
where A
is also a struct type,
x.f
is a shortcut for (*x.A).f
.
For example, given the declarations:
@@ -2421,9 +2469,9 @@ p.z // (*p).z p.y // ((*p).T1).y p.x // (*(*p).T0).x -p.M2 // (*p).M2 -p.M1 // ((*p).T1).M1 -p.M0 // ((*p).T0).M0 +p.M2() // (*p).M2() +p.M1() // ((*p).T1).M1() +p.M0() // ((*p).T0).M0() @@ -2434,7 +2482,7 @@ TODO: Specify what happens to receivers. --> -
A primary expression of the form
@@ -2451,17 +2499,36 @@ The value x
is called the
rules apply:
+If a
is not a map:
+
x
must be an integer value; it is in range if 0 <= x < len(a)
,
+ otherwise it is out of rangeint
+
For a
of type A
or *A
-where A
is an array type,
-or for a
of type S
where S
is a slice type:
+where A
is an array type:
x
must be an integer value and 0 <= x < len(a)
a
is nil
or if x
is out of range at run time,
+ a run-time panic occursa[x]
is the array element at index x
and the type of
- a[x]
is the element type of A
a
is nil
or if the index x
is out of range,
- a run-time panic occursa[x]
is the element type of A
+For a
of type S
where S
is a slice type:
+
nil
or if x
is out of range at run time,
+ a run-time panic occursa[x]
is the slice element at index x
and the type of
+ a[x]
is the element type of S
@@ -2469,12 +2536,13 @@ For a
of type T
where T
is a string type:
x
must be an integer value and 0 <= x < len(a)
a
is also constantx
is out of range at run time,
+ a run-time panic occursa[x]
is the byte at index x
and the type of
- a[x]
is byte
a[x]
is byte
a[x]
may not be assigned tox
is out of range,
- a run-time panic occurs
@@ -2483,14 +2551,14 @@ where M
is a map type:
x
's type must be
- assignable
- to the key type of M
M
x
,
- a[x]
is the map value with key x
- and the type of a[x]
is the value type of M
a[x]
is the map value with key x
+ and the type of a[x]
is the value type of M
nil
or does not contain such an entry,
- a[x]
is the zero value
- for the value type of M
a[x]
is the zero value
+ for the value type of M
@@ -2533,9 +2601,9 @@ a[low : high]
-constructs a substring or slice. The index expressions low
and
+constructs a substring or slice. The indices low
and
high
select which elements appear in the result. The result has
-indexes starting at 0 and length equal to
+indices starting at 0 and length equal to
high
- low
.
After slicing the array a
-For convenience, any of the index expressions may be omitted. A missing low
+For convenience, any of the indices may be omitted. A missing low
index defaults to zero; a missing high
index defaults to the length of the
sliced operand:
-For arrays or strings, the indexes low
and high
must
-satisfy 0 <= low
<= high
<= length; for
-slices, the upper bound is the capacity rather than the length.
+For arrays or strings, the indices low
and high
are
+in range if 0
<= low
<= high
<= len(a)
,
+otherwise they are out of range.
+For slices, the upper index bound is the slice capacity cap(a)
rather than the length.
+A constant index must be non-negative and representable by a value of type
+int
.
+If both indices
+are constant, they must satisfy low <= high
. If a
is nil
+or if the indices are out of range at run time, a run-time panic occurs.
@@ -2601,19 +2675,33 @@ The notation x.(T)
is called a type assertion.
More precisely, if T
is not an interface type, x.(T)
asserts
that the dynamic type of x
is identical
to the type T
.
+In this case, T
must implement the (interface) type of x
;
+otherwise the type assertion is invalid since it is not possible for x
+to store a value of type T
.
If T
is an interface type, x.(T)
asserts that the dynamic type
-of x
implements the interface T
(§Interface types).
+of x
implements the interface T
.
If the type assertion holds, the value of the expression is the value
stored in x
and its type is T
. If the type assertion is false,
a run-time panic occurs.
In other words, even though the dynamic type of x
-is known only at run-time, the type of x.(T)
is
+is known only at run time, the type of x.(T)
is
known to be T
in a correct program.
+var x interface{} = 7 // x has dynamic type int and value 7 +i := x.(int) // i has type int and value 7 + +type I interface { m() } +var y I +s := y.(string) // illegal: string does not implement I (missing method m) +r := y.(io.Reader) // r has type io.Reader and y must implement both I and io.Reader ++
-If a type assertion is used in an assignment or initialization of the form +If a type assertion is used in an assignment or initialization of the form
@@ -2629,7 +2717,7 @@ otherwise, the expression returns(Z, false)
whereZ
is the zero value for typeT
. No run-time panic occurs in this case. The type assertion in this construct thus acts like a function call -returning a value and a boolean indicating success. (§Assignments) +returning a value and a boolean indicating success. @@ -2677,13 +2765,14 @@ causes a run-time panic.-As a special case, if the return parameters of a function or method +As a special case, if the return values of a function or method
g
are equal in number and individually assignable to the parameters of another function or methodf
, then the callf(g(parameters_of_g))
will invokef
after binding the return values ofg
to the parameters off
in order. The call -off
must contain no parameters other than the call ofg
. +off
must contain no parameters other than the call ofg
, +andg
must have at least one return value. Iff
has a final...
parameter, it is assigned the return values ofg
that remain after assignment of regular parameters. @@ -2834,8 +2923,8 @@ As a consequence, statement*p++
is the same as(*p)++
There are five precedence levels for binary operators. Multiplication operators bind strongest, followed by addition -operators, comparison operators,
&&
(logical and), -and finally||
(logical or): +operators, comparison operators,&&
(logical AND), +and finally||
(logical OR):@@ -2878,10 +2967,10 @@ to strings. All other arithmetic operators apply to integers only. / quotient integers, floats, complex values % remainder integers -& bitwise and integers -| bitwise or integers -^ bitwise xor integers -&^ bit clear (and not) integers +& bitwise AND integers +| bitwise OR integers +^ bitwise XOR integers +&^ bit clear (AND NOT) integers << left shift integer << unsigned integer >> right shift integer >> unsigned integer @@ -2938,10 +3027,11 @@ int64 -9223372036854775808-If the divisor is zero, a run-time panic occurs. -If the dividend is positive and the divisor is a constant power of 2, +If the divisor is a constant, it must not be zero. +If the divisor is zero at run time, a run-time panic occurs. +If the dividend is non-negative and the divisor is a constant power of 2, the division may be replaced by a right shift, and computing the remainder may -be replaced by a bitwise "and" operation: +be replaced by a bitwise AND operation:
@@ -2976,10 +3066,10 @@ follows:-For floating-point numbers, +For floating-point and complex numbers,
@@ -3142,9 +3232,9 @@ The right operand is evaluated conditionally.+x
is the same asx
, while-x
is the negation ofx
. -The result of a floating-point division by zero is not specified beyond the +The result of a floating-point or complex division by zero is not specified beyond the IEEE-754 standard; whether a run-time panic occurs is implementation-specific.-&& conditional and p && q is "if p then q else false" -|| conditional or p || q is "if p then true else q" -! not !p is "not p" +&& conditional AND p && q is "if p then q else false" +|| conditional OR p || q is "if p then true else q" +! NOT !p is "not p"@@ -3158,6 +3248,7 @@ that is, either a variable, pointer indirection, or slice indexing operation; or a field selector of an addressable struct operand; or an array indexing operation of an addressable array. As an exception to the addressability requirement,x
may also be a +(possibly parenthesized) composite literal.@@ -3171,6 +3262,7 @@ will cause a run-time panic.
&x &a[f(2)] +&Point{2, 3} *p *pf(x)@@ -3181,9 +3273,13 @@ will cause a run-time panic.For an operand
ch
of channel type, the value of the receive operation<-ch
is the value received -from the channelch
. The type of the value is the element type of -the channel. The expression blocks until a value is available. +from the channelch
. The channel direction must permit receive operations, +and the type of the receive operation is the element type of the channel. +The expression blocks until a value is available. Receiving from anil
channel blocks forever. +Receiving from a closed channel always succeeds, +immediately returning the element type's zero +value.@@ -3204,11 +3300,11 @@ var x, ok = <-ch-yields an additional result. -The boolean variable
ok
indicates whether -the received value was sent on the channel (true
) -or is a zero value returned -because the channel is closed and empty (false
). +yields an additional result of typebool
reporting whether the +communication succeeded. The value ofok
istrue
+if the value received was delivered by a successful send operation to the +channel, orfalse
if it is a zero value generated because the +channel is closed and empty.