From 04b08da9af0c450d645ab7389d1467308cfc2db8 Mon Sep 17 00:00:00 2001 From: Michael Stapelberg Date: Mon, 4 Mar 2013 21:27:36 +0100 Subject: Imported Upstream version 1.1~hg20130304 --- doc/go_spec.html | 751 +++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 478 insertions(+), 273 deletions(-) (limited to 'doc/go_spec.html') diff --git a/doc/go_spec.html b/doc/go_spec.html index 90acc1704..0cb9f54b1 100644 --- a/doc/go_spec.html +++ b/doc/go_spec.html @@ -1,6 +1,6 @@ @@ -15,7 +15,6 @@ TODO [ ] need explicit language about the result type of operations [ ] should probably write something about evaluation order of statements even though obvious -[ ] review language on implicit dereferencing --> @@ -89,7 +88,8 @@ Source code is Unicode text encoded in canonicalized, so a single accented code point is distinct from the same character constructed from combining an accent and a letter; those are treated as two code points. For simplicity, this document -will use the term character to refer to a Unicode code point. +will use the unqualified term character to refer to a Unicode code point +in the source text.

Each code point is distinct; for instance, upper and lower case letters @@ -99,6 +99,11 @@ are different characters. Implementation restriction: For compatibility with other tools, a compiler may disallow the NUL character (U+0000) in the source text.

+

+Implementation restriction: For compatibility with other tools, a +compiler may ignore a UTF-8-encoded byte order mark +(U+FEFF) if it is the first Unicode code point in the source text. +

Characters

@@ -113,7 +118,7 @@ unicode_digit = /* a Unicode code point classified as "Decimal Digit" */ .

-In The Unicode Standard 6.0, +In The Unicode Standard 6.2, Section 4.5 "General Category" defines a set of character categories. Go treats those characters in category Lu, Ll, Lt, Lm, or Lo as Unicode letters, @@ -198,7 +203,7 @@ token is integer, floating-point, imaginary, - character, or + rune, or string literal @@ -360,13 +365,15 @@ imaginary_lit = (decimals | float_lit) "i" . -

Character literals

+

Rune literals

-A character literal represents a character constant, -typically a Unicode code point, as one or more characters enclosed in single -quotes. Within the quotes, any character may appear except single -quote and newline. A single quoted character represents itself, +A rune literal represents a rune constant, +an integer value identifying a Unicode code point. +A rune literal is expressed as one or more characters enclosed in single quotes. +Within the quotes, any character may appear except single +quote and newline. A single quoted character represents the Unicode value +of the character itself, while multi-character sequences beginning with a backslash encode values in various formats.

@@ -380,8 +387,8 @@ a literal a, Unicode U+0061, value 0x61, while a literal a-dieresis, U+00E4, value 0xe4.

-Several backslash escapes allow arbitrary values to be represented -as ASCII text. There are four ways to represent the integer value +Several backslash escapes allow arbitrary values to be encoded as +ASCII text. There are four ways to represent the integer value as a numeric constant: \x followed by exactly two hexadecimal digits; \u followed by exactly four hexadecimal digits; \U followed by exactly eight hexadecimal digits, and a @@ -409,14 +416,14 @@ After a backslash, certain single-character escapes represent special values: \t U+0009 horizontal tab \v U+000b vertical tab \\ U+005c backslash -\' U+0027 single quote (valid escape only within character literals) +\' U+0027 single quote (valid escape only within rune literals) \" U+0022 double quote (valid escape only within string literals)

-All other sequences starting with a backslash are illegal inside character literals. +All other sequences starting with a backslash are illegal inside rune literals.

-char_lit         = "'" ( unicode_value | byte_value ) "'" .
+rune_lit         = "'" ( unicode_value | byte_value ) "'" .
 unicode_value    = unicode_char | little_u_value | big_u_value | escaped_char .
 byte_value       = octal_byte_value | hex_byte_value .
 octal_byte_value = `\` octal_digit octal_digit octal_digit .
@@ -439,6 +446,11 @@ escaped_char     = `\` ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | `\` | "'" | `
 '\xff'
 '\u12e4'
 '\U00101234'
+'aa'         // illegal: too many characters
+'\xa'        // illegal: too few hexadecimal digits
+'\0'         // illegal: too few octal digits
+'\uDFFF'     // illegal: surrogate half
+'\U00110000' // illegal: invalid Unicode code point
 
@@ -453,7 +465,8 @@ raw string literals and interpreted string literals. Raw string literals are character sequences between back quotes ``. Within the quotes, any character is legal except back quote. The value of a raw string literal is the -string composed of the uninterpreted characters between the quotes; +string composed of the uninterpreted (implicitly UTF-8-encoded) characters +between the quotes; in particular, backslashes have no special meaning and the string may contain newlines. Carriage returns inside raw string literals @@ -464,8 +477,9 @@ Interpreted string literals are character sequences between double quotes "". The text between the quotes, which may not contain newlines, forms the value of the literal, with backslash escapes interpreted as they -are in character literals (except that \' is illegal and -\" is legal). The three-digit octal (\nnn) +are in rune literals (except that \' is illegal and +\" is legal), with the same restrictions. +The three-digit octal (\nnn) and two-digit hexadecimal (\xnn) escapes represent individual bytes of the resulting string; all other escapes represent the (possibly multi-byte) UTF-8 encoding of individual characters. @@ -492,6 +506,8 @@ interpreted_string_lit = `"` { unicode_value | byte_value } `"` . "日本語" "\u65e5本\U00008a9e" "\xff\u00FF" +"\uD800" // illegal: surrogate half +"\U00110000" // illegal: invalid Unicode code point

@@ -501,15 +517,15 @@ These examples all represent the same string:

 "日本語"                                 // UTF-8 input text
 `日本語`                                 // UTF-8 input text as a raw literal
-"\u65e5\u672c\u8a9e"                    // The explicit Unicode code points
-"\U000065e5\U0000672c\U00008a9e"        // The explicit Unicode code points
-"\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e"  // The explicit UTF-8 bytes
+"\u65e5\u672c\u8a9e"                    // the explicit Unicode code points
+"\U000065e5\U0000672c\U00008a9e"        // the explicit Unicode code points
+"\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e"  // the explicit UTF-8 bytes
 

If the source code represents a character as two code points, such as a combining form involving an accent and a letter, the result will be -an error if placed in a character literal (it is not a single code +an error if placed in a rune literal (it is not a single code point), and will appear as two code points if placed in a string literal.

@@ -518,7 +534,7 @@ literal.

Constants

There are boolean constants, -character constants, +rune constants, integer constants, floating-point constants, complex constants, and string constants. Character, integer, floating-point, @@ -528,7 +544,7 @@ collectively called numeric constants.

A constant value is represented by a -character, +rune, integer, floating-point, imaginary, @@ -622,14 +638,15 @@ expressions.

A type determines the set of values and operations specific to values of that -type. A type may be specified by a (possibly qualified) type name -(§Qualified identifier, §Type declarations) or a type literal, +type. A type may be specified by a +(possibly qualified) type name +(§Type declarations) or a type literal, which composes a new type from previously declared types.

 Type      = TypeName | TypeLit | "(" Type ")" .
-TypeName  = QualifiedIdent .
+TypeName  = identifier | QualifiedIdent .
 TypeLit   = ArrayType | StructType | PointerType | FunctionType | InterfaceType |
 	    SliceType | MapType | ChannelType .
 
@@ -646,7 +663,7 @@ type literals. The static type (or just type) of a variable is the type defined by its declaration. Variables of interface type also have a distinct dynamic type, which -is the actual type of the value stored in the variable at run-time. +is the actual type of the value stored in the variable at run time. The dynamic type may vary during execution but is always assignable to the static type of the interface variable. For non-interface @@ -764,19 +781,21 @@ particular architecture.

A string type represents the set of string values. -Strings behave like slices of bytes but are immutable: once created, +A string value is a (possibly empty) sequence of bytes. +Strings are immutable: once created, it is impossible to change the contents of a string. The predeclared string type is string. +

-The elements of strings have type byte and may be -accessed using the usual indexing operations. It is -illegal to take the address of such an element; if -s[i] is the ith byte of a -string, &s[i] is invalid. The length of string -s can be discovered using the built-in function -len. The length is a compile-time constant if s -is a string literal. +The length of a string s (its size in bytes) can be discovered using +the built-in function len. +The length is a compile-time constant if the string is a constant. +A string's bytes can be accessed by integer indices +0 through len(s)-1. +It is illegal to take the address of such an element; if +s[i] is the i'th byte of a +string, &s[i] is invalid.

@@ -796,12 +815,13 @@ ElementType = Type .

-The length is part of the array's type and must be a -constant expression that evaluates to a non-negative -integer value. The length of array a can be discovered -using the built-in function len(a). -The elements can be indexed by integer -indices 0 through len(a)-1Indexes). +The length is part of the array's type; it must evaluate to a non- +negative constant representable by a value +of type int. +The length of array a can be discovered +using the built-in function len. +The elements can be addressed by integer indices +0 through len(a)-1. Array types are always one-dimensional but may be composed to form multi-dimensional types.

@@ -830,9 +850,9 @@ SliceType = "[" "]" ElementType .

Like arrays, slices are indexable and have a length. The length of a slice s can be discovered by the built-in function -len(s); unlike with arrays it may change during -execution. The elements can be addressed by integer indices 0 -through len(s)-1Indexes). The slice index of a +len; unlike with arrays it may change during +execution. The elements can be addressed by integer indices +0 through len(s)-1. The slice index of a given element may be less than the index of the same element in the underlying array.

@@ -981,7 +1001,7 @@ promoted methods are included in the method set of the struct as follows: T. The method set of *S also includes promoted methods with receiver *T. - +
  • If S contains an anonymous field *T, the method sets of S and *S both @@ -994,7 +1014,7 @@ promoted methods are included in the method set of the struct as follows: A field declaration may be followed by an optional string literal tag, which becomes an attribute for all the fields in the corresponding field declaration. The tags are made -visible through a reflection interface +visible through a reflection interface but are otherwise ignored.

    @@ -1046,8 +1066,11 @@ ParameterDecl = [ IdentifierList ] [ "..." ] Type .

    Within a list of parameters or results, the names (IdentifierList) must either all be present or all be absent. If present, each name -stands for one item (parameter or result) of the specified type; if absent, each -type stands for one item of that type. Parameter and result +stands for one item (parameter or result) of the specified type and +all non-blank names in the signature +must be unique. +If absent, each type stands for one item of that type. +Parameter and result lists are always parenthesized except that if there is exactly one unnamed result it may be written as an unparenthesized type.

    @@ -1232,10 +1255,10 @@ map[string]interface{}

    The number of map elements is called its length. For a map m, it can be discovered using the -built-in function len(m) +built-in function len and may change during execution. Elements may be added during execution using assignments and retrieved with -index expressions; they may be removed with the +index expressions; they may be removed with the delete built-in function.

    @@ -1510,11 +1533,11 @@ Go is lexically scoped using blocks: or function (but not method) declared at top level (outside any function) is the package block.

  • -
  • The scope of an imported package identifier is the file block +
  • The scope of the package name of an imported package is the file block of the file containing the import declaration.
  • -
  • The scope of an identifier denoting a function parameter or - result variable is the function body.
  • +
  • The scope of an identifier denoting a method receiver, function parameter, + or result variable is the function body.
  • The scope of a constant or variable identifier declared inside a function begins at the end of the ConstSpec or VarSpec @@ -1897,7 +1920,7 @@ _, y, _ := coord(p) // coord() returns three values; only interested in y coord

    Unlike regular variable declarations, a short variable declaration may redeclare variables provided they -were originally declared in the same block with the same type, and at +were originally declared earlier in the same block with the same type, and at least one of the non-blank variables is new. As a consequence, redeclaration can only appear in a multi-variable short declaration. Redeclaration does not introduce a new @@ -1907,6 +1930,7 @@ variable; it just assigns a new value to the original.

     field1, offset := nextField(str, 0)
     field2, offset := nextField(str, offset)  // redeclares offset
    +a, a := 1, 2                              // illegal: double declaration of a or no new variable if a was declared elsewhere
     

    @@ -1969,8 +1993,15 @@ is visible only within selectors for that type.

    -For a base type, the non-blank names of -methods bound to it must be unique. +A non-blank receiver identifier must be +unique in the method signature. +If the receiver's value is not referenced inside the body of the method, +its identifier may be omitted in the declaration. The same applies in +general to parameters of functions and methods. +

    + +

    +For a base type, the non-blank names of methods bound to it must be unique. If the base type is a struct type, the non-blank method and field names must be distinct.

    @@ -1996,12 +2027,6 @@ with receiver type *Point, to the base type Point.

    -

    -If the receiver's value is not referenced inside the body of the method, -its identifier may be omitted in the declaration. The same applies in -general to parameters of functions and methods. -

    -

    The type of a method is the type of a function with the receiver as first argument. For instance, the method Scale has type @@ -2026,25 +2051,33 @@ operators and functions to operands.

    Operands

    -Operands denote the elementary values in an expression. +Operands denote the elementary values in an expression. An operand may be a +literal, a (possibly qualified) identifier +denoting a +constant, +variable, or +function, +a method expression yielding a function, +or a parenthesized expression.

    -Operand    = Literal | QualifiedIdent | MethodExpr | "(" Expression ")" .
    +Operand    = Literal | OperandName | MethodExpr | "(" Expression ")" .
     Literal    = BasicLit | CompositeLit | FunctionLit .
    -BasicLit   = int_lit | float_lit | imaginary_lit | char_lit | string_lit .
    +BasicLit   = int_lit | float_lit | imaginary_lit | rune_lit | string_lit .
    +OperandName = identifier | QualifiedIdent.
     
    -

    Qualified identifiers

    -A qualified identifier is a non-blank identifier -qualified by a package name prefix. +A qualified identifier is an identifier qualified with a package name prefix. +Both the package name and the identifier must not be +blank.

    -QualifiedIdent = [ PackageName "." ] identifier .
    +QualifiedIdent = PackageName "." identifier .
     

    @@ -2089,7 +2122,7 @@ The types of the expressions must be assignable to the respective field, element, and key types of the LiteralType; there is no additional conversion. The key is interpreted as a field name for struct literals, -an index expression for array and slice literals, and a key for map literals. +an index for array and slice literals, and a key for map literals. For map literals, all elements must have a key. It is an error to specify multiple elements with the same field name or constant key value. @@ -2101,18 +2134,18 @@ For struct literals the following rules apply: