diff options
Diffstat (limited to 'doc/articles/gos_declaration_syntax.html')
-rw-r--r-- | doc/articles/gos_declaration_syntax.html | 348 |
1 files changed, 348 insertions, 0 deletions
diff --git a/doc/articles/gos_declaration_syntax.html b/doc/articles/gos_declaration_syntax.html new file mode 100644 index 000000000..455cced1d --- /dev/null +++ b/doc/articles/gos_declaration_syntax.html @@ -0,0 +1,348 @@ +<!--{ +"Title": "Go's Declaration Syntax" +}--> + +<p> +Newcomers to Go wonder why the declaration syntax is different from the +tradition established in the C family. In this post we'll compare the +two approaches and explain why Go's declarations look as they do. +</p> + +<p> +<b>C syntax</b> +</p> + +<p> +First, let's talk about C syntax. C took an unusual and clever approach +to declaration syntax. Instead of describing the types with special +syntax, one writes an expression involving the item being declared, and +states what type that expression will have. Thus +</p> + +<pre> +int x; +</pre> + +<p> +declares x to be an int: the expression 'x' will have type int. In +general, to figure out how to write the type of a new variable, write an +expression involving that variable that evaluates to a basic type, then +put the basic type on the left and the expression on the right. +</p> + +<p> +Thus, the declarations +</p> + +<pre> +int *p; +int a[3]; +</pre> + +<p> +state that p is a pointer to int because '*p' has type int, and that a +is an array of ints because a[3] (ignoring the particular index value, +which is punned to be the size of the array) has type int. +</p> + +<p> +What about functions? Originally, C's function declarations wrote the +types of the arguments outside the parens, like this: +</p> + +<pre> +int main(argc, argv) + int argc; + char *argv[]; +{ /* ... */ } +</pre> + +<p> +Again, we see that main is a function because the expression main(argc, +argv) returns an int. In modern notation we'd write +</p> + +<pre> +int main(int argc, char *argv[]) { /* ... */ } +</pre> + +<p> +but the basic structure is the same. +</p> + +<p> +This is a clever syntactic idea that works well for simple types but can +get confusing fast. The famous example is declaring a function pointer. +Follow the rules and you get this: +</p> + +<pre> +int (*fp)(int a, int b); +</pre> + +<p> +Here, fp is a pointer to a function because if you write the expression +(*fp)(a, b) you'll call a function that returns int. What if one of fp's +arguments is itself a function? +</p> + +<pre> +int (*fp)(int (*ff)(int x, int y), int b) +</pre> + +<p> +That's starting to get hard to read. +</p> + +<p> +Of course, we can leave out the name of the parameters when we declare a +function, so main can be declared +</p> + +<pre> +int main(int, char *[]) +</pre> + +<p> +Recall that argv is declared like this, +</p> + +<pre> +char *argv[] +</pre> + +<p> +so you drop the name from the <em>middle</em> of its declaration to construct +its type. It's not obvious, though, that you declare something of type +char *[] by putting its name in the middle. +</p> + +<p> +And look what happens to fp's declaration if you don't name the +parameters: +</p> + +<pre> +int (*fp)(int (*)(int, int), int) +</pre> + +<p> +Not only is it not obvious where to put the name inside +</p> + +<pre> +int (*)(int, int) +</pre> + +<p> +it's not exactly clear that it's a function pointer declaration at all. +And what if the return type is a function pointer? +</p> + +<pre> +int (*(*fp)(int (*)(int, int), int))(int, int) +</pre> + +<p> +It's hard even to see that this declaration is about fp. +</p> + +<p> +You can construct more elaborate examples but these should illustrate +some of the difficulties that C's declaration syntax can introduce. +</p> + +<p> +There's one more point that needs to be made, though. Because type and +declaration syntax are the same, it can be difficult to parse +expressions with types in the middle. This is why, for instance, C casts +always parenthesize the type, as in +</p> + +<pre> +(int)M_PI +</pre> + +<p> +<b>Go syntax</b> +</p> + +<p> +Languages outside the C family usually use a distinct type syntax in +declarations. Although it's a separate point, the name usually comes +first, often followed by a colon. Thus our examples above become +something like (in a fictional but illustrative language) +</p> + +<pre> +x: int +p: pointer to int +a: array[3] of int +</pre> + +<p> +These declarations are clear, if verbose - you just read them left to +right. Go takes its cue from here, but in the interests of brevity it +drops the colon and removes some of the keywords: +</p> + +<pre> +x int +p *int +a [3]int +</pre> + +<p> +There is no direct correspondence between the look of [3]int and how to +use a in an expression. (We'll come back to pointers in the next +section.) You gain clarity at the cost of a separate syntax. +</p> + +<p> +Now consider functions. Let's transcribe the declaration for main, even +though the main function in Go takes no arguments: +</p> + +<pre> +func main(argc int, argv *[]byte) int +</pre> + +<p> +Superficially that's not much different from C, but it reads well from +left to right: +</p> + +<p> +<em>function main takes an int and a pointer to a slice of bytes and returns an int.</em> +</p> + +<p> +Drop the parameter names and it's just as clear - they're always first +so there's no confusion. +</p> + +<pre> +func main(int, *[]byte) int +</pre> + +<p> +One value of this left-to-right style is how well it works as the types +become more complex. Here's a declaration of a function variable +(analogous to a function pointer in C): +</p> + +<pre> +f func(func(int,int) int, int) int +</pre> + +<p> +Or if f returns a function: +</p> + +<pre> +f func(func(int,int) int, int) func(int, int) int +</pre> + +<p> +It still reads clearly, from left to right, and it's always obvious +which name is being declared - the name comes first. +</p> + +<p> +The distinction between type and expression syntax makes it easy to +write and invoke closures in Go: +</p> + +<pre> +sum := func(a, b int) int { return a+b } (3, 4) +</pre> + +<p> +<b>Pointers</b> +</p> + +<p> +Pointers are the exception that proves the rule. Notice that in arrays +and slices, for instance, Go's type syntax puts the brackets on the left +of the type but the expression syntax puts them on the right of the +expression: +</p> + +<pre> +var a []int +x = a[1] +</pre> + +<p> +For familiarity, Go's pointers use the * notation from C, but we could +not bring ourselves to make a similar reversal for pointer types. Thus +pointers work like this +</p> + +<pre> +var p *int +x = *p +</pre> + +<p> +We couldn't say +</p> + +<pre> +var p *int +x = p* +</pre> + +<p> +because that postfix * would conflate with multiplication. We could have +used the Pascal ^, for example: +</p> + +<pre> +var p ^int +x = p^ +</pre> + +<p> +and perhaps we should have (and chosen another operator for xor), +because the prefix asterisk on both types and expressions complicates +things in a number of ways. For instance, although one can write +</p> + +<pre> +[]int("hi") +</pre> + +<p> +as a conversion, one must parenthesize the type if it starts with a *: +</p> + +<pre> +(*int)(nil) +</pre> + +<p> +Had we been willing to give up * as pointer syntax, those parentheses +would be unnecessary. +</p> + +<p> +So Go's pointer syntax is tied to the familiar C form, but those ties +mean that we cannot break completely from using parentheses to +disambiguate types and expressions in the grammar. +</p> + +<p> +Overall, though, we believe Go's type syntax is easier to understand +than C's, especially when things get complicated. +</p> + +<p> +<b>Notes</b> +</p> + +<p> +Go's declarations read left to right. It's been pointed out that C's +read in a spiral! See <a href="http://c-faq.com/decl/spiral.anderson.html"> +The "Clockwise/Spiral Rule"</a> by David Anderson. +</p> |