1 files changed, 124 insertions, 33 deletions
diff --git a/doc/asm.html b/doc/asm.html
index d44cb799d..771c493cc 100644
--- a/doc/asm.html
+++ b/doc/asm.html
@@ -117,6 +117,9 @@ All user-defined symbols other than jump labels are written as offsets to these
 <p>
 The <code>SB</code> pseudo-register can be thought of as the origin of memory, so the symbol <code>foo(SB)</code>
 is the name <code>foo</code> as an address in memory.
+This form is used to name global functions and data.
+Adding <code>&lt;&gt;</code> to the name, as in <code>foo&lt;&gt;(SB)</code>, makes the name
+visible only in the current source file, like a top-level <code>static</code> declaration in a C file.
 </p>
 
 <p>
@@ -128,8 +131,11 @@ Thus <code>0(FP)</code> is the first argument to the function,
 When referring to a function argument this way, it is conventional to place the name
 at the beginning, as in <code>first_arg+0(FP)</code> and <code>second_arg+8(FP)</code>.
 Some of the assemblers enforce this convention, rejecting plain <code>0(FP)</code> and <code>8(FP)</code>.
-For assembly functions with Go prototypes, <code>go vet</code> will check that the argument names
+For assembly functions with Go prototypes, <code>go</code> <code>vet</code> will check that the argument names
 and offsets match.
+On 32-bit systems, the low and high 32 bits of a 64-bit value are distinguished by adding
+a <code>_lo</code> or <code>_hi</code> suffix to the name, as in <code>arg_lo+0(FP)</code> or <code>arg_hi+4(FP)</code>.
+If a Go prototype does not name its result, the expected assembly name is <code>ret</code>.
 </p>
 
 <p>
@@ -149,7 +155,7 @@ hardware's <code>SP</code> register.
 <p>
 Instructions, registers, and assembler directives are always in UPPER CASE to remind you
 that assembly programming is a fraught endeavor.
-(Exceptions: the <code>m</code> and <code>g</code> register renamings on ARM.)
+(Exception: the <code>g</code> register renaming on ARM.)
 </p>
 
 <p>
@@ -206,6 +212,8 @@ The frame size <code>$24-8</code> states that the function has a 24-byte frame
 and is called with 8 bytes of argument, which live on the caller's frame.
 If <code>NOSPLIT</code> is not specified for the <code>TEXT</code>,
 the argument size must be provided.
+For assembly functions with Go prototypes, <code>go</code> <code>vet</code> will check that the
+argument size is correct.
 </p>
 
 <p>
@@ -216,19 +224,20 @@ simple name <code>profileloop</code>.
 </p>
 
 <p>
-For <code>DATA</code> directives, the symbol is followed by a slash and the number
-of bytes the memory associated with the symbol occupies.
-The arguments are optional flags and the data itself.
-For instance,
-</p>
+Global data symbols are defined by a sequence of initializing
+<code>DATA</code> directives followed by a <code>GLOBL</code> directive.
+Each <code>DATA</code> directive initializes a section of the
+corresponding memory.
+The memory not explicitly initialized is zeroed.
+The general form of the <code>DATA</code> directive is
 
 <pre>
-DATA  runtime·isplan9(SB)/4, $1
+DATA	symbol+offset(SB)/width, value
 </pre>
 
 <p>
-declares the local symbol <code>runtime·isplan9</code> of size 4 and value 1.
-Again the symbol has the middle dot and is offset from <code>SB</code>.
+which initializes the symbol memory at the given offset and width with the given value.
+The <code>DATA</code> directives for a given symbol must be written with increasing offsets.
 </p>
 
 <p>
@@ -237,15 +246,26 @@ The arguments are optional flags and the size of the data being declared as a gl
 which will have initial value all zeros unless a <code>DATA</code> directive
 has initialized it.
 The <code>GLOBL</code> directive must follow any corresponding <code>DATA</code> directives.
-This example
+</p>
+
+<p>
+For example,
 </p>
 
 <pre>
-GLOBL runtime·tlsoffset(SB),$4
+DATA divtab&lt;&gt;+0x00(SB)/4, $0xf4f8fcff
+DATA divtab&lt;&gt;+0x04(SB)/4, $0xe6eaedf0
+...
+DATA divtab&lt;&gt;+0x3c(SB)/4, $0x81828384
+GLOBL divtab&lt;&gt;(SB), RODATA, $64
+
+GLOBL runtime·tlsoffset(SB), NOPTR, $4
 </pre>
 
 <p>
-declares <code>runtime·tlsoffset</code> to have size 4.
+declares and initializes <code>divtab&lt;&gt;</code>, a read-only 64-byte table of 4-byte integer values,
+and declares <code>runtime·tlsoffset</code>, a 4-byte, implicitly zeroed variable that
+contains no pointers.
 </p>
 
 <p>
@@ -253,7 +273,7 @@ There may be one or two arguments to the directives.
 If there are two, the first is a bit mask of flags,
 which can be written as numeric expressions, added or or-ed together,
 or can be set symbolically for easier absorption by a human.
-Their values, defined in the file <code>src/cmd/ld/textflag.h</code>, are:
+Their values, defined in the standard <code>#include</code>  file <code>textflag.h</code>, are:
 </p>
 
 <ul>
@@ -299,6 +319,80 @@ This is a wrapper function and should not count as disabling <code>recover</code
 </li>
 </ul>
 
+<h3 id="runtime">Runtime Coordination</h3>
+
+<p>
+For garbage collection to run correctly, the runtime must know the
+location of pointers in all global data and in most stack frames.
+The Go compiler emits this information when compiling Go source files,
+but assembly programs must define it explicitly.
+</p>
+
+<p>
+A data symbol marked with the <code>NOPTR</code> flag (see above)
+is treated as containing no pointers to runtime-allocated data.
+A data symbol with the <code>RODATA</code> flag
+is allocated in read-only memory and is therefore treated
+as implicitly marked <code>NOPTR</code>.
+A data symbol with a total size smaller than a pointer
+is also treated as implicitly marked <code>NOPTR</code>.
+It is not possible to define a symbol containing pointers in an assembly source file;
+such a symbol must be defined in a Go source file instead.
+Assembly source can still refer to the symbol by name
+even without <code>DATA</code> and <code>GLOBL</code> directives.
+A good general rule of thumb is to define all non-<code>RODATA</code>
+symbols in Go instead of in assembly.
+</p>
+
+<p>
+Each function also needs annotations giving the location of
+live pointers in its arguments, results, and local stack frame.
+For an assembly function with no pointer results and
+either no local stack frame or no function calls,
+the only requirement is to define a Go prototype for the function
+in a Go source file in the same package.
+For more complex situations, explicit annotation is needed.
+These annotations use pseudo-instructions defined in the standard
+<code>#include</code> file <code>funcdata.h</code>.
+</p>
+
+<p>
+If a function has no arguments and no results,
+the pointer information can be omitted.
+This is indicated by an argument size annotation of <code>$<i>n</i>-0</code>
+on the <code>TEXT</code> instruction.
+Otherwise, pointer information must be provided by
+a Go prototype for the function in a Go source file,
+even for assembly functions not called directly from Go.
+(The prototype will also let <code>go</code> <code>vet</code> check the argument references.)
+At the start of the function, the arguments are assumed
+to be initialized but the results are assumed uninitialized.
+If the results will hold live pointers during a call instruction,
+the function should start by zeroing the results and then 
+executing the pseudo-instruction <code>GO_RESULTS_INITIALIZED</code>.
+This instruction records that the results are now initialized
+and should be scanned during stack movement and garbage collection.
+It is typically easier to arrange that assembly functions do not
+return pointers or do not contain call instructions;
+no assembly functions in the standard library use
+<code>GO_RESULTS_INITIALIZED</code>.
+</p>
+
+<p>
+If a function has no local stack frame,
+the pointer information can be omitted.
+This is indicated by a local frame size annotation of <code>$0-<i>n</i></code>
+on the <code>TEXT</code> instruction.
+The pointer information can also be omitted if the
+function contains no call instructions.
+Otherwise, the local stack frame must not contain pointers,
+and the assembly must confirm this fact by executing the 
+pseudo-instruction <code>NO_LOCAL_POINTERS</code>.
+Because stack resizing is implemented by moving the stack,
+the stack pointer may change during any function call:
+even pointers to stack data must not be kept in local variables.
+</p>
+
 <h2 id="architectures">Architecture-specific details</h2>
 
 <p>
@@ -344,7 +438,7 @@ Here follows some descriptions of key Go-specific details for the supported arch
 <h3 id="x86">32-bit Intel 386</h3>
 
 <p>
-The runtime pointers to the <code>m</code> and <code>g</code> structures are maintained
+The runtime pointer to the <code>g</code> structure is maintained
 through the value of an otherwise unused (as far as Go is concerned) register in the MMU.
 A OS-dependent macro <code>get_tls</code> is defined for the assembler if the source includes
 an architecture-dependent header file, like this:
@@ -356,14 +450,15 @@ an architecture-dependent header file, like this:
 
 <p>
 Within the runtime, the <code>get_tls</code> macro loads its argument register
-with a pointer to a pair of words representing the <code>g</code> and <code>m</code> pointers.
+with a pointer to the <code>g</code> pointer, and the <code>g</code> struct
+contains the <code>m</code> pointer.
 The sequence to load <code>g</code> and <code>m</code> using <code>CX</code> looks like this:
 </p>
 
 <pre>
 get_tls(CX)
-MOVL	g(CX), AX	// Move g into AX.
-MOVL	m(CX), BX	// Move m into BX.
+MOVL	g(CX), AX     // Move g into AX.
+MOVL	g_m(AX), BX   // Move g->m into BX.
 </pre>
 
 <h3 id="amd64">64-bit Intel 386 (a.k.a. amd64)</h3>
@@ -376,22 +471,21 @@ pointers is the same as on the 386, except it uses <code>MOVQ</code> rather than
 
 <pre>
 get_tls(CX)
-MOVQ	g(CX), AX	// Move g into AX.
-MOVQ	m(CX), BX	// Move m into BX.
+MOVQ	g(CX), AX     // Move g into AX.
+MOVQ	g_m(AX), BX   // Move g->m into BX.
 </pre>
 
 <h3 id="arm">ARM</h3>
 
 <p>
-The registers <code>R9</code>, <code>R10</code>, and <code>R11</code>
+The registers <code>R10</code> and <code>R11</code>
 are reserved by the compiler and linker.
 </p>
 
 <p>
-<code>R9</code> and <code>R10</code> point to the <code>m</code> (machine) and <code>g</code>
-(goroutine) structures, respectively.
-Within assembler source code, these pointers must be referred to as <code>m</code> and <code>g</code>;
-the names <code>R9</code> and <code>R10</code> are not recognized.
+<code>R10</code> points to the <code>g</code> (goroutine) structure.
+Within assembler source code, this pointer must be referred to as <code>g</code>;
+the name <code>R10</code> is not recognized.
 </p>
 
 <p>
@@ -434,13 +528,10 @@ Here's how the 386 runtime defines the 64-bit atomic load function.
 // so actually
 // void atomicload64(uint64 *res, uint64 volatile *addr);
 TEXT runtime·atomicload64(SB), NOSPLIT, $0-8
-	MOVL	4(SP), BX
-	MOVL	8(SP), AX
-	// MOVQ (%EAX), %MM0
-	BYTE $0x0f; BYTE $0x6f; BYTE $0x00
-	// MOVQ %MM0, 0(%EBX)
-	BYTE $0x0f; BYTE $0x7f; BYTE $0x03
-	// EMMS
-	BYTE $0x0F; BYTE $0x77
+	MOVL	ptr+0(FP), AX
+	LEAL	ret_lo+4(FP), BX
+	BYTE $0x0f; BYTE $0x6f; BYTE $0x00	// MOVQ (%EAX), %MM0
+	BYTE $0x0f; BYTE $0x7f; BYTE $0x03	// MOVQ %MM0, 0(%EBX)
+	BYTE $0x0F; BYTE $0x77			// EMMS
 	RET
 </pre>