1 files changed, 315 insertions, 0 deletions
diff --git a/doc/articles/gobs_of_data.html b/doc/articles/gobs_of_data.html
new file mode 100644
index 000000000..6b836b2c3
--- /dev/null
+++ b/doc/articles/gobs_of_data.html
@@ -0,0 +1,315 @@
+<!--{
+"Title": "Gobs of data",
+"Template": true
+}-->
+
+<p>
+To transmit a data structure across a network or to store it in a file, it must
+be encoded and then decoded again. There are many encodings available, of
+course: <a href="http://www.json.org/">JSON</a>,
+<a href="http://www.w3.org/XML/">XML</a>, Google's
+<a href="http://code.google.com/p/protobuf">protocol buffers</a>, and more.
+And now there's another, provided by Go's <a href="/pkg/encoding/gob/">gob</a>
+package.
+</p>
+
+<p>
+Why define a new encoding? It's a lot of work and redundant at that. Why not
+just use one of the existing formats? Well, for one thing, we do! Go has
+<a href="/pkg/">packages</a> supporting all the encodings just mentioned (the
+<a href="http://code.google.com/p/goprotobuf">protocol buffer package</a> is in
+a separate repository but it's one of the most frequently downloaded). And for
+many purposes, including communicating with tools and systems written in other
+languages, they're the right choice.
+</p>
+
+<p>
+But for a Go-specific environment, such as communicating between two servers
+written in Go, there's an opportunity to build something much easier to use and
+possibly more efficient.
+</p>
+
+<p>
+Gobs work with the language in a way that an externally-defined,
+language-independent encoding cannot. At the same time, there are lessons to be
+learned from the existing systems.
+</p>
+
+<p>
+<b>Goals</b>
+</p>
+
+<p>
+The gob package was designed with a number of goals in mind.
+</p>
+
+<p>
+First, and most obvious, it had to be very easy to use. First, because Go has
+reflection, there is no need for a separate interface definition language or
+"protocol compiler". The data structure itself is all the package should need
+to figure out how to encode and decode it. On the other hand, this approach
+means that gobs will never work as well with other languages, but that's OK:
+gobs are unashamedly Go-centric.
+</p>
+
+<p>
+Efficiency is also important. Textual representations, exemplified by XML and
+JSON, are too slow to put at the center of an efficient communications network.
+A binary encoding is necessary.
+</p>
+
+<p>
+Gob streams must be self-describing. Each gob stream, read from the beginning,
+contains sufficient information that the entire stream can be parsed by an
+agent that knows nothing a priori about its contents. This property means that
+you will always be able to decode a gob stream stored in a file, even long
+after you've forgotten what data it represents.
+</p>
+
+<p>
+There were also some things to learn from our experiences with Google protocol
+buffers.
+</p>
+
+<p>
+<b>Protocol buffer misfeatures</b>
+</p>
+
+<p>
+Protocol buffers had a major effect on the design of gobs, but have three
+features that were deliberately avoided. (Leaving aside the property that
+protocol buffers aren't self-describing: if you don't know the data definition
+used to encode a protocol buffer, you might not be able to parse it.)
+</p>
+
+<p>
+First, protocol buffers only work on the data type we call a struct in Go. You
+can't encode an integer or array at the top level, only a struct with fields
+inside it. That seems a pointless restriction, at least in Go. If all you want
+to send is an array of integers, why should you have to put it into a
+struct first?
+</p>
+
+<p>
+Next, a protocol buffer definition may specify that fields <code>T.x</code> and
+<code>T.y</code> are required to be present whenever a value of type
+<code>T</code> is encoded or decoded.  Although such required fields may seem
+like a good idea, they are costly to implement because the codec must maintain a
+separate data structure while encoding and decoding, to be able to report when
+required fields are missing.  They're also a maintenance problem. Over time, one
+may want to modify the data definition to remove a required field, but that may
+cause existing clients of the data to crash. It's better not to have them in the
+encoding at all.  (Protocol buffers also have optional fields. But if we don't
+have required fields, all fields are optional and that's that. There will be
+more to say about optional fields a little later.)
+</p>
+
+<p>
+The third protocol buffer misfeature is default values. If a protocol buffer
+omits the value for a "defaulted" field, then the decoded structure behaves as
+if the field were set to that value. This idea works nicely when you have
+getter and setter methods to control access to the field, but is harder to
+handle cleanly when the container is just a plain idiomatic struct. Required
+fields are also tricky to implement: where does one define the default values,
+what types do they have (is text UTF-8? uninterpreted bytes? how many bits in a
+float?) and despite the apparent simplicity, there were a number of
+complications in their design and implementation for protocol buffers. We
+decided to leave them out of gobs and fall back to Go's trivial but effective
+defaulting rule: unless you set something otherwise, it has the "zero value"
+for that type - and it doesn't need to be transmitted.
+</p>
+
+<p>
+So gobs end up looking like a sort of generalized, simplified protocol buffer.
+How do they work?
+</p>
+
+<p>
+<b>Values</b>
+</p>
+
+<p>
+The encoded gob data isn't about <code>int8</code>s and <code>uint16</code>s.
+Instead, somewhat analogous to constants in Go, its integer values are abstract,
+sizeless numbers, either signed or unsigned. When you encode an
+<code>int8</code>, its value is transmitted as an unsized, variable-length
+integer. When you encode an <code>int64</code>, its value is also transmitted as
+an unsized, variable-length integer. (Signed and unsigned are treated
+distinctly, but the same unsized-ness applies to unsigned values too.) If both
+have the value 7, the bits sent on the wire will be identical. When the receiver
+decodes that value, it puts it into the receiver's variable, which may be of
+arbitrary integer type. Thus an encoder may send a 7 that came from an
+<code>int8</code>, but the receiver may store it in an <code>int64</code>. This
+is fine: the value is an integer and as a long as it fits, everything works. (If
+it doesn't fit, an error results.) This decoupling from the size of the variable
+gives some flexibility to the encoding: we can expand the type of the integer
+variable as the software evolves, but still be able to decode old data.
+</p>
+
+<p>
+This flexibility also applies to pointers. Before transmission, all pointers are
+flattened. Values of type <code>int8</code>, <code>*int8</code>,
+<code>**int8</code>, <code>****int8</code>, etc. are all transmitted as an
+integer value, which may then be stored in <code>int</code> of any size, or
+<code>*int</code>, or <code>******int</code>, etc. Again, this allows for
+flexibility.
+</p>
+
+<p>
+Flexibility also happens because, when decoding a struct, only those fields
+that are sent by the encoder are stored in the destination. Given the value
+</p>
+
+{{code "/doc/progs/gobs1.go" `/type T/` `/STOP/`}}
+
+<p>
+the encoding of <code>t</code> sends only the 7 and 8. Because it's zero, the
+value of <code>Y</code> isn't even sent; there's no need to send a zero value.
+</p>
+
+<p>
+The receiver could instead decode the value into this structure:
+</p>
+
+{{code "/doc/progs/gobs1.go" `/type U/` `/STOP/`}}
+
+<p>
+and acquire a value of <code>u</code> with only <code>X</code> set (to the
+address of an <code>int8</code> variable set to 7); the <code>Z</code> field is
+ignored - where would you put it? When decoding structs, fields are matched by
+name and compatible type, and only fields that exist in both are affected. This
+simple approach finesses the "optional field" problem: as the type
+<code>T</code> evolves by adding fields, out of date receivers will still
+function with the part of the type they recognize. Thus gobs provide the
+important result of optional fields - extensibility - without any additional
+mechanism or notation.
+</p>
+
+<p>
+From integers we can build all the other types: bytes, strings, arrays, slices,
+maps, even floats. Floating-point values are represented by their IEEE 754
+floating-point bit pattern, stored as an integer, which works fine as long as
+you know their type, which we always do. By the way, that integer is sent in
+byte-reversed order because common values of floating-point numbers, such as
+small integers, have a lot of zeros at the low end that we can avoid
+transmitting.
+</p>
+
+<p>
+One nice feature of gobs that Go makes possible is that they allow you to define
+your own encoding by having your type satisfy the
+<a href="/pkg/encoding/gob/#GobEncoder">GobEncoder</a> and
+<a href="/pkg/encoding/gob/#GobDecoder">GobDecoder</a> interfaces, in a manner
+analogous to the <a href="/pkg/encoding/json/">JSON</a> package's
+<a href="/pkg/encoding/json/#Marshaler">Marshaler</a> and
+<a href="/pkg/encoding/json/#Unmarshaler">Unmarshaler</a> and also to the
+<a href="/pkg/fmt/#Stringer">Stringer</a> interface from
+<a href="/pkg/fmt/">package fmt</a>. This facility makes it possible to
+represent special features, enforce constraints, or hide secrets when you
+transmit data. See the <a href="/pkg/encoding/gob/">documentation</a> for
+details.
+</p>
+
+<p>
+<b>Types on the wire</b>
+</p>
+
+<p>
+The first time you send a given type, the gob package includes in the data
+stream a description of that type. In fact, what happens is that the encoder is
+used to encode, in the standard gob encoding format, an internal struct that
+describes the type and gives it a unique number. (Basic types, plus the layout
+of the type description structure, are predefined by the software for
+bootstrapping.) After the type is described, it can be referenced by its type
+number.
+</p>
+
+<p>
+Thus when we send our first type <code>T</code>, the gob encoder sends a
+description of <code>T</code> and tags it with a type number, say 127. All
+values, including the first, are then prefixed by that number, so a stream of
+<code>T</code> values looks like:
+</p>
+
+<pre>
+("define type id" 127, definition of type T)(127, T value)(127, T value), ...
+</pre>
+
+<p>
+These type numbers make it possible to describe recursive types and send values
+of those types. Thus gobs can encode types such as trees:
+</p>
+
+{{code "/doc/progs/gobs1.go" `/type Node/` `/STOP/`}}
+
+<p>
+(It's an exercise for the reader to discover how the zero-defaulting rule makes
+this work, even though gobs don't represent pointers.)
+</p>
+
+<p>
+With the type information, a gob stream is fully self-describing except for the
+set of bootstrap types, which is a well-defined starting point.
+</p>
+
+<p>
+<b>Compiling a machine</b>
+</p>
+
+<p>
+The first time you encode a value of a given type, the gob package builds a
+little interpreted machine specific to that data type. It uses reflection on
+the type to construct that machine, but once the machine is built it does not
+depend on reflection. The machine uses package unsafe and some trickery to
+convert the data into the encoded bytes at high speed. It could use reflection
+and avoid unsafe, but would be significantly slower. (A similar high-speed
+approach is taken by the protocol buffer support for Go, whose design was
+influenced by the implementation of gobs.) Subsequent values of the same type
+use the already-compiled machine, so they can be encoded right away.
+</p>
+
+<p>
+Decoding is similar but harder. When you decode a value, the gob package holds
+a byte slice representing a value of a given encoder-defined type to decode,
+plus a Go value into which to decode it. The gob package builds a machine for
+that pair: the gob type sent on the wire crossed with the Go type provided for
+decoding. Once that decoding machine is built, though, it's again a
+reflectionless engine that uses unsafe methods to get maximum speed.
+</p>
+
+<p>
+<b>Use</b>
+</p>
+
+<p>
+There's a lot going on under the hood, but the result is an efficient,
+easy-to-use encoding system for transmitting data. Here's a complete example
+showing differing encoded and decoded types. Note how easy it is to send and
+receive values; all you need to do is present values and variables to the
+<a href="/pkg/encoding/gob/">gob package</a> and it does all the work.
+</p>
+
+{{code "/doc/progs/gobs2.go" `/package main/` `$`}}
+
+<p>
+You can compile and run this example code in the
+<a href="http://play.golang.org/p/_-OJV-rwMq">Go Playground</a>.
+</p>
+
+<p>
+The <a href="/pkg/net/rpc/">rpc package</a> builds on gobs to turn this
+encode/decode automation into transport for method calls across the network.
+That's a subject for another article.
+</p>
+
+<p>
+<b>Details</b>
+</p>
+
+<p>
+The <a href="/pkg/encoding/gob/">gob package documentation</a>, especially the
+file <a href="/src/pkg/encoding/gob/doc.go">doc.go</a>, expands on many of the
+details described here and includes a full worked example showing how the
+encoding represents data. If you are interested in the innards of the gob
+implementation, that's a good place to start.
+</p>