summaryrefslogtreecommitdiff
path: root/usr/src/man/man5/byteorder.5
diff options
context:
space:
mode:
Diffstat (limited to 'usr/src/man/man5/byteorder.5')
-rw-r--r--usr/src/man/man5/byteorder.5264
1 files changed, 264 insertions, 0 deletions
diff --git a/usr/src/man/man5/byteorder.5 b/usr/src/man/man5/byteorder.5
new file mode 100644
index 0000000000..4172008e97
--- /dev/null
+++ b/usr/src/man/man5/byteorder.5
@@ -0,0 +1,264 @@
+.\"
+.\" This file and its contents are supplied under the terms of the
+.\" Common Development and Distribution License ("CDDL"), version 1.0.
+.\" You may only use this file in accordance with the terms of version
+.\" 1.0 of the CDDL.
+.\"
+.\" A full copy of the text of the CDDL should have accompanied this
+.\" source. A copy of the CDDL is also available via the Internet at
+.\" http://www.illumos.org/license/CDDL.
+.\"
+.\"
+.\" Copyright 2016 Joyent, Inc.
+.\"
+.Dd January 31, 2016
+.Dt BYTEORDER 5
+.Os
+.Sh NAME
+.Nm byteorder ,
+.Nm endian
+.Nd byte order and endianness
+.Sh DESCRIPTION
+Integer values which occupy more than 1 byte in memory can be laid out
+in different ways on different platforms. In particular, there is a
+major split between those which place the least significant byte of an
+integer at the lowest address, and those which place the most
+significant byte there instead. As this difference relates to which
+end of the integer is found in memory first, the term
+.Em endian
+is used to refer to a particular byte order.
+.Pp
+A platform is referred to as using a
+.Em big-endian
+byte order when it places the most significant byte at the lowest
+address, and
+.Em little-endian
+when it places the least significant byte first. Some platforms may also
+switch between big- and little-endian mode and run code compiled for
+either.
+.Pp
+Historically, there have also been some systems that utilized
+.Em middle-endian
+byte orders for integers larger than 2 bytes. Such orderings are not in
+common use today.
+.Pp
+Endianness is also of particular importance when dealing with values
+that are being read into memory from an external source. For example,
+network protocols such as IP conventionally define the fields in a
+packet as being always stored in big-endian byte order. This means that
+a little-endian machine will have to perform transformations on these
+fields in order to process them.
+.Ss Examples
+To illustrate endianness in memory, let us consider the decimal integer
+2864434397. This number fits in 32 bits of storage (4 bytes).
+.Pp
+On a big-endian system, this integer would be written into memory as
+the bytes 0xAA, 0xBB, 0xCC, 0xDD, in order from lowest memory address to
+highest.
+.Pp
+On a little-endian system, it would be written instead as the bytes
+0xDD, 0xCC, 0xBB, 0xAA, in that order.
+.Pp
+If both the big- and little-endian systems were asked to store this
+integer at address 0x100, we would see the following in each of their
+memory:
+.Bd -literal
+
+ Big-Endian
+
+ ++------++------++------++------++
+ || 0xAA || 0xBB || 0xCC || 0xDD ||
+ ++------++------++------++------++
+ ^^ ^^ ^^ ^^
+ 0x100 0x101 0x102 0x103
+ vv vv vv vv
+ ++------++------++------++------++
+ || 0xDD || 0xCC || 0xBB || 0xAA ||
+ ++------++------++------++------++
+
+ Little-Endian
+.Ed
+.Pp
+It is particularly important to note that even though the byte order is
+different between these two machines, the bit ordering within each byte,
+by convention, is still the same.
+.Pp
+For example, take the decimal integer 4660, which occupies in 16 bits (2
+bytes).
+.Pp
+On a big-endian system, this would be written into memory as 0x12, then
+0x34.
+.Pp
+On a little-endian system, it would be written as 0x34, then 0x12. Note
+that this is not at all the same as seeing 0x43 then 0x21 in memory --
+only the bytes are re-ordered, not any bits (or nybbles) within them.
+.Pp
+As before, storing this at address 0x100:
+.Bd -literal
+ Big-Endian
+
+ ++------++------++
+ || 0x12 || 0x34 ||
+ ++------++------++
+ ^^ ^^
+ 0x100 0x101
+ vv vv
+ ++------++------++
+ || 0x34 || 0x12 ||
+ ++------++------++
+
+ Little-Endian
+.Ed
+.Pp
+This example shows how an eight byte number, 0xBADCAFEDEADBEEF is stored
+in both big and little-endian:
+.Bd -literal
+ Big-Endian
+
+ +------+------+------+------+------+------+------+------+
+ | 0xBA | 0xDC | 0xAF | 0xFE | 0xDE | 0xAD | 0xBE | 0xEF |
+ +------+------+------+------+------+------+------+------+
+ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^
+ 0x100 0x101 0x102 0x103 0x104 0x105 0x106 0x107
+ vv vv vv vv vv vv vv vv
+ +------+------+------+------+------+------+------+------+
+ | 0xEF | 0xBE | 0xAD | 0xDE | 0xFE | 0xAF | 0xDC | 0xBA |
+ +------+------+------+------+------+------+------+------+
+
+ Little-Endian
+
+.Ed
+.Pp
+The treatment of different endian values would not be complete without
+discussing
+.Em PDP-endian ,
+which is also known as
+.Em middle-endian .
+While the PDP-11 was a 16-bit little-endian system, it laid out 32-bit
+values in a different way from current little-endian systems. First, it
+would divide a 32-bit number into two 16-bit numbers. Each 16-bit number
+would be stored in little-endian; however, the two 16-bit words would be
+stored with the larger 16-bit word appearing first in memory, followed
+by the latter.
+.Pp
+The following image illustrates PDP-endian and compares it against
+little-endian values. Here, we'll start with the value 0xAABBCCDD and
+show how the four bytes for it will be laid out, starting at 0x100.
+.Bd -literal
+ PDP-Endian
+
+ ++------++------++------++------++
+ || 0xBB || 0xAA || 0xDD || 0xCC ||
+ ++------++------++------++------++
+ ^^ ^^ ^^ ^^
+ 0x100 0x101 0x102 0x103
+ vv vv vv vv
+ ++------++------++------++------++
+ || 0xDD || 0xCC || 0xBB || 0xAA ||
+ ++------++------++------++------++
+
+ Little-Endian
+
+.Ed
+.Ss Network Byte Order
+The term 'network byte order' refers to big-endian ordering, and
+originates from the IEEE. Early disagreements over which byte ordering
+to use for network traffic prompted RFC1700 to define that all
+IETF-specified network protocols use big-endian ordering unless noted
+explicitly otherwise. The Internet protocol family (IP, and thus TCP and
+UDP etc) particularly adhere to this convention.
+.Ss Determining the System's Byte Order
+The operating system supports both big-endian and little-endian CPUs. To
+make it easier for programs to determine the endianness of the
+platform they are being compiled for, functions and macro constants are
+provided in the system header files.
+.Pp
+The endianness of the system can be obtained by including the header
+.In sys/types.h
+and using the pre-processor macros
+.Sy _LITTLE_ENDIAN
+and
+.Sy _BIG_ENDIAN .
+See
+.Xr types.h 3HEAD
+for more information.
+.Pp
+Additionally, the header
+.In endian.h
+defines an alternative means for determining the endianness of the
+current system. See
+.Xr endian.h 3HEAD
+for more information.
+.Pp
+illumos runs on both big- and little-endian systems. When writing
+software for which the endianness is important, one must always check
+the byte order and convert it appropriately.
+.Ss Converting Between Byte Orders
+The system provides two different sets of functions to convert values
+between big-endian and little-endian. They are defined in
+.Xr byteorder 3C
+and
+.Xr endian 3C .
+.Pp
+The
+.Xr byteorder 3SOCKET
+family of functions convert data between the host's native byte order
+and big- or little-endian.
+The functions operate on either 16-bit, 32-bit, or 64-bit values.
+Functions that convert from network byte order to the host's byte order
+start with the string
+.Sy ntoh ,
+while functions which convert from the host's byte order to network byte
+order, begin with
+.Sy hton .
+For example, to convert a 32-bit value, a long, from network byte order
+to the host's, one would use the function
+.Xr ntohl 3SOCKET .
+.Pp
+These functions have been standardized by POSIX. However, the 64-bit variants,
+.Xr ntohll 3SOCKET
+and
+.Xr htonll 3SOCKET
+are not standardized and may not be found on other systems. For more
+information on these functions, see
+.Xr byteorder 3SOCKET .
+.Pp
+The second family of functions,
+.Xr endian 3C ,
+provide a means to convert between the host's byte order
+and big-endian and little-endian specifically. While these functions are
+similar to those in
+.Xr byteorder 3C ,
+they more explicitly cover different data conversions. Like them, these
+functions operate on either 16-bit, 32-bit, or 64-bit values. When
+converting from big-endian, to the host's endianness, the functions
+begin with
+.Sy betoh .
+If instead, one is converting data from the host's native endianness to
+another, then it starts with
+.Sy htobe .
+When working with little-endian data, the prefixes
+.Sy letoh
+and
+.Sy htole
+convert little-endian data to the host's endianness and from the host's
+to little-endian respectively.
+.Pp
+These functions
+are not standardized and the header they appear in varies between the
+BSDs and GNU/Linux. Applications that wish to be portable, should
+instead use the
+.Xr byteorder 3C
+functions.
+.Pp
+All of these functions in both families simply return their input when
+the host's native byte order is the same as the desired order. For
+example, when calling
+.Xr htonl 3SOCKET
+on a big-endian system the original data is returned with no conversion
+or modification.
+.Sh SEE ALSO
+.Xr endian 3C ,
+.Xr endian.h 3HEAD ,
+.Xr inet 3HEAD ,
+.Xr byteorder 3SOCKET