diff options
Diffstat (limited to 'usr/src/man/man5/byteorder.5')
-rw-r--r-- | usr/src/man/man5/byteorder.5 | 264 |
1 files changed, 264 insertions, 0 deletions
diff --git a/usr/src/man/man5/byteorder.5 b/usr/src/man/man5/byteorder.5 new file mode 100644 index 0000000000..4172008e97 --- /dev/null +++ b/usr/src/man/man5/byteorder.5 @@ -0,0 +1,264 @@ +.\" +.\" This file and its contents are supplied under the terms of the +.\" Common Development and Distribution License ("CDDL"), version 1.0. +.\" You may only use this file in accordance with the terms of version +.\" 1.0 of the CDDL. +.\" +.\" A full copy of the text of the CDDL should have accompanied this +.\" source. A copy of the CDDL is also available via the Internet at +.\" http://www.illumos.org/license/CDDL. +.\" +.\" +.\" Copyright 2016 Joyent, Inc. +.\" +.Dd January 31, 2016 +.Dt BYTEORDER 5 +.Os +.Sh NAME +.Nm byteorder , +.Nm endian +.Nd byte order and endianness +.Sh DESCRIPTION +Integer values which occupy more than 1 byte in memory can be laid out +in different ways on different platforms. In particular, there is a +major split between those which place the least significant byte of an +integer at the lowest address, and those which place the most +significant byte there instead. As this difference relates to which +end of the integer is found in memory first, the term +.Em endian +is used to refer to a particular byte order. +.Pp +A platform is referred to as using a +.Em big-endian +byte order when it places the most significant byte at the lowest +address, and +.Em little-endian +when it places the least significant byte first. Some platforms may also +switch between big- and little-endian mode and run code compiled for +either. +.Pp +Historically, there have also been some systems that utilized +.Em middle-endian +byte orders for integers larger than 2 bytes. Such orderings are not in +common use today. +.Pp +Endianness is also of particular importance when dealing with values +that are being read into memory from an external source. For example, +network protocols such as IP conventionally define the fields in a +packet as being always stored in big-endian byte order. This means that +a little-endian machine will have to perform transformations on these +fields in order to process them. +.Ss Examples +To illustrate endianness in memory, let us consider the decimal integer +2864434397. This number fits in 32 bits of storage (4 bytes). +.Pp +On a big-endian system, this integer would be written into memory as +the bytes 0xAA, 0xBB, 0xCC, 0xDD, in order from lowest memory address to +highest. +.Pp +On a little-endian system, it would be written instead as the bytes +0xDD, 0xCC, 0xBB, 0xAA, in that order. +.Pp +If both the big- and little-endian systems were asked to store this +integer at address 0x100, we would see the following in each of their +memory: +.Bd -literal + + Big-Endian + + ++------++------++------++------++ + || 0xAA || 0xBB || 0xCC || 0xDD || + ++------++------++------++------++ + ^^ ^^ ^^ ^^ + 0x100 0x101 0x102 0x103 + vv vv vv vv + ++------++------++------++------++ + || 0xDD || 0xCC || 0xBB || 0xAA || + ++------++------++------++------++ + + Little-Endian +.Ed +.Pp +It is particularly important to note that even though the byte order is +different between these two machines, the bit ordering within each byte, +by convention, is still the same. +.Pp +For example, take the decimal integer 4660, which occupies in 16 bits (2 +bytes). +.Pp +On a big-endian system, this would be written into memory as 0x12, then +0x34. +.Pp +On a little-endian system, it would be written as 0x34, then 0x12. Note +that this is not at all the same as seeing 0x43 then 0x21 in memory -- +only the bytes are re-ordered, not any bits (or nybbles) within them. +.Pp +As before, storing this at address 0x100: +.Bd -literal + Big-Endian + + ++------++------++ + || 0x12 || 0x34 || + ++------++------++ + ^^ ^^ + 0x100 0x101 + vv vv + ++------++------++ + || 0x34 || 0x12 || + ++------++------++ + + Little-Endian +.Ed +.Pp +This example shows how an eight byte number, 0xBADCAFEDEADBEEF is stored +in both big and little-endian: +.Bd -literal + Big-Endian + + +------+------+------+------+------+------+------+------+ + | 0xBA | 0xDC | 0xAF | 0xFE | 0xDE | 0xAD | 0xBE | 0xEF | + +------+------+------+------+------+------+------+------+ + ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ + 0x100 0x101 0x102 0x103 0x104 0x105 0x106 0x107 + vv vv vv vv vv vv vv vv + +------+------+------+------+------+------+------+------+ + | 0xEF | 0xBE | 0xAD | 0xDE | 0xFE | 0xAF | 0xDC | 0xBA | + +------+------+------+------+------+------+------+------+ + + Little-Endian + +.Ed +.Pp +The treatment of different endian values would not be complete without +discussing +.Em PDP-endian , +which is also known as +.Em middle-endian . +While the PDP-11 was a 16-bit little-endian system, it laid out 32-bit +values in a different way from current little-endian systems. First, it +would divide a 32-bit number into two 16-bit numbers. Each 16-bit number +would be stored in little-endian; however, the two 16-bit words would be +stored with the larger 16-bit word appearing first in memory, followed +by the latter. +.Pp +The following image illustrates PDP-endian and compares it against +little-endian values. Here, we'll start with the value 0xAABBCCDD and +show how the four bytes for it will be laid out, starting at 0x100. +.Bd -literal + PDP-Endian + + ++------++------++------++------++ + || 0xBB || 0xAA || 0xDD || 0xCC || + ++------++------++------++------++ + ^^ ^^ ^^ ^^ + 0x100 0x101 0x102 0x103 + vv vv vv vv + ++------++------++------++------++ + || 0xDD || 0xCC || 0xBB || 0xAA || + ++------++------++------++------++ + + Little-Endian + +.Ed +.Ss Network Byte Order +The term 'network byte order' refers to big-endian ordering, and +originates from the IEEE. Early disagreements over which byte ordering +to use for network traffic prompted RFC1700 to define that all +IETF-specified network protocols use big-endian ordering unless noted +explicitly otherwise. The Internet protocol family (IP, and thus TCP and +UDP etc) particularly adhere to this convention. +.Ss Determining the System's Byte Order +The operating system supports both big-endian and little-endian CPUs. To +make it easier for programs to determine the endianness of the +platform they are being compiled for, functions and macro constants are +provided in the system header files. +.Pp +The endianness of the system can be obtained by including the header +.In sys/types.h +and using the pre-processor macros +.Sy _LITTLE_ENDIAN +and +.Sy _BIG_ENDIAN . +See +.Xr types.h 3HEAD +for more information. +.Pp +Additionally, the header +.In endian.h +defines an alternative means for determining the endianness of the +current system. See +.Xr endian.h 3HEAD +for more information. +.Pp +illumos runs on both big- and little-endian systems. When writing +software for which the endianness is important, one must always check +the byte order and convert it appropriately. +.Ss Converting Between Byte Orders +The system provides two different sets of functions to convert values +between big-endian and little-endian. They are defined in +.Xr byteorder 3C +and +.Xr endian 3C . +.Pp +The +.Xr byteorder 3SOCKET +family of functions convert data between the host's native byte order +and big- or little-endian. +The functions operate on either 16-bit, 32-bit, or 64-bit values. +Functions that convert from network byte order to the host's byte order +start with the string +.Sy ntoh , +while functions which convert from the host's byte order to network byte +order, begin with +.Sy hton . +For example, to convert a 32-bit value, a long, from network byte order +to the host's, one would use the function +.Xr ntohl 3SOCKET . +.Pp +These functions have been standardized by POSIX. However, the 64-bit variants, +.Xr ntohll 3SOCKET +and +.Xr htonll 3SOCKET +are not standardized and may not be found on other systems. For more +information on these functions, see +.Xr byteorder 3SOCKET . +.Pp +The second family of functions, +.Xr endian 3C , +provide a means to convert between the host's byte order +and big-endian and little-endian specifically. While these functions are +similar to those in +.Xr byteorder 3C , +they more explicitly cover different data conversions. Like them, these +functions operate on either 16-bit, 32-bit, or 64-bit values. When +converting from big-endian, to the host's endianness, the functions +begin with +.Sy betoh . +If instead, one is converting data from the host's native endianness to +another, then it starts with +.Sy htobe . +When working with little-endian data, the prefixes +.Sy letoh +and +.Sy htole +convert little-endian data to the host's endianness and from the host's +to little-endian respectively. +.Pp +These functions +are not standardized and the header they appear in varies between the +BSDs and GNU/Linux. Applications that wish to be portable, should +instead use the +.Xr byteorder 3C +functions. +.Pp +All of these functions in both families simply return their input when +the host's native byte order is the same as the desired order. For +example, when calling +.Xr htonl 3SOCKET +on a big-endian system the original data is returned with no conversion +or modification. +.Sh SEE ALSO +.Xr endian 3C , +.Xr endian.h 3HEAD , +.Xr inet 3HEAD , +.Xr byteorder 3SOCKET |