diff options
Diffstat (limited to 'usr/src/man/man7/byteorder.7')
-rw-r--r-- | usr/src/man/man7/byteorder.7 | 270 |
1 files changed, 270 insertions, 0 deletions
diff --git a/usr/src/man/man7/byteorder.7 b/usr/src/man/man7/byteorder.7 new file mode 100644 index 0000000000..33ab61636c --- /dev/null +++ b/usr/src/man/man7/byteorder.7 @@ -0,0 +1,270 @@ +.\" +.\" This file and its contents are supplied under the terms of the +.\" Common Development and Distribution License ("CDDL"), version 1.0. +.\" You may only use this file in accordance with the terms of version +.\" 1.0 of the CDDL. +.\" +.\" A full copy of the text of the CDDL should have accompanied this +.\" source. A copy of the CDDL is also available via the Internet at +.\" http://www.illumos.org/license/CDDL. +.\" +.\" +.\" Copyright 2016 Joyent, Inc. +.\" +.Dd August 2, 2018 +.Dt BYTEORDER 7 +.Os +.Sh NAME +.Nm byteorder , +.Nm endian +.Nd byte order and endianness +.Sh DESCRIPTION +Integer values which occupy more than 1 byte in memory can be laid out +in different ways on different platforms. +In particular, there is a major split between those which place the least +significant byte of an integer at the lowest address, and those which place the +most significant byte there instead. +As this difference relates to which end of the integer is found in memory first, +the term +.Em endian +is used to refer to a particular byte order. +.Pp +A platform is referred to as using a +.Em big-endian +byte order when it places the most significant byte at the lowest +address, and +.Em little-endian +when it places the least significant byte first. +Some platforms may also switch between big- and little-endian mode and run code +compiled for either. +.Pp +Historically, there have also been some systems that utilized +.Em middle-endian +byte orders for integers larger than 2 bytes. +Such orderings are not in common use today. +.Pp +Endianness is also of particular importance when dealing with values +that are being read into memory from an external source. +For example, network protocols such as IP conventionally define the fields in a +packet as being always stored in big-endian byte order. +This means that a little-endian machine will have to perform transformations on +these fields in order to process them. +.Ss Examples +To illustrate endianness in memory, let us consider the decimal integer +2864434397. +This number fits in 32 bits of storage (4 bytes). +.Pp +On a big-endian system, this integer would be written into memory as +the bytes 0xAA, 0xBB, 0xCC, 0xDD, in order from lowest memory address to +highest. +.Pp +On a little-endian system, it would be written instead as the bytes +0xDD, 0xCC, 0xBB, 0xAA, in that order. +.Pp +If both the big- and little-endian systems were asked to store this +integer at address 0x100, we would see the following in each of their +memory: +.Bd -literal + + Big-Endian + + ++------++------++------++------++ + || 0xAA || 0xBB || 0xCC || 0xDD || + ++------++------++------++------++ + ^^ ^^ ^^ ^^ + 0x100 0x101 0x102 0x103 + vv vv vv vv + ++------++------++------++------++ + || 0xDD || 0xCC || 0xBB || 0xAA || + ++------++------++------++------++ + + Little-Endian +.Ed +.Pp +It is particularly important to note that even though the byte order is +different between these two machines, the bit ordering within each byte, +by convention, is still the same. +.Pp +For example, take the decimal integer 4660, which occupies in 16 bits (2 +bytes). +.Pp +On a big-endian system, this would be written into memory as 0x12, then +0x34. +.Pp +On a little-endian system, it would be written as 0x34, then 0x12. +Note that this is not at all the same as seeing 0x43 then 0x21 in memory -- +only the bytes are re-ordered, not any bits (or nybbles) within them. +.Pp +As before, storing this at address 0x100: +.Bd -literal + Big-Endian + + ++------++------++ + || 0x12 || 0x34 || + ++------++------++ + ^^ ^^ + 0x100 0x101 + vv vv + ++------++------++ + || 0x34 || 0x12 || + ++------++------++ + + Little-Endian +.Ed +.Pp +This example shows how an eight byte number, 0xBADCAFEDEADBEEF is stored +in both big and little-endian: +.Bd -literal + Big-Endian + + +------+------+------+------+------+------+------+------+ + | 0xBA | 0xDC | 0xAF | 0xFE | 0xDE | 0xAD | 0xBE | 0xEF | + +------+------+------+------+------+------+------+------+ + ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ + 0x100 0x101 0x102 0x103 0x104 0x105 0x106 0x107 + vv vv vv vv vv vv vv vv + +------+------+------+------+------+------+------+------+ + | 0xEF | 0xBE | 0xAD | 0xDE | 0xFE | 0xAF | 0xDC | 0xBA | + +------+------+------+------+------+------+------+------+ + + Little-Endian + +.Ed +.Pp +The treatment of different endian values would not be complete without +discussing +.Em PDP-endian , +which is also known as +.Em middle-endian . +While the PDP-11 was a 16-bit little-endian system, it laid out 32-bit +values in a different way from current little-endian systems. +First, it would divide a 32-bit number into two 16-bit numbers. +Each 16-bit number would be stored in little-endian; however, the two 16-bit +words would be stored with the larger 16-bit word appearing first in memory, +followed by the latter. +.Pp +The following image illustrates PDP-endian and compares it against +little-endian values. +Here, we'll start with the value 0xAABBCCDD and show how the four bytes for it +will be laid out, starting at 0x100. +.Bd -literal + PDP-Endian + + ++------++------++------++------++ + || 0xBB || 0xAA || 0xDD || 0xCC || + ++------++------++------++------++ + ^^ ^^ ^^ ^^ + 0x100 0x101 0x102 0x103 + vv vv vv vv + ++------++------++------++------++ + || 0xDD || 0xCC || 0xBB || 0xAA || + ++------++------++------++------++ + + Little-Endian + +.Ed +.Ss Network Byte Order +The term 'network byte order' refers to big-endian ordering, and +originates from the IEEE. +Early disagreements over which byte ordering to use for network traffic prompted +RFC1700 to define that all IETF-specified network protocols use big-endian +ordering unless noted explicitly otherwise. +The Internet protocol family (IP, and thus TCP and UDP etc) particularly adhere +to this convention. +.Ss Determining the System's Byte Order +The operating system supports both big-endian and little-endian CPUs. +To make it easier for programs to determine the endianness of the platform they +are being compiled for, functions and macro constants are provided in the system +header files. +.Pp +The endianness of the system can be obtained by including the header +.In sys/types.h +and using the pre-processor macros +.Sy _LITTLE_ENDIAN +and +.Sy _BIG_ENDIAN . +See +.Xr types.h 3HEAD +for more information. +.Pp +Additionally, the header +.In endian.h +defines an alternative means for determining the endianness of the +current system. +See +.Xr endian.h 3HEAD +for more information. +.Pp +illumos runs on both big- and little-endian systems. +When writing software for which the endianness is important, one must always +check the byte order and convert it appropriately. +.Ss Converting Between Byte Orders +The system provides two different sets of functions to convert values +between big-endian and little-endian. +They are defined in +.Xr byteorder 3C +and +.Xr endian 3C . +.Pp +The +.Xr byteorder 3C +family of functions convert data between the host's native byte order +and big- or little-endian. +The functions operate on either 16-bit, 32-bit, or 64-bit values. +Functions that convert from network byte order to the host's byte order +start with the string +.Sy ntoh , +while functions which convert from the host's byte order to network byte +order, begin with +.Sy hton . +For example, to convert a 32-bit value, a long, from network byte order +to the host's, one would use the function +.Xr ntohl 3C . +.Pp +These functions have been standardized by POSIX. +However, the 64-bit variants, +.Xr ntohll 3C +and +.Xr htonll 3C +are not standardized and may not be found on other systems. +For more information on these functions, see +.Xr byteorder 3C . +.Pp +The second family of functions, +.Xr endian 3C , +provide a means to convert between the host's byte order +and big-endian and little-endian specifically. +While these functions are similar to those in +.Xr byteorder 3C , +they more explicitly cover different data conversions. +Like them, these functions operate on either 16-bit, 32-bit, or 64-bit values. +When converting from big-endian, to the host's endianness, the functions +begin with +.Sy betoh . +If instead, one is converting data from the host's native endianness to +another, then it starts with +.Sy htobe . +When working with little-endian data, the prefixes +.Sy letoh +and +.Sy htole +convert little-endian data to the host's endianness and from the host's +to little-endian respectively. +.Pp +These functions are not standardized and the header they appear in varies +between the BSDs and GNU/Linux. +Applications that wish to be portable, should instead use the +.Xr byteorder 3C +functions. +.Pp +All of these functions in both families simply return their input when +the host's native byte order is the same as the desired order. +For example, when calling +.Xr htonl 3C +on a big-endian system the original data is returned with no conversion +or modification. +.Sh SEE ALSO +.Xr byteorder 3C , +.Xr endian 3C , +.Xr endian.h 3HEAD , +.Xr inet 3HEAD |