diff options
Diffstat (limited to 'usr/src/man/man4/ctf.4')
-rw-r--r-- | usr/src/man/man4/ctf.4 | 1215 |
1 files changed, 0 insertions, 1215 deletions
diff --git a/usr/src/man/man4/ctf.4 b/usr/src/man/man4/ctf.4 deleted file mode 100644 index 7e00f8d99a..0000000000 --- a/usr/src/man/man4/ctf.4 +++ /dev/null @@ -1,1215 +0,0 @@ -.\" -.\" This file and its contents are supplied under the terms of the -.\" Common Development and Distribution License ("CDDL"), version 1.0. -.\" You may only use this file in accordance with the terms of version -.\" 1.0 of the CDDL. -.\" -.\" A full copy of the text of the CDDL should have accompanied this -.\" source. A copy of the CDDL is also available via the Internet at -.\" http://www.illumos.org/license/CDDL. -.\" -.\" -.\" Copyright (c) 2014 Joyent, Inc. -.\" -.Dd Sep 26, 2014 -.Dt CTF 4 -.Os -.Sh NAME -.Nm ctf -.Nd Compact C Type Format -.Sh SYNOPSIS -.In sys/ctf.h -.Sh DESCRIPTION -.Nm -is designed to be a compact representation of the C programming -language's type information focused on serving the needs of dynamic -tracing, debuggers, and other in-situ and post-mortem introspection -tools. -.Nm -data is generally included in -.Sy ELF -objects and is tagged as -.Sy SHT_PROGBITS -to ensure that the data is accessible in a running process and in subsequent -core dumps, if generated. -.Lp -The -.Nm -data contained in each file has information about the layout and -sizes of C types, including intrinsic types, enumerations, structures, -typedefs, and unions, that are used by the corresponding -.Sy ELF -object. -The -.Nm -data may also include information about the types of global objects and -the return type and arguments of functions in the symbol table. -.Lp -Because a -.Nm -file is often embedded inside a file, rather than being a standalone -file itself, it may also be referred to as a -.Nm -.Sy container . -.Lp -On illumos systems, -.Nm -data is consumed by multiple programs. -It can be used by the modular debugger, -.Xr mdb 1 , -as well as by -.Xr dtrace 1M . -Programmatic access to -.Nm -data can be obtained through -.Xr libctf 3LIB . -.Lp -The -.Nm -file format is broken down into seven different sections. -The first section is the -.Sy preamble -and -.Sy header , -which describes the version of the -.Nm -file, links it has to other -.Nm -files, and the sizes of the other sections. -The next section is the -.Sy label -section, -which provides a way of identifying similar groups of -.Nm -data across multiple files. -This is followed by the -.Sy object -information section, which describes the type of global -symbols. -The subsequent section is the -.Sy function -information section, which describes the return -types and arguments of functions. -The next section is the -.Sy type -information section, which describes -the format and layout of the C types themselves, and finally the last -section is the -.Sy string -section, which contains the names of types, enumerations, members, and -labels. -.Lp -While strictly speaking, only the -.Sy preamble -and -.Sy header -are required, to be actually useful, both the type and string -sections are necessary. -.Lp -A -.Nm -file may contain all of the type information that it requires, or it -may optionally refer to another -.Nm -file which holds the remaining types. -When a -.Nm -file refers to another file, it is called the -.Sy child -and the file it refers to is called the -.Sy parent . -A given file may only refer to one parent. -This process is called -.Em uniquification -because it ensures each child only has type information that is -unique to it. -A common example of this is that most kernel modules in illumos are uniquified -against the kernel module -.Sy genunix -and the type information that comes from the -.Sy IP -module. -This means that a module only has types that are unique to itself and the most -common types in the kernel are not duplicated. -.Sh FILE FORMAT -This documents version -.Em two -of the -.Nm -file format. -All applications and tools currently produce and operate on this version. -.Lp -The file format can be summarized with the following image, the -following sections will cover this in more detail. -.Bd -literal - - +-------------+ 0t0 -+--------| Preamble | -| +-------------+ 0t4 -|+-------| Header | -|| +-------------+ 0t36 + cth_lbloff -||+------| Labels | -||| +-------------+ 0t36 + cth_objtoff -|||+-----| Objects | -|||| +-------------+ 0t36 + cth_funcoff -||||+----| Functions | -||||| +-------------+ 0t36 + cth_typeoff -|||||+---| Types | -|||||| +-------------+ 0t36 + cth_stroff -||||||+--| Strings | -||||||| +-------------+ 0t36 + cth_stroff + cth_strlen -||||||| -||||||| -||||||| -||||||| +-- magic - vers flags -||||||| | | | | -||||||| +------+------+------+------+ -+---------| 0xcf | 0xf1 | 0x02 | 0x00 | - |||||| +------+------+------+------+ - |||||| 0 1 2 3 4 - |||||| - |||||| + parent label + objects - |||||| | + parent name | + functions + strings - |||||| | | + label | | + types | + strlen - |||||| | | | | | | | | - |||||| +------+------+------+------+------+-------+-------+-------+ - +--------| 0x00 | 0x00 | 0x00 | 0x08 | 0x36 | 0x110 | 0x5f4 | 0x611 | - ||||| +------+------+------+------+------+-------+-------+-------+ - ||||| 0x04 0x08 0x0c 0x10 0x14 0x18 0x1c 0x20 0x24 - ||||| - ||||| + Label name - ||||| | + Label type - ||||| | | + Next label - ||||| | | | - ||||| +-------+------+-----+ - +-----------| 0x01 | 0x42 | ... | - |||| +-------+------+-----+ - |||| cth_lbloff +0x4 +0x8 cth_objtoff - |||| - |||| - |||| Symidx 0t15 0t43 0t44 - |||| +------+------+------+-----+ - +----------| 0x00 | 0x42 | 0x36 | ... | - ||| +------+------+------+-----+ - ||| cth_objtoff +0x2 +0x4 +0x6 cth_funcoff - ||| - ||| + CTF_TYPE_INFO + CTF_TYPE_INFO - ||| | + Return type | - ||| | | + arg0 | - ||| +--------+------+------+-----+ - +---------| 0x2c10 | 0x08 | 0x0c | ... | - || +--------+------+------+-----+ - || cth_funcff +0x2 +0x4 +0x6 cth_typeoff - || - || + ctf_stype_t for type 1 - || | integer + integer encoding - || | | + ctf_stype_t for type 2 - || | | | - || +--------------------+-----------+-----+ - +--------| 0x19 * 0xc01 * 0x0 | 0x1000000 | ... | - | +--------------------+-----------+-----+ - | cth_typeoff +0x08 +0x0c cth_stroff - | - | +--- str 0 - | | +--- str 1 + str 2 - | | | | - | v v v - | +----+---+---+---+----+---+---+---+---+---+----+ - +---| \\0 | i | n | t | \\0 | f | o | o | _ | t | \\0 | - +----+---+---+---+----+---+---+---+---+---+----+ - 0 1 2 3 4 5 6 7 8 9 10 11 -.Ed -.Lp -Every -.Nm -file begins with a -.Sy preamble , -followed by a -.Sy header . -The -.Sy preamble -is defined as follows: -.Bd -literal -typedef struct ctf_preamble { - ushort_t ctp_magic; /* magic number (CTF_MAGIC) */ - uchar_t ctp_version; /* data format version number (CTF_VERSION) */ - uchar_t ctp_flags; /* flags (see below) */ -} ctf_preamble_t; -.Ed -.Pp -The -.Sy preamble -is four bytes long and must be four byte aligned. -This -.Sy preamble -defines the version of the -.Nm -file which defines the format of the rest of the header. -While the header may change in subsequent versions, the preamble will not change -across versions, though the interpretation of its flags may change from -version to version. -The -.Em ctp_magic -member defines the magic number for the -.Nm -file format. -This must always be -.Li 0xcff1 . -If another value is encountered, then the file should not be treated as -a -.Nm -file. -The -.Em ctp_version -member defines the version of the -.Nm -file. -The current version is -.Li 2 . -It is possible to encounter an unsupported version. -In that case, software should not try to parse the format, as it may have -changed. -Finally, the -.Em ctp_flags -member describes aspects of the file which modify its interpretation. -The following flags are currently defined: -.Bd -literal -#define CTF_F_COMPRESS 0x01 -.Ed -.Pp -The flag -.Sy CTF_F_COMPRESS -indicates that the body of the -.Nm -file, all the data following the -.Sy header , -has been compressed through the -.Sy zlib -library and its -.Sy deflate -algorithm. -If this flag is not present, then the body has not been compressed and no -special action is needed to interpret it. -All offsets into the data as described by -.Sy header , -always refer to the -.Sy uncompressed -data. -.Lp -In version two of the -.Nm -file format, the -.Sy header -denotes whether whether or not this -.Nm -file is the child of another -.Nm -file and also indicates the size of the remaining sections. -The structure for the -.Sy header , -logically contains a copy of the -.Sy preamble -and the two have a combined size of 36 bytes. -.Bd -literal -typedef struct ctf_header { - ctf_preamble_t cth_preamble; - uint_t cth_parlabel; /* ref to name of parent lbl uniq'd against */ - uint_t cth_parname; /* ref to basename of parent */ - uint_t cth_lbloff; /* offset of label section */ - uint_t cth_objtoff; /* offset of object section */ - uint_t cth_funcoff; /* offset of function section */ - uint_t cth_typeoff; /* offset of type section */ - uint_t cth_stroff; /* offset of string section */ - uint_t cth_strlen; /* length of string section in bytes */ -} ctf_header_t; -.Ed -.Pp -After the -.Sy preamble , -the next two members -.Em cth_parlablel -and -.Em cth_parname , -are used to identify the parent. -The value of both members are offsets into the -.Sy string -section which point to the start of a null-terminated string. -For more information on the encoding of strings, see the subsection on -.Sx String Identifiers . -If the value of either is zero, then there is no entry for that -member. -If the member -.Em cth_parlabel -is set, then the -.Em ctf_parname -member must be set, otherwise it will not be possible to find the -parent. -If -.Em ctf_parname -is set, it is not necessary to define -.Em cth_parlabel , -as the parent may not have a label. -For more information on labels and their interpretation, see -.Sx The Label Section . -.Lp -The remaining members (excepting -.Em cth_strlen ) -describe the beginning of the corresponding sections. -These offsets are relative to the end of the -.Sy header . -Therefore, something with an offset of 0 is at an offset of thirty-six -bytes relative to the start of the -.Nm -file. -The difference between members indicates the size of the section itself. -Different offsets have different alignment requirements. -The start of the -.Em cth_objotoff -and -.Em cth_funcoff -must be two byte aligned, while the sections -.Em cth_lbloff -and -.Em cth_typeoff -must be four-byte aligned. -The section -.Em cth_stroff -has no alignment requirements. -To calculate the size of a given section, excepting the -.Sy string -section, one should subtract the offset of the section from the following one. -For example, the size of the -.Sy types -section can be calculated by subtracting -.Em cth_stroff -from -.Em cth_typeoff . -.Lp -Finally, the member -.Em cth_strlen -describes the length of the string section itself. -From it, you can also calculate the size of the entire -.Nm -file by adding together the size of the -.Sy ctf_header_t , -the offset of the string section in -.Em cth_stroff , -and the size of the string section in -.Em cth_srlen . -.Ss Type Identifiers -Through the -.Nm ctf -data, types are referred to by identifiers. -A given -.Nm -file supports up to 32767 (0x7fff) types. -The first valid type identifier is 0x1. -When a given -.Nm -file is a child, indicated by a non-zero entry for the -.Sy header Ns 's -.Em cth_parname , -then the first valid type identifier is 0x8000 and the last is 0xffff. -In this case, type identifiers 0x1 through 0x7fff are references to the -parent. -.Lp -The type identifier zero is a sentinel value used to indicate that there -is no type information available or it is an unknown type. -.Lp -Throughout the file format, the identifier is stored in different sized -values; however, the minimum size to represent a given identifier is a -.Sy uint16_t . -Other consumers of -.Nm -information may use larger or opaque identifiers. -.Ss String Identifiers -String identifiers are always encoded as four byte unsigned integers -which are an offset into a string table. -The -.Nm -format supports two different string tables which have an identifier of -zero or one. -This identifier is stored in the high-order bit of the unsigned four byte -offset. -Therefore, the maximum supported offset into one of these tables is 0x7ffffffff. -.Lp -Table identifier zero, always refers to the -.Sy string -section in the CTF file itself. -String table identifier one refers to an external string table which is the ELF -string table for the ELF symbol table associated with the -.Nm -container. -.Ss Type Encoding -Every -.Nm -type begins with metadata encoded into a -.Sy uint16_t . -This encoded information tells us three different pieces of information: -.Bl -bullet -offset indent -compact -.It -The kind of the type -.It -Whether this type is a root type or not -.It -The length of the variable data -.El -.Lp -The 16 bits that make up the encoding are broken down such that you have -five bits for the kind, one bit for indicating whether or not it is a -root type, and 10 bits for the variable length. -This is laid out as follows: -.Bd -literal -offset indent -+--------------------+ -| kind | root | vlen | -+--------------------+ -15 11 10 9 0 -.Ed -.Lp -The current version of the file format defines 14 different kinds. -The interpretation of these different kinds will be discussed in the section -.Sx The Type Section . -If a kind is encountered that is not listed below, then it is not a valid -.Nm -file. -The kinds are defined as follows: -.Bd -literal -offset indent -#define CTF_K_UNKNOWN 0 -#define CTF_K_INTEGER 1 -#define CTF_K_FLOAT 2 -#define CTF_K_POINTER 3 -#define CTF_K_ARRAY 4 -#define CTF_K_FUNCTION 5 -#define CTF_K_STRUCT 6 -#define CTF_K_UNION 7 -#define CTF_K_ENUM 8 -#define CTF_K_FORWARD 9 -#define CTF_K_TYPEDEF 10 -#define CTF_K_VOLATILE 11 -#define CTF_K_CONST 12 -#define CTF_K_RESTRICT 13 -.Ed -.Lp -Programs directly reference many types; however, other types are referenced -indirectly because they are part of some other structure. -These types that are referenced directly and used are called -.Sy root -types. -Other types may be used indirectly, for example, a program may reference -a structure directly, but not one of its members which has a type. -That type is not considered a -.Sy root -type. -If a type is a -.Sy root -type, then it will have bit 10 set. -.Lp -The variable length section is specific to each kind and is discussed in the -section -.Sx The Type Section . -.Lp -The following macros are useful for constructing and deconstructing the encoded -type information: -.Bd -literal -offset indent - -#define CTF_MAX_VLEN 0x3ff -#define CTF_INFO_KIND(info) (((info) & 0xf800) >> 11) -#define CTF_INFO_ISROOT(info) (((info) & 0x0400) >> 10) -#define CTF_INFO_VLEN(info) (((info) & CTF_MAX_VLEN)) - -#define CTF_TYPE_INFO(kind, isroot, vlen) \\ - (((kind) << 11) | (((isroot) ? 1 : 0) << 10) | ((vlen) & CTF_MAX_VLEN)) -.Ed -.Ss The Label Section -When consuming -.Nm -data, it is often useful to know whether two different -.Nm -containers come from the same source base and version. -For example, when building illumos, there are many kernel modules that are built -against a single collection of source code. -A label is encoded into the -.Nm -files that corresponds with the particular build. -This ensures that if files on the system were to become mixed up from multiple -releases, that they are not used together by tools, particularly when a child -needs to refer to a type in the parent. -Because they are linked used the type identifiers, if the wrong parent is used -then the wrong type will be encountered. -.Lp -Each label is encoded in the file format using the following eight byte -structure: -.Bd -literal -typedef struct ctf_lblent { - uint_t ctl_label; /* ref to name of label */ - uint_t ctl_typeidx; /* last type associated with this label */ -} ctf_lblent_t; -.Ed -.Lp -Each label has two different components, a name and a type identifier. -The name is encoded in the -.Em ctl_label -member which is in the format defined in the section -.Sx String Identifiers . -Generally, the names of all labels are found in the internal string -section. -.Lp -The type identifier encoded in the member -.Em ctl_typeidx -refers to the last type identifier that a label refers to in the current -file. -Labels only refer to types in the current file, if the -.Nm -file is a child, then it will have the same label as its parent; -however, its label will only refer to its types, not its parents. -.Lp -It is also possible, though rather uncommon, for a -.Nm -file to have multiple labels. -Labels are placed one after another, every eight bytes. -When multiple labels are present, types may only belong to a single label. -.Ss The Object Section -The object section provides a mapping from ELF symbols of type -.Sy STT_OBJECT -in the symbol table to a type identifier. -Every entry in this section is a -.Sy uint16_t -which contains a type identifier as described in the section -.Sx Type Identifiers . -If there is no information for an object, then the type identifier 0x0 -is stored for that entry. -.Lp -To walk the object section, you need to have a corresponding -.Sy symbol table -in the ELF object that contains the -.Nm -data. -Not every object is included in this section. -Specifically, when walking the symbol table. -An entry is skipped if it matches any of the following conditions: -.Lp -.Bl -bullet -offset indent -compact -.It -The symbol type is not -.Sy STT_OBJECT -.It -The symbol's section index is -.Sy SHN_UNDEF -.It -The symbol's name offset is zero -.It -The symbol's section index is -.Sy SHN_ABS -and the value of the symbol is zero. -.It -The symbol's name is -.Li _START_ -or -.Li _END_ . -These are skipped because they are used for scoping local symbols in -ELF. -.El -.Lp -The following sample code shows an example of iterating the object -section and skipping the correct symbols: -.Bd -literal -#include <gelf.h> -#include <stdio.h> - -/* - * Given the start of the object section in the CTF file, the number of symbols, - * and the ELF Data sections for the symbol table and the string table, this - * prints the type identifiers that correspond to objects. Note, a more robust - * implementation should ensure that they don't walk beyond the end of the CTF - * object section. - */ -static int -walk_symbols(uint16_t *objtoff, Elf_Data *symdata, Elf_Data *strdata, - long nsyms) -{ - long i; - uintptr_t strbase = strdata->d_buf; - - for (i = 1; i < nsyms; i++, objftoff++) { - const char *name; - GElf_Sym sym; - - if (gelf_getsym(symdata, i, &sym) == NULL) - return (1); - - if (GELF_ST_TYPE(sym.st_info) != STT_OBJECT) - continue; - if (sym.st_shndx == SHN_UNDEF || sym.st_name == 0) - continue; - if (sym.st_shndx == SHN_ABS && sym.st_value == 0) - continue; - name = (const char *)(strbase + sym.st_name); - if (strcmp(name, "_START_") == 0 || strcmp(name, "_END_") == 0) - continue; - - (void) printf("Symbol %d has type %d\n", i, *objtoff); - } - - return (0); -} -.Ed -.Ss The Function Section -The function section of the -.Nm -file encodes the types of both the function's arguments and the function's -return type. -Similar to -.Sx The Object Section , -the function section encodes information for all symbols of type -.Sy STT_FUNCTION , -excepting those that fit specific criteria. -Unlike with objects, because functions have a variable number of arguments, they -start with a type encoding as defined in -.Sx Type Encoding , -which is the size of a -.Sy uint16_t . -For functions which have no type information available, they are encoded as -.Li CTF_TYPE_INFO(CTF_K_UNKNOWN, 0, 0) . -Functions with arguments are encoded differently. -Here, the variable length is turned into the number of arguments in the -function. -If a function is a -.Sy varargs -type function, then the number of arguments is increased by one. -Functions with type information are encoded as: -.Li CTF_TYPE_INFO(CTF_K_FUNCTION, 0, nargs) . -.Lp -For functions that have no type information, nothing else is encoded, and the -next function is encoded. -For functions with type information, the next -.Sy uint16_t -is encoded with the type identifier of the return type of the function. -It is followed by each of the type identifiers of the arguments, if any exist, -in the order that they appear in the function. -Therefore, argument 0 is the first type identifier and so on. -When a function has a final varargs argument, that is encoded with the type -identifier of zero. -.Lp -Like -.Sx The Object Section , -the function section is encoded in the order of the symbol table. -It has similar, but slightly different considerations from objects. -While iterating the symbol table, if any of the following conditions are true, -then the entry is skipped and no corresponding entry is written: -.Lp -.Bl -bullet -offset indent -compact -.It -The symbol type is not -.Sy STT_FUNCTION -.It -The symbol's section index is -.Sy SHN_UNDEF -.It -The symbol's name offset is zero -.It -The symbol's name is -.Li _START_ -or -.Li _END_ . -These are skipped because they are used for scoping local symbols in -ELF. -.El -.Ss The Type Section -The type section is the heart of the -.Nm -data. -It encodes all of the information about the types themselves. -The base of the type information comes in two forms, a short form and a long -form, each of which may be followed by a variable number of arguments. -The following definitions describe the short and long forms: -.Bd -literal -#define CTF_MAX_SIZE 0xfffe /* max size of a type in bytes */ -#define CTF_LSIZE_SENT 0xffff /* sentinel for ctt_size */ -#define CTF_MAX_LSIZE UINT64_MAX - -typedef struct ctf_stype { - uint_t ctt_name; /* reference to name in string table */ - ushort_t ctt_info; /* encoded kind, variant length */ - union { - ushort_t _size; /* size of entire type in bytes */ - ushort_t _type; /* reference to another type */ - } _u; -} ctf_stype_t; - -typedef struct ctf_type { - uint_t ctt_name; /* reference to name in string table */ - ushort_t ctt_info; /* encoded kind, variant length */ - union { - ushort_t _size; /* always CTF_LSIZE_SENT */ - ushort_t _type; /* do not use */ - } _u; - uint_t ctt_lsizehi; /* high 32 bits of type size in bytes */ - uint_t ctt_lsizelo; /* low 32 bits of type size in bytes */ -} ctf_type_t; - -#define ctt_size _u._size /* for fundamental types that have a size */ -#define ctt_type _u._type /* for types that reference another type */ -.Ed -.Pp -Type sizes are stored in -.Sy bytes . -The basic small form uses a -.Sy ushort_t -to store the number of bytes. -If the number of bytes in a structure would exceed 0xfffe, then the alternate -form, the -.Sy ctf_type_t , -is used instead. -To indicate that the larger form is being used, the member -.Em ctt_size -is set to value of -.Sy CTF_LSIZE_SENT -(0xffff). -In general, when going through the type section, consumers use the -.Sy ctf_type_t -structure, but pay attention to the value of the member -.Em ctt_size -to determine whether they should increment their scan by the size of the -.Sy ctf_stype_t -or -.Sy ctf_type_t . -Not all kinds of types use -.Sy ctt_size . -Those which do not, will always use the -.Sy ctf_stype_t -structure. -The individual sections for each kind have more information. -.Lp -Types are written out in order. -Therefore the first entry encountered has a type id of 0x1, or 0x8000 if a -child. -The member -.Em ctt_name -is encoded as described in the section -.Sx String Identifiers . -The string that it points to is the name of the type. -If the identifier points to an empty string (one that consists solely of a null -terminator) then the type does not have a name, this is common with anonymous -structures and unions that only have a typedef to name them, as well as, -pointers and qualifiers. -.Lp -The next member, the -.Em ctt_info , -is encoded as described in the section -.Sx Type Encoding . -The types kind tells us how to interpret the remaining data in the -.Sy ctf_type_t -and any variable length data that may exist. -The rest of this section will be broken down into the interpretation of the -various kinds. -.Ss Encoding of Integers -Integers, which are of type -.Sy CTF_K_INTEGER , -have no variable length arguments. -Instead, they are followed by a four byte -.Sy uint_t -which describes their encoding. -All integers must be encoded with a variable length of zero. -The -.Em ctt_size -member describes the length of the integer in bytes. -In general, integer sizes will be rounded up to the closest power of two. -.Lp -The integer encoding contains three different pieces of information: -.Bl -bullet -offset indent -compact -.It -The encoding of the integer -.It -The offset in -.Sy bits -of the type -.It -The size in -.Sy bits -of the type -.El -.Pp -This encoding can be expressed through the following macros: -.Bd -literal -offset indent -#define CTF_INT_ENCODING(data) (((data) & 0xff000000) >> 24) -#define CTF_INT_OFFSET(data) (((data) & 0x00ff0000) >> 16) -#define CTF_INT_BITS(data) (((data) & 0x0000ffff)) - -#define CTF_INT_DATA(encoding, offset, bits) \\ - (((encoding) << 24) | ((offset) << 16) | (bits)) -.Ed -.Pp -The following flags are defined for the encoding at this time: -.Bd -literal -offset indent -#define CTF_INT_SIGNED 0x01 -#define CTF_INT_CHAR 0x02 -#define CTF_INT_BOOL 0x04 -#define CTF_INT_VARARGS 0x08 -.Ed -.Lp -By default, an integer is considered to be unsigned, unless it has the -.Sy CTF_INT_SIGNED -flag set. -If the flag -.Sy CTF_INT_CHAR -is set, that indicates that the integer is of a type that stores character -data, for example the intrinsic C type -.Sy char -would have the -.Sy CTF_INT_CHAR -flag set. -If the flag -.Sy CTF_INT_BOOL -is set, that indicates that the integer represents a boolean type. -For example, the intrinsic C type -.Sy _Bool -would have the -.Sy CTF_INT_BOOL -flag set. -Finally, the flag -.Sy CTF_INT_VARARGS -indicates that the integer is used as part of a variable number of arguments. -This encoding is rather uncommon. -.Ss Encoding of Floats -Floats, which are of type -.Sy CTF_K_FLOAT , -are similar to their integer counterparts. -They have no variable length arguments and are followed by a four byte encoding -which describes the kind of float that exists. -The -.Em ctt_size -member is the size, in bytes, of the float. -The float encoding has three different pieces of information inside of it: -.Lp -.Bl -bullet -offset indent -compact -.It -The specific kind of float that exists -.It -The offset in -.Sy bits -of the float -.It -The size in -.Sy bits -of the float -.El -.Lp -This encoding can be expressed through the following macros: -.Bd -literal -offset indent -#define CTF_FP_ENCODING(data) (((data) & 0xff000000) >> 24) -#define CTF_FP_OFFSET(data) (((data) & 0x00ff0000) >> 16) -#define CTF_FP_BITS(data) (((data) & 0x0000ffff)) - -#define CTF_FP_DATA(encoding, offset, bits) \\ - (((encoding) << 24) | ((offset) << 16) | (bits)) -.Ed -.Lp -Where as the encoding for integers was a series of flags, the encoding for -floats maps to a specific kind of float. -It is not a flag-based value. -The kinds of floats correspond to both their size, and the encoding. -This covers all of the basic C intrinsic floating point types. -The following are the different kinds of floats represented in the encoding: -.Bd -literal -offset indent -#define CTF_FP_SINGLE 1 /* IEEE 32-bit float encoding */ -#define CTF_FP_DOUBLE 2 /* IEEE 64-bit float encoding */ -#define CTF_FP_CPLX 3 /* Complex encoding */ -#define CTF_FP_DCPLX 4 /* Double complex encoding */ -#define CTF_FP_LDCPLX 5 /* Long double complex encoding */ -#define CTF_FP_LDOUBLE 6 /* Long double encoding */ -#define CTF_FP_INTRVL 7 /* Interval (2x32-bit) encoding */ -#define CTF_FP_DINTRVL 8 /* Double interval (2x64-bit) encoding */ -#define CTF_FP_LDINTRVL 9 /* Long double interval (2x128-bit) encoding */ -#define CTF_FP_IMAGRY 10 /* Imaginary (32-bit) encoding */ -#define CTF_FP_DIMAGRY 11 /* Long imaginary (64-bit) encoding */ -#define CTF_FP_LDIMAGRY 12 /* Long double imaginary (128-bit) encoding */ -.Ed -.Ss Encoding of Arrays -Arrays, which are of type -.Sy CTF_K_ARRAY , -have no variable length arguments. -They are followed by a structure which describes the number of elements in the -array, the type identifier of the elements in the array, and the type identifier -of the index of the array. -With arrays, the -.Em ctt_size -member is set to zero. -The structure that follows an array is defined as: -.Bd -literal -typedef struct ctf_array { - ushort_t cta_contents; /* reference to type of array contents */ - ushort_t cta_index; /* reference to type of array index */ - uint_t cta_nelems; /* number of elements */ -} ctf_array_t; -.Ed -.Lp -The -.Em cta_contents -and -.Em cta_index -members of the -.Sy ctf_array_t -are type identifiers which are encoded as per the section -.Sx Type Identifiers . -The member -.Em cta_nelems -is a simple four byte unsigned count of the number of elements. -This count may be zero when encountering C99's flexible array members. -.Ss Encoding of Functions -Function types, which are of type -.Sy CTF_K_FUNCTION , -use the variable length list to be the number of arguments in the function. -When the function has a final member which is a varargs, then the argument count -is incremented by one to account for the variable argument. -Here, the -.Em ctt_type -member is encoded with the type identifier of the return type of the function. -Note that the -.Em ctt_size -member is not used here. -.Lp -The variable argument list contains the type identifiers for the arguments of -the function, if any. -Each one is represented by a -.Sy uint16_t -and encoded according to the -.Sx Type Identifiers -section. -If the function's last argument is of type varargs, then it is also written out, -but the type identifier is zero. -This is included in the count of the function's arguments. -.Ss Encoding of Structures and Unions -Structures and Unions, which are encoded with -.Sy CTF_K_STRUCT -and -.Sy CTF_K_UNION -respectively, are very similar constructs in C. -The main difference between them is the fact that every member of a structure -follows one another, where as in a union, all members share the same memory. -They are also very similar in terms of their encoding in -.Nm . -The variable length argument for structures and unions represents the number of -members that they have. -The value of the member -.Em ctt_size -is the size of the structure and union. -There are two different structures which are used to encode members in the -variable list. -When the size of a structure or union is greater than or equal to the large -member threshold, 8192, then a different structure is used to encode the member, -all members are encoded using the same structure. -The structure for members is as follows: -.Bd -literal -typedef struct ctf_member { - uint_t ctm_name; /* reference to name in string table */ - ushort_t ctm_type; /* reference to type of member */ - ushort_t ctm_offset; /* offset of this member in bits */ -} ctf_member_t; - -typedef struct ctf_lmember { - uint_t ctlm_name; /* reference to name in string table */ - ushort_t ctlm_type; /* reference to type of member */ - ushort_t ctlm_pad; /* padding */ - uint_t ctlm_offsethi; /* high 32 bits of member offset in bits */ - uint_t ctlm_offsetlo; /* low 32 bits of member offset in bits */ -} ctf_lmember_t; -.Ed -.Lp -Both the -.Em ctm_name -and -.Em ctlm_name -refer to the name of the member. -The name is encoded as an offset into the string table as described by the -section -.Sx String Identifiers . -The members -.Sy ctm_type -and -.Sy ctlm_type -both refer to the type of the member. -They are encoded as per the section -.Sx Type Identifiers . -.Lp -The last piece of information that is present is the offset which describes the -offset in memory that the member begins at. -For unions, this value will always be zero because the start of unions in memory -is always zero. -For structures, this is the offset in -.Sy bits -that the member begins at. -Note that a compiler may lay out a type with padding. -This means that the difference in offset between two consecutive members may be -larger than the size of the member. -When the size of the overall structure is strictly less than 8192 bytes, the -normal structure, -.Sy ctf_member_t , -is used and the offset in bits is stored in the member -.Em ctm_offset . -However, when the size of the structure is greater than or equal to 8192 bytes, -then the number of bits is split into two 32-bit quantities. -One member, -.Em ctlm_offsethi , -represents the upper 32 bits of the offset, while the other member, -.Em ctlm_offsetlo , -represents the lower 32 bits of the offset. -These can be joined together to get a 64-bit sized offset in bits by shifting -the member -.Em ctlm_offsethi -to the left by thirty two and then doing a binary or of -.Em ctlm_offsetlo . -.Ss Encoding of Enumerations -Enumerations, noted by the type -.Sy CTF_K_ENUM , -are similar to structures. -Enumerations use the variable list to note the number of values that the -enumeration contains, which we'll term enumerators. -In C, an enumeration is always equivalent to the intrinsic type -.Sy int , -thus the value of the member -.Em ctt_size -is always the size of an integer which is determined based on the current model. -For illumos systems, this will always be 4, as an integer is always defined to -be 4 bytes large in both -.Sy ILP32 -and -.Sy LP64 , -regardless of the architecture. -.Lp -The enumerators encoded in an enumeration have the following structure in the -variable list: -.Bd -literal -typedef struct ctf_enum { - uint_t cte_name; /* reference to name in string table */ - int cte_value; /* value associated with this name */ -} ctf_enum_t; -.Ed -.Pp -The member -.Em cte_name -refers to the name of the enumerator's value, it is encoded according to the -rules in the section -.Sx String Identifiers . -The member -.Em cte_value -contains the integer value of this enumerator. -.Ss Encoding of Forward References -Forward references, types of kind -.Sy CTF_K_FORWARD , -in a -.Nm -file refer to types which may not have a definition at all, only a name. -If the -.Nm -file is a child, then it may be that the forward is resolved to an -actual type in the parent, otherwise the definition may be in another -.Nm -container or may not be known at all. -The only member of the -.Sy ctf_type_t -that matters for a forward declaration is the -.Em ctt_name -which points to the name of the forward reference in the string table as -described earlier. -There is no other information recorded for forward references. -.Ss Encoding of Pointers, Typedefs, Volatile, Const, and Restrict -Pointers, typedefs, volatile, const, and restrict are all similar in -.Nm . -They all refer to another type. -In the case of typedefs, they provide an alternate name, while volatile, const, -and restrict change how the type is interpreted in the C programming language. -This covers the -.Nm -kinds -.Sy CTF_K_POINTER , -.Sy CTF_K_TYPEDEF , -.Sy CTF_K_VOLATILE , -.Sy CTF_K_RESTRICT , -and -.Sy CTF_K_CONST . -.Lp -These types have no variable list entries and use the member -.Em ctt_type -to refer to the base type that they modify. -.Ss Encoding of Unknown Types -Types with the kind -.Sy CTF_K_UNKNOWN -are used to indicate gaps in the type identifier space. -These entries consume an identifier, but do not define anything. -Nothing should refer to these gap identifiers. -.Ss Dependencies Between Types -C types can be imagined as a directed, cyclic, graph. -Structures and unions may refer to each other in a way that creates a cyclic -dependency. -In cases such as these, the entire type section must be read in and processed. -Consumers must not assume that every type can be laid out in dependency order; -they cannot. -.Ss The String Section -The last section of the -.Nm -file is the -.Sy string -section. -This section encodes all of the strings that appear throughout the other -sections. -It is laid out as a series of characters followed by a null terminator. -Generally, all names are written out in ASCII, as most C compilers do not allow -and characters to appear in identifiers outside of a subset of ASCII. -However, any extended characters sets should be written out as a series of UTF-8 -bytes. -.Lp -The first entry in the section, at offset zero, is a single null -terminator to reference the empty string. -Following that, each C string should be written out, including the null -terminator. -Offsets that refer to something in this section should refer to the first byte -which begins a string. -Beyond the first byte in the section being the null terminator, the order of -strings is unimportant. -.Sh Data Encoding and ELF Considerations -.Nm -data is generally included in ELF objects which specify information to -identify the architecture and endianness of the file. -A -.Nm -container inside such an object must match the endianness of the ELF object. -Aside from the question of the endian encoding of data, there should be no other -differences between architectures. -While many of the types in this document refer to non-fixed size C integral -types, they are equivalent in the models -.Sy ILP32 -and -.Sy LP64 . -If any other model is being used with -.Nm -data that has different sizes, then it must not use the model's sizes for -those integral types and instead use the fixed size equivalents based on an -.Sy ILP32 -environment. -.Lp -When placing a -.Nm -container inside of an ELF object, there are certain conventions that are -expected for the purposes of tooling being able to find the -.Nm -data. -In particular, a given ELF object should only contain a single -.Nm -section. -Multiple containers should be merged together into a single one. -.Lp -The -.Nm -file should be included in its own ELF section. -The section's name must be -.Ql .SUNW_ctf . -The type of the section must be -.Sy SHT_PROGBITS . -The section should have a link set to the symbol table and its address -alignment must be 4. -.Sh SEE ALSO -.Xr mdb 1 , -.Xr dtrace 1M , -.Xr gelf 3ELF , -.Xr libelf 3LIB , -.Xr a.out 4 |