diff options
Diffstat (limited to 'archivers/libarchive/files/doc/text/tar.5')
-rw-r--r-- | archivers/libarchive/files/doc/text/tar.5 | 534 |
1 files changed, 534 insertions, 0 deletions
diff --git a/archivers/libarchive/files/doc/text/tar.5 b/archivers/libarchive/files/doc/text/tar.5 new file mode 100644 index 00000000000..24b77e15984 --- /dev/null +++ b/archivers/libarchive/files/doc/text/tar.5 @@ -0,0 +1,534 @@ +TAR(5) FreeBSD File Formats Manual TAR(5) + +NAME + tar -- format of tape archive files + +DESCRIPTION + The tar archive format collects any number of files, directories, and + other file system objects (symbolic links, device nodes, etc.) into a + single stream of bytes. The format was originally designed to be used + with tape drives that operate with fixed-size blocks, but is widely used + as a general packaging mechanism. + + General Format + A tar archive consists of a series of 512-byte records. Each file system + object requires a header record which stores basic metadata (pathname, + owner, permissions, etc.) and zero or more records containing any file + data. The end of the archive is indicated by two records consisting + entirely of zero bytes. + + For compatibility with tape drives that use fixed block sizes, programs + that read or write tar files always read or write a fixed number of + records with each I/O operation. These ``blocks'' are always a multiple + of the record size. The most common block size--and the maximum sup- + ported by historic implementations--is 10240 bytes or 20 records. (Note: + the terms ``block'' and ``record'' here are not entirely standard; this + document follows the convention established by John Gilmore in document- + ing pdtar.) + + Old-Style Archive Format + The original tar archive format has been extended many times to include + additional information that various implementors found necessary. This + section describes the variant implemented by the tar command included in + Version 7 AT&T UNIX, which is one of the earliest widely-used versions of + the tar program. + + The header record for an old-style tar archive consists of the following: + + struct header_old_tar { + char name[100]; + char mode[8]; + char uid[8]; + char gid[8]; + char size[12]; + char mtime[12]; + char checksum[8]; + char linkflag[1]; + char linkname[100]; + char pad[255]; + }; + All unused bytes in the header record are filled with nulls. + + name Pathname, stored as a null-terminated string. Early tar imple- + mentations only stored regular files (including hardlinks to + those files). One common early convention used a trailing "/" + character to indicate a directory name, allowing directory per- + missions and owner information to be archived and restored. + + mode File mode, stored as an octal number in ASCII. + + uid, gid + User id and group id of owner, as octal numbers in ASCII. + + size Size of file, as octal number in ASCII. For regular files only, + this indicates the amount of data that follows the header. In + particular, this field was ignored by early tar implementations + when extracting hardlinks. Modern writers should always store a + zero length for hardlink entries. + + mtime Modification time of file, as an octal number in ASCII. This + indicates the number of seconds since the start of the epoch, + 00:00:00 UTC January 1, 1970. Note that negative values should + be avoided here, as they are handled inconsistently. + + checksum + Header checksum, stored as an octal number in ASCII. To compute + the checksum, set the checksum field to all spaces, then sum all + bytes in the header using unsigned arithmetic. This field should + be stored as six octal digits followed by a null and a space + character. Note that many early implementations of tar used + signed arithmetic for the checksum field, which can cause inter- + operability problems when transferring archives between systems. + Modern robust readers compute the checksum both ways and accept + the header if either computation matches. + + linkflag, linkname + In order to preserve hardlinks and conserve tape, a file with + multiple links is only written to the archive the first time it + is encountered. The next time it is encountered, the linkflag is + set to an ASCII `1' and the linkname field holds the first name + under which this file appears. (Note that regular files have a + null value in the linkflag field.) + + Early tar implementations varied in how they terminated these fields. + The tar command in Version 7 AT&T UNIX used the following conventions + (this is also documented in early BSD manpages): the pathname must be + null-terminated; the mode, uid, and gid fields must end in a space and a + null byte; the size and mtime fields must end in a space; the checksum is + terminated by a null and a space. Early implementations filled the + numeric fields with leading spaces. This seems to have been common prac- + tice until the IEEE Std 1003.1-1988 (``POSIX.1'') standard was released. + For best portability, modern implementations should fill the numeric + fields with leading zeros. + + Pre-POSIX Archives + An early draft of IEEE Std 1003.1-1988 (``POSIX.1'') served as the basis + for John Gilmore's pdtar program and many system implementations from the + late 1980s and early 1990s. These archives generally follow the POSIX + ustar format described below with the following variations: + o The magic value is ``ustar '' (note the following space). The + version field contains a space character followed by a null. + o The numeric fields are generally filled with leading spaces (not + leading zeros as recommended in the final standard). + o The prefix field is often not used, limiting pathnames to the 100 + characters of old-style archives. + + POSIX ustar Archives + IEEE Std 1003.1-1988 (``POSIX.1'') defined a standard tar file format to + be read and written by compliant implementations of tar(1). This format + is often called the ``ustar'' format, after the magic value used in the + header. (The name is an acronym for ``Unix Standard TAR''.) It extends + the historic format with new fields: + + struct header_posix_ustar { + char name[100]; + char mode[8]; + char uid[8]; + char gid[8]; + char size[12]; + char mtime[12]; + char checksum[8]; + char typeflag[1]; + char linkname[100]; + char magic[6]; + char version[2]; + char uname[32]; + char gname[32]; + char devmajor[8]; + char devminor[8]; + char prefix[155]; + char pad[12]; + }; + + typeflag + Type of entry. POSIX extended the earlier linkflag field with + several new type values: + ``0'' Regular file. NULL should be treated as a synonym, for + compatibility purposes. + ``1'' Hard link. + ``2'' Symbolic link. + ``3'' Character device node. + ``4'' Block device node. + ``5'' Directory. + ``6'' FIFO node. + ``7'' Reserved. + Other A POSIX-compliant implementation must treat any unrecog- + nized typeflag value as a regular file. In particular, + writers should ensure that all entries have a valid file- + name so that they can be restored by readers that do not + support the corresponding extension. Uppercase letters + "A" through "Z" are reserved for custom extensions. Note + that sockets and whiteout entries are not archivable. + It is worth noting that the size field, in particular, has dif- + ferent meanings depending on the type. For regular files, of + course, it indicates the amount of data following the header. + For directories, it may be used to indicate the total size of all + files in the directory, for use by operating systems that pre- + allocate directory space. For all other types, it should be set + to zero by writers and ignored by readers. + + magic Contains the magic value ``ustar'' followed by a NULL byte to + indicate that this is a POSIX standard archive. Full compliance + requires the uname and gname fields be properly set. + + version + Version. This should be ``00'' (two copies of the ASCII digit + zero) for POSIX standard archives. + + uname, gname + User and group names, as null-terminated ASCII strings. These + should be used in preference to the uid/gid values when they are + set and the corresponding names exist on the system. + + devmajor, devminor + Major and minor numbers for character device or block device + entry. + + prefix First part of pathname. If the pathname is too long to fit in + the 100 bytes provided by the standard format, it can be split at + any / character with the first portion going here. If the prefix + field is not empty, the reader will prepend the prefix value and + a / character to the regular name field to obtain the full path- + name. + + Note that all unused bytes must be set to NULL. + + Field termination is specified slightly differently by POSIX than by pre- + vious implementations. The magic, uname, and gname fields must have a + trailing NULL. The pathname, linkname, and prefix fields must have a + trailing NULL unless they fill the entire field. (In particular, it is + possible to store a 256-character pathname if it happens to have a / as + the 156th character.) POSIX requires numeric fields to be zero-padded in + the front, and allows them to be terminated with either space or NULL + characters. + + Currently, most tar implementations comply with the ustar format, occa- + sionally extending it by adding new fields to the blank area at the end + of the header record. + + Pax Interchange Format + There are many attributes that cannot be portably stored in a POSIX ustar + archive. IEEE Std 1003.1-2001 (``POSIX.1'') defined a ``pax interchange + format'' that uses two new types of entries to hold text-formatted meta- + data that applies to following entries. Note that a pax interchange for- + mat archive is a ustar archive in every respect. The new data is stored + in ustar-compatible archive entries that use the ``x'' or ``g'' typeflag. + In particular, older implementations that do not fully support these + extensions will extract the metadata into regular files, where the meta- + data can be examined as necessary. + + An entry in a pax interchange format archive consists of one or two stan- + dard ustar entries, each with its own header and data. The first + optional entry stores the extended attributes for the following entry. + This optional first entry has an "x" typeflag and a size field that indi- + cates the total size of the extended attributes. The extended attributes + themselves are stored as a series of text-format lines encoded in the + portable UTF-8 encoding. Each line consists of a decimal number, a + space, a key string, an equals sign, a value string, and a new line. The + decimal number indicates the length of the entire line, including the + initial length field and the trailing newline. An example of such a + field is: + 25 ctime=1084839148.1212\n + Keys in all lowercase are standard keys. Vendors can add their own keys + by prefixing them with an all uppercase vendor name and a period. Note + that, unlike the historic header, numeric values are stored using deci- + mal, not octal. A description of some common keys follows: + + atime, ctime, mtime + File access, inode change, and modification times. These fields + can be negative or include a decimal point and a fractional + value. + + uname, uid, gname, gid + User name, group name, and numeric UID and GID values. The user + name and group name stored here are encoded in UTF8 and can thus + include non-ASCII characters. The UID and GID fields can be of + arbitrary length. + + linkpath + The full path of the linked-to file. Note that this is encoded + in UTF8 and can thus include non-ASCII characters. + + path The full pathname of the entry. Note that this is encoded in + UTF8 and can thus include non-ASCII characters. + + realtime.*, security.* + These keys are reserved and may be used for future standardiza- + tion. + + size The size of the file. Note that there is no length limit on this + field, allowing conforming archives to store files much larger + than the historic 8GB limit. + + SCHILY.* + Vendor-specific attributes used by Joerg Schilling's star imple- + mentation. + + SCHILY.acl.access, SCHILY.acl.default + Stores the access and default ACLs as textual strings in a format + that is an extension of the format specified by POSIX.1e draft + 17. In particular, each user or group access specification can + include a fourth colon-separated field with the numeric UID or + GID. This allows ACLs to be restored on systems that may not + have complete user or group information available (such as when + NIS/YP or LDAP services are temporarily unavailable). + + SCHILY.devminor, SCHILY.devmajor + The full minor and major numbers for device nodes. + + SCHILY.dev, SCHILY.ino, SCHILY.nlinks + The device number, inode number, and link count for the entry. + In particular, note that a pax interchange format archive using + Joerg Schilling's SCHILY.* extensions can store all of the data + from struct stat. + + LIBARCHIVE.xattr.namespace.key + Libarchive stores POSIX.1e-style extended attributes using keys + of this form. The key value is URL-encoded: All non-ASCII char- + acters and the two special characters ``='' and ``%'' are encoded + as ``%'' followed by two uppercase hexadecimal digits. The value + of this key is the extended attribute value encoded in base 64. + XXX Detail the base-64 format here XXX + + VENDOR.* + XXX document other vendor-specific extensions XXX + + Any values stored in an extended attribute override the corresponding + values in the regular tar header. Note that compliant readers should + ignore the regular fields when they are overridden. This is important, + as existing archivers are known to store non-compliant values in the + standard header fields in this situation. There are no limits on length + for any of these fields. In particular, numeric fields can be arbitrar- + ily large. All text fields are encoded in UTF8. Compliant writers + should store only portable 7-bit ASCII characters in the standard ustar + header and use extended attributes whenever a text value contains non- + ASCII characters. + + In addition to the x entry described above, the pax interchange format + also supports a g entry. The g entry is identical in format, but speci- + fies attributes that serve as defaults for all subsequent archive + entries. The g entry is not widely used. + + Besides the new x and g entries, the pax interchange format has a few + other minor variations from the earlier ustar format. The most troubling + one is that hardlinks are permitted to have data following them. This + allows readers to restore any hardlink to a file without having to rewind + the archive to find an earlier entry. However, it creates complications + for robust readers, as it is no longer clear whether or not they should + ignore the size field for hardlink entries. + + GNU Tar Archives + The GNU tar program started with a pre-POSIX format similar to that + described earlier and has extended it using several different mechanisms: + It added new fields to the empty space in the header (some of which was + later used by POSIX for conflicting purposes); it allowed the header to + be continued over multiple records; and it defined new entries that mod- + ify following entries (similar in principle to the x entry described + above, but each GNU special entry is single-purpose, unlike the general- + purpose x entry). As a result, GNU tar archives are not POSIX compati- + ble, although more lenient POSIX-compliant readers can successfully + extract most GNU tar archives. + + struct header_gnu_tar { + char name[100]; + char mode[8]; + char uid[8]; + char gid[8]; + char size[12]; + char mtime[12]; + char checksum[8]; + char typeflag[1]; + char linkname[100]; + char magic[6]; + char version[2]; + char uname[32]; + char gname[32]; + char devmajor[8]; + char devminor[8]; + char atime[12]; + char ctime[12]; + char offset[12]; + char longnames[4]; + char unused[1]; + struct { + char offset[12]; + char numbytes[12]; + } sparse[4]; + char isextended[1]; + char realsize[12]; + char pad[17]; + }; + + typeflag + GNU tar uses the following special entry types, in addition to + those defined by POSIX: + + 7 GNU tar treats type "7" records identically to type "0" + records, except on one obscure RTOS where they are used + to indicate the pre-allocation of a contiguous file on + disk. + + D This indicates a directory entry. Unlike the POSIX-stan- + dard "5" typeflag, the header is followed by data records + listing the names of files in this directory. Each name + is preceded by an ASCII "Y" if the file is stored in this + archive or "N" if the file is not stored in this archive. + Each name is terminated with a null, and an extra null + marks the end of the name list. The purpose of this + entry is to support incremental backups; a program + restoring from such an archive may wish to delete files + on disk that did not exist in the directory when the ar- + chive was made. + + Note that the "D" typeflag specifically violates POSIX, + which requires that unrecognized typeflags be restored as + normal files. In this case, restoring the "D" entry as a + file could interfere with subsequent creation of the + like-named directory. + + K The data for this entry is a long linkname for the fol- + lowing regular entry. + + L The data for this entry is a long pathname for the fol- + lowing regular entry. + + M This is a continuation of the last file on the previous + volume. GNU multi-volume archives guarantee that each + volume begins with a valid entry header. To ensure this, + a file may be split, with part stored at the end of one + volume, and part stored at the beginning of the next vol- + ume. The "M" typeflag indicates that this entry contin- + ues an existing file. Such entries can only occur as the + first or second entry in an archive (the latter only if + the first entry is a volume label). The size field spec- + ifies the size of this entry. The offset field at bytes + 369-380 specifies the offset where this file fragment + begins. The realsize field specifies the total size of + the file (which must equal size plus offset). When + extracting, GNU tar checks that the header file name is + the one it is expecting, that the header offset is in the + correct sequence, and that the sum of offset and size is + equal to realsize. FreeBSD's version of GNU tar does not + handle the corner case of an archive's being continued in + the middle of a long name or other extension header. + + N Type "N" records are no longer generated by GNU tar. + They contained a list of files to be renamed or symlinked + after extraction; this was originally used to support + long names. The contents of this record are a text + description of the operations to be done, in the form + ``Rename %s to %s\n'' or ``Symlink %s to %s\n''; in + either case, both filenames are escaped using K&R C syn- + tax. + + S This is a ``sparse'' regular file. Sparse files are + stored as a series of fragments. The header contains a + list of fragment offset/length pairs. If more than four + such entries are required, the header is extended as nec- + essary with ``extra'' header extensions (an older format + that is no longer used), or ``sparse'' extensions. + + V The name field should be interpreted as a tape/volume + header name. This entry should generally be ignored on + extraction. + + magic The magic field holds the five characters ``ustar'' followed by a + space. Note that POSIX ustar archives have a trailing null. + + version + The version field holds a space character followed by a null. + Note that POSIX ustar archives use two copies of the ASCII digit + ``0''. + + atime, ctime + The time the file was last accessed and the time of last change + of file information, stored in octal as with mtime. + + longnames + This field is apparently no longer used. + + Sparse offset / numbytes + Each such structure specifies a single fragment of a sparse file. + The two fields store values as octal numbers. The fragments are + each padded to a multiple of 512 bytes in the archive. On + extraction, the list of fragments is collected from the header + (including any extension headers), and the data is then read and + written to the file at appropriate offsets. + + isextended + If this is set to non-zero, the header will be followed by addi- + tional ``sparse header'' records. Each such record contains + information about as many as 21 additional sparse blocks as shown + here: + + struct gnu_sparse_header { + struct { + char offset[12]; + char numbytes[12]; + } sparse[21]; + char isextended[1]; + char padding[7]; + }; + + realsize + A binary representation of the file's complete size, with a much + larger range than the POSIX file size. In particular, with M + type files, the current entry is only a portion of the file. In + that case, the POSIX size field will indicate the size of this + entry; the realsize field will indicate the total size of the + file. + + Solaris Tar + XXX More Details Needed XXX + + Solaris tar (beginning with SunOS XXX 5.7 ?? XXX) supports an + ``extended'' format that is fundamentally similar to pax interchange for- + mat, with the following differences: + o Extended attributes are stored in an entry whose type is X, not + x, as used by pax interchange format. The detailed format of + this entry appears to be the same as detailed above for the x + entry. + o An additional A entry is used to store an ACL for the following + regular entry. The body of this entry contains a seven-digit + octal number (whose value is 01000000 plus the number of ACL + entries) followed by a zero byte, followed by the textual ACL + description. + + Other Extensions + One common extension, utilized by GNU tar, star, and other newer tar + implementations, permits binary numbers in the standard numeric fields. + This is flagged by setting the high bit of the first character. This + permits 95-bit values for the length and time fields and 63-bit values + for the uid, gid, and device numbers. GNU tar supports this extension + for the length, mtime, ctime, and atime fields. Joerg Schilling's star + program supports this extension for all numeric fields. Note that this + extension is largely obsoleted by the extended attribute record provided + by the pax interchange format. + + Another early GNU extension allowed base-64 values rather than octal. + This extension was short-lived and such archives are almost never seen. + However, there is still code in GNU tar to support them; this code is + responsible for a very cryptic warning message that is sometimes seen + when GNU tar encounters a damaged archive. + +SEE ALSO + ar(1), pax(1), tar(1) + +STANDARDS + The tar utility is no longer a part of POSIX or the Single Unix Standard. + It last appeared in Version 2 of the Single UNIX Specification + (``SUSv2''). It has been supplanted in subsequent standards by pax(1). + The ustar format is currently part of the specification for the pax(1) + utility. The pax interchange file format is new with IEEE Std + 1003.1-2001 (``POSIX.1''). + +HISTORY + A tar command appeared in Seventh Edition Unix, which was released in + January, 1979. It replaced the tp program from Fourth Edition Unix which + in turn replaced the tap program from First Edition Unix. John Gilmore's + pdtar public-domain implementation (circa 1987) was highly influential + and formed the basis of GNU tar. Joerg Shilling's star archiver is + another open-source (GPL) archiver (originally developed circa 1985) + which features complete support for pax interchange format. + +FreeBSD 6.0 May 20, 2004 FreeBSD 6.0 |