summaryrefslogtreecommitdiff
path: root/archivers/libarchive/files/doc/text/tar.5.txt
diff options
context:
space:
mode:
Diffstat (limited to 'archivers/libarchive/files/doc/text/tar.5.txt')
-rw-r--r--archivers/libarchive/files/doc/text/tar.5.txt534
1 files changed, 534 insertions, 0 deletions
diff --git a/archivers/libarchive/files/doc/text/tar.5.txt b/archivers/libarchive/files/doc/text/tar.5.txt
new file mode 100644
index 00000000000..24b77e15984
--- /dev/null
+++ b/archivers/libarchive/files/doc/text/tar.5.txt
@@ -0,0 +1,534 @@
+TAR(5) FreeBSD File Formats Manual TAR(5)
+
+NAME
+ tar -- format of tape archive files
+
+DESCRIPTION
+ The tar archive format collects any number of files, directories, and
+ other file system objects (symbolic links, device nodes, etc.) into a
+ single stream of bytes. The format was originally designed to be used
+ with tape drives that operate with fixed-size blocks, but is widely used
+ as a general packaging mechanism.
+
+ General Format
+ A tar archive consists of a series of 512-byte records. Each file system
+ object requires a header record which stores basic metadata (pathname,
+ owner, permissions, etc.) and zero or more records containing any file
+ data. The end of the archive is indicated by two records consisting
+ entirely of zero bytes.
+
+ For compatibility with tape drives that use fixed block sizes, programs
+ that read or write tar files always read or write a fixed number of
+ records with each I/O operation. These ``blocks'' are always a multiple
+ of the record size. The most common block size--and the maximum sup-
+ ported by historic implementations--is 10240 bytes or 20 records. (Note:
+ the terms ``block'' and ``record'' here are not entirely standard; this
+ document follows the convention established by John Gilmore in document-
+ ing pdtar.)
+
+ Old-Style Archive Format
+ The original tar archive format has been extended many times to include
+ additional information that various implementors found necessary. This
+ section describes the variant implemented by the tar command included in
+ Version 7 AT&T UNIX, which is one of the earliest widely-used versions of
+ the tar program.
+
+ The header record for an old-style tar archive consists of the following:
+
+ struct header_old_tar {
+ char name[100];
+ char mode[8];
+ char uid[8];
+ char gid[8];
+ char size[12];
+ char mtime[12];
+ char checksum[8];
+ char linkflag[1];
+ char linkname[100];
+ char pad[255];
+ };
+ All unused bytes in the header record are filled with nulls.
+
+ name Pathname, stored as a null-terminated string. Early tar imple-
+ mentations only stored regular files (including hardlinks to
+ those files). One common early convention used a trailing "/"
+ character to indicate a directory name, allowing directory per-
+ missions and owner information to be archived and restored.
+
+ mode File mode, stored as an octal number in ASCII.
+
+ uid, gid
+ User id and group id of owner, as octal numbers in ASCII.
+
+ size Size of file, as octal number in ASCII. For regular files only,
+ this indicates the amount of data that follows the header. In
+ particular, this field was ignored by early tar implementations
+ when extracting hardlinks. Modern writers should always store a
+ zero length for hardlink entries.
+
+ mtime Modification time of file, as an octal number in ASCII. This
+ indicates the number of seconds since the start of the epoch,
+ 00:00:00 UTC January 1, 1970. Note that negative values should
+ be avoided here, as they are handled inconsistently.
+
+ checksum
+ Header checksum, stored as an octal number in ASCII. To compute
+ the checksum, set the checksum field to all spaces, then sum all
+ bytes in the header using unsigned arithmetic. This field should
+ be stored as six octal digits followed by a null and a space
+ character. Note that many early implementations of tar used
+ signed arithmetic for the checksum field, which can cause inter-
+ operability problems when transferring archives between systems.
+ Modern robust readers compute the checksum both ways and accept
+ the header if either computation matches.
+
+ linkflag, linkname
+ In order to preserve hardlinks and conserve tape, a file with
+ multiple links is only written to the archive the first time it
+ is encountered. The next time it is encountered, the linkflag is
+ set to an ASCII `1' and the linkname field holds the first name
+ under which this file appears. (Note that regular files have a
+ null value in the linkflag field.)
+
+ Early tar implementations varied in how they terminated these fields.
+ The tar command in Version 7 AT&T UNIX used the following conventions
+ (this is also documented in early BSD manpages): the pathname must be
+ null-terminated; the mode, uid, and gid fields must end in a space and a
+ null byte; the size and mtime fields must end in a space; the checksum is
+ terminated by a null and a space. Early implementations filled the
+ numeric fields with leading spaces. This seems to have been common prac-
+ tice until the IEEE Std 1003.1-1988 (``POSIX.1'') standard was released.
+ For best portability, modern implementations should fill the numeric
+ fields with leading zeros.
+
+ Pre-POSIX Archives
+ An early draft of IEEE Std 1003.1-1988 (``POSIX.1'') served as the basis
+ for John Gilmore's pdtar program and many system implementations from the
+ late 1980s and early 1990s. These archives generally follow the POSIX
+ ustar format described below with the following variations:
+ o The magic value is ``ustar '' (note the following space). The
+ version field contains a space character followed by a null.
+ o The numeric fields are generally filled with leading spaces (not
+ leading zeros as recommended in the final standard).
+ o The prefix field is often not used, limiting pathnames to the 100
+ characters of old-style archives.
+
+ POSIX ustar Archives
+ IEEE Std 1003.1-1988 (``POSIX.1'') defined a standard tar file format to
+ be read and written by compliant implementations of tar(1). This format
+ is often called the ``ustar'' format, after the magic value used in the
+ header. (The name is an acronym for ``Unix Standard TAR''.) It extends
+ the historic format with new fields:
+
+ struct header_posix_ustar {
+ char name[100];
+ char mode[8];
+ char uid[8];
+ char gid[8];
+ char size[12];
+ char mtime[12];
+ char checksum[8];
+ char typeflag[1];
+ char linkname[100];
+ char magic[6];
+ char version[2];
+ char uname[32];
+ char gname[32];
+ char devmajor[8];
+ char devminor[8];
+ char prefix[155];
+ char pad[12];
+ };
+
+ typeflag
+ Type of entry. POSIX extended the earlier linkflag field with
+ several new type values:
+ ``0'' Regular file. NULL should be treated as a synonym, for
+ compatibility purposes.
+ ``1'' Hard link.
+ ``2'' Symbolic link.
+ ``3'' Character device node.
+ ``4'' Block device node.
+ ``5'' Directory.
+ ``6'' FIFO node.
+ ``7'' Reserved.
+ Other A POSIX-compliant implementation must treat any unrecog-
+ nized typeflag value as a regular file. In particular,
+ writers should ensure that all entries have a valid file-
+ name so that they can be restored by readers that do not
+ support the corresponding extension. Uppercase letters
+ "A" through "Z" are reserved for custom extensions. Note
+ that sockets and whiteout entries are not archivable.
+ It is worth noting that the size field, in particular, has dif-
+ ferent meanings depending on the type. For regular files, of
+ course, it indicates the amount of data following the header.
+ For directories, it may be used to indicate the total size of all
+ files in the directory, for use by operating systems that pre-
+ allocate directory space. For all other types, it should be set
+ to zero by writers and ignored by readers.
+
+ magic Contains the magic value ``ustar'' followed by a NULL byte to
+ indicate that this is a POSIX standard archive. Full compliance
+ requires the uname and gname fields be properly set.
+
+ version
+ Version. This should be ``00'' (two copies of the ASCII digit
+ zero) for POSIX standard archives.
+
+ uname, gname
+ User and group names, as null-terminated ASCII strings. These
+ should be used in preference to the uid/gid values when they are
+ set and the corresponding names exist on the system.
+
+ devmajor, devminor
+ Major and minor numbers for character device or block device
+ entry.
+
+ prefix First part of pathname. If the pathname is too long to fit in
+ the 100 bytes provided by the standard format, it can be split at
+ any / character with the first portion going here. If the prefix
+ field is not empty, the reader will prepend the prefix value and
+ a / character to the regular name field to obtain the full path-
+ name.
+
+ Note that all unused bytes must be set to NULL.
+
+ Field termination is specified slightly differently by POSIX than by pre-
+ vious implementations. The magic, uname, and gname fields must have a
+ trailing NULL. The pathname, linkname, and prefix fields must have a
+ trailing NULL unless they fill the entire field. (In particular, it is
+ possible to store a 256-character pathname if it happens to have a / as
+ the 156th character.) POSIX requires numeric fields to be zero-padded in
+ the front, and allows them to be terminated with either space or NULL
+ characters.
+
+ Currently, most tar implementations comply with the ustar format, occa-
+ sionally extending it by adding new fields to the blank area at the end
+ of the header record.
+
+ Pax Interchange Format
+ There are many attributes that cannot be portably stored in a POSIX ustar
+ archive. IEEE Std 1003.1-2001 (``POSIX.1'') defined a ``pax interchange
+ format'' that uses two new types of entries to hold text-formatted meta-
+ data that applies to following entries. Note that a pax interchange for-
+ mat archive is a ustar archive in every respect. The new data is stored
+ in ustar-compatible archive entries that use the ``x'' or ``g'' typeflag.
+ In particular, older implementations that do not fully support these
+ extensions will extract the metadata into regular files, where the meta-
+ data can be examined as necessary.
+
+ An entry in a pax interchange format archive consists of one or two stan-
+ dard ustar entries, each with its own header and data. The first
+ optional entry stores the extended attributes for the following entry.
+ This optional first entry has an "x" typeflag and a size field that indi-
+ cates the total size of the extended attributes. The extended attributes
+ themselves are stored as a series of text-format lines encoded in the
+ portable UTF-8 encoding. Each line consists of a decimal number, a
+ space, a key string, an equals sign, a value string, and a new line. The
+ decimal number indicates the length of the entire line, including the
+ initial length field and the trailing newline. An example of such a
+ field is:
+ 25 ctime=1084839148.1212\n
+ Keys in all lowercase are standard keys. Vendors can add their own keys
+ by prefixing them with an all uppercase vendor name and a period. Note
+ that, unlike the historic header, numeric values are stored using deci-
+ mal, not octal. A description of some common keys follows:
+
+ atime, ctime, mtime
+ File access, inode change, and modification times. These fields
+ can be negative or include a decimal point and a fractional
+ value.
+
+ uname, uid, gname, gid
+ User name, group name, and numeric UID and GID values. The user
+ name and group name stored here are encoded in UTF8 and can thus
+ include non-ASCII characters. The UID and GID fields can be of
+ arbitrary length.
+
+ linkpath
+ The full path of the linked-to file. Note that this is encoded
+ in UTF8 and can thus include non-ASCII characters.
+
+ path The full pathname of the entry. Note that this is encoded in
+ UTF8 and can thus include non-ASCII characters.
+
+ realtime.*, security.*
+ These keys are reserved and may be used for future standardiza-
+ tion.
+
+ size The size of the file. Note that there is no length limit on this
+ field, allowing conforming archives to store files much larger
+ than the historic 8GB limit.
+
+ SCHILY.*
+ Vendor-specific attributes used by Joerg Schilling's star imple-
+ mentation.
+
+ SCHILY.acl.access, SCHILY.acl.default
+ Stores the access and default ACLs as textual strings in a format
+ that is an extension of the format specified by POSIX.1e draft
+ 17. In particular, each user or group access specification can
+ include a fourth colon-separated field with the numeric UID or
+ GID. This allows ACLs to be restored on systems that may not
+ have complete user or group information available (such as when
+ NIS/YP or LDAP services are temporarily unavailable).
+
+ SCHILY.devminor, SCHILY.devmajor
+ The full minor and major numbers for device nodes.
+
+ SCHILY.dev, SCHILY.ino, SCHILY.nlinks
+ The device number, inode number, and link count for the entry.
+ In particular, note that a pax interchange format archive using
+ Joerg Schilling's SCHILY.* extensions can store all of the data
+ from struct stat.
+
+ LIBARCHIVE.xattr.namespace.key
+ Libarchive stores POSIX.1e-style extended attributes using keys
+ of this form. The key value is URL-encoded: All non-ASCII char-
+ acters and the two special characters ``='' and ``%'' are encoded
+ as ``%'' followed by two uppercase hexadecimal digits. The value
+ of this key is the extended attribute value encoded in base 64.
+ XXX Detail the base-64 format here XXX
+
+ VENDOR.*
+ XXX document other vendor-specific extensions XXX
+
+ Any values stored in an extended attribute override the corresponding
+ values in the regular tar header. Note that compliant readers should
+ ignore the regular fields when they are overridden. This is important,
+ as existing archivers are known to store non-compliant values in the
+ standard header fields in this situation. There are no limits on length
+ for any of these fields. In particular, numeric fields can be arbitrar-
+ ily large. All text fields are encoded in UTF8. Compliant writers
+ should store only portable 7-bit ASCII characters in the standard ustar
+ header and use extended attributes whenever a text value contains non-
+ ASCII characters.
+
+ In addition to the x entry described above, the pax interchange format
+ also supports a g entry. The g entry is identical in format, but speci-
+ fies attributes that serve as defaults for all subsequent archive
+ entries. The g entry is not widely used.
+
+ Besides the new x and g entries, the pax interchange format has a few
+ other minor variations from the earlier ustar format. The most troubling
+ one is that hardlinks are permitted to have data following them. This
+ allows readers to restore any hardlink to a file without having to rewind
+ the archive to find an earlier entry. However, it creates complications
+ for robust readers, as it is no longer clear whether or not they should
+ ignore the size field for hardlink entries.
+
+ GNU Tar Archives
+ The GNU tar program started with a pre-POSIX format similar to that
+ described earlier and has extended it using several different mechanisms:
+ It added new fields to the empty space in the header (some of which was
+ later used by POSIX for conflicting purposes); it allowed the header to
+ be continued over multiple records; and it defined new entries that mod-
+ ify following entries (similar in principle to the x entry described
+ above, but each GNU special entry is single-purpose, unlike the general-
+ purpose x entry). As a result, GNU tar archives are not POSIX compati-
+ ble, although more lenient POSIX-compliant readers can successfully
+ extract most GNU tar archives.
+
+ struct header_gnu_tar {
+ char name[100];
+ char mode[8];
+ char uid[8];
+ char gid[8];
+ char size[12];
+ char mtime[12];
+ char checksum[8];
+ char typeflag[1];
+ char linkname[100];
+ char magic[6];
+ char version[2];
+ char uname[32];
+ char gname[32];
+ char devmajor[8];
+ char devminor[8];
+ char atime[12];
+ char ctime[12];
+ char offset[12];
+ char longnames[4];
+ char unused[1];
+ struct {
+ char offset[12];
+ char numbytes[12];
+ } sparse[4];
+ char isextended[1];
+ char realsize[12];
+ char pad[17];
+ };
+
+ typeflag
+ GNU tar uses the following special entry types, in addition to
+ those defined by POSIX:
+
+ 7 GNU tar treats type "7" records identically to type "0"
+ records, except on one obscure RTOS where they are used
+ to indicate the pre-allocation of a contiguous file on
+ disk.
+
+ D This indicates a directory entry. Unlike the POSIX-stan-
+ dard "5" typeflag, the header is followed by data records
+ listing the names of files in this directory. Each name
+ is preceded by an ASCII "Y" if the file is stored in this
+ archive or "N" if the file is not stored in this archive.
+ Each name is terminated with a null, and an extra null
+ marks the end of the name list. The purpose of this
+ entry is to support incremental backups; a program
+ restoring from such an archive may wish to delete files
+ on disk that did not exist in the directory when the ar-
+ chive was made.
+
+ Note that the "D" typeflag specifically violates POSIX,
+ which requires that unrecognized typeflags be restored as
+ normal files. In this case, restoring the "D" entry as a
+ file could interfere with subsequent creation of the
+ like-named directory.
+
+ K The data for this entry is a long linkname for the fol-
+ lowing regular entry.
+
+ L The data for this entry is a long pathname for the fol-
+ lowing regular entry.
+
+ M This is a continuation of the last file on the previous
+ volume. GNU multi-volume archives guarantee that each
+ volume begins with a valid entry header. To ensure this,
+ a file may be split, with part stored at the end of one
+ volume, and part stored at the beginning of the next vol-
+ ume. The "M" typeflag indicates that this entry contin-
+ ues an existing file. Such entries can only occur as the
+ first or second entry in an archive (the latter only if
+ the first entry is a volume label). The size field spec-
+ ifies the size of this entry. The offset field at bytes
+ 369-380 specifies the offset where this file fragment
+ begins. The realsize field specifies the total size of
+ the file (which must equal size plus offset). When
+ extracting, GNU tar checks that the header file name is
+ the one it is expecting, that the header offset is in the
+ correct sequence, and that the sum of offset and size is
+ equal to realsize. FreeBSD's version of GNU tar does not
+ handle the corner case of an archive's being continued in
+ the middle of a long name or other extension header.
+
+ N Type "N" records are no longer generated by GNU tar.
+ They contained a list of files to be renamed or symlinked
+ after extraction; this was originally used to support
+ long names. The contents of this record are a text
+ description of the operations to be done, in the form
+ ``Rename %s to %s\n'' or ``Symlink %s to %s\n''; in
+ either case, both filenames are escaped using K&R C syn-
+ tax.
+
+ S This is a ``sparse'' regular file. Sparse files are
+ stored as a series of fragments. The header contains a
+ list of fragment offset/length pairs. If more than four
+ such entries are required, the header is extended as nec-
+ essary with ``extra'' header extensions (an older format
+ that is no longer used), or ``sparse'' extensions.
+
+ V The name field should be interpreted as a tape/volume
+ header name. This entry should generally be ignored on
+ extraction.
+
+ magic The magic field holds the five characters ``ustar'' followed by a
+ space. Note that POSIX ustar archives have a trailing null.
+
+ version
+ The version field holds a space character followed by a null.
+ Note that POSIX ustar archives use two copies of the ASCII digit
+ ``0''.
+
+ atime, ctime
+ The time the file was last accessed and the time of last change
+ of file information, stored in octal as with mtime.
+
+ longnames
+ This field is apparently no longer used.
+
+ Sparse offset / numbytes
+ Each such structure specifies a single fragment of a sparse file.
+ The two fields store values as octal numbers. The fragments are
+ each padded to a multiple of 512 bytes in the archive. On
+ extraction, the list of fragments is collected from the header
+ (including any extension headers), and the data is then read and
+ written to the file at appropriate offsets.
+
+ isextended
+ If this is set to non-zero, the header will be followed by addi-
+ tional ``sparse header'' records. Each such record contains
+ information about as many as 21 additional sparse blocks as shown
+ here:
+
+ struct gnu_sparse_header {
+ struct {
+ char offset[12];
+ char numbytes[12];
+ } sparse[21];
+ char isextended[1];
+ char padding[7];
+ };
+
+ realsize
+ A binary representation of the file's complete size, with a much
+ larger range than the POSIX file size. In particular, with M
+ type files, the current entry is only a portion of the file. In
+ that case, the POSIX size field will indicate the size of this
+ entry; the realsize field will indicate the total size of the
+ file.
+
+ Solaris Tar
+ XXX More Details Needed XXX
+
+ Solaris tar (beginning with SunOS XXX 5.7 ?? XXX) supports an
+ ``extended'' format that is fundamentally similar to pax interchange for-
+ mat, with the following differences:
+ o Extended attributes are stored in an entry whose type is X, not
+ x, as used by pax interchange format. The detailed format of
+ this entry appears to be the same as detailed above for the x
+ entry.
+ o An additional A entry is used to store an ACL for the following
+ regular entry. The body of this entry contains a seven-digit
+ octal number (whose value is 01000000 plus the number of ACL
+ entries) followed by a zero byte, followed by the textual ACL
+ description.
+
+ Other Extensions
+ One common extension, utilized by GNU tar, star, and other newer tar
+ implementations, permits binary numbers in the standard numeric fields.
+ This is flagged by setting the high bit of the first character. This
+ permits 95-bit values for the length and time fields and 63-bit values
+ for the uid, gid, and device numbers. GNU tar supports this extension
+ for the length, mtime, ctime, and atime fields. Joerg Schilling's star
+ program supports this extension for all numeric fields. Note that this
+ extension is largely obsoleted by the extended attribute record provided
+ by the pax interchange format.
+
+ Another early GNU extension allowed base-64 values rather than octal.
+ This extension was short-lived and such archives are almost never seen.
+ However, there is still code in GNU tar to support them; this code is
+ responsible for a very cryptic warning message that is sometimes seen
+ when GNU tar encounters a damaged archive.
+
+SEE ALSO
+ ar(1), pax(1), tar(1)
+
+STANDARDS
+ The tar utility is no longer a part of POSIX or the Single Unix Standard.
+ It last appeared in Version 2 of the Single UNIX Specification
+ (``SUSv2''). It has been supplanted in subsequent standards by pax(1).
+ The ustar format is currently part of the specification for the pax(1)
+ utility. The pax interchange file format is new with IEEE Std
+ 1003.1-2001 (``POSIX.1'').
+
+HISTORY
+ A tar command appeared in Seventh Edition Unix, which was released in
+ January, 1979. It replaced the tp program from Fourth Edition Unix which
+ in turn replaced the tap program from First Edition Unix. John Gilmore's
+ pdtar public-domain implementation (circa 1987) was highly influential
+ and formed the basis of GNU tar. Joerg Shilling's star archiver is
+ another open-source (GPL) archiver (originally developed circa 1985)
+ which features complete support for pax interchange format.
+
+FreeBSD 6.0 May 20, 2004 FreeBSD 6.0