diff options
author | Internet Software Consortium, Inc <@isc.org> | 2007-09-07 14:08:16 -0600 |
---|---|---|
committer | LaMont Jones <lamont@debian.org> | 2007-09-07 14:08:16 -0600 |
commit | ab47e90612dcdb02c4b134cfb1be0697007c0dac (patch) | |
tree | 88602431bff8a63ef57fb16f35e30d93f7455d41 /doc/rfc | |
download | bind9-ab47e90612dcdb02c4b134cfb1be0697007c0dac.tar.gz |
9.0.0b1
Diffstat (limited to 'doc/rfc')
43 files changed, 49741 insertions, 0 deletions
diff --git a/doc/rfc/rfc1032.txt b/doc/rfc/rfc1032.txt new file mode 100644 index 00000000..0e82721c --- /dev/null +++ b/doc/rfc/rfc1032.txt @@ -0,0 +1,781 @@ +Network Working Group M. Stahl +Request for Comments: 1032 SRI International + November 1987 + + + DOMAIN ADMINISTRATORS GUIDE + + +STATUS OF THIS MEMO + + This memo describes procedures for registering a domain with the + Network Information Center (NIC) of Defense Data Network (DDN), and + offers guidelines on the establishment and administration of a domain + in accordance with the requirements specified in RFC-920. It is + intended for use by domain administrators. This memo should be used + in conjunction with RFC-920, which is an official policy statement of + the Internet Activities Board (IAB) and the Defense Advanced Research + Projects Agency (DARPA). Distribution of this memo is unlimited. + +BACKGROUND + + Domains are administrative entities that provide decentralized + management of host naming and addressing. The domain-naming system + is distributed and hierarchical. + + The NIC is designated by the Defense Communications Agency (DCA) to + provide registry services for the domain-naming system on the DDN and + DARPA portions of the Internet. + + As registrar of top-level and second-level domains, as well as + administrator of the root domain name servers on behalf of DARPA and + DDN, the NIC is responsible for maintaining the root server zone + files and their binary equivalents. In addition, the NIC is + responsible for administering the top-level domains of "ARPA," "COM," + "EDU," "ORG," "GOV," and "MIL" on behalf of DCA and DARPA until it + becomes feasible for other appropriate organizations to assume those + responsibilities. + + It is recommended that the guidelines described in this document be + used by domain administrators in the establishment and control of + second-level domains. + +THE DOMAIN ADMINISTRATOR + + The role of the domain administrator (DA) is that of coordinator, + manager, and technician. If his domain is established at the second + level or lower in the tree, the DA must register by interacting with + the management of the domain directly above his, making certain that + + + +Stahl [Page 1] + +RFC 1032 DOMAIN ADMINISTRATORS GUIDE November 1987 + + + his domain satisfies all the requirements of the administration under + which his domain would be situated. To find out who has authority + over the name space he wishes to join, the DA can ask the NIC + Hostmaster. Information on contacts for the top-level and second- + level domains can also be found on line in the file NETINFO:DOMAIN- + CONTACTS.TXT, which is available from the NIC via anonymous FTP. + + The DA should be technically competent; he should understand the + concepts and procedures for operating a domain server, as described + in RFC-1034, and make sure that the service provided is reliable and + uninterrupted. It is his responsibility or that of his delegate to + ensure that the data will be current at all times. As a manager, the + DA must be able to handle complaints about service provided by his + domain name server. He must be aware of the behavior of the hosts in + his domain, and take prompt action on reports of problems, such as + protocol violations or other serious misbehavior. The administrator + of a domain must be a responsible person who has the authority to + either enforce these actions himself or delegate them to someone + else. + + Name assignments within a domain are controlled by the DA, who should + verify that names are unique within his domain and that they conform + to standard naming conventions. He furnishes access to names and + name-related information to users both inside and outside his domain. + He should work closely with the personnel he has designated as the + "technical and zone" contacts for his domain, for many administrative + decisions will be made on the basis of input from these people. + +THE DOMAIN TECHNICAL AND ZONE CONTACT + + A zone consists of those contiguous parts of the domain tree for + which a domain server has complete information and over which it has + authority. A domain server may be authoritative for more than one + zone. The domain technical/zone contact is the person who tends to + the technical aspects of maintaining the domain's name server and + resolver software, and database files. He keeps the name server + running, and interacts with technical people in other domains and + zones to solve problems that affect his zone. + +POLICIES + + Domain or host name choices and the allocation of domain name space + are considered to be local matters. In the event of conflicts, it is + the policy of the NIC not to get involved in local disputes or in the + local decision-making process. The NIC will not act as referee in + disputes over such matters as who has the "right" to register a + particular top-level or second-level domain for an organization. The + NIC considers this a private local matter that must be settled among + + + +Stahl [Page 2] + +RFC 1032 DOMAIN ADMINISTRATORS GUIDE November 1987 + + + the parties involved prior to their commencing the registration + process with the NIC. Therefore, it is assumed that the responsible + person for a domain will have resolved any local conflicts among the + members of his domain before registering that domain with the NIC. + The NIC will give guidance, if requested, by answering specific + technical questions, but will not provide arbitration in disputes at + the local level. This policy is also in keeping with the distributed + hierarchical nature of the domain-naming system in that it helps to + distribute the tasks of solving problems and handling questions. + + Naming conventions for hosts should follow the rules specified in + RFC-952. From a technical standpoint, domain names can be very long. + Each segment of a domain name may contain up to 64 characters, but + the NIC strongly advises DAs to choose names that are 12 characters + or fewer, because behind every domain system there is a human being + who must keep track of the names, addresses, contacts, and other data + in a database. The longer the name, the more likely the data + maintainer is to make a mistake. Users also will appreciate shorter + names. Most people agree that short names are easier to remember and + type; most domain names registered so far are 12 characters or fewer. + + Domain name assignments are made on a first-come-first-served basis. + The NIC has chosen not to register individual hosts directly under + the top-level domains it administers. One advantage of the domain + naming system is that administration and data maintenance can be + delegated down a hierarchical tree. Registration of hosts at the + same level in the tree as a second-level domain would dilute the + usefulness of this feature. In addition, the administrator of a + domain is responsible for the actions of hosts within his domain. We + would not want to find ourselves in the awkward position of policing + the actions of individual hosts. Rather, the subdomains registered + under these top-level domains retain the responsibility for this + function. + + Countries that wish to be registered as top-level domains are + required to name themselves after the two-letter country code listed + in the international standard ISO-3166. In some cases, however, the + two-letter ISO country code is identical to a state code used by the + U.S. Postal Service. Requests made by countries to use the three- + letter form of country code specified in the ISO-3166 standard will + be considered in such cases so as to prevent possible conflicts and + confusion. + + + + + + + + + +Stahl [Page 3] + +RFC 1032 DOMAIN ADMINISTRATORS GUIDE November 1987 + + +HOW TO REGISTER + + Obtain a domain questionnaire from the NIC hostmaster, or FTP the + file NETINFO:DOMAIN-TEMPLATE.TXT from host SRI-NIC.ARPA. + + Fill out the questionnaire completely. Return it via electronic mail + to HOSTMASTER@SRI-NIC.ARPA. + + The APPENDIX to this memo contains the application form for + registering a top-level or second-level domain with the NIC. It + supersedes the version of the questionnaire found in RFC-920. The + application should be submitted by the person administratively + responsible for the domain, and must be filled out completely before + the NIC will authorize establishment of a top-level or second-level + domain. The DA is responsible for keeping his domain's data current + with the NIC or with the registration agent with which his domain is + registered. For example, the CSNET and UUCP managements act as + domain filters, processing domain applications for their own + organizations. They pass pertinent information along periodically to + the NIC for incorporation into the domain database and root server + files. The online file NETINFO:ALTERNATE-DOMAIN-PROCEDURE.TXT + outlines this procedure. It is highly recommended that the DA review + this information periodically and provide any corrections or + additions. Corrections should be submitted via electronic mail. + +WHICH DOMAIN NAME? + + The designers of the domain-naming system initiated several general + categories of names as top-level domain names, so that each could + accommodate a variety of organizations. The current top-level + domains registered with the DDN Network Information Center are ARPA, + COM, EDU, GOV, MIL, NET, and ORG, plus a number of top-level country + domains. To join one of these, a DA needs to be aware of the purpose + for which it was intended. + + "ARPA" is a temporary domain. It is by default appended to the + names of hosts that have not yet joined a domain. When the system + was begun in 1984, the names of all hosts in the Official DoD + Internet Host Table maintained by the NIC were changed by adding + of the label ".ARPA" in order to accelerate a transition to the + domain-naming system. Another reason for the blanket name changes + was to force hosts to become accustomed to using the new style + names and to modify their network software, if necessary. This + was done on a network-wide basis and was directed by DCA in DDN + Management Bulletin No. 22. Hosts that fall into this domain will + eventually move to other branches of the domain tree. + + + + + +Stahl [Page 4] + +RFC 1032 DOMAIN ADMINISTRATORS GUIDE November 1987 + + + "COM" is meant to incorporate subdomains of companies and + businesses. + + "EDU" was initiated to accommodate subdomains set up by + universities and other educational institutions. + + "GOV" exists to act as parent domain for subdomains set up by + government agencies. + + "MIL" was initiated to act as parent to subdomains that are + developed by military organizations. + + "NET" was introduced as a parent domain for various network-type + organizations. Organizations that belong within this top-level + domain are generic or network-specific, such as network service + centers and consortia. "NET" also encompasses network + management-related organizations, such as information centers and + operations centers. + + "ORG" exists as a parent to subdomains that do not clearly fall + within the other top-level domains. This may include technical- + support groups, professional societies, or similar organizations. + + One of the guidelines in effect in the domain-naming system is that a + host should have only one name regardless of what networks it is + connected to. This implies, that, in general, domain names should + not include routing information or addresses. For example, a host + that has one network connection to the Internet and another to BITNET + should use the same name when talking to either network. For a + description of the syntax of domain names, please refer to Section 3 + of RFC-1034. + +VERIFICATION OF DATA + + The verification process can be accomplished in several ways. One of + these is through the NIC WHOIS server. If he has access to WHOIS, + the DA can type the command "whois domain <domain name><return>". + The reply from WHOIS will supply the following: the name and address + of the organization "owning" the domain; the name of the domain; its + administrative, technical, and zone contacts; the host names and + network addresses of sites providing name service for the domain. + + + + + + + + + + +Stahl [Page 5] + +RFC 1032 DOMAIN ADMINISTRATORS GUIDE November 1987 + + + Example: + + @whois domain rice.edu<Return> + + Rice University (RICE-DOM) + Advanced Studies and Research + Houston, TX 77001 + + Domain Name: RICE.EDU + + Administrative Contact: + Kennedy, Ken (KK28) Kennedy@LLL-CRG.ARPA (713) 527-4834 + Technical Contact, Zone Contact: + Riffle, Vicky R. (VRR) rif@RICE.EDU + (713) 527-8101 ext 3844 + + Domain servers: + + RICE.EDU 128.42.5.1 + PENDRAGON.CS.PURDUE.EDU 128.10.2.5 + + + Alternatively, the DA can send an electronic mail message to + SERVICE@SRI-NIC.ARPA. In the subject line of the message header, the + DA should type "whois domain <domain name>". The requested + information will be returned via electronic mail. This method is + convenient for sites that do not have access to the NIC WHOIS + service. + + The initial application for domain authorization should be submitted + via electronic mail, if possible, to HOSTMASTER@SRI-NIC.ARPA. The + questionnaire described in the appendix may be used or a separate + application can be FTPed from host SRI-NIC.ARPA. The information + provided by the administrator will be reviewed by hostmaster + personnel for completeness. There will most likely be a few + exchanges of correspondence via electronic mail, the preferred method + of communication, prior to authorization of the domain. + +HOW TO GET MORE INFORMATION + + An informational table of the top-level domains and their root + servers is contained in the file NETINFO:DOMAINS.TXT online at SRI- + NIC.ARPA. This table can be obtained by FTPing the file. + Alternatively, the information can be acquired by opening a TCP or + UDP connection to the NIC Host Name Server, port 101 on SRI-NIC.ARPA, + and invoking the command "ALL-DOM". + + + + + +Stahl [Page 6] + +RFC 1032 DOMAIN ADMINISTRATORS GUIDE November 1987 + + + The following online files, all available by FTP from SRI-NIC.ARPA, + contain pertinent domain information: + + - NETINFO:DOMAINS.TXT, a table of all top-level domains and the + network addresses of the machines providing domain name + service for them. It is updated each time a new top-level + domain is approved. + + - NETINFO:DOMAIN-INFO.TXT contains a concise list of all + top-level and second-level domain names registered with the + NIC and is updated monthly. + + - NETINFO:DOMAIN-CONTACTS.TXT also contains a list of all the + top level and second-level domains, but includes the + administrative, technical and zone contacts for each as well. + + - NETINFO:DOMAIN-TEMPLATE.TXT contains the questionnaire to be + completed before registering a top-level or second-level + domain. + + For either general or specific information on the domain system, do + one or more of the following: + + 1. Send electronic mail to HOSTMASTER@SRI-NIC.ARPA + + 2. Call the toll-free NIC hotline at (800) 235-3155 + + 3. Use FTP to get background RFCs and other files maintained + online at the NIC. Some pertinent RFCs are listed below in + the REFERENCES section of this memo. + + + + + + + + + + + + + + + + + + + + + +Stahl [Page 7] + +RFC 1032 DOMAIN ADMINISTRATORS GUIDE November 1987 + + +REFERENCES + + The references listed here provide important background information + on the domain-naming system. Path names of the online files + available via anonymous FTP from the SRI-NIC.ARPA host are noted in + brackets. + + 1. Defense Communications Agency DDN Defense Communications + System, DDN Management Bulletin No. 22, Domain Names + Transition, March 1984. + [ DDN-NEWS:DDN-MGT-BULLETIN-22.TXT ] + + 2. Defense Communications Agency DDN Defense Communications + System, DDN Management Bulletin No. 32, Phase I of the Domain + Name Implementation, January 1987. + [ DDN-NEWS:DDN-MGT-BULLETIN-32.TXT ] + + 3. Harrenstien, K., M. Stahl, and E. Feinler, "Hostname + Server", RFC-953, DDN Network Information Center, SRI + International, October 1985. [ RFC:RFC953.TXT ] + + 4. Harrenstien, K., M. Stahl, and E. Feinler, "Official DoD + Internet Host Table Specification", RFC-952, DDN Network + Information Center, SRI International, October 1985. + [ RFC:RFC952.TXT ] + + 5. ISO, "Codes for the Representation of Names of Countries", + ISO-3166, International Standards Organization, May 1981. + [ Not online ] + + 6. Lazear, W.D., "MILNET Name Domain Transition", RFC-1031, + Mitre Corporation, October 1987. [ RFC:RFC1031.TXT ] + + 7. Lottor, M.K., "Domain Administrators Operations Guide", + RFC-1033, DDN Network Information Center, SRI International, + July 1987. [ RFC:RFC1033.TXT ] + + 8. Mockapetris, P., "Domain Names - Concepts and Facilities", + RFC-1034, USC Information Sciences Institute, October 1987. + [ RFC:RFC1034.TXT ] + + 9. Mockapetris, P., "Domain Names - Implementation and + Specification", RFC-1035, USC Information Sciences Institute, + October 1987. [ RFC:RFC1035.TXT ] + + 10. Mockapetris, P., "The Domain Name System", Proceedings of the + IFIP 6.5 Working Conference on Computer Message Services, + Nottingham, England, May 1984. Also as ISI/RS-84-133, June + + + +Stahl [Page 8] + +RFC 1032 DOMAIN ADMINISTRATORS GUIDE November 1987 + + + 1984. [ Not online ] + + 11. Mockapetris, P., J. Postel, and P. Kirton, "Name Server + Design for Distributed Systems", Proceedings of the Seventh + International Conference on Computer Communication, October + 30 to November 3 1984, Sidney, Australia. Also as + ISI/RS-84-132, June 1984. [ Not online ] + + 12. Partridge, C., "Mail Routing and the Domain System", RFC-974, + CSNET-CIC, BBN Laboratories, January 1986. + [ RFC:RFC974.TXT ] + + 13. Postel, J., "The Domain Names Plan and Schedule", RFC-881, + USC Information Sciences Institute, November 1983. + [ RFC:RFC881.TXT ] + + 14. Reynolds, J., and Postel, J., "Assigned Numbers", RFC-1010 + USC Information Sciences Institute, May 1986. + [ RFC:RFC1010.TXT ] + + 15. Romano, S., and Stahl, M., "Internet Numbers", RFC-1020, + SRI, November 1987. + [ RFC:RFC1020.TXT ] + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Stahl [Page 9] + +RFC 1032 DOMAIN ADMINISTRATORS GUIDE November 1987 + + +APPENDIX + + The following questionnaire may be FTPed from SRI-NIC.ARPA as + NETINFO:DOMAIN-TEMPLATE.TXT. + + --------------------------------------------------------------------- + + To establish a domain, the following information must be sent to the + NIC Domain Registrar (HOSTMASTER@SRI-NIC.ARPA): + + NOTE: The key people must have electronic mailboxes and NIC + "handles," unique NIC database identifiers. If you have access to + "WHOIS", please check to see if you are registered and if so, make + sure the information is current. Include only your handle and any + changes (if any) that need to be made in your entry. If you do not + have access to "WHOIS", please provide all the information indicated + and a NIC handle will be assigned. + + (1) The name of the top-level domain to join. + + For example: COM + + (2) The NIC handle of the administrative head of the organization. + Alternately, the person's name, title, mailing address, phone number, + organization, and network mailbox. This is the contact point for + administrative and policy questions about the domain. In the case of + a research project, this should be the principal investigator. + + For example: + + Administrator + + Organization The NetWorthy Corporation + Name Penelope Q. Sassafrass + Title President + Mail Address The NetWorthy Corporation + 4676 Andrews Way, Suite 100 + Santa Clara, CA 94302-1212 + Phone Number (415) 123-4567 + Net Mailbox Sassafrass@ECHO.TNC.COM + NIC Handle PQS + + (3) The NIC handle of the technical contact for the domain. + Alternately, the person's name, title, mailing address, phone number, + organization, and network mailbox. This is the contact point for + problems concerning the domain or zone, as well as for updating + information about the domain or zone. + + + + +Stahl [Page 10] + +RFC 1032 DOMAIN ADMINISTRATORS GUIDE November 1987 + + + For example: + + Technical and Zone Contact + + Organization The NetWorthy Corporation + Name Ansel A. Aardvark + Title Executive Director + Mail Address The NetWorthy Corporation + 4676 Andrews Way, Suite 100 + Santa Clara, CA. 94302-1212 + Phone Number (415) 123-6789 + Net Mailbox Aardvark@ECHO.TNC.COM + NIC Handle AAA2 + + (4) The name of the domain (up to 12 characters). This is the name + that will be used in tables and lists associating the domain with the + domain server addresses. [While, from a technical standpoint, domain + names can be quite long (programmers beware), shorter names are + easier for people to cope with.] + + For example: TNC + + (5) A description of the servers that provide the domain service for + translating names to addresses for hosts in this domain, and the date + they will be operational. + + A good way to answer this question is to say "Our server is + supplied by person or company X and does whatever their standard + issue server does." + + For example: Our server is a copy of the one operated by + the NIC; it will be installed and made operational on + 1 November 1987. + + (6) Domains must provide at least two independent servers for the + domain. Establishing the servers in physically separate locations + and on different PSNs is strongly recommended. A description of the + server machine and its backup, including + + + + + + + + + + + + + +Stahl [Page 11] + +RFC 1032 DOMAIN ADMINISTRATORS GUIDE November 1987 + + + (a) Hardware and software (using keywords from the Assigned + Numbers RFC). + + (b) Host domain name and network addresses (which host on which + network for each connected network). + + (c) Any domain-style nicknames (please limit your domain-style + nickname request to one) + + For example: + + - Hardware and software + + VAX-11/750 and UNIX, or + IBM-PC and MS-DOS, or + DEC-1090 and TOPS-20 + + - Host domain names and network addresses + + BAR.FOO.COM 10.9.0.193 on ARPANET + + - Domain-style nickname + + BR.FOO.COM (same as BAR.FOO.COM 10.9.0.13 on ARPANET) + + (7) Planned mapping of names of any other network hosts, other than + the server machines, into the new domain's naming space. + + For example: + + BAR-FOO2.ARPA (10.8.0.193) -> FOO2.BAR.COM + BAR-FOO3.ARPA (10.7.0.193) -> FOO3.BAR.COM + BAR-FOO4.ARPA (10.6.0.193) -> FOO4.BAR.COM + + + (8) An estimate of the number of hosts that will be in the domain. + + (a) Initially + (b) Within one year + (c) Two years + (d) Five years. + + For example: + + (a) Initially = 50 + (b) One year = 100 + (c) Two years = 200 + (d) Five years = 500 + + + +Stahl [Page 12] + +RFC 1032 DOMAIN ADMINISTRATORS GUIDE November 1987 + + + (9) The date you expect the fully qualified domain name to become + the official host name in HOSTS.TXT. + + Please note: If changing to a fully qualified domain name (e.g., + FOO.BAR.COM) causes a change in the official host name of an + ARPANET or MILNET host, DCA approval must be obtained beforehand. + Allow 10 working days for your requested changes to be processed. + + ARPANET sites should contact ARPANETMGR@DDN1.ARPA. MILNET sites + should contact HOSTMASTER@SRI-NIC.ARPA, 800-235-3155, for + further instructions. + + (10) Please describe your organization briefly. + + For example: The NetWorthy Corporation is a consulting + organization of people working with UNIX and the C language in an + electronic networking environment. It sponsors two technical + conferences annually and distributes a bimonthly newsletter. + + --------------------------------------------------------------------- + + This example of a completed application corresponds to the examples + found in the companion document RFC-1033, "Domain Administrators + Operations Guide." + + (1) The name of the top-level domain to join. + + COM + + (2) The NIC handle of the administrative contact person. + + NIC Handle JAKE + + (3) The NIC handle of the domain's technical and zone + contact person. + + NIC Handle DLE6 + + (4) The name of the domain. + + SRI + + (5) A description of the servers. + + Our server is the TOPS20 server JEEVES supplied by ISI; it + will be installed and made operational on 1 July 1987. + + + + + +Stahl [Page 13] + +RFC 1032 DOMAIN ADMINISTRATORS GUIDE November 1987 + + + (6) A description of the server machine and its backup: + + (a) Hardware and software + + DEC-1090T and TOPS20 + DEC-2065 and TOPS20 + + (b) Host domain name and network address + + KL.SRI.COM 10.1.0.2 on ARPANET, 128.18.10.6 on SRINET + STRIPE.SRI.COM 10.4.0.2 on ARPANET, 128.18.10.4 on SRINET + + (c) Domain-style nickname + + None + + (7) Planned mapping of names of any other network hosts, other than + the server machines, into the new domain's naming space. + + SRI-Blackjack.ARPA (128.18.2.1) -> Blackjack.SRI.COM + SRI-CSL.ARPA (192.12.33.2) -> CSL.SRI.COM + + (8) An estimate of the number of hosts that will be directly within + this domain. + + (a) Initially = 50 + (b) One year = 100 + (c) Two years = 200 + (d) Five years = 500 + + (9) A date when you expect the fully qualified domain name to become + the official host name in HOSTS.TXT. + + 31 September 1987 + + (10) Brief description of organization. + + SRI International is an independent, nonprofit, scientific + research organization. It performs basic and applied research + for government and commercial clients, and contributes to + worldwide economic, scientific, industrial, and social progress + through research and related services. + + + + + + + + + +Stahl [Page 14] + diff --git a/doc/rfc/rfc1033.txt b/doc/rfc/rfc1033.txt new file mode 100644 index 00000000..37029fd9 --- /dev/null +++ b/doc/rfc/rfc1033.txt @@ -0,0 +1,1229 @@ +Network Working Group M. Lottor +Request For Comments: 1033 SRI International + November 1987 + + + DOMAIN ADMINISTRATORS OPERATIONS GUIDE + + + +STATUS OF THIS MEMO + + This RFC provides guidelines for domain administrators in operating a + domain server and maintaining their portion of the hierarchical + database. Familiarity with the domain system is assumed. + Distribution of this memo is unlimited. + +ACKNOWLEDGMENTS + + This memo is a formatted collection of notes and excerpts from the + references listed at the end of this document. Of particular mention + are Paul Mockapetris and Kevin Dunlap. + +INTRODUCTION + + A domain server requires a few files to get started. It will + normally have some number of boot/startup files (also known as the + "safety belt" files). One section will contain a list of possible + root servers that the server will use to find the up-to-date list of + root servers. Another section will list the zone files to be loaded + into the server for your local domain information. A zone file + typically contains all the data for a particular domain. This guide + describes the data formats that can be used in zone files and + suggested parameters to use for certain fields. If you are + attempting to do anything advanced or tricky, consult the appropriate + domain RFC's for more details. + + Note: Each implementation of domain software may require different + files. Zone files are standardized but some servers may require + other startup files. See the appropriate documentation that comes + with your software. See the appendix for some specific examples. + +ZONES + + A zone defines the contents of a contiguous section of the domain + space, usually bounded by administrative boundaries. There will + typically be a separate data file for each zone. The data contained + in a zone file is composed of entries called Resource Records (RRs). + + + + +Lottor [Page 1] + +RFC 1033 DOMAIN OPERATIONS GUIDE November 1987 + + + You may only put data in your domain server that you are + authoritative for. You must not add entries for domains other than + your own (except for the special case of "glue records"). + + A domain server will probably read a file on start-up that lists the + zones it should load into its database. The format of this file is + not standardized and is different for most domain server + implementations. For each zone it will normally contain the domain + name of the zone and the file name that contains the data to load for + the zone. + +ROOT SERVERS + + A resolver will need to find the root servers when it first starts. + When the resolver boots, it will typically read a list of possible + root servers from a file. + + The resolver will cycle through the list trying to contact each one. + When it finds a root server, it will ask it for the current list of + root servers. It will then discard the list of root servers it read + from the data file and replace it with the current list it received. + + Root servers will not change very often. You can get the names of + current root servers from the NIC. + + FTP the file NETINFO:ROOT-SERVERS.TXT or send a mail request to + NIC@SRI-NIC.ARPA. + + As of this date (June 1987) they are: + + SRI-NIC.ARPA 10.0.0.51 26.0.0.73 + C.ISI.EDU 10.0.0.52 + BRL-AOS.ARPA 192.5.25.82 192.5.22.82 128.20.1.2 + A.ISI.EDU 26.3.0.103 + +RESOURCE RECORDS + + Records in the zone data files are called resource records (RRs). + They are specified in RFC-883 and RFC-973. An RR has a standard + format as shown: + + <name> [<ttl>] [<class>] <type> <data> + + The record is divided into fields which are separated by white space. + + <name> + + The name field defines what domain name applies to the given + + + +Lottor [Page 2] + +RFC 1033 DOMAIN OPERATIONS GUIDE November 1987 + + + RR. In some cases the name field can be left blank and it will + default to the name field of the previous RR. + + <ttl> + + TTL stands for Time To Live. It specifies how long a domain + resolver should cache the RR before it throws it out and asks a + domain server again. See the section on TTL's. If you leave + the TTL field blank it will default to the minimum time + specified in the SOA record (described later). + + <class> + + The class field specifies the protocol group. If left blank it + will default to the last class specified. + + <type> + + The type field specifies what type of data is in the RR. See + the section on types. + + <data> + + The data field is defined differently for each type and class + of data. Popular RR data formats are described later. + + The domain system does not guarantee to preserve the order of + resource records. Listing RRs (such as multiple address records) in + a certain order does not guarantee they will be used in that order. + + Case is preserved in names and data fields when loaded into the name + server. All comparisons and lookups in the name server are case + insensitive. + + Parenthesis ("(",")") are used to group data that crosses a line + boundary. + + A semicolon (";") starts a comment; the remainder of the line is + ignored. + + The asterisk ("*") is used for wildcarding. + + The at-sign ("@") denotes the current default domain name. + + + + + + + + +Lottor [Page 3] + +RFC 1033 DOMAIN OPERATIONS GUIDE November 1987 + + +NAMES + + A domain name is a sequence of labels separated by dots. + + Domain names in the zone files can be one of two types, either + absolute or relative. An absolute name is the fully qualified domain + name and is terminated with a period. A relative name does not + terminate with a period, and the current default domain is appended + to it. The default domain is usually the name of the domain that was + specified in the boot file that loads each zone. + + The domain system allows a label to contain any 8-bit character. + Although the domain system has no restrictions, other protocols such + as SMTP do have name restrictions. Because of other protocol + restrictions, only the following characters are recommended for use + in a host name (besides the dot separator): + + "A-Z", "a-z", "0-9", dash and underscore + +TTL's (Time To Live) + + It is important that TTLs are set to appropriate values. The TTL is + the time (in seconds) that a resolver will use the data it got from + your server before it asks your server again. If you set the value + too low, your server will get loaded down with lots of repeat + requests. If you set it too high, then information you change will + not get distributed in a reasonable amount of time. If you leave the + TTL field blank, it will default to what is specified in the SOA + record for the zone. + + Most host information does not change much over long time periods. A + good way to set up your TTLs would be to set them at a high value, + and then lower the value if you know a change will be coming soon. + You might set most TTLs to anywhere between a day (86400) and a week + (604800). Then, if you know some data will be changing in the near + future, set the TTL for that RR down to a lower value (an hour to a + day) until the change takes place, and then put it back up to its + previous value. + + Also, all RRs with the same name, class, and type should have the + same TTL value. + +CLASSES + + The domain system was designed to be protocol independent. The class + field is used to identify the protocol group that each RR is in. + + The class of interest to people using TCP/IP software is the class + + + +Lottor [Page 4] + +RFC 1033 DOMAIN OPERATIONS GUIDE November 1987 + + + "Internet". Its standard designation is "IN". + + A zone file should only contain RRs of the same class. + +TYPES + + There are many defined RR types. For a complete list, see the domain + specification RFCs. Here is a list of current commonly used types. + The data for each type is described in the data section. + + Designation Description + ========================================== + SOA Start Of Authority + NS Name Server + + A Internet Address + CNAME Canonical Name (nickname pointer) + HINFO Host Information + WKS Well Known Services + + MX Mail Exchanger + + PTR Pointer + +SOA (Start Of Authority) + + <name> [<ttl>] [<class>] SOA <origin> <person> ( + <serial> + <refresh> + <retry> + <expire> + <minimum> ) + + The Start Of Authority record designates the start of a zone. The + zone ends at the next SOA record. + + <name> is the name of the zone. + + <origin> is the name of the host on which the master zone file + resides. + + <person> is a mailbox for the person responsible for the zone. It is + formatted like a mailing address but the at-sign that normally + separates the user from the host name is replaced with a dot. + + <serial> is the version number of the zone file. It should be + incremented anytime a change is made to data in the zone. + + + + +Lottor [Page 5] + +RFC 1033 DOMAIN OPERATIONS GUIDE November 1987 + + + <refresh> is how long, in seconds, a secondary name server is to + check with the primary name server to see if an update is needed. A + good value here would be one hour (3600). + + <retry> is how long, in seconds, a secondary name server is to retry + after a failure to check for a refresh. A good value here would be + 10 minutes (600). + + <expire> is the upper limit, in seconds, that a secondary name server + is to use the data before it expires for lack of getting a refresh. + You want this to be rather large, and a nice value is 3600000, about + 42 days. + + <minimum> is the minimum number of seconds to be used for TTL values + in RRs. A minimum of at least a day is a good value here (86400). + + There should only be one SOA record per zone. A sample SOA record + would look something like: + + @ IN SOA SRI-NIC.ARPA. HOSTMASTER.SRI-NIC.ARPA. ( + 45 ;serial + 3600 ;refresh + 600 ;retry + 3600000 ;expire + 86400 ) ;minimum + + +NS (Name Server) + + <domain> [<ttl>] [<class>] NS <server> + + The NS record lists the name of a machine that provides domain + service for a particular domain. The name associated with the RR is + the domain name and the data portion is the name of a host that + provides the service. If machines SRI-NIC.ARPA and C.ISI.EDU provide + name lookup service for the domain COM then the following entries + would be used: + + COM. NS SRI-NIC.ARPA. + NS C.ISI.EDU. + + Note that the machines providing name service do not have to live in + the named domain. There should be one NS record for each server for + a domain. Also note that the name "COM" defaults for the second NS + record. + + NS records for a domain exist in both the zone that delegates the + domain, and in the domain itself. + + + +Lottor [Page 6] + +RFC 1033 DOMAIN OPERATIONS GUIDE November 1987 + + +GLUE RECORDS + + If the name server host for a particular domain is itself inside the + domain, then a 'glue' record will be needed. A glue record is an A + (address) RR that specifies the address of the server. Glue records + are only needed in the server delegating the domain, not in the + domain itself. If for example the name server for domain SRI.COM was + KL.SRI.COM, then the NS record would look like this, but you will + also need to have the following A record. + + SRI.COM. NS KL.SRI.COM. + KL.SRI.COM. A 10.1.0.2 + + +A (Address) + + <host> [<ttl>] [<class>] A <address> + + The data for an A record is an internet address in dotted decimal + form. A sample A record might look like: + + SRI-NIC.ARPA. A 10.0.0.51 + + There should be one A record for each address of a host. + +CNAME ( Canonical Name) + + <nickname> [<ttl>] [<class>] CNAME <host> + + The CNAME record is used for nicknames. The name associated with the + RR is the nickname. The data portion is the official name. For + example, a machine named SRI-NIC.ARPA may want to have the nickname + NIC.ARPA. In that case, the following RR would be used: + + NIC.ARPA. CNAME SRI-NIC.ARPA. + + There must not be any other RRs associated with a nickname of the + same class. + + Nicknames are also useful when a host changes it's name. In that + case, it is usually a good idea to have a CNAME pointer so that + people still using the old name will get to the right place. + + + + + + + + + +Lottor [Page 7] + +RFC 1033 DOMAIN OPERATIONS GUIDE November 1987 + + +HINFO (Host Info) + + <host> [<ttl>] [<class>] HINFO <hardware> <software> + + The HINFO record gives information about a particular host. The data + is two strings separated by whitespace. The first string is a + hardware description and the second is software. The hardware is + usually a manufacturer name followed by a dash and model designation. + The software string is usually the name of the operating system. + + Official HINFO types can be found in the latest Assigned Numbers RFC, + the latest of which is RFC-1010. The Hardware type is called the + Machine name and the Software type is called the System name. + + Some sample HINFO records: + + SRI-NIC.ARPA. HINFO DEC-2060 TOPS20 + UCBARPA.Berkeley.EDU. HINFO VAX-11/780 UNIX + + +WKS (Well Known Services) + + <host> [<ttl>] [<class>] WKS <address> <protocol> <services> + + The WKS record is used to list Well Known Services a host provides. + WKS's are defined to be services on port numbers below 256. The WKS + record lists what services are available at a certain address using a + certain protocol. The common protocols are TCP or UDP. A sample WKS + record for a host offering the same services on all address would + look like: + + Official protocol names can be found in the latest Assigned Numbers + RFC, the latest of which is RFC-1010. + + SRI-NIC.ARPA. WKS 10.0.0.51 TCP TELNET FTP SMTP + WKS 10.0.0.51 UDP TIME + WKS 26.0.0.73 TCP TELNET FTP SMTP + WKS 26.0.0.73 UDP TIME + +MX (Mail Exchanger) (See RFC-974 for more details.) + + <name> [<ttl>] [<class>] MX <preference> <host> + + MX records specify where mail for a domain name should be delivered. + There may be multiple MX records for a particular name. The + preference value specifies the order a mailer should try multiple MX + records when delivering mail. Zero is the highest preference. + Multiple records for the same name may have the same preference. + + + +Lottor [Page 8] + +RFC 1033 DOMAIN OPERATIONS GUIDE November 1987 + + + A host BAR.FOO.COM may want its mail to be delivered to the host + PO.FOO.COM and would then use the MX record: + + BAR.FOO.COM. MX 10 PO.FOO.COM. + + A host BAZ.FOO.COM may want its mail to be delivered to one of three + different machines, in the following order: + + BAZ.FOO.COM. MX 10 PO1.FOO.COM. + MX 20 PO2.FOO.COM. + MX 30 PO3.FOO.COM. + + An entire domain of hosts not connected to the Internet may want + their mail to go through a mail gateway that knows how to deliver + mail to them. If they would like mail addressed to any host in the + domain FOO.COM to go through the mail gateway they might use: + + FOO.COM. MX 10 RELAY.CS.NET. + *.FOO.COM. MX 20 RELAY.CS.NET. + + Note that you can specify a wildcard in the MX record to match on + anything in FOO.COM, but that it won't match a plain FOO.COM. + +IN-ADDR.ARPA + + The structure of names in the domain system is set up in a + hierarchical way such that the address of a name can be found by + tracing down the domain tree contacting a server for each label of + the name. Because of this 'indexing' based on name, there is no easy + way to translate a host address back into its host name. + + In order to do the reverse translation easily, a domain was created + that uses hosts' addresses as part of a name that then points to the + data for that host. In this way, there is now an 'index' to hosts' + RRs based on their address. This address mapping domain is called + IN-ADDR.ARPA. Within that domain are subdomains for each network, + based on network number. Also, for consistency and natural + groupings, the 4 octets of a host number are reversed. + + For example, the ARPANET is net 10. That means there is a domain + called 10.IN-ADDR.ARPA. Within this domain there is a PTR RR at + 51.0.0.10.IN-ADDR that points to the RRs for the host SRI-NIC.ARPA + (who's address is 10.0.0.51). Since the NIC is also on the MILNET + (Net 26, address 26.0.0.73), there is also a PTR RR at 73.0.0.26.IN- + ADDR.ARPA that points to the same RR's for SRI-NIC.ARPA. The format + of these special pointers is defined below along with the examples + for the NIC. + + + + +Lottor [Page 9] + +RFC 1033 DOMAIN OPERATIONS GUIDE November 1987 + + +PTR + + <special-name> [<ttl>] [<class>] PTR <name> + + The PTR record is used to let special names point to some other + location in the domain tree. They are mainly used in the IN- + ADDR.ARPA records for translation of addresses to names. PTR's + should use official names and not aliases. + + For example, host SRI-NIC.ARPA with addresses 10.0.0.51 and 26.0.0.73 + would have the following records in the respective zone files for net + 10 and net 26: + + 51.0.0.10.IN-ADDR.ARPA. PTR SRI-NIC.ARPA. + 73.0.0.26.IN-ADDR.ARPA. PTR SRI-NIC.ARPA. + +GATEWAY PTR's + + The IN-ADDR tree is also used to locate gateways on a particular + network. Gateways have the same kind of PTR RRs as hosts (as above) + but in addition they have other PTRs used to locate them by network + number alone. These records have only 1, 2, or 3 octets as part of + the name depending on whether they are class A, B, or C networks, + respectively. + + Lets take the SRI-CSL gateway for example. It connects 3 different + networks, one class A, one class B and one class C. It will have the + standard RR's for a host in the CSL.SRI.COM zone: + + GW.CSL.SRI.COM. A 10.2.0.2 + A 128.18.1.1 + A 192.12.33.2 + + Also, in 3 different zones (one for each network), it will have one + of the following number to name translation pointers: + + 2.0.2.10.IN-ADDR.ARPA. PTR GW.CSL.SRI.COM. + 1.1.18.128.IN-ADDR.ARPA. PTR GW.CSL.SRI.COM. + 1.33.12.192.IN-ADDR.ARPA. PTR GW.CSL.SRI.COM. + + In addition, in each of the same 3 zones will be one of the following + gateway location pointers: + + 10.IN-ADDR.ARPA. PTR GW.CSL.SRI.COM. + 18.128.IN-ADDR.ARPA. PTR GW.CSL.SRI.COM. + 33.12.192.IN-ADDR.ARPA. PTR GW.CSL.SRI.COM. + + + + + +Lottor [Page 10] + +RFC 1033 DOMAIN OPERATIONS GUIDE November 1987 + + +INSTRUCTIONS + + Adding a subdomain. + + To add a new subdomain to your domain: + + Setup the other domain server and/or the new zone file. + + Add an NS record for each server of the new domain to the zone + file of the parent domain. + + Add any necessary glue RRs. + + Adding a host. + + To add a new host to your zone files: + + Edit the appropriate zone file for the domain the host is in. + + Add an entry for each address of the host. + + Optionally add CNAME, HINFO, WKS, and MX records. + + Add the reverse IN-ADDR entry for each host address in the + appropriate zone files for each network the host in on. + + Deleting a host. + + To delete a host from the zone files: + + Remove all the hosts' resource records from the zone file of + the domain the host is in. + + Remove all the hosts' PTR records from the IN-ADDR zone files + for each network the host was on. + + Adding gateways. + + Follow instructions for adding a host. + + Add the gateway location PTR records for each network the + gateway is on. + + Deleting gateways. + + Follow instructions for deleting a host. + + Also delete the gateway location PTR records for each network + + + +Lottor [Page 11] + +RFC 1033 DOMAIN OPERATIONS GUIDE November 1987 + + + the gateway was on. + +COMPLAINTS + + These are the suggested steps you should take if you are having + problems that you believe are caused by someone else's name server: + + + 1. Complain privately to the responsible person for the domain. You + can find their mailing address in the SOA record for the domain. + + 2. Complain publicly to the responsible person for the domain. + + 3. Ask the NIC for the administrative person responsible for the + domain. Complain. You can also find domain contacts on the NIC in + the file NETINFO:DOMAIN-CONTACTS.TXT + + 4. Complain to the parent domain authorities. + + 5. Ask the parent authorities to excommunicate the domain. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Lottor [Page 12] + +RFC 1033 DOMAIN OPERATIONS GUIDE November 1987 + + +EXAMPLE DOMAIN SERVER DATABASE FILES + + The following examples show how zone files are set up for a typical + organization. SRI will be used as the example organization. SRI has + decided to divided their domain SRI.COM into a few subdomains, one + for each group that wants one. The subdomains are CSL and ISTC. + + Note the following interesting items: + + There are both hosts and domains under SRI.COM. + + CSL.SRI.COM is both a domain name and a host name. + + All the domains are serviced by the same pair of domain servers. + + All hosts at SRI are on net 128.18 except hosts in the CSL domain + which are on net 192.12.33. Note that a domain does not have to + correspond to a physical network. + + The examples do not necessarily correspond to actual data in use + by the SRI domain. + + SRI Domain Organization + + +-------+ + | COM | + +-------+ + | + +-------+ + | SRI | + +-------+ + | + +----------++-----------+ + | | | + +-------+ +------+ +-------+ + | CSL | | ISTC | | Hosts | + +-------+ +------+ +-------+ + | | + +-------+ +-------+ + | Hosts | | Hosts | + +-------+ +-------+ + + + + + + + + + + +Lottor [Page 13] + +RFC 1033 DOMAIN OPERATIONS GUIDE November 1987 + + + [File "CONFIG.CMD". Since bootstrap files are not standardized, this + file is presented using a pseudo configuration file syntax.] + + load root server list from file ROOT.SERVERS + load zone SRI.COM. from file SRI.ZONE + load zone CSL.SRI.COM. from file CSL.ZONE + load zone ISTC.SRI.COM. from file ISTC.ZONE + load zone 18.128.IN-ADDR.ARPA. from file SRINET.ZONE + load zone 33.12.192.IN-ADDR.ARPA. from file SRI-CSL-NET.ZONE + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Lottor [Page 14] + +RFC 1033 DOMAIN OPERATIONS GUIDE November 1987 + + + [File "ROOT.SERVERS". Again, the format of this file is not + standardized.] + + ;list of possible root servers + SRI-NIC.ARPA 10.0.0.51 26.0.0.73 + C.ISI.EDU 10.0.0.52 + BRL-AOS.ARPA 192.5.25.82 192.5.22.82 128.20.1.2 + A.ISI.EDU 26.3.0.103 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Lottor [Page 15] + +RFC 1033 DOMAIN OPERATIONS GUIDE November 1987 + + + [File "SRI.ZONE"] + + SRI.COM. IN SOA KL.SRI.COM. DLE.STRIPE.SRI.COM. ( + 870407 ;serial + 1800 ;refresh every 30 minutes + 600 ;retry every 10 minutes + 604800 ;expire after a week + 86400 ;default of an hour + ) + + SRI.COM. NS KL.SRI.COM. + NS STRIPE.SRI.COM. + MX 10 KL.SRI.COM. + + ;SRI.COM hosts + + KL A 10.1.0.2 + A 128.18.10.6 + MX 10 KL.SRI.COM. + + STRIPE A 10.4.0.2 + STRIPE A 128.18.10.4 + MX 10 STRIPE.SRI.COM. + + NIC CNAME SRI-NIC.ARPA. + + Blackjack A 128.18.2.1 + HINFO VAX-11/780 UNIX + WKS 128.18.2.1 TCP TELNET FTP + + CSL A 192.12.33.2 + HINFO FOONLY-F4 TOPS20 + WKS 192.12.33.2 TCP TELNET FTP SMTP FINGER + MX 10 CSL.SRI.COM. + + + + + + + + + + + + + + + + + +Lottor [Page 16] + +RFC 1033 DOMAIN OPERATIONS GUIDE November 1987 + + + [File "CSL.ZONE"] + + CSL.SRI.COM. IN SOA KL.SRI.COM. DLE.STRIPE.SRI.COM. ( + 870330 ;serial + 1800 ;refresh every 30 minutes + 600 ;retry every 10 minutes + 604800 ;expire after a week + 86400 ;default of a day + ) + + CSL.SRI.COM. NS KL.SRI.COM. + NS STRIPE.SRI.COM. + A 192.12.33.2 + + ;CSL.SRI.COM hosts + + A CNAME CSL.SRI.COM. + B A 192.12.33.3 + HINFO FOONLY-F4 TOPS20 + WKS 192.12.33.3 TCP TELNET FTP SMTP + GW A 10.2.0.2 + A 192.12.33.1 + A 128.18.1.1 + HINFO PDP-11/23 MOS + SMELLY A 192.12.33.4 + HINFO IMAGEN IMAGEN + SQUIRREL A 192.12.33.5 + HINFO XEROX-1100 INTERLISP + VENUS A 192.12.33.7 + HINFO SYMBOLICS-3600 LISPM + HELIUM A 192.12.33.30 + HINFO SUN-3/160 UNIX + ARGON A 192.12.33.31 + HINFO SUN-3/75 UNIX + RADON A 192.12.33.32 + HINFO SUN-3/75 UNIX + + + + + + + + + + + + + + + +Lottor [Page 17] + +RFC 1033 DOMAIN OPERATIONS GUIDE November 1987 + + + [File "ISTC.ZONE"] + + ISTC.SRI.COM. IN SOA KL.SRI.COM. roemers.JOYCE.ISTC.SRI.COM. ( + 870406 ;serial + 1800 ;refresh every 30 minutes + 600 ;retry every 10 minutes + 604800 ;expire after a week + 86400 ;default of a day + ) + + ISTC.SRI.COM. NS KL.SRI.COM. + NS STRIPE.SRI.COM. + MX 10 SPAM.ISTC.SRI.COM. + + ; ISTC hosts + + joyce A 128.18.4.2 + HINFO VAX-11/750 UNIX + bozo A 128.18.0.6 + HINFO SUN UNIX + sundae A 128.18.0.11 + HINFO SUN UNIX + tsca A 128.18.0.201 + A 10.3.0.2 + HINFO VAX-11/750 UNIX + MX 10 TSCA.ISTC.SRI.COM. + tsc CNAME tsca + prmh A 128.18.0.203 + A 10.2.0.51 + HINFO PDP-11/44 UNIX + spam A 128.18.4.3 + A 10.2.0.107 + HINFO VAX-11/780 UNIX + MX 10 SPAM.ISTC.SRI.COM. + + + + + + + + + + + + + + + + + +Lottor [Page 18] + +RFC 1033 DOMAIN OPERATIONS GUIDE November 1987 + + + [File "SRINET.ZONE"] + + 18.128.IN-ADDR.ARPA. IN SOA KL.SRI.COM DLE.STRIPE.SRI.COM. ( + 870406 ;serial + 1800 ;refresh every 30 minutes + 600 ;retry every 10 minutes + 604800 ;expire after a week + 86400 ;default of a day + ) + + 18.128.IN-ADDR.ARPA. NS KL.SRI.COM. + NS STRIPE.SRI.COM. + PTR GW.CSL.SRI.COM. + + ; SRINET [128.18.0.0] Address Translations + + ; SRI.COM Hosts + 1.2.18.128.IN-ADDR.ARPA. PTR Blackjack.SRI.COM. + + ; ISTC.SRI.COM Hosts + 2.4.18.128.IN-ADDR.ARPA. PTR joyce.ISTC.SRI.COM. + 6.0.18.128.IN-ADDR.ARPA. PTR bozo.ISTC.SRI.COM. + 11.0.18.128.IN-ADDR.ARPA. PTR sundae.ISTC.SRI.COM. + 201.0.18.128.IN-ADDR.ARPA. PTR tsca.ISTC.SRI.COM. + 203.0.18.128.IN-ADDR.ARPA. PTR prmh.ISTC.SRI.COM. + 3.4.18.128.IN-ADDR.ARPA. PTR spam.ISTC.SRI.COM. + + ; CSL.SRI.COM Hosts + 1.1.18.128.IN-ADDR.ARPA. PTR GW.CSL.SRI.COM. + + + + + + + + + + + + + + + + + + + + + + +Lottor [Page 19] + +RFC 1033 DOMAIN OPERATIONS GUIDE November 1987 + + + [File "SRI-CSL-NET.ZONE"] + + 33.12.192.IN-ADDR.ARPA. IN SOA KL.SRI.COM DLE.STRIPE.SRI.COM. ( + 870404 ;serial + 1800 ;refresh every 30 minutes + 600 ;retry every 10 minutes + 604800 ;expire after a week + 86400 ;default of a day + ) + + 33.12.192.IN-ADDR.ARPA. NS KL.SRI.COM. + NS STRIPE.SRI.COM. + PTR GW.CSL.SRI.COM. + + ; SRI-CSL-NET [192.12.33.0] Address Translations + + ; SRI.COM Hosts + 2.33.12.192.IN-ADDR.ARPA. PTR CSL.SRI.COM. + + ; CSL.SRI.COM Hosts + 1.33.12.192.IN-ADDR.ARPA. PTR GW.CSL.SRI.COM. + 3.33.12.192.IN-ADDR.ARPA. PTR B.CSL.SRI.COM. + 4.33.12.192.IN-ADDR.ARPA. PTR SMELLY.CSL.SRI.COM. + 5.33.12.192.IN-ADDR.ARPA. PTR SQUIRREL.CSL.SRI.COM. + 7.33.12.192.IN-ADDR.ARPA. PTR VENUS.CSL.SRI.COM. + 30.33.12.192.IN-ADDR.ARPA. PTR HELIUM.CSL.SRI.COM. + 31.33.12.192.IN-ADDR.ARPA. PTR ARGON.CSL.SRI.COM. + 32.33.12.192.IN-ADDR.ARPA. PTR RADON.CSL.SRI.COM. + + + + + + + + + + + + + + + + + + + + + + + +Lottor [Page 20] + +RFC 1033 DOMAIN OPERATIONS GUIDE November 1987 + + +APPENDIX + + BIND (Berkeley Internet Name Domain server) distributed with 4.3 BSD + UNIX + + This section describes two BIND implementation specific files; the + boot file and the cache file. BIND has other options, files, and + specifications that are not described here. See the Name Server + Operations Guide for BIND for details. + + The boot file for BIND is usually called "named.boot". This + corresponds to file "CONFIG.CMD" in the example section. + + -------------------------------------------------------- + cache . named.ca + primary SRI.COM SRI.ZONE + primary CSL.SRI.COM CSL.ZONE + primary ISTC.SRI.COM ISTC.ZONE + primary 18.128.IN-ADDR.ARPA SRINET.ZONE + primary 33.12.192.IN-ADDR.ARPA SRI-CSL-NET.ZONE + -------------------------------------------------------- + + The cache file for BIND is usually called "named.ca". This + corresponds to file "ROOT.SERVERS" in the example section. + + ------------------------------------------------- + ;list of possible root servers + . 1 IN NS SRI-NIC.ARPA. + NS C.ISI.EDU. + NS BRL-AOS.ARPA. + NS C.ISI.EDU. + ;and their addresses + SRI-NIC.ARPA. A 10.0.0.51 + A 26.0.0.73 + C.ISI.EDU. A 10.0.0.52 + BRL-AOS.ARPA. A 192.5.25.82 + A 192.5.22.82 + A 128.20.1.2 + A.ISI.EDU. A 26.3.0.103 + ------------------------------------------------- + + + + + + + + + + + +Lottor [Page 21] + +RFC 1033 DOMAIN OPERATIONS GUIDE November 1987 + + +REFERENCES + + [1] Dunlap, K., "Name Server Operations Guide for BIND", CSRG, + Department of Electrical Engineering and Computer Sciences, + University of California, Berkeley, California. + + [2] Partridge, C., "Mail Routing and the Domain System", RFC-974, + CSNET CIC BBN Laboratories, January 1986. + + [3] Mockapetris, P., "Domains Names - Concepts and Facilities", + RFC-1034, USC/Information Sciences Institute, November 1987. + + [4] Mockapetris, P., "Domain Names - Implementations Specification", + RFC-1035, USC/Information Sciences Institute, November 1987. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Lottor [Page 22] + diff --git a/doc/rfc/rfc1034.txt b/doc/rfc/rfc1034.txt new file mode 100644 index 00000000..55cdb21f --- /dev/null +++ b/doc/rfc/rfc1034.txt @@ -0,0 +1,3077 @@ +Network Working Group P. Mockapetris +Request for Comments: 1034 ISI +Obsoletes: RFCs 882, 883, 973 November 1987 + + + DOMAIN NAMES - CONCEPTS AND FACILITIES + + + +1. STATUS OF THIS MEMO + +This RFC is an introduction to the Domain Name System (DNS), and omits +many details which can be found in a companion RFC, "Domain Names - +Implementation and Specification" [RFC-1035]. That RFC assumes that the +reader is familiar with the concepts discussed in this memo. + +A subset of DNS functions and data types constitute an official +protocol. The official protocol includes standard queries and their +responses and most of the Internet class data formats (e.g., host +addresses). + +However, the domain system is intentionally extensible. Researchers are +continuously proposing, implementing and experimenting with new data +types, query types, classes, functions, etc. Thus while the components +of the official protocol are expected to stay essentially unchanged and +operate as a production service, experimental behavior should always be +expected in extensions beyond the official protocol. Experimental or +obsolete features are clearly marked in these RFCs, and such information +should be used with caution. + +The reader is especially cautioned not to depend on the values which +appear in examples to be current or complete, since their purpose is +primarily pedagogical. Distribution of this memo is unlimited. + +2. INTRODUCTION + +This RFC introduces domain style names, their use for Internet mail and +host address support, and the protocols and servers used to implement +domain name facilities. + +2.1. The history of domain names + +The impetus for the development of the domain system was growth in the +Internet: + + - Host name to address mappings were maintained by the Network + Information Center (NIC) in a single file (HOSTS.TXT) which + was FTPed by all hosts [RFC-952, RFC-953]. The total network + + + +Mockapetris [Page 1] + +RFC 1034 Domain Concepts and Facilities November 1987 + + + bandwidth consumed in distributing a new version by this + scheme is proportional to the square of the number of hosts in + the network, and even when multiple levels of FTP are used, + the outgoing FTP load on the NIC host is considerable. + Explosive growth in the number of hosts didn't bode well for + the future. + + - The network population was also changing in character. The + timeshared hosts that made up the original ARPANET were being + replaced with local networks of workstations. Local + organizations were administering their own names and + addresses, but had to wait for the NIC to change HOSTS.TXT to + make changes visible to the Internet at large. Organizations + also wanted some local structure on the name space. + + - The applications on the Internet were getting more + sophisticated and creating a need for general purpose name + service. + + +The result was several ideas about name spaces and their management +[IEN-116, RFC-799, RFC-819, RFC-830]. The proposals varied, but a +common thread was the idea of a hierarchical name space, with the +hierarchy roughly corresponding to organizational structure, and names +using "." as the character to mark the boundary between hierarchy +levels. A design using a distributed database and generalized resources +was described in [RFC-882, RFC-883]. Based on experience with several +implementations, the system evolved into the scheme described in this +memo. + +The terms "domain" or "domain name" are used in many contexts beyond the +DNS described here. Very often, the term domain name is used to refer +to a name with structure indicated by dots, but no relation to the DNS. +This is particularly true in mail addressing [Quarterman 86]. + +2.2. DNS design goals + +The design goals of the DNS influence its structure. They are: + + - The primary goal is a consistent name space which will be used + for referring to resources. In order to avoid the problems + caused by ad hoc encodings, names should not be required to + contain network identifiers, addresses, routes, or similar + information as part of the name. + + - The sheer size of the database and frequency of updates + suggest that it must be maintained in a distributed manner, + with local caching to improve performance. Approaches that + + + +Mockapetris [Page 2] + +RFC 1034 Domain Concepts and Facilities November 1987 + + + attempt to collect a consistent copy of the entire database + will become more and more expensive and difficult, and hence + should be avoided. The same principle holds for the structure + of the name space, and in particular mechanisms for creating + and deleting names; these should also be distributed. + + - Where there tradeoffs between the cost of acquiring data, the + speed of updates, and the accuracy of caches, the source of + the data should control the tradeoff. + + - The costs of implementing such a facility dictate that it be + generally useful, and not restricted to a single application. + We should be able to use names to retrieve host addresses, + mailbox data, and other as yet undetermined information. All + data associated with a name is tagged with a type, and queries + can be limited to a single type. + + - Because we want the name space to be useful in dissimilar + networks and applications, we provide the ability to use the + same name space with different protocol families or + management. For example, host address formats differ between + protocols, though all protocols have the notion of address. + The DNS tags all data with a class as well as the type, so + that we can allow parallel use of different formats for data + of type address. + + - We want name server transactions to be independent of the + communications system that carries them. Some systems may + wish to use datagrams for queries and responses, and only + establish virtual circuits for transactions that need the + reliability (e.g., database updates, long transactions); other + systems will use virtual circuits exclusively. + + - The system should be useful across a wide spectrum of host + capabilities. Both personal computers and large timeshared + hosts should be able to use the system, though perhaps in + different ways. + +2.3. Assumptions about usage + +The organization of the domain system derives from some assumptions +about the needs and usage patterns of its user community and is designed +to avoid many of the the complicated problems found in general purpose +database systems. + +The assumptions are: + + - The size of the total database will initially be proportional + + + +Mockapetris [Page 3] + +RFC 1034 Domain Concepts and Facilities November 1987 + + + to the number of hosts using the system, but will eventually + grow to be proportional to the number of users on those hosts + as mailboxes and other information are added to the domain + system. + + - Most of the data in the system will change very slowly (e.g., + mailbox bindings, host addresses), but that the system should + be able to deal with subsets that change more rapidly (on the + order of seconds or minutes). + + - The administrative boundaries used to distribute + responsibility for the database will usually correspond to + organizations that have one or more hosts. Each organization + that has responsibility for a particular set of domains will + provide redundant name servers, either on the organization's + own hosts or other hosts that the organization arranges to + use. + + - Clients of the domain system should be able to identify + trusted name servers they prefer to use before accepting + referrals to name servers outside of this "trusted" set. + + - Access to information is more critical than instantaneous + updates or guarantees of consistency. Hence the update + process allows updates to percolate out through the users of + the domain system rather than guaranteeing that all copies are + simultaneously updated. When updates are unavailable due to + network or host failure, the usual course is to believe old + information while continuing efforts to update it. The + general model is that copies are distributed with timeouts for + refreshing. The distributor sets the timeout value and the + recipient of the distribution is responsible for performing + the refresh. In special situations, very short intervals can + be specified, or the owner can prohibit copies. + + - In any system that has a distributed database, a particular + name server may be presented with a query that can only be + answered by some other server. The two general approaches to + dealing with this problem are "recursive", in which the first + server pursues the query for the client at another server, and + "iterative", in which the server refers the client to another + server and lets the client pursue the query. Both approaches + have advantages and disadvantages, but the iterative approach + is preferred for the datagram style of access. The domain + system requires implementation of the iterative approach, but + allows the recursive approach as an option. + + + + + +Mockapetris [Page 4] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +The domain system assumes that all data originates in master files +scattered through the hosts that use the domain system. These master +files are updated by local system administrators. Master files are text +files that are read by a local name server, and hence become available +through the name servers to users of the domain system. The user +programs access name servers through standard programs called resolvers. + +The standard format of master files allows them to be exchanged between +hosts (via FTP, mail, or some other mechanism); this facility is useful +when an organization wants a domain, but doesn't want to support a name +server. The organization can maintain the master files locally using a +text editor, transfer them to a foreign host which runs a name server, +and then arrange with the system administrator of the name server to get +the files loaded. + +Each host's name servers and resolvers are configured by a local system +administrator [RFC-1033]. For a name server, this configuration data +includes the identity of local master files and instructions on which +non-local master files are to be loaded from foreign servers. The name +server uses the master files or copies to load its zones. For +resolvers, the configuration data identifies the name servers which +should be the primary sources of information. + +The domain system defines procedures for accessing the data and for +referrals to other name servers. The domain system also defines +procedures for caching retrieved data and for periodic refreshing of +data defined by the system administrator. + +The system administrators provide: + + - The definition of zone boundaries. + + - Master files of data. + + - Updates to master files. + + - Statements of the refresh policies desired. + +The domain system provides: + + - Standard formats for resource data. + + - Standard methods for querying the database. + + - Standard methods for name servers to refresh local data from + foreign name servers. + + + + + +Mockapetris [Page 5] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +2.4. Elements of the DNS + +The DNS has three major components: + + - The DOMAIN NAME SPACE and RESOURCE RECORDS, which are + specifications for a tree structured name space and data + associated with the names. Conceptually, each node and leaf + of the domain name space tree names a set of information, and + query operations are attempts to extract specific types of + information from a particular set. A query names the domain + name of interest and describes the type of resource + information that is desired. For example, the Internet + uses some of its domain names to identify hosts; queries for + address resources return Internet host addresses. + + - NAME SERVERS are server programs which hold information about + the domain tree's structure and set information. A name + server may cache structure or set information about any part + of the domain tree, but in general a particular name server + has complete information about a subset of the domain space, + and pointers to other name servers that can be used to lead to + information from any part of the domain tree. Name servers + know the parts of the domain tree for which they have complete + information; a name server is said to be an AUTHORITY for + these parts of the name space. Authoritative information is + organized into units called ZONEs, and these zones can be + automatically distributed to the name servers which provide + redundant service for the data in a zone. + + - RESOLVERS are programs that extract information from name + servers in response to client requests. Resolvers must be + able to access at least one name server and use that name + server's information to answer a query directly, or pursue the + query using referrals to other name servers. A resolver will + typically be a system routine that is directly accessible to + user programs; hence no protocol is necessary between the + resolver and the user program. + +These three components roughly correspond to the three layers or views +of the domain system: + + - From the user's point of view, the domain system is accessed + through a simple procedure or OS call to a local resolver. + The domain space consists of a single tree and the user can + request information from any section of the tree. + + - From the resolver's point of view, the domain system is + composed of an unknown number of name servers. Each name + + + +Mockapetris [Page 6] + +RFC 1034 Domain Concepts and Facilities November 1987 + + + server has one or more pieces of the whole domain tree's data, + but the resolver views each of these databases as essentially + static. + + - From a name server's point of view, the domain system consists + of separate sets of local information called zones. The name + server has local copies of some of the zones. The name server + must periodically refresh its zones from master copies in + local files or foreign name servers. The name server must + concurrently process queries that arrive from resolvers. + +In the interests of performance, implementations may couple these +functions. For example, a resolver on the same machine as a name server +might share a database consisting of the the zones managed by the name +server and the cache managed by the resolver. + +3. DOMAIN NAME SPACE and RESOURCE RECORDS + +3.1. Name space specifications and terminology + +The domain name space is a tree structure. Each node and leaf on the +tree corresponds to a resource set (which may be empty). The domain +system makes no distinctions between the uses of the interior nodes and +leaves, and this memo uses the term "node" to refer to both. + +Each node has a label, which is zero to 63 octets in length. Brother +nodes may not have the same label, although the same label can be used +for nodes which are not brothers. One label is reserved, and that is +the null (i.e., zero length) label used for the root. + +The domain name of a node is the list of the labels on the path from the +node to the root of the tree. By convention, the labels that compose a +domain name are printed or read left to right, from the most specific +(lowest, farthest from the root) to the least specific (highest, closest +to the root). + +Internally, programs that manipulate domain names should represent them +as sequences of labels, where each label is a length octet followed by +an octet string. Because all domain names end at the root, which has a +null string for a label, these internal representations can use a length +byte of zero to terminate a domain name. + +By convention, domain names can be stored with arbitrary case, but +domain name comparisons for all present domain functions are done in a +case-insensitive manner, assuming an ASCII character set, and a high +order zero bit. This means that you are free to create a node with +label "A" or a node with label "a", but not both as brothers; you could +refer to either using "a" or "A". When you receive a domain name or + + + +Mockapetris [Page 7] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +label, you should preserve its case. The rationale for this choice is +that we may someday need to add full binary domain names for new +services; existing services would not be changed. + +When a user needs to type a domain name, the length of each label is +omitted and the labels are separated by dots ("."). Since a complete +domain name ends with the root label, this leads to a printed form which +ends in a dot. We use this property to distinguish between: + + - a character string which represents a complete domain name + (often called "absolute"). For example, "poneria.ISI.EDU." + + - a character string that represents the starting labels of a + domain name which is incomplete, and should be completed by + local software using knowledge of the local domain (often + called "relative"). For example, "poneria" used in the + ISI.EDU domain. + +Relative names are either taken relative to a well known origin, or to a +list of domains used as a search list. Relative names appear mostly at +the user interface, where their interpretation varies from +implementation to implementation, and in master files, where they are +relative to a single origin domain name. The most common interpretation +uses the root "." as either the single origin or as one of the members +of the search list, so a multi-label relative name is often one where +the trailing dot has been omitted to save typing. + +To simplify implementations, the total number of octets that represent a +domain name (i.e., the sum of all label octets and label lengths) is +limited to 255. + +A domain is identified by a domain name, and consists of that part of +the domain name space that is at or below the domain name which +specifies the domain. A domain is a subdomain of another domain if it +is contained within that domain. This relationship can be tested by +seeing if the subdomain's name ends with the containing domain's name. +For example, A.B.C.D is a subdomain of B.C.D, C.D, D, and " ". + +3.2. Administrative guidelines on use + +As a matter of policy, the DNS technical specifications do not mandate a +particular tree structure or rules for selecting labels; its goal is to +be as general as possible, so that it can be used to build arbitrary +applications. In particular, the system was designed so that the name +space did not have to be organized along the lines of network +boundaries, name servers, etc. The rationale for this is not that the +name space should have no implied semantics, but rather that the choice +of implied semantics should be left open to be used for the problem at + + + +Mockapetris [Page 8] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +hand, and that different parts of the tree can have different implied +semantics. For example, the IN-ADDR.ARPA domain is organized and +distributed by network and host address because its role is to translate +from network or host numbers to names; NetBIOS domains [RFC-1001, RFC- +1002] are flat because that is appropriate for that application. + +However, there are some guidelines that apply to the "normal" parts of +the name space used for hosts, mailboxes, etc., that will make the name +space more uniform, provide for growth, and minimize problems as +software is converted from the older host table. The political +decisions about the top levels of the tree originated in RFC-920. +Current policy for the top levels is discussed in [RFC-1032]. MILNET +conversion issues are covered in [RFC-1031]. + +Lower domains which will eventually be broken into multiple zones should +provide branching at the top of the domain so that the eventual +decomposition can be done without renaming. Node labels which use +special characters, leading digits, etc., are likely to break older +software which depends on more restrictive choices. + +3.3. Technical guidelines on use + +Before the DNS can be used to hold naming information for some kind of +object, two needs must be met: + + - A convention for mapping between object names and domain + names. This describes how information about an object is + accessed. + + - RR types and data formats for describing the object. + +These rules can be quite simple or fairly complex. Very often, the +designer must take into account existing formats and plan for upward +compatibility for existing usage. Multiple mappings or levels of +mapping may be required. + +For hosts, the mapping depends on the existing syntax for host names +which is a subset of the usual text representation for domain names, +together with RR formats for describing host addresses, etc. Because we +need a reliable inverse mapping from address to host name, a special +mapping for addresses into the IN-ADDR.ARPA domain is also defined. + +For mailboxes, the mapping is slightly more complex. The usual mail +address <local-part>@<mail-domain> is mapped into a domain name by +converting <local-part> into a single label (regardles of dots it +contains), converting <mail-domain> into a domain name using the usual +text format for domain names (dots denote label breaks), and +concatenating the two to form a single domain name. Thus the mailbox + + + +Mockapetris [Page 9] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +HOSTMASTER@SRI-NIC.ARPA is represented as a domain name by +HOSTMASTER.SRI-NIC.ARPA. An appreciation for the reasons behind this +design also must take into account the scheme for mail exchanges [RFC- +974]. + +The typical user is not concerned with defining these rules, but should +understand that they usually are the result of numerous compromises +between desires for upward compatibility with old usage, interactions +between different object definitions, and the inevitable urge to add new +features when defining the rules. The way the DNS is used to support +some object is often more crucial than the restrictions inherent in the +DNS. + +3.4. Example name space + +The following figure shows a part of the current domain name space, and +is used in many examples in this RFC. Note that the tree is a very +small subset of the actual name space. + + | + | + +---------------------+------------------+ + | | | + MIL EDU ARPA + | | | + | | | + +-----+-----+ | +------+-----+-----+ + | | | | | | | + BRL NOSC DARPA | IN-ADDR SRI-NIC ACC + | + +--------+------------------+---------------+--------+ + | | | | | + UCI MIT | UDEL YALE + | ISI + | | + +---+---+ | + | | | + LCS ACHILLES +--+-----+-----+--------+ + | | | | | | + XX A C VAXA VENERA Mockapetris + +In this example, the root domain has three immediate subdomains: MIL, +EDU, and ARPA. The LCS.MIT.EDU domain has one immediate subdomain named +XX.LCS.MIT.EDU. All of the leaves are also domains. + +3.5. Preferred name syntax + +The DNS specifications attempt to be as general as possible in the rules + + + +Mockapetris [Page 10] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +for constructing domain names. The idea is that the name of any +existing object can be expressed as a domain name with minimal changes. +However, when assigning a domain name for an object, the prudent user +will select a name which satisfies both the rules of the domain system +and any existing rules for the object, whether these rules are published +or implied by existing programs. + +For example, when naming a mail domain, the user should satisfy both the +rules of this memo and those in RFC-822. When creating a new host name, +the old rules for HOSTS.TXT should be followed. This avoids problems +when old software is converted to use domain names. + +The following syntax will result in fewer problems with many +applications that use domain names (e.g., mail, TELNET). + +<domain> ::= <subdomain> | " " + +<subdomain> ::= <label> | <subdomain> "." <label> + +<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ] + +<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str> + +<let-dig-hyp> ::= <let-dig> | "-" + +<let-dig> ::= <letter> | <digit> + +<letter> ::= any one of the 52 alphabetic characters A through Z in +upper case and a through z in lower case + +<digit> ::= any one of the ten digits 0 through 9 + +Note that while upper and lower case letters are allowed in domain +names, no significance is attached to the case. That is, two names with +the same spelling but different case are to be treated as if identical. + +The labels must follow the rules for ARPANET host names. They must +start with a letter, end with a letter or digit, and have as interior +characters only letters, digits, and hyphen. There are also some +restrictions on the length. Labels must be 63 characters or less. + +For example, the following strings identify hosts in the Internet: + +A.ISI.EDU XX.LCS.MIT.EDU SRI-NIC.ARPA + +3.6. Resource Records + +A domain name identifies a node. Each node has a set of resource + + + +Mockapetris [Page 11] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +information, which may be empty. The set of resource information +associated with a particular name is composed of separate resource +records (RRs). The order of RRs in a set is not significant, and need +not be preserved by name servers, resolvers, or other parts of the DNS. + +When we talk about a specific RR, we assume it has the following: + +owner which is the domain name where the RR is found. + +type which is an encoded 16 bit value that specifies the type + of the resource in this resource record. Types refer to + abstract resources. + + This memo uses the following types: + + A a host address + + CNAME identifies the canonical name of an + alias + + HINFO identifies the CPU and OS used by a host + + MX identifies a mail exchange for the + domain. See [RFC-974 for details. + + NS + the authoritative name server for the domain + + PTR + a pointer to another part of the domain name space + + SOA + identifies the start of a zone of authority] + +class which is an encoded 16 bit value which identifies a + protocol family or instance of a protocol. + + This memo uses the following classes: + + IN the Internet system + + CH the Chaos system + +TTL which is the time to live of the RR. This field is a 32 + bit integer in units of seconds, an is primarily used by + resolvers when they cache RRs. The TTL describes how + long a RR can be cached before it should be discarded. + + + + +Mockapetris [Page 12] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +RDATA which is the type and sometimes class dependent data + which describes the resource: + + A For the IN class, a 32 bit IP address + + For the CH class, a domain name followed + by a 16 bit octal Chaos address. + + CNAME a domain name. + + MX a 16 bit preference value (lower is + better) followed by a host name willing + to act as a mail exchange for the owner + domain. + + NS a host name. + + PTR a domain name. + + SOA several fields. + +The owner name is often implicit, rather than forming an integral part +of the RR. For example, many name servers internally form tree or hash +structures for the name space, and chain RRs off nodes. The remaining +RR parts are the fixed header (type, class, TTL) which is consistent for +all RRs, and a variable part (RDATA) that fits the needs of the resource +being described. + +The meaning of the TTL field is a time limit on how long an RR can be +kept in a cache. This limit does not apply to authoritative data in +zones; it is also timed out, but by the refreshing policies for the +zone. The TTL is assigned by the administrator for the zone where the +data originates. While short TTLs can be used to minimize caching, and +a zero TTL prohibits caching, the realities of Internet performance +suggest that these times should be on the order of days for the typical +host. If a change can be anticipated, the TTL can be reduced prior to +the change to minimize inconsistency during the change, and then +increased back to its former value following the change. + +The data in the RDATA section of RRs is carried as a combination of +binary strings and domain names. The domain names are frequently used +as "pointers" to other data in the DNS. + +3.6.1. Textual expression of RRs + +RRs are represented in binary form in the packets of the DNS protocol, +and are usually represented in highly encoded form when stored in a name +server or resolver. In this memo, we adopt a style similar to that used + + + +Mockapetris [Page 13] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +in master files in order to show the contents of RRs. In this format, +most RRs are shown on a single line, although continuation lines are +possible using parentheses. + +The start of the line gives the owner of the RR. If a line begins with +a blank, then the owner is assumed to be the same as that of the +previous RR. Blank lines are often included for readability. + +Following the owner, we list the TTL, type, and class of the RR. Class +and type use the mnemonics defined above, and TTL is an integer before +the type field. In order to avoid ambiguity in parsing, type and class +mnemonics are disjoint, TTLs are integers, and the type mnemonic is +always last. The IN class and TTL values are often omitted from examples +in the interests of clarity. + +The resource data or RDATA section of the RR are given using knowledge +of the typical representation for the data. + +For example, we might show the RRs carried in a message as: + + ISI.EDU. MX 10 VENERA.ISI.EDU. + MX 10 VAXA.ISI.EDU. + VENERA.ISI.EDU. A 128.9.0.32 + A 10.1.0.52 + VAXA.ISI.EDU. A 10.2.0.27 + A 128.9.0.33 + +The MX RRs have an RDATA section which consists of a 16 bit number +followed by a domain name. The address RRs use a standard IP address +format to contain a 32 bit internet address. + +This example shows six RRs, with two RRs at each of three domain names. + +Similarly we might see: + + XX.LCS.MIT.EDU. IN A 10.0.0.44 + CH A MIT.EDU. 2420 + +This example shows two addresses for XX.LCS.MIT.EDU, each of a different +class. + +3.6.2. Aliases and canonical names + +In existing systems, hosts and other resources often have several names +that identify the same resource. For example, the names C.ISI.EDU and +USC-ISIC.ARPA both identify the same host. Similarly, in the case of +mailboxes, many organizations provide many names that actually go to the +same mailbox; for example Mockapetris@C.ISI.EDU, Mockapetris@B.ISI.EDU, + + + +Mockapetris [Page 14] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +and PVM@ISI.EDU all go to the same mailbox (although the mechanism +behind this is somewhat complicated). + +Most of these systems have a notion that one of the equivalent set of +names is the canonical or primary name and all others are aliases. + +The domain system provides such a feature using the canonical name +(CNAME) RR. A CNAME RR identifies its owner name as an alias, and +specifies the corresponding canonical name in the RDATA section of the +RR. If a CNAME RR is present at a node, no other data should be +present; this ensures that the data for a canonical name and its aliases +cannot be different. This rule also insures that a cached CNAME can be +used without checking with an authoritative server for other RR types. + +CNAME RRs cause special action in DNS software. When a name server +fails to find a desired RR in the resource set associated with the +domain name, it checks to see if the resource set consists of a CNAME +record with a matching class. If so, the name server includes the CNAME +record in the response and restarts the query at the domain name +specified in the data field of the CNAME record. The one exception to +this rule is that queries which match the CNAME type are not restarted. + +For example, suppose a name server was processing a query with for USC- +ISIC.ARPA, asking for type A information, and had the following resource +records: + + USC-ISIC.ARPA IN CNAME C.ISI.EDU + + C.ISI.EDU IN A 10.0.0.52 + +Both of these RRs would be returned in the response to the type A query, +while a type CNAME or * query should return just the CNAME. + +Domain names in RRs which point at another name should always point at +the primary name and not the alias. This avoids extra indirections in +accessing information. For example, the address to name RR for the +above host should be: + + 52.0.0.10.IN-ADDR.ARPA IN PTR C.ISI.EDU + +rather than pointing at USC-ISIC.ARPA. Of course, by the robustness +principle, domain software should not fail when presented with CNAME +chains or loops; CNAME chains should be followed and CNAME loops +signalled as an error. + +3.7. Queries + +Queries are messages which may be sent to a name server to provoke a + + + +Mockapetris [Page 15] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +response. In the Internet, queries are carried in UDP datagrams or over +TCP connections. The response by the name server either answers the +question posed in the query, refers the requester to another set of name +servers, or signals some error condition. + +In general, the user does not generate queries directly, but instead +makes a request to a resolver which in turn sends one or more queries to +name servers and deals with the error conditions and referrals that may +result. Of course, the possible questions which can be asked in a query +does shape the kind of service a resolver can provide. + +DNS queries and responses are carried in a standard message format. The +message format has a header containing a number of fixed fields which +are always present, and four sections which carry query parameters and +RRs. + +The most important field in the header is a four bit field called an +opcode which separates different queries. Of the possible 16 values, +one (standard query) is part of the official protocol, two (inverse +query and status query) are options, one (completion) is obsolete, and +the rest are unassigned. + +The four sections are: + +Question Carries the query name and other query parameters. + +Answer Carries RRs which directly answer the query. + +Authority Carries RRs which describe other authoritative servers. + May optionally carry the SOA RR for the authoritative + data in the answer section. + +Additional Carries RRs which may be helpful in using the RRs in the + other sections. + +Note that the content, but not the format, of these sections varies with +header opcode. + +3.7.1. Standard queries + +A standard query specifies a target domain name (QNAME), query type +(QTYPE), and query class (QCLASS) and asks for RRs which match. This +type of query makes up such a vast majority of DNS queries that we use +the term "query" to mean standard query unless otherwise specified. The +QTYPE and QCLASS fields are each 16 bits long, and are a superset of +defined types and classes. + + + + + +Mockapetris [Page 16] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +The QTYPE field may contain: + +<any type> matches just that type. (e.g., A, PTR). + +AXFR special zone transfer QTYPE. + +MAILB matches all mail box related RRs (e.g. MB and MG). + +* matches all RR types. + +The QCLASS field may contain: + +<any class> matches just that class (e.g., IN, CH). + +* matches aLL RR classes. + +Using the query domain name, QTYPE, and QCLASS, the name server looks +for matching RRs. In addition to relevant records, the name server may +return RRs that point toward a name server that has the desired +information or RRs that are expected to be useful in interpreting the +relevant RRs. For example, a name server that doesn't have the +requested information may know a name server that does; a name server +that returns a domain name in a relevant RR may also return the RR that +binds that domain name to an address. + +For example, a mailer tying to send mail to Mockapetris@ISI.EDU might +ask the resolver for mail information about ISI.EDU, resulting in a +query for QNAME=ISI.EDU, QTYPE=MX, QCLASS=IN. The response's answer +section would be: + + ISI.EDU. MX 10 VENERA.ISI.EDU. + MX 10 VAXA.ISI.EDU. + +while the additional section might be: + + VAXA.ISI.EDU. A 10.2.0.27 + A 128.9.0.33 + VENERA.ISI.EDU. A 10.1.0.52 + A 128.9.0.32 + +Because the server assumes that if the requester wants mail exchange +information, it will probably want the addresses of the mail exchanges +soon afterward. + +Note that the QCLASS=* construct requires special interpretation +regarding authority. Since a particular name server may not know all of +the classes available in the domain system, it can never know if it is +authoritative for all classes. Hence responses to QCLASS=* queries can + + + +Mockapetris [Page 17] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +never be authoritative. + +3.7.2. Inverse queries (Optional) + +Name servers may also support inverse queries that map a particular +resource to a domain name or domain names that have that resource. For +example, while a standard query might map a domain name to a SOA RR, the +corresponding inverse query might map the SOA RR back to the domain +name. + +Implementation of this service is optional in a name server, but all +name servers must at least be able to understand an inverse query +message and return a not-implemented error response. + +The domain system cannot guarantee the completeness or uniqueness of +inverse queries because the domain system is organized by domain name +rather than by host address or any other resource type. Inverse queries +are primarily useful for debugging and database maintenance activities. + +Inverse queries may not return the proper TTL, and do not indicate cases +where the identified RR is one of a set (for example, one address for a +host having multiple addresses). Therefore, the RRs returned in inverse +queries should never be cached. + +Inverse queries are NOT an acceptable method for mapping host addresses +to host names; use the IN-ADDR.ARPA domain instead. + +A detailed discussion of inverse queries is contained in [RFC-1035]. + +3.8. Status queries (Experimental) + +To be defined. + +3.9. Completion queries (Obsolete) + +The optional completion services described in RFCs 882 and 883 have been +deleted. Redesigned services may become available in the future, or the +opcodes may be reclaimed for other use. + +4. NAME SERVERS + +4.1. Introduction + +Name servers are the repositories of information that make up the domain +database. The database is divided up into sections called zones, which +are distributed among the name servers. While name servers can have +several optional functions and sources of data, the essential task of a +name server is to answer queries using data in its zones. By design, + + + +Mockapetris [Page 18] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +name servers can answer queries in a simple manner; the response can +always be generated using only local data, and either contains the +answer to the question or a referral to other name servers "closer" to +the desired information. + +A given zone will be available from several name servers to insure its +availability in spite of host or communication link failure. By +administrative fiat, we require every zone to be available on at least +two servers, and many zones have more redundancy than that. + +A given name server will typically support one or more zones, but this +gives it authoritative information about only a small section of the +domain tree. It may also have some cached non-authoritative data about +other parts of the tree. The name server marks its responses to queries +so that the requester can tell whether the response comes from +authoritative data or not. + +4.2. How the database is divided into zones + +The domain database is partitioned in two ways: by class, and by "cuts" +made in the name space between nodes. + +The class partition is simple. The database for any class is organized, +delegated, and maintained separately from all other classes. Since, by +convention, the name spaces are the same for all classes, the separate +classes can be thought of as an array of parallel namespace trees. Note +that the data attached to nodes will be different for these different +parallel classes. The most common reasons for creating a new class are +the necessity for a new data format for existing types or a desire for a +separately managed version of the existing name space. + +Within a class, "cuts" in the name space can be made between any two +adjacent nodes. After all cuts are made, each group of connected name +space is a separate zone. The zone is said to be authoritative for all +names in the connected region. Note that the "cuts" in the name space +may be in different places for different classes, the name servers may +be different, etc. + +These rules mean that every zone has at least one node, and hence domain +name, for which it is authoritative, and all of the nodes in a +particular zone are connected. Given, the tree structure, every zone +has a highest node which is closer to the root than any other node in +the zone. The name of this node is often used to identify the zone. + +It would be possible, though not particularly useful, to partition the +name space so that each domain name was in a separate zone or so that +all nodes were in a single zone. Instead, the database is partitioned +at points where a particular organization wants to take over control of + + + +Mockapetris [Page 19] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +a subtree. Once an organization controls its own zone it can +unilaterally change the data in the zone, grow new tree sections +connected to the zone, delete existing nodes, or delegate new subzones +under its zone. + +If the organization has substructure, it may want to make further +internal partitions to achieve nested delegations of name space control. +In some cases, such divisions are made purely to make database +maintenance more convenient. + +4.2.1. Technical considerations + +The data that describes a zone has four major parts: + + - Authoritative data for all nodes within the zone. + + - Data that defines the top node of the zone (can be thought of + as part of the authoritative data). + + - Data that describes delegated subzones, i.e., cuts around the + bottom of the zone. + + - Data that allows access to name servers for subzones + (sometimes called "glue" data). + +All of this data is expressed in the form of RRs, so a zone can be +completely described in terms of a set of RRs. Whole zones can be +transferred between name servers by transferring the RRs, either carried +in a series of messages or by FTPing a master file which is a textual +representation. + +The authoritative data for a zone is simply all of the RRs attached to +all of the nodes from the top node of the zone down to leaf nodes or +nodes above cuts around the bottom edge of the zone. + +Though logically part of the authoritative data, the RRs that describe +the top node of the zone are especially important to the zone's +management. These RRs are of two types: name server RRs that list, one +per RR, all of the servers for the zone, and a single SOA RR that +describes zone management parameters. + +The RRs that describe cuts around the bottom of the zone are NS RRs that +name the servers for the subzones. Since the cuts are between nodes, +these RRs are NOT part of the authoritative data of the zone, and should +be exactly the same as the corresponding RRs in the top node of the +subzone. Since name servers are always associated with zone boundaries, +NS RRs are only found at nodes which are the top node of some zone. In +the data that makes up a zone, NS RRs are found at the top node of the + + + +Mockapetris [Page 20] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +zone (and are authoritative) and at cuts around the bottom of the zone +(where they are not authoritative), but never in between. + +One of the goals of the zone structure is that any zone have all the +data required to set up communications with the name servers for any +subzones. That is, parent zones have all the information needed to +access servers for their children zones. The NS RRs that name the +servers for subzones are often not enough for this task since they name +the servers, but do not give their addresses. In particular, if the +name of the name server is itself in the subzone, we could be faced with +the situation where the NS RRs tell us that in order to learn a name +server's address, we should contact the server using the address we wish +to learn. To fix this problem, a zone contains "glue" RRs which are not +part of the authoritative data, and are address RRs for the servers. +These RRs are only necessary if the name server's name is "below" the +cut, and are only used as part of a referral response. + +4.2.2. Administrative considerations + +When some organization wants to control its own domain, the first step +is to identify the proper parent zone, and get the parent zone's owners +to agree to the delegation of control. While there are no particular +technical constraints dealing with where in the tree this can be done, +there are some administrative groupings discussed in [RFC-1032] which +deal with top level organization, and middle level zones are free to +create their own rules. For example, one university might choose to use +a single zone, while another might choose to organize by subzones +dedicated to individual departments or schools. [RFC-1033] catalogs +available DNS software an discusses administration procedures. + +Once the proper name for the new subzone is selected, the new owners +should be required to demonstrate redundant name server support. Note +that there is no requirement that the servers for a zone reside in a +host which has a name in that domain. In many cases, a zone will be +more accessible to the internet at large if its servers are widely +distributed rather than being within the physical facilities controlled +by the same organization that manages the zone. For example, in the +current DNS, one of the name servers for the United Kingdom, or UK +domain, is found in the US. This allows US hosts to get UK data without +using limited transatlantic bandwidth. + +As the last installation step, the delegation NS RRs and glue RRs +necessary to make the delegation effective should be added to the parent +zone. The administrators of both zones should insure that the NS and +glue RRs which mark both sides of the cut are consistent and remain so. + +4.3. Name server internals + + + + +Mockapetris [Page 21] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +4.3.1. Queries and responses + +The principal activity of name servers is to answer standard queries. +Both the query and its response are carried in a standard message format +which is described in [RFC-1035]. The query contains a QTYPE, QCLASS, +and QNAME, which describe the types and classes of desired information +and the name of interest. + +The way that the name server answers the query depends upon whether it +is operating in recursive mode or not: + + - The simplest mode for the server is non-recursive, since it + can answer queries using only local information: the response + contains an error, the answer, or a referral to some other + server "closer" to the answer. All name servers must + implement non-recursive queries. + + - The simplest mode for the client is recursive, since in this + mode the name server acts in the role of a resolver and + returns either an error or the answer, but never referrals. + This service is optional in a name server, and the name server + may also choose to restrict the clients which can use + recursive mode. + +Recursive service is helpful in several situations: + + - a relatively simple requester that lacks the ability to use + anything other than a direct answer to the question. + + - a request that needs to cross protocol or other boundaries and + can be sent to a server which can act as intermediary. + + - a network where we want to concentrate the cache rather than + having a separate cache for each client. + +Non-recursive service is appropriate if the requester is capable of +pursuing referrals and interested in information which will aid future +requests. + +The use of recursive mode is limited to cases where both the client and +the name server agree to its use. The agreement is negotiated through +the use of two bits in query and response messages: + + - The recursion available, or RA bit, is set or cleared by a + name server in all responses. The bit is true if the name + server is willing to provide recursive service for the client, + regardless of whether the client requested recursive service. + That is, RA signals availability rather than use. + + + +Mockapetris [Page 22] + +RFC 1034 Domain Concepts and Facilities November 1987 + + + - Queries contain a bit called recursion desired or RD. This + bit specifies specifies whether the requester wants recursive + service for this query. Clients may request recursive service + from any name server, though they should depend upon receiving + it only from servers which have previously sent an RA, or + servers which have agreed to provide service through private + agreement or some other means outside of the DNS protocol. + +The recursive mode occurs when a query with RD set arrives at a server +which is willing to provide recursive service; the client can verify +that recursive mode was used by checking that both RA and RD are set in +the reply. Note that the name server should never perform recursive +service unless asked via RD, since this interferes with trouble shooting +of name servers and their databases. + +If recursive service is requested and available, the recursive response +to a query will be one of the following: + + - The answer to the query, possibly preface by one or more CNAME + RRs that specify aliases encountered on the way to an answer. + + - A name error indicating that the name does not exist. This + may include CNAME RRs that indicate that the original query + name was an alias for a name which does not exist. + + - A temporary error indication. + +If recursive service is not requested or is not available, the non- +recursive response will be one of the following: + + - An authoritative name error indicating that the name does not + exist. + + - A temporary error indication. + + - Some combination of: + + RRs that answer the question, together with an indication + whether the data comes from a zone or is cached. + + A referral to name servers which have zones which are closer + ancestors to the name than the server sending the reply. + + - RRs that the name server thinks will prove useful to the + requester. + + + + + + +Mockapetris [Page 23] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +4.3.2. Algorithm + +The actual algorithm used by the name server will depend on the local OS +and data structures used to store RRs. The following algorithm assumes +that the RRs are organized in several tree structures, one for each +zone, and another for the cache: + + 1. Set or clear the value of recursion available in the response + depending on whether the name server is willing to provide + recursive service. If recursive service is available and + requested via the RD bit in the query, go to step 5, + otherwise step 2. + + 2. Search the available zones for the zone which is the nearest + ancestor to QNAME. If such a zone is found, go to step 3, + otherwise step 4. + + 3. Start matching down, label by label, in the zone. The + matching process can terminate several ways: + + a. If the whole of QNAME is matched, we have found the + node. + + If the data at the node is a CNAME, and QTYPE doesn't + match CNAME, copy the CNAME RR into the answer section + of the response, change QNAME to the canonical name in + the CNAME RR, and go back to step 1. + + Otherwise, copy all RRs which match QTYPE into the + answer section and go to step 6. + + b. If a match would take us out of the authoritative data, + we have a referral. This happens when we encounter a + node with NS RRs marking cuts along the bottom of a + zone. + + Copy the NS RRs for the subzone into the authority + section of the reply. Put whatever addresses are + available into the additional section, using glue RRs + if the addresses are not available from authoritative + data or the cache. Go to step 4. + + c. If at some label, a match is impossible (i.e., the + corresponding label does not exist), look to see if a + the "*" label exists. + + If the "*" label does not exist, check whether the name + we are looking for is the original QNAME in the query + + + +Mockapetris [Page 24] + +RFC 1034 Domain Concepts and Facilities November 1987 + + + or a name we have followed due to a CNAME. If the name + is original, set an authoritative name error in the + response and exit. Otherwise just exit. + + If the "*" label does exist, match RRs at that node + against QTYPE. If any match, copy them into the answer + section, but set the owner of the RR to be QNAME, and + not the node with the "*" label. Go to step 6. + + 4. Start matching down in the cache. If QNAME is found in the + cache, copy all RRs attached to it that match QTYPE into the + answer section. If there was no delegation from + authoritative data, look for the best one from the cache, and + put it in the authority section. Go to step 6. + + 5. Using the local resolver or a copy of its algorithm (see + resolver section of this memo) to answer the query. Store + the results, including any intermediate CNAMEs, in the answer + section of the response. + + 6. Using local data only, attempt to add other RRs which may be + useful to the additional section of the query. Exit. + +4.3.3. Wildcards + +In the previous algorithm, special treatment was given to RRs with owner +names starting with the label "*". Such RRs are called wildcards. +Wildcard RRs can be thought of as instructions for synthesizing RRs. +When the appropriate conditions are met, the name server creates RRs +with an owner name equal to the query name and contents taken from the +wildcard RRs. + +This facility is most often used to create a zone which will be used to +forward mail from the Internet to some other mail system. The general +idea is that any name in that zone which is presented to server in a +query will be assumed to exist, with certain properties, unless explicit +evidence exists to the contrary. Note that the use of the term zone +here, instead of domain, is intentional; such defaults do not propagate +across zone boundaries, although a subzone may choose to achieve that +appearance by setting up similar defaults. + +The contents of the wildcard RRs follows the usual rules and formats for +RRs. The wildcards in the zone have an owner name that controls the +query names they will match. The owner name of the wildcard RRs is of +the form "*.<anydomain>", where <anydomain> is any domain name. +<anydomain> should not contain other * labels, and should be in the +authoritative data of the zone. The wildcards potentially apply to +descendants of <anydomain>, but not to <anydomain> itself. Another way + + + +Mockapetris [Page 25] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +to look at this is that the "*" label always matches at least one whole +label and sometimes more, but always whole labels. + +Wildcard RRs do not apply: + + - When the query is in another zone. That is, delegation cancels + the wildcard defaults. + + - When the query name or a name between the wildcard domain and + the query name is know to exist. For example, if a wildcard + RR has an owner name of "*.X", and the zone also contains RRs + attached to B.X, the wildcards would apply to queries for name + Z.X (presuming there is no explicit information for Z.X), but + not to B.X, A.B.X, or X. + +A * label appearing in a query name has no special effect, but can be +used to test for wildcards in an authoritative zone; such a query is the +only way to get a response containing RRs with an owner name with * in +it. The result of such a query should not be cached. + +Note that the contents of the wildcard RRs are not modified when used to +synthesize RRs. + +To illustrate the use of wildcard RRs, suppose a large company with a +large, non-IP/TCP, network wanted to create a mail gateway. If the +company was called X.COM, and IP/TCP capable gateway machine was called +A.X.COM, the following RRs might be entered into the COM zone: + + X.COM MX 10 A.X.COM + + *.X.COM MX 10 A.X.COM + + A.X.COM A 1.2.3.4 + A.X.COM MX 10 A.X.COM + + *.A.X.COM MX 10 A.X.COM + +This would cause any MX query for any domain name ending in X.COM to +return an MX RR pointing at A.X.COM. Two wildcard RRs are required +since the effect of the wildcard at *.X.COM is inhibited in the A.X.COM +subtree by the explicit data for A.X.COM. Note also that the explicit +MX data at X.COM and A.X.COM is required, and that none of the RRs above +would match a query name of XX.COM. + +4.3.4. Negative response caching (Optional) + +The DNS provides an optional service which allows name servers to +distribute, and resolvers to cache, negative results with TTLs. For + + + +Mockapetris [Page 26] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +example, a name server can distribute a TTL along with a name error +indication, and a resolver receiving such information is allowed to +assume that the name does not exist during the TTL period without +consulting authoritative data. Similarly, a resolver can make a query +with a QTYPE which matches multiple types, and cache the fact that some +of the types are not present. + +This feature can be particularly important in a system which implements +naming shorthands that use search lists beacuse a popular shorthand, +which happens to require a suffix toward the end of the search list, +will generate multiple name errors whenever it is used. + +The method is that a name server may add an SOA RR to the additional +section of a response when that response is authoritative. The SOA must +be that of the zone which was the source of the authoritative data in +the answer section, or name error if applicable. The MINIMUM field of +the SOA controls the length of time that the negative result may be +cached. + +Note that in some circumstances, the answer section may contain multiple +owner names. In this case, the SOA mechanism should only be used for +the data which matches QNAME, which is the only authoritative data in +this section. + +Name servers and resolvers should never attempt to add SOAs to the +additional section of a non-authoritative response, or attempt to infer +results which are not directly stated in an authoritative response. +There are several reasons for this, including: cached information isn't +usually enough to match up RRs and their zone names, SOA RRs may be +cached due to direct SOA queries, and name servers are not required to +output the SOAs in the authority section. + +This feature is optional, although a refined version is expected to +become part of the standard protocol in the future. Name servers are +not required to add the SOA RRs in all authoritative responses, nor are +resolvers required to cache negative results. Both are recommended. +All resolvers and recursive name servers are required to at least be +able to ignore the SOA RR when it is present in a response. + +Some experiments have also been proposed which will use this feature. +The idea is that if cached data is known to come from a particular zone, +and if an authoritative copy of the zone's SOA is obtained, and if the +zone's SERIAL has not changed since the data was cached, then the TTL of +the cached data can be reset to the zone MINIMUM value if it is smaller. +This usage is mentioned for planning purposes only, and is not +recommended as yet. + + + + + +Mockapetris [Page 27] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +4.3.5. Zone maintenance and transfers + +Part of the job of a zone administrator is to maintain the zones at all +of the name servers which are authoritative for the zone. When the +inevitable changes are made, they must be distributed to all of the name +servers. While this distribution can be accomplished using FTP or some +other ad hoc procedure, the preferred method is the zone transfer part +of the DNS protocol. + +The general model of automatic zone transfer or refreshing is that one +of the name servers is the master or primary for the zone. Changes are +coordinated at the primary, typically by editing a master file for the +zone. After editing, the administrator signals the master server to +load the new zone. The other non-master or secondary servers for the +zone periodically check for changes (at a selectable interval) and +obtain new zone copies when changes have been made. + +To detect changes, secondaries just check the SERIAL field of the SOA +for the zone. In addition to whatever other changes are made, the +SERIAL field in the SOA of the zone is always advanced whenever any +change is made to the zone. The advancing can be a simple increment, or +could be based on the write date and time of the master file, etc. The +purpose is to make it possible to determine which of two copies of a +zone is more recent by comparing serial numbers. Serial number advances +and comparisons use sequence space arithmetic, so there is a theoretic +limit on how fast a zone can be updated, basically that old copies must +die out before the serial number covers half of its 32 bit range. In +practice, the only concern is that the compare operation deals properly +with comparisons around the boundary between the most positive and most +negative 32 bit numbers. + +The periodic polling of the secondary servers is controlled by +parameters in the SOA RR for the zone, which set the minimum acceptable +polling intervals. The parameters are called REFRESH, RETRY, and +EXPIRE. Whenever a new zone is loaded in a secondary, the secondary +waits REFRESH seconds before checking with the primary for a new serial. +If this check cannot be completed, new checks are started every RETRY +seconds. The check is a simple query to the primary for the SOA RR of +the zone. If the serial field in the secondary's zone copy is equal to +the serial returned by the primary, then no changes have occurred, and +the REFRESH interval wait is restarted. If the secondary finds it +impossible to perform a serial check for the EXPIRE interval, it must +assume that its copy of the zone is obsolete an discard it. + +When the poll shows that the zone has changed, then the secondary server +must request a zone transfer via an AXFR request for the zone. The AXFR +may cause an error, such as refused, but normally is answered by a +sequence of response messages. The first and last messages must contain + + + +Mockapetris [Page 28] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +the data for the top authoritative node of the zone. Intermediate +messages carry all of the other RRs from the zone, including both +authoritative and non-authoritative RRs. The stream of messages allows +the secondary to construct a copy of the zone. Because accuracy is +essential, TCP or some other reliable protocol must be used for AXFR +requests. + +Each secondary server is required to perform the following operations +against the master, but may also optionally perform these operations +against other secondary servers. This strategy can improve the transfer +process when the primary is unavailable due to host downtime or network +problems, or when a secondary server has better network access to an +"intermediate" secondary than to the primary. + +5. RESOLVERS + +5.1. Introduction + +Resolvers are programs that interface user programs to domain name +servers. In the simplest case, a resolver receives a request from a +user program (e.g., mail programs, TELNET, FTP) in the form of a +subroutine call, system call etc., and returns the desired information +in a form compatible with the local host's data formats. + +The resolver is located on the same machine as the program that requests +the resolver's services, but it may need to consult name servers on +other hosts. Because a resolver may need to consult several name +servers, or may have the requested information in a local cache, the +amount of time that a resolver will take to complete can vary quite a +bit, from milliseconds to several seconds. + +A very important goal of the resolver is to eliminate network delay and +name server load from most requests by answering them from its cache of +prior results. It follows that caches which are shared by multiple +processes, users, machines, etc., are more efficient than non-shared +caches. + +5.2. Client-resolver interface + +5.2.1. Typical functions + +The client interface to the resolver is influenced by the local host's +conventions, but the typical resolver-client interface has three +functions: + + 1. Host name to host address translation. + + This function is often defined to mimic a previous HOSTS.TXT + + + +Mockapetris [Page 29] + +RFC 1034 Domain Concepts and Facilities November 1987 + + + based function. Given a character string, the caller wants + one or more 32 bit IP addresses. Under the DNS, it + translates into a request for type A RRs. Since the DNS does + not preserve the order of RRs, this function may choose to + sort the returned addresses or select the "best" address if + the service returns only one choice to the client. Note that + a multiple address return is recommended, but a single + address may be the only way to emulate prior HOSTS.TXT + services. + + 2. Host address to host name translation + + This function will often follow the form of previous + functions. Given a 32 bit IP address, the caller wants a + character string. The octets of the IP address are reversed, + used as name components, and suffixed with "IN-ADDR.ARPA". A + type PTR query is used to get the RR with the primary name of + the host. For example, a request for the host name + corresponding to IP address 1.2.3.4 looks for PTR RRs for + domain name "4.3.2.1.IN-ADDR.ARPA". + + 3. General lookup function + + This function retrieves arbitrary information from the DNS, + and has no counterpart in previous systems. The caller + supplies a QNAME, QTYPE, and QCLASS, and wants all of the + matching RRs. This function will often use the DNS format + for all RR data instead of the local host's, and returns all + RR content (e.g., TTL) instead of a processed form with local + quoting conventions. + +When the resolver performs the indicated function, it usually has one of +the following results to pass back to the client: + + - One or more RRs giving the requested data. + + In this case the resolver returns the answer in the + appropriate format. + + - A name error (NE). + + This happens when the referenced name does not exist. For + example, a user may have mistyped a host name. + + - A data not found error. + + This happens when the referenced name exists, but data of the + appropriate type does not. For example, a host address + + + +Mockapetris [Page 30] + +RFC 1034 Domain Concepts and Facilities November 1987 + + + function applied to a mailbox name would return this error + since the name exists, but no address RR is present. + +It is important to note that the functions for translating between host +names and addresses may combine the "name error" and "data not found" +error conditions into a single type of error return, but the general +function should not. One reason for this is that applications may ask +first for one type of information about a name followed by a second +request to the same name for some other type of information; if the two +errors are combined, then useless queries may slow the application. + +5.2.2. Aliases + +While attempting to resolve a particular request, the resolver may find +that the name in question is an alias. For example, the resolver might +find that the name given for host name to address translation is an +alias when it finds the CNAME RR. If possible, the alias condition +should be signalled back from the resolver to the client. + +In most cases a resolver simply restarts the query at the new name when +it encounters a CNAME. However, when performing the general function, +the resolver should not pursue aliases when the CNAME RR matches the +query type. This allows queries which ask whether an alias is present. +For example, if the query type is CNAME, the user is interested in the +CNAME RR itself, and not the RRs at the name it points to. + +Several special conditions can occur with aliases. Multiple levels of +aliases should be avoided due to their lack of efficiency, but should +not be signalled as an error. Alias loops and aliases which point to +non-existent names should be caught and an error condition passed back +to the client. + +5.2.3. Temporary failures + +In a less than perfect world, all resolvers will occasionally be unable +to resolve a particular request. This condition can be caused by a +resolver which becomes separated from the rest of the network due to a +link failure or gateway problem, or less often by coincident failure or +unavailability of all servers for a particular domain. + +It is essential that this sort of condition should not be signalled as a +name or data not present error to applications. This sort of behavior +is annoying to humans, and can wreak havoc when mail systems use the +DNS. + +While in some cases it is possible to deal with such a temporary problem +by blocking the request indefinitely, this is usually not a good choice, +particularly when the client is a server process that could move on to + + + +Mockapetris [Page 31] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +other tasks. The recommended solution is to always have temporary +failure as one of the possible results of a resolver function, even +though this may make emulation of existing HOSTS.TXT functions more +difficult. + +5.3. Resolver internals + +Every resolver implementation uses slightly different algorithms, and +typically spends much more logic dealing with errors of various sorts +than typical occurances. This section outlines a recommended basic +strategy for resolver operation, but leaves details to [RFC-1035]. + +5.3.1. Stub resolvers + +One option for implementing a resolver is to move the resolution +function out of the local machine and into a name server which supports +recursive queries. This can provide an easy method of providing domain +service in a PC which lacks the resources to perform the resolver +function, or can centralize the cache for a whole local network or +organization. + +All that the remaining stub needs is a list of name server addresses +that will perform the recursive requests. This type of resolver +presumably needs the information in a configuration file, since it +probably lacks the sophistication to locate it in the domain database. +The user also needs to verify that the listed servers will perform the +recursive service; a name server is free to refuse to perform recursive +services for any or all clients. The user should consult the local +system administrator to find name servers willing to perform the +service. + +This type of service suffers from some drawbacks. Since the recursive +requests may take an arbitrary amount of time to perform, the stub may +have difficulty optimizing retransmission intervals to deal with both +lost UDP packets and dead servers; the name server can be easily +overloaded by too zealous a stub if it interprets retransmissions as new +requests. Use of TCP may be an answer, but TCP may well place burdens +on the host's capabilities which are similar to those of a real +resolver. + +5.3.2. Resources + +In addition to its own resources, the resolver may also have shared +access to zones maintained by a local name server. This gives the +resolver the advantage of more rapid access, but the resolver must be +careful to never let cached information override zone data. In this +discussion the term "local information" is meant to mean the union of +the cache and such shared zones, with the understanding that + + + +Mockapetris [Page 32] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +authoritative data is always used in preference to cached data when both +are present. + +The following resolver algorithm assumes that all functions have been +converted to a general lookup function, and uses the following data +structures to represent the state of a request in progress in the +resolver: + +SNAME the domain name we are searching for. + +STYPE the QTYPE of the search request. + +SCLASS the QCLASS of the search request. + +SLIST a structure which describes the name servers and the + zone which the resolver is currently trying to query. + This structure keeps track of the resolver's current + best guess about which name servers hold the desired + information; it is updated when arriving information + changes the guess. This structure includes the + equivalent of a zone name, the known name servers for + the zone, the known addresses for the name servers, and + history information which can be used to suggest which + server is likely to be the best one to try next. The + zone name equivalent is a match count of the number of + labels from the root down which SNAME has in common with + the zone being queried; this is used as a measure of how + "close" the resolver is to SNAME. + +SBELT a "safety belt" structure of the same form as SLIST, + which is initialized from a configuration file, and + lists servers which should be used when the resolver + doesn't have any local information to guide name server + selection. The match count will be -1 to indicate that + no labels are known to match. + +CACHE A structure which stores the results from previous + responses. Since resolvers are responsible for + discarding old RRs whose TTL has expired, most + implementations convert the interval specified in + arriving RRs to some sort of absolute time when the RR + is stored in the cache. Instead of counting the TTLs + down individually, the resolver just ignores or discards + old RRs when it runs across them in the course of a + search, or discards them during periodic sweeps to + reclaim the memory consumed by old RRs. + + + + + +Mockapetris [Page 33] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +5.3.3. Algorithm + +The top level algorithm has four steps: + + 1. See if the answer is in local information, and if so return + it to the client. + + 2. Find the best servers to ask. + + 3. Send them queries until one returns a response. + + 4. Analyze the response, either: + + a. if the response answers the question or contains a name + error, cache the data as well as returning it back to + the client. + + b. if the response contains a better delegation to other + servers, cache the delegation information, and go to + step 2. + + c. if the response shows a CNAME and that is not the + answer itself, cache the CNAME, change the SNAME to the + canonical name in the CNAME RR and go to step 1. + + d. if the response shows a servers failure or other + bizarre contents, delete the server from the SLIST and + go back to step 3. + +Step 1 searches the cache for the desired data. If the data is in the +cache, it is assumed to be good enough for normal use. Some resolvers +have an option at the user interface which will force the resolver to +ignore the cached data and consult with an authoritative server. This +is not recommended as the default. If the resolver has direct access to +a name server's zones, it should check to see if the desired data is +present in authoritative form, and if so, use the authoritative data in +preference to cached data. + +Step 2 looks for a name server to ask for the required data. The +general strategy is to look for locally-available name server RRs, +starting at SNAME, then the parent domain name of SNAME, the +grandparent, and so on toward the root. Thus if SNAME were +Mockapetris.ISI.EDU, this step would look for NS RRs for +Mockapetris.ISI.EDU, then ISI.EDU, then EDU, and then . (the root). +These NS RRs list the names of hosts for a zone at or above SNAME. Copy +the names into SLIST. Set up their addresses using local data. It may +be the case that the addresses are not available. The resolver has many +choices here; the best is to start parallel resolver processes looking + + + +Mockapetris [Page 34] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +for the addresses while continuing onward with the addresses which are +available. Obviously, the design choices and options are complicated +and a function of the local host's capabilities. The recommended +priorities for the resolver designer are: + + 1. Bound the amount of work (packets sent, parallel processes + started) so that a request can't get into an infinite loop or + start off a chain reaction of requests or queries with other + implementations EVEN IF SOMEONE HAS INCORRECTLY CONFIGURED + SOME DATA. + + 2. Get back an answer if at all possible. + + 3. Avoid unnecessary transmissions. + + 4. Get the answer as quickly as possible. + +If the search for NS RRs fails, then the resolver initializes SLIST from +the safety belt SBELT. The basic idea is that when the resolver has no +idea what servers to ask, it should use information from a configuration +file that lists several servers which are expected to be helpful. +Although there are special situations, the usual choice is two of the +root servers and two of the servers for the host's domain. The reason +for two of each is for redundancy. The root servers will provide +eventual access to all of the domain space. The two local servers will +allow the resolver to continue to resolve local names if the local +network becomes isolated from the internet due to gateway or link +failure. + +In addition to the names and addresses of the servers, the SLIST data +structure can be sorted to use the best servers first, and to insure +that all addresses of all servers are used in a round-robin manner. The +sorting can be a simple function of preferring addresses on the local +network over others, or may involve statistics from past events, such as +previous response times and batting averages. + +Step 3 sends out queries until a response is received. The strategy is +to cycle around all of the addresses for all of the servers with a +timeout between each transmission. In practice it is important to use +all addresses of a multihomed host, and too aggressive a retransmission +policy actually slows response when used by multiple resolvers +contending for the same name server and even occasionally for a single +resolver. SLIST typically contains data values to control the timeouts +and keep track of previous transmissions. + +Step 4 involves analyzing responses. The resolver should be highly +paranoid in its parsing of responses. It should also check that the +response matches the query it sent using the ID field in the response. + + + +Mockapetris [Page 35] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +The ideal answer is one from a server authoritative for the query which +either gives the required data or a name error. The data is passed back +to the user and entered in the cache for future use if its TTL is +greater than zero. + +If the response shows a delegation, the resolver should check to see +that the delegation is "closer" to the answer than the servers in SLIST +are. This can be done by comparing the match count in SLIST with that +computed from SNAME and the NS RRs in the delegation. If not, the reply +is bogus and should be ignored. If the delegation is valid the NS +delegation RRs and any address RRs for the servers should be cached. +The name servers are entered in the SLIST, and the search is restarted. + +If the response contains a CNAME, the search is restarted at the CNAME +unless the response has the data for the canonical name or if the CNAME +is the answer itself. + +Details and implementation hints can be found in [RFC-1035]. + +6. A SCENARIO + +In our sample domain space, suppose we wanted separate administrative +control for the root, MIL, EDU, MIT.EDU and ISI.EDU zones. We might +allocate name servers as follows: + + + |(C.ISI.EDU,SRI-NIC.ARPA + | A.ISI.EDU) + +---------------------+------------------+ + | | | + MIL EDU ARPA + |(SRI-NIC.ARPA, |(SRI-NIC.ARPA, | + | A.ISI.EDU | C.ISI.EDU) | + +-----+-----+ | +------+-----+-----+ + | | | | | | | + BRL NOSC DARPA | IN-ADDR SRI-NIC ACC + | + +--------+------------------+---------------+--------+ + | | | | | + UCI MIT | UDEL YALE + |(XX.LCS.MIT.EDU, ISI + |ACHILLES.MIT.EDU) |(VAXA.ISI.EDU,VENERA.ISI.EDU, + +---+---+ | A.ISI.EDU) + | | | + LCS ACHILLES +--+-----+-----+--------+ + | | | | | | + XX A C VAXA VENERA Mockapetris + + + + +Mockapetris [Page 36] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +In this example, the authoritative name server is shown in parentheses +at the point in the domain tree at which is assumes control. + +Thus the root name servers are on C.ISI.EDU, SRI-NIC.ARPA, and +A.ISI.EDU. The MIL domain is served by SRI-NIC.ARPA and A.ISI.EDU. The +EDU domain is served by SRI-NIC.ARPA. and C.ISI.EDU. Note that servers +may have zones which are contiguous or disjoint. In this scenario, +C.ISI.EDU has contiguous zones at the root and EDU domains. A.ISI.EDU +has contiguous zones at the root and MIL domains, but also has a non- +contiguous zone at ISI.EDU. + +6.1. C.ISI.EDU name server + +C.ISI.EDU is a name server for the root, MIL, and EDU domains of the IN +class, and would have zones for these domains. The zone data for the +root domain might be: + + . IN SOA SRI-NIC.ARPA. HOSTMASTER.SRI-NIC.ARPA. ( + 870611 ;serial + 1800 ;refresh every 30 min + 300 ;retry every 5 min + 604800 ;expire after a week + 86400) ;minimum of a day + NS A.ISI.EDU. + NS C.ISI.EDU. + NS SRI-NIC.ARPA. + + MIL. 86400 NS SRI-NIC.ARPA. + 86400 NS A.ISI.EDU. + + EDU. 86400 NS SRI-NIC.ARPA. + 86400 NS C.ISI.EDU. + + SRI-NIC.ARPA. A 26.0.0.73 + A 10.0.0.51 + MX 0 SRI-NIC.ARPA. + HINFO DEC-2060 TOPS20 + + ACC.ARPA. A 26.6.0.65 + HINFO PDP-11/70 UNIX + MX 10 ACC.ARPA. + + USC-ISIC.ARPA. CNAME C.ISI.EDU. + + 73.0.0.26.IN-ADDR.ARPA. PTR SRI-NIC.ARPA. + 65.0.6.26.IN-ADDR.ARPA. PTR ACC.ARPA. + 51.0.0.10.IN-ADDR.ARPA. PTR SRI-NIC.ARPA. + 52.0.0.10.IN-ADDR.ARPA. PTR C.ISI.EDU. + + + +Mockapetris [Page 37] + +RFC 1034 Domain Concepts and Facilities November 1987 + + + 103.0.3.26.IN-ADDR.ARPA. PTR A.ISI.EDU. + + A.ISI.EDU. 86400 A 26.3.0.103 + C.ISI.EDU. 86400 A 10.0.0.52 + +This data is represented as it would be in a master file. Most RRs are +single line entries; the sole exception here is the SOA RR, which uses +"(" to start a multi-line RR and ")" to show the end of a multi-line RR. +Since the class of all RRs in a zone must be the same, only the first RR +in a zone need specify the class. When a name server loads a zone, it +forces the TTL of all authoritative RRs to be at least the MINIMUM field +of the SOA, here 86400 seconds, or one day. The NS RRs marking +delegation of the MIL and EDU domains, together with the glue RRs for +the servers host addresses, are not part of the authoritative data in +the zone, and hence have explicit TTLs. + +Four RRs are attached to the root node: the SOA which describes the root +zone and the 3 NS RRs which list the name servers for the root. The +data in the SOA RR describes the management of the zone. The zone data +is maintained on host SRI-NIC.ARPA, and the responsible party for the +zone is HOSTMASTER@SRI-NIC.ARPA. A key item in the SOA is the 86400 +second minimum TTL, which means that all authoritative data in the zone +has at least that TTL, although higher values may be explicitly +specified. + +The NS RRs for the MIL and EDU domains mark the boundary between the +root zone and the MIL and EDU zones. Note that in this example, the +lower zones happen to be supported by name servers which also support +the root zone. + +The master file for the EDU zone might be stated relative to the origin +EDU. The zone data for the EDU domain might be: + + EDU. IN SOA SRI-NIC.ARPA. HOSTMASTER.SRI-NIC.ARPA. ( + 870729 ;serial + 1800 ;refresh every 30 minutes + 300 ;retry every 5 minutes + 604800 ;expire after a week + 86400 ;minimum of a day + ) + NS SRI-NIC.ARPA. + NS C.ISI.EDU. + + UCI 172800 NS ICS.UCI + 172800 NS ROME.UCI + ICS.UCI 172800 A 192.5.19.1 + ROME.UCI 172800 A 192.5.19.31 + + + + +Mockapetris [Page 38] + +RFC 1034 Domain Concepts and Facilities November 1987 + + + ISI 172800 NS VAXA.ISI + 172800 NS A.ISI + 172800 NS VENERA.ISI.EDU. + VAXA.ISI 172800 A 10.2.0.27 + 172800 A 128.9.0.33 + VENERA.ISI.EDU. 172800 A 10.1.0.52 + 172800 A 128.9.0.32 + A.ISI 172800 A 26.3.0.103 + + UDEL.EDU. 172800 NS LOUIE.UDEL.EDU. + 172800 NS UMN-REI-UC.ARPA. + LOUIE.UDEL.EDU. 172800 A 10.0.0.96 + 172800 A 192.5.39.3 + + YALE.EDU. 172800 NS YALE.ARPA. + YALE.EDU. 172800 NS YALE-BULLDOG.ARPA. + + MIT.EDU. 43200 NS XX.LCS.MIT.EDU. + 43200 NS ACHILLES.MIT.EDU. + XX.LCS.MIT.EDU. 43200 A 10.0.0.44 + ACHILLES.MIT.EDU. 43200 A 18.72.0.8 + +Note the use of relative names here. The owner name for the ISI.EDU. is +stated using a relative name, as are two of the name server RR contents. +Relative and absolute domain names may be freely intermixed in a master + +6.2. Example standard queries + +The following queries and responses illustrate name server behavior. +Unless otherwise noted, the queries do not have recursion desired (RD) +in the header. Note that the answers to non-recursive queries do depend +on the server being asked, but do not depend on the identity of the +requester. + + + + + + + + + + + + + + + + + + +Mockapetris [Page 39] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +6.2.1. QNAME=SRI-NIC.ARPA, QTYPE=A + +The query would look like: + + +---------------------------------------------------+ + Header | OPCODE=SQUERY | + +---------------------------------------------------+ + Question | QNAME=SRI-NIC.ARPA., QCLASS=IN, QTYPE=A | + +---------------------------------------------------+ + Answer | <empty> | + +---------------------------------------------------+ + Authority | <empty> | + +---------------------------------------------------+ + Additional | <empty> | + +---------------------------------------------------+ + +The response from C.ISI.EDU would be: + + +---------------------------------------------------+ + Header | OPCODE=SQUERY, RESPONSE, AA | + +---------------------------------------------------+ + Question | QNAME=SRI-NIC.ARPA., QCLASS=IN, QTYPE=A | + +---------------------------------------------------+ + Answer | SRI-NIC.ARPA. 86400 IN A 26.0.0.73 | + | 86400 IN A 10.0.0.51 | + +---------------------------------------------------+ + Authority | <empty> | + +---------------------------------------------------+ + Additional | <empty> | + +---------------------------------------------------+ + +The header of the response looks like the header of the query, except +that the RESPONSE bit is set, indicating that this message is a +response, not a query, and the Authoritative Answer (AA) bit is set +indicating that the address RRs in the answer section are from +authoritative data. The question section of the response matches the +question section of the query. + + + + + + + + + + + + + + +Mockapetris [Page 40] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +If the same query was sent to some other server which was not +authoritative for SRI-NIC.ARPA, the response might be: + + +---------------------------------------------------+ + Header | OPCODE=SQUERY,RESPONSE | + +---------------------------------------------------+ + Question | QNAME=SRI-NIC.ARPA., QCLASS=IN, QTYPE=A | + +---------------------------------------------------+ + Answer | SRI-NIC.ARPA. 1777 IN A 10.0.0.51 | + | 1777 IN A 26.0.0.73 | + +---------------------------------------------------+ + Authority | <empty> | + +---------------------------------------------------+ + Additional | <empty> | + +---------------------------------------------------+ + +This response is different from the previous one in two ways: the header +does not have AA set, and the TTLs are different. The inference is that +the data did not come from a zone, but from a cache. The difference +between the authoritative TTL and the TTL here is due to aging of the +data in a cache. The difference in ordering of the RRs in the answer +section is not significant. + +6.2.2. QNAME=SRI-NIC.ARPA, QTYPE=* + +A query similar to the previous one, but using a QTYPE of *, would +receive the following response from C.ISI.EDU: + + +---------------------------------------------------+ + Header | OPCODE=SQUERY, RESPONSE, AA | + +---------------------------------------------------+ + Question | QNAME=SRI-NIC.ARPA., QCLASS=IN, QTYPE=* | + +---------------------------------------------------+ + Answer | SRI-NIC.ARPA. 86400 IN A 26.0.0.73 | + | A 10.0.0.51 | + | MX 0 SRI-NIC.ARPA. | + | HINFO DEC-2060 TOPS20 | + +---------------------------------------------------+ + Authority | <empty> | + +---------------------------------------------------+ + Additional | <empty> | + +---------------------------------------------------+ + + + + + + + + + +Mockapetris [Page 41] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +If a similar query was directed to two name servers which are not +authoritative for SRI-NIC.ARPA, the responses might be: + + +---------------------------------------------------+ + Header | OPCODE=SQUERY, RESPONSE | + +---------------------------------------------------+ + Question | QNAME=SRI-NIC.ARPA., QCLASS=IN, QTYPE=* | + +---------------------------------------------------+ + Answer | SRI-NIC.ARPA. 12345 IN A 26.0.0.73 | + | A 10.0.0.51 | + +---------------------------------------------------+ + Authority | <empty> | + +---------------------------------------------------+ + Additional | <empty> | + +---------------------------------------------------+ + +and + + +---------------------------------------------------+ + Header | OPCODE=SQUERY, RESPONSE | + +---------------------------------------------------+ + Question | QNAME=SRI-NIC.ARPA., QCLASS=IN, QTYPE=* | + +---------------------------------------------------+ + Answer | SRI-NIC.ARPA. 1290 IN HINFO DEC-2060 TOPS20 | + +---------------------------------------------------+ + Authority | <empty> | + +---------------------------------------------------+ + Additional | <empty> | + +---------------------------------------------------+ + +Neither of these answers have AA set, so neither response comes from +authoritative data. The different contents and different TTLs suggest +that the two servers cached data at different times, and that the first +server cached the response to a QTYPE=A query and the second cached the +response to a HINFO query. + + + + + + + + + + + + + + + + +Mockapetris [Page 42] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +6.2.3. QNAME=SRI-NIC.ARPA, QTYPE=MX + +This type of query might be result from a mailer trying to look up +routing information for the mail destination HOSTMASTER@SRI-NIC.ARPA. +The response from C.ISI.EDU would be: + + +---------------------------------------------------+ + Header | OPCODE=SQUERY, RESPONSE, AA | + +---------------------------------------------------+ + Question | QNAME=SRI-NIC.ARPA., QCLASS=IN, QTYPE=MX | + +---------------------------------------------------+ + Answer | SRI-NIC.ARPA. 86400 IN MX 0 SRI-NIC.ARPA.| + +---------------------------------------------------+ + Authority | <empty> | + +---------------------------------------------------+ + Additional | SRI-NIC.ARPA. 86400 IN A 26.0.0.73 | + | A 10.0.0.51 | + +---------------------------------------------------+ + +This response contains the MX RR in the answer section of the response. +The additional section contains the address RRs because the name server +at C.ISI.EDU guesses that the requester will need the addresses in order +to properly use the information carried by the MX. + +6.2.4. QNAME=SRI-NIC.ARPA, QTYPE=NS + +C.ISI.EDU would reply to this query with: + + +---------------------------------------------------+ + Header | OPCODE=SQUERY, RESPONSE, AA | + +---------------------------------------------------+ + Question | QNAME=SRI-NIC.ARPA., QCLASS=IN, QTYPE=NS | + +---------------------------------------------------+ + Answer | <empty> | + +---------------------------------------------------+ + Authority | <empty> | + +---------------------------------------------------+ + Additional | <empty> | + +---------------------------------------------------+ + +The only difference between the response and the query is the AA and +RESPONSE bits in the header. The interpretation of this response is +that the server is authoritative for the name, and the name exists, but +no RRs of type NS are present there. + +6.2.5. QNAME=SIR-NIC.ARPA, QTYPE=A + +If a user mistyped a host name, we might see this type of query. + + + +Mockapetris [Page 43] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +C.ISI.EDU would answer it with: + + +---------------------------------------------------+ + Header | OPCODE=SQUERY, RESPONSE, AA, RCODE=NE | + +---------------------------------------------------+ + Question | QNAME=SIR-NIC.ARPA., QCLASS=IN, QTYPE=A | + +---------------------------------------------------+ + Answer | <empty> | + +---------------------------------------------------+ + Authority | . SOA SRI-NIC.ARPA. HOSTMASTER.SRI-NIC.ARPA. | + | 870611 1800 300 604800 86400 | + +---------------------------------------------------+ + Additional | <empty> | + +---------------------------------------------------+ + +This response states that the name does not exist. This condition is +signalled in the response code (RCODE) section of the header. + +The SOA RR in the authority section is the optional negative caching +information which allows the resolver using this response to assume that +the name will not exist for the SOA MINIMUM (86400) seconds. + +6.2.6. QNAME=BRL.MIL, QTYPE=A + +If this query is sent to C.ISI.EDU, the reply would be: + + +---------------------------------------------------+ + Header | OPCODE=SQUERY, RESPONSE | + +---------------------------------------------------+ + Question | QNAME=BRL.MIL, QCLASS=IN, QTYPE=A | + +---------------------------------------------------+ + Answer | <empty> | + +---------------------------------------------------+ + Authority | MIL. 86400 IN NS SRI-NIC.ARPA. | + | 86400 NS A.ISI.EDU. | + +---------------------------------------------------+ + Additional | A.ISI.EDU. A 26.3.0.103 | + | SRI-NIC.ARPA. A 26.0.0.73 | + | A 10.0.0.51 | + +---------------------------------------------------+ + +This response has an empty answer section, but is not authoritative, so +it is a referral. The name server on C.ISI.EDU, realizing that it is +not authoritative for the MIL domain, has referred the requester to +servers on A.ISI.EDU and SRI-NIC.ARPA, which it knows are authoritative +for the MIL domain. + + + + + +Mockapetris [Page 44] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +6.2.7. QNAME=USC-ISIC.ARPA, QTYPE=A + +The response to this query from A.ISI.EDU would be: + + +---------------------------------------------------+ + Header | OPCODE=SQUERY, RESPONSE, AA | + +---------------------------------------------------+ + Question | QNAME=USC-ISIC.ARPA., QCLASS=IN, QTYPE=A | + +---------------------------------------------------+ + Answer | USC-ISIC.ARPA. 86400 IN CNAME C.ISI.EDU. | + | C.ISI.EDU. 86400 IN A 10.0.0.52 | + +---------------------------------------------------+ + Authority | <empty> | + +---------------------------------------------------+ + Additional | <empty> | + +---------------------------------------------------+ + +Note that the AA bit in the header guarantees that the data matching +QNAME is authoritative, but does not say anything about whether the data +for C.ISI.EDU is authoritative. This complete reply is possible because +A.ISI.EDU happens to be authoritative for both the ARPA domain where +USC-ISIC.ARPA is found and the ISI.EDU domain where C.ISI.EDU data is +found. + +If the same query was sent to C.ISI.EDU, its response might be the same +as shown above if it had its own address in its cache, but might also +be: + + + + + + + + + + + + + + + + + + + + + + + + +Mockapetris [Page 45] + +RFC 1034 Domain Concepts and Facilities November 1987 + + + +---------------------------------------------------+ + Header | OPCODE=SQUERY, RESPONSE, AA | + +---------------------------------------------------+ + Question | QNAME=USC-ISIC.ARPA., QCLASS=IN, QTYPE=A | + +---------------------------------------------------+ + Answer | USC-ISIC.ARPA. 86400 IN CNAME C.ISI.EDU. | + +---------------------------------------------------+ + Authority | ISI.EDU. 172800 IN NS VAXA.ISI.EDU. | + | NS A.ISI.EDU. | + | NS VENERA.ISI.EDU. | + +---------------------------------------------------+ + Additional | VAXA.ISI.EDU. 172800 A 10.2.0.27 | + | 172800 A 128.9.0.33 | + | VENERA.ISI.EDU. 172800 A 10.1.0.52 | + | 172800 A 128.9.0.32 | + | A.ISI.EDU. 172800 A 26.3.0.103 | + +---------------------------------------------------+ + +This reply contains an authoritative reply for the alias USC-ISIC.ARPA, +plus a referral to the name servers for ISI.EDU. This sort of reply +isn't very likely given that the query is for the host name of the name +server being asked, but would be common for other aliases. + +6.2.8. QNAME=USC-ISIC.ARPA, QTYPE=CNAME + +If this query is sent to either A.ISI.EDU or C.ISI.EDU, the reply would +be: + + +---------------------------------------------------+ + Header | OPCODE=SQUERY, RESPONSE, AA | + +---------------------------------------------------+ + Question | QNAME=USC-ISIC.ARPA., QCLASS=IN, QTYPE=A | + +---------------------------------------------------+ + Answer | USC-ISIC.ARPA. 86400 IN CNAME C.ISI.EDU. | + +---------------------------------------------------+ + Authority | <empty> | + +---------------------------------------------------+ + Additional | <empty> | + +---------------------------------------------------+ + +Because QTYPE=CNAME, the CNAME RR itself answers the query, and the name +server doesn't attempt to look up anything for C.ISI.EDU. (Except +possibly for the additional section.) + +6.3. Example resolution + +The following examples illustrate the operations a resolver must perform +for its client. We assume that the resolver is starting without a + + + +Mockapetris [Page 46] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +cache, as might be the case after system boot. We further assume that +the system is not one of the hosts in the data and that the host is +located somewhere on net 26, and that its safety belt (SBELT) data +structure has the following information: + + Match count = -1 + SRI-NIC.ARPA. 26.0.0.73 10.0.0.51 + A.ISI.EDU. 26.3.0.103 + +This information specifies servers to try, their addresses, and a match +count of -1, which says that the servers aren't very close to the +target. Note that the -1 isn't supposed to be an accurate closeness +measure, just a value so that later stages of the algorithm will work. + +The following examples illustrate the use of a cache, so each example +assumes that previous requests have completed. + +6.3.1. Resolve MX for ISI.EDU. + +Suppose the first request to the resolver comes from the local mailer, +which has mail for PVM@ISI.EDU. The mailer might then ask for type MX +RRs for the domain name ISI.EDU. + +The resolver would look in its cache for MX RRs at ISI.EDU, but the +empty cache wouldn't be helpful. The resolver would recognize that it +needed to query foreign servers and try to determine the best servers to +query. This search would look for NS RRs for the domains ISI.EDU, EDU, +and the root. These searches of the cache would also fail. As a last +resort, the resolver would use the information from the SBELT, copying +it into its SLIST structure. + +At this point the resolver would need to pick one of the three available +addresses to try. Given that the resolver is on net 26, it should +choose either 26.0.0.73 or 26.3.0.103 as its first choice. It would +then send off a query of the form: + + + + + + + + + + + + + + + + +Mockapetris [Page 47] + +RFC 1034 Domain Concepts and Facilities November 1987 + + + +---------------------------------------------------+ + Header | OPCODE=SQUERY | + +---------------------------------------------------+ + Question | QNAME=ISI.EDU., QCLASS=IN, QTYPE=MX | + +---------------------------------------------------+ + Answer | <empty> | + +---------------------------------------------------+ + Authority | <empty> | + +---------------------------------------------------+ + Additional | <empty> | + +---------------------------------------------------+ + +The resolver would then wait for a response to its query or a timeout. +If the timeout occurs, it would try different servers, then different +addresses of the same servers, lastly retrying addresses already tried. +It might eventually receive a reply from SRI-NIC.ARPA: + + +---------------------------------------------------+ + Header | OPCODE=SQUERY, RESPONSE | + +---------------------------------------------------+ + Question | QNAME=ISI.EDU., QCLASS=IN, QTYPE=MX | + +---------------------------------------------------+ + Answer | <empty> | + +---------------------------------------------------+ + Authority | ISI.EDU. 172800 IN NS VAXA.ISI.EDU. | + | NS A.ISI.EDU. | + | NS VENERA.ISI.EDU.| + +---------------------------------------------------+ + Additional | VAXA.ISI.EDU. 172800 A 10.2.0.27 | + | 172800 A 128.9.0.33 | + | VENERA.ISI.EDU. 172800 A 10.1.0.52 | + | 172800 A 128.9.0.32 | + | A.ISI.EDU. 172800 A 26.3.0.103 | + +---------------------------------------------------+ + +The resolver would notice that the information in the response gave a +closer delegation to ISI.EDU than its existing SLIST (since it matches +three labels). The resolver would then cache the information in this +response and use it to set up a new SLIST: + + Match count = 3 + A.ISI.EDU. 26.3.0.103 + VAXA.ISI.EDU. 10.2.0.27 128.9.0.33 + VENERA.ISI.EDU. 10.1.0.52 128.9.0.32 + +A.ISI.EDU appears on this list as well as the previous one, but that is +purely coincidental. The resolver would again start transmitting and +waiting for responses. Eventually it would get an answer: + + + +Mockapetris [Page 48] + +RFC 1034 Domain Concepts and Facilities November 1987 + + + +---------------------------------------------------+ + Header | OPCODE=SQUERY, RESPONSE, AA | + +---------------------------------------------------+ + Question | QNAME=ISI.EDU., QCLASS=IN, QTYPE=MX | + +---------------------------------------------------+ + Answer | ISI.EDU. MX 10 VENERA.ISI.EDU. | + | MX 20 VAXA.ISI.EDU. | + +---------------------------------------------------+ + Authority | <empty> | + +---------------------------------------------------+ + Additional | VAXA.ISI.EDU. 172800 A 10.2.0.27 | + | 172800 A 128.9.0.33 | + | VENERA.ISI.EDU. 172800 A 10.1.0.52 | + | 172800 A 128.9.0.32 | + +---------------------------------------------------+ + +The resolver would add this information to its cache, and return the MX +RRs to its client. + +6.3.2. Get the host name for address 26.6.0.65 + +The resolver would translate this into a request for PTR RRs for +65.0.6.26.IN-ADDR.ARPA. This information is not in the cache, so the +resolver would look for foreign servers to ask. No servers would match, +so it would use SBELT again. (Note that the servers for the ISI.EDU +domain are in the cache, but ISI.EDU is not an ancestor of +65.0.6.26.IN-ADDR.ARPA, so the SBELT is used.) + +Since this request is within the authoritative data of both servers in +SBELT, eventually one would return: + + + + + + + + + + + + + + + + + + + + + +Mockapetris [Page 49] + +RFC 1034 Domain Concepts and Facilities November 1987 + + + +---------------------------------------------------+ + Header | OPCODE=SQUERY, RESPONSE, AA | + +---------------------------------------------------+ + Question | QNAME=65.0.6.26.IN-ADDR.ARPA.,QCLASS=IN,QTYPE=PTR | + +---------------------------------------------------+ + Answer | 65.0.6.26.IN-ADDR.ARPA. PTR ACC.ARPA. | + +---------------------------------------------------+ + Authority | <empty> | + +---------------------------------------------------+ + Additional | <empty> | + +---------------------------------------------------+ + +6.3.3. Get the host address of poneria.ISI.EDU + +This request would translate into a type A request for poneria.ISI.EDU. +The resolver would not find any cached data for this name, but would +find the NS RRs in the cache for ISI.EDU when it looks for foreign +servers to ask. Using this data, it would construct a SLIST of the +form: + + Match count = 3 + + A.ISI.EDU. 26.3.0.103 + VAXA.ISI.EDU. 10.2.0.27 128.9.0.33 + VENERA.ISI.EDU. 10.1.0.52 + +A.ISI.EDU is listed first on the assumption that the resolver orders its +choices by preference, and A.ISI.EDU is on the same network. + +One of these servers would answer the query. + +7. REFERENCES and BIBLIOGRAPHY + +[Dyer 87] Dyer, S., and F. Hsu, "Hesiod", Project Athena + Technical Plan - Name Service, April 1987, version 1.9. + + Describes the fundamentals of the Hesiod name service. + +[IEN-116] J. Postel, "Internet Name Server", IEN-116, + USC/Information Sciences Institute, August 1979. + + A name service obsoleted by the Domain Name System, but + still in use. + + + + + + + + +Mockapetris [Page 50] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +[Quarterman 86] Quarterman, J., and J. Hoskins, "Notable Computer + Networks",Communications of the ACM, October 1986, + volume 29, number 10. + +[RFC-742] K. Harrenstien, "NAME/FINGER", RFC-742, Network + Information Center, SRI International, December 1977. + +[RFC-768] J. Postel, "User Datagram Protocol", RFC-768, + USC/Information Sciences Institute, August 1980. + +[RFC-793] J. Postel, "Transmission Control Protocol", RFC-793, + USC/Information Sciences Institute, September 1981. + +[RFC-799] D. Mills, "Internet Name Domains", RFC-799, COMSAT, + September 1981. + + Suggests introduction of a hierarchy in place of a flat + name space for the Internet. + +[RFC-805] J. Postel, "Computer Mail Meeting Notes", RFC-805, + USC/Information Sciences Institute, February 1982. + +[RFC-810] E. Feinler, K. Harrenstien, Z. Su, and V. White, "DOD + Internet Host Table Specification", RFC-810, Network + Information Center, SRI International, March 1982. + + Obsolete. See RFC-952. + +[RFC-811] K. Harrenstien, V. White, and E. Feinler, "Hostnames + Server", RFC-811, Network Information Center, SRI + International, March 1982. + + Obsolete. See RFC-953. + +[RFC-812] K. Harrenstien, and V. White, "NICNAME/WHOIS", RFC-812, + Network Information Center, SRI International, March + 1982. + +[RFC-819] Z. Su, and J. Postel, "The Domain Naming Convention for + Internet User Applications", RFC-819, Network + Information Center, SRI International, August 1982. + + Early thoughts on the design of the domain system. + Current implementation is completely different. + +[RFC-821] J. Postel, "Simple Mail Transfer Protocol", RFC-821, + USC/Information Sciences Institute, August 1980. + + + + +Mockapetris [Page 51] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +[RFC-830] Z. Su, "A Distributed System for Internet Name Service", + RFC-830, Network Information Center, SRI International, + October 1982. + + Early thoughts on the design of the domain system. + Current implementation is completely different. + +[RFC-882] P. Mockapetris, "Domain names - Concepts and + Facilities," RFC-882, USC/Information Sciences + Institute, November 1983. + + Superceeded by this memo. + +[RFC-883] P. Mockapetris, "Domain names - Implementation and + Specification," RFC-883, USC/Information Sciences + Institute, November 1983. + + Superceeded by this memo. + +[RFC-920] J. Postel and J. Reynolds, "Domain Requirements", + RFC-920, USC/Information Sciences Institute + October 1984. + + Explains the naming scheme for top level domains. + +[RFC-952] K. Harrenstien, M. Stahl, E. Feinler, "DoD Internet Host + Table Specification", RFC-952, SRI, October 1985. + + Specifies the format of HOSTS.TXT, the host/address + table replaced by the DNS. + +[RFC-953] K. Harrenstien, M. Stahl, E. Feinler, "HOSTNAME Server", + RFC-953, SRI, October 1985. + + This RFC contains the official specification of the + hostname server protocol, which is obsoleted by the DNS. + This TCP based protocol accesses information stored in + the RFC-952 format, and is used to obtain copies of the + host table. + +[RFC-973] P. Mockapetris, "Domain System Changes and + Observations", RFC-973, USC/Information Sciences + Institute, January 1986. + + Describes changes to RFC-882 and RFC-883 and reasons for + them. Now obsolete. + + + + + +Mockapetris [Page 52] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +[RFC-974] C. Partridge, "Mail routing and the domain system", + RFC-974, CSNET CIC BBN Labs, January 1986. + + Describes the transition from HOSTS.TXT based mail + addressing to the more powerful MX system used with the + domain system. + +[RFC-1001] NetBIOS Working Group, "Protocol standard for a NetBIOS + service on a TCP/UDP transport: Concepts and Methods", + RFC-1001, March 1987. + + This RFC and RFC-1002 are a preliminary design for + NETBIOS on top of TCP/IP which proposes to base NetBIOS + name service on top of the DNS. + +[RFC-1002] NetBIOS Working Group, "Protocol standard for a NetBIOS + service on a TCP/UDP transport: Detailed + Specifications", RFC-1002, March 1987. + +[RFC-1010] J. Reynolds and J. Postel, "Assigned Numbers", RFC-1010, + USC/Information Sciences Institute, May 1987 + + Contains socket numbers and mnemonics for host names, + operating systems, etc. + +[RFC-1031] W. Lazear, "MILNET Name Domain Transition", RFC-1031, + November 1987. + + Describes a plan for converting the MILNET to the DNS. + +[RFC-1032] M. K. Stahl, "Establishing a Domain - Guidelines for + Administrators", RFC-1032, November 1987. + + Describes the registration policies used by the NIC to + administer the top level domains and delegate subzones. + +[RFC-1033] M. K. Lottor, "Domain Administrators Operations Guide", + RFC-1033, November 1987. + + A cookbook for domain administrators. + +[Solomon 82] M. Solomon, L. Landweber, and D. Neuhengen, "The CSNET + Name Server", Computer Networks, vol 6, nr 3, July 1982. + + Describes a name service for CSNET which is independent + from the DNS and DNS use in the CSNET. + + + + + +Mockapetris [Page 53] + +RFC 1034 Domain Concepts and Facilities November 1987 + + +Index + + A 12 + Absolute names 8 + Aliases 14, 31 + Authority 6 + AXFR 17 + + Case of characters 7 + CH 12 + CNAME 12, 13, 31 + Completion queries 18 + + Domain name 6, 7 + + Glue RRs 20 + + HINFO 12 + + IN 12 + Inverse queries 16 + Iterative 4 + + Label 7 + + Mailbox names 9 + MX 12 + + Name error 27, 36 + Name servers 5, 17 + NE 30 + Negative caching 44 + NS 12 + + Opcode 16 + + PTR 12 + + QCLASS 16 + QTYPE 16 + + RDATA 13 + Recursive 4 + Recursive service 22 + Relative names 7 + Resolvers 6 + RR 12 + + + + +Mockapetris [Page 54] + +RFC 1034 Domain Concepts and Facilities November 1987 + + + Safety belt 33 + Sections 16 + SOA 12 + Standard queries 22 + + Status queries 18 + Stub resolvers 32 + + TTL 12, 13 + + Wildcards 25 + + Zone transfers 28 + Zones 19 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Mockapetris [Page 55] + diff --git a/doc/rfc/rfc1035.txt b/doc/rfc/rfc1035.txt new file mode 100644 index 00000000..b1a9bf5a --- /dev/null +++ b/doc/rfc/rfc1035.txt @@ -0,0 +1,3077 @@ +Network Working Group P. Mockapetris +Request for Comments: 1035 ISI + November 1987 +Obsoletes: RFCs 882, 883, 973 + + DOMAIN NAMES - IMPLEMENTATION AND SPECIFICATION + + +1. STATUS OF THIS MEMO + +This RFC describes the details of the domain system and protocol, and +assumes that the reader is familiar with the concepts discussed in a +companion RFC, "Domain Names - Concepts and Facilities" [RFC-1034]. + +The domain system is a mixture of functions and data types which are an +official protocol and functions and data types which are still +experimental. Since the domain system is intentionally extensible, new +data types and experimental behavior should always be expected in parts +of the system beyond the official protocol. The official protocol parts +include standard queries, responses and the Internet class RR data +formats (e.g., host addresses). Since the previous RFC set, several +definitions have changed, so some previous definitions are obsolete. + +Experimental or obsolete features are clearly marked in these RFCs, and +such information should be used with caution. + +The reader is especially cautioned not to depend on the values which +appear in examples to be current or complete, since their purpose is +primarily pedagogical. Distribution of this memo is unlimited. + + Table of Contents + + 1. STATUS OF THIS MEMO 1 + 2. INTRODUCTION 3 + 2.1. Overview 3 + 2.2. Common configurations 4 + 2.3. Conventions 7 + 2.3.1. Preferred name syntax 7 + 2.3.2. Data Transmission Order 8 + 2.3.3. Character Case 9 + 2.3.4. Size limits 10 + 3. DOMAIN NAME SPACE AND RR DEFINITIONS 10 + 3.1. Name space definitions 10 + 3.2. RR definitions 11 + 3.2.1. Format 11 + 3.2.2. TYPE values 12 + 3.2.3. QTYPE values 12 + 3.2.4. CLASS values 13 + + + +Mockapetris [Page 1] + +RFC 1035 Domain Implementation and Specification November 1987 + + + 3.2.5. QCLASS values 13 + 3.3. Standard RRs 13 + 3.3.1. CNAME RDATA format 14 + 3.3.2. HINFO RDATA format 14 + 3.3.3. MB RDATA format (EXPERIMENTAL) 14 + 3.3.4. MD RDATA format (Obsolete) 15 + 3.3.5. MF RDATA format (Obsolete) 15 + 3.3.6. MG RDATA format (EXPERIMENTAL) 16 + 3.3.7. MINFO RDATA format (EXPERIMENTAL) 16 + 3.3.8. MR RDATA format (EXPERIMENTAL) 17 + 3.3.9. MX RDATA format 17 + 3.3.10. NULL RDATA format (EXPERIMENTAL) 17 + 3.3.11. NS RDATA format 18 + 3.3.12. PTR RDATA format 18 + 3.3.13. SOA RDATA format 19 + 3.3.14. TXT RDATA format 20 + 3.4. ARPA Internet specific RRs 20 + 3.4.1. A RDATA format 20 + 3.4.2. WKS RDATA format 21 + 3.5. IN-ADDR.ARPA domain 22 + 3.6. Defining new types, classes, and special namespaces 24 + 4. MESSAGES 25 + 4.1. Format 25 + 4.1.1. Header section format 26 + 4.1.2. Question section format 28 + 4.1.3. Resource record format 29 + 4.1.4. Message compression 30 + 4.2. Transport 32 + 4.2.1. UDP usage 32 + 4.2.2. TCP usage 32 + 5. MASTER FILES 33 + 5.1. Format 33 + 5.2. Use of master files to define zones 35 + 5.3. Master file example 36 + 6. NAME SERVER IMPLEMENTATION 37 + 6.1. Architecture 37 + 6.1.1. Control 37 + 6.1.2. Database 37 + 6.1.3. Time 39 + 6.2. Standard query processing 39 + 6.3. Zone refresh and reload processing 39 + 6.4. Inverse queries (Optional) 40 + 6.4.1. The contents of inverse queries and responses 40 + 6.4.2. Inverse query and response example 41 + 6.4.3. Inverse query processing 42 + + + + + + +Mockapetris [Page 2] + +RFC 1035 Domain Implementation and Specification November 1987 + + + 6.5. Completion queries and responses 42 + 7. RESOLVER IMPLEMENTATION 43 + 7.1. Transforming a user request into a query 43 + 7.2. Sending the queries 44 + 7.3. Processing responses 46 + 7.4. Using the cache 47 + 8. MAIL SUPPORT 47 + 8.1. Mail exchange binding 48 + 8.2. Mailbox binding (Experimental) 48 + 9. REFERENCES and BIBLIOGRAPHY 50 + Index 54 + +2. INTRODUCTION + +2.1. Overview + +The goal of domain names is to provide a mechanism for naming resources +in such a way that the names are usable in different hosts, networks, +protocol families, internets, and administrative organizations. + +From the user's point of view, domain names are useful as arguments to a +local agent, called a resolver, which retrieves information associated +with the domain name. Thus a user might ask for the host address or +mail information associated with a particular domain name. To enable +the user to request a particular type of information, an appropriate +query type is passed to the resolver with the domain name. To the user, +the domain tree is a single information space; the resolver is +responsible for hiding the distribution of data among name servers from +the user. + +From the resolver's point of view, the database that makes up the domain +space is distributed among various name servers. Different parts of the +domain space are stored in different name servers, although a particular +data item will be stored redundantly in two or more name servers. The +resolver starts with knowledge of at least one name server. When the +resolver processes a user query it asks a known name server for the +information; in return, the resolver either receives the desired +information or a referral to another name server. Using these +referrals, resolvers learn the identities and contents of other name +servers. Resolvers are responsible for dealing with the distribution of +the domain space and dealing with the effects of name server failure by +consulting redundant databases in other servers. + +Name servers manage two kinds of data. The first kind of data held in +sets called zones; each zone is the complete database for a particular +"pruned" subtree of the domain space. This data is called +authoritative. A name server periodically checks to make sure that its +zones are up to date, and if not, obtains a new copy of updated zones + + + +Mockapetris [Page 3] + +RFC 1035 Domain Implementation and Specification November 1987 + + +from master files stored locally or in another name server. The second +kind of data is cached data which was acquired by a local resolver. +This data may be incomplete, but improves the performance of the +retrieval process when non-local data is repeatedly accessed. Cached +data is eventually discarded by a timeout mechanism. + +This functional structure isolates the problems of user interface, +failure recovery, and distribution in the resolvers and isolates the +database update and refresh problems in the name servers. + +2.2. Common configurations + +A host can participate in the domain name system in a number of ways, +depending on whether the host runs programs that retrieve information +from the domain system, name servers that answer queries from other +hosts, or various combinations of both functions. The simplest, and +perhaps most typical, configuration is shown below: + + Local Host | Foreign + | + +---------+ +----------+ | +--------+ + | | user queries | |queries | | | + | User |-------------->| |---------|->|Foreign | + | Program | | Resolver | | | Name | + | |<--------------| |<--------|--| Server | + | | user responses| |responses| | | + +---------+ +----------+ | +--------+ + | A | + cache additions | | references | + V | | + +----------+ | + | cache | | + +----------+ | + +User programs interact with the domain name space through resolvers; the +format of user queries and user responses is specific to the host and +its operating system. User queries will typically be operating system +calls, and the resolver and its cache will be part of the host operating +system. Less capable hosts may choose to implement the resolver as a +subroutine to be linked in with every program that needs its services. +Resolvers answer user queries with information they acquire via queries +to foreign name servers and the local cache. + +Note that the resolver may have to make several queries to several +different foreign name servers to answer a particular user query, and +hence the resolution of a user query may involve several network +accesses and an arbitrary amount of time. The queries to foreign name +servers and the corresponding responses have a standard format described + + + +Mockapetris [Page 4] + +RFC 1035 Domain Implementation and Specification November 1987 + + +in this memo, and may be datagrams. + +Depending on its capabilities, a name server could be a stand alone +program on a dedicated machine or a process or processes on a large +timeshared host. A simple configuration might be: + + Local Host | Foreign + | + +---------+ | + / /| | + +---------+ | +----------+ | +--------+ + | | | | |responses| | | + | | | | Name |---------|->|Foreign | + | Master |-------------->| Server | | |Resolver| + | files | | | |<--------|--| | + | |/ | | queries | +--------+ + +---------+ +----------+ | + +Here a primary name server acquires information about one or more zones +by reading master files from its local file system, and answers queries +about those zones that arrive from foreign resolvers. + +The DNS requires that all zones be redundantly supported by more than +one name server. Designated secondary servers can acquire zones and +check for updates from the primary server using the zone transfer +protocol of the DNS. This configuration is shown below: + + Local Host | Foreign + | + +---------+ | + / /| | + +---------+ | +----------+ | +--------+ + | | | | |responses| | | + | | | | Name |---------|->|Foreign | + | Master |-------------->| Server | | |Resolver| + | files | | | |<--------|--| | + | |/ | | queries | +--------+ + +---------+ +----------+ | + A |maintenance | +--------+ + | +------------|->| | + | queries | |Foreign | + | | | Name | + +------------------|--| Server | + maintenance responses | +--------+ + +In this configuration, the name server periodically establishes a +virtual circuit to a foreign name server to acquire a copy of a zone or +to check that an existing copy has not changed. The messages sent for + + + +Mockapetris [Page 5] + +RFC 1035 Domain Implementation and Specification November 1987 + + +these maintenance activities follow the same form as queries and +responses, but the message sequences are somewhat different. + +The information flow in a host that supports all aspects of the domain +name system is shown below: + + Local Host | Foreign + | + +---------+ +----------+ | +--------+ + | | user queries | |queries | | | + | User |-------------->| |---------|->|Foreign | + | Program | | Resolver | | | Name | + | |<--------------| |<--------|--| Server | + | | user responses| |responses| | | + +---------+ +----------+ | +--------+ + | A | + cache additions | | references | + V | | + +----------+ | + | Shared | | + | database | | + +----------+ | + A | | + +---------+ refreshes | | references | + / /| | V | + +---------+ | +----------+ | +--------+ + | | | | |responses| | | + | | | | Name |---------|->|Foreign | + | Master |-------------->| Server | | |Resolver| + | files | | | |<--------|--| | + | |/ | | queries | +--------+ + +---------+ +----------+ | + A |maintenance | +--------+ + | +------------|->| | + | queries | |Foreign | + | | | Name | + +------------------|--| Server | + maintenance responses | +--------+ + +The shared database holds domain space data for the local name server +and resolver. The contents of the shared database will typically be a +mixture of authoritative data maintained by the periodic refresh +operations of the name server and cached data from previous resolver +requests. The structure of the domain data and the necessity for +synchronization between name servers and resolvers imply the general +characteristics of this database, but the actual format is up to the +local implementor. + + + + +Mockapetris [Page 6] + +RFC 1035 Domain Implementation and Specification November 1987 + + +Information flow can also be tailored so that a group of hosts act +together to optimize activities. Sometimes this is done to offload less +capable hosts so that they do not have to implement a full resolver. +This can be appropriate for PCs or hosts which want to minimize the +amount of new network code which is required. This scheme can also +allow a group of hosts can share a small number of caches rather than +maintaining a large number of separate caches, on the premise that the +centralized caches will have a higher hit ratio. In either case, +resolvers are replaced with stub resolvers which act as front ends to +resolvers located in a recursive server in one or more name servers +known to perform that service: + + Local Hosts | Foreign + | + +---------+ | + | | responses | + | Stub |<--------------------+ | + | Resolver| | | + | |----------------+ | | + +---------+ recursive | | | + queries | | | + V | | + +---------+ recursive +----------+ | +--------+ + | | queries | |queries | | | + | Stub |-------------->| Recursive|---------|->|Foreign | + | Resolver| | Server | | | Name | + | |<--------------| |<--------|--| Server | + +---------+ responses | |responses| | | + +----------+ | +--------+ + | Central | | + | cache | | + +----------+ | + +In any case, note that domain components are always replicated for +reliability whenever possible. + +2.3. Conventions + +The domain system has several conventions dealing with low-level, but +fundamental, issues. While the implementor is free to violate these +conventions WITHIN HIS OWN SYSTEM, he must observe these conventions in +ALL behavior observed from other hosts. + +2.3.1. Preferred name syntax + +The DNS specifications attempt to be as general as possible in the rules +for constructing domain names. The idea is that the name of any +existing object can be expressed as a domain name with minimal changes. + + + +Mockapetris [Page 7] + +RFC 1035 Domain Implementation and Specification November 1987 + + +However, when assigning a domain name for an object, the prudent user +will select a name which satisfies both the rules of the domain system +and any existing rules for the object, whether these rules are published +or implied by existing programs. + +For example, when naming a mail domain, the user should satisfy both the +rules of this memo and those in RFC-822. When creating a new host name, +the old rules for HOSTS.TXT should be followed. This avoids problems +when old software is converted to use domain names. + +The following syntax will result in fewer problems with many + +applications that use domain names (e.g., mail, TELNET). + +<domain> ::= <subdomain> | " " + +<subdomain> ::= <label> | <subdomain> "." <label> + +<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ] + +<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str> + +<let-dig-hyp> ::= <let-dig> | "-" + +<let-dig> ::= <letter> | <digit> + +<letter> ::= any one of the 52 alphabetic characters A through Z in +upper case and a through z in lower case + +<digit> ::= any one of the ten digits 0 through 9 + +Note that while upper and lower case letters are allowed in domain +names, no significance is attached to the case. That is, two names with +the same spelling but different case are to be treated as if identical. + +The labels must follow the rules for ARPANET host names. They must +start with a letter, end with a letter or digit, and have as interior +characters only letters, digits, and hyphen. There are also some +restrictions on the length. Labels must be 63 characters or less. + +For example, the following strings identify hosts in the Internet: + +A.ISI.EDU XX.LCS.MIT.EDU SRI-NIC.ARPA + +2.3.2. Data Transmission Order + +The order of transmission of the header and data described in this +document is resolved to the octet level. Whenever a diagram shows a + + + +Mockapetris [Page 8] + +RFC 1035 Domain Implementation and Specification November 1987 + + +group of octets, the order of transmission of those octets is the normal +order in which they are read in English. For example, in the following +diagram, the octets are transmitted in the order they are numbered. + + 0 1 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | 1 | 2 | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | 3 | 4 | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | 5 | 6 | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +Whenever an octet represents a numeric quantity, the left most bit in +the diagram is the high order or most significant bit. That is, the bit +labeled 0 is the most significant bit. For example, the following +diagram represents the value 170 (decimal). + + 0 1 2 3 4 5 6 7 + +-+-+-+-+-+-+-+-+ + |1 0 1 0 1 0 1 0| + +-+-+-+-+-+-+-+-+ + +Similarly, whenever a multi-octet field represents a numeric quantity +the left most bit of the whole field is the most significant bit. When +a multi-octet quantity is transmitted the most significant octet is +transmitted first. + +2.3.3. Character Case + +For all parts of the DNS that are part of the official protocol, all +comparisons between character strings (e.g., labels, domain names, etc.) +are done in a case-insensitive manner. At present, this rule is in +force throughout the domain system without exception. However, future +additions beyond current usage may need to use the full binary octet +capabilities in names, so attempts to store domain names in 7-bit ASCII +or use of special bytes to terminate labels, etc., should be avoided. + +When data enters the domain system, its original case should be +preserved whenever possible. In certain circumstances this cannot be +done. For example, if two RRs are stored in a database, one at x.y and +one at X.Y, they are actually stored at the same place in the database, +and hence only one casing would be preserved. The basic rule is that +case can be discarded only when data is used to define structure in a +database, and two names are identical when compared in a case +insensitive manner. + + + + +Mockapetris [Page 9] + +RFC 1035 Domain Implementation and Specification November 1987 + + +Loss of case sensitive data must be minimized. Thus while data for x.y +and X.Y may both be stored under a single location x.y or X.Y, data for +a.x and B.X would never be stored under A.x, A.X, b.x, or b.X. In +general, this preserves the case of the first label of a domain name, +but forces standardization of interior node labels. + +Systems administrators who enter data into the domain database should +take care to represent the data they supply to the domain system in a +case-consistent manner if their system is case-sensitive. The data +distribution system in the domain system will ensure that consistent +representations are preserved. + +2.3.4. Size limits + +Various objects and parameters in the DNS have size limits. They are +listed below. Some could be easily changed, others are more +fundamental. + +labels 63 octets or less + +names 255 octets or less + +TTL positive values of a signed 32 bit number. + +UDP messages 512 octets or less + +3. DOMAIN NAME SPACE AND RR DEFINITIONS + +3.1. Name space definitions + +Domain names in messages are expressed in terms of a sequence of labels. +Each label is represented as a one octet length field followed by that +number of octets. Since every domain name ends with the null label of +the root, a domain name is terminated by a length byte of zero. The +high order two bits of every length octet must be zero, and the +remaining six bits of the length field limit the label to 63 octets or +less. + +To simplify implementations, the total length of a domain name (i.e., +label octets and label length octets) is restricted to 255 octets or +less. + +Although labels can contain any 8 bit values in octets that make up a +label, it is strongly recommended that labels follow the preferred +syntax described elsewhere in this memo, which is compatible with +existing host naming conventions. Name servers and resolvers must +compare labels in a case-insensitive manner (i.e., A=a), assuming ASCII +with zero parity. Non-alphabetic codes must match exactly. + + + +Mockapetris [Page 10] + +RFC 1035 Domain Implementation and Specification November 1987 + + +3.2. RR definitions + +3.2.1. Format + +All RRs have the same top level format shown below: + + 1 1 1 1 1 1 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | | + / / + / NAME / + | | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | TYPE | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | CLASS | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | TTL | + | | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | RDLENGTH | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--| + / RDATA / + / / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + + +where: + +NAME an owner name, i.e., the name of the node to which this + resource record pertains. + +TYPE two octets containing one of the RR TYPE codes. + +CLASS two octets containing one of the RR CLASS codes. + +TTL a 32 bit signed integer that specifies the time interval + that the resource record may be cached before the source + of the information should again be consulted. Zero + values are interpreted to mean that the RR can only be + used for the transaction in progress, and should not be + cached. For example, SOA records are always distributed + with a zero TTL to prohibit caching. Zero values can + also be used for extremely volatile data. + +RDLENGTH an unsigned 16 bit integer that specifies the length in + octets of the RDATA field. + + + +Mockapetris [Page 11] + +RFC 1035 Domain Implementation and Specification November 1987 + + +RDATA a variable length string of octets that describes the + resource. The format of this information varies + according to the TYPE and CLASS of the resource record. + +3.2.2. TYPE values + +TYPE fields are used in resource records. Note that these types are a +subset of QTYPEs. + +TYPE value and meaning + +A 1 a host address + +NS 2 an authoritative name server + +MD 3 a mail destination (Obsolete - use MX) + +MF 4 a mail forwarder (Obsolete - use MX) + +CNAME 5 the canonical name for an alias + +SOA 6 marks the start of a zone of authority + +MB 7 a mailbox domain name (EXPERIMENTAL) + +MG 8 a mail group member (EXPERIMENTAL) + +MR 9 a mail rename domain name (EXPERIMENTAL) + +NULL 10 a null RR (EXPERIMENTAL) + +WKS 11 a well known service description + +PTR 12 a domain name pointer + +HINFO 13 host information + +MINFO 14 mailbox or mail list information + +MX 15 mail exchange + +TXT 16 text strings + +3.2.3. QTYPE values + +QTYPE fields appear in the question part of a query. QTYPES are a +superset of TYPEs, hence all TYPEs are valid QTYPEs. In addition, the +following QTYPEs are defined: + + + +Mockapetris [Page 12] + +RFC 1035 Domain Implementation and Specification November 1987 + + +AXFR 252 A request for a transfer of an entire zone + +MAILB 253 A request for mailbox-related records (MB, MG or MR) + +MAILA 254 A request for mail agent RRs (Obsolete - see MX) + +* 255 A request for all records + +3.2.4. CLASS values + +CLASS fields appear in resource records. The following CLASS mnemonics +and values are defined: + +IN 1 the Internet + +CS 2 the CSNET class (Obsolete - used only for examples in + some obsolete RFCs) + +CH 3 the CHAOS class + +HS 4 Hesiod [Dyer 87] + +3.2.5. QCLASS values + +QCLASS fields appear in the question section of a query. QCLASS values +are a superset of CLASS values; every CLASS is a valid QCLASS. In +addition to CLASS values, the following QCLASSes are defined: + +* 255 any class + +3.3. Standard RRs + +The following RR definitions are expected to occur, at least +potentially, in all classes. In particular, NS, SOA, CNAME, and PTR +will be used in all classes, and have the same format in all classes. +Because their RDATA format is known, all domain names in the RDATA +section of these RRs may be compressed. + +<domain-name> is a domain name represented as a series of labels, and +terminated by a label with zero length. <character-string> is a single +length octet followed by that number of characters. <character-string> +is treated as binary information, and can be up to 256 characters in +length (including the length octet). + + + + + + + + +Mockapetris [Page 13] + +RFC 1035 Domain Implementation and Specification November 1987 + + +3.3.1. CNAME RDATA format + + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + / CNAME / + / / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + +where: + +CNAME A <domain-name> which specifies the canonical or primary + name for the owner. The owner name is an alias. + +CNAME RRs cause no additional section processing, but name servers may +choose to restart the query at the canonical name in certain cases. See +the description of name server logic in [RFC-1034] for details. + +3.3.2. HINFO RDATA format + + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + / CPU / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + / OS / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + +where: + +CPU A <character-string> which specifies the CPU type. + +OS A <character-string> which specifies the operating + system type. + +Standard values for CPU and OS can be found in [RFC-1010]. + +HINFO records are used to acquire general information about a host. The +main use is for protocols such as FTP that can use special procedures +when talking between machines or operating systems of the same type. + +3.3.3. MB RDATA format (EXPERIMENTAL) + + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + / MADNAME / + / / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + +where: + +MADNAME A <domain-name> which specifies a host which has the + specified mailbox. + + + +Mockapetris [Page 14] + +RFC 1035 Domain Implementation and Specification November 1987 + + +MB records cause additional section processing which looks up an A type +RRs corresponding to MADNAME. + +3.3.4. MD RDATA format (Obsolete) + + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + / MADNAME / + / / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + +where: + +MADNAME A <domain-name> which specifies a host which has a mail + agent for the domain which should be able to deliver + mail for the domain. + +MD records cause additional section processing which looks up an A type +record corresponding to MADNAME. + +MD is obsolete. See the definition of MX and [RFC-974] for details of +the new scheme. The recommended policy for dealing with MD RRs found in +a master file is to reject them, or to convert them to MX RRs with a +preference of 0. + +3.3.5. MF RDATA format (Obsolete) + + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + / MADNAME / + / / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + +where: + +MADNAME A <domain-name> which specifies a host which has a mail + agent for the domain which will accept mail for + forwarding to the domain. + +MF records cause additional section processing which looks up an A type +record corresponding to MADNAME. + +MF is obsolete. See the definition of MX and [RFC-974] for details ofw +the new scheme. The recommended policy for dealing with MD RRs found in +a master file is to reject them, or to convert them to MX RRs with a +preference of 10. + + + + + + + +Mockapetris [Page 15] + +RFC 1035 Domain Implementation and Specification November 1987 + + +3.3.6. MG RDATA format (EXPERIMENTAL) + + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + / MGMNAME / + / / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + +where: + +MGMNAME A <domain-name> which specifies a mailbox which is a + member of the mail group specified by the domain name. + +MG records cause no additional section processing. + +3.3.7. MINFO RDATA format (EXPERIMENTAL) + + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + / RMAILBX / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + / EMAILBX / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + +where: + +RMAILBX A <domain-name> which specifies a mailbox which is + responsible for the mailing list or mailbox. If this + domain name names the root, the owner of the MINFO RR is + responsible for itself. Note that many existing mailing + lists use a mailbox X-request for the RMAILBX field of + mailing list X, e.g., Msgroup-request for Msgroup. This + field provides a more general mechanism. + + +EMAILBX A <domain-name> which specifies a mailbox which is to + receive error messages related to the mailing list or + mailbox specified by the owner of the MINFO RR (similar + to the ERRORS-TO: field which has been proposed). If + this domain name names the root, errors should be + returned to the sender of the message. + +MINFO records cause no additional section processing. Although these +records can be associated with a simple mailbox, they are usually used +with a mailing list. + + + + + + + + +Mockapetris [Page 16] + +RFC 1035 Domain Implementation and Specification November 1987 + + +3.3.8. MR RDATA format (EXPERIMENTAL) + + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + / NEWNAME / + / / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + +where: + +NEWNAME A <domain-name> which specifies a mailbox which is the + proper rename of the specified mailbox. + +MR records cause no additional section processing. The main use for MR +is as a forwarding entry for a user who has moved to a different +mailbox. + +3.3.9. MX RDATA format + + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | PREFERENCE | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + / EXCHANGE / + / / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + +where: + +PREFERENCE A 16 bit integer which specifies the preference given to + this RR among others at the same owner. Lower values + are preferred. + +EXCHANGE A <domain-name> which specifies a host willing to act as + a mail exchange for the owner name. + +MX records cause type A additional section processing for the host +specified by EXCHANGE. The use of MX RRs is explained in detail in +[RFC-974]. + +3.3.10. NULL RDATA format (EXPERIMENTAL) + + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + / <anything> / + / / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + +Anything at all may be in the RDATA field so long as it is 65535 octets +or less. + + + + +Mockapetris [Page 17] + +RFC 1035 Domain Implementation and Specification November 1987 + + +NULL records cause no additional section processing. NULL RRs are not +allowed in master files. NULLs are used as placeholders in some +experimental extensions of the DNS. + +3.3.11. NS RDATA format + + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + / NSDNAME / + / / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + +where: + +NSDNAME A <domain-name> which specifies a host which should be + authoritative for the specified class and domain. + +NS records cause both the usual additional section processing to locate +a type A record, and, when used in a referral, a special search of the +zone in which they reside for glue information. + +The NS RR states that the named host should be expected to have a zone +starting at owner name of the specified class. Note that the class may +not indicate the protocol family which should be used to communicate +with the host, although it is typically a strong hint. For example, +hosts which are name servers for either Internet (IN) or Hesiod (HS) +class information are normally queried using IN class protocols. + +3.3.12. PTR RDATA format + + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + / PTRDNAME / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + +where: + +PTRDNAME A <domain-name> which points to some location in the + domain name space. + +PTR records cause no additional section processing. These RRs are used +in special domains to point to some other location in the domain space. +These records are simple data, and don't imply any special processing +similar to that performed by CNAME, which identifies aliases. See the +description of the IN-ADDR.ARPA domain for an example. + + + + + + + + +Mockapetris [Page 18] + +RFC 1035 Domain Implementation and Specification November 1987 + + +3.3.13. SOA RDATA format + + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + / MNAME / + / / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + / RNAME / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | SERIAL | + | | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | REFRESH | + | | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | RETRY | + | | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | EXPIRE | + | | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | MINIMUM | + | | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + +where: + +MNAME The <domain-name> of the name server that was the + original or primary source of data for this zone. + +RNAME A <domain-name> which specifies the mailbox of the + person responsible for this zone. + +SERIAL The unsigned 32 bit version number of the original copy + of the zone. Zone transfers preserve this value. This + value wraps and should be compared using sequence space + arithmetic. + +REFRESH A 32 bit time interval before the zone should be + refreshed. + +RETRY A 32 bit time interval that should elapse before a + failed refresh should be retried. + +EXPIRE A 32 bit time value that specifies the upper limit on + the time interval that can elapse before the zone is no + longer authoritative. + + + + + +Mockapetris [Page 19] + +RFC 1035 Domain Implementation and Specification November 1987 + + +MINIMUM The unsigned 32 bit minimum TTL field that should be + exported with any RR from this zone. + +SOA records cause no additional section processing. + +All times are in units of seconds. + +Most of these fields are pertinent only for name server maintenance +operations. However, MINIMUM is used in all query operations that +retrieve RRs from a zone. Whenever a RR is sent in a response to a +query, the TTL field is set to the maximum of the TTL field from the RR +and the MINIMUM field in the appropriate SOA. Thus MINIMUM is a lower +bound on the TTL field for all RRs in a zone. Note that this use of +MINIMUM should occur when the RRs are copied into the response and not +when the zone is loaded from a master file or via a zone transfer. The +reason for this provison is to allow future dynamic update facilities to +change the SOA RR with known semantics. + + +3.3.14. TXT RDATA format + + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + / TXT-DATA / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + +where: + +TXT-DATA One or more <character-string>s. + +TXT RRs are used to hold descriptive text. The semantics of the text +depends on the domain where it is found. + +3.4. Internet specific RRs + +3.4.1. A RDATA format + + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | ADDRESS | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + +where: + +ADDRESS A 32 bit Internet address. + +Hosts that have multiple Internet addresses will have multiple A +records. + + + + + +Mockapetris [Page 20] + +RFC 1035 Domain Implementation and Specification November 1987 + + +A records cause no additional section processing. The RDATA section of +an A line in a master file is an Internet address expressed as four +decimal numbers separated by dots without any imbedded spaces (e.g., +"10.2.0.52" or "192.0.5.6"). + +3.4.2. WKS RDATA format + + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | ADDRESS | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | PROTOCOL | | + +--+--+--+--+--+--+--+--+ | + | | + / <BIT MAP> / + / / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + +where: + +ADDRESS An 32 bit Internet address + +PROTOCOL An 8 bit IP protocol number + +<BIT MAP> A variable length bit map. The bit map must be a + multiple of 8 bits long. + +The WKS record is used to describe the well known services supported by +a particular protocol on a particular internet address. The PROTOCOL +field specifies an IP protocol number, and the bit map has one bit per +port of the specified protocol. The first bit corresponds to port 0, +the second to port 1, etc. If the bit map does not include a bit for a +protocol of interest, that bit is assumed zero. The appropriate values +and mnemonics for ports and protocols are specified in [RFC-1010]. + +For example, if PROTOCOL=TCP (6), the 26th bit corresponds to TCP port +25 (SMTP). If this bit is set, a SMTP server should be listening on TCP +port 25; if zero, SMTP service is not supported on the specified +address. + +The purpose of WKS RRs is to provide availability information for +servers for TCP and UDP. If a server supports both TCP and UDP, or has +multiple Internet addresses, then multiple WKS RRs are used. + +WKS RRs cause no additional section processing. + +In master files, both ports and protocols are expressed using mnemonics +or decimal numbers. + + + + +Mockapetris [Page 21] + +RFC 1035 Domain Implementation and Specification November 1987 + + +3.5. IN-ADDR.ARPA domain + +The Internet uses a special domain to support gateway location and +Internet address to host mapping. Other classes may employ a similar +strategy in other domains. The intent of this domain is to provide a +guaranteed method to perform host address to host name mapping, and to +facilitate queries to locate all gateways on a particular network in the +Internet. + +Note that both of these services are similar to functions that could be +performed by inverse queries; the difference is that this part of the +domain name space is structured according to address, and hence can +guarantee that the appropriate data can be located without an exhaustive +search of the domain space. + +The domain begins at IN-ADDR.ARPA and has a substructure which follows +the Internet addressing structure. + +Domain names in the IN-ADDR.ARPA domain are defined to have up to four +labels in addition to the IN-ADDR.ARPA suffix. Each label represents +one octet of an Internet address, and is expressed as a character string +for a decimal value in the range 0-255 (with leading zeros omitted +except in the case of a zero octet which is represented by a single +zero). + +Host addresses are represented by domain names that have all four labels +specified. Thus data for Internet address 10.2.0.52 is located at +domain name 52.0.2.10.IN-ADDR.ARPA. The reversal, though awkward to +read, allows zones to be delegated which are exactly one network of +address space. For example, 10.IN-ADDR.ARPA can be a zone containing +data for the ARPANET, while 26.IN-ADDR.ARPA can be a separate zone for +MILNET. Address nodes are used to hold pointers to primary host names +in the normal domain space. + +Network numbers correspond to some non-terminal nodes at various depths +in the IN-ADDR.ARPA domain, since Internet network numbers are either 1, +2, or 3 octets. Network nodes are used to hold pointers to the primary +host names of gateways attached to that network. Since a gateway is, by +definition, on more than one network, it will typically have two or more +network nodes which point at it. Gateways will also have host level +pointers at their fully qualified addresses. + +Both the gateway pointers at network nodes and the normal host pointers +at full address nodes use the PTR RR to point back to the primary domain +names of the corresponding hosts. + +For example, the IN-ADDR.ARPA domain will contain information about the +ISI gateway between net 10 and 26, an MIT gateway from net 10 to MIT's + + + +Mockapetris [Page 22] + +RFC 1035 Domain Implementation and Specification November 1987 + + +net 18, and hosts A.ISI.EDU and MULTICS.MIT.EDU. Assuming that ISI +gateway has addresses 10.2.0.22 and 26.0.0.103, and a name MILNET- +GW.ISI.EDU, and the MIT gateway has addresses 10.0.0.77 and 18.10.0.4 +and a name GW.LCS.MIT.EDU, the domain database would contain: + + 10.IN-ADDR.ARPA. PTR MILNET-GW.ISI.EDU. + 10.IN-ADDR.ARPA. PTR GW.LCS.MIT.EDU. + 18.IN-ADDR.ARPA. PTR GW.LCS.MIT.EDU. + 26.IN-ADDR.ARPA. PTR MILNET-GW.ISI.EDU. + 22.0.2.10.IN-ADDR.ARPA. PTR MILNET-GW.ISI.EDU. + 103.0.0.26.IN-ADDR.ARPA. PTR MILNET-GW.ISI.EDU. + 77.0.0.10.IN-ADDR.ARPA. PTR GW.LCS.MIT.EDU. + 4.0.10.18.IN-ADDR.ARPA. PTR GW.LCS.MIT.EDU. + 103.0.3.26.IN-ADDR.ARPA. PTR A.ISI.EDU. + 6.0.0.10.IN-ADDR.ARPA. PTR MULTICS.MIT.EDU. + +Thus a program which wanted to locate gateways on net 10 would originate +a query of the form QTYPE=PTR, QCLASS=IN, QNAME=10.IN-ADDR.ARPA. It +would receive two RRs in response: + + 10.IN-ADDR.ARPA. PTR MILNET-GW.ISI.EDU. + 10.IN-ADDR.ARPA. PTR GW.LCS.MIT.EDU. + +The program could then originate QTYPE=A, QCLASS=IN queries for MILNET- +GW.ISI.EDU. and GW.LCS.MIT.EDU. to discover the Internet addresses of +these gateways. + +A resolver which wanted to find the host name corresponding to Internet +host address 10.0.0.6 would pursue a query of the form QTYPE=PTR, +QCLASS=IN, QNAME=6.0.0.10.IN-ADDR.ARPA, and would receive: + + 6.0.0.10.IN-ADDR.ARPA. PTR MULTICS.MIT.EDU. + +Several cautions apply to the use of these services: + - Since the IN-ADDR.ARPA special domain and the normal domain + for a particular host or gateway will be in different zones, + the possibility exists that that the data may be inconsistent. + + - Gateways will often have two names in separate domains, only + one of which can be primary. + + - Systems that use the domain database to initialize their + routing tables must start with enough gateway information to + guarantee that they can access the appropriate name server. + + - The gateway data only reflects the existence of a gateway in a + manner equivalent to the current HOSTS.TXT file. It doesn't + replace the dynamic availability information from GGP or EGP. + + + +Mockapetris [Page 23] + +RFC 1035 Domain Implementation and Specification November 1987 + + +3.6. Defining new types, classes, and special namespaces + +The previously defined types and classes are the ones in use as of the +date of this memo. New definitions should be expected. This section +makes some recommendations to designers considering additions to the +existing facilities. The mailing list NAMEDROPPERS@SRI-NIC.ARPA is the +forum where general discussion of design issues takes place. + +In general, a new type is appropriate when new information is to be +added to the database about an existing object, or we need new data +formats for some totally new object. Designers should attempt to define +types and their RDATA formats that are generally applicable to all +classes, and which avoid duplication of information. New classes are +appropriate when the DNS is to be used for a new protocol, etc which +requires new class-specific data formats, or when a copy of the existing +name space is desired, but a separate management domain is necessary. + +New types and classes need mnemonics for master files; the format of the +master files requires that the mnemonics for type and class be disjoint. + +TYPE and CLASS values must be a proper subset of QTYPEs and QCLASSes +respectively. + +The present system uses multiple RRs to represent multiple values of a +type rather than storing multiple values in the RDATA section of a +single RR. This is less efficient for most applications, but does keep +RRs shorter. The multiple RRs assumption is incorporated in some +experimental work on dynamic update methods. + +The present system attempts to minimize the duplication of data in the +database in order to insure consistency. Thus, in order to find the +address of the host for a mail exchange, you map the mail domain name to +a host name, then the host name to addresses, rather than a direct +mapping to host address. This approach is preferred because it avoids +the opportunity for inconsistency. + +In defining a new type of data, multiple RR types should not be used to +create an ordering between entries or express different formats for +equivalent bindings, instead this information should be carried in the +body of the RR and a single type used. This policy avoids problems with +caching multiple types and defining QTYPEs to match multiple types. + +For example, the original form of mail exchange binding used two RR +types one to represent a "closer" exchange (MD) and one to represent a +"less close" exchange (MF). The difficulty is that the presence of one +RR type in a cache doesn't convey any information about the other +because the query which acquired the cached information might have used +a QTYPE of MF, MD, or MAILA (which matched both). The redesigned + + + +Mockapetris [Page 24] + +RFC 1035 Domain Implementation and Specification November 1987 + + +service used a single type (MX) with a "preference" value in the RDATA +section which can order different RRs. However, if any MX RRs are found +in the cache, then all should be there. + +4. MESSAGES + +4.1. Format + +All communications inside of the domain protocol are carried in a single +format called a message. The top level format of message is divided +into 5 sections (some of which are empty in certain cases) shown below: + + +---------------------+ + | Header | + +---------------------+ + | Question | the question for the name server + +---------------------+ + | Answer | RRs answering the question + +---------------------+ + | Authority | RRs pointing toward an authority + +---------------------+ + | Additional | RRs holding additional information + +---------------------+ + +The header section is always present. The header includes fields that +specify which of the remaining sections are present, and also specify +whether the message is a query or a response, a standard query or some +other opcode, etc. + +The names of the sections after the header are derived from their use in +standard queries. The question section contains fields that describe a +question to a name server. These fields are a query type (QTYPE), a +query class (QCLASS), and a query domain name (QNAME). The last three +sections have the same format: a possibly empty list of concatenated +resource records (RRs). The answer section contains RRs that answer the +question; the authority section contains RRs that point toward an +authoritative name server; the additional records section contains RRs +which relate to the query, but are not strictly answers for the +question. + + + + + + + + + + + + +Mockapetris [Page 25] + +RFC 1035 Domain Implementation and Specification November 1987 + + +4.1.1. Header section format + +The header contains the following fields: + + 1 1 1 1 1 1 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | ID | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + |QR| Opcode |AA|TC|RD|RA| Z | RCODE | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | QDCOUNT | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | ANCOUNT | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | NSCOUNT | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | ARCOUNT | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + +where: + +ID A 16 bit identifier assigned by the program that + generates any kind of query. This identifier is copied + the corresponding reply and can be used by the requester + to match up replies to outstanding queries. + +QR A one bit field that specifies whether this message is a + query (0), or a response (1). + +OPCODE A four bit field that specifies kind of query in this + message. This value is set by the originator of a query + and copied into the response. The values are: + + 0 a standard query (QUERY) + + 1 an inverse query (IQUERY) + + 2 a server status request (STATUS) + + 3-15 reserved for future use + +AA Authoritative Answer - this bit is valid in responses, + and specifies that the responding name server is an + authority for the domain name in question section. + + Note that the contents of the answer section may have + multiple owner names because of aliases. The AA bit + + + +Mockapetris [Page 26] + +RFC 1035 Domain Implementation and Specification November 1987 + + + corresponds to the name which matches the query name, or + the first owner name in the answer section. + +TC TrunCation - specifies that this message was truncated + due to length greater than that permitted on the + transmission channel. + +RD Recursion Desired - this bit may be set in a query and + is copied into the response. If RD is set, it directs + the name server to pursue the query recursively. + Recursive query support is optional. + +RA Recursion Available - this be is set or cleared in a + response, and denotes whether recursive query support is + available in the name server. + +Z Reserved for future use. Must be zero in all queries + and responses. + +RCODE Response code - this 4 bit field is set as part of + responses. The values have the following + interpretation: + + 0 No error condition + + 1 Format error - The name server was + unable to interpret the query. + + 2 Server failure - The name server was + unable to process this query due to a + problem with the name server. + + 3 Name Error - Meaningful only for + responses from an authoritative name + server, this code signifies that the + domain name referenced in the query does + not exist. + + 4 Not Implemented - The name server does + not support the requested kind of query. + + 5 Refused - The name server refuses to + perform the specified operation for + policy reasons. For example, a name + server may not wish to provide the + information to the particular requester, + or a name server may not wish to perform + a particular operation (e.g., zone + + + +Mockapetris [Page 27] + +RFC 1035 Domain Implementation and Specification November 1987 + + + transfer) for particular data. + + 6-15 Reserved for future use. + +QDCOUNT an unsigned 16 bit integer specifying the number of + entries in the question section. + +ANCOUNT an unsigned 16 bit integer specifying the number of + resource records in the answer section. + +NSCOUNT an unsigned 16 bit integer specifying the number of name + server resource records in the authority records + section. + +ARCOUNT an unsigned 16 bit integer specifying the number of + resource records in the additional records section. + +4.1.2. Question section format + +The question section is used to carry the "question" in most queries, +i.e., the parameters that define what is being asked. The section +contains QDCOUNT (usually 1) entries, each of the following format: + + 1 1 1 1 1 1 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | | + / QNAME / + / / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | QTYPE | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | QCLASS | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + +where: + +QNAME a domain name represented as a sequence of labels, where + each label consists of a length octet followed by that + number of octets. The domain name terminates with the + zero length octet for the null label of the root. Note + that this field may be an odd number of octets; no + padding is used. + +QTYPE a two octet code which specifies the type of the query. + The values for this field include all codes valid for a + TYPE field, together with some more general codes which + can match more than one type of RR. + + + +Mockapetris [Page 28] + +RFC 1035 Domain Implementation and Specification November 1987 + + +QCLASS a two octet code that specifies the class of the query. + For example, the QCLASS field is IN for the Internet. + +4.1.3. Resource record format + +The answer, authority, and additional sections all share the same +format: a variable number of resource records, where the number of +records is specified in the corresponding count field in the header. +Each resource record has the following format: + 1 1 1 1 1 1 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | | + / / + / NAME / + | | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | TYPE | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | CLASS | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | TTL | + | | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | RDLENGTH | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--| + / RDATA / + / / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + +where: + +NAME a domain name to which this resource record pertains. + +TYPE two octets containing one of the RR type codes. This + field specifies the meaning of the data in the RDATA + field. + +CLASS two octets which specify the class of the data in the + RDATA field. + +TTL a 32 bit unsigned integer that specifies the time + interval (in seconds) that the resource record may be + cached before it should be discarded. Zero values are + interpreted to mean that the RR can only be used for the + transaction in progress, and should not be cached. + + + + + +Mockapetris [Page 29] + +RFC 1035 Domain Implementation and Specification November 1987 + + +RDLENGTH an unsigned 16 bit integer that specifies the length in + octets of the RDATA field. + +RDATA a variable length string of octets that describes the + resource. The format of this information varies + according to the TYPE and CLASS of the resource record. + For example, the if the TYPE is A and the CLASS is IN, + the RDATA field is a 4 octet ARPA Internet address. + +4.1.4. Message compression + +In order to reduce the size of messages, the domain system utilizes a +compression scheme which eliminates the repetition of domain names in a +message. In this scheme, an entire domain name or a list of labels at +the end of a domain name is replaced with a pointer to a prior occurance +of the same name. + +The pointer takes the form of a two octet sequence: + + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | 1 1| OFFSET | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + +The first two bits are ones. This allows a pointer to be distinguished +from a label, since the label must begin with two zero bits because +labels are restricted to 63 octets or less. (The 10 and 01 combinations +are reserved for future use.) The OFFSET field specifies an offset from +the start of the message (i.e., the first octet of the ID field in the +domain header). A zero offset specifies the first byte of the ID field, +etc. + +The compression scheme allows a domain name in a message to be +represented as either: + + - a sequence of labels ending in a zero octet + + - a pointer + + - a sequence of labels ending with a pointer + +Pointers can only be used for occurances of a domain name where the +format is not class specific. If this were not the case, a name server +or resolver would be required to know the format of all RRs it handled. +As yet, there are no such cases, but they may occur in future RDATA +formats. + +If a domain name is contained in a part of the message subject to a +length field (such as the RDATA section of an RR), and compression is + + + +Mockapetris [Page 30] + +RFC 1035 Domain Implementation and Specification November 1987 + + +used, the length of the compressed name is used in the length +calculation, rather than the length of the expanded name. + +Programs are free to avoid using pointers in messages they generate, +although this will reduce datagram capacity, and may cause truncation. +However all programs are required to understand arriving messages that +contain pointers. + +For example, a datagram might need to use the domain names F.ISI.ARPA, +FOO.F.ISI.ARPA, ARPA, and the root. Ignoring the other fields of the +message, these domain names might be represented as: + + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + 20 | 1 | F | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + 22 | 3 | I | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + 24 | S | I | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + 26 | 4 | A | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + 28 | R | P | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + 30 | A | 0 | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + 40 | 3 | F | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + 42 | O | O | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + 44 | 1 1| 20 | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + 64 | 1 1| 26 | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + 92 | 0 | | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + +The domain name for F.ISI.ARPA is shown at offset 20. The domain name +FOO.F.ISI.ARPA is shown at offset 40; this definition uses a pointer to +concatenate a label for FOO to the previously defined F.ISI.ARPA. The +domain name ARPA is defined at offset 64 using a pointer to the ARPA +component of the name F.ISI.ARPA at 20; note that this pointer relies on +ARPA being the last label in the string at 20. The root domain name is + + + +Mockapetris [Page 31] + +RFC 1035 Domain Implementation and Specification November 1987 + + +defined by a single octet of zeros at 92; the root domain name has no +labels. + +4.2. Transport + +The DNS assumes that messages will be transmitted as datagrams or in a +byte stream carried by a virtual circuit. While virtual circuits can be +used for any DNS activity, datagrams are preferred for queries due to +their lower overhead and better performance. Zone refresh activities +must use virtual circuits because of the need for reliable transfer. + +The Internet supports name server access using TCP [RFC-793] on server +port 53 (decimal) as well as datagram access using UDP [RFC-768] on UDP +port 53 (decimal). + +4.2.1. UDP usage + +Messages sent using UDP user server port 53 (decimal). + +Messages carried by UDP are restricted to 512 bytes (not counting the IP +or UDP headers). Longer messages are truncated and the TC bit is set in +the header. + +UDP is not acceptable for zone transfers, but is the recommended method +for standard queries in the Internet. Queries sent using UDP may be +lost, and hence a retransmission strategy is required. Queries or their +responses may be reordered by the network, or by processing in name +servers, so resolvers should not depend on them being returned in order. + +The optimal UDP retransmission policy will vary with performance of the +Internet and the needs of the client, but the following are recommended: + + - The client should try other servers and server addresses + before repeating a query to a specific address of a server. + + - The retransmission interval should be based on prior + statistics if possible. Too aggressive retransmission can + easily slow responses for the community at large. Depending + on how well connected the client is to its expected servers, + the minimum retransmission interval should be 2-5 seconds. + +More suggestions on server selection and retransmission policy can be +found in the resolver section of this memo. + +4.2.2. TCP usage + +Messages sent over TCP connections use server port 53 (decimal). The +message is prefixed with a two byte length field which gives the message + + + +Mockapetris [Page 32] + +RFC 1035 Domain Implementation and Specification November 1987 + + +length, excluding the two byte length field. This length field allows +the low-level processing to assemble a complete message before beginning +to parse it. + +Several connection management policies are recommended: + + - The server should not block other activities waiting for TCP + data. + + - The server should support multiple connections. + + - The server should assume that the client will initiate + connection closing, and should delay closing its end of the + connection until all outstanding client requests have been + satisfied. + + - If the server needs to close a dormant connection to reclaim + resources, it should wait until the connection has been idle + for a period on the order of two minutes. In particular, the + server should allow the SOA and AXFR request sequence (which + begins a refresh operation) to be made on a single connection. + Since the server would be unable to answer queries anyway, a + unilateral close or reset may be used instead of a graceful + close. + +5. MASTER FILES + +Master files are text files that contain RRs in text form. Since the +contents of a zone can be expressed in the form of a list of RRs a +master file is most often used to define a zone, though it can be used +to list a cache's contents. Hence, this section first discusses the +format of RRs in a master file, and then the special considerations when +a master file is used to create a zone in some name server. + +5.1. Format + +The format of these files is a sequence of entries. Entries are +predominantly line-oriented, though parentheses can be used to continue +a list of items across a line boundary, and text literals can contain +CRLF within the text. Any combination of tabs and spaces act as a +delimiter between the separate items that make up an entry. The end of +any line in the master file can end with a comment. The comment starts +with a ";" (semicolon). + +The following entries are defined: + + <blank>[<comment>] + + + + +Mockapetris [Page 33] + +RFC 1035 Domain Implementation and Specification November 1987 + + + $ORIGIN <domain-name> [<comment>] + + $INCLUDE <file-name> [<domain-name>] [<comment>] + + <domain-name><rr> [<comment>] + + <blank><rr> [<comment>] + +Blank lines, with or without comments, are allowed anywhere in the file. + +Two control entries are defined: $ORIGIN and $INCLUDE. $ORIGIN is +followed by a domain name, and resets the current origin for relative +domain names to the stated name. $INCLUDE inserts the named file into +the current file, and may optionally specify a domain name that sets the +relative domain name origin for the included file. $INCLUDE may also +have a comment. Note that a $INCLUDE entry never changes the relative +origin of the parent file, regardless of changes to the relative origin +made within the included file. + +The last two forms represent RRs. If an entry for an RR begins with a +blank, then the RR is assumed to be owned by the last stated owner. If +an RR entry begins with a <domain-name>, then the owner name is reset. + +<rr> contents take one of the following forms: + + [<TTL>] [<class>] <type> <RDATA> + + [<class>] [<TTL>] <type> <RDATA> + +The RR begins with optional TTL and class fields, followed by a type and +RDATA field appropriate to the type and class. Class and type use the +standard mnemonics, TTL is a decimal integer. Omitted class and TTL +values are default to the last explicitly stated values. Since type and +class mnemonics are disjoint, the parse is unique. (Note that this +order is different from the order used in examples and the order used in +the actual RRs; the given order allows easier parsing and defaulting.) + +<domain-name>s make up a large share of the data in the master file. +The labels in the domain name are expressed as character strings and +separated by dots. Quoting conventions allow arbitrary characters to be +stored in domain names. Domain names that end in a dot are called +absolute, and are taken as complete. Domain names which do not end in a +dot are called relative; the actual domain name is the concatenation of +the relative part with an origin specified in a $ORIGIN, $INCLUDE, or as +an argument to the master file loading routine. A relative name is an +error when no origin is available. + + + + + +Mockapetris [Page 34] + +RFC 1035 Domain Implementation and Specification November 1987 + + +<character-string> is expressed in one or two ways: as a contiguous set +of characters without interior spaces, or as a string beginning with a " +and ending with a ". Inside a " delimited string any character can +occur, except for a " itself, which must be quoted using \ (back slash). + +Because these files are text files several special encodings are +necessary to allow arbitrary data to be loaded. In particular: + + of the root. + +@ A free standing @ is used to denote the current origin. + +\X where X is any character other than a digit (0-9), is + used to quote that character so that its special meaning + does not apply. For example, "\." can be used to place + a dot character in a label. + +\DDD where each D is a digit is the octet corresponding to + the decimal number described by DDD. The resulting + octet is assumed to be text and is not checked for + special meaning. + +( ) Parentheses are used to group data that crosses a line + boundary. In effect, line terminations are not + recognized within parentheses. + +; Semicolon is used to start a comment; the remainder of + the line is ignored. + +5.2. Use of master files to define zones + +When a master file is used to load a zone, the operation should be +suppressed if any errors are encountered in the master file. The +rationale for this is that a single error can have widespread +consequences. For example, suppose that the RRs defining a delegation +have syntax errors; then the server will return authoritative name +errors for all names in the subzone (except in the case where the +subzone is also present on the server). + +Several other validity checks that should be performed in addition to +insuring that the file is syntactically correct: + + 1. All RRs in the file should have the same class. + + 2. Exactly one SOA RR should be present at the top of the zone. + + 3. If delegations are present and glue information is required, + it should be present. + + + +Mockapetris [Page 35] + +RFC 1035 Domain Implementation and Specification November 1987 + + + 4. Information present outside of the authoritative nodes in the + zone should be glue information, rather than the result of an + origin or similar error. + +5.3. Master file example + +The following is an example file which might be used to define the +ISI.EDU zone.and is loaded with an origin of ISI.EDU: + +@ IN SOA VENERA Action\.domains ( + 20 ; SERIAL + 7200 ; REFRESH + 600 ; RETRY + 3600000; EXPIRE + 60) ; MINIMUM + + NS A.ISI.EDU. + NS VENERA + NS VAXA + MX 10 VENERA + MX 20 VAXA + +A A 26.3.0.103 + +VENERA A 10.1.0.52 + A 128.9.0.32 + +VAXA A 10.2.0.27 + A 128.9.0.33 + + +$INCLUDE <SUBSYS>ISI-MAILBOXES.TXT + +Where the file <SUBSYS>ISI-MAILBOXES.TXT is: + + MOE MB A.ISI.EDU. + LARRY MB A.ISI.EDU. + CURLEY MB A.ISI.EDU. + STOOGES MG MOE + MG LARRY + MG CURLEY + +Note the use of the \ character in the SOA RR to specify the responsible +person mailbox "Action.domains@E.ISI.EDU". + + + + + + + +Mockapetris [Page 36] + +RFC 1035 Domain Implementation and Specification November 1987 + + +6. NAME SERVER IMPLEMENTATION + +6.1. Architecture + +The optimal structure for the name server will depend on the host +operating system and whether the name server is integrated with resolver +operations, either by supporting recursive service, or by sharing its +database with a resolver. This section discusses implementation +considerations for a name server which shares a database with a +resolver, but most of these concerns are present in any name server. + +6.1.1. Control + +A name server must employ multiple concurrent activities, whether they +are implemented as separate tasks in the host's OS or multiplexing +inside a single name server program. It is simply not acceptable for a +name server to block the service of UDP requests while it waits for TCP +data for refreshing or query activities. Similarly, a name server +should not attempt to provide recursive service without processing such +requests in parallel, though it may choose to serialize requests from a +single client, or to regard identical requests from the same client as +duplicates. A name server should not substantially delay requests while +it reloads a zone from master files or while it incorporates a newly +refreshed zone into its database. + +6.1.2. Database + +While name server implementations are free to use any internal data +structures they choose, the suggested structure consists of three major +parts: + + - A "catalog" data structure which lists the zones available to + this server, and a "pointer" to the zone data structure. The + main purpose of this structure is to find the nearest ancestor + zone, if any, for arriving standard queries. + + - Separate data structures for each of the zones held by the + name server. + + - A data structure for cached data. (or perhaps separate caches + for different classes) + +All of these data structures can be implemented an identical tree +structure format, with different data chained off the nodes in different +parts: in the catalog the data is pointers to zones, while in the zone +and cache data structures, the data will be RRs. In designing the tree +framework the designer should recognize that query processing will need +to traverse the tree using case-insensitive label comparisons; and that + + + +Mockapetris [Page 37] + +RFC 1035 Domain Implementation and Specification November 1987 + + +in real data, a few nodes have a very high branching factor (100-1000 or +more), but the vast majority have a very low branching factor (0-1). + +One way to solve the case problem is to store the labels for each node +in two pieces: a standardized-case representation of the label where all +ASCII characters are in a single case, together with a bit mask that +denotes which characters are actually of a different case. The +branching factor diversity can be handled using a simple linked list for +a node until the branching factor exceeds some threshold, and +transitioning to a hash structure after the threshold is exceeded. In +any case, hash structures used to store tree sections must insure that +hash functions and procedures preserve the casing conventions of the +DNS. + +The use of separate structures for the different parts of the database +is motivated by several factors: + + - The catalog structure can be an almost static structure that + need change only when the system administrator changes the + zones supported by the server. This structure can also be + used to store parameters used to control refreshing + activities. + + - The individual data structures for zones allow a zone to be + replaced simply by changing a pointer in the catalog. Zone + refresh operations can build a new structure and, when + complete, splice it into the database via a simple pointer + replacement. It is very important that when a zone is + refreshed, queries should not use old and new data + simultaneously. + + - With the proper search procedures, authoritative data in zones + will always "hide", and hence take precedence over, cached + data. + + - Errors in zone definitions that cause overlapping zones, etc., + may cause erroneous responses to queries, but problem + determination is simplified, and the contents of one "bad" + zone can't corrupt another. + + - Since the cache is most frequently updated, it is most + vulnerable to corruption during system restarts. It can also + become full of expired RR data. In either case, it can easily + be discarded without disturbing zone data. + +A major aspect of database design is selecting a structure which allows +the name server to deal with crashes of the name server's host. State +information which a name server should save across system crashes + + + +Mockapetris [Page 38] + +RFC 1035 Domain Implementation and Specification November 1987 + + +includes the catalog structure (including the state of refreshing for +each zone) and the zone data itself. + +6.1.3. Time + +Both the TTL data for RRs and the timing data for refreshing activities +depends on 32 bit timers in units of seconds. Inside the database, +refresh timers and TTLs for cached data conceptually "count down", while +data in the zone stays with constant TTLs. + +A recommended implementation strategy is to store time in two ways: as +a relative increment and as an absolute time. One way to do this is to +use positive 32 bit numbers for one type and negative numbers for the +other. The RRs in zones use relative times; the refresh timers and +cache data use absolute times. Absolute numbers are taken with respect +to some known origin and converted to relative values when placed in the +response to a query. When an absolute TTL is negative after conversion +to relative, then the data is expired and should be ignored. + +6.2. Standard query processing + +The major algorithm for standard query processing is presented in +[RFC-1034]. + +When processing queries with QCLASS=*, or some other QCLASS which +matches multiple classes, the response should never be authoritative +unless the server can guarantee that the response covers all classes. + +When composing a response, RRs which are to be inserted in the +additional section, but duplicate RRs in the answer or authority +sections, may be omitted from the additional section. + +When a response is so long that truncation is required, the truncation +should start at the end of the response and work forward in the +datagram. Thus if there is any data for the authority section, the +answer section is guaranteed to be unique. + +The MINIMUM value in the SOA should be used to set a floor on the TTL of +data distributed from a zone. This floor function should be done when +the data is copied into a response. This will allow future dynamic +update protocols to change the SOA MINIMUM field without ambiguous +semantics. + +6.3. Zone refresh and reload processing + +In spite of a server's best efforts, it may be unable to load zone data +from a master file due to syntax errors, etc., or be unable to refresh a +zone within the its expiration parameter. In this case, the name server + + + +Mockapetris [Page 39] + +RFC 1035 Domain Implementation and Specification November 1987 + + +should answer queries as if it were not supposed to possess the zone. + +If a master is sending a zone out via AXFR, and a new version is created +during the transfer, the master should continue to send the old version +if possible. In any case, it should never send part of one version and +part of another. If completion is not possible, the master should reset +the connection on which the zone transfer is taking place. + +6.4. Inverse queries (Optional) + +Inverse queries are an optional part of the DNS. Name servers are not +required to support any form of inverse queries. If a name server +receives an inverse query that it does not support, it returns an error +response with the "Not Implemented" error set in the header. While +inverse query support is optional, all name servers must be at least +able to return the error response. + +6.4.1. The contents of inverse queries and responses Inverse +queries reverse the mappings performed by standard query operations; +while a standard query maps a domain name to a resource, an inverse +query maps a resource to a domain name. For example, a standard query +might bind a domain name to a host address; the corresponding inverse +query binds the host address to a domain name. + +Inverse queries take the form of a single RR in the answer section of +the message, with an empty question section. The owner name of the +query RR and its TTL are not significant. The response carries +questions in the question section which identify all names possessing +the query RR WHICH THE NAME SERVER KNOWS. Since no name server knows +about all of the domain name space, the response can never be assumed to +be complete. Thus inverse queries are primarily useful for database +management and debugging activities. Inverse queries are NOT an +acceptable method of mapping host addresses to host names; use the IN- +ADDR.ARPA domain instead. + +Where possible, name servers should provide case-insensitive comparisons +for inverse queries. Thus an inverse query asking for an MX RR of +"Venera.isi.edu" should get the same response as a query for +"VENERA.ISI.EDU"; an inverse query for HINFO RR "IBM-PC UNIX" should +produce the same result as an inverse query for "IBM-pc unix". However, +this cannot be guaranteed because name servers may possess RRs that +contain character strings but the name server does not know that the +data is character. + +When a name server processes an inverse query, it either returns: + + 1. zero, one, or multiple domain names for the specified + resource as QNAMEs in the question section + + + +Mockapetris [Page 40] + +RFC 1035 Domain Implementation and Specification November 1987 + + + 2. an error code indicating that the name server doesn't support + inverse mapping of the specified resource type. + +When the response to an inverse query contains one or more QNAMEs, the +owner name and TTL of the RR in the answer section which defines the +inverse query is modified to exactly match an RR found at the first +QNAME. + +RRs returned in the inverse queries cannot be cached using the same +mechanism as is used for the replies to standard queries. One reason +for this is that a name might have multiple RRs of the same type, and +only one would appear. For example, an inverse query for a single +address of a multiply homed host might create the impression that only +one address existed. + +6.4.2. Inverse query and response example The overall structure +of an inverse query for retrieving the domain name that corresponds to +Internet address 10.1.0.52 is shown below: + + +-----------------------------------------+ + Header | OPCODE=IQUERY, ID=997 | + +-----------------------------------------+ + Question | <empty> | + +-----------------------------------------+ + Answer | <anyname> A IN 10.1.0.52 | + +-----------------------------------------+ + Authority | <empty> | + +-----------------------------------------+ + Additional | <empty> | + +-----------------------------------------+ + +This query asks for a question whose answer is the Internet style +address 10.1.0.52. Since the owner name is not known, any domain name +can be used as a placeholder (and is ignored). A single octet of zero, +signifying the root, is usually used because it minimizes the length of +the message. The TTL of the RR is not significant. The response to +this query might be: + + + + + + + + + + + + + + +Mockapetris [Page 41] + +RFC 1035 Domain Implementation and Specification November 1987 + + + +-----------------------------------------+ + Header | OPCODE=RESPONSE, ID=997 | + +-----------------------------------------+ + Question |QTYPE=A, QCLASS=IN, QNAME=VENERA.ISI.EDU | + +-----------------------------------------+ + Answer | VENERA.ISI.EDU A IN 10.1.0.52 | + +-----------------------------------------+ + Authority | <empty> | + +-----------------------------------------+ + Additional | <empty> | + +-----------------------------------------+ + +Note that the QTYPE in a response to an inverse query is the same as the +TYPE field in the answer section of the inverse query. Responses to +inverse queries may contain multiple questions when the inverse is not +unique. If the question section in the response is not empty, then the +RR in the answer section is modified to correspond to be an exact copy +of an RR at the first QNAME. + +6.4.3. Inverse query processing + +Name servers that support inverse queries can support these operations +through exhaustive searches of their databases, but this becomes +impractical as the size of the database increases. An alternative +approach is to invert the database according to the search key. + +For name servers that support multiple zones and a large amount of data, +the recommended approach is separate inversions for each zone. When a +particular zone is changed during a refresh, only its inversions need to +be redone. + +Support for transfer of this type of inversion may be included in future +versions of the domain system, but is not supported in this version. + +6.5. Completion queries and responses + +The optional completion services described in RFC-882 and RFC-883 have +been deleted. Redesigned services may become available in the future. + + + + + + + + + + + + + +Mockapetris [Page 42] + +RFC 1035 Domain Implementation and Specification November 1987 + + +7. RESOLVER IMPLEMENTATION + +The top levels of the recommended resolver algorithm are discussed in +[RFC-1034]. This section discusses implementation details assuming the +database structure suggested in the name server implementation section +of this memo. + +7.1. Transforming a user request into a query + +The first step a resolver takes is to transform the client's request, +stated in a format suitable to the local OS, into a search specification +for RRs at a specific name which match a specific QTYPE and QCLASS. +Where possible, the QTYPE and QCLASS should correspond to a single type +and a single class, because this makes the use of cached data much +simpler. The reason for this is that the presence of data of one type +in a cache doesn't confirm the existence or non-existence of data of +other types, hence the only way to be sure is to consult an +authoritative source. If QCLASS=* is used, then authoritative answers +won't be available. + +Since a resolver must be able to multiplex multiple requests if it is to +perform its function efficiently, each pending request is usually +represented in some block of state information. This state block will +typically contain: + + - A timestamp indicating the time the request began. + The timestamp is used to decide whether RRs in the database + can be used or are out of date. This timestamp uses the + absolute time format previously discussed for RR storage in + zones and caches. Note that when an RRs TTL indicates a + relative time, the RR must be timely, since it is part of a + zone. When the RR has an absolute time, it is part of a + cache, and the TTL of the RR is compared against the timestamp + for the start of the request. + + Note that using the timestamp is superior to using a current + time, since it allows RRs with TTLs of zero to be entered in + the cache in the usual manner, but still used by the current + request, even after intervals of many seconds due to system + load, query retransmission timeouts, etc. + + - Some sort of parameters to limit the amount of work which will + be performed for this request. + + The amount of work which a resolver will do in response to a + client request must be limited to guard against errors in the + database, such as circular CNAME references, and operational + problems, such as network partition which prevents the + + + +Mockapetris [Page 43] + +RFC 1035 Domain Implementation and Specification November 1987 + + + resolver from accessing the name servers it needs. While + local limits on the number of times a resolver will retransmit + a particular query to a particular name server address are + essential, the resolver should have a global per-request + counter to limit work on a single request. The counter should + be set to some initial value and decremented whenever the + resolver performs any action (retransmission timeout, + retransmission, etc.) If the counter passes zero, the request + is terminated with a temporary error. + + Note that if the resolver structure allows one request to + start others in parallel, such as when the need to access a + name server for one request causes a parallel resolve for the + name server's addresses, the spawned request should be started + with a lower counter. This prevents circular references in + the database from starting a chain reaction of resolver + activity. + + - The SLIST data structure discussed in [RFC-1034]. + + This structure keeps track of the state of a request if it + must wait for answers from foreign name servers. + +7.2. Sending the queries + +As described in [RFC-1034], the basic task of the resolver is to +formulate a query which will answer the client's request and direct that +query to name servers which can provide the information. The resolver +will usually only have very strong hints about which servers to ask, in +the form of NS RRs, and may have to revise the query, in response to +CNAMEs, or revise the set of name servers the resolver is asking, in +response to delegation responses which point the resolver to name +servers closer to the desired information. In addition to the +information requested by the client, the resolver may have to call upon +its own services to determine the address of name servers it wishes to +contact. + +In any case, the model used in this memo assumes that the resolver is +multiplexing attention between multiple requests, some from the client, +and some internally generated. Each request is represented by some +state information, and the desired behavior is that the resolver +transmit queries to name servers in a way that maximizes the probability +that the request is answered, minimizes the time that the request takes, +and avoids excessive transmissions. The key algorithm uses the state +information of the request to select the next name server address to +query, and also computes a timeout which will cause the next action +should a response not arrive. The next action will usually be a +transmission to some other server, but may be a temporary error to the + + + +Mockapetris [Page 44] + +RFC 1035 Domain Implementation and Specification November 1987 + + +client. + +The resolver always starts with a list of server names to query (SLIST). +This list will be all NS RRs which correspond to the nearest ancestor +zone that the resolver knows about. To avoid startup problems, the +resolver should have a set of default servers which it will ask should +it have no current NS RRs which are appropriate. The resolver then adds +to SLIST all of the known addresses for the name servers, and may start +parallel requests to acquire the addresses of the servers when the +resolver has the name, but no addresses, for the name servers. + +To complete initialization of SLIST, the resolver attaches whatever +history information it has to the each address in SLIST. This will +usually consist of some sort of weighted averages for the response time +of the address, and the batting average of the address (i.e., how often +the address responded at all to the request). Note that this +information should be kept on a per address basis, rather than on a per +name server basis, because the response time and batting average of a +particular server may vary considerably from address to address. Note +also that this information is actually specific to a resolver address / +server address pair, so a resolver with multiple addresses may wish to +keep separate histories for each of its addresses. Part of this step +must deal with addresses which have no such history; in this case an +expected round trip time of 5-10 seconds should be the worst case, with +lower estimates for the same local network, etc. + +Note that whenever a delegation is followed, the resolver algorithm +reinitializes SLIST. + +The information establishes a partial ranking of the available name +server addresses. Each time an address is chosen and the state should +be altered to prevent its selection again until all other addresses have +been tried. The timeout for each transmission should be 50-100% greater +than the average predicted value to allow for variance in response. + +Some fine points: + + - The resolver may encounter a situation where no addresses are + available for any of the name servers named in SLIST, and + where the servers in the list are precisely those which would + normally be used to look up their own addresses. This + situation typically occurs when the glue address RRs have a + smaller TTL than the NS RRs marking delegation, or when the + resolver caches the result of a NS search. The resolver + should detect this condition and restart the search at the + next ancestor zone, or alternatively at the root. + + + + + +Mockapetris [Page 45] + +RFC 1035 Domain Implementation and Specification November 1987 + + + - If a resolver gets a server error or other bizarre response + from a name server, it should remove it from SLIST, and may + wish to schedule an immediate transmission to the next + candidate server address. + +7.3. Processing responses + +The first step in processing arriving response datagrams is to parse the +response. This procedure should include: + + - Check the header for reasonableness. Discard datagrams which + are queries when responses are expected. + + - Parse the sections of the message, and insure that all RRs are + correctly formatted. + + - As an optional step, check the TTLs of arriving data looking + for RRs with excessively long TTLs. If a RR has an + excessively long TTL, say greater than 1 week, either discard + the whole response, or limit all TTLs in the response to 1 + week. + +The next step is to match the response to a current resolver request. +The recommended strategy is to do a preliminary matching using the ID +field in the domain header, and then to verify that the question section +corresponds to the information currently desired. This requires that +the transmission algorithm devote several bits of the domain ID field to +a request identifier of some sort. This step has several fine points: + + - Some name servers send their responses from different + addresses than the one used to receive the query. That is, a + resolver cannot rely that a response will come from the same + address which it sent the corresponding query to. This name + server bug is typically encountered in UNIX systems. + + - If the resolver retransmits a particular request to a name + server it should be able to use a response from any of the + transmissions. However, if it is using the response to sample + the round trip time to access the name server, it must be able + to determine which transmission matches the response (and keep + transmission times for each outgoing message), or only + calculate round trip times based on initial transmissions. + + - A name server will occasionally not have a current copy of a + zone which it should have according to some NS RRs. The + resolver should simply remove the name server from the current + SLIST, and continue. + + + + +Mockapetris [Page 46] + +RFC 1035 Domain Implementation and Specification November 1987 + + +7.4. Using the cache + +In general, we expect a resolver to cache all data which it receives in +responses since it may be useful in answering future client requests. +However, there are several types of data which should not be cached: + + - When several RRs of the same type are available for a + particular owner name, the resolver should either cache them + all or none at all. When a response is truncated, and a + resolver doesn't know whether it has a complete set, it should + not cache a possibly partial set of RRs. + + - Cached data should never be used in preference to + authoritative data, so if caching would cause this to happen + the data should not be cached. + + - The results of an inverse query should not be cached. + + - The results of standard queries where the QNAME contains "*" + labels if the data might be used to construct wildcards. The + reason is that the cache does not necessarily contain existing + RRs or zone boundary information which is necessary to + restrict the application of the wildcard RRs. + + - RR data in responses of dubious reliability. When a resolver + receives unsolicited responses or RR data other than that + requested, it should discard it without caching it. The basic + implication is that all sanity checks on a packet should be + performed before any of it is cached. + +In a similar vein, when a resolver has a set of RRs for some name in a +response, and wants to cache the RRs, it should check its cache for +already existing RRs. Depending on the circumstances, either the data +in the response or the cache is preferred, but the two should never be +combined. If the data in the response is from authoritative data in the +answer section, it is always preferred. + +8. MAIL SUPPORT + +The domain system defines a standard for mapping mailboxes into domain +names, and two methods for using the mailbox information to derive mail +routing information. The first method is called mail exchange binding +and the other method is mailbox binding. The mailbox encoding standard +and mail exchange binding are part of the DNS official protocol, and are +the recommended method for mail routing in the Internet. Mailbox +binding is an experimental feature which is still under development and +subject to change. + + + + +Mockapetris [Page 47] + +RFC 1035 Domain Implementation and Specification November 1987 + + +The mailbox encoding standard assumes a mailbox name of the form +"<local-part>@<mail-domain>". While the syntax allowed in each of these +sections varies substantially between the various mail internets, the +preferred syntax for the ARPA Internet is given in [RFC-822]. + +The DNS encodes the <local-part> as a single label, and encodes the +<mail-domain> as a domain name. The single label from the <local-part> +is prefaced to the domain name from <mail-domain> to form the domain +name corresponding to the mailbox. Thus the mailbox HOSTMASTER@SRI- +NIC.ARPA is mapped into the domain name HOSTMASTER.SRI-NIC.ARPA. If the +<local-part> contains dots or other special characters, its +representation in a master file will require the use of backslash +quoting to ensure that the domain name is properly encoded. For +example, the mailbox Action.domains@ISI.EDU would be represented as +Action\.domains.ISI.EDU. + +8.1. Mail exchange binding + +Mail exchange binding uses the <mail-domain> part of a mailbox +specification to determine where mail should be sent. The <local-part> +is not even consulted. [RFC-974] specifies this method in detail, and +should be consulted before attempting to use mail exchange support. + +One of the advantages of this method is that it decouples mail +destination naming from the hosts used to support mail service, at the +cost of another layer of indirection in the lookup function. However, +the addition layer should eliminate the need for complicated "%", "!", +etc encodings in <local-part>. + +The essence of the method is that the <mail-domain> is used as a domain +name to locate type MX RRs which list hosts willing to accept mail for +<mail-domain>, together with preference values which rank the hosts +according to an order specified by the administrators for <mail-domain>. + +In this memo, the <mail-domain> ISI.EDU is used in examples, together +with the hosts VENERA.ISI.EDU and VAXA.ISI.EDU as mail exchanges for +ISI.EDU. If a mailer had a message for Mockapetris@ISI.EDU, it would +route it by looking up MX RRs for ISI.EDU. The MX RRs at ISI.EDU name +VENERA.ISI.EDU and VAXA.ISI.EDU, and type A queries can find the host +addresses. + +8.2. Mailbox binding (Experimental) + +In mailbox binding, the mailer uses the entire mail destination +specification to construct a domain name. The encoded domain name for +the mailbox is used as the QNAME field in a QTYPE=MAILB query. + +Several outcomes are possible for this query: + + + +Mockapetris [Page 48] + +RFC 1035 Domain Implementation and Specification November 1987 + + + 1. The query can return a name error indicating that the mailbox + does not exist as a domain name. + + In the long term, this would indicate that the specified + mailbox doesn't exist. However, until the use of mailbox + binding is universal, this error condition should be + interpreted to mean that the organization identified by the + global part does not support mailbox binding. The + appropriate procedure is to revert to exchange binding at + this point. + + 2. The query can return a Mail Rename (MR) RR. + + The MR RR carries new mailbox specification in its RDATA + field. The mailer should replace the old mailbox with the + new one and retry the operation. + + 3. The query can return a MB RR. + + The MB RR carries a domain name for a host in its RDATA + field. The mailer should deliver the message to that host + via whatever protocol is applicable, e.g., b,SMTP. + + 4. The query can return one or more Mail Group (MG) RRs. + + This condition means that the mailbox was actually a mailing + list or mail group, rather than a single mailbox. Each MG RR + has a RDATA field that identifies a mailbox that is a member + of the group. The mailer should deliver a copy of the + message to each member. + + 5. The query can return a MB RR as well as one or more MG RRs. + + This condition means the the mailbox was actually a mailing + list. The mailer can either deliver the message to the host + specified by the MB RR, which will in turn do the delivery to + all members, or the mailer can use the MG RRs to do the + expansion itself. + +In any of these cases, the response may include a Mail Information +(MINFO) RR. This RR is usually associated with a mail group, but is +legal with a MB. The MINFO RR identifies two mailboxes. One of these +identifies a responsible person for the original mailbox name. This +mailbox should be used for requests to be added to a mail group, etc. +The second mailbox name in the MINFO RR identifies a mailbox that should +receive error messages for mail failures. This is particularly +appropriate for mailing lists when errors in member names should be +reported to a person other than the one who sends a message to the list. + + + +Mockapetris [Page 49] + +RFC 1035 Domain Implementation and Specification November 1987 + + +New fields may be added to this RR in the future. + + +9. REFERENCES and BIBLIOGRAPHY + +[Dyer 87] S. Dyer, F. Hsu, "Hesiod", Project Athena + Technical Plan - Name Service, April 1987, version 1.9. + + Describes the fundamentals of the Hesiod name service. + +[IEN-116] J. Postel, "Internet Name Server", IEN-116, + USC/Information Sciences Institute, August 1979. + + A name service obsoleted by the Domain Name System, but + still in use. + +[Quarterman 86] J. Quarterman, and J. Hoskins, "Notable Computer Networks", + Communications of the ACM, October 1986, volume 29, number + 10. + +[RFC-742] K. Harrenstien, "NAME/FINGER", RFC-742, Network + Information Center, SRI International, December 1977. + +[RFC-768] J. Postel, "User Datagram Protocol", RFC-768, + USC/Information Sciences Institute, August 1980. + +[RFC-793] J. Postel, "Transmission Control Protocol", RFC-793, + USC/Information Sciences Institute, September 1981. + +[RFC-799] D. Mills, "Internet Name Domains", RFC-799, COMSAT, + September 1981. + + Suggests introduction of a hierarchy in place of a flat + name space for the Internet. + +[RFC-805] J. Postel, "Computer Mail Meeting Notes", RFC-805, + USC/Information Sciences Institute, February 1982. + +[RFC-810] E. Feinler, K. Harrenstien, Z. Su, and V. White, "DOD + Internet Host Table Specification", RFC-810, Network + Information Center, SRI International, March 1982. + + Obsolete. See RFC-952. + +[RFC-811] K. Harrenstien, V. White, and E. Feinler, "Hostnames + Server", RFC-811, Network Information Center, SRI + International, March 1982. + + + + +Mockapetris [Page 50] + +RFC 1035 Domain Implementation and Specification November 1987 + + + Obsolete. See RFC-953. + +[RFC-812] K. Harrenstien, and V. White, "NICNAME/WHOIS", RFC-812, + Network Information Center, SRI International, March + 1982. + +[RFC-819] Z. Su, and J. Postel, "The Domain Naming Convention for + Internet User Applications", RFC-819, Network + Information Center, SRI International, August 1982. + + Early thoughts on the design of the domain system. + Current implementation is completely different. + +[RFC-821] J. Postel, "Simple Mail Transfer Protocol", RFC-821, + USC/Information Sciences Institute, August 1980. + +[RFC-830] Z. Su, "A Distributed System for Internet Name Service", + RFC-830, Network Information Center, SRI International, + October 1982. + + Early thoughts on the design of the domain system. + Current implementation is completely different. + +[RFC-882] P. Mockapetris, "Domain names - Concepts and + Facilities," RFC-882, USC/Information Sciences + Institute, November 1983. + + Superceeded by this memo. + +[RFC-883] P. Mockapetris, "Domain names - Implementation and + Specification," RFC-883, USC/Information Sciences + Institute, November 1983. + + Superceeded by this memo. + +[RFC-920] J. Postel and J. Reynolds, "Domain Requirements", + RFC-920, USC/Information Sciences Institute, + October 1984. + + Explains the naming scheme for top level domains. + +[RFC-952] K. Harrenstien, M. Stahl, E. Feinler, "DoD Internet Host + Table Specification", RFC-952, SRI, October 1985. + + Specifies the format of HOSTS.TXT, the host/address + table replaced by the DNS. + + + + + +Mockapetris [Page 51] + +RFC 1035 Domain Implementation and Specification November 1987 + + +[RFC-953] K. Harrenstien, M. Stahl, E. Feinler, "HOSTNAME Server", + RFC-953, SRI, October 1985. + + This RFC contains the official specification of the + hostname server protocol, which is obsoleted by the DNS. + This TCP based protocol accesses information stored in + the RFC-952 format, and is used to obtain copies of the + host table. + +[RFC-973] P. Mockapetris, "Domain System Changes and + Observations", RFC-973, USC/Information Sciences + Institute, January 1986. + + Describes changes to RFC-882 and RFC-883 and reasons for + them. + +[RFC-974] C. Partridge, "Mail routing and the domain system", + RFC-974, CSNET CIC BBN Labs, January 1986. + + Describes the transition from HOSTS.TXT based mail + addressing to the more powerful MX system used with the + domain system. + +[RFC-1001] NetBIOS Working Group, "Protocol standard for a NetBIOS + service on a TCP/UDP transport: Concepts and Methods", + RFC-1001, March 1987. + + This RFC and RFC-1002 are a preliminary design for + NETBIOS on top of TCP/IP which proposes to base NetBIOS + name service on top of the DNS. + +[RFC-1002] NetBIOS Working Group, "Protocol standard for a NetBIOS + service on a TCP/UDP transport: Detailed + Specifications", RFC-1002, March 1987. + +[RFC-1010] J. Reynolds, and J. Postel, "Assigned Numbers", RFC-1010, + USC/Information Sciences Institute, May 1987. + + Contains socket numbers and mnemonics for host names, + operating systems, etc. + +[RFC-1031] W. Lazear, "MILNET Name Domain Transition", RFC-1031, + November 1987. + + Describes a plan for converting the MILNET to the DNS. + +[RFC-1032] M. Stahl, "Establishing a Domain - Guidelines for + Administrators", RFC-1032, November 1987. + + + +Mockapetris [Page 52] + +RFC 1035 Domain Implementation and Specification November 1987 + + + Describes the registration policies used by the NIC to + administer the top level domains and delegate subzones. + +[RFC-1033] M. Lottor, "Domain Administrators Operations Guide", + RFC-1033, November 1987. + + A cookbook for domain administrators. + +[Solomon 82] M. Solomon, L. Landweber, and D. Neuhengen, "The CSNET + Name Server", Computer Networks, vol 6, nr 3, July 1982. + + Describes a name service for CSNET which is independent + from the DNS and DNS use in the CSNET. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Mockapetris [Page 53] + +RFC 1035 Domain Implementation and Specification November 1987 + + +Index + + * 13 + + ; 33, 35 + + <character-string> 35 + <domain-name> 34 + + @ 35 + + \ 35 + + A 12 + + Byte order 8 + + CH 13 + Character case 9 + CLASS 11 + CNAME 12 + Completion 42 + CS 13 + + Hesiod 13 + HINFO 12 + HS 13 + + IN 13 + IN-ADDR.ARPA domain 22 + Inverse queries 40 + + Mailbox names 47 + MB 12 + MD 12 + MF 12 + MG 12 + MINFO 12 + MINIMUM 20 + MR 12 + MX 12 + + NS 12 + NULL 12 + + Port numbers 32 + Primary server 5 + PTR 12, 18 + + + +Mockapetris [Page 54] + +RFC 1035 Domain Implementation and Specification November 1987 + + + QCLASS 13 + QTYPE 12 + + RDATA 12 + RDLENGTH 11 + + Secondary server 5 + SOA 12 + Stub resolvers 7 + + TCP 32 + TXT 12 + TYPE 11 + + UDP 32 + + WKS 12 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Mockapetris [Page 55] + diff --git a/doc/rfc/rfc1101.txt b/doc/rfc/rfc1101.txt new file mode 100644 index 00000000..66c9d8b8 --- /dev/null +++ b/doc/rfc/rfc1101.txt @@ -0,0 +1,787 @@ + + + + + + +Network Working Group P. Mockapetris +Request for Comments: 1101 ISI +Updates: RFCs 1034, 1035 April 1989 + + + DNS Encoding of Network Names and Other Types + + +1. STATUS OF THIS MEMO + + This RFC proposes two extensions to the Domain Name System: + + - A specific method for entering and retrieving RRs which map + between network names and numbers. + + - Ideas for a general method for describing mappings between + arbitrary identifiers and numbers. + + The method for mapping between network names and addresses is a + proposed standard, the ideas for a general method are experimental. + + This RFC assumes that the reader is familiar with the DNS [RFC 1034, + RFC 1035] and its use. The data shown is for pedagogical use and + does not necessarily reflect the real Internet. + + Distribution of this memo is unlimited. + +2. INTRODUCTION + + The DNS is extensible and can be used for a virtually unlimited + number of data types, name spaces, etc. New type definitions are + occasionally necessary as are revisions or deletions of old types + (e.g., MX replacement of MD and MF [RFC 974]), and changes described + in [RFC 973]. This RFC describes changes due to the general need to + map between identifiers and values, and a specific need for network + name support. + + Users wish to be able to use the DNS to map between network names and + numbers. This need is the only capability found in HOSTS.TXT which + is not available from the DNS. In designing a method to do this, + there were two major areas of concern: + + - Several tradeoffs involving control of network names, the + syntax of network names, backward compatibility, etc. + + - A desire to create a method which would be sufficiently + general to set a good precedent for future mappings, + for example, between TCP-port names and numbers, + + + +Mockapetris [Page 1] + +RFC 1101 DNS Encoding of Network Names and Other Types April 1989 + + + autonomous system names and numbers, X.500 Relative + Distinguished Names (RDNs) and their servers, or whatever. + + It was impossible to reconcile these two areas of concern for network + names because of the desire to unify network number support within + existing IP address to host name support. The existing support is + the IN-ADDR.ARPA section of the DNS name space. As a result this RFC + describes one structure for network names which builds on the + existing support for host names, and another family of structures for + future yellow pages (YP) functions such as conversions between TCP- + port numbers and mnemonics. + + Both structures are described in following sections. Each structure + has a discussion of design issues and specific structure + recommendations. + + We wish to avoid defining structures and methods which can work but + do not because of indifference or errors on the part of system + administrators when maintaining the database. The WKS RR is an + example. Thus, while we favor distribution as a general method, we + also recognize that centrally maintained tables (such as HOSTS.TXT) + are usually more consistent though less maintainable and timely. + Hence we recommend both specific methods for mapping network names, + addresses, and subnets, as well as an instance of the general method + for mapping between allocated network numbers and network names. + (Allocation is centrally performed by the SRI Network Information + Center, aka the NIC). + +3. NETWORK NAME ISSUES AND DISCUSSION + + The issues involved in the design were the definition of network name + syntax, the mappings to be provided, and possible support for similar + functions at the subnet level. + +3.1. Network name syntax + + The current syntax for network names, as defined by [RFC 952] is an + alphanumeric string of up to 24 characters, which begins with an + alpha, and may include "." and "-" except as first and last + characters. This is the format which was also used for host names + before the DNS. Upward compatibility with existing names might be a + goal of any new scheme. + + However, the present syntax has been used to define a flat name + space, and hence would prohibit the same distributed name allocation + method used for host names. There is some sentiment for allowing the + NIC to continue to allocate and regulate network names, much as it + allocates numbers, but the majority opinion favors local control of + + + +Mockapetris [Page 2] + +RFC 1101 DNS Encoding of Network Names and Other Types April 1989 + + + network names. Although it would be possible to provide a flat space + or a name space in which, for example, the last label of a domain + name captured the old-style network name, any such approach would add + complexity to the method and create different rules for network names + and host names. + + For these reasons, we assume that the syntax of network names will be + the same as the expanded syntax for host names permitted in [HR]. + The new syntax expands the set of names to allow leading digits, so + long as the resulting representations do not conflict with IP + addresses in decimal octet form. For example, 3Com.COM and 3M.COM + are now legal, although 26.0.0.73.COM is not. See [HR] for details. + + The price is that network names will get as complicated as host + names. An administrator will be able to create network names in any + domain under his control, and also create network number to name + entries in IN-ADDR.ARPA domains under his control. Thus, the name + for the ARPANET might become NET.ARPA, ARPANET.ARPA or Arpa- + network.MIL., depending on the preferences of the owner. + +3.2. Mappings + + The desired mappings, ranked by priority with most important first, + are: + + - Mapping a IP address or network number to a network name. + + This mapping is for use in debugging tools and status displays + of various sorts. The conversion from IP address to network + number is well known for class A, B, and C IP addresses, and + involves a simple mask operation. The needs of other classes + are not yet defined and are ignored for the rest of this RFC. + + - Mapping a network name to a network address. + + This facility is of less obvious application, but a + symmetrical mapping seems desirable. + + - Mapping an organization to its network names and numbers. + + This facility is useful because it may not always be possible + to guess the local choice for network names, but the + organization name is often well known. + + - Similar mappings for subnets, even when nested. + + The primary application is to be able to identify all of the + subnets involved in a particular IP address. A secondary + + + +Mockapetris [Page 3] + +RFC 1101 DNS Encoding of Network Names and Other Types April 1989 + + + requirement is to retrieve address mask information. + +3.3. Network address section of the name space + + The network name syntax discussed above can provide domain names + which will contain mappings from network names to various quantities, + but we also need a section of the name space, organized by network + and subnet number to hold the inverse mappings. + + The choices include: + + - The same network number slots already assigned and delegated + in the IN-ADDR.ARPA section of the name space. + + For example, 10.IN-ADDR.ARPA for class A net 10, + 2.128.IN-ADDR.ARPA for class B net 128.2, etc. + + - Host-zero addresses in the IN-ADDR.ARPA tree. (A host field + of all zero in an IP address is prohibited because of + confusion related to broadcast addresses, et al.) + + For example, 0.0.0.10.IN-ADDR.ARPA for class A net 10, + 0.0.2.128.IN-ADDR.arpa for class B net 128.2, etc. Like the + first scheme, it uses in-place name space delegations to + distribute control. + + The main advantage of this scheme over the first is that it + allows convenient names for subnets as well as networks. A + secondary advantage is that it uses names which are not in use + already, and hence it is possible to test whether an + organization has entered this information in its domain + database. + + - Some new section of the name space. + + While this option provides the most opportunities, it creates + a need to delegate a whole new name space. Since the IP + address space is so closely related to the network number + space, most believe that the overhead of creating such a new + space is overwhelming and would lead to the WKS syndrome. (As + of February, 1989, approximately 400 sections of the + IN-ADDR.ARPA tree are already delegated, usually at network + boundaries.) + + + + + + + + +Mockapetris [Page 4] + +RFC 1101 DNS Encoding of Network Names and Other Types April 1989 + + +4. SPECIFICS FOR NETWORK NAME MAPPINGS + + The proposed solution uses information stored at: + + - Names in the IN-ADDR.ARPA tree that correspond to host-zero IP + addresses. The same method is used for subnets in a nested + fashion. For example, 0.0.0.10.IN-ADDR.ARPA. for net 10. + + Two types of information are stored here: PTR RRs which point + to the network name in their data sections, and A RRs, which + are present if the network (or subnet) is subnetted further. + If a type A RR is present, then it has the address mask as its + data. The general form is: + + <reversed-host-zero-number>.IN-ADDR.ARPA. PTR <network-name> + <reversed-host-zero-number>.IN-ADDR.ARPA. A <subnet-mask> + + For example: + + 0.0.0.10.IN-ADDR.ARPA. PTR ARPANET.ARPA. + + or + + 0.0.2.128.IN-ADDR.ARPA. PTR cmu-net.cmu.edu. + A 255.255.255.0 + + In general, this information will be added to an existing + master file for some IN-ADDR.ARPA domain for each network + involved. Similar RRs can be used at host-zero subnet + entries. + + - Names which are network names. + + The data stored here is PTR RRs pointing at the host-zero + entries. The general form is: + + <network-name> ptr <reversed-host-zero-number>.IN-ADDR.ARPA + + For example: + + ARPANET.ARPA. PTR 0.0.0.10.IN-ADDR.ARPA. + + or + + isi-net.isi.edu. PTR 0.0.9.128.IN-ADDR.ARPA. + + In general, this information will be inserted in the master + file for the domain name of the organization; this is a + + + +Mockapetris [Page 5] + +RFC 1101 DNS Encoding of Network Names and Other Types April 1989 + + + different file from that which holds the information below + IN-ADDR.ARPA. Similar PTR RRs can be used at subnet names. + + - Names corresponding to organizations. + + The data here is one or more PTR RRs pointing at the + IN-ADDR.ARPA names corresponding to host-zero entries for + networks. + + For example: + + ISI.EDU. PTR 0.0.9.128.IN-ADDR.ARPA. + + MCC.COM. PTR 0.167.5.192.IN-ADDR.ARPA. + PTR 0.168.5.192.IN-ADDR.ARPA. + PTR 0.169.5.192.IN-ADDR.ARPA. + PTR 0.0.62.128.IN-ADDR.ARPA. + +4.1. A simple example + + The ARPANET is a Class A network without subnets. The RRs which + would be added, assuming the ARPANET.ARPA was selected as a network + name, would be: + + ARPA. PTR 0.0.0.10.IN-ADDR.ARPA. + + ARPANET.ARPA. PTR 0.0.0.10.IN-ADDR.ARPA. + + 0.0.0.10.IN-ADDR.ARPA. PTR ARPANET.ARPA. + + The first RR states that the organization named ARPA owns net 10 (It + might also own more network numbers, and these would be represented + with an additional RR per net.) The second states that the network + name ARPANET.ARPA. maps to net 10. The last states that net 10 is + named ARPANET.ARPA. + + Note that all of the usual host and corresponding IN-ADDR.ARPA + entries would still be required. + +4.2. A complicated, subnetted example + + The ISI network is 128.9, a class B number. Suppose the ISI network + was organized into two levels of subnet, with the first level using + an additional 8 bits of address, and the second level using 4 bits, + for address masks of x'FFFFFF00' and X'FFFFFFF0'. + + Then the following RRs would be entered in ISI's master file for the + ISI.EDU zone: + + + +Mockapetris [Page 6] + +RFC 1101 DNS Encoding of Network Names and Other Types April 1989 + + + ; Define network entry + isi-net.isi.edu. PTR 0.0.9.128.IN-ADDR.ARPA. + + ; Define first level subnets + div1-subnet.isi.edu. PTR 0.1.9.128.IN-ADDR.ARPA. + div2-subnet.isi.edu. PTR 0.2.9.128.IN-ADDR.ARPA. + + ; Define second level subnets + inc-subsubnet.isi.edu. PTR 16.2.9.128.IN-ADDR.ARPA. + + in the 9.128.IN-ADDR.ARPA zone: + + ; Define network number and address mask + 0.0.9.128.IN-ADDR.ARPA. PTR isi-net.isi.edu. + A 255.255.255.0 ;aka X'FFFFFF00' + + ; Define one of the first level subnet numbers and masks + 0.1.9.128.IN-ADDR.ARPA. PTR div1-subnet.isi.edu. + A 255.255.255.240 ;aka X'FFFFFFF0' + + ; Define another first level subnet number and mask + 0.2.9.128.IN-ADDR.ARPA. PTR div2-subnet.isi.edu. + A 255.255.255.240 ;aka X'FFFFFFF0' + + ; Define second level subnet number + 16.2.9.128.IN-ADDR.ARPA. PTR inc-subsubnet.isi.edu. + + This assumes that the ISI network is named isi-net.isi.edu., first + level subnets are named div1-subnet.isi.edu. and div2- + subnet.isi.edu., and a second level subnet is called inc- + subsubnet.isi.edu. (In a real system as complicated as this there + would be more first and second level subnets defined, but we have + shown enough to illustrate the ideas.) + +4.3. Procedure for using an IP address to get network name + + Depending on whether the IP address is class A, B, or C, mask off the + high one, two, or three bytes, respectively. Reverse the octets, + suffix IN-ADDR.ARPA, and do a PTR query. + + For example, suppose the IP address is 10.0.0.51. + + 1. Since this is a class A address, use a mask x'FF000000' and + get 10.0.0.0. + + 2. Construct the name 0.0.0.10.IN-ADDR.ARPA. + + 3. Do a PTR query. Get back + + + +Mockapetris [Page 7] + +RFC 1101 DNS Encoding of Network Names and Other Types April 1989 + + + 0.0.0.10.IN-ADDR.ARPA. PTR ARPANET.ARPA. + + 4. Conclude that the network name is "ARPANET.ARPA." + + Suppose that the IP address is 128.9.2.17. + + 1. Since this is a class B address, use a mask of x'FFFF0000' + and get 128.9.0.0. + + 2. Construct the name 0.0.9.128.IN-ADDR.ARPA. + + 3. Do a PTR query. Get back + + 0.0.9.128.IN-ADDR.ARPA. PTR isi-net.isi.edu + + 4. Conclude that the network name is "isi-net.isi.edu." + +4.4. Procedure for finding all subnets involved with an IP address + + This is a simple extension of the IP address to network name method. + When the network entry is located, do a lookup for a possible A RR. + If the A RR is found, look up the next level of subnet using the + original IP address and the mask in the A RR. Repeat this procedure + until no A RR is found. + + For example, repeating the use of 128.9.2.17. + + 1. As before construct a query for 0.0.9.128.IN-ADDR.ARPA. + Retrieve: + + 0.0.9.128.IN-ADDR.ARPA. PTR isi-net.isi.edu. + A 255.255.255.0 + + 2. Since an A RR was found, repeat using mask from RR + (255.255.255.0), constructing a query for + 0.2.9.128.IN-ADDR.ARPA. Retrieve: + + 0.2.9.128.IN-ADDR.ARPA. PTR div2-subnet.isi.edu. + A 255.255.255.240 + + 3. Since another A RR was found, repeat using mask + 255.255.255.240 (x'FFFFFFF0'). constructing a query for + 16.2.9.128.IN-ADDR.ARPA. Retrieve: + + 16.2.9.128.IN-ADDR.ARPA. PTR inc-subsubnet.isi.edu. + + 4. Since no A RR is present at 16.2.9.128.IN-ADDR.ARPA., there + are no more subnet levels. + + + +Mockapetris [Page 8] + +RFC 1101 DNS Encoding of Network Names and Other Types April 1989 + + +5. YP ISSUES AND DISCUSSION + + The term "Yellow Pages" is used in almost as many ways as the term + "domain", so it is useful to define what is meant herein by YP. The + general problem to be solved is to create a method for creating + mappings from one kind of identifier to another, often with an + inverse capability. The traditional methods are to search or use a + precomputed index of some kind. + + Searching is impractical when the search is too large, and + precomputed indexes are possible only when it is possible to specify + search criteria in advance, and pay for the resources necessary to + build the index. For example, it is impractical to search the entire + domain tree to find a particular address RR, so we build the IN- + ADDR.ARPA YP. Similarly, we could never build an Internet-wide index + of "hosts with a load average of less than 2" in less time than it + would take for the data to change, so indexes are a useless approach + for that problem. + + Such a precomputed index is what we mean by YP, and we regard the + IN-ADDR.ARPA domain as the first instance of a YP in the DNS. + Although a single, centrally-managed YP for well-known values such as + TCP-port is desirable, we regard organization-specific YPs for, say, + locally defined TCP ports as a natural extension, as are combinations + of YPs using search lists to merge the two. + + In examining Internet Numbers [RFC 997] and Assigned Numbers [RFC + 1010], it is clear that there are several mappings which might be of + value. For example: + + <assigned-network-name> <==> <IP-address> + <autonomous-system-id> <==> <number> + <protocol-id> <==> <number> + <port-id> <==> <number> + <ethernet-type> <==> <number> + <public-data-net> <==> <IP-address> + + Following the IN-ADDR example, the YP takes the form of a domain tree + organized to optimize retrieval by search key and distribution via + normal DNS rules. The name used as a key must include: + + 1. A well known origin. For example, IN-ADDR.ARPA is the + current IP-address to host name YP. + + 2. A "from" data type. This identifies the input type of the + mapping. This is necessary because we may be mapping + something as anonymous as a number to any number of + mnemonics, etc. + + + +Mockapetris [Page 9] + +RFC 1101 DNS Encoding of Network Names and Other Types April 1989 + + + 3. A "to" data type. Since we assume several symmetrical + mnemonic <==> number mappings, this is also necessary. + + This ordering reflects the natural scoping of control, and hence the + order of the components in a domain name. Thus domain names would be + of the form: + + <from-value>.<to-data-type>.<from-data-type>.<YP-origin> + + To make this work, we need to define well-know strings for each of + these metavariables, as well as encoding rules for converting a + <from-value> into a domain name. We might define: + + <YP-origin> :=YP + <from-data-type>:=TCP-port | IN-ADDR | Number | + Assigned-network-number | Name + <to-data-type> :=<from-data-type> + + Note that "YP" is NOT a valid country code under [ISO 3166] (although + we may want to worry about the future), and the existence of a + syntactically valid <to-data-type>.<from-data-type> pair does not + imply that a meaningful mapping exists, or is even possible. + + The encoding rules might be: + + TCP-port Six character alphanumeric + + IN-ADDR Reversed 4-octet decimal string + + Number decimal integer + + Assigned-network-number + Reversed 4-octet decimal string + + Name Domain name + +6. SPECIFICS FOR YP MAPPINGS + +6.1. TCP-PORT + + $origin Number.TCP-port.YP. + + 23 PTR TELNET.TCP-port.Number.YP. + 25 PTR SMTP.TCP-port.Number.YP. + + $origin TCP-port.Number.YP. + + TELNET PTR 23.Number.TCP-port.YP. + + + +Mockapetris [Page 10] + +RFC 1101 DNS Encoding of Network Names and Other Types April 1989 + + + SMTP PTR 25.Number.TCP-port.YP. + + Thus the mapping between 23 and TELNET is represented by a pair of + PTR RRs, one for each direction of the mapping. + +6.2. Assigned networks + + Network numbers are assigned by the NIC and reported in "Internet + Numbers" RFCs. To create a YP, the NIC would set up two domains: + + Name.Assigned-network-number.YP and Assigned-network-number.YP + + The first would contain entries of the form: + + $origin Name.Assigned-network-number.YP. + + 0.0.0.4 PTR SATNET.Assigned-network-number.Name.YP. + 0.0.0.10 PTR ARPANET.Assigned-network-number.Name.YP. + + The second would contain entries of the form: + + $origin Assigned-network-number.Name.YP. + + SATNET. PTR 0.0.0.4.Name.Assigned-network-number.YP. + ARPANET. PTR 0.0.0.10.Name.Assigned-network-number.YP. + + These YPs are not in conflict with the network name support described + in the first half of this RFC since they map between ASSIGNED network + names and numbers, not those allocated by the organizations + themselves. That is, they document the NIC's decisions about + allocating network numbers but do not automatically track any + renaming performed by the new owners. + + As a practical matter, we might want to create both of these domains + to enable users on the Internet to experiment with centrally + maintained support as well as the distributed version, or might want + to implement only the allocated number to name mapping and request + organizations to convert their allocated network names to the network + names described in the distributed model. + +6.3. Operational improvements + + We could imagine that all conversion routines using these YPs might + be instructed to use "YP.<local-domain>" followed by "YP." as a + search list. Thus, if the organization ISI.EDU wished to define + locally meaningful TCP-PORT, it would define the domains: + + <TCP-port.Number.YP.ISI.EDU> and <Number.TCP-port.YP.ISI.EDU>. + + + +Mockapetris [Page 11] + +RFC 1101 DNS Encoding of Network Names and Other Types April 1989 + + + We could add another level of indirection in the YP lookup, defining + the <to-data-type>.<from-data-type>.<YP-origin> nodes to point to the + YP tree, rather than being the YP tree directly. This would enable + entries of the form: + + IN-ADDR.Netname.YP. PTR IN-ADDR.ARPA. + + to splice in YPs from other origins or existing spaces. + + Another possibility would be to shorten the RDATA section of the RRs + which map back and forth by deleting the origin. This could be done + either by allowing the domain name in the RDATA portion to not + identify a real domain name, or by defining a new RR which used a + simple text string rather than a domain name. + + Thus, we might replace + + $origin Assigned-network-number.Name.YP. + + SATNET. PTR 0.0.0.4.Name.Assigned-network-number.YP. + ARPANET. PTR 0.0.0.10.Name.Assigned-network-number.YP. + + with + + $origin Assigned-network-number.Name.YP. + + SATNET. PTR 0.0.0.4. + ARPANET. PTR 0.0.0.10. + + or + + $origin Assigned-network-number.Name.YP. + + SATNET. PTT "0.0.0.4" + ARPANET. PTT "0.0.0.10" + + where PTT is a new type whose RDATA section is a text string. + +7. ACKNOWLEDGMENTS + + Drew Perkins, Mark Lottor, and Rob Austein contributed several of the + ideas in this RFC. Numerous contributions, criticisms, and + compromises were produced in the IETF Domain working group and the + NAMEDROPPERS mailing list. + + + + + + + +Mockapetris [Page 12] + +RFC 1101 DNS Encoding of Network Names and Other Types April 1989 + + +8. REFERENCES + + [HR] Braden, B., editor, "Requirements for Internet Hosts", + RFC in preparation. + + [ISO 3166] ISO, "Codes for the Representation of Names of + Countries", 1981. + + [RFC 882] Mockapetris, P., "Domain names - Concepts and + Facilities", RFC 882, USC/Information Sciences Institute, + November 1983. + + Superseded by RFC 1034. + + [RFC 883] Mockapetris, P.,"Domain names - Implementation and + Specification", RFC 883, USC/Information Sciences + Institute, November 1983. + + Superceeded by RFC 1035. + + [RFC 920] Postel, J. and J. Reynolds, "Domain Requirements", RFC + 920, October 1984. + + Explains the naming scheme for top level domains. + + [RFC 952] Harrenstien, K., M. Stahl, and E. Feinler, "DoD Internet + Host Table Specification", RFC 952, SRI, October 1985. + + Specifies the format of HOSTS.TXT, the host/address table + replaced by the DNS + + [RFC 973] Mockapetris, P., "Domain System Changes and + Observations", RFC 973, USC/Information Sciences + Institute, January 1986. + + Describes changes to RFCs 882 and 883 and reasons for + them. + + [RFC 974] Partridge, C., "Mail routing and the domain system", RFC + 974, CSNET CIC BBN Labs, January 1986. + + Describes the transition from HOSTS.TXT based mail + addressing to the more powerful MX system used with the + domain system. + + + + + + + +Mockapetris [Page 13] + +RFC 1101 DNS Encoding of Network Names and Other Types April 1989 + + + [RFC 997] Reynolds, J., and J. Postel, "Internet Numbers", RFC 997, + USC/Information Sciences Institute, March 1987 + + Contains network numbers, autonomous system numbers, etc. + + [RFC 1010] Reynolds, J., and J. Postel, "Assigned Numbers", RFC + 1010, USC/Information Sciences Institute, May 1987 + + Contains socket numbers and mnemonics for host names, + operating systems, etc. + + + [RFC 1034] Mockapetris, P., "Domain names - Concepts and + Facilities", RFC 1034, USC/Information Sciences + Institute, November 1987. + + Introduction/overview of the DNS. + + [RFC 1035] Mockapetris, P., "Domain names - Implementation and + Specification", RFC 1035, USC/Information Sciences + Institute, November 1987. + + DNS implementation instructions. + +Author's Address: + + Paul Mockapetris + USC/Information Sciences Institute + 4676 Admiralty Way + Marina del Rey, CA 90292 + + Phone: (213) 822-1511 + + Email: PVM@ISI.EDU + + + + + + + + + + + + + + + + + +Mockapetris [Page 14] +
\ No newline at end of file diff --git a/doc/rfc/rfc1122.txt b/doc/rfc/rfc1122.txt new file mode 100644 index 00000000..c14f2e50 --- /dev/null +++ b/doc/rfc/rfc1122.txt @@ -0,0 +1,6844 @@ + + + + + + +Network Working Group Internet Engineering Task Force +Request for Comments: 1122 R. Braden, Editor + October 1989 + + + Requirements for Internet Hosts -- Communication Layers + + +Status of This Memo + + This RFC is an official specification for the Internet community. It + incorporates by reference, amends, corrects, and supplements the + primary protocol standards documents relating to hosts. Distribution + of this document is unlimited. + +Summary + + This is one RFC of a pair that defines and discusses the requirements + for Internet host software. This RFC covers the communications + protocol layers: link layer, IP layer, and transport layer; its + companion RFC-1123 covers the application and support protocols. + + + + Table of Contents + + + + + 1. INTRODUCTION ............................................... 5 + 1.1 The Internet Architecture .............................. 6 + 1.1.1 Internet Hosts .................................... 6 + 1.1.2 Architectural Assumptions ......................... 7 + 1.1.3 Internet Protocol Suite ........................... 8 + 1.1.4 Embedded Gateway Code ............................. 10 + 1.2 General Considerations ................................. 12 + 1.2.1 Continuing Internet Evolution ..................... 12 + 1.2.2 Robustness Principle .............................. 12 + 1.2.3 Error Logging ..................................... 13 + 1.2.4 Configuration ..................................... 14 + 1.3 Reading this Document .................................. 15 + 1.3.1 Organization ...................................... 15 + 1.3.2 Requirements ...................................... 16 + 1.3.3 Terminology ....................................... 17 + 1.4 Acknowledgments ........................................ 20 + + 2. LINK LAYER .................................................. 21 + 2.1 INTRODUCTION ........................................... 21 + + + +Internet Engineering Task Force [Page 1] + + + + +RFC1122 INTRODUCTION October 1989 + + + 2.2 PROTOCOL WALK-THROUGH .................................. 21 + 2.3 SPECIFIC ISSUES ........................................ 21 + 2.3.1 Trailer Protocol Negotiation ...................... 21 + 2.3.2 Address Resolution Protocol -- ARP ................ 22 + 2.3.2.1 ARP Cache Validation ......................... 22 + 2.3.2.2 ARP Packet Queue ............................. 24 + 2.3.3 Ethernet and IEEE 802 Encapsulation ............... 24 + 2.4 LINK/INTERNET LAYER INTERFACE .......................... 25 + 2.5 LINK LAYER REQUIREMENTS SUMMARY ........................ 26 + + 3. INTERNET LAYER PROTOCOLS .................................... 27 + 3.1 INTRODUCTION ............................................ 27 + 3.2 PROTOCOL WALK-THROUGH .................................. 29 + 3.2.1 Internet Protocol -- IP ............................ 29 + 3.2.1.1 Version Number ............................... 29 + 3.2.1.2 Checksum ..................................... 29 + 3.2.1.3 Addressing ................................... 29 + 3.2.1.4 Fragmentation and Reassembly ................. 32 + 3.2.1.5 Identification ............................... 32 + 3.2.1.6 Type-of-Service .............................. 33 + 3.2.1.7 Time-to-Live ................................. 34 + 3.2.1.8 Options ...................................... 35 + 3.2.2 Internet Control Message Protocol -- ICMP .......... 38 + 3.2.2.1 Destination Unreachable ...................... 39 + 3.2.2.2 Redirect ..................................... 40 + 3.2.2.3 Source Quench ................................ 41 + 3.2.2.4 Time Exceeded ................................ 41 + 3.2.2.5 Parameter Problem ............................ 42 + 3.2.2.6 Echo Request/Reply ........................... 42 + 3.2.2.7 Information Request/Reply .................... 43 + 3.2.2.8 Timestamp and Timestamp Reply ................ 43 + 3.2.2.9 Address Mask Request/Reply ................... 45 + 3.2.3 Internet Group Management Protocol IGMP ........... 47 + 3.3 SPECIFIC ISSUES ........................................ 47 + 3.3.1 Routing Outbound Datagrams ........................ 47 + 3.3.1.1 Local/Remote Decision ........................ 47 + 3.3.1.2 Gateway Selection ............................ 48 + 3.3.1.3 Route Cache .................................. 49 + 3.3.1.4 Dead Gateway Detection ....................... 51 + 3.3.1.5 New Gateway Selection ........................ 55 + 3.3.1.6 Initialization ............................... 56 + 3.3.2 Reassembly ........................................ 56 + 3.3.3 Fragmentation ..................................... 58 + 3.3.4 Local Multihoming ................................. 60 + 3.3.4.1 Introduction ................................. 60 + 3.3.4.2 Multihoming Requirements ..................... 61 + 3.3.4.3 Choosing a Source Address .................... 64 + 3.3.5 Source Route Forwarding ........................... 65 + + + +Internet Engineering Task Force [Page 2] + + + + +RFC1122 INTRODUCTION October 1989 + + + 3.3.6 Broadcasts ........................................ 66 + 3.3.7 IP Multicasting ................................... 67 + 3.3.8 Error Reporting ................................... 69 + 3.4 INTERNET/TRANSPORT LAYER INTERFACE ..................... 69 + 3.5 INTERNET LAYER REQUIREMENTS SUMMARY .................... 72 + + 4. TRANSPORT PROTOCOLS ......................................... 77 + 4.1 USER DATAGRAM PROTOCOL -- UDP .......................... 77 + 4.1.1 INTRODUCTION ...................................... 77 + 4.1.2 PROTOCOL WALK-THROUGH ............................. 77 + 4.1.3 SPECIFIC ISSUES ................................... 77 + 4.1.3.1 Ports ........................................ 77 + 4.1.3.2 IP Options ................................... 77 + 4.1.3.3 ICMP Messages ................................ 78 + 4.1.3.4 UDP Checksums ................................ 78 + 4.1.3.5 UDP Multihoming .............................. 79 + 4.1.3.6 Invalid Addresses ............................ 79 + 4.1.4 UDP/APPLICATION LAYER INTERFACE ................... 79 + 4.1.5 UDP REQUIREMENTS SUMMARY .......................... 80 + 4.2 TRANSMISSION CONTROL PROTOCOL -- TCP ................... 82 + 4.2.1 INTRODUCTION ...................................... 82 + 4.2.2 PROTOCOL WALK-THROUGH ............................. 82 + 4.2.2.1 Well-Known Ports ............................. 82 + 4.2.2.2 Use of Push .................................. 82 + 4.2.2.3 Window Size .................................. 83 + 4.2.2.4 Urgent Pointer ............................... 84 + 4.2.2.5 TCP Options .................................. 85 + 4.2.2.6 Maximum Segment Size Option .................. 85 + 4.2.2.7 TCP Checksum ................................. 86 + 4.2.2.8 TCP Connection State Diagram ................. 86 + 4.2.2.9 Initial Sequence Number Selection ............ 87 + 4.2.2.10 Simultaneous Open Attempts .................. 87 + 4.2.2.11 Recovery from Old Duplicate SYN ............. 87 + 4.2.2.12 RST Segment ................................. 87 + 4.2.2.13 Closing a Connection ........................ 87 + 4.2.2.14 Data Communication .......................... 89 + 4.2.2.15 Retransmission Timeout ...................... 90 + 4.2.2.16 Managing the Window ......................... 91 + 4.2.2.17 Probing Zero Windows ........................ 92 + 4.2.2.18 Passive OPEN Calls .......................... 92 + 4.2.2.19 Time to Live ................................ 93 + 4.2.2.20 Event Processing ............................ 93 + 4.2.2.21 Acknowledging Queued Segments ............... 94 + 4.2.3 SPECIFIC ISSUES ................................... 95 + 4.2.3.1 Retransmission Timeout Calculation ........... 95 + 4.2.3.2 When to Send an ACK Segment .................. 96 + 4.2.3.3 When to Send a Window Update ................. 97 + 4.2.3.4 When to Send Data ............................ 98 + + + +Internet Engineering Task Force [Page 3] + + + + +RFC1122 INTRODUCTION October 1989 + + + 4.2.3.5 TCP Connection Failures ...................... 100 + 4.2.3.6 TCP Keep-Alives .............................. 101 + 4.2.3.7 TCP Multihoming .............................. 103 + 4.2.3.8 IP Options ................................... 103 + 4.2.3.9 ICMP Messages ................................ 103 + 4.2.3.10 Remote Address Validation ................... 104 + 4.2.3.11 TCP Traffic Patterns ........................ 104 + 4.2.3.12 Efficiency .................................. 105 + 4.2.4 TCP/APPLICATION LAYER INTERFACE ................... 106 + 4.2.4.1 Asynchronous Reports ......................... 106 + 4.2.4.2 Type-of-Service .............................. 107 + 4.2.4.3 Flush Call ................................... 107 + 4.2.4.4 Multihoming .................................. 108 + 4.2.5 TCP REQUIREMENT SUMMARY ........................... 108 + + 5. REFERENCES ................................................. 112 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Internet Engineering Task Force [Page 4] + + + + +RFC1122 INTRODUCTION October 1989 + + +1. INTRODUCTION + + This document is one of a pair that defines and discusses the + requirements for host system implementations of the Internet protocol + suite. This RFC covers the communication protocol layers: link + layer, IP layer, and transport layer. Its companion RFC, + "Requirements for Internet Hosts -- Application and Support" + [INTRO:1], covers the application layer protocols. This document + should also be read in conjunction with "Requirements for Internet + Gateways" [INTRO:2]. + + These documents are intended to provide guidance for vendors, + implementors, and users of Internet communication software. They + represent the consensus of a large body of technical experience and + wisdom, contributed by the members of the Internet research and + vendor communities. + + This RFC enumerates standard protocols that a host connected to the + Internet must use, and it incorporates by reference the RFCs and + other documents describing the current specifications for these + protocols. It corrects errors in the referenced documents and adds + additional discussion and guidance for an implementor. + + For each protocol, this document also contains an explicit set of + requirements, recommendations, and options. The reader must + understand that the list of requirements in this document is + incomplete by itself; the complete set of requirements for an + Internet host is primarily defined in the standard protocol + specification documents, with the corrections, amendments, and + supplements contained in this RFC. + + A good-faith implementation of the protocols that was produced after + careful reading of the RFC's and with some interaction with the + Internet technical community, and that followed good communications + software engineering practices, should differ from the requirements + of this document in only minor ways. Thus, in many cases, the + "requirements" in this RFC are already stated or implied in the + standard protocol documents, so that their inclusion here is, in a + sense, redundant. However, they were included because some past + implementation has made the wrong choice, causing problems of + interoperability, performance, and/or robustness. + + This document includes discussion and explanation of many of the + requirements and recommendations. A simple list of requirements + would be dangerous, because: + + o Some required features are more important than others, and some + features are optional. + + + +Internet Engineering Task Force [Page 5] + + + + +RFC1122 INTRODUCTION October 1989 + + + o There may be valid reasons why particular vendor products that + are designed for restricted contexts might choose to use + different specifications. + + However, the specifications of this document must be followed to meet + the general goal of arbitrary host interoperation across the + diversity and complexity of the Internet system. Although most + current implementations fail to meet these requirements in various + ways, some minor and some major, this specification is the ideal + towards which we need to move. + + These requirements are based on the current level of Internet + architecture. This document will be updated as required to provide + additional clarifications or to include additional information in + those areas in which specifications are still evolving. + + This introductory section begins with a brief overview of the + Internet architecture as it relates to hosts, and then gives some + general advice to host software vendors. Finally, there is some + guidance on reading the rest of the document and some terminology. + + 1.1 The Internet Architecture + + General background and discussion on the Internet architecture and + supporting protocol suite can be found in the DDN Protocol + Handbook [INTRO:3]; for background see for example [INTRO:9], + [INTRO:10], and [INTRO:11]. Reference [INTRO:5] describes the + procedure for obtaining Internet protocol documents, while + [INTRO:6] contains a list of the numbers assigned within Internet + protocols. + + 1.1.1 Internet Hosts + + A host computer, or simply "host," is the ultimate consumer of + communication services. A host generally executes application + programs on behalf of user(s), employing network and/or + Internet communication services in support of this function. + An Internet host corresponds to the concept of an "End-System" + used in the OSI protocol suite [INTRO:13]. + + An Internet communication system consists of interconnected + packet networks supporting communication among host computers + using the Internet protocols. The networks are interconnected + using packet-switching computers called "gateways" or "IP + routers" by the Internet community, and "Intermediate Systems" + by the OSI world [INTRO:13]. The RFC "Requirements for + Internet Gateways" [INTRO:2] contains the official + specifications for Internet gateways. That RFC together with + + + +Internet Engineering Task Force [Page 6] + + + + +RFC1122 INTRODUCTION October 1989 + + + the present document and its companion [INTRO:1] define the + rules for the current realization of the Internet architecture. + + Internet hosts span a wide range of size, speed, and function. + They range in size from small microprocessors through + workstations to mainframes and supercomputers. In function, + they range from single-purpose hosts (such as terminal servers) + to full-service hosts that support a variety of online network + services, typically including remote login, file transfer, and + electronic mail. + + A host is generally said to be multihomed if it has more than + one interface to the same or to different networks. See + Section 1.1.3 on "Terminology". + + 1.1.2 Architectural Assumptions + + The current Internet architecture is based on a set of + assumptions about the communication system. The assumptions + most relevant to hosts are as follows: + + (a) The Internet is a network of networks. + + Each host is directly connected to some particular + network(s); its connection to the Internet is only + conceptual. Two hosts on the same network communicate + with each other using the same set of protocols that they + would use to communicate with hosts on distant networks. + + (b) Gateways don't keep connection state information. + + To improve robustness of the communication system, + gateways are designed to be stateless, forwarding each IP + datagram independently of other datagrams. As a result, + redundant paths can be exploited to provide robust service + in spite of failures of intervening gateways and networks. + + All state information required for end-to-end flow control + and reliability is implemented in the hosts, in the + transport layer or in application programs. All + connection control information is thus co-located with the + end points of the communication, so it will be lost only + if an end point fails. + + (c) Routing complexity should be in the gateways. + + Routing is a complex and difficult problem, and ought to + be performed by the gateways, not the hosts. An important + + + +Internet Engineering Task Force [Page 7] + + + + +RFC1122 INTRODUCTION October 1989 + + + objective is to insulate host software from changes caused + by the inevitable evolution of the Internet routing + architecture. + + (d) The System must tolerate wide network variation. + + A basic objective of the Internet design is to tolerate a + wide range of network characteristics -- e.g., bandwidth, + delay, packet loss, packet reordering, and maximum packet + size. Another objective is robustness against failure of + individual networks, gateways, and hosts, using whatever + bandwidth is still available. Finally, the goal is full + "open system interconnection": an Internet host must be + able to interoperate robustly and effectively with any + other Internet host, across diverse Internet paths. + + Sometimes host implementors have designed for less + ambitious goals. For example, the LAN environment is + typically much more benign than the Internet as a whole; + LANs have low packet loss and delay and do not reorder + packets. Some vendors have fielded host implementations + that are adequate for a simple LAN environment, but work + badly for general interoperation. The vendor justifies + such a product as being economical within the restricted + LAN market. However, isolated LANs seldom stay isolated + for long; they are soon gatewayed to each other, to + organization-wide internets, and eventually to the global + Internet system. In the end, neither the customer nor the + vendor is served by incomplete or substandard Internet + host software. + + The requirements spelled out in this document are designed + for a full-function Internet host, capable of full + interoperation over an arbitrary Internet path. + + + 1.1.3 Internet Protocol Suite + + To communicate using the Internet system, a host must implement + the layered set of protocols comprising the Internet protocol + suite. A host typically must implement at least one protocol + from each layer. + + The protocol layers used in the Internet architecture are as + follows [INTRO:4]: + + + o Application Layer + + + +Internet Engineering Task Force [Page 8] + + + + +RFC1122 INTRODUCTION October 1989 + + + The application layer is the top layer of the Internet + protocol suite. The Internet suite does not further + subdivide the application layer, although some of the + Internet application layer protocols do contain some + internal sub-layering. The application layer of the + Internet suite essentially combines the functions of the + top two layers -- Presentation and Application -- of the + OSI reference model. + + We distinguish two categories of application layer + protocols: user protocols that provide service directly + to users, and support protocols that provide common system + functions. Requirements for user and support protocols + will be found in the companion RFC [INTRO:1]. + + The most common Internet user protocols are: + + o Telnet (remote login) + o FTP (file transfer) + o SMTP (electronic mail delivery) + + There are a number of other standardized user protocols + [INTRO:4] and many private user protocols. + + Support protocols, used for host name mapping, booting, + and management, include SNMP, BOOTP, RARP, and the Domain + Name System (DNS) protocols. + + + o Transport Layer + + The transport layer provides end-to-end communication + services for applications. There are two primary + transport layer protocols at present: + + o Transmission Control Protocol (TCP) + o User Datagram Protocol (UDP) + + TCP is a reliable connection-oriented transport service + that provides end-to-end reliability, resequencing, and + flow control. UDP is a connectionless ("datagram") + transport service. + + Other transport protocols have been developed by the + research community, and the set of official Internet + transport protocols may be expanded in the future. + + Transport layer protocols are discussed in Chapter 4. + + + +Internet Engineering Task Force [Page 9] + + + + +RFC1122 INTRODUCTION October 1989 + + + o Internet Layer + + All Internet transport protocols use the Internet Protocol + (IP) to carry data from source host to destination host. + IP is a connectionless or datagram internetwork service, + providing no end-to-end delivery guarantees. Thus, IP + datagrams may arrive at the destination host damaged, + duplicated, out of order, or not at all. The layers above + IP are responsible for reliable delivery service when it + is required. The IP protocol includes provision for + addressing, type-of-service specification, fragmentation + and reassembly, and security information. + + The datagram or connectionless nature of the IP protocol + is a fundamental and characteristic feature of the + Internet architecture. Internet IP was the model for the + OSI Connectionless Network Protocol [INTRO:12]. + + ICMP is a control protocol that is considered to be an + integral part of IP, although it is architecturally + layered upon IP, i.e., it uses IP to carry its data end- + to-end just as a transport protocol like TCP or UDP does. + ICMP provides error reporting, congestion reporting, and + first-hop gateway redirection. + + IGMP is an Internet layer protocol used for establishing + dynamic host groups for IP multicasting. + + The Internet layer protocols IP, ICMP, and IGMP are + discussed in Chapter 3. + + + o Link Layer + + To communicate on its directly-connected network, a host + must implement the communication protocol used to + interface to that network. We call this a link layer or + media-access layer protocol. + + There is a wide variety of link layer protocols, + corresponding to the many different types of networks. + See Chapter 2. + + + 1.1.4 Embedded Gateway Code + + Some Internet host software includes embedded gateway + functionality, so that these hosts can forward packets as a + + + +Internet Engineering Task Force [Page 10] + + + + +RFC1122 INTRODUCTION October 1989 + + + gateway would, while still performing the application layer + functions of a host. + + Such dual-purpose systems must follow the Gateway Requirements + RFC [INTRO:2] with respect to their gateway functions, and + must follow the present document with respect to their host + functions. In all overlapping cases, the two specifications + should be in agreement. + + There are varying opinions in the Internet community about + embedded gateway functionality. The main arguments are as + follows: + + o Pro: in a local network environment where networking is + informal, or in isolated internets, it may be convenient + and economical to use existing host systems as gateways. + + There is also an architectural argument for embedded + gateway functionality: multihoming is much more common + than originally foreseen, and multihoming forces a host to + make routing decisions as if it were a gateway. If the + multihomed host contains an embedded gateway, it will + have full routing knowledge and as a result will be able + to make more optimal routing decisions. + + o Con: Gateway algorithms and protocols are still changing, + and they will continue to change as the Internet system + grows larger. Attempting to include a general gateway + function within the host IP layer will force host system + maintainers to track these (more frequent) changes. Also, + a larger pool of gateway implementations will make + coordinating the changes more difficult. Finally, the + complexity of a gateway IP layer is somewhat greater than + that of a host, making the implementation and operation + tasks more complex. + + In addition, the style of operation of some hosts is not + appropriate for providing stable and robust gateway + service. + + There is considerable merit in both of these viewpoints. One + conclusion can be drawn: an host administrator must have + conscious control over whether or not a given host acts as a + gateway. See Section 3.1 for the detailed requirements. + + + + + + + +Internet Engineering Task Force [Page 11] + + + + +RFC1122 INTRODUCTION October 1989 + + + 1.2 General Considerations + + There are two important lessons that vendors of Internet host + software have learned and which a new vendor should consider + seriously. + + 1.2.1 Continuing Internet Evolution + + The enormous growth of the Internet has revealed problems of + management and scaling in a large datagram-based packet + communication system. These problems are being addressed, and + as a result there will be continuing evolution of the + specifications described in this document. These changes will + be carefully planned and controlled, since there is extensive + participation in this planning by the vendors and by the + organizations responsible for operations of the networks. + + Development, evolution, and revision are characteristic of + computer network protocols today, and this situation will + persist for some years. A vendor who develops computer + communication software for the Internet protocol suite (or any + other protocol suite!) and then fails to maintain and update + that software for changing specifications is going to leave a + trail of unhappy customers. The Internet is a large + communication network, and the users are in constant contact + through it. Experience has shown that knowledge of + deficiencies in vendor software propagates quickly through the + Internet technical community. + + 1.2.2 Robustness Principle + + At every layer of the protocols, there is a general rule whose + application can lead to enormous benefits in robustness and + interoperability [IP:1]: + + "Be liberal in what you accept, and + conservative in what you send" + + Software should be written to deal with every conceivable + error, no matter how unlikely; sooner or later a packet will + come in with that particular combination of errors and + attributes, and unless the software is prepared, chaos can + ensue. In general, it is best to assume that the network is + filled with malevolent entities that will send in packets + designed to have the worst possible effect. This assumption + will lead to suitable protective design, although the most + serious problems in the Internet have been caused by + unenvisaged mechanisms triggered by low-probability events; + + + +Internet Engineering Task Force [Page 12] + + + + +RFC1122 INTRODUCTION October 1989 + + + mere human malice would never have taken so devious a course! + + Adaptability to change must be designed into all levels of + Internet host software. As a simple example, consider a + protocol specification that contains an enumeration of values + for a particular header field -- e.g., a type field, a port + number, or an error code; this enumeration must be assumed to + be incomplete. Thus, if a protocol specification defines four + possible error codes, the software must not break when a fifth + code shows up. An undefined code might be logged (see below), + but it must not cause a failure. + + The second part of the principle is almost as important: + software on other hosts may contain deficiencies that make it + unwise to exploit legal but obscure protocol features. It is + unwise to stray far from the obvious and simple, lest untoward + effects result elsewhere. A corollary of this is "watch out + for misbehaving hosts"; host software should be prepared, not + just to survive other misbehaving hosts, but also to cooperate + to limit the amount of disruption such hosts can cause to the + shared communication facility. + + 1.2.3 Error Logging + + The Internet includes a great variety of host and gateway + systems, each implementing many protocols and protocol layers, + and some of these contain bugs and mis-features in their + Internet protocol software. As a result of complexity, + diversity, and distribution of function, the diagnosis of + Internet problems is often very difficult. + + Problem diagnosis will be aided if host implementations include + a carefully designed facility for logging erroneous or + "strange" protocol events. It is important to include as much + diagnostic information as possible when an error is logged. In + particular, it is often useful to record the header(s) of a + packet that caused an error. However, care must be taken to + ensure that error logging does not consume prohibitive amounts + of resources or otherwise interfere with the operation of the + host. + + There is a tendency for abnormal but harmless protocol events + to overflow error logging files; this can be avoided by using a + "circular" log, or by enabling logging only while diagnosing a + known failure. It may be useful to filter and count duplicate + successive messages. One strategy that seems to work well is: + (1) always count abnormalities and make such counts accessible + through the management protocol (see [INTRO:1]); and (2) allow + + + +Internet Engineering Task Force [Page 13] + + + + +RFC1122 INTRODUCTION October 1989 + + + the logging of a great variety of events to be selectively + enabled. For example, it might useful to be able to "log + everything" or to "log everything for host X". + + Note that different managements may have differing policies + about the amount of error logging that they want normally + enabled in a host. Some will say, "if it doesn't hurt me, I + don't want to know about it", while others will want to take a + more watchful and aggressive attitude about detecting and + removing protocol abnormalities. + + 1.2.4 Configuration + + It would be ideal if a host implementation of the Internet + protocol suite could be entirely self-configuring. This would + allow the whole suite to be implemented in ROM or cast into + silicon, it would simplify diskless workstations, and it would + be an immense boon to harried LAN administrators as well as + system vendors. We have not reached this ideal; in fact, we + are not even close. + + At many points in this document, you will find a requirement + that a parameter be a configurable option. There are several + different reasons behind such requirements. In a few cases, + there is current uncertainty or disagreement about the best + value, and it may be necessary to update the recommended value + in the future. In other cases, the value really depends on + external factors -- e.g., the size of the host and the + distribution of its communication load, or the speeds and + topology of nearby networks -- and self-tuning algorithms are + unavailable and may be insufficient. In some cases, + configurability is needed because of administrative + requirements. + + Finally, some configuration options are required to communicate + with obsolete or incorrect implementations of the protocols, + distributed without sources, that unfortunately persist in many + parts of the Internet. To make correct systems coexist with + these faulty systems, administrators often have to "mis- + configure" the correct systems. This problem will correct + itself gradually as the faulty systems are retired, but it + cannot be ignored by vendors. + + When we say that a parameter must be configurable, we do not + intend to require that its value be explicitly read from a + configuration file at every boot time. We recommend that + implementors set up a default for each parameter, so a + configuration file is only necessary to override those defaults + + + +Internet Engineering Task Force [Page 14] + + + + +RFC1122 INTRODUCTION October 1989 + + + that are inappropriate in a particular installation. Thus, the + configurability requirement is an assurance that it will be + POSSIBLE to override the default when necessary, even in a + binary-only or ROM-based product. + + This document requires a particular value for such defaults in + some cases. The choice of default is a sensitive issue when + the configuration item controls the accommodation to existing + faulty systems. If the Internet is to converge successfully to + complete interoperability, the default values built into + implementations must implement the official protocol, not + "mis-configurations" to accommodate faulty implementations. + Although marketing considerations have led some vendors to + choose mis-configuration defaults, we urge vendors to choose + defaults that will conform to the standard. + + Finally, we note that a vendor needs to provide adequate + documentation on all configuration parameters, their limits and + effects. + + + 1.3 Reading this Document + + 1.3.1 Organization + + Protocol layering, which is generally used as an organizing + principle in implementing network software, has also been used + to organize this document. In describing the rules, we assume + that an implementation does strictly mirror the layering of the + protocols. Thus, the following three major sections specify + the requirements for the link layer, the internet layer, and + the transport layer, respectively. A companion RFC [INTRO:1] + covers application level software. This layerist organization + was chosen for simplicity and clarity. + + However, strict layering is an imperfect model, both for the + protocol suite and for recommended implementation approaches. + Protocols in different layers interact in complex and sometimes + subtle ways, and particular functions often involve multiple + layers. There are many design choices in an implementation, + many of which involve creative "breaking" of strict layering. + Every implementor is urged to read references [INTRO:7] and + [INTRO:8]. + + This document describes the conceptual service interface + between layers using a functional ("procedure call") notation, + like that used in the TCP specification [TCP:1]. A host + implementation must support the logical information flow + + + +Internet Engineering Task Force [Page 15] + + + + +RFC1122 INTRODUCTION October 1989 + + + implied by these calls, but need not literally implement the + calls themselves. For example, many implementations reflect + the coupling between the transport layer and the IP layer by + giving them shared access to common data structures. These + data structures, rather than explicit procedure calls, are then + the agency for passing much of the information that is + required. + + In general, each major section of this document is organized + into the following subsections: + + (1) Introduction + + (2) Protocol Walk-Through -- considers the protocol + specification documents section-by-section, correcting + errors, stating requirements that may be ambiguous or + ill-defined, and providing further clarification or + explanation. + + (3) Specific Issues -- discusses protocol design and + implementation issues that were not included in the walk- + through. + + (4) Interfaces -- discusses the service interface to the next + higher layer. + + (5) Summary -- contains a summary of the requirements of the + section. + + + Under many of the individual topics in this document, there is + parenthetical material labeled "DISCUSSION" or + "IMPLEMENTATION". This material is intended to give + clarification and explanation of the preceding requirements + text. It also includes some suggestions on possible future + directions or developments. The implementation material + contains suggested approaches that an implementor may want to + consider. + + The summary sections are intended to be guides and indexes to + the text, but are necessarily cryptic and incomplete. The + summaries should never be used or referenced separately from + the complete RFC. + + 1.3.2 Requirements + + In this document, the words that are used to define the + significance of each particular requirement are capitalized. + + + +Internet Engineering Task Force [Page 16] + + + + +RFC1122 INTRODUCTION October 1989 + + + These words are: + + * "MUST" + + This word or the adjective "REQUIRED" means that the item + is an absolute requirement of the specification. + + * "SHOULD" + + This word or the adjective "RECOMMENDED" means that there + may exist valid reasons in particular circumstances to + ignore this item, but the full implications should be + understood and the case carefully weighed before choosing + a different course. + + * "MAY" + + This word or the adjective "OPTIONAL" means that this item + is truly optional. One vendor may choose to include the + item because a particular marketplace requires it or + because it enhances the product, for example; another + vendor may omit the same item. + + + An implementation is not compliant if it fails to satisfy one + or more of the MUST requirements for the protocols it + implements. An implementation that satisfies all the MUST and + all the SHOULD requirements for its protocols is said to be + "unconditionally compliant"; one that satisfies all the MUST + requirements but not all the SHOULD requirements for its + protocols is said to be "conditionally compliant". + + 1.3.3 Terminology + + This document uses the following technical terms: + + Segment + A segment is the unit of end-to-end transmission in the + TCP protocol. A segment consists of a TCP header followed + by application data. A segment is transmitted by + encapsulation inside an IP datagram. + + Message + In this description of the lower-layer protocols, a + message is the unit of transmission in a transport layer + protocol. In particular, a TCP segment is a message. A + message consists of a transport protocol header followed + by application protocol data. To be transmitted end-to- + + + +Internet Engineering Task Force [Page 17] + + + + +RFC1122 INTRODUCTION October 1989 + + + end through the Internet, a message must be encapsulated + inside a datagram. + + IP Datagram + An IP datagram is the unit of end-to-end transmission in + the IP protocol. An IP datagram consists of an IP header + followed by transport layer data, i.e., of an IP header + followed by a message. + + In the description of the internet layer (Section 3), the + unqualified term "datagram" should be understood to refer + to an IP datagram. + + Packet + A packet is the unit of data passed across the interface + between the internet layer and the link layer. It + includes an IP header and data. A packet may be a + complete IP datagram or a fragment of an IP datagram. + + Frame + A frame is the unit of transmission in a link layer + protocol, and consists of a link-layer header followed by + a packet. + + Connected Network + A network to which a host is interfaced is often known as + the "local network" or the "subnetwork" relative to that + host. However, these terms can cause confusion, and + therefore we use the term "connected network" in this + document. + + Multihomed + A host is said to be multihomed if it has multiple IP + addresses. For a discussion of multihoming, see Section + 3.3.4 below. + + Physical network interface + This is a physical interface to a connected network and + has a (possibly unique) link-layer address. Multiple + physical network interfaces on a single host may share the + same link-layer address, but the address must be unique + for different hosts on the same physical network. + + Logical [network] interface + We define a logical [network] interface to be a logical + path, distinguished by a unique IP address, to a connected + network. See Section 3.3.4. + + + + +Internet Engineering Task Force [Page 18] + + + + +RFC1122 INTRODUCTION October 1989 + + + Specific-destination address + This is the effective destination address of a datagram, + even if it is broadcast or multicast; see Section 3.2.1.3. + + Path + At a given moment, all the IP datagrams from a particular + source host to a particular destination host will + typically traverse the same sequence of gateways. We use + the term "path" for this sequence. Note that a path is + uni-directional; it is not unusual to have different paths + in the two directions between a given host pair. + + MTU + The maximum transmission unit, i.e., the size of the + largest packet that can be transmitted. + + + The terms frame, packet, datagram, message, and segment are + illustrated by the following schematic diagrams: + + A. Transmission on connected network: + _______________________________________________ + | LL hdr | IP hdr | (data) | + |________|________|_____________________________| + + <---------- Frame -----------------------------> + <----------Packet --------------------> + + + B. Before IP fragmentation or after IP reassembly: + ______________________________________ + | IP hdr | transport| Application Data | + |________|____hdr___|__________________| + + <-------- Datagram ------------------> + <-------- Message -----------> + or, for TCP: + ______________________________________ + | IP hdr | TCP hdr | Application Data | + |________|__________|__________________| + + <-------- Datagram ------------------> + <-------- Segment -----------> + + + + + + + + +Internet Engineering Task Force [Page 19] + + + + +RFC1122 INTRODUCTION October 1989 + + + 1.4 Acknowledgments + + This document incorporates contributions and comments from a large + group of Internet protocol experts, including representatives of + university and research labs, vendors, and government agencies. + It was assembled primarily by the Host Requirements Working Group + of the Internet Engineering Task Force (IETF). + + The Editor would especially like to acknowledge the tireless + dedication of the following people, who attended many long + meetings and generated 3 million bytes of electronic mail over the + past 18 months in pursuit of this document: Philip Almquist, Dave + Borman (Cray Research), Noel Chiappa, Dave Crocker (DEC), Steve + Deering (Stanford), Mike Karels (Berkeley), Phil Karn (Bellcore), + John Lekashman (NASA), Charles Lynn (BBN), Keith McCloghrie (TWG), + Paul Mockapetris (ISI), Thomas Narten (Purdue), Craig Partridge + (BBN), Drew Perkins (CMU), and James Van Bokkelen (FTP Software). + + In addition, the following people made major contributions to the + effort: Bill Barns (Mitre), Steve Bellovin (AT&T), Mike Brescia + (BBN), Ed Cain (DCA), Annette DeSchon (ISI), Martin Gross (DCA), + Phill Gross (NRI), Charles Hedrick (Rutgers), Van Jacobson (LBL), + John Klensin (MIT), Mark Lottor (SRI), Milo Medin (NASA), Bill + Melohn (Sun Microsystems), Greg Minshall (Kinetics), Jeff Mogul + (DEC), John Mullen (CMC), Jon Postel (ISI), John Romkey (Epilogue + Technology), and Mike StJohns (DCA). The following also made + significant contributions to particular areas: Eric Allman + (Berkeley), Rob Austein (MIT), Art Berggreen (ACC), Keith Bostic + (Berkeley), Vint Cerf (NRI), Wayne Hathaway (NASA), Matt Korn + (IBM), Erik Naggum (Naggum Software, Norway), Robert Ullmann + (Prime Computer), David Waitzman (BBN), Frank Wancho (USA), Arun + Welch (Ohio State), Bill Westfield (Cisco), and Rayan Zachariassen + (Toronto). + + We are grateful to all, including any contributors who may have + been inadvertently omitted from this list. + + + + + + + + + + + + + + + +Internet Engineering Task Force [Page 20] + + + + +RFC1122 LINK LAYER October 1989 + + +2. LINK LAYER + + 2.1 INTRODUCTION + + All Internet systems, both hosts and gateways, have the same + requirements for link layer protocols. These requirements are + given in Chapter 3 of "Requirements for Internet Gateways" + [INTRO:2], augmented with the material in this section. + + 2.2 PROTOCOL WALK-THROUGH + + None. + + 2.3 SPECIFIC ISSUES + + 2.3.1 Trailer Protocol Negotiation + + The trailer protocol [LINK:1] for link-layer encapsulation MAY + be used, but only when it has been verified that both systems + (host or gateway) involved in the link-layer communication + implement trailers. If the system does not dynamically + negotiate use of the trailer protocol on a per-destination + basis, the default configuration MUST disable the protocol. + + DISCUSSION: + The trailer protocol is a link-layer encapsulation + technique that rearranges the data contents of packets + sent on the physical network. In some cases, trailers + improve the throughput of higher layer protocols by + reducing the amount of data copying within the operating + system. Higher layer protocols are unaware of trailer + use, but both the sending and receiving host MUST + understand the protocol if it is used. + + Improper use of trailers can result in very confusing + symptoms. Only packets with specific size attributes are + encapsulated using trailers, and typically only a small + fraction of the packets being exchanged have these + attributes. Thus, if a system using trailers exchanges + packets with a system that does not, some packets + disappear into a black hole while others are delivered + successfully. + + IMPLEMENTATION: + On an Ethernet, packets encapsulated with trailers use a + distinct Ethernet type [LINK:1], and trailer negotiation + is performed at the time that ARP is used to discover the + link-layer address of a destination system. + + + +Internet Engineering Task Force [Page 21] + + + + +RFC1122 LINK LAYER October 1989 + + + Specifically, the ARP exchange is completed in the usual + manner using the normal IP protocol type, but a host that + wants to speak trailers will send an additional "trailer + ARP reply" packet, i.e., an ARP reply that specifies the + trailer encapsulation protocol type but otherwise has the + format of a normal ARP reply. If a host configured to use + trailers receives a trailer ARP reply message from a + remote machine, it can add that machine to the list of + machines that understand trailers, e.g., by marking the + corresponding entry in the ARP cache. + + Hosts wishing to receive trailer encapsulations send + trailer ARP replies whenever they complete exchanges of + normal ARP messages for IP. Thus, a host that received an + ARP request for its IP protocol address would send a + trailer ARP reply in addition to the normal IP ARP reply; + a host that sent the IP ARP request would send a trailer + ARP reply when it received the corresponding IP ARP reply. + In this way, either the requesting or responding host in + an IP ARP exchange may request that it receive trailer + encapsulations. + + This scheme, using extra trailer ARP reply packets rather + than sending an ARP request for the trailer protocol type, + was designed to avoid a continuous exchange of ARP packets + with a misbehaving host that, contrary to any + specification or common sense, responded to an ARP reply + for trailers with another ARP reply for IP. This problem + is avoided by sending a trailer ARP reply in response to + an IP ARP reply only when the IP ARP reply answers an + outstanding request; this is true when the hardware + address for the host is still unknown when the IP ARP + reply is received. A trailer ARP reply may always be sent + along with an IP ARP reply responding to an IP ARP + request. + + 2.3.2 Address Resolution Protocol -- ARP + + 2.3.2.1 ARP Cache Validation + + An implementation of the Address Resolution Protocol (ARP) + [LINK:2] MUST provide a mechanism to flush out-of-date cache + entries. If this mechanism involves a timeout, it SHOULD be + possible to configure the timeout value. + + A mechanism to prevent ARP flooding (repeatedly sending an + ARP Request for the same IP address, at a high rate) MUST be + included. The recommended maximum rate is 1 per second per + + + +Internet Engineering Task Force [Page 22] + + + + +RFC1122 LINK LAYER October 1989 + + + destination. + + DISCUSSION: + The ARP specification [LINK:2] suggests but does not + require a timeout mechanism to invalidate cache entries + when hosts change their Ethernet addresses. The + prevalence of proxy ARP (see Section 2.4 of [INTRO:2]) + has significantly increased the likelihood that cache + entries in hosts will become invalid, and therefore + some ARP-cache invalidation mechanism is now required + for hosts. Even in the absence of proxy ARP, a long- + period cache timeout is useful in order to + automatically correct any bad ARP data that might have + been cached. + + IMPLEMENTATION: + Four mechanisms have been used, sometimes in + combination, to flush out-of-date cache entries. + + (1) Timeout -- Periodically time out cache entries, + even if they are in use. Note that this timeout + should be restarted when the cache entry is + "refreshed" (by observing the source fields, + regardless of target address, of an ARP broadcast + from the system in question). For proxy ARP + situations, the timeout needs to be on the order + of a minute. + + (2) Unicast Poll -- Actively poll the remote host by + periodically sending a point-to-point ARP Request + to it, and delete the entry if no ARP Reply is + received from N successive polls. Again, the + timeout should be on the order of a minute, and + typically N is 2. + + (3) Link-Layer Advice -- If the link-layer driver + detects a delivery problem, flush the + corresponding ARP cache entry. + + (4) Higher-layer Advice -- Provide a call from the + Internet layer to the link layer to indicate a + delivery problem. The effect of this call would + be to invalidate the corresponding cache entry. + This call would be analogous to the + "ADVISE_DELIVPROB()" call from the transport layer + to the Internet layer (see Section 3.4), and in + fact the ADVISE_DELIVPROB routine might in turn + call the link-layer advice routine to invalidate + + + +Internet Engineering Task Force [Page 23] + + + + +RFC1122 LINK LAYER October 1989 + + + the ARP cache entry. + + Approaches (1) and (2) involve ARP cache timeouts on + the order of a minute or less. In the absence of proxy + ARP, a timeout this short could create noticeable + overhead traffic on a very large Ethernet. Therefore, + it may be necessary to configure a host to lengthen the + ARP cache timeout. + + 2.3.2.2 ARP Packet Queue + + The link layer SHOULD save (rather than discard) at least + one (the latest) packet of each set of packets destined to + the same unresolved IP address, and transmit the saved + packet when the address has been resolved. + + DISCUSSION: + Failure to follow this recommendation causes the first + packet of every exchange to be lost. Although higher- + layer protocols can generally cope with packet loss by + retransmission, packet loss does impact performance. + For example, loss of a TCP open request causes the + initial round-trip time estimate to be inflated. UDP- + based applications such as the Domain Name System are + more seriously affected. + + 2.3.3 Ethernet and IEEE 802 Encapsulation + + The IP encapsulation for Ethernets is described in RFC-894 + [LINK:3], while RFC-1042 [LINK:4] describes the IP + encapsulation for IEEE 802 networks. RFC-1042 elaborates and + replaces the discussion in Section 3.4 of [INTRO:2]. + + Every Internet host connected to a 10Mbps Ethernet cable: + + o MUST be able to send and receive packets using RFC-894 + encapsulation; + + o SHOULD be able to receive RFC-1042 packets, intermixed + with RFC-894 packets; and + + o MAY be able to send packets using RFC-1042 encapsulation. + + + An Internet host that implements sending both the RFC-894 and + the RFC-1042 encapsulations MUST provide a configuration switch + to select which is sent, and this switch MUST default to RFC- + 894. + + + +Internet Engineering Task Force [Page 24] + + + + +RFC1122 LINK LAYER October 1989 + + + Note that the standard IP encapsulation in RFC-1042 does not + use the protocol id value (K1=6) that IEEE reserved for IP; + instead, it uses a value (K1=170) that implies an extension + (the "SNAP") which can be used to hold the Ether-Type field. + An Internet system MUST NOT send 802 packets using K1=6. + + Address translation from Internet addresses to link-layer + addresses on Ethernet and IEEE 802 networks MUST be managed by + the Address Resolution Protocol (ARP). + + The MTU for an Ethernet is 1500 and for 802.3 is 1492. + + DISCUSSION: + The IEEE 802.3 specification provides for operation over a + 10Mbps Ethernet cable, in which case Ethernet and IEEE + 802.3 frames can be physically intermixed. A receiver can + distinguish Ethernet and 802.3 frames by the value of the + 802.3 Length field; this two-octet field coincides in the + header with the Ether-Type field of an Ethernet frame. In + particular, the 802.3 Length field must be less than or + equal to 1500, while all valid Ether-Type values are + greater than 1500. + + Another compatibility problem arises with link-layer + broadcasts. A broadcast sent with one framing will not be + seen by hosts that can receive only the other framing. + + The provisions of this section were designed to provide + direct interoperation between 894-capable and 1042-capable + systems on the same cable, to the maximum extent possible. + It is intended to support the present situation where + 894-only systems predominate, while providing an easy + transition to a possible future in which 1042-capable + systems become common. + + Note that 894-only systems cannot interoperate directly + with 1042-only systems. If the two system types are set + up as two different logical networks on the same cable, + they can communicate only through an IP gateway. + Furthermore, it is not useful or even possible for a + dual-format host to discover automatically which format to + send, because of the problem of link-layer broadcasts. + + 2.4 LINK/INTERNET LAYER INTERFACE + + The packet receive interface between the IP layer and the link + layer MUST include a flag to indicate whether the incoming packet + was addressed to a link-layer broadcast address. + + + +Internet Engineering Task Force [Page 25] + + + + +RFC1122 LINK LAYER October 1989 + + + DISCUSSION + Although the IP layer does not generally know link layer + addresses (since every different network medium typically has + a different address format), the broadcast address on a + broadcast-capable medium is an important special case. See + Section 3.2.2, especially the DISCUSSION concerning broadcast + storms. + + The packet send interface between the IP and link layers MUST + include the 5-bit TOS field (see Section 3.2.1.6). + + The link layer MUST NOT report a Destination Unreachable error to + IP solely because there is no ARP cache entry for a destination. + + 2.5 LINK LAYER REQUIREMENTS SUMMARY + + | | | | |S| | + | | | | |H| |F + | | | | |O|M|o + | | |S| |U|U|o + | | |H| |L|S|t + | |M|O| |D|T|n + | |U|U|M| | |o + | |S|L|A|N|N|t + | |T|D|Y|O|O|t +FEATURE |SECTION| | | |T|T|e +--------------------------------------------------|-------|-|-|-|-|-|-- + | | | | | | | +Trailer encapsulation |2.3.1 | | |x| | | +Send Trailers by default without negotiation |2.3.1 | | | | |x| +ARP |2.3.2 | | | | | | + Flush out-of-date ARP cache entries |2.3.2.1|x| | | | | + Prevent ARP floods |2.3.2.1|x| | | | | + Cache timeout configurable |2.3.2.1| |x| | | | + Save at least one (latest) unresolved pkt |2.3.2.2| |x| | | | +Ethernet and IEEE 802 Encapsulation |2.3.3 | | | | | | + Host able to: |2.3.3 | | | | | | + Send & receive RFC-894 encapsulation |2.3.3 |x| | | | | + Receive RFC-1042 encapsulation |2.3.3 | |x| | | | + Send RFC-1042 encapsulation |2.3.3 | | |x| | | + Then config. sw. to select, RFC-894 dflt |2.3.3 |x| | | | | + Send K1=6 encapsulation |2.3.3 | | | | |x| + Use ARP on Ethernet and IEEE 802 nets |2.3.3 |x| | | | | +Link layer report b'casts to IP layer |2.4 |x| | | | | +IP layer pass TOS to link layer |2.4 |x| | | | | +No ARP cache entry treated as Dest. Unreach. |2.4 | | | | |x| + + + + + +Internet Engineering Task Force [Page 26] + + + + +RFC1122 INTERNET LAYER October 1989 + + +3. INTERNET LAYER PROTOCOLS + + 3.1 INTRODUCTION + + The Robustness Principle: "Be liberal in what you accept, and + conservative in what you send" is particularly important in the + Internet layer, where one misbehaving host can deny Internet + service to many other hosts. + + The protocol standards used in the Internet layer are: + + o RFC-791 [IP:1] defines the IP protocol and gives an + introduction to the architecture of the Internet. + + o RFC-792 [IP:2] defines ICMP, which provides routing, + diagnostic and error functionality for IP. Although ICMP + messages are encapsulated within IP datagrams, ICMP + processing is considered to be (and is typically implemented + as) part of the IP layer. See Section 3.2.2. + + o RFC-950 [IP:3] defines the mandatory subnet extension to the + addressing architecture. + + o RFC-1112 [IP:4] defines the Internet Group Management + Protocol IGMP, as part of a recommended extension to hosts + and to the host-gateway interface to support Internet-wide + multicasting at the IP level. See Section 3.2.3. + + The target of an IP multicast may be an arbitrary group of + Internet hosts. IP multicasting is designed as a natural + extension of the link-layer multicasting facilities of some + networks, and it provides a standard means for local access + to such link-layer multicasting facilities. + + Other important references are listed in Section 5 of this + document. + + The Internet layer of host software MUST implement both IP and + ICMP. See Section 3.3.7 for the requirements on support of IGMP. + + The host IP layer has two basic functions: (1) choose the "next + hop" gateway or host for outgoing IP datagrams and (2) reassemble + incoming IP datagrams. The IP layer may also (3) implement + intentional fragmentation of outgoing datagrams. Finally, the IP + layer must (4) provide diagnostic and error functionality. We + expect that IP layer functions may increase somewhat in the + future, as further Internet control and management facilities are + developed. + + + +Internet Engineering Task Force [Page 27] + + + + +RFC1122 INTERNET LAYER October 1989 + + + For normal datagrams, the processing is straightforward. For + incoming datagrams, the IP layer: + + (1) verifies that the datagram is correctly formatted; + + (2) verifies that it is destined to the local host; + + (3) processes options; + + (4) reassembles the datagram if necessary; and + + (5) passes the encapsulated message to the appropriate + transport-layer protocol module. + + For outgoing datagrams, the IP layer: + + (1) sets any fields not set by the transport layer; + + (2) selects the correct first hop on the connected network (a + process called "routing"); + + (3) fragments the datagram if necessary and if intentional + fragmentation is implemented (see Section 3.3.3); and + + (4) passes the packet(s) to the appropriate link-layer driver. + + + A host is said to be multihomed if it has multiple IP addresses. + Multihoming introduces considerable confusion and complexity into + the protocol suite, and it is an area in which the Internet + architecture falls seriously short of solving all problems. There + are two distinct problem areas in multihoming: + + (1) Local multihoming -- the host itself is multihomed; or + + (2) Remote multihoming -- the local host needs to communicate + with a remote multihomed host. + + At present, remote multihoming MUST be handled at the application + layer, as discussed in the companion RFC [INTRO:1]. A host MAY + support local multihoming, which is discussed in this document, + and in particular in Section 3.3.4. + + Any host that forwards datagrams generated by another host is + acting as a gateway and MUST also meet the specifications laid out + in the gateway requirements RFC [INTRO:2]. An Internet host that + includes embedded gateway code MUST have a configuration switch to + disable the gateway function, and this switch MUST default to the + + + +Internet Engineering Task Force [Page 28] + + + + +RFC1122 INTERNET LAYER October 1989 + + + non-gateway mode. In this mode, a datagram arriving through one + interface will not be forwarded to another host or gateway (unless + it is source-routed), regardless of whether the host is single- + homed or multihomed. The host software MUST NOT automatically + move into gateway mode if the host has more than one interface, as + the operator of the machine may neither want to provide that + service nor be competent to do so. + + In the following, the action specified in certain cases is to + "silently discard" a received datagram. This means that the + datagram will be discarded without further processing and that the + host will not send any ICMP error message (see Section 3.2.2) as a + result. However, for diagnosis of problems a host SHOULD provide + the capability of logging the error (see Section 1.2.3), including + the contents of the silently-discarded datagram, and SHOULD record + the event in a statistics counter. + + DISCUSSION: + Silent discard of erroneous datagrams is generally intended + to prevent "broadcast storms". + + 3.2 PROTOCOL WALK-THROUGH + + 3.2.1 Internet Protocol -- IP + + 3.2.1.1 Version Number: RFC-791 Section 3.1 + + A datagram whose version number is not 4 MUST be silently + discarded. + + 3.2.1.2 Checksum: RFC-791 Section 3.1 + + A host MUST verify the IP header checksum on every received + datagram and silently discard every datagram that has a bad + checksum. + + 3.2.1.3 Addressing: RFC-791 Section 3.2 + + There are now five classes of IP addresses: Class A through + Class E. Class D addresses are used for IP multicasting + [IP:4], while Class E addresses are reserved for + experimental use. + + A multicast (Class D) address is a 28-bit logical address + that stands for a group of hosts, and may be either + permanent or transient. Permanent multicast addresses are + allocated by the Internet Assigned Number Authority + [INTRO:6], while transient addresses may be allocated + + + +Internet Engineering Task Force [Page 29] + + + + +RFC1122 INTERNET LAYER October 1989 + + + dynamically to transient groups. Group membership is + determined dynamically using IGMP [IP:4]. + + We now summarize the important special cases for Class A, B, + and C IP addresses, using the following notation for an IP + address: + + { <Network-number>, <Host-number> } + + or + { <Network-number>, <Subnet-number>, <Host-number> } + + and the notation "-1" for a field that contains all 1 bits. + This notation is not intended to imply that the 1-bits in an + address mask need be contiguous. + + (a) { 0, 0 } + + This host on this network. MUST NOT be sent, except as + a source address as part of an initialization procedure + by which the host learns its own IP address. + + See also Section 3.3.6 for a non-standard use of {0,0}. + + (b) { 0, <Host-number> } + + Specified host on this network. It MUST NOT be sent, + except as a source address as part of an initialization + procedure by which the host learns its full IP address. + + (c) { -1, -1 } + + Limited broadcast. It MUST NOT be used as a source + address. + + A datagram with this destination address will be + received by every host on the connected physical + network but will not be forwarded outside that network. + + (d) { <Network-number>, -1 } + + Directed broadcast to the specified network. It MUST + NOT be used as a source address. + + (e) { <Network-number>, <Subnet-number>, -1 } + + Directed broadcast to the specified subnet. It MUST + NOT be used as a source address. + + + +Internet Engineering Task Force [Page 30] + + + + +RFC1122 INTERNET LAYER October 1989 + + + (f) { <Network-number>, -1, -1 } + + Directed broadcast to all subnets of the specified + subnetted network. It MUST NOT be used as a source + address. + + (g) { 127, <any> } + + Internal host loopback address. Addresses of this form + MUST NOT appear outside a host. + + The <Network-number> is administratively assigned so that + its value will be unique in the entire world. + + IP addresses are not permitted to have the value 0 or -1 for + any of the <Host-number>, <Network-number>, or <Subnet- + number> fields (except in the special cases listed above). + This implies that each of these fields will be at least two + bits long. + + For further discussion of broadcast addresses, see Section + 3.3.6. + + A host MUST support the subnet extensions to IP [IP:3]. As + a result, there will be an address mask of the form: + {-1, -1, 0} associated with each of the host's local IP + addresses; see Sections 3.2.2.9 and 3.3.1.1. + + When a host sends any datagram, the IP source address MUST + be one of its own IP addresses (but not a broadcast or + multicast address). + + A host MUST silently discard an incoming datagram that is + not destined for the host. An incoming datagram is destined + for the host if the datagram's destination address field is: + + (1) (one of) the host's IP address(es); or + + (2) an IP broadcast address valid for the connected + network; or + + (3) the address for a multicast group of which the host is + a member on the incoming physical interface. + + For most purposes, a datagram addressed to a broadcast or + multicast destination is processed as if it had been + addressed to one of the host's IP addresses; we use the term + "specific-destination address" for the equivalent local IP + + + +Internet Engineering Task Force [Page 31] + + + + +RFC1122 INTERNET LAYER October 1989 + + + address of the host. The specific-destination address is + defined to be the destination address in the IP header + unless the header contains a broadcast or multicast address, + in which case the specific-destination is an IP address + assigned to the physical interface on which the datagram + arrived. + + A host MUST silently discard an incoming datagram containing + an IP source address that is invalid by the rules of this + section. This validation could be done in either the IP + layer or by each protocol in the transport layer. + + DISCUSSION: + A mis-addressed datagram might be caused by a link- + layer broadcast of a unicast datagram or by a gateway + or host that is confused or mis-configured. + + An architectural goal for Internet hosts was to allow + IP addresses to be featureless 32-bit numbers, avoiding + algorithms that required a knowledge of the IP address + format. Otherwise, any future change in the format or + interpretation of IP addresses will require host + software changes. However, validation of broadcast and + multicast addresses violates this goal; a few other + violations are described elsewhere in this document. + + Implementers should be aware that applications + depending upon the all-subnets directed broadcast + address (f) may be unusable on some networks. All- + subnets broadcast is not widely implemented in vendor + gateways at present, and even when it is implemented, a + particular network administration may disable it in the + gateway configuration. + + 3.2.1.4 Fragmentation and Reassembly: RFC-791 Section 3.2 + + The Internet model requires that every host support + reassembly. See Sections 3.3.2 and 3.3.3 for the + requirements on fragmentation and reassembly. + + 3.2.1.5 Identification: RFC-791 Section 3.2 + + When sending an identical copy of an earlier datagram, a + host MAY optionally retain the same Identification field in + the copy. + + + + + + +Internet Engineering Task Force [Page 32] + + + + +RFC1122 INTERNET LAYER October 1989 + + + DISCUSSION: + Some Internet protocol experts have maintained that + when a host sends an identical copy of an earlier + datagram, the new copy should contain the same + Identification value as the original. There are two + suggested advantages: (1) if the datagrams are + fragmented and some of the fragments are lost, the + receiver may be able to reconstruct a complete datagram + from fragments of the original and the copies; (2) a + congested gateway might use the IP Identification field + (and Fragment Offset) to discard duplicate datagrams + from the queue. + + However, the observed patterns of datagram loss in the + Internet do not favor the probability of retransmitted + fragments filling reassembly gaps, while other + mechanisms (e.g., TCP repacketizing upon + retransmission) tend to prevent retransmission of an + identical datagram [IP:9]. Therefore, we believe that + retransmitting the same Identification field is not + useful. Also, a connectionless transport protocol like + UDP would require the cooperation of the application + programs to retain the same Identification value in + identical datagrams. + + 3.2.1.6 Type-of-Service: RFC-791 Section 3.2 + + The "Type-of-Service" byte in the IP header is divided into + two sections: the Precedence field (high-order 3 bits), and + a field that is customarily called "Type-of-Service" or + "TOS" (low-order 5 bits). In this document, all references + to "TOS" or the "TOS field" refer to the low-order 5 bits + only. + + The Precedence field is intended for Department of Defense + applications of the Internet protocols. The use of non-zero + values in this field is outside the scope of this document + and the IP standard specification. Vendors should consult + the Defense Communication Agency (DCA) for guidance on the + IP Precedence field and its implications for other protocol + layers. However, vendors should note that the use of + precedence will most likely require that its value be passed + between protocol layers in just the same way as the TOS + field is passed. + + The IP layer MUST provide a means for the transport layer to + set the TOS field of every datagram that is sent; the + default is all zero bits. The IP layer SHOULD pass received + + + +Internet Engineering Task Force [Page 33] + + + + +RFC1122 INTERNET LAYER October 1989 + + + TOS values up to the transport layer. + + The particular link-layer mappings of TOS contained in RFC- + 795 SHOULD NOT be implemented. + + DISCUSSION: + While the TOS field has been little used in the past, + it is expected to play an increasing role in the near + future. The TOS field is expected to be used to + control two aspects of gateway operations: routing and + queueing algorithms. See Section 2 of [INTRO:1] for + the requirements on application programs to specify TOS + values. + + The TOS field may also be mapped into link-layer + service selectors. This has been applied to provide + effective sharing of serial lines by different classes + of TCP traffic, for example. However, the mappings + suggested in RFC-795 for networks that were included in + the Internet as of 1981 are now obsolete. + + 3.2.1.7 Time-to-Live: RFC-791 Section 3.2 + + A host MUST NOT send a datagram with a Time-to-Live (TTL) + value of zero. + + A host MUST NOT discard a datagram just because it was + received with TTL less than 2. + + The IP layer MUST provide a means for the transport layer to + set the TTL field of every datagram that is sent. When a + fixed TTL value is used, it MUST be configurable. The + current suggested value will be published in the "Assigned + Numbers" RFC. + + DISCUSSION: + The TTL field has two functions: limit the lifetime of + TCP segments (see RFC-793 [TCP:1], p. 28), and + terminate Internet routing loops. Although TTL is a + time in seconds, it also has some attributes of a hop- + count, since each gateway is required to reduce the TTL + field by at least one. + + The intent is that TTL expiration will cause a datagram + to be discarded by a gateway but not by the destination + host; however, hosts that act as gateways by forwarding + datagrams must follow the gateway rules for TTL. + + + + +Internet Engineering Task Force [Page 34] + + + + +RFC1122 INTERNET LAYER October 1989 + + + A higher-layer protocol may want to set the TTL in + order to implement an "expanding scope" search for some + Internet resource. This is used by some diagnostic + tools, and is expected to be useful for locating the + "nearest" server of a given class using IP + multicasting, for example. A particular transport + protocol may also want to specify its own TTL bound on + maximum datagram lifetime. + + A fixed value must be at least big enough for the + Internet "diameter," i.e., the longest possible path. + A reasonable value is about twice the diameter, to + allow for continued Internet growth. + + 3.2.1.8 Options: RFC-791 Section 3.2 + + There MUST be a means for the transport layer to specify IP + options to be included in transmitted IP datagrams (see + Section 3.4). + + All IP options (except NOP or END-OF-LIST) received in + datagrams MUST be passed to the transport layer (or to ICMP + processing when the datagram is an ICMP message). The IP + and transport layer MUST each interpret those IP options + that they understand and silently ignore the others. + + Later sections of this document discuss specific IP option + support required by each of ICMP, TCP, and UDP. + + DISCUSSION: + Passing all received IP options to the transport layer + is a deliberate "violation of strict layering" that is + designed to ease the introduction of new transport- + relevant IP options in the future. Each layer must + pick out any options that are relevant to its own + processing and ignore the rest. For this purpose, + every IP option except NOP and END-OF-LIST will include + a specification of its own length. + + This document does not define the order in which a + receiver must process multiple options in the same IP + header. Hosts sending multiple options must be aware + that this introduces an ambiguity in the meaning of + certain options when combined with a source-route + option. + + IMPLEMENTATION: + The IP layer must not crash as the result of an option + + + +Internet Engineering Task Force [Page 35] + + + + +RFC1122 INTERNET LAYER October 1989 + + + length that is outside the possible range. For + example, erroneous option lengths have been observed to + put some IP implementations into infinite loops. + + Here are the requirements for specific IP options: + + + (a) Security Option + + Some environments require the Security option in every + datagram; such a requirement is outside the scope of + this document and the IP standard specification. Note, + however, that the security options described in RFC-791 + and RFC-1038 are obsolete. For DoD applications, + vendors should consult [IP:8] for guidance. + + + (b) Stream Identifier Option + + This option is obsolete; it SHOULD NOT be sent, and it + MUST be silently ignored if received. + + + (c) Source Route Options + + A host MUST support originating a source route and MUST + be able to act as the final destination of a source + route. + + If host receives a datagram containing a completed + source route (i.e., the pointer points beyond the last + field), the datagram has reached its final destination; + the option as received (the recorded route) MUST be + passed up to the transport layer (or to ICMP message + processing). This recorded route will be reversed and + used to form a return source route for reply datagrams + (see discussion of IP Options in Section 4). When a + return source route is built, it MUST be correctly + formed even if the recorded route included the source + host (see case (B) in the discussion below). + + An IP header containing more than one Source Route + option MUST NOT be sent; the effect on routing of + multiple Source Route options is implementation- + specific. + + Section 3.3.5 presents the rules for a host acting as + an intermediate hop in a source route, i.e., forwarding + + + +Internet Engineering Task Force [Page 36] + + + + +RFC1122 INTERNET LAYER October 1989 + + + a source-routed datagram. + + DISCUSSION: + If a source-routed datagram is fragmented, each + fragment will contain a copy of the source route. + Since the processing of IP options (including a + source route) must precede reassembly, the + original datagram will not be reassembled until + the final destination is reached. + + Suppose a source routed datagram is to be routed + from host S to host D via gateways G1, G2, ... Gn. + There was an ambiguity in the specification over + whether the source route option in a datagram sent + out by S should be (A) or (B): + + (A): {>>G2, G3, ... Gn, D} <--- CORRECT + + (B): {S, >>G2, G3, ... Gn, D} <---- WRONG + + (where >> represents the pointer). If (A) is + sent, the datagram received at D will contain the + option: {G1, G2, ... Gn >>}, with S and D as the + IP source and destination addresses. If (B) were + sent, the datagram received at D would again + contain S and D as the same IP source and + destination addresses, but the option would be: + {S, G1, ...Gn >>}; i.e., the originating host + would be the first hop in the route. + + + (d) Record Route Option + + Implementation of originating and processing the Record + Route option is OPTIONAL. + + + (e) Timestamp Option + + Implementation of originating and processing the + Timestamp option is OPTIONAL. If it is implemented, + the following rules apply: + + o The originating host MUST record a timestamp in a + Timestamp option whose Internet address fields are + not pre-specified or whose first pre-specified + address is the host's interface address. + + + + +Internet Engineering Task Force [Page 37] + + + + +RFC1122 INTERNET LAYER October 1989 + + + o The destination host MUST (if possible) add the + current timestamp to a Timestamp option before + passing the option to the transport layer or to + ICMP for processing. + + o A timestamp value MUST follow the rules given in + Section 3.2.2.8 for the ICMP Timestamp message. + + + 3.2.2 Internet Control Message Protocol -- ICMP + + ICMP messages are grouped into two classes. + + * + ICMP error messages: + + Destination Unreachable (see Section 3.2.2.1) + Redirect (see Section 3.2.2.2) + Source Quench (see Section 3.2.2.3) + Time Exceeded (see Section 3.2.2.4) + Parameter Problem (see Section 3.2.2.5) + + + * + ICMP query messages: + + Echo (see Section 3.2.2.6) + Information (see Section 3.2.2.7) + Timestamp (see Section 3.2.2.8) + Address Mask (see Section 3.2.2.9) + + + If an ICMP message of unknown type is received, it MUST be + silently discarded. + + Every ICMP error message includes the Internet header and at + least the first 8 data octets of the datagram that triggered + the error; more than 8 octets MAY be sent; this header and data + MUST be unchanged from the received datagram. + + In those cases where the Internet layer is required to pass an + ICMP error message to the transport layer, the IP protocol + number MUST be extracted from the original header and used to + select the appropriate transport protocol entity to handle the + error. + + An ICMP error message SHOULD be sent with normal (i.e., zero) + TOS bits. + + + +Internet Engineering Task Force [Page 38] + + + + +RFC1122 INTERNET LAYER October 1989 + + + An ICMP error message MUST NOT be sent as the result of + receiving: + + * an ICMP error message, or + + * a datagram destined to an IP broadcast or IP multicast + address, or + + * a datagram sent as a link-layer broadcast, or + + * a non-initial fragment, or + + * a datagram whose source address does not define a single + host -- e.g., a zero address, a loopback address, a + broadcast address, a multicast address, or a Class E + address. + + NOTE: THESE RESTRICTIONS TAKE PRECEDENCE OVER ANY REQUIREMENT + ELSEWHERE IN THIS DOCUMENT FOR SENDING ICMP ERROR MESSAGES. + + DISCUSSION: + These rules will prevent the "broadcast storms" that have + resulted from hosts returning ICMP error messages in + response to broadcast datagrams. For example, a broadcast + UDP segment to a non-existent port could trigger a flood + of ICMP Destination Unreachable datagrams from all + machines that do not have a client for that destination + port. On a large Ethernet, the resulting collisions can + render the network useless for a second or more. + + Every datagram that is broadcast on the connected network + should have a valid IP broadcast address as its IP + destination (see Section 3.3.6). However, some hosts + violate this rule. To be certain to detect broadcast + datagrams, therefore, hosts are required to check for a + link-layer broadcast as well as an IP-layer broadcast + address. + + IMPLEMENTATION: + This requires that the link layer inform the IP layer when + a link-layer broadcast datagram has been received; see + Section 2.4. + + 3.2.2.1 Destination Unreachable: RFC-792 + + The following additional codes are hereby defined: + + 6 = destination network unknown + + + +Internet Engineering Task Force [Page 39] + + + + +RFC1122 INTERNET LAYER October 1989 + + + 7 = destination host unknown + + 8 = source host isolated + + 9 = communication with destination network + administratively prohibited + + 10 = communication with destination host + administratively prohibited + + 11 = network unreachable for type of service + + 12 = host unreachable for type of service + + A host SHOULD generate Destination Unreachable messages with + code: + + 2 (Protocol Unreachable), when the designated transport + protocol is not supported; or + + 3 (Port Unreachable), when the designated transport + protocol (e.g., UDP) is unable to demultiplex the + datagram but has no protocol mechanism to inform the + sender. + + A Destination Unreachable message that is received MUST be + reported to the transport layer. The transport layer SHOULD + use the information appropriately; for example, see Sections + 4.1.3.3, 4.2.3.9, and 4.2.4 below. A transport protocol + that has its own mechanism for notifying the sender that a + port is unreachable (e.g., TCP, which sends RST segments) + MUST nevertheless accept an ICMP Port Unreachable for the + same purpose. + + A Destination Unreachable message that is received with code + 0 (Net), 1 (Host), or 5 (Bad Source Route) may result from a + routing transient and MUST therefore be interpreted as only + a hint, not proof, that the specified destination is + unreachable [IP:11]. For example, it MUST NOT be used as + proof of a dead gateway (see Section 3.3.1). + + 3.2.2.2 Redirect: RFC-792 + + A host SHOULD NOT send an ICMP Redirect message; Redirects + are to be sent only by gateways. + + A host receiving a Redirect message MUST update its routing + information accordingly. Every host MUST be prepared to + + + +Internet Engineering Task Force [Page 40] + + + + +RFC1122 INTERNET LAYER October 1989 + + + accept both Host and Network Redirects and to process them + as described in Section 3.3.1.2 below. + + A Redirect message SHOULD be silently discarded if the new + gateway address it specifies is not on the same connected + (sub-) net through which the Redirect arrived [INTRO:2, + Appendix A], or if the source of the Redirect is not the + current first-hop gateway for the specified destination (see + Section 3.3.1). + + 3.2.2.3 Source Quench: RFC-792 + + A host MAY send a Source Quench message if it is + approaching, or has reached, the point at which it is forced + to discard incoming datagrams due to a shortage of + reassembly buffers or other resources. See Section 2.2.3 of + [INTRO:2] for suggestions on when to send Source Quench. + + If a Source Quench message is received, the IP layer MUST + report it to the transport layer (or ICMP processing). In + general, the transport or application layer SHOULD implement + a mechanism to respond to Source Quench for any protocol + that can send a sequence of datagrams to the same + destination and which can reasonably be expected to maintain + enough state information to make this feasible. See Section + 4 for the handling of Source Quench by TCP and UDP. + + DISCUSSION: + A Source Quench may be generated by the target host or + by some gateway in the path of a datagram. The host + receiving a Source Quench should throttle itself back + for a period of time, then gradually increase the + transmission rate again. The mechanism to respond to + Source Quench may be in the transport layer (for + connection-oriented protocols like TCP) or in the + application layer (for protocols that are built on top + of UDP). + + A mechanism has been proposed [IP:14] to make the IP + layer respond directly to Source Quench by controlling + the rate at which datagrams are sent, however, this + proposal is currently experimental and not currently + recommended. + + 3.2.2.4 Time Exceeded: RFC-792 + + An incoming Time Exceeded message MUST be passed to the + transport layer. + + + +Internet Engineering Task Force [Page 41] + + + + +RFC1122 INTERNET LAYER October 1989 + + + DISCUSSION: + A gateway will send a Time Exceeded Code 0 (In Transit) + message when it discards a datagram due to an expired + TTL field. This indicates either a gateway routing + loop or too small an initial TTL value. + + A host may receive a Time Exceeded Code 1 (Reassembly + Timeout) message from a destination host that has timed + out and discarded an incomplete datagram; see Section + 3.3.2 below. In the future, receipt of this message + might be part of some "MTU discovery" procedure, to + discover the maximum datagram size that can be sent on + the path without fragmentation. + + 3.2.2.5 Parameter Problem: RFC-792 + + A host SHOULD generate Parameter Problem messages. An + incoming Parameter Problem message MUST be passed to the + transport layer, and it MAY be reported to the user. + + DISCUSSION: + The ICMP Parameter Problem message is sent to the + source host for any problem not specifically covered by + another ICMP message. Receipt of a Parameter Problem + message generally indicates some local or remote + implementation error. + + A new variant on the Parameter Problem message is hereby + defined: + Code 1 = required option is missing. + + DISCUSSION: + This variant is currently in use in the military + community for a missing security option. + + 3.2.2.6 Echo Request/Reply: RFC-792 + + Every host MUST implement an ICMP Echo server function that + receives Echo Requests and sends corresponding Echo Replies. + A host SHOULD also implement an application-layer interface + for sending an Echo Request and receiving an Echo Reply, for + diagnostic purposes. + + An ICMP Echo Request destined to an IP broadcast or IP + multicast address MAY be silently discarded. + + + + + + +Internet Engineering Task Force [Page 42] + + + + +RFC1122 INTERNET LAYER October 1989 + + + DISCUSSION: + This neutral provision results from a passionate debate + between those who feel that ICMP Echo to a broadcast + address provides a valuable diagnostic capability and + those who feel that misuse of this feature can too + easily create packet storms. + + The IP source address in an ICMP Echo Reply MUST be the same + as the specific-destination address (defined in Section + 3.2.1.3) of the corresponding ICMP Echo Request message. + + Data received in an ICMP Echo Request MUST be entirely + included in the resulting Echo Reply. However, if sending + the Echo Reply requires intentional fragmentation that is + not implemented, the datagram MUST be truncated to maximum + transmission size (see Section 3.3.3) and sent. + + Echo Reply messages MUST be passed to the ICMP user + interface, unless the corresponding Echo Request originated + in the IP layer. + + If a Record Route and/or Time Stamp option is received in an + ICMP Echo Request, this option (these options) SHOULD be + updated to include the current host and included in the IP + header of the Echo Reply message, without "truncation". + Thus, the recorded route will be for the entire round trip. + + If a Source Route option is received in an ICMP Echo + Request, the return route MUST be reversed and used as a + Source Route option for the Echo Reply message. + + 3.2.2.7 Information Request/Reply: RFC-792 + + A host SHOULD NOT implement these messages. + + DISCUSSION: + The Information Request/Reply pair was intended to + support self-configuring systems such as diskless + workstations, to allow them to discover their IP + network numbers at boot time. However, the RARP and + BOOTP protocols provide better mechanisms for a host to + discover its own IP address. + + 3.2.2.8 Timestamp and Timestamp Reply: RFC-792 + + A host MAY implement Timestamp and Timestamp Reply. If they + are implemented, the following rules MUST be followed. + + + + +Internet Engineering Task Force [Page 43] + + + + +RFC1122 INTERNET LAYER October 1989 + + + o The ICMP Timestamp server function returns a Timestamp + Reply to every Timestamp message that is received. If + this function is implemented, it SHOULD be designed for + minimum variability in delay (e.g., implemented in the + kernel to avoid delay in scheduling a user process). + + The following cases for Timestamp are to be handled + according to the corresponding rules for ICMP Echo: + + o An ICMP Timestamp Request message to an IP broadcast or + IP multicast address MAY be silently discarded. + + o The IP source address in an ICMP Timestamp Reply MUST + be the same as the specific-destination address of the + corresponding Timestamp Request message. + + o If a Source-route option is received in an ICMP Echo + Request, the return route MUST be reversed and used as + a Source Route option for the Timestamp Reply message. + + o If a Record Route and/or Timestamp option is received + in a Timestamp Request, this (these) option(s) SHOULD + be updated to include the current host and included in + the IP header of the Timestamp Reply message. + + o Incoming Timestamp Reply messages MUST be passed up to + the ICMP user interface. + + The preferred form for a timestamp value (the "standard + value") is in units of milliseconds since midnight Universal + Time. However, it may be difficult to provide this value + with millisecond resolution. For example, many systems use + clocks that update only at line frequency, 50 or 60 times + per second. Therefore, some latitude is allowed in a + "standard value": + + (a) A "standard value" MUST be updated at least 15 times + per second (i.e., at most the six low-order bits of the + value may be undefined). + + (b) The accuracy of a "standard value" MUST approximate + that of operator-set CPU clocks, i.e., correct within a + few minutes. + + + + + + + + +Internet Engineering Task Force [Page 44] + + + + +RFC1122 INTERNET LAYER October 1989 + + + 3.2.2.9 Address Mask Request/Reply: RFC-950 + + A host MUST support the first, and MAY implement all three, + of the following methods for determining the address mask(s) + corresponding to its IP address(es): + + (1) static configuration information; + + (2) obtaining the address mask(s) dynamically as a side- + effect of the system initialization process (see + [INTRO:1]); and + + (3) sending ICMP Address Mask Request(s) and receiving ICMP + Address Mask Reply(s). + + The choice of method to be used in a particular host MUST be + configurable. + + When method (3), the use of Address Mask messages, is + enabled, then: + + (a) When it initializes, the host MUST broadcast an Address + Mask Request message on the connected network + corresponding to the IP address. It MUST retransmit + this message a small number of times if it does not + receive an immediate Address Mask Reply. + + (b) Until it has received an Address Mask Reply, the host + SHOULD assume a mask appropriate for the address class + of the IP address, i.e., assume that the connected + network is not subnetted. + + (c) The first Address Mask Reply message received MUST be + used to set the address mask corresponding to the + particular local IP address. This is true even if the + first Address Mask Reply message is "unsolicited", in + which case it will have been broadcast and may arrive + after the host has ceased to retransmit Address Mask + Requests. Once the mask has been set by an Address + Mask Reply, later Address Mask Reply messages MUST be + (silently) ignored. + + Conversely, if Address Mask messages are disabled, then no + ICMP Address Mask Requests will be sent, and any ICMP + Address Mask Replies received for that local IP address MUST + be (silently) ignored. + + A host SHOULD make some reasonableness check on any address + + + +Internet Engineering Task Force [Page 45] + + + + +RFC1122 INTERNET LAYER October 1989 + + + mask it installs; see IMPLEMENTATION section below. + + A system MUST NOT send an Address Mask Reply unless it is an + authoritative agent for address masks. An authoritative + agent may be a host or a gateway, but it MUST be explicitly + configured as a address mask agent. Receiving an address + mask via an Address Mask Reply does not give the receiver + authority and MUST NOT be used as the basis for issuing + Address Mask Replies. + + With a statically configured address mask, there SHOULD be + an additional configuration flag that determines whether the + host is to act as an authoritative agent for this mask, + i.e., whether it will answer Address Mask Request messages + using this mask. + + If it is configured as an agent, the host MUST broadcast an + Address Mask Reply for the mask on the appropriate interface + when it initializes. + + See "System Initialization" in [INTRO:1] for more + information about the use of Address Mask Request/Reply + messages. + + DISCUSSION + Hosts that casually send Address Mask Replies with + invalid address masks have often been a serious + nuisance. To prevent this, Address Mask Replies ought + to be sent only by authoritative agents that have been + selected by explicit administrative action. + + When an authoritative agent receives an Address Mask + Request message, it will send a unicast Address Mask + Reply to the source IP address. If the network part of + this address is zero (see (a) and (b) in 3.2.1.3), the + Reply will be broadcast. + + Getting no reply to its Address Mask Request messages, + a host will assume there is no agent and use an + unsubnetted mask, but the agent may be only temporarily + unreachable. An agent will broadcast an unsolicited + Address Mask Reply whenever it initializes, in order to + update the masks of all hosts that have initialized in + the meantime. + + IMPLEMENTATION: + The following reasonableness check on an address mask + is suggested: the mask is not all 1 bits, and it is + + + +Internet Engineering Task Force [Page 46] + + + + +RFC1122 INTERNET LAYER October 1989 + + + either zero or else the 8 highest-order bits are on. + + 3.2.3 Internet Group Management Protocol IGMP + + IGMP [IP:4] is a protocol used between hosts and gateways on a + single network to establish hosts' membership in particular + multicast groups. The gateways use this information, in + conjunction with a multicast routing protocol, to support IP + multicasting across the Internet. + + At this time, implementation of IGMP is OPTIONAL; see Section + 3.3.7 for more information. Without IGMP, a host can still + participate in multicasting local to its connected networks. + + 3.3 SPECIFIC ISSUES + + 3.3.1 Routing Outbound Datagrams + + The IP layer chooses the correct next hop for each datagram it + sends. If the destination is on a connected network, the + datagram is sent directly to the destination host; otherwise, + it has to be routed to a gateway on a connected network. + + 3.3.1.1 Local/Remote Decision + + To decide if the destination is on a connected network, the + following algorithm MUST be used [see IP:3]: + + (a) The address mask (particular to a local IP address for + a multihomed host) is a 32-bit mask that selects the + network number and subnet number fields of the + corresponding IP address. + + (b) If the IP destination address bits extracted by the + address mask match the IP source address bits extracted + by the same mask, then the destination is on the + corresponding connected network, and the datagram is to + be transmitted directly to the destination host. + + (c) If not, then the destination is accessible only through + a gateway. Selection of a gateway is described below + (3.3.1.2). + + A special-case destination address is handled as follows: + + * For a limited broadcast or a multicast address, simply + pass the datagram to the link layer for the appropriate + interface. + + + +Internet Engineering Task Force [Page 47] + + + + +RFC1122 INTERNET LAYER October 1989 + + + * For a (network or subnet) directed broadcast, the + datagram can use the standard routing algorithms. + + The host IP layer MUST operate correctly in a minimal + network environment, and in particular, when there are no + gateways. For example, if the IP layer of a host insists on + finding at least one gateway to initialize, the host will be + unable to operate on a single isolated broadcast net. + + 3.3.1.2 Gateway Selection + + To efficiently route a series of datagrams to the same + destination, the source host MUST keep a "route cache" of + mappings to next-hop gateways. A host uses the following + basic algorithm on this cache to route a datagram; this + algorithm is designed to put the primary routing burden on + the gateways [IP:11]. + + (a) If the route cache contains no information for a + particular destination, the host chooses a "default" + gateway and sends the datagram to it. It also builds a + corresponding Route Cache entry. + + (b) If that gateway is not the best next hop to the + destination, the gateway will forward the datagram to + the best next-hop gateway and return an ICMP Redirect + message to the source host. + + (c) When it receives a Redirect, the host updates the + next-hop gateway in the appropriate route cache entry, + so later datagrams to the same destination will go + directly to the best gateway. + + Since the subnet mask appropriate to the destination address + is generally not known, a Network Redirect message SHOULD be + treated identically to a Host Redirect message; i.e., the + cache entry for the destination host (only) would be updated + (or created, if an entry for that host did not exist) for + the new gateway. + + DISCUSSION: + This recommendation is to protect against gateways that + erroneously send Network Redirects for a subnetted + network, in violation of the gateway requirements + [INTRO:2]. + + When there is no route cache entry for the destination host + address (and the destination is not on the connected + + + +Internet Engineering Task Force [Page 48] + + + + +RFC1122 INTERNET LAYER October 1989 + + + network), the IP layer MUST pick a gateway from its list of + "default" gateways. The IP layer MUST support multiple + default gateways. + + As an extra feature, a host IP layer MAY implement a table + of "static routes". Each such static route MAY include a + flag specifying whether it may be overridden by ICMP + Redirects. + + DISCUSSION: + A host generally needs to know at least one default + gateway to get started. This information can be + obtained from a configuration file or else from the + host startup sequence, e.g., the BOOTP protocol (see + [INTRO:1]). + + It has been suggested that a host can augment its list + of default gateways by recording any new gateways it + learns about. For example, it can record every gateway + to which it is ever redirected. Such a feature, while + possibly useful in some circumstances, may cause + problems in other cases (e.g., gateways are not all + equal), and it is not recommended. + + A static route is typically a particular preset mapping + from destination host or network into a particular + next-hop gateway; it might also depend on the Type-of- + Service (see next section). Static routes would be set + up by system administrators to override the normal + automatic routing mechanism, to handle exceptional + situations. However, any static routing information is + a potential source of failure as configurations change + or equipment fails. + + 3.3.1.3 Route Cache + + Each route cache entry needs to include the following + fields: + + (1) Local IP address (for a multihomed host) + + (2) Destination IP address + + (3) Type(s)-of-Service + + (4) Next-hop gateway IP address + + Field (2) MAY be the full IP address of the destination + + + +Internet Engineering Task Force [Page 49] + + + + +RFC1122 INTERNET LAYER October 1989 + + + host, or only the destination network number. Field (3), + the TOS, SHOULD be included. + + See Section 3.3.4.2 for a discussion of the implications of + multihoming for the lookup procedure in this cache. + + DISCUSSION: + Including the Type-of-Service field in the route cache + and considering it in the host route algorithm will + provide the necessary mechanism for the future when + Type-of-Service routing is commonly used in the + Internet. See Section 3.2.1.6. + + Each route cache entry defines the endpoints of an + Internet path. Although the connecting path may change + dynamically in an arbitrary way, the transmission + characteristics of the path tend to remain + approximately constant over a time period longer than a + single typical host-host transport connection. + Therefore, a route cache entry is a natural place to + cache data on the properties of the path. Examples of + such properties might be the maximum unfragmented + datagram size (see Section 3.3.3), or the average + round-trip delay measured by a transport protocol. + This data will generally be both gathered and used by a + higher layer protocol, e.g., by TCP, or by an + application using UDP. Experiments are currently in + progress on caching path properties in this manner. + + There is no consensus on whether the route cache should + be keyed on destination host addresses alone, or allow + both host and network addresses. Those who favor the + use of only host addresses argue that: + + (1) As required in Section 3.3.1.2, Redirect messages + will generally result in entries keyed on + destination host addresses; the simplest and most + general scheme would be to use host addresses + always. + + (2) The IP layer may not always know the address mask + for a network address in a complex subnetted + environment. + + (3) The use of only host addresses allows the + destination address to be used as a pure 32-bit + number, which may allow the Internet architecture + to be more easily extended in the future without + + + +Internet Engineering Task Force [Page 50] + + + + +RFC1122 INTERNET LAYER October 1989 + + + any change to the hosts. + + The opposing view is that allowing a mixture of + destination hosts and networks in the route cache: + + (1) Saves memory space. + + (2) Leads to a simpler data structure, easily + combining the cache with the tables of default and + static routes (see below). + + (3) Provides a more useful place to cache path + properties, as discussed earlier. + + + IMPLEMENTATION: + The cache needs to be large enough to include entries + for the maximum number of destination hosts that may be + in use at one time. + + A route cache entry may also include control + information used to choose an entry for replacement. + This might take the form of a "recently used" bit, a + use count, or a last-used timestamp, for example. It + is recommended that it include the time of last + modification of the entry, for diagnostic purposes. + + An implementation may wish to reduce the overhead of + scanning the route cache for every datagram to be + transmitted. This may be accomplished with a hash + table to speed the lookup, or by giving a connection- + oriented transport protocol a "hint" or temporary + handle on the appropriate cache entry, to be passed to + the IP layer with each subsequent datagram. + + Although we have described the route cache, the lists + of default gateways, and a table of static routes as + conceptually distinct, in practice they may be combined + into a single "routing table" data structure. + + 3.3.1.4 Dead Gateway Detection + + The IP layer MUST be able to detect the failure of a "next- + hop" gateway that is listed in its route cache and to choose + an alternate gateway (see Section 3.3.1.5). + + Dead gateway detection is covered in some detail in RFC-816 + [IP:11]. Experience to date has not produced a complete + + + +Internet Engineering Task Force [Page 51] + + + + +RFC1122 INTERNET LAYER October 1989 + + + algorithm which is totally satisfactory, though it has + identified several forbidden paths and promising techniques. + + * A particular gateway SHOULD NOT be used indefinitely in + the absence of positive indications that it is + functioning. + + * Active probes such as "pinging" (i.e., using an ICMP + Echo Request/Reply exchange) are expensive and scale + poorly. In particular, hosts MUST NOT actively check + the status of a first-hop gateway by simply pinging the + gateway continuously. + + * Even when it is the only effective way to verify a + gateway's status, pinging MUST be used only when + traffic is being sent to the gateway and when there is + no other positive indication to suggest that the + gateway is functioning. + + * To avoid pinging, the layers above and/or below the + Internet layer SHOULD be able to give "advice" on the + status of route cache entries when either positive + (gateway OK) or negative (gateway dead) information is + available. + + + DISCUSSION: + If an implementation does not include an adequate + mechanism for detecting a dead gateway and re-routing, + a gateway failure may cause datagrams to apparently + vanish into a "black hole". This failure can be + extremely confusing for users and difficult for network + personnel to debug. + + The dead-gateway detection mechanism must not cause + unacceptable load on the host, on connected networks, + or on first-hop gateway(s). The exact constraints on + the timeliness of dead gateway detection and on + acceptable load may vary somewhat depending on the + nature of the host's mission, but a host generally + needs to detect a failed first-hop gateway quickly + enough that transport-layer connections will not break + before an alternate gateway can be selected. + + Passing advice from other layers of the protocol stack + complicates the interfaces between the layers, but it + is the preferred approach to dead gateway detection. + Advice can come from almost any part of the IP/TCP + + + +Internet Engineering Task Force [Page 52] + + + + +RFC1122 INTERNET LAYER October 1989 + + + architecture, but it is expected to come primarily from + the transport and link layers. Here are some possible + sources for gateway advice: + + o TCP or any connection-oriented transport protocol + should be able to give negative advice, e.g., + triggered by excessive retransmissions. + + o TCP may give positive advice when (new) data is + acknowledged. Even though the route may be + asymmetric, an ACK for new data proves that the + acknowleged data must have been transmitted + successfully. + + o An ICMP Redirect message from a particular gateway + should be used as positive advice about that + gateway. + + o Link-layer information that reliably detects and + reports host failures (e.g., ARPANET Destination + Dead messages) should be used as negative advice. + + o Failure to ARP or to re-validate ARP mappings may + be used as negative advice for the corresponding + IP address. + + o Packets arriving from a particular link-layer + address are evidence that the system at this + address is alive. However, turning this + information into advice about gateways requires + mapping the link-layer address into an IP address, + and then checking that IP address against the + gateways pointed to by the route cache. This is + probably prohibitively inefficient. + + Note that positive advice that is given for every + datagram received may cause unacceptable overhead in + the implementation. + + While advice might be passed using required arguments + in all interfaces to the IP layer, some transport and + application layer protocols cannot deduce the correct + advice. These interfaces must therefore allow a + neutral value for advice, since either always-positive + or always-negative advice leads to incorrect behavior. + + There is another technique for dead gateway detection + that has been commonly used but is not recommended. + + + +Internet Engineering Task Force [Page 53] + + + + +RFC1122 INTERNET LAYER October 1989 + + + This technique depends upon the host passively + receiving ("wiretapping") the Interior Gateway Protocol + (IGP) datagrams that the gateways are broadcasting to + each other. This approach has the drawback that a host + needs to recognize all the interior gateway protocols + that gateways may use (see [INTRO:2]). In addition, it + only works on a broadcast network. + + At present, pinging (i.e., using ICMP Echo messages) is + the mechanism for gateway probing when absolutely + required. A successful ping guarantees that the + addressed interface and its associated machine are up, + but it does not guarantee that the machine is a gateway + as opposed to a host. The normal inference is that if + a Redirect or other evidence indicates that a machine + was a gateway, successful pings will indicate that the + machine is still up and hence still a gateway. + However, since a host silently discards packets that a + gateway would forward or redirect, this assumption + could sometimes fail. To avoid this problem, a new + ICMP message under development will ask "are you a + gateway?" + + IMPLEMENTATION: + The following specific algorithm has been suggested: + + o Associate a "reroute timer" with each gateway + pointed to by the route cache. Initialize the + timer to a value Tr, which must be small enough to + allow detection of a dead gateway before transport + connections time out. + + o Positive advice would reset the reroute timer to + Tr. Negative advice would reduce or zero the + reroute timer. + + o Whenever the IP layer used a particular gateway to + route a datagram, it would check the corresponding + reroute timer. If the timer had expired (reached + zero), the IP layer would send a ping to the + gateway, followed immediately by the datagram. + + o The ping (ICMP Echo) would be sent again if + necessary, up to N times. If no ping reply was + received in N tries, the gateway would be assumed + to have failed, and a new first-hop gateway would + be chosen for all cache entries pointing to the + failed gateway. + + + +Internet Engineering Task Force [Page 54] + + + + +RFC1122 INTERNET LAYER October 1989 + + + Note that the size of Tr is inversely related to the + amount of advice available. Tr should be large enough + to insure that: + + * Any pinging will be at a low level (e.g., <10%) of + all packets sent to a gateway from the host, AND + + * pinging is infrequent (e.g., every 3 minutes) + + Since the recommended algorithm is concerned with the + gateways pointed to by route cache entries, rather than + the cache entries themselves, a two level data + structure (perhaps coordinated with ARP or similar + caches) may be desirable for implementing a route + cache. + + 3.3.1.5 New Gateway Selection + + If the failed gateway is not the current default, the IP + layer can immediately switch to a default gateway. If it is + the current default that failed, the IP layer MUST select a + different default gateway (assuming more than one default is + known) for the failed route and for establishing new routes. + + DISCUSSION: + When a gateway does fail, the other gateways on the + connected network will learn of the failure through + some inter-gateway routing protocol. However, this + will not happen instantaneously, since gateway routing + protocols typically have a settling time of 30-60 + seconds. If the host switches to an alternative + gateway before the gateways have agreed on the failure, + the new target gateway will probably forward the + datagram to the failed gateway and send a Redirect back + to the host pointing to the failed gateway (!). The + result is likely to be a rapid oscillation in the + contents of the host's route cache during the gateway + settling period. It has been proposed that the dead- + gateway logic should include some hysteresis mechanism + to prevent such oscillations. However, experience has + not shown any harm from such oscillations, since + service cannot be restored to the host until the + gateways' routing information does settle down. + + IMPLEMENTATION: + One implementation technique for choosing a new default + gateway is to simply round-robin among the default + gateways in the host's list. Another is to rank the + + + +Internet Engineering Task Force [Page 55] + + + + +RFC1122 INTERNET LAYER October 1989 + + + gateways in priority order, and when the current + default gateway is not the highest priority one, to + "ping" the higher-priority gateways slowly to detect + when they return to service. This pinging can be at a + very low rate, e.g., 0.005 per second. + + 3.3.1.6 Initialization + + The following information MUST be configurable: + + (1) IP address(es). + + (2) Address mask(s). + + (3) A list of default gateways, with a preference level. + + A manual method of entering this configuration data MUST be + provided. In addition, a variety of methods can be used to + determine this information dynamically; see the section on + "Host Initialization" in [INTRO:1]. + + DISCUSSION: + Some host implementations use "wiretapping" of gateway + protocols on a broadcast network to learn what gateways + exist. A standard method for default gateway discovery + is under development. + + 3.3.2 Reassembly + + The IP layer MUST implement reassembly of IP datagrams. + + We designate the largest datagram size that can be reassembled + by EMTU_R ("Effective MTU to receive"); this is sometimes + called the "reassembly buffer size". EMTU_R MUST be greater + than or equal to 576, SHOULD be either configurable or + indefinite, and SHOULD be greater than or equal to the MTU of + the connected network(s). + + DISCUSSION: + A fixed EMTU_R limit should not be built into the code + because some application layer protocols require EMTU_R + values larger than 576. + + IMPLEMENTATION: + An implementation may use a contiguous reassembly buffer + for each datagram, or it may use a more complex data + structure that places no definite limit on the reassembled + datagram size; in the latter case, EMTU_R is said to be + + + +Internet Engineering Task Force [Page 56] + + + + +RFC1122 INTERNET LAYER October 1989 + + + "indefinite". + + Logically, reassembly is performed by simply copying each + fragment into the packet buffer at the proper offset. + Note that fragments may overlap if successive + retransmissions use different packetizing but the same + reassembly Id. + + The tricky part of reassembly is the bookkeeping to + determine when all bytes of the datagram have been + reassembled. We recommend Clark's algorithm [IP:10] that + requires no additional data space for the bookkeeping. + However, note that, contrary to [IP:10], the first + fragment header needs to be saved for inclusion in a + possible ICMP Time Exceeded (Reassembly Timeout) message. + + There MUST be a mechanism by which the transport layer can + learn MMS_R, the maximum message size that can be received and + reassembled in an IP datagram (see GET_MAXSIZES calls in + Section 3.4). If EMTU_R is not indefinite, then the value of + MMS_R is given by: + + MMS_R = EMTU_R - 20 + + since 20 is the minimum size of an IP header. + + There MUST be a reassembly timeout. The reassembly timeout + value SHOULD be a fixed value, not set from the remaining TTL. + It is recommended that the value lie between 60 seconds and 120 + seconds. If this timeout expires, the partially-reassembled + datagram MUST be discarded and an ICMP Time Exceeded message + sent to the source host (if fragment zero has been received). + + DISCUSSION: + The IP specification says that the reassembly timeout + should be the remaining TTL from the IP header, but this + does not work well because gateways generally treat TTL as + a simple hop count rather than an elapsed time. If the + reassembly timeout is too small, datagrams will be + discarded unnecessarily, and communication may fail. The + timeout needs to be at least as large as the typical + maximum delay across the Internet. A realistic minimum + reassembly timeout would be 60 seconds. + + It has been suggested that a cache might be kept of + round-trip times measured by transport protocols for + various destinations, and that these values might be used + to dynamically determine a reasonable reassembly timeout + + + +Internet Engineering Task Force [Page 57] + + + + +RFC1122 INTERNET LAYER October 1989 + + + value. Further investigation of this approach is + required. + + If the reassembly timeout is set too high, buffer + resources in the receiving host will be tied up too long, + and the MSL (Maximum Segment Lifetime) [TCP:1] will be + larger than necessary. The MSL controls the maximum rate + at which fragmented datagrams can be sent using distinct + values of the 16-bit Ident field; a larger MSL lowers the + maximum rate. The TCP specification [TCP:1] arbitrarily + assumes a value of 2 minutes for MSL. This sets an upper + limit on a reasonable reassembly timeout value. + + 3.3.3 Fragmentation + + Optionally, the IP layer MAY implement a mechanism to fragment + outgoing datagrams intentionally. + + We designate by EMTU_S ("Effective MTU for sending") the + maximum IP datagram size that may be sent, for a particular + combination of IP source and destination addresses and perhaps + TOS. + + A host MUST implement a mechanism to allow the transport layer + to learn MMS_S, the maximum transport-layer message size that + may be sent for a given {source, destination, TOS} triplet (see + GET_MAXSIZES call in Section 3.4). If no local fragmentation + is performed, the value of MMS_S will be: + + MMS_S = EMTU_S - <IP header size> + + and EMTU_S must be less than or equal to the MTU of the network + interface corresponding to the source address of the datagram. + Note that <IP header size> in this equation will be 20, unless + the IP reserves space to insert IP options for its own purposes + in addition to any options inserted by the transport layer. + + A host that does not implement local fragmentation MUST ensure + that the transport layer (for TCP) or the application layer + (for UDP) obtains MMS_S from the IP layer and does not send a + datagram exceeding MMS_S in size. + + It is generally desirable to avoid local fragmentation and to + choose EMTU_S low enough to avoid fragmentation in any gateway + along the path. In the absence of actual knowledge of the + minimum MTU along the path, the IP layer SHOULD use + EMTU_S <= 576 whenever the destination address is not on a + connected network, and otherwise use the connected network's + + + +Internet Engineering Task Force [Page 58] + + + + +RFC1122 INTERNET LAYER October 1989 + + + MTU. + + The MTU of each physical interface MUST be configurable. + + A host IP layer implementation MAY have a configuration flag + "All-Subnets-MTU", indicating that the MTU of the connected + network is to be used for destinations on different subnets + within the same network, but not for other networks. Thus, + this flag causes the network class mask, rather than the subnet + address mask, to be used to choose an EMTU_S. For a multihomed + host, an "All-Subnets-MTU" flag is needed for each network + interface. + + DISCUSSION: + Picking the correct datagram size to use when sending data + is a complex topic [IP:9]. + + (a) In general, no host is required to accept an IP + datagram larger than 576 bytes (including header and + data), so a host must not send a larger datagram + without explicit knowledge or prior arrangement with + the destination host. Thus, MMS_S is only an upper + bound on the datagram size that a transport protocol + may send; even when MMS_S exceeds 556, the transport + layer must limit its messages to 556 bytes in the + absence of other knowledge about the destination + host. + + (b) Some transport protocols (e.g., TCP) provide a way to + explicitly inform the sender about the largest + datagram the other end can receive and reassemble + [IP:7]. There is no corresponding mechanism in the + IP layer. + + A transport protocol that assumes an EMTU_R larger + than 576 (see Section 3.3.2), can send a datagram of + this larger size to another host that implements the + same protocol. + + (c) Hosts should ideally limit their EMTU_S for a given + destination to the minimum MTU of all the networks + along the path, to avoid any fragmentation. IP + fragmentation, while formally correct, can create a + serious transport protocol performance problem, + because loss of a single fragment means all the + fragments in the segment must be retransmitted + [IP:9]. + + + + +Internet Engineering Task Force [Page 59] + + + + +RFC1122 INTERNET LAYER October 1989 + + + Since nearly all networks in the Internet currently + support an MTU of 576 or greater, we strongly recommend + the use of 576 for datagrams sent to non-local networks. + + It has been suggested that a host could determine the MTU + over a given path by sending a zero-offset datagram + fragment and waiting for the receiver to time out the + reassembly (which cannot complete!) and return an ICMP + Time Exceeded message. This message would include the + largest remaining fragment header in its body. More + direct mechanisms are being experimented with, but have + not yet been adopted (see e.g., RFC-1063). + + 3.3.4 Local Multihoming + + 3.3.4.1 Introduction + + A multihomed host has multiple IP addresses, which we may + think of as "logical interfaces". These logical interfaces + may be associated with one or more physical interfaces, and + these physical interfaces may be connected to the same or + different networks. + + Here are some important cases of multihoming: + + (a) Multiple Logical Networks + + The Internet architects envisioned that each physical + network would have a single unique IP network (or + subnet) number. However, LAN administrators have + sometimes found it useful to violate this assumption, + operating a LAN with multiple logical networks per + physical connected network. + + If a host connected to such a physical network is + configured to handle traffic for each of N different + logical networks, then the host will have N logical + interfaces. These could share a single physical + interface, or might use N physical interfaces to the + same network. + + (b) Multiple Logical Hosts + + When a host has multiple IP addresses that all have the + same <Network-number> part (and the same <Subnet- + number> part, if any), the logical interfaces are known + as "logical hosts". These logical interfaces might + share a single physical interface or might use separate + + + +Internet Engineering Task Force [Page 60] + + + + +RFC1122 INTERNET LAYER October 1989 + + + physical interfaces to the same physical network. + + (c) Simple Multihoming + + In this case, each logical interface is mapped into a + separate physical interface and each physical interface + is connected to a different physical network. The term + "multihoming" was originally applied only to this case, + but it is now applied more generally. + + A host with embedded gateway functionality will + typically fall into the simple multihoming case. Note, + however, that a host may be simply multihomed without + containing an embedded gateway, i.e., without + forwarding datagrams from one connected network to + another. + + This case presents the most difficult routing problems. + The choice of interface (i.e., the choice of first-hop + network) may significantly affect performance or even + reachability of remote parts of the Internet. + + + Finally, we note another possibility that is NOT + multihoming: one logical interface may be bound to multiple + physical interfaces, in order to increase the reliability or + throughput between directly connected machines by providing + alternative physical paths between them. For instance, two + systems might be connected by multiple point-to-point links. + We call this "link-layer multiplexing". With link-layer + multiplexing, the protocols above the link layer are unaware + that multiple physical interfaces are present; the link- + layer device driver is responsible for multiplexing and + routing packets across the physical interfaces. + + In the Internet protocol architecture, a transport protocol + instance ("entity") has no address of its own, but instead + uses a single Internet Protocol (IP) address. This has + implications for the IP, transport, and application layers, + and for the interfaces between them. In particular, the + application software may have to be aware of the multiple IP + addresses of a multihomed host; in other cases, the choice + can be made within the network software. + + 3.3.4.2 Multihoming Requirements + + The following general rules apply to the selection of an IP + source address for sending a datagram from a multihomed + + + +Internet Engineering Task Force [Page 61] + + + + +RFC1122 INTERNET LAYER October 1989 + + + host. + + (1) If the datagram is sent in response to a received + datagram, the source address for the response SHOULD be + the specific-destination address of the request. See + Sections 4.1.3.5 and 4.2.3.7 and the "General Issues" + section of [INTRO:1] for more specific requirements on + higher layers. + + Otherwise, a source address must be selected. + + (2) An application MUST be able to explicitly specify the + source address for initiating a connection or a + request. + + (3) In the absence of such a specification, the networking + software MUST choose a source address. Rules for this + choice are described below. + + + There are two key requirement issues related to multihoming: + + (A) A host MAY silently discard an incoming datagram whose + destination address does not correspond to the physical + interface through which it is received. + + (B) A host MAY restrict itself to sending (non-source- + routed) IP datagrams only through the physical + interface that corresponds to the IP source address of + the datagrams. + + + DISCUSSION: + Internet host implementors have used two different + conceptual models for multihoming, briefly summarized + in the following discussion. This document takes no + stand on which model is preferred; each seems to have a + place. This ambivalence is reflected in the issues (A) + and (B) being optional. + + o Strong ES Model + + The Strong ES (End System, i.e., host) model + emphasizes the host/gateway (ES/IS) distinction, + and would therefore substitute MUST for MAY in + issues (A) and (B) above. It tends to model a + multihomed host as a set of logical hosts within + the same physical host. + + + +Internet Engineering Task Force [Page 62] + + + + +RFC1122 INTERNET LAYER October 1989 + + + With respect to (A), proponents of the Strong ES + model note that automatic Internet routing + mechanisms could not route a datagram to a + physical interface that did not correspond to the + destination address. + + Under the Strong ES model, the route computation + for an outgoing datagram is the mapping: + + route(src IP addr, dest IP addr, TOS) + -> gateway + + Here the source address is included as a parameter + in order to select a gateway that is directly + reachable on the corresponding physical interface. + Note that this model logically requires that in + general there be at least one default gateway, and + preferably multiple defaults, for each IP source + address. + + o Weak ES Model + + This view de-emphasizes the ES/IS distinction, and + would therefore substitute MUST NOT for MAY in + issues (A) and (B). This model may be the more + natural one for hosts that wiretap gateway routing + protocols, and is necessary for hosts that have + embedded gateway functionality. + + The Weak ES Model may cause the Redirect mechanism + to fail. If a datagram is sent out a physical + interface that does not correspond to the + destination address, the first-hop gateway will + not realize when it needs to send a Redirect. On + the other hand, if the host has embedded gateway + functionality, then it has routing information + without listening to Redirects. + + In the Weak ES model, the route computation for an + outgoing datagram is the mapping: + + route(dest IP addr, TOS) -> gateway, interface + + + + + + + + + +Internet Engineering Task Force [Page 63] + + + + +RFC1122 INTERNET LAYER October 1989 + + + 3.3.4.3 Choosing a Source Address + + DISCUSSION: + When it sends an initial connection request (e.g., a + TCP "SYN" segment) or a datagram service request (e.g., + a UDP-based query), the transport layer on a multihomed + host needs to know which source address to use. If the + application does not specify it, the transport layer + must ask the IP layer to perform the conceptual + mapping: + + GET_SRCADDR(remote IP addr, TOS) + -> local IP address + + Here TOS is the Type-of-Service value (see Section + 3.2.1.6), and the result is the desired source address. + The following rules are suggested for implementing this + mapping: + + (a) If the remote Internet address lies on one of the + (sub-) nets to which the host is directly + connected, a corresponding source address may be + chosen, unless the corresponding interface is + known to be down. + + (b) The route cache may be consulted, to see if there + is an active route to the specified destination + network through any network interface; if so, a + local IP address corresponding to that interface + may be chosen. + + (c) The table of static routes, if any (see Section + 3.3.1.2) may be similarly consulted. + + (d) The default gateways may be consulted. If these + gateways are assigned to different interfaces, the + interface corresponding to the gateway with the + highest preference may be chosen. + + In the future, there may be a defined way for a + multihomed host to ask the gateways on all connected + networks for advice about the best network to use for a + given destination. + + IMPLEMENTATION: + It will be noted that this process is essentially the + same as datagram routing (see Section 3.3.1), and + therefore hosts may be able to combine the + + + +Internet Engineering Task Force [Page 64] + + + + +RFC1122 INTERNET LAYER October 1989 + + + implementation of the two functions. + + 3.3.5 Source Route Forwarding + + Subject to restrictions given below, a host MAY be able to act + as an intermediate hop in a source route, forwarding a source- + routed datagram to the next specified hop. + + However, in performing this gateway-like function, the host + MUST obey all the relevant rules for a gateway forwarding + source-routed datagrams [INTRO:2]. This includes the following + specific provisions, which override the corresponding host + provisions given earlier in this document: + + (A) TTL (ref. Section 3.2.1.7) + + The TTL field MUST be decremented and the datagram perhaps + discarded as specified for a gateway in [INTRO:2]. + + (B) ICMP Destination Unreachable (ref. Section 3.2.2.1) + + A host MUST be able to generate Destination Unreachable + messages with the following codes: + + 4 (Fragmentation Required but DF Set) when a source- + routed datagram cannot be fragmented to fit into the + target network; + + 5 (Source Route Failed) when a source-routed datagram + cannot be forwarded, e.g., because of a routing + problem or because the next hop of a strict source + route is not on a connected network. + + (C) IP Source Address (ref. Section 3.2.1.3) + + A source-routed datagram being forwarded MAY (and normally + will) have a source address that is not one of the IP + addresses of the forwarding host. + + (D) Record Route Option (ref. Section 3.2.1.8d) + + A host that is forwarding a source-routed datagram + containing a Record Route option MUST update that option, + if it has room. + + (E) Timestamp Option (ref. Section 3.2.1.8e) + + A host that is forwarding a source-routed datagram + + + +Internet Engineering Task Force [Page 65] + + + + +RFC1122 INTERNET LAYER October 1989 + + + containing a Timestamp Option MUST add the current + timestamp to that option, according to the rules for this + option. + + To define the rules restricting host forwarding of source- + routed datagrams, we use the term "local source-routing" if the + next hop will be through the same physical interface through + which the datagram arrived; otherwise, it is "non-local + source-routing". + + o A host is permitted to perform local source-routing + without restriction. + + o A host that supports non-local source-routing MUST have a + configurable switch to disable forwarding, and this switch + MUST default to disabled. + + o The host MUST satisfy all gateway requirements for + configurable policy filters [INTRO:2] restricting non- + local forwarding. + + If a host receives a datagram with an incomplete source route + but does not forward it for some reason, the host SHOULD return + an ICMP Destination Unreachable (code 5, Source Route Failed) + message, unless the datagram was itself an ICMP error message. + + 3.3.6 Broadcasts + + Section 3.2.1.3 defined the four standard IP broadcast address + forms: + + Limited Broadcast: {-1, -1} + + Directed Broadcast: {<Network-number>,-1} + + Subnet Directed Broadcast: + {<Network-number>,<Subnet-number>,-1} + + All-Subnets Directed Broadcast: {<Network-number>,-1,-1} + + A host MUST recognize any of these forms in the destination + address of an incoming datagram. + + There is a class of hosts* that use non-standard broadcast + address forms, substituting 0 for -1. All hosts SHOULD +_________________________ +*4.2BSD Unix and its derivatives, but not 4.3BSD. + + + + +Internet Engineering Task Force [Page 66] + + + + +RFC1122 INTERNET LAYER October 1989 + + + recognize and accept any of these non-standard broadcast + addresses as the destination address of an incoming datagram. + A host MAY optionally have a configuration option to choose the + 0 or the -1 form of broadcast address, for each physical + interface, but this option SHOULD default to the standard (-1) + form. + + When a host sends a datagram to a link-layer broadcast address, + the IP destination address MUST be a legal IP broadcast or IP + multicast address. + + A host SHOULD silently discard a datagram that is received via + a link-layer broadcast (see Section 2.4) but does not specify + an IP multicast or broadcast destination address. + + Hosts SHOULD use the Limited Broadcast address to broadcast to + a connected network. + + + DISCUSSION: + Using the Limited Broadcast address instead of a Directed + Broadcast address may improve system robustness. Problems + are often caused by machines that do not understand the + plethora of broadcast addresses (see Section 3.2.1.3), or + that may have different ideas about which broadcast + addresses are in use. The prime example of the latter is + machines that do not understand subnetting but are + attached to a subnetted net. Sending a Subnet Broadcast + for the connected network will confuse those machines, + which will see it as a message to some other host. + + There has been discussion on whether a datagram addressed + to the Limited Broadcast address ought to be sent from all + the interfaces of a multihomed host. This specification + takes no stand on the issue. + + 3.3.7 IP Multicasting + + A host SHOULD support local IP multicasting on all connected + networks for which a mapping from Class D IP addresses to + link-layer addresses has been specified (see below). Support + for local IP multicasting includes sending multicast datagrams, + joining multicast groups and receiving multicast datagrams, and + leaving multicast groups. This implies support for all of + [IP:4] except the IGMP protocol itself, which is OPTIONAL. + + + + + + +Internet Engineering Task Force [Page 67] + + + + +RFC1122 INTERNET LAYER October 1989 + + + DISCUSSION: + IGMP provides gateways that are capable of multicast + routing with the information required to support IP + multicasting across multiple networks. At this time, + multicast-routing gateways are in the experimental stage + and are not widely available. For hosts that are not + connected to networks with multicast-routing gateways or + that do not need to receive multicast datagrams + originating on other networks, IGMP serves no purpose and + is therefore optional for now. However, the rest of + [IP:4] is currently recommended for the purpose of + providing IP-layer access to local network multicast + addressing, as a preferable alternative to local broadcast + addressing. It is expected that IGMP will become + recommended at some future date, when multicast-routing + gateways have become more widely available. + + If IGMP is not implemented, a host SHOULD still join the "all- + hosts" group (224.0.0.1) when the IP layer is initialized and + remain a member for as long as the IP layer is active. + + DISCUSSION: + Joining the "all-hosts" group will support strictly local + uses of multicasting, e.g., a gateway discovery protocol, + even if IGMP is not implemented. + + The mapping of IP Class D addresses to local addresses is + currently specified for the following types of networks: + + o Ethernet/IEEE 802.3, as defined in [IP:4]. + + o Any network that supports broadcast but not multicast, + addressing: all IP Class D addresses map to the local + broadcast address. + + o Any type of point-to-point link (e.g., SLIP or HDLC + links): no mapping required. All IP multicast datagrams + are sent as-is, inside the local framing. + + Mappings for other types of networks will be specified in the + future. + + A host SHOULD provide a way for higher-layer protocols or + applications to determine which of the host's connected + network(s) support IP multicast addressing. + + + + + + +Internet Engineering Task Force [Page 68] + + + + +RFC1122 INTERNET LAYER October 1989 + + + 3.3.8 Error Reporting + + Wherever practical, hosts MUST return ICMP error datagrams on + detection of an error, except in those cases where returning an + ICMP error message is specifically prohibited. + + DISCUSSION: + A common phenomenon in datagram networks is the "black + hole disease": datagrams are sent out, but nothing comes + back. Without any error datagrams, it is difficult for + the user to figure out what the problem is. + + 3.4 INTERNET/TRANSPORT LAYER INTERFACE + + The interface between the IP layer and the transport layer MUST + provide full access to all the mechanisms of the IP layer, + including options, Type-of-Service, and Time-to-Live. The + transport layer MUST either have mechanisms to set these interface + parameters, or provide a path to pass them through from an + application, or both. + + DISCUSSION: + Applications are urged to make use of these mechanisms where + applicable, even when the mechanisms are not currently + effective in the Internet (e.g., TOS). This will allow these + mechanisms to be immediately useful when they do become + effective, without a large amount of retrofitting of host + software. + + We now describe a conceptual interface between the transport layer + and the IP layer, as a set of procedure calls. This is an + extension of the information in Section 3.3 of RFC-791 [IP:1]. + + + * Send Datagram + + SEND(src, dst, prot, TOS, TTL, BufPTR, len, Id, DF, opt + => result ) + + where the parameters are defined in RFC-791. Passing an Id + parameter is optional; see Section 3.2.1.5. + + + * Receive Datagram + + RECV(BufPTR, prot + => result, src, dst, SpecDest, TOS, len, opt) + + + + +Internet Engineering Task Force [Page 69] + + + + +RFC1122 INTERNET LAYER October 1989 + + + All the parameters are defined in RFC-791, except for: + + SpecDest = specific-destination address of datagram + (defined in Section 3.2.1.3) + + The result parameter dst contains the datagram's destination + address. Since this may be a broadcast or multicast address, + the SpecDest parameter (not shown in RFC-791) MUST be passed. + The parameter opt contains all the IP options received in the + datagram; these MUST also be passed to the transport layer. + + + * Select Source Address + + GET_SRCADDR(remote, TOS) -> local + + remote = remote IP address + TOS = Type-of-Service + local = local IP address + + See Section 3.3.4.3. + + + * Find Maximum Datagram Sizes + + GET_MAXSIZES(local, remote, TOS) -> MMS_R, MMS_S + + MMS_R = maximum receive transport-message size. + MMS_S = maximum send transport-message size. + (local, remote, TOS defined above) + + See Sections 3.3.2 and 3.3.3. + + + * Advice on Delivery Success + + ADVISE_DELIVPROB(sense, local, remote, TOS) + + Here the parameter sense is a 1-bit flag indicating whether + positive or negative advice is being given; see the + discussion in Section 3.3.1.4. The other parameters were + defined earlier. + + + * Send ICMP Message + + SEND_ICMP(src, dst, TOS, TTL, BufPTR, len, Id, DF, opt) + -> result + + + +Internet Engineering Task Force [Page 70] + + + + +RFC1122 INTERNET LAYER October 1989 + + + (Parameters defined in RFC-791). + + Passing an Id parameter is optional; see Section 3.2.1.5. + The transport layer MUST be able to send certain ICMP + messages: Port Unreachable or any of the query-type + messages. This function could be considered to be a special + case of the SEND() call, of course; we describe it separately + for clarity. + + + * Receive ICMP Message + + RECV_ICMP(BufPTR ) -> result, src, dst, len, opt + + (Parameters defined in RFC-791). + + The IP layer MUST pass certain ICMP messages up to the + appropriate transport-layer routine. This function could be + considered to be a special case of the RECV() call, of + course; we describe it separately for clarity. + + For an ICMP error message, the data that is passed up MUST + include the original Internet header plus all the octets of + the original message that are included in the ICMP message. + This data will be used by the transport layer to locate the + connection state information, if any. + + In particular, the following ICMP messages are to be passed + up: + + o Destination Unreachable + + o Source Quench + + o Echo Reply (to ICMP user interface, unless the Echo + Request originated in the IP layer) + + o Timestamp Reply (to ICMP user interface) + + o Time Exceeded + + + DISCUSSION: + In the future, there may be additions to this interface to + pass path data (see Section 3.3.1.3) between the IP and + transport layers. + + + + + +Internet Engineering Task Force [Page 71] + + + + +RFC1122 INTERNET LAYER October 1989 + + + 3.5 INTERNET LAYER REQUIREMENTS SUMMARY + + + | | | | |S| | + | | | | |H| |F + | | | | |O|M|o + | | |S| |U|U|o + | | |H| |L|S|t + | |M|O| |D|T|n + | |U|U|M| | |o + | |S|L|A|N|N|t + | |T|D|Y|O|O|t +FEATURE |SECTION | | | |T|T|e +-------------------------------------------------|--------|-|-|-|-|-|-- + | | | | | | | +Implement IP and ICMP |3.1 |x| | | | | +Handle remote multihoming in application layer |3.1 |x| | | | | +Support local multihoming |3.1 | | |x| | | +Meet gateway specs if forward datagrams |3.1 |x| | | | | +Configuration switch for embedded gateway |3.1 |x| | | | |1 + Config switch default to non-gateway |3.1 |x| | | | |1 + Auto-config based on number of interfaces |3.1 | | | | |x|1 +Able to log discarded datagrams |3.1 | |x| | | | + Record in counter |3.1 | |x| | | | + | | | | | | | +Silently discard Version != 4 |3.2.1.1 |x| | | | | +Verify IP checksum, silently discard bad dgram |3.2.1.2 |x| | | | | +Addressing: | | | | | | | + Subnet addressing (RFC-950) |3.2.1.3 |x| | | | | + Src address must be host's own IP address |3.2.1.3 |x| | | | | + Silently discard datagram with bad dest addr |3.2.1.3 |x| | | | | + Silently discard datagram with bad src addr |3.2.1.3 |x| | | | | +Support reassembly |3.2.1.4 |x| | | | | +Retain same Id field in identical datagram |3.2.1.5 | | |x| | | + | | | | | | | +TOS: | | | | | | | + Allow transport layer to set TOS |3.2.1.6 |x| | | | | + Pass received TOS up to transport layer |3.2.1.6 | |x| | | | + Use RFC-795 link-layer mappings for TOS |3.2.1.6 | | | |x| | +TTL: | | | | | | | + Send packet with TTL of 0 |3.2.1.7 | | | | |x| + Discard received packets with TTL < 2 |3.2.1.7 | | | | |x| + Allow transport layer to set TTL |3.2.1.7 |x| | | | | + Fixed TTL is configurable |3.2.1.7 |x| | | | | + | | | | | | | +IP Options: | | | | | | | + Allow transport layer to send IP options |3.2.1.8 |x| | | | | + Pass all IP options rcvd to higher layer |3.2.1.8 |x| | | | | + + + +Internet Engineering Task Force [Page 72] + + + + +RFC1122 INTERNET LAYER October 1989 + + + IP layer silently ignore unknown options |3.2.1.8 |x| | | | | + Security option |3.2.1.8a| | |x| | | + Send Stream Identifier option |3.2.1.8b| | | |x| | + Silently ignore Stream Identifer option |3.2.1.8b|x| | | | | + Record Route option |3.2.1.8d| | |x| | | + Timestamp option |3.2.1.8e| | |x| | | +Source Route Option: | | | | | | | + Originate & terminate Source Route options |3.2.1.8c|x| | | | | + Datagram with completed SR passed up to TL |3.2.1.8c|x| | | | | + Build correct (non-redundant) return route |3.2.1.8c|x| | | | | + Send multiple SR options in one header |3.2.1.8c| | | | |x| + | | | | | | | +ICMP: | | | | | | | + Silently discard ICMP msg with unknown type |3.2.2 |x| | | | | + Include more than 8 octets of orig datagram |3.2.2 | | |x| | | + Included octets same as received |3.2.2 |x| | | | | + Demux ICMP Error to transport protocol |3.2.2 |x| | | | | + Send ICMP error message with TOS=0 |3.2.2 | |x| | | | + Send ICMP error message for: | | | | | | | + - ICMP error msg |3.2.2 | | | | |x| + - IP b'cast or IP m'cast |3.2.2 | | | | |x| + - Link-layer b'cast |3.2.2 | | | | |x| + - Non-initial fragment |3.2.2 | | | | |x| + - Datagram with non-unique src address |3.2.2 | | | | |x| + Return ICMP error msgs (when not prohibited) |3.3.8 |x| | | | | + | | | | | | | + Dest Unreachable: | | | | | | | + Generate Dest Unreachable (code 2/3) |3.2.2.1 | |x| | | | + Pass ICMP Dest Unreachable to higher layer |3.2.2.1 |x| | | | | + Higher layer act on Dest Unreach |3.2.2.1 | |x| | | | + Interpret Dest Unreach as only hint |3.2.2.1 |x| | | | | + Redirect: | | | | | | | + Host send Redirect |3.2.2.2 | | | |x| | + Update route cache when recv Redirect |3.2.2.2 |x| | | | | + Handle both Host and Net Redirects |3.2.2.2 |x| | | | | + Discard illegal Redirect |3.2.2.2 | |x| | | | + Source Quench: | | | | | | | + Send Source Quench if buffering exceeded |3.2.2.3 | | |x| | | + Pass Source Quench to higher layer |3.2.2.3 |x| | | | | + Higher layer act on Source Quench |3.2.2.3 | |x| | | | + Time Exceeded: pass to higher layer |3.2.2.4 |x| | | | | + Parameter Problem: | | | | | | | + Send Parameter Problem messages |3.2.2.5 | |x| | | | + Pass Parameter Problem to higher layer |3.2.2.5 |x| | | | | + Report Parameter Problem to user |3.2.2.5 | | |x| | | + | | | | | | | + ICMP Echo Request or Reply: | | | | | | | + Echo server and Echo client |3.2.2.6 |x| | | | | + + + +Internet Engineering Task Force [Page 73] + + + + +RFC1122 INTERNET LAYER October 1989 + + + Echo client |3.2.2.6 | |x| | | | + Discard Echo Request to broadcast address |3.2.2.6 | | |x| | | + Discard Echo Request to multicast address |3.2.2.6 | | |x| | | + Use specific-dest addr as Echo Reply src |3.2.2.6 |x| | | | | + Send same data in Echo Reply |3.2.2.6 |x| | | | | + Pass Echo Reply to higher layer |3.2.2.6 |x| | | | | + Reflect Record Route, Time Stamp options |3.2.2.6 | |x| | | | + Reverse and reflect Source Route option |3.2.2.6 |x| | | | | + | | | | | | | + ICMP Information Request or Reply: |3.2.2.7 | | | |x| | + ICMP Timestamp and Timestamp Reply: |3.2.2.8 | | |x| | | + Minimize delay variability |3.2.2.8 | |x| | | |1 + Silently discard b'cast Timestamp |3.2.2.8 | | |x| | |1 + Silently discard m'cast Timestamp |3.2.2.8 | | |x| | |1 + Use specific-dest addr as TS Reply src |3.2.2.8 |x| | | | |1 + Reflect Record Route, Time Stamp options |3.2.2.6 | |x| | | |1 + Reverse and reflect Source Route option |3.2.2.8 |x| | | | |1 + Pass Timestamp Reply to higher layer |3.2.2.8 |x| | | | |1 + Obey rules for "standard value" |3.2.2.8 |x| | | | |1 + | | | | | | | + ICMP Address Mask Request and Reply: | | | | | | | + Addr Mask source configurable |3.2.2.9 |x| | | | | + Support static configuration of addr mask |3.2.2.9 |x| | | | | + Get addr mask dynamically during booting |3.2.2.9 | | |x| | | + Get addr via ICMP Addr Mask Request/Reply |3.2.2.9 | | |x| | | + Retransmit Addr Mask Req if no Reply |3.2.2.9 |x| | | | |3 + Assume default mask if no Reply |3.2.2.9 | |x| | | |3 + Update address mask from first Reply only |3.2.2.9 |x| | | | |3 + Reasonableness check on Addr Mask |3.2.2.9 | |x| | | | + Send unauthorized Addr Mask Reply msgs |3.2.2.9 | | | | |x| + Explicitly configured to be agent |3.2.2.9 |x| | | | | + Static config=> Addr-Mask-Authoritative flag |3.2.2.9 | |x| | | | + Broadcast Addr Mask Reply when init. |3.2.2.9 |x| | | | |3 + | | | | | | | +ROUTING OUTBOUND DATAGRAMS: | | | | | | | + Use address mask in local/remote decision |3.3.1.1 |x| | | | | + Operate with no gateways on conn network |3.3.1.1 |x| | | | | + Maintain "route cache" of next-hop gateways |3.3.1.2 |x| | | | | + Treat Host and Net Redirect the same |3.3.1.2 | |x| | | | + If no cache entry, use default gateway |3.3.1.2 |x| | | | | + Support multiple default gateways |3.3.1.2 |x| | | | | + Provide table of static routes |3.3.1.2 | | |x| | | + Flag: route overridable by Redirects |3.3.1.2 | | |x| | | + Key route cache on host, not net address |3.3.1.3 | | |x| | | + Include TOS in route cache |3.3.1.3 | |x| | | | + | | | | | | | + Able to detect failure of next-hop gateway |3.3.1.4 |x| | | | | + Assume route is good forever |3.3.1.4 | | | |x| | + + + +Internet Engineering Task Force [Page 74] + + + + +RFC1122 INTERNET LAYER October 1989 + + + Ping gateways continuously |3.3.1.4 | | | | |x| + Ping only when traffic being sent |3.3.1.4 |x| | | | | + Ping only when no positive indication |3.3.1.4 |x| | | | | + Higher and lower layers give advice |3.3.1.4 | |x| | | | + Switch from failed default g'way to another |3.3.1.5 |x| | | | | + Manual method of entering config info |3.3.1.6 |x| | | | | + | | | | | | | +REASSEMBLY and FRAGMENTATION: | | | | | | | + Able to reassemble incoming datagrams |3.3.2 |x| | | | | + At least 576 byte datagrams |3.3.2 |x| | | | | + EMTU_R configurable or indefinite |3.3.2 | |x| | | | + Transport layer able to learn MMS_R |3.3.2 |x| | | | | + Send ICMP Time Exceeded on reassembly timeout |3.3.2 |x| | | | | + Fixed reassembly timeout value |3.3.2 | |x| | | | + | | | | | | | + Pass MMS_S to higher layers |3.3.3 |x| | | | | + Local fragmentation of outgoing packets |3.3.3 | | |x| | | + Else don't send bigger than MMS_S |3.3.3 |x| | | | | + Send max 576 to off-net destination |3.3.3 | |x| | | | + All-Subnets-MTU configuration flag |3.3.3 | | |x| | | + | | | | | | | +MULTIHOMING: | | | | | | | + Reply with same addr as spec-dest addr |3.3.4.2 | |x| | | | + Allow application to choose local IP addr |3.3.4.2 |x| | | | | + Silently discard d'gram in "wrong" interface |3.3.4.2 | | |x| | | + Only send d'gram through "right" interface |3.3.4.2 | | |x| | |4 + | | | | | | | +SOURCE-ROUTE FORWARDING: | | | | | | | + Forward datagram with Source Route option |3.3.5 | | |x| | |1 + Obey corresponding gateway rules |3.3.5 |x| | | | |1 + Update TTL by gateway rules |3.3.5 |x| | | | |1 + Able to generate ICMP err code 4, 5 |3.3.5 |x| | | | |1 + IP src addr not local host |3.3.5 | | |x| | |1 + Update Timestamp, Record Route options |3.3.5 |x| | | | |1 + Configurable switch for non-local SRing |3.3.5 |x| | | | |1 + Defaults to OFF |3.3.5 |x| | | | |1 + Satisfy gwy access rules for non-local SRing |3.3.5 |x| | | | |1 + If not forward, send Dest Unreach (cd 5) |3.3.5 | |x| | | |2 + | | | | | | | +BROADCAST: | | | | | | | + Broadcast addr as IP source addr |3.2.1.3 | | | | |x| + Receive 0 or -1 broadcast formats OK |3.3.6 | |x| | | | + Config'ble option to send 0 or -1 b'cast |3.3.6 | | |x| | | + Default to -1 broadcast |3.3.6 | |x| | | | + Recognize all broadcast address formats |3.3.6 |x| | | | | + Use IP b'cast/m'cast addr in link-layer b'cast |3.3.6 |x| | | | | + Silently discard link-layer-only b'cast dg's |3.3.6 | |x| | | | + Use Limited Broadcast addr for connected net |3.3.6 | |x| | | | + + + +Internet Engineering Task Force [Page 75] + + + + +RFC1122 INTERNET LAYER October 1989 + + + | | | | | | | +MULTICAST: | | | | | | | + Support local IP multicasting (RFC-1112) |3.3.7 | |x| | | | + Support IGMP (RFC-1112) |3.3.7 | | |x| | | + Join all-hosts group at startup |3.3.7 | |x| | | | + Higher layers learn i'face m'cast capability |3.3.7 | |x| | | | + | | | | | | | +INTERFACE: | | | | | | | + Allow transport layer to use all IP mechanisms |3.4 |x| | | | | + Pass interface ident up to transport layer |3.4 |x| | | | | + Pass all IP options up to transport layer |3.4 |x| | | | | + Transport layer can send certain ICMP messages |3.4 |x| | | | | + Pass spec'd ICMP messages up to transp. layer |3.4 |x| | | | | + Include IP hdr+8 octets or more from orig. |3.4 |x| | | | | + Able to leap tall buildings at a single bound |3.5 | |x| | | | + +Footnotes: + +(1) Only if feature is implemented. + +(2) This requirement is overruled if datagram is an ICMP error message. + +(3) Only if feature is implemented and is configured "on". + +(4) Unless has embedded gateway functionality or is source routed. + + + + + + + + + + + + + + + + + + + + + + + + + + +Internet Engineering Task Force [Page 76] + + + + +RFC1122 TRANSPORT LAYER -- UDP October 1989 + + +4. TRANSPORT PROTOCOLS + + 4.1 USER DATAGRAM PROTOCOL -- UDP + + 4.1.1 INTRODUCTION + + The User Datagram Protocol UDP [UDP:1] offers only a minimal + transport service -- non-guaranteed datagram delivery -- and + gives applications direct access to the datagram service of the + IP layer. UDP is used by applications that do not require the + level of service of TCP or that wish to use communications + services (e.g., multicast or broadcast delivery) not available + from TCP. + + UDP is almost a null protocol; the only services it provides + over IP are checksumming of data and multiplexing by port + number. Therefore, an application program running over UDP + must deal directly with end-to-end communication problems that + a connection-oriented protocol would have handled -- e.g., + retransmission for reliable delivery, packetization and + reassembly, flow control, congestion avoidance, etc., when + these are required. The fairly complex coupling between IP and + TCP will be mirrored in the coupling between UDP and many + applications using UDP. + + 4.1.2 PROTOCOL WALK-THROUGH + + There are no known errors in the specification of UDP. + + 4.1.3 SPECIFIC ISSUES + + 4.1.3.1 Ports + + UDP well-known ports follow the same rules as TCP well-known + ports; see Section 4.2.2.1 below. + + If a datagram arrives addressed to a UDP port for which + there is no pending LISTEN call, UDP SHOULD send an ICMP + Port Unreachable message. + + 4.1.3.2 IP Options + + UDP MUST pass any IP option that it receives from the IP + layer transparently to the application layer. + + An application MUST be able to specify IP options to be sent + in its UDP datagrams, and UDP MUST pass these options to the + IP layer. + + + +Internet Engineering Task Force [Page 77] + + + + +RFC1122 TRANSPORT LAYER -- UDP October 1989 + + + DISCUSSION: + At present, the only options that need be passed + through UDP are Source Route, Record Route, and Time + Stamp. However, new options may be defined in the + future, and UDP need not and should not make any + assumptions about the format or content of options it + passes to or from the application; an exception to this + might be an IP-layer security option. + + An application based on UDP will need to obtain a + source route from a request datagram and supply a + reversed route for sending the corresponding reply. + + 4.1.3.3 ICMP Messages + + UDP MUST pass to the application layer all ICMP error + messages that it receives from the IP layer. Conceptually + at least, this may be accomplished with an upcall to the + ERROR_REPORT routine (see Section 4.2.4.1). + + DISCUSSION: + Note that ICMP error messages resulting from sending a + UDP datagram are received asynchronously. A UDP-based + application that wants to receive ICMP error messages + is responsible for maintaining the state necessary to + demultiplex these messages when they arrive; for + example, the application may keep a pending receive + operation for this purpose. The application is also + responsible to avoid confusion from a delayed ICMP + error message resulting from an earlier use of the same + port(s). + + 4.1.3.4 UDP Checksums + + A host MUST implement the facility to generate and validate + UDP checksums. An application MAY optionally be able to + control whether a UDP checksum will be generated, but it + MUST default to checksumming on. + + If a UDP datagram is received with a checksum that is non- + zero and invalid, UDP MUST silently discard the datagram. + An application MAY optionally be able to control whether UDP + datagrams without checksums should be discarded or passed to + the application. + + DISCUSSION: + Some applications that normally run only across local + area networks have chosen to turn off UDP checksums for + + + +Internet Engineering Task Force [Page 78] + + + + +RFC1122 TRANSPORT LAYER -- UDP October 1989 + + + efficiency. As a result, numerous cases of undetected + errors have been reported. The advisability of ever + turning off UDP checksumming is very controversial. + + IMPLEMENTATION: + There is a common implementation error in UDP + checksums. Unlike the TCP checksum, the UDP checksum + is optional; the value zero is transmitted in the + checksum field of a UDP header to indicate the absence + of a checksum. If the transmitter really calculates a + UDP checksum of zero, it must transmit the checksum as + all 1's (65535). No special action is required at the + receiver, since zero and 65535 are equivalent in 1's + complement arithmetic. + + 4.1.3.5 UDP Multihoming + + When a UDP datagram is received, its specific-destination + address MUST be passed up to the application layer. + + An application program MUST be able to specify the IP source + address to be used for sending a UDP datagram or to leave it + unspecified (in which case the networking software will + choose an appropriate source address). There SHOULD be a + way to communicate the chosen source address up to the + application layer (e.g, so that the application can later + receive a reply datagram only from the corresponding + interface). + + DISCUSSION: + A request/response application that uses UDP should use + a source address for the response that is the same as + the specific destination address of the request. See + the "General Issues" section of [INTRO:1]. + + 4.1.3.6 Invalid Addresses + + A UDP datagram received with an invalid IP source address + (e.g., a broadcast or multicast address) must be discarded + by UDP or by the IP layer (see Section 3.2.1.3). + + When a host sends a UDP datagram, the source address MUST be + (one of) the IP address(es) of the host. + + 4.1.4 UDP/APPLICATION LAYER INTERFACE + + The application interface to UDP MUST provide the full services + of the IP/transport interface described in Section 3.4 of this + + + +Internet Engineering Task Force [Page 79] + + + + +RFC1122 TRANSPORT LAYER -- UDP October 1989 + + + document. Thus, an application using UDP needs the functions + of the GET_SRCADDR(), GET_MAXSIZES(), ADVISE_DELIVPROB(), and + RECV_ICMP() calls described in Section 3.4. For example, + GET_MAXSIZES() can be used to learn the effective maximum UDP + maximum datagram size for a particular {interface,remote + host,TOS} triplet. + + An application-layer program MUST be able to set the TTL and + TOS values as well as IP options for sending a UDP datagram, + and these values must be passed transparently to the IP layer. + UDP MAY pass the received TOS up to the application layer. + + 4.1.5 UDP REQUIREMENTS SUMMARY + + + | | | | |S| | + | | | | |H| |F + | | | | |O|M|o + | | |S| |U|U|o + | | |H| |L|S|t + | |M|O| |D|T|n + | |U|U|M| | |o + | |S|L|A|N|N|t + | |T|D|Y|O|O|t +FEATURE |SECTION | | | |T|T|e +-------------------------------------------------|--------|-|-|-|-|-|-- + | | | | | | | + UDP | | | | | | | +-------------------------------------------------|--------|-|-|-|-|-|-- + | | | | | | | +UDP send Port Unreachable |4.1.3.1 | |x| | | | + | | | | | | | +IP Options in UDP | | | | | | | + - Pass rcv'd IP options to applic layer |4.1.3.2 |x| | | | | + - Applic layer can specify IP options in Send |4.1.3.2 |x| | | | | + - UDP passes IP options down to IP layer |4.1.3.2 |x| | | | | + | | | | | | | +Pass ICMP msgs up to applic layer |4.1.3.3 |x| | | | | + | | | | | | | +UDP checksums: | | | | | | | + - Able to generate/check checksum |4.1.3.4 |x| | | | | + - Silently discard bad checksum |4.1.3.4 |x| | | | | + - Sender Option to not generate checksum |4.1.3.4 | | |x| | | + - Default is to checksum |4.1.3.4 |x| | | | | + - Receiver Option to require checksum |4.1.3.4 | | |x| | | + | | | | | | | +UDP Multihoming | | | | | | | + - Pass spec-dest addr to application |4.1.3.5 |x| | | | | + + + +Internet Engineering Task Force [Page 80] + + + + +RFC1122 TRANSPORT LAYER -- UDP October 1989 + + + - Applic layer can specify Local IP addr |4.1.3.5 |x| | | | | + - Applic layer specify wild Local IP addr |4.1.3.5 |x| | | | | + - Applic layer notified of Local IP addr used |4.1.3.5 | |x| | | | + | | | | | | | +Bad IP src addr silently discarded by UDP/IP |4.1.3.6 |x| | | | | +Only send valid IP source address |4.1.3.6 |x| | | | | +UDP Application Interface Services | | | | | | | +Full IP interface of 3.4 for application |4.1.4 |x| | | | | + - Able to spec TTL, TOS, IP opts when send dg |4.1.4 |x| | | | | + - Pass received TOS up to applic layer |4.1.4 | | |x| | | + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Internet Engineering Task Force [Page 81] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + 4.2 TRANSMISSION CONTROL PROTOCOL -- TCP + + 4.2.1 INTRODUCTION + + The Transmission Control Protocol TCP [TCP:1] is the primary + virtual-circuit transport protocol for the Internet suite. TCP + provides reliable, in-sequence delivery of a full-duplex stream + of octets (8-bit bytes). TCP is used by those applications + needing reliable, connection-oriented transport service, e.g., + mail (SMTP), file transfer (FTP), and virtual terminal service + (Telnet); requirements for these application-layer protocols + are described in [INTRO:1]. + + 4.2.2 PROTOCOL WALK-THROUGH + + 4.2.2.1 Well-Known Ports: RFC-793 Section 2.7 + + DISCUSSION: + TCP reserves port numbers in the range 0-255 for + "well-known" ports, used to access services that are + standardized across the Internet. The remainder of the + port space can be freely allocated to application + processes. Current well-known port definitions are + listed in the RFC entitled "Assigned Numbers" + [INTRO:6]. A prerequisite for defining a new well- + known port is an RFC documenting the proposed service + in enough detail to allow new implementations. + + Some systems extend this notion by adding a third + subdivision of the TCP port space: reserved ports, + which are generally used for operating-system-specific + services. For example, reserved ports might fall + between 256 and some system-dependent upper limit. + Some systems further choose to protect well-known and + reserved ports by permitting only privileged users to + open TCP connections with those port values. This is + perfectly reasonable as long as the host does not + assume that all hosts protect their low-numbered ports + in this manner. + + 4.2.2.2 Use of Push: RFC-793 Section 2.8 + + When an application issues a series of SEND calls without + setting the PUSH flag, the TCP MAY aggregate the data + internally without sending it. Similarly, when a series of + segments is received without the PSH bit, a TCP MAY queue + the data internally without passing it to the receiving + application. + + + +Internet Engineering Task Force [Page 82] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + The PSH bit is not a record marker and is independent of + segment boundaries. The transmitter SHOULD collapse + successive PSH bits when it packetizes data, to send the + largest possible segment. + + A TCP MAY implement PUSH flags on SEND calls. If PUSH flags + are not implemented, then the sending TCP: (1) must not + buffer data indefinitely, and (2) MUST set the PSH bit in + the last buffered segment (i.e., when there is no more + queued data to be sent). + + The discussion in RFC-793 on pages 48, 50, and 74 + erroneously implies that a received PSH flag must be passed + to the application layer. Passing a received PSH flag to + the application layer is now OPTIONAL. + + An application program is logically required to set the PUSH + flag in a SEND call whenever it needs to force delivery of + the data to avoid a communication deadlock. However, a TCP + SHOULD send a maximum-sized segment whenever possible, to + improve performance (see Section 4.2.3.4). + + DISCUSSION: + When the PUSH flag is not implemented on SEND calls, + i.e., when the application/TCP interface uses a pure + streaming model, responsibility for aggregating any + tiny data fragments to form reasonable sized segments + is partially borne by the application layer. + + Generally, an interactive application protocol must set + the PUSH flag at least in the last SEND call in each + command or response sequence. A bulk transfer protocol + like FTP should set the PUSH flag on the last segment + of a file or when necessary to prevent buffer deadlock. + + At the receiver, the PSH bit forces buffered data to be + delivered to the application (even if less than a full + buffer has been received). Conversely, the lack of a + PSH bit can be used to avoid unnecessary wakeup calls + to the application process; this can be an important + performance optimization for large timesharing hosts. + Passing the PSH bit to the receiving application allows + an analogous optimization within the application. + + 4.2.2.3 Window Size: RFC-793 Section 3.1 + + The window size MUST be treated as an unsigned number, or + else large window sizes will appear like negative windows + + + +Internet Engineering Task Force [Page 83] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + and TCP will not work. It is RECOMMENDED that + implementations reserve 32-bit fields for the send and + receive window sizes in the connection record and do all + window computations with 32 bits. + + DISCUSSION: + It is known that the window field in the TCP header is + too small for high-speed, long-delay paths. + Experimental TCP options have been defined to extend + the window size; see for example [TCP:11]. In + anticipation of the adoption of such an extension, TCP + implementors should treat windows as 32 bits. + + 4.2.2.4 Urgent Pointer: RFC-793 Section 3.1 + + The second sentence is in error: the urgent pointer points + to the sequence number of the LAST octet (not LAST+1) in a + sequence of urgent data. The description on page 56 (last + sentence) is correct. + + A TCP MUST support a sequence of urgent data of any length. + + A TCP MUST inform the application layer asynchronously + whenever it receives an Urgent pointer and there was + previously no pending urgent data, or whenever the Urgent + pointer advances in the data stream. There MUST be a way + for the application to learn how much urgent data remains to + be read from the connection, or at least to determine + whether or not more urgent data remains to be read. + + DISCUSSION: + Although the Urgent mechanism may be used for any + application, it is normally used to send "interrupt"- + type commands to a Telnet program (see "Using Telnet + Synch Sequence" section in [INTRO:1]). + + The asynchronous or "out-of-band" notification will + allow the application to go into "urgent mode", reading + data from the TCP connection. This allows control + commands to be sent to an application whose normal + input buffers are full of unprocessed data. + + IMPLEMENTATION: + The generic ERROR-REPORT() upcall described in Section + 4.2.4.1 is a possible mechanism for informing the + application of the arrival of urgent data. + + + + + +Internet Engineering Task Force [Page 84] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + 4.2.2.5 TCP Options: RFC-793 Section 3.1 + + A TCP MUST be able to receive a TCP option in any segment. + A TCP MUST ignore without error any TCP option it does not + implement, assuming that the option has a length field (all + TCP options defined in the future will have length fields). + TCP MUST be prepared to handle an illegal option length + (e.g., zero) without crashing; a suggested procedure is to + reset the connection and log the reason. + + 4.2.2.6 Maximum Segment Size Option: RFC-793 Section 3.1 + + TCP MUST implement both sending and receiving the Maximum + Segment Size option [TCP:4]. + + TCP SHOULD send an MSS (Maximum Segment Size) option in + every SYN segment when its receive MSS differs from the + default 536, and MAY send it always. + + If an MSS option is not received at connection setup, TCP + MUST assume a default send MSS of 536 (576-40) [TCP:4]. + + The maximum size of a segment that TCP really sends, the + "effective send MSS," MUST be the smaller of the send MSS + (which reflects the available reassembly buffer size at the + remote host) and the largest size permitted by the IP layer: + + Eff.snd.MSS = + + min(SendMSS+20, MMS_S) - TCPhdrsize - IPoptionsize + + where: + + * SendMSS is the MSS value received from the remote host, + or the default 536 if no MSS option is received. + + * MMS_S is the maximum size for a transport-layer message + that TCP may send. + + * TCPhdrsize is the size of the TCP header; this is + normally 20, but may be larger if TCP options are to be + sent. + + * IPoptionsize is the size of any IP options that TCP + will pass to the IP layer with the current message. + + + The MSS value to be sent in an MSS option must be less than + + + +Internet Engineering Task Force [Page 85] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + or equal to: + + MMS_R - 20 + + where MMS_R is the maximum size for a transport-layer + message that can be received (and reassembled). TCP obtains + MMS_R and MMS_S from the IP layer; see the generic call + GET_MAXSIZES in Section 3.4. + + DISCUSSION: + The choice of TCP segment size has a strong effect on + performance. Larger segments increase throughput by + amortizing header size and per-datagram processing + overhead over more data bytes; however, if the packet + is so large that it causes IP fragmentation, efficiency + drops sharply if any fragments are lost [IP:9]. + + Some TCP implementations send an MSS option only if the + destination host is on a non-connected network. + However, in general the TCP layer may not have the + appropriate information to make this decision, so it is + preferable to leave to the IP layer the task of + determining a suitable MTU for the Internet path. We + therefore recommend that TCP always send the option (if + not 536) and that the IP layer determine MMS_R as + specified in 3.3.3 and 3.4. A proposed IP-layer + mechanism to measure the MTU would then modify the IP + layer without changing TCP. + + 4.2.2.7 TCP Checksum: RFC-793 Section 3.1 + + Unlike the UDP checksum (see Section 4.1.3.4), the TCP + checksum is never optional. The sender MUST generate it and + the receiver MUST check it. + + 4.2.2.8 TCP Connection State Diagram: RFC-793 Section 3.2, + page 23 + + There are several problems with this diagram: + + (a) The arrow from SYN-SENT to SYN-RCVD should be labeled + with "snd SYN,ACK", to agree with the text on page 68 + and with Figure 8. + + (b) There could be an arrow from SYN-RCVD state to LISTEN + state, conditioned on receiving a RST after a passive + open (see text page 70). + + + + +Internet Engineering Task Force [Page 86] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + (c) It is possible to go directly from FIN-WAIT-1 to the + TIME-WAIT state (see page 75 of the spec). + + + 4.2.2.9 Initial Sequence Number Selection: RFC-793 Section + 3.3, page 27 + + A TCP MUST use the specified clock-driven selection of + initial sequence numbers. + + 4.2.2.10 Simultaneous Open Attempts: RFC-793 Section 3.4, page + 32 + + There is an error in Figure 8: the packet on line 7 should + be identical to the packet on line 5. + + A TCP MUST support simultaneous open attempts. + + DISCUSSION: + It sometimes surprises implementors that if two + applications attempt to simultaneously connect to each + other, only one connection is generated instead of two. + This was an intentional design decision; don't try to + "fix" it. + + 4.2.2.11 Recovery from Old Duplicate SYN: RFC-793 Section 3.4, + page 33 + + Note that a TCP implementation MUST keep track of whether a + connection has reached SYN_RCVD state as the result of a + passive OPEN or an active OPEN. + + 4.2.2.12 RST Segment: RFC-793 Section 3.4 + + A TCP SHOULD allow a received RST segment to include data. + + DISCUSSION + It has been suggested that a RST segment could contain + ASCII text that encoded and explained the cause of the + RST. No standard has yet been established for such + data. + + 4.2.2.13 Closing a Connection: RFC-793 Section 3.5 + + A TCP connection may terminate in two ways: (1) the normal + TCP close sequence using a FIN handshake, and (2) an "abort" + in which one or more RST segments are sent and the + connection state is immediately discarded. If a TCP + + + +Internet Engineering Task Force [Page 87] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + connection is closed by the remote site, the local + application MUST be informed whether it closed normally or + was aborted. + + The normal TCP close sequence delivers buffered data + reliably in both directions. Since the two directions of a + TCP connection are closed independently, it is possible for + a connection to be "half closed," i.e., closed in only one + direction, and a host is permitted to continue sending data + in the open direction on a half-closed connection. + + A host MAY implement a "half-duplex" TCP close sequence, so + that an application that has called CLOSE cannot continue to + read data from the connection. If such a host issues a + CLOSE call while received data is still pending in TCP, or + if new data is received after CLOSE is called, its TCP + SHOULD send a RST to show that data was lost. + + When a connection is closed actively, it MUST linger in + TIME-WAIT state for a time 2xMSL (Maximum Segment Lifetime). + However, it MAY accept a new SYN from the remote TCP to + reopen the connection directly from TIME-WAIT state, if it: + + (1) assigns its initial sequence number for the new + connection to be larger than the largest sequence + number it used on the previous connection incarnation, + and + + (2) returns to TIME-WAIT state if the SYN turns out to be + an old duplicate. + + + DISCUSSION: + TCP's full-duplex data-preserving close is a feature + that is not included in the analogous ISO transport + protocol TP4. + + Some systems have not implemented half-closed + connections, presumably because they do not fit into + the I/O model of their particular operating system. On + these systems, once an application has called CLOSE, it + can no longer read input data from the connection; this + is referred to as a "half-duplex" TCP close sequence. + + The graceful close algorithm of TCP requires that the + connection state remain defined on (at least) one end + of the connection, for a timeout period of 2xMSL, i.e., + 4 minutes. During this period, the (remote socket, + + + +Internet Engineering Task Force [Page 88] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + local socket) pair that defines the connection is busy + and cannot be reused. To shorten the time that a given + port pair is tied up, some TCPs allow a new SYN to be + accepted in TIME-WAIT state. + + 4.2.2.14 Data Communication: RFC-793 Section 3.7, page 40 + + Since RFC-793 was written, there has been extensive work on + TCP algorithms to achieve efficient data communication. + Later sections of the present document describe required and + recommended TCP algorithms to determine when to send data + (Section 4.2.3.4), when to send an acknowledgment (Section + 4.2.3.2), and when to update the window (Section 4.2.3.3). + + DISCUSSION: + One important performance issue is "Silly Window + Syndrome" or "SWS" [TCP:5], a stable pattern of small + incremental window movements resulting in extremely + poor TCP performance. Algorithms to avoid SWS are + described below for both the sending side (Section + 4.2.3.4) and the receiving side (Section 4.2.3.3). + + In brief, SWS is caused by the receiver advancing the + right window edge whenever it has any new buffer space + available to receive data and by the sender using any + incremental window, no matter how small, to send more + data [TCP:5]. The result can be a stable pattern of + sending tiny data segments, even though both sender and + receiver have a large total buffer space for the + connection. SWS can only occur during the transmission + of a large amount of data; if the connection goes + quiescent, the problem will disappear. It is caused by + typical straightforward implementation of window + management, but the sender and receiver algorithms + given below will avoid it. + + Another important TCP performance issue is that some + applications, especially remote login to character-at- + a-time hosts, tend to send streams of one-octet data + segments. To avoid deadlocks, every TCP SEND call from + such applications must be "pushed", either explicitly + by the application or else implicitly by TCP. The + result may be a stream of TCP segments that contain one + data octet each, which makes very inefficient use of + the Internet and contributes to Internet congestion. + The Nagle Algorithm described in Section 4.2.3.4 + provides a simple and effective solution to this + problem. It does have the effect of clumping + + + +Internet Engineering Task Force [Page 89] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + characters over Telnet connections; this may initially + surprise users accustomed to single-character echo, but + user acceptance has not been a problem. + + Note that the Nagle algorithm and the send SWS + avoidance algorithm play complementary roles in + improving performance. The Nagle algorithm discourages + sending tiny segments when the data to be sent + increases in small increments, while the SWS avoidance + algorithm discourages small segments resulting from the + right window edge advancing in small increments. + + A careless implementation can send two or more + acknowledgment segments per data segment received. For + example, suppose the receiver acknowledges every data + segment immediately. When the application program + subsequently consumes the data and increases the + available receive buffer space again, the receiver may + send a second acknowledgment segment to update the + window at the sender. The extreme case occurs with + single-character segments on TCP connections using the + Telnet protocol for remote login service. Some + implementations have been observed in which each + incoming 1-character segment generates three return + segments: (1) the acknowledgment, (2) a one byte + increase in the window, and (3) the echoed character, + respectively. + + 4.2.2.15 Retransmission Timeout: RFC-793 Section 3.7, page 41 + + The algorithm suggested in RFC-793 for calculating the + retransmission timeout is now known to be inadequate; see + Section 4.2.3.1 below. + + Recent work by Jacobson [TCP:7] on Internet congestion and + TCP retransmission stability has produced a transmission + algorithm combining "slow start" with "congestion + avoidance". A TCP MUST implement this algorithm. + + If a retransmitted packet is identical to the original + packet (which implies not only that the data boundaries have + not changed, but also that the window and acknowledgment + fields of the header have not changed), then the same IP + Identification field MAY be used (see Section 3.2.1.5). + + IMPLEMENTATION: + Some TCP implementors have chosen to "packetize" the + data stream, i.e., to pick segment boundaries when + + + +Internet Engineering Task Force [Page 90] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + segments are originally sent and to queue these + segments in a "retransmission queue" until they are + acknowledged. Another design (which may be simpler) is + to defer packetizing until each time data is + transmitted or retransmitted, so there will be no + segment retransmission queue. + + In an implementation with a segment retransmission + queue, TCP performance may be enhanced by repacketizing + the segments awaiting acknowledgment when the first + retransmission timeout occurs. That is, the + outstanding segments that fitted would be combined into + one maximum-sized segment, with a new IP Identification + value. The TCP would then retain this combined segment + in the retransmit queue until it was acknowledged. + However, if the first two segments in the + retransmission queue totalled more than one maximum- + sized segment, the TCP would retransmit only the first + segment using the original IP Identification field. + + 4.2.2.16 Managing the Window: RFC-793 Section 3.7, page 41 + + A TCP receiver SHOULD NOT shrink the window, i.e., move the + right window edge to the left. However, a sending TCP MUST + be robust against window shrinking, which may cause the + "useable window" (see Section 4.2.3.4) to become negative. + + If this happens, the sender SHOULD NOT send new data, but + SHOULD retransmit normally the old unacknowledged data + between SND.UNA and SND.UNA+SND.WND. The sender MAY also + retransmit old data beyond SND.UNA+SND.WND, but SHOULD NOT + time out the connection if data beyond the right window edge + is not acknowledged. If the window shrinks to zero, the TCP + MUST probe it in the standard way (see next Section). + + DISCUSSION: + Many TCP implementations become confused if the window + shrinks from the right after data has been sent into a + larger window. Note that TCP has a heuristic to select + the latest window update despite possible datagram + reordering; as a result, it may ignore a window update + with a smaller window than previously offered if + neither the sequence number nor the acknowledgment + number is increased. + + + + + + + +Internet Engineering Task Force [Page 91] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + 4.2.2.17 Probing Zero Windows: RFC-793 Section 3.7, page 42 + + Probing of zero (offered) windows MUST be supported. + + A TCP MAY keep its offered receive window closed + indefinitely. As long as the receiving TCP continues to + send acknowledgments in response to the probe segments, the + sending TCP MUST allow the connection to stay open. + + DISCUSSION: + It is extremely important to remember that ACK + (acknowledgment) segments that contain no data are not + reliably transmitted by TCP. If zero window probing is + not supported, a connection may hang forever when an + ACK segment that re-opens the window is lost. + + The delay in opening a zero window generally occurs + when the receiving application stops taking data from + its TCP. For example, consider a printer daemon + application, stopped because the printer ran out of + paper. + + The transmitting host SHOULD send the first zero-window + probe when a zero window has existed for the retransmission + timeout period (see Section 4.2.2.15), and SHOULD increase + exponentially the interval between successive probes. + + DISCUSSION: + This procedure minimizes delay if the zero-window + condition is due to a lost ACK segment containing a + window-opening update. Exponential backoff is + recommended, possibly with some maximum interval not + specified here. This procedure is similar to that of + the retransmission algorithm, and it may be possible to + combine the two procedures in the implementation. + + 4.2.2.18 Passive OPEN Calls: RFC-793 Section 3.8 + + Every passive OPEN call either creates a new connection + record in LISTEN state, or it returns an error; it MUST NOT + affect any previously created connection record. + + A TCP that supports multiple concurrent users MUST provide + an OPEN call that will functionally allow an application to + LISTEN on a port while a connection block with the same + local port is in SYN-SENT or SYN-RECEIVED state. + + DISCUSSION: + + + +Internet Engineering Task Force [Page 92] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + Some applications (e.g., SMTP servers) may need to + handle multiple connection attempts at about the same + time. The probability of a connection attempt failing + is reduced by giving the application some means of + listening for a new connection at the same time that an + earlier connection attempt is going through the three- + way handshake. + + IMPLEMENTATION: + Acceptable implementations of concurrent opens may + permit multiple passive OPEN calls, or they may allow + "cloning" of LISTEN-state connections from a single + passive OPEN call. + + 4.2.2.19 Time to Live: RFC-793 Section 3.9, page 52 + + RFC-793 specified that TCP was to request the IP layer to + send TCP segments with TTL = 60. This is obsolete; the TTL + value used to send TCP segments MUST be configurable. See + Section 3.2.1.7 for discussion. + + 4.2.2.20 Event Processing: RFC-793 Section 3.9 + + While it is not strictly required, a TCP SHOULD be capable + of queueing out-of-order TCP segments. Change the "may" in + the last sentence of the first paragraph on page 70 to + "should". + + DISCUSSION: + Some small-host implementations have omitted segment + queueing because of limited buffer space. This + omission may be expected to adversely affect TCP + throughput, since loss of a single segment causes all + later segments to appear to be "out of sequence". + + In general, the processing of received segments MUST be + implemented to aggregate ACK segments whenever possible. + For example, if the TCP is processing a series of queued + segments, it MUST process them all before sending any ACK + segments. + + Here are some detailed error corrections and notes on the + Event Processing section of RFC-793. + + (a) CLOSE Call, CLOSE-WAIT state, p. 61: enter LAST-ACK + state, not CLOSING. + + (b) LISTEN state, check for SYN (pp. 65, 66): With a SYN + + + +Internet Engineering Task Force [Page 93] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + bit, if the security/compartment or the precedence is + wrong for the segment, a reset is sent. The wrong form + of reset is shown in the text; it should be: + + <SEQ=0><ACK=SEG.SEQ+SEG.LEN><CTL=RST,ACK> + + + (c) SYN-SENT state, Check for SYN, p. 68: When the + connection enters ESTABLISHED state, the following + variables must be set: + SND.WND <- SEG.WND + SND.WL1 <- SEG.SEQ + SND.WL2 <- SEG.ACK + + + (d) Check security and precedence, p. 71: The first heading + "ESTABLISHED STATE" should really be a list of all + states other than SYN-RECEIVED: ESTABLISHED, FIN-WAIT- + 1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, and + TIME-WAIT. + + (e) Check SYN bit, p. 71: "In SYN-RECEIVED state and if + the connection was initiated with a passive OPEN, then + return this connection to the LISTEN state and return. + Otherwise...". + + (f) Check ACK field, SYN-RECEIVED state, p. 72: When the + connection enters ESTABLISHED state, the variables + listed in (c) must be set. + + (g) Check ACK field, ESTABLISHED state, p. 72: The ACK is a + duplicate if SEG.ACK =< SND.UNA (the = was omitted). + Similarly, the window should be updated if: SND.UNA =< + SEG.ACK =< SND.NXT. + + (h) USER TIMEOUT, p. 77: + + It would be better to notify the application of the + timeout rather than letting TCP force the connection + closed. However, see also Section 4.2.3.5. + + + 4.2.2.21 Acknowledging Queued Segments: RFC-793 Section 3.9 + + A TCP MAY send an ACK segment acknowledging RCV.NXT when a + valid segment arrives that is in the window but not at the + left window edge. + + + + +Internet Engineering Task Force [Page 94] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + DISCUSSION: + RFC-793 (see page 74) was ambiguous about whether or + not an ACK segment should be sent when an out-of-order + segment was received, i.e., when SEG.SEQ was unequal to + RCV.NXT. + + One reason for ACKing out-of-order segments might be to + support an experimental algorithm known as "fast + retransmit". With this algorithm, the sender uses the + "redundant" ACK's to deduce that a segment has been + lost before the retransmission timer has expired. It + counts the number of times an ACK has been received + with the same value of SEG.ACK and with the same right + window edge. If more than a threshold number of such + ACK's is received, then the segment containing the + octets starting at SEG.ACK is assumed to have been lost + and is retransmitted, without awaiting a timeout. The + threshold is chosen to compensate for the maximum + likely segment reordering in the Internet. There is + not yet enough experience with the fast retransmit + algorithm to determine how useful it is. + + 4.2.3 SPECIFIC ISSUES + + 4.2.3.1 Retransmission Timeout Calculation + + A host TCP MUST implement Karn's algorithm and Jacobson's + algorithm for computing the retransmission timeout ("RTO"). + + o Jacobson's algorithm for computing the smoothed round- + trip ("RTT") time incorporates a simple measure of the + variance [TCP:7]. + + o Karn's algorithm for selecting RTT measurements ensures + that ambiguous round-trip times will not corrupt the + calculation of the smoothed round-trip time [TCP:6]. + + This implementation also MUST include "exponential backoff" + for successive RTO values for the same segment. + Retransmission of SYN segments SHOULD use the same algorithm + as data segments. + + DISCUSSION: + There were two known problems with the RTO calculations + specified in RFC-793. First, the accurate measurement + of RTTs is difficult when there are retransmissions. + Second, the algorithm to compute the smoothed round- + trip time is inadequate [TCP:7], because it incorrectly + + + +Internet Engineering Task Force [Page 95] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + assumed that the variance in RTT values would be small + and constant. These problems were solved by Karn's and + Jacobson's algorithm, respectively. + + The performance increase resulting from the use of + these improvements varies from noticeable to dramatic. + Jacobson's algorithm for incorporating the measured RTT + variance is especially important on a low-speed link, + where the natural variation of packet sizes causes a + large variation in RTT. One vendor found link + utilization on a 9.6kb line went from 10% to 90% as a + result of implementing Jacobson's variance algorithm in + TCP. + + The following values SHOULD be used to initialize the + estimation parameters for a new connection: + + (a) RTT = 0 seconds. + + (b) RTO = 3 seconds. (The smoothed variance is to be + initialized to the value that will result in this RTO). + + The recommended upper and lower bounds on the RTO are known + to be inadequate on large internets. The lower bound SHOULD + be measured in fractions of a second (to accommodate high + speed LANs) and the upper bound should be 2*MSL, i.e., 240 + seconds. + + DISCUSSION: + Experience has shown that these initialization values + are reasonable, and that in any case the Karn and + Jacobson algorithms make TCP behavior reasonably + insensitive to the initial parameter choices. + + 4.2.3.2 When to Send an ACK Segment + + A host that is receiving a stream of TCP data segments can + increase efficiency in both the Internet and the hosts by + sending fewer than one ACK (acknowledgment) segment per data + segment received; this is known as a "delayed ACK" [TCP:5]. + + A TCP SHOULD implement a delayed ACK, but an ACK should not + be excessively delayed; in particular, the delay MUST be + less than 0.5 seconds, and in a stream of full-sized + segments there SHOULD be an ACK for at least every second + segment. + + DISCUSSION: + + + +Internet Engineering Task Force [Page 96] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + A delayed ACK gives the application an opportunity to + update the window and perhaps to send an immediate + response. In particular, in the case of character-mode + remote login, a delayed ACK can reduce the number of + segments sent by the server by a factor of 3 (ACK, + window update, and echo character all combined in one + segment). + + In addition, on some large multi-user hosts, a delayed + ACK can substantially reduce protocol processing + overhead by reducing the total number of packets to be + processed [TCP:5]. However, excessive delays on ACK's + can disturb the round-trip timing and packet "clocking" + algorithms [TCP:7]. + + 4.2.3.3 When to Send a Window Update + + A TCP MUST include a SWS avoidance algorithm in the receiver + [TCP:5]. + + IMPLEMENTATION: + The receiver's SWS avoidance algorithm determines when + the right window edge may be advanced; this is + customarily known as "updating the window". This + algorithm combines with the delayed ACK algorithm (see + Section 4.2.3.2) to determine when an ACK segment + containing the current window will really be sent to + the receiver. We use the notation of RFC-793; see + Figures 4 and 5 in that document. + + The solution to receiver SWS is to avoid advancing the + right window edge RCV.NXT+RCV.WND in small increments, + even if data is received from the network in small + segments. + + Suppose the total receive buffer space is RCV.BUFF. At + any given moment, RCV.USER octets of this total may be + tied up with data that has been received and + acknowledged but which the user process has not yet + consumed. When the connection is quiescent, RCV.WND = + RCV.BUFF and RCV.USER = 0. + + Keeping the right window edge fixed as data arrives and + is acknowledged requires that the receiver offer less + than its full buffer space, i.e., the receiver must + specify a RCV.WND that keeps RCV.NXT+RCV.WND constant + as RCV.NXT increases. Thus, the total buffer space + RCV.BUFF is generally divided into three parts: + + + +Internet Engineering Task Force [Page 97] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + + |<------- RCV.BUFF ---------------->| + 1 2 3 + ----|---------|------------------|------|---- + RCV.NXT ^ + (Fixed) + + 1 - RCV.USER = data received but not yet consumed; + 2 - RCV.WND = space advertised to sender; + 3 - Reduction = space available but not yet + advertised. + + + The suggested SWS avoidance algorithm for the receiver + is to keep RCV.NXT+RCV.WND fixed until the reduction + satisfies: + + RCV.BUFF - RCV.USER - RCV.WND >= + + min( Fr * RCV.BUFF, Eff.snd.MSS ) + + where Fr is a fraction whose recommended value is 1/2, + and Eff.snd.MSS is the effective send MSS for the + connection (see Section 4.2.2.6). When the inequality + is satisfied, RCV.WND is set to RCV.BUFF-RCV.USER. + + Note that the general effect of this algorithm is to + advance RCV.WND in increments of Eff.snd.MSS (for + realistic receive buffers: Eff.snd.MSS < RCV.BUFF/2). + Note also that the receiver must use its own + Eff.snd.MSS, assuming it is the same as the sender's. + + 4.2.3.4 When to Send Data + + A TCP MUST include a SWS avoidance algorithm in the sender. + + A TCP SHOULD implement the Nagle Algorithm [TCP:9] to + coalesce short segments. However, there MUST be a way for + an application to disable the Nagle algorithm on an + individual connection. In all cases, sending data is also + subject to the limitation imposed by the Slow Start + algorithm (Section 4.2.2.15). + + DISCUSSION: + The Nagle algorithm is generally as follows: + + If there is unacknowledged data (i.e., SND.NXT > + SND.UNA), then the sending TCP buffers all user + + + +Internet Engineering Task Force [Page 98] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + data (regardless of the PSH bit), until the + outstanding data has been acknowledged or until + the TCP can send a full-sized segment (Eff.snd.MSS + bytes; see Section 4.2.2.6). + + Some applications (e.g., real-time display window + updates) require that the Nagle algorithm be turned + off, so small data segments can be streamed out at the + maximum rate. + + IMPLEMENTATION: + The sender's SWS avoidance algorithm is more difficult + than the receivers's, because the sender does not know + (directly) the receiver's total buffer space RCV.BUFF. + An approach which has been found to work well is for + the sender to calculate Max(SND.WND), the maximum send + window it has seen so far on the connection, and to use + this value as an estimate of RCV.BUFF. Unfortunately, + this can only be an estimate; the receiver may at any + time reduce the size of RCV.BUFF. To avoid a resulting + deadlock, it is necessary to have a timeout to force + transmission of data, overriding the SWS avoidance + algorithm. In practice, this timeout should seldom + occur. + + The "useable window" [TCP:5] is: + + U = SND.UNA + SND.WND - SND.NXT + + i.e., the offered window less the amount of data sent + but not acknowledged. If D is the amount of data + queued in the sending TCP but not yet sent, then the + following set of rules is recommended. + + Send data: + + (1) if a maximum-sized segment can be sent, i.e, if: + + min(D,U) >= Eff.snd.MSS; + + + (2) or if the data is pushed and all queued data can + be sent now, i.e., if: + + [SND.NXT = SND.UNA and] PUSHED and D <= U + + (the bracketed condition is imposed by the Nagle + algorithm); + + + +Internet Engineering Task Force [Page 99] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + (3) or if at least a fraction Fs of the maximum window + can be sent, i.e., if: + + [SND.NXT = SND.UNA and] + + min(D.U) >= Fs * Max(SND.WND); + + + (4) or if data is PUSHed and the override timeout + occurs. + + Here Fs is a fraction whose recommended value is 1/2. + The override timeout should be in the range 0.1 - 1.0 + seconds. It may be convenient to combine this timer + with the timer used to probe zero windows (Section + 4.2.2.17). + + Finally, note that the SWS avoidance algorithm just + specified is to be used instead of the sender-side + algorithm contained in [TCP:5]. + + 4.2.3.5 TCP Connection Failures + + Excessive retransmission of the same segment by TCP + indicates some failure of the remote host or the Internet + path. This failure may be of short or long duration. The + following procedure MUST be used to handle excessive + retransmissions of data segments [IP:11]: + + (a) There are two thresholds R1 and R2 measuring the amount + of retransmission that has occurred for the same + segment. R1 and R2 might be measured in time units or + as a count of retransmissions. + + (b) When the number of transmissions of the same segment + reaches or exceeds threshold R1, pass negative advice + (see Section 3.3.1.4) to the IP layer, to trigger + dead-gateway diagnosis. + + (c) When the number of transmissions of the same segment + reaches a threshold R2 greater than R1, close the + connection. + + (d) An application MUST be able to set the value for R2 for + a particular connection. For example, an interactive + application might set R2 to "infinity," giving the user + control over when to disconnect. + + + + +Internet Engineering Task Force [Page 100] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + (d) TCP SHOULD inform the application of the delivery + problem (unless such information has been disabled by + the application; see Section 4.2.4.1), when R1 is + reached and before R2. This will allow a remote login + (User Telnet) application program to inform the user, + for example. + + The value of R1 SHOULD correspond to at least 3 + retransmissions, at the current RTO. The value of R2 SHOULD + correspond to at least 100 seconds. + + An attempt to open a TCP connection could fail with + excessive retransmissions of the SYN segment or by receipt + of a RST segment or an ICMP Port Unreachable. SYN + retransmissions MUST be handled in the general way just + described for data retransmissions, including notification + of the application layer. + + However, the values of R1 and R2 may be different for SYN + and data segments. In particular, R2 for a SYN segment MUST + be set large enough to provide retransmission of the segment + for at least 3 minutes. The application can close the + connection (i.e., give up on the open attempt) sooner, of + course. + + DISCUSSION: + Some Internet paths have significant setup times, and + the number of such paths is likely to increase in the + future. + + 4.2.3.6 TCP Keep-Alives + + Implementors MAY include "keep-alives" in their TCP + implementations, although this practice is not universally + accepted. If keep-alives are included, the application MUST + be able to turn them on or off for each TCP connection, and + they MUST default to off. + + Keep-alive packets MUST only be sent when no data or + acknowledgement packets have been received for the + connection within an interval. This interval MUST be + configurable and MUST default to no less than two hours. + + It is extremely important to remember that ACK segments that + contain no data are not reliably transmitted by TCP. + Consequently, if a keep-alive mechanism is implemented it + MUST NOT interpret failure to respond to any specific probe + as a dead connection. + + + +Internet Engineering Task Force [Page 101] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + An implementation SHOULD send a keep-alive segment with no + data; however, it MAY be configurable to send a keep-alive + segment containing one garbage octet, for compatibility with + erroneous TCP implementations. + + DISCUSSION: + A "keep-alive" mechanism periodically probes the other + end of a connection when the connection is otherwise + idle, even when there is no data to be sent. The TCP + specification does not include a keep-alive mechanism + because it could: (1) cause perfectly good connections + to break during transient Internet failures; (2) + consume unnecessary bandwidth ("if no one is using the + connection, who cares if it is still good?"); and (3) + cost money for an Internet path that charges for + packets. + + Some TCP implementations, however, have included a + keep-alive mechanism. To confirm that an idle + connection is still active, these implementations send + a probe segment designed to elicit a response from the + peer TCP. Such a segment generally contains SEG.SEQ = + SND.NXT-1 and may or may not contain one garbage octet + of data. Note that on a quiet connection SND.NXT = + RCV.NXT, so that this SEG.SEQ will be outside the + window. Therefore, the probe causes the receiver to + return an acknowledgment segment, confirming that the + connection is still live. If the peer has dropped the + connection due to a network partition or a crash, it + will respond with a RST instead of an acknowledgment + segment. + + Unfortunately, some misbehaved TCP implementations fail + to respond to a segment with SEG.SEQ = SND.NXT-1 unless + the segment contains data. Alternatively, an + implementation could determine whether a peer responded + correctly to keep-alive packets with no garbage data + octet. + + A TCP keep-alive mechanism should only be invoked in + server applications that might otherwise hang + indefinitely and consume resources unnecessarily if a + client crashes or aborts a connection during a network + failure. + + + + + + + +Internet Engineering Task Force [Page 102] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + 4.2.3.7 TCP Multihoming + + If an application on a multihomed host does not specify the + local IP address when actively opening a TCP connection, + then the TCP MUST ask the IP layer to select a local IP + address before sending the (first) SYN. See the function + GET_SRCADDR() in Section 3.4. + + At all other times, a previous segment has either been sent + or received on this connection, and TCP MUST use the same + local address is used that was used in those previous + segments. + + 4.2.3.8 IP Options + + When received options are passed up to TCP from the IP + layer, TCP MUST ignore options that it does not understand. + + A TCP MAY support the Time Stamp and Record Route options. + + An application MUST be able to specify a source route when + it actively opens a TCP connection, and this MUST take + precedence over a source route received in a datagram. + + When a TCP connection is OPENed passively and a packet + arrives with a completed IP Source Route option (containing + a return route), TCP MUST save the return route and use it + for all segments sent on this connection. If a different + source route arrives in a later segment, the later + definition SHOULD override the earlier one. + + 4.2.3.9 ICMP Messages + + TCP MUST act on an ICMP error message passed up from the IP + layer, directing it to the connection that created the + error. The necessary demultiplexing information can be + found in the IP header contained within the ICMP message. + + o Source Quench + + TCP MUST react to a Source Quench by slowing + transmission on the connection. The RECOMMENDED + procedure is for a Source Quench to trigger a "slow + start," as if a retransmission timeout had occurred. + + o Destination Unreachable -- codes 0, 1, 5 + + Since these Unreachable messages indicate soft error + + + +Internet Engineering Task Force [Page 103] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + conditions, TCP MUST NOT abort the connection, and it + SHOULD make the information available to the + application. + + DISCUSSION: + TCP could report the soft error condition directly + to the application layer with an upcall to the + ERROR_REPORT routine, or it could merely note the + message and report it to the application only when + and if the TCP connection times out. + + o Destination Unreachable -- codes 2-4 + + These are hard error conditions, so TCP SHOULD abort + the connection. + + o Time Exceeded -- codes 0, 1 + + This should be handled the same way as Destination + Unreachable codes 0, 1, 5 (see above). + + o Parameter Problem + + This should be handled the same way as Destination + Unreachable codes 0, 1, 5 (see above). + + + 4.2.3.10 Remote Address Validation + + A TCP implementation MUST reject as an error a local OPEN + call for an invalid remote IP address (e.g., a broadcast or + multicast address). + + An incoming SYN with an invalid source address must be + ignored either by TCP or by the IP layer (see Section + 3.2.1.3). + + A TCP implementation MUST silently discard an incoming SYN + segment that is addressed to a broadcast or multicast + address. + + 4.2.3.11 TCP Traffic Patterns + + IMPLEMENTATION: + The TCP protocol specification [TCP:1] gives the + implementor much freedom in designing the algorithms + that control the message flow over the connection -- + packetizing, managing the window, sending + + + +Internet Engineering Task Force [Page 104] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + acknowledgments, etc. These design decisions are + difficult because a TCP must adapt to a wide range of + traffic patterns. Experience has shown that a TCP + implementor needs to verify the design on two extreme + traffic patterns: + + o Single-character Segments + + Even if the sender is using the Nagle Algorithm, + when a TCP connection carries remote login traffic + across a low-delay LAN the receiver will generally + get a stream of single-character segments. If + remote terminal echo mode is in effect, the + receiver's system will generally echo each + character as it is received. + + o Bulk Transfer + + When TCP is used for bulk transfer, the data + stream should be made up (almost) entirely of + segments of the size of the effective MSS. + Although TCP uses a sequence number space with + byte (octet) granularity, in bulk-transfer mode + its operation should be as if TCP used a sequence + space that counted only segments. + + Experience has furthermore shown that a single TCP can + effectively and efficiently handle these two extremes. + + The most important tool for verifying a new TCP + implementation is a packet trace program. There is a + large volume of experience showing the importance of + tracing a variety of traffic patterns with other TCP + implementations and studying the results carefully. + + + 4.2.3.12 Efficiency + + IMPLEMENTATION: + Extensive experience has led to the following + suggestions for efficient implementation of TCP: + + (a) Don't Copy Data + + In bulk data transfer, the primary CPU-intensive + tasks are copying data from one place to another + and checksumming the data. It is vital to + minimize the number of copies of TCP data. Since + + + +Internet Engineering Task Force [Page 105] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + the ultimate speed limitation may be fetching data + across the memory bus, it may be useful to combine + the copy with checksumming, doing both with a + single memory fetch. + + (b) Hand-Craft the Checksum Routine + + A good TCP checksumming routine is typically two + to five times faster than a simple and direct + implementation of the definition. Great care and + clever coding are often required and advisable to + make the checksumming code "blazing fast". See + [TCP:10]. + + (c) Code for the Common Case + + TCP protocol processing can be complicated, but + for most segments there are only a few simple + decisions to be made. Per-segment processing will + be greatly speeded up by coding the main line to + minimize the number of decisions in the most + common case. + + + 4.2.4 TCP/APPLICATION LAYER INTERFACE + + 4.2.4.1 Asynchronous Reports + + There MUST be a mechanism for reporting soft TCP error + conditions to the application. Generically, we assume this + takes the form of an application-supplied ERROR_REPORT + routine that may be upcalled [INTRO:7] asynchronously from + the transport layer: + + ERROR_REPORT(local connection name, reason, subreason) + + The precise encoding of the reason and subreason parameters + is not specified here. However, the conditions that are + reported asynchronously to the application MUST include: + + * ICMP error message arrived (see 4.2.3.9) + + * Excessive retransmissions (see 4.2.3.5) + + * Urgent pointer advance (see 4.2.2.4). + + However, an application program that does not want to + receive such ERROR_REPORT calls SHOULD be able to + + + +Internet Engineering Task Force [Page 106] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + effectively disable these calls. + + DISCUSSION: + These error reports generally reflect soft errors that + can be ignored without harm by many applications. It + has been suggested that these error report calls should + default to "disabled," but this is not required. + + 4.2.4.2 Type-of-Service + + The application layer MUST be able to specify the Type-of- + Service (TOS) for segments that are sent on a connection. + It not required, but the application SHOULD be able to + change the TOS during the connection lifetime. TCP SHOULD + pass the current TOS value without change to the IP layer, + when it sends segments on the connection. + + The TOS will be specified independently in each direction on + the connection, so that the receiver application will + specify the TOS used for ACK segments. + + TCP MAY pass the most recently received TOS up to the + application. + + DISCUSSION + Some applications (e.g., SMTP) change the nature of + their communication during the lifetime of a + connection, and therefore would like to change the TOS + specification. + + Note also that the OPEN call specified in RFC-793 + includes a parameter ("options") in which the caller + can specify IP options such as source route, record + route, or timestamp. + + 4.2.4.3 Flush Call + + Some TCP implementations have included a FLUSH call, which + will empty the TCP send queue of any data for which the user + has issued SEND calls but which is still to the right of the + current send window. That is, it flushes as much queued + send data as possible without losing sequence number + synchronization. This is useful for implementing the "abort + output" function of Telnet. + + + + + + + +Internet Engineering Task Force [Page 107] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + 4.2.4.4 Multihoming + + The user interface outlined in sections 2.7 and 3.8 of RFC- + 793 needs to be extended for multihoming. The OPEN call + MUST have an optional parameter: + + OPEN( ... [local IP address,] ... ) + + to allow the specification of the local IP address. + + DISCUSSION: + Some TCP-based applications need to specify the local + IP address to be used to open a particular connection; + FTP is an example. + + IMPLEMENTATION: + A passive OPEN call with a specified "local IP address" + parameter will await an incoming connection request to + that address. If the parameter is unspecified, a + passive OPEN will await an incoming connection request + to any local IP address, and then bind the local IP + address of the connection to the particular address + that is used. + + For an active OPEN call, a specified "local IP address" + parameter will be used for opening the connection. If + the parameter is unspecified, the networking software + will choose an appropriate local IP address (see + Section 3.3.4.2) for the connection + + 4.2.5 TCP REQUIREMENT SUMMARY + + | | | | |S| | + | | | | |H| |F + | | | | |O|M|o + | | |S| |U|U|o + | | |H| |L|S|t + | |M|O| |D|T|n + | |U|U|M| | |o + | |S|L|A|N|N|t + | |T|D|Y|O|O|t +FEATURE |SECTION | | | |T|T|e +-------------------------------------------------|--------|-|-|-|-|-|-- + | | | | | | | +Push flag | | | | | | | + Aggregate or queue un-pushed data |4.2.2.2 | | |x| | | + Sender collapse successive PSH flags |4.2.2.2 | |x| | | | + SEND call can specify PUSH |4.2.2.2 | | |x| | | + + + +Internet Engineering Task Force [Page 108] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + If cannot: sender buffer indefinitely |4.2.2.2 | | | | |x| + If cannot: PSH last segment |4.2.2.2 |x| | | | | + Notify receiving ALP of PSH |4.2.2.2 | | |x| | |1 + Send max size segment when possible |4.2.2.2 | |x| | | | + | | | | | | | +Window | | | | | | | + Treat as unsigned number |4.2.2.3 |x| | | | | + Handle as 32-bit number |4.2.2.3 | |x| | | | + Shrink window from right |4.2.2.16| | | |x| | + Robust against shrinking window |4.2.2.16|x| | | | | + Receiver's window closed indefinitely |4.2.2.17| | |x| | | + Sender probe zero window |4.2.2.17|x| | | | | + First probe after RTO |4.2.2.17| |x| | | | + Exponential backoff |4.2.2.17| |x| | | | + Allow window stay zero indefinitely |4.2.2.17|x| | | | | + Sender timeout OK conn with zero wind |4.2.2.17| | | | |x| + | | | | | | | +Urgent Data | | | | | | | + Pointer points to last octet |4.2.2.4 |x| | | | | + Arbitrary length urgent data sequence |4.2.2.4 |x| | | | | + Inform ALP asynchronously of urgent data |4.2.2.4 |x| | | | |1 + ALP can learn if/how much urgent data Q'd |4.2.2.4 |x| | | | |1 + | | | | | | | +TCP Options | | | | | | | + Receive TCP option in any segment |4.2.2.5 |x| | | | | + Ignore unsupported options |4.2.2.5 |x| | | | | + Cope with illegal option length |4.2.2.5 |x| | | | | + Implement sending & receiving MSS option |4.2.2.6 |x| | | | | + Send MSS option unless 536 |4.2.2.6 | |x| | | | + Send MSS option always |4.2.2.6 | | |x| | | + Send-MSS default is 536 |4.2.2.6 |x| | | | | + Calculate effective send seg size |4.2.2.6 |x| | | | | + | | | | | | | +TCP Checksums | | | | | | | + Sender compute checksum |4.2.2.7 |x| | | | | + Receiver check checksum |4.2.2.7 |x| | | | | + | | | | | | | +Use clock-driven ISN selection |4.2.2.9 |x| | | | | + | | | | | | | +Opening Connections | | | | | | | + Support simultaneous open attempts |4.2.2.10|x| | | | | + SYN-RCVD remembers last state |4.2.2.11|x| | | | | + Passive Open call interfere with others |4.2.2.18| | | | |x| + Function: simultan. LISTENs for same port |4.2.2.18|x| | | | | + Ask IP for src address for SYN if necc. |4.2.3.7 |x| | | | | + Otherwise, use local addr of conn. |4.2.3.7 |x| | | | | + OPEN to broadcast/multicast IP Address |4.2.3.14| | | | |x| + Silently discard seg to bcast/mcast addr |4.2.3.14|x| | | | | + + + +Internet Engineering Task Force [Page 109] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + | | | | | | | +Closing Connections | | | | | | | + RST can contain data |4.2.2.12| |x| | | | + Inform application of aborted conn |4.2.2.13|x| | | | | + Half-duplex close connections |4.2.2.13| | |x| | | + Send RST to indicate data lost |4.2.2.13| |x| | | | + In TIME-WAIT state for 2xMSL seconds |4.2.2.13|x| | | | | + Accept SYN from TIME-WAIT state |4.2.2.13| | |x| | | + | | | | | | | +Retransmissions | | | | | | | + Jacobson Slow Start algorithm |4.2.2.15|x| | | | | + Jacobson Congestion-Avoidance algorithm |4.2.2.15|x| | | | | + Retransmit with same IP ident |4.2.2.15| | |x| | | + Karn's algorithm |4.2.3.1 |x| | | | | + Jacobson's RTO estimation alg. |4.2.3.1 |x| | | | | + Exponential backoff |4.2.3.1 |x| | | | | + SYN RTO calc same as data |4.2.3.1 | |x| | | | + Recommended initial values and bounds |4.2.3.1 | |x| | | | + | | | | | | | +Generating ACK's: | | | | | | | + Queue out-of-order segments |4.2.2.20| |x| | | | + Process all Q'd before send ACK |4.2.2.20|x| | | | | + Send ACK for out-of-order segment |4.2.2.21| | |x| | | + Delayed ACK's |4.2.3.2 | |x| | | | + Delay < 0.5 seconds |4.2.3.2 |x| | | | | + Every 2nd full-sized segment ACK'd |4.2.3.2 |x| | | | | + Receiver SWS-Avoidance Algorithm |4.2.3.3 |x| | | | | + | | | | | | | +Sending data | | | | | | | + Configurable TTL |4.2.2.19|x| | | | | + Sender SWS-Avoidance Algorithm |4.2.3.4 |x| | | | | + Nagle algorithm |4.2.3.4 | |x| | | | + Application can disable Nagle algorithm |4.2.3.4 |x| | | | | + | | | | | | | +Connection Failures: | | | | | | | + Negative advice to IP on R1 retxs |4.2.3.5 |x| | | | | + Close connection on R2 retxs |4.2.3.5 |x| | | | | + ALP can set R2 |4.2.3.5 |x| | | | |1 + Inform ALP of R1<=retxs<R2 |4.2.3.5 | |x| | | |1 + Recommended values for R1, R2 |4.2.3.5 | |x| | | | + Same mechanism for SYNs |4.2.3.5 |x| | | | | + R2 at least 3 minutes for SYN |4.2.3.5 |x| | | | | + | | | | | | | +Send Keep-alive Packets: |4.2.3.6 | | |x| | | + - Application can request |4.2.3.6 |x| | | | | + - Default is "off" |4.2.3.6 |x| | | | | + - Only send if idle for interval |4.2.3.6 |x| | | | | + - Interval configurable |4.2.3.6 |x| | | | | + + + +Internet Engineering Task Force [Page 110] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + - Default at least 2 hrs. |4.2.3.6 |x| | | | | + - Tolerant of lost ACK's |4.2.3.6 |x| | | | | + | | | | | | | +IP Options | | | | | | | + Ignore options TCP doesn't understand |4.2.3.8 |x| | | | | + Time Stamp support |4.2.3.8 | | |x| | | + Record Route support |4.2.3.8 | | |x| | | + Source Route: | | | | | | | + ALP can specify |4.2.3.8 |x| | | | |1 + Overrides src rt in datagram |4.2.3.8 |x| | | | | + Build return route from src rt |4.2.3.8 |x| | | | | + Later src route overrides |4.2.3.8 | |x| | | | + | | | | | | | +Receiving ICMP Messages from IP |4.2.3.9 |x| | | | | + Dest. Unreach (0,1,5) => inform ALP |4.2.3.9 | |x| | | | + Dest. Unreach (0,1,5) => abort conn |4.2.3.9 | | | | |x| + Dest. Unreach (2-4) => abort conn |4.2.3.9 | |x| | | | + Source Quench => slow start |4.2.3.9 | |x| | | | + Time Exceeded => tell ALP, don't abort |4.2.3.9 | |x| | | | + Param Problem => tell ALP, don't abort |4.2.3.9 | |x| | | | + | | | | | | | +Address Validation | | | | | | | + Reject OPEN call to invalid IP address |4.2.3.10|x| | | | | + Reject SYN from invalid IP address |4.2.3.10|x| | | | | + Silently discard SYN to bcast/mcast addr |4.2.3.10|x| | | | | + | | | | | | | +TCP/ALP Interface Services | | | | | | | + Error Report mechanism |4.2.4.1 |x| | | | | + ALP can disable Error Report Routine |4.2.4.1 | |x| | | | + ALP can specify TOS for sending |4.2.4.2 |x| | | | | + Passed unchanged to IP |4.2.4.2 | |x| | | | + ALP can change TOS during connection |4.2.4.2 | |x| | | | + Pass received TOS up to ALP |4.2.4.2 | | |x| | | + FLUSH call |4.2.4.3 | | |x| | | + Optional local IP addr parm. in OPEN |4.2.4.4 |x| | | | | +-------------------------------------------------|--------|-|-|-|-|-|-- +-------------------------------------------------|--------|-|-|-|-|-|-- + +FOOTNOTES: + +(1) "ALP" means Application-Layer program. + + + + + + + + + + +Internet Engineering Task Force [Page 111] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + +5. REFERENCES + +INTRODUCTORY REFERENCES + + +[INTRO:1] "Requirements for Internet Hosts -- Application and Support," + IETF Host Requirements Working Group, R. Braden, Ed., RFC-1123, + October 1989. + +[INTRO:2] "Requirements for Internet Gateways," R. Braden and J. + Postel, RFC-1009, June 1987. + +[INTRO:3] "DDN Protocol Handbook," NIC-50004, NIC-50005, NIC-50006, + (three volumes), SRI International, December 1985. + +[INTRO:4] "Official Internet Protocols," J. Reynolds and J. Postel, + RFC-1011, May 1987. + + This document is republished periodically with new RFC numbers; the + latest version must be used. + +[INTRO:5] "Protocol Document Order Information," O. Jacobsen and J. + Postel, RFC-980, March 1986. + +[INTRO:6] "Assigned Numbers," J. Reynolds and J. Postel, RFC-1010, May + 1987. + + This document is republished periodically with new RFC numbers; the + latest version must be used. + +[INTRO:7] "Modularity and Efficiency in Protocol Implementations," D. + Clark, RFC-817, July 1982. + +[INTRO:8] "The Structuring of Systems Using Upcalls," D. Clark, 10th ACM + SOSP, Orcas Island, Washington, December 1985. + + +Secondary References: + + +[INTRO:9] "A Protocol for Packet Network Intercommunication," V. Cerf + and R. Kahn, IEEE Transactions on Communication, May 1974. + +[INTRO:10] "The ARPA Internet Protocol," J. Postel, C. Sunshine, and D. + Cohen, Computer Networks, Vol. 5, No. 4, July 1981. + +[INTRO:11] "The DARPA Internet Protocol Suite," B. Leiner, J. Postel, + R. Cole and D. Mills, Proceedings INFOCOM 85, IEEE, Washington DC, + + + +Internet Engineering Task Force [Page 112] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + March 1985. Also in: IEEE Communications Magazine, March 1985. + Also available as ISI-RS-85-153. + +[INTRO:12] "Final Text of DIS8473, Protocol for Providing the + Connectionless Mode Network Service," ANSI, published as RFC-994, + March 1986. + +[INTRO:13] "End System to Intermediate System Routing Exchange + Protocol," ANSI X3S3.3, published as RFC-995, April 1986. + + +LINK LAYER REFERENCES + + +[LINK:1] "Trailer Encapsulations," S. Leffler and M. Karels, RFC-893, + April 1984. + +[LINK:2] "An Ethernet Address Resolution Protocol," D. Plummer, RFC-826, + November 1982. + +[LINK:3] "A Standard for the Transmission of IP Datagrams over Ethernet + Networks," C. Hornig, RFC-894, April 1984. + +[LINK:4] "A Standard for the Transmission of IP Datagrams over IEEE 802 + "Networks," J. Postel and J. Reynolds, RFC-1042, February 1988. + + This RFC contains a great deal of information of importance to + Internet implementers planning to use IEEE 802 networks. + + +IP LAYER REFERENCES + + +[IP:1] "Internet Protocol (IP)," J. Postel, RFC-791, September 1981. + +[IP:2] "Internet Control Message Protocol (ICMP)," J. Postel, RFC-792, + September 1981. + +[IP:3] "Internet Standard Subnetting Procedure," J. Mogul and J. Postel, + RFC-950, August 1985. + +[IP:4] "Host Extensions for IP Multicasting," S. Deering, RFC-1112, + August 1989. + +[IP:5] "Military Standard Internet Protocol," MIL-STD-1777, Department + of Defense, August 1983. + + This specification, as amended by RFC-963, is intended to describe + + + +Internet Engineering Task Force [Page 113] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + + the Internet Protocol but has some serious omissions (e.g., the + mandatory subnet extension [IP:3] and the optional multicasting + extension [IP:4]). It is also out of date. If there is a + conflict, RFC-791, RFC-792, and RFC-950 must be taken as + authoritative, while the present document is authoritative over + all. + +[IP:6] "Some Problems with the Specification of the Military Standard + Internet Protocol," D. Sidhu, RFC-963, November 1985. + +[IP:7] "The TCP Maximum Segment Size and Related Topics," J. Postel, + RFC-879, November 1983. + + Discusses and clarifies the relationship between the TCP Maximum + Segment Size option and the IP datagram size. + +[IP:8] "Internet Protocol Security Options," B. Schofield, RFC-1108, + October 1989. + +[IP:9] "Fragmentation Considered Harmful," C. Kent and J. Mogul, ACM + SIGCOMM-87, August 1987. Published as ACM Comp Comm Review, Vol. + 17, no. 5. + + This useful paper discusses the problems created by Internet + fragmentation and presents alternative solutions. + +[IP:10] "IP Datagram Reassembly Algorithms," D. Clark, RFC-815, July + 1982. + + This and the following paper should be read by every implementor. + +[IP:11] "Fault Isolation and Recovery," D. Clark, RFC-816, July 1982. + +SECONDARY IP REFERENCES: + + +[IP:12] "Broadcasting Internet Datagrams in the Presence of Subnets," J. + Mogul, RFC-922, October 1984. + +[IP:13] "Name, Addresses, Ports, and Routes," D. Clark, RFC-814, July + 1982. + +[IP:14] "Something a Host Could Do with Source Quench: The Source Quench + Introduced Delay (SQUID)," W. Prue and J. Postel, RFC-1016, July + 1987. + + This RFC first described directed broadcast addresses. However, + the bulk of the RFC is concerned with gateways, not hosts. + + + +Internet Engineering Task Force [Page 114] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + +UDP REFERENCES: + + +[UDP:1] "User Datagram Protocol," J. Postel, RFC-768, August 1980. + + +TCP REFERENCES: + + +[TCP:1] "Transmission Control Protocol," J. Postel, RFC-793, September + 1981. + + +[TCP:2] "Transmission Control Protocol," MIL-STD-1778, US Department of + Defense, August 1984. + + This specification as amended by RFC-964 is intended to describe + the same protocol as RFC-793 [TCP:1]. If there is a conflict, + RFC-793 takes precedence, and the present document is authoritative + over both. + + +[TCP:3] "Some Problems with the Specification of the Military Standard + Transmission Control Protocol," D. Sidhu and T. Blumer, RFC-964, + November 1985. + + +[TCP:4] "The TCP Maximum Segment Size and Related Topics," J. Postel, + RFC-879, November 1983. + + +[TCP:5] "Window and Acknowledgment Strategy in TCP," D. Clark, RFC-813, + July 1982. + + +[TCP:6] "Round Trip Time Estimation," P. Karn & C. Partridge, ACM + SIGCOMM-87, August 1987. + + +[TCP:7] "Congestion Avoidance and Control," V. Jacobson, ACM SIGCOMM-88, + August 1988. + + +SECONDARY TCP REFERENCES: + + +[TCP:8] "Modularity and Efficiency in Protocol Implementation," D. + Clark, RFC-817, July 1982. + + + +Internet Engineering Task Force [Page 115] + + + + +RFC1122 TRANSPORT LAYER -- TCP October 1989 + + +[TCP:9] "Congestion Control in IP/TCP," J. Nagle, RFC-896, January 1984. + + +[TCP:10] "Computing the Internet Checksum," R. Braden, D. Borman, and C. + Partridge, RFC-1071, September 1988. + + +[TCP:11] "TCP Extensions for Long-Delay Paths," V. Jacobson & R. Braden, + RFC-1072, October 1988. + + +Security Considerations + + There are many security issues in the communication layers of host + software, but a full discussion is beyond the scope of this RFC. + + The Internet architecture generally provides little protection + against spoofing of IP source addresses, so any security mechanism + that is based upon verifying the IP source address of a datagram + should be treated with suspicion. However, in restricted + environments some source-address checking may be possible. For + example, there might be a secure LAN whose gateway to the rest of the + Internet discarded any incoming datagram with a source address that + spoofed the LAN address. In this case, a host on the LAN could use + the source address to test for local vs. remote source. This problem + is complicated by source routing, and some have suggested that + source-routed datagram forwarding by hosts (see Section 3.3.5) should + be outlawed for security reasons. + + Security-related issues are mentioned in sections concerning the IP + Security option (Section 3.2.1.8), the ICMP Parameter Problem message + (Section 3.2.2.5), IP options in UDP datagrams (Section 4.1.3.2), and + reserved TCP ports (Section 4.2.2.1). + +Author's Address + + Robert Braden + USC/Information Sciences Institute + 4676 Admiralty Way + Marina del Rey, CA 90292-6695 + + Phone: (213) 822 1511 + + EMail: Braden@ISI.EDU + + + + + + + +Internet Engineering Task Force [Page 116] + diff --git a/doc/rfc/rfc1123.txt b/doc/rfc/rfc1123.txt new file mode 100644 index 00000000..51cdf83c --- /dev/null +++ b/doc/rfc/rfc1123.txt @@ -0,0 +1,5782 @@ + + + + + + +Network Working Group Internet Engineering Task Force +Request for Comments: 1123 R. Braden, Editor + October 1989 + + + Requirements for Internet Hosts -- Application and Support + +Status of This Memo + + This RFC is an official specification for the Internet community. It + incorporates by reference, amends, corrects, and supplements the + primary protocol standards documents relating to hosts. Distribution + of this document is unlimited. + +Summary + + This RFC is one of a pair that defines and discusses the requirements + for Internet host software. This RFC covers the application and + support protocols; its companion RFC-1122 covers the communication + protocol layers: link layer, IP layer, and transport layer. + + + + Table of Contents + + + + + 1. INTRODUCTION ............................................... 5 + 1.1 The Internet Architecture .............................. 6 + 1.2 General Considerations ................................. 6 + 1.2.1 Continuing Internet Evolution ..................... 6 + 1.2.2 Robustness Principle .............................. 7 + 1.2.3 Error Logging ..................................... 8 + 1.2.4 Configuration ..................................... 8 + 1.3 Reading this Document .................................. 10 + 1.3.1 Organization ...................................... 10 + 1.3.2 Requirements ...................................... 10 + 1.3.3 Terminology ....................................... 11 + 1.4 Acknowledgments ........................................ 12 + + 2. GENERAL ISSUES ............................................. 13 + 2.1 Host Names and Numbers ................................. 13 + 2.2 Using Domain Name Service .............................. 13 + 2.3 Applications on Multihomed hosts ....................... 14 + 2.4 Type-of-Service ........................................ 14 + 2.5 GENERAL APPLICATION REQUIREMENTS SUMMARY ............... 15 + + + + +Internet Engineering Task Force [Page 1] + + + + +RFC1123 INTRODUCTION October 1989 + + + 3. REMOTE LOGIN -- TELNET PROTOCOL ............................ 16 + 3.1 INTRODUCTION ........................................... 16 + 3.2 PROTOCOL WALK-THROUGH .................................. 16 + 3.2.1 Option Negotiation ................................ 16 + 3.2.2 Telnet Go-Ahead Function .......................... 16 + 3.2.3 Control Functions ................................. 17 + 3.2.4 Telnet "Synch" Signal ............................. 18 + 3.2.5 NVT Printer and Keyboard .......................... 19 + 3.2.6 Telnet Command Structure .......................... 20 + 3.2.7 Telnet Binary Option .............................. 20 + 3.2.8 Telnet Terminal-Type Option ....................... 20 + 3.3 SPECIFIC ISSUES ........................................ 21 + 3.3.1 Telnet End-of-Line Convention ..................... 21 + 3.3.2 Data Entry Terminals .............................. 23 + 3.3.3 Option Requirements ............................... 24 + 3.3.4 Option Initiation ................................. 24 + 3.3.5 Telnet Linemode Option ............................ 25 + 3.4 TELNET/USER INTERFACE .................................. 25 + 3.4.1 Character Set Transparency ........................ 25 + 3.4.2 Telnet Commands ................................... 26 + 3.4.3 TCP Connection Errors ............................. 26 + 3.4.4 Non-Default Telnet Contact Port ................... 26 + 3.4.5 Flushing Output ................................... 26 + 3.5. TELNET REQUIREMENTS SUMMARY ........................... 27 + + 4. FILE TRANSFER .............................................. 29 + 4.1 FILE TRANSFER PROTOCOL -- FTP .......................... 29 + 4.1.1 INTRODUCTION ...................................... 29 + 4.1.2. PROTOCOL WALK-THROUGH ............................ 29 + 4.1.2.1 LOCAL Type ................................... 29 + 4.1.2.2 Telnet Format Control ........................ 30 + 4.1.2.3 Page Structure ............................... 30 + 4.1.2.4 Data Structure Transformations ............... 30 + 4.1.2.5 Data Connection Management ................... 31 + 4.1.2.6 PASV Command ................................. 31 + 4.1.2.7 LIST and NLST Commands ....................... 31 + 4.1.2.8 SITE Command ................................. 32 + 4.1.2.9 STOU Command ................................. 32 + 4.1.2.10 Telnet End-of-line Code ..................... 32 + 4.1.2.11 FTP Replies ................................. 33 + 4.1.2.12 Connections ................................. 34 + 4.1.2.13 Minimum Implementation; RFC-959 Section ..... 34 + 4.1.3 SPECIFIC ISSUES ................................... 35 + 4.1.3.1 Non-standard Command Verbs ................... 35 + 4.1.3.2 Idle Timeout ................................. 36 + 4.1.3.3 Concurrency of Data and Control .............. 36 + 4.1.3.4 FTP Restart Mechanism ........................ 36 + 4.1.4 FTP/USER INTERFACE ................................ 39 + + + +Internet Engineering Task Force [Page 2] + + + + +RFC1123 INTRODUCTION October 1989 + + + 4.1.4.1 Pathname Specification ....................... 39 + 4.1.4.2 "QUOTE" Command .............................. 40 + 4.1.4.3 Displaying Replies to User ................... 40 + 4.1.4.4 Maintaining Synchronization .................. 40 + 4.1.5 FTP REQUIREMENTS SUMMARY ......................... 41 + 4.2 TRIVIAL FILE TRANSFER PROTOCOL -- TFTP ................. 44 + 4.2.1 INTRODUCTION ...................................... 44 + 4.2.2 PROTOCOL WALK-THROUGH ............................. 44 + 4.2.2.1 Transfer Modes ............................... 44 + 4.2.2.2 UDP Header ................................... 44 + 4.2.3 SPECIFIC ISSUES ................................... 44 + 4.2.3.1 Sorcerer's Apprentice Syndrome ............... 44 + 4.2.3.2 Timeout Algorithms ........................... 46 + 4.2.3.3 Extensions ................................... 46 + 4.2.3.4 Access Control ............................... 46 + 4.2.3.5 Broadcast Request ............................ 46 + 4.2.4 TFTP REQUIREMENTS SUMMARY ......................... 47 + + 5. ELECTRONIC MAIL -- SMTP and RFC-822 ........................ 48 + 5.1 INTRODUCTION ........................................... 48 + 5.2 PROTOCOL WALK-THROUGH .................................. 48 + 5.2.1 The SMTP Model .................................... 48 + 5.2.2 Canonicalization .................................. 49 + 5.2.3 VRFY and EXPN Commands ............................ 50 + 5.2.4 SEND, SOML, and SAML Commands ..................... 50 + 5.2.5 HELO Command ...................................... 50 + 5.2.6 Mail Relay ........................................ 51 + 5.2.7 RCPT Command ...................................... 52 + 5.2.8 DATA Command ...................................... 53 + 5.2.9 Command Syntax .................................... 54 + 5.2.10 SMTP Replies ..................................... 54 + 5.2.11 Transparency ..................................... 55 + 5.2.12 WKS Use in MX Processing ......................... 55 + 5.2.13 RFC-822 Message Specification .................... 55 + 5.2.14 RFC-822 Date and Time Specification .............. 55 + 5.2.15 RFC-822 Syntax Change ............................ 56 + 5.2.16 RFC-822 Local-part .............................. 56 + 5.2.17 Domain Literals .................................. 57 + 5.2.18 Common Address Formatting Errors ................. 58 + 5.2.19 Explicit Source Routes ........................... 58 + 5.3 SPECIFIC ISSUES ........................................ 59 + 5.3.1 SMTP Queueing Strategies .......................... 59 + 5.3.1.1 Sending Strategy .............................. 59 + 5.3.1.2 Receiving strategy ........................... 61 + 5.3.2 Timeouts in SMTP .................................. 61 + 5.3.3 Reliable Mail Receipt ............................. 63 + 5.3.4 Reliable Mail Transmission ........................ 63 + 5.3.5 Domain Name Support ............................... 65 + + + +Internet Engineering Task Force [Page 3] + + + + +RFC1123 INTRODUCTION October 1989 + + + 5.3.6 Mailing Lists and Aliases ......................... 65 + 5.3.7 Mail Gatewaying ................................... 66 + 5.3.8 Maximum Message Size .............................. 68 + 5.4 SMTP REQUIREMENTS SUMMARY .............................. 69 + + 6. SUPPORT SERVICES ............................................ 72 + 6.1 DOMAIN NAME TRANSLATION ................................. 72 + 6.1.1 INTRODUCTION ....................................... 72 + 6.1.2 PROTOCOL WALK-THROUGH ............................. 72 + 6.1.2.1 Resource Records with Zero TTL ............... 73 + 6.1.2.2 QCLASS Values ................................ 73 + 6.1.2.3 Unused Fields ................................ 73 + 6.1.2.4 Compression .................................. 73 + 6.1.2.5 Misusing Configuration Info .................. 73 + 6.1.3 SPECIFIC ISSUES ................................... 74 + 6.1.3.1 Resolver Implementation ...................... 74 + 6.1.3.2 Transport Protocols .......................... 75 + 6.1.3.3 Efficient Resource Usage ..................... 77 + 6.1.3.4 Multihomed Hosts ............................. 78 + 6.1.3.5 Extensibility ................................ 79 + 6.1.3.6 Status of RR Types ........................... 79 + 6.1.3.7 Robustness ................................... 80 + 6.1.3.8 Local Host Table ............................. 80 + 6.1.4 DNS USER INTERFACE ................................ 81 + 6.1.4.1 DNS Administration ........................... 81 + 6.1.4.2 DNS User Interface ........................... 81 + 6.1.4.3 Interface Abbreviation Facilities ............. 82 + 6.1.5 DOMAIN NAME SYSTEM REQUIREMENTS SUMMARY ........... 84 + 6.2 HOST INITIALIZATION .................................... 87 + 6.2.1 INTRODUCTION ...................................... 87 + 6.2.2 REQUIREMENTS ...................................... 87 + 6.2.2.1 Dynamic Configuration ........................ 87 + 6.2.2.2 Loading Phase ................................ 89 + 6.3 REMOTE MANAGEMENT ...................................... 90 + 6.3.1 INTRODUCTION ...................................... 90 + 6.3.2 PROTOCOL WALK-THROUGH ............................. 90 + 6.3.3 MANAGEMENT REQUIREMENTS SUMMARY ................... 92 + + 7. REFERENCES ................................................. 93 + + + + + + + + + + + + +Internet Engineering Task Force [Page 4] + + + + +RFC1123 INTRODUCTION October 1989 + + +1. INTRODUCTION + + This document is one of a pair that defines and discusses the + requirements for host system implementations of the Internet protocol + suite. This RFC covers the applications layer and support protocols. + Its companion RFC, "Requirements for Internet Hosts -- Communications + Layers" [INTRO:1] covers the lower layer protocols: transport layer, + IP layer, and link layer. + + These documents are intended to provide guidance for vendors, + implementors, and users of Internet communication software. They + represent the consensus of a large body of technical experience and + wisdom, contributed by members of the Internet research and vendor + communities. + + This RFC enumerates standard protocols that a host connected to the + Internet must use, and it incorporates by reference the RFCs and + other documents describing the current specifications for these + protocols. It corrects errors in the referenced documents and adds + additional discussion and guidance for an implementor. + + For each protocol, this document also contains an explicit set of + requirements, recommendations, and options. The reader must + understand that the list of requirements in this document is + incomplete by itself; the complete set of requirements for an + Internet host is primarily defined in the standard protocol + specification documents, with the corrections, amendments, and + supplements contained in this RFC. + + A good-faith implementation of the protocols that was produced after + careful reading of the RFC's and with some interaction with the + Internet technical community, and that followed good communications + software engineering practices, should differ from the requirements + of this document in only minor ways. Thus, in many cases, the + "requirements" in this RFC are already stated or implied in the + standard protocol documents, so that their inclusion here is, in a + sense, redundant. However, they were included because some past + implementation has made the wrong choice, causing problems of + interoperability, performance, and/or robustness. + + This document includes discussion and explanation of many of the + requirements and recommendations. A simple list of requirements + would be dangerous, because: + + o Some required features are more important than others, and some + features are optional. + + o There may be valid reasons why particular vendor products that + + + +Internet Engineering Task Force [Page 5] + + + + +RFC1123 INTRODUCTION October 1989 + + + are designed for restricted contexts might choose to use + different specifications. + + However, the specifications of this document must be followed to meet + the general goal of arbitrary host interoperation across the + diversity and complexity of the Internet system. Although most + current implementations fail to meet these requirements in various + ways, some minor and some major, this specification is the ideal + towards which we need to move. + + These requirements are based on the current level of Internet + architecture. This document will be updated as required to provide + additional clarifications or to include additional information in + those areas in which specifications are still evolving. + + This introductory section begins with general advice to host software + vendors, and then gives some guidance on reading the rest of the + document. Section 2 contains general requirements that may be + applicable to all application and support protocols. Sections 3, 4, + and 5 contain the requirements on protocols for the three major + applications: Telnet, file transfer, and electronic mail, + respectively. Section 6 covers the support applications: the domain + name system, system initialization, and management. Finally, all + references will be found in Section 7. + + 1.1 The Internet Architecture + + For a brief introduction to the Internet architecture from a host + viewpoint, see Section 1.1 of [INTRO:1]. That section also + contains recommended references for general background on the + Internet architecture. + + 1.2 General Considerations + + There are two important lessons that vendors of Internet host + software have learned and which a new vendor should consider + seriously. + + 1.2.1 Continuing Internet Evolution + + The enormous growth of the Internet has revealed problems of + management and scaling in a large datagram-based packet + communication system. These problems are being addressed, and + as a result there will be continuing evolution of the + specifications described in this document. These changes will + be carefully planned and controlled, since there is extensive + participation in this planning by the vendors and by the + organizations responsible for operations of the networks. + + + +Internet Engineering Task Force [Page 6] + + + + +RFC1123 INTRODUCTION October 1989 + + + Development, evolution, and revision are characteristic of + computer network protocols today, and this situation will + persist for some years. A vendor who develops computer + communication software for the Internet protocol suite (or any + other protocol suite!) and then fails to maintain and update + that software for changing specifications is going to leave a + trail of unhappy customers. The Internet is a large + communication network, and the users are in constant contact + through it. Experience has shown that knowledge of + deficiencies in vendor software propagates quickly through the + Internet technical community. + + 1.2.2 Robustness Principle + + At every layer of the protocols, there is a general rule whose + application can lead to enormous benefits in robustness and + interoperability: + + "Be liberal in what you accept, and + conservative in what you send" + + Software should be written to deal with every conceivable + error, no matter how unlikely; sooner or later a packet will + come in with that particular combination of errors and + attributes, and unless the software is prepared, chaos can + ensue. In general, it is best to assume that the network is + filled with malevolent entities that will send in packets + designed to have the worst possible effect. This assumption + will lead to suitable protective design, although the most + serious problems in the Internet have been caused by + unenvisaged mechanisms triggered by low-probability events; + mere human malice would never have taken so devious a course! + + Adaptability to change must be designed into all levels of + Internet host software. As a simple example, consider a + protocol specification that contains an enumeration of values + for a particular header field -- e.g., a type field, a port + number, or an error code; this enumeration must be assumed to + be incomplete. Thus, if a protocol specification defines four + possible error codes, the software must not break when a fifth + code shows up. An undefined code might be logged (see below), + but it must not cause a failure. + + The second part of the principle is almost as important: + software on other hosts may contain deficiencies that make it + unwise to exploit legal but obscure protocol features. It is + unwise to stray far from the obvious and simple, lest untoward + effects result elsewhere. A corollary of this is "watch out + + + +Internet Engineering Task Force [Page 7] + + + + +RFC1123 INTRODUCTION October 1989 + + + for misbehaving hosts"; host software should be prepared, not + just to survive other misbehaving hosts, but also to cooperate + to limit the amount of disruption such hosts can cause to the + shared communication facility. + + 1.2.3 Error Logging + + The Internet includes a great variety of host and gateway + systems, each implementing many protocols and protocol layers, + and some of these contain bugs and mis-features in their + Internet protocol software. As a result of complexity, + diversity, and distribution of function, the diagnosis of user + problems is often very difficult. + + Problem diagnosis will be aided if host implementations include + a carefully designed facility for logging erroneous or + "strange" protocol events. It is important to include as much + diagnostic information as possible when an error is logged. In + particular, it is often useful to record the header(s) of a + packet that caused an error. However, care must be taken to + ensure that error logging does not consume prohibitive amounts + of resources or otherwise interfere with the operation of the + host. + + There is a tendency for abnormal but harmless protocol events + to overflow error logging files; this can be avoided by using a + "circular" log, or by enabling logging only while diagnosing a + known failure. It may be useful to filter and count duplicate + successive messages. One strategy that seems to work well is: + (1) always count abnormalities and make such counts accessible + through the management protocol (see Section 6.3); and (2) + allow the logging of a great variety of events to be + selectively enabled. For example, it might useful to be able + to "log everything" or to "log everything for host X". + + Note that different managements may have differing policies + about the amount of error logging that they want normally + enabled in a host. Some will say, "if it doesn't hurt me, I + don't want to know about it", while others will want to take a + more watchful and aggressive attitude about detecting and + removing protocol abnormalities. + + 1.2.4 Configuration + + It would be ideal if a host implementation of the Internet + protocol suite could be entirely self-configuring. This would + allow the whole suite to be implemented in ROM or cast into + silicon, it would simplify diskless workstations, and it would + + + +Internet Engineering Task Force [Page 8] + + + + +RFC1123 INTRODUCTION October 1989 + + + be an immense boon to harried LAN administrators as well as + system vendors. We have not reached this ideal; in fact, we + are not even close. + + At many points in this document, you will find a requirement + that a parameter be a configurable option. There are several + different reasons behind such requirements. In a few cases, + there is current uncertainty or disagreement about the best + value, and it may be necessary to update the recommended value + in the future. In other cases, the value really depends on + external factors -- e.g., the size of the host and the + distribution of its communication load, or the speeds and + topology of nearby networks -- and self-tuning algorithms are + unavailable and may be insufficient. In some cases, + configurability is needed because of administrative + requirements. + + Finally, some configuration options are required to communicate + with obsolete or incorrect implementations of the protocols, + distributed without sources, that unfortunately persist in many + parts of the Internet. To make correct systems coexist with + these faulty systems, administrators often have to "mis- + configure" the correct systems. This problem will correct + itself gradually as the faulty systems are retired, but it + cannot be ignored by vendors. + + When we say that a parameter must be configurable, we do not + intend to require that its value be explicitly read from a + configuration file at every boot time. We recommend that + implementors set up a default for each parameter, so a + configuration file is only necessary to override those defaults + that are inappropriate in a particular installation. Thus, the + configurability requirement is an assurance that it will be + POSSIBLE to override the default when necessary, even in a + binary-only or ROM-based product. + + This document requires a particular value for such defaults in + some cases. The choice of default is a sensitive issue when + the configuration item controls the accommodation to existing + faulty systems. If the Internet is to converge successfully to + complete interoperability, the default values built into + implementations must implement the official protocol, not + "mis-configurations" to accommodate faulty implementations. + Although marketing considerations have led some vendors to + choose mis-configuration defaults, we urge vendors to choose + defaults that will conform to the standard. + + Finally, we note that a vendor needs to provide adequate + + + +Internet Engineering Task Force [Page 9] + + + + +RFC1123 INTRODUCTION October 1989 + + + documentation on all configuration parameters, their limits and + effects. + + + 1.3 Reading this Document + + 1.3.1 Organization + + In general, each major section is organized into the following + subsections: + + (1) Introduction + + (2) Protocol Walk-Through -- considers the protocol + specification documents section-by-section, correcting + errors, stating requirements that may be ambiguous or + ill-defined, and providing further clarification or + explanation. + + (3) Specific Issues -- discusses protocol design and + implementation issues that were not included in the walk- + through. + + (4) Interfaces -- discusses the service interface to the next + higher layer. + + (5) Summary -- contains a summary of the requirements of the + section. + + Under many of the individual topics in this document, there is + parenthetical material labeled "DISCUSSION" or + "IMPLEMENTATION". This material is intended to give + clarification and explanation of the preceding requirements + text. It also includes some suggestions on possible future + directions or developments. The implementation material + contains suggested approaches that an implementor may want to + consider. + + The summary sections are intended to be guides and indexes to + the text, but are necessarily cryptic and incomplete. The + summaries should never be used or referenced separately from + the complete RFC. + + 1.3.2 Requirements + + In this document, the words that are used to define the + significance of each particular requirement are capitalized. + These words are: + + + +Internet Engineering Task Force [Page 10] + + + + +RFC1123 INTRODUCTION October 1989 + + + * "MUST" + + This word or the adjective "REQUIRED" means that the item + is an absolute requirement of the specification. + + * "SHOULD" + + This word or the adjective "RECOMMENDED" means that there + may exist valid reasons in particular circumstances to + ignore this item, but the full implications should be + understood and the case carefully weighed before choosing + a different course. + + * "MAY" + + This word or the adjective "OPTIONAL" means that this item + is truly optional. One vendor may choose to include the + item because a particular marketplace requires it or + because it enhances the product, for example; another + vendor may omit the same item. + + + An implementation is not compliant if it fails to satisfy one + or more of the MUST requirements for the protocols it + implements. An implementation that satisfies all the MUST and + all the SHOULD requirements for its protocols is said to be + "unconditionally compliant"; one that satisfies all the MUST + requirements but not all the SHOULD requirements for its + protocols is said to be "conditionally compliant". + + 1.3.3 Terminology + + This document uses the following technical terms: + + Segment + A segment is the unit of end-to-end transmission in the + TCP protocol. A segment consists of a TCP header followed + by application data. A segment is transmitted by + encapsulation in an IP datagram. + + Message + This term is used by some application layer protocols + (particularly SMTP) for an application data unit. + + Datagram + A [UDP] datagram is the unit of end-to-end transmission in + the UDP protocol. + + + + +Internet Engineering Task Force [Page 11] + + + + +RFC1123 INTRODUCTION October 1989 + + + Multihomed + A host is said to be multihomed if it has multiple IP + addresses to connected networks. + + + + 1.4 Acknowledgments + + This document incorporates contributions and comments from a large + group of Internet protocol experts, including representatives of + university and research labs, vendors, and government agencies. + It was assembled primarily by the Host Requirements Working Group + of the Internet Engineering Task Force (IETF). + + The Editor would especially like to acknowledge the tireless + dedication of the following people, who attended many long + meetings and generated 3 million bytes of electronic mail over the + past 18 months in pursuit of this document: Philip Almquist, Dave + Borman (Cray Research), Noel Chiappa, Dave Crocker (DEC), Steve + Deering (Stanford), Mike Karels (Berkeley), Phil Karn (Bellcore), + John Lekashman (NASA), Charles Lynn (BBN), Keith McCloghrie (TWG), + Paul Mockapetris (ISI), Thomas Narten (Purdue), Craig Partridge + (BBN), Drew Perkins (CMU), and James Van Bokkelen (FTP Software). + + In addition, the following people made major contributions to the + effort: Bill Barns (Mitre), Steve Bellovin (AT&T), Mike Brescia + (BBN), Ed Cain (DCA), Annette DeSchon (ISI), Martin Gross (DCA), + Phill Gross (NRI), Charles Hedrick (Rutgers), Van Jacobson (LBL), + John Klensin (MIT), Mark Lottor (SRI), Milo Medin (NASA), Bill + Melohn (Sun Microsystems), Greg Minshall (Kinetics), Jeff Mogul + (DEC), John Mullen (CMC), Jon Postel (ISI), John Romkey (Epilogue + Technology), and Mike StJohns (DCA). The following also made + significant contributions to particular areas: Eric Allman + (Berkeley), Rob Austein (MIT), Art Berggreen (ACC), Keith Bostic + (Berkeley), Vint Cerf (NRI), Wayne Hathaway (NASA), Matt Korn + (IBM), Erik Naggum (Naggum Software, Norway), Robert Ullmann + (Prime Computer), David Waitzman (BBN), Frank Wancho (USA), Arun + Welch (Ohio State), Bill Westfield (Cisco), and Rayan Zachariassen + (Toronto). + + We are grateful to all, including any contributors who may have + been inadvertently omitted from this list. + + + + + + + + + +Internet Engineering Task Force [Page 12] + + + + +RFC1123 APPLICATIONS LAYER -- GENERAL October 1989 + + +2. GENERAL ISSUES + + This section contains general requirements that may be applicable to + all application-layer protocols. + + 2.1 Host Names and Numbers + + The syntax of a legal Internet host name was specified in RFC-952 + [DNS:4]. One aspect of host name syntax is hereby changed: the + restriction on the first character is relaxed to allow either a + letter or a digit. Host software MUST support this more liberal + syntax. + + Host software MUST handle host names of up to 63 characters and + SHOULD handle host names of up to 255 characters. + + Whenever a user inputs the identity of an Internet host, it SHOULD + be possible to enter either (1) a host domain name or (2) an IP + address in dotted-decimal ("#.#.#.#") form. The host SHOULD check + the string syntactically for a dotted-decimal number before + looking it up in the Domain Name System. + + DISCUSSION: + This last requirement is not intended to specify the complete + syntactic form for entering a dotted-decimal host number; + that is considered to be a user-interface issue. For + example, a dotted-decimal number must be enclosed within + "[ ]" brackets for SMTP mail (see Section 5.2.17). This + notation could be made universal within a host system, + simplifying the syntactic checking for a dotted-decimal + number. + + If a dotted-decimal number can be entered without such + identifying delimiters, then a full syntactic check must be + made, because a segment of a host domain name is now allowed + to begin with a digit and could legally be entirely numeric + (see Section 6.1.2.4). However, a valid host name can never + have the dotted-decimal form #.#.#.#, since at least the + highest-level component label will be alphabetic. + + 2.2 Using Domain Name Service + + Host domain names MUST be translated to IP addresses as described + in Section 6.1. + + Applications using domain name services MUST be able to cope with + soft error conditions. Applications MUST wait a reasonable + interval between successive retries due to a soft error, and MUST + + + +Internet Engineering Task Force [Page 13] + + + + +RFC1123 APPLICATIONS LAYER -- GENERAL October 1989 + + + allow for the possibility that network problems may deny service + for hours or even days. + + An application SHOULD NOT rely on the ability to locate a WKS + record containing an accurate listing of all services at a + particular host address, since the WKS RR type is not often used + by Internet sites. To confirm that a service is present, simply + attempt to use it. + + 2.3 Applications on Multihomed hosts + + When the remote host is multihomed, the name-to-address + translation will return a list of alternative IP addresses. As + specified in Section 6.1.3.4, this list should be in order of + decreasing preference. Application protocol implementations + SHOULD be prepared to try multiple addresses from the list until + success is obtained. More specific requirements for SMTP are + given in Section 5.3.4. + + When the local host is multihomed, a UDP-based request/response + application SHOULD send the response with an IP source address + that is the same as the specific destination address of the UDP + request datagram. The "specific destination address" is defined + in the "IP Addressing" section of the companion RFC [INTRO:1]. + + Similarly, a server application that opens multiple TCP + connections to the same client SHOULD use the same local IP + address for all. + + 2.4 Type-of-Service + + Applications MUST select appropriate TOS values when they invoke + transport layer services, and these values MUST be configurable. + Note that a TOS value contains 5 bits, of which only the most- + significant 3 bits are currently defined; the other two bits MUST + be zero. + + DISCUSSION: + As gateway algorithms are developed to implement Type-of- + Service, the recommended values for various application + protocols may change. In addition, it is likely that + particular combinations of users and Internet paths will want + non-standard TOS values. For these reasons, the TOS values + must be configurable. + + See the latest version of the "Assigned Numbers" RFC + [INTRO:5] for the recommended TOS values for the major + application protocols. + + + +Internet Engineering Task Force [Page 14] + + + + +RFC1123 APPLICATIONS LAYER -- GENERAL October 1989 + + + 2.5 GENERAL APPLICATION REQUIREMENTS SUMMARY + + | | | | |S| | + | | | | |H| |F + | | | | |O|M|o + | | |S| |U|U|o + | | |H| |L|S|t + | |M|O| |D|T|n + | |U|U|M| | |o + | |S|L|A|N|N|t + | |T|D|Y|O|O|t +FEATURE |SECTION | | | |T|T|e +-----------------------------------------------|----------|-|-|-|-|-|-- + | | | | | | | +User interfaces: | | | | | | | + Allow host name to begin with digit |2.1 |x| | | | | + Host names of up to 635 characters |2.1 |x| | | | | + Host names of up to 255 characters |2.1 | |x| | | | + Support dotted-decimal host numbers |2.1 | |x| | | | + Check syntactically for dotted-dec first |2.1 | |x| | | | + | | | | | | | +Map domain names per Section 6.1 |2.2 |x| | | | | +Cope with soft DNS errors |2.2 |x| | | | | + Reasonable interval between retries |2.2 |x| | | | | + Allow for long outages |2.2 |x| | | | | +Expect WKS records to be available |2.2 | | | |x| | + | | | | | | | +Try multiple addr's for remote multihomed host |2.3 | |x| | | | +UDP reply src addr is specific dest of request |2.3 | |x| | | | +Use same IP addr for related TCP connections |2.3 | |x| | | | +Specify appropriate TOS values |2.4 |x| | | | | + TOS values configurable |2.4 |x| | | | | + Unused TOS bits zero |2.4 |x| | | | | + | | | | | | | + | | | | | | | + + + + + + + + + + + + + + + + +Internet Engineering Task Force [Page 15] + + + + +RFC1123 REMOTE LOGIN -- TELNET October 1989 + + +3. REMOTE LOGIN -- TELNET PROTOCOL + + 3.1 INTRODUCTION + + Telnet is the standard Internet application protocol for remote + login. It provides the encoding rules to link a user's + keyboard/display on a client ("user") system with a command + interpreter on a remote server system. A subset of the Telnet + protocol is also incorporated within other application protocols, + e.g., FTP and SMTP. + + Telnet uses a single TCP connection, and its normal data stream + ("Network Virtual Terminal" or "NVT" mode) is 7-bit ASCII with + escape sequences to embed control functions. Telnet also allows + the negotiation of many optional modes and functions. + + The primary Telnet specification is to be found in RFC-854 + [TELNET:1], while the options are defined in many other RFCs; see + Section 7 for references. + + 3.2 PROTOCOL WALK-THROUGH + + 3.2.1 Option Negotiation: RFC-854, pp. 2-3 + + Every Telnet implementation MUST include option negotiation and + subnegotiation machinery [TELNET:2]. + + A host MUST carefully follow the rules of RFC-854 to avoid + option-negotiation loops. A host MUST refuse (i.e, reply + WONT/DONT to a DO/WILL) an unsupported option. Option + negotiation SHOULD continue to function (even if all requests + are refused) throughout the lifetime of a Telnet connection. + + If all option negotiations fail, a Telnet implementation MUST + default to, and support, an NVT. + + DISCUSSION: + Even though more sophisticated "terminals" and supporting + option negotiations are becoming the norm, all + implementations must be prepared to support an NVT for any + user-server communication. + + 3.2.2 Telnet Go-Ahead Function: RFC-854, p. 5, and RFC-858 + + On a host that never sends the Telnet command Go Ahead (GA), + the Telnet Server MUST attempt to negotiate the Suppress Go + Ahead option (i.e., send "WILL Suppress Go Ahead"). A User or + Server Telnet MUST always accept negotiation of the Suppress Go + + + +Internet Engineering Task Force [Page 16] + + + + +RFC1123 REMOTE LOGIN -- TELNET October 1989 + + + Ahead option. + + When it is driving a full-duplex terminal for which GA has no + meaning, a User Telnet implementation MAY ignore GA commands. + + DISCUSSION: + Half-duplex ("locked-keyboard") line-at-a-time terminals + for which the Go-Ahead mechanism was designed have largely + disappeared from the scene. It turned out to be difficult + to implement sending the Go-Ahead signal in many operating + systems, even some systems that support native half-duplex + terminals. The difficulty is typically that the Telnet + server code does not have access to information about + whether the user process is blocked awaiting input from + the Telnet connection, i.e., it cannot reliably determine + when to send a GA command. Therefore, most Telnet Server + hosts do not send GA commands. + + The effect of the rules in this section is to allow either + end of a Telnet connection to veto the use of GA commands. + + There is a class of half-duplex terminals that is still + commercially important: "data entry terminals," which + interact in a full-screen manner. However, supporting + data entry terminals using the Telnet protocol does not + require the Go Ahead signal; see Section 3.3.2. + + 3.2.3 Control Functions: RFC-854, pp. 7-8 + + The list of Telnet commands has been extended to include EOR + (End-of-Record), with code 239 [TELNET:9]. + + Both User and Server Telnets MAY support the control functions + EOR, EC, EL, and Break, and MUST support AO, AYT, DM, IP, NOP, + SB, and SE. + + A host MUST be able to receive and ignore any Telnet control + functions that it does not support. + + DISCUSSION: + Note that a Server Telnet is required to support the + Telnet IP (Interrupt Process) function, even if the server + host has an equivalent in-stream function (e.g., Control-C + in many systems). The Telnet IP function may be stronger + than an in-stream interrupt command, because of the out- + of-band effect of TCP urgent data. + + The EOR control function may be used to delimit the + + + +Internet Engineering Task Force [Page 17] + + + + +RFC1123 REMOTE LOGIN -- TELNET October 1989 + + + stream. An important application is data entry terminal + support (see Section 3.3.2). There was concern that since + EOR had not been defined in RFC-854, a host that was not + prepared to correctly ignore unknown Telnet commands might + crash if it received an EOR. To protect such hosts, the + End-of-Record option [TELNET:9] was introduced; however, a + properly implemented Telnet program will not require this + protection. + + 3.2.4 Telnet "Synch" Signal: RFC-854, pp. 8-10 + + When it receives "urgent" TCP data, a User or Server Telnet + MUST discard all data except Telnet commands until the DM (and + end of urgent) is reached. + + When it sends Telnet IP (Interrupt Process), a User Telnet + SHOULD follow it by the Telnet "Synch" sequence, i.e., send as + TCP urgent data the sequence "IAC IP IAC DM". The TCP urgent + pointer points to the DM octet. + + When it receives a Telnet IP command, a Server Telnet MAY send + a Telnet "Synch" sequence back to the user, to flush the output + stream. The choice ought to be consistent with the way the + server operating system behaves when a local user interrupts a + process. + + When it receives a Telnet AO command, a Server Telnet MUST send + a Telnet "Synch" sequence back to the user, to flush the output + stream. + + A User Telnet SHOULD have the capability of flushing output + when it sends a Telnet IP; see also Section 3.4.5. + + DISCUSSION: + There are three possible ways for a User Telnet to flush + the stream of server output data: + + (1) Send AO after IP. + + This will cause the server host to send a "flush- + buffered-output" signal to its operating system. + However, the AO may not take effect locally, i.e., + stop terminal output at the User Telnet end, until + the Server Telnet has received and processed the AO + and has sent back a "Synch". + + (2) Send DO TIMING-MARK [TELNET:7] after IP, and discard + all output locally until a WILL/WONT TIMING-MARK is + + + +Internet Engineering Task Force [Page 18] + + + + +RFC1123 REMOTE LOGIN -- TELNET October 1989 + + + received from the Server Telnet. + + Since the DO TIMING-MARK will be processed after the + IP at the server, the reply to it should be in the + right place in the output data stream. However, the + TIMING-MARK will not send a "flush buffered output" + signal to the server operating system. Whether or + not this is needed is dependent upon the server + system. + + (3) Do both. + + The best method is not entirely clear, since it must + accommodate a number of existing server hosts that do not + follow the Telnet standards in various ways. The safest + approach is probably to provide a user-controllable option + to select (1), (2), or (3). + + 3.2.5 NVT Printer and Keyboard: RFC-854, p. 11 + + In NVT mode, a Telnet SHOULD NOT send characters with the + high-order bit 1, and MUST NOT send it as a parity bit. + Implementations that pass the high-order bit to applications + SHOULD negotiate binary mode (see Section 3.2.6). + + + DISCUSSION: + Implementors should be aware that a strict reading of + RFC-854 allows a client or server expecting NVT ASCII to + ignore characters with the high-order bit set. In + general, binary mode is expected to be used for + transmission of an extended (beyond 7-bit) character set + with Telnet. + + However, there exist applications that really need an 8- + bit NVT mode, which is currently not defined, and these + existing applications do set the high-order bit during + part or all of the life of a Telnet connection. Note that + binary mode is not the same as 8-bit NVT mode, since + binary mode turns off end-of-line processing. For this + reason, the requirements on the high-order bit are stated + as SHOULD, not MUST. + + RFC-854 defines a minimal set of properties of a "network + virtual terminal" or NVT; this is not meant to preclude + additional features in a real terminal. A Telnet + connection is fully transparent to all 7-bit ASCII + characters, including arbitrary ASCII control characters. + + + +Internet Engineering Task Force [Page 19] + + + + +RFC1123 REMOTE LOGIN -- TELNET October 1989 + + + For example, a terminal might support full-screen commands + coded as ASCII escape sequences; a Telnet implementation + would pass these sequences as uninterpreted data. Thus, + an NVT should not be conceived as a terminal type of a + highly-restricted device. + + 3.2.6 Telnet Command Structure: RFC-854, p. 13 + + Since options may appear at any point in the data stream, a + Telnet escape character (known as IAC, with the value 255) to + be sent as data MUST be doubled. + + 3.2.7 Telnet Binary Option: RFC-856 + + When the Binary option has been successfully negotiated, + arbitrary 8-bit characters are allowed. However, the data + stream MUST still be scanned for IAC characters, any embedded + Telnet commands MUST be obeyed, and data bytes equal to IAC + MUST be doubled. Other character processing (e.g., replacing + CR by CR NUL or by CR LF) MUST NOT be done. In particular, + there is no end-of-line convention (see Section 3.3.1) in + binary mode. + + DISCUSSION: + The Binary option is normally negotiated in both + directions, to change the Telnet connection from NVT mode + to "binary mode". + + The sequence IAC EOR can be used to delimit blocks of data + within a binary-mode Telnet stream. + + 3.2.8 Telnet Terminal-Type Option: RFC-1091 + + The Terminal-Type option MUST use the terminal type names + officially defined in the Assigned Numbers RFC [INTRO:5], when + they are available for the particular terminal. However, the + receiver of a Terminal-Type option MUST accept any name. + + DISCUSSION: + RFC-1091 [TELNET:10] updates an earlier version of the + Terminal-Type option defined in RFC-930. The earlier + version allowed a server host capable of supporting + multiple terminal types to learn the type of a particular + client's terminal, assuming that each physical terminal + had an intrinsic type. However, today a "terminal" is + often really a terminal emulator program running in a PC, + perhaps capable of emulating a range of terminal types. + Therefore, RFC-1091 extends the specification to allow a + + + +Internet Engineering Task Force [Page 20] + + + + +RFC1123 REMOTE LOGIN -- TELNET October 1989 + + + more general terminal-type negotiation between User and + Server Telnets. + + 3.3 SPECIFIC ISSUES + + 3.3.1 Telnet End-of-Line Convention + + The Telnet protocol defines the sequence CR LF to mean "end- + of-line". For terminal input, this corresponds to a command- + completion or "end-of-line" key being pressed on a user + terminal; on an ASCII terminal, this is the CR key, but it may + also be labelled "Return" or "Enter". + + When a Server Telnet receives the Telnet end-of-line sequence + CR LF as input from a remote terminal, the effect MUST be the + same as if the user had pressed the "end-of-line" key on a + local terminal. On server hosts that use ASCII, in particular, + receipt of the Telnet sequence CR LF must cause the same effect + as a local user pressing the CR key on a local terminal. Thus, + CR LF and CR NUL MUST have the same effect on an ASCII server + host when received as input over a Telnet connection. + + A User Telnet MUST be able to send any of the forms: CR LF, CR + NUL, and LF. A User Telnet on an ASCII host SHOULD have a + user-controllable mode to send either CR LF or CR NUL when the + user presses the "end-of-line" key, and CR LF SHOULD be the + default. + + The Telnet end-of-line sequence CR LF MUST be used to send + Telnet data that is not terminal-to-computer (e.g., for Server + Telnet sending output, or the Telnet protocol incorporated + another application protocol). + + DISCUSSION: + To allow interoperability between arbitrary Telnet clients + and servers, the Telnet protocol defined a standard + representation for a line terminator. Since the ASCII + character set includes no explicit end-of-line character, + systems have chosen various representations, e.g., CR, LF, + and the sequence CR LF. The Telnet protocol chose the CR + LF sequence as the standard for network transmission. + + Unfortunately, the Telnet protocol specification in RFC- + 854 [TELNET:1] has turned out to be somewhat ambiguous on + what character(s) should be sent from client to server for + the "end-of-line" key. The result has been a massive and + continuing interoperability headache, made worse by + various faulty implementations of both User and Server + + + +Internet Engineering Task Force [Page 21] + + + + +RFC1123 REMOTE LOGIN -- TELNET October 1989 + + + Telnets. + + Although the Telnet protocol is based on a perfectly + symmetric model, in a remote login session the role of the + user at a terminal differs from the role of the server + host. For example, RFC-854 defines the meaning of CR, LF, + and CR LF as output from the server, but does not specify + what the User Telnet should send when the user presses the + "end-of-line" key on the terminal; this turns out to be + the point at issue. + + When a user presses the "end-of-line" key, some User + Telnet implementations send CR LF, while others send CR + NUL (based on a different interpretation of the same + sentence in RFC-854). These will be equivalent for a + correctly-implemented ASCII server host, as discussed + above. For other servers, a mode in the User Telnet is + needed. + + The existence of User Telnets that send only CR NUL when + CR is pressed creates a dilemma for non-ASCII hosts: they + can either treat CR NUL as equivalent to CR LF in input, + thus precluding the possibility of entering a "bare" CR, + or else lose complete interworking. + + Suppose a user on host A uses Telnet to log into a server + host B, and then execute B's User Telnet program to log + into server host C. It is desirable for the Server/User + Telnet combination on B to be as transparent as possible, + i.e., to appear as if A were connected directly to C. In + particular, correct implementation will make B transparent + to Telnet end-of-line sequences, except that CR LF may be + translated to CR NUL or vice versa. + + IMPLEMENTATION: + To understand Telnet end-of-line issues, one must have at + least a general model of the relationship of Telnet to the + local operating system. The Server Telnet process is + typically coupled into the terminal driver software of the + operating system as a pseudo-terminal. A Telnet end-of- + line sequence received by the Server Telnet must have the + same effect as pressing the end-of-line key on a real + locally-connected terminal. + + Operating systems that support interactive character-at- + a-time applications (e.g., editors) typically have two + internal modes for their terminal I/O: a formatted mode, + in which local conventions for end-of-line and other + + + +Internet Engineering Task Force [Page 22] + + + + +RFC1123 REMOTE LOGIN -- TELNET October 1989 + + + formatting rules have been applied to the data stream, and + a "raw" mode, in which the application has direct access + to every character as it was entered. A Server Telnet + must be implemented in such a way that these modes have + the same effect for remote as for local terminals. For + example, suppose a CR LF or CR NUL is received by the + Server Telnet on an ASCII host. In raw mode, a CR + character is passed to the application; in formatted mode, + the local system's end-of-line convention is used. + + 3.3.2 Data Entry Terminals + + DISCUSSION: + In addition to the line-oriented and character-oriented + ASCII terminals for which Telnet was designed, there are + several families of video display terminals that are + sometimes known as "data entry terminals" or DETs. The + IBM 3270 family is a well-known example. + + Two Internet protocols have been designed to support + generic DETs: SUPDUP [TELNET:16, TELNET:17], and the DET + option [TELNET:18, TELNET:19]. The DET option drives a + data entry terminal over a Telnet connection using (sub-) + negotiation. SUPDUP is a completely separate terminal + protocol, which can be entered from Telnet by negotiation. + Although both SUPDUP and the DET option have been used + successfully in particular environments, neither has + gained general acceptance or wide implementation. + + A different approach to DET interaction has been developed + for supporting the IBM 3270 family through Telnet, + although the same approach would be applicable to any DET. + The idea is to enter a "native DET" mode, in which the + native DET input/output stream is sent as binary data. + The Telnet EOR command is used to delimit logical records + (e.g., "screens") within this binary stream. + + IMPLEMENTATION: + The rules for entering and leaving native DET mode are as + follows: + + o The Server uses the Terminal-Type option [TELNET:10] + to learn that the client is a DET. + + o It is conventional, but not required, that both ends + negotiate the EOR option [TELNET:9]. + + o Both ends negotiate the Binary option [TELNET:3] to + + + +Internet Engineering Task Force [Page 23] + + + + +RFC1123 REMOTE LOGIN -- TELNET October 1989 + + + enter native DET mode. + + o When either end negotiates out of binary mode, the + other end does too, and the mode then reverts to + normal NVT. + + + 3.3.3 Option Requirements + + Every Telnet implementation MUST support the Binary option + [TELNET:3] and the Suppress Go Ahead option [TELNET:5], and + SHOULD support the Echo [TELNET:4], Status [TELNET:6], End-of- + Record [TELNET:9], and Extended Options List [TELNET:8] + options. + + A User or Server Telnet SHOULD support the Window Size Option + [TELNET:12] if the local operating system provides the + corresponding capability. + + DISCUSSION: + Note that the End-of-Record option only signifies that a + Telnet can receive a Telnet EOR without crashing; + therefore, every Telnet ought to be willing to accept + negotiation of the End-of-Record option. See also the + discussion in Section 3.2.3. + + 3.3.4 Option Initiation + + When the Telnet protocol is used in a client/server situation, + the server SHOULD initiate negotiation of the terminal + interaction mode it expects. + + DISCUSSION: + The Telnet protocol was defined to be perfectly + symmetrical, but its application is generally asymmetric. + Remote login has been known to fail because NEITHER side + initiated negotiation of the required non-default terminal + modes. It is generally the server that determines the + preferred mode, so the server needs to initiate the + negotiation; since the negotiation is symmetric, the user + can also initiate it. + + A client (User Telnet) SHOULD provide a means for users to + enable and disable the initiation of option negotiation. + + DISCUSSION: + A user sometimes needs to connect to an application + service (e.g., FTP or SMTP) that uses Telnet for its + + + +Internet Engineering Task Force [Page 24] + + + + +RFC1123 REMOTE LOGIN -- TELNET October 1989 + + + control stream but does not support Telnet options. User + Telnet may be used for this purpose if initiation of + option negotiation is disabled. + + 3.3.5 Telnet Linemode Option + + DISCUSSION: + An important new Telnet option, LINEMODE [TELNET:12], has + been proposed. The LINEMODE option provides a standard + way for a User Telnet and a Server Telnet to agree that + the client rather than the server will perform terminal + character processing. When the client has prepared a + complete line of text, it will send it to the server in + (usually) one TCP packet. This option will greatly + decrease the packet cost of Telnet sessions and will also + give much better user response over congested or long- + delay networks. + + The LINEMODE option allows dynamic switching between local + and remote character processing. For example, the Telnet + connection will automatically negotiate into single- + character mode while a full screen editor is running, and + then return to linemode when the editor is finished. + + We expect that when this RFC is released, hosts should + implement the client side of this option, and may + implement the server side of this option. To properly + implement the server side, the server needs to be able to + tell the local system not to do any input character + processing, but to remember its current terminal state and + notify the Server Telnet process whenever the state + changes. This will allow password echoing and full screen + editors to be handled properly, for example. + + 3.4 TELNET/USER INTERFACE + + 3.4.1 Character Set Transparency + + User Telnet implementations SHOULD be able to send or receive + any 7-bit ASCII character. Where possible, any special + character interpretations by the user host's operating system + SHOULD be bypassed so that these characters can conveniently be + sent and received on the connection. + + Some character value MUST be reserved as "escape to command + mode"; conventionally, doubling this character allows it to be + entered as data. The specific character used SHOULD be user + selectable. + + + +Internet Engineering Task Force [Page 25] + + + + +RFC1123 REMOTE LOGIN -- TELNET October 1989 + + + On binary-mode connections, a User Telnet program MAY provide + an escape mechanism for entering arbitrary 8-bit values, if the + host operating system doesn't allow them to be entered directly + from the keyboard. + + IMPLEMENTATION: + The transparency issues are less pressing on servers, but + implementors should take care in dealing with issues like: + masking off parity bits (sent by an older, non-conforming + client) before they reach programs that expect only NVT + ASCII, and properly handling programs that request 8-bit + data streams. + + 3.4.2 Telnet Commands + + A User Telnet program MUST provide a user the capability of + entering any of the Telnet control functions IP, AO, or AYT, + and SHOULD provide the capability of entering EC, EL, and + Break. + + 3.4.3 TCP Connection Errors + + A User Telnet program SHOULD report to the user any TCP errors + that are reported by the transport layer (see "TCP/Application + Layer Interface" section in [INTRO:1]). + + 3.4.4 Non-Default Telnet Contact Port + + A User Telnet program SHOULD allow the user to optionally + specify a non-standard contact port number at the Server Telnet + host. + + 3.4.5 Flushing Output + + A User Telnet program SHOULD provide the user the ability to + specify whether or not output should be flushed when an IP is + sent; see Section 3.2.4. + + For any output flushing scheme that causes the User Telnet to + flush output locally until a Telnet signal is received from the + Server, there SHOULD be a way for the user to manually restore + normal output, in case the Server fails to send the expected + signal. + + + + + + + + +Internet Engineering Task Force [Page 26] + + + + +RFC1123 REMOTE LOGIN -- TELNET October 1989 + + + 3.5. TELNET REQUIREMENTS SUMMARY + + + | | | | |S| | + | | | | |H| |F + | | | | |O|M|o + | | |S| |U|U|o + | | |H| |L|S|t + | |M|O| |D|T|n + | |U|U|M| | |o + | |S|L|A|N|N|t + | |T|D|Y|O|O|t +FEATURE |SECTION | | | |T|T|e +-------------------------------------------------|--------|-|-|-|-|-|-- + | | | | | | | +Option Negotiation |3.2.1 |x| | | | | + Avoid negotiation loops |3.2.1 |x| | | | | + Refuse unsupported options |3.2.1 |x| | | | | + Negotiation OK anytime on connection |3.2.1 | |x| | | | + Default to NVT |3.2.1 |x| | | | | + Send official name in Term-Type option |3.2.8 |x| | | | | + Accept any name in Term-Type option |3.2.8 |x| | | | | + Implement Binary, Suppress-GA options |3.3.3 |x| | | | | + Echo, Status, EOL, Ext-Opt-List options |3.3.3 | |x| | | | + Implement Window-Size option if appropriate |3.3.3 | |x| | | | + Server initiate mode negotiations |3.3.4 | |x| | | | + User can enable/disable init negotiations |3.3.4 | |x| | | | + | | | | | | | +Go-Aheads | | | | | | | + Non-GA server negotiate SUPPRESS-GA option |3.2.2 |x| | | | | + User or Server accept SUPPRESS-GA option |3.2.2 |x| | | | | + User Telnet ignore GA's |3.2.2 | | |x| | | + | | | | | | | +Control Functions | | | | | | | + Support SE NOP DM IP AO AYT SB |3.2.3 |x| | | | | + Support EOR EC EL Break |3.2.3 | | |x| | | + Ignore unsupported control functions |3.2.3 |x| | | | | + User, Server discard urgent data up to DM |3.2.4 |x| | | | | + User Telnet send "Synch" after IP, AO, AYT |3.2.4 | |x| | | | + Server Telnet reply Synch to IP |3.2.4 | | |x| | | + Server Telnet reply Synch to AO |3.2.4 |x| | | | | + User Telnet can flush output when send IP |3.2.4 | |x| | | | + | | | | | | | +Encoding | | | | | | | + Send high-order bit in NVT mode |3.2.5 | | | |x| | + Send high-order bit as parity bit |3.2.5 | | | | |x| + Negot. BINARY if pass high-ord. bit to applic |3.2.5 | |x| | | | + Always double IAC data byte |3.2.6 |x| | | | | + + + +Internet Engineering Task Force [Page 27] + + + + +RFC1123 REMOTE LOGIN -- TELNET October 1989 + + + Double IAC data byte in binary mode |3.2.7 |x| | | | | + Obey Telnet cmds in binary mode |3.2.7 |x| | | | | + End-of-line, CR NUL in binary mode |3.2.7 | | | | |x| + | | | | | | | +End-of-Line | | | | | | | + EOL at Server same as local end-of-line |3.3.1 |x| | | | | + ASCII Server accept CR LF or CR NUL for EOL |3.3.1 |x| | | | | + User Telnet able to send CR LF, CR NUL, or LF |3.3.1 |x| | | | | + ASCII user able to select CR LF/CR NUL |3.3.1 | |x| | | | + User Telnet default mode is CR LF |3.3.1 | |x| | | | + Non-interactive uses CR LF for EOL |3.3.1 |x| | | | | + | | | | | | | +User Telnet interface | | | | | | | + Input & output all 7-bit characters |3.4.1 | |x| | | | + Bypass local op sys interpretation |3.4.1 | |x| | | | + Escape character |3.4.1 |x| | | | | + User-settable escape character |3.4.1 | |x| | | | + Escape to enter 8-bit values |3.4.1 | | |x| | | + Can input IP, AO, AYT |3.4.2 |x| | | | | + Can input EC, EL, Break |3.4.2 | |x| | | | + Report TCP connection errors to user |3.4.3 | |x| | | | + Optional non-default contact port |3.4.4 | |x| | | | + Can spec: output flushed when IP sent |3.4.5 | |x| | | | + Can manually restore output mode |3.4.5 | |x| | | | + | | | | | | | + + + + + + + + + + + + + + + + + + + + + + + + + + +Internet Engineering Task Force [Page 28] + + + + +RFC1123 FILE TRANSFER -- FTP October 1989 + + +4. FILE TRANSFER + + 4.1 FILE TRANSFER PROTOCOL -- FTP + + 4.1.1 INTRODUCTION + + The File Transfer Protocol FTP is the primary Internet standard + for file transfer. The current specification is contained in + RFC-959 [FTP:1]. + + FTP uses separate simultaneous TCP connections for control and + for data transfer. The FTP protocol includes many features, + some of which are not commonly implemented. However, for every + feature in FTP, there exists at least one implementation. The + minimum implementation defined in RFC-959 was too small, so a + somewhat larger minimum implementation is defined here. + + Internet users have been unnecessarily burdened for years by + deficient FTP implementations. Protocol implementors have + suffered from the erroneous opinion that implementing FTP ought + to be a small and trivial task. This is wrong, because FTP has + a user interface, because it has to deal (correctly) with the + whole variety of communication and operating system errors that + may occur, and because it has to handle the great diversity of + real file systems in the world. + + 4.1.2. PROTOCOL WALK-THROUGH + + 4.1.2.1 LOCAL Type: RFC-959 Section 3.1.1.4 + + An FTP program MUST support TYPE I ("IMAGE" or binary type) + as well as TYPE L 8 ("LOCAL" type with logical byte size 8). + A machine whose memory is organized into m-bit words, where + m is not a multiple of 8, MAY also support TYPE L m. + + DISCUSSION: + The command "TYPE L 8" is often required to transfer + binary data between a machine whose memory is organized + into (e.g.) 36-bit words and a machine with an 8-bit + byte organization. For an 8-bit byte machine, TYPE L 8 + is equivalent to IMAGE. + + "TYPE L m" is sometimes specified to the FTP programs + on two m-bit word machines to ensure the correct + transfer of a native-mode binary file from one machine + to the other. However, this command should have the + same effect on these machines as "TYPE I". + + + + +Internet Engineering Task Force [Page 29] + + + + +RFC1123 FILE TRANSFER -- FTP October 1989 + + + 4.1.2.2 Telnet Format Control: RFC-959 Section 3.1.1.5.2 + + A host that makes no distinction between TYPE N and TYPE T + SHOULD implement TYPE T to be identical to TYPE N. + + DISCUSSION: + This provision should ease interoperation with hosts + that do make this distinction. + + Many hosts represent text files internally as strings + of ASCII characters, using the embedded ASCII format + effector characters (LF, BS, FF, ...) to control the + format when a file is printed. For such hosts, there + is no distinction between "print" files and other + files. However, systems that use record structured + files typically need a special format for printable + files (e.g., ASA carriage control). For the latter + hosts, FTP allows a choice of TYPE N or TYPE T. + + 4.1.2.3 Page Structure: RFC-959 Section 3.1.2.3 and Appendix I + + Implementation of page structure is NOT RECOMMENDED in + general. However, if a host system does need to implement + FTP for "random access" or "holey" files, it MUST use the + defined page structure format rather than define a new + private FTP format. + + 4.1.2.4 Data Structure Transformations: RFC-959 Section 3.1.2 + + An FTP transformation between record-structure and file- + structure SHOULD be invertible, to the extent possible while + making the result useful on the target host. + + DISCUSSION: + RFC-959 required strict invertibility between record- + structure and file-structure, but in practice, + efficiency and convenience often preclude it. + Therefore, the requirement is being relaxed. There are + two different objectives for transferring a file: + processing it on the target host, or just storage. For + storage, strict invertibility is important. For + processing, the file created on the target host needs + to be in the format expected by application programs on + that host. + + As an example of the conflict, imagine a record- + oriented operating system that requires some data files + to have exactly 80 bytes in each record. While STORing + + + +Internet Engineering Task Force [Page 30] + + + + +RFC1123 FILE TRANSFER -- FTP October 1989 + + + a file on such a host, an FTP Server must be able to + pad each line or record to 80 bytes; a later retrieval + of such a file cannot be strictly invertible. + + 4.1.2.5 Data Connection Management: RFC-959 Section 3.3 + + A User-FTP that uses STREAM mode SHOULD send a PORT command + to assign a non-default data port before each transfer + command is issued. + + DISCUSSION: + This is required because of the long delay after a TCP + connection is closed until its socket pair can be + reused, to allow multiple transfers during a single FTP + session. Sending a port command can avoided if a + transfer mode other than stream is used, by leaving the + data transfer connection open between transfers. + + 4.1.2.6 PASV Command: RFC-959 Section 4.1.2 + + A server-FTP MUST implement the PASV command. + + If multiple third-party transfers are to be executed during + the same session, a new PASV command MUST be issued before + each transfer command, to obtain a unique port pair. + + IMPLEMENTATION: + The format of the 227 reply to a PASV command is not + well standardized. In particular, an FTP client cannot + assume that the parentheses shown on page 40 of RFC-959 + will be present (and in fact, Figure 3 on page 43 omits + them). Therefore, a User-FTP program that interprets + the PASV reply must scan the reply for the first digit + of the host and port numbers. + + Note that the host number h1,h2,h3,h4 is the IP address + of the server host that is sending the reply, and that + p1,p2 is a non-default data transfer port that PASV has + assigned. + + 4.1.2.7 LIST and NLST Commands: RFC-959 Section 4.1.3 + + The data returned by an NLST command MUST contain only a + simple list of legal pathnames, such that the server can use + them directly as the arguments of subsequent data transfer + commands for the individual files. + + The data returned by a LIST or NLST command SHOULD use an + + + +Internet Engineering Task Force [Page 31] + + + + +RFC1123 FILE TRANSFER -- FTP October 1989 + + + implied TYPE AN, unless the current type is EBCDIC, in which + case an implied TYPE EN SHOULD be used. + + DISCUSSION: + Many FTP clients support macro-commands that will get + or put files matching a wildcard specification, using + NLST to obtain a list of pathnames. The expansion of + "multiple-put" is local to the client, but "multiple- + get" requires cooperation by the server. + + The implied type for LIST and NLST is designed to + provide compatibility with existing User-FTPs, and in + particular with multiple-get commands. + + 4.1.2.8 SITE Command: RFC-959 Section 4.1.3 + + A Server-FTP SHOULD use the SITE command for non-standard + features, rather than invent new private commands or + unstandardized extensions to existing commands. + + 4.1.2.9 STOU Command: RFC-959 Section 4.1.3 + + The STOU command stores into a uniquely named file. When it + receives an STOU command, a Server-FTP MUST return the + actual file name in the "125 Transfer Starting" or the "150 + Opening Data Connection" message that precedes the transfer + (the 250 reply code mentioned in RFC-959 is incorrect). The + exact format of these messages is hereby defined to be as + follows: + + 125 FILE: pppp + 150 FILE: pppp + + where pppp represents the unique pathname of the file that + will be written. + + 4.1.2.10 Telnet End-of-line Code: RFC-959, Page 34 + + Implementors MUST NOT assume any correspondence between READ + boundaries on the control connection and the Telnet EOL + sequences (CR LF). + + DISCUSSION: + Thus, a server-FTP (or User-FTP) must continue reading + characters from the control connection until a complete + Telnet EOL sequence is encountered, before processing + the command (or response, respectively). Conversely, a + single READ from the control connection may include + + + +Internet Engineering Task Force [Page 32] + + + + +RFC1123 FILE TRANSFER -- FTP October 1989 + + + more than one FTP command. + + 4.1.2.11 FTP Replies: RFC-959 Section 4.2, Page 35 + + A Server-FTP MUST send only correctly formatted replies on + the control connection. Note that RFC-959 (unlike earlier + versions of the FTP spec) contains no provision for a + "spontaneous" reply message. + + A Server-FTP SHOULD use the reply codes defined in RFC-959 + whenever they apply. However, a server-FTP MAY use a + different reply code when needed, as long as the general + rules of Section 4.2 are followed. When the implementor has + a choice between a 4xx and 5xx reply code, a Server-FTP + SHOULD send a 4xx (temporary failure) code when there is any + reasonable possibility that a failed FTP will succeed a few + hours later. + + A User-FTP SHOULD generally use only the highest-order digit + of a 3-digit reply code for making a procedural decision, to + prevent difficulties when a Server-FTP uses non-standard + reply codes. + + A User-FTP MUST be able to handle multi-line replies. If + the implementation imposes a limit on the number of lines + and if this limit is exceeded, the User-FTP MUST recover, + e.g., by ignoring the excess lines until the end of the + multi-line reply is reached. + + A User-FTP SHOULD NOT interpret a 421 reply code ("Service + not available, closing control connection") specially, but + SHOULD detect closing of the control connection by the + server. + + DISCUSSION: + Server implementations that fail to strictly follow the + reply rules often cause FTP user programs to hang. + Note that RFC-959 resolved ambiguities in the reply + rules found in earlier FTP specifications and must be + followed. + + It is important to choose FTP reply codes that properly + distinguish between temporary and permanent failures, + to allow the successful use of file transfer client + daemons. These programs depend on the reply codes to + decide whether or not to retry a failed transfer; using + a permanent failure code (5xx) for a temporary error + will cause these programs to give up unnecessarily. + + + +Internet Engineering Task Force [Page 33] + + + + +RFC1123 FILE TRANSFER -- FTP October 1989 + + + When the meaning of a reply matches exactly the text + shown in RFC-959, uniformity will be enhanced by using + the RFC-959 text verbatim. However, a Server-FTP + implementor is encouraged to choose reply text that + conveys specific system-dependent information, when + appropriate. + + 4.1.2.12 Connections: RFC-959 Section 5.2 + + The words "and the port used" in the second paragraph of + this section of RFC-959 are erroneous (historical), and they + should be ignored. + + On a multihomed server host, the default data transfer port + (L-1) MUST be associated with the same local IP address as + the corresponding control connection to port L. + + A user-FTP MUST NOT send any Telnet controls other than + SYNCH and IP on an FTP control connection. In particular, it + MUST NOT attempt to negotiate Telnet options on the control + connection. However, a server-FTP MUST be capable of + accepting and refusing Telnet negotiations (i.e., sending + DONT/WONT). + + DISCUSSION: + Although the RFC says: "Server- and User- processes + should follow the conventions for the Telnet + protocol...[on the control connection]", it is not the + intent that Telnet option negotiation is to be + employed. + + 4.1.2.13 Minimum Implementation; RFC-959 Section 5.1 + + The following commands and options MUST be supported by + every server-FTP and user-FTP, except in cases where the + underlying file system or operating system does not allow or + support a particular command. + + Type: ASCII Non-print, IMAGE, LOCAL 8 + Mode: Stream + Structure: File, Record* + Commands: + USER, PASS, ACCT, + PORT, PASV, + TYPE, MODE, STRU, + RETR, STOR, APPE, + RNFR, RNTO, DELE, + CWD, CDUP, RMD, MKD, PWD, + + + +Internet Engineering Task Force [Page 34] + + + + +RFC1123 FILE TRANSFER -- FTP October 1989 + + + LIST, NLST, + SYST, STAT, + HELP, NOOP, QUIT. + + *Record structure is REQUIRED only for hosts whose file + systems support record structure. + + DISCUSSION: + Vendors are encouraged to implement a larger subset of + the protocol. For example, there are important + robustness features in the protocol (e.g., Restart, + ABOR, block mode) that would be an aid to some Internet + users but are not widely implemented. + + A host that does not have record structures in its file + system may still accept files with STRU R, recording + the byte stream literally. + + 4.1.3 SPECIFIC ISSUES + + 4.1.3.1 Non-standard Command Verbs + + FTP allows "experimental" commands, whose names begin with + "X". If these commands are subsequently adopted as + standards, there may still be existing implementations using + the "X" form. At present, this is true for the directory + commands: + + RFC-959 "Experimental" + + MKD XMKD + RMD XRMD + PWD XPWD + CDUP XCUP + CWD XCWD + + All FTP implementations SHOULD recognize both forms of these + commands, by simply equating them with extra entries in the + command lookup table. + + IMPLEMENTATION: + A User-FTP can access a server that supports only the + "X" forms by implementing a mode switch, or + automatically using the following procedure: if the + RFC-959 form of one of the above commands is rejected + with a 500 or 502 response code, then try the + experimental form; any other response would be passed + to the user. + + + +Internet Engineering Task Force [Page 35] + + + + +RFC1123 FILE TRANSFER -- FTP October 1989 + + + 4.1.3.2 Idle Timeout + + A Server-FTP process SHOULD have an idle timeout, which will + terminate the process and close the control connection if + the server is inactive (i.e., no command or data transfer in + progress) for a long period of time. The idle timeout time + SHOULD be configurable, and the default should be at least 5 + minutes. + + A client FTP process ("User-PI" in RFC-959) will need + timeouts on responses only if it is invoked from a program. + + DISCUSSION: + Without a timeout, a Server-FTP process may be left + pending indefinitely if the corresponding client + crashes without closing the control connection. + + 4.1.3.3 Concurrency of Data and Control + + DISCUSSION: + The intent of the designers of FTP was that a user + should be able to send a STAT command at any time while + data transfer was in progress and that the server-FTP + would reply immediately with status -- e.g., the number + of bytes transferred so far. Similarly, an ABOR + command should be possible at any time during a data + transfer. + + Unfortunately, some small-machine operating systems + make such concurrent programming difficult, and some + other implementers seek minimal solutions, so some FTP + implementations do not allow concurrent use of the data + and control connections. Even such a minimal server + must be prepared to accept and defer a STAT or ABOR + command that arrives during data transfer. + + 4.1.3.4 FTP Restart Mechanism + + The description of the 110 reply on pp. 40-41 of RFC-959 is + incorrect; the correct description is as follows. A restart + reply message, sent over the control connection from the + receiving FTP to the User-FTP, has the format: + + 110 MARK ssss = rrrr + + Here: + + * ssss is a text string that appeared in a Restart Marker + + + +Internet Engineering Task Force [Page 36] + + + + +RFC1123 FILE TRANSFER -- FTP October 1989 + + + in the data stream and encodes a position in the + sender's file system; + + * rrrr encodes the corresponding position in the + receiver's file system. + + The encoding, which is specific to a particular file system + and network implementation, is always generated and + interpreted by the same system, either sender or receiver. + + When an FTP that implements restart receives a Restart + Marker in the data stream, it SHOULD force the data to that + point to be written to stable storage before encoding the + corresponding position rrrr. An FTP sending Restart Markers + MUST NOT assume that 110 replies will be returned + synchronously with the data, i.e., it must not await a 110 + reply before sending more data. + + Two new reply codes are hereby defined for errors + encountered in restarting a transfer: + + 554 Requested action not taken: invalid REST parameter. + + A 554 reply may result from a FTP service command that + follows a REST command. The reply indicates that the + existing file at the Server-FTP cannot be repositioned + as specified in the REST. + + 555 Requested action not taken: type or stru mismatch. + + A 555 reply may result from an APPE command or from any + FTP service command following a REST command. The + reply indicates that there is some mismatch between the + current transfer parameters (type and stru) and the + attributes of the existing file. + + DISCUSSION: + Note that the FTP Restart mechanism requires that Block + or Compressed mode be used for data transfer, to allow + the Restart Markers to be included within the data + stream. The frequency of Restart Markers can be low. + + Restart Markers mark a place in the data stream, but + the receiver may be performing some transformation on + the data as it is stored into stable storage. In + general, the receiver's encoding must include any state + information necessary to restart this transformation at + any point of the FTP data stream. For example, in TYPE + + + +Internet Engineering Task Force [Page 37] + + + + +RFC1123 FILE TRANSFER -- FTP October 1989 + + + A transfers, some receiver hosts transform CR LF + sequences into a single LF character on disk. If a + Restart Marker happens to fall between CR and LF, the + receiver must encode in rrrr that the transfer must be + restarted in a "CR has been seen and discarded" state. + + Note that the Restart Marker is required to be encoded + as a string of printable ASCII characters, regardless + of the type of the data. + + RFC-959 says that restart information is to be returned + "to the user". This should not be taken literally. In + general, the User-FTP should save the restart + information (ssss,rrrr) in stable storage, e.g., append + it to a restart control file. An empty restart control + file should be created when the transfer first starts + and deleted automatically when the transfer completes + successfully. It is suggested that this file have a + name derived in an easily-identifiable manner from the + name of the file being transferred and the remote host + name; this is analogous to the means used by many text + editors for naming "backup" files. + + There are three cases for FTP restart. + + (1) User-to-Server Transfer + + The User-FTP puts Restart Markers <ssss> at + convenient places in the data stream. When the + Server-FTP receives a Marker, it writes all prior + data to disk, encodes its file system position and + transformation state as rrrr, and returns a "110 + MARK ssss = rrrr" reply over the control + connection. The User-FTP appends the pair + (ssss,rrrr) to its restart control file. + + To restart the transfer, the User-FTP fetches the + last (ssss,rrrr) pair from the restart control + file, repositions its local file system and + transformation state using ssss, and sends the + command "REST rrrr" to the Server-FTP. + + (2) Server-to-User Transfer + + The Server-FTP puts Restart Markers <ssss> at + convenient places in the data stream. When the + User-FTP receives a Marker, it writes all prior + data to disk, encodes its file system position and + + + +Internet Engineering Task Force [Page 38] + + + + +RFC1123 FILE TRANSFER -- FTP October 1989 + + + transformation state as rrrr, and appends the pair + (rrrr,ssss) to its restart control file. + + To restart the transfer, the User-FTP fetches the + last (rrrr,ssss) pair from the restart control + file, repositions its local file system and + transformation state using rrrr, and sends the + command "REST ssss" to the Server-FTP. + + (3) Server-to-Server ("Third-Party") Transfer + + The sending Server-FTP puts Restart Markers <ssss> + at convenient places in the data stream. When it + receives a Marker, the receiving Server-FTP writes + all prior data to disk, encodes its file system + position and transformation state as rrrr, and + sends a "110 MARK ssss = rrrr" reply over the + control connection to the User. The User-FTP + appends the pair (ssss,rrrr) to its restart + control file. + + To restart the transfer, the User-FTP fetches the + last (ssss,rrrr) pair from the restart control + file, sends "REST ssss" to the sending Server-FTP, + and sends "REST rrrr" to the receiving Server-FTP. + + + 4.1.4 FTP/USER INTERFACE + + This section discusses the user interface for a User-FTP + program. + + 4.1.4.1 Pathname Specification + + Since FTP is intended for use in a heterogeneous + environment, User-FTP implementations MUST support remote + pathnames as arbitrary character strings, so that their form + and content are not limited by the conventions of the local + operating system. + + DISCUSSION: + In particular, remote pathnames can be of arbitrary + length, and all the printing ASCII characters as well + as space (0x20) must be allowed. RFC-959 allows a + pathname to contain any 7-bit ASCII character except CR + or LF. + + + + + +Internet Engineering Task Force [Page 39] + + + + +RFC1123 FILE TRANSFER -- FTP October 1989 + + + 4.1.4.2 "QUOTE" Command + + A User-FTP program MUST implement a "QUOTE" command that + will pass an arbitrary character string to the server and + display all resulting response messages to the user. + + To make the "QUOTE" command useful, a User-FTP SHOULD send + transfer control commands to the server as the user enters + them, rather than saving all the commands and sending them + to the server only when a data transfer is started. + + DISCUSSION: + The "QUOTE" command is essential to allow the user to + access servers that require system-specific commands + (e.g., SITE or ALLO), or to invoke new or optional + features that are not implemented by the User-FTP. For + example, "QUOTE" may be used to specify "TYPE A T" to + send a print file to hosts that require the + distinction, even if the User-FTP does not recognize + that TYPE. + + 4.1.4.3 Displaying Replies to User + + A User-FTP SHOULD display to the user the full text of all + error reply messages it receives. It SHOULD have a + "verbose" mode in which all commands it sends and the full + text and reply codes it receives are displayed, for + diagnosis of problems. + + 4.1.4.4 Maintaining Synchronization + + The state machine in a User-FTP SHOULD be forgiving of + missing and unexpected reply messages, in order to maintain + command synchronization with the server. + + + + + + + + + + + + + + + + + +Internet Engineering Task Force [Page 40] + + + + +RFC1123 FILE TRANSFER -- FTP October 1989 + + + 4.1.5 FTP REQUIREMENTS SUMMARY + + | | | | |S| | + | | | | |H| |F + | | | | |O|M|o + | | |S| |U|U|o + | | |H| |L|S|t + | |M|O| |D|T|n + | |U|U|M| | |o + | |S|L|A|N|N|t + | |T|D|Y|O|O|t +FEATURE |SECTION | | | |T|T|e +-------------------------------------------|---------------|-|-|-|-|-|-- +Implement TYPE T if same as TYPE N |4.1.2.2 | |x| | | | +File/Record transform invertible if poss. |4.1.2.4 | |x| | | | +User-FTP send PORT cmd for stream mode |4.1.2.5 | |x| | | | +Server-FTP implement PASV |4.1.2.6 |x| | | | | + PASV is per-transfer |4.1.2.6 |x| | | | | +NLST reply usable in RETR cmds |4.1.2.7 |x| | | | | +Implied type for LIST and NLST |4.1.2.7 | |x| | | | +SITE cmd for non-standard features |4.1.2.8 | |x| | | | +STOU cmd return pathname as specified |4.1.2.9 |x| | | | | +Use TCP READ boundaries on control conn. |4.1.2.10 | | | | |x| + | | | | | | | +Server-FTP send only correct reply format |4.1.2.11 |x| | | | | +Server-FTP use defined reply code if poss. |4.1.2.11 | |x| | | | + New reply code following Section 4.2 |4.1.2.11 | | |x| | | +User-FTP use only high digit of reply |4.1.2.11 | |x| | | | +User-FTP handle multi-line reply lines |4.1.2.11 |x| | | | | +User-FTP handle 421 reply specially |4.1.2.11 | | | |x| | + | | | | | | | +Default data port same IP addr as ctl conn |4.1.2.12 |x| | | | | +User-FTP send Telnet cmds exc. SYNCH, IP |4.1.2.12 | | | | |x| +User-FTP negotiate Telnet options |4.1.2.12 | | | | |x| +Server-FTP handle Telnet options |4.1.2.12 |x| | | | | +Handle "Experimental" directory cmds |4.1.3.1 | |x| | | | +Idle timeout in server-FTP |4.1.3.2 | |x| | | | + Configurable idle timeout |4.1.3.2 | |x| | | | +Receiver checkpoint data at Restart Marker |4.1.3.4 | |x| | | | +Sender assume 110 replies are synchronous |4.1.3.4 | | | | |x| + | | | | | | | +Support TYPE: | | | | | | | + ASCII - Non-Print (AN) |4.1.2.13 |x| | | | | + ASCII - Telnet (AT) -- if same as AN |4.1.2.2 | |x| | | | + ASCII - Carriage Control (AC) |959 3.1.1.5.2 | | |x| | | + EBCDIC - (any form) |959 3.1.1.2 | | |x| | | + IMAGE |4.1.2.1 |x| | | | | + LOCAL 8 |4.1.2.1 |x| | | | | + + + +Internet Engineering Task Force [Page 41] + + + + +RFC1123 FILE TRANSFER -- FTP October 1989 + + + LOCAL m |4.1.2.1 | | |x| | |2 + | | | | | | | +Support MODE: | | | | | | | + Stream |4.1.2.13 |x| | | | | + Block |959 3.4.2 | | |x| | | + | | | | | | | +Support STRUCTURE: | | | | | | | + File |4.1.2.13 |x| | | | | + Record |4.1.2.13 |x| | | | |3 + Page |4.1.2.3 | | | |x| | + | | | | | | | +Support commands: | | | | | | | + USER |4.1.2.13 |x| | | | | + PASS |4.1.2.13 |x| | | | | + ACCT |4.1.2.13 |x| | | | | + CWD |4.1.2.13 |x| | | | | + CDUP |4.1.2.13 |x| | | | | + SMNT |959 5.3.1 | | |x| | | + REIN |959 5.3.1 | | |x| | | + QUIT |4.1.2.13 |x| | | | | + | | | | | | | + PORT |4.1.2.13 |x| | | | | + PASV |4.1.2.6 |x| | | | | + TYPE |4.1.2.13 |x| | | | |1 + STRU |4.1.2.13 |x| | | | |1 + MODE |4.1.2.13 |x| | | | |1 + | | | | | | | + RETR |4.1.2.13 |x| | | | | + STOR |4.1.2.13 |x| | | | | + STOU |959 5.3.1 | | |x| | | + APPE |4.1.2.13 |x| | | | | + ALLO |959 5.3.1 | | |x| | | + REST |959 5.3.1 | | |x| | | + RNFR |4.1.2.13 |x| | | | | + RNTO |4.1.2.13 |x| | | | | + ABOR |959 5.3.1 | | |x| | | + DELE |4.1.2.13 |x| | | | | + RMD |4.1.2.13 |x| | | | | + MKD |4.1.2.13 |x| | | | | + PWD |4.1.2.13 |x| | | | | + LIST |4.1.2.13 |x| | | | | + NLST |4.1.2.13 |x| | | | | + SITE |4.1.2.8 | | |x| | | + STAT |4.1.2.13 |x| | | | | + SYST |4.1.2.13 |x| | | | | + HELP |4.1.2.13 |x| | | | | + NOOP |4.1.2.13 |x| | | | | + | | | | | | | + + + +Internet Engineering Task Force [Page 42] + + + + +RFC1123 FILE TRANSFER -- FTP October 1989 + + +User Interface: | | | | | | | + Arbitrary pathnames |4.1.4.1 |x| | | | | + Implement "QUOTE" command |4.1.4.2 |x| | | | | + Transfer control commands immediately |4.1.4.2 | |x| | | | + Display error messages to user |4.1.4.3 | |x| | | | + Verbose mode |4.1.4.3 | |x| | | | + Maintain synchronization with server |4.1.4.4 | |x| | | | + +Footnotes: + +(1) For the values shown earlier. + +(2) Here m is number of bits in a memory word. + +(3) Required for host with record-structured file system, optional + otherwise. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Internet Engineering Task Force [Page 43] + + + + +RFC1123 FILE TRANSFER -- TFTP October 1989 + + + 4.2 TRIVIAL FILE TRANSFER PROTOCOL -- TFTP + + 4.2.1 INTRODUCTION + + The Trivial File Transfer Protocol TFTP is defined in RFC-783 + [TFTP:1]. + + TFTP provides its own reliable delivery with UDP as its + transport protocol, using a simple stop-and-wait acknowledgment + system. Since TFTP has an effective window of only one 512 + octet segment, it can provide good performance only over paths + that have a small delay*bandwidth product. The TFTP file + interface is very simple, providing no access control or + security. + + TFTP's most important application is bootstrapping a host over + a local network, since it is simple and small enough to be + easily implemented in EPROM [BOOT:1, BOOT:2]. Vendors are + urged to support TFTP for booting. + + 4.2.2 PROTOCOL WALK-THROUGH + + The TFTP specification [TFTP:1] is written in an open style, + and does not fully specify many parts of the protocol. + + 4.2.2.1 Transfer Modes: RFC-783, Page 3 + + The transfer mode "mail" SHOULD NOT be supported. + + 4.2.2.2 UDP Header: RFC-783, Page 17 + + The Length field of a UDP header is incorrectly defined; it + includes the UDP header length (8). + + 4.2.3 SPECIFIC ISSUES + + 4.2.3.1 Sorcerer's Apprentice Syndrome + + There is a serious bug, known as the "Sorcerer's Apprentice + Syndrome," in the protocol specification. While it does not + cause incorrect operation of the transfer (the file will + always be transferred correctly if the transfer completes), + this bug may cause excessive retransmission, which may cause + the transfer to time out. + + Implementations MUST contain the fix for this problem: the + sender (i.e., the side originating the DATA packets) must + never resend the current DATA packet on receipt of a + + + +Internet Engineering Task Force [Page 44] + + + + +RFC1123 FILE TRANSFER -- TFTP October 1989 + + + duplicate ACK. + + DISCUSSION: + The bug is caused by the protocol rule that either + side, on receiving an old duplicate datagram, may + resend the current datagram. If a packet is delayed in + the network but later successfully delivered after + either side has timed out and retransmitted a packet, a + duplicate copy of the response may be generated. If + the other side responds to this duplicate with a + duplicate of its own, then every datagram will be sent + in duplicate for the remainder of the transfer (unless + a datagram is lost, breaking the repetition). Worse + yet, since the delay is often caused by congestion, + this duplicate transmission will usually causes more + congestion, leading to more delayed packets, etc. + + The following example may help to clarify this problem. + + TFTP A TFTP B + + (1) Receive ACK X-1 + Send DATA X + (2) Receive DATA X + Send ACK X + (ACK X is delayed in network, + and A times out): + (3) Retransmit DATA X + + (4) Receive DATA X again + Send ACK X again + (5) Receive (delayed) ACK X + Send DATA X+1 + (6) Receive DATA X+1 + Send ACK X+1 + (7) Receive ACK X again + Send DATA X+1 again + (8) Receive DATA X+1 again + Send ACK X+1 again + (9) Receive ACK X+1 + Send DATA X+2 + (10) Receive DATA X+2 + Send ACK X+3 + (11) Receive ACK X+1 again + Send DATA X+2 again + (12) Receive DATA X+2 again + Send ACK X+3 again + + + + +Internet Engineering Task Force [Page 45] + + + + +RFC1123 FILE TRANSFER -- TFTP October 1989 + + + Notice that once the delayed ACK arrives, the protocol + settles down to duplicate all further packets + (sequences 5-8 and 9-12). The problem is caused not by + either side timing out, but by both sides + retransmitting the current packet when they receive a + duplicate. + + The fix is to break the retransmission loop, as + indicated above. This is analogous to the behavior of + TCP. It is then possible to remove the retransmission + timer on the receiver, since the resent ACK will never + cause any action; this is a useful simplification where + TFTP is used in a bootstrap program. It is OK to allow + the timer to remain, and it may be helpful if the + retransmitted ACK replaces one that was genuinely lost + in the network. The sender still requires a retransmit + timer, of course. + + 4.2.3.2 Timeout Algorithms + + A TFTP implementation MUST use an adaptive timeout. + + IMPLEMENTATION: + TCP retransmission algorithms provide a useful base to + work from. At least an exponential backoff of + retransmission timeout is necessary. + + 4.2.3.3 Extensions + + A variety of non-standard extensions have been made to TFTP, + including additional transfer modes and a secure operation + mode (with passwords). None of these have been + standardized. + + 4.2.3.4 Access Control + + A server TFTP implementation SHOULD include some + configurable access control over what pathnames are allowed + in TFTP operations. + + 4.2.3.5 Broadcast Request + + A TFTP request directed to a broadcast address SHOULD be + silently ignored. + + DISCUSSION: + Due to the weak access control capability of TFTP, + directed broadcasts of TFTP requests to random networks + + + +Internet Engineering Task Force [Page 46] + + + + +RFC1123 FILE TRANSFER -- TFTP October 1989 + + + could create a significant security hole. + + 4.2.4 TFTP REQUIREMENTS SUMMARY + + | | | | |S| | + | | | | |H| |F + | | | | |O|M|o + | | |S| |U|U|o + | | |H| |L|S|t + | |M|O| |D|T|n + | |U|U|M| | |o + | |S|L|A|N|N|t + | |T|D|Y|O|O|t +FEATURE |SECTION | | | |T|T|e +-------------------------------------------------|--------|-|-|-|-|-|-- +Fix Sorcerer's Apprentice Syndrome |4.2.3.1 |x| | | | | +Transfer modes: | | | | | | | + netascii |RFC-783 |x| | | | | + octet |RFC-783 |x| | | | | + mail |4.2.2.1 | | | |x| | + extensions |4.2.3.3 | | |x| | | +Use adaptive timeout |4.2.3.2 |x| | | | | +Configurable access control |4.2.3.4 | |x| | | | +Silently ignore broadcast request |4.2.3.5 | |x| | | | +-------------------------------------------------|--------|-|-|-|-|-|-- +-------------------------------------------------|--------|-|-|-|-|-|-- + + + + + + + + + + + + + + + + + + + + + + + + + +Internet Engineering Task Force [Page 47] + + + + +RFC1123 MAIL -- SMTP & RFC-822 October 1989 + + +5. ELECTRONIC MAIL -- SMTP and RFC-822 + + 5.1 INTRODUCTION + + In the TCP/IP protocol suite, electronic mail in a format + specified in RFC-822 [SMTP:2] is transmitted using the Simple Mail + Transfer Protocol (SMTP) defined in RFC-821 [SMTP:1]. + + While SMTP has remained unchanged over the years, the Internet + community has made several changes in the way SMTP is used. In + particular, the conversion to the Domain Name System (DNS) has + caused changes in address formats and in mail routing. In this + section, we assume familiarity with the concepts and terminology + of the DNS, whose requirements are given in Section 6.1. + + RFC-822 specifies the Internet standard format for electronic mail + messages. RFC-822 supercedes an older standard, RFC-733, that may + still be in use in a few places, although it is obsolete. The two + formats are sometimes referred to simply by number ("822" and + "733"). + + RFC-822 is used in some non-Internet mail environments with + different mail transfer protocols than SMTP, and SMTP has also + been adapted for use in some non-Internet environments. Note that + this document presents the rules for the use of SMTP and RFC-822 + for the Internet environment only; other mail environments that + use these protocols may be expected to have their own rules. + + 5.2 PROTOCOL WALK-THROUGH + + This section covers both RFC-821 and RFC-822. + + The SMTP specification in RFC-821 is clear and contains numerous + examples, so implementors should not find it difficult to + understand. This section simply updates or annotates portions of + RFC-821 to conform with current usage. + + RFC-822 is a long and dense document, defining a rich syntax. + Unfortunately, incomplete or defective implementations of RFC-822 + are common. In fact, nearly all of the many formats of RFC-822 + are actually used, so an implementation generally needs to + recognize and correctly interpret all of the RFC-822 syntax. + + 5.2.1 The SMTP Model: RFC-821 Section 2 + + DISCUSSION: + Mail is sent by a series of request/response transactions + between a client, the "sender-SMTP," and a server, the + + + +Internet Engineering Task Force [Page 48] + + + + +RFC1123 MAIL -- SMTP & RFC-822 October 1989 + + + "receiver-SMTP". These transactions pass (1) the message + proper, which is composed of header and body, and (2) SMTP + source and destination addresses, referred to as the + "envelope". + + The SMTP programs are analogous to Message Transfer Agents + (MTAs) of X.400. There will be another level of protocol + software, closer to the end user, that is responsible for + composing and analyzing RFC-822 message headers; this + component is known as the "User Agent" in X.400, and we + use that term in this document. There is a clear logical + distinction between the User Agent and the SMTP + implementation, since they operate on different levels of + protocol. Note, however, that this distinction is may not + be exactly reflected the structure of typical + implementations of Internet mail. Often there is a + program known as the "mailer" that implements SMTP and + also some of the User Agent functions; the rest of the + User Agent functions are included in a user interface used + for entering and reading mail. + + The SMTP envelope is constructed at the originating site, + typically by the User Agent when the message is first + queued for the Sender-SMTP program. The envelope + addresses may be derived from information in the message + header, supplied by the user interface (e.g., to implement + a bcc: request), or derived from local configuration + information (e.g., expansion of a mailing list). The SMTP + envelope cannot in general be re-derived from the header + at a later stage in message delivery, so the envelope is + transmitted separately from the message itself using the + MAIL and RCPT commands of SMTP. + + The text of RFC-821 suggests that mail is to be delivered + to an individual user at a host. With the advent of the + domain system and of mail routing using mail-exchange (MX) + resource records, implementors should now think of + delivering mail to a user at a domain, which may or may + not be a particular host. This DOES NOT change the fact + that SMTP is a host-to-host mail exchange protocol. + + 5.2.2 Canonicalization: RFC-821 Section 3.1 + + The domain names that a Sender-SMTP sends in MAIL and RCPT + commands MUST have been "canonicalized," i.e., they must be + fully-qualified principal names or domain literals, not + nicknames or domain abbreviations. A canonicalized name either + identifies a host directly or is an MX name; it cannot be a + + + +Internet Engineering Task Force [Page 49] + + + + +RFC1123 MAIL -- SMTP & RFC-822 October 1989 + + + CNAME. + + 5.2.3 VRFY and EXPN Commands: RFC-821 Section 3.3 + + A receiver-SMTP MUST implement VRFY and SHOULD implement EXPN + (this requirement overrides RFC-821). However, there MAY be + configuration information to disable VRFY and EXPN in a + particular installation; this might even allow EXPN to be + disabled for selected lists. + + A new reply code is defined for the VRFY command: + + 252 Cannot VRFY user (e.g., info is not local), but will + take message for this user and attempt delivery. + + DISCUSSION: + SMTP users and administrators make regular use of these + commands for diagnosing mail delivery problems. With the + increasing use of multi-level mailing list expansion + (sometimes more than two levels), EXPN has been + increasingly important for diagnosing inadvertent mail + loops. On the other hand, some feel that EXPN represents + a significant privacy, and perhaps even a security, + exposure. + + 5.2.4 SEND, SOML, and SAML Commands: RFC-821 Section 3.4 + + An SMTP MAY implement the commands to send a message to a + user's terminal: SEND, SOML, and SAML. + + DISCUSSION: + It has been suggested that the use of mail relaying + through an MX record is inconsistent with the intent of + SEND to deliver a message immediately and directly to a + user's terminal. However, an SMTP receiver that is unable + to write directly to the user terminal can return a "251 + User Not Local" reply to the RCPT following a SEND, to + inform the originator of possibly deferred delivery. + + 5.2.5 HELO Command: RFC-821 Section 3.5 + + The sender-SMTP MUST ensure that the <domain> parameter in a + HELO command is a valid principal host domain name for the + client host. As a result, the receiver-SMTP will not have to + perform MX resolution on this name in order to validate the + HELO parameter. + + The HELO receiver MAY verify that the HELO parameter really + + + +Internet Engineering Task Force [Page 50] + + + + +RFC1123 MAIL -- SMTP & RFC-822 October 1989 + + + corresponds to the IP address of the sender. However, the + receiver MUST NOT refuse to accept a message, even if the + sender's HELO command fails verification. + + DISCUSSION: + Verifying the HELO parameter requires a domain name lookup + and may therefore take considerable time. An alternative + tool for tracking bogus mail sources is suggested below + (see "DATA Command"). + + Note also that the HELO argument is still required to have + valid <domain> syntax, since it will appear in a Received: + line; otherwise, a 501 error is to be sent. + + IMPLEMENTATION: + When HELO parameter validation fails, a suggested + procedure is to insert a note about the unknown + authenticity of the sender into the message header (e.g., + in the "Received:" line). + + 5.2.6 Mail Relay: RFC-821 Section 3.6 + + We distinguish three types of mail (store-and-) forwarding: + + (1) A simple forwarder or "mail exchanger" forwards a message + using private knowledge about the recipient; see section + 3.2 of RFC-821. + + (2) An SMTP mail "relay" forwards a message within an SMTP + mail environment as the result of an explicit source route + (as defined in section 3.6 of RFC-821). The SMTP relay + function uses the "@...:" form of source route from RFC- + 822 (see Section 5.2.19 below). + + (3) A mail "gateway" passes a message between different + environments. The rules for mail gateways are discussed + below in Section 5.3.7. + + An Internet host that is forwarding a message but is not a + gateway to a different mail environment (i.e., it falls under + (1) or (2)) SHOULD NOT alter any existing header fields, + although the host will add an appropriate Received: line as + required in Section 5.2.8. + + A Sender-SMTP SHOULD NOT send a RCPT TO: command containing an + explicit source route using the "@...:" address form. Thus, + the relay function defined in section 3.6 of RFC-821 should + not be used. + + + +Internet Engineering Task Force [Page 51] + + + + +RFC1123 MAIL -- SMTP & RFC-822 October 1989 + + + DISCUSSION: + The intent is to discourage all source routing and to + abolish explicit source routing for mail delivery within + the Internet environment. Source-routing is unnecessary; + the simple target address "user@domain" should always + suffice. This is the result of an explicit architectural + decision to use universal naming rather than source + routing for mail. Thus, SMTP provides end-to-end + connectivity, and the DNS provides globally-unique, + location-independent names. MX records handle the major + case where source routing might otherwise be needed. + + A receiver-SMTP MUST accept the explicit source route syntax in + the envelope, but it MAY implement the relay function as + defined in section 3.6 of RFC-821. If it does not implement + the relay function, it SHOULD attempt to deliver the message + directly to the host to the right of the right-most "@" sign. + + DISCUSSION: + For example, suppose a host that does not implement the + relay function receives a message with the SMTP command: + "RCPT TO:<@ALPHA,@BETA:joe@GAMMA>", where ALPHA, BETA, and + GAMMA represent domain names. Rather than immediately + refusing the message with a 550 error reply as suggested + on page 20 of RFC-821, the host should try to forward the + message to GAMMA directly, using: "RCPT TO:<joe@GAMMA>". + Since this host does not support relaying, it is not + required to update the reverse path. + + Some have suggested that source routing may be needed + occasionally for manually routing mail around failures; + however, the reality and importance of this need is + controversial. The use of explicit SMTP mail relaying for + this purpose is discouraged, and in fact it may not be + successful, as many host systems do not support it. Some + have used the "%-hack" (see Section 5.2.16) for this + purpose. + + 5.2.7 RCPT Command: RFC-821 Section 4.1.1 + + A host that supports a receiver-SMTP MUST support the reserved + mailbox "Postmaster". + + The receiver-SMTP MAY verify RCPT parameters as they arrive; + however, RCPT responses MUST NOT be delayed beyond a reasonable + time (see Section 5.3.2). + + Therefore, a "250 OK" response to a RCPT does not necessarily + + + +Internet Engineering Task Force [Page 52] + + + + +RFC1123 MAIL -- SMTP & RFC-822 October 1989 + + + imply that the delivery address(es) are valid. Errors found + after message acceptance will be reported by mailing a + notification message to an appropriate address (see Section + 5.3.3). + + DISCUSSION: + The set of conditions under which a RCPT parameter can be + validated immediately is an engineering design choice. + Reporting destination mailbox errors to the Sender-SMTP + before mail is transferred is generally desirable to save + time and network bandwidth, but this advantage is lost if + RCPT verification is lengthy. + + For example, the receiver can verify immediately any + simple local reference, such as a single locally- + registered mailbox. On the other hand, the "reasonable + time" limitation generally implies deferring verification + of a mailing list until after the message has been + transferred and accepted, since verifying a large mailing + list can take a very long time. An implementation might + or might not choose to defer validation of addresses that + are non-local and therefore require a DNS lookup. If a + DNS lookup is performed but a soft domain system error + (e.g., timeout) occurs, validity must be assumed. + + 5.2.8 DATA Command: RFC-821 Section 4.1.1 + + Every receiver-SMTP (not just one that "accepts a message for + relaying or for final delivery" [SMTP:1]) MUST insert a + "Received:" line at the beginning of a message. In this line, + called a "time stamp line" in RFC-821: + + * The FROM field SHOULD contain both (1) the name of the + source host as presented in the HELO command and (2) a + domain literal containing the IP address of the source, + determined from the TCP connection. + + * The ID field MAY contain an "@" as suggested in RFC-822, + but this is not required. + + * The FOR field MAY contain a list of <path> entries when + multiple RCPT commands have been given. + + + An Internet mail program MUST NOT change a Received: line that + was previously added to the message header. + + + + + +Internet Engineering Task Force [Page 53] + + + + +RFC1123 MAIL -- SMTP & RFC-822 October 1989 + + + DISCUSSION: + Including both the source host and the IP source address + in the Received: line may provide enough information for + tracking illicit mail sources and eliminate a need to + explicitly verify the HELO parameter. + + Received: lines are primarily intended for humans tracing + mail routes, primarily of diagnosis of faults. See also + the discussion under 5.3.7. + + When the receiver-SMTP makes "final delivery" of a message, + then it MUST pass the MAIL FROM: address from the SMTP envelope + with the message, for use if an error notification message must + be sent later (see Section 5.3.3). There is an analogous + requirement when gatewaying from the Internet into a different + mail environment; see Section 5.3.7. + + DISCUSSION: + Note that the final reply to the DATA command depends only + upon the successful transfer and storage of the message. + Any problem with the destination address(es) must either + (1) have been reported in an SMTP error reply to the RCPT + command(s), or (2) be reported in a later error message + mailed to the originator. + + IMPLEMENTATION: + The MAIL FROM: information may be passed as a parameter or + in a Return-Path: line inserted at the beginning of the + message. + + 5.2.9 Command Syntax: RFC-821 Section 4.1.2 + + The syntax shown in RFC-821 for the MAIL FROM: command omits + the case of an empty path: "MAIL FROM: <>" (see RFC-821 Page + 15). An empty reverse path MUST be supported. + + 5.2.10 SMTP Replies: RFC-821 Section 4.2 + + A receiver-SMTP SHOULD send only the reply codes listed in + section 4.2.2 of RFC-821 or in this document. A receiver-SMTP + SHOULD use the text shown in examples in RFC-821 whenever + appropriate. + + A sender-SMTP MUST determine its actions only by the reply + code, not by the text (except for 251 and 551 replies); any + text, including no text at all, must be acceptable. The space + (blank) following the reply code is considered part of the + text. Whenever possible, a sender-SMTP SHOULD test only the + + + +Internet Engineering Task Force [Page 54] + + + + +RFC1123 MAIL -- SMTP & RFC-822 October 1989 + + + first digit of the reply code, as specified in Appendix E of + RFC-821. + + DISCUSSION: + Interoperability problems have arisen with SMTP systems + using reply codes that are not listed explicitly in RFC- + 821 Section 4.3 but are legal according to the theory of + reply codes explained in Appendix E. + + 5.2.11 Transparency: RFC-821 Section 4.5.2 + + Implementors MUST be sure that their mail systems always add + and delete periods to ensure message transparency. + + 5.2.12 WKS Use in MX Processing: RFC-974, p. 5 + + RFC-974 [SMTP:3] recommended that the domain system be queried + for WKS ("Well-Known Service") records, to verify that each + proposed mail target does support SMTP. Later experience has + shown that WKS is not widely supported, so the WKS step in MX + processing SHOULD NOT be used. + + The following are notes on RFC-822, organized by section of that + document. + + 5.2.13 RFC-822 Message Specification: RFC-822 Section 4 + + The syntax shown for the Return-path line omits the possibility + of a null return path, which is used to prevent looping of + error notifications (see Section 5.3.3). The complete syntax + is: + + return = "Return-path" ":" route-addr + / "Return-path" ":" "<" ">" + + The set of optional header fields is hereby expanded to include + the Content-Type field defined in RFC-1049 [SMTP:7]. This + field "allows mail reading systems to automatically identify + the type of a structured message body and to process it for + display accordingly". [SMTP:7] A User Agent MAY support this + field. + + 5.2.14 RFC-822 Date and Time Specification: RFC-822 Section 5 + + The syntax for the date is hereby changed to: + + date = 1*2DIGIT month 2*4DIGIT + + + + +Internet Engineering Task Force [Page 55] + + + + +RFC1123 MAIL -- SMTP & RFC-822 October 1989 + + + All mail software SHOULD use 4-digit years in dates, to ease + the transition to the next century. + + There is a strong trend towards the use of numeric timezone + indicators, and implementations SHOULD use numeric timezones + instead of timezone names. However, all implementations MUST + accept either notation. If timezone names are used, they MUST + be exactly as defined in RFC-822. + + The military time zones are specified incorrectly in RFC-822: + they count the wrong way from UT (the signs are reversed). As + a result, military time zones in RFC-822 headers carry no + information. + + Finally, note that there is a typo in the definition of "zone" + in the syntax summary of appendix D; the correct definition + occurs in Section 3 of RFC-822. + + 5.2.15 RFC-822 Syntax Change: RFC-822 Section 6.1 + + The syntactic definition of "mailbox" in RFC-822 is hereby + changed to: + + mailbox = addr-spec ; simple address + / [phrase] route-addr ; name & addr-spec + + That is, the phrase preceding a route address is now OPTIONAL. + This change makes the following header field legal, for + example: + + From: <craig@nnsc.nsf.net> + + 5.2.16 RFC-822 Local-part: RFC-822 Section 6.2 + + The basic mailbox address specification has the form: "local- + part@domain". Here "local-part", sometimes called the "left- + hand side" of the address, is domain-dependent. + + A host that is forwarding the message but is not the + destination host implied by the right-hand side "domain" MUST + NOT interpret or modify the "local-part" of the address. + + When mail is to be gatewayed from the Internet mail environment + into a foreign mail environment (see Section 5.3.7), routing + information for that foreign environment MAY be embedded within + the "local-part" of the address. The gateway will then + interpret this local part appropriately for the foreign mail + environment. + + + +Internet Engineering Task Force [Page 56] + + + + +RFC1123 MAIL -- SMTP & RFC-822 October 1989 + + + DISCUSSION: + Although source routes are discouraged within the Internet + (see Section 5.2.6), there are non-Internet mail + environments whose delivery mechanisms do depend upon + source routes. Source routes for extra-Internet + environments can generally be buried in the "local-part" + of the address (see Section 5.2.16) while mail traverses + the Internet. When the mail reaches the appropriate + Internet mail gateway, the gateway will interpret the + local-part and build the necessary address or route for + the target mail environment. + + For example, an Internet host might send mail to: + "a!b!c!user@gateway-domain". The complex local part + "a!b!c!user" would be uninterpreted within the Internet + domain, but could be parsed and understood by the + specified mail gateway. + + An embedded source route is sometimes encoded in the + "local-part" using "%" as a right-binding routing + operator. For example, in: + + user%domain%relay3%relay2@relay1 + + the "%" convention implies that the mail is to be routed + from "relay1" through "relay2", "relay3", and finally to + "user" at "domain". This is commonly known as the "%- + hack". It is suggested that "%" have lower precedence + than any other routing operator (e.g., "!") hidden in the + local-part; for example, "a!b%c" would be interpreted as + "(a!b)%c". + + Only the target host (in this case, "relay1") is permitted + to analyze the local-part "user%domain%relay3%relay2". + + 5.2.17 Domain Literals: RFC-822 Section 6.2.3 + + A mailer MUST be able to accept and parse an Internet domain + literal whose content ("dtext"; see RFC-822) is a dotted- + decimal host address. This satisfies the requirement of + Section 2.1 for the case of mail. + + An SMTP MUST accept and recognize a domain literal for any of + its own IP addresses. + + + + + + + +Internet Engineering Task Force [Page 57] + + + + +RFC1123 MAIL -- SMTP & RFC-822 October 1989 + + + 5.2.18 Common Address Formatting Errors: RFC-822 Section 6.1 + + Errors in formatting or parsing 822 addresses are unfortunately + common. This section mentions only the most common errors. A + User Agent MUST accept all valid RFC-822 address formats, and + MUST NOT generate illegal address syntax. + + o A common error is to leave out the semicolon after a group + identifier. + + o Some systems fail to fully-qualify domain names in + messages they generate. The right-hand side of an "@" + sign in a header address field MUST be a fully-qualified + domain name. + + For example, some systems fail to fully-qualify the From: + address; this prevents a "reply" command in the user + interface from automatically constructing a return + address. + + DISCUSSION: + Although RFC-822 allows the local use of abbreviated + domain names within a domain, the application of + RFC-822 in Internet mail does not allow this. The + intent is that an Internet host must not send an SMTP + message header containing an abbreviated domain name + in an address field. This allows the address fields + of the header to be passed without alteration across + the Internet, as required in Section 5.2.6. + + o Some systems mis-parse multiple-hop explicit source routes + such as: + + @relay1,@relay2,@relay3:user@domain. + + + o Some systems over-qualify domain names by adding a + trailing dot to some or all domain names in addresses or + message-ids. This violates RFC-822 syntax. + + + 5.2.19 Explicit Source Routes: RFC-822 Section 6.2.7 + + Internet host software SHOULD NOT create an RFC-822 header + containing an address with an explicit source route, but MUST + accept such headers for compatibility with earlier systems. + + DISCUSSION: + + + +Internet Engineering Task Force [Page 58] + + + + +RFC1123 MAIL -- SMTP & RFC-822 October 1989 + + + In an understatement, RFC-822 says "The use of explicit + source routing is discouraged". Many hosts implemented + RFC-822 source routes incorrectly, so the syntax cannot be + used unambiguously in practice. Many users feel the + syntax is ugly. Explicit source routes are not needed in + the mail envelope for delivery; see Section 5.2.6. For + all these reasons, explicit source routes using the RFC- + 822 notations are not to be used in Internet mail headers. + + As stated in Section 5.2.16, it is necessary to allow an + explicit source route to be buried in the local-part of an + address, e.g., using the "%-hack", in order to allow mail + to be gatewayed into another environment in which explicit + source routing is necessary. The vigilant will observe + that there is no way for a User Agent to detect and + prevent the use of such implicit source routing when the + destination is within the Internet. We can only + discourage source routing of any kind within the Internet, + as unnecessary and undesirable. + + 5.3 SPECIFIC ISSUES + + 5.3.1 SMTP Queueing Strategies + + The common structure of a host SMTP implementation includes + user mailboxes, one or more areas for queueing messages in + transit, and one or more daemon processes for sending and + receiving mail. The exact structure will vary depending on the + needs of the users on the host and the number and size of + mailing lists supported by the host. We describe several + optimizations that have proved helpful, particularly for + mailers supporting high traffic levels. + + Any queueing strategy MUST include: + + o Timeouts on all activities. See Section 5.3.2. + + o Never sending error messages in response to error + messages. + + + 5.3.1.1 Sending Strategy + + The general model of a sender-SMTP is one or more processes + that periodically attempt to transmit outgoing mail. In a + typical system, the program that composes a message has some + method for requesting immediate attention for a new piece of + outgoing mail, while mail that cannot be transmitted + + + +Internet Engineering Task Force [Page 59] + + + + +RFC1123 MAIL -- SMTP & RFC-822 October 1989 + + + immediately MUST be queued and periodically retried by the + sender. A mail queue entry will include not only the + message itself but also the envelope information. + + The sender MUST delay retrying a particular destination + after one attempt has failed. In general, the retry + interval SHOULD be at least 30 minutes; however, more + sophisticated and variable strategies will be beneficial + when the sender-SMTP can determine the reason for non- + delivery. + + Retries continue until the message is transmitted or the + sender gives up; the give-up time generally needs to be at + least 4-5 days. The parameters to the retry algorithm MUST + be configurable. + + A sender SHOULD keep a list of hosts it cannot reach and + corresponding timeouts, rather than just retrying queued + mail items. + + DISCUSSION: + Experience suggests that failures are typically + transient (the target system has crashed), favoring a + policy of two connection attempts in the first hour the + message is in the queue, and then backing off to once + every two or three hours. + + The sender-SMTP can shorten the queueing delay by + cooperation with the receiver-SMTP. In particular, if + mail is received from a particular address, it is good + evidence that any mail queued for that host can now be + sent. + + The strategy may be further modified as a result of + multiple addresses per host (see Section 5.3.4), to + optimize delivery time vs. resource usage. + + A sender-SMTP may have a large queue of messages for + each unavailable destination host, and if it retried + all these messages in every retry cycle, there would be + excessive Internet overhead and the daemon would be + blocked for a long period. Note that an SMTP can + generally determine that a delivery attempt has failed + only after a timeout of a minute or more; a one minute + timeout per connection will result in a very large + delay if it is repeated for dozens or even hundreds of + queued messages. + + + + +Internet Engineering Task Force [Page 60] + + + + +RFC1123 MAIL -- SMTP & RFC-822 October 1989 + + + When the same message is to be delivered to several users on + the same host, only one copy of the message SHOULD be + transmitted. That is, the sender-SMTP should use the + command sequence: RCPT, RCPT,... RCPT, DATA instead of the + sequence: RCPT, DATA, RCPT, DATA,... RCPT, DATA. + Implementation of this efficiency feature is strongly urged. + + Similarly, the sender-SMTP MAY support multiple concurrent + outgoing mail transactions to achieve timely delivery. + However, some limit SHOULD be imposed to protect the host + from devoting all its resources to mail. + + The use of the different addresses of a multihomed host is + discussed below. + + 5.3.1.2 Receiving strategy + + The receiver-SMTP SHOULD attempt to keep a pending listen on + the SMTP port at all times. This will require the support + of multiple incoming TCP connections for SMTP. Some limit + MAY be imposed. + + IMPLEMENTATION: + When the receiver-SMTP receives mail from a particular + host address, it could notify the sender-SMTP to retry + any mail pending for that host address. + + 5.3.2 Timeouts in SMTP + + There are two approaches to timeouts in the sender-SMTP: (a) + limit the time for each SMTP command separately, or (b) limit + the time for the entire SMTP dialogue for a single mail + message. A sender-SMTP SHOULD use option (a), per-command + timeouts. Timeouts SHOULD be easily reconfigurable, preferably + without recompiling the SMTP code. + + DISCUSSION: + Timeouts are an essential feature of an SMTP + implementation. If the timeouts are too long (or worse, + there are no timeouts), Internet communication failures or + software bugs in receiver-SMTP programs can tie up SMTP + processes indefinitely. If the timeouts are too short, + resources will be wasted with attempts that time out part + way through message delivery. + + If option (b) is used, the timeout has to be very large, + e.g., an hour, to allow time to expand very large mailing + lists. The timeout may also need to increase linearly + + + +Internet Engineering Task Force [Page 61] + + + + +RFC1123 MAIL -- SMTP & RFC-822 October 1989 + + + with the size of the message, to account for the time to + transmit a very large message. A large fixed timeout + leads to two problems: a failure can still tie up the + sender for a very long time, and very large messages may + still spuriously time out (which is a wasteful failure!). + + Using the recommended option (a), a timer is set for each + SMTP command and for each buffer of the data transfer. + The latter means that the overall timeout is inherently + proportional to the size of the message. + + Based on extensive experience with busy mail-relay hosts, the + minimum per-command timeout values SHOULD be as follows: + + o Initial 220 Message: 5 minutes + + A Sender-SMTP process needs to distinguish between a + failed TCP connection and a delay in receiving the initial + 220 greeting message. Many receiver-SMTPs will accept a + TCP connection but delay delivery of the 220 message until + their system load will permit more mail to be processed. + + o MAIL Command: 5 minutes + + + o RCPT Command: 5 minutes + + A longer timeout would be required if processing of + mailing lists and aliases were not deferred until after + the message was accepted. + + o DATA Initiation: 2 minutes + + This is while awaiting the "354 Start Input" reply to a + DATA command. + + o Data Block: 3 minutes + + This is while awaiting the completion of each TCP SEND + call transmitting a chunk of data. + + o DATA Termination: 10 minutes. + + This is while awaiting the "250 OK" reply. When the + receiver gets the final period terminating the message + data, it typically performs processing to deliver the + message to a user mailbox. A spurious timeout at this + point would be very wasteful, since the message has been + + + +Internet Engineering Task Force [Page 62] + + + + +RFC1123 MAIL -- SMTP & RFC-822 October 1989 + + + successfully sent. + + A receiver-SMTP SHOULD have a timeout of at least 5 minutes + while it is awaiting the next command from the sender. + + 5.3.3 Reliable Mail Receipt + + When the receiver-SMTP accepts a piece of mail (by sending a + "250 OK" message in response to DATA), it is accepting + responsibility for delivering or relaying the message. It must + take this responsibility seriously, i.e., it MUST NOT lose the + message for frivolous reasons, e.g., because the host later + crashes or because of a predictable resource shortage. + + If there is a delivery failure after acceptance of a message, + the receiver-SMTP MUST formulate and mail a notification + message. This notification MUST be sent using a null ("<>") + reverse path in the envelope; see Section 3.6 of RFC-821. The + recipient of this notification SHOULD be the address from the + envelope return path (or the Return-Path: line). However, if + this address is null ("<>"), the receiver-SMTP MUST NOT send a + notification. If the address is an explicit source route, it + SHOULD be stripped down to its final hop. + + DISCUSSION: + For example, suppose that an error notification must be + sent for a message that arrived with: + "MAIL FROM:<@a,@b:user@d>". The notification message + should be sent to: "RCPT TO:<user@d>". + + Some delivery failures after the message is accepted by + SMTP will be unavoidable. For example, it may be + impossible for the receiver-SMTP to validate all the + delivery addresses in RCPT command(s) due to a "soft" + domain system error or because the target is a mailing + list (see earlier discussion of RCPT). + + To avoid receiving duplicate messages as the result of + timeouts, a receiver-SMTP MUST seek to minimize the time + required to respond to the final "." that ends a message + transfer. See RFC-1047 [SMTP:4] for a discussion of this + problem. + + 5.3.4 Reliable Mail Transmission + + To transmit a message, a sender-SMTP determines the IP address + of the target host from the destination address in the + envelope. Specifically, it maps the string to the right of the + + + +Internet Engineering Task Force [Page 63] + + + + +RFC1123 MAIL -- SMTP & RFC-822 October 1989 + + + "@" sign into an IP address. This mapping or the transfer + itself may fail with a soft error, in which case the sender- + SMTP will requeue the outgoing mail for a later retry, as + required in Section 5.3.1.1. + + When it succeeds, the mapping can result in a list of + alternative delivery addresses rather than a single address, + because of (a) multiple MX records, (b) multihoming, or both. + To provide reliable mail transmission, the sender-SMTP MUST be + able to try (and retry) each of the addresses in this list in + order, until a delivery attempt succeeds. However, there MAY + also be a configurable limit on the number of alternate + addresses that can be tried. In any case, a host SHOULD try at + least two addresses. + + The following information is to be used to rank the host + addresses: + + (1) Multiple MX Records -- these contain a preference + indication that should be used in sorting. If there are + multiple destinations with the same preference and there + is no clear reason to favor one (e.g., by address + preference), then the sender-SMTP SHOULD pick one at + random to spread the load across multiple mail exchanges + for a specific organization; note that this is a + refinement of the procedure in [DNS:3]. + + (2) Multihomed host -- The destination host (perhaps taken + from the preferred MX record) may be multihomed, in which + case the domain name resolver will return a list of + alternative IP addresses. It is the responsibility of the + domain name resolver interface (see Section 6.1.3.4 below) + to have ordered this list by decreasing preference, and + SMTP MUST try them in the order presented. + + DISCUSSION: + Although the capability to try multiple alternative + addresses is required, there may be circumstances where + specific installations want to limit or disable the use of + alternative addresses. The question of whether a sender + should attempt retries using the different addresses of a + multihomed host has been controversial. The main argument + for using the multiple addresses is that it maximizes the + probability of timely delivery, and indeed sometimes the + probability of any delivery; the counter argument is that + it may result in unnecessary resource use. + + Note that resource use is also strongly determined by the + + + +Internet Engineering Task Force [Page 64] + + + + +RFC1123 MAIL -- SMTP & RFC-822 October 1989 + + + sending strategy discussed in Section 5.3.1. + + 5.3.5 Domain Name Support + + SMTP implementations MUST use the mechanism defined in Section + 6.1 for mapping between domain names and IP addresses. This + means that every Internet SMTP MUST include support for the + Internet DNS. + + In particular, a sender-SMTP MUST support the MX record scheme + [SMTP:3]. See also Section 7.4 of [DNS:2] for information on + domain name support for SMTP. + + 5.3.6 Mailing Lists and Aliases + + An SMTP-capable host SHOULD support both the alias and the list + form of address expansion for multiple delivery. When a + message is delivered or forwarded to each address of an + expanded list form, the return address in the envelope + ("MAIL FROM:") MUST be changed to be the address of a person + who administers the list, but the message header MUST be left + unchanged; in particular, the "From" field of the message is + unaffected. + + DISCUSSION: + An important mail facility is a mechanism for multi- + destination delivery of a single message, by transforming + or "expanding" a pseudo-mailbox address into a list of + destination mailbox addresses. When a message is sent to + such a pseudo-mailbox (sometimes called an "exploder"), + copies are forwarded or redistributed to each mailbox in + the expanded list. We classify such a pseudo-mailbox as + an "alias" or a "list", depending upon the expansion + rules: + + (a) Alias + + To expand an alias, the recipient mailer simply + replaces the pseudo-mailbox address in the envelope + with each of the expanded addresses in turn; the rest + of the envelope and the message body are left + unchanged. The message is then delivered or + forwarded to each expanded address. + + (b) List + + A mailing list may be said to operate by + "redistribution" rather than by "forwarding". To + + + +Internet Engineering Task Force [Page 65] + + + + +RFC1123 MAIL -- SMTP & RFC-822 October 1989 + + + expand a list, the recipient mailer replaces the + pseudo-mailbox address in the envelope with each of + the expanded addresses in turn. The return address in + the envelope is changed so that all error messages + generated by the final deliveries will be returned to + a list administrator, not to the message originator, + who generally has no control over the contents of the + list and will typically find error messages annoying. + + + 5.3.7 Mail Gatewaying + + Gatewaying mail between different mail environments, i.e., + different mail formats and protocols, is complex and does not + easily yield to standardization. See for example [SMTP:5a], + [SMTP:5b]. However, some general requirements may be given for + a gateway between the Internet and another mail environment. + + (A) Header fields MAY be rewritten when necessary as messages + are gatewayed across mail environment boundaries. + + DISCUSSION: + This may involve interpreting the local-part of the + destination address, as suggested in Section 5.2.16. + + The other mail systems gatewayed to the Internet + generally use a subset of RFC-822 headers, but some + of them do not have an equivalent to the SMTP + envelope. Therefore, when a message leaves the + Internet environment, it may be necessary to fold the + SMTP envelope information into the message header. A + possible solution would be to create new header + fields to carry the envelope information (e.g., "X- + SMTP-MAIL:" and "X-SMTP-RCPT:"); however, this would + require changes in mail programs in the foreign + environment. + + (B) When forwarding a message into or out of the Internet + environment, a gateway MUST prepend a Received: line, but + it MUST NOT alter in any way a Received: line that is + already in the header. + + DISCUSSION: + This requirement is a subset of the general + "Received:" line requirement of Section 5.2.8; it is + restated here for emphasis. + + Received: fields of messages originating from other + + + +Internet Engineering Task Force [Page 66] + + + + +RFC1123 MAIL -- SMTP & RFC-822 October 1989 + + + environments may not conform exactly to RFC822. + However, the most important use of Received: lines is + for debugging mail faults, and this debugging can be + severely hampered by well-meaning gateways that try + to "fix" a Received: line. + + The gateway is strongly encouraged to indicate the + environment and protocol in the "via" clauses of + Received field(s) that it supplies. + + (C) From the Internet side, the gateway SHOULD accept all + valid address formats in SMTP commands and in RFC-822 + headers, and all valid RFC-822 messages. Although a + gateway must accept an RFC-822 explicit source route + ("@...:" format) in either the RFC-822 header or in the + envelope, it MAY or may not act on the source route; see + Sections 5.2.6 and 5.2.19. + + DISCUSSION: + It is often tempting to restrict the range of + addresses accepted at the mail gateway to simplify + the translation into addresses for the remote + environment. This practice is based on the + assumption that mail users have control over the + addresses their mailers send to the mail gateway. In + practice, however, users have little control over the + addresses that are finally sent; their mailers are + free to change addresses into any legal RFC-822 + format. + + (D) The gateway MUST ensure that all header fields of a + message that it forwards into the Internet meet the + requirements for Internet mail. In particular, all + addresses in "From:", "To:", "Cc:", etc., fields must be + transformed (if necessary) to satisfy RFC-822 syntax, and + they must be effective and useful for sending replies. + + + (E) The translation algorithm used to convert mail from the + Internet protocols to another environment's protocol + SHOULD try to ensure that error messages from the foreign + mail environment are delivered to the return path from the + SMTP envelope, not to the sender listed in the "From:" + field of the RFC-822 message. + + DISCUSSION: + Internet mail lists usually place the address of the + mail list maintainer in the envelope but leave the + + + +Internet Engineering Task Force [Page 67] + + + + +RFC1123 MAIL -- SMTP & RFC-822 October 1989 + + + original message header intact (with the "From:" + field containing the original sender). This yields + the behavior the average recipient expects: a reply + to the header gets sent to the original sender, not + to a mail list maintainer; however, errors get sent + to the maintainer (who can fix the problem) and not + the sender (who probably cannot). + + (F) Similarly, when forwarding a message from another + environment into the Internet, the gateway SHOULD set the + envelope return path in accordance with an error message + return address, if any, supplied by the foreign + environment. + + + 5.3.8 Maximum Message Size + + Mailer software MUST be able to send and receive messages of at + least 64K bytes in length (including header), and a much larger + maximum size is highly desirable. + + DISCUSSION: + Although SMTP does not define the maximum size of a + message, many systems impose implementation limits. + + The current de facto minimum limit in the Internet is 64K + bytes. However, electronic mail is used for a variety of + purposes that create much larger messages. For example, + mail is often used instead of FTP for transmitting ASCII + files, and in particular to transmit entire documents. As + a result, messages can be 1 megabyte or even larger. We + note that the present document together with its lower- + layer companion contains 0.5 megabytes. + + + + + + + + + + + + + + + + + + +Internet Engineering Task Force [Page 68] + + + + +RFC1123 MAIL -- SMTP & RFC-822 October 1989 + + + 5.4 SMTP REQUIREMENTS SUMMARY + + | | | | |S| | + | | | | |H| |F + | | | | |O|M|o + | | |S| |U|U|o + | | |H| |L|S|t + | |M|O| |D|T|n + | |U|U|M| | |o + | |S|L|A|N|N|t + | |T|D|Y|O|O|t +FEATURE |SECTION | | | |T|T|e +-----------------------------------------------|----------|-|-|-|-|-|-- + | | | | | | | +RECEIVER-SMTP: | | | | | | | + Implement VRFY |5.2.3 |x| | | | | + Implement EXPN |5.2.3 | |x| | | | + EXPN, VRFY configurable |5.2.3 | | |x| | | + Implement SEND, SOML, SAML |5.2.4 | | |x| | | + Verify HELO parameter |5.2.5 | | |x| | | + Refuse message with bad HELO |5.2.5 | | | | |x| + Accept explicit src-route syntax in env. |5.2.6 |x| | | | | + Support "postmaster" |5.2.7 |x| | | | | + Process RCPT when received (except lists) |5.2.7 | | |x| | | + Long delay of RCPT responses |5.2.7 | | | | |x| + | | | | | | | + Add Received: line |5.2.8 |x| | | | | + Received: line include domain literal |5.2.8 | |x| | | | + Change previous Received: line |5.2.8 | | | | |x| + Pass Return-Path info (final deliv/gwy) |5.2.8 |x| | | | | + Support empty reverse path |5.2.9 |x| | | | | + Send only official reply codes |5.2.10 | |x| | | | + Send text from RFC-821 when appropriate |5.2.10 | |x| | | | + Delete "." for transparency |5.2.11 |x| | | | | + Accept and recognize self domain literal(s) |5.2.17 |x| | | | | + | | | | | | | + Error message about error message |5.3.1 | | | | |x| + Keep pending listen on SMTP port |5.3.1.2 | |x| | | | + Provide limit on recv concurrency |5.3.1.2 | | |x| | | + Wait at least 5 mins for next sender cmd |5.3.2 | |x| | | | + Avoidable delivery failure after "250 OK" |5.3.3 | | | | |x| + Send error notification msg after accept |5.3.3 |x| | | | | + Send using null return path |5.3.3 |x| | | | | + Send to envelope return path |5.3.3 | |x| | | | + Send to null address |5.3.3 | | | | |x| + Strip off explicit src route |5.3.3 | |x| | | | + Minimize acceptance delay (RFC-1047) |5.3.3 |x| | | | | +-----------------------------------------------|----------|-|-|-|-|-|-- + + + +Internet Engineering Task Force [Page 69] + + + + +RFC1123 MAIL -- SMTP & RFC-822 October 1989 + + + | | | | | | | +SENDER-SMTP: | | | | | | | + Canonicalized domain names in MAIL, RCPT |5.2.2 |x| | | | | + Implement SEND, SOML, SAML |5.2.4 | | |x| | | + Send valid principal host name in HELO |5.2.5 |x| | | | | + Send explicit source route in RCPT TO: |5.2.6 | | | |x| | + Use only reply code to determine action |5.2.10 |x| | | | | + Use only high digit of reply code when poss. |5.2.10 | |x| | | | + Add "." for transparency |5.2.11 |x| | | | | + | | | | | | | + Retry messages after soft failure |5.3.1.1 |x| | | | | + Delay before retry |5.3.1.1 |x| | | | | + Configurable retry parameters |5.3.1.1 |x| | | | | + Retry once per each queued dest host |5.3.1.1 | |x| | | | + Multiple RCPT's for same DATA |5.3.1.1 | |x| | | | + Support multiple concurrent transactions |5.3.1.1 | | |x| | | + Provide limit on concurrency |5.3.1.1 | |x| | | | + | | | | | | | + Timeouts on all activities |5.3.1 |x| | | | | + Per-command timeouts |5.3.2 | |x| | | | + Timeouts easily reconfigurable |5.3.2 | |x| | | | + Recommended times |5.3.2 | |x| | | | + Try alternate addr's in order |5.3.4 |x| | | | | + Configurable limit on alternate tries |5.3.4 | | |x| | | + Try at least two alternates |5.3.4 | |x| | | | + Load-split across equal MX alternates |5.3.4 | |x| | | | + Use the Domain Name System |5.3.5 |x| | | | | + Support MX records |5.3.5 |x| | | | | + Use WKS records in MX processing |5.2.12 | | | |x| | +-----------------------------------------------|----------|-|-|-|-|-|-- + | | | | | | | +MAIL FORWARDING: | | | | | | | + Alter existing header field(s) |5.2.6 | | | |x| | + Implement relay function: 821/section 3.6 |5.2.6 | | |x| | | + If not, deliver to RHS domain |5.2.6 | |x| | | | + Interpret 'local-part' of addr |5.2.16 | | | | |x| + | | | | | | | +MAILING LISTS AND ALIASES | | | | | | | + Support both |5.3.6 | |x| | | | + Report mail list error to local admin. |5.3.6 |x| | | | | + | | | | | | | +MAIL GATEWAYS: | | | | | | | + Embed foreign mail route in local-part |5.2.16 | | |x| | | + Rewrite header fields when necessary |5.3.7 | | |x| | | + Prepend Received: line |5.3.7 |x| | | | | + Change existing Received: line |5.3.7 | | | | |x| + Accept full RFC-822 on Internet side |5.3.7 | |x| | | | + Act on RFC-822 explicit source route |5.3.7 | | |x| | | + + + +Internet Engineering Task Force [Page 70] + + + + +RFC1123 MAIL -- SMTP & RFC-822 October 1989 + + + Send only valid RFC-822 on Internet side |5.3.7 |x| | | | | + Deliver error msgs to envelope addr |5.3.7 | |x| | | | + Set env return path from err return addr |5.3.7 | |x| | | | + | | | | | | | +USER AGENT -- RFC-822 | | | | | | | + Allow user to enter <route> address |5.2.6 | | | |x| | + Support RFC-1049 Content Type field |5.2.13 | | |x| | | + Use 4-digit years |5.2.14 | |x| | | | + Generate numeric timezones |5.2.14 | |x| | | | + Accept all timezones |5.2.14 |x| | | | | + Use non-num timezones from RFC-822 |5.2.14 |x| | | | | + Omit phrase before route-addr |5.2.15 | | |x| | | + Accept and parse dot.dec. domain literals |5.2.17 |x| | | | | + Accept all RFC-822 address formats |5.2.18 |x| | | | | + Generate invalid RFC-822 address format |5.2.18 | | | | |x| + Fully-qualified domain names in header |5.2.18 |x| | | | | + Create explicit src route in header |5.2.19 | | | |x| | + Accept explicit src route in header |5.2.19 |x| | | | | + | | | | | | | +Send/recv at least 64KB messages |5.3.8 |x| | | | | + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Internet Engineering Task Force [Page 71] + + + + +RFC1123 SUPPORT SERVICES -- DOMAINS October 1989 + + +6. SUPPORT SERVICES + + 6.1 DOMAIN NAME TRANSLATION + + 6.1.1 INTRODUCTION + + Every host MUST implement a resolver for the Domain Name System + (DNS), and it MUST implement a mechanism using this DNS + resolver to convert host names to IP addresses and vice-versa + [DNS:1, DNS:2]. + + In addition to the DNS, a host MAY also implement a host name + translation mechanism that searches a local Internet host + table. See Section 6.1.3.8 for more information on this + option. + + DISCUSSION: + Internet host name translation was originally performed by + searching local copies of a table of all hosts. This + table became too large to update and distribute in a + timely manner and too large to fit into many hosts, so the + DNS was invented. + + The DNS creates a distributed database used primarily for + the translation between host names and host addresses. + Implementation of DNS software is required. The DNS + consists of two logically distinct parts: name servers and + resolvers (although implementations often combine these + two logical parts in the interest of efficiency) [DNS:2]. + + Domain name servers store authoritative data about certain + sections of the database and answer queries about the + data. Domain resolvers query domain name servers for data + on behalf of user processes. Every host therefore needs a + DNS resolver; some host machines will also need to run + domain name servers. Since no name server has complete + information, in general it is necessary to obtain + information from more than one name server to resolve a + query. + + 6.1.2 PROTOCOL WALK-THROUGH + + An implementor must study references [DNS:1] and [DNS:2] + carefully. They provide a thorough description of the theory, + protocol, and implementation of the domain name system, and + reflect several years of experience. + + + + + +Internet Engineering Task Force [Page 72] + + + + +RFC1123 SUPPORT SERVICES -- DOMAINS October 1989 + + + 6.1.2.1 Resource Records with Zero TTL: RFC-1035 Section 3.2.1 + + All DNS name servers and resolvers MUST properly handle RRs + with a zero TTL: return the RR to the client but do not + cache it. + + DISCUSSION: + Zero TTL values are interpreted to mean that the RR can + only be used for the transaction in progress, and + should not be cached; they are useful for extremely + volatile data. + + 6.1.2.2 QCLASS Values: RFC-1035 Section 3.2.5 + + A query with "QCLASS=*" SHOULD NOT be used unless the + requestor is seeking data from more than one class. In + particular, if the requestor is only interested in Internet + data types, QCLASS=IN MUST be used. + + 6.1.2.3 Unused Fields: RFC-1035 Section 4.1.1 + + Unused fields in a query or response message MUST be zero. + + 6.1.2.4 Compression: RFC-1035 Section 4.1.4 + + Name servers MUST use compression in responses. + + DISCUSSION: + Compression is essential to avoid overflowing UDP + datagrams; see Section 6.1.3.2. + + 6.1.2.5 Misusing Configuration Info: RFC-1035 Section 6.1.2 + + Recursive name servers and full-service resolvers generally + have some configuration information containing hints about + the location of root or local name servers. An + implementation MUST NOT include any of these hints in a + response. + + DISCUSSION: + Many implementors have found it convenient to store + these hints as if they were cached data, but some + neglected to ensure that this "cached data" was not + included in responses. This has caused serious + problems in the Internet when the hints were obsolete + or incorrect. + + + + + +Internet Engineering Task Force [Page 73] + + + + +RFC1123 SUPPORT SERVICES -- DOMAINS October 1989 + + + 6.1.3 SPECIFIC ISSUES + + 6.1.3.1 Resolver Implementation + + A name resolver SHOULD be able to multiplex concurrent + requests if the host supports concurrent processes. + + In implementing a DNS resolver, one of two different models + MAY optionally be chosen: a full-service resolver, or a stub + resolver. + + + (A) Full-Service Resolver + + A full-service resolver is a complete implementation of + the resolver service, and is capable of dealing with + communication failures, failure of individual name + servers, location of the proper name server for a given + name, etc. It must satisfy the following requirements: + + o The resolver MUST implement a local caching + function to avoid repeated remote access for + identical requests, and MUST time out information + in the cache. + + o The resolver SHOULD be configurable with start-up + information pointing to multiple root name servers + and multiple name servers for the local domain. + This insures that the resolver will be able to + access the whole name space in normal cases, and + will be able to access local domain information + should the local network become disconnected from + the rest of the Internet. + + + (B) Stub Resolver + + A "stub resolver" relies on the services of a recursive + name server on the connected network or a "nearby" + network. This scheme allows the host to pass on the + burden of the resolver function to a name server on + another host. This model is often essential for less + capable hosts, such as PCs, and is also recommended + when the host is one of several workstations on a local + network, because it allows all of the workstations to + share the cache of the recursive name server and hence + reduce the number of domain requests exported by the + local network. + + + +Internet Engineering Task Force [Page 74] + + + + +RFC1123 SUPPORT SERVICES -- DOMAINS October 1989 + + + At a minimum, the stub resolver MUST be capable of + directing its requests to redundant recursive name + servers. Note that recursive name servers are allowed + to restrict the sources of requests that they will + honor, so the host administrator must verify that the + service will be provided. Stub resolvers MAY implement + caching if they choose, but if so, MUST timeout cached + information. + + + 6.1.3.2 Transport Protocols + + DNS resolvers and recursive servers MUST support UDP, and + SHOULD support TCP, for sending (non-zone-transfer) queries. + Specifically, a DNS resolver or server that is sending a + non-zone-transfer query MUST send a UDP query first. If the + Answer section of the response is truncated and if the + requester supports TCP, it SHOULD try the query again using + TCP. + + DNS servers MUST be able to service UDP queries and SHOULD + be able to service TCP queries. A name server MAY limit the + resources it devotes to TCP queries, but it SHOULD NOT + refuse to service a TCP query just because it would have + succeeded with UDP. + + Truncated responses MUST NOT be saved (cached) and later + used in such a way that the fact that they are truncated is + lost. + + DISCUSSION: + UDP is preferred over TCP for queries because UDP + queries have much lower overhead, both in packet count + and in connection state. The use of UDP is essential + for heavily-loaded servers, especially the root + servers. UDP also offers additional robustness, since + a resolver can attempt several UDP queries to different + servers for the cost of a single TCP query. + + It is possible for a DNS response to be truncated, + although this is a very rare occurrence in the present + Internet DNS. Practically speaking, truncation cannot + be predicted, since it is data-dependent. The + dependencies include the number of RRs in the answer, + the size of each RR, and the savings in space realized + by the name compression algorithm. As a rule of thumb, + truncation in NS and MX lists should not occur for + answers containing 15 or fewer RRs. + + + +Internet Engineering Task Force [Page 75] + + + + +RFC1123 SUPPORT SERVICES -- DOMAINS October 1989 + + + Whether it is possible to use a truncated answer + depends on the application. A mailer must not use a + truncated MX response, since this could lead to mail + loops. + + Responsible practices can make UDP suffice in the vast + majority of cases. Name servers must use compression + in responses. Resolvers must differentiate truncation + of the Additional section of a response (which only + loses extra information) from truncation of the Answer + section (which for MX records renders the response + unusable by mailers). Database administrators should + list only a reasonable number of primary names in lists + of name servers, MX alternatives, etc. + + However, it is also clear that some new DNS record + types defined in the future will contain information + exceeding the 512 byte limit that applies to UDP, and + hence will require TCP. Thus, resolvers and name + servers should implement TCP services as a backup to + UDP today, with the knowledge that they will require + the TCP service in the future. + + By private agreement, name servers and resolvers MAY arrange + to use TCP for all traffic between themselves. TCP MUST be + used for zone transfers. + + A DNS server MUST have sufficient internal concurrency that + it can continue to process UDP queries while awaiting a + response or performing a zone transfer on an open TCP + connection [DNS:2]. + + A server MAY support a UDP query that is delivered using an + IP broadcast or multicast address. However, the Recursion + Desired bit MUST NOT be set in a query that is multicast, + and MUST be ignored by name servers receiving queries via a + broadcast or multicast address. A host that sends broadcast + or multicast DNS queries SHOULD send them only as occasional + probes, caching the IP address(es) it obtains from the + response(s) so it can normally send unicast queries. + + DISCUSSION: + Broadcast or (especially) IP multicast can provide a + way to locate nearby name servers without knowing their + IP addresses in advance. However, general broadcasting + of recursive queries can result in excessive and + unnecessary load on both network and servers. + + + + +Internet Engineering Task Force [Page 76] + + + + +RFC1123 SUPPORT SERVICES -- DOMAINS October 1989 + + + 6.1.3.3 Efficient Resource Usage + + The following requirements on servers and resolvers are very + important to the health of the Internet as a whole, + particularly when DNS services are invoked repeatedly by + higher level automatic servers, such as mailers. + + (1) The resolver MUST implement retransmission controls to + insure that it does not waste communication bandwidth, + and MUST impose finite bounds on the resources consumed + to respond to a single request. See [DNS:2] pages 43- + 44 for specific recommendations. + + (2) After a query has been retransmitted several times + without a response, an implementation MUST give up and + return a soft error to the application. + + (3) All DNS name servers and resolvers SHOULD cache + temporary failures, with a timeout period of the order + of minutes. + + DISCUSSION: + This will prevent applications that immediately + retry soft failures (in violation of Section 2.2 + of this document) from generating excessive DNS + traffic. + + (4) All DNS name servers and resolvers SHOULD cache + negative responses that indicate the specified name, or + data of the specified type, does not exist, as + described in [DNS:2]. + + (5) When a DNS server or resolver retries a UDP query, the + retry interval SHOULD be constrained by an exponential + backoff algorithm, and SHOULD also have upper and lower + bounds. + + IMPLEMENTATION: + A measured RTT and variance (if available) should + be used to calculate an initial retransmission + interval. If this information is not available, a + default of no less than 5 seconds should be used. + Implementations may limit the retransmission + interval, but this limit must exceed twice the + Internet maximum segment lifetime plus service + delay at the name server. + + (6) When a resolver or server receives a Source Quench for + + + +Internet Engineering Task Force [Page 77] + + + + +RFC1123 SUPPORT SERVICES -- DOMAINS October 1989 + + + a query it has issued, it SHOULD take steps to reduce + the rate of querying that server in the near future. A + server MAY ignore a Source Quench that it receives as + the result of sending a response datagram. + + IMPLEMENTATION: + One recommended action to reduce the rate is to + send the next query attempt to an alternate + server, if there is one available. Another is to + backoff the retry interval for the same server. + + + 6.1.3.4 Multihomed Hosts + + When the host name-to-address function encounters a host + with multiple addresses, it SHOULD rank or sort the + addresses using knowledge of the immediately connected + network number(s) and any other applicable performance or + history information. + + DISCUSSION: + The different addresses of a multihomed host generally + imply different Internet paths, and some paths may be + preferable to others in performance, reliability, or + administrative restrictions. There is no general way + for the domain system to determine the best path. A + recommended approach is to base this decision on local + configuration information set by the system + administrator. + + IMPLEMENTATION: + The following scheme has been used successfully: + + (a) Incorporate into the host configuration data a + Network-Preference List, that is simply a list of + networks in preferred order. This list may be + empty if there is no preference. + + (b) When a host name is mapped into a list of IP + addresses, these addresses should be sorted by + network number, into the same order as the + corresponding networks in the Network-Preference + List. IP addresses whose networks do not appear + in the Network-Preference List should be placed at + the end of the list. + + + + + + +Internet Engineering Task Force [Page 78] + + + + +RFC1123 SUPPORT SERVICES -- DOMAINS October 1989 + + + 6.1.3.5 Extensibility + + DNS software MUST support all well-known, class-independent + formats [DNS:2], and SHOULD be written to minimize the + trauma associated with the introduction of new well-known + types and local experimentation with non-standard types. + + DISCUSSION: + The data types and classes used by the DNS are + extensible, and thus new types will be added and old + types deleted or redefined. Introduction of new data + types ought to be dependent only upon the rules for + compression of domain names inside DNS messages, and + the translation between printable (i.e., master file) + and internal formats for Resource Records (RRs). + + Compression relies on knowledge of the format of data + inside a particular RR. Hence compression must only be + used for the contents of well-known, class-independent + RRs, and must never be used for class-specific RRs or + RR types that are not well-known. The owner name of an + RR is always eligible for compression. + + A name server may acquire, via zone transfer, RRs that + the server doesn't know how to convert to printable + format. A resolver can receive similar information as + the result of queries. For proper operation, this data + must be preserved, and hence the implication is that + DNS software cannot use textual formats for internal + storage. + + The DNS defines domain name syntax very generally -- a + string of labels each containing up to 63 8-bit octets, + separated by dots, and with a maximum total of 255 + octets. Particular applications of the DNS are + permitted to further constrain the syntax of the domain + names they use, although the DNS deployment has led to + some applications allowing more general names. In + particular, Section 2.1 of this document liberalizes + slightly the syntax of a legal Internet host name that + was defined in RFC-952 [DNS:4]. + + 6.1.3.6 Status of RR Types + + Name servers MUST be able to load all RR types except MD and + MF from configuration files. The MD and MF types are + obsolete and MUST NOT be implemented; in particular, name + servers MUST NOT load these types from configuration files. + + + +Internet Engineering Task Force [Page 79] + + + + +RFC1123 SUPPORT SERVICES -- DOMAINS October 1989 + + + DISCUSSION: + The RR types MB, MG, MR, NULL, MINFO and RP are + considered experimental, and applications that use the + DNS cannot expect these RR types to be supported by + most domains. Furthermore these types are subject to + redefinition. + + The TXT and WKS RR types have not been widely used by + Internet sites; as a result, an application cannot rely + on the the existence of a TXT or WKS RR in most + domains. + + 6.1.3.7 Robustness + + DNS software may need to operate in environments where the + root servers or other servers are unavailable due to network + connectivity or other problems. In this situation, DNS name + servers and resolvers MUST continue to provide service for + the reachable part of the name space, while giving temporary + failures for the rest. + + DISCUSSION: + Although the DNS is meant to be used primarily in the + connected Internet, it should be possible to use the + system in networks which are unconnected to the + Internet. Hence implementations must not depend on + access to root servers before providing service for + local names. + + 6.1.3.8 Local Host Table + + DISCUSSION: + A host may use a local host table as a backup or + supplement to the DNS. This raises the question of + which takes precedence, the DNS or the host table; the + most flexible approach would make this a configuration + option. + + Typically, the contents of such a supplementary host + table will be determined locally by the site. However, + a publically-available table of Internet hosts is + maintained by the DDN Network Information Center (DDN + NIC), with a format documented in [DNS:4]. This table + can be retrieved from the DDN NIC using a protocol + described in [DNS:5]. It must be noted that this table + contains only a small fraction of all Internet hosts. + Hosts using this protocol to retrieve the DDN NIC host + table should use the VERSION command to check if the + + + +Internet Engineering Task Force [Page 80] + + + + +RFC1123 SUPPORT SERVICES -- DOMAINS October 1989 + + + table has changed before requesting the entire table + with the ALL command. The VERSION identifier should be + treated as an arbitrary string and tested only for + equality; no numerical sequence may be assumed. + + The DDN NIC host table includes administrative + information that is not needed for host operation and + is therefore not currently included in the DNS + database; examples include network and gateway entries. + However, much of this additional information will be + added to the DNS in the future. Conversely, the DNS + provides essential services (in particular, MX records) + that are not available from the DDN NIC host table. + + 6.1.4 DNS USER INTERFACE + + 6.1.4.1 DNS Administration + + This document is concerned with design and implementation + issues in host software, not with administrative or + operational issues. However, administrative issues are of + particular importance in the DNS, since errors in particular + segments of this large distributed database can cause poor + or erroneous performance for many sites. These issues are + discussed in [DNS:6] and [DNS:7]. + + 6.1.4.2 DNS User Interface + + Hosts MUST provide an interface to the DNS for all + application programs running on the host. This interface + will typically direct requests to a system process to + perform the resolver function [DNS:1, 6.1:2]. + + At a minimum, the basic interface MUST support a request for + all information of a specific type and class associated with + a specific name, and it MUST return either all of the + requested information, a hard error code, or a soft error + indication. When there is no error, the basic interface + returns the complete response information without + modification, deletion, or ordering, so that the basic + interface will not need to be changed to accommodate new + data types. + + DISCUSSION: + The soft error indication is an essential part of the + interface, since it may not always be possible to + access particular information from the DNS; see Section + 6.1.3.3. + + + +Internet Engineering Task Force [Page 81] + + + + +RFC1123 SUPPORT SERVICES -- DOMAINS October 1989 + + + A host MAY provide other DNS interfaces tailored to + particular functions, transforming the raw domain data into + formats more suited to these functions. In particular, a + host MUST provide a DNS interface to facilitate translation + between host addresses and host names. + + 6.1.4.3 Interface Abbreviation Facilities + + User interfaces MAY provide a method for users to enter + abbreviations for commonly-used names. Although the + definition of such methods is outside of the scope of the + DNS specification, certain rules are necessary to insure + that these methods allow access to the entire DNS name space + and to prevent excessive use of Internet resources. + + If an abbreviation method is provided, then: + + (a) There MUST be some convention for denoting that a name + is already complete, so that the abbreviation method(s) + are suppressed. A trailing dot is the usual method. + + (b) Abbreviation expansion MUST be done exactly once, and + MUST be done in the context in which the name was + entered. + + + DISCUSSION: + For example, if an abbreviation is used in a mail + program for a destination, the abbreviation should be + expanded into a full domain name and stored in the + queued message with an indication that it is already + complete. Otherwise, the abbreviation might be + expanded with a mail system search list, not the + user's, or a name could grow due to repeated + canonicalizations attempts interacting with wildcards. + + The two most common abbreviation methods are: + + (1) Interface-level aliases + + Interface-level aliases are conceptually implemented as + a list of alias/domain name pairs. The list can be + per-user or per-host, and separate lists can be + associated with different functions, e.g. one list for + host name-to-address translation, and a different list + for mail domains. When the user enters a name, the + interface attempts to match the name to the alias + component of a list entry, and if a matching entry can + + + +Internet Engineering Task Force [Page 82] + + + + +RFC1123 SUPPORT SERVICES -- DOMAINS October 1989 + + + be found, the name is replaced by the domain name found + in the pair. + + Note that interface-level aliases and CNAMEs are + completely separate mechanisms; interface-level aliases + are a local matter while CNAMEs are an Internet-wide + aliasing mechanism which is a required part of any DNS + implementation. + + (2) Search Lists + + A search list is conceptually implemented as an ordered + list of domain names. When the user enters a name, the + domain names in the search list are used as suffixes to + the user-supplied name, one by one, until a domain name + with the desired associated data is found, or the + search list is exhausted. Search lists often contain + the name of the local host's parent domain or other + ancestor domains. Search lists are often per-user or + per-process. + + It SHOULD be possible for an administrator to disable a + DNS search-list facility. Administrative denial may be + warranted in some cases, to prevent abuse of the DNS. + + There is danger that a search-list mechanism will + generate excessive queries to the root servers while + testing whether user input is a complete domain name, + lacking a final period to mark it as complete. A + search-list mechanism MUST have one of, and SHOULD have + both of, the following two provisions to prevent this: + + (a) The local resolver/name server can implement + caching of negative responses (see Section + 6.1.3.3). + + (b) The search list expander can require two or more + interior dots in a generated domain name before it + tries using the name in a query to non-local + domain servers, such as the root. + + DISCUSSION: + The intent of this requirement is to avoid + excessive delay for the user as the search list is + tested, and more importantly to prevent excessive + traffic to the root and other high-level servers. + For example, if the user supplied a name "X" and + the search list contained the root as a component, + + + +Internet Engineering Task Force [Page 83] + + + + +RFC1123 SUPPORT SERVICES -- DOMAINS October 1989 + + + a query would have to consult a root server before + the next search list alternative could be tried. + The resulting load seen by the root servers and + gateways near the root would be multiplied by the + number of hosts in the Internet. + + The negative caching alternative limits the effect + to the first time a name is used. The interior + dot rule is simpler to implement but can prevent + easy use of some top-level names. + + + 6.1.5 DOMAIN NAME SYSTEM REQUIREMENTS SUMMARY + + | | | | |S| | + | | | | |H| |F + | | | | |O|M|o + | | |S| |U|U|o + | | |H| |L|S|t + | |M|O| |D|T|n + | |U|U|M| | |o + | |S|L|A|N|N|t + | |T|D|Y|O|O|t +FEATURE |SECTION | | | |T|T|e +-----------------------------------------------|-----------|-|-|-|-|-|-- +GENERAL ISSUES | | | | | | | + | | | | | | | +Implement DNS name-to-address conversion |6.1.1 |x| | | | | +Implement DNS address-to-name conversion |6.1.1 |x| | | | | +Support conversions using host table |6.1.1 | | |x| | | +Properly handle RR with zero TTL |6.1.2.1 |x| | | | | +Use QCLASS=* unnecessarily |6.1.2.2 | |x| | | | + Use QCLASS=IN for Internet class |6.1.2.2 |x| | | | | +Unused fields zero |6.1.2.3 |x| | | | | +Use compression in responses |6.1.2.4 |x| | | | | + | | | | | | | +Include config info in responses |6.1.2.5 | | | | |x| +Support all well-known, class-indep. types |6.1.3.5 |x| | | | | +Easily expand type list |6.1.3.5 | |x| | | | +Load all RR types (except MD and MF) |6.1.3.6 |x| | | | | +Load MD or MF type |6.1.3.6 | | | | |x| +Operate when root servers, etc. unavailable |6.1.3.7 |x| | | | | +-----------------------------------------------|-----------|-|-|-|-|-|-- +RESOLVER ISSUES: | | | | | | | + | | | | | | | +Resolver support multiple concurrent requests |6.1.3.1 | |x| | | | +Full-service resolver: |6.1.3.1 | | |x| | | + Local caching |6.1.3.1 |x| | | | | + + + +Internet Engineering Task Force [Page 84] + + + + +RFC1123 SUPPORT SERVICES -- DOMAINS October 1989 + + + Information in local cache times out |6.1.3.1 |x| | | | | + Configurable with starting info |6.1.3.1 | |x| | | | +Stub resolver: |6.1.3.1 | | |x| | | + Use redundant recursive name servers |6.1.3.1 |x| | | | | + Local caching |6.1.3.1 | | |x| | | + Information in local cache times out |6.1.3.1 |x| | | | | +Support for remote multi-homed hosts: | | | | | | | + Sort multiple addresses by preference list |6.1.3.4 | |x| | | | + | | | | | | | +-----------------------------------------------|-----------|-|-|-|-|-|-- +TRANSPORT PROTOCOLS: | | | | | | | + | | | | | | | +Support UDP queries |6.1.3.2 |x| | | | | +Support TCP queries |6.1.3.2 | |x| | | | + Send query using UDP first |6.1.3.2 |x| | | | |1 + Try TCP if UDP answers are truncated |6.1.3.2 | |x| | | | +Name server limit TCP query resources |6.1.3.2 | | |x| | | + Punish unnecessary TCP query |6.1.3.2 | | | |x| | +Use truncated data as if it were not |6.1.3.2 | | | | |x| +Private agreement to use only TCP |6.1.3.2 | | |x| | | +Use TCP for zone transfers |6.1.3.2 |x| | | | | +TCP usage not block UDP queries |6.1.3.2 |x| | | | | +Support broadcast or multicast queries |6.1.3.2 | | |x| | | + RD bit set in query |6.1.3.2 | | | | |x| + RD bit ignored by server is b'cast/m'cast |6.1.3.2 |x| | | | | + Send only as occasional probe for addr's |6.1.3.2 | |x| | | | +-----------------------------------------------|-----------|-|-|-|-|-|-- +RESOURCE USAGE: | | | | | | | + | | | | | | | +Transmission controls, per [DNS:2] |6.1.3.3 |x| | | | | + Finite bounds per request |6.1.3.3 |x| | | | | +Failure after retries => soft error |6.1.3.3 |x| | | | | +Cache temporary failures |6.1.3.3 | |x| | | | +Cache negative responses |6.1.3.3 | |x| | | | +Retries use exponential backoff |6.1.3.3 | |x| | | | + Upper, lower bounds |6.1.3.3 | |x| | | | +Client handle Source Quench |6.1.3.3 | |x| | | | +Server ignore Source Quench |6.1.3.3 | | |x| | | +-----------------------------------------------|-----------|-|-|-|-|-|-- +USER INTERFACE: | | | | | | | + | | | | | | | +All programs have access to DNS interface |6.1.4.2 |x| | | | | +Able to request all info for given name |6.1.4.2 |x| | | | | +Returns complete info or error |6.1.4.2 |x| | | | | +Special interfaces |6.1.4.2 | | |x| | | + Name<->Address translation |6.1.4.2 |x| | | | | + | | | | | | | +Abbreviation Facilities: |6.1.4.3 | | |x| | | + + + +Internet Engineering Task Force [Page 85] + + + + +RFC1123 SUPPORT SERVICES -- DOMAINS October 1989 + + + Convention for complete names |6.1.4.3 |x| | | | | + Conversion exactly once |6.1.4.3 |x| | | | | + Conversion in proper context |6.1.4.3 |x| | | | | + Search list: |6.1.4.3 | | |x| | | + Administrator can disable |6.1.4.3 | |x| | | | + Prevention of excessive root queries |6.1.4.3 |x| | | | | + Both methods |6.1.4.3 | |x| | | | +-----------------------------------------------|-----------|-|-|-|-|-|-- +-----------------------------------------------|-----------|-|-|-|-|-|-- + +1. Unless there is private agreement between particular resolver and + particular server. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Internet Engineering Task Force [Page 86] + + + + +RFC1123 SUPPORT SERVICES -- INITIALIZATION October 1989 + + + 6.2 HOST INITIALIZATION + + 6.2.1 INTRODUCTION + + This section discusses the initialization of host software + across a connected network, or more generally across an + Internet path. This is necessary for a diskless host, and may + optionally be used for a host with disk drives. For a diskless + host, the initialization process is called "network booting" + and is controlled by a bootstrap program located in a boot ROM. + + To initialize a diskless host across the network, there are two + distinct phases: + + (1) Configure the IP layer. + + Diskless machines often have no permanent storage in which + to store network configuration information, so that + sufficient configuration information must be obtained + dynamically to support the loading phase that follows. + This information must include at least the IP addresses of + the host and of the boot server. To support booting + across a gateway, the address mask and a list of default + gateways are also required. + + (2) Load the host system code. + + During the loading phase, an appropriate file transfer + protocol is used to copy the system code across the + network from the boot server. + + A host with a disk may perform the first step, dynamic + configuration. This is important for microcomputers, whose + floppy disks allow network configuration information to be + mistakenly duplicated on more than one host. Also, + installation of new hosts is much simpler if they automatically + obtain their configuration information from a central server, + saving administrator time and decreasing the probability of + mistakes. + + 6.2.2 REQUIREMENTS + + 6.2.2.1 Dynamic Configuration + + A number of protocol provisions have been made for dynamic + configuration. + + o ICMP Information Request/Reply messages + + + +Internet Engineering Task Force [Page 87] + + + + +RFC1123 SUPPORT SERVICES -- INITIALIZATION October 1989 + + + This obsolete message pair was designed to allow a host + to find the number of the network it is on. + Unfortunately, it was useful only if the host already + knew the host number part of its IP address, + information that hosts requiring dynamic configuration + seldom had. + + o Reverse Address Resolution Protocol (RARP) [BOOT:4] + + RARP is a link-layer protocol for a broadcast medium + that allows a host to find its IP address given its + link layer address. Unfortunately, RARP does not work + across IP gateways and therefore requires a RARP server + on every network. In addition, RARP does not provide + any other configuration information. + + o ICMP Address Mask Request/Reply messages + + These ICMP messages allow a host to learn the address + mask for a particular network interface. + + o BOOTP Protocol [BOOT:2] + + This protocol allows a host to determine the IP + addresses of the local host and the boot server, the + name of an appropriate boot file, and optionally the + address mask and list of default gateways. To locate a + BOOTP server, the host broadcasts a BOOTP request using + UDP. Ad hoc gateway extensions have been used to + transmit the BOOTP broadcast through gateways, and in + the future the IP Multicasting facility will provide a + standard mechanism for this purpose. + + + The suggested approach to dynamic configuration is to use + the BOOTP protocol with the extensions defined in "BOOTP + Vendor Information Extensions" RFC-1084 [BOOT:3]. RFC-1084 + defines some important general (not vendor-specific) + extensions. In particular, these extensions allow the + address mask to be supplied in BOOTP; we RECOMMEND that the + address mask be supplied in this manner. + + DISCUSSION: + Historically, subnetting was defined long after IP, and + so a separate mechanism (ICMP Address Mask messages) + was designed to supply the address mask to a host. + However, the IP address mask and the corresponding IP + address conceptually form a pair, and for operational + + + +Internet Engineering Task Force [Page 88] + + + + +RFC1123 SUPPORT SERVICES -- INITIALIZATION October 1989 + + + simplicity they ought to be defined at the same time + and by the same mechanism, whether a configuration file + or a dynamic mechanism like BOOTP. + + Note that BOOTP is not sufficiently general to specify + the configurations of all interfaces of a multihomed + host. A multihomed host must either use BOOTP + separately for each interface, or configure one + interface using BOOTP to perform the loading, and + perform the complete initialization from a file later. + + Application layer configuration information is expected + to be obtained from files after loading of the system + code. + + 6.2.2.2 Loading Phase + + A suggested approach for the loading phase is to use TFTP + [BOOT:1] between the IP addresses established by BOOTP. + + TFTP to a broadcast address SHOULD NOT be used, for reasons + explained in Section 4.2.3.4. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Internet Engineering Task Force [Page 89] + + + + +RFC1123 SUPPORT SERVICES -- MANAGEMENT October 1989 + + + 6.3 REMOTE MANAGEMENT + + 6.3.1 INTRODUCTION + + The Internet community has recently put considerable effort + into the development of network management protocols. The + result has been a two-pronged approach [MGT:1, MGT:6]: the + Simple Network Management Protocol (SNMP) [MGT:4] and the + Common Management Information Protocol over TCP (CMOT) [MGT:5]. + + In order to be managed using SNMP or CMOT, a host will need to + implement an appropriate management agent. An Internet host + SHOULD include an agent for either SNMP or CMOT. + + Both SNMP and CMOT operate on a Management Information Base + (MIB) that defines a collection of management values. By + reading and setting these values, a remote application may + query and change the state of the managed system. + + A standard MIB [MGT:3] has been defined for use by both + management protocols, using data types defined by the Structure + of Management Information (SMI) defined in [MGT:2]. Additional + MIB variables can be introduced under the "enterprises" and + "experimental" subtrees of the MIB naming space [MGT:2]. + + Every protocol module in the host SHOULD implement the relevant + MIB variables. A host SHOULD implement the MIB variables as + defined in the most recent standard MIB, and MAY implement + other MIB variables when appropriate and useful. + + 6.3.2 PROTOCOL WALK-THROUGH + + The MIB is intended to cover both hosts and gateways, although + there may be detailed differences in MIB application to the two + cases. This section contains the appropriate interpretation of + the MIB for hosts. It is likely that later versions of the MIB + will include more entries for host management. + + A managed host must implement the following groups of MIB + object definitions: System, Interfaces, Address Translation, + IP, ICMP, TCP, and UDP. + + The following specific interpretations apply to hosts: + + o ipInHdrErrors + + Note that the error "time-to-live exceeded" can occur in a + host only when it is forwarding a source-routed datagram. + + + +Internet Engineering Task Force [Page 90] + + + + +RFC1123 SUPPORT SERVICES -- MANAGEMENT October 1989 + + + o ipOutNoRoutes + + This object counts datagrams discarded because no route + can be found. This may happen in a host if all the + default gateways in the host's configuration are down. + + o ipFragOKs, ipFragFails, ipFragCreates + + A host that does not implement intentional fragmentation + (see "Fragmentation" section of [INTRO:1]) MUST return the + value zero for these three objects. + + o icmpOutRedirects + + For a host, this object MUST always be zero, since hosts + do not send Redirects. + + o icmpOutAddrMaskReps + + For a host, this object MUST always be zero, unless the + host is an authoritative source of address mask + information. + + o ipAddrTable + + For a host, the "IP Address Table" object is effectively a + table of logical interfaces. + + o ipRoutingTable + + For a host, the "IP Routing Table" object is effectively a + combination of the host's Routing Cache and the static + route table described in "Routing Outbound Datagrams" + section of [INTRO:1]. + + Within each ipRouteEntry, ipRouteMetric1...4 normally will + have no meaning for a host and SHOULD always be -1, while + ipRouteType will normally have the value "remote". + + If destinations on the connected network do not appear in + the Route Cache (see "Routing Outbound Datagrams section + of [INTRO:1]), there will be no entries with ipRouteType + of "direct". + + + DISCUSSION: + The current MIB does not include Type-of-Service in an + ipRouteEntry, but a future revision is expected to make + + + +Internet Engineering Task Force [Page 91] + + + + +RFC1123 SUPPORT SERVICES -- MANAGEMENT October 1989 + + + this addition. + + We also expect the MIB to be expanded to allow the remote + management of applications (e.g., the ability to partially + reconfigure mail systems). Network service applications + such as mail systems should therefore be written with the + "hooks" for remote management. + + 6.3.3 MANAGEMENT REQUIREMENTS SUMMARY + + | | | | |S| | + | | | | |H| |F + | | | | |O|M|o + | | |S| |U|U|o + | | |H| |L|S|t + | |M|O| |D|T|n + | |U|U|M| | |o + | |S|L|A|N|N|t + | |T|D|Y|O|O|t +FEATURE |SECTION | | | |T|T|e +-----------------------------------------------|-----------|-|-|-|-|-|-- +Support SNMP or CMOT agent |6.3.1 | |x| | | | +Implement specified objects in standard MIB |6.3.1 | |x| | | | + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Internet Engineering Task Force [Page 92] + + + + +RFC1123 SUPPORT SERVICES -- MANAGEMENT October 1989 + + +7. REFERENCES + + This section lists the primary references with which every + implementer must be thoroughly familiar. It also lists some + secondary references that are suggested additional reading. + + INTRODUCTORY REFERENCES: + + + [INTRO:1] "Requirements for Internet Hosts -- Communication Layers," + IETF Host Requirements Working Group, R. Braden, Ed., RFC-1122, + October 1989. + + [INTRO:2] "DDN Protocol Handbook," NIC-50004, NIC-50005, NIC-50006, + (three volumes), SRI International, December 1985. + + [INTRO:3] "Official Internet Protocols," J. Reynolds and J. Postel, + RFC-1011, May 1987. + + This document is republished periodically with new RFC numbers; + the latest version must be used. + + [INTRO:4] "Protocol Document Order Information," O. Jacobsen and J. + Postel, RFC-980, March 1986. + + [INTRO:5] "Assigned Numbers," J. Reynolds and J. Postel, RFC-1010, + May 1987. + + This document is republished periodically with new RFC numbers; + the latest version must be used. + + + TELNET REFERENCES: + + + [TELNET:1] "Telnet Protocol Specification," J. Postel and J. + Reynolds, RFC-854, May 1983. + + [TELNET:2] "Telnet Option Specification," J. Postel and J. Reynolds, + RFC-855, May 1983. + + [TELNET:3] "Telnet Binary Transmission," J. Postel and J. Reynolds, + RFC-856, May 1983. + + [TELNET:4] "Telnet Echo Option," J. Postel and J. Reynolds, RFC-857, + May 1983. + + [TELNET:5] "Telnet Suppress Go Ahead Option," J. Postel and J. + + + +Internet Engineering Task Force [Page 93] + + + + +RFC1123 SUPPORT SERVICES -- MANAGEMENT October 1989 + + + Reynolds, RFC-858, May 1983. + + [TELNET:6] "Telnet Status Option," J. Postel and J. Reynolds, RFC- + 859, May 1983. + + [TELNET:7] "Telnet Timing Mark Option," J. Postel and J. Reynolds, + RFC-860, May 1983. + + [TELNET:8] "Telnet Extended Options List," J. Postel and J. + Reynolds, RFC-861, May 1983. + + [TELNET:9] "Telnet End-Of-Record Option," J. Postel, RFC-855, + December 1983. + + [TELNET:10] "Telnet Terminal-Type Option," J. VanBokkelen, RFC-1091, + February 1989. + + This document supercedes RFC-930. + + [TELNET:11] "Telnet Window Size Option," D. Waitzman, RFC-1073, + October 1988. + + [TELNET:12] "Telnet Linemode Option," D. Borman, RFC-1116, August + 1989. + + [TELNET:13] "Telnet Terminal Speed Option," C. Hedrick, RFC-1079, + December 1988. + + [TELNET:14] "Telnet Remote Flow Control Option," C. Hedrick, RFC- + 1080, November 1988. + + + SECONDARY TELNET REFERENCES: + + + [TELNET:15] "Telnet Protocol," MIL-STD-1782, U.S. Department of + Defense, May 1984. + + This document is intended to describe the same protocol as RFC- + 854. In case of conflict, RFC-854 takes precedence, and the + present document takes precedence over both. + + [TELNET:16] "SUPDUP Protocol," M. Crispin, RFC-734, October 1977. + + [TELNET:17] "Telnet SUPDUP Option," M. Crispin, RFC-736, October + 1977. + + [TELNET:18] "Data Entry Terminal Option," J. Day, RFC-732, June 1977. + + + +Internet Engineering Task Force [Page 94] + + + + +RFC1123 SUPPORT SERVICES -- MANAGEMENT October 1989 + + + [TELNET:19] "TELNET Data Entry Terminal option -- DODIIS + Implementation," A. Yasuda and T. Thompson, RFC-1043, February + 1988. + + + FTP REFERENCES: + + + [FTP:1] "File Transfer Protocol," J. Postel and J. Reynolds, RFC- + 959, October 1985. + + [FTP:2] "Document File Format Standards," J. Postel, RFC-678, + December 1974. + + [FTP:3] "File Transfer Protocol," MIL-STD-1780, U.S. Department of + Defense, May 1984. + + This document is based on an earlier version of the FTP + specification (RFC-765) and is obsolete. + + + TFTP REFERENCES: + + + [TFTP:1] "The TFTP Protocol Revision 2," K. Sollins, RFC-783, June + 1981. + + + MAIL REFERENCES: + + + [SMTP:1] "Simple Mail Transfer Protocol," J. Postel, RFC-821, August + 1982. + + [SMTP:2] "Standard For The Format of ARPA Internet Text Messages," + D. Crocker, RFC-822, August 1982. + + This document obsoleted an earlier specification, RFC-733. + + [SMTP:3] "Mail Routing and the Domain System," C. Partridge, RFC- + 974, January 1986. + + This RFC describes the use of MX records, a mandatory extension + to the mail delivery process. + + [SMTP:4] "Duplicate Messages and SMTP," C. Partridge, RFC-1047, + February 1988. + + + + +Internet Engineering Task Force [Page 95] + + + + +RFC1123 SUPPORT SERVICES -- MANAGEMENT October 1989 + + + [SMTP:5a] "Mapping between X.400 and RFC 822," S. Kille, RFC-987, + June 1986. + + [SMTP:5b] "Addendum to RFC-987," S. Kille, RFC-???, September 1987. + + The two preceding RFC's define a proposed standard for + gatewaying mail between the Internet and the X.400 environments. + + [SMTP:6] "Simple Mail Transfer Protocol," MIL-STD-1781, U.S. + Department of Defense, May 1984. + + This specification is intended to describe the same protocol as + does RFC-821. However, MIL-STD-1781 is incomplete; in + particular, it does not include MX records [SMTP:3]. + + [SMTP:7] "A Content-Type Field for Internet Messages," M. Sirbu, + RFC-1049, March 1988. + + + DOMAIN NAME SYSTEM REFERENCES: + + + [DNS:1] "Domain Names - Concepts and Facilities," P. Mockapetris, + RFC-1034, November 1987. + + This document and the following one obsolete RFC-882, RFC-883, + and RFC-973. + + [DNS:2] "Domain Names - Implementation and Specification," RFC-1035, + P. Mockapetris, November 1987. + + + [DNS:3] "Mail Routing and the Domain System," C. Partridge, RFC-974, + January 1986. + + + [DNS:4] "DoD Internet Host Table Specification," K. Harrenstein, + RFC-952, M. Stahl, E. Feinler, October 1985. + + SECONDARY DNS REFERENCES: + + + [DNS:5] "Hostname Server," K. Harrenstein, M. Stahl, E. Feinler, + RFC-953, October 1985. + + [DNS:6] "Domain Administrators Guide," M. Stahl, RFC-1032, November + 1987. + + + + +Internet Engineering Task Force [Page 96] + + + + +RFC1123 SUPPORT SERVICES -- MANAGEMENT October 1989 + + + [DNS:7] "Domain Administrators Operations Guide," M. Lottor, RFC- + 1033, November 1987. + + [DNS:8] "The Domain Name System Handbook," Vol. 4 of Internet + Protocol Handbook, NIC 50007, SRI Network Information Center, + August 1989. + + + SYSTEM INITIALIZATION REFERENCES: + + + [BOOT:1] "Bootstrap Loading Using TFTP," R. Finlayson, RFC-906, June + 1984. + + [BOOT:2] "Bootstrap Protocol (BOOTP)," W. Croft and J. Gilmore, RFC- + 951, September 1985. + + [BOOT:3] "BOOTP Vendor Information Extensions," J. Reynolds, RFC- + 1084, December 1988. + + Note: this RFC revised and obsoleted RFC-1048. + + [BOOT:4] "A Reverse Address Resolution Protocol," R. Finlayson, T. + Mann, J. Mogul, and M. Theimer, RFC-903, June 1984. + + + MANAGEMENT REFERENCES: + + + [MGT:1] "IAB Recommendations for the Development of Internet Network + Management Standards," V. Cerf, RFC-1052, April 1988. + + [MGT:2] "Structure and Identification of Management Information for + TCP/IP-based internets," M. Rose and K. McCloghrie, RFC-1065, + August 1988. + + [MGT:3] "Management Information Base for Network Management of + TCP/IP-based internets," M. Rose and K. McCloghrie, RFC-1066, + August 1988. + + [MGT:4] "A Simple Network Management Protocol," J. Case, M. Fedor, + M. Schoffstall, and C. Davin, RFC-1098, April 1989. + + [MGT:5] "The Common Management Information Services and Protocol + over TCP/IP," U. Warrier and L. Besaw, RFC-1095, April 1989. + + [MGT:6] "Report of the Second Ad Hoc Network Management Review + Group," V. Cerf, RFC-1109, August 1989. + + + +Internet Engineering Task Force [Page 97] + + + + +RFC1123 SUPPORT SERVICES -- MANAGEMENT October 1989 + + +Security Considerations + + There are many security issues in the application and support + programs of host software, but a full discussion is beyond the scope + of this RFC. Security-related issues are mentioned in sections + concerning TFTP (Sections 4.2.1, 4.2.3.4, 4.2.3.5), the SMTP VRFY and + EXPN commands (Section 5.2.3), the SMTP HELO command (5.2.5), and the + SMTP DATA command (Section 5.2.8). + +Author's Address + + Robert Braden + USC/Information Sciences Institute + 4676 Admiralty Way + Marina del Rey, CA 90292-6695 + + Phone: (213) 822 1511 + + EMail: Braden@ISI.EDU + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Internet Engineering Task Force [Page 98] + diff --git a/doc/rfc/rfc1183.txt b/doc/rfc/rfc1183.txt new file mode 100644 index 00000000..6f080448 --- /dev/null +++ b/doc/rfc/rfc1183.txt @@ -0,0 +1,619 @@ + + + + + + +Network Working Group C. Everhart +Request for Comments: 1183 Transarc +Updates: RFCs 1034, 1035 L. Mamakos + University of Maryland + R. Ullmann + Prime Computer + P. Mockapetris, Editor + ISI + October 1990 + + + New DNS RR Definitions + +Status of this Memo + + This memo defines five new DNS types for experimental purposes. This + RFC describes an Experimental Protocol for the Internet community, + and requests discussion and suggestions for improvements. + Distribution of this memo is unlimited. + +Table of Contents + + Introduction.................................................... 1 + 1. AFS Data Base location....................................... 2 + 2. Responsible Person........................................... 3 + 2.1. Identification of the guilty party......................... 3 + 2.2. The Responsible Person RR.................................. 4 + 3. X.25 and ISDN addresses, Route Binding....................... 6 + 3.1. The X25 RR................................................. 6 + 3.2. The ISDN RR................................................ 7 + 3.3. The Route Through RR....................................... 8 + REFERENCES and BIBLIOGRAPHY..................................... 9 + Security Considerations......................................... 10 + Authors' Addresses.............................................. 11 + +Introduction + + This RFC defines the format of new Resource Records (RRs) for the + Domain Name System (DNS), and reserves corresponding DNS type + mnemonics and numerical codes. The definitions are in three + independent sections: (1) location of AFS database servers, (2) + location of responsible persons, and (3) representation of X.25 and + ISDN addresses and route binding. All are experimental. + + This RFC assumes that the reader is familiar with the DNS [3,4]. The + data shown is for pedagogical use and does not necessarily reflect + the real Internet. + + + + +Everhart, Mamakos, Ullmann & Mockapetris [Page 1] + +RFC 1183 New DNS RR Definitions October 1990 + + +1. AFS Data Base location + + This section defines an extension of the DNS to locate servers both + for AFS (AFS is a registered trademark of Transarc Corporation) and + for the Open Software Foundation's (OSF) Distributed Computing + Environment (DCE) authenticated naming system using HP/Apollo's NCA, + both to be components of the OSF DCE. The discussion assumes that + the reader is familiar with AFS [5] and NCA [6]. + + The AFS (originally the Andrew File System) system uses the DNS to + map from a domain name to the name of an AFS cell database server. + The DCE Naming service uses the DNS for a similar function: mapping + from the domain name of a cell to authenticated name servers for that + cell. The method uses a new RR type with mnemonic AFSDB and type + code of 18 (decimal). + + AFSDB has the following format: + + <owner> <ttl> <class> AFSDB <subtype> <hostname> + + Both RDATA fields are required in all AFSDB RRs. The <subtype> field + is a 16 bit integer. The <hostname> field is a domain name of a host + that has a server for the cell named by the owner name of the RR. + + The format of the AFSDB RR is class insensitive. AFSDB records cause + type A additional section processing for <hostname>. This, in fact, + is the rationale for using a new type code, rather than trying to + build the same functionality with TXT RRs. + + Note that the format of AFSDB in a master file is identical to MX. + For purposes of the DNS itself, the subtype is merely an integer. + The present subtype semantics are discussed below, but changes are + possible and will be announced in subsequent RFCs. + + In the case of subtype 1, the host has an AFS version 3.0 Volume + Location Server for the named AFS cell. In the case of subtype 2, + the host has an authenticated name server holding the cell-root + directory node for the named DCE/NCA cell. + + The use of subtypes is motivated by two considerations. First, the + space of DNS RR types is limited. Second, the services provided are + sufficiently distinct that it would continue to be confusing for a + client to attempt to connect to a cell's servers using the protocol + for one service, if the cell offered only the other service. + + As an example of the use of this RR, suppose that the Toaster + Corporation has deployed AFS 3.0 but not (yet) the OSF's DCE. Their + cell, named toaster.com, has three "AFS 3.0 cell database server" + + + +Everhart, Mamakos, Ullmann & Mockapetris [Page 2] + +RFC 1183 New DNS RR Definitions October 1990 + + + machines: bigbird.toaster.com, ernie.toaster.com, and + henson.toaster.com. These three machines would be listed in three + AFSDB RRs. These might appear in a master file as: + + toaster.com. AFSDB 1 bigbird.toaster.com. + toaster.com. AFSDB 1 ernie.toaster.com. + toaster.com. AFSDB 1 henson.toaster.com. + + As another example use of this RR, suppose that Femto College (domain + name femto.edu) has deployed DCE, and that their DCE cell root + directory is served by processes running on green.femto.edu and + turquoise.femto.edu. Furthermore, their DCE file servers also run + AFS 3.0-compatible volume location servers, on the hosts + turquoise.femto.edu and orange.femto.edu. These machines would be + listed in four AFSDB RRs, which might appear in a master file as: + + femto.edu. AFSDB 2 green.femto.edu. + femto.edu. AFSDB 2 turquoise.femto.edu. + femto.edu. AFSDB 1 turquoise.femto.edu. + femto.edu. AFSDB 1 orange.femto.edu. + +2. Responsible Person + + The purpose of this section is to provide a standard method for + associating responsible person identification to any name in the DNS. + + The domain name system functions as a distributed database which + contains many different form of information. For a particular name + or host, you can discover it's Internet address, mail forwarding + information, hardware type and operating system among others. + + A key aspect of the DNS is that the tree-structured namespace can be + divided into pieces, called zones, for purposes of distributing + control and responsibility. The responsible person for zone database + purposes is named in the SOA RR for that zone. This section + describes an extension which allows different responsible persons to + be specified for different names in a zone. + +2.1. Identification of the guilty party + + Often it is desirable to be able to identify the responsible entity + for a particular host. When that host is down or malfunctioning, it + is difficult to contact those parties which might resolve or repair + the host. Mail sent to POSTMASTER may not reach the person in a + timely fashion. If the host is one of a multitude of workstations, + there may be no responsible person which can be contacted on that + host. + + + + +Everhart, Mamakos, Ullmann & Mockapetris [Page 3] + +RFC 1183 New DNS RR Definitions October 1990 + + + The POSTMASTER mailbox on that host continues to be a good contact + point for mail problems, and the zone contact in the SOA record for + database problem, but the RP record allows us to associate a mailbox + to entities that don't receive mail or are not directly connected + (namespace-wise) to the problem (e.g., GATEWAY.ISI.EDU might want to + point at HOTLINE@BBN.COM, and GATEWAY doesn't get mail, nor does the + ISI zone administrator have a clue about fixing gateways). + +2.2. The Responsible Person RR + + The method uses a new RR type with mnemonic RP and type code of 17 + (decimal). + + RP has the following format: + + <owner> <ttl> <class> RP <mbox-dname> <txt-dname> + + Both RDATA fields are required in all RP RRs. + + The first field, <mbox-dname>, is a domain name that specifies the + mailbox for the responsible person. Its format in master files uses + the DNS convention for mailbox encoding, identical to that used for + the RNAME mailbox field in the SOA RR. The root domain name (just + ".") may be specified for <mbox-dname> to indicate that no mailbox is + available. + + The second field, <txt-dname>, is a domain name for which TXT RR's + exist. A subsequent query can be performed to retrieve the + associated TXT resource records at <txt-dname>. This provides a + level of indirection so that the entity can be referred to from + multiple places in the DNS. The root domain name (just ".") may be + specified for <txt-dname> to indicate that the TXT_DNAME is absent, + and no associated TXT RR exists. + + The format of the RP RR is class insensitive. RP records cause no + additional section processing. (TXT additional section processing + for <txt-dname> is allowed as an option, but only if it is disabled + for the root, i.e., "."). + + The Responsible Person RR can be associated with any node in the + Domain Name System hierarchy, not just at the leaves of the tree. + + The TXT RR associated with the TXT_DNAME contain free format text + suitable for humans. Refer to [4] for more details on the TXT RR. + + Multiple RP records at a single name may be present in the database. + They should have identical TTLs. + + + + +Everhart, Mamakos, Ullmann & Mockapetris [Page 4] + +RFC 1183 New DNS RR Definitions October 1990 + + + EXAMPLES + + Some examples of how the RP record might be used. + + sayshell.umd.edu. A 128.8.1.14 + MX 10 sayshell.umd.edu. + HINFO NeXT UNIX + WKS 128.8.1.14 tcp ftp telnet smtp + RP louie.trantor.umd.edu. LAM1.people.umd.edu. + + LAM1.people.umd.edu. TXT ( + "Louis A. Mamakos, (301) 454-2946, don't call me at home!" ) + + In this example, the responsible person's mailbox for the host + SAYSHELL.UMD.EDU is louie@trantor.umd.edu. The TXT RR at + LAM1.people.umd.edu provides additional information and advice. + + TERP.UMD.EDU. A 128.8.10.90 + MX 10 128.8.10.90 + HINFO MICROVAX-II UNIX + WKS 128.8.10.90 udp domain + WKS 128.8.10.90 tcp ftp telnet smtp domain + RP louie.trantor.umd.edu. LAM1.people.umd.edu. + RP root.terp.umd.edu. ops.CS.UMD.EDU. + + TRANTOR.UMD.EDU. A 128.8.10.14 + MX 10 trantor.umd.edu. + HINFO MICROVAX-II UNIX + WKS 128.8.10.14 udp domain + WKS 128.8.10.14 tcp ftp telnet smtp domain + RP louie.trantor.umd.edu. LAM1.people.umd.edu. + RP petry.netwolf.umd.edu. petry.people.UMD.EDU. + RP root.trantor.umd.edu. ops.CS.UMD.EDU. + RP gregh.sunset.umd.edu. . + + LAM1.people.umd.edu. TXT "Louis A. Mamakos (301) 454-2946" + petry.people.umd.edu. TXT "Michael G. Petry (301) 454-2946" + ops.CS.UMD.EDU. TXT "CS Operations Staff (301) 454-2943" + + This set of resource records has two hosts, TRANTOR.UMD.EDU and + TERP.UMD.EDU, as well as a number of TXT RRs. Note that TERP.UMD.EDU + and TRANTOR.UMD.EDU both reference the same pair of TXT resource + records, although the mail box names (root.terp.umd.edu and + root.trantor.umd.edu) differ. + + Here, we obviously care much more if the machine flakes out, as we've + specified four persons which might want to be notified of problems or + other events involving TRANTOR.UMD.EDU. In this example, the last RP + + + +Everhart, Mamakos, Ullmann & Mockapetris [Page 5] + +RFC 1183 New DNS RR Definitions October 1990 + + + RR for TRANTOR.UMD.EDU specifies a mailbox (gregh.sunset.umd.edu), + but no associated TXT RR. + +3. X.25 and ISDN addresses, Route Binding + + This section describes an experimental representation of X.25 and + ISDN addresses in the DNS, as well as a route binding method, + analogous to the MX for mail routing, for very large scale networks. + + There are several possible uses, all experimental at this time. + First, the RRs provide simple documentation of the correct addresses + to use in static configurations of IP/X.25 [11] and SMTP/X.25 [12]. + + The RRs could also be used automatically by an internet network-layer + router, typically IP. The procedure would be to map IP address to + domain name, then name to canonical name if needed, then following RT + records, and finally attempting an IP/X.25 call to the address found. + Alternately, configured domain names could be resolved to identify IP + to X.25/ISDN bindings for a static but periodically refreshed routing + table. + + This provides a function similar to ARP for wide area non-broadcast + networks that will scale well to a network with hundreds of millions + of hosts. + + Also, a standard address binding reference will facilitate other + experiments in the use of X.25 and ISDN, especially in serious + inter-operability testing. The majority of work in such a test is + establishing the n-squared entries in static tables. + + Finally, the RRs are intended for use in a proposal [13] by one of + the authors for a possible next-generation internet. + +3.1. The X25 RR + + The X25 RR is defined with mnemonic X25 and type code 19 (decimal). + + X25 has the following format: + + <owner> <ttl> <class> X25 <PSDN-address> + + <PSDN-address> is required in all X25 RRs. + + <PSDN-address> identifies the PSDN (Public Switched Data Network) + address in the X.121 [10] numbering plan associated with <owner>. + Its format in master files is a <character-string> syntactically + identical to that used in TXT and HINFO. + + + + +Everhart, Mamakos, Ullmann & Mockapetris [Page 6] + +RFC 1183 New DNS RR Definitions October 1990 + + + The format of X25 is class insensitive. X25 RRs cause no additional + section processing. + + The <PSDN-address> is a string of decimal digits, beginning with the + 4 digit DNIC (Data Network Identification Code), as specified in + X.121. National prefixes (such as a 0) MUST NOT be used. + + For example: + + Relay.Prime.COM. X25 311061700956 + +3.2. The ISDN RR + + The ISDN RR is defined with mnemonic ISDN and type code 20 (decimal). + + An ISDN (Integrated Service Digital Network) number is simply a + telephone number. The intent of the members of the CCITT is to + upgrade all telephone and data network service to a common service. + + The numbering plan (E.163/E.164) is the same as the familiar + international plan for POTS (an un-official acronym, meaning Plain + Old Telephone Service). In E.166, CCITT says "An E.163/E.164 + telephony subscriber may become an ISDN subscriber without a number + change." + + ISDN has the following format: + + <owner> <ttl> <class> ISDN <ISDN-address> <sa> + + The <ISDN-address> field is required; <sa> is optional. + + <ISDN-address> identifies the ISDN number of <owner> and DDI (Direct + Dial In) if any, as defined by E.164 [8] and E.163 [7], the ISDN and + PSTN (Public Switched Telephone Network) numbering plan. E.163 + defines the country codes, and E.164 the form of the addresses. Its + format in master files is a <character-string> syntactically + identical to that used in TXT and HINFO. + + <sa> specifies the subaddress (SA). The format of <sa> in master + files is a <character-string> syntactically identical to that used in + TXT and HINFO. + + The format of ISDN is class insensitive. ISDN RRs cause no + additional section processing. + + The <ISDN-address> is a string of characters, normally decimal + digits, beginning with the E.163 country code and ending with the DDI + if any. Note that ISDN, in Q.931, permits any IA5 character in the + + + +Everhart, Mamakos, Ullmann & Mockapetris [Page 7] + +RFC 1183 New DNS RR Definitions October 1990 + + + general case. + + The <sa> is a string of hexadecimal digits. For digits 0-9, the + concrete encoding in the Q.931 call setup information element is + identical to BCD. + + For example: + + Relay.Prime.COM. IN ISDN 150862028003217 + sh.Prime.COM. IN ISDN 150862028003217 004 + + (Note: "1" is the country code for the North American Integrated + Numbering Area, i.e., the system of "area codes" familiar to people + in those countries.) + + The RR data is the ASCII representation of the digits. It is encoded + as one or two <character-string>s, i.e., count followed by + characters. + + CCITT recommendation E.166 [9] defines prefix escape codes for the + representation of ISDN (E.163/E.164) addresses in X.121, and PSDN + (X.121) addresses in E.164. It specifies that the exact codes are a + "national matter", i.e., different on different networks. A host + connected to the ISDN may be able to use both the X25 and ISDN + addresses, with the local prefix added. + +3.3. The Route Through RR + + The Route Through RR is defined with mnemonic RT and type code 21 + (decimal). + + The RT resource record provides a route-through binding for hosts + that do not have their own direct wide area network addresses. It is + used in much the same way as the MX RR. + + RT has the following format: + + <owner> <ttl> <class> RT <preference> <intermediate-host> + + Both RDATA fields are required in all RT RRs. + + The first field, <preference>, is a 16 bit integer, representing the + preference of the route. Smaller numbers indicate more preferred + routes. + + <intermediate-host> is the domain name of a host which will serve as + an intermediate in reaching the host specified by <owner>. The DNS + RRs associated with <intermediate-host> are expected to include at + + + +Everhart, Mamakos, Ullmann & Mockapetris [Page 8] + +RFC 1183 New DNS RR Definitions October 1990 + + + least one A, X25, or ISDN record. + + The format of the RT RR is class insensitive. RT records cause type + X25, ISDN, and A additional section processing for <intermediate- + host>. + + For example, + + sh.prime.com. IN RT 2 Relay.Prime.COM. + IN RT 10 NET.Prime.COM. + *.prime.com. IN RT 90 Relay.Prime.COM. + + When a host is looking up DNS records to attempt to route a datagram, + it first looks for RT records for the destination host, which point + to hosts with address records (A, X25, ISDN) compatible with the wide + area networks available to the host. If it is itself in the set of + RT records, it discards any RTs with preferences higher or equal to + its own. If there are no (remaining) RTs, it can then use address + records of the destination itself. + + Wild-card RTs are used exactly as are wild-card MXs. RT's do not + "chain"; that is, it is not valid to use the RT RRs found for a host + referred to by an RT. + + The concrete encoding is identical to the MX RR. + +REFERENCES and BIBLIOGRAPHY + + [1] Stahl, M., "Domain Administrators Guide", RFC 1032, Network + Information Center, SRI International, November 1987. + + [2] Lottor, M., "Domain Administrators Operations Guide", RFC 1033, + Network Information Center, SRI International, November, 1987. + + [3] Mockapetris, P., "Domain Names - Concepts and Facilities", RFC + 1034, USC/Information Sciences Institute, November 1987. + + [4] Mockapetris, P., "Domain Names - Implementation and + Specification", RFC 1035, USC/Information Sciences Institute, + November 1987. + + [5] Spector A., and M. Kazar, "Uniting File Systems", UNIX Review, + 7(3), pp. 61-69, March 1989. + + [6] Zahn, et al., "Network Computing Architecture", Prentice-Hall, + 1989. + + [7] International Telegraph and Telephone Consultative Committee, + + + +Everhart, Mamakos, Ullmann & Mockapetris [Page 9] + +RFC 1183 New DNS RR Definitions October 1990 + + + "Numbering Plan for the International Telephone Service", CCITT + Recommendations E.163., IXth Plenary Assembly, Melbourne, 1988, + Fascicle II.2 ("Blue Book"). + + [8] International Telegraph and Telephone Consultative Committee, + "Numbering Plan for the ISDN Era", CCITT Recommendations E.164., + IXth Plenary Assembly, Melbourne, 1988, Fascicle II.2 ("Blue + Book"). + + [9] International Telegraph and Telephone Consultative Committee. + "Numbering Plan Interworking in the ISDN Era", CCITT + Recommendations E.166., IXth Plenary Assembly, Melbourne, 1988, + Fascicle II.2 ("Blue Book"). + + [10] International Telegraph and Telephone Consultative Committee, + "International Numbering Plan for the Public Data Networks", + CCITT Recommendations X.121., IXth Plenary Assembly, Melbourne, + 1988, Fascicle VIII.3 ("Blue Book"); provisional, Geneva, 1978; + amended, Geneva, 1980, Malaga-Torremolinos, 1984 and Melborne, + 1988. + + [11] Korb, J., "Standard for the Transmission of IP datagrams Over + Public Data Networks", RFC 877, Purdue University, September + 1983. + + [12] Ullmann, R., "SMTP on X.25", RFC 1090, Prime Computer, February + 1989. + + [13] Ullmann, R., "TP/IX: The Next Internet", Prime Computer + (unpublished), July 1990. + + [14] Mockapetris, P., "DNS Encoding of Network Names and Other Types", + RFC 1101, USC/Information Sciences Institute, April 1989. + +Security Considerations + + Security issues are not addressed in this memo. + + + + + + + + + + + + + + +Everhart, Mamakos, Ullmann & Mockapetris [Page 10] + +RFC 1183 New DNS RR Definitions October 1990 + + +Authors' Addresses + + Craig F. Everhart + Transarc Corporation + The Gulf Tower + 707 Grant Street + Pittsburgh, PA 15219 + + Phone: +1 412 338 4467 + + EMail: Craig_Everhart@transarc.com + + + Louis A. Mamakos + Network Infrastructure Group + Computer Science Center + University of Maryland + College Park, MD 20742-2411 + + Phone: +1-301-405-7836 + + Email: louie@Sayshell.UMD.EDU + + + Robert Ullmann 10-30 + Prime Computer, Inc. + 500 Old Connecticut Path + Framingham, MA 01701 + + Phone: +1 508 620 2800 ext 1736 + + Email: Ariel@Relay.Prime.COM + + + Paul Mockapetris + USC Information Sciences Institute + 4676 Admiralty Way + Marina del Rey, CA 90292 + + Phone: 213-822-1511 + + EMail: pvm@isi.edu + + + + + + + + + +Everhart, Mamakos, Ullmann & Mockapetris [Page 11] +
\ No newline at end of file diff --git a/doc/rfc/rfc1535.txt b/doc/rfc/rfc1535.txt new file mode 100644 index 00000000..03bddeeb --- /dev/null +++ b/doc/rfc/rfc1535.txt @@ -0,0 +1,283 @@ + + + + + + +Network Working Group E. Gavron +Request for Comments: 1535 ACES Research Inc. +Category: Informational October 1993 + + + A Security Problem and Proposed Correction + With Widely Deployed DNS Software + +Status of this Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard. Distribution of this memo is + unlimited. + +Abstract + + This document discusses a flaw in some of the currently distributed + name resolver clients. The flaw exposes a security weakness related + to the search heuristic invoked by these same resolvers when users + provide a partial domain name, and which is easy to exploit (although + not by the masses). This document points out the flaw, a case in + point, and a solution. + +Background + + Current Domain Name Server clients are designed to ease the burden of + remembering IP dotted quad addresses. As such they translate human- + readable names into addresses and other resource records. Part of + the translation process includes understanding and dealing with + hostnames that are not fully qualified domain names (FQDNs). + + An absolute "rooted" FQDN is of the format {name}{.} A non "rooted" + domain name is of the format {name} + + A domain name may have many parts and typically these include the + host, domain, and type. Example: foobar.company.com or + fooschool.university.edu. + +Flaw + + The problem with most widely distributed resolvers based on the BSD + BIND resolver is that they attempt to resolve a partial name by + processing a search list of partial domains to be added to portions + of the specified host name until a DNS record is found. This + "feature" is disabled by default in the official BIND 4.9.2 release. + + Example: A TELNET attempt by User@Machine.Tech.ACES.COM + to UnivHost.University.EDU + + + +Gavron [Page 1] + +RFC 1535 DNS Software Enhancements October 1993 + + + The resolver client will realize that since "UnivHost.University.EDU" + does not end with a ".", it is not an absolute "rooted" FQDN. It + will then try the following combinations until a resource record is + found: + + UnivHost.University.EDU.Tech.ACES.COM. + UnivHost.University.EDU.ACES.COM. + UnivHost.University.EDU.COM. + UnivHost.University.EDU. + +Security Issue + + After registering the EDU.COM domain, it was discovered that an + unliberal application of one wildcard CNAME record would cause *all* + connects from any .COM site to any .EDU site to terminate at one + target machine in the private edu.com sub-domain. + + Further, discussion reveals that specific hostnames registered in + this private subdomain, or any similarly named subdomain may be used + to spoof a host. + + Example: harvard.edu.com. CNAME targethost + + Thus all connects to Harvard.edu from all .com sites would end up at + targthost, a machine which could provide a Harvard.edu login banner. + + This is clearly unacceptable. Further, it could only be made worse + with domains like COM.EDU, MIL.GOV, GOV.COM, etc. + +Public vs. Local Name Space Administration + + The specification of the Domain Name System and the software that + implements it provides an undifferentiated hierarchy which permits + delegation of administration for subordinate portions of the name + space. Actual administration of the name space is divided between + "public" and "local" portions. Public administration pertains to all + top-level domains, such as .COM and .EDU. For some domains, it also + pertains to some number of sub-domain levels. The multi-level nature + of the public administration is most evident for top-level domains + for countries. For example in the Fully Qualified Domain Name, + dbc.mtview.ca.us., the portion "mtview.ca.us" represents three levels + of public administration. Only the left-most portion is subject to + local administration. + + + + + + + + +Gavron [Page 2] + +RFC 1535 DNS Software Enhancements October 1993 + + + The danger of the heuristic search common in current practise is that + it it is possible to "intercept" the search by matching against an + unintended value while walking up the search list. While this is + potentially dangerous at any level, it is entirely unacceptable when + the error impacts users outside of a local administration. + + When attempting to resolve a partial domain name, DNS resolvers use + the Domain Name of the searching host for deriving the search list. + Existing DNS resolvers do not distinguish the portion of that name + which is in the locally administered scope from the part that is + publically administered. + +Solution(s) + + At a minimum, DNS resolvers must honor the BOUNDARY between local and + public administration, by limiting any search lists to locally- + administered portions of the Domain Name space. This requires a + parameter which shows the scope of the name space controlled by the + local administrator. + + This would permit progressive searches from the most qualified to + less qualified up through the locally controlled domain, but not + beyond. + + For example, if the local user were trying to reach: + + User@chief.admin.DESERTU.EDU from + starburst,astro.DESERTU.EDU, + + it is reasonable to permit the user to enter just chief.admin, and + for the search to cover: + + chief.admin.astro.DESERTU.EDU + chief.admin.DESERTU.EDU + + but not + + chief.admin.EDU + + In this case, the value of "search" should be set to "DESERTU.EDU" + because that's the scope of the name space controlled by the local + DNS administrator. + + This is more than a mere optimization hack. The local administrator + has control over the assignment of names within the locally + administered domain, so the administrator can make sure that + abbreviations result in the right thing. Outside of the local + control, users are necessarily at risk. + + + +Gavron [Page 3] + +RFC 1535 DNS Software Enhancements October 1993 + + + A more stringent mechanism is implemented in BIND 4.9.2, to respond + to this problem: + + The DNS Name resolver clients narrows its IMPLICIT search list IF ANY + to only try the first and the last of the examples shown. + + Any additional search alternatives must be configured into the + resolver EXPLICITLY. + + DNS Name resolver software SHOULD NOT use implicit search lists in + attempts to resolve partial names into absolute FQDNs other than the + hosts's immediate parent domain. + + Resolvers which continue to use implicit search lists MUST limit + their scope to locally administered sub-domains. + + DNS Name resolver software SHOULD NOT come pre-configured with + explicit search lists that perpetuate this problem. + + Further, in any event where a "." exists in a specified name it + should be assumed to be a fully qualified domain name (FQDN) and + SHOULD be tried as a rooted name first. + + Example: Given user@a.b.c.d connecting to e.f.g.h only two tries + should be attempted as a result of using an implicit + search list: + + e.f.g.h. and e.f.g.h.b.c.d. + + Given user@a.b.c.d. connecting to host those same two + tries would appear as: + + x.b.c.d. and x. + + Some organizations make regular use of multi-part, partially + qualified Domain Names. For example, host foo.loc1.org.city.state.us + might be used to making references to bar.loc2, or mumble.loc3, all + of which refer to whatever.locN.org.city.state.us + + The stringent implicit search rules for BIND 4.9.2 will now cause + these searches to fail. To return the ability for them to succeed, + configuration of the client resolvers must be changed to include an + explicit search rule for org.city.state.us. That is, it must contain + an explicit rule for any -- and each -- portion of the locally- + administered sub-domain that it wishes to have as part of the search + list. + + + + + +Gavron [Page 4] + +RFC 1535 DNS Software Enhancements October 1993 + + +References + + [1] Mockapetris, P., "Domain Names Concepts and Facilities", STD 13, + RFC 1034, USC/Information Sciences Institute, November 1987. + + [2] Mockapetris, P., "Domain Names Implementation and Specification", + STD 13, RFC 1035, USC/Information Sciences Institute, November + 1987. + + [3] Partridge, C., "Mail Routing and the Domain System", STD 14, RFC + 974, CSNET CIC BBN, January 1986. + + [4] Kumar, A., Postel, J., Neuman, C., Danzig, P., and S. Miller, + "Common DNS Implementation Errors and Suggested Fixes", RFC 1536, + USC/Information Sciences Institute, USC, October 1993. + + [5] Beertema, P., "Common DNS Data File Configuration Errors", RFC + 1537, CWI, October 1993. + +Security Considerations + + This memo indicates vulnerabilities with all too-forgiving DNS + clients. It points out a correction that would eliminate the future + potential of the problem. + +Author's Address + + Ehud Gavron + ACES Research Inc. + PO Box 14546 + Tucson, AZ 85711 + + Phone: (602) 743-9841 + EMail: gavron@aces.com + + + + + + + + + + + + + + + + + +Gavron [Page 5] + diff --git a/doc/rfc/rfc1536.txt b/doc/rfc/rfc1536.txt new file mode 100644 index 00000000..5ff2b25d --- /dev/null +++ b/doc/rfc/rfc1536.txt @@ -0,0 +1,675 @@ + + + + + + +Network Working Group A. Kumar +Request for Comments: 1536 J. Postel +Category: Informational C. Neuman + ISI + P. Danzig + S. Miller + USC + October 1993 + + + Common DNS Implementation Errors and Suggested Fixes + +Status of this Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard. Distribution of this memo is + unlimited. + +Abstract + + This memo describes common errors seen in DNS implementations and + suggests some fixes. Where applicable, violations of recommendations + from STD 13, RFC 1034 and STD 13, RFC 1035 are mentioned. The memo + also describes, where relevant, the algorithms followed in BIND + (versions 4.8.3 and 4.9 which the authors referred to) to serve as an + example. + +Introduction + + The last few years have seen, virtually, an explosion of DNS traffic + on the NSFnet backbone. Various DNS implementations and various + versions of these implementations interact with each other, producing + huge amounts of unnecessary traffic. Attempts are being made by + researchers all over the internet, to document the nature of these + interactions, the symptomatic traffic patterns and to devise remedies + for the sick pieces of software. + + This draft is an attempt to document fixes for known DNS problems so + people know what problems to watch out for and how to repair broken + software. + +1. Fast Retransmissions + + DNS implements the classic request-response scheme of client-server + interaction. UDP is, therefore, the chosen protocol for communication + though TCP is used for zone transfers. The onus of requerying in case + no response is seen in a "reasonable" period of time, lies with the + client. Although RFC 1034 and 1035 do not recommend any + + + +Kumar, Postel, Neuman, Danzig & Miller [Page 1] + +RFC 1536 Common DNS Implementation Errors October 1993 + + + retransmission policy, RFC 1035 does recommend that the resolvers + should cycle through a list of servers. Both name servers and stub + resolvers should, therefore, implement some kind of a retransmission + policy based on round trip time estimates of the name servers. The + client should back-off exponentially, probably to a maximum timeout + value. + + However, clients might not implement either of the two. They might + not wait a sufficient amount of time before retransmitting or they + might not back-off their inter-query times sufficiently. + + Thus, what the server would see will be a series of queries from the + same querying entity, spaced very close together. Of course, a + correctly implemented server discards all duplicate queries but the + queries contribute to wide-area traffic, nevertheless. + + We classify a retransmission of a query as a pure Fast retry timeout + problem when a series of query packets meet the following conditions. + + a. Query packets are seen within a time less than a "reasonable + waiting period" of each other. + + b. No response to the original query was seen i.e., we see two or + more queries, back to back. + + c. The query packets share the same query identifier. + + d. The server eventually responds to the query. + +A GOOD IMPLEMENTATION: + + BIND (we looked at versions 4.8.3 and 4.9) implements a good + retransmission algorithm which solves or limits all of these + problems. The Berkeley stub-resolver queries servers at an interval + that starts at the greater of 4 seconds and 5 seconds divided by the + number of servers the resolver queries. The resolver cycles through + servers and at the end of a cycle, backs off the time out + exponentially. + + The Berkeley full-service resolver (built in with the program + "named") starts with a time-out equal to the greater of 4 seconds and + two times the round-trip time estimate of the server. The time-out + is backed off with each cycle, exponentially, to a ceiling value of + 45 seconds. + + + + + + + +Kumar, Postel, Neuman, Danzig & Miller [Page 2] + +RFC 1536 Common DNS Implementation Errors October 1993 + + +FIXES: + + a. Estimate round-trip times or set a reasonably high initial + time-out. + + b. Back-off timeout periods exponentially. + + c. Yet another fundamental though difficult fix is to send the + client an acknowledgement of a query, with a round-trip time + estimate. + + Since UDP is used, no response is expected by the client until the + query is complete. Thus, it is less likely to have information about + previous packets on which to estimate its back-off time. Unless, you + maintain state across queries, so subsequent queries to the same + server use information from previous queries. Unfortunately, such + estimates are likely to be inaccurate for chained requests since the + variance is likely to be high. + + The fix chosen in the ARDP library used by Prospero is that the + server will send an initial acknowledgement to the client in those + cases where the server expects the query to take a long time (as + might be the case for chained queries). This initial acknowledgement + can include an expected time to wait before retrying. + + This fix is more difficult since it requires that the client software + also be trained to expect the acknowledgement packet. This, in an + internet of millions of hosts is at best a hard problem. + +2. Recursion Bugs + + When a server receives a client request, it first looks up its zone + data and the cache to check if the query can be answered. If the + answer is unavailable in either place, the server seeks names of + servers that are more likely to have the information, in its cache or + zone data. It then does one of two things. If the client desires the + server to recurse and the server architecture allows recursion, the + server chains this request to these known servers closest to the + queried name. If the client doesn't seek recursion or if the server + cannot handle recursion, it returns the list of name servers to the + client assuming the client knows what to do with these records. + + The client queries this new list of name servers to get either the + answer, or names of another set of name servers to query. This + process repeats until the client is satisfied. Servers might also go + through this chaining process if the server returns a CNAME record + for the queried name. Some servers reprocess this name to try and get + the desired record type. + + + +Kumar, Postel, Neuman, Danzig & Miller [Page 3] + +RFC 1536 Common DNS Implementation Errors October 1993 + + + However, in certain cases, this chain of events may not be good. For + example, a broken or malicious name server might list itself as one + of the name servers to query again. The unsuspecting client resends + the same query to the same server. + + In another situation, more difficult to detect, a set of servers + might form a loop wherein A refers to B and B refers to A. This loop + might involve more than two servers. + + Yet another error is where the client does not know how to process + the list of name servers returned, and requeries the same server + since that is one (of the few) servers it knows. + + We, therefore, classify recursion bugs into three distinct + categories: + + a. Ignored referral: Client did not know how to handle NS records + in the AUTHORITY section. + + b. Too many referrals: Client called on a server too many times, + beyond a "reasonable" number, with same query. This is + different from a Fast retransmission problem and a Server + Failure detection problem in that a response is seen for every + query. Also, the identifiers are always different. It implies + client is in a loop and should have detected that and broken + it. (RFC 1035 mentions that client should not recurse beyond + a certain depth.) + + c. Malicious Server: a server refers to itself in the authority + section. If a server does not have an answer now, it is very + unlikely it will be any better the next time you query it, + specially when it claims to be authoritative over a domain. + + RFC 1034 warns against such situations, on page 35. + + "Bound the amount of work (packets sent, parallel processes + started) so that a request can't get into an infinite loop or + start off a chain reaction of requests or queries with other + implementations EVEN IF SOMEONE HAS INCORRECTLY CONFIGURED + SOME DATA." + +A GOOD IMPLEMENTATION: + + BIND fixes at least one of these problems. It places an upper limit + on the number of recursive queries it will make, to answer a + question. It chases a maximum of 20 referral links and 8 canonical + name translations. + + + + +Kumar, Postel, Neuman, Danzig & Miller [Page 4] + +RFC 1536 Common DNS Implementation Errors October 1993 + + +FIXES: + + a. Set an upper limit on the number of referral links and CNAME + links you are willing to chase. + + Note that this is not guaranteed to break only recursion loops. + It could, in a rare case, prune off a very long search path, + prematurely. We know, however, with high probability, that if + the number of links cross a certain metric (two times the depth + of the DNS tree), it is a recursion problem. + + b. Watch out for self-referring servers. Avoid them whenever + possible. + + c. Make sure you never pass off an authority NS record with your + own name on it! + + d. Fix clients to accept iterative answers from servers not built + to provide recursion. Such clients should either be happy with + the non-authoritative answer or be willing to chase the + referral links themselves. + +3. Zero Answer Bugs: + + Name servers sometimes return an authoritative NOERROR with no + ANSWER, AUTHORITY or ADDITIONAL records. This happens when the + queried name is valid but it does not have a record of the desired + type. Of course, the server has authority over the domain. + + However, once again, some implementations of resolvers do not + interpret this kind of a response reasonably. They always expect an + answer record when they see an authoritative NOERROR. These entities + continue to resend their queries, possibly endlessly. + +A GOOD IMPLEMENTATION + + BIND resolver code does not query a server more than 3 times. If it + is unable to get an answer from 4 servers, querying them three times + each, it returns error. + + Of course, it treats a zero-answer response the way it should be + treated; with respect! + +FIXES: + + a. Set an upper limit on the number of retransmissions for a given + query, at the very least. + + + + +Kumar, Postel, Neuman, Danzig & Miller [Page 5] + +RFC 1536 Common DNS Implementation Errors October 1993 + + + b. Fix resolvers to interpret such a response as an authoritative + statement of non-existence of the record type for the given + name. + +4. Inability to detect server failure: + + Servers in the internet are not very reliable (they go down every + once in a while) and resolvers are expected to adapt to the changed + scenario by not querying the server for a while. Thus, when a server + does not respond to a query, resolvers should try another server. + Also, non-stub resolvers should update their round trip time estimate + for the server to a large value so that server is not tried again + before other, faster servers. + + Stub resolvers, however, cycle through a fixed set of servers and if, + unfortunately, a server is down while others do not respond for other + reasons (high load, recursive resolution of query is taking more time + than the resolver's time-out, ....), the resolver queries the dead + server again! In fact, some resolvers might not set an upper limit on + the number of query retransmissions they will send and continue to + query dead servers indefinitely. + + Name servers running system or chained queries might also suffer from + the same problem. They store names of servers they should query for a + given domain. They cycle through these names and in case none of them + answers, hit each one more than one. It is, once again, important + that there be an upper limit on the number of retransmissions, to + prevent network overload. + + This behavior is clearly in violation of the dictum in RFC 1035 (page + 46) + + "If a resolver gets a server error or other bizarre response + from a name server, it should remove it from SLIST, and may + wish to schedule an immediate transmission to the next + candidate server address." + + Removal from SLIST implies that the server is not queried again for + some time. + + Correctly implemented full-service resolvers should, as pointed out + before, update round trip time values for servers that do not respond + and query them only after other, good servers. Full-service resolvers + might, however, not follow any of these common sense directives. They + query dead servers, and they query them endlessly. + + + + + + +Kumar, Postel, Neuman, Danzig & Miller [Page 6] + +RFC 1536 Common DNS Implementation Errors October 1993 + + +A GOOD IMPLEMENTATION: + + BIND places an upper limit on the number of times it queries a + server. Both the stub-resolver and the full-service resolver code do + this. Also, since the full-service resolver estimates round-trip + times and sorts name server addresses by these estimates, it does not + query a dead server again, until and unless all the other servers in + the list are dead too! Further, BIND implements exponential back-off + too. + +FIXES: + + a. Set an upper limit on number of retransmissions. + + b. Measure round-trip time from servers (some estimate is better + than none). Treat no response as a "very large" round-trip + time. + + c. Maintain a weighted rtt estimate and decay the "large" value + slowly, with time, so that the server is eventually tested + again, but not after an indefinitely long period. + + d. Follow an exponential back-off scheme so that even if you do + not restrict the number of queries, you do not overload the + net excessively. + +5. Cache Leaks: + + Every resource record returned by a server is cached for TTL seconds, + where the TTL value is returned with the RR. Full-service (or stub) + resolvers cache the RR and answer any queries based on this cached + information, in the future, until the TTL expires. After that, one + more query to the wide-area network gets the RR in cache again. + + Full-service resolvers might not implement this caching mechanism + well. They might impose a limit on the cache size or might not + interpret the TTL value correctly. In either case, queries repeated + within a TTL period of a RR constitute a cache leak. + +A GOOD/BAD IMPLEMENTATION: + + BIND has no restriction on the cache size and the size is governed by + the limits on the virtual address space of the machine it is running + on. BIND caches RRs for the duration of the TTL returned with each + record. + + It does, however, not follow the RFCs with respect to interpretation + of a 0 TTL value. If a record has a TTL value of 0 seconds, BIND uses + + + +Kumar, Postel, Neuman, Danzig & Miller [Page 7] + +RFC 1536 Common DNS Implementation Errors October 1993 + + + the minimum TTL value, for that zone, from the SOA record and caches + it for that duration. This, though it saves some traffic on the + wide-area network, is not correct behavior. + +FIXES: + + a. Look over your caching mechanism to ensure TTLs are interpreted + correctly. + + b. Do not restrict cache sizes (come on, memory is cheap!). + Expired entries are reclaimed periodically, anyway. Of course, + the cache size is bound to have some physical limit. But, when + possible, this limit should be large (run your name server on + a machine with a large amount of physical memory). + + c. Possibly, a mechanism is needed to flush the cache, when it is + known or even suspected that the information has changed. + +6. Name Error Bugs: + + This bug is very similar to the Zero Answer bug. A server returns an + authoritative NXDOMAIN when the queried name is known to be bad, by + the server authoritative for the domain, in the absence of negative + caching. This authoritative NXDOMAIN response is usually accompanied + by the SOA record for the domain, in the authority section. + + Resolvers should recognize that the name they queried for was a bad + name and should stop querying further. + + Some resolvers might, however, not interpret this correctly and + continue to query servers, expecting an answer record. + + Some applications, in fact, prompt NXDOMAIN answers! When given a + perfectly good name to resolve, they append the local domain to it + e.g., an application in the domain "foo.bar.com", when trying to + resolve the name "usc.edu" first tries "usc.edu.foo.bar.com", then + "usc.edu.bar.com" and finally the good name "usc.edu". This causes at + least two queries that return NXDOMAIN, for every good query. The + problem is aggravated since the negative answers from the previous + queries are not cached. When the same name is sought again, the + process repeats. + + Some DNS resolver implementations suffer from this problem, too. They + append successive sub-parts of the local domain using an implicit + searchlist mechanism, when certain conditions are satisfied and try + the original name, only when this first set of iterations fails. This + behavior recently caused pandemonium in the Internet when the domain + "edu.com" was registered and a wildcard "CNAME" record placed at the + + + +Kumar, Postel, Neuman, Danzig & Miller [Page 8] + +RFC 1536 Common DNS Implementation Errors October 1993 + + + top level. All machines from "com" domains trying to connect to hosts + in the "edu" domain ended up with connections to the local machine in + the "edu.com" domain! + +GOOD/BAD IMPLEMENTATIONS: + + Some local versions of BIND already implement negative caching. They + typically cache negative answers with a very small TTL, sufficient to + answer a burst of queries spaced close together, as is typically + seen. + + The next official public release of BIND (4.9.2) will have negative + caching as an ifdef'd feature. + + The BIND resolver appends local domain to the given name, when one of + two conditions is met: + + i. The name has no periods and the flag RES_DEFNAME is set. + ii. There is no trailing period and the flag RES_DNSRCH is set. + + The flags RES_DEFNAME and RES_DNSRCH are default resolver options, in + BIND, but can be changed at compile time. + + Only if the name, so generated, returns an NXDOMAIN is the original + name tried as a Fully Qualified Domain Name. And only if it contains + at least one period. + +FIXES: + + a. Fix the resolver code. + + b. Negative Caching. Negative caching servers will restrict the + traffic seen on the wide-area network, even if not curb it + altogether. + + c. Applications and resolvers should not append the local domain to + names they seek to resolve, as far as possible. Names + interspersed with periods should be treated as Fully Qualified + Domain Names. + + In other words, Use searchlists only when explicitly specified. + No implicit searchlists should be used. A name that contains + any dots should first be tried as a FQDN and if that fails, with + the local domain name (or searchlist if specified) appended. A + name containing no dots can be appended with the searchlist right + away, but once again, no implicit searchlists should be used. + + + + + +Kumar, Postel, Neuman, Danzig & Miller [Page 9] + +RFC 1536 Common DNS Implementation Errors October 1993 + + + Associated with the name error bug is another problem where a server + might return an authoritative NXDOMAIN, although the name is valid. A + secondary server, on start-up, reads the zone information from the + primary, through a zone transfer. While it is in the process of + loading the zones, it does not have information about them, although + it is authoritative for them. Thus, any query for a name in that + domain is answered with an NXDOMAIN response code. This problem might + not be disastrous were it not for negative caching servers that cache + this answer and so propagate incorrect information over the internet. + +BAD IMPLEMENTATION: + + BIND apparently suffers from this problem. + + Also, a new name added to the primary database will take a while to + propagate to the secondaries. Until that time, they will return + NXDOMAIN answers for a good name. Negative caching servers store this + answer, too and aggravate this problem further. This is probably a + more general DNS problem but is apparently more harmful in this + situation. + +FIX: + + a. Servers should start answering only after loading all the zone + data. A failed server is better than a server handing out + incorrect information. + + b. Negative cache records for a very small time, sufficient only + to ward off a burst of requests for the same bad name. This + could be related to the round-trip time of the server from + which the negative answer was received. Alternatively, a + statistical measure of the amount of time for which queries + for such names are received could be used. Minimum TTL value + from the SOA record is not advisable since they tend to be + pretty large. + + c. A "PUSH" (or, at least, a "NOTIFY") mechanism should be allowed + and implemented, to allow the primary server to inform + secondaries that the database has been modified since it last + transferred zone data. To alleviate the problem of "too many + zone transfers" that this might cause, Incremental Zone + Transfers should also be part of DNS. Also, the primary should + not NOTIFY/PUSH with every update but bunch a good number + together. + + + + + + + +Kumar, Postel, Neuman, Danzig & Miller [Page 10] + +RFC 1536 Common DNS Implementation Errors October 1993 + + +7. Format Errors: + + Some resolvers issue query packets that do not necessarily conform to + standards as laid out in the relevant RFCs. This unnecessarily + increases net traffic and wastes server time. + +FIXES: + + a. Fix resolvers. + + b. Each resolver verify format of packets before sending them out, + using a mechanism outside of the resolver. This is, obviously, + needed only if step 1 cannot be followed. + +References + + [1] Mockapetris, P., "Domain Names Concepts and Facilities", STD 13, + RFC 1034, USC/Information Sciences Institute, November 1987. + + [2] Mockapetris, P., "Domain Names Implementation and Specification", + STD 13, RFC 1035, USC/Information Sciences Institute, November + 1987. + + [3] Partridge, C., "Mail Routing and the Domain System", STD 14, RFC + 974, CSNET CIC BBN, January 1986. + + [4] Gavron, E., "A Security Problem and Proposed Correction With + Widely Deployed DNS Software", RFC 1535, ACES Research Inc., + October 1993. + + [5] Beertema, P., "Common DNS Data File Configuration Errors", RFC + 1537, CWI, October 1993. + +Security Considerations + + Security issues are not discussed in this memo. + + + + + + + + + + + + + + + +Kumar, Postel, Neuman, Danzig & Miller [Page 11] + +RFC 1536 Common DNS Implementation Errors October 1993 + + +Authors' Addresses + + Anant Kumar + USC Information Sciences Institute + 4676 Admiralty Way + Marina Del Rey CA 90292-6695 + + Phone:(310) 822-1511 + FAX: (310) 823-6741 + EMail: anant@isi.edu + + + Jon Postel + USC Information Sciences Institute + 4676 Admiralty Way + Marina Del Rey CA 90292-6695 + + Phone:(310) 822-1511 + FAX: (310) 823-6714 + EMail: postel@isi.edu + + + Cliff Neuman + USC Information Sciences Institute + 4676 Admiralty Way + Marina Del Rey CA 90292-6695 + + Phone:(310) 822-1511 + FAX: (310) 823-6714 + EMail: bcn@isi.edu + + + Peter Danzig + Computer Science Department + University of Southern California + University Park + + EMail: danzig@caldera.usc.edu + + + Steve Miller + Computer Science Department + University of Southern California + University Park + Los Angeles CA 90089 + + EMail: smiller@caldera.usc.edu + + + + +Kumar, Postel, Neuman, Danzig & Miller [Page 12] + diff --git a/doc/rfc/rfc1537.txt b/doc/rfc/rfc1537.txt new file mode 100644 index 00000000..81b97683 --- /dev/null +++ b/doc/rfc/rfc1537.txt @@ -0,0 +1,507 @@ + + + + + + +Network Working Group P. Beertema +Request for Comments: 1537 CWI +Category: Informational October 1993 + + + Common DNS Data File Configuration Errors + +Status of this Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard. Distribution of this memo is + unlimited. + +Abstract + + This memo describes errors often found in DNS data files. It points + out common mistakes system administrators tend to make and why they + often go unnoticed for long periods of time. + +Introduction + + Due to the lack of extensive documentation and automated tools, DNS + zone files have mostly been configured by system administrators, by + hand. Some of the rules for writing the data files are rather subtle + and a few common mistakes are seen in domains worldwide. + + This document is an attempt to list "surprises" that administrators + might find hidden in their zone files. It describes the symptoms of + the malady and prescribes medicine to cure that. It also gives some + general recommendations and advice on specific nameserver and zone + file issues and on the (proper) use of the Domain Name System. + +1. SOA records + + A problem I've found in quite some nameservers is that the various + timers have been set (far) too low. Especially for top level domain + nameservers this causes unnecessary traffic over international and + intercontinental links. + + Unfortunately the examples given in the BIND manual, in RFC's and in + some expert documents give those very short timer values, and that's + most likely what people have modeled their SOA records after. + + First of all a short explanation of the timers used in the SOA + record: + + + + + + +Beertema [Page 1] + +RFC 1537 Common DNS Data File Configuration Errors October 1993 + + + - Refresh: The SOA record of the primary server is checked + every "refresh" time by the secondary servers; + if it has changed, a zone transfer is done. + + - Retry: If a secondary server cannot reach the primary + server, it tries it again every "retry" time. + + - Expire: If for "expire" time the primary server cannot + be reached, all information about the zone is + invalidated on the secondary servers (i.e., they + are no longer authoritative for that zone). + + - Minimum TTL: The default TTL value for all records in the + zone file; a different TTL value may be given + explicitly in a record when necessary. + (This timer is named "Minimum", and that's + what it's function should be according to + STD 13, RFC 1035, but most (all?) + implementations take it as the default value + exported with records without an explicit TTL + value). + + For top level domain servers I would recommend the following values: + + 86400 ; Refresh 24 hours + 7200 ; Retry 2 hours + 2592000 ; Expire 30 days + 345600 ; Minimum TTL 4 days + + For other servers I would suggest: + + 28800 ; Refresh 8 hours + 7200 ; Retry 2 hours + 604800 ; Expire 7 days + 86400 ; Minimum TTL 1 day + + but here the frequency of changes, the required speed of propagation, + the reachability of the primary server etc. play a role in optimizing + the timer values. + +2. Glue records + + Quite often, people put unnecessary glue (A) records in their zone + files. Even worse is that I've even seen *wrong* glue records for an + external host in a primary zone file! Glue records need only be in a + zone file if the server host is within the zone and there is no A + record for that host elsewhere in the zone file. + + + + +Beertema [Page 2] + +RFC 1537 Common DNS Data File Configuration Errors October 1993 + + + Old BIND versions ("native" 4.8.3 and older versions) showed the + problem that wrong glue records could enter secondary servers in a + zone transfer. + +3. "Secondary server surprise" + + I've seen it happen on various occasions that hosts got bombarded by + nameserver requests without knowing why. On investigation it turned + out then that such a host was supposed to (i.e., the information was + in the root servers) run secondary for some domain (or reverse (in- + addr.arpa)) domain, without that host's nameserver manager having + been asked or even been told so! + + Newer BIND versions (4.9 and later) solved this problem. At the same + time though the fix has the disadvantage that it's far less easy to + spot this problem. + + Practice has shown that most domain registrars accept registrations + of nameservers without checking if primary (!) and secondary servers + have been set up, informed, or even asked. It should also be noted + that a combination of long-lasting unreachability of primary + nameservers, (therefore) expiration of zone information, plus static + IP routing, can lead to massive network traffic that can fill up + lines completely. + +4. "MX records surprise" + + In a sense similar to point 3. Sometimes nameserver managers enter MX + records in their zone files that point to external hosts, without + first asking or even informing the systems managers of those external + hosts. This has to be fought out between the nameserver manager and + the systems managers involved. Only as a last resort, if really + nothing helps to get the offending records removed, can the systems + manager turn to the naming authority of the domain above the + offending domain to get the problem sorted out. + +5. "Name extension surprise" + + Sometimes one encounters weird names, which appear to be an external + name extended with a local domain. This is caused by forgetting to + terminate a name with a dot: names in zone files that don't end with + a dot are always expanded with the name of the current zone (the + domain that the zone file stands for or the last $ORIGIN). + + Example: zone file for foo.xx: + + pqr MX 100 relay.yy. + xyz MX 100 relay.yy (no trailing dot!) + + + +Beertema [Page 3] + +RFC 1537 Common DNS Data File Configuration Errors October 1993 + + + When fully written out this stands for: + + pqr.foo.xx. MX 100 relay.yy. + xyz.foo.xx. MX 100 relay.yy.foo.xx. (name extension!) + +6. Missing secondary servers + + It is required that there be a least 2 nameservers for a domain. For + obvious reasons the nameservers for top level domains need to be very + well reachable from all over the Internet. This implies that there + must be more than just 2 of them; besides, most of the (secondary) + servers should be placed at "strategic" locations, e.g., close to a + point where international and/or intercontinental lines come + together. To keep things manageable, there shouldn't be too many + servers for a domain either. + + Important aspects in selecting the location of primary and secondary + servers are reliability (network, host) and expedient contacts: in + case of problems, changes/fixes must be carried out quickly. It + should be considered logical that primary servers for European top + level domains should run on a host in Europe, preferably (if + possible) in the country itself. For each top level domain there + should be 2 secondary servers in Europe and 2 in the USA, but there + may of course be more on either side. An excessive number of + nameservers is not a good idea though; a recommended maximum is 7 + nameservers. In Europe, EUnet has offered to run secondary server + for each European top level domain. + +7. Wildcard MX records + + Wildcard MX records should be avoided where possible. They often + cause confusion and errors: especially beginning nameserver managers + tend to overlook the fact that a host/domain listed with ANY type of + record in a zone file is NOT covered by an overall wildcard MX record + in that zone; this goes not only for simple domain/host names, but + also for names that cover one or more domains. Take the following + example in zone foo.bar: + + * MX 100 mailhost + pqr MX 100 mailhost + abc.def MX 100 mailhost + + This makes pqr.foo.bar, def.foo.bar and abd.def.foo.bar valid + domains, but the wildcard MX record covers NONE of them, nor anything + below them. To cover everything by MX records, the required entries + are: + + + + + +Beertema [Page 4] + +RFC 1537 Common DNS Data File Configuration Errors October 1993 + + + * MX 100 mailhost + pqr MX 100 mailhost + *.pqr MX 100 mailhost + abc.def MX 100 mailhost + *.def MX 100 mailhost + *.abc.def MX 100 mailhost + + An overall wildcard MX record is almost never useful. + + In particular the zone file of a top level domain should NEVER + contain only an overall wildcard MX record (*.XX). The effect of such + a wildcard MX record can be that mail is unnecessarily sent across + possibly expensive links, only to fail at the destination or gateway + that the record points to. Top level domain zone files should + explicitly list at least all the officially registered primary + subdomains. + + Whereas overall wildcard MX records should be avoided, wildcard MX + records are acceptable as an explicit part of subdomain entries, + provided they are allowed under a given subdomain (to be determined + by the naming authority for that domain). + + Example: + + foo.xx. MX 100 gateway.xx. + MX 200 fallback.yy. + *.foo.xx. MX 100 gateway.xx. + MX 200 fallback.yy. +8. Hostnames + + People appear to sometimes look only at STD 11, RFC 822 to determine + whether a particular hostname is correct or not. Hostnames should + strictly conform to the syntax given in STD 13, RFC 1034 (page 11), + with *addresses* in addition conforming to RFC 822. As an example + take "c&w.blues" which is perfectly legal according to RFC 822, but + which can have quite surprising effects on particular systems, e.g., + "telnet c&w.blues" on a Unix system. + +9. HINFO records + + There appears to be a common misunderstanding that one of the data + fields (usually the second field) in HINFO records is optional. A + recent scan of all reachable nameservers in only one country revealed + some 300 incomplete HINFO records. Specifying two data fields in a + HINFO record is mandatory (RFC 1033), but note that this does *not* + mean that HINFO records themselves are mandatory. + + + + + +Beertema [Page 5] + +RFC 1537 Common DNS Data File Configuration Errors October 1993 + + +10. Safety measures and specialties + + Nameservers and resolvers aren't flawless. Bogus queries should be + kept from being forwarded to the root servers, since they'll only + lead to unnecessary intercontinental traffic. Known bogus queries + that can easily be dealt with locally are queries for 0 and broadcast + addresses. To catch such queries, every nameserver should run + primary for the 0.in-addr.arpa and 255.in-addr.arpa zones; the zone + files need only contain a SOA and an NS record. + + Also each nameserver should run primary for 0.0.127.in-addr.arpa; + that zone file should contain a SOA and NS record and an entry: + + 1 PTR localhost. + + There has been extensive discussion about whether or not to append + the local domain to it. The conclusion was that "localhost." would be + the best solution; reasons given were: + + - "localhost" itself is used and expected to work on some systems. + + - translating 127.0.0.1 into "localhost.my_domain" can cause some + software to connect to itself using the loopback interface when + it didn't want to. + + Note that all domains that contain hosts should have a "localhost" A + record in them. + + People maintaining zone files with the Serial number given in dotted + decimal notation (e.g., when SCCS is used to maintain the files) + should beware of a bug in all BIND versions: if the serial number is + in Release.Version (dotted decimal) notation, then it is virtually + impossible to change to a higher release: because of the wrong way + that notation is turned into an integer, it results in a serial + number that is LOWER than that of the former release. + + For this reason and because the Serial is an (unsigned) integer + according to STD 13, RFC 1035, it is recommended not to use the + dotted decimal notation. A recommended notation is to use the date + (yyyymmdd), if necessary with an extra digit (yyyymmddn) if there is + or can be more than one change per day in a zone file. + + Very old versions of DNS resolver code have a bug that causes queries + for A records with domain names like "192.16.184.3" to go out. This + happens when users type in IP addresses and the resolver code does + not catch this case before sending out a DNS query. This problem has + been fixed in all resolver implementations known to us but if it + still pops up it is very serious because all those queries will go to + + + +Beertema [Page 6] + +RFC 1537 Common DNS Data File Configuration Errors October 1993 + + + the root servers looking for top level domains like "3" etc. It is + strongly recommended to install the latest (publicly) available BIND + version plus all available patches to get rid of these and other + problems. + + Running secondary nameserver off another secondary nameserver is + possible, but not recommended unless really necessary: there are + known cases where it has led to problems like bogus TTL values. This + can be caused by older or flawed implementations, but secondary + nameservers in principle should always transfer their zones from the + official primary nameserver. + +11. Some general points + + The Domain Name System and nameserver are purely technical tools, not + meant in any way to exert control or impose politics. The function of + a naming authority is that of a clearing house. Anyone registering a + subdomain under a particular (top level) domain becomes naming + authority and therewith the sole responsible for that subdomain. + Requests to enter MX or NS records concerning such a subdomain + therefore always MUST be honored by the registrar of the next higher + domain. + + Examples of practices that are not allowed are: + + - imposing specific mail routing (MX records) when registering + a subdomain. + + - making registration of a subdomain dependent on to the use of + certain networks or services. + + - using TXT records as a means of (free) commercial advertising. + + In the latter case a network service provider could decide to cut off + a particular site until the offending TXT records have been removed + from the site's zone file. + + Of course there are obvious cases where a naming authority can refuse + to register a particular subdomain and can require a proposed name to + be changed in order to get it registered (think of DEC trying to + register a domain IBM.XX). + + There are also cases were one has to probe the authority of the + person: sending in the application - not every systems manager should + be able to register a domain name for a whole university. The naming + authority can impose certain extra rules as long as they don't + violate or conflict with the rights and interest of the registrars of + subdomains; a top level domain registrar may e.g., require that there + + + +Beertema [Page 7] + +RFC 1537 Common DNS Data File Configuration Errors October 1993 + + + be primary subdomain "ac" and "co" only and that subdomains be + registered under those primary subdomains. + + The naming authority can also interfere in exceptional cases like the + one mentioned in point 4, e.g., by temporarily removing a domain's + entry from the nameserver zone files; this of course should be done + only with extreme care and only as a last resort. + + When adding NS records for subdomains, top level domain nameserver + managers should realize that the people setting up the nameserver for + a subdomain often are rather inexperienced and can make mistakes that + can easily lead to the subdomain becoming completely unreachable or + that can cause unnecessary DNS traffic (see point 1). It is therefore + highly recommended that, prior to entering such an NS record, the + (top level) nameserver manager does a couple of sanity checks on the + new nameserver (SOA record and timers OK?, MX records present where + needed? No obvious errors made? Listed secondary servers + operational?). Things that cannot be caught though by such checks + are: + + - resolvers set up to use external hosts as nameservers + + - nameservers set up to use external hosts as forwarders + without permission from those hosts. + + Care should also be taken when registering 2-letter subdomains. + Although this is allowed, an implication is that abbreviated + addressing (see STD 11, RFC 822, paragraph 6.2.2) is not possible in + and under that subdomain. When requested to register such a domain, + one should always notify the people of this consequence. As an + example take the name "cs", which is commonly used for Computer + Science departments: it is also the name of the top level domain for + Czecho-Slovakia, so within the domain cs.foo.bar the user@host.cs is + ambiguous in that in can denote both a user on the host + host.cs.foo.bar and a user on the host "host" in Czecho-Slovakia. + (This example does not take into account the recent political changes + in the mentioned country). + +References + + [1] Mockapetris, P., "Domain Names Concepts and Facilities", STD 13, + RFC 1034, USC/Information Sciences Institute, November 1987. + + [2] Mockapetris, P., "Domain Names Implementation and Specification", + STD 13, RFC 1035, USC/Information Sciences Institute, November + 1987. + + + + + +Beertema [Page 8] + +RFC 1537 Common DNS Data File Configuration Errors October 1993 + + + [3] Partridge, C., "Mail Routing and the Domain System", STD 14, RFC + 974, CSNET CIC BBN, January 1986. + + [4] Gavron, E., "A Security Problem and Proposed Correction With + Widely Deployed DNS Software", RFC 1535, ACES Research Inc., + October 1993. + + [5] Kumar, A., Postel, J., Neuman, C., Danzig, P., and S. Miller, + "Common DNS Implementation Errors and Suggested Fixes", RFC 1536, + USC/Information Sciences Institute, USC, October 1993. + +Security Considerations + + Security issues are not discussed in this memo. + +Author's Address + + Piet Beertema + CWI + Kruislaan 413 + NL-1098 SJ Amsterdam + The Netherlands + + Phone: +31 20 592 4112 + FAX: +31 20 592 4199 + EMail: Piet.Beertema@cwi.nl + + +Editor's Address + + Anant Kumar + USC Information Sciences Institute + 4676 Admiralty Way + Marina Del Rey CA 90292-6695 + + Phone:(310) 822-1511 + FAX: (310) 823-6741 + EMail: anant@isi.edu + + + + + + + + + + + + + +Beertema [Page 9] +
\ No newline at end of file diff --git a/doc/rfc/rfc1591.txt b/doc/rfc/rfc1591.txt new file mode 100644 index 00000000..89e0a254 --- /dev/null +++ b/doc/rfc/rfc1591.txt @@ -0,0 +1,395 @@ + + + + + + +Network Working Group J. Postel +Request for Comments: 1591 ISI +Category: Informational March 1994 + + + Domain Name System Structure and Delegation + + +Status of this Memo + + This memo provides information for the Internet community. This memo + does not specify an Internet standard of any kind. Distribution of + this memo is unlimited. + +1. Introduction + + This memo provides some information on the structure of the names in + the Domain Name System (DNS), specifically the top-level domain + names; and on the administration of domains. The Internet Assigned + Numbers Authority (IANA) is the overall authority for the IP + Addresses, the Domain Names, and many other parameters, used in the + Internet. The day-to-day responsibility for the assignment of IP + Addresses, Autonomous System Numbers, and most top and second level + Domain Names are handled by the Internet Registry (IR) and regional + registries. + +2. The Top Level Structure of the Domain Names + + In the Domain Name System (DNS) naming of computers there is a + hierarchy of names. The root of system is unnamed. There are a set + of what are called "top-level domain names" (TLDs). These are the + generic TLDs (EDU, COM, NET, ORG, GOV, MIL, and INT), and the two + letter country codes from ISO-3166. It is extremely unlikely that + any other TLDs will be created. + + Under each TLD may be created a hierarchy of names. Generally, under + the generic TLDs the structure is very flat. That is, many + organizations are registered directly under the TLD, and any further + structure is up to the individual organizations. + + In the country TLDs, there is a wide variation in the structure, in + some countries the structure is very flat, in others there is + substantial structural organization. In some country domains the + second levels are generic categories (such as, AC, CO, GO, and RE), + in others they are based on political geography, and in still others, + organization names are listed directly under the country code. The + organization for the US country domain is described in RFC 1480 [1]. + + + + +Postel [Page 1] + +RFC 1591 Domain Name System Structure and Delegation March 1994 + + + Each of the generic TLDs was created for a general category of + organizations. The country code domains (for example, FR, NL, KR, + US) are each organized by an administrator for that country. These + administrators may further delegate the management of portions of the + naming tree. These administrators are performing a public service on + behalf of the Internet community. Descriptions of the generic + domains and the US country domain follow. + + Of these generic domains, five are international in nature, and two + are restricted to use by entities in the United States. + + World Wide Generic Domains: + + COM - This domain is intended for commercial entities, that is + companies. This domain has grown very large and there is + concern about the administrative load and system performance if + the current growth pattern is continued. Consideration is + being taken to subdivide the COM domain and only allow future + commercial registrations in the subdomains. + + EDU - This domain was originally intended for all educational + institutions. Many Universities, colleges, schools, + educational service organizations, and educational consortia + have registered here. More recently a decision has been taken + to limit further registrations to 4 year colleges and + universities. Schools and 2-year colleges will be registered + in the country domains (see US Domain, especially K12 and CC, + below). + + NET - This domain is intended to hold only the computers of network + providers, that is the NIC and NOC computers, the + administrative computers, and the network node computers. The + customers of the network provider would have domain names of + their own (not in the NET TLD). + + ORG - This domain is intended as the miscellaneous TLD for + organizations that didn't fit anywhere else. Some non- + government organizations may fit here. + + INT - This domain is for organizations established by international + treaties, or international databases. + + United States Only Generic Domains: + + GOV - This domain was originally intended for any kind of government + office or agency. More recently a decision was taken to + register only agencies of the US Federal government in this + domain. State and local agencies are registered in the country + + + +Postel [Page 2] + +RFC 1591 Domain Name System Structure and Delegation March 1994 + + + domains (see US Domain, below). + + MIL - This domain is used by the US military. + + Example country code Domain: + + US - As an example of a country domain, the US domain provides for + the registration of all kinds of entities in the United States + on the basis of political geography, that is, a hierarchy of + <entity-name>.<locality>.<state-code>.US. For example, + "IBM.Armonk.NY.US". In addition, branches of the US domain are + provided within each state for schools (K12), community colleges + (CC), technical schools (TEC), state government agencies + (STATE), councils of governments (COG),libraries (LIB), museums + (MUS), and several other generic types of entities (see RFC 1480 + for details [1]). + + To find a contact for a TLD use the "whois" program to access the + database on the host rs.internic.net. Append "-dom" to the name of + TLD you are interested in. For example: + + whois -h rs.internic.net us-dom + or + whois -h rs.internic.net edu-dom + +3. The Administration of Delegated Domains + + The Internet Assigned Numbers Authority (IANA) is responsible for the + overall coordination and management of the Domain Name System (DNS), + and especially the delegation of portions of the name space called + top-level domains. Most of these top-level domains are two-letter + country codes taken from the ISO standard 3166. + + A central Internet Registry (IR) has been selected and designated to + handled the bulk of the day-to-day administration of the Domain Name + System. Applications for new top-level domains (for example, country + code domains) are handled by the IR with consultation with the IANA. + The central IR is INTERNIC.NET. Second level domains in COM, EDU, + ORG, NET, and GOV are registered by the Internet Registry at the + InterNIC. The second level domains in the MIL are registered by the + DDN registry at NIC.DDN.MIL. Second level names in INT are + registered by the PVM at ISI.EDU. + + While all requests for new top-level domains must be sent to the + Internic (at hostmaster@internic.net), the regional registries are + often enlisted to assist in the administration of the DNS, especially + in solving problems with a country administration. Currently, the + RIPE NCC is the regional registry for Europe and the APNIC is the + + + +Postel [Page 3] + +RFC 1591 Domain Name System Structure and Delegation March 1994 + + + regional registry for the Asia-Pacific region, while the INTERNIC + administers the North America region, and all the as yet undelegated + regions. + + The contact mailboxes for these regional registries are: + + INTERNIC hostmaster@internic.net + APNIC hostmaster@apnic.net + RIPE NCC ncc@ripe.net + + The policy concerns involved when a new top-level domain is + established are described in the following. Also mentioned are + concerns raised when it is necessary to change the delegation of an + established domain from one party to another. + + A new top-level domain is usually created and its management + delegated to a "designated manager" all at once. + + Most of these same concerns are relevant when a sub-domain is + delegated and in general the principles described here apply + recursively to all delegations of the Internet DNS name space. + + The major concern in selecting a designated manager for a domain is + that it be able to carry out the necessary responsibilities, and have + the ability to do a equitable, just, honest, and competent job. + + 1) The key requirement is that for each domain there be a designated + manager for supervising that domain's name space. In the case of + top-level domains that are country codes this means that there is + a manager that supervises the domain names and operates the domain + name system in that country. + + The manager must, of course, be on the Internet. There must be + Internet Protocol (IP) connectivity to the nameservers and email + connectivity to the management and staff of the manager. + + There must be an administrative contact and a technical contact + for each domain. For top-level domains that are country codes at + least the administrative contact must reside in the country + involved. + + 2) These designated authorities are trustees for the delegated + domain, and have a duty to serve the community. + + The designated manager is the trustee of the top-level domain for + both the nation, in the case of a country code, and the global + Internet community. + + + + +Postel [Page 4] + +RFC 1591 Domain Name System Structure and Delegation March 1994 + + + Concerns about "rights" and "ownership" of domains are + inappropriate. It is appropriate to be concerned about + "responsibilities" and "service" to the community. + + 3) The designated manager must be equitable to all groups in the + domain that request domain names. + + This means that the same rules are applied to all requests, all + requests must be processed in a non-discriminatory fashion, and + academic and commercial (and other) users are treated on an equal + basis. No bias shall be shown regarding requests that may come + from customers of some other business related to the manager -- + e.g., no preferential service for customers of a particular data + network provider. There can be no requirement that a particular + mail system (or other application), protocol, or product be used. + + There are no requirements on subdomains of top-level domains + beyond the requirements on higher-level domains themselves. That + is, the requirements in this memo are applied recursively. In + particular, all subdomains shall be allowed to operate their own + domain name servers, providing in them whatever information the + subdomain manager sees fit (as long as it is true and correct). + + 4) Significantly interested parties in the domain should agree that + the designated manager is the appropriate party. + + The IANA tries to have any contending parties reach agreement + among themselves, and generally takes no action to change things + unless all the contending parties agree; only in cases where the + designated manager has substantially mis-behaved would the IANA + step in. + + However, it is also appropriate for interested parties to have + some voice in selecting the designated manager. + + There are two cases where the IANA and the central IR may + establish a new top-level domain and delegate only a portion of + it: (1) there are contending parties that cannot agree, or (2) the + applying party may not be able to represent or serve the whole + country. The later case sometimes arises when a party outside a + country is trying to be helpful in getting networking started in a + country -- this is sometimes called a "proxy" DNS service. + + The Internet DNS Names Review Board (IDNB), a committee + established by the IANA, will act as a review panel for cases in + which the parties can not reach agreement among themselves. The + IDNB's decisions will be binding. + + + + +Postel [Page 5] + +RFC 1591 Domain Name System Structure and Delegation March 1994 + + + 5) The designated manager must do a satisfactory job of operating the + DNS service for the domain. + + That is, the actual management of the assigning of domain names, + delegating subdomains and operating nameservers must be done with + technical competence. This includes keeping the central IR (in + the case of top-level domains) or other higher-level domain + manager advised of the status of the domain, responding to + requests in a timely manner, and operating the database with + accuracy, robustness, and resilience. + + There must be a primary and a secondary nameserver that have IP + connectivity to the Internet and can be easily checked for + operational status and database accuracy by the IR and the IANA. + + In cases when there are persistent problems with the proper + operation of a domain, the delegation may be revoked, and possibly + delegated to another designated manager. + + 6) For any transfer of the designated manager trusteeship from one + organization to another, the higher-level domain manager (the IANA + in the case of top-level domains) must receive communications from + both the old organization and the new organization that assure the + IANA that the transfer in mutually agreed, and that the new + organization understands its responsibilities. + + It is also very helpful for the IANA to receive communications + from other parties that may be concerned or affected by the + transfer. + +4. Rights to Names + + 1) Names and Trademarks + + In case of a dispute between domain name registrants as to the + rights to a particular name, the registration authority shall have + no role or responsibility other than to provide the contact + information to both parties. + + The registration of a domain name does not have any Trademark + status. It is up to the requestor to be sure he is not violating + anyone else's Trademark. + + 2) Country Codes + + The IANA is not in the business of deciding what is and what is + not a country. + + + + +Postel [Page 6] + +RFC 1591 Domain Name System Structure and Delegation March 1994 + + + The selection of the ISO 3166 list as a basis for country code + top-level domain names was made with the knowledge that ISO has a + procedure for determining which entities should be and should not + be on that list. + +5. Security Considerations + + Security issues are not discussed in this memo. + +6. Acknowledgements + + Many people have made comments on draft version of these descriptions + and procedures. Steve Goldstein and John Klensin have been + particularly helpful. + +7. Author's Address + + Jon Postel + USC/Information Sciences Institute + 4676 Admiralty Way + Marina del Rey, CA 90292 + + Phone: 310-822-1511 + Fax: 310-823-6714 + EMail: Postel@ISI.EDU + +7. References + + [1] Cooper, A., and J. Postel, "The US Domain", RFC 1480, + USC/Information Sciences Institute, June 1993. + + [2] Reynolds, J., and J. Postel, "Assigned Numbers", STD 2, RFC 1340, + USC/Information Sciences Institute, July 1992. + + [3] Mockapetris, P., "Domain Names - Concepts and Facilities", STD + 13, RFC 1034, USC/Information Sciences Institute, November 1987. + + [4] Mockapetris, P., "Domain Names - Implementation and + Specification", STD 13, RFC 1035, USC/Information Sciences + Institute, November 1987. + + [6] Partridge, C., "Mail Routing and the Domain System", STD 14, RFC + 974, CSNET CIC BBN, January 1986. + + [7] Braden, R., Editor, "Requirements for Internet Hosts -- + Application and Support", STD 3, RFC 1123, Internet Engineering + Task Force, October 1989. + + + + +Postel [Page 7] + diff --git a/doc/rfc/rfc1706.txt b/doc/rfc/rfc1706.txt new file mode 100644 index 00000000..5b5d8219 --- /dev/null +++ b/doc/rfc/rfc1706.txt @@ -0,0 +1,563 @@ + + + + + + +Network Working Group B. Manning +Request for Comments: 1706 ISI +Obsoletes: 1637, 1348 R. Colella +Category: Informational NIST + October 1994 + + + DNS NSAP Resource Records + + +Status of this Memo + + This memo provides information for the Internet community. This memo + does not specify an Internet standard of any kind. Distribution of + this memo is unlimited. + +Abstract + + OSI lower layer protocols, comprising the connectionless network + protocol (CLNP) and supporting routing protocols, are deployed in + some parts of the global Internet. Maintenance and debugging of CLNP + connectivity is greatly aided by support in the Domain Name System + (DNS) for mapping between names and NSAP addresses. + + This document defines the format of one new Resource Record (RR) for + the DNS for domain name-to-NSAP mapping. The RR may be used with any + NSAP address format. + + NSAP-to-name translation is accomplished through use of the PTR RR + (see STD 13, RFC 1035 for a description of the PTR RR). This paper + describes how PTR RRs are used to support this translation. + + This document obsoletes RFC 1348 and RFC 1637. + + + + + + + + + + + + + + + + + + +Manning & Colella [Page 1] + +RFC 1706 DNS NSAP RRs October 1994 + + +1. Introduction + + OSI lower layer protocols, comprising the connectionless network + protocol (CLNP) [5] and supporting routing protocols, are deployed in + some parts of the global Internet. Maintenance and debugging of CLNP + connectivity is greatly aided by support in the Domain Name System + (DNS) [7] [8] for mapping between names and NSAP (network service + access point) addresses [6] [Note: NSAP and NSAP address are used + interchangeably throughout this memo]. + + This document defines the format of one new Resource Record (RR) for + the DNS for domain name-to-NSAP mapping. The RR may be used with any + NSAP address format. + + NSAP-to-name translation is accomplished through use of the PTR RR + (see RFC 1035 for a description of the PTR RR). This paper describes + how PTR RRs are used to support this translation. + + This memo assumes that the reader is familiar with the DNS. Some + familiarity with NSAPs is useful; see [1] or Annex A of [6] for + additional information. + +2. Background + + The reason for defining DNS mappings for NSAPs is to support the + existing CLNP deployment in the Internet. Debugging with CLNP ping + and traceroute has become more difficult with only numeric NSAPs as + the scale of deployment has increased. Current debugging is supported + by maintaining and exchanging a configuration file with name/NSAP + mappings similar in function to hosts.txt. This suffers from the lack + of a central coordinator for this file and also from the perspective + of scaling. The former describes the most serious short-term + problem. Scaling of a hosts.txt-like solution has well-known long- + term scaling difficiencies. + +3. Scope + + The methods defined in this paper are applicable to all NSAP formats. + + As a point of reference, there is a distinction between registration + and publication of addresses. For IP addresses, the IANA is the root + registration authority and the DNS a publication method. For NSAPs, + Annex A of the network service definition, ISO8348 [6], describes the + root registration authority and this memo defines how the DNS is used + as a publication method. + + + + + + +Manning & Colella [Page 2] + +RFC 1706 DNS NSAP RRs October 1994 + + +4. Structure of NSAPs + + NSAPs are hierarchically structured to allow distributed + administration and efficient routing. Distributed administration + permits subdelegated addressing authorities to, as allowed by the + delegator, further structure the portion of the NSAP space under + their delegated control. Accomodating this distributed authority + requires that there be little or no a priori knowledge of the + structure of NSAPs built into DNS resolvers and servers. + + For the purposes of this memo, NSAPs can be thought of as a tree of + identifiers. The root of the tree is ISO8348 [6], and has as its + immediately registered subordinates the one-octet Authority and + Format Identifiers (AFIs) defined there. The size of subsequently- + defined fields depends on which branch of the tree is taken. The + depth of the tree varies according to the authority responsible for + defining subsequent fields. + + An example is the authority under which U.S. GOSIP defines NSAPs [2]. + Under the AFI of 47, NIST (National Institute of Standards and + Technology) obtained a value of 0005 (the AFI of 47 defines the next + field as being two octets consisting of four BCD digits from the + International Code Designator space [3]). NIST defined the subsequent + fields in [2], as shown in Figure 1. The field immediately following + 0005 is a format identifier for the rest of the U.S. GOSIP NSAP + structure, with a hex value of 80. Following this is the three-octet + field, values for which are allocated to network operators; the + registration authority for this field is delegated to GSA (General + Services Administration). + + The last octet of the NSAP is the NSelector (NSel). In practice, the + NSAP minus the NSel identifies the CLNP protocol machine on a given + system, and the NSel identifies the CLNP user. Since there can be + more than one CLNP user (meaning multiple NSel values for a given + "base" NSAP), the representation of the NSAP should be CLNP-user + independent. To achieve this, an NSel value of zero shall be used + with all NSAP values stored in the DNS. An NSAP with NSel=0 + identifies the network layer itself. It is left to the application + retrieving the NSAP to determine the appropriate value to use in that + instance of communication. + + When CLNP is used to support TCP and UDP services, the NSel value + used is the appropriate IP PROTO value as registered with the IANA. + For "standard" OSI, the selection of NSel values is left as a matter + of local administration. Administrators of systems that support the + OSI transport protocol [4] in addition to TCP/UDP must select NSels + for use by OSI Transport that do not conflict with the IP PROTO + values. + + + +Manning & Colella [Page 3] + +RFC 1706 DNS NSAP RRs October 1994 + + + |--------------| + | <-- IDP --> | + |--------------|-------------------------------------| + | AFI | IDI | <-- DSP --> | + |-----|--------|-------------------------------------| + | 47 | 0005 | DFI | AA |Rsvd | RD |Area | ID |Sel | + |-----|--------|-----|----|-----|----|-----|----|----| + octets | 1 | 2 | 1 | 3 | 2 | 2 | 2 | 6 | 1 | + |-----|--------|-----|----|-----|----|-----|----|----| + + IDP Initial Domain Part + AFI Authority and Format Identifier + IDI Initial Domain Identifier + DSP Domain Specific Part + DFI DSP Format Identifier + AA Administrative Authority + Rsvd Reserved + RD Routing Domain Identifier + Area Area Identifier + ID System Identifier + SEL NSAP Selector + + Figure 1: GOSIP Version 2 NSAP structure. + + + In the NSAP RRs in Master Files and in the printed text in this memo, + NSAPs are often represented as a string of "."-separated hex values. + The values correspond to convenient divisions of the NSAP to make it + more readable. For example, the "."-separated fields might correspond + to the NSAP fields as defined by the appropriate authority (RARE, + U.S. GOSIP, ANSI, etc.). The use of this notation is strictly for + readability. The "."s do not appear in DNS packets and DNS servers + can ignore them when reading Master Files. For example, a printable + representation of the first four fields of a U.S. GOSIP NSAP might + look like + + 47.0005.80.005a00 + + and a full U.S. GOSIP NSAP might appear as + + 47.0005.80.005a00.0000.1000.0020.00800a123456.00. + + Other NSAP formats have different lengths and different + administratively defined field widths to accomodate different + requirements. For more information on NSAP formats in use see RFC + 1629 [1]. + + + + + +Manning & Colella [Page 4] + +RFC 1706 DNS NSAP RRs October 1994 + + +5. The NSAP RR + + The NSAP RR is defined with mnemonic "NSAP" and TYPE code 22 + (decimal) and is used to map from domain names to NSAPs. Name-to-NSAP + mapping in the DNS using the NSAP RR operates analogously to IP + address lookup. A query is generated by the resolver requesting an + NSAP RR for a provided domain name. + + NSAP RRs conform to the top level RR format and semantics as defined + in Section 3.2.1 of RFC 1035. + + 1 1 1 1 1 1 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | | + / / + / NAME / + | | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | TYPE = NSAP | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | CLASS = IN | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | TTL | + | | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | RDLENGTH | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + / RDATA / + / / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + + where: + + * NAME: an owner name, i.e., the name of the node to which this + resource record pertains. + + * TYPE: two octets containing the NSAP RR TYPE code of 22 (decimal). + + * CLASS: two octets containing the RR IN CLASS code of 1. + + * TTL: a 32 bit signed integer that specifies the time interval in + seconds that the resource record may be cached before the source + of the information should again be consulted. Zero values are + interpreted to mean that the RR can only be used for the + transaction in progress, and should not be cached. For example, + SOA records are always distributed with a zero TTL to prohibit + caching. Zero values can also be used for extremely volatile data. + + + +Manning & Colella [Page 5] + +RFC 1706 DNS NSAP RRs October 1994 + + + * RDLENGTH: an unsigned 16 bit integer that specifies the length in + octets of the RDATA field. + + * RDATA: a variable length string of octets containing the NSAP. + The value is the binary encoding of the NSAP as it would appear in + the CLNP source or destination address field. A typical example of + such an NSAP (in hex) is shown below. For this NSAP, RDLENGTH is + 20 (decimal); "."s have been omitted to emphasize that they don't + appear in the DNS packets. + + 39840f80005a0000000001e13708002010726e00 + + NSAP RRs cause no additional section processing. + +6. NSAP-to-name Mapping Using the PTR RR + + The PTR RR is defined in RFC 1035. This RR is typically used under + the "IN-ADDR.ARPA" domain to map from IPv4 addresses to domain names. + + Similarly, the PTR RR is used to map from NSAPs to domain names under + the "NSAP.INT" domain. A domain name is generated from the NSAP + according to the rules described below. A query is sent by the + resolver requesting a PTR RR for the provided domain name. + + A domain name is generated from an NSAP by reversing the hex nibbles + of the NSAP, treating each nibble as a separate subdomain, and + appending the top-level subdomain name "NSAP.INT" to it. For example, + the domain name used in the reverse lookup for the NSAP + + 47.0005.80.005a00.0000.0001.e133.ffffff000162.00 + + would appear as + + 0.0.2.6.1.0.0.0.f.f.f.f.f.f.3.3.1.e.1.0.0.0.0.0.0.0.0.0.a.5.0.0. \ + 0.8.5.0.0.0.7.4.NSAP.INT. + + [Implementation note: For sanity's sake user interfaces should be + designed to allow users to enter NSAPs using their natural order, + i.e., as they are typically written on paper. Also, arbitrary "."s + should be allowed (and ignored) on input.] + +7. Master File Format + + The format of NSAP RRs (and NSAP-related PTR RRs) in Master Files + conforms to Section 5, "Master Files," of RFC 1035. Below are + examples of the use of these RRs in Master Files to support name-to- + NSAP and NSAP-to-name mapping. + + + + +Manning & Colella [Page 6] + +RFC 1706 DNS NSAP RRs October 1994 + + + The NSAP RR introduces a new hex string format for the RDATA field. + The format is "0x" (i.e., a zero followed by an 'x' character) + followed by a variable length string of hex characters (0 to 9, a to + f). The hex string is case-insensitive. "."s (i.e., periods) may be + inserted in the hex string anywhere after the "0x" for readability. + The "."s have no significance other than for readability and are not + propagated in the protocol (e.g., queries or zone transfers). + + + ;;;;;; + ;;;;;; Master File for domain nsap.nist.gov. + ;;;;;; + + + @ IN SOA emu.ncsl.nist.gov. root.emu.ncsl.nist.gov. ( + 1994041800 ; Serial - date + 1800 ; Refresh - 30 minutes + 300 ; Retry - 5 minutes + 604800 ; Expire - 7 days + 3600 ) ; Minimum - 1 hour + IN NS emu.ncsl.nist.gov. + IN NS tuba.nsap.lanl.gov. + ; + ; + $ORIGIN nsap.nist.gov. + ; + ; hosts + ; + bsdi1 IN NSAP 0x47.0005.80.005a00.0000.0001.e133.ffffff000161.00 + IN A 129.6.224.161 + IN HINFO PC_486 BSDi1.1 + ; + bsdi2 IN NSAP 0x47.0005.80.005a00.0000.0001.e133.ffffff000162.00 + IN A 129.6.224.162 + IN HINFO PC_486 BSDi1.1 + ; + cursive IN NSAP 0x47.0005.80.005a00.0000.0001.e133.ffffff000171.00 + IN A 129.6.224.171 + IN HINFO PC_386 DOS_5.0/NCSA_Telnet(TUBA) + ; + infidel IN NSAP 0x47.0005.80.005a00.0000.0001.e133.ffffff000164.00 + IN A 129.6.55.164 + IN HINFO PC/486 BSDi1.0(TUBA) + ; + ; routers + ; + cisco1 IN NSAP 0x47.0005.80.005a00.0000.0001.e133.aaaaaa000151.00 + IN A 129.6.224.151 + + + +Manning & Colella [Page 7] + +RFC 1706 DNS NSAP RRs October 1994 + + + IN A 129.6.225.151 + IN A 129.6.229.151 + ; + 3com1 IN NSAP 0x47.0005.80.005a00.0000.0001.e133.aaaaaa000111.00 + IN A 129.6.224.111 + IN A 129.6.225.111 + IN A 129.6.228.111 + + + + + ;;;;;; + ;;;;;; Master File for reverse mapping of NSAPs under the + ;;;;;; NSAP prefix: + ;;;;;; + ;;;;;; 47.0005.80.005a00.0000.0001.e133 + ;;;;;; + + + @ IN SOA emu.ncsl.nist.gov. root.emu.ncsl.nist.gov. ( + 1994041800 ; Serial - date + 1800 ; Refresh - 30 minutes + 300 ; Retry - 5 minutes + 604800 ; Expire - 7 days + 3600 ) ; Minimum - 1 hour + IN NS emu.ncsl.nist.gov. + IN NS tuba.nsap.lanl.gov. + ; + ; + $ORIGIN 3.3.1.e.1.0.0.0.0.0.0.0.0.0.a.5.0.0.0.8.5.0.0.0.7.4.NSAP.INT. + ; + 0.0.1.6.1.0.0.0.f.f.f.f.f.f IN PTR bsdi1.nsap.nist.gov. + ; + 0.0.2.6.1.0.0.0.f.f.f.f.f.f IN PTR bsdi2.nsap.nist.gov. + ; + 0.0.1.7.1.0.0.0.f.f.f.f.f.f IN PTR cursive.nsap.nist.gov. + ; + 0.0.4.6.1.0.0.0.f.f.f.f.f.f IN PTR infidel.nsap.nist.gov. + ; + 0.0.1.5.1.0.0.0.a.a.a.a.a.a IN PTR cisco1.nsap.nist.gov. + ; + 0.0.1.1.1.0.0.0.a.a.a.a.a.a IN PTR 3com1.nsap.nist.gov. + +8. Security Considerations + + Security issues are not discussed in this memo. + + + + + +Manning & Colella [Page 8] + +RFC 1706 DNS NSAP RRs October 1994 + + +9. Authors' Addresses + + Bill Manning + USC/Information Sciences Institute + 4676 Admiralty Way + Marina del Rey, CA. 90292 + USA + + Phone: +1.310.822.1511 + EMail: bmanning@isi.edu + + + Richard Colella + National Institute of Standards and Technology + Technology/B217 + Gaithersburg, MD 20899 + USA + + Phone: +1 301-975-3627 + Fax: +1 301 590-0932 + EMail: colella@nist.gov + +10. References + + [1] Colella, R., Gardner, E., Callon, R., and Y. Rekhter, "Guidelines + for OSI NSAP Allocation inh the Internet", RFC 1629, NIST, + Wellfleet, Mitre, T.J. Watson Research Center, IBM Corp., May + 1994. + + [2] GOSIP Advanced Requirements Group. Government Open Systems + Interconnection Profile (GOSIP) Version 2. Federal Information + Processing Standard 146-1, U.S. Department of Commerce, National + Institute of Standards and Technology, Gaithersburg, MD, April + 1991. + + [3] ISO/IEC. Data interchange - structures for the identification of + organization. International Standard 6523, ISO/IEC JTC 1, + Switzerland, 1984. + + [4] ISO/IEC. Connection oriented transport protocol specification. + International Standard 8073, ISO/IEC JTC 1, Switzerland, 1986. + + [5] ISO/IEC. Protocol for Providing the Connectionless-mode Network + Service. International Standard 8473, ISO/IEC JTC 1, + Switzerland, 1986. + + + + + + +Manning & Colella [Page 9] + +RFC 1706 DNS NSAP RRs October 1994 + + + [6] ISO/IEC. Information Processing Systems -- Data Communications -- + Network Service Definition. International Standard 8348, ISO/IEC + JTC 1, Switzerland, 1993. + + [7] Mockapetris, P., "Domain Names -- Concepts and Facilities", STD + 13, RFC 1034, USC/Information Sciences Institute, November 1987. + + [8] Mockapetris, P., "Domain Names -- Implementation and + Specification", STD 13, RFC 1035, USC/Information Sciences + Institute, November 1987. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Manning & Colella [Page 10] + diff --git a/doc/rfc/rfc1750.txt b/doc/rfc/rfc1750.txt new file mode 100644 index 00000000..56d478c7 --- /dev/null +++ b/doc/rfc/rfc1750.txt @@ -0,0 +1,1683 @@ + + + + + + +Network Working Group D. Eastlake, 3rd +Request for Comments: 1750 DEC +Category: Informational S. Crocker + Cybercash + J. Schiller + MIT + December 1994 + + + Randomness Recommendations for Security + +Status of this Memo + + This memo provides information for the Internet community. This memo + does not specify an Internet standard of any kind. Distribution of + this memo is unlimited. + +Abstract + + Security systems today are built on increasingly strong cryptographic + algorithms that foil pattern analysis attempts. However, the security + of these systems is dependent on generating secret quantities for + passwords, cryptographic keys, and similar quantities. The use of + pseudo-random processes to generate secret quantities can result in + pseudo-security. The sophisticated attacker of these security + systems may find it easier to reproduce the environment that produced + the secret quantities, searching the resulting small set of + possibilities, than to locate the quantities in the whole of the + number space. + + Choosing random quantities to foil a resourceful and motivated + adversary is surprisingly difficult. This paper points out many + pitfalls in using traditional pseudo-random number generation + techniques for choosing such quantities. It recommends the use of + truly random hardware techniques and shows that the existing hardware + on many systems can be used for this purpose. It provides + suggestions to ameliorate the problem when a hardware solution is not + available. And it gives examples of how large such quantities need + to be for some particular applications. + + + + + + + + + + + + +Eastlake, Crocker & Schiller [Page 1] + +RFC 1750 Randomness Recommendations for Security December 1994 + + +Acknowledgements + + Comments on this document that have been incorporated were received + from (in alphabetic order) the following: + + David M. Balenson (TIS) + Don Coppersmith (IBM) + Don T. Davis (consultant) + Carl Ellison (Stratus) + Marc Horowitz (MIT) + Christian Huitema (INRIA) + Charlie Kaufman (IRIS) + Steve Kent (BBN) + Hal Murray (DEC) + Neil Haller (Bellcore) + Richard Pitkin (DEC) + Tim Redmond (TIS) + Doug Tygar (CMU) + +Table of Contents + + 1. Introduction........................................... 3 + 2. Requirements........................................... 4 + 3. Traditional Pseudo-Random Sequences.................... 5 + 4. Unpredictability....................................... 7 + 4.1 Problems with Clocks and Serial Numbers............... 7 + 4.2 Timing and Content of External Events................ 8 + 4.3 The Fallacy of Complex Manipulation.................. 8 + 4.4 The Fallacy of Selection from a Large Database....... 9 + 5. Hardware for Randomness............................... 10 + 5.1 Volume Required...................................... 10 + 5.2 Sensitivity to Skew.................................. 10 + 5.2.1 Using Stream Parity to De-Skew..................... 11 + 5.2.2 Using Transition Mappings to De-Skew............... 12 + 5.2.3 Using FFT to De-Skew............................... 13 + 5.2.4 Using Compression to De-Skew....................... 13 + 5.3 Existing Hardware Can Be Used For Randomness......... 14 + 5.3.1 Using Existing Sound/Video Input................... 14 + 5.3.2 Using Existing Disk Drives......................... 14 + 6. Recommended Non-Hardware Strategy..................... 14 + 6.1 Mixing Functions..................................... 15 + 6.1.1 A Trivial Mixing Function.......................... 15 + 6.1.2 Stronger Mixing Functions.......................... 16 + 6.1.3 Diff-Hellman as a Mixing Function.................. 17 + 6.1.4 Using a Mixing Function to Stretch Random Bits..... 17 + 6.1.5 Other Factors in Choosing a Mixing Function........ 18 + 6.2 Non-Hardware Sources of Randomness................... 19 + 6.3 Cryptographically Strong Sequences................... 19 + + + +Eastlake, Crocker & Schiller [Page 2] + +RFC 1750 Randomness Recommendations for Security December 1994 + + + 6.3.1 Traditional Strong Sequences....................... 20 + 6.3.2 The Blum Blum Shub Sequence Generator.............. 21 + 7. Key Generation Standards.............................. 22 + 7.1 US DoD Recommendations for Password Generation....... 23 + 7.2 X9.17 Key Generation................................. 23 + 8. Examples of Randomness Required....................... 24 + 8.1 Password Generation................................. 24 + 8.2 A Very High Security Cryptographic Key............... 25 + 8.2.1 Effort per Key Trial............................... 25 + 8.2.2 Meet in the Middle Attacks......................... 26 + 8.2.3 Other Considerations............................... 26 + 9. Conclusion............................................ 27 + 10. Security Considerations.............................. 27 + References............................................... 28 + Authors' Addresses....................................... 30 + +1. Introduction + + Software cryptography is coming into wider use. Systems like + Kerberos, PEM, PGP, etc. are maturing and becoming a part of the + network landscape [PEM]. These systems provide substantial + protection against snooping and spoofing. However, there is a + potential flaw. At the heart of all cryptographic systems is the + generation of secret, unguessable (i.e., random) numbers. + + For the present, the lack of generally available facilities for + generating such unpredictable numbers is an open wound in the design + of cryptographic software. For the software developer who wants to + build a key or password generation procedure that runs on a wide + range of hardware, the only safe strategy so far has been to force + the local installation to supply a suitable routine to generate + random numbers. To say the least, this is an awkward, error-prone + and unpalatable solution. + + It is important to keep in mind that the requirement is for data that + an adversary has a very low probability of guessing or determining. + This will fail if pseudo-random data is used which only meets + traditional statistical tests for randomness or which is based on + limited range sources, such as clocks. Frequently such random + quantities are determinable by an adversary searching through an + embarrassingly small space of possibilities. + + This informational document suggests techniques for producing random + quantities that will be resistant to such attack. It recommends that + future systems include hardware random number generation or provide + access to existing hardware that can be used for this purpose. It + suggests methods for use if such hardware is not available. And it + gives some estimates of the number of random bits required for sample + + + +Eastlake, Crocker & Schiller [Page 3] + +RFC 1750 Randomness Recommendations for Security December 1994 + + + applications. + +2. Requirements + + Probably the most commonly encountered randomness requirement today + is the user password. This is usually a simple character string. + Obviously, if a password can be guessed, it does not provide + security. (For re-usable passwords, it is desirable that users be + able to remember the password. This may make it advisable to use + pronounceable character strings or phrases composed on ordinary + words. But this only affects the format of the password information, + not the requirement that the password be very hard to guess.) + + Many other requirements come from the cryptographic arena. + Cryptographic techniques can be used to provide a variety of services + including confidentiality and authentication. Such services are + based on quantities, traditionally called "keys", that are unknown to + and unguessable by an adversary. + + In some cases, such as the use of symmetric encryption with the one + time pads [CRYPTO*] or the US Data Encryption Standard [DES], the + parties who wish to communicate confidentially and/or with + authentication must all know the same secret key. In other cases, + using what are called asymmetric or "public key" cryptographic + techniques, keys come in pairs. One key of the pair is private and + must be kept secret by one party, the other is public and can be + published to the world. It is computationally infeasible to + determine the private key from the public key [ASYMMETRIC, CRYPTO*]. + + The frequency and volume of the requirement for random quantities + differs greatly for different cryptographic systems. Using pure RSA + [CRYPTO*], random quantities are required when the key pair is + generated, but thereafter any number of messages can be signed + without any further need for randomness. The public key Digital + Signature Algorithm that has been proposed by the US National + Institute of Standards and Technology (NIST) requires good random + numbers for each signature. And encrypting with a one time pad, in + principle the strongest possible encryption technique, requires a + volume of randomness equal to all the messages to be processed. + + In most of these cases, an adversary can try to determine the + "secret" key by trial and error. (This is possible as long as the + key is enough smaller than the message that the correct key can be + uniquely identified.) The probability of an adversary succeeding at + this must be made acceptably low, depending on the particular + application. The size of the space the adversary must search is + related to the amount of key "information" present in the information + theoretic sense [SHANNON]. This depends on the number of different + + + +Eastlake, Crocker & Schiller [Page 4] + +RFC 1750 Randomness Recommendations for Security December 1994 + + + secret values possible and the probability of each value as follows: + + ----- + \ + Bits-of-info = \ - p * log ( p ) + / i 2 i + / + ----- + + where i varies from 1 to the number of possible secret values and p + sub i is the probability of the value numbered i. (Since p sub i is + less than one, the log will be negative so each term in the sum will + be non-negative.) + + If there are 2^n different values of equal probability, then n bits + of information are present and an adversary would, on the average, + have to try half of the values, or 2^(n-1) , before guessing the + secret quantity. If the probability of different values is unequal, + then there is less information present and fewer guesses will, on + average, be required by an adversary. In particular, any values that + the adversary can know are impossible, or are of low probability, can + be initially ignored by an adversary, who will search through the + more probable values first. + + For example, consider a cryptographic system that uses 56 bit keys. + If these 56 bit keys are derived by using a fixed pseudo-random + number generator that is seeded with an 8 bit seed, then an adversary + needs to search through only 256 keys (by running the pseudo-random + number generator with every possible seed), not the 2^56 keys that + may at first appear to be the case. Only 8 bits of "information" are + in these 56 bit keys. + +3. Traditional Pseudo-Random Sequences + + Most traditional sources of random numbers use deterministic sources + of "pseudo-random" numbers. These typically start with a "seed" + quantity and use numeric or logical operations to produce a sequence + of values. + + [KNUTH] has a classic exposition on pseudo-random numbers. + Applications he mentions are simulation of natural phenomena, + sampling, numerical analysis, testing computer programs, decision + making, and games. None of these have the same characteristics as + the sort of security uses we are talking about. Only in the last two + could there be an adversary trying to find the random quantity. + However, in these cases, the adversary normally has only a single + chance to use a guessed value. In guessing passwords or attempting + to break an encryption scheme, the adversary normally has many, + + + +Eastlake, Crocker & Schiller [Page 5] + +RFC 1750 Randomness Recommendations for Security December 1994 + + + perhaps unlimited, chances at guessing the correct value and should + be assumed to be aided by a computer. + + For testing the "randomness" of numbers, Knuth suggests a variety of + measures including statistical and spectral. These tests check + things like autocorrelation between different parts of a "random" + sequence or distribution of its values. They could be met by a + constant stored random sequence, such as the "random" sequence + printed in the CRC Standard Mathematical Tables [CRC]. + + A typical pseudo-random number generation technique, known as a + linear congruence pseudo-random number generator, is modular + arithmetic where the N+1th value is calculated from the Nth value by + + V = ( V * a + b )(Mod c) + N+1 N + + The above technique has a strong relationship to linear shift + register pseudo-random number generators, which are well understood + cryptographically [SHIFT*]. In such generators bits are introduced + at one end of a shift register as the Exclusive Or (binary sum + without carry) of bits from selected fixed taps into the register. + + For example: + + +----+ +----+ +----+ +----+ + | B | <-- | B | <-- | B | <-- . . . . . . <-- | B | <-+ + | 0 | | 1 | | 2 | | n | | + +----+ +----+ +----+ +----+ | + | | | | + | | V +-----+ + | V +----------------> | | + V +-----------------------------> | XOR | + +---------------------------------------------------> | | + +-----+ + + + V = ( ( V * 2 ) + B .xor. B ... )(Mod 2^n) + N+1 N 0 2 + + The goodness of traditional pseudo-random number generator algorithms + is measured by statistical tests on such sequences. Carefully chosen + values of the initial V and a, b, and c or the placement of shift + register tap in the above simple processes can produce excellent + statistics. + + + + + + +Eastlake, Crocker & Schiller [Page 6] + +RFC 1750 Randomness Recommendations for Security December 1994 + + + These sequences may be adequate in simulations (Monte Carlo + experiments) as long as the sequence is orthogonal to the structure + of the space being explored. Even there, subtle patterns may cause + problems. However, such sequences are clearly bad for use in + security applications. They are fully predictable if the initial + state is known. Depending on the form of the pseudo-random number + generator, the sequence may be determinable from observation of a + short portion of the sequence [CRYPTO*, STERN]. For example, with + the generators above, one can determine V(n+1) given knowledge of + V(n). In fact, it has been shown that with these techniques, even if + only one bit of the pseudo-random values is released, the seed can be + determined from short sequences. + + Not only have linear congruent generators been broken, but techniques + are now known for breaking all polynomial congruent generators + [KRAWCZYK]. + +4. Unpredictability + + Randomness in the traditional sense described in section 3 is NOT the + same as the unpredictability required for security use. + + For example, use of a widely available constant sequence, such as + that from the CRC tables, is very weak against an adversary. Once + they learn of or guess it, they can easily break all security, future + and past, based on the sequence [CRC]. Yet the statistical + properties of these tables are good. + + The following sections describe the limitations of some randomness + generation techniques and sources. + +4.1 Problems with Clocks and Serial Numbers + + Computer clocks, or similar operating system or hardware values, + provide significantly fewer real bits of unpredictability than might + appear from their specifications. + + Tests have been done on clocks on numerous systems and it was found + that their behavior can vary widely and in unexpected ways. One + version of an operating system running on one set of hardware may + actually provide, say, microsecond resolution in a clock while a + different configuration of the "same" system may always provide the + same lower bits and only count in the upper bits at much lower + resolution. This means that successive reads on the clock may + produce identical values even if enough time has passed that the + value "should" change based on the nominal clock resolution. There + are also cases where frequently reading a clock can produce + artificial sequential values because of extra code that checks for + + + +Eastlake, Crocker & Schiller [Page 7] + +RFC 1750 Randomness Recommendations for Security December 1994 + + + the clock being unchanged between two reads and increases it by one! + Designing portable application code to generate unpredictable numbers + based on such system clocks is particularly challenging because the + system designer does not always know the properties of the system + clocks that the code will execute on. + + Use of a hardware serial number such as an Ethernet address may also + provide fewer bits of uniqueness than one would guess. Such + quantities are usually heavily structured and subfields may have only + a limited range of possible values or values easily guessable based + on approximate date of manufacture or other data. For example, it is + likely that most of the Ethernet cards installed on Digital Equipment + Corporation (DEC) hardware within DEC were manufactured by DEC + itself, which significantly limits the range of built in addresses. + + Problems such as those described above related to clocks and serial + numbers make code to produce unpredictable quantities difficult if + the code is to be ported across a variety of computer platforms and + systems. + +4.2 Timing and Content of External Events + + It is possible to measure the timing and content of mouse movement, + key strokes, and similar user events. This is a reasonable source of + unguessable data with some qualifications. On some machines, inputs + such as key strokes are buffered. Even though the user's inter- + keystroke timing may have sufficient variation and unpredictability, + there might not be an easy way to access that variation. Another + problem is that no standard method exists to sample timing details. + This makes it hard to build standard software intended for + distribution to a large range of machines based on this technique. + + The amount of mouse movement or the keys actually hit are usually + easier to access than timings but may yield less unpredictability as + the user may provide highly repetitive input. + + Other external events, such as network packet arrival times, can also + be used with care. In particular, the possibility of manipulation of + such times by an adversary must be considered. + +4.3 The Fallacy of Complex Manipulation + + One strategy which may give a misleading appearance of + unpredictability is to take a very complex algorithm (or an excellent + traditional pseudo-random number generator with good statistical + properties) and calculate a cryptographic key by starting with the + current value of a computer system clock as the seed. An adversary + who knew roughly when the generator was started would have a + + + +Eastlake, Crocker & Schiller [Page 8] + +RFC 1750 Randomness Recommendations for Security December 1994 + + + relatively small number of seed values to test as they would know + likely values of the system clock. Large numbers of pseudo-random + bits could be generated but the search space an adversary would need + to check could be quite small. + + Thus very strong and/or complex manipulation of data will not help if + the adversary can learn what the manipulation is and there is not + enough unpredictability in the starting seed value. Even if they can + not learn what the manipulation is, they may be able to use the + limited number of results stemming from a limited number of seed + values to defeat security. + + Another serious strategy error is to assume that a very complex + pseudo-random number generation algorithm will produce strong random + numbers when there has been no theory behind or analysis of the + algorithm. There is a excellent example of this fallacy right near + the beginning of chapter 3 in [KNUTH] where the author describes a + complex algorithm. It was intended that the machine language program + corresponding to the algorithm would be so complicated that a person + trying to read the code without comments wouldn't know what the + program was doing. Unfortunately, actual use of this algorithm + showed that it almost immediately converged to a single repeated + value in one case and a small cycle of values in another case. + + Not only does complex manipulation not help you if you have a limited + range of seeds but blindly chosen complex manipulation can destroy + the randomness in a good seed! + +4.4 The Fallacy of Selection from a Large Database + + Another strategy that can give a misleading appearance of + unpredictability is selection of a quantity randomly from a database + and assume that its strength is related to the total number of bits + in the database. For example, typical USENET servers as of this date + process over 35 megabytes of information per day. Assume a random + quantity was selected by fetching 32 bytes of data from a random + starting point in this data. This does not yield 32*8 = 256 bits + worth of unguessability. Even after allowing that much of the data + is human language and probably has more like 2 or 3 bits of + information per byte, it doesn't yield 32*2.5 = 80 bits of + unguessability. For an adversary with access to the same 35 + megabytes the unguessability rests only on the starting point of the + selection. That is, at best, about 25 bits of unguessability in this + case. + + The same argument applies to selecting sequences from the data on a + CD ROM or Audio CD recording or any other large public database. If + the adversary has access to the same database, this "selection from a + + + +Eastlake, Crocker & Schiller [Page 9] + +RFC 1750 Randomness Recommendations for Security December 1994 + + + large volume of data" step buys very little. However, if a selection + can be made from data to which the adversary has no access, such as + system buffers on an active multi-user system, it may be of some + help. + +5. Hardware for Randomness + + Is there any hope for strong portable randomness in the future? + There might be. All that's needed is a physical source of + unpredictable numbers. + + A thermal noise or radioactive decay source and a fast, free-running + oscillator would do the trick directly [GIFFORD]. This is a trivial + amount of hardware, and could easily be included as a standard part + of a computer system's architecture. Furthermore, any system with a + spinning disk or the like has an adequate source of randomness + [DAVIS]. All that's needed is the common perception among computer + vendors that this small additional hardware and the software to + access it is necessary and useful. + +5.1 Volume Required + + How much unpredictability is needed? Is it possible to quantify the + requirement in, say, number of random bits per second? + + The answer is not very much is needed. For DES, the key is 56 bits + and, as we show in an example in Section 8, even the highest security + system is unlikely to require a keying material of over 200 bits. If + a series of keys are needed, it can be generated from a strong random + seed using a cryptographically strong sequence as explained in + Section 6.3. A few hundred random bits generated once a day would be + enough using such techniques. Even if the random bits are generated + as slowly as one per second and it is not possible to overlap the + generation process, it should be tolerable in high security + applications to wait 200 seconds occasionally. + + These numbers are trivial to achieve. It could be done by a person + repeatedly tossing a coin. Almost any hardware process is likely to + be much faster. + +5.2 Sensitivity to Skew + + Is there any specific requirement on the shape of the distribution of + the random numbers? The good news is the distribution need not be + uniform. All that is needed is a conservative estimate of how non- + uniform it is to bound performance. Two simple techniques to de-skew + the bit stream are given below and stronger techniques are mentioned + in Section 6.1.2 below. + + + +Eastlake, Crocker & Schiller [Page 10] + +RFC 1750 Randomness Recommendations for Security December 1994 + + +5.2.1 Using Stream Parity to De-Skew + + Consider taking a sufficiently long string of bits and map the string + to "zero" or "one". The mapping will not yield a perfectly uniform + distribution, but it can be as close as desired. One mapping that + serves the purpose is to take the parity of the string. This has the + advantages that it is robust across all degrees of skew up to the + estimated maximum skew and is absolutely trivial to implement in + hardware. + + The following analysis gives the number of bits that must be sampled: + + Suppose the ratio of ones to zeros is 0.5 + e : 0.5 - e, where e is + between 0 and 0.5 and is a measure of the "eccentricity" of the + distribution. Consider the distribution of the parity function of N + bit samples. The probabilities that the parity will be one or zero + will be the sum of the odd or even terms in the binomial expansion of + (p + q)^N, where p = 0.5 + e, the probability of a one, and q = 0.5 - + e, the probability of a zero. + + These sums can be computed easily as + + N N + 1/2 * ( ( p + q ) + ( p - q ) ) + and + N N + 1/2 * ( ( p + q ) - ( p - q ) ). + + (Which one corresponds to the probability the parity will be 1 + depends on whether N is odd or even.) + + Since p + q = 1 and p - q = 2e, these expressions reduce to + + N + 1/2 * [1 + (2e) ] + and + N + 1/2 * [1 - (2e) ]. + + Neither of these will ever be exactly 0.5 unless e is zero, but we + can bring them arbitrarily close to 0.5. If we want the + probabilities to be within some delta d of 0.5, i.e. then + + N + ( 0.5 + ( 0.5 * (2e) ) ) < 0.5 + d. + + + + + + +Eastlake, Crocker & Schiller [Page 11] + +RFC 1750 Randomness Recommendations for Security December 1994 + + + Solving for N yields N > log(2d)/log(2e). (Note that 2e is less than + 1, so its log is negative. Division by a negative number reverses + the sense of an inequality.) + + The following table gives the length of the string which must be + sampled for various degrees of skew in order to come within 0.001 of + a 50/50 distribution. + + +---------+--------+-------+ + | Prob(1) | e | N | + +---------+--------+-------+ + | 0.5 | 0.00 | 1 | + | 0.6 | 0.10 | 4 | + | 0.7 | 0.20 | 7 | + | 0.8 | 0.30 | 13 | + | 0.9 | 0.40 | 28 | + | 0.95 | 0.45 | 59 | + | 0.99 | 0.49 | 308 | + +---------+--------+-------+ + + The last entry shows that even if the distribution is skewed 99% in + favor of ones, the parity of a string of 308 samples will be within + 0.001 of a 50/50 distribution. + +5.2.2 Using Transition Mappings to De-Skew + + Another technique, originally due to von Neumann [VON NEUMANN], is to + examine a bit stream as a sequence of non-overlapping pairs. You + could then discard any 00 or 11 pairs found, interpret 01 as a 0 and + 10 as a 1. Assume the probability of a 1 is 0.5+e and the + probability of a 0 is 0.5-e where e is the eccentricity of the source + and described in the previous section. Then the probability of each + pair is as follows: + + +------+-----------------------------------------+ + | pair | probability | + +------+-----------------------------------------+ + | 00 | (0.5 - e)^2 = 0.25 - e + e^2 | + | 01 | (0.5 - e)*(0.5 + e) = 0.25 - e^2 | + | 10 | (0.5 + e)*(0.5 - e) = 0.25 - e^2 | + | 11 | (0.5 + e)^2 = 0.25 + e + e^2 | + +------+-----------------------------------------+ + + This technique will completely eliminate any bias but at the expense + of taking an indeterminate number of input bits for any particular + desired number of output bits. The probability of any particular + pair being discarded is 0.5 + 2e^2 so the expected number of input + bits to produce X output bits is X/(0.25 - e^2). + + + +Eastlake, Crocker & Schiller [Page 12] + +RFC 1750 Randomness Recommendations for Security December 1994 + + + This technique assumes that the bits are from a stream where each bit + has the same probability of being a 0 or 1 as any other bit in the + stream and that bits are not correlated, i.e., that the bits are + identical independent distributions. If alternate bits were from two + correlated sources, for example, the above analysis breaks down. + + The above technique also provides another illustration of how a + simple statistical analysis can mislead if one is not always on the + lookout for patterns that could be exploited by an adversary. If the + algorithm were mis-read slightly so that overlapping successive bits + pairs were used instead of non-overlapping pairs, the statistical + analysis given is the same; however, instead of provided an unbiased + uncorrelated series of random 1's and 0's, it instead produces a + totally predictable sequence of exactly alternating 1's and 0's. + +5.2.3 Using FFT to De-Skew + + When real world data consists of strongly biased or correlated bits, + it may still contain useful amounts of randomness. This randomness + can be extracted through use of the discrete Fourier transform or its + optimized variant, the FFT. + + Using the Fourier transform of the data, strong correlations can be + discarded. If adequate data is processed and remaining correlations + decay, spectral lines approaching statistical independence and + normally distributed randomness can be produced [BRILLINGER]. + +5.2.4 Using Compression to De-Skew + + Reversible compression techniques also provide a crude method of de- + skewing a skewed bit stream. This follows directly from the + definition of reversible compression and the formula in Section 2 + above for the amount of information in a sequence. Since the + compression is reversible, the same amount of information must be + present in the shorter output than was present in the longer input. + By the Shannon information equation, this is only possible if, on + average, the probabilities of the different shorter sequences are + more uniformly distributed than were the probabilities of the longer + sequences. Thus the shorter sequences are de-skewed relative to the + input. + + However, many compression techniques add a somewhat predicatable + preface to their output stream and may insert such a sequence again + periodically in their output or otherwise introduce subtle patterns + of their own. They should be considered only a rough technique + compared with those described above or in Section 6.1.2. At a + minimum, the beginning of the compressed sequence should be skipped + and only later bits used for applications requiring random bits. + + + +Eastlake, Crocker & Schiller [Page 13] + +RFC 1750 Randomness Recommendations for Security December 1994 + + +5.3 Existing Hardware Can Be Used For Randomness + + As described below, many computers come with hardware that can, with + care, be used to generate truly random quantities. + +5.3.1 Using Existing Sound/Video Input + + Increasingly computers are being built with inputs that digitize some + real world analog source, such as sound from a microphone or video + input from a camera. Under appropriate circumstances, such input can + provide reasonably high quality random bits. The "input" from a + sound digitizer with no source plugged in or a camera with the lens + cap on, if the system has enough gain to detect anything, is + essentially thermal noise. + + For example, on a SPARCstation, one can read from the /dev/audio + device with nothing plugged into the microphone jack. Such data is + essentially random noise although it should not be trusted without + some checking in case of hardware failure. It will, in any case, + need to be de-skewed as described elsewhere. + + Combining this with compression to de-skew one can, in UNIXese, + generate a huge amount of medium quality random data by doing + + cat /dev/audio | compress - >random-bits-file + +5.3.2 Using Existing Disk Drives + + Disk drives have small random fluctuations in their rotational speed + due to chaotic air turbulence [DAVIS]. By adding low level disk seek + time instrumentation to a system, a series of measurements can be + obtained that include this randomness. Such data is usually highly + correlated so that significant processing is needed, including FFT + (see section 5.2.3). Nevertheless experimentation has shown that, + with such processing, disk drives easily produce 100 bits a minute or + more of excellent random data. + + Partly offsetting this need for processing is the fact that disk + drive failure will normally be rapidly noticed. Thus, problems with + this method of random number generation due to hardware failure are + very unlikely. + +6. Recommended Non-Hardware Strategy + + What is the best overall strategy for meeting the requirement for + unguessable random numbers in the absence of a reliable hardware + source? It is to obtain random input from a large number of + uncorrelated sources and to mix them with a strong mixing function. + + + +Eastlake, Crocker & Schiller [Page 14] + +RFC 1750 Randomness Recommendations for Security December 1994 + + + Such a function will preserve the randomness present in any of the + sources even if other quantities being combined are fixed or easily + guessable. This may be advisable even with a good hardware source as + hardware can also fail, though this should be weighed against any + increase in the chance of overall failure due to added software + complexity. + +6.1 Mixing Functions + + A strong mixing function is one which combines two or more inputs and + produces an output where each output bit is a different complex non- + linear function of all the input bits. On average, changing any + input bit will change about half the output bits. But because the + relationship is complex and non-linear, no particular output bit is + guaranteed to change when any particular input bit is changed. + + Consider the problem of converting a stream of bits that is skewed + towards 0 or 1 to a shorter stream which is more random, as discussed + in Section 5.2 above. This is simply another case where a strong + mixing function is desired, mixing the input bits to produce a + smaller number of output bits. The technique given in Section 5.2.1 + of using the parity of a number of bits is simply the result of + successively Exclusive Or'ing them which is examined as a trivial + mixing function immediately below. Use of stronger mixing functions + to extract more of the randomness in a stream of skewed bits is + examined in Section 6.1.2. + +6.1.1 A Trivial Mixing Function + + A trivial example for single bit inputs is the Exclusive Or function, + which is equivalent to addition without carry, as show in the table + below. This is a degenerate case in which the one output bit always + changes for a change in either input bit. But, despite its + simplicity, it will still provide a useful illustration. + + +-----------+-----------+----------+ + | input 1 | input 2 | output | + +-----------+-----------+----------+ + | 0 | 0 | 0 | + | 0 | 1 | 1 | + | 1 | 0 | 1 | + | 1 | 1 | 0 | + +-----------+-----------+----------+ + + If inputs 1 and 2 are uncorrelated and combined in this fashion then + the output will be an even better (less skewed) random bit than the + inputs. If we assume an "eccentricity" e as defined in Section 5.2 + above, then the output eccentricity relates to the input eccentricity + + + +Eastlake, Crocker & Schiller [Page 15] + +RFC 1750 Randomness Recommendations for Security December 1994 + + + as follows: + + e = 2 * e * e + output input 1 input 2 + + Since e is never greater than 1/2, the eccentricity is always + improved except in the case where at least one input is a totally + skewed constant. This is illustrated in the following table where + the top and left side values are the two input eccentricities and the + entries are the output eccentricity: + + +--------+--------+--------+--------+--------+--------+--------+ + | e | 0.00 | 0.10 | 0.20 | 0.30 | 0.40 | 0.50 | + +--------+--------+--------+--------+--------+--------+--------+ + | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | + | 0.10 | 0.00 | 0.02 | 0.04 | 0.06 | 0.08 | 0.10 | + | 0.20 | 0.00 | 0.04 | 0.08 | 0.12 | 0.16 | 0.20 | + | 0.30 | 0.00 | 0.06 | 0.12 | 0.18 | 0.24 | 0.30 | + | 0.40 | 0.00 | 0.08 | 0.16 | 0.24 | 0.32 | 0.40 | + | 0.50 | 0.00 | 0.10 | 0.20 | 0.30 | 0.40 | 0.50 | + +--------+--------+--------+--------+--------+--------+--------+ + + However, keep in mind that the above calculations assume that the + inputs are not correlated. If the inputs were, say, the parity of + the number of minutes from midnight on two clocks accurate to a few + seconds, then each might appear random if sampled at random intervals + much longer than a minute. Yet if they were both sampled and + combined with xor, the result would be zero most of the time. + +6.1.2 Stronger Mixing Functions + + The US Government Data Encryption Standard [DES] is an example of a + strong mixing function for multiple bit quantities. It takes up to + 120 bits of input (64 bits of "data" and 56 bits of "key") and + produces 64 bits of output each of which is dependent on a complex + non-linear function of all input bits. Other strong encryption + functions with this characteristic can also be used by considering + them to mix all of their key and data input bits. + + Another good family of mixing functions are the "message digest" or + hashing functions such as The US Government Secure Hash Standard + [SHS] and the MD2, MD4, MD5 [MD2, MD4, MD5] series. These functions + all take an arbitrary amount of input and produce an output mixing + all the input bits. The MD* series produce 128 bits of output and SHS + produces 160 bits. + + + + + + +Eastlake, Crocker & Schiller [Page 16] + +RFC 1750 Randomness Recommendations for Security December 1994 + + + Although the message digest functions are designed for variable + amounts of input, DES and other encryption functions can also be used + to combine any number of inputs. If 64 bits of output is adequate, + the inputs can be packed into a 64 bit data quantity and successive + 56 bit keys, padding with zeros if needed, which are then used to + successively encrypt using DES in Electronic Codebook Mode [DES + MODES]. If more than 64 bits of output are needed, use more complex + mixing. For example, if inputs are packed into three quantities, A, + B, and C, use DES to encrypt A with B as a key and then with C as a + key to produce the 1st part of the output, then encrypt B with C and + then A for more output and, if necessary, encrypt C with A and then B + for yet more output. Still more output can be produced by reversing + the order of the keys given above to stretch things. The same can be + done with the hash functions by hashing various subsets of the input + data to produce multiple outputs. But keep in mind that it is + impossible to get more bits of "randomness" out than are put in. + + An example of using a strong mixing function would be to reconsider + the case of a string of 308 bits each of which is biased 99% towards + zero. The parity technique given in Section 5.2.1 above reduced this + to one bit with only a 1/1000 deviance from being equally likely a + zero or one. But, applying the equation for information given in + Section 2, this 308 bit sequence has 5 bits of information in it. + Thus hashing it with SHS or MD5 and taking the bottom 5 bits of the + result would yield 5 unbiased random bits as opposed to the single + bit given by calculating the parity of the string. + +6.1.3 Diffie-Hellman as a Mixing Function + + Diffie-Hellman exponential key exchange is a technique that yields a + shared secret between two parties that can be made computationally + infeasible for a third party to determine even if they can observe + all the messages between the two communicating parties. This shared + secret is a mixture of initial quantities generated by each of them + [D-H]. If these initial quantities are random, then the shared + secret contains the combined randomness of them both, assuming they + are uncorrelated. + +6.1.4 Using a Mixing Function to Stretch Random Bits + + While it is not necessary for a mixing function to produce the same + or fewer bits than its inputs, mixing bits cannot "stretch" the + amount of random unpredictability present in the inputs. Thus four + inputs of 32 bits each where there is 12 bits worth of + unpredicatability (such as 4,096 equally probable values) in each + input cannot produce more than 48 bits worth of unpredictable output. + The output can be expanded to hundreds or thousands of bits by, for + example, mixing with successive integers, but the clever adversary's + + + +Eastlake, Crocker & Schiller [Page 17] + +RFC 1750 Randomness Recommendations for Security December 1994 + + + search space is still 2^48 possibilities. Furthermore, mixing to + fewer bits than are input will tend to strengthen the randomness of + the output the way using Exclusive Or to produce one bit from two did + above. + + The last table in Section 6.1.1 shows that mixing a random bit with a + constant bit with Exclusive Or will produce a random bit. While this + is true, it does not provide a way to "stretch" one random bit into + more than one. If, for example, a random bit is mixed with a 0 and + then with a 1, this produces a two bit sequence but it will always be + either 01 or 10. Since there are only two possible values, there is + still only the one bit of original randomness. + +6.1.5 Other Factors in Choosing a Mixing Function + + For local use, DES has the advantages that it has been widely tested + for flaws, is widely documented, and is widely implemented with + hardware and software implementations available all over the world + including source code available by anonymous FTP. The SHS and MD* + family are younger algorithms which have been less tested but there + is no particular reason to believe they are flawed. Both MD5 and SHS + were derived from the earlier MD4 algorithm. They all have source + code available by anonymous FTP [SHS, MD2, MD4, MD5]. + + DES and SHS have been vouched for the the US National Security Agency + (NSA) on the basis of criteria that primarily remain secret. While + this is the cause of much speculation and doubt, investigation of DES + over the years has indicated that NSA involvement in modifications to + its design, which originated with IBM, was primarily to strengthen + it. No concealed or special weakness has been found in DES. It is + almost certain that the NSA modification to MD4 to produce the SHS + similarly strengthened the algorithm, possibly against threats not + yet known in the public cryptographic community. + + DES, SHS, MD4, and MD5 are royalty free for all purposes. MD2 has + been freely licensed only for non-profit use in connection with + Privacy Enhanced Mail [PEM]. Between the MD* algorithms, some people + believe that, as with "Goldilocks and the Three Bears", MD2 is strong + but too slow, MD4 is fast but too weak, and MD5 is just right. + + Another advantage of the MD* or similar hashing algorithms over + encryption algorithms is that they are not subject to the same + regulations imposed by the US Government prohibiting the unlicensed + export or import of encryption/decryption software and hardware. The + same should be true of DES rigged to produce an irreversible hash + code but most DES packages are oriented to reversible encryption. + + + + + +Eastlake, Crocker & Schiller [Page 18] + +RFC 1750 Randomness Recommendations for Security December 1994 + + +6.2 Non-Hardware Sources of Randomness + + The best source of input for mixing would be a hardware randomness + such as disk drive timing affected by air turbulence, audio input + with thermal noise, or radioactive decay. However, if that is not + available there are other possibilities. These include system + clocks, system or input/output buffers, user/system/hardware/network + serial numbers and/or addresses and timing, and user input. + Unfortunately, any of these sources can produce limited or + predicatable values under some circumstances. + + Some of the sources listed above would be quite strong on multi-user + systems where, in essence, each user of the system is a source of + randomness. However, on a small single user system, such as a + typical IBM PC or Apple Macintosh, it might be possible for an + adversary to assemble a similar configuration. This could give the + adversary inputs to the mixing process that were sufficiently + correlated to those used originally as to make exhaustive search + practical. + + The use of multiple random inputs with a strong mixing function is + recommended and can overcome weakness in any particular input. For + example, the timing and content of requested "random" user keystrokes + can yield hundreds of random bits but conservative assumptions need + to be made. For example, assuming a few bits of randomness if the + inter-keystroke interval is unique in the sequence up to that point + and a similar assumption if the key hit is unique but assuming that + no bits of randomness are present in the initial key value or if the + timing or key value duplicate previous values. The results of mixing + these timings and characters typed could be further combined with + clock values and other inputs. + + This strategy may make practical portable code to produce good random + numbers for security even if some of the inputs are very weak on some + of the target systems. However, it may still fail against a high + grade attack on small single user systems, especially if the + adversary has ever been able to observe the generation process in the + past. A hardware based random source is still preferable. + +6.3 Cryptographically Strong Sequences + + In cases where a series of random quantities must be generated, an + adversary may learn some values in the sequence. In general, they + should not be able to predict other values from the ones that they + know. + + + + + + +Eastlake, Crocker & Schiller [Page 19] + +RFC 1750 Randomness Recommendations for Security December 1994 + + + The correct technique is to start with a strong random seed, take + cryptographically strong steps from that seed [CRYPTO2, CRYPTO3], and + do not reveal the complete state of the generator in the sequence + elements. If each value in the sequence can be calculated in a fixed + way from the previous value, then when any value is compromised, all + future values can be determined. This would be the case, for + example, if each value were a constant function of the previously + used values, even if the function were a very strong, non-invertible + message digest function. + + It should be noted that if your technique for generating a sequence + of key values is fast enough, it can trivially be used as the basis + for a confidentiality system. If two parties use the same sequence + generating technique and start with the same seed material, they will + generate identical sequences. These could, for example, be xor'ed at + one end with data being send, encrypting it, and xor'ed with this + data as received, decrypting it due to the reversible properties of + the xor operation. + +6.3.1 Traditional Strong Sequences + + A traditional way to achieve a strong sequence has been to have the + values be produced by hashing the quantities produced by + concatenating the seed with successive integers or the like and then + mask the values obtained so as to limit the amount of generator state + available to the adversary. + + It may also be possible to use an "encryption" algorithm with a + random key and seed value to encrypt and feedback some or all of the + output encrypted value into the value to be encrypted for the next + iteration. Appropriate feedback techniques will usually be + recommended with the encryption algorithm. An example is shown below + where shifting and masking are used to combine the cypher output + feedback. This type of feedback is recommended by the US Government + in connection with DES [DES MODES]. + + + + + + + + + + + + + + + + +Eastlake, Crocker & Schiller [Page 20] + +RFC 1750 Randomness Recommendations for Security December 1994 + + + +---------------+ + | V | + | | n | + +--+------------+ + | | +---------+ + | +---------> | | +-----+ + +--+ | Encrypt | <--- | Key | + | +-------- | | +-----+ + | | +---------+ + V V + +------------+--+ + | V | | + | n+1 | + +---------------+ + + Note that if a shift of one is used, this is the same as the shift + register technique described in Section 3 above but with the all + important difference that the feedback is determined by a complex + non-linear function of all bits rather than a simple linear or + polynomial combination of output from a few bit position taps. + + It has been shown by Donald W. Davies that this sort of shifted + partial output feedback significantly weakens an algorithm compared + will feeding all of the output bits back as input. In particular, + for DES, repeated encrypting a full 64 bit quantity will give an + expected repeat in about 2^63 iterations. Feeding back anything less + than 64 (and more than 0) bits will give an expected repeat in + between 2**31 and 2**32 iterations! + + To predict values of a sequence from others when the sequence was + generated by these techniques is equivalent to breaking the + cryptosystem or inverting the "non-invertible" hashing involved with + only partial information available. The less information revealed + each iteration, the harder it will be for an adversary to predict the + sequence. Thus it is best to use only one bit from each value. It + has been shown that in some cases this makes it impossible to break a + system even when the cryptographic system is invertible and can be + broken if all of each generated value was revealed. + +6.3.2 The Blum Blum Shub Sequence Generator + + Currently the generator which has the strongest public proof of + strength is called the Blum Blum Shub generator after its inventors + [BBS]. It is also very simple and is based on quadratic residues. + It's only disadvantage is that is is computationally intensive + compared with the traditional techniques give in 6.3.1 above. This + is not a serious draw back if it is used for moderately infrequent + purposes, such as generating session keys. + + + +Eastlake, Crocker & Schiller [Page 21] + +RFC 1750 Randomness Recommendations for Security December 1994 + + + Simply choose two large prime numbers, say p and q, which both have + the property that you get a remainder of 3 if you divide them by 4. + Let n = p * q. Then you choose a random number x relatively prime to + n. The initial seed for the generator and the method for calculating + subsequent values are then + + 2 + s = ( x )(Mod n) + 0 + + 2 + s = ( s )(Mod n) + i+1 i + + You must be careful to use only a few bits from the bottom of each s. + It is always safe to use only the lowest order bit. If you use no + more than the + + log ( log ( s ) ) + 2 2 i + + low order bits, then predicting any additional bits from a sequence + generated in this manner is provable as hard as factoring n. As long + as the initial x is secret, you can even make n public if you want. + + An intersting characteristic of this generator is that you can + directly calculate any of the s values. In particular + + i + ( ( 2 )(Mod (( p - 1 ) * ( q - 1 )) ) ) + s = ( s )(Mod n) + i 0 + + This means that in applications where many keys are generated in this + fashion, it is not necessary to save them all. Each key can be + effectively indexed and recovered from that small index and the + initial s and n. + +7. Key Generation Standards + + Several public standards are now in place for the generation of keys. + Two of these are described below. Both use DES but any equally + strong or stronger mixing function could be substituted. + + + + + + + + +Eastlake, Crocker & Schiller [Page 22] + +RFC 1750 Randomness Recommendations for Security December 1994 + + +7.1 US DoD Recommendations for Password Generation + + The United States Department of Defense has specific recommendations + for password generation [DoD]. They suggest using the US Data + Encryption Standard [DES] in Output Feedback Mode [DES MODES] as + follows: + + use an initialization vector determined from + the system clock, + system ID, + user ID, and + date and time; + use a key determined from + system interrupt registers, + system status registers, and + system counters; and, + as plain text, use an external randomly generated 64 bit + quantity such as 8 characters typed in by a system + administrator. + + The password can then be calculated from the 64 bit "cipher text" + generated in 64-bit Output Feedback Mode. As many bits as are needed + can be taken from these 64 bits and expanded into a pronounceable + word, phrase, or other format if a human being needs to remember the + password. + +7.2 X9.17 Key Generation + + The American National Standards Institute has specified a method for + generating a sequence of keys as follows: + + s is the initial 64 bit seed + 0 + + g is the sequence of generated 64 bit key quantities + n + + k is a random key reserved for generating this key sequence + + t is the time at which a key is generated to as fine a resolution + as is available (up to 64 bits). + + DES ( K, Q ) is the DES encryption of quantity Q with key K + + + + + + + + +Eastlake, Crocker & Schiller [Page 23] + +RFC 1750 Randomness Recommendations for Security December 1994 + + + g = DES ( k, DES ( k, t ) .xor. s ) + n n + + s = DES ( k, DES ( k, t ) .xor. g ) + n+1 n + + If g sub n is to be used as a DES key, then every eighth bit should + be adjusted for parity for that use but the entire 64 bit unmodified + g should be used in calculating the next s. + +8. Examples of Randomness Required + + Below are two examples showing rough calculations of needed + randomness for security. The first is for moderate security + passwords while the second assumes a need for a very high security + cryptographic key. + +8.1 Password Generation + + Assume that user passwords change once a year and it is desired that + the probability that an adversary could guess the password for a + particular account be less than one in a thousand. Further assume + that sending a password to the system is the only way to try a + password. Then the crucial question is how often an adversary can + try possibilities. Assume that delays have been introduced into a + system so that, at most, an adversary can make one password try every + six seconds. That's 600 per hour or about 15,000 per day or about + 5,000,000 tries in a year. Assuming any sort of monitoring, it is + unlikely someone could actually try continuously for a year. In + fact, even if log files are only checked monthly, 500,000 tries is + more plausible before the attack is noticed and steps taken to change + passwords and make it harder to try more passwords. + + To have a one in a thousand chance of guessing the password in + 500,000 tries implies a universe of at least 500,000,000 passwords or + about 2^29. Thus 29 bits of randomness are needed. This can probably + be achieved using the US DoD recommended inputs for password + generation as it has 8 inputs which probably average over 5 bits of + randomness each (see section 7.1). Using a list of 1000 words, the + password could be expressed as a three word phrase (1,000,000,000 + possibilities) or, using case insensitive letters and digits, six + would suffice ((26+10)^6 = 2,176,782,336 possibilities). + + For a higher security password, the number of bits required goes up. + To decrease the probability by 1,000 requires increasing the universe + of passwords by the same factor which adds about 10 bits. Thus to + have only a one in a million chance of a password being guessed under + the above scenario would require 39 bits of randomness and a password + + + +Eastlake, Crocker & Schiller [Page 24] + +RFC 1750 Randomness Recommendations for Security December 1994 + + + that was a four word phrase from a 1000 word list or eight + letters/digits. To go to a one in 10^9 chance, 49 bits of randomness + are needed implying a five word phrase or ten letter/digit password. + + In a real system, of course, there are also other factors. For + example, the larger and harder to remember passwords are, the more + likely users are to write them down resulting in an additional risk + of compromise. + +8.2 A Very High Security Cryptographic Key + + Assume that a very high security key is needed for symmetric + encryption / decryption between two parties. Assume an adversary can + observe communications and knows the algorithm being used. Within + the field of random possibilities, the adversary can try key values + in hopes of finding the one in use. Assume further that brute force + trial of keys is the best the adversary can do. + +8.2.1 Effort per Key Trial + + How much effort will it take to try each key? For very high security + applications it is best to assume a low value of effort. Even if it + would clearly take tens of thousands of computer cycles or more to + try a single key, there may be some pattern that enables huge blocks + of key values to be tested with much less effort per key. Thus it is + probably best to assume no more than a couple hundred cycles per key. + (There is no clear lower bound on this as computers operate in + parallel on a number of bits and a poor encryption algorithm could + allow many keys or even groups of keys to be tested in parallel. + However, we need to assume some value and can hope that a reasonably + strong algorithm has been chosen for our hypothetical high security + task.) + + If the adversary can command a highly parallel processor or a large + network of work stations, 2*10^10 cycles per second is probably a + minimum assumption for availability today. Looking forward just a + couple years, there should be at least an order of magnitude + improvement. Thus assuming 10^9 keys could be checked per second or + 3.6*10^11 per hour or 6*10^13 per week or 2.4*10^14 per month is + reasonable. This implies a need for a minimum of 51 bits of + randomness in keys to be sure they cannot be found in a month. Even + then it is possible that, a few years from now, a highly determined + and resourceful adversary could break the key in 2 weeks (on average + they need try only half the keys). + + + + + + + +Eastlake, Crocker & Schiller [Page 25] + +RFC 1750 Randomness Recommendations for Security December 1994 + + +8.2.2 Meet in the Middle Attacks + + If chosen or known plain text and the resulting encrypted text are + available, a "meet in the middle" attack is possible if the structure + of the encryption algorithm allows it. (In a known plain text + attack, the adversary knows all or part of the messages being + encrypted, possibly some standard header or trailer fields. In a + chosen plain text attack, the adversary can force some chosen plain + text to be encrypted, possibly by "leaking" an exciting text that + would then be sent by the adversary over an encrypted channel.) + + An oversimplified explanation of the meet in the middle attack is as + follows: the adversary can half-encrypt the known or chosen plain + text with all possible first half-keys, sort the output, then half- + decrypt the encoded text with all the second half-keys. If a match + is found, the full key can be assembled from the halves and used to + decrypt other parts of the message or other messages. At its best, + this type of attack can halve the exponent of the work required by + the adversary while adding a large but roughly constant factor of + effort. To be assured of safety against this, a doubling of the + amount of randomness in the key to a minimum of 102 bits is required. + + The meet in the middle attack assumes that the cryptographic + algorithm can be decomposed in this way but we can not rule that out + without a deep knowledge of the algorithm. Even if a basic algorithm + is not subject to a meet in the middle attack, an attempt to produce + a stronger algorithm by applying the basic algorithm twice (or two + different algorithms sequentially) with different keys may gain less + added security than would be expected. Such a composite algorithm + would be subject to a meet in the middle attack. + + Enormous resources may be required to mount a meet in the middle + attack but they are probably within the range of the national + security services of a major nation. Essentially all nations spy on + other nations government traffic and several nations are believed to + spy on commercial traffic for economic advantage. + +8.2.3 Other Considerations + + Since we have not even considered the possibilities of special + purpose code breaking hardware or just how much of a safety margin we + want beyond our assumptions above, probably a good minimum for a very + high security cryptographic key is 128 bits of randomness which + implies a minimum key length of 128 bits. If the two parties agree + on a key by Diffie-Hellman exchange [D-H], then in principle only + half of this randomness would have to be supplied by each party. + However, there is probably some correlation between their random + inputs so it is probably best to assume that each party needs to + + + +Eastlake, Crocker & Schiller [Page 26] + +RFC 1750 Randomness Recommendations for Security December 1994 + + + provide at least 96 bits worth of randomness for very high security + if Diffie-Hellman is used. + + This amount of randomness is beyond the limit of that in the inputs + recommended by the US DoD for password generation and could require + user typing timing, hardware random number generation, or other + sources. + + It should be noted that key length calculations such at those above + are controversial and depend on various assumptions about the + cryptographic algorithms in use. In some cases, a professional with + a deep knowledge of code breaking techniques and of the strength of + the algorithm in use could be satisfied with less than half of the + key size derived above. + +9. Conclusion + + Generation of unguessable "random" secret quantities for security use + is an essential but difficult task. + + We have shown that hardware techniques to produce such randomness + would be relatively simple. In particular, the volume and quality + would not need to be high and existing computer hardware, such as + disk drives, can be used. Computational techniques are available to + process low quality random quantities from multiple sources or a + larger quantity of such low quality input from one source and produce + a smaller quantity of higher quality, less predictable key material. + In the absence of hardware sources of randomness, a variety of user + and software sources can frequently be used instead with care; + however, most modern systems already have hardware, such as disk + drives or audio input, that could be used to produce high quality + randomness. + + Once a sufficient quantity of high quality seed key material (a few + hundred bits) is available, strong computational techniques are + available to produce cryptographically strong sequences of + unpredicatable quantities from this seed material. + +10. Security Considerations + + The entirety of this document concerns techniques and recommendations + for generating unguessable "random" quantities for use as passwords, + cryptographic keys, and similar security uses. + + + + + + + + +Eastlake, Crocker & Schiller [Page 27] + +RFC 1750 Randomness Recommendations for Security December 1994 + + +References + + [ASYMMETRIC] - Secure Communications and Asymmetric Cryptosystems, + edited by Gustavus J. Simmons, AAAS Selected Symposium 69, Westview + Press, Inc. + + [BBS] - A Simple Unpredictable Pseudo-Random Number Generator, SIAM + Journal on Computing, v. 15, n. 2, 1986, L. Blum, M. Blum, & M. Shub. + + [BRILLINGER] - Time Series: Data Analysis and Theory, Holden-Day, + 1981, David Brillinger. + + [CRC] - C.R.C. Standard Mathematical Tables, Chemical Rubber + Publishing Company. + + [CRYPTO1] - Cryptography: A Primer, A Wiley-Interscience Publication, + John Wiley & Sons, 1981, Alan G. Konheim. + + [CRYPTO2] - Cryptography: A New Dimension in Computer Data Security, + A Wiley-Interscience Publication, John Wiley & Sons, 1982, Carl H. + Meyer & Stephen M. Matyas. + + [CRYPTO3] - Applied Cryptography: Protocols, Algorithms, and Source + Code in C, John Wiley & Sons, 1994, Bruce Schneier. + + [DAVIS] - Cryptographic Randomness from Air Turbulence in Disk + Drives, Advances in Cryptology - Crypto '94, Springer-Verlag Lecture + Notes in Computer Science #839, 1984, Don Davis, Ross Ihaka, and + Philip Fenstermacher. + + [DES] - Data Encryption Standard, United States of America, + Department of Commerce, National Institute of Standards and + Technology, Federal Information Processing Standard (FIPS) 46-1. + - Data Encryption Algorithm, American National Standards Institute, + ANSI X3.92-1981. + (See also FIPS 112, Password Usage, which includes FORTRAN code for + performing DES.) + + [DES MODES] - DES Modes of Operation, United States of America, + Department of Commerce, National Institute of Standards and + Technology, Federal Information Processing Standard (FIPS) 81. + - Data Encryption Algorithm - Modes of Operation, American National + Standards Institute, ANSI X3.106-1983. + + [D-H] - New Directions in Cryptography, IEEE Transactions on + Information Technology, November, 1976, Whitfield Diffie and Martin + E. Hellman. + + + + +Eastlake, Crocker & Schiller [Page 28] + +RFC 1750 Randomness Recommendations for Security December 1994 + + + [DoD] - Password Management Guideline, United States of America, + Department of Defense, Computer Security Center, CSC-STD-002-85. + (See also FIPS 112, Password Usage, which incorporates CSC-STD-002-85 + as one of its appendices.) + + [GIFFORD] - Natural Random Number, MIT/LCS/TM-371, September 1988, + David K. Gifford + + [KNUTH] - The Art of Computer Programming, Volume 2: Seminumerical + Algorithms, Chapter 3: Random Numbers. Addison Wesley Publishing + Company, Second Edition 1982, Donald E. Knuth. + + [KRAWCZYK] - How to Predict Congruential Generators, Journal of + Algorithms, V. 13, N. 4, December 1992, H. Krawczyk + + [MD2] - The MD2 Message-Digest Algorithm, RFC1319, April 1992, B. + Kaliski + [MD4] - The MD4 Message-Digest Algorithm, RFC1320, April 1992, R. + Rivest + [MD5] - The MD5 Message-Digest Algorithm, RFC1321, April 1992, R. + Rivest + + [PEM] - RFCs 1421 through 1424: + - RFC 1424, Privacy Enhancement for Internet Electronic Mail: Part + IV: Key Certification and Related Services, 02/10/1993, B. Kaliski + - RFC 1423, Privacy Enhancement for Internet Electronic Mail: Part + III: Algorithms, Modes, and Identifiers, 02/10/1993, D. Balenson + - RFC 1422, Privacy Enhancement for Internet Electronic Mail: Part + II: Certificate-Based Key Management, 02/10/1993, S. Kent + - RFC 1421, Privacy Enhancement for Internet Electronic Mail: Part I: + Message Encryption and Authentication Procedures, 02/10/1993, J. Linn + + [SHANNON] - The Mathematical Theory of Communication, University of + Illinois Press, 1963, Claude E. Shannon. (originally from: Bell + System Technical Journal, July and October 1948) + + [SHIFT1] - Shift Register Sequences, Aegean Park Press, Revised + Edition 1982, Solomon W. Golomb. + + [SHIFT2] - Cryptanalysis of Shift-Register Generated Stream Cypher + Systems, Aegean Park Press, 1984, Wayne G. Barker. + + [SHS] - Secure Hash Standard, United States of American, National + Institute of Science and Technology, Federal Information Processing + Standard (FIPS) 180, April 1993. + + [STERN] - Secret Linear Congruential Generators are not + Cryptograhically Secure, Proceedings of IEEE STOC, 1987, J. Stern. + + + +Eastlake, Crocker & Schiller [Page 29] + +RFC 1750 Randomness Recommendations for Security December 1994 + + + [VON NEUMANN] - Various techniques used in connection with random + digits, von Neumann's Collected Works, Vol. 5, Pergamon Press, 1963, + J. von Neumann. + +Authors' Addresses + + Donald E. Eastlake 3rd + Digital Equipment Corporation + 550 King Street, LKG2-1/BB3 + Littleton, MA 01460 + + Phone: +1 508 486 6577(w) +1 508 287 4877(h) + EMail: dee@lkg.dec.com + + + Stephen D. Crocker + CyberCash Inc. + 2086 Hunters Crest Way + Vienna, VA 22181 + + Phone: +1 703-620-1222(w) +1 703-391-2651 (fax) + EMail: crocker@cybercash.com + + + Jeffrey I. Schiller + Massachusetts Institute of Technology + 77 Massachusetts Avenue + Cambridge, MA 02139 + + Phone: +1 617 253 0161(w) + EMail: jis@mit.edu + + + + + + + + + + + + + + + + + + + + +Eastlake, Crocker & Schiller [Page 30] + diff --git a/doc/rfc/rfc1876.txt b/doc/rfc/rfc1876.txt new file mode 100644 index 00000000..a289cffe --- /dev/null +++ b/doc/rfc/rfc1876.txt @@ -0,0 +1,1011 @@ + + + + + + +Network Working Group C. Davis +Request for Comments: 1876 Kapor Enterprises +Updates: 1034, 1035 P. Vixie +Category: Experimental Vixie Enterprises + T. Goodwin + FORE Systems + I. Dickinson + University of Warwick + January 1996 + + + A Means for Expressing Location Information in the Domain Name System + +Status of this Memo + + This memo defines an Experimental Protocol for the Internet + community. This memo does not specify an Internet standard of any + kind. Discussion and suggestions for improvement are requested. + Distribution of this memo is unlimited. + +1. Abstract + + This memo defines a new DNS RR type for experimental purposes. This + RFC describes a mechanism to allow the DNS to carry location + information about hosts, networks, and subnets. Such information for + a small subset of hosts is currently contained in the flat-file UUCP + maps. However, just as the DNS replaced the use of HOSTS.TXT to + carry host and network address information, it is possible to replace + the UUCP maps as carriers of location information. + + This RFC defines the format of a new Resource Record (RR) for the + Domain Name System (DNS), and reserves a corresponding DNS type + mnemonic (LOC) and numerical code (29). + + This RFC assumes that the reader is familiar with the DNS [RFC 1034, + RFC 1035]. The data shown in our examples is for pedagogical use and + does not necessarily reflect the real Internet. + + + + + + + + + + + + + + +Davis, et al Experimental [Page 1] + +RFC 1876 Location Information in the DNS January 1996 + + +2. RDATA Format + + MSB LSB + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + 0| VERSION | SIZE | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + 2| HORIZ PRE | VERT PRE | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + 4| LATITUDE | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + 6| LATITUDE | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + 8| LONGITUDE | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + 10| LONGITUDE | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + 12| ALTITUDE | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + 14| ALTITUDE | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + (octet) + +where: + +VERSION Version number of the representation. This must be zero. + Implementations are required to check this field and make + no assumptions about the format of unrecognized versions. + +SIZE The diameter of a sphere enclosing the described entity, in + centimeters, expressed as a pair of four-bit unsigned + integers, each ranging from zero to nine, with the most + significant four bits representing the base and the second + number representing the power of ten by which to multiply + the base. This allows sizes from 0e0 (<1cm) to 9e9 + (90,000km) to be expressed. This representation was chosen + such that the hexadecimal representation can be read by + eye; 0x15 = 1e5. Four-bit values greater than 9 are + undefined, as are values with a base of zero and a non-zero + exponent. + + Since 20000000m (represented by the value 0x29) is greater + than the equatorial diameter of the WGS 84 ellipsoid + (12756274m), it is therefore suitable for use as a + "worldwide" size. + +HORIZ PRE The horizontal precision of the data, in centimeters, + expressed using the same representation as SIZE. This is + the diameter of the horizontal "circle of error", rather + + + +Davis, et al Experimental [Page 2] + +RFC 1876 Location Information in the DNS January 1996 + + + than a "plus or minus" value. (This was chosen to match + the interpretation of SIZE; to get a "plus or minus" value, + divide by 2.) + +VERT PRE The vertical precision of the data, in centimeters, + expressed using the sane representation as for SIZE. This + is the total potential vertical error, rather than a "plus + or minus" value. (This was chosen to match the + interpretation of SIZE; to get a "plus or minus" value, + divide by 2.) Note that if altitude above or below sea + level is used as an approximation for altitude relative to + the [WGS 84] ellipsoid, the precision value should be + adjusted. + +LATITUDE The latitude of the center of the sphere described by the + SIZE field, expressed as a 32-bit integer, most significant + octet first (network standard byte order), in thousandths + of a second of arc. 2^31 represents the equator; numbers + above that are north latitude. + +LONGITUDE The longitude of the center of the sphere described by the + SIZE field, expressed as a 32-bit integer, most significant + octet first (network standard byte order), in thousandths + of a second of arc, rounded away from the prime meridian. + 2^31 represents the prime meridian; numbers above that are + east longitude. + +ALTITUDE The altitude of the center of the sphere described by the + SIZE field, expressed as a 32-bit integer, most significant + octet first (network standard byte order), in centimeters, + from a base of 100,000m below the [WGS 84] reference + spheroid used by GPS (semimajor axis a=6378137.0, + reciprocal flattening rf=298.257223563). Altitude above + (or below) sea level may be used as an approximation of + altitude relative to the the [WGS 84] spheroid, though due + to the Earth's surface not being a perfect spheroid, there + will be differences. (For example, the geoid (which sea + level approximates) for the continental US ranges from 10 + meters to 50 meters below the [WGS 84] spheroid. + Adjustments to ALTITUDE and/or VERT PRE will be necessary + in most cases. The Defense Mapping Agency publishes geoid + height values relative to the [WGS 84] ellipsoid. + + + + + + + + + +Davis, et al Experimental [Page 3] + +RFC 1876 Location Information in the DNS January 1996 + + +3. Master File Format + + The LOC record is expressed in a master file in the following format: + + <owner> <TTL> <class> LOC ( d1 [m1 [s1]] {"N"|"S"} d2 [m2 [s2]] + {"E"|"W"} alt["m"] [siz["m"] [hp["m"] + [vp["m"]]]] ) + + (The parentheses are used for multi-line data as specified in [RFC + 1035] section 5.1.) + + where: + + d1: [0 .. 90] (degrees latitude) + d2: [0 .. 180] (degrees longitude) + m1, m2: [0 .. 59] (minutes latitude/longitude) + s1, s2: [0 .. 59.999] (seconds latitude/longitude) + alt: [-100000.00 .. 42849672.95] BY .01 (altitude in meters) + siz, hp, vp: [0 .. 90000000.00] (size/precision in meters) + + If omitted, minutes and seconds default to zero, size defaults to 1m, + horizontal precision defaults to 10000m, and vertical precision + defaults to 10m. These defaults are chosen to represent typical + ZIP/postal code area sizes, since it is often easy to find + approximate geographical location by ZIP/postal code. + +4. Example Data + +;;; +;;; note that these data would not all appear in one zone file +;;; + +;; network LOC RR derived from ZIP data. note use of precision defaults +cambridge-net.kei.com. LOC 42 21 54 N 71 06 18 W -24m 30m + +;; higher-precision host LOC RR. note use of vertical precision default +loiosh.kei.com. LOC 42 21 43.952 N 71 5 6.344 W + -24m 1m 200m + +pipex.net. LOC 52 14 05 N 00 08 50 E 10m + +curtin.edu.au. LOC 32 7 19 S 116 2 25 E 10m + +rwy04L.logan-airport.boston. LOC 42 21 28.764 N 71 00 51.617 W + -44m 2000m + + + + + + +Davis, et al Experimental [Page 4] + +RFC 1876 Location Information in the DNS January 1996 + + +5. Application use of the LOC RR + +5.1 Suggested Uses + + Some uses for the LOC RR have already been suggested, including the + USENET backbone flow maps, a "visual traceroute" application showing + the geographical path of an IP packet, and network management + applications that could use LOC RRs to generate a map of hosts and + routers being managed. + +5.2 Search Algorithms + + This section specifies how to use the DNS to translate domain names + and/or IP addresses into location information. + + If an application wishes to have a "fallback" behavior, displaying a + less precise or larger area when a host does not have an associated + LOC RR, it MAY support use of the algorithm in section 5.2.3, as + noted in sections 5.2.1 and 5.2.2. If fallback is desired, this + behaviour is the RECOMMENDED default, but in some cases it may need + to be modified based on the specific requirements of the application + involved. + + This search algorithm is designed to allow network administrators to + specify the location of a network or subnet without requiring LOC RR + data for each individual host. For example, a computer lab with 24 + workstations, all of which are on the same subnet and in basically + the same location, would only need a LOC RR for the subnet. + (However, if the file server's location has been more precisely + measured, a separate LOC RR for it can be placed in the DNS.) + +5.2.1 Searching by Name + + If the application is beginning with a name, rather than an IP + address (as the USENET backbone flow maps do), it MUST check for a + LOC RR associated with that name. (CNAME records should be followed + as for any other RR type.) + + If there is no LOC RR for that name, all A records (if any) + associated with the name MAY be checked for network (or subnet) LOC + RRs using the "Searching by Network or Subnet" algorithm (5.2.3). If + multiple A records exist and have associated network or subnet LOC + RRs, the application may choose to use any, some, or all of the LOC + RRs found, possibly in combination. It is suggested that multi-homed + hosts have LOC RRs for their name in the DNS to avoid any ambiguity + in these cases. + + + + + +Davis, et al Experimental [Page 5] + +RFC 1876 Location Information in the DNS January 1996 + + + Note that domain names that do not have associated A records must + have a LOC RR associated with their name in order for location + information to be accessible. + +5.2.2 Searching by Address + + If the application is beginning with an IP address (as a "visual + traceroute" application might be) it MUST first map the address to a + name using the IN-ADDR.ARPA namespace (see [RFC 1034], section + 5.2.1), then check for a LOC RR associated with that name. + + If there is no LOC RR for the name, the address MAY be checked for + network (or subnet) LOC RRs using the "Searching by Network or + Subnet" algorithm (5.2.3). + +5.2.3 Searching by Network or Subnet + + Even if a host's name does not have any associated LOC RRs, the + network(s) or subnet(s) it is on may. If the application wishes to + search for such less specific data, the following algorithm SHOULD be + followed to find a network or subnet LOC RR associated with the IP + address. This algorithm is adapted slightly from that specified in + [RFC 1101], sections 4.3 and 4.4. + + Since subnet LOC RRs are (if present) more specific than network LOC + RRs, it is best to use them if available. In order to do so, we + build a stack of network and subnet names found while performing the + [RFC 1101] search, then work our way down the stack until a LOC RR is + found. + + 1. create a host-zero address using the network portion of the IP + address (one, two, or three bytes for class A, B, or C networks, + respectively). For example, for the host 128.9.2.17, on the class + B network 128.9, this would result in the address "128.9.0.0". + + 2. Reverse the octets, suffix IN-ADDR.ARPA, and query for PTR and A + records. Retrieve: + + 0.0.9.128.IN-ADDR.ARPA. PTR isi-net.isi.edu. + A 255.255.255.0 + + Push the name "isi-net.isi.edu" onto the stack of names to be + searched for LOC RRs later. + + + + + + + + +Davis, et al Experimental [Page 6] + +RFC 1876 Location Information in the DNS January 1996 + + + 3. Since an A RR was found, repeat using mask from RR + (255.255.255.0), constructing a query for 0.2.9.128.IN-ADDR.ARPA. + Retrieve: + + 0.2.9.128.IN-ADDR.ARPA. PTR div2-subnet.isi.edu. + A 255.255.255.240 + + Push the name "div2-subnet.isi.edu" onto the stack of names to be + searched for LOC RRs later. + + 4. Since another A RR was found, repeat using mask 255.255.255.240 + (x'FFFFFFF0'), constructing a query for 16.2.9.128.IN-ADDR.ARPA. + Retrieve: + + 16.2.9.128.IN-ADDR.ARPA. PTR inc-subsubnet.isi.edu. + + Push the name "inc-subsubnet.isi.edu" onto the stack of names to + be searched for LOC RRs later. + + 5. Since no A RR is present at 16.2.9.128.IN-ADDR.ARPA., there are no + more subnet levels to search. We now pop the top name from the + stack and check for an associated LOC RR. Repeat until a LOC RR + is found. + + In this case, assume that inc-subsubnet.isi.edu does not have an + associated LOC RR, but that div2-subnet.isi.edu does. We will + then use div2-subnet.isi.edu's LOC RR as an approximation of this + host's location. (Note that even if isi-net.isi.edu has a LOC RR, + it will not be used if a subnet also has a LOC RR.) + +5.3 Applicability to non-IN Classes and non-IP Addresses + + The LOC record is defined for all RR classes, and may be used with + non-IN classes such as HS and CH. The semantics of such use are not + defined by this memo. + + The search algorithm in section 5.2.3 may be adapted to other + addressing schemes by extending [RFC 1101]'s encoding of network + names to cover those schemes. Such extensions are not defined by + this memo. + + + + + + + + + + + +Davis, et al Experimental [Page 7] + +RFC 1876 Location Information in the DNS January 1996 + + +6. References + + [RFC 1034] Mockapetris, P., "Domain Names - Concepts and Facilities", + STD 13, RFC 1034, USC/Information Sciences Institute, + November 1987. + + [RFC 1035] Mockapetris, P., "Domain Names - Implementation and + Specification", STD 13, RFC 1035, USC/Information Sciences + Institute, November 1987. + + [RFC 1101] Mockapetris, P., "DNS Encoding of Network Names and Other + Types", RFC 1101, USC/Information Sciences Institute, + April 1989. + + [WGS 84] United States Department of Defense; DoD WGS-1984 - Its + Definition and Relationships with Local Geodetic Systems; + Washington, D.C.; 1985; Report AD-A188 815 DMA; 6127; 7-R- + 138-R; CV, KV; + +7. Security Considerations + + High-precision LOC RR information could be used to plan a penetration + of physical security, leading to potential denial-of-machine attacks. + To avoid any appearance of suggesting this method to potential + attackers, we declined the opportunity to name this RR "ICBM". + +8. Authors' Addresses + + The authors as a group can be reached as <loc@pipex.net>. + + Christopher Davis + Kapor Enterprises, Inc. + 238 Main Street, Suite 400 + Cambridge, MA 02142 + + Phone: +1 617 576 4532 + EMail: ckd@kei.com + + + Paul Vixie + Vixie Enterprises + Star Route Box 159A + Woodside, CA 94062 + + Phone: +1 415 747 0204 + EMail: paul@vix.com + + + + + +Davis, et al Experimental [Page 8] + +RFC 1876 Location Information in the DNS January 1996 + + + Tim Goodwin + Public IP Exchange Ltd (PIPEX) + 216 The Science Park + Cambridge CB4 4WA + UK + + Phone: +44 1223 250250 + EMail: tim@pipex.net + + + Ian Dickinson + FORE Systems + 2475 The Crescent + Solihull Parkway + Birmingham Business Park + B37 7YE + UK + + Phone: +44 121 717 4444 + EMail: idickins@fore.co.uk + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Davis, et al Experimental [Page 9] + +RFC 1876 Location Information in the DNS January 1996 + + +Appendix A: Sample Conversion Routines + +/* + * routines to convert between on-the-wire RR format and zone file + * format. Does not contain conversion to/from decimal degrees; + * divide or multiply by 60*60*1000 for that. + */ + +static unsigned int poweroften[10] = {1, 10, 100, 1000, 10000, 100000, + 1000000,10000000,100000000,1000000000}; + +/* takes an XeY precision/size value, returns a string representation.*/ +static const char * +precsize_ntoa(prec) + u_int8_t prec; +{ + static char retbuf[sizeof("90000000.00")]; + unsigned long val; + int mantissa, exponent; + + mantissa = (int)((prec >> 4) & 0x0f) % 10; + exponent = (int)((prec >> 0) & 0x0f) % 10; + + val = mantissa * poweroften[exponent]; + + (void) sprintf(retbuf,"%d.%.2d", val/100, val%100); + return (retbuf); +} + +/* converts ascii size/precision X * 10**Y(cm) to 0xXY. moves pointer.*/ +static u_int8_t +precsize_aton(strptr) + char **strptr; +{ + unsigned int mval = 0, cmval = 0; + u_int8_t retval = 0; + register char *cp; + register int exponent; + register int mantissa; + + cp = *strptr; + + while (isdigit(*cp)) + mval = mval * 10 + (*cp++ - '0'); + + if (*cp == '.') { /* centimeters */ + cp++; + if (isdigit(*cp)) { + + + +Davis, et al Experimental [Page 10] + +RFC 1876 Location Information in the DNS January 1996 + + + cmval = (*cp++ - '0') * 10; + if (isdigit(*cp)) { + cmval += (*cp++ - '0'); + } + } + } + cmval = (mval * 100) + cmval; + + for (exponent = 0; exponent < 9; exponent++) + if (cmval < poweroften[exponent+1]) + break; + + mantissa = cmval / poweroften[exponent]; + if (mantissa > 9) + mantissa = 9; + + retval = (mantissa << 4) | exponent; + + *strptr = cp; + + return (retval); +} + +/* converts ascii lat/lon to unsigned encoded 32-bit number. + * moves pointer. */ +static u_int32_t +latlon2ul(latlonstrptr,which) + char **latlonstrptr; + int *which; +{ + register char *cp; + u_int32_t retval; + int deg = 0, min = 0, secs = 0, secsfrac = 0; + + cp = *latlonstrptr; + + while (isdigit(*cp)) + deg = deg * 10 + (*cp++ - '0'); + + while (isspace(*cp)) + cp++; + + if (!(isdigit(*cp))) + goto fndhemi; + + while (isdigit(*cp)) + min = min * 10 + (*cp++ - '0'); + + + + +Davis, et al Experimental [Page 11] + +RFC 1876 Location Information in the DNS January 1996 + + + while (isspace(*cp)) + cp++; + + if (!(isdigit(*cp))) + goto fndhemi; + + while (isdigit(*cp)) + secs = secs * 10 + (*cp++ - '0'); + + if (*cp == '.') { /* decimal seconds */ + cp++; + if (isdigit(*cp)) { + secsfrac = (*cp++ - '0') * 100; + if (isdigit(*cp)) { + secsfrac += (*cp++ - '0') * 10; + if (isdigit(*cp)) { + secsfrac += (*cp++ - '0'); + } + } + } + } + + while (!isspace(*cp)) /* if any trailing garbage */ + cp++; + + while (isspace(*cp)) + cp++; + + fndhemi: + switch (*cp) { + case 'N': case 'n': + case 'E': case 'e': + retval = ((unsigned)1<<31) + + (((((deg * 60) + min) * 60) + secs) * 1000) + + secsfrac; + break; + case 'S': case 's': + case 'W': case 'w': + retval = ((unsigned)1<<31) + - (((((deg * 60) + min) * 60) + secs) * 1000) + - secsfrac; + break; + default: + retval = 0; /* invalid value -- indicates error */ + break; + } + + switch (*cp) { + + + +Davis, et al Experimental [Page 12] + +RFC 1876 Location Information in the DNS January 1996 + + + case 'N': case 'n': + case 'S': case 's': + *which = 1; /* latitude */ + break; + case 'E': case 'e': + case 'W': case 'w': + *which = 2; /* longitude */ + break; + default: + *which = 0; /* error */ + break; + } + + cp++; /* skip the hemisphere */ + + while (!isspace(*cp)) /* if any trailing garbage */ + cp++; + + while (isspace(*cp)) /* move to next field */ + cp++; + + *latlonstrptr = cp; + + return (retval); +} + +/* converts a zone file representation in a string to an RDATA + * on-the-wire representation. */ +u_int32_t +loc_aton(ascii, binary) + const char *ascii; + u_char *binary; +{ + const char *cp, *maxcp; + u_char *bcp; + + u_int32_t latit = 0, longit = 0, alt = 0; + u_int32_t lltemp1 = 0, lltemp2 = 0; + int altmeters = 0, altfrac = 0, altsign = 1; + u_int8_t hp = 0x16; /* default = 1e6 cm = 10000.00m = 10km */ + u_int8_t vp = 0x13; /* default = 1e3 cm = 10.00m */ + u_int8_t siz = 0x12; /* default = 1e2 cm = 1.00m */ + int which1 = 0, which2 = 0; + + cp = ascii; + maxcp = cp + strlen(ascii); + + lltemp1 = latlon2ul(&cp, &which1); + + + +Davis, et al Experimental [Page 13] + +RFC 1876 Location Information in the DNS January 1996 + + + lltemp2 = latlon2ul(&cp, &which2); + + switch (which1 + which2) { + case 3: /* 1 + 2, the only valid combination */ + if ((which1 == 1) && (which2 == 2)) { /* normal case */ + latit = lltemp1; + longit = lltemp2; + } else if ((which1 == 2) && (which2 == 1)) {/*reversed*/ + longit = lltemp1; + latit = lltemp2; + } else { /* some kind of brokenness */ + return 0; + } + break; + default: /* we didn't get one of each */ + return 0; + } + + /* altitude */ + if (*cp == '-') { + altsign = -1; + cp++; + } + + if (*cp == '+') + cp++; + + while (isdigit(*cp)) + altmeters = altmeters * 10 + (*cp++ - '0'); + + if (*cp == '.') { /* decimal meters */ + cp++; + if (isdigit(*cp)) { + altfrac = (*cp++ - '0') * 10; + if (isdigit(*cp)) { + altfrac += (*cp++ - '0'); + } + } + } + + alt = (10000000 + (altsign * (altmeters * 100 + altfrac))); + + while (!isspace(*cp) && (cp < maxcp)) + /* if trailing garbage or m */ + cp++; + + while (isspace(*cp) && (cp < maxcp)) + cp++; + + + +Davis, et al Experimental [Page 14] + +RFC 1876 Location Information in the DNS January 1996 + + + if (cp >= maxcp) + goto defaults; + + siz = precsize_aton(&cp); + + while (!isspace(*cp) && (cp < maxcp))/*if trailing garbage or m*/ + cp++; + + while (isspace(*cp) && (cp < maxcp)) + cp++; + + if (cp >= maxcp) + goto defaults; + + hp = precsize_aton(&cp); + + while (!isspace(*cp) && (cp < maxcp))/*if trailing garbage or m*/ + cp++; + + while (isspace(*cp) && (cp < maxcp)) + cp++; + + if (cp >= maxcp) + goto defaults; + + vp = precsize_aton(&cp); + + defaults: + + bcp = binary; + *bcp++ = (u_int8_t) 0; /* version byte */ + *bcp++ = siz; + *bcp++ = hp; + *bcp++ = vp; + PUTLONG(latit,bcp); + PUTLONG(longit,bcp); + PUTLONG(alt,bcp); + + return (16); /* size of RR in octets */ +} + +/* takes an on-the-wire LOC RR and prints it in zone file + * (human readable) format. */ +char * +loc_ntoa(binary,ascii) + const u_char *binary; + char *ascii; +{ + + + +Davis, et al Experimental [Page 15] + +RFC 1876 Location Information in the DNS January 1996 + + + static char tmpbuf[255*3]; + + register char *cp; + register const u_char *rcp; + + int latdeg, latmin, latsec, latsecfrac; + int longdeg, longmin, longsec, longsecfrac; + char northsouth, eastwest; + int altmeters, altfrac, altsign; + + const int referencealt = 100000 * 100; + + int32_t latval, longval, altval; + u_int32_t templ; + u_int8_t sizeval, hpval, vpval, versionval; + + char *sizestr, *hpstr, *vpstr; + + rcp = binary; + if (ascii) + cp = ascii; + else { + cp = tmpbuf; + } + + versionval = *rcp++; + + if (versionval) { + sprintf(cp,"; error: unknown LOC RR version"); + return (cp); + } + + sizeval = *rcp++; + + hpval = *rcp++; + vpval = *rcp++; + + GETLONG(templ,rcp); + latval = (templ - ((unsigned)1<<31)); + + GETLONG(templ,rcp); + longval = (templ - ((unsigned)1<<31)); + + GETLONG(templ,rcp); + if (templ < referencealt) { /* below WGS 84 spheroid */ + altval = referencealt - templ; + altsign = -1; + } else { + + + +Davis, et al Experimental [Page 16] + +RFC 1876 Location Information in the DNS January 1996 + + + altval = templ - referencealt; + altsign = 1; + } + + if (latval < 0) { + northsouth = 'S'; + latval = -latval; + } + else + northsouth = 'N'; + + latsecfrac = latval % 1000; + latval = latval / 1000; + latsec = latval % 60; + latval = latval / 60; + latmin = latval % 60; + latval = latval / 60; + latdeg = latval; + + if (longval < 0) { + eastwest = 'W'; + longval = -longval; + } + else + eastwest = 'E'; + + longsecfrac = longval % 1000; + longval = longval / 1000; + longsec = longval % 60; + longval = longval / 60; + longmin = longval % 60; + longval = longval / 60; + longdeg = longval; + + altfrac = altval % 100; + altmeters = (altval / 100) * altsign; + + sizestr = savestr(precsize_ntoa(sizeval)); + hpstr = savestr(precsize_ntoa(hpval)); + vpstr = savestr(precsize_ntoa(vpval)); + + sprintf(cp, + "%d %.2d %.2d.%.3d %c %d %.2d %.2d.%.3d %c %d.%.2dm + %sm %sm %sm", + latdeg, latmin, latsec, latsecfrac, northsouth, + longdeg, longmin, longsec, longsecfrac, eastwest, + altmeters, altfrac, sizestr, hpstr, vpstr); + + + + +Davis, et al Experimental [Page 17] + +RFC 1876 Location Information in the DNS January 1996 + + + free(sizestr); + free(hpstr); + free(vpstr); + + return (cp); +} + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Davis, et al Experimental [Page 18] + diff --git a/doc/rfc/rfc1982.txt b/doc/rfc/rfc1982.txt new file mode 100644 index 00000000..5a34bc42 --- /dev/null +++ b/doc/rfc/rfc1982.txt @@ -0,0 +1,394 @@ + + + + + + +Network Working Group R. Elz +Request for Comments: 1982 University of Melbourne +Updates: 1034, 1035 R. Bush +Category: Standards Track RGnet, Inc. + August 1996 + + + Serial Number Arithmetic + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Abstract + + This memo defines serial number arithmetic, as used in the Domain + Name System. The DNS has long relied upon serial number arithmetic, + a concept which has never really been defined, certainly not in an + IETF document, though which has been widely understood. This memo + supplies the missing definition. It is intended to update RFC1034 + and RFC1035. + +1. Introduction + + The serial number field of the SOA resource record is defined in + RFC1035 as + + SERIAL The unsigned 32 bit version number of the original copy of + the zone. Zone transfers preserve this value. This value + wraps and should be compared using sequence space + arithmetic. + + RFC1034 uses the same terminology when defining secondary server zone + consistency procedures. + + Unfortunately the term "sequence space arithmetic" is not defined in + either RFC1034 or RFC1035, nor do any of their references provide + further information. + + This phrase seems to have been intending to specify arithmetic as + used in TCP sequence numbers [RFC793], and defined in [IEN-74]. + + Unfortunately, the arithmetic defined in [IEN-74] is not adequate for + the purposes of the DNS, as no general comparison operator is + + + +Elz & Bush Standards Track [Page 1] + +RFC 1982 Serial Number Arithmetic August 1996 + + + defined. + + To avoid further problems with this simple field, this document + defines the field and the operations available upon it. This + definition is intended merely to clarify the intent of RFC1034 and + RFC1035, and is believed to generally agree with current + implementations. However, older, superseded, implementations are + known to have treated the serial number as a simple unsigned integer, + with no attempt to implement any kind of "sequence space arithmetic", + however that may have been interpreted, and further, ignoring the + requirement that the value wraps. Nothing can be done with these + implementations, beyond extermination. + +2. Serial Number Arithmetic + + Serial numbers are formed from non-negative integers from a finite + subset of the range of all integer values. The lowest integer in + every subset used for this purpose is zero, the maximum is always one + less than a power of two. + + When considered as serial numbers however no value has any particular + significance, there is no minimum or maximum serial number, every + value has a successor and predecessor. + + To define a serial number to be used in this way, the size of the + serial number space must be given. This value, called "SERIAL_BITS", + gives the power of two which results in one larger than the largest + integer corresponding to a serial number value. This also specifies + the number of bits required to hold every possible value of a serial + number of the defined type. The operations permitted upon serial + numbers are defined in the following section. + +3. Operations upon the serial number + + Only two operations are defined upon serial numbers, addition of a + positive integer of limited range, and comparison with another serial + number. + +3.1. Addition + + Serial numbers may be incremented by the addition of a positive + integer n, where n is taken from the range of integers + [0 .. (2^(SERIAL_BITS - 1) - 1)]. For a sequence number s, the + result of such an addition, s', is defined as + + s' = (s + n) modulo (2 ^ SERIAL_BITS) + + + + + +Elz & Bush Standards Track [Page 2] + +RFC 1982 Serial Number Arithmetic August 1996 + + + where the addition and modulus operations here act upon values that + are non-negative values of unbounded size in the usual ways of + integer arithmetic. + + Addition of a value outside the range + [0 .. (2^(SERIAL_BITS - 1) - 1)] is undefined. + +3.2. Comparison + + Any two serial numbers, s1 and s2, may be compared. The definition + of the result of this comparison is as follows. + + For the purposes of this definition, consider two integers, i1 and + i2, from the unbounded set of non-negative integers, such that i1 and + s1 have the same numeric value, as do i2 and s2. Arithmetic and + comparisons applied to i1 and i2 use ordinary unbounded integer + arithmetic. + + Then, s1 is said to be equal to s2 if and only if i1 is equal to i2, + in all other cases, s1 is not equal to s2. + + s1 is said to be less than s2 if, and only if, s1 is not equal to s2, + and + + (i1 < i2 and i2 - i1 < 2^(SERIAL_BITS - 1)) or + (i1 > i2 and i1 - i2 > 2^(SERIAL_BITS - 1)) + + s1 is said to be greater than s2 if, and only if, s1 is not equal to + s2, and + + (i1 < i2 and i2 - i1 > 2^(SERIAL_BITS - 1)) or + (i1 > i2 and i1 - i2 < 2^(SERIAL_BITS - 1)) + + Note that there are some pairs of values s1 and s2 for which s1 is + not equal to s2, but for which s1 is neither greater than, nor less + than, s2. An attempt to use these ordering operators on such pairs + of values produces an undefined result. + + The reason for this is that those pairs of values are such that any + simple definition that were to define s1 to be less than s2 where + (s1, s2) is such a pair, would also usually cause s2 to be less than + s1, when the pair is (s2, s1). This would mean that the particular + order selected for a test could cause the result to differ, leading + to unpredictable implementations. + + While it would be possible to define the test in such a way that the + inequality would not have this surprising property, while being + defined for all pairs of values, such a definition would be + + + +Elz & Bush Standards Track [Page 3] + +RFC 1982 Serial Number Arithmetic August 1996 + + + unnecessarily burdensome to implement, and difficult to understand, + and would still allow cases where + + s1 < s2 and (s1 + 1) > (s2 + 1) + + which is just as non-intuitive. + + Thus the problem case is left undefined, implementations are free to + return either result, or to flag an error, and users must take care + not to depend on any particular outcome. Usually this will mean + avoiding allowing those particular pairs of numbers to co-exist. + + The relationships greater than or equal to, and less than or equal + to, follow in the natural way from the above definitions. + +4. Corollaries + + These definitions give rise to some results of note. + +4.1. Corollary 1 + + For any sequence number s and any integer n such that addition of n + to s is well defined, (s + n) >= s. Further (s + n) == s only when + n == 0, in all other defined cases, (s + n) > s. + +4.2. Corollary 2 + + If s' is the result of adding the non-zero integer n to the sequence + number s, and m is another integer from the range defined as able to + be added to a sequence number, and s" is the result of adding m to + s', then it is undefined whether s" is greater than, or less than s, + though it is known that s" is not equal to s. + +4.3. Corollary 3 + + If s" from the previous corollary is further incremented, then there + is no longer any known relationship between the result and s. + +4.4. Corollary 4 + + If in corollary 2 the value (n + m) is such that addition of the sum + to sequence number s would produce a defined result, then corollary 1 + applies, and s" is known to be greater than s. + + + + + + + + +Elz & Bush Standards Track [Page 4] + +RFC 1982 Serial Number Arithmetic August 1996 + + +5. Examples + +5.1. A trivial example + + The simplest meaningful serial number space has SERIAL_BITS == 2. In + this space, the integers that make up the serial number space are 0, + 1, 2, and 3. That is, 3 == 2^SERIAL_BITS - 1. + + In this space, the largest integer that it is meaningful to add to a + sequence number is 2^(SERIAL_BITS - 1) - 1, or 1. + + Then, as defined 0+1 == 1, 1+1 == 2, 2+1 == 3, and 3+1 == 0. + Further, 1 > 0, 2 > 1, 3 > 2, and 0 > 3. It is undefined whether + 2 > 0 or 0 > 2, and whether 1 > 3 or 3 > 1. + +5.2. A slightly larger example + + Consider the case where SERIAL_BITS == 8. In this space the integers + that make up the serial number space are 0, 1, 2, ... 254, 255. + 255 == 2^SERIAL_BITS - 1. + + In this space, the largest integer that it is meaningful to add to a + sequence number is 2^(SERIAL_BITS - 1) - 1, or 127. + + Addition is as expected in this space, for example: 255+1 == 0, + 100+100 == 200, and 200+100 == 44. + + Comparison is more interesting, 1 > 0, 44 > 0, 100 > 0, 100 > 44, + 200 > 100, 255 > 200, 0 > 255, 100 > 255, 0 > 200, and 44 > 200. + + Note that 100+100 > 100, but that (100+100)+100 < 100. Incrementing + a serial number can cause it to become "smaller". Of course, + incrementing by a smaller number will allow many more increments to + be made before this occurs. However this is always something to be + aware of, it can cause surprising errors, or be useful as it is the + only defined way to actually cause a serial number to decrease. + + The pairs of values 0 and 128, 1 and 129, 2 and 130, etc, to 127 and + 255 are not equal, but in each pair, neither number is defined as + being greater than, or less than, the other. + + It could be defined (arbitrarily) that 128 > 0, 129 > 1, + 130 > 2, ..., 255 > 127, by changing the comparison operator + definitions, as mentioned above. However note that that would cause + 255 > 127, while (255 + 1) < (127 + 1), as 0 < 128. Such a + definition, apart from being arbitrary, would also be more costly to + implement. + + + + +Elz & Bush Standards Track [Page 5] + +RFC 1982 Serial Number Arithmetic August 1996 + + +6. Citation + + As this defined arithmetic may be useful for purposes other than for + the DNS serial number, it may be referenced as Serial Number + Arithmetic from RFC1982. Any such reference shall be taken as + implying that the rules of sections 2 to 5 of this document apply to + the stated values. + +7. The DNS SOA serial number + + The serial number in the DNS SOA Resource Record is a Serial Number + as defined above, with SERIAL_BITS being 32. That is, the serial + number is a non negative integer with values taken from the range + [0 .. 4294967295]. That is, a 32 bit unsigned integer. + + The maximum defined increment is 2147483647 (2^31 - 1). + + Care should be taken that the serial number not be incremented, in + one or more steps, by more than this maximum within the period given + by the value of SOA.expire. Doing so may leave some secondary + servers with out of date copies of the zone, but with a serial number + "greater" than that of the primary server. Of course, special + circumstances may require this rule be set aside, for example, when + the serial number needs to be set lower for some reason. If this + must be done, then take special care to verify that ALL servers have + correctly succeeded in following the primary server's serial number + changes, at each step. + + Note that each, and every, increment to the serial number must be + treated as the start of a new sequence of increments for this + purpose, as well as being the continuation of all previous sequences + started within the period specified by SOA.expire. + + Caution should also be exercised before causing the serial number to + be set to the value zero. While this value is not in any way special + in serial number arithmetic, or to the DNS SOA serial number, many + DNS implementations have incorrectly treated zero as a special case, + with special properties, and unusual behaviour may be expected if + zero is used as a DNS SOA serial number. + + + + + + + + + + + + +Elz & Bush Standards Track [Page 6] + +RFC 1982 Serial Number Arithmetic August 1996 + + +8. Document Updates + + RFC1034 and RFC1035 are to be treated as if the references to + "sequence space arithmetic" therein are replaced by references to + serial number arithmetic, as defined in this document. + +9. Security Considerations + + This document does not consider security. + + It is not believed that anything in this document adds to any + security issues that may exist with the DNS, nor does it do anything + to lessen them. + +References + + [RFC1034] Domain Names - Concepts and Facilities, + P. Mockapetris, STD 13, ISI, November 1987. + + [RFC1035] Domain Names - Implementation and Specification + P. Mockapetris, STD 13, ISI, November 1987 + + [RFC793] Transmission Control protocol + Information Sciences Institute, STD 7, USC, September 1981 + + [IEN-74] Sequence Number Arithmetic + William W. Plummer, BB&N Inc, September 1978 + +Acknowledgements + + Thanks to Rob Austein for suggesting clarification of the undefined + comparison operators, and to Michael Patton for attempting to locate + another reference for this procedure. Thanks also to members of the + IETF DNSIND working group of 1995-6, in particular, Paul Mockapetris. + +Authors' Addresses + + Robert Elz Randy Bush + Computer Science RGnet, Inc. + University of Melbourne 10361 NE Sasquatch Lane + Parkville, Vic, 3052 Bainbridge Island, Washington, 98110 + Australia. United States. + + EMail: kre@munnari.OZ.AU EMail: randy@psg.com + + + + + + + +Elz & Bush Standards Track [Page 7] diff --git a/doc/rfc/rfc1995.txt b/doc/rfc/rfc1995.txt new file mode 100644 index 00000000..b50bdc60 --- /dev/null +++ b/doc/rfc/rfc1995.txt @@ -0,0 +1,451 @@ + + + + + + +Network Working Group M. Ohta +Request for Comments: 1995 Tokyo Institute of Technology +Updates: 1035 August 1996 +Category: Standards Track + + + Incremental Zone Transfer in DNS + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Abstract + + This document proposes extensions to the DNS protocols to provide an + incremental zone transfer (IXFR) mechanism. + +1. Introduction + + For rapid propagation of changes to a DNS database [STD13], it is + necessary to reduce latency by actively notifying servers of the + change. This is accomplished by the NOTIFY extension of the DNS + [NOTIFY]. + + The current full zone transfer mechanism (AXFR) is not an efficient + means to propagate changes to a small part of a zone, as it transfers + the entire zone file. + + Incremental transfer (IXFR) as proposed is a more efficient + mechanism, as it transfers only the changed portion(s) of a zone. + + In this document, a secondary name server which requests IXFR is + called an IXFR client and a primary or secondary name server which + responds to the request is called an IXFR server. + +2. Brief Description of the Protocol + + If an IXFR client, which likely has an older version of a zone, + thinks it needs new information about the zone (typically through SOA + refresh timeout or the NOTIFY mechanism), it sends an IXFR message + containing the SOA serial number of its, presumably outdated, copy of + the zone. + + + + + +Ohta Standards Track [Page 1] + +RFC 1995 Incremental Zone Transfer in DNS August 1996 + + + An IXFR server should keep record of the newest version of the zone + and the differences between that copy and several older versions. + When an IXFR request with an older version number is received, the + IXFR server needs to send only the differences required to make that + version current. Alternatively, the server may choose to transfer + the entire zone just as in a normal full zone transfer. + + When a zone has been updated, it should be saved in stable storage + before the new version is used to respond to IXFR (or AXFR) queries. + Otherwise, if the server crashes, data which is no longer available + may have been distributed to secondary servers, which can cause + persistent database inconsistencies. + + If an IXFR query with the same or newer version number than that of + the server is received, it is replied to with a single SOA record of + the server's current version, just as in AXFR. + + Transport of a query may be by either UDP or TCP. If an IXFR query + is via UDP, the IXFR server may attempt to reply using UDP if the + entire response can be contained in a single DNS packet. If the UDP + reply does not fit, the query is responded to with a single SOA + record of the server's current version to inform the client that a + TCP query should be initiated. + + Thus, a client should first make an IXFR query using UDP. If the + query type is not recognized by the server, an AXFR (preceded by a + UDP SOA query) should be tried, ensuring backward compatibility. If + the query response is a single packet with the entire new zone, or if + the server does not have a newer version than the client, everything + is done. Otherwise, a TCP IXFR query should be tried. + + To ensure integrity, servers should use UDP checksums for all UDP + responses. A cautious client which receives a UDP packet with a + checksum value of zero should ignore the result and try a TCP IXFR + instead. + + The query type value of IXFR assigned by IANA is 251. + +3. Query Format + + The IXFR query packet format is the same as that of a normal DNS + query, but with the query type being IXFR and the authority section + containing the SOA record of client's version of the zone. + + + + + + + + +Ohta Standards Track [Page 2] + +RFC 1995 Incremental Zone Transfer in DNS August 1996 + + +4. Response Format + + If incremental zone transfer is not available, the entire zone is + returned. The first and the last RR of the response is the SOA + record of the zone. I.e. the behavior is the same as an AXFR + response except the query type is IXFR. + + If incremental zone transfer is available, one or more difference + sequences is returned. The list of difference sequences is preceded + and followed by a copy of the server's current version of the SOA. + + Each difference sequence represents one update to the zone (one SOA + serial change) consisting of deleted RRs and added RRs. The first RR + of the deleted RRs is the older SOA RR and the first RR of the added + RRs is the newer SOA RR. + + Modification of an RR is performed first by removing the original RR + and then adding the modified one. + + The sequences of differential information are ordered oldest first + newest last. Thus, the differential sequences are the history of + changes made since the version known by the IXFR client up to the + server's current version. + + RRs in the incremental transfer messages may be partial. That is, if + a single RR of multiple RRs of the same RR type changes, only the + changed RR is transferred. + + An IXFR client, should only replace an older version with a newer + version after all the differences have been successfully processed. + + An incremental response is different from that of a non-incremental + response in that it begins with two SOA RRs, the server's current SOA + followed by the SOA of the client's version which is about to be + replaced. + + 5. Purging Strategy + + An IXFR server can not be required to hold all previous versions + forever and may delete them anytime. In general, there is a trade-off + between the size of storage space and the possibility of using IXFR. + + Information about older versions should be purged if the total length + of an IXFR response would be longer than that of an AXFR response. + Given that the purpose of IXFR is to reduce AXFR overhead, this + strategy is quite reasonable. The strategy assures that the amount + of storage required is at most twice that of the current zone + information. + + + +Ohta Standards Track [Page 3] + +RFC 1995 Incremental Zone Transfer in DNS August 1996 + + + Information older than the SOA expire period may also be purged. + +6. Optional Condensation of Multiple Versions + + An IXFR server may optionally condense multiple difference sequences + into a single difference sequence, thus, dropping information on + intermediate versions. + + This may be beneficial if a lot of versions, not all of which are + useful, are generated. For example, if multiple ftp servers share a + single DNS name and the IP address associated with the name is + changed once a minute to balance load between the ftp servers, it is + not so important to keep track of all the history of changes. + + But, this feature may not be so useful if an IXFR client has access + to two IXFR servers: A and B, with inconsistent condensation results. + The current version of the IXFR client, received from server A, may + be unknown to server B. In such a case, server B can not provide + incremental data from the unknown version and a full zone transfer is + necessary. + + Condensation is completely optional. Clients can't detect from the + response whether the server has condensed the reply or not. + + For interoperability, IXFR servers, including those without the + condensation feature, should not flag an error even if it receives a + client's IXFR request with a unknown version number and should, + instead, attempt to perform a full zone transfer. + +7. Example + + Given the following three generations of data with the current serial + number of 3, + + JAIN.AD.JP. IN SOA NS.JAIN.AD.JP. mohta.jain.ad.jp. ( + 1 600 600 3600000 604800) + IN NS NS.JAIN.AD.JP. + NS.JAIN.AD.JP. IN A 133.69.136.1 + NEZU.JAIN.AD.JP. IN A 133.69.136.5 + + NEZU.JAIN.AD.JP. is removed and JAIN-BB.JAIN.AD.JP. is added. + + jain.ad.jp. IN SOA ns.jain.ad.jp. mohta.jain.ad.jp. ( + 2 600 600 3600000 604800) + IN NS NS.JAIN.AD.JP. + NS.JAIN.AD.JP. IN A 133.69.136.1 + JAIN-BB.JAIN.AD.JP. IN A 133.69.136.4 + IN A 192.41.197.2 + + + +Ohta Standards Track [Page 4] + +RFC 1995 Incremental Zone Transfer in DNS August 1996 + + + One of the IP addresses of JAIN-BB.JAIN.AD.JP. is changed. + + JAIN.AD.JP. IN SOA ns.jain.ad.jp. mohta.jain.ad.jp. ( + 3 600 600 3600000 604800) + IN NS NS.JAIN.AD.JP. + NS.JAIN.AD.JP. IN A 133.69.136.1 + JAIN-BB.JAIN.AD.JP. IN A 133.69.136.3 + IN A 192.41.197.2 + + The following IXFR query + + +---------------------------------------------------+ + Header | OPCODE=SQUERY | + +---------------------------------------------------+ + Question | QNAME=JAIN.AD.JP., QCLASS=IN, QTYPE=IXFR | + +---------------------------------------------------+ + Answer | <empty> | + +---------------------------------------------------+ + Authority | JAIN.AD.JP. IN SOA serial=1 | + +---------------------------------------------------+ + Additional | <empty> | + +---------------------------------------------------+ + + could be replied to with the following full zone transfer message: + + +---------------------------------------------------+ + Header | OPCODE=SQUERY, RESPONSE | + +---------------------------------------------------+ + Question | QNAME=JAIN.AD.JP., QCLASS=IN, QTYPE=IXFR | + +---------------------------------------------------+ + Answer | JAIN.AD.JP. IN SOA serial=3 | + | JAIN.AD.JP. IN NS NS.JAIN.AD.JP. | + | NS.JAIN.AD.JP. IN A 133.69.136.1 | + | JAIN-BB.JAIN.AD.JP. IN A 133.69.136.3 | + | JAIN-BB.JAIN.AD.JP. IN A 192.41.197.2 | + | JAIN.AD.JP. IN SOA serial=3 | + +---------------------------------------------------+ + Authority | <empty> | + +---------------------------------------------------+ + Additional | <empty> | + +---------------------------------------------------+ + + + + + + + + + + +Ohta Standards Track [Page 5] + +RFC 1995 Incremental Zone Transfer in DNS August 1996 + + + or with the following incremental message: + + +---------------------------------------------------+ + Header | OPCODE=SQUERY, RESPONSE | + +---------------------------------------------------+ + Question | QNAME=JAIN.AD.JP., QCLASS=IN, QTYPE=IXFR | + +---------------------------------------------------+ + Answer | JAIN.AD.JP. IN SOA serial=3 | + | JAIN.AD.JP. IN SOA serial=1 | + | NEZU.JAIN.AD.JP. IN A 133.69.136.5 | + | JAIN.AD.JP. IN SOA serial=2 | + | JAIN-BB.JAIN.AD.JP. IN A 133.69.136.4 | + | JAIN-BB.JAIN.AD.JP. IN A 192.41.197.2 | + | JAIN.AD.JP. IN SOA serial=2 | + | JAIN-BB.JAIN.AD.JP. IN A 133.69.136.4 | + | JAIN.AD.JP. IN SOA serial=3 | + | JAIN-BB.JAIN.AD.JP. IN A 133.69.136.3 | + | JAIN.AD.JP. IN SOA serial=3 | + +---------------------------------------------------+ + Authority | <empty> | + +---------------------------------------------------+ + Additional | <empty> | + +---------------------------------------------------+ + + or with the following condensed incremental message: + + +---------------------------------------------------+ + Header | OPCODE=SQUERY, RESPONSE | + +---------------------------------------------------+ + Question | QNAME=JAIN.AD.JP., QCLASS=IN, QTYPE=IXFR | + +---------------------------------------------------+ + Answer | JAIN.AD.JP. IN SOA serial=3 | + | JAIN.AD.JP. IN SOA serial=1 | + | NEZU.JAIN.AD.JP. IN A 133.69.136.5 | + | JAIN.AD.JP. IN SOA serial=3 | + | JAIN-BB.JAIN.AD.JP. IN A 133.69.136.3 | + | JAIN-BB.JAIN.AD.JP. IN A 192.41.197.2 | + | JAIN.AD.JP. IN SOA serial=3 | + +---------------------------------------------------+ + Authority | <empty> | + +---------------------------------------------------+ + Additional | <empty> | + +---------------------------------------------------+ + + + + + + + + +Ohta Standards Track [Page 6] + +RFC 1995 Incremental Zone Transfer in DNS August 1996 + + + or, if UDP packet overflow occurs, with the following message: + + +---------------------------------------------------+ + Header | OPCODE=SQUERY, RESPONSE | + +---------------------------------------------------+ + Question | QNAME=JAIN.AD.JP., QCLASS=IN, QTYPE=IXFR | + +---------------------------------------------------+ + Answer | JAIN.AD.JP. IN SOA serial=3 | + +---------------------------------------------------+ + Authority | <empty> | + +---------------------------------------------------+ + Additional | <empty> | + +---------------------------------------------------+ + +8. Acknowledgements + + The original idea of IXFR was conceived by Anant Kumar, Steve Hotz + and Jon Postel. + + For the refinement of the protocol and documentation, many people + have contributed including, but not limited to, Anant Kumar, Robert + Austein, Paul Vixie, Randy Bush, Mark Andrews, Robert Elz and the + members of the IETF DNSIND working group. + +9. References + + [NOTIFY] Vixie, P., "DNS NOTIFY: A Mechanism for Prompt + Notification of Zone Changes", RFC 1996, August 1996. + + [STD13] Mockapetris, P., "Domain Name System", STD 13, RFC 1034 and + RFC 1035), November 1987. + +10. Security Considerations + + Though DNS is related to several security problems, no attempt is + made to fix them in this document. + + This document is believed to introduce no additional security + problems to the current DNS protocol. + + + + + + + + + + + + +Ohta Standards Track [Page 7] + +RFC 1995 Incremental Zone Transfer in DNS August 1996 + + +11. Author's Address + + Masataka Ohta + Computer Center + Tokyo Institute of Technology + 2-12-1, O-okayama, Meguro-ku, Tokyo 152, JAPAN + + Phone: +81-3-5734-3299 + Fax: +81-3-5734-3415 + EMail: mohta@necom830.hpcl.titech.ac.jp + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Ohta Standards Track [Page 8] + diff --git a/doc/rfc/rfc1996.txt b/doc/rfc/rfc1996.txt new file mode 100644 index 00000000..b08f2007 --- /dev/null +++ b/doc/rfc/rfc1996.txt @@ -0,0 +1,395 @@ + + + + + + +Network Working Group P. Vixie +Request for Comments: 1996 ISC +Updates: 1035 August 1996 +Category: Standards Track + + + A Mechanism for Prompt Notification of Zone Changes (DNS NOTIFY) + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Abstract + + This memo describes the NOTIFY opcode for DNS, by which a master + server advises a set of slave servers that the master's data has been + changed and that a query should be initiated to discover the new + data. + +1. Rationale and Scope + + 1.1. Slow propagation of new and changed data in a DNS zone can be + due to a zone's relatively long refresh times. Longer refresh times + are beneficial in that they reduce load on the master servers, but + that benefit comes at the cost of long intervals of incoherence among + authority servers whenever the zone is updated. + + 1.2. The DNS NOTIFY transaction allows master servers to inform slave + servers when the zone has changed -- an interrupt as opposed to poll + model -- which it is hoped will reduce propagation delay while not + unduly increasing the masters' load. This specification only allows + slaves to be notified of SOA RR changes, but the architechture of + NOTIFY is intended to be extensible to other RR types. + + 1.3. This document intentionally gives more definition to the roles + of "Master," "Slave" and "Stealth" servers, their enumeration in NS + RRs, and the SOA MNAME field. In that sense, this document can be + considered an addendum to [RFC1035]. + + + + + + + + + +Vixie Standards Track [Page 1] + +RFC 1996 DNS NOTIFY August 1996 + + +2. Definitions and Invariants + + 2.1. The following definitions are used in this document: + + Slave an authoritative server which uses zone transfer to + retrieve the zone. All slave servers are named in + the NS RRs for the zone. + + Master any authoritative server configured to be the source + of zone transfer for one or more slave servers. + + Primary Master master server at the root of the zone transfer + dependency graph. The primary master is named in the + zone's SOA MNAME field and optionally by an NS RR. + There is by definition only one primary master server + per zone. + + Stealth like a slave server except not listed in an NS RR for + the zone. A stealth server, unless explicitly + configured to do otherwise, will set the AA bit in + responses and be capable of acting as a master. A + stealth server will only be known by other servers if + they are given static configuration data indicating + its existence. + + Notify Set set of servers to be notified of changes to some + zone. Default is all servers named in the NS RRset, + except for any server also named in the SOA MNAME. + Some implementations will permit the name server + administrator to override this set or add elements to + it (such as, for example, stealth servers). + + 2.2. The zone's servers must be organized into a dependency graph + such that there is a primary master, and all other servers must use + AXFR or IXFR either from the primary master or from some slave which + is also a master. No loops are permitted in the AXFR dependency + graph. + +3. NOTIFY Message + + 3.1. When a master has updated one or more RRs in which slave servers + may be interested, the master may send the changed RR's name, class, + type, and optionally, new RDATA(s), to each known slave server using + a best efforts protocol based on the NOTIFY opcode. + + 3.2. NOTIFY uses the DNS Message Format, although it uses only a + subset of the available fields. Fields not otherwise described + herein are to be filled with binary zero (0), and implementations + + + +Vixie Standards Track [Page 2] + +RFC 1996 DNS NOTIFY August 1996 + + + must ignore all messages for which this is not the case. + + 3.3. NOTIFY is similar to QUERY in that it has a request message with + the header QR flag "clear" and a response message with QR "set". The + response message contains no useful information, but its reception by + the master is an indication that the slave has received the NOTIFY + and that the master can remove the slave from any retry queue for + this NOTIFY event. + + 3.4. The transport protocol used for a NOTIFY transaction will be UDP + unless the master has reason to believe that TCP is necessary; for + example, if a firewall has been installed between master and slave, + and only TCP has been allowed; or, if the changed RR is too large to + fit in a UDP/DNS datagram. + + 3.5. If TCP is used, both master and slave must continue to offer + name service during the transaction, even when the TCP transaction is + not making progress. The NOTIFY request is sent once, and a + "timeout" is said to have occurred if no NOTIFY response is received + within a reasonable interval. + + 3.6. If UDP is used, a master periodically sends a NOTIFY request to + a slave until either too many copies have been sent (a "timeout"), an + ICMP message indicating that the port is unreachable, or until a + NOTIFY response is received from the slave with a matching query ID, + QNAME, IP source address, and UDP source port number. + + Note: + The interval between transmissions, and the total number of + retransmissions, should be operational parameters specifiable by + the name server administrator, perhaps on a per-zone basis. + Reasonable defaults are a 60 second interval (or timeout if + using TCP), and a maximum of 5 retransmissions (for UDP). It is + considered reasonable to use additive or exponential backoff for + the retry interval. + + 3.7. A NOTIFY request has QDCOUNT>0, ANCOUNT>=0, AUCOUNT>=0, + ADCOUNT>=0. If ANCOUNT>0, then the answer section represents an + unsecure hint at the new RRset for this <QNAME,QCLASS,QTYPE>. A + slave receiving such a hint is free to treat equivilence of this + answer section with its local data as a "no further work needs to be + done" indication. If ANCOUNT=0, or ANCOUNT>0 and the answer section + differs from the slave's local data, then the slave should query its + known masters to retrieve the new data. + + 3.8. In no case shall the answer section of a NOTIFY request be used + to update a slave's local data, or to indicate that a zone transfer + needs to be undertaken, or to change the slave's zone refresh timers. + + + +Vixie Standards Track [Page 3] + +RFC 1996 DNS NOTIFY August 1996 + + + Only a "data present; data same" condition can lead a slave to act + differently if ANCOUNT>0 than it would if ANCOUNT=0. + + 3.9. This version of the NOTIFY specification makes no use of the + authority or additional data sections, and so conforming + implementations should set AUCOUNT=0 and ADCOUNT=0 when transmitting + requests. Since a future revision of this specification may define a + backwards compatible use for either or both of these sections, + current implementations must ignore these sections, but not the + entire message, if AUCOUNT>0 and/or ADCOUNT>0. + + 3.10. If a slave receives a NOTIFY request from a host that is not a + known master for the zone containing the QNAME, it should ignore the + request and produce an error message in its operations log. + + Note: + This implies that slaves of a multihomed master must either know + their master by the "closest" of the master's interface + addresses, or must know all of the master's interface addresses. + Otherwise, a valid NOTIFY request might come from an address + that is not on the slave's state list of masters for the zone, + which would be an error. + + 3.11. The only defined NOTIFY event at this time is that the SOA RR + has changed. Upon completion of a NOTIFY transaction for QTYPE=SOA, + the slave should behave as though the zone given in the QNAME had + reached its REFRESH interval (see [RFC1035]), i.e., it should query + its masters for the SOA of the zone given in the NOTIFY QNAME, and + check the answer to see if the SOA SERIAL has been incremented since + the last time the zone was fetched. If so, a zone transfer (either + AXFR or IXFR) should be initiated. + + Note: + Because a deep server dependency graph may have multiple paths + from the primary master to any given slave, it is possible that + a slave will receive a NOTIFY from one of its known masters even + though the rest of its known masters have not yet updated their + copies of the zone. Therefore, when issuing a QUERY for the + zone's SOA, the query should be directed at the known master who + was the source of the NOTIFY event, and not at any of the other + known masters. This represents a departure from [RFC1035], + which specifies that upon expiry of the SOA REFRESH interval, + all known masters should be queried in turn. + + 3.12. If a NOTIFY request is received by a slave who does not + implement the NOTIFY opcode, it will respond with a NOTIMP + (unimplemented feature error) message. A master server who receives + such a NOTIMP should consider the NOTIFY transaction complete for + + + +Vixie Standards Track [Page 4] + +RFC 1996 DNS NOTIFY August 1996 + + + that slave. + +4. Details and Examples + + 4.1. Retaining query state information across host reboots is + optional, but it is reasonable to simply execute an SOA NOTIFY + transaction on each authority zone when a server first starts. + + 4.2. Each slave is likely to receive several copies of the same + NOTIFY request: One from the primary master, and one from each other + slave as that slave transfers the new zone and notifies its potential + peers. The NOTIFY protocol supports this multiplicity by requiring + that NOTIFY be sent by a slave/master only AFTER it has updated the + SOA RR or has determined that no update is necessary, which in + practice means after a successful zone transfer. Thus, barring + delivery reordering, the last NOTIFY any slave receives will be the + one indicating the latest change. Since a slave always requests SOAs + and AXFR/IXFRs only from its known masters, it will have an + opportunity to retry its QUERY for the SOA after each of its masters + have completed each zone update. + + 4.3. If a master server seeks to avoid causing a large number of + simultaneous outbound zone transfers, it may delay for an arbitrary + length of time before sending a NOTIFY message to any given slave. + It is expected that the time will be chosen at random, so that each + slave will begin its transfer at a unique time. The delay shall not + in any case be longer than the SOA REFRESH time. + + Note: + This delay should be a parameter that each primary master name + server can specify, perhaps on a per-zone basis. Random delays + of between 30 and 60 seconds would seem adequate if the servers + share a LAN and the zones are of moderate size. + + 4.4. A slave which receives a valid NOTIFY should defer action on any + subsequent NOTIFY with the same <QNAME,QCLASS,QTYPE> until it has + completed the transaction begun by the first NOTIFY. This duplicate + rejection is necessary to avoid having multiple notifications lead to + pummeling the master server. + + + + + + + + + + + + +Vixie Standards Track [Page 5] + +RFC 1996 DNS NOTIFY August 1996 + + + 4.5 Zone has Updated on Primary Master + + Primary master sends a NOTIFY request to all servers named in Notify + Set. The NOTIFY request has the following characteristics: + + query ID: (new) + op: NOTIFY (4) + resp: NOERROR + flags: AA + qcount: 1 + qname: (zone name) + qclass: (zone class) + qtype: T_SOA + + 4.6 Zone has Updated on a Slave that is also a Master + + As above in 4.5, except that this server's Notify Set may be + different from the Primary Master's due to optional static + specification of local stealth servers. + + 4.7 Slave Receives a NOTIFY Request from a Master + + When a slave server receives a NOTIFY request from one of its locally + designated masters for the zone enclosing the given QNAME, with + QTYPE=SOA and QR=0, it should enter the state it would if the zone's + refresh timer had expired. It will also send a NOTIFY response back + to the NOTIFY request's source, with the following characteristics: + + query ID: (same) + op: NOTIFY (4) + resp: NOERROR + flags: QR AA + qcount: 1 + qname: (zone name) + qclass: (zone class) + qtype: T_SOA + + This is intended to be identical to the NOTIFY request, except that + the QR bit is also set. The query ID of the response must be the + same as was received in the request. + + 4.8 Master Receives a NOTIFY Response from Slave + + When a master server receives a NOTIFY response, it deletes this + query from the retry queue, thus completing the "notification + process" of "this" RRset change to "that" server. + + + + + +Vixie Standards Track [Page 6] + +RFC 1996 DNS NOTIFY August 1996 + + +5. Security Considerations + + We believe that the NOTIFY operation's only security considerations + are: + + 1. That a NOTIFY request with a forged IP/UDP source address can + cause a slave to send spurious SOA queries to its masters, + leading to a benign denial of service attack if the forged + requests are sent very often. + + 2. That TCP spoofing could be used against a slave server given + NOTIFY as a means of synchronizing an SOA query and UDP/DNS + spoofing as a means of forcing a zone transfer. + +6. References + + [RFC1035] + Mockapetris, P., "Domain Names - Implementation and + Specification", STD 13, RFC 1035, November 1987. + + [IXFR] + Ohta, M., "Incremental Zone Transfer", RFC 1995, August 1996. + +7. Author's Address + + Paul Vixie + Internet Software Consortium + Star Route Box 159A + Woodside, CA 94062 + + Phone: +1 415 747 0204 + EMail: paul@vix.com + + + + + + + + + + + + + + + + + + + +Vixie Standards Track [Page 7] + diff --git a/doc/rfc/rfc2052.txt b/doc/rfc/rfc2052.txt new file mode 100644 index 00000000..46ba3629 --- /dev/null +++ b/doc/rfc/rfc2052.txt @@ -0,0 +1,563 @@ + + + + + + +Network Working Group A. Gulbrandsen +Request for Comments: 2052 Troll Technologies +Updates: 1035, 1183 P. Vixie +Category: Experimental Vixie Enterprises + October 1996 + + + A DNS RR for specifying the location of services (DNS SRV) + +Status of this Memo + + This memo defines an Experimental Protocol for the Internet + community. This memo does not specify an Internet standard of any + kind. Discussion and suggestions for improvement are requested. + Distribution of this memo is unlimited. + +Abstract + + This document describes a DNS RR which specifies the location of the + server(s) for a specific protocol and domain (like a more general + form of MX). + +Overview and rationale + + Currently, one must either know the exact address of a server to + contact it, or broadcast a question. This has led to, for example, + ftp.whatever.com aliases, the SMTP-specific MX RR, and using MAC- + level broadcasts to locate servers. + + The SRV RR allows administrators to use several servers for a single + domain, to move services from host to host with little fuss, and to + designate some hosts as primary servers for a service and others as + backups. + + Clients ask for a specific service/protocol for a specific domain + (the word domain is used here in the strict RFC 1034 sense), and get + back the names of any available servers. + +Introductory example + + When a SRV-cognizant web-browser wants to retrieve + + http://www.asdf.com/ + + it does a lookup of + + http.tcp.www.asdf.com + + + + +Gulbrandsen & Vixie Experimental [Page 1] + +RFC 2052 DNS SRV RR October 1996 + + + and retrieves the document from one of the servers in the reply. The + example zone file near the end of the memo contains answering RRs for + this query. + +The format of the SRV RR + + Here is the format of the SRV RR, whose DNS type code is 33: + + Service.Proto.Name TTL Class SRV Priority Weight Port Target + + (There is an example near the end of this document.) + + Service + The symbolic name of the desired service, as defined in Assigned + Numbers or locally. + + Some widely used services, notably POP, don't have a single + universal name. If Assigned Numbers names the service + indicated, that name is the only name which is legal for SRV + lookups. Only locally defined services may be named locally. + The Service is case insensitive. + + Proto + TCP and UDP are at present the most useful values + for this field, though any name defined by Assigned Numbers or + locally may be used (as for Service). The Proto is case + insensitive. + + Name + The domain this RR refers to. The SRV RR is unique in that the + name one searches for is not this name; the example near the end + shows this clearly. + + TTL + Standard DNS meaning. + + Class + Standard DNS meaning. + + Priority + As for MX, the priority of this target host. A client MUST + attempt to contact the target host with the lowest-numbered + priority it can reach; target hosts with the same priority + SHOULD be tried in pseudorandom order. The range is 0-65535. + + + + + + + +Gulbrandsen & Vixie Experimental [Page 2] + +RFC 2052 DNS SRV RR October 1996 + + + Weight + Load balancing mechanism. When selecting a target host among + the those that have the same priority, the chance of trying this + one first SHOULD be proportional to its weight. The range of + this number is 1-65535. Domain administrators are urged to use + Weight 0 when there isn't any load balancing to do, to make the + RR easier to read for humans (less noisy). + + Port + The port on this target host of this service. The range is + 0-65535. This is often as specified in Assigned Numbers but + need not be. + + Target + As for MX, the domain name of the target host. There MUST be + one or more A records for this name. Implementors are urged, but + not required, to return the A record(s) in the Additional Data + section. Name compression is to be used for this field. + + A Target of "." means that the service is decidedly not + available at this domain. + +Domain administrator advice + + Asking everyone to update their telnet (for example) clients when the + first internet site adds a SRV RR for Telnet/TCP is futile (even if + desirable). Therefore SRV will have to coexist with A record lookups + for a long time, and DNS administrators should try to provide A + records to support old clients: + + - Where the services for a single domain are spread over several + hosts, it seems advisable to have a list of A RRs at the same + DNS node as the SRV RR, listing reasonable (if perhaps + suboptimal) fallback hosts for Telnet, NNTP and other protocols + likely to be used with this name. Note that some programs only + try the first address they get back from e.g. gethostbyname(), + and we don't know how widespread this behaviour is. + + - Where one service is provided by several hosts, one can either + provide A records for all the hosts (in which case the round- + robin mechanism, where available, will share the load equally) + or just for one (presumably the fastest). + + - If a host is intended to provide a service only when the main + server(s) is/are down, it probably shouldn't be listed in A + records. + + + + + +Gulbrandsen & Vixie Experimental [Page 3] + +RFC 2052 DNS SRV RR October 1996 + + + - Hosts that are referenced by backup A records must use the port + number specified in Assigned Numbers for the service. + + Currently there's a practical limit of 512 bytes for DNS replies. + Until all resolvers can handle larger responses, domain + administrators are strongly advised to keep their SRV replies below + 512 bytes. + + All round numbers, wrote Dr. Johnson, are false, and these numbers + are very round: A reply packet has a 30-byte overhead plus the name + of the service ("telnet.tcp.asdf.com" for instance); each SRV RR adds + 20 bytes plus the name of the target host; each NS RR in the NS + section is 15 bytes plus the name of the name server host; and + finally each A RR in the additional data section is 20 bytes or so, + and there are A's for each SRV and NS RR mentioned in the answer. + This size estimate is extremely crude, but shouldn't underestimate + the actual answer size by much. If an answer may be close to the + limit, using e.g. "dig" to look at the actual answer is a good idea. + +The "Weight" field + + Weight, the load balancing field, is not quite satisfactory, but the + actual load on typical servers changes much too quickly to be kept + around in DNS caches. It seems to the authors that offering + administrators a way to say "this machine is three times as fast as + that one" is the best that can practically be done. + + The only way the authors can see of getting a "better" load figure is + asking a separate server when the client selects a server and + contacts it. For short-lived services like SMTP an extra step in the + connection establishment seems too expensive, and for long-lived + services like telnet, the load figure may well be thrown off a minute + after the connection is established when someone else starts or + finishes a heavy job. + +The Port number + + Currently, the translation from service name to port number happens + at the client, often using a file such as /etc/services. + + Moving this information to the DNS makes it less necessary to update + these files on every single computer of the net every time a new + service is added, and makes it possible to move standard services out + of the "root-only" port range on unix + + + + + + + +Gulbrandsen & Vixie Experimental [Page 4] + +RFC 2052 DNS SRV RR October 1996 + + +Usage rules + + A SRV-cognizant client SHOULD use this procedure to locate a list of + servers and connect to the preferred one: + + Do a lookup for QNAME=service.protocol.target, QCLASS=IN, + QTYPE=SRV. + + If the reply is NOERROR, ANCOUNT>0 and there is at least one SRV + RR which specifies the requested Service and Protocol in the + reply: + + If there is precisely one SRV RR, and its Target is "." + (the root domain), abort. + + Else, for all such RR's, build a list of (Priority, Weight, + Target) tuples + + Sort the list by priority (lowest number first) + + Create a new empty list + + For each distinct priority level + While there are still elements left at this priority + level + Select an element randomly, with probability + Weight, and move it to the tail of the new list + + For each element in the new list + + query the DNS for A RR's for the Target or use any + RR's found in the Additional Data secion of the + earlier SRV query. + + for each A RR found, try to connect to the (protocol, + address, service). + + else if the service desired is SMTP + + skip to RFC 974 (MX). + + else + + Do a lookup for QNAME=target, QCLASS=IN, QTYPE=A + + for each A RR found, try to connect to the (protocol, + address, service) + + + + +Gulbrandsen & Vixie Experimental [Page 5] + +RFC 2052 DNS SRV RR October 1996 + + + Notes: + + - Port numbers SHOULD NOT be used in place of the symbolic service + or protocol names (for the same reason why variant names cannot + be allowed: Applications would have to do two or more lookups). + + - If a truncated response comes back from an SRV query, and the + Additional Data section has at least one complete RR in it, the + answer MUST be considered complete and the client resolver + SHOULD NOT retry the query using TCP, but use normal UDP queries + for A RR's missing from the Additional Data section. + + - A client MAY use means other than Weight to choose among target + hosts with equal Priority. + + - A client MUST parse all of the RR's in the reply. + + - If the Additional Data section doesn't contain A RR's for all + the SRV RR's and the client may want to connect to the target + host(s) involved, the client MUST look up the A RR(s). (This + happens quite often when the A RR has shorter TTL than the SRV + or NS RR's.) + + - A future standard could specify that a SRV RR whose Protocol was + TCP and whose Service was SMTP would override RFC 974's rules + with regard to the use of an MX RR. This would allow firewalled + organizations with several SMTP relays to control the load + distribution using the Weight field. + + - Future protocols could be designed to use SRV RR lookups as the + means by which clients locate their servers. + +Fictional example + + This is (part of) the zone file for asdf.com, a still-unused domain: + + $ORIGIN asdf.com. + @ SOA server.asdf.com. root.asdf.com. ( + 1995032001 3600 3600 604800 86400 ) + NS server.asdf.com. + NS ns1.ip-provider.net. + NS ns2.ip-provider.net. + ftp.tcp SRV 0 0 21 server.asdf.com. + finger.tcp SRV 0 0 79 server.asdf.com. + ; telnet - use old-slow-box or new-fast-box if either is + ; available, make three quarters of the logins go to + ; new-fast-box. + telnet.tcp SRV 0 1 23 old-slow-box.asdf.com. + + + +Gulbrandsen & Vixie Experimental [Page 6] + +RFC 2052 DNS SRV RR October 1996 + + + SRV 0 3 23 new-fast-box.asdf.com. + ; if neither old-slow-box or new-fast-box is up, switch to + ; using the sysdmin's box and the server + SRV 1 0 23 sysadmins-box.asdf.com. + SRV 1 0 23 server.asdf.com. + ; HTTP - server is the main server, new-fast-box is the backup + ; (On new-fast-box, the HTTP daemon runs on port 8000) + http.tcp SRV 0 0 80 server.asdf.com. + SRV 10 0 8000 new-fast-box.asdf.com. + ; since we want to support both http://asdf.com/ and + ; http://www.asdf.com/ we need the next two RRs as well + http.tcp.www SRV 0 0 80 server.asdf.com. + SRV 10 0 8000 new-fast-box.asdf.com. + ; SMTP - mail goes to the server, and to the IP provider if + ; the net is down + smtp.tcp SRV 0 0 25 server.asdf.com. + SRV 1 0 25 mailhost.ip-provider.net. + @ MX 0 server.asdf.com. + MX 1 mailhost.ip-provider.net. + ; NNTP - use the IP providers's NNTP server + nntp.tcp SRV 0 0 119 nntphost.ip-provider.net. + ; IDB is an locally defined protocol + idb.tcp SRV 0 0 2025 new-fast-box.asdf.com. + ; addresses + server A 172.30.79.10 + old-slow-box A 172.30.79.11 + sysadmins-box A 172.30.79.12 + new-fast-box A 172.30.79.13 + ; backup A records - new-fast-box and old-slow-box are + ; included, naturally, and server is too, but might go + ; if the load got too bad + @ A 172.30.79.10 + A 172.30.79.11 + A 172.30.79.13 + ; backup A RR for www.asdf.com + www A 172.30.79.10 + ; NO other services are supported + *.tcp SRV 0 0 0 . + *.udp SRV 0 0 0 . + + In this example, a telnet connection to "asdf.com." needs an SRV + lookup of "telnet.tcp.asdf.com." and possibly A lookups of "new- + fast-box.asdf.com." and/or the other hosts named. The size of the + SRV reply is approximately 365 bytes: + + 30 bytes general overhead + 20 bytes for the query string, "telnet.tcp.asdf.com." + 130 bytes for 4 SRV RR's, 20 bytes each plus the lengths of "new- + + + +Gulbrandsen & Vixie Experimental [Page 7] + +RFC 2052 DNS SRV RR October 1996 + + + fast-box", "old-slow-box", "server" and "sysadmins-box" - + "asdf.com" in the query section is quoted here and doesn't + need to be counted again. + 75 bytes for 3 NS RRs, 15 bytes each plus the lengths of + "server", "ns1.ip-provider.net." and "ns2" - again, "ip- + provider.net." is quoted and only needs to be counted once. + 120 bytes for the 6 A RR's mentioned by the SRV and NS RR's. + +Refererences + + RFC 1918: Rekhter, Y., Moskowitz, R., Karrenberg, D., de Groot, G., + and E. Lear, "Address Allocation for Private Internets", + RFC 1918, February 1996. + + RFC 1916 Berkowitz, H., Ferguson, P, Leland, W. and P. Nesser, + "Enterprise Renumbering: Experience and Information + Solicitation", RFC 1916, February 1996. + + RFC 1912 Barr, D., "Common DNS Operational and Configuration + Errors", RFC 1912, February 1996. + + RFC 1900: Carpenter, B., and Y. Rekhter, "Renumbering Needs Work", + RFC 1900, February 1996. + + RFC 1920: Postel, J., "INTERNET OFFICIAL PROTOCOL STANDARDS", + STD 1, RFC 1920, March 1996. + + RFC 1814: Gerich, E., "Unique Addresses are Good", RFC 1814, June + 1995. + + RFC 1794: Brisco, T., "DNS Support for Load Balancing", April 1995. + + RFC 1713: Romao, A., "Tools for DNS debugging", November 1994. + + RFC 1712: Farrell, C., Schulze, M., Pleitner, S., and D. Baldoni, + "DNS Encoding of Geographical Location", RFC 1712, November + 1994. + + RFC 1706: Manning, B. and R. Colella, "DNS NSAP Resource Records", + RFC 1706, October 1994. + + RFC 1700: Reynolds, J., and J. Postel, "ASSIGNED NUMBERS", + STD 2, RFC 1700, October 1994. + + RFC 1183: Ullmann, R., Mockapetris, P., Mamakos, L., and + C. Everhart, "New DNS RR Definitions", RFC 1183, November + 1990. + + + + +Gulbrandsen & Vixie Experimental [Page 8] + +RFC 2052 DNS SRV RR October 1996 + + + RFC 1101: Mockapetris, P., "DNS encoding of network names and other + types", RFC 1101, April 1989. + + RFC 1035: Mockapetris, P., "Domain names - implementation and + specification", STD 13, RFC 1035, November 1987. + + RFC 1034: Mockapetris, P., "Domain names - concepts and + facilities", STD 13, RFC 1034, November 1987. + + RFC 1033: Lottor, M., "Domain administrators operations guide", + RFC 1033, November 1987. + + RFC 1032: Stahl, M., "Domain administrators guide", RFC 1032, + November 1987. + + RFC 974: Partridge, C., "Mail routing and the domain system", + STD 14, RFC 974, January 1986. + +Security Considerations + + The authors believes this RR to not cause any new security problems. + Some problems become more visible, though. + + - The ability to specify ports on a fine-grained basis obviously + changes how a router can filter packets. It becomes impossible + to block internal clients from accessing specific external + services, slightly harder to block internal users from running + unautorised services, and more important for the router + operations and DNS operations personnel to cooperate. + + - There is no way a site can keep its hosts from being referenced + as servers (as, indeed, some sites become unwilling secondary + MXes today). This could lead to denial of service. + + - With SRV, DNS spoofers can supply false port numbers, as well as + host names and addresses. The authors do not see any practical + effect of this. + + We assume that as the DNS-security people invent new features, DNS + servers will return the relevant RRs in the Additional Data section + when answering an SRV query. + + + + + + + + + + +Gulbrandsen & Vixie Experimental [Page 9] + +RFC 2052 DNS SRV RR October 1996 + + +Authors' Addresses + + Arnt Gulbrandsen + Troll Tech + Postboks 6133 Etterstad + N-0602 Oslo + Norway + + Phone: +47 22646966 + EMail: agulbra@troll.no + + + Paul Vixie + Vixie Enterprises + Star Route 159A + Woodside, CA 94062 + + Phone: (415) 747-0204 + EMail: paul@vix.com + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Gulbrandsen & Vixie Experimental [Page 10] + diff --git a/doc/rfc/rfc2104.txt b/doc/rfc/rfc2104.txt new file mode 100644 index 00000000..a205103a --- /dev/null +++ b/doc/rfc/rfc2104.txt @@ -0,0 +1,620 @@ + + + + + + +Network Working Group H. Krawczyk +Request for Comments: 2104 IBM +Category: Informational M. Bellare + UCSD + R. Canetti + IBM + February 1997 + + + HMAC: Keyed-Hashing for Message Authentication + +Status of This Memo + + This memo provides information for the Internet community. This memo + does not specify an Internet standard of any kind. Distribution of + this memo is unlimited. + +Abstract + + This document describes HMAC, a mechanism for message authentication + using cryptographic hash functions. HMAC can be used with any + iterative cryptographic hash function, e.g., MD5, SHA-1, in + combination with a secret shared key. The cryptographic strength of + HMAC depends on the properties of the underlying hash function. + +1. Introduction + + Providing a way to check the integrity of information transmitted + over or stored in an unreliable medium is a prime necessity in the + world of open computing and communications. Mechanisms that provide + such integrity check based on a secret key are usually called + "message authentication codes" (MAC). Typically, message + authentication codes are used between two parties that share a secret + key in order to validate information transmitted between these + parties. In this document we present such a MAC mechanism based on + cryptographic hash functions. This mechanism, called HMAC, is based + on work by the authors [BCK1] where the construction is presented and + cryptographically analyzed. We refer to that work for the details on + the rationale and security analysis of HMAC, and its comparison to + other keyed-hash methods. + + + + + + + + + + + +Krawczyk, et. al. Informational [Page 1] + +RFC 2104 HMAC February 1997 + + + HMAC can be used in combination with any iterated cryptographic hash + function. MD5 and SHA-1 are examples of such hash functions. HMAC + also uses a secret key for calculation and verification of the + message authentication values. The main goals behind this + construction are + + * To use, without modifications, available hash functions. + In particular, hash functions that perform well in software, + and for which code is freely and widely available. + + * To preserve the original performance of the hash function without + incurring a significant degradation. + + * To use and handle keys in a simple way. + + * To have a well understood cryptographic analysis of the strength of + the authentication mechanism based on reasonable assumptions on the + underlying hash function. + + * To allow for easy replaceability of the underlying hash function in + case that faster or more secure hash functions are found or + required. + + This document specifies HMAC using a generic cryptographic hash + function (denoted by H). Specific instantiations of HMAC need to + define a particular hash function. Current candidates for such hash + functions include SHA-1 [SHA], MD5 [MD5], RIPEMD-128/160 [RIPEMD]. + These different realizations of HMAC will be denoted by HMAC-SHA1, + HMAC-MD5, HMAC-RIPEMD, etc. + + Note: To the date of writing of this document MD5 and SHA-1 are the + most widely used cryptographic hash functions. MD5 has been recently + shown to be vulnerable to collision search attacks [Dobb]. This + attack and other currently known weaknesses of MD5 do not compromise + the use of MD5 within HMAC as specified in this document (see + [Dobb]); however, SHA-1 appears to be a cryptographically stronger + function. To this date, MD5 can be considered for use in HMAC for + applications where the superior performance of MD5 is critical. In + any case, implementers and users need to be aware of possible + cryptanalytic developments regarding any of these cryptographic hash + functions, and the eventual need to replace the underlying hash + function. (See section 6 for more information on the security of + HMAC.) + + + + + + + + +Krawczyk, et. al. Informational [Page 2] + +RFC 2104 HMAC February 1997 + + +2. Definition of HMAC + + The definition of HMAC requires a cryptographic hash function, which + we denote by H, and a secret key K. We assume H to be a cryptographic + hash function where data is hashed by iterating a basic compression + function on blocks of data. We denote by B the byte-length of such + blocks (B=64 for all the above mentioned examples of hash functions), + and by L the byte-length of hash outputs (L=16 for MD5, L=20 for + SHA-1). The authentication key K can be of any length up to B, the + block length of the hash function. Applications that use keys longer + than B bytes will first hash the key using H and then use the + resultant L byte string as the actual key to HMAC. In any case the + minimal recommended length for K is L bytes (as the hash output + length). See section 3 for more information on keys. + + We define two fixed and different strings ipad and opad as follows + (the 'i' and 'o' are mnemonics for inner and outer): + + ipad = the byte 0x36 repeated B times + opad = the byte 0x5C repeated B times. + + To compute HMAC over the data `text' we perform + + H(K XOR opad, H(K XOR ipad, text)) + + Namely, + + (1) append zeros to the end of K to create a B byte string + (e.g., if K is of length 20 bytes and B=64, then K will be + appended with 44 zero bytes 0x00) + (2) XOR (bitwise exclusive-OR) the B byte string computed in step + (1) with ipad + (3) append the stream of data 'text' to the B byte string resulting + from step (2) + (4) apply H to the stream generated in step (3) + (5) XOR (bitwise exclusive-OR) the B byte string computed in + step (1) with opad + (6) append the H result from step (4) to the B byte string + resulting from step (5) + (7) apply H to the stream generated in step (6) and output + the result + + For illustration purposes, sample code based on MD5 is provided as an + appendix. + + + + + + + +Krawczyk, et. al. Informational [Page 3] + +RFC 2104 HMAC February 1997 + + +3. Keys + + The key for HMAC can be of any length (keys longer than B bytes are + first hashed using H). However, less than L bytes is strongly + discouraged as it would decrease the security strength of the + function. Keys longer than L bytes are acceptable but the extra + length would not significantly increase the function strength. (A + longer key may be advisable if the randomness of the key is + considered weak.) + + Keys need to be chosen at random (or using a cryptographically strong + pseudo-random generator seeded with a random seed), and periodically + refreshed. (Current attacks do not indicate a specific recommended + frequency for key changes as these attacks are practically + infeasible. However, periodic key refreshment is a fundamental + security practice that helps against potential weaknesses of the + function and keys, and limits the damage of an exposed key.) + +4. Implementation Note + + HMAC is defined in such a way that the underlying hash function H can + be used with no modification to its code. In particular, it uses the + function H with the pre-defined initial value IV (a fixed value + specified by each iterative hash function to initialize its + compression function). However, if desired, a performance + improvement can be achieved at the cost of (possibly) modifying the + code of H to support variable IVs. + + The idea is that the intermediate results of the compression function + on the B-byte blocks (K XOR ipad) and (K XOR opad) can be precomputed + only once at the time of generation of the key K, or before its first + use. These intermediate results are stored and then used to + initialize the IV of H each time that a message needs to be + authenticated. This method saves, for each authenticated message, + the application of the compression function of H on two B-byte blocks + (i.e., on (K XOR ipad) and (K XOR opad)). Such a savings may be + significant when authenticating short streams of data. We stress + that the stored intermediate values need to be treated and protected + the same as secret keys. + + Choosing to implement HMAC in the above way is a decision of the + local implementation and has no effect on inter-operability. + + + + + + + + + +Krawczyk, et. al. Informational [Page 4] + +RFC 2104 HMAC February 1997 + + +5. Truncated output + + A well-known practice with message authentication codes is to + truncate the output of the MAC and output only part of the bits + (e.g., [MM, ANSI]). Preneel and van Oorschot [PV] show some + analytical advantages of truncating the output of hash-based MAC + functions. The results in this area are not absolute as for the + overall security advantages of truncation. It has advantages (less + information on the hash result available to an attacker) and + disadvantages (less bits to predict for the attacker). Applications + of HMAC can choose to truncate the output of HMAC by outputting the t + leftmost bits of the HMAC computation for some parameter t (namely, + the computation is carried in the normal way as defined in section 2 + above but the end result is truncated to t bits). We recommend that + the output length t be not less than half the length of the hash + output (to match the birthday attack bound) and not less than 80 bits + (a suitable lower bound on the number of bits that need to be + predicted by an attacker). We propose denoting a realization of HMAC + that uses a hash function H with t bits of output as HMAC-H-t. For + example, HMAC-SHA1-80 denotes HMAC computed using the SHA-1 function + and with the output truncated to 80 bits. (If the parameter t is not + specified, e.g. HMAC-MD5, then it is assumed that all the bits of the + hash are output.) + +6. Security + + The security of the message authentication mechanism presented here + depends on cryptographic properties of the hash function H: the + resistance to collision finding (limited to the case where the + initial value is secret and random, and where the output of the + function is not explicitly available to the attacker), and the + message authentication property of the compression function of H when + applied to single blocks (in HMAC these blocks are partially unknown + to an attacker as they contain the result of the inner H computation + and, in particular, cannot be fully chosen by the attacker). + + These properties, and actually stronger ones, are commonly assumed + for hash functions of the kind used with HMAC. In particular, a hash + function for which the above properties do not hold would become + unsuitable for most (probably, all) cryptographic applications, + including alternative message authentication schemes based on such + functions. (For a complete analysis and rationale of the HMAC + function the reader is referred to [BCK1].) + + + + + + + + +Krawczyk, et. al. Informational [Page 5] + +RFC 2104 HMAC February 1997 + + + Given the limited confidence gained so far as for the cryptographic + strength of candidate hash functions, it is important to observe the + following two properties of the HMAC construction and its secure use + for message authentication: + + 1. The construction is independent of the details of the particular + hash function H in use and then the latter can be replaced by any + other secure (iterative) cryptographic hash function. + + 2. Message authentication, as opposed to encryption, has a + "transient" effect. A published breaking of a message authentication + scheme would lead to the replacement of that scheme, but would have + no adversarial effect on information authenticated in the past. This + is in sharp contrast with encryption, where information encrypted + today may suffer from exposure in the future if, and when, the + encryption algorithm is broken. + + The strongest attack known against HMAC is based on the frequency of + collisions for the hash function H ("birthday attack") [PV,BCK2], and + is totally impractical for minimally reasonable hash functions. + + As an example, if we consider a hash function like MD5 where the + output length equals L=16 bytes (128 bits) the attacker needs to + acquire the correct message authentication tags computed (with the + _same_ secret key K!) on about 2**64 known plaintexts. This would + require the processing of at least 2**64 blocks under H, an + impossible task in any realistic scenario (for a block length of 64 + bytes this would take 250,000 years in a continuous 1Gbps link, and + without changing the secret key K during all this time). This attack + could become realistic only if serious flaws in the collision + behavior of the function H are discovered (e.g. collisions found + after 2**30 messages). Such a discovery would determine the immediate + replacement of the function H (the effects of such failure would be + far more severe for the traditional uses of H in the context of + digital signatures, public key certificates, etc.). + + Note: this attack needs to be strongly contrasted with regular + collision attacks on cryptographic hash functions where no secret key + is involved and where 2**64 off-line parallelizable (!) operations + suffice to find collisions. The latter attack is approaching + feasibility [VW] while the birthday attack on HMAC is totally + impractical. (In the above examples, if one uses a hash function + with, say, 160 bit of output then 2**64 should be replaced by 2**80.) + + + + + + + + +Krawczyk, et. al. Informational [Page 6] + +RFC 2104 HMAC February 1997 + + + A correct implementation of the above construction, the choice of + random (or cryptographically pseudorandom) keys, a secure key + exchange mechanism, frequent key refreshments, and good secrecy + protection of keys are all essential ingredients for the security of + the integrity verification mechanism provided by HMAC. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Krawczyk, et. al. Informational [Page 7] + +RFC 2104 HMAC February 1997 + + +Appendix -- Sample Code + + For the sake of illustration we provide the following sample code for + the implementation of HMAC-MD5 as well as some corresponding test + vectors (the code is based on MD5 code as described in [MD5]). + +/* +** Function: hmac_md5 +*/ + +void +hmac_md5(text, text_len, key, key_len, digest) +unsigned char* text; /* pointer to data stream */ +int text_len; /* length of data stream */ +unsigned char* key; /* pointer to authentication key */ +int key_len; /* length of authentication key */ +caddr_t digest; /* caller digest to be filled in */ + +{ + MD5_CTX context; + unsigned char k_ipad[65]; /* inner padding - + * key XORd with ipad + */ + unsigned char k_opad[65]; /* outer padding - + * key XORd with opad + */ + unsigned char tk[16]; + int i; + /* if key is longer than 64 bytes reset it to key=MD5(key) */ + if (key_len > 64) { + + MD5_CTX tctx; + + MD5Init(&tctx); + MD5Update(&tctx, key, key_len); + MD5Final(tk, &tctx); + + key = tk; + key_len = 16; + } + + /* + * the HMAC_MD5 transform looks like: + * + * MD5(K XOR opad, MD5(K XOR ipad, text)) + * + * where K is an n byte key + * ipad is the byte 0x36 repeated 64 times + + + +Krawczyk, et. al. Informational [Page 8] + +RFC 2104 HMAC February 1997 + + + * opad is the byte 0x5c repeated 64 times + * and text is the data being protected + */ + + /* start out by storing key in pads */ + bzero( k_ipad, sizeof k_ipad); + bzero( k_opad, sizeof k_opad); + bcopy( key, k_ipad, key_len); + bcopy( key, k_opad, key_len); + + /* XOR key with ipad and opad values */ + for (i=0; i<64; i++) { + k_ipad[i] ^= 0x36; + k_opad[i] ^= 0x5c; + } + /* + * perform inner MD5 + */ + MD5Init(&context); /* init context for 1st + * pass */ + MD5Update(&context, k_ipad, 64) /* start with inner pad */ + MD5Update(&context, text, text_len); /* then text of datagram */ + MD5Final(digest, &context); /* finish up 1st pass */ + /* + * perform outer MD5 + */ + MD5Init(&context); /* init context for 2nd + * pass */ + MD5Update(&context, k_opad, 64); /* start with outer pad */ + MD5Update(&context, digest, 16); /* then results of 1st + * hash */ + MD5Final(digest, &context); /* finish up 2nd pass */ +} + +Test Vectors (Trailing '\0' of a character string not included in test): + + key = 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b + key_len = 16 bytes + data = "Hi There" + data_len = 8 bytes + digest = 0x9294727a3638bb1c13f48ef8158bfc9d + + key = "Jefe" + data = "what do ya want for nothing?" + data_len = 28 bytes + digest = 0x750c783e6ab0b503eaa86e310a5db738 + + key = 0xAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA + + + +Krawczyk, et. al. Informational [Page 9] + +RFC 2104 HMAC February 1997 + + + key_len 16 bytes + data = 0xDDDDDDDDDDDDDDDDDDDD... + ..DDDDDDDDDDDDDDDDDDDD... + ..DDDDDDDDDDDDDDDDDDDD... + ..DDDDDDDDDDDDDDDDDDDD... + ..DDDDDDDDDDDDDDDDDDDD + data_len = 50 bytes + digest = 0x56be34521d144c88dbb8c733f0e8b3f6 + +Acknowledgments + + Pau-Chen Cheng, Jeff Kraemer, and Michael Oehler, have provided + useful comments on early drafts, and ran the first interoperability + tests of this specification. Jeff and Pau-Chen kindly provided the + sample code and test vectors that appear in the appendix. Burt + Kaliski, Bart Preneel, Matt Robshaw, Adi Shamir, and Paul van + Oorschot have provided useful comments and suggestions during the + investigation of the HMAC construction. + +References + + [ANSI] ANSI X9.9, "American National Standard for Financial + Institution Message Authentication (Wholesale)," American + Bankers Association, 1981. Revised 1986. + + [Atk] Atkinson, R., "IP Authentication Header", RFC 1826, August + 1995. + + [BCK1] M. Bellare, R. Canetti, and H. Krawczyk, + "Keyed Hash Functions and Message Authentication", + Proceedings of Crypto'96, LNCS 1109, pp. 1-15. + (http://www.research.ibm.com/security/keyed-md5.html) + + [BCK2] M. Bellare, R. Canetti, and H. Krawczyk, + "Pseudorandom Functions Revisited: The Cascade Construction", + Proceedings of FOCS'96. + + [Dobb] H. Dobbertin, "The Status of MD5 After a Recent Attack", + RSA Labs' CryptoBytes, Vol. 2 No. 2, Summer 1996. + http://www.rsa.com/rsalabs/pubs/cryptobytes.html + + [PV] B. Preneel and P. van Oorschot, "Building fast MACs from hash + functions", Advances in Cryptology -- CRYPTO'95 Proceedings, + Lecture Notes in Computer Science, Springer-Verlag Vol.963, + 1995, pp. 1-14. + + [MD5] Rivest, R., "The MD5 Message-Digest Algorithm", + RFC 1321, April 1992. + + + +Krawczyk, et. al. Informational [Page 10] + +RFC 2104 HMAC February 1997 + + + [MM] Meyer, S. and Matyas, S.M., Cryptography, New York Wiley, + 1982. + + [RIPEMD] H. Dobbertin, A. Bosselaers, and B. Preneel, "RIPEMD-160: A + strengthened version of RIPEMD", Fast Software Encryption, + LNCS Vol 1039, pp. 71-82. + ftp://ftp.esat.kuleuven.ac.be/pub/COSIC/bosselae/ripemd/. + + [SHA] NIST, FIPS PUB 180-1: Secure Hash Standard, April 1995. + + [Tsu] G. Tsudik, "Message authentication with one-way hash + functions", In Proceedings of Infocom'92, May 1992. + (Also in "Access Control and Policy Enforcement in + Internetworks", Ph.D. Dissertation, Computer Science + Department, University of Southern California, April 1991.) + + [VW] P. van Oorschot and M. Wiener, "Parallel Collision + Search with Applications to Hash Functions and Discrete + Logarithms", Proceedings of the 2nd ACM Conf. Computer and + Communications Security, Fairfax, VA, November 1994. + +Authors' Addresses + + Hugo Krawczyk + IBM T.J. Watson Research Center + P.O.Box 704 + Yorktown Heights, NY 10598 + + EMail: hugo@watson.ibm.com + + Mihir Bellare + Dept of Computer Science and Engineering + Mail Code 0114 + University of California at San Diego + 9500 Gilman Drive + La Jolla, CA 92093 + + EMail: mihir@cs.ucsd.edu + + Ran Canetti + IBM T.J. Watson Research Center + P.O.Box 704 + Yorktown Heights, NY 10598 + + EMail: canetti@watson.ibm.com + + + + + + +Krawczyk, et. al. Informational [Page 11] + + diff --git a/doc/rfc/rfc2119.txt b/doc/rfc/rfc2119.txt new file mode 100644 index 00000000..c19d55b0 --- /dev/null +++ b/doc/rfc/rfc2119.txt @@ -0,0 +1,171 @@ +
+
+
+
+
+
+Network Working Group S. Bradner
+Request for Comments: 2119 Harvard University
+BCP: 14 March 1997
+Category: Best Current Practice
+
+
+ Key words for use in RFCs to Indicate Requirement Levels
+
+Status of this Memo
+
+ This document specifies an Internet Best Current Practices for the
+ Internet Community, and requests discussion and suggestions for
+ improvements. Distribution of this memo is unlimited.
+
+Abstract
+
+ In many standards track documents several words are used to signify
+ the requirements in the specification. These words are often
+ capitalized. This document defines these words as they should be
+ interpreted in IETF documents. Authors who follow these guidelines
+ should incorporate this phrase near the beginning of their document:
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
+ NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
+ "OPTIONAL" in this document are to be interpreted as described in
+ RFC 2119.
+
+ Note that the force of these words is modified by the requirement
+ level of the document in which they are used.
+
+1. MUST This word, or the terms "REQUIRED" or "SHALL", mean that the
+ definition is an absolute requirement of the specification.
+
+2. MUST NOT This phrase, or the phrase "SHALL NOT", mean that the
+ definition is an absolute prohibition of the specification.
+
+3. SHOULD This word, or the adjective "RECOMMENDED", mean that there
+ may exist valid reasons in particular circumstances to ignore a
+ particular item, but the full implications must be understood and
+ carefully weighed before choosing a different course.
+
+4. SHOULD NOT This phrase, or the phrase "NOT RECOMMENDED" mean that
+ there may exist valid reasons in particular circumstances when the
+ particular behavior is acceptable or even useful, but the full
+ implications should be understood and the case carefully weighed
+ before implementing any behavior described with this label.
+
+
+
+
+
+Bradner Best Current Practice [Page 1]
+
+RFC 2119 RFC Key Words March 1997
+
+
+5. MAY This word, or the adjective "OPTIONAL", mean that an item is
+ truly optional. One vendor may choose to include the item because a
+ particular marketplace requires it or because the vendor feels that
+ it enhances the product while another vendor may omit the same item.
+ An implementation which does not include a particular option MUST be
+ prepared to interoperate with another implementation which does
+ include the option, though perhaps with reduced functionality. In the
+ same vein an implementation which does include a particular option
+ MUST be prepared to interoperate with another implementation which
+ does not include the option (except, of course, for the feature the
+ option provides.)
+
+6. Guidance in the use of these Imperatives
+
+ Imperatives of the type defined in this memo must be used with care
+ and sparingly. In particular, they MUST only be used where it is
+ actually required for interoperation or to limit behavior which has
+ potential for causing harm (e.g., limiting retransmisssions) For
+ example, they must not be used to try to impose a particular method
+ on implementors where the method is not required for
+ interoperability.
+
+7. Security Considerations
+
+ These terms are frequently used to specify behavior with security
+ implications. The effects on security of not implementing a MUST or
+ SHOULD, or doing something the specification says MUST NOT or SHOULD
+ NOT be done may be very subtle. Document authors should take the time
+ to elaborate the security implications of not following
+ recommendations or requirements as most implementors will not have
+ had the benefit of the experience and discussion that produced the
+ specification.
+
+8. Acknowledgments
+
+ The definitions of these terms are an amalgam of definitions taken
+ from a number of RFCs. In addition, suggestions have been
+ incorporated from a number of people including Robert Ullmann, Thomas
+ Narten, Neal McBurnett, and Robert Elz.
+
+
+
+
+
+
+
+
+
+
+
+
+Bradner Best Current Practice [Page 2]
+
+RFC 2119 RFC Key Words March 1997
+
+
+9. Author's Address
+
+ Scott Bradner
+ Harvard University
+ 1350 Mass. Ave.
+ Cambridge, MA 02138
+
+ phone - +1 617 495 3864
+
+ email - sob@harvard.edu
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Bradner Best Current Practice [Page 3]
+
diff --git a/doc/rfc/rfc2136.txt b/doc/rfc/rfc2136.txt new file mode 100644 index 00000000..4d62702e --- /dev/null +++ b/doc/rfc/rfc2136.txt @@ -0,0 +1,1460 @@ + + + + + + +Network Working Group P. Vixie, Editor +Request for Comments: 2136 ISC +Updates: 1035 S. Thomson +Category: Standards Track Bellcore + Y. Rekhter + Cisco + J. Bound + DEC + April 1997 + + Dynamic Updates in the Domain Name System (DNS UPDATE) + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Abstract + + The Domain Name System was originally designed to support queries of + a statically configured database. While the data was expected to + change, the frequency of those changes was expected to be fairly low, + and all updates were made as external edits to a zone's Master File. + + Using this specification of the UPDATE opcode, it is possible to add + or delete RRs or RRsets from a specified zone. Prerequisites are + specified separately from update operations, and can specify a + dependency upon either the previous existence or nonexistence of an + RRset, or the existence of a single RR. + + UPDATE is atomic, i.e., all prerequisites must be satisfied or else + no update operations will take place. There are no data dependent + error conditions defined after the prerequisites have been met. + +1 - Definitions + + This document intentionally gives more definition to the roles of + "Master," "Slave," and "Primary Master" servers, and their + enumeration in NS RRs, and the SOA MNAME field. In that sense, the + following server type definitions can be considered an addendum to + [RFC1035], and are intended to be consistent with [RFC1996]: + + Slave an authoritative server that uses AXFR or IXFR to + retrieve the zone and is named in the zone's NS + RRset. + + + +Vixie, et. al. Standards Track [Page 1] + +RFC 2136 DNS Update April 1997 + + + Master an authoritative server configured to be the + source of AXFR or IXFR data for one or more slave + servers. + + Primary Master master server at the root of the AXFR/IXFR + dependency graph. The primary master is named in + the zone's SOA MNAME field and optionally by an NS + RR. There is by definition only one primary master + server per zone. + + A domain name identifies a node within the domain name space tree + structure. Each node has a set (possibly empty) of Resource Records + (RRs). All RRs having the same NAME, CLASS and TYPE are called a + Resource Record Set (RRset). + + The pseudocode used in this document is for example purposes only. + If it is found to disagree with the text, the text shall be + considered authoritative. If the text is found to be ambiguous, the + pseudocode can be used to help resolve the ambiguity. + + 1.1 - Comparison Rules + + 1.1.1. Two RRs are considered equal if their NAME, CLASS, TYPE, + RDLENGTH and RDATA fields are equal. Note that the time-to-live + (TTL) field is explicitly excluded from the comparison. + + 1.1.2. The rules for comparison of character strings in names are + specified in [RFC1035 2.3.3]. + + 1.1.3. Wildcarding is disabled. That is, a wildcard ("*") in an + update only matches a wildcard ("*") in the zone, and vice versa. + + 1.1.4. Aliasing is disabled: A CNAME in the zone matches a CNAME in + the update, and will not otherwise be followed. All UPDATE + operations are done on the basis of canonical names. + + 1.1.5. The following RR types cannot be appended to an RRset. If the + following comparison rules are met, then an attempt to add the new RR + will result in the replacement of the previous RR: + + SOA compare only NAME, CLASS and TYPE -- it is not possible to + have more than one SOA per zone, even if any of the data + fields differ. + + WKS compare only NAME, CLASS, TYPE, ADDRESS, and PROTOCOL + -- only one WKS RR is possible for this tuple, even if the + services masks differ. + + + + +Vixie, et. al. Standards Track [Page 2] + +RFC 2136 DNS Update April 1997 + + + CNAME compare only NAME, CLASS, and TYPE -- it is not possible + to have more than one CNAME RR, even if their data fields + differ. + + 1.2 - Glue RRs + + For the purpose of determining whether a domain name used in the + UPDATE protocol is contained within a specified zone, a domain name + is "in" a zone if it is owned by that zone's domain name. See + section 7.18 for details. + + 1.3 - New Assigned Numbers + + CLASS = NONE (254) + RCODE = YXDOMAIN (6) + RCODE = YXRRSET (7) + RCODE = NXRRSET (8) + RCODE = NOTAUTH (9) + RCODE = NOTZONE (10) + Opcode = UPDATE (5) + +2 - Update Message Format + + The DNS Message Format is defined by [RFC1035 4.1]. Some extensions + are necessary (for example, more error codes are possible under + UPDATE than under QUERY) and some fields must be overloaded (see + description of CLASS fields below). + + The overall format of an UPDATE message is, following [ibid]: + + +---------------------+ + | Header | + +---------------------+ + | Zone | specifies the zone to be updated + +---------------------+ + | Prerequisite | RRs or RRsets which must (not) preexist + +---------------------+ + | Update | RRs or RRsets to be added or deleted + +---------------------+ + | Additional Data | additional data + +---------------------+ + + + + + + + + + + +Vixie, et. al. Standards Track [Page 3] + +RFC 2136 DNS Update April 1997 + + + The Header Section specifies that this message is an UPDATE, and + describes the size of the other sections. The Zone Section names the + zone that is to be updated by this message. The Prerequisite Section + specifies the starting invariants (in terms of zone content) required + for this update. The Update Section contains the edits to be made, + and the Additional Data Section contains data which may be necessary + to complete, but is not part of, this update. + + 2.1 - Transport Issues + + An update transaction may be carried in a UDP datagram, if the + request fits, or in a TCP connection (at the discretion of the + requestor). When TCP is used, the message is in the format described + in [RFC1035 4.2.2]. + + 2.2 - Message Header + + The header of the DNS Message Format is defined by [RFC 1035 4.1]. + Not all opcodes define the same set of flag bits, though as a + practical matter most of the bits defined for QUERY (in [ibid]) are + identically defined by the other opcodes. UPDATE uses only one flag + bit (QR). + + The DNS Message Format specifies record counts for its four sections + (Question, Answer, Authority, and Additional). UPDATE uses the same + fields, and the same section formats, but the naming and use of these + sections differs as shown in the following modified header, after + [RFC1035 4.1.1]: + + 1 1 1 1 1 1 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | ID | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + |QR| Opcode | Z | RCODE | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | ZOCOUNT | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | PRCOUNT | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | UPCOUNT | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | ADCOUNT | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + + + + + + + +Vixie, et. al. Standards Track [Page 4] + +RFC 2136 DNS Update April 1997 + + + These fields are used as follows: + + ID A 16-bit identifier assigned by the entity that generates any + kind of request. This identifier is copied in the + corresponding reply and can be used by the requestor to match + replies to outstanding requests, or by the server to detect + duplicated requests from some requestor. + + QR A one bit field that specifies whether this message is a + request (0), or a response (1). + + Opcode A four bit field that specifies the kind of request in this + message. This value is set by the originator of a request + and copied into the response. The Opcode value that + identifies an UPDATE message is five (5). + + Z Reserved for future use. Should be zero (0) in all requests + and responses. A non-zero Z field should be ignored by + implementations of this specification. + + RCODE Response code - this four bit field is undefined in requests + and set in responses. The values and meanings of this field + within responses are as follows: + + Mneumonic Value Description + ------------------------------------------------------------ + NOERROR 0 No error condition. + FORMERR 1 The name server was unable to interpret + the request due to a format error. + SERVFAIL 2 The name server encountered an internal + failure while processing this request, + for example an operating system error + or a forwarding timeout. + NXDOMAIN 3 Some name that ought to exist, + does not exist. + NOTIMP 4 The name server does not support + the specified Opcode. + REFUSED 5 The name server refuses to perform the + specified operation for policy or + security reasons. + YXDOMAIN 6 Some name that ought not to exist, + does exist. + YXRRSET 7 Some RRset that ought not to exist, + does exist. + NXRRSET 8 Some RRset that ought to exist, + does not exist. + + + + + +Vixie, et. al. Standards Track [Page 5] + +RFC 2136 DNS Update April 1997 + + + NOTAUTH 9 The server is not authoritative for + the zone named in the Zone Section. + NOTZONE 10 A name used in the Prerequisite or + Update Section is not within the + zone denoted by the Zone Section. + + ZOCOUNT The number of RRs in the Zone Section. + + PRCOUNT The number of RRs in the Prerequisite Section. + + UPCOUNT The number of RRs in the Update Section. + + ADCOUNT The number of RRs in the Additional Data Section. + + 2.3 - Zone Section + + The Zone Section has the same format as that specified in [RFC1035 + 4.1.2], with the fields redefined as follows: + + 1 1 1 1 1 1 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | | + / ZNAME / + / / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | ZTYPE | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | ZCLASS | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + + UPDATE uses this section to denote the zone of the records being + updated. All records to be updated must be in the same zone, and + therefore the Zone Section is allowed to contain exactly one record. + The ZNAME is the zone name, the ZTYPE must be SOA, and the ZCLASS is + the zone's class. + + 2.4 - Prerequisite Section + + This section contains a set of RRset prerequisites which must be + satisfied at the time the UPDATE packet is received by the primary + master server. The format of this section is as specified by + [RFC1035 4.1.3]. There are five possible sets of semantics that can + be expressed here, summarized as follows and then explained below. + + (1) RRset exists (value independent). At least one RR with a + specified NAME and TYPE (in the zone and class specified by + the Zone Section) must exist. + + + +Vixie, et. al. Standards Track [Page 6] + +RFC 2136 DNS Update April 1997 + + + (2) RRset exists (value dependent). A set of RRs with a + specified NAME and TYPE exists and has the same members + with the same RDATAs as the RRset specified here in this + Section. + + (3) RRset does not exist. No RRs with a specified NAME and TYPE + (in the zone and class denoted by the Zone Section) can exist. + + (4) Name is in use. At least one RR with a specified NAME (in + the zone and class specified by the Zone Section) must exist. + Note that this prerequisite is NOT satisfied by empty + nonterminals. + + (5) Name is not in use. No RR of any type is owned by a + specified NAME. Note that this prerequisite IS satisfied by + empty nonterminals. + + The syntax of these is as follows: + + 2.4.1 - RRset Exists (Value Independent) + + At least one RR with a specified NAME and TYPE (in the zone and class + specified in the Zone Section) must exist. + + For this prerequisite, a requestor adds to the section a single RR + whose NAME and TYPE are equal to that of the zone RRset whose + existence is required. RDLENGTH is zero and RDATA is therefore + empty. CLASS must be specified as ANY to differentiate this + condition from that of an actual RR whose RDLENGTH is naturally zero + (0) (e.g., NULL). TTL is specified as zero (0). + + 2.4.2 - RRset Exists (Value Dependent) + + A set of RRs with a specified NAME and TYPE exists and has the same + members with the same RDATAs as the RRset specified here in this + section. While RRset ordering is undefined and therefore not + significant to this comparison, the sets be identical in their + extent. + + For this prerequisite, a requestor adds to the section an entire + RRset whose preexistence is required. NAME and TYPE are that of the + RRset being denoted. CLASS is that of the zone. TTL must be + specified as zero (0) and is ignored when comparing RRsets for + identity. + + + + + + + +Vixie, et. al. Standards Track [Page 7] + +RFC 2136 DNS Update April 1997 + + + 2.4.3 - RRset Does Not Exist + + No RRs with a specified NAME and TYPE (in the zone and class denoted + by the Zone Section) can exist. + + For this prerequisite, a requestor adds to the section a single RR + whose NAME and TYPE are equal to that of the RRset whose nonexistence + is required. The RDLENGTH of this record is zero (0), and RDATA + field is therefore empty. CLASS must be specified as NONE in order + to distinguish this condition from a valid RR whose RDLENGTH is + naturally zero (0) (for example, the NULL RR). TTL must be specified + as zero (0). + + 2.4.4 - Name Is In Use + + Name is in use. At least one RR with a specified NAME (in the zone + and class specified by the Zone Section) must exist. Note that this + prerequisite is NOT satisfied by empty nonterminals. + + For this prerequisite, a requestor adds to the section a single RR + whose NAME is equal to that of the name whose ownership of an RR is + required. RDLENGTH is zero and RDATA is therefore empty. CLASS must + be specified as ANY to differentiate this condition from that of an + actual RR whose RDLENGTH is naturally zero (0) (e.g., NULL). TYPE + must be specified as ANY to differentiate this case from that of an + RRset existence test. TTL is specified as zero (0). + + 2.4.5 - Name Is Not In Use + + Name is not in use. No RR of any type is owned by a specified NAME. + Note that this prerequisite IS satisfied by empty nonterminals. + + For this prerequisite, a requestor adds to the section a single RR + whose NAME is equal to that of the name whose nonownership of any RRs + is required. RDLENGTH is zero and RDATA is therefore empty. CLASS + must be specified as NONE. TYPE must be specified as ANY. TTL must + be specified as zero (0). + + 2.5 - Update Section + + This section contains RRs to be added to or deleted from the zone. + The format of this section is as specified by [RFC1035 4.1.3]. There + are four possible sets of semantics, summarized below and with + details to follow. + + + + + + + +Vixie, et. al. Standards Track [Page 8] + +RFC 2136 DNS Update April 1997 + + + (1) Add RRs to an RRset. + (2) Delete an RRset. + (3) Delete all RRsets from a name. + (4) Delete an RR from an RRset. + + The syntax of these is as follows: + + 2.5.1 - Add To An RRset + + RRs are added to the Update Section whose NAME, TYPE, TTL, RDLENGTH + and RDATA are those being added, and CLASS is the same as the zone + class. Any duplicate RRs will be silently ignored by the primary + master. + + 2.5.2 - Delete An RRset + + One RR is added to the Update Section whose NAME and TYPE are those + of the RRset to be deleted. TTL must be specified as zero (0) and is + otherwise not used by the primary master. CLASS must be specified as + ANY. RDLENGTH must be zero (0) and RDATA must therefore be empty. + If no such RRset exists, then this Update RR will be silently ignored + by the primary master. + + 2.5.3 - Delete All RRsets From A Name + + One RR is added to the Update Section whose NAME is that of the name + to be cleansed of RRsets. TYPE must be specified as ANY. TTL must + be specified as zero (0) and is otherwise not used by the primary + master. CLASS must be specified as ANY. RDLENGTH must be zero (0) + and RDATA must therefore be empty. If no such RRsets exist, then + this Update RR will be silently ignored by the primary master. + + 2.5.4 - Delete An RR From An RRset + + RRs to be deleted are added to the Update Section. The NAME, TYPE, + RDLENGTH and RDATA must match the RR being deleted. TTL must be + specified as zero (0) and will otherwise be ignored by the primary + master. CLASS must be specified as NONE to distinguish this from an + RR addition. If no such RRs exist, then this Update RR will be + silently ignored by the primary master. + + + + + + + + + + + +Vixie, et. al. Standards Track [Page 9] + +RFC 2136 DNS Update April 1997 + + + 2.6 - Additional Data Section + + This section contains RRs which are related to the update itself, or + to new RRs being added by the update. For example, out of zone glue + (A RRs referred to by new NS RRs) should be presented here. The + server can use or ignore out of zone glue, at the discretion of the + server implementor. The format of this section is as specified by + [RFC1035 4.1.3]. + +3 - Server Behavior + + A server, upon receiving an UPDATE request, will signal NOTIMP to the + requestor if the UPDATE opcode is not recognized or if it is + recognized but has not been implemented. Otherwise, processing + continues as follows. + + 3.1 - Process Zone Section + + 3.1.1. The Zone Section is checked to see that there is exactly one + RR therein and that the RR's ZTYPE is SOA, else signal FORMERR to the + requestor. Next, the ZNAME and ZCLASS are checked to see if the zone + so named is one of this server's authority zones, else signal NOTAUTH + to the requestor. If the server is a zone slave, the request will be + forwarded toward the primary master. + + 3.1.2 - Pseudocode For Zone Section Processing + + if (zcount != 1 || ztype != SOA) + return (FORMERR) + if (zone_type(zname, zclass) == SLAVE) + return forward() + if (zone_type(zname, zclass) == MASTER) + return update() + return (NOTAUTH) + + Sections 3.2 through 3.8 describe the primary master's behaviour, + whereas Section 6 describes a forwarder's behaviour. + + 3.2 - Process Prerequisite Section + + Next, the Prerequisite Section is checked to see that all + prerequisites are satisfied by the current state of the zone. Using + the definitions expressed in Section 1.2, if any RR's NAME is not + within the zone specified in the Zone Section, signal NOTZONE to the + requestor. + + + + + + +Vixie, et. al. Standards Track [Page 10] + +RFC 2136 DNS Update April 1997 + + + 3.2.1. For RRs in this section whose CLASS is ANY, test to see that + TTL and RDLENGTH are both zero (0), else signal FORMERR to the + requestor. If TYPE is ANY, test to see that there is at least one RR + in the zone whose NAME is the same as that of the Prerequisite RR, + else signal NXDOMAIN to the requestor. If TYPE is not ANY, test to + see that there is at least one RR in the zone whose NAME and TYPE are + the same as that of the Prerequisite RR, else signal NXRRSET to the + requestor. + + 3.2.2. For RRs in this section whose CLASS is NONE, test to see that + the TTL and RDLENGTH are both zero (0), else signal FORMERR to the + requestor. If the TYPE is ANY, test to see that there are no RRs in + the zone whose NAME is the same as that of the Prerequisite RR, else + signal YXDOMAIN to the requestor. If the TYPE is not ANY, test to + see that there are no RRs in the zone whose NAME and TYPE are the + same as that of the Prerequisite RR, else signal YXRRSET to the + requestor. + + 3.2.3. For RRs in this section whose CLASS is the same as the ZCLASS, + test to see that the TTL is zero (0), else signal FORMERR to the + requestor. Then, build an RRset for each unique <NAME,TYPE> and + compare each resulting RRset for set equality (same members, no more, + no less) with RRsets in the zone. If any Prerequisite RRset is not + entirely and exactly matched by a zone RRset, signal NXRRSET to the + requestor. If any RR in this section has a CLASS other than ZCLASS + or NONE or ANY, signal FORMERR to the requestor. + + 3.2.4 - Table Of Metavalues Used In Prerequisite Section + + CLASS TYPE RDATA Meaning + ------------------------------------------------------------ + ANY ANY empty Name is in use + ANY rrset empty RRset exists (value independent) + NONE ANY empty Name is not in use + NONE rrset empty RRset does not exist + zone rrset rr RRset exists (value dependent) + + + + + + + + + + + + + + + +Vixie, et. al. Standards Track [Page 11] + +RFC 2136 DNS Update April 1997 + + + 3.2.5 - Pseudocode for Prerequisite Section Processing + + for rr in prerequisites + if (rr.ttl != 0) + return (FORMERR) + if (zone_of(rr.name) != ZNAME) + return (NOTZONE); + if (rr.class == ANY) + if (rr.rdlength != 0) + return (FORMERR) + if (rr.type == ANY) + if (!zone_name<rr.name>) + return (NXDOMAIN) + else + if (!zone_rrset<rr.name, rr.type>) + return (NXRRSET) + if (rr.class == NONE) + if (rr.rdlength != 0) + return (FORMERR) + if (rr.type == ANY) + if (zone_name<rr.name>) + return (YXDOMAIN) + else + if (zone_rrset<rr.name, rr.type>) + return (YXRRSET) + if (rr.class == zclass) + temp<rr.name, rr.type> += rr + else + return (FORMERR) + + for rrset in temp + if (zone_rrset<rrset.name, rrset.type> != rrset) + return (NXRRSET) + + 3.3 - Check Requestor's Permissions + + 3.3.1. Next, the requestor's permission to update the RRs named in + the Update Section may be tested in an implementation dependent + fashion or using mechanisms specified in a subsequent Secure DNS + Update protocol. If the requestor does not have permission to + perform these updates, the server may write a warning message in its + operations log, and may either signal REFUSED to the requestor, or + ignore the permission problem and proceed with the update. + + + + + + + + +Vixie, et. al. Standards Track [Page 12] + +RFC 2136 DNS Update April 1997 + + + 3.3.2. While the exact processing is implementation defined, if these + verification activities are to be performed, this is the point in the + server's processing where such performance should take place, since + if a REFUSED condition is encountered after an update has been + partially applied, it will be necessary to undo the partial update + and restore the zone to its original state before answering the + requestor. + + 3.3.3 - Pseudocode for Permission Checking + + if (security policy exists) + if (this update is not permitted) + if (local option) + log a message about permission problem + if (local option) + return (REFUSED) + + 3.4 - Process Update Section + + Next, the Update Section is processed as follows. + + 3.4.1 - Prescan + + The Update Section is parsed into RRs and each RR's CLASS is checked + to see if it is ANY, NONE, or the same as the Zone Class, else signal + a FORMERR to the requestor. Using the definitions in Section 1.2, + each RR's NAME must be in the zone specified by the Zone Section, + else signal NOTZONE to the requestor. + + 3.4.1.2. For RRs whose CLASS is not ANY, check the TYPE and if it is + ANY, AXFR, MAILA, MAILB, or any other QUERY metatype, or any + unrecognized type, then signal FORMERR to the requestor. For RRs + whose CLASS is ANY or NONE, check the TTL to see that it is zero (0), + else signal a FORMERR to the requestor. For any RR whose CLASS is + ANY, check the RDLENGTH to make sure that it is zero (0) (that is, + the RDATA field is empty), and that the TYPE is not AXFR, MAILA, + MAILB, or any other QUERY metatype besides ANY, or any unrecognized + type, else signal FORMERR to the requestor. + + + + + + + + + + + + + +Vixie, et. al. Standards Track [Page 13] + +RFC 2136 DNS Update April 1997 + + + 3.4.1.3 - Pseudocode For Update Section Prescan + + [rr] for rr in updates + if (zone_of(rr.name) != ZNAME) + return (NOTZONE); + if (rr.class == zclass) + if (rr.type & ANY|AXFR|MAILA|MAILB) + return (FORMERR) + elsif (rr.class == ANY) + if (rr.ttl != 0 || rr.rdlength != 0 + || rr.type & AXFR|MAILA|MAILB) + return (FORMERR) + elsif (rr.class == NONE) + if (rr.ttl != 0 || rr.type & ANY|AXFR|MAILA|MAILB) + return (FORMERR) + else + return (FORMERR) + + 3.4.2 - Update + + The Update Section is parsed into RRs and these RRs are processed in + order. + + 3.4.2.1. If any system failure (such as an out of memory condition, + or a hardware error in persistent storage) occurs during the + processing of this section, signal SERVFAIL to the requestor and undo + all updates applied to the zone during this transaction. + + 3.4.2.2. Any Update RR whose CLASS is the same as ZCLASS is added to + the zone. In case of duplicate RDATAs (which for SOA RRs is always + the case, and for WKS RRs is the case if the ADDRESS and PROTOCOL + fields both match), the Zone RR is replaced by Update RR. If the + TYPE is SOA and there is no Zone SOA RR, or the new SOA.SERIAL is + lower (according to [RFC1982]) than or equal to the current Zone SOA + RR's SOA.SERIAL, the Update RR is ignored. In the case of a CNAME + Update RR and a non-CNAME Zone RRset or vice versa, ignore the CNAME + Update RR, otherwise replace the CNAME Zone RR with the CNAME Update + RR. + + 3.4.2.3. For any Update RR whose CLASS is ANY and whose TYPE is ANY, + all Zone RRs with the same NAME are deleted, unless the NAME is the + same as ZNAME in which case only those RRs whose TYPE is other than + SOA or NS are deleted. For any Update RR whose CLASS is ANY and + whose TYPE is not ANY all Zone RRs with the same NAME and TYPE are + deleted, unless the NAME is the same as ZNAME in which case neither + SOA or NS RRs will be deleted. + + + + + +Vixie, et. al. Standards Track [Page 14] + +RFC 2136 DNS Update April 1997 + + + 3.4.2.4. For any Update RR whose class is NONE, any Zone RR whose + NAME, TYPE, RDATA and RDLENGTH are equal to the Update RR is deleted, + unless the NAME is the same as ZNAME and either the TYPE is SOA or + the TYPE is NS and the matching Zone RR is the only NS remaining in + the RRset, in which case this Update RR is ignored. + + 3.4.2.5. Signal NOERROR to the requestor. + + 3.4.2.6 - Table Of Metavalues Used In Update Section + + CLASS TYPE RDATA Meaning + --------------------------------------------------------- + ANY ANY empty Delete all RRsets from a name + ANY rrset empty Delete an RRset + NONE rrset rr Delete an RR from an RRset + zone rrset rr Add to an RRset + + 3.4.2.7 - Pseudocode For Update Section Processing + + [rr] for rr in updates + if (rr.class == zclass) + if (rr.type == CNAME) + if (zone_rrset<rr.name, ~CNAME>) + next [rr] + elsif (zone_rrset<rr.name, CNAME>) + next [rr] + if (rr.type == SOA) + if (!zone_rrset<rr.name, SOA> || + zone_rr<rr.name, SOA>.serial > rr.soa.serial) + next [rr] + for zrr in zone_rrset<rr.name, rr.type> + if (rr.type == CNAME || rr.type == SOA || + (rr.type == WKS && rr.proto == zrr.proto && + rr.address == zrr.address) || + rr.rdata == zrr.rdata) + zrr = rr + next [rr] + zone_rrset<rr.name, rr.type> += rr + elsif (rr.class == ANY) + if (rr.type == ANY) + if (rr.name == zname) + zone_rrset<rr.name, ~(SOA|NS)> = Nil + else + zone_rrset<rr.name, *> = Nil + elsif (rr.name == zname && + (rr.type == SOA || rr.type == NS)) + next [rr] + else + + + +Vixie, et. al. Standards Track [Page 15] + +RFC 2136 DNS Update April 1997 + + + zone_rrset<rr.name, rr.type> = Nil + elsif (rr.class == NONE) + if (rr.type == SOA) + next [rr] + if (rr.type == NS && zone_rrset<rr.name, NS> == rr) + next [rr] + zone_rr<rr.name, rr.type, rr.data> = Nil + return (NOERROR) + + 3.5 - Stability + + When a zone is modified by an UPDATE operation, the server must + commit the change to nonvolatile storage before sending a response to + the requestor or answering any queries or transfers for the modified + zone. It is reasonable for a server to store only the update records + as long as a system reboot or power failure will cause these update + records to be incorporated into the zone the next time the server is + started. It is also reasonable for the server to copy the entire + modified zone to nonvolatile storage after each update operation, + though this would have suboptimal performance for large zones. + + 3.6 - Zone Identity + + If the zone's SOA SERIAL is changed by an update operation, that + change must be in a positive direction (using modulo 2**32 arithmetic + as specified by [RFC1982]). Attempts to replace an SOA with one + whose SERIAL is less than the current one will be silently ignored by + the primary master server. + + If the zone's SOA's SERIAL is not changed as a result of an update + operation, then the server shall increment it automatically before + the SOA or any changed name or RR or RRset is included in any + response or transfer. The primary master server's implementor might + choose to autoincrement the SOA SERIAL if any of the following events + occurs: + + (1) Each update operation. + + (2) A name, RR or RRset in the zone has changed and has subsequently + been visible to a DNS client since the unincremented SOA was + visible to a DNS client, and the SOA is about to become visible + to a DNS client. + + (3) A configurable period of time has elapsed since the last update + operation. This period shall be less than or equal to one third + of the zone refresh time, and the default shall be the lesser of + that maximum and 300 seconds. + + + + +Vixie, et. al. Standards Track [Page 16] + +RFC 2136 DNS Update April 1997 + + + (4) A configurable number of updates has been applied since the last + SOA change. The default value for this configuration parameter + shall be one hundred (100). + + It is imperative that the zone's contents and the SOA's SERIAL be + tightly synchronized. If the zone appears to change, the SOA must + appear to change as well. + + 3.7 - Atomicity + + During the processing of an UPDATE transaction, the server must + ensure atomicity with respect to other (concurrent) UPDATE or QUERY + transactions. No two transactions can be processed concurrently if + either depends on the final results of the other; in particular, a + QUERY should not be able to retrieve RRsets which have been partially + modified by a concurrent UPDATE, and an UPDATE should not be able to + start from prerequisites that might not still hold at the completion + of some other concurrent UPDATE. Finally, if two UPDATE transactions + would modify the same names, RRs or RRsets, then such UPDATE + transactions must be serialized. + + 3.8 - Response + + At the end of UPDATE processing, a response code will be known. A + response message is generated by copying the ID and Opcode fields + from the request, and either copying the ZOCOUNT, PRCOUNT, UPCOUNT, + and ADCOUNT fields and associated sections, or placing zeros (0) in + the these "count" fields and not including any part of the original + update. The QR bit is set to one (1), and the response is sent back + to the requestor. If the requestor used UDP, then the response will + be sent to the requestor's source UDP port. If the requestor used + TCP, then the response will be sent back on the requestor's open TCP + connection. + +4 - Requestor Behaviour + + 4.1. From a requestor's point of view, any authoritative server for + the zone can appear to be able to process update requests, even + though only the primary master server is actually able to modify the + zone's master file. Requestors are expected to know the name of the + zone they intend to update and to know or be able to determine the + name servers for that zone. + + + + + + + + + +Vixie, et. al. Standards Track [Page 17] + +RFC 2136 DNS Update April 1997 + + + 4.2. If update ordering is desired, the requestor will need to know + the value of the existing SOA RR. Requestors who update the SOA RR + must update the SOA SERIAL field in a positive direction (as defined + by [RFC1982]) and also preserve the other SOA fields unless the + requestor's explicit intent is to change them. The SOA SERIAL field + must never be set to zero (0). + + 4.3. If the requestor has reasonable cause to believe that all of a + zone's servers will be equally reachable, then it should arrange to + try the primary master server (as given by the SOA MNAME field if + matched by some NS NSDNAME) first to avoid unnecessary forwarding + inside the slave servers. (Note that the primary master will in some + cases not be reachable by all requestors, due to firewalls or network + partitioning.) + + 4.4. Once the zone's name servers been found and possibly sorted so + that the ones more likely to be reachable and/or support the UPDATE + opcode are listed first, the requestor composes an UPDATE message of + the following form and sends it to the first name server on its list: + + ID: (new) + Opcode: UPDATE + Zone zcount: 1 + Zone zname: (zone name) + Zone zclass: (zone class) + Zone ztype: T_SOA + Prerequisite Section: (see previous text) + Update Section: (see previous text) + Additional Data Section: (empty) + + 4.5. If the requestor receives a response, and the response has an + RCODE other than SERVFAIL or NOTIMP, then the requestor returns an + appropriate response to its caller. + + 4.6. If a response is received whose RCODE is SERVFAIL or NOTIMP, or + if no response is received within an implementation dependent timeout + period, or if an ICMP error is received indicating that the server's + port is unreachable, then the requestor will delete the unusable + server from its internal name server list and try the next one, + repeating until the name server list is empty. If the requestor runs + out of servers to try, an appropriate error will be returned to the + requestor's caller. + + + + + + + + + +Vixie, et. al. Standards Track [Page 18] + +RFC 2136 DNS Update April 1997 + + +5 - Duplicate Detection, Ordering and Mutual Exclusion + + 5.1. For correct operation, mechanisms may be needed to ensure + idempotence, order UPDATE requests and provide mutual exclusion. An + UPDATE message or response might be delivered zero times, one time, + or multiple times. Datagram duplication is of particular interest + since it covers the case of the so-called "replay attack" where a + correct request is duplicated maliciously by an intruder. + + 5.2. Multiple UPDATE requests or responses in transit might be + delivered in any order, due to network topology changes or load + balancing, or to multipath forwarding graphs wherein several slave + servers all forward to the primary master. In some cases, it might + be required that the earlier update not be applied after the later + update, where "earlier" and "later" are defined by an external time + base visible to some set of requestors, rather than by the order of + request receipt at the primary master. + + 5.3. A requestor can ensure transaction idempotence by explicitly + deleting some "marker RR" (rather than deleting the RRset of which it + is a part) and then adding a new "marker RR" with a different RDATA + field. The Prerequisite Section should specify that the original + "marker RR" must be present in order for this UPDATE message to be + accepted by the server. + + 5.4. If the request is duplicated by a network error, all duplicate + requests will fail since only the first will find the original + "marker RR" present and having its known previous value. The + decisions of whether to use such a "marker RR" and what RR to use are + left up to the application programmer, though one obvious choice is + the zone's SOA RR as described below. + + 5.5. Requestors can ensure update ordering by externally + synchronizing their use of successive values of the "marker RR." + Mutual exclusion can be addressed as a degenerate case, in that a + single succession of the "marker RR" is all that is needed. + + 5.6. A special case where update ordering and datagram duplication + intersect is when an RR validly changes to some new value and then + back to its previous value. Without a "marker RR" as described + above, this sequence of updates can leave the zone in an undefined + state if datagrams are duplicated. + + 5.7. To achieve an atomic multitransaction "read-modify-write" cycle, + a requestor could first retrieve the SOA RR, and build an UPDATE + message one of whose prerequisites was the old SOA RR. It would then + specify updates that would delete this SOA RR and add a new one with + an incremented SOA SERIAL, along with whatever actual prerequisites + + + +Vixie, et. al. Standards Track [Page 19] + +RFC 2136 DNS Update April 1997 + + + and updates were the object of the transaction. If the transaction + succeeds, the requestor knows that the RRs being changed were not + otherwise altered by any other requestor. + +6 - Forwarding + + When a zone slave forwards an UPDATE message upward toward the zone's + primary master server, it must allocate a new ID and prepare to enter + the role of "forwarding server," which is a requestor with respect to + the forward server. + + 6.1. The set of forward servers will be same as the set of servers + this zone slave would use as the source of AXFR or IXFR data. So, + while the original requestor might have used the zone's NS RRset to + locate its update server, a forwarder always forwards toward its + designated zone master servers. + + 6.2. If the original requestor used TCP, then the TCP connection from + the requestor is still open and the forwarder must use TCP to forward + the message. If the original requestor used UDP, the forwarder may + use either UDP or TCP to forward the message, at the whim of the + implementor. + + 6.3. It is reasonable for forward servers to be forwarders + themselves, if the AXFR dependency graph being followed is a deep one + involving firewalls and multiple connectivity realms. In most cases + the AXFR dependency graph will be shallow and the forward server will + be the primary master server. + + 6.4. The forwarder will not respond to its requestor until it + receives a response from its forward server. UPDATE transactions + involving forwarders are therefore time synchronized with respect to + the original requestor and the primary master server. + + 6.5. When there are multiple possible sources of AXFR data and + therefore multiple possible forward servers, a forwarder will use the + same fallback strategy with respect to connectivity or timeout errors + that it would use when performing an AXFR. This is implementation + dependent. + + 6.6. When a forwarder receives a response from a forward server, it + copies this response into a new response message, assigns its + requestor's ID to that message, and sends the response back to the + requestor. + + + + + + + +Vixie, et. al. Standards Track [Page 20] + +RFC 2136 DNS Update April 1997 + + +7 - Design, Implementation, Operation, and Protocol Notes + + Some of the principles which guided the design of this UPDATE + specification are as follows. Note that these are not part of the + formal specification and any disagreement between this section and + any other section of this document should be resolved in favour of + the other section. + + 7.1. Using metavalues for CLASS is possible only because all RRs in + the packet are assumed to be in the same zone, and CLASS is an + attribute of a zone rather than of an RRset. (It is for this reason + that the Zone Section is not optional.) + + 7.2. Since there are no data-present or data-absent errors possible + from processing the Update Section, any necessary data-present and + data- absent dependencies should be specified in the Prerequisite + Section. + + 7.3. The Additional Data Section can be used to supply a server with + out of zone glue that will be needed in referrals. For example, if + adding a new NS RR to HOME.VIX.COM specifying a nameserver called + NS.AU.OZ, the A RR for NS.AU.OZ can be included in the Additional + Data Section. Servers can use this information or ignore it, at the + discretion of the implementor. We discourage caching this + information for use in subsequent DNS responses. + + 7.4. The Additional Data Section might be used if some of the RRs + later needed for Secure DNS Update are not actually zone updates, but + rather ancillary keys or signatures not intended to be stored in the + zone (as an update would be), yet necessary for validating the update + operation. + + 7.5. It is expected that in the absence of Secure DNS Update, a + server will only accept updates if they come from a source address + that has been statically configured in the server's description of a + primary master zone. DHCP servers would be likely candidates for + inclusion in this statically configured list. + + 7.6. It is not possible to create a zone using this protocol, since + there is no provision for a slave server to be told who its master + servers are. It is expected that this protocol will be extended in + the future to cover this case. Therefore, at this time, the addition + of SOA RRs is unsupported. For similar reasons, deletion of SOA RRs + is also unsupported. + + + + + + + +Vixie, et. al. Standards Track [Page 21] + +RFC 2136 DNS Update April 1997 + + + 7.7. The prerequisite for specifying that a name own at least one RR + differs semantically from QUERY, in that QUERY would return + <NOERROR,ANCOUNT=0> rather than NXDOMAIN if queried for an RRset at + this name, while UPDATE's prerequisite condition [Section 2.4.4] + would NOT be satisfied. + + 7.8. It is possible for a UDP response to be lost in transit and for + a request to be retried due to a timeout condition. In this case an + UPDATE that was successful the first time it was received by the + primary master might ultimately appear to have failed when the + response to a duplicate request is finally received by the requestor. + (This is because the original prerequisites may no longer be + satisfied after the update has been applied.) For this reason, + requestors who require an accurate response code must use TCP. + + 7.9. Because a requestor who requires an accurate response code will + initiate their UPDATE transaction using TCP, a forwarder who receives + a request via TCP must forward it using TCP. + + 7.10. Deferral of SOA SERIAL autoincrements is made possible so that + serial numbers can be conserved and wraparound at 2**32 can be made + an infrequent occurance. Visible (to DNS clients) SOA SERIALs need + to differ if the zone differs. Note that the Authority Section SOA + in a QUERY response is a form of visibility, for the purposes of this + prerequisite. + + 7.11. A zone's SOA SERIAL should never be set to zero (0) due to + interoperability problems with some older but widely installed + implementations of DNS. When incrementing an SOA SERIAL, if the + result of the increment is zero (0) (as will be true when wrapping + around 2**32), it is necessary to increment it again or set it to one + (1). See [RFC1982] for more detail on this subject. + + 7.12. Due to the TTL minimalization necessary when caching an RRset, + it is recommended that all TTLs in an RRset be set to the same value. + While the DNS Message Format permits variant TTLs to exist in the + same RRset, and this variance can exist inside a zone, such variance + will have counterintuitive results and its use is discouraged. + + 7.13. Zone cut management presents some obscure corner cases to the + add and delete operations in the Update Section. It is possible to + delete an NS RR as long as it is not the last NS RR at the root of a + zone. If deleting all RRs from a name, SOA and NS RRs at the root of + a zone are unaffected. If deleting RRsets, it is not possible to + delete either SOA or NS RRsets at the top of a zone. An attempt to + add an SOA will be treated as a replace operation if an SOA already + exists, or as a no-op if the SOA would be new. + + + + +Vixie, et. al. Standards Track [Page 22] + +RFC 2136 DNS Update April 1997 + + + 7.14. No semantic checking is required in the primary master server + when adding new RRs. Therefore a requestor can cause CNAME or NS or + any other kind of RR to be added even if their target name does not + exist or does not have the proper RRsets to make the original RR + useful. Primary master servers that DO implement this kind of + checking should take great care to avoid out-of-zone dependencies + (whose veracity cannot be authoritatively checked) and should + implement all such checking during the prescan phase. + + 7.15. Nonterminal or wildcard CNAMEs are not well specified by + [RFC1035] and their use will probably lead to unpredictable results. + Their use is discouraged. + + 7.16. Empty nonterminals (nodes with children but no RRs of their + own) will cause <NOERROR,ANCOUNT=0> responses to be sent in response + to a query of any type for that name. There is no provision for + empty terminal nodes -- so if all RRs of a terminal node are deleted, + the name is no longer in use, and queries of any type for that name + will result in an NXDOMAIN response. + + 7.17. In a deep AXFR dependency graph, it has not historically been + an error for slaves to depend mutually upon each other. This + configuration has been used to enable a zone to flow from the primary + master to all slaves even though not all slaves have continuous + connectivity to the primary master. UPDATE's use of the AXFR + dependency graph for forwarding prohibits this kind of dependency + loop, since UPDATE forwarding has no loop detection analagous to the + SOA SERIAL pretest used by AXFR. + + 7.18. Previously existing names which are occluded by a new zone cut + are still considered part of the parent zone, for the purposes of + zone transfers, even though queries for such names will be referred + to the new subzone's servers. If a zone cut is removed, all parent + zone names that were occluded by it will again become visible to + queries. (This is a clarification of [RFC1034].) + + 7.19. If a server is authoritative for both a zone and its child, + then queries for names at the zone cut between them will be answered + authoritatively using only data from the child zone. (This is a + clarification of [RFC1034].) + + + + + + + + + + + +Vixie, et. al. Standards Track [Page 23] + +RFC 2136 DNS Update April 1997 + + + 7.20. Update ordering using the SOA RR is problematic since there is + no way to know which of a zone's NS RRs represents the primary + master, and the zone slaves can be out of date if their SOA.REFRESH + timers have not elapsed since the last time the zone was changed on + the primary master. We recommend that a zone needing ordered updates + use only servers which implement NOTIFY (see [RFC1996]) and IXFR (see + [RFC1995]), and that a client receiving a prerequisite error while + attempting an ordered update simply retry after a random delay period + to allow the zone to settle. + +8 - Security Considerations + + 8.1. In the absence of [RFC2137] or equivilent technology, the + protocol described by this document makes it possible for anyone who + can reach an authoritative name server to alter the contents of any + zones on that server. This is a serious increase in vulnerability + from the current technology. Therefore it is very strongly + recommended that the protocols described in this document not be used + without [RFC2137] or other equivalently strong security measures, + e.g. IPsec. + + 8.2. A denial of service attack can be launched by flooding an update + forwarder with TCP sessions containing updates that the primary + master server will ultimately refuse due to permission problems. + This arises due to the requirement that an update forwarder receiving + a request via TCP use a synchronous TCP session for its forwarding + operation. The connection management mechanisms of [RFC1035 4.2.2] + are sufficient to prevent large scale damage from such an attack, but + not to prevent some queries from going unanswered during the attack. + +Acknowledgements + + We would like to thank the IETF DNSIND working group for their input + and assistance, in particular, Rob Austein, Randy Bush, Donald + Eastlake, Masataka Ohta, Mark Andrews, and Robert Elz. Special + thanks to Bill Simpson, Ken Wallich and Bob Halley for reviewing this + document. + + + + + + + + + + + + + + +Vixie, et. al. Standards Track [Page 24] + +RFC 2136 DNS Update April 1997 + + +References + + [RFC1035] + Mockapetris, P., "Domain Names - Implementation and + Specification", STD 13, RFC 1035, USC/Information Sciences + Institute, November 1987. + + [RFC1982] + Elz, R., "Serial Number Arithmetic", RFC 1982, University of + Melbourne, August 1996. + + [RFC1995] + Ohta, M., "Incremental Zone Transfer", RFC 1995, Tokyo Institute + of Technology, August 1996. + + [RFC1996] + Vixie, P., "A Mechanism for Prompt Notification of Zone Changes", + RFC 1996, Internet Software Consortium, August 1996. + + [RFC2065] + Eastlake, D., and C. Kaufman, "Domain Name System Protocol + Security Extensions", RFC 2065, January 1997. + + [RFC2137] + Eastlake, D., "Secure Domain Name System Dynamic Update", RFC + 2137, April 1997. + +Authors' Addresses + + Yakov Rekhter + Cisco Systems + 170 West Tasman Drive + San Jose, CA 95134-1706 + + Phone: +1 914 528 0090 + EMail: yakov@cisco.com + + + Susan Thomson + Bellcore + 445 South Street + Morristown, NJ 07960 + + Phone: +1 201 829 4514 + EMail: set@thumper.bellcore.com + + + + + + +Vixie, et. al. Standards Track [Page 25] + +RFC 2136 DNS Update April 1997 + + + Jim Bound + Digital Equipment Corp. + 110 Spitbrook Rd ZK3-3/U14 + Nashua, NH 03062-2698 + + Phone: +1 603 881 0400 + EMail: bound@zk3.dec.com + + + Paul Vixie + Internet Software Consortium + Star Route Box 159A + Woodside, CA 94062 + + Phone: +1 415 747 0204 + EMail: paul@vix.com + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Vixie, et. al. Standards Track [Page 26] + + diff --git a/doc/rfc/rfc2137.txt b/doc/rfc/rfc2137.txt new file mode 100644 index 00000000..ceb3613d --- /dev/null +++ b/doc/rfc/rfc2137.txt @@ -0,0 +1,619 @@ + + + + + + +Network Working Group D. Eastlake 3rd +Request for Comments: 2137 CyberCash, Inc. +Updates: 1035 April 1997 +Category: Standards Track + + + Secure Domain Name System Dynamic Update + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Abstract + + Domain Name System (DNS) protocol extensions have been defined to + authenticate the data in DNS and provide key distribution services + [RFC2065]. DNS Dynamic Update operations have also been defined + [RFC2136], but without a detailed description of security for the + update operation. This memo describes how to use DNSSEC digital + signatures covering requests and data to secure updates and restrict + updates to those authorized to perform them as indicated by the + updater's possession of cryptographic keys. + +Acknowledgements + + The contributions of the following persons (who are listed in + alphabetic order) to this memo are gratefully acknowledged: + + Olafur Gudmundsson (ogud@tis.com> + Charlie Kaufman <Charlie_Kaufman@iris.com> + Stuart Kwan <skwan@microsoft.com> + Edward Lewis <lewis@tis.com> + +Table of Contents + + 1. Introduction............................................2 + 1.1 Overview of DNS Dynamic Update.........................2 + 1.2 Overview of DNS Security...............................2 + 2. Two Basic Modes.........................................3 + 3. Keys....................................................5 + 3.1 Update Keys............................................6 + 3.1.1 Update Key Name Scope................................6 + 3.1.2 Update Key Class Scope...............................6 + 3.1.3 Update Key Signatory Field...........................6 + + + +Eastlake Standards Track [Page 1] + +RFC 2137 SDNSDU April 1997 + + + 3.2 Zone Keys and Update Modes.............................8 + 3.3 Wildcard Key Punch Through.............................9 + 4. Update Signatures.......................................9 + 4.1 Update Request Signatures..............................9 + 4.2 Update Data Signatures................................10 + 5. Security Considerations................................10 + References................................................10 + Author's Address..........................................11 + +1. Introduction + + Dynamic update operations have been defined for the Domain Name + System (DNS) in RFC 2136, but without a detailed description of + security for those updates. Means of securing the DNS and using it + for key distribution have been defined in RFC 2065. + + This memo proposes techniques based on the defined DNS security + mechanisms to authenticate DNS updates. + + Familiarity with the DNS system [RFC 1034, 1035] is assumed. + Familiarity with the DNS security and dynamic update proposals will + be helpful. + +1.1 Overview of DNS Dynamic Update + + DNS dynamic update defines a new DNS opcode, new DNS request and + response structure if that opcode is used, and new error codes. An + update can specify complex combinations of deletion and insertion + (with or without pre-existence testing) of resource records (RRs) + with one or more owner names; however, all testing and changes for + any particular DNS update request are restricted to a single zone. + Updates occur at the primary server for a zone. + + The primary server for a secure dynamic zone must increment the zone + SOA serial number when an update occurs or the next time the SOA is + retrieved if one or more updates have occurred since the previous SOA + retrieval and the updates themselves did not update the SOA. + +1.2 Overview of DNS Security + + DNS security authenticates data in the DNS by also storing digital + signatures in the DNS as SIG resource records (RRs). A SIG RR + provides a digital signature on the set of all RRs with the same + owner name and class as the SIG and whose type is the type covered by + the SIG. The SIG RR cryptographically binds the covered RR set to + the signer, time signed, signature expiration date, etc. There are + one or more keys associated with every secure zone and all data in + the secure zone is signed either by a zone key or by a dynamic update + + + +Eastlake Standards Track [Page 2] + +RFC 2137 SDNSDU April 1997 + + + key tracing its authority to a zone key. + + DNS security also defines transaction SIGs and request SIGs. + Transaction SIGs appear at the end of a response. Transaction SIGs + authenticate the response and bind it to the corresponding request + with the key of the host where the responding DNS server is. Request + SIGs appear at the end of a request and authenticate the request with + the key of the submitting entity. + + Request SIGs are the primary means of authenticating update requests. + + DNS security also permits the storage of public keys in the DNS via + KEY RRs. These KEY RRs are also, of course, authenticated by SIG + RRs. KEY RRs for zones are stored in their superzone and subzone + servers, if any, so that the secure DNS tree of zones can be + traversed by a security aware resolver. + +2. Two Basic Modes + + A dynamic secure zone is any secure DNS zone containing one or more + KEY RRs that can authorize dynamic updates, i.e., entity or user KEY + RRs with the signatory field non-zero, and whose zone KEY RR + signatory field indicates that updates are implemented. There are two + basic modes of dynamic secure zone which relate to the update + strategy, mode A and mode B. A summary comparison table is given + below and then each mode is described. + + + + + + + + + + + + + + + + + + + + + + + + + +Eastlake Standards Track [Page 3] + +RFC 2137 SDNSDU April 1997 + + + SUMMARY OF DYNAMIC SECURE ZONE MODES + + CRITERIA: | MODE A | MODE B + =========================+====================+=================== + Definition: | Zone Key Off line | Zone Key On line + =========================+====================+=================== + Server Workload | Low | High + -------------------------+--------------------+------------------- + Static Data Security | Very High | Medium-High + -------------------------+--------------------+------------------- + Dynamic Data Security | Medium | Medium-High + -------------------------+--------------------+------------------- + Key Restrictions | Fine grain | Coarse grain + -------------------------+--------------------+------------------- + Dynamic Data Temporality | Transient | Permanent + -------------------------+--------------------+------------------- + Dynamic Key Rollover | No | Yes + -------------------------+--------------------+------------------- + + For mode A, the zone owner key and static zone master file are always + kept off-line for maximum security of the static zone contents. + + As a consequence, any dynamicly added or changed RRs are signed in + the secure zone by their authorizing dynamic update key and they are + backed up, along with this SIG RR, in a separate online dynamic + master file. In this type of zone, server computation is minimized + since the server need only check signatures on the update data and + request, which have already been signed by the updater, generally a + much faster operation than signing data. However, the AXFR SIG and + NXT RRs which covers the zone under the zone key will not cover + dynamically added data. Thus, for type A dynamic secure zones, zone + transfer security is not automatically provided for dynamically added + RRs, where they could be omitted, and authentication is not provided + for the server denial of the existence of a dynamically added type. + Because the dynamicly added RRs retain their update KEY signed SIG, + finer grained control of updates can be implemented via bits in the + KEY RR signatory field. Because dynamic data is only stored in the + online dynamic master file and only authenticated by dynamic keys + which expire, updates are transient in nature. Key rollover for an + entity that can authorize dynamic updates is more cumbersome since + the authority of their key must be traceable to a zone key and so, in + general, they must securely communicate a new key to the zone + authority for manual transfer to the off line static master file. + NOTE: for this mode the zone SOA must be signed by a dynamic update + key and that private key must be kept on line so that the SOA can be + changed for updates. + + + + + +Eastlake Standards Track [Page 4] + +RFC 2137 SDNSDU April 1997 + + + For mode B, the zone owner key and master file are kept on-line at + the zone primary server. When authenticated updates succeed, SIGs + under the zone key for the resulting data (including the possible NXT + type bit map changes) are calculated and these SIG (and possible NXT) + changes are entered into the zone and the unified on-line master + file. (The zone transfer AXFR SIG may be recalculated for each + update or on demand when a zone transfer is requested and it is out + of date.) + + As a consequence, this mode requires considerably more computational + effort on the part of the server as the public/private keys are + generally arranged so that signing (calculating a SIG) is more effort + than verifying a signature. The security of static data in the zone + is decreased because the ultimate state of the static data being + served and the ultimate zone authority private key are all on-line on + the net. This means that if the primary server is subverted, false + data could be authenticated to secondaries and other + servers/resolvers. On the other hand, this mode of operation means + that data added dynamically is more secure than in mode A. Dynamic + data will be covered by the AXFR SIG and thus always protected during + zone transfers and will be included in NXT RRs so that it can be + falsely denied by a server only to the same extent that static data + can (i.e., if it is within a wild card scope). Because the zone key + is used to sign all the zone data, the information as to who + originated the current state of dynamic RR sets is lost, making + unavailable the effects of some of the update control bits in the KEY + RR signatory field. In addition, the incorporation of the updates + into the primary master file and their authentication by the zone key + makes then permanent in nature. Maintaining the zone key on-line + also means that dynamic update keys which are signed by the zone key + can be dynamically updated since the zone key is available to + dynamically sign new values. + + NOTE: The Mode A / Mode B distinction only effects the validation + and performance of update requests. It has no effect on retrievals. + One reasonable operational scheme may be to keep a mostly static main + zone operating in Mode A and have one or more dynamic subzones + operating in Mode B. + +3. Keys + + Dynamic update requests depend on update keys as described in section + 3.1 below. In addition, the zone secure dynamic update mode and + availability of some options is indicated in the zone key. Finally, + a special rule is used in searching for KEYs to validate updates as + described in section 3.3. + + + + + +Eastlake Standards Track [Page 5] + +RFC 2137 SDNSDU April 1997 + + +3.1 Update Keys + + All update requests to a secure zone must include signatures by one + or more key(s) that together can authorize that update. In order for + the Domain Name System (DNS) server receiving the request to confirm + this, the key or keys must be available to and authenticated by that + server as a specially flagged KEY Resource Record. + + The scope of authority of such keys is indicated by their KEY RR + owner name, class, and signatory field flags as described below. In + addition, such KEY RRs must be entity or user keys and not have the + authentication use prohibited bit on. All parts of the actual update + must be within the scope of at least one of the keys used for a + request SIG on the update request as described in section 4. + +3.1.1 Update Key Name Scope + + The owner name of any update authorizing KEY RR must (1) be the same + as the owner name of any RRs being added or deleted or (2) a wildcard + name including within its extended scope (see section 3.3) the name + of any RRs being added or deleted and those RRs must be in the same + zone. + +3.1.2 Update Key Class Scope + + The class of any update authorizing KEY RR must be the same as the + class of any RR's being added or deleted. + +3.1.3 Update Key Signatory Field + + The four bit "signatory field" (see RFC 2065) of any update + authorizing KEY RR must be non-zero. The bits have the meanings + described below for non-zone keys (see section 3.2 for zone type + keys). + + UPDATE KEY RR SIGNATORY FIELD BITS + + 0 1 2 3 + +-----------+-----------+-----------+-----------+ + | zone | strong | unique | general | + +-----------+-----------+-----------+-----------+ + + Bit 0, zone control - If nonzero, this key is authorized to attach, + detach, and move zones by creating and deleting NS, glue A, and + zone KEY RR(s). If zero, the key can not authorize any update + that would effect such RRs. This bit is meaningful for both + type A and type B dynamic secure zones. + + + + +Eastlake Standards Track [Page 6] + +RFC 2137 SDNSDU April 1997 + + + NOTE: do not confuse the "zone" signatory field bit with the + "zone" key type bit. + + Bit 1, strong update - If nonzero, this key is authorized to add and + delete RRs even if there are other RRs with the same owner name + and class that are authenticated by a SIG signed with a + different dynamic update KEY. If zero, the key can only + authorize updates where any existing RRs of the same owner and + class are authenticated by a SIG using the same key. This bit + is meaningful only for type A dynamic zones and is ignored in + type B dynamic zones. + + Keeping this bit zero on multiple KEY RRs with the same or + nested wild card owner names permits multiple entities to exist + that can create and delete names but can not effect RRs with + different owner names from any they created. In effect, this + creates two levels of dynamic update key, strong and weak, where + weak keys are limited in interfering with each other but a + strong key can interfere with any weak keys or other strong + keys. + + Bit 2, unique name update - If nonzero, this key is authorized to add + and update RRs for only a single owner name. If there already + exist RRs with one or more names signed by this key, they may be + updated but no new name created until the number of existing + names is reduced to zero. This bit is meaningful only for mode + A dynamic zones and is ignored in mode B dynamic zones. This bit + is meaningful only if the owner name is a wildcard. (Any + dynamic update KEY with a non-wildcard name is, in effect, a + unique name update key.) + + This bit can be used to restrict a KEY from flooding a zone with + new names. In conjunction with a local administratively imposed + limit on the number of dynamic RRs with a particular name, it + can completely restrict a KEY from flooding a zone with RRs. + + Bit 3, general update - The general update signatory field bit has no + special meaning. If the other three bits are all zero, it must + be one so that the field is non-zero to designate that the key + is an update key. The meaning of all values of the signatory + field with the general bit and one or more other signatory field + bits on is reserved. + + All the signatory bit update authorizations described above only + apply if the update is within the name and class scope as per + sections 3.1.1 and 3.1.2. + + + + + +Eastlake Standards Track [Page 7] + +RFC 2137 SDNSDU April 1997 + + +3.2 Zone Keys and Update Modes + + Zone type keys are automatically authorized to sign anything in their + zone, of course, regardless of the value of their signatory field. + For zone keys, the signatory field bits have different means than + they they do for update keys, as shown below. The signatory field + MUST be zero if dynamic update is not supported for a zone and MUST + be non-zero if it is. + + ZONE KEY RR SIGNATORY FIELD BITS + + 0 1 2 3 + +-----------+-----------+-----------+-----------+ + | mode | strong | unique | general | + +-----------+-----------+-----------+-----------+ + + Bit 0, mode - This bit indicates the update mode for this zone. Zero + indicates mode A while a one indicates mode B. + + Bit 1, strong update - If nonzero, this indicates that the "strong" + key feature described in section 3.1.3 above is implemented and + enabled for this secure zone. If zero, the feature is not + available. Has no effect if the zone is a mode B secure update + zone. + + Bit 2, unique name update - If nonzero, this indicates that the + "unique name" feature described in section 3.1.3 above is + implemented and enabled for this secure zone. If zero, this + feature is not available. Has no effect if the zone is a mode B + secure update zone. + + Bit 3, general - This bit has no special meeting. If dynamic update + for a zone is supported and the other bits in the zone key + signatory field are zero, it must be a one. The meaning of zone + keys where the signatory field has the general bit and one or + more other bits on is reserved. + + If there are multiple dynamic update KEY RRs for a zone and zone + policy is in transition, they might have different non-zero signatory + fields. In that case, strong and unique name restrictions must be + enforced as long as there is a non-expired zone key being advertised + that indicates mode A with the strong or unique name bit on + respectively. Mode B updates MUST be supported as long as there is a + non-expired zone key that indicates mode B. Mode A updates may be + treated as mode B updates at server option if non-expired zone keys + indicate that both are supported. + + + + + +Eastlake Standards Track [Page 8] + +RFC 2137 SDNSDU April 1997 + + + A server that will be executing update operations on a zone, that is, + the primary master server, MUST not advertize a zone key that will + attract requests for a mode or features that it can not support. + +3.3 Wildcard Key Punch Through + + Just as a zone key is valid throughout the entire zone, update keys + with wildcard names are valid throughout their extended scope, within + the zone. That is, they remain valid for any name that would match + them, even existing specific names within their apparent scope. + + If this were not so, then whenever a name within a wildcard scope was + created by dynamic update, it would be necessary to first create a + copy of the KEY RR with this name, because otherwise the existence of + the more specific name would hide the authorizing KEY RR and would + make later updates impossible. An updater could create such a KEY RR + but could not zone sign it with their authorizing signer. They would + have to sign it with the same key using the wildcard name as signer. + Thus in creating, for example, one hundred type A RRs authorized by a + *.1.1.1.in-addr.arpa. KEY RR, without key punch through 100 As, 100 + KEYs, and 200 SIGs would have to be created as opposed to merely 100 + As and 100 SIGs with key punch through. + +4. Update Signatures + + Two kinds of signatures can appear in updates. Request signatures, + which are always required, cover the entire request and authenticate + the DNS header, including opcode, counts, etc., as well as the data. + Data signatures, on the other hand, appear only among the RRs to be + added and are only required for mode A operation. These two types of + signatures are described further below. + +4.1 Update Request Signatures + + An update can effect multiple owner names in a zone. It may be that + these different names are covered by different dynamic update keys. + For every owner name effected, the updater must know a private key + valid for that name (and the zone's class) and must prove this by + appending request SIG RRs under each such key. + + As specified in RFC 2065, a request signature is a SIG RR occurring + at the end of a request with a type covered field of zero. For an + update, request signatures occur in the Additional information + section. Each request SIG signs the entire request, including DNS + header, but excluding any other request SIG(s) and with the ARCOUNT + in the DNS header set to what it wold be without the request SIGs. + + + + + +Eastlake Standards Track [Page 9] + +RFC 2137 SDNSDU April 1997 + + +4.2 Update Data Signatures + + Mode A dynamic secure zones require that the update requester provide + SIG RRs that will authenticate the after update state of all RR sets + that are changed by the update and are non-empty after the update. + These SIG RRs appear in the request as RRs to be added and the + request must delete any previous data SIG RRs that are invalidated by + the request. + + In Mode B dynamic secure zones, all zone data is authenticated by + zone key SIG RRs. In this case, data signatures need not be included + with the update. A resolver can determine which mode an updatable + secure zone is using by examining the signatory field bits of the + zone KEY RR (see section 3.2). + +5. Security Considerations + + Any zone permitting dynamic updates is inherently less secure than a + static secure zone maintained off line as recommended in RFC 2065. If + nothing else, secure dynamic update requires on line change to and + re-signing of the zone SOA resource record (RR) to increase the SOA + serial number. This means that compromise of the primary server host + could lead to arbitrary serial number changes. + + Isolation of dynamic RRs to separate zones from those holding most + static RRs can limit the damage that could occur from breach of a + dynamic zone's security. + +References + + [RFC2065] Eastlake, D., and C. Kaufman, "Domain Name System Security + Extensions", RFC 2065, CyberCash, Iris, January 1997. + + [RFC2136] Vixie, P., Editor, Thomson, T., Rekhter, Y., and J. Bound, + "Dynamic Updates in the Domain Name System (DNS UPDATE)", RFC 2136, + April 1997. + + [RFC1035] Mockapetris, P., "Domain Names - Implementation and + Specifications", STD 13, RFC 1035, November 1987. + + [RFC1034] Mockapetris, P., "Domain Names - Concepts and Facilities", + STD 13, RFC 1034, November 1987. + + + + + + + + + +Eastlake Standards Track [Page 10] + +RFC 2137 SDNSDU April 1997 + + +Author's Address + + Donald E. Eastlake, 3rd + CyberCash, Inc. + 318 Acton Street + Carlisle, MA 01741 USA + + Phone: +1 508-287-4877 + +1 508-371-7148 (fax) + +1 703-620-4200 (main office, Reston, Virginia, USA) + EMail: dee@cybercash.com + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Eastlake Standards Track [Page 11] + diff --git a/doc/rfc/rfc2163.txt b/doc/rfc/rfc2163.txt new file mode 100644 index 00000000..00fcee7c --- /dev/null +++ b/doc/rfc/rfc2163.txt @@ -0,0 +1,1459 @@ + + + + + + +Network Working Group C. Allocchio +Request for Comments: 2163 GARR-Italy +Obsoletes: 1664 January 1998 +Category: Standards Track + + + Using the Internet DNS to Distribute + MIXER Conformant Global Address Mapping (MCGAM) + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (1998). All Rights Reserved. + +Abstract + + This memo is the complete technical specification to store in the + Internet Domain Name System (DNS) the mapping information (MCGAM) + needed by MIXER conformant e-mail gateways and other tools to map + RFC822 domain names into X.400 O/R names and vice versa. Mapping + information can be managed in a distributed rather than a centralised + way. Organizations can publish their MIXER mapping or preferred + gateway routing information using just local resources (their local + DNS server), avoiding the need for a strong coordination with any + centralised organization. MIXER conformant gateways and tools located + on Internet hosts can retrieve the mapping information querying the + DNS instead of having fixed tables which need to be centrally updated + and distributed. + + This memo obsoletes RFC1664. It includes the changes introduced by + MIXER specification with respect to RFC1327: the new 'gate1' (O/R + addresses to domain) table is fully supported. Full backward + compatibility with RFC1664 specification is mantained, too. + + RFC1664 was a joint effort of IETF X400 operation working group + (x400ops) and TERENA (formely named "RARE") Mail and Messaging + working group (WG-MSG). This update was performed by the IETF MIXER + working group. + + + + + + +Allocchio Standards Track [Page 1] + +RFC 2163 MIXER MCGAM January 1998 + + +1. Introduction + + The connectivity between the Internet SMTP mail and other mail + services, including the Internet X.400 mail and the commercial X.400 + service providers, is assured by the Mail eXchanger (MX) record + information distributed via the Internet Domain Name System (DNS). A + number of documents then specify in details how to convert or encode + addresses from/to RFC822 style to the other mail system syntax. + However, only conversion methods provide, via some algorithm or a set + of mapping rules, a smooth translation, resulting in addresses + indistinguishable from the native ones in both RFC822 and foreign + world. + + MIXER describes a set of mappings (MIXER Conformant Global Address + Mapping - MCGAM) which will enable interworking between systems + operating the CCITT X.400 (1984/88/92) Recommendations and systems + using using the RFC822 mail protocol, or protocols derived from + RFC822. That document addresses conversion of services, addresses, + message envelopes, and message bodies between the two mail systems. + This document is concerned with one aspect of MIXER: the mechanism + for mapping between X.400 O/R addresses and RFC822 domain names. As + described in Appendix F of MIXER, implementation of the mappings + requires a database which maps between X.400 O/R addresses and domain + names; in RFC1327 this database was statically defined. + + The original approach in RFC1327 required many efforts to maintain + the correct mapping: all the gateways needed to get coherent tables + to apply the same mappings, the conversion tables had to be + distributed among all the operational gateways, and also every update + needed to be distributed. + + The concept of mapping rules distribution and use has been revised in + the new MIXER specification, introducing the concept of MIXER + Conformant Global Address Mapping (MCGAM). A MCGAM does not need to + be globally installed by any MIXER conformant gateway in the world + any more. However MIXER requires now efficient methods to publish its + MCGAM. + + Static tables are one of the possible methods to publish MCGAM. + However this static mechanism requires quite a long time to be spent + modifying and distributing the information, putting heavy constraints + on the time schedule of every update. In fact it does not appear + efficient compared to the Internet Domain Name Service (DNS). More + over it does not look feasible to distribute the database to a large + number of other useful applications, like local address converters, + e-mail User Agents or any other tool requiring the mapping rules to + produce correct results. + + + + +Allocchio Standards Track [Page 2] + +RFC 2163 MIXER MCGAM January 1998 + + + Two much more efficient methods are proposed by MIXER for publication + of MCGAM: the Internet DNS and X.500. This memo is the complete + technical specification for publishing MCGAM via Internet DNS. + + A first proposal to use the Internet DNS to store, retrieve and + maintain those mappings was introduced by two of the authors of + RFC1664 (B. Cole and R. Hagens) adopting two new DNS resource record + (RR) types: TO-X400 and TO-822. This proposal now adopts a more + complete strategy, and requires one new RR only. The distribution of + MCGAMs via DNS is in fact an important service for the whole Internet + community: it completes the information given by MX resource record + and it allows to produce clean addresses when messages are exchanged + among the Internet RFC822 world and the X.400 one (both Internet and + Public X.400 service providers). + + A first experiment in using the DNS without expanding the current set + of RR and using available ones was deployed by some of the authors of + RFC1664 at the time of its development. The existing PTR resource + records were used to store the mapping rules, and a new DNS tree was + created under the ".it" top level domain. The result of the + experiment was positive, and a few test applications ran under this + provisional set up. This test was also very useful in order to define + a possible migration strategy during the deployment of the new DNS + containing the new RR. The Internet DNS nameservers wishing to + provide this mapping information need in fact to be modified to + support the new RR type, and in the real Internet, due to the large + number of different implementations, this takes some time. + + The basic idea is to adopt a new DNS RR to store the mapping + information. The RFC822 to X.400 mapping rules (including the so + called 'gate2' rules) will be stored in the ordinary DNS tree, while + the definition of a new branch of the name space defined under each + national top level domain is envisaged in order to contain the X.400 + to RFC822 mappings ('table1' and 'gate1'). A "two-way" mapping + resolution schema is thus fully implemented. + + The creation of the new domain name space representing the X.400 O/R + names structure also provides the chance to use the DNS to distribute + dynamically other X.400 related information, thus solving other + efficiency problems currently affecting the X.400 MHS service. + + In this paper we will adopt the MCGAM syntax, showing how it can be + stored into the Internet DNS. + + + + + + + + +Allocchio Standards Track [Page 3] + +RFC 2163 MIXER MCGAM January 1998 + + +1.1 Definitions syntax + + The definitions in this document is given in BNF-like syntax, using + the following conventions: + + | means choice + \ is used for continuation of a definition over several lines + [] means optional + {} means repeated one or more times + + The definitions, however, are detailed only until a certain level, + and below it self-explaining character text strings will be used. + +2. Motivation + + Implementations of MIXER gateways require that a database store + address mapping information for X.400 and RFC822. This information + must be made available (published) to all MIXER gateways. In the + Internet community, the DNS has proven to be a practical mean for + providing a distributed name service. Advantages of using a DNS based + system over a table based approach for mapping between O/R addresses + and domain names are: + + - It avoids fetching and storing of entire mapping tables by every + host that wishes to implement MIXER gateways and/or tools + + - Modifications to the DNS based mapping information can be made + available in a more timely manner than with a table driven + approach. + + - It allows full authority delegation, in agreement with the + Internet regionalization process. + + - Table management is not necessarily required for DNS-based + MIXER gateways. + + - One can determine the mappings in use by a remote gateway by + querying the DNS (remote debugging). + + Also many other tools, like address converters and User Agents can + take advantage of the real-time availability of MIXER tables, + allowing a much easier maintenance of the information. + +3. The domain space for X.400 O/R name addresses + + Usual domain names (the ones normally used as the global part of an + RFC822 e-mail address) and their associated information, i.e., host + IP addresses, mail exchanger names, etc., are stored in the DNS as a + + + +Allocchio Standards Track [Page 4] + +RFC 2163 MIXER MCGAM January 1998 + + + distributed database under a number of top-level domains. Some top- + level domains are used for traditional categories or international + organisations (EDU, COM, NET, ORG, INT, MIL...). On the other hand + any country has its own two letter ISO country code as top-level + domain (FR, DE, GB, IT, RU, ...), including "US" for USA. The + special top-level/second-level couple IN-ADDR.ARPA is used to store + the IP address to domain name relationship. This memo defines in the + above structure the appropriate way to locate the X.400 O/R name + space, thus enabling to store in DNS the MIXER mappings (MCGAMs). + + The MIXER mapping information is composed by four tables: + + - 'table1' and 'gate1' gives the translation from X.400 to RFC822; + - 'table2' and 'gate2' tables map RFC822 into X.400. + + Each mapping table is composed by mapping rules, and a single mapping + rule is composed by a keyword (the argument of the mapping function + derived from the address to be translated) and a translator (the + mapping function parameter): + + keyword#translator# + + the '#' sign is a delimiter enclosing the translator. An example: + + foo.bar.us#PRMD$foo\.bar.ADMD$intx.C$us# + + Local mappings are not intended for use outside their restricted + environment, thus they should not be included in DNS. If local + mappings are used, they should be stored using static local tables, + exactly as local static host tables can be used with DNS. + + The keyword of a 'table2' and 'gate2' table entry is a valid RFC822 + domain; thus the usual domain name space can be used without problems + to store these entries. + On the other hand, the keyword of a 'table1' and 'gate1' entry + belongs to the X.400 O/R name space. The X.400 O/R name space does + not usually fit into the usual domain name space, although there are + a number of similarities; a new name structure is thus needed to + represent it. This new name structure contains the X.400 mail + domains. + + To ensure the correct functioning of the DNS system, the new X.400 + name structure must be hooked to the existing domain name space in a + way which respects the existing name hierarchy. + + A possible solution was to create another special branch, starting + from the root of the DNS tree, somehow similar to the in-addr.arpa + tree. This idea would have required to establish a central authority + + + +Allocchio Standards Track [Page 5] + +RFC 2163 MIXER MCGAM January 1998 + + + to coordinate at international level the management of each national + X.400 name tree, including the X.400 public service providers. This + coordination problem is a heavy burden if approached globally. More + over the X.400 name structure is very 'country oriented': thus while + it requires a coordination at national level, it does not have + concepts like the international root. In fact the X.400 international + service is based on a large number of bilateral agreements, and only + within some communities an international coordination service exists. + + The X.400 two letter ISO country codes, however, are the same used + for the RFC822 country top-level domains and this gives us an + appropriate hook to insert the new branches. The proposal is, in + fact, to create under each national top level ISO country code a new + branch in the name space. This branch represents exactly the X.400 + O/R name structure as defined in each single country, following the + ADMD, PRMD, O, OU hierarchy. A unique reserved label 'X42D' is placed + under each country top-level domain, and hence the national X.400 + name space derives its own structure: + + . (root) + | + +-----------------+-----------+--------+-----------------+... + | | | | + edu it us fr + | | | | + +---+---+... +-----+-----+... +-----+-----+... +--+---+... + | | | | | | | | | | + ... ... cnr X42D infn va ca X42D X42D inria + | | | | + +------------+------------+... ... ... +----+-------+... + | | | | | + ADMD-PtPostel ADMD-garr ADMD-Master400 ADMD-atlas ADMD-red + | | | | + +----------+----+... ... +-------+------+... ... + | | | | + PRMD-infn PRMD-STET PRMD-Telecom PRMD-Renault + | | | | + ... ... ... ... + + + The creation of the X.400 new name tree at national level solves the + problem of the international coordination. Actually the coordination + problem is just moved at national level, but it thus becomes easier + to solve. The coordination at national level between the X.400 + communities and the Internet world is already a requirement for the + creation of the national static MIXER mapping tables; the use of the + Internet DNS gives further motivations for this coordination. + + + + +Allocchio Standards Track [Page 6] + +RFC 2163 MIXER MCGAM January 1998 + + + The coordination at national level also fits in the new concept of + MCGAM pubblication. The DNS in fact allows a step by step authority + distribution, up to a final complete delegation: thus organizations + whishing to publish their MCGAM just need to receive delegation also + for their branch of the new X.400 name space. A further advantage of + the national based solution is to allow each country to set up its + own X.400 name structure in DNS and to deploy its own authority + delegation according to its local time scale and requirements, with + no loss of global service in the mean time. And last, placing the new + X.400 name tree and coordination process at national level fits into + the Internet regionalization and internationalisation process, as it + requires local bodies to take care of local coordination problems. + + The DNS name space thus contains completely the information required + by an e-mail gateway or tool to perform the X.400-RFC822 mapping: a + simple query to the nearest nameserver provides it. Moreover there is + no more any need to store, maintain and distribute manually any + mapping table. The new X.400 name space can also contain further + information about the X.400 community, as DNS allows for it a + complete set of resource records, and thus it allows further + developments. This set of RRs in the new X.400 name space must be + considered 'reserved' and thus not used until further specifications. + + The construction of the new domain space trees will follow the same + procedures used when organising at first the already existing DNS + space: at first the information will be stored in a quite centralised + way, and distribution of authority will be gradually achieved. A + separate document will describe the implementation phase and the + methods to assure a smooth introduction of the new service. + +4. The new DNS resource record for MIXER mapping rules: PX + + The specification of the Internet DNS (RFC1035) provides a number of + specific resource records (RRs) to contain specific pieces of + information. In particular they contain the Mail eXchanger (MX) RR + and the host Address (A) records which are used by the Internet SMTP + mailers. As we will store the RFC822 to X.400 mapping information in + the already existing DNS name tree, we need to define a new DNS RR in + order to avoid any possible clash or misuse of already existing data + structures. The same new RR will also be used to store the mappings + from X.400 to RFC822. More over the mapping information, i.e., the + MCGAMs, has a specific format and syntax which require an appropriate + data structure and processing. A further advantage of defining a new + RR is the ability to include flexibility for some eventual future + development. + + + + + + +Allocchio Standards Track [Page 7] + +RFC 2163 MIXER MCGAM January 1998 + + + The definition of the new 'PX' DNS resource record is: + + class: IN (Internet) + + name: PX (pointer to X.400/RFC822 mapping information) + + value: 26 + + The PX RDATA format is: + + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | PREFERENCE | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + / MAP822 / + / / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + / MAPX400 / + / / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + + where: + + PREFERENCE A 16 bit integer which specifies the preference given to + this RR among others at the same owner. Lower values + are preferred; + + MAP822 A <domain-name> element containing <rfc822-domain>, the + RFC822 part of the MCGAM; + + MAPX400 A <domain-name> element containing the value of + <x400-in-domain-syntax> derived from the X.400 part of + the MCGAM (see sect. 4.2); + + PX records cause no additional section processing. The PX RR format + is the usual one: + + <name> [<class>] [<TTL>] <type> <RDATA> + + When we store in DNS a 'table1' or a 'gate1' entry, then <name> will + be an X.400 mail domain name in DNS syntax (see sect. 4.2). When we + store a 'table2' or a 'gate2' table entry, <name> will be an RFC822 + mail domain name, including both fully qualified DNS domains and mail + only domains (MX-only domains). All normal DNS conventions, like + default values, wildcards, abbreviations and message compression, + apply also for all the components of the PX RR. In particular <name>, + MAP822 and MAPX400, as <domain-name> elements, must have the final + "." (root) when they are fully qualified. + + + + +Allocchio Standards Track [Page 8] + +RFC 2163 MIXER MCGAM January 1998 + + +4.1 Additional features of the PX resource record + + The definition of the RDATA for the PX resource record, and the fact + that DNS allows a distinction between an exact value and a wildcard + match for the <name> parameter, represent an extension of the MIXER + specification for mapping rules. In fact, any MCGAM entry is an + implicit wildcard entry, i.e., the rule + + net2.it#PRMD$net2.ADMD$p400.C$it# + + covers any RFC822 domain ending with 'net2.it', unless more detailed + rules for some subdomain in 'net2.it' are present. Thus there is no + possibility to specify explicitly a MCGAM as an exact match only + rule. In DNS an entry like + + *.net2.it. IN PX 10 net2.it. PRMD-net2.ADMD-p400.C-it. + + specify the usual wildcard match as for MIXER tables. However an + entry like + + ab.net2.it. IN PX 10 ab.net2.it. O-ab.PRMD-net2.ADMDb.C-it. + + is valid only for an exact match of 'ab.net2.it' RFC822 domain. + + Note also that in DNS syntax there is no '#' delimiter around MAP822 + and MAPX400 fields: the syntax defined in sect. 4.2 in fact does not + allow the <blank> (ASCII decimal 32) character within these fields, + making unneeded the use of an explicit delimiter as required in the + MIXER original syntax. + + Another extension to the MIXER specifications is the PREFERENCE value + defined as part of the PX RDATA section. This numeric value has + exactly the same meaning than the similar one used for the MX RR. It + is thus possible to specify more than one single mapping for a domain + (both from RFC822 to X.400 and vice versa), giving as the preference + order. In MIXER static tables, however, you cannot specify more than + one mapping per each RFC822 domain, and the same restriction apply + for any X.400 domain mapping to an RFC822 one. + + More over, in the X.400 recommendations a note suggests than an + ADMD=<blank> should be reserved for some special cases. Various + national functional profile specifications for an X.400 MHS states + that if an X.400 PRMD is reachable via any of its national ADMDs, + independently of its actual single or multiple connectivity with + them, it should use ADMD=<blank> to advertise this fact. Again, if a + PRMD has no connections to any ADMD it should use ADMD=0 to notify + its status, etc. However, in most of the current real situations, the + ADMD service providers do not accept messages coming from their + + + +Allocchio Standards Track [Page 9] + +RFC 2163 MIXER MCGAM January 1998 + + + subscribers if they have a blank ADMD, forcing them to have their own + ADMD value. In such a situation there are problems in indicating + properly the actually working mappings for domains with multiple + connectivity. The PX RDATA 'PREFERENCE' extension was introduced to + take in consideration these problems. + + However, as these extensions are not available with MIXER static + tables, it is strongly discouraged to use them when interworking with + any table based gateway or application. The extensions were in fact + introduced just to add more flexibility, like the PREFERENCE value, + or they were already implicit in the DNS mechanism, like the + wildcard specification. They should be used very carefully or just + considered 'reserved for future use'. In particular, for current use, + the PREFERENCE value in the PX record specification should be fixed + to a value of 50, and only wildcard specifications should be used + when specifying <name> values. + +4.2 The DNS syntax for an X.400 'domain' + + The syntax definition of the MCGAM rules is defined in appendix F of + that document. However that syntax is not very human oriented and + contains a number of characters which have a special meaning in other + fields of the Internet DNS. Thus in order to avoid any possible + problem, especially due to some old DNS implementations still being + used in the Internet, we define a syntax for the X.400 part of any + MCGAM rules (and hence for any X.400 O/R name) which makes it + compatible with a <domain-name> element, i.e., + + <domain-name> ::= <subdomain> | " " + <subdomain> ::= <label> | <label> "." <subdomain> + <label> ::= <alphanum>| + <alphanum> {<alphanumhyphen>} <alphanum> + <alphanum> ::= "0".."9" | "A".."Z" | "a".."z" + <alphanumhyphen> ::= "0".."9" | "A".."Z" | "a".."z" | "-" + + (see RFC1035, section 2.3.1, page 8). The legal character set for + <label> does not correspond to the IA5 Printablestring one used in + MIXER to define MCGAM rules. However a very simple "escape mechanism" + can be applied in order to bypass the problem. We can in fact simply + describe the X.400 part of a MCGAM rule format as: + + <map-rule> ::= <map-elem> | <map-elem> { "." <map-elem> } + <map-elem> ::= <attr-label> "$" <attr-value> + <attr-label> ::= "C" | "ADMD" | "PRMD" | "O" | "OU" + <attr-value> ::= " " | "@" | IA5-Printablestring + + + + + + +Allocchio Standards Track [Page 10] + +RFC 2163 MIXER MCGAM January 1998 + + + As you can notice <domain-name> and <map-rule> look similar, and also + <label> and <map-elem> look the same. If we define the correct method + to transform a <map-elem> into a <label> and vice versa the problem + to write a MCGAM rule in <domain-name> syntax is solved. + + The RFC822 domain part of any MCGAM rule is of course already in + <domain-name> syntax, and thus remains unchanged. + + In particular, in a 'table1' or 'gate1' mapping rule the 'keyword' + value must be converted into <x400-in-domain-syntax> (X.400 mail DNS + mail domain), while the 'translator' value is already a valid RFC822 + domain. Vice versa in a 'table2' or 'gate2' mapping rule, the + 'translator' must be converted into <x400-in-domain-syntax>, while + the 'keyword' is already a valid RFC822 domain. + +4.2.1 IA5-Printablestring to <alphanumhyphen> mappings + + The problem of unmatching IA5-Printablestring and <label> character + set definition is solved by a simple character mapping rule: whenever + an IA5 character does not belong to <alphanumhyphen>, then it is + mapped using its 3 digit decimal ASCII code, enclosed in hyphens. A + small set of special rules is also defined for the most frequent + cases. Moreover some frequent characters combinations used in MIXER + rules are also mapped as special cases. + + Let's then define the following simple rules: + + MCGAM rule DNS store translation conditions + ----------------------------------------------------------------- + <attr-label>$@ <attr-label> missing attribute + <attr-label>$<blank> <attr-label>"b" blank attribute + <attr-label>$xxx <attr-label>-xxx elsewhere + + Non <alphanumhyphen> characters in <attr-value>: + + MCGAM rule DNS store translation conditions + ----------------------------------------------------------------- + - -h- hyphen + \. -d- quoted dot + <blank> -b- blank + <non A/N character> -<3digit-decimal>- elsewhere + + If the DNS store translation of <attr-value> happens to end with an + hyphen, then this last hyphen is omitted. + + Let's now have some examples: + + + + + +Allocchio Standards Track [Page 11] + +RFC 2163 MIXER MCGAM January 1998 + + + MCGAM rule DNS store translation conditions + ----------------------------------------------------------------- + PRMD$@ PRMD missing attribute + ADMD$<blank> ADMDb blank attribute + ADMD$400-net ADMD-400-h-net hyphen mapping + PRMD$UK\.BD PRMD-UK-d-BD quoted dot mapping + O$ACME Inc\. O-ACME-b-Inc-d blank & final hyphen + PRMD$main-400-a PRMD-main-h-400-h-a hyphen mapping + O$-123-b O--h-123-h-b hyphen mapping + OU$123-x OU-123-h-x hyphen mapping + PRMD$Adis+co PRMD-Adis-043-co 3digit mapping + + Thus, an X.400 part from a MCGAM like + + OU$uuu.O$@.PRMD$ppp\.rrr.ADMD$aaa ddd-mmm.C$cc + + translates to + + OU-uuu.O.PRMD-ppp-d-rrr.ADMD-aaa-b-ddd-h-mmm.C-cc + + Another example: + + OU$sales dept\..O$@.PRMD$ACME.ADMD$ .C$GB + + translates to + + OU-sales-b-dept-d.O.PRMD-ACME.ADMDb.C-GB + +4.2.2 Flow chart + + In order to achieve the proper DNS store translations of the X.400 + part of a MCGAM or any other X.400 O/R name, some software tools will + be used. It is in fact evident that the above rules for converting + mapping table from MIXER to DNS format (and vice versa) are not user + friendly enough to think of a human made conversion. + + To help in designing such tools, we describe hereunder a small flow + chart. The fundamental rule to be applied during translation is, + however, the following: + + "A string must be parsed from left to right, moving appropriately + the pointer in order not to consider again the already translated + left section of the string in subsequent analysis." + + + + + + + + +Allocchio Standards Track [Page 12] + +RFC 2163 MIXER MCGAM January 1998 + + + Flow chart 1 - Translation from MIXER to DNS format: + + parse single attribute + (enclosed in "." separators) + | + (yes) --- <label>$@ ? --- (no) + | | + map to <label> (no) <label>$<blank> ? (yes) + | | | + | map to <label>- map to <label>"b" + | | | + | map "\." to -d- | + | | | + | map "-" to -h- | + | | | + | map non A/N char to -<3digit>- | + restart | | | + ^ | remove (if any) last "-" | + | | | | + | \-------> add a "." <--------------/ + | | + \---------- take next attribute (if any) + + + Flow chart 2 - Translation from DNS to MIXER format: + + + parse single attribute + (enclosed in "." separators) + | + (yes) ---- <label> ? ---- (no) + | | + map to <label>$@ (no) <label>"b" ? (yes) + | | | + | map to <label>$ map to <label>$<blank> + | | | + | map -d- to "\." | + | | | + | map -h- to "-" | + | | | + | map -b- to " " | + restart | | | + ^ | map -<3digit>- to non A/N char | + | | | | + | \--------> add a "." <----------/ + | | + \------------- take next attribute (if any) + + + + +Allocchio Standards Track [Page 13] + +RFC 2163 MIXER MCGAM January 1998 + + + Note that the above flow charts deal with the translation of the + attributes syntax, only. + +4.2.3 The Country Code convention in the <name> value. + + The RFC822 domain space and the X.400 O/R address space, as said in + section 3, have one specific common feature: the X.400 ISO country + codes are the same as the RFC822 ISO top level domains for countries. + In the previous sections we have also defined a method to write in + <domain-name> syntax any X.400 domain, while in section 3 we + described the new name space starting at each country top level + domain under the X42D.cc (where 'cc' is then two letter ISO country + code). + + The <name> value for a 'table1' or 'gate1' entry in DNS should thus + be derived from the X.400 domain value, translated to <domain-name> + syntax, adding the 'X42D.cc.' post-fix to it, i.e., + + ADMD$acme.C$fr + + produces in <domain-name> syntax the key: + + ADMD-acme.C-fr + + which is post-fixed by 'X42D.fr.' resulting in: + + ADMD-acme.C-fr.X42D.fr. + + However, due to the identical encoding for X.400 country codes and + RFC822 country top level domains, the string 'C-fr.X42D.fr.' is + clearly redundant. + + We thus define the 'Country Code convention' for the <name> key, + i.e., + + "The C-cc section of an X.400 domain in <domain-name> syntax must + be omitted when creating a <name> key, as it is identical to the + top level country code used to identify the DNS zone where the + information is stored". + + Thus we obtain the following <name> key examples: + + X.400 domain DNS <name> key + -------------------------------------------------------------------- + ADMD$acme.C$fr ADMD-acme.X42D.fr. + PRMD$ux\.av.ADMD$ .C$gb PRMD-ux-d-av.ADMDb.X42D.gb. + PRMD$ppb.ADMD$Dat 400.C$de PRMD-ppb.ADMD-Dat-b-400.X42D.de. + + + + +Allocchio Standards Track [Page 14] + +RFC 2163 MIXER MCGAM January 1998 + + +4.3 Creating the appropriate DNS files + + Using MIXER's assumption of an asymmetric mapping between X.400 and + RFC822 addresses, two separate relations are required to store the + mapping database: MIXER 'table1' and MIXER 'table2'; thus also in DNS + we will maintain the two different sections, even if they will both + use the PX resource record. More over MIXER also specify two + additional tables: MIXER 'gate1' and 'gate2' tables. These additional + tables, however, have the same syntax rules than MIXER 'table1' and + 'table2' respectively, and thus the same translation procedure as + 'table1' and 'table2' will be applied; some details about the MIXER + 'gate1' and 'gate2' tables are discussed in section 4.4. + + Let's now check how to create, from an MCGAM entry, the appropriate + DNS entry in a DNS data file. We can again define an MCGAM entry as + defined in appendix F of that document as: + + <x400-domain>#<rfc822-domain># (case A: 'table1' and 'gate1' + entry) + + and + + <rfc822-domain>#<x400-domain># (case B: 'table2' and 'gate2' + entry) + + The two cases must be considered separately. Let's consider case A. + + - take <x400-domain> and translate it into <domain-name> syntax, + obtaining <x400-in-domain-syntax>; + - create the <name> key from <x400-in-domain-syntax> i.e., apply + the Country Code convention described in sect. 4.2.3; + - construct the DNS PX record as: + + *.<name> IN PX 50 <rfc822-domain> <x400-in-domain-syntax> + + Please note that within PX RDATA the <rfc822-domain> precedes the + <x400-in-domain-syntax> also for a 'table1' and 'gate1' entry. + + an example: from the 'table1' rule + + PRMD$ab.ADMD$ac.C$fr#ab.fr# + + we obtain + + *.PRMD-ab.ADMD-ac.X42D.fr. IN PX 50 ab.fr. PRMD-ab.ADMD-ac.C-fr. + + Note that <name>, <rfc822-domain> and <x400-in-domain-syntax> are + fully qualified <domain-name> elements, thus ending with a ".". + + + +Allocchio Standards Track [Page 15] + +RFC 2163 MIXER MCGAM January 1998 + + + Let's now consider case B. + + - take <rfc822-domain> as <name> key; + - translate <x400-domain> into <x400-in-domain-syntax>; + - construct the DNS PX record as: + + *.<name> IN PX 50 <rfc822-domain> <x400-in-domain-syntax> + + an example: from the 'table2' rule + + ab.fr#PRMD$ab.ADMD$ac.C$fr# + + we obtain + + *.ab.fr. IN PX 50 ab.fr. PRMD-ab.ADMD-ac.C-fr. + + Again note the fully qualified <domain-name> elements. + + A file containing the MIXER mapping rules and MIXER 'gate1' and + 'gate2' table written in DNS format will look like the following + fictious example: + + ! + ! MIXER table 1: X.400 --> RFC822 + ! + *.ADMD-acme.X42D.it. IN PX 50 it. ADMD-acme.C-it. + *.PRMD-accred.ADMD-tx400.X42D.it. IN PX 50 \ + accred.it. PRMD-accred.ADMD-tx400.C-it. + *.O-u-h-newcity.PRMD-x4net.ADMDb.X42D.it. IN PX 50 \ + cs.ncty.it. O-u-h-newcity.PRMD-x4net.ADMDb.C-it. + ! + ! MIXER table 2: RFC822 --> X.400 + ! + *.nrc.it. IN PX 50 nrc.it. PRMD-nrc.ADMD-acme.C-it. + *.ninp.it. IN PX 50 ninp.it. O.PRMD-ninp.ADMD-acme.C-it. + *.bd.it. IN PX 50 bd.it. PRMD-uk-d-bd.ADMDb.C-it. + ! + ! MIXER Gate 1 Table + ! + *.ADMD-XKW-h-Mail.X42D.it. IN PX 50 \ + XKW-gateway.it. ADMD-XKW-h-Mail.C-it.G. + *.PRMD-Super-b-Inc.ADMDb.X42D.it. IN PX 50 \ + GlobalGw.it. PRMD-Super-b-Inc.ADMDb.C-it.G. + ! + ! MIXER Gate 2 Table + ! + my.it. IN PX 50 my.it. OU-int-h-gw.O.PRMD-ninp.ADMD-acme.C-it.G. + co.it. IN PX 50 co.it. O-mhs-h-relay.PRMD-x4net.ADMDb.C-it.G. + + + +Allocchio Standards Track [Page 16] + +RFC 2163 MIXER MCGAM January 1998 + + + (here the "\" indicates continuation on the same line, as wrapping is + done only due to typographical reasons). + + Note the special suffix ".G." on the right side of the 'gate1' and + 'gate2' Tables section whose aim is described in section 4.4. The + corresponding MIXER tables are: + + # + # MIXER table 1: X.400 --> RFC822 + # + ADMD$acme.C$it#it# + PRMD$accred.ADMD$tx400.C$it#accred.it# + O$u-newcity.PRMD$x4net.ADMD$ .C$it#cs.ncty.it# + # + # MIXER table 2: RFC822 --> X.400 + # + nrc.it#PRMD$nrc.ADMD$acme.C$it# + ninp.it#O.PRMD$ninp.ADMD$acme.C$it# + bd.it#PRMD$uk\.bd.ADMD$ .C$it# + # + # MIXER Gate 1 Table + # + ADMD$XKW-Mail.C$it#XKW-gateway.it# + PRMD$Super Inc.ADMD$ .C$it#GlobalGw.it# + # + # MIXER Gate 2 Table + # + my.it#OU$int-gw.O$@.PRMD$ninp.ADMD$acme.C$it# + co.it#O$mhs-relay.PRMD$x4net.ADMD$ .C$t# + +4.4 Storing the MIXER 'gate1' and 'gate2' tables + + Section 4.3.4 of MIXER also specify how an address should be + converted between RFC822 and X.400 in case a complete mapping is + impossible. To allow the use of DDAs for non mappable domains, the + MIXER 'gate2' table is thus introduced. + + In a totally similar way, when an X.400 address cannot be completely + converted in RFC822, section 4.3.5 of MIXER specifies how to encode + (LHS encoding) the address itself, pointing then to the appropriate + MIXER conformant gateway, indicated in the MIXER 'gate1' table. + + DNS must store and distribute also these 'gate1' and 'gate2' data. + + One of the major features of the DNS is the ability to distribute the + authority: a certain site runs the "primary" nameserver for one + determined sub-tree and thus it is also the only place allowed to + update information regarding that sub-tree. This fact allows, in our + + + +Allocchio Standards Track [Page 17] + +RFC 2163 MIXER MCGAM January 1998 + + + case, a further additional feature to the table based approach. In + fact we can avoid one possible ambiguity about the use of the 'gate1' + and 'gate2' tables (and thus of LHS and DDAs encoding). + + The authority maintaining a DNS entry in the usual RFC822 domain + space is the only one allowed to decide if its domain should be + mapped using Standard Attributes (SA) syntax or Domain Defined + Attributes (DDA) one. If the authority decides that its RFC822 domain + should be mapped using SA, then the PX RDATA will be a 'table2' + entry, otherwise it will be a 'gate2' table entry. Thus for an RFC822 + domain we cannot have any more two possible entries, one from 'table2 + and another one from 'gate2' table, and the action for a gateway + results clearly stated. + + Similarly, the authority mantaining a DNS entry in the new X.400 name + space is the only one allowed to decide if its X.400 domain should be + mapped using SA syntax or Left Hand Side (LHS) encoding. If the + authority decides that its X.400 domain should be mapped using SA, + then the PX RDATA will be a 'table1' entry, otherwise it will be a + 'gate1' table entry. Thus also for an X.400 domain we cannot have any + more two possible entries, one from 'table1' and another one from + 'gate1' table, and the action for a gateway results clearly stated. + + The MIXER 'gate1' table syntax is actually identical to MIXER + 'table1', and 'gate2' table syntax is identical to MIXER 'table2'. + Thus the same syntax translation rules from MIXER to DNS format can + be applied in both cases. However a gateway or any other application + must know if the answer it got from DNS contains some 'table1', + 'table2' or some 'gate1', 'gate2' table information. This is easily + obtained flagging with an additional ".G." post-fix the PX RDATA + value when it contains a 'gate1' or 'gate2' table entry. The example + in section 4.3 shows clearly the result. As any X.400 O/R domain must + end with a country code ("C-xx" in our DNS syntax) the additional + ".G." creates no conflicts or ambiguities at all. This postfix must + obviously be removed before using the MIXER 'gate1' or 'gate2' table + data. + +5. Finding MIXER mapping information from DNS + + The MIXER mapping information is stored in DNS both in the normal + RFC822 domain name space, and in the newly defined X.400 name space. + The information, stored in PX resource records, does not represent a + full RFC822 or X.400 O/R address: it is a template which specifies + the fields of the domain that are used by the mapping algorithm. + + When mapping information is stored in the DNS, queries to the DNS are + issued whenever an iterative search through the mapping table would + be performed (MIXER: section 4.3.4, State I; section 4.3.5, mapping + + + +Allocchio Standards Track [Page 18] + +RFC 2163 MIXER MCGAM January 1998 + + + B). Due to the DNS search mechanism, DNS by itself returns the + longest possible match in the stored mapping rule with a single + query, thus no iteration and/or multiple queries are needed. As + specified in MIXER, a search of the mapping table will result in + either success (mapping found) or failure (query failed, mapping not + found). + + When a DNS query is issued, a third possible result is timeout. If + the result is timeout, the gateway operation is delayed and then + retried at a later time. A result of success or failure is processed + according to the algorithms specified in MIXER. If a DNS error code + is returned, an error message should be logged and the gateway + operation is delayed as for timeout. These pathological situations, + however, should be avoided with a careful duplication and chaching + mechanism which DNS itself provides. + + Searching the nameserver which can authoritatively solve the query is + automatically performed by the DNS distributed name service. + +5.1 A DNS query example + + An MIXER mail-gateway located in the Internet, when translating + addresses from RFC822 to X.400, can get information about the MCGAM + rule asking the DNS. As an example, when translating the address + SUN.CCE.NRC.IT, the gateway will just query DNS for the associated PX + resource record. The DNS should contain a PX record like this: + + *.cce.nrc.it. IN PX 50 cce.nrc.it. O-cce.PRMD-nrc.ADMD-acme.C-it. + + The first query will return immediately the appropriate mapping rule + in DNS store format. + + There is no ".G." at the end of the obtained PX RDATA value, thus + applying the syntax translation specified in paragraph 4.2 the MIXER + Table 2 mapping rule will be obtained. + + Let's now take another example where a 'gate2' table rule is + returned. If we are looking for an RFC822 domain ending with top + level domain "MW", and the DNS contains a PX record like this, + + *.mw. IN PX 50 mw. O-cce.PRMD-nrc.ADMD-acme.C-it.G. + + DNS will return 'mw.' and 'O-cce.PRMD-nrc.ADMD-acme.C-it.G.', i.e., a + 'gate2' table entry in DNS store format. Dropping the final ".G." and + applying the syntax translation specified in paragraph 4.2 the + original rule will be available. More over, the ".G." flag also tells + the gateway to use DDA encoding for the inquired RFC822 domain. + + + + +Allocchio Standards Track [Page 19] + +RFC 2163 MIXER MCGAM January 1998 + + + On the other hand, translating from X.400 to RFC822 the address + + C=de; ADMD=pkz; PRMD=nfc; O=top; + + the mail gateway should convert the syntax according to paragraph + 4.2, apply the 'Country code convention' described in 4.2.3 to derive + the appropriate DNS translation of the X.400 O/R name and then query + DNS for the corresponding PX resource record. The obtained record for + which the PX record must be queried is thus: + + O-top.PRMD-nfc.ADMD-pkz.X42D.de. + + The DNS could contain: + + *.ADMD-pkz.X42D.de. IN PX 50 pkz.de. ADMD-pkz.C-de. + + Assuming that there are not more specific records in DNS, the + wildcard mechanism will return the MIXER 'table1' rule in encoded + format. + + Finally, an example where a 'gate1' rule is involved. If we are + looking for an X.400 domain ending with ADMD=PWT400; C=US; , and the + DNS contains a PX record like this, + + *.ADMD-PWT400.X42D.us. IN PX 50 intGw.com. ADMD-PWT400.C-us.G. + + DNS will return 'intGw.com.' and 'ADMD-PWT400.C-us.G.', i.e., a + 'gate1' table entry in DNS store format. Dropping the final ".G." and + applying the syntax translation specified in paragraph 4.2 the + original rule will be available. More over, the ".G." flag also tells + the gateway to use LHS encoding for the inquired X.400 domain. + +6. Administration of mapping information + + The DNS, using the PX RR, is able to distribute the MCGAM rules to + all MIXER gateways located on the Internet. However, not all MIXER + gateways will be able to use the Internet DNS. It is expected that + some gateways in a particular management domain will conform to one + of the following models: + + (a) Table-based, (b) DNS-based, (c) X.500-based + + Table-based management domains will continue to publish their MCGAM + rules and retrieve the mapping tables via the International Mapping + Table coordinator, manually or via some automated procedures. Their + MCGAM information can be made available also in DNS by the + appropriate DNS authorities, using the same mechanism already in + place for MX records: if a branch has not yet in place its own DNS + + + +Allocchio Standards Track [Page 20] + +RFC 2163 MIXER MCGAM January 1998 + + + server, some higher authority in the DNS tree will provide the + service for it. A transition procedure similar to the one used to + migrate from the 'hosts.txt' tables to DNS can be applied also to the + deployment phase of this specification. An informational document + describing the implementation phase and the detailed coordination + procedures is expected. + + Another distributed directory service which can distribute the MCGAM + information is X.500. Coordination with table-based domains can be + obtained in an identical way as for the DNS case. + + Coordination of MCGAM information between DNS and X.500 is more + complex, as it requies some kind of uploading information between the + two systems. The ideal solution is a dynamic alignment mechanism + which transparently makes the DNS mapping information available in + X.500 and vice versa. Some work in this specific field is already + being done [see Costa] which can result in a global transparent + directory service, where the information is stored in DNS or in + X.500, but is visible completely by any of the two systems. + + However we must remind that MIXER concept of MCGAM rules publication + is different from the old RFC1327 concept of globally distributed, + coordinated and unique mapping rules. In fact MIXER does not requires + any more for any conformant gateway or tool to know the complete set + of MCGAM: it only requires to use some set (eventually empty) of + valid MCGAM rules, published either by Tables, DNS or X.500 + mechanisms or any combination of these methods. More over MIXER + specifies that also incomplete sets of MCGAM can be used, and + supplementary local unpublished (but valid) MCGAM can also be used. + As a consequence, the problem of coordination between the three + systems proposed by MIXER for MCGAM publication is non essential, and + important only for efficient operational matters. It does not in fact + affect the correct behaviour of MIXER conformant gateways and tools. + +7. Conclusion + + The introduction of the new PX resource record and the definition of + the X.400 O/R name space in the DNS structure provide a good + repository for MCGAM information. The mapping information is stored + in the DNS tree structure so that it can be easily obtained using the + DNS distributed name service. At the same time the definition of the + appropriate DNS space for X.400 O/R names provide a repository where + to store and distribute some other X.400 MHS information. The use of + the DNS has many known advantages in storing, managing and updating + the information. A successful number of tests were been performed + under the provisional top level domain "X400.IT" when RFC1664 was + developed, and their results confirmed the advantages of the method. + Operational exeprience for over 2 years with RFC1664 specification + + + +Allocchio Standards Track [Page 21] + +RFC 2163 MIXER MCGAM January 1998 + + + confirmed the feasibility of the method, and helped identifying some + operational procedures to deploy the insertion of MCGAM into DNS. + + Software to query the DNS and then to convert between the textual + representation of DNS resource records and the address format defined + in MIXER was developed with RFC1664. This software also allows a + smooth implementation and deployment period, eventually taking care + of the transition phase. This software can be easily used (with + little or null modification) also for this updated specification, + supporting the new 'gate1' MIXER table. DNS software implementations + supporting RFC1664 also supports with no modification this memo new + specification. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Allocchio Standards Track [Page 22] + +RFC 2163 MIXER MCGAM January 1998 + + + A further informational document describing operational and + implementation of the service is expected. + +8. Acknowledgements + + We wish to thanks all those who contributed to the discussion and + revision of this document: many of their ideas and suggestions + constitute essential parts of this work. In particular thanks to Jon + Postel, Paul Mockapetris, Rob Austin and the whole IETF x400ops, + TERENA wg-msg and IETF namedroppers groups. A special mention to + Christian Huitema for his fundamental contribution to this work. + + This document is a revision of RFC1664, edited by one of its authors + on behalf of the IETF MIXER working group. The current editor wishes + to thank here also the authors of RFC1664: + + Antonio Blasco Bonito RFC822: bonito@cnuce.cnr.it + CNUCE - CNR X.400: C=it;A=garr;P=cnr; + Reparto infr. reti O=cnuce;S=bonito; + Viale S. Maria 36 + I 56126 Pisa + Italy + + + Bruce Cole RFC822: bcole@cisco.com + Cisco Systems Inc. X.400: C=us;A= ;P=Internet; + P.O. Box 3075 DD.rfc-822=bcole(a)cisco.com; + 1525 O'Brien Drive + Menlo Park, CA 94026 + U.S.A. + + + Silvia Giordano RFC822: giordano@cscs.ch + Centro Svizzero di X.400: C=ch;A=arcom;P=switch;O=cscs; + Calcolo Scientifico S=giordano; + Via Cantonale + CH 6928 Manno + Switzerland + + + Robert Hagens RFC822: hagens@ans.net + Advanced Network and Services X.400: C=us;A= ;P=Internet; + 1875 Campus Commons Drive DD.rfc-822=hagens(a)ans.net; + Reston, VA 22091 + U.S.A. + + + + + + +Allocchio Standards Track [Page 23] + +RFC 2163 MIXER MCGAM January 1998 + + +9. References + + [CCITT] CCITT SG 5/VII, "Recommendation X.400, Message Handling + Systems: System Model - Service Elements", October 1988. + + [RFC 1327] Kille, S., "Mapping between X.400(1988)/ISO 10021 and RFC + 822", RFC 1327, March 1992. + + [RFC 1034] Mockapetris, P., "Domain Names - Concepts and Facilities", + STD 13, RFC 1034, USC/Information Sciences Institute, November + 1987. + + [RFC 1035] Mockapetris, P., "Domain names - Implementation and + Specification", STD 13, RFC 1035, USC/Information Sciences + Institute, November 1987. + + [RFC 1033] Lottor, M., "Domain Administrators Operation Guide", RFC + 1033, SRI International, November 1987. + + [RFC 2156] Kille, S. E., " MIXER (Mime Internet X.400 Enhanced + Relay): Mapping between X.400 and RFC 822/MIME", RFC 2156, + January 1998. + + [Costa] Costa, A., Macedo, J., and V. Freitas, "Accessing and + Managing DNS Information in the X.500 Directory", Proceeding of + the 4th Joint European Networking Conference, Trondheim, NO, May + 1993. + +10. Security Considerations + + This document specifies a means by which DNS "PX" records can direct + the translation between X.400 and Internet mail addresses. + + This can indirectly affect the routing of mail across an gateway + between X.400 and Internet Mail. A succesful attack on this service + could cause incorrect translation of an originator address (thus + "forging" the originator address), or incorrect translation of a + recipient address (thus directing the mail to an unauthorized + recipient, or making it appear to an authorized recipient, that the + message was intended for recipients other than those chosen by the + originator) or could force the mail path via some particular gateway + or message transfer agent where mail security can be affected by + compromised software. + + + + + + + + +Allocchio Standards Track [Page 24] + +RFC 2163 MIXER MCGAM January 1998 + + + There are several means by which an attacker might be able to deliver + incorrect PX records to a client. These include: (a) compromise of a + DNS server, (b) generating a counterfeit response to a client's DNS + query, (c) returning incorrect "additional information" in response + to an unrelated query. + + Clients using PX records SHOULD ensure that routing and address + translations are based only on authoritative answers. Once DNS + Security mechanisms [RFC 2065] become more widely deployed, clients + SHOULD employ those mechanisms to verify the authenticity and + integrity of PX records. + +11. Author's Address + + Claudio Allocchio + Sincrotrone Trieste + SS 14 Km 163.5 Basovizza + I 34012 Trieste + Italy + + RFC822: Claudio.Allocchio@elettra.trieste.it + X.400: C=it;A=garr;P=Trieste;O=Elettra; + S=Allocchio;G=Claudio; + Phone: +39 40 3758523 + Fax: +39 40 3758565 + + + + + + + + + + + + + + + + + + + + + + + + + + +Allocchio Standards Track [Page 25] + +RFC 2163 MIXER MCGAM January 1998 + + +12. Full Copyright Statement + + Copyright (C) The Internet Society (1998). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + + + + + + + + + + + + + + + + + + + + + + + + +Allocchio Standards Track [Page 26] + diff --git a/doc/rfc/rfc2168.txt b/doc/rfc/rfc2168.txt new file mode 100644 index 00000000..3eed1bdb --- /dev/null +++ b/doc/rfc/rfc2168.txt @@ -0,0 +1,1123 @@ + + + + + + +Network Working Group R. Daniel +Request for Comments: 2168 Los Alamos National Laboratory +Category: Experimental M. Mealling + Network Solutions, Inc. + June 1997 + + + Resolution of Uniform Resource Identifiers + using the Domain Name System + +Status of this Memo +=================== + + This memo defines an Experimental Protocol for the Internet + community. This memo does not specify an Internet standard of any + kind. Discussion and suggestions for improvement are requested. + Distribution of this memo is unlimited. + +Abstract: +========= + + Uniform Resource Locators (URLs) are the foundation of the World Wide + Web, and are a vital Internet technology. However, they have proven + to be brittle in practice. The basic problem is that URLs typically + identify a particular path to a file on a particular host. There is + no graceful way of changing the path or host once the URL has been + assigned. Neither is there a graceful way of replicating the resource + located by the URL to achieve better network utilization and/or fault + tolerance. Uniform Resource Names (URNs) have been hypothesized as a + adjunct to URLs that would overcome such problems. URNs and URLs are + both instances of a broader class of identifiers known as Uniform + Resource Identifiers (URIs). + + The requirements document for URN resolution systems[15] defines the + concept of a "resolver discovery service". This document describes + the first, experimental, RDS. It is implemented by a new DNS Resource + Record, NAPTR (Naming Authority PoinTeR), that provides rules for + mapping parts of URIs to domain names. By changing the mapping + rules, we can change the host that is contacted to resolve a URI. + This will allow a more graceful handling of URLs over long time + periods, and forms the foundation for a new proposal for Uniform + Resource Names. + + + + + + + + + +Daniel & Mealling Experimental [Page 1] + +RFC 2168 Resolution of URIs Using the DNS June 1997 + + + In addition to locating resolvers, the NAPTR provides for other + naming systems to be grandfathered into the URN world, provides + independence between the name assignment system and the resolution + protocol system, and allows multiple services (Name to Location, Name + to Description, Name to Resource, ...) to be offered. In conjunction + with the SRV RR, the NAPTR record allows those services to be + replicated for the purposes of fault tolerance and load balancing. + +Introduction: +============= + + Uniform Resource Locators have been a significant advance in + retrieving Internet-accessible resources. However, their brittle + nature over time has been recognized for several years. The Uniform + Resource Identifier working group proposed the development of Uniform + Resource Names to serve as persistent, location-independent + identifiers for Internet resources in order to overcome most of the + problems with URLs. RFC-1737 [1] sets forth requirements on URNs. + + During the lifetime of the URI-WG, a number of URN proposals were + generated. The developers of several of those proposals met in a + series of meetings, resulting in a compromise known as the Knoxville + framework. The major principle behind the Knoxville framework is + that the resolution system must be separate from the way names are + assigned. This is in marked contrast to most URLs, which identify the + host to contact and the protocol to use. Readers are referred to [2] + for background on the Knoxville framework and for additional + information on the context and purpose of this proposal. + + Separating the way names are resolved from the way they are + constructed provides several benefits. It allows multiple naming + approaches and resolution approaches to compete, as it allows + different protocols and resolvers to be used. There is just one + problem with such a separation - how do we resolve a name when it + can't give us directions to its resolver? + + For the short term, DNS is the obvious candidate for the resolution + framework, since it is widely deployed and understood. However, it is + not appropriate to use DNS to maintain information on a per-resource + basis. First of all, DNS was never intended to handle that many + records. Second, the limited record size is inappropriate for catalog + information. Third, domain names are not appropriate as URNs. + + Therefore our approach is to use DNS to locate "resolvers" that can + provide information on individual resources, potentially including + the resource itself. To accomplish this, we "rewrite" the URI into a + domain name following the rules provided in NAPTR records. Rewrite + rules provide considerable power, which is important when trying to + + + +Daniel & Mealling Experimental [Page 2] + +RFC 2168 Resolution of URIs Using the DNS June 1997 + + + meet the goals listed above. However, collections of rules can become + difficult to understand. To lessen this problem, the NAPTR rules are + *always* applied to the original URI, *never* to the output of + previous rules. + + Locating a resolver through the rewrite procedure may take multiple + steps, but the beginning is always the same. The start of the URI is + scanned to extract its colon-delimited prefix. (For URNs, the prefix + is always "urn:" and we extract the following colon-delimited + namespace identifier [3]). NAPTR resolution begins by taking the + extracted string, appending the well-known suffix ".urn.net", and + querying the DNS for NAPTR records at that domain name. Based on the + results of this query, zero or more additional DNS queries may be + needed to locate resolvers for the URI. The details of the + conversation between the client and the resolver thus located are + outside the bounds of this draft. Three brief examples of this + procedure are given in the next section. + + The NAPTR RR provides the level of indirection needed to keep the + naming system independent of the resolution system, its protocols, + and services. Coupled with the new SRV resource record proposal[4] + there is also the potential for replicating the resolver on multiple + hosts, overcoming some of the most significant problems of URLs. This + is an important and subtle point. Not only do the NAPTR and SRV + records allow us to replicate the resource, we can replicate the + resolvers that know about the replicated resource. Preventing a + single point of failure at the resolver level is a significant + benefit. Separating the resolution procedure from the way names are + constructed has additional benefits. Different resolution procedures + can be used over time, and resolution procedures that are determined + to be useful can be extended to deal with additional namespaces. + +Caveats +======= + + The NAPTR proposal is the first resolution procedure to be considered + by the URN-WG. There are several concerns about the proposal which + have motivated the group to recommend it for publication as an + Experimental rather than a standards-track RFC. + + First, URN resolution is new to the IETF and we wish to gain + operational experience before recommending any procedure for the + standards track. Second, the NAPTR proposal is based on DNS and + consequently inherits concerns about security and administration. The + recent advancement of the DNSSEC and secure update drafts to Proposed + Standard reduce these concerns, but we wish to experiment with those + new capabilities in the context of URN administration. A third area + of concern is the potential for a noticeable impact on the DNS. We + + + +Daniel & Mealling Experimental [Page 3] + +RFC 2168 Resolution of URIs Using the DNS June 1997 + + + believe that the proposal makes appropriate use of caching and + additional information, but it is best to go slow where the potential + for impact on a core system like the DNS is concerned. Fourth, the + rewrite rules in the NAPTR proposal are based on regular expressions. + Since regular expressions are difficult for humans to construct + correctly, concerns exist about the usability and maintainability of + the rules. This is especially true where international character sets + are concerned. Finally, the URN-WG is developing a requirements + document for URN Resolution Services[15], but that document is not + complete. That document needs to precede any resolution service + proposals on the standards track. + +Terminology +=========== + + "Must" or "Shall" - Software that does not behave in the manner that + this document says it must is not conformant to this + document. + "Should" - Software that does not follow the behavior that this + document says it should may still be conformant, but is + probably broken in some fundamental way. + "May" - Implementations may or may not provide the described + behavior, while still remaining conformant to this + document. + +Brief overview and examples of the NAPTR RR: +============================================ + + A detailed description of the NAPTR RR will be given later, but to + give a flavor for the proposal we first give a simple description of + the record and three examples of its use. + + The key fields in the NAPTR RR are order, preference, service, flags, + regexp, and replacement: + + * The order field specifies the order in which records MUST be + processed when multiple NAPTR records are returned in response to a + single query. A naming authority may have delegated a portion of + its namespace to another agency. Evaluating the NAPTR records in + the correct order is necessary for delegation to work properly. + + * The preference field specifies the order in which records SHOULD be + processed when multiple NAPTR records have the same value of + "order". This field lets a service provider specify the order in + which resolvers are contacted, so that more capable machines are + contacted in preference to less capable ones. + + + + + +Daniel & Mealling Experimental [Page 4] + +RFC 2168 Resolution of URIs Using the DNS June 1997 + + + * The service field specifies the resolution protocol and resolution + service(s) that will be available if the rewrite specified by the + regexp or replacement fields is applied. Resolution protocols are + the protocols used to talk with a resolver. They will be specified + in other documents, such as [5]. Resolution services are operations + such as N2R (URN to Resource), N2L (URN to URL), N2C (URN to URC), + etc. These will be discussed in the URN Resolution Services + document[6], and their behavior in a particular resolution protocol + will be given in the specification for that protocol (see [5] for a + concrete example). + + * The flags field contains modifiers that affect what happens in the + next DNS lookup, typically for optimizing the process. Flags may + also affect the interpretation of the other fields in the record, + therefore, clients MUST skip NAPTR records which contain an unknown + flag value. + + * The regexp field is one of two fields used for the rewrite rules, + and is the core concept of the NAPTR record. The regexp field is a + String containing a sed-like substitution expression. (The actual + grammar for the substitution expressions is given later in this + draft). The substitution expression is applied to the original URN + to determine the next domain name to be queried. The regexp field + should be used when the domain name to be generated is conditional + on information in the URI. If the next domain name is always known, + which is anticipated to be a common occurrence, the replacement + field should be used instead. + + * The replacement field is the other field that may be used for the + rewrite rule. It is an optimization of the rewrite process for the + case where the next domain name is fixed instead of being + conditional on the content of the URI. The replacement field is a + domain name (subject to compression if a DNS sender knows that a + given recipient is able to decompress names in this RR type's RDATA + field). If the rewrite is more complex than a simple substitution + of a domain name, the replacement field should be set to . and the + regexp field used. + + + + + + + + + + + + + + +Daniel & Mealling Experimental [Page 5] + +RFC 2168 Resolution of URIs Using the DNS June 1997 + + + Note that the client applies all the substitutions and performs all + lookups, they are not performed in the DNS servers. Note also that it + is the belief of the developers of this document that regexps should + rarely be used. The replacement field seems adequate for the vast + majority of situations. Regexps are only necessary when portions of a + namespace are to be delegated to different resolvers. Finally, note + that the regexp and replacement fields are, at present, mutually + exclusive. However, developers of client software should be aware + that a new flag might be defined which requires values in both + fields. + +Example 1 +--------- + + Consider a URN that uses the hypothetical DUNS namespace. DUNS + numbers are identifiers for approximately 30 million registered + businesses around the world, assigned and maintained by Dunn and + Bradstreet. The URN might look like: + + urn:duns:002372413:annual-report-1997 + + The first step in the resolution process is to find out about the + DUNS namespace. The namespace identifier, "duns", is extracted from + the URN, prepended to urn.net, and the NAPTRs for duns.urn.net looked + up. It might return records of the form: + +duns.urn.net +;; order pref flags service regexp replacement + IN NAPTR 100 10 "s" "dunslink+N2L+N2C" "" dunslink.udp.isi.dandb.com + IN NAPTR 100 20 "s" "rcds+N2C" "" rcds.udp.isi.dandb.com + IN NAPTR 100 30 "s" "http+N2L+N2C+N2R" "" http.tcp.isi.dandb.com + + The order field contains equal values, indicating that no name + delegation order has to be followed. The preference field indicates + that the provider would like clients to use the special dunslink + protocol, followed by the RCDS protocol, and that HTTP is offered as + a last resort. All the records specify the "s" flag, which will be + explained momentarily. The service fields say that if we speak + dunslink, we will be able to issue either the N2L or N2C requests to + obtain a URL or a URC (description) of the resource. The Resource + Cataloging and Distribution Service (RCDS)[7] could be used to get a + URC for the resource, while HTTP could be used to get a URL, URC, or + the resource itself. All the records supply the next domain name to + query, none of them need to be rewritten with the aid of regular + expressions. + + + + + + +Daniel & Mealling Experimental [Page 6] + +RFC 2168 Resolution of URIs Using the DNS June 1997 + + + The general case might require multiple NAPTR rewrites to locate a + resolver, but eventually we will come to the "terminal NAPTR". Once + we have the terminal NAPTR, our next probe into the DNS will be for a + SRV or A record instead of another NAPTR. Rather than probing for a + non-existent NAPTR record to terminate the loop, the flags field is + used to indicate a terminal lookup. If it has a value of "s", the + next lookup should be for SRV RRs, "a" denotes that A records should + sought. A "p" flag is also provided to indicate that the next action + is Protocol-specific, but that looking up another NAPTR will not be + part of it. + + Since our example RR specified the "s" flag, it was terminal. + Assuming our client does not know the dunslink protocol, our next + action is to lookup SRV RRs for rcds.udp.isi.dandb.com, which will + tell us hosts that can provide the necessary resolution service. That + lookup might return: + + ;; Pref Weight Port Target + rcds.udp.isi.dandb.com IN SRV 0 0 1000 defduns.isi.dandb.com + IN SRV 0 0 1000 dbmirror.com.au + IN SRV 0 0 1000 ukmirror.com.uk + + telling us three hosts that could actually do the resolution, and + giving us the port we should use to talk to their RCDS server. (The + reader is referred to the SRV proposal [4] for the interpretation of + the fields above). + + There is opportunity for significant optimization here. We can return + the SRV records as additional information for terminal NAPTRs (and + the A records as additional information for those SRVs). While this + recursive provision of additional information is not explicitly + blessed in the DNS specifications, it is not forbidden, and BIND does + take advantage of it [8]. This is a significant optimization. In + conjunction with a long TTL for *.urn.net records, the average number + of probes to DNS for resolving DUNS URNs would approach one. + Therefore, DNS server implementors SHOULD provide additional + information with NAPTR responses. The additional information will be + either SRV or A records. If SRV records are available, their A + records should be provided as recursive additional information. + + Note that the example NAPTR records above are intended to represent + the reply the client will see. They are not quite identical to what + the domain administrator would put into the zone files. For one + thing, the administrator should supply the trailing '.' character on + any FQDNs. + + + + + + +Daniel & Mealling Experimental [Page 7] + +RFC 2168 Resolution of URIs Using the DNS June 1997 + + +Example 2 +--------- + + Consider a URN namespace based on MIME Content-Ids. The URN might + look like this: + + urn:cid:199606121851.1@mordred.gatech.edu + + (Note that this example is chosen for pedagogical purposes, and does + not conform to the recently-approved CID URL scheme.) + + The first step in the resolution process is to find out about the CID + namespace. The namespace identifier, cid, is extracted from the URN, + prepended to urn.net, and the NAPTR for cid.urn.net looked up. It + might return records of the form: + + cid.urn.net + ;; order pref flags service regexp replacement + IN NAPTR 100 10 "" "" "/urn:cid:.+@([^\.]+\.)(.*)$/\2/i" . + + We have only one NAPTR response, so ordering the responses is not a + problem. The replacement field is empty, so we check the regexp + field and use the pattern provided there. We apply that regexp to the + entire URN to see if it matches, which it does. The \2 part of the + substitution expression returns the string "gatech.edu". Since the + flags field does not contain "s" or "a", the lookup is not terminal + and our next probe to DNS is for more NAPTR records: + lookup(query=NAPTR, "gatech.edu"). + + Note that the rule does not extract the full domain name from the + CID, instead it assumes the CID comes from a host and extracts its + domain. While all hosts, such as mordred, could have their very own + NAPTR, maintaining those records for all the machines at a site as + large as Georgia Tech would be an intolerable burden. Wildcards are + not appropriate here since they only return results when there is no + exactly matching names already in the system. + + The record returned from the query on "gatech.edu" might look like: + +gatech.edu IN NAPTR +;; order pref flags service regexp replacement + IN NAPTR 100 50 "s" "z3950+N2L+N2C" "" z3950.tcp.gatech.edu + IN NAPTR 100 50 "s" "rcds+N2C" "" rcds.udp.gatech.edu + IN NAPTR 100 50 "s" "http+N2L+N2C+N2R" "" http.tcp.gatech.edu + + + + + + + +Daniel & Mealling Experimental [Page 8] + +RFC 2168 Resolution of URIs Using the DNS June 1997 + + + Continuing with our example, we note that the values of the order and + preference fields are equal in all records, so the client is free to + pick any record. The flags field tells us that these are the last + NAPTR patterns we should see, and after the rewrite (a simple + replacement in this case) we should look up SRV records to get + information on the hosts that can provide the necessary service. + + Assuming we prefer the Z39.50 protocol, our lookup might return: + + ;; Pref Weight Port Target + z3950.tcp.gatech.edu IN SRV 0 0 1000 z3950.gatech.edu + IN SRV 0 0 1000 z3950.cc.gatech.edu + IN SRV 0 0 1000 z3950.uga.edu + + telling us three hosts that could actually do the resolution, and + giving us the port we should use to talk to their Z39.50 server. + + Recall that the regular expression used \2 to extract a domain name + from the CID, and \. for matching the literal '.' characters + seperating the domain name components. Since '\' is the escape + character, literal occurances of a backslash must be escaped by + another backslash. For the case of the cid.urn.net record above, the + regular expression entered into the zone file should be + "/urn:cid:.+@([^\\.]+\\.)(.*)$/\\2/i". When the client code actually + receives the record, the pattern will have been converted to + "/urn:cid:.+@([^.]+\.)(.*)$/\2/i". + +Example 3 +--------- + + Even if URN systems were in place now, there would still be a + tremendous number of URLs. It should be possible to develop a URN + resolution system that can also provide location independence for + those URLs. This is related to the requirement in [1] to be able to + grandfather in names from other naming systems, such as ISO Formal + Public Identifiers, Library of Congress Call Numbers, ISBNs, ISSNs, + etc. + + The NAPTR RR could also be used for URLs that have already been + assigned. Assume we have the URL for a very popular piece of + software that the publisher wishes to mirror at multiple sites around + the world: + + http://www.foo.com/software/latest-beta.exe + + + + + + + +Daniel & Mealling Experimental [Page 9] + +RFC 2168 Resolution of URIs Using the DNS June 1997 + + + We extract the prefix, "http", and lookup NAPTR records for + http.urn.net. This might return a record of the form + + http.urn.net IN NAPTR + ;; order pref flags service regexp replacement + 100 90 "" "" "!http://([^/:]+)!\1!i" . + + This expression returns everything after the first double slash and + before the next slash or colon. (We use the '!' character to delimit + the parts of the substitution expression. Otherwise we would have to + use backslashes to escape the forward slashes, and would have a + regexp in the zone file that looked like + "/http:\\/\\/([^\\/:]+)/\\1/i".). + + Applying this pattern to the URL extracts "www.foo.com". Looking up + NAPTR records for that might return: + + www.foo.com + ;; order pref flags service regexp replacement + IN NAPTR 100 100 "s" "http+L2R" "" http.tcp.foo.com + IN NAPTR 100 100 "s" "ftp+L2R" "" ftp.tcp.foo.com + + Looking up SRV records for http.tcp.foo.com would return information + on the hosts that foo.com has designated to be its mirror sites. The + client can then pick one for the user. + +NAPTR RR Format +=============== + + The format of the NAPTR RR is given below. The DNS type code for + NAPTR is 35. + + Domain TTL Class Order Preference Flags Service Regexp + Replacement + + where: + + Domain + The domain name this resource record refers to. + TTL + Standard DNS Time To Live field + Class + Standard DNS meaning + + + + + + + + +Daniel & Mealling Experimental [Page 10] + +RFC 2168 Resolution of URIs Using the DNS June 1997 + + + Order + A 16-bit integer specifying the order in which the NAPTR + records MUST be processed to ensure correct delegation of + portions of the namespace over time. Low numbers are processed + before high numbers, and once a NAPTR is found that "matches" + a URN, the client MUST NOT consider any NAPTRs with a higher + value for order. + + Preference + A 16-bit integer which specifies the order in which NAPTR + records with equal "order" values SHOULD be processed, low + numbers being processed before high numbers. This is similar + to the preference field in an MX record, and is used so domain + administrators can direct clients towards more capable hosts + or lighter weight protocols. + + Flags + A String giving flags to control aspects of the rewriting and + interpretation of the fields in the record. Flags are single + characters from the set [A-Z0-9]. The case of the alphabetic + characters is not significant. + + At this time only three flags, "S", "A", and "P", are defined. + "S" means that the next lookup should be for SRV records + instead of NAPTR records. "A" means that the next lookup + should be for A records. The "P" flag says that the remainder + of the resolution shall be carried out in a Protocol-specific + fashion, and we should not do any more DNS queries. + + The remaining alphabetic flags are reserved. The numeric flags + may be used for local experimentation. The S, A, and P flags + are all mutually exclusive, and resolution libraries MAY + signal an error if more than one is given. (Experimental code + and code for assisting in the creation of NAPTRs would be more + likely to signal such an error than a client such as a + browser). We anticipate that multiple flags will be allowed in + the future, so implementers MUST NOT assume that the flags + field can only contain 0 or 1 characters. Finally, if a client + encounters a record with an unknown flag, it MUST ignore it + and move to the next record. This test takes precedence even + over the "order" field. Since flags can control the + interpretation placed on fields, a novel flag might change the + interpretation of the regexp and/or replacement fields such + that it is impossible to determine if a record matched a URN. + + + + + + + +Daniel & Mealling Experimental [Page 11] + +RFC 2168 Resolution of URIs Using the DNS June 1997 + + + Service + Specifies the resolution service(s) available down this + rewrite path. It may also specify the particular protocol that + is used to talk with a resolver. A protocol MUST be specified + if the flags field states that the NAPTR is terminal. If a + protocol is specified, but the flags field does not state that + the NAPTR is terminal, the next lookup MUST be for a NAPTR. + The client MAY choose not to perform the next lookup if the + protocol is unknown, but that behavior MUST NOT be relied + upon. + + The service field may take any of the values below (using the + Augmented BNF of RFC 822[9]): + + service_field = [ [protocol] *("+" rs)] + protocol = ALPHA *31ALPHANUM + rs = ALPHA *31ALPHANUM + // The protocol and rs fields are limited to 32 + // characters and must start with an alphabetic. + // The current set of "known" strings are: + // protocol = "rcds" / "thttp" / "hdl" / "rwhois" / "z3950" + // rs = "N2L" / "N2Ls" / "N2R" / "N2Rs" / "N2C" + // / "N2Ns" / "L2R" / "L2Ns" / "L2Ls" / "L2C" + + i.e. an optional protocol specification followed by 0 or more + resolution services. Each resolution service is indicated by + an initial '+' character. + + Note that the empty string is also a valid service field. This + will typically be seen at the top levels of a namespace, when + it is impossible to know what services and protocols will be + offered by a particular publisher within that name space. + + At this time the known protocols are rcds[7], hdl[10] (binary, + UDP-based protocols), thttp[5] (a textual, TCP-based + protocol), rwhois[11] (textual, UDP or TCP based), and + Z39.50[12] (binary, TCP-based). More will be allowed later. + The names of the protocols must be formed from the characters + [a-Z0-9]. Case of the characters is not significant. + + The service requests currently allowed will be described in + more detail in [6], but in brief they are: + N2L - Given a URN, return a URL + N2Ls - Given a URN, return a set of URLs + N2R - Given a URN, return an instance of the resource. + N2Rs - Given a URN, return multiple instances of the + resource, typically encoded using + multipart/alternative. + + + +Daniel & Mealling Experimental [Page 12] + +RFC 2168 Resolution of URIs Using the DNS June 1997 + + + N2C - Given a URN, return a collection of meta- + information on the named resource. The format of + this response is the subject of another document. + N2Ns - Given a URN, return all URNs that are also + identifers for the resource. + L2R - Given a URL, return the resource. + L2Ns - Given a URL, return all the URNs that are + identifiers for the resource. + L2Ls - Given a URL, return all the URLs for instances of + of the same resource. + L2C - Given a URL, return a description of the + resource. + + The actual format of the service request and response will be + determined by the resolution protocol, and is the subject for + other documents (e.g. [5]). Protocols need not offer all + services. The labels for service requests shall be formed from + the set of characters [A-Z0-9]. The case of the alphabetic + characters is not significant. + + Regexp + A STRING containing a substitution expression that is applied + to the original URI in order to construct the next domain name + to lookup. The grammar of the substitution expression is given + in the next section. + + Replacement + The next NAME to query for NAPTR, SRV, or A records depending + on the value of the flags field. As mentioned above, this may + be compressed. + +Substitution Expression Grammar: +================================ + + The content of the regexp field is a substitution expression. True + sed(1) substitution expressions are not appropriate for use in this + application for a variety of reasons, therefore the contents of the + regexp field MUST follow the grammar below: + +subst_expr = delim-char ere delim-char repl delim-char *flags +delim-char = "/" / "!" / ... (Any non-digit or non-flag character other + than backslash '\'. All occurances of a delim_char in a + subst_expr must be the same character.) +ere = POSIX Extended Regular Expression (see [13], section + 2.8.4) +repl = dns_str / backref / repl dns_str / repl backref +dns_str = 1*DNS_CHAR +backref = "\" 1POS_DIGIT + + + +Daniel & Mealling Experimental [Page 13] + +RFC 2168 Resolution of URIs Using the DNS June 1997 + + +flags = "i" +DNS_CHAR = "-" / "0" / ... / "9" / "a" / ... / "z" / "A" / ... / "Z" +POS_DIGIT = "1" / "2" / ... / "9" ; 0 is not an allowed backref +value domain name (see RFC-1123 [14]). + + The result of applying the substitution expression to the original + URI MUST result in a string that obeys the syntax for DNS host names + [14]. Since it is possible for the regexp field to be improperly + specified, such that a non-conforming host name can be constructed, + client software SHOULD verify that the result is a legal host name + before making queries on it. + + Backref expressions in the repl portion of the substitution + expression are replaced by the (possibly empty) string of characters + enclosed by '(' and ')' in the ERE portion of the substitution + expression. N is a single digit from 1 through 9, inclusive. It + specifies the N'th backref expression, the one that begins with the + N'th '(' and continues to the matching ')'. For example, the ERE + (A(B(C)DE)(F)G) + has backref expressions: + \1 = ABCDEFG + \2 = BCDE + \3 = C + \4 = F + \5..\9 = error - no matching subexpression + + The "i" flag indicates that the ERE matching SHALL be performed in a + case-insensitive fashion. Furthermore, any backref replacements MAY + be normalized to lower case when the "i" flag is given. + + The first character in the substitution expression shall be used as + the character that delimits the components of the substitution + expression. There must be exactly three non-escaped occurrences of + the delimiter character in a substitution expression. Since escaped + occurrences of the delimiter character will be interpreted as + occurrences of that character, digits MUST NOT be used as delimiters. + Backrefs would be confused with literal digits were this allowed. + Similarly, if flags are specified in the substitution expression, the + delimiter character must not also be a flag character. + + + + + + + + + + + + +Daniel & Mealling Experimental [Page 14] + +RFC 2168 Resolution of URIs Using the DNS June 1997 + + +Advice to domain administrators: +================================ + + Beware of regular expressions. Not only are they a pain to get + correct on their own, but there is the previously mentioned + interaction with DNS. Any backslashes in a regexp must be entered + twice in a zone file in order to appear once in a query response. + More seriously, the need for double backslashes has probably not been + tested by all implementors of DNS servers. We anticipate that urn.net + will be the heaviest user of regexps. Only when delegating portions + of namespaces should the typical domain administrator need to use + regexps. + + On a related note, beware of interactions with the shell when + manipulating regexps from the command line. Since '\' is a common + escape character in shells, there is a good chance that when you + think you are saying "\\" you are actually saying "\". Similar + caveats apply to characters such as + + The "a" flag allows the next lookup to be for A records rather than + SRV records. Since there is no place for a port specification in the + NAPTR record, when the "A" flag is used the specified protocol must + be running on its default port. + + The URN Sytnax draft defines a canonical form for each URN, which + requires %encoding characters outside a limited repertoire. The + regular expressions MUST be written to operate on that canonical + form. Since international character sets will end up with extensive + use of %encoded characters, regular expressions operating on them + will be essentially impossible to read or write by hand. + +Usage +===== + + For the edification of implementers, pseudocode for a client routine + using NAPTRs is given below. This code is provided merely as a + convience, it does not have any weight as a standard way to process + NAPTR records. Also, as is the case with pseudocode, it has never + been executed and may contain logical errors. You have been warned. + + // + // findResolver(URN) + // Given a URN, find a host that can resolve it. + // + findResolver(string URN) { + // prepend prefix to urn.net + sprintf(key, "%s.urn.net", extractNS(URN)); + do { + + + +Daniel & Mealling Experimental [Page 15] + +RFC 2168 Resolution of URIs Using the DNS June 1997 + + + rewrite_flag = false; + terminal = false; + if (key has been seen) { + quit with a loop detected error + } + add key to list of "seens" + records = lookup(type=NAPTR, key); // get all NAPTR RRs for 'key' + + discard any records with an unknown value in the "flags" field. + sort NAPTR records by "order" field and "preference" field + (with "order" being more significant than "preference"). + n_naptrs = number of NAPTR records in response. + curr_order = records[0].order; + max_order = records[n_naptrs-1].order; + + // Process current batch of NAPTRs according to "order" field. + for (j=0; j < n_naptrs && records[j].order <= max_order; j++) { + if (unknown_flag) // skip this record and go to next one + continue; + newkey = rewrite(URN, naptr[j].replacement, naptr[j].regexp); + if (!newkey) // Skip to next record if the rewrite didn't + match continue; + // We did do a rewrite, shrink max_order to current value + // so that delegation works properly + max_order = naptr[j].order; + // Will we know what to do with the protocol and services + // specified in the NAPTR? If not, try next record. + if(!isKnownProto(naptr[j].services)) { + continue; + } + if(!isKnownService(naptr[j].services)) { + continue; + } + + // At this point we have a successful rewrite and we will + // know how to speak the protocol and request a known + // resolution service. Before we do the next lookup, check + // some optimization possibilities. + + if (strcasecmp(flags, "S") + || strcasecmp(flags, "P")) + || strcasecmp(flags, "A")) { + terminal = true; + services = naptr[j].services; + addnl = any SRV and/or A records returned as additional + info for naptr[j]. + } + key = newkey; + + + +Daniel & Mealling Experimental [Page 16] + +RFC 2168 Resolution of URIs Using the DNS June 1997 + + + rewriteflag = true; + break; + } + } while (rewriteflag && !terminal); + + // Did we not find our way to a resolver? + if (!rewrite_flag) { + report an error + return NULL; + } + + + // Leave rest to another protocol? + if (strcasecmp(flags, "P")) { + return key as host to talk to; + } + + // If not, keep plugging + if (!addnl) { // No SRVs came in as additional info, look them up + srvs = lookup(type=SRV, key); + } + + sort SRV records by preference, weight, ... + foreach (SRV record) { // in order of preference + try contacting srv[j].target using the protocol and one of the + resolution service requests from the "services" field of the + last NAPTR record. + if (successful) + return (target, protocol, service); + // Actually we would probably return a result, but this + // code was supposed to just tell us a good host to talk to. + } + die with an "unable to find a host" error; + } + +Notes: +====== + + - A client MUST process multiple NAPTR records in the order + specified by the "order" field, it MUST NOT simply use the first + record that provides a known protocol and service combination. + + + + + + + + + + +Daniel & Mealling Experimental [Page 17] + +RFC 2168 Resolution of URIs Using the DNS June 1997 + + + - If a record at a particular order matches the URI, but the + client doesn't know the specified protocol and service, the + client SHOULD continue to examine records that have the same + order. The client MUST NOT consider records with a higher value + of order. This is necessary to make delegation of portions of + the namespace work. The order field is what lets site + administrators say "all requests for URIs matching pattern x go + to server 1, all others go to server 2". + (A match is defined as: + 1) The NAPTR provides a replacement domain name + or + 2) The regular expression matches the URN + ) + + - When multiple RRs have the same "order", the client should use + the value of the preference field to select the next NAPTR to + consider. However, because of preferred protocols or services, + estimates of network distance and bandwidth, etc. clients may + use different criteria to sort the records. + - If the lookup after a rewrite fails, clients are strongly + encouraged to report a failure, rather than backing up to pursue + other rewrite paths. + - When a namespace is to be delegated among a set of resolvers, + regexps must be used. Each regexp appears in a separate NAPTR + RR. Administrators should do as little delegation as possible, + because of limitations on the size of DNS responses. + - Note that SRV RRs impose additional requirements on clients. + +Acknowledgments: +================= + + The editors would like to thank Keith Moore for all his consultations + during the development of this draft. We would also like to thank + Paul Vixie for his assistance in debugging our implementation, and + his answers on our questions. Finally, we would like to acknowledge + our enormous intellectual debt to the participants in the Knoxville + series of meetings, as well as to the participants in the URI and URN + working groups. + +References: +=========== + + [1] Sollins, Karen and Larry Masinter, "Functional Requirements + for Uniform Resource Names", RFC-1737, Dec. 1994. + + [2] The URN Implementors, Uniform Resource Names: A Progress Report, + http://www.dlib.org/dlib/february96/02arms.html, D-Lib Magazine, + February 1996. + + + +Daniel & Mealling Experimental [Page 18] + +RFC 2168 Resolution of URIs Using the DNS June 1997 + + + [3] Moats, Ryan, "URN Syntax", RFC-2141, May 1997. + + [4] Gulbrandsen, A. and P. Vixie, "A DNS RR for specifying + the location of services (DNS SRV)", RFC-2052, October 1996. + + [5] Daniel, Jr., Ron, "A Trivial Convention for using HTTP in URN + Resolution", RFC-2169, June 1997. + + [6] URN-WG, "URN Resolution Services", Work in Progress. + + [7] Moore, Keith, Shirley Browne, Jason Cox, and Jonathan Gettler, + Resource Cataloging and Distribution System, Technical Report + CS-97-346, University of Tennessee, Knoxville, December 1996 + + [8] Paul Vixie, personal communication. + + [9] Crocker, Dave H. "Standard for the Format of ARPA Internet Text + Messages", RFC-822, August 1982. + + [10] Orth, Charles and Bill Arms; Handle Resolution Protocol + Specification, http://www.handle.net/docs/client_spec.html + + [11] Williamson, S., M. Kosters, D. Blacka, J. Singh, K. Zeilstra, + "Referral Whois Protocol (RWhois)", RFC-2167, June 1997. + + [12] Information Retrieval (Z39.50): Application Service Definition + and Protocol Specification, ANSI/NISO Z39.50-1995, July 1995. + + [13] IEEE Standard for Information Technology - Portable Operating + System Interface (POSIX) - Part 2: Shell and Utilities (Vol. 1); + IEEE Std 1003.2-1992; The Institute of Electrical and + Electronics Engineers; New York; 1993. ISBN:1-55937-255-9 + + [14] Braden, R., "Requirements for Internet Hosts - Application and + and Support", RFC-1123, Oct. 1989. + + [15] Sollins, Karen, "Requirements and a Framework for URN Resolution + Systems", November 1996, Work in Progress. + + + + + + + + + + + + + +Daniel & Mealling Experimental [Page 19] + +RFC 2168 Resolution of URIs Using the DNS June 1997 + + +Security Considerations +======================= + + The use of "urn.net" as the registry for URN namespaces is subject to + denial of service attacks, as well as other DNS spoofing attacks. The + interactions with DNSSEC are currently being studied. It is expected + that NAPTR records will be signed with SIG records once the DNSSEC + work is deployed. + + The rewrite rules make identifiers from other namespaces subject to + the same attacks as normal domain names. Since they have not been + easily resolvable before, this may or may not be considered a + problem. + + Regular expressions should be checked for sanity, not blindly passed + to something like PERL. + + This document has discussed a way of locating a resolver, but has not + discussed any detail of how the communication with the resolver takes + place. There are significant security considerations attached to the + communication with a resolver. Those considerations are outside the + scope of this document, and must be addressed by the specifications + for particular resolver communication protocols. + +Author Contact Information: +=========================== + + Ron Daniel + Los Alamos National Laboratory + MS B287 + Los Alamos, NM, USA, 87545 + voice: +1 505 665 0597 + fax: +1 505 665 4939 + email: rdaniel@lanl.gov + + + Michael Mealling + Network Solutions + 505 Huntmar Park Drive + Herndon, VA 22070 + voice: (703) 742-0400 + fax: (703) 742-9552 + email: michaelm@internic.net + URL: http://www.netsol.com/ + + + + + + + +Daniel & Mealling Experimental [Page 20] + diff --git a/doc/rfc/rfc2181.txt b/doc/rfc/rfc2181.txt new file mode 100644 index 00000000..7899e1cb --- /dev/null +++ b/doc/rfc/rfc2181.txt @@ -0,0 +1,842 @@ + + + + + + +Network Working Group R. Elz +Request for Comments: 2181 University of Melbourne +Updates: 1034, 1035, 1123 R. Bush +Category: Standards Track RGnet, Inc. + July 1997 + + + Clarifications to the DNS Specification + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +1. Abstract + + This document considers some areas that have been identified as + problems with the specification of the Domain Name System, and + proposes remedies for the defects identified. Eight separate issues + are considered: + + + IP packet header address usage from multi-homed servers, + + TTLs in sets of records with the same name, class, and type, + + correct handling of zone cuts, + + three minor issues concerning SOA records and their use, + + the precise definition of the Time to Live (TTL) + + Use of the TC (truncated) header bit + + the issue of what is an authoritative, or canonical, name, + + and the issue of what makes a valid DNS label. + + The first six of these are areas where the correct behaviour has been + somewhat unclear, we seek to rectify that. The other two are already + adequately specified, however the specifications seem to be sometimes + ignored. We seek to reinforce the existing specifications. + + + + + + + + + + + + + + +Elz & Bush Standards Track [Page 1] + +RFC 2181 Clarifications to the DNS Specification July 1997 + + + + +Contents + + 1 Abstract ................................................... 1 + 2 Introduction ............................................... 2 + 3 Terminology ................................................ 3 + 4 Server Reply Source Address Selection ...................... 3 + 5 Resource Record Sets ....................................... 4 + 6 Zone Cuts .................................................. 8 + 7 SOA RRs .................................................... 10 + 8 Time to Live (TTL) ......................................... 10 + 9 The TC (truncated) header bit .............................. 11 + 10 Naming issues .............................................. 11 + 11 Name syntax ................................................ 13 + 12 Security Considerations .................................... 14 + 13 References ................................................. 14 + 14 Acknowledgements ........................................... 15 + 15 Authors' Addresses ......................................... 15 + + + + +2. Introduction + + Several problem areas in the Domain Name System specification + [RFC1034, RFC1035] have been noted through the years [RFC1123]. This + document addresses several additional problem areas. The issues here + are independent. Those issues are the question of which source + address a multi-homed DNS server should use when replying to a query, + the issue of differing TTLs for DNS records with the same label, + class and type, and the issue of canonical names, what they are, how + CNAME records relate, what names are legal in what parts of the DNS, + and what is the valid syntax of a DNS name. + + Clarifications to the DNS specification to avoid these problems are + made in this memo. A minor ambiguity in RFC1034 concerned with SOA + records is also corrected, as is one in the definition of the TTL + (Time To Live) and some possible confusion in use of the TC bit. + + + + + + + + + + + + +Elz & Bush Standards Track [Page 2] + +RFC 2181 Clarifications to the DNS Specification July 1997 + + +3. Terminology + + This memo does not use the oft used expressions MUST, SHOULD, MAY, or + their negative forms. In some sections it may seem that a + specification is worded mildly, and hence some may infer that the + specification is optional. That is not correct. Anywhere that this + memo suggests that some action should be carried out, or must be + carried out, or that some behaviour is acceptable, or not, that is to + be considered as a fundamental aspect of this specification, + regardless of the specific words used. If some behaviour or action + is truly optional, that will be clearly specified by the text. + +4. Server Reply Source Address Selection + + Most, if not all, DNS clients, expect the address from which a reply + is received to be the same address as that to which the query + eliciting the reply was sent. This is true for servers acting as + clients for the purposes of recursive query resolution, as well as + simple resolver clients. The address, along with the identifier (ID) + in the reply is used for disambiguating replies, and filtering + spurious responses. This may, or may not, have been intended when + the DNS was designed, but is now a fact of life. + + Some multi-homed hosts running DNS servers generate a reply using a + source address that is not the same as the destination address from + the client's request packet. Such replies will be discarded by the + client because the source address of the reply does not match that of + a host to which the client sent the original request. That is, it + appears to be an unsolicited response. + +4.1. UDP Source Address Selection + + To avoid these problems, servers when responding to queries using UDP + must cause the reply to be sent with the source address field in the + IP header set to the address that was in the destination address + field of the IP header of the packet containing the query causing the + response. If this would cause the response to be sent from an IP + address that is not permitted for this purpose, then the response may + be sent from any legal IP address allocated to the server. That + address should be chosen to maximise the possibility that the client + will be able to use it for further queries. Servers configured in + such a way that not all their addresses are equally reachable from + all potential clients need take particular care when responding to + queries sent to anycast, multicast, or similar, addresses. + + + + + + + +Elz & Bush Standards Track [Page 3] + +RFC 2181 Clarifications to the DNS Specification July 1997 + + +4.2. Port Number Selection + + Replies to all queries must be directed to the port from which they + were sent. When queries are received via TCP this is an inherent + part of the transport protocol. For queries received by UDP the + server must take note of the source port and use that as the + destination port in the response. Replies should always be sent from + the port to which they were directed. Except in extraordinary + circumstances, this will be the well known port assigned for DNS + queries [RFC1700]. + +5. Resource Record Sets + + Each DNS Resource Record (RR) has a label, class, type, and data. It + is meaningless for two records to ever have label, class, type and + data all equal - servers should suppress such duplicates if + encountered. It is however possible for most record types to exist + with the same label, class and type, but with different data. Such a + group of records is hereby defined to be a Resource Record Set + (RRSet). + +5.1. Sending RRs from an RRSet + + A query for a specific (or non-specific) label, class, and type, will + always return all records in the associated RRSet - whether that be + one or more RRs. The response must be marked as "truncated" if the + entire RRSet will not fit in the response. + +5.2. TTLs of RRs in an RRSet + + Resource Records also have a time to live (TTL). It is possible for + the RRs in an RRSet to have different TTLs. No uses for this have + been found that cannot be better accomplished in other ways. This + can, however, cause partial replies (not marked "truncated") from a + caching server, where the TTLs for some but not all the RRs in the + RRSet have expired. + + Consequently the use of differing TTLs in an RRSet is hereby + deprecated, the TTLs of all RRs in an RRSet must be the same. + + Should a client receive a response containing RRs from an RRSet with + differing TTLs, it should treat this as an error. If the RRSet + concerned is from a non-authoritative source for this data, the + client should simply ignore the RRSet, and if the values were + required, seek to acquire them from an authoritative source. Clients + that are configured to send all queries to one, or more, particular + servers should treat those servers as authoritative for this purpose. + Should an authoritative source send such a malformed RRSet, the + + + +Elz & Bush Standards Track [Page 4] + +RFC 2181 Clarifications to the DNS Specification July 1997 + + + client should treat the RRs for all purposes as if all TTLs in the + RRSet had been set to the value of the lowest TTL in the RRSet. In + no case may a server send an RRSet with TTLs not all equal. + +5.3. DNSSEC Special Cases + + Two of the record types added by DNS Security (DNSSEC) [RFC2065] + require special attention when considering the formation of Resource + Record Sets. Those are the SIG and NXT records. It should be noted + that DNS Security is still very new, and there is, as yet, little + experience with it. Readers should be prepared for the information + related to DNSSEC contained in this document to become outdated as + the DNS Security specification matures. + +5.3.1. SIG records and RRSets + + A SIG record provides signature (validation) data for another RRSet + in the DNS. Where a zone has been signed, every RRSet in the zone + will have had a SIG record associated with it. The data type of the + RRSet is included in the data of the SIG RR, to indicate with which + particular RRSet this SIG record is associated. Were the rules above + applied, whenever a SIG record was included with a response to + validate that response, the SIG records for all other RRSets + associated with the appropriate node would also need to be included. + In some cases, this could be a very large number of records, not + helped by their being rather large RRs. + + Thus, it is specifically permitted for the authority section to + contain only those SIG RRs with the "type covered" field equal to the + type field of an answer being returned. However, where SIG records + are being returned in the answer section, in response to a query for + SIG records, or a query for all records associated with a name + (type=ANY) the entire SIG RRSet must be included, as for any other RR + type. + + Servers that receive responses containing SIG records in the + authority section, or (probably incorrectly) as additional data, must + understand that the entire RRSet has almost certainly not been + included. Thus, they must not cache that SIG record in a way that + would permit it to be returned should a query for SIG records be + received at that server. RFC2065 actually requires that SIG queries + be directed only to authoritative servers to avoid the problems that + could be caused here, and while servers exist that do not understand + the special properties of SIG records, this will remain necessary. + However, careful design of SIG record processing in new + implementations should permit this restriction to be relaxed in the + future, so resolvers do not need to treat SIG record queries + specially. + + + +Elz & Bush Standards Track [Page 5] + +RFC 2181 Clarifications to the DNS Specification July 1997 + + + It has been occasionally stated that a received request for a SIG + record should be forwarded to an authoritative server, rather than + being answered from data in the cache. This is not necessary - a + server that has the knowledge of SIG as a special case for processing + this way would be better to correctly cache SIG records, taking into + account their characteristics. Then the server can determine when it + is safe to reply from the cache, and when the answer is not available + and the query must be forwarded. + +5.3.2. NXT RRs + + Next Resource Records (NXT) are even more peculiar. There will only + ever be one NXT record in a zone for a particular label, so + superficially, the RRSet problem is trivial. However, at a zone cut, + both the parent zone, and the child zone (superzone and subzone in + RFC2065 terminology) will have NXT records for the same name. Those + two NXT records do not form an RRSet, even where both zones are + housed at the same server. NXT RRSets always contain just a single + RR. Where both NXT records are visible, two RRSets exist. However, + servers are not required to treat this as a special case when + receiving NXT records in a response. They may elect to notice the + existence of two different NXT RRSets, and treat that as they would + two different RRSets of any other type. That is, cache one, and + ignore the other. Security aware servers will need to correctly + process the NXT record in the received response though. + +5.4. Receiving RRSets + + Servers must never merge RRs from a response with RRs in their cache + to form an RRSet. If a response contains data that would form an + RRSet with data in a server's cache the server must either ignore the + RRs in the response, or discard the entire RRSet currently in the + cache, as appropriate. Consequently the issue of TTLs varying + between the cache and a response does not cause concern, one will be + ignored. That is, one of the data sets is always incorrect if the + data from an answer differs from the data in the cache. The + challenge for the server is to determine which of the data sets is + correct, if one is, and retain that, while ignoring the other. Note + that if a server receives an answer containing an RRSet that is + identical to that in its cache, with the possible exception of the + TTL value, it may, optionally, update the TTL in its cache with the + TTL of the received answer. It should do this if the received answer + would be considered more authoritative (as discussed in the next + section) than the previously cached answer. + + + + + + + +Elz & Bush Standards Track [Page 6] + +RFC 2181 Clarifications to the DNS Specification July 1997 + + +5.4.1. Ranking data + + When considering whether to accept an RRSet in a reply, or retain an + RRSet already in its cache instead, a server should consider the + relative likely trustworthiness of the various data. An + authoritative answer from a reply should replace cached data that had + been obtained from additional information in an earlier reply. + However additional information from a reply will be ignored if the + cache contains data from an authoritative answer or a zone file. + + The accuracy of data available is assumed from its source. + Trustworthiness shall be, in order from most to least: + + + Data from a primary zone file, other than glue data, + + Data from a zone transfer, other than glue, + + The authoritative data included in the answer section of an + authoritative reply. + + Data from the authority section of an authoritative answer, + + Glue from a primary zone, or glue from a zone transfer, + + Data from the answer section of a non-authoritative answer, and + non-authoritative data from the answer section of authoritative + answers, + + Additional information from an authoritative answer, + Data from the authority section of a non-authoritative answer, + Additional information from non-authoritative answers. + + Note that the answer section of an authoritative answer normally + contains only authoritative data. However when the name sought is an + alias (see section 10.1.1) only the record describing that alias is + necessarily authoritative. Clients should assume that other records + may have come from the server's cache. Where authoritative answers + are required, the client should query again, using the canonical name + associated with the alias. + + Unauthenticated RRs received and cached from the least trustworthy of + those groupings, that is data from the additional data section, and + data from the authority section of a non-authoritative answer, should + not be cached in such a way that they would ever be returned as + answers to a received query. They may be returned as additional + information where appropriate. Ignoring this would allow the + trustworthiness of relatively untrustworthy data to be increased + without cause or excuse. + + When DNS security [RFC2065] is in use, and an authenticated reply has + been received and verified, the data thus authenticated shall be + considered more trustworthy than unauthenticated data of the same + type. Note that throughout this document, "authoritative" means a + reply with the AA bit set. DNSSEC uses trusted chains of SIG and KEY + + + +Elz & Bush Standards Track [Page 7] + +RFC 2181 Clarifications to the DNS Specification July 1997 + + + records to determine the authenticity of data, the AA bit is almost + irrelevant. However DNSSEC aware servers must still correctly set + the AA bit in responses to enable correct operation with servers that + are not security aware (almost all currently). + + Note that, glue excluded, it is impossible for data from two + correctly configured primary zone files, two correctly configured + secondary zones (data from zone transfers) or data from correctly + configured primary and secondary zones to ever conflict. Where glue + for the same name exists in multiple zones, and differs in value, the + nameserver should select data from a primary zone file in preference + to secondary, but otherwise may choose any single set of such data. + Choosing that which appears to come from a source nearer the + authoritative data source may make sense where that can be + determined. Choosing primary data over secondary allows the source + of incorrect glue data to be discovered more readily, when a problem + with such data exists. Where a server can detect from two zone files + that one or more are incorrectly configured, so as to create + conflicts, it should refuse to load the zones determined to be + erroneous, and issue suitable diagnostics. + + "Glue" above includes any record in a zone file that is not properly + part of that zone, including nameserver records of delegated sub- + zones (NS records), address records that accompany those NS records + (A, AAAA, etc), and any other stray data that might appear. + +5.5. Sending RRSets (reprise) + + A Resource Record Set should only be included once in any DNS reply. + It may occur in any of the Answer, Authority, or Additional + Information sections, as required. However it should not be repeated + in the same, or any other, section, except where explicitly required + by a specification. For example, an AXFR response requires the SOA + record (always an RRSet containing a single RR) be both the first and + last record of the reply. Where duplicates are required this way, + the TTL transmitted in each case must be the same. + +6. Zone Cuts + + The DNS tree is divided into "zones", which are collections of + domains that are treated as a unit for certain management purposes. + Zones are delimited by "zone cuts". Each zone cut separates a + "child" zone (below the cut) from a "parent" zone (above the cut). + The domain name that appears at the top of a zone (just below the cut + that separates the zone from its parent) is called the zone's + "origin". The name of the zone is the same as the name of the domain + at the zone's origin. Each zone comprises that subset of the DNS + tree that is at or below the zone's origin, and that is above the + + + +Elz & Bush Standards Track [Page 8] + +RFC 2181 Clarifications to the DNS Specification July 1997 + + + cuts that separate the zone from its children (if any). The + existence of a zone cut is indicated in the parent zone by the + existence of NS records specifying the origin of the child zone. A + child zone does not contain any explicit reference to its parent. + +6.1. Zone authority + + The authoritative servers for a zone are enumerated in the NS records + for the origin of the zone, which, along with a Start of Authority + (SOA) record are the mandatory records in every zone. Such a server + is authoritative for all resource records in a zone that are not in + another zone. The NS records that indicate a zone cut are the + property of the child zone created, as are any other records for the + origin of that child zone, or any sub-domains of it. A server for a + zone should not return authoritative answers for queries related to + names in another zone, which includes the NS, and perhaps A, records + at a zone cut, unless it also happens to be a server for the other + zone. + + Other than the DNSSEC cases mentioned immediately below, servers + should ignore data other than NS records, and necessary A records to + locate the servers listed in the NS records, that may happen to be + configured in a zone at a zone cut. + +6.2. DNSSEC issues + + The DNS security mechanisms [RFC2065] complicate this somewhat, as + some of the new resource record types added are very unusual when + compared with other DNS RRs. In particular the NXT ("next") RR type + contains information about which names exist in a zone, and hence + which do not, and thus must necessarily relate to the zone in which + it exists. The same domain name may have different NXT records in + the parent zone and the child zone, and both are valid, and are not + an RRSet. See also section 5.3.2. + + Since NXT records are intended to be automatically generated, rather + than configured by DNS operators, servers may, but are not required + to, retain all differing NXT records they receive regardless of the + rules in section 5.4. + + For a secure parent zone to securely indicate that a subzone is + insecure, DNSSEC requires that a KEY RR indicating that the subzone + is insecure, and the parent zone's authenticating SIG RR(s) be + present in the parent zone, as they by definition cannot be in the + subzone. Where a subzone is secure, the KEY and SIG records will be + present, and authoritative, in that zone, but should also always be + present in the parent zone (if secure). + + + + +Elz & Bush Standards Track [Page 9] + +RFC 2181 Clarifications to the DNS Specification July 1997 + + + Note that in none of these cases should a server for the parent zone, + not also being a server for the subzone, set the AA bit in any + response for a label at a zone cut. + +7. SOA RRs + + Three minor issues concerning the Start of Zone of Authority (SOA) + Resource Record need some clarification. + +7.1. Placement of SOA RRs in authoritative answers + + RFC1034, in section 3.7, indicates that the authority section of an + authoritative answer may contain the SOA record for the zone from + which the answer was obtained. When discussing negative caching, + RFC1034 section 4.3.4 refers to this technique but mentions the + additional section of the response. The former is correct, as is + implied by the example shown in section 6.2.5 of RFC1034. SOA + records, if added, are to be placed in the authority section. + +7.2. TTLs on SOA RRs + + It may be observed that in section 3.2.1 of RFC1035, which defines + the format of a Resource Record, that the definition of the TTL field + contains a throw away line which states that the TTL of an SOA record + should always be sent as zero to prevent caching. This is mentioned + nowhere else, and has not generally been implemented. + Implementations should not assume that SOA records will have a TTL of + zero, nor are they required to send SOA records with a TTL of zero. + +7.3. The SOA.MNAME field + + It is quite clear in the specifications, yet seems to have been + widely ignored, that the MNAME field of the SOA record should contain + the name of the primary (master) server for the zone identified by + the SOA. It should not contain the name of the zone itself. That + information would be useless, as to discover it, one needs to start + with the domain name of the SOA record - that is the name of the + zone. + +8. Time to Live (TTL) + + The definition of values appropriate to the TTL field in STD 13 is + not as clear as it could be, with respect to how many significant + bits exist, and whether the value is signed or unsigned. It is + hereby specified that a TTL value is an unsigned number, with a + minimum value of 0, and a maximum value of 2147483647. That is, a + maximum of 2^31 - 1. When transmitted, this value shall be encoded + in the less significant 31 bits of the 32 bit TTL field, with the + + + +Elz & Bush Standards Track [Page 10] + +RFC 2181 Clarifications to the DNS Specification July 1997 + + + most significant, or sign, bit set to zero. + + Implementations should treat TTL values received with the most + significant bit set as if the entire value received was zero. + + Implementations are always free to place an upper bound on any TTL + received, and treat any larger values as if they were that upper + bound. The TTL specifies a maximum time to live, not a mandatory + time to live. + +9. The TC (truncated) header bit + + The TC bit should be set in responses only when an RRSet is required + as a part of the response, but could not be included in its entirety. + The TC bit should not be set merely because some extra information + could have been included, but there was insufficient room. This + includes the results of additional section processing. In such cases + the entire RRSet that will not fit in the response should be omitted, + and the reply sent as is, with the TC bit clear. If the recipient of + the reply needs the omitted data, it can construct a query for that + data and send that separately. + + Where TC is set, the partial RRSet that would not completely fit may + be left in the response. When a DNS client receives a reply with TC + set, it should ignore that response, and query again, using a + mechanism, such as a TCP connection, that will permit larger replies. + +10. Naming issues + + It has sometimes been inferred from some sections of the DNS + specification [RFC1034, RFC1035] that a host, or perhaps an interface + of a host, is permitted exactly one authoritative, or official, name, + called the canonical name. There is no such requirement in the DNS. + +10.1. CNAME resource records + + The DNS CNAME ("canonical name") record exists to provide the + canonical name associated with an alias name. There may be only one + such canonical name for any one alias. That name should generally be + a name that exists elsewhere in the DNS, though there are some rare + applications for aliases with the accompanying canonical name + undefined in the DNS. An alias name (label of a CNAME record) may, + if DNSSEC is in use, have SIG, NXT, and KEY RRs, but may have no + other data. That is, for any label in the DNS (any domain name) + exactly one of the following is true: + + + + + + +Elz & Bush Standards Track [Page 11] + +RFC 2181 Clarifications to the DNS Specification July 1997 + + + + one CNAME record exists, optionally accompanied by SIG, NXT, and + KEY RRs, + + one or more records exist, none being CNAME records, + + the name exists, but has no associated RRs of any type, + + the name does not exist at all. + +10.1.1. CNAME terminology + + It has been traditional to refer to the label of a CNAME record as "a + CNAME". This is unfortunate, as "CNAME" is an abbreviation of + "canonical name", and the label of a CNAME record is most certainly + not a canonical name. It is, however, an entrenched usage. Care + must therefore be taken to be very clear whether the label, or the + value (the canonical name) of a CNAME resource record is intended. + In this document, the label of a CNAME resource record will always be + referred to as an alias. + +10.2. PTR records + + Confusion about canonical names has lead to a belief that a PTR + record should have exactly one RR in its RRSet. This is incorrect, + the relevant section of RFC1034 (section 3.6.2) indicates that the + value of a PTR record should be a canonical name. That is, it should + not be an alias. There is no implication in that section that only + one PTR record is permitted for a name. No such restriction should + be inferred. + + Note that while the value of a PTR record must not be an alias, there + is no requirement that the process of resolving a PTR record not + encounter any aliases. The label that is being looked up for a PTR + value might have a CNAME record. That is, it might be an alias. The + value of that CNAME RR, if not another alias, which it should not be, + will give the location where the PTR record is found. That record + gives the result of the PTR type lookup. This final result, the + value of the PTR RR, is the label which must not be an alias. + +10.3. MX and NS records + + The domain name used as the value of a NS resource record, or part of + the value of a MX resource record must not be an alias. Not only is + the specification clear on this point, but using an alias in either + of these positions neither works as well as might be hoped, nor well + fulfills the ambition that may have led to this approach. This + domain name must have as its value one or more address records. + Currently those will be A records, however in the future other record + types giving addressing information may be acceptable. It can also + have other RRs, but never a CNAME RR. + + + + +Elz & Bush Standards Track [Page 12] + +RFC 2181 Clarifications to the DNS Specification July 1997 + + + Searching for either NS or MX records causes "additional section + processing" in which address records associated with the value of the + record sought are appended to the answer. This helps avoid needless + extra queries that are easily anticipated when the first was made. + + Additional section processing does not include CNAME records, let + alone the address records that may be associated with the canonical + name derived from the alias. Thus, if an alias is used as the value + of an NS or MX record, no address will be returned with the NS or MX + value. This can cause extra queries, and extra network burden, on + every query. It is trivial for the DNS administrator to avoid this + by resolving the alias and placing the canonical name directly in the + affected record just once when it is updated or installed. In some + particular hard cases the lack of the additional section address + records in the results of a NS lookup can cause the request to fail. + +11. Name syntax + + Occasionally it is assumed that the Domain Name System serves only + the purpose of mapping Internet host names to data, and mapping + Internet addresses to host names. This is not correct, the DNS is a + general (if somewhat limited) hierarchical database, and can store + almost any kind of data, for almost any purpose. + + The DNS itself places only one restriction on the particular labels + that can be used to identify resource records. That one restriction + relates to the length of the label and the full name. The length of + any one label is limited to between 1 and 63 octets. A full domain + name is limited to 255 octets (including the separators). The zero + length full name is defined as representing the root of the DNS tree, + and is typically written and displayed as ".". Those restrictions + aside, any binary string whatever can be used as the label of any + resource record. Similarly, any binary string can serve as the value + of any record that includes a domain name as some or all of its value + (SOA, NS, MX, PTR, CNAME, and any others that may be added). + Implementations of the DNS protocols must not place any restrictions + on the labels that can be used. In particular, DNS servers must not + refuse to serve a zone because it contains labels that might not be + acceptable to some DNS client programs. A DNS server may be + configurable to issue warnings when loading, or even to refuse to + load, a primary zone containing labels that might be considered + questionable, however this should not happen by default. + + Note however, that the various applications that make use of DNS data + can have restrictions imposed on what particular values are + acceptable in their environment. For example, that any binary label + can have an MX record does not imply that any binary name can be used + as the host part of an e-mail address. Clients of the DNS can impose + + + +Elz & Bush Standards Track [Page 13] + +RFC 2181 Clarifications to the DNS Specification July 1997 + + + whatever restrictions are appropriate to their circumstances on the + values they use as keys for DNS lookup requests, and on the values + returned by the DNS. If the client has such restrictions, it is + solely responsible for validating the data from the DNS to ensure + that it conforms before it makes any use of that data. + + See also [RFC1123] section 6.1.3.5. + +12. Security Considerations + + This document does not consider security. + + In particular, nothing in section 4 is any way related to, or useful + for, any security related purposes. + + Section 5.4.1 is also not related to security. Security of DNS data + will be obtained by the Secure DNS [RFC2065], which is mostly + orthogonal to this memo. + + It is not believed that anything in this document adds to any + security issues that may exist with the DNS, nor does it do anything + to that will necessarily lessen them. Correct implementation of the + clarifications in this document might play some small part in + limiting the spread of non-malicious bad data in the DNS, but only + DNSSEC can help with deliberate attempts to subvert DNS data. + +13. References + + [RFC1034] Mockapetris, P., "Domain Names - Concepts and Facilities", + STD 13, RFC 1034, November 1987. + + [RFC1035] Mockapetris, P., "Domain Names - Implementation and + Specification", STD 13, RFC 1035, November 1987. + + [RFC1123] Braden, R., "Requirements for Internet Hosts - application + and support", STD 3, RFC 1123, January 1989. + + [RFC1700] Reynolds, J., Postel, J., "Assigned Numbers", + STD 2, RFC 1700, October 1994. + + [RFC2065] Eastlake, D., Kaufman, C., "Domain Name System Security + Extensions", RFC 2065, January 1997. + + + + + + + + + +Elz & Bush Standards Track [Page 14] + +RFC 2181 Clarifications to the DNS Specification July 1997 + + +14. Acknowledgements + + This memo arose from discussions in the DNSIND working group of the + IETF in 1995 and 1996, the members of that working group are largely + responsible for the ideas captured herein. Particular thanks to + Donald E. Eastlake, 3rd, and Olafur Gudmundsson, for help with the + DNSSEC issues in this document, and to John Gilmore for pointing out + where the clarifications were not necessarily clarifying. Bob Halley + suggested clarifying the placement of SOA records in authoritative + answers, and provided the references. Michael Patton, as usual, and + Mark Andrews, Alan Barrett and Stan Barber provided much assistance + with many details. Josh Littlefield helped make sure that the + clarifications didn't cause problems in some irritating corner cases. + +15. Authors' Addresses + + Robert Elz + Computer Science + University of Melbourne + Parkville, Victoria, 3052 + Australia. + + EMail: kre@munnari.OZ.AU + + + Randy Bush + RGnet, Inc. + 5147 Crystal Springs Drive NE + Bainbridge Island, Washington, 98110 + United States. + + EMail: randy@psg.com + + + + + + + + + + + + + + + + + + + +Elz & Bush Standards Track [Page 15] diff --git a/doc/rfc/rfc2230.txt b/doc/rfc/rfc2230.txt new file mode 100644 index 00000000..03995fe2 --- /dev/null +++ b/doc/rfc/rfc2230.txt @@ -0,0 +1,619 @@ + + + + + + +Network Working Group R. Atkinson +Request for Comments: 2230 NRL +Category: Informational November 1997 + + + Key Exchange Delegation Record for the DNS + +Status of this Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard of any kind. Distribution of this + memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (1997). All Rights Reserved. + +ABSTRACT + + This note describes a mechanism whereby authorisation for one node to + act as key exchanger for a second node is delegated and made + available via the Secure DNS. This mechanism is intended to be used + only with the Secure DNS. It can be used with several security + services. For example, a system seeking to use IP Security [RFC- + 1825, RFC-1826, RFC-1827] to protect IP packets for a given + destination can use this mechanism to determine the set of authorised + remote key exchanger systems for that destination. + +1. INTRODUCTION + + + The Domain Name System (DNS) is the standard way that Internet nodes + locate information about addresses, mail exchangers, and other data + relating to remote Internet nodes. [RFC-1035, RFC-1034] More + recently, Eastlake and Kaufman have defined standards-track security + extensions to the DNS. [RFC-2065] These security extensions can be + used to authenticate signed DNS data records and can also be used to + store signed public keys in the DNS. + + The KX record is useful in providing an authenticatible method of + delegating authorisation for one node to provide key exchange + services on behalf of one or more, possibly different, nodes. This + note specifies the syntax and semantics of the KX record, which is + currently in limited deployment in certain IP-based networks. The + + + + + + + +Atkinson Informational [Page 1] + +RFC 2230 DNS Key Exchange Delegation Record November 1997 + + + reader is assumed to be familiar with the basics of DNS, including + familiarity with [RFC-1035, RFC-1034]. This document is not on the + IETF standards-track and does not specify any level of standard. + This document merely provides information for the Internet community. + +1.1 Identity Terminology + + This document relies upon the concept of "identity domination". This + concept might be new to the reader and so is explained in this + section. The subject of endpoint naming for security associations + has historically been somewhat contentious. This document takes no + position on what forms of identity should be used. In a network, + there are several forms of identity that are possible. + + For example, IP Security has defined notions of identity that + include: IP Address, IP Address Range, Connection ID, Fully-Qualified + Domain Name (FQDN), and User with Fully Qualified Domain Name (USER + FQDN). + + A USER FQDN identity dominates a FQDN identity. A FQDN identity in + turn dominates an IP Address identity. Similarly, a Connection ID + dominates an IP Address identity. An IP Address Range dominates each + IP Address identity for each IP address within that IP address range. + Also, for completeness, an IP Address identity is considered to + dominate itself. + +2. APPROACH + + This document specifies a new kind of DNS Resource Record (RR), known + as the Key Exchanger (KX) record. A Key Exchanger Record has the + mnemonic "KX" and the type code of 36. Each KX record is associated + with a fully-qualified domain name. The KX record is modeled on the + MX record described in [Part86]. Any given domain, subdomain, or host + entry in the DNS might have a KX record. + +2.1 IPsec Examples + + In these two examples, let S be the originating node and let D be the + destination node. S2 is another node on the same subnet as S. D2 is + another node on the same subnet as D. R1 and R2 are IPsec-capable + routers. The path from S to D goes via first R1 and later R2. The + return path from D to S goes via first R2 and later R1. + + IETF-standard IP Security uses unidirectional Security Associations + [RFC-1825]. Therefore, a typical IP session will use a pair of + related Security Associations, one in each direction. The examples + below talk about how to setup an example Security Association, but in + practice a pair of matched Security Associations will normally be + + + +Atkinson Informational [Page 2] + +RFC 2230 DNS Key Exchange Delegation Record November 1997 + + + used. + +2.1.1 Subnet-to-Subnet Example + + If neither S nor D implements IPsec, security can still be provided + between R1 and R2 by building a secure tunnel. This can use either + AH or ESP. + + S ---+ +----D + | | + +- R1 -----[zero or more routers]-------R2-+ + | | + S2---+ +----D2 + + Figure 1: Network Diagram for Subnet-to-Subnet Example + + In this example, R1 makes the policy decision to provide the IPsec + service for traffic from R1 destined for R2. Once R1 has decided + that the packet from S to D should be protected, it performs a secure + DNS lookup for the records associated with domain D. If R1 only + knows the IP address for D, then a secure reverse DNS lookup will be + necessary to determine the domain D, before that forward secure DNS + lookup for records associated with domain D. If these DNS records of + domain D include a KX record for the IPsec service, then R1 knows + which set of nodes are authorised key exchanger nodes for the + destination D. + + In this example, let there be at least one KX record for D and let + the most preferred KX record for D point at R2. R1 then selects a + key exchanger (in this example, R2) for D from the list obtained from + the secure DNS. Then R1 initiates a key management session with that + key exchanger (in this example, R2) to setup an IPsec Security + Association between R1 and D. In this example, R1 knows (either by + seeing an outbound packet arriving from S destined to D or via other + methods) that S will be sending traffic to D. In this example R1's + policy requires that traffic from S to D should be segregated at + least on a host-to-host basis, so R1 desires an IPsec Security + Association with source identity that dominates S, proxy identity + that dominates R1, and destination identity that dominates R2. + + In turn, R2 is able to authenticate the delegation of Key Exchanger + authorisation for target S to R1 by making an authenticated forward + DNS lookup for KX records associated with S and verifying that at + least one such record points to R1. The identity S is typically + given to R2 as part of the key management process between R1 and R2. + + + + + + +Atkinson Informational [Page 3] + +RFC 2230 DNS Key Exchange Delegation Record November 1997 + + + If D initially only knows the IP address of S, then it will need to + perform a secure reverse DNS lookup to obtain the fully-qualified + domain name for S prior to that secure forward DNS lookup. + + If R2 does not receive an authenticated DNS response indicating that + R1 is an authorised key exchanger for S, then D will not accept the + SA negotiation from R1 on behalf of identity S. + + If the proposed IPsec Security Association is acceptable to both R1 + and R2, each of which might have separate policies, then they create + that IPsec Security Association via Key Management. + + Note that for unicast traffic, Key Management will typically also + setup a separate (but related) IPsec Security Association for the + return traffic. That return IPsec Security Association will have + equivalent identities. In this example, that return IPsec Security + Association will have a source identity that dominates D, a proxy + identity that dominates R2, and a destination identity that dominates + R1. + + Once the IPsec Security Association has been created, then R1 uses it + to protect traffic from S destined for D via a secure tunnel that + originates at R1 and terminates at R2. For the case of unicast, R2 + will use the return IPsec Security Association to protect traffic + from D destined for S via a secure tunnel that originates at R2 and + terminates at R1. + +2.1.2 Subnet-to-Host Example + + Consider the case where D and R1 implement IPsec, but S does not + implement IPsec, which is an interesting variation on the previous + example. This example is shown in Figure 2 below. + + S ---+ + | + +- R1 -----[zero or more routers]-------D + | + S2---+ + + Figure 2: Network Diagram for Subnet-to-Host Example + + In this example, R1 makes the policy decision that IP Security is + needed for the packet travelling from S to D. Then, R1 performs the + secure DNS lookup for D and determines that D is its own key + exchanger, either from the existence of a KX record for D pointing to + D or from an authenticated DNS response indicating that no KX record + exists for D. If R1 does not initially know the domain name of D, + then prior to the above forward secure DNS lookup, R1 performs a + + + +Atkinson Informational [Page 4] + +RFC 2230 DNS Key Exchange Delegation Record November 1997 + + + secure reverse DNS lookup on the IP address of D to determine the + fully-qualified domain name for that IP address. R1 then initiates + key management with D to create an IPsec Security Association on + behalf of S. + + In turn, D can verify that R1 is authorised to create an IPsec + Security Association on behalf of S by performing a DNS KX record + lookup for target S. R1 usually provides identity S to D via key + management. If D only has the IP address of S, then D will need to + perform a secure reverse lookup on the IP address of S to determine + domain name S prior to the secure forward DNS lookup on S to locate + the KX records for S. + + If D does not receive an authenticated DNS response indicating that + R1 is an authorised key exchanger for S, then D will not accept the + SA negotiation from R1 on behalf of identity S. + + If the IPsec Security Association is successfully established between + R1 and D, that IPsec Security Association has a source identity that + dominates S's IP address, a proxy identity that dominates R1's IP + address, and a destination identity that dominates D's IP address. + + Finally, R1 begins providing the security service for packets from S + that transit R1 destined for D. When D receives such packets, D + examines the SA information during IPsec input processing and sees + that R1's address is listed as valid proxy address for that SA and + that S is the source address for that SA. Hence, D knows at input + processing time that R1 is authorised to provide security on behalf + of S. Therefore packets coming from R1 with valid IP security that + claim to be from S are trusted by D to have really come from S. + +2.1.3 Host to Subnet Example + + Now consider the above case from D's perspective (i.e. where D is + sending IP packets to S). This variant is sometimes known as the + Mobile Host or "roadwarrier" case. The same basic concepts apply, but + the details are covered here in hope of improved clarity. + + S ---+ + | + +- R1 -----[zero or more routers]-------D + | + S2---+ + + Figure 3: Network Diagram for Host-to-Subnet Example + + + + + + +Atkinson Informational [Page 5] + +RFC 2230 DNS Key Exchange Delegation Record November 1997 + + + In this example, D makes the policy decision that IP Security is + needed for the packets from D to S. Then D performs the secure DNS + lookup for S and discovers that a KX record for S exists and points + at R1. If D only has the IP address of S, then it performs a secure + reverse DNS lookup on the IP address of S prior to the forward secure + DNS lookup for S. + + D then initiates key management with R1, where R1 is acting on behalf + of S, to create an appropriate Security Association. Because D is + acting as its own key exchanger, R1 does not need to perform a secure + DNS lookup for KX records associated with D. + + D and R1 then create an appropriate IPsec Security Security + Association. This IPsec Security Association is setup as a secure + tunnel with a source identity that dominates D's IP Address and a + destination identity that dominates R1's IP Address. Because D + performs IPsec for itself, no proxy identity is needed in this IPsec + Security Association. If the proxy identity is non-null in this + situation, then the proxy identity must dominate D's IP Address. + + Finally, D sends secured IP packets to R1. R1 receives those + packets, provides IPsec input processing (including appropriate + inner/outer IP address validation), and forwards valid packets along + to S. + +2.2 Other Examples + + This mechanism can be extended for use with other services as well. + To give some insight into other possible uses, this section discusses + use of KX records in environments using a Key Distribution Center + (KDC), such as Kerberos [KN93], and a possible use of KX records in + conjunction with mobile nodes accessing the network via a dialup + service. + +2.2.1 KDC Examples + + This example considers the situation of a destination node + implementing IPsec that can only obtain its Security Association + information from a Key Distribution Center (KDC). Let the KDC + implement both the KDC protocol and also a non-KDC key management + protocol (e.g. ISAKMP). In such a case, each client node of the KDC + might have its own KX record pointing at the KDC so that nodes not + implementing the KDC protocol can still create Security Associations + with each of the client nodes of the KDC. + + In the event the session initiator were not using the KDC but the + session target was an IPsec node that only used the KDC, the + initiator would find the KX record for the target pointing at the + + + +Atkinson Informational [Page 6] + +RFC 2230 DNS Key Exchange Delegation Record November 1997 + + + KDC. Then, the external key management exchange (e.g. ISAKMP) would + be between the initiator and the KDC. Then the KDC would distribute + the IPsec SA to the KDC-only IPsec node using the KDC. The IPsec + traffic itself could travel directly between the initiator and the + destination node. + + In the event the initiator node could only use the KDC and the target + were not using the KDC, the initiator would send its request for a + key to the KDC. The KDC would then initiate an external key + management exchange (e.g. ISAKMP) with a node that the target's KX + record(s) pointed to, on behalf of the initiator node. + + The target node could verify that the KDC were allowed to proxy for + the initiator node by looking up the KX records for the initiator + node and finding a KX record for the initiator that listed the KDC. + + Then the external key exchange would be performed between the KDC and + the target node. Then the KDC would distribute the resulting IPsec + Security Association to the initiator. Again, IPsec traffic itself + could travel directly between the initiator and the destination. + +2.2.2 Dial-Up Host Example + + This example outlines a possible use of KX records with mobile hosts + that dial into the network via PPP and are dynamically assigned an IP + address and domain-name at dial-in time. + + Consider the situation where each mobile node is dynamically assigned + both a domain name and an IP address at the time that node dials into + the network. Let the policy require that each mobile node act as its + own Key Exchanger. In this case, it is important that dial-in nodes + use addresses from one or more well known IP subnets or address pools + dedicated to dial-in access. If that is true, then no KX record or + other action is needed to ensure that each node will act as its own + Key Exchanger because lack of a KX record indicates that the node is + its own Key Exchanger. + + Consider the situation where the mobile node's domain name remains + constant but its IP address changes. Let the policy require that + each mobile node act as its own Key Exchanger. In this case, there + might be operational problems when another node attempts to perform a + secure reverse DNS lookup on the IP address to determine the + corresponding domain name. The authenticated DNS binding (in the + form of a PTR record) between the mobile node's currently assigned IP + address and its permanent domain name will need to be securely + updated each time the node is assigned a new IP address. There are + no mechanisms for accomplishing this that are both IETF-standard and + widely deployed as of the time this note was written. Use of Dynamic + + + +Atkinson Informational [Page 7] + +RFC 2230 DNS Key Exchange Delegation Record November 1997 + + + DNS Update without authentication is a significant security risk and + hence is not recommended for this situation. + +3. SYNTAX OF KX RECORD + + A KX record has the DNS TYPE of "KX" and a numeric value of 36. A KX + record is a member of the Internet ("IN") CLASS in the DNS. Each KX + record is associated with a <domain-name> entry in the DNS. A KX + record has the following textual syntax: + + <domain-name> IN KX <preference> <domain-name> + + For this description, let the <domain-name> item to the left of the + "KX" string be called <domain-name 1> and the <domain-name> item to + the right of the "KX" string be called <domain-name 2>. <preference> + is a non-negative integer. + + Internet nodes about to initiate a key exchange with <domain-name 1> + should instead contact <domain-name 2> to initiate the key exchange + for a security service between the initiator and <domain-name 2>. If + more than one KX record exists for <domain-name 1>, then the + <preference> field is used to indicate preference among the systems + delegated to. Lower values are preferred over higher values. The + <domain-name 2> is authorised to provide key exchange services on + behalf of <domain-name 1>. The <domain-name 2> MUST have a CNAME + record, an A record, or an AAAA record associated with it. + +3.1 KX RDATA format + + The KX DNS record has the following RDATA format: + + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | PREFERENCE | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + / EXCHANGER / + / / + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + + where: + + PREFERENCE A 16 bit non-negative integer which specifies the + preference given to this RR among other KX records + at the same owner. Lower values are preferred. + + EXCHANGER A <domain-name> which specifies a host willing to + act as a mail exchange for the owner name. + + + + + +Atkinson Informational [Page 8] + +RFC 2230 DNS Key Exchange Delegation Record November 1997 + + + KX records MUST cause type A additional section processing for the + host specified by EXCHANGER. In the event that the host processing + the DNS transaction supports IPv6, KX records MUST also cause type + AAAA additional section processing. + + The KX RDATA field MUST NOT be compressed. + +4. SECURITY CONSIDERATIONS + + KX records MUST always be signed using the method(s) defined by the + DNS Security extensions specified in [RFC-2065]. All unsigned KX + records MUST be ignored because of the security vulnerability caused + by assuming that unsigned records are valid. All signed KX records + whose signatures do not correctly validate MUST be ignored because of + the potential security vulnerability in trusting an invalid KX + record. + + KX records MUST be ignored by systems not implementing Secure DNS + because such systems have no mechanism to authenticate the KX record. + + If a node does not have a permanent DNS entry and some form of + Dynamic DNS Update is in use, then those dynamic DNS updates MUST be + fully authenticated to prevent an adversary from injecting false DNS + records (especially the KX, A, and PTR records) into the Domain Name + System. If false records were inserted into the DNS without being + signed by the Secure DNS mechanisms, then a denial-of-service attack + results. If false records were inserted into the DNS and were + (erroneously) signed by the signing authority, then an active attack + results. + + Myriad serious security vulnerabilities can arise if the restrictions + throuhout this document are not strictly adhered to. Implementers + should carefully consider the openly published issues relating to DNS + security [Bell95,Vixie95] as they build their implementations. + Readers should also consider the security considerations discussed in + the DNS Security Extensions document [RFC-2065]. + +5. REFERENCES + + + [RFC-1825] Atkinson, R., "IP Authentication Header", RFC 1826, + August 1995. + + [RFC-1827] Atkinson, R., "IP Encapsulating Security Payload", + RFC 1827, August 1995. + + + + + + +Atkinson Informational [Page 9] + +RFC 2230 DNS Key Exchange Delegation Record November 1997 + + + [Bell95] Bellovin, S., "Using the Domain Name System for System + Break-ins", Proceedings of 5th USENIX UNIX Security + Symposium, USENIX Association, Berkeley, CA, June 1995. + ftp://ftp.research.att.com/dist/smb/dnshack.ps + + [RFC-2065] Eastlake, D., and C. Kaufman, "Domain Name System + Security Extensions", RFC 2065, January 1997. + + [RFC-1510] Kohl J., and C. Neuman, "The Kerberos Network + Authentication Service", RFC 1510, September 1993. + + [RFC-1035] Mockapetris, P., "Domain names - implementation and + specification", STD 13, RFC 1035, November 1987. + + [RFC-1034] Mockapetris, P., "Domain names - concepts and + facilities", STD 13, RFC 1034, November 1987. + + [Vixie95] P. Vixie, "DNS and BIND Security Issues", Proceedings of + the 5th USENIX UNIX Security Symposium, USENIX + Association, Berkeley, CA, June 1995. + ftp://ftp.vix.com/pri/vixie/bindsec.psf + +ACKNOWLEDGEMENTS + + Development of this DNS record was primarily performed during 1993 + through 1995. The author's work on this was sponsored jointly by the + Computing Systems Technology Office (CSTO) of the Advanced Research + Projects Agency (ARPA) and by the Information Security Program Office + (PD71E), Space & Naval Warface Systems Command (SPAWAR). In that + era, Dave Mihelcic and others provided detailed review and + constructive feedback. More recently, Bob Moscowitz and Todd Welch + provided detailed review and constructive feedback of a work in + progress version of this document. + +AUTHOR'S ADDRESS + + Randall Atkinson + Code 5544 + Naval Research Laboratory + 4555 Overlook Avenue, SW + Washington, DC 20375-5337 + + Phone: (DSN) 354-8590 + EMail: atkinson@itd.nrl.navy.mil + + + + + + + +Atkinson Informational [Page 10] + +RFC 2230 DNS Key Exchange Delegation Record November 1997 + + +Full Copyright Statement + + Copyright (C) The Internet Society (1997). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implmentation may be prepared, copied, published + andand distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + + + + + + + + + + + + + + + + + + + + + + + + +Atkinson Informational [Page 11] + diff --git a/doc/rfc/rfc2308.txt b/doc/rfc/rfc2308.txt new file mode 100644 index 00000000..9123a952 --- /dev/null +++ b/doc/rfc/rfc2308.txt @@ -0,0 +1,1067 @@ + + + + + + +Network Working Group M. Andrews +Request for Comments: 2308 CSIRO +Updates: 1034, 1035 March 1998 +Category: Standards Track + + + Negative Caching of DNS Queries (DNS NCACHE) + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (1998). All Rights Reserved. + +Abstract + + [RFC1034] provided a description of how to cache negative responses. + It however had a fundamental flaw in that it did not allow a name + server to hand out those cached responses to other resolvers, thereby + greatly reducing the effect of the caching. This document addresses + issues raise in the light of experience and replaces [RFC1034 Section + 4.3.4]. + + Negative caching was an optional part of the DNS specification and + deals with the caching of the non-existence of an RRset [RFC2181] or + domain name. + + Negative caching is useful as it reduces the response time for + negative answers. It also reduces the number of messages that have + to be sent between resolvers and name servers hence overall network + traffic. A large proportion of DNS traffic on the Internet could be + eliminated if all resolvers implemented negative caching. With this + in mind negative caching should no longer be seen as an optional part + of a DNS resolver. + + + + + + + + + + + +Andrews Standards Track [Page 1] + +RFC 2308 DNS NCACHE March 1998 + + +1 - Terminology + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in [RFC2119]. + + "Negative caching" - the storage of knowledge that something does not + exist. We can store the knowledge that a record has a particular + value. We can also do the reverse, that is, to store the knowledge + that a record does not exist. It is the storage of knowledge that + something does not exist, cannot or does not give an answer that we + call negative caching. + + "QNAME" - the name in the query section of an answer, or where this + resolves to a CNAME, or CNAME chain, the data field of the last + CNAME. The last CNAME in this sense is that which contains a value + which does not resolve to another CNAME. Implementations should note + that including CNAME records in responses in order, so that the first + has the label from the query section, and then each in sequence has + the label from the data section of the previous (where more than one + CNAME is needed) allows the sequence to be processed in one pass, and + considerably eases the task of the receiver. Other relevant records + (such as SIG RRs [RFC2065]) can be interspersed amongst the CNAMEs. + + "NXDOMAIN" - an alternate expression for the "Name Error" RCODE as + described in [RFC1035 Section 4.1.1] and the two terms are used + interchangeably in this document. + + "NODATA" - a pseudo RCODE which indicates that the name is valid, for + the given class, but are no records of the given type. A NODATA + response has to be inferred from the answer. + + "FORWARDER" - a nameserver used to resolve queries instead of + directly using the authoritative nameserver chain. The forwarder + typically either has better access to the internet, or maintains a + bigger cache which may be shared amongst many resolvers. How a + server is identified as a FORWARDER, or knows it is a FORWARDER is + outside the scope of this document. However if you are being used as + a forwarder the query will have the recursion desired flag set. + + An understanding of [RFC1034], [RFC1035] and [RFC2065] is expected + when reading this document. + + + + + + + + + +Andrews Standards Track [Page 2] + +RFC 2308 DNS NCACHE March 1998 + + +2 - Negative Responses + + The most common negative responses indicate that a particular RRset + does not exist in the DNS. The first sections of this document deal + with this case. Other negative responses can indicate failures of a + nameserver, those are dealt with in section 7 (Other Negative + Responses). + + A negative response is indicated by one of the following conditions: + +2.1 - Name Error + + Name errors (NXDOMAIN) are indicated by the presence of "Name Error" + in the RCODE field. In this case the domain referred to by the QNAME + does not exist. Note: the answer section may have SIG and CNAME RRs + and the authority section may have SOA, NXT [RFC2065] and SIG RRsets. + + It is possible to distinguish between a referral and a NXDOMAIN + response by the presense of NXDOMAIN in the RCODE regardless of the + presence of NS or SOA records in the authority section. + + NXDOMAIN responses can be categorised into four types by the contents + of the authority section. These are shown below along with a + referral for comparison. Fields not mentioned are not important in + terms of the examples. + + NXDOMAIN RESPONSE: TYPE 1. + + Header: + RDCODE=NXDOMAIN + Query: + AN.EXAMPLE. A + Answer: + AN.EXAMPLE. CNAME TRIPPLE.XX. + Authority: + XX. SOA NS1.XX. HOSTMASTER.NS1.XX. .... + XX. NS NS1.XX. + XX. NS NS2.XX. + Additional: + NS1.XX. A 127.0.0.2 + NS2.XX. A 127.0.0.3 + + NXDOMAIN RESPONSE: TYPE 2. + + Header: + RDCODE=NXDOMAIN + Query: + AN.EXAMPLE. A + + + +Andrews Standards Track [Page 3] + +RFC 2308 DNS NCACHE March 1998 + + + Answer: + AN.EXAMPLE. CNAME TRIPPLE.XX. + Authority: + XX. SOA NS1.XX. HOSTMASTER.NS1.XX. .... + Additional: + <empty> + + NXDOMAIN RESPONSE: TYPE 3. + + Header: + RDCODE=NXDOMAIN + Query: + AN.EXAMPLE. A + Answer: + AN.EXAMPLE. CNAME TRIPPLE.XX. + Authority: + <empty> + Additional: + <empty> + + NXDOMAIN RESPONSE: TYPE 4 + + Header: + RDCODE=NXDOMAIN + Query: + AN.EXAMPLE. A + Answer: + AN.EXAMPLE. CNAME TRIPPLE.XX. + Authority: + XX. NS NS1.XX. + XX. NS NS2.XX. + Additional: + NS1.XX. A 127.0.0.2 + NS2.XX. A 127.0.0.3 + + REFERRAL RESPONSE. + + Header: + RDCODE=NOERROR + Query: + AN.EXAMPLE. A + Answer: + AN.EXAMPLE. CNAME TRIPPLE.XX. + Authority: + XX. NS NS1.XX. + XX. NS NS2.XX. + Additional: + NS1.XX. A 127.0.0.2 + + + +Andrews Standards Track [Page 4] + +RFC 2308 DNS NCACHE March 1998 + + + NS2.XX. A 127.0.0.3 + + Note, in the four examples of NXDOMAIN responses, it is known that + the name "AN.EXAMPLE." exists, and has as its value a CNAME record. + The NXDOMAIN refers to "TRIPPLE.XX", which is then known not to + exist. On the other hand, in the referral example, it is shown that + "AN.EXAMPLE" exists, and has a CNAME RR as its value, but nothing is + known one way or the other about the existence of "TRIPPLE.XX", other + than that "NS1.XX" or "NS2.XX" can be consulted as the next step in + obtaining information about it. + + Where no CNAME records appear, the NXDOMAIN response refers to the + name in the label of the RR in the question section. + +2.1.1 Special Handling of Name Error + + This section deals with errors encountered when implementing negative + caching of NXDOMAIN responses. + + There are a large number of resolvers currently in existence that + fail to correctly detect and process all forms of NXDOMAIN response. + Some resolvers treat a TYPE 1 NXDOMAIN response as a referral. To + alleviate this problem it is recommended that servers that are + authoritative for the NXDOMAIN response only send TYPE 2 NXDOMAIN + responses, that is the authority section contains a SOA record and no + NS records. If a non- authoritative server sends a type 1 NXDOMAIN + response to one of these old resolvers, the result will be an + unnecessary query to an authoritative server. This is undesirable, + but not fatal except when the server is being used a FORWARDER. If + however the resolver is using the server as a FORWARDER to such a + resolver it will be necessary to disable the sending of TYPE 1 + NXDOMAIN response to it, use TYPE 2 NXDOMAIN instead. + + Some resolvers incorrectly continue processing if the authoritative + answer flag is not set, looping until the query retry threshold is + exceeded and then returning SERVFAIL. This is a problem when your + nameserver is listed as a FORWARDER for such resolvers. If the + nameserver is used as a FORWARDER by such resolver, the authority + flag will have to be forced on for NXDOMAIN responses to these + resolvers. In practice this causes no problems even if turned on + always, and has been the default behaviour in BIND from 4.9.3 + onwards. + +2.2 - No Data + + NODATA is indicated by an answer with the RCODE set to NOERROR and no + relevant answers in the answer section. The authority section will + contain an SOA record, or there will be no NS records there. + + + +Andrews Standards Track [Page 5] + +RFC 2308 DNS NCACHE March 1998 + + + NODATA responses have to be algorithmically determined from the + response's contents as there is no RCODE value to indicate NODATA. + In some cases to determine with certainty that NODATA is the correct + response it can be necessary to send another query. + + The authority section may contain NXT and SIG RRsets in addition to + NS and SOA records. CNAME and SIG records may exist in the answer + section. + + It is possible to distinguish between a NODATA and a referral + response by the presence of a SOA record in the authority section or + the absence of NS records in the authority section. + + NODATA responses can be categorised into three types by the contents + of the authority section. These are shown below along with a + referral for comparison. Fields not mentioned are not important in + terms of the examples. + + NODATA RESPONSE: TYPE 1. + + Header: + RDCODE=NOERROR + Query: + ANOTHER.EXAMPLE. A + Answer: + <empty> + Authority: + EXAMPLE. SOA NS1.XX. HOSTMASTER.NS1.XX. .... + EXAMPLE. NS NS1.XX. + EXAMPLE. NS NS2.XX. + Additional: + NS1.XX. A 127.0.0.2 + NS2.XX. A 127.0.0.3 + + NO DATA RESPONSE: TYPE 2. + + Header: + RDCODE=NOERROR + Query: + ANOTHER.EXAMPLE. A + Answer: + <empty> + Authority: + EXAMPLE. SOA NS1.XX. HOSTMASTER.NS1.XX. .... + Additional: + <empty> + + + + + +Andrews Standards Track [Page 6] + +RFC 2308 DNS NCACHE March 1998 + + + NO DATA RESPONSE: TYPE 3. + + Header: + RDCODE=NOERROR + Query: + ANOTHER.EXAMPLE. A + Answer: + <empty> + Authority: + <empty> + Additional: + <empty> + + REFERRAL RESPONSE. + + Header: + RDCODE=NOERROR + Query: + ANOTHER.EXAMPLE. A + Answer: + <empty> + Authority: + EXAMPLE. NS NS1.XX. + EXAMPLE. NS NS2.XX. + Additional: + NS1.XX. A 127.0.0.2 + NS2.XX. A 127.0.0.3 + + + These examples, unlike the NXDOMAIN examples above, have no CNAME + records, however they could, in just the same way that the NXDOMAIN + examples did, in which case it would be the value of the last CNAME + (the QNAME) for which NODATA would be concluded. + +2.2.1 - Special Handling of No Data + + There are a large number of resolvers currently in existence that + fail to correctly detect and process all forms of NODATA response. + Some resolvers treat a TYPE 1 NODATA response as a referral. To + alleviate this problem it is recommended that servers that are + authoritative for the NODATA response only send TYPE 2 NODATA + responses, that is the authority section contains a SOA record and no + NS records. Sending a TYPE 1 NODATA response from a non- + authoritative server to one of these resolvers will only result in an + unnecessary query. If a server is listed as a FORWARDER for another + resolver it may also be necessary to disable the sending of TYPE 1 + NODATA response for non-authoritative NODATA responses. + + + + +Andrews Standards Track [Page 7] + +RFC 2308 DNS NCACHE March 1998 + + + Some name servers fail to set the RCODE to NXDOMAIN in the presence + of CNAMEs in the answer section. If a definitive NXDOMAIN / NODATA + answer is required in this case the resolver must query again using + the QNAME as the query label. + +3 - Negative Answers from Authoritative Servers + + Name servers authoritative for a zone MUST include the SOA record of + the zone in the authority section of the response when reporting an + NXDOMAIN or indicating that no data of the requested type exists. + This is required so that the response may be cached. The TTL of this + record is set from the minimum of the MINIMUM field of the SOA record + and the TTL of the SOA itself, and indicates how long a resolver may + cache the negative answer. The TTL SIG record associated with the + SOA record should also be trimmed in line with the SOA's TTL. + + If the containing zone is signed [RFC2065] the SOA and appropriate + NXT and SIG records MUST be added. + +4 - SOA Minimum Field + + The SOA minimum field has been overloaded in the past to have three + different meanings, the minimum TTL value of all RRs in a zone, the + default TTL of RRs which did not contain a TTL value and the TTL of + negative responses. + + Despite being the original defined meaning, the first of these, the + minimum TTL value of all RRs in a zone, has never in practice been + used and is hereby deprecated. + + The second, the default TTL of RRs which contain no explicit TTL in + the master zone file, is relevant only at the primary server. After + a zone transfer all RRs have explicit TTLs and it is impossible to + determine whether the TTL for a record was explicitly set or derived + from the default after a zone transfer. Where a server does not + require RRs to include the TTL value explicitly, it should provide a + mechanism, not being the value of the MINIMUM field of the SOA + record, from which the missing TTL values are obtained. How this is + done is implementation dependent. + + The Master File format [RFC 1035 Section 5] is extended to include + the following directive: + + $TTL <TTL> [comment] + + + + + + + +Andrews Standards Track [Page 8] + +RFC 2308 DNS NCACHE March 1998 + + + All resource records appearing after the directive, and which do not + explicitly include a TTL value, have their TTL set to the TTL given + in the $TTL directive. SIG records without a explicit TTL get their + TTL from the "original TTL" of the SIG record [RFC 2065 Section 4.5]. + + The remaining of the current meanings, of being the TTL to be used + for negative responses, is the new defined meaning of the SOA minimum + field. + +5 - Caching Negative Answers + + Like normal answers negative answers have a time to live (TTL). As + there is no record in the answer section to which this TTL can be + applied, the TTL must be carried by another method. This is done by + including the SOA record from the zone in the authority section of + the reply. When the authoritative server creates this record its TTL + is taken from the minimum of the SOA.MINIMUM field and SOA's TTL. + This TTL decrements in a similar manner to a normal cached answer and + upon reaching zero (0) indicates the cached negative answer MUST NOT + be used again. + + A negative answer that resulted from a name error (NXDOMAIN) should + be cached such that it can be retrieved and returned in response to + another query for the same <QNAME, QCLASS> that resulted in the + cached negative response. + + A negative answer that resulted from a no data error (NODATA) should + be cached such that it can be retrieved and returned in response to + another query for the same <QNAME, QTYPE, QCLASS> that resulted in + the cached negative response. + + The NXT record, if it exists in the authority section of a negative + answer received, MUST be stored such that it can be be located and + returned with SOA record in the authority section, as should any SIG + records in the authority section. For NXDOMAIN answers there is no + "necessary" obvious relationship between the NXT records and the + QNAME. The NXT record MUST have the same owner name as the query + name for NODATA responses. + + Negative responses without SOA records SHOULD NOT be cached as there + is no way to prevent the negative responses looping forever between a + pair of servers even with a short TTL. + + Despite the DNS forming a tree of servers, with various mis- + configurations it is possible to form a loop in the query graph, e.g. + two servers listing each other as forwarders, various lame server + configurations. Without a TTL count down a cache negative response + + + + +Andrews Standards Track [Page 9] + +RFC 2308 DNS NCACHE March 1998 + + + when received by the next server would have its TTL reset. This + negative indication could then live forever circulating between the + servers involved. + + As with caching positive responses it is sensible for a resolver to + limit for how long it will cache a negative response as the protocol + supports caching for up to 68 years. Such a limit should not be + greater than that applied to positive answers and preferably be + tunable. Values of one to three hours have been found to work well + and would make sensible a default. Values exceeding one day have + been found to be problematic. + +6 - Negative answers from the cache + + When a server, in answering a query, encounters a cached negative + response it MUST add the cached SOA record to the authority section + of the response with the TTL decremented by the amount of time it was + stored in the cache. This allows the NXDOMAIN / NODATA response to + time out correctly. + + If a NXT record was cached along with SOA record it MUST be added to + the authority section. If a SIG record was cached along with a NXT + record it SHOULD be added to the authority section. + + As with all answers coming from the cache, negative answers SHOULD + have an implicit referral built into the answer. This enables the + resolver to locate an authoritative source. An implicit referral is + characterised by NS records in the authority section referring the + resolver towards a authoritative source. NXDOMAIN types 1 and 4 + responses contain implicit referrals as does NODATA type 1 response. + +7 - Other Negative Responses + + Caching of other negative responses is not covered by any existing + RFC. There is no way to indicate a desired TTL in these responses. + Care needs to be taken to ensure that there are not forwarding loops. + +7.1 Server Failure (OPTIONAL) + + Server failures fall into two major classes. The first is where a + server can determine that it has been misconfigured for a zone. This + may be where it has been listed as a server, but not configured to be + a server for the zone, or where it has been configured to be a server + for the zone, but cannot obtain the zone data for some reason. This + can occur either because the zone file does not exist or contains + errors, or because another server from which the zone should have + been available either did not respond or was unable or unwilling to + supply the zone. + + + +Andrews Standards Track [Page 10] + +RFC 2308 DNS NCACHE March 1998 + + + The second class is where the server needs to obtain an answer from + elsewhere, but is unable to do so, due to network failures, other + servers that don't reply, or return server failure errors, or + similar. + + In either case a resolver MAY cache a server failure response. If it + does so it MUST NOT cache it for longer than five (5) minutes, and it + MUST be cached against the specific query tuple <query name, type, + class, server IP address>. + +7.2 Dead / Unreachable Server (OPTIONAL) + + Dead / Unreachable servers are servers that fail to respond in any + way to a query or where the transport layer has provided an + indication that the server does not exist or is unreachable. A + server may be deemed to be dead or unreachable if it has not + responded to an outstanding query within 120 seconds. + + Examples of transport layer indications are: + + ICMP error messages indicating host, net or port unreachable. + TCP resets + IP stack error messages providing similar indications to those above. + + A server MAY cache a dead server indication. If it does so it MUST + NOT be deemed dead for longer than five (5) minutes. The indication + MUST be stored against query tuple <query name, type, class, server + IP address> unless there was a transport layer indication that the + server does not exist, in which case it applies to all queries to + that specific IP address. + +8 - Changes from RFC 1034 + + Negative caching in resolvers is no-longer optional, if a resolver + caches anything it must also cache negative answers. + + Non-authoritative negative answers MAY be cached. + + The SOA record from the authority section MUST be cached. Name error + indications must be cached against the tuple <query name, QCLASS>. + No data indications must be cached against <query name, QTYPE, + QCLASS> tuple. + + A cached SOA record must be added to the response. This was + explicitly not allowed because previously the distinction between a + normal cached SOA record, and the SOA cached as a result of a + negative response was not made, and simply extracting a normal cached + SOA and adding that to a cached negative response causes problems. + + + +Andrews Standards Track [Page 11] + +RFC 2308 DNS NCACHE March 1998 + + + The $TTL TTL directive was added to the master file format. + +9 - History of Negative Caching + + This section presents a potted history of negative caching in the DNS + and forms no part of the technical specification of negative caching. + + It is interesting to note that the same concepts were re-invented in + both the CHIVES and BIND servers. + + The history of the early CHIVES work (Section 9.1) was supplied by + Rob Austein <sra@epilogue.com> and is reproduced here in the form in + which he supplied it [MPA]. + + Sometime around the spring of 1985, I mentioned to Paul Mockapetris + that our experience with his JEEVES DNS resolver had pointed out the + need for some kind of negative caching scheme. Paul suggested that + we simply cache authoritative errors, using the SOA MINIMUM value for + the zone that would have contained the target RRs. I'm pretty sure + that this conversation took place before RFC-973 was written, but it + was never clear to me whether this idea was something that Paul came + up with on the spot in response to my question or something he'd + already been planning to put into the document that became RFC-973. + In any case, neither of us was entirely sure that the SOA MINIMUM + value was really the right metric to use, but it was available and + was under the control of the administrator of the target zone, both + of which seemed to us at the time to be important feature. + + Late in 1987, I released the initial beta-test version of CHIVES, the + DNS resolver I'd written to replace Paul's JEEVES resolver. CHIVES + included a search path mechanism that was used pretty heavily at + several sites (including my own), so CHIVES also included a negative + caching mechanism based on SOA MINIMUM values. The basic strategy + was to cache authoritative error codes keyed by the exact query + parameters (QNAME, QCLASS, and QTYPE), with a cache TTL equal to the + SOA MINIMUM value. CHIVES did not attempt to track down SOA RRs if + they weren't supplied in the authoritative response, so it never + managed to completely eliminate the gratuitous DNS error message + traffic, but it did help considerably. Keep in mind that this was + happening at about the same time as the near-collapse of the ARPANET + due to congestion caused by exponential growth and the the "old" + (pre-VJ) TCP retransmission algorithm, so negative caching resulted + in drasticly better DNS response time for our users, mailer daemons, + etcetera. + + + + + + + +Andrews Standards Track [Page 12] + +RFC 2308 DNS NCACHE March 1998 + + + As far as I know, CHIVES was the first resolver to implement negative + caching. CHIVES was developed during the twilight years of TOPS-20, + so it never ran on very many machines, but the few machines that it + did run on were the ones that were too critical to shut down quickly + no matter how much it cost to keep them running. So what few users + we did have tended to drive CHIVES pretty hard. Several interesting + bits of DNS technology resulted from that, but the one that's + relevant here is the MAXTTL configuration parameter. + + Experience with JEEVES had already shown that RRs often showed up + with ridiculously long TTLs (99999999 was particularly popular for + many years, due to bugs in the code and documentation of several + early versions of BIND), and that robust software that blindly + believed such TTLs could create so many strange failures that it was + often necessary to reboot the resolver frequently just to clear this + garbage out of the cache. So CHIVES had a configuration parameter + "MAXTTL", which specified the maximum "reasonable" TTL in a received + RR. RRs with TTLs greater than MAXTTL would either have their TTLs + reduced to MAXTTL or would be discarded entirely, depending on the + setting of another configuration parameter. + + When we started getting field experience with CHIVES's negative + caching code, it became clear that the SOA MINIMUM value was often + large enough to cause the same kinds of problems for negative caching + as the huge TTLs in RRs had for normal caching (again, this was in + part due to a bug in several early versions of BIND, where a + secondary server would authoritatively deny all knowledge of its + zones if it couldn't contact the primaries on reboot). So we started + running the negative cache TTLs through the MAXTTL check too, and + continued to experiment. + + The configuration that seemed to work best on WSMR-SIMTEL20.ARMY.MIL + (last of the major Internet TOPS-20 machines to be shut down, thus + the last major user of CHIVES, thus the place where we had the + longest experimental baseline) was to set MAXTTL to about three days. + Most of the traffic initiated by SIMTEL20 in its last years was + mail-related, and the mail queue timeout was set to one week, so this + gave a "stuck" message several tries at complete DNS resolution, + without bogging down the system with a lot of useless queries. Since + (for reasons that now escape me) we only had the single MAXTTL + parameter rather than separate ones for positive and negative + caching, it's not clear how much effect this setting of MAXTTL had on + the negative caching code. + + CHIVES also included a second, somewhat controversial mechanism which + took the place of negative caching in some cases. The CHIVES + resolver daemon could be configured to load DNS master files, giving + it the ability to act as what today would be called a "stealth + + + +Andrews Standards Track [Page 13] + +RFC 2308 DNS NCACHE March 1998 + + + secondary". That is, when configured in this way, the resolver had + direct access to authoritative information for heavily-used zones. + The search path mechanisms in CHIVES reflected this: there were + actually two separate search paths, one of which only searched local + authoritative zone data, and one which could generate normal + iterative queries. This cut down on the need for negative caching in + cases where usage was predictably heavy (e.g., the resolver on + XX.LCS.MIT.EDU always loaded the zone files for both LCS.MIT.EDU and + AI.MIT.EDU and put both of these suffixes into the "local" search + path, since between them the hosts in these two zones accounted for + the bulk of the DNS traffic). Not all sites running CHIVES chose to + use this feature; C.CS.CMU.EDU, for example, chose to use the + "remote" search path for everything because there were too many + different sub-zones at CMU for zone shadowing to be practical for + them, so they relied pretty heavily on negative caching even for + local traffic. + + Overall, I still think the basic design we used for negative caching + was pretty reasonable: the zone administrator specified how long to + cache negative answers, and the resolver configuration chose the + actual cache time from the range between zero and the period + specified by the zone administrator. There are a lot of details I'd + do differently now (like using a new SOA field instead of overloading + the MINIMUM field), but after more than a decade, I'd be more worried + if we couldn't think of at least a few improvements. + +9.2 BIND + + While not the first attempt to get negative caching into BIND, in + July 1993, BIND 4.9.2 ALPHA, Anant Kumar of ISI supplied code that + implemented, validation and negative caching (NCACHE). This code had + a 10 minute TTL for negative caching and only cached the indication + that there was a negative response, NXDOMAIN or NOERROR_NODATA. This + is the origin of the NODATA pseudo response code mentioned above. + + Mark Andrews of CSIRO added code (RETURNSOA) that stored the SOA + record such that it could be retrieved by a similar query. UUnet + complained that they were getting old answers after loading a new + zone, and the option was turned off, BIND 4.9.3-alpha5, April 1994. + In reality this indicated that the named needed to purge the space + the zone would occupy. Functionality to do this was added in BIND + 4.9.3 BETA11 patch2, December 1994. + + RETURNSOA was re-enabled by default, BIND 4.9.5-T1A, August 1996. + + + + + + + +Andrews Standards Track [Page 14] + +RFC 2308 DNS NCACHE March 1998 + + +10 Example + + The following example is based on a signed zone that is empty apart + from the nameservers. We will query for WWW.XX.EXAMPLE showing + initial response and again 10 minutes later. Note 1: during the + intervening 10 minutes the NS records for XX.EXAMPLE have expired. + Note 2: the TTL of the SIG records are not explicitly set in the zone + file and are hence the TTL of the RRset they are the signature for. + + Zone File: + + $TTL 86400 + $ORIGIN XX.EXAMPLE. + @ IN SOA NS1.XX.EXAMPLE. HOSTMATER.XX.EXAMPLE. ( + 1997102000 ; serial + 1800 ; refresh (30 mins) + 900 ; retry (15 mins) + 604800 ; expire (7 days) + 1200 ) ; minimum (20 mins) + IN SIG SOA ... + 1200 IN NXT NS1.XX.EXAMPLE. A NXT SIG SOA NS KEY + IN SIG NXT ... XX.EXAMPLE. ... + 300 IN NS NS1.XX.EXAMPLE. + 300 IN NS NS2.XX.EXAMPLE. + IN SIG NS ... XX.EXAMPLE. ... + IN KEY 0x4100 1 1 ... + IN SIG KEY ... XX.EXAMPLE. ... + IN SIG KEY ... EXAMPLE. ... + NS1 IN A 10.0.0.1 + IN SIG A ... XX.EXAMPLE. ... + 1200 IN NXT NS2.XX.EXAMPLE. A NXT SIG + IN SIG NXT ... + NS2 IN A 10.0.0.2 + IN SIG A ... XX.EXAMPLE. ... + 1200 IN NXT XX.EXAMPLE. A NXT SIG + IN SIG NXT ... XX.EXAMPLE. ... + + Initial Response: + + Header: + RDCODE=NXDOMAIN, AA=1, QR=1, TC=0 + Query: + WWW.XX.EXAMPLE. IN A + Answer: + <empty> + Authority: + XX.EXAMPLE. 1200 IN SOA NS1.XX.EXAMPLE. ... + XX.EXAMPLE. 1200 IN SIG SOA ... XX.EXAMPLE. ... + + + +Andrews Standards Track [Page 15] + +RFC 2308 DNS NCACHE March 1998 + + + NS2.XX.EXAMPLE. 1200 IN NXT XX.EXAMPLE. NXT A NXT SIG + NS2.XX.EXAMPLE. 1200 IN SIG NXT ... XX.EXAMPLE. ... + XX.EXAMPLE. 86400 IN NS NS1.XX.EXAMPLE. + XX.EXAMPLE. 86400 IN NS NS2.XX.EXAMPLE. + XX.EXAMPLE. 86400 IN SIG NS ... XX.EXAMPLE. ... + Additional + XX.EXAMPLE. 86400 IN KEY 0x4100 1 1 ... + XX.EXAMPLE. 86400 IN SIG KEY ... EXAMPLE. ... + NS1.XX.EXAMPLE. 86400 IN A 10.0.0.1 + NS1.XX.EXAMPLE. 86400 IN SIG A ... XX.EXAMPLE. ... + NS2.XX.EXAMPLE. 86400 IN A 10.0.0.2 + NS3.XX.EXAMPLE. 86400 IN SIG A ... XX.EXAMPLE. ... + + After 10 Minutes: + + Header: + RDCODE=NXDOMAIN, AA=0, QR=1, TC=0 + Query: + WWW.XX.EXAMPLE. IN A + Answer: + <empty> + Authority: + XX.EXAMPLE. 600 IN SOA NS1.XX.EXAMPLE. ... + XX.EXAMPLE. 600 IN SIG SOA ... XX.EXAMPLE. ... + NS2.XX.EXAMPLE. 600 IN NXT XX.EXAMPLE. NXT A NXT SIG + NS2.XX.EXAMPLE. 600 IN SIG NXT ... XX.EXAMPLE. ... + EXAMPLE. 65799 IN NS NS1.YY.EXAMPLE. + EXAMPLE. 65799 IN NS NS2.YY.EXAMPLE. + EXAMPLE. 65799 IN SIG NS ... XX.EXAMPLE. ... + Additional + XX.EXAMPLE. 65800 IN KEY 0x4100 1 1 ... + XX.EXAMPLE. 65800 IN SIG KEY ... EXAMPLE. ... + NS1.YY.EXAMPLE. 65799 IN A 10.100.0.1 + NS1.YY.EXAMPLE. 65799 IN SIG A ... EXAMPLE. ... + NS2.YY.EXAMPLE. 65799 IN A 10.100.0.2 + NS3.YY.EXAMPLE. 65799 IN SIG A ... EXAMPLE. ... + EXAMPLE. 65799 IN KEY 0x4100 1 1 ... + EXAMPLE. 65799 IN SIG KEY ... . ... + + +11 Security Considerations + + It is believed that this document does not introduce any significant + additional security threats other that those that already exist when + using data from the DNS. + + + + + + +Andrews Standards Track [Page 16] + +RFC 2308 DNS NCACHE March 1998 + + + With negative caching it might be possible to propagate a denial of + service attack by spreading a NXDOMAIN message with a very high TTL. + Without negative caching that would be much harder. A similar effect + could be achieved previously by spreading a bad A record, so that the + server could not be reached - which is almost the same. It has the + same effect as far as what the end user is able to do, but with a + different psychological effect. With the bad A, I feel "damn the + network is broken again" and try again tomorrow. With the "NXDOMAIN" + I feel "Oh, they've turned off the server and it doesn't exist any + more" and probably never bother trying this server again. + + A practical example of this is a SMTP server where this behaviour is + encoded. With a NXDOMAIN attack the mail message would bounce + immediately, where as with a bad A attack the mail would be queued + and could potentially get through after the attack was suspended. + + For such an attack to be successful, the NXDOMAIN indiction must be + injected into a parent server (or a busy caching resolver). One way + this might be done by the use of a CNAME which results in the parent + server querying an attackers server. Resolvers that wish to prevent + such attacks can query again the final QNAME ignoring any NS data in + the query responses it has received for this query. + + Implementing TTL sanity checking will reduce the effectiveness of + such an attack, because a successful attack would require re- + injection of the bogus data at more frequent intervals. + + DNS Security [RFC2065] provides a mechanism to verify whether a + negative response is valid or not, through the use of NXT and SIG + records. This document supports the use of that mechanism by + promoting the transmission of the relevant security records even in a + non security aware server. + +Acknowledgments + + I would like to thank Rob Austein for his history of the CHIVES + nameserver. The DNSIND working group, in particular Robert Elz for + his valuable technical and editorial contributions to this document. + + + + + + + + + + + + + +Andrews Standards Track [Page 17] + +RFC 2308 DNS NCACHE March 1998 + + +References + + [RFC1034] + Mockapetris, P., "DOMAIN NAMES - CONCEPTS AND FACILITIES," + STD 13, RFC 1034, November 1987. + + [RFC1035] + Mockapetris, P., "DOMAIN NAMES - IMPLEMENTATION AND + SPECIFICATION," STD 13, RFC 1035, November 1987. + + [RFC2065] + Eastlake, D., and C. Kaufman, "Domain Name System Security + Extensions," RFC 2065, January 1997. + + [RFC2119] + Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels," BCP 14, RFC 2119, March 1997. + + [RFC2181] + Elz, R., and R. Bush, "Clarifications to the DNS + Specification," RFC 2181, July 1997. + +Author's Address + + Mark Andrews + CSIRO - Mathematical and Information Sciences + Locked Bag 17 + North Ryde NSW 2113 + AUSTRALIA + + Phone: +61 2 9325 3148 + EMail: Mark.Andrews@cmis.csiro.au + + + + + + + + + + + + + + + + + + + +Andrews Standards Track [Page 18] + +RFC 2308 DNS NCACHE March 1998 + + +Full Copyright Statement + + Copyright (C) The Internet Society (1998). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + + + + + + + + + + + + + + + + + + + + + + + + +Andrews Standards Track [Page 19] + diff --git a/doc/rfc/rfc2373.txt b/doc/rfc/rfc2373.txt new file mode 100644 index 00000000..1b31496f --- /dev/null +++ b/doc/rfc/rfc2373.txt @@ -0,0 +1,1459 @@ +
+
+
+
+
+
+Network Working Group R. Hinden
+Request for Comments: 2373 Nokia
+Obsoletes: 1884 S. Deering
+Category: Standards Track Cisco Systems
+ July 1998
+
+ IP Version 6 Addressing Architecture
+
+Status of this Memo
+
+ This document specifies an Internet standards track protocol for the
+ Internet community, and requests discussion and suggestions for
+ improvements. Please refer to the current edition of the "Internet
+ Official Protocol Standards" (STD 1) for the standardization state
+ and status of this protocol. Distribution of this memo is unlimited.
+
+Copyright Notice
+
+ Copyright (C) The Internet Society (1998). All Rights Reserved.
+
+Abstract
+
+ This specification defines the addressing architecture of the IP
+ Version 6 protocol [IPV6]. The document includes the IPv6 addressing
+ model, text representations of IPv6 addresses, definition of IPv6
+ unicast addresses, anycast addresses, and multicast addresses, and an
+ IPv6 node's required addresses.
+
+Table of Contents
+
+ 1. Introduction.................................................2
+ 2. IPv6 Addressing..............................................2
+ 2.1 Addressing Model.........................................3
+ 2.2 Text Representation of Addresses.........................3
+ 2.3 Text Representation of Address Prefixes..................5
+ 2.4 Address Type Representation..............................6
+ 2.5 Unicast Addresses........................................7
+ 2.5.1 Interface Identifiers................................8
+ 2.5.2 The Unspecified Address..............................9
+ 2.5.3 The Loopback Address.................................9
+ 2.5.4 IPv6 Addresses with Embedded IPv4 Addresses.........10
+ 2.5.5 NSAP Addresses......................................10
+ 2.5.6 IPX Addresses.......................................10
+ 2.5.7 Aggregatable Global Unicast Addresses...............11
+ 2.5.8 Local-use IPv6 Unicast Addresses....................11
+ 2.6 Anycast Addresses.......................................12
+ 2.6.1 Required Anycast Address............................13
+ 2.7 Multicast Addresses.....................................14
+
+
+
+Hinden & Deering Standards Track [Page 1]
+
+RFC 2373 IPv6 Addressing Architecture July 1998
+
+
+ 2.7.1 Pre-Defined Multicast Addresses.....................15
+ 2.7.2 Assignment of New IPv6 Multicast Addresses..........17
+ 2.8 A Node's Required Addresses.............................17
+ 3. Security Considerations.....................................18
+ APPENDIX A: Creating EUI-64 based Interface Identifiers........19
+ APPENDIX B: ABNF Description of Text Representations...........22
+ APPENDIX C: CHANGES FROM RFC-1884..............................23
+ REFERENCES.....................................................24
+ AUTHORS' ADDRESSES.............................................25
+ FULL COPYRIGHT STATEMENT.......................................26
+
+
+1.0 INTRODUCTION
+
+ This specification defines the addressing architecture of the IP
+ Version 6 protocol. It includes a detailed description of the
+ currently defined address formats for IPv6 [IPV6].
+
+ The authors would like to acknowledge the contributions of Paul
+ Francis, Scott Bradner, Jim Bound, Brian Carpenter, Matt Crawford,
+ Deborah Estrin, Roger Fajman, Bob Fink, Peter Ford, Bob Gilligan,
+ Dimitry Haskin, Tom Harsch, Christian Huitema, Tony Li, Greg
+ Minshall, Thomas Narten, Erik Nordmark, Yakov Rekhter, Bill Simpson,
+ and Sue Thomson.
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in [RFC 2119].
+
+2.0 IPv6 ADDRESSING
+
+ IPv6 addresses are 128-bit identifiers for interfaces and sets of
+ interfaces. There are three types of addresses:
+
+ Unicast: An identifier for a single interface. A packet sent to
+ a unicast address is delivered to the interface
+ identified by that address.
+
+ Anycast: An identifier for a set of interfaces (typically
+ belonging to different nodes). A packet sent to an
+ anycast address is delivered to one of the interfaces
+ identified by that address (the "nearest" one, according
+ to the routing protocols' measure of distance).
+
+ Multicast: An identifier for a set of interfaces (typically
+ belonging to different nodes). A packet sent to a
+ multicast address is delivered to all interfaces
+ identified by that address.
+
+
+
+Hinden & Deering Standards Track [Page 2]
+
+RFC 2373 IPv6 Addressing Architecture July 1998
+
+
+ There are no broadcast addresses in IPv6, their function being
+ superseded by multicast addresses.
+
+ In this document, fields in addresses are given a specific name, for
+ example "subscriber". When this name is used with the term "ID" for
+ identifier after the name (e.g., "subscriber ID"), it refers to the
+ contents of the named field. When it is used with the term "prefix"
+ (e.g. "subscriber prefix") it refers to all of the address up to and
+ including this field.
+
+ In IPv6, all zeros and all ones are legal values for any field,
+ unless specifically excluded. Specifically, prefixes may contain
+ zero-valued fields or end in zeros.
+
+2.1 Addressing Model
+
+ IPv6 addresses of all types are assigned to interfaces, not nodes.
+ An IPv6 unicast address refers to a single interface. Since each
+ interface belongs to a single node, any of that node's interfaces'
+ unicast addresses may be used as an identifier for the node.
+
+ All interfaces are required to have at least one link-local unicast
+ address (see section 2.8 for additional required addresses). A
+ single interface may also be assigned multiple IPv6 addresses of any
+ type (unicast, anycast, and multicast) or scope. Unicast addresses
+ with scope greater than link-scope are not needed for interfaces that
+ are not used as the origin or destination of any IPv6 packets to or
+ from non-neighbors. This is sometimes convenient for point-to-point
+ interfaces. There is one exception to this addressing model:
+
+ An unicast address or a set of unicast addresses may be assigned to
+ multiple physical interfaces if the implementation treats the
+ multiple physical interfaces as one interface when presenting it to
+ the internet layer. This is useful for load-sharing over multiple
+ physical interfaces.
+
+ Currently IPv6 continues the IPv4 model that a subnet prefix is
+ associated with one link. Multiple subnet prefixes may be assigned
+ to the same link.
+
+2.2 Text Representation of Addresses
+
+ There are three conventional forms for representing IPv6 addresses as
+ text strings:
+
+ 1. The preferred form is x:x:x:x:x:x:x:x, where the 'x's are the
+ hexadecimal values of the eight 16-bit pieces of the address.
+ Examples:
+
+
+
+Hinden & Deering Standards Track [Page 3]
+
+RFC 2373 IPv6 Addressing Architecture July 1998
+
+
+ FEDC:BA98:7654:3210:FEDC:BA98:7654:3210
+
+ 1080:0:0:0:8:800:200C:417A
+
+ Note that it is not necessary to write the leading zeros in an
+ individual field, but there must be at least one numeral in every
+ field (except for the case described in 2.).
+
+ 2. Due to some methods of allocating certain styles of IPv6
+ addresses, it will be common for addresses to contain long strings
+ of zero bits. In order to make writing addresses containing zero
+ bits easier a special syntax is available to compress the zeros.
+ The use of "::" indicates multiple groups of 16-bits of zeros.
+ The "::" can only appear once in an address. The "::" can also be
+ used to compress the leading and/or trailing zeros in an address.
+
+ For example the following addresses:
+
+ 1080:0:0:0:8:800:200C:417A a unicast address
+ FF01:0:0:0:0:0:0:101 a multicast address
+ 0:0:0:0:0:0:0:1 the loopback address
+ 0:0:0:0:0:0:0:0 the unspecified addresses
+
+ may be represented as:
+
+ 1080::8:800:200C:417A a unicast address
+ FF01::101 a multicast address
+ ::1 the loopback address
+ :: the unspecified addresses
+
+ 3. An alternative form that is sometimes more convenient when dealing
+ with a mixed environment of IPv4 and IPv6 nodes is
+ x:x:x:x:x:x:d.d.d.d, where the 'x's are the hexadecimal values of
+ the six high-order 16-bit pieces of the address, and the 'd's are
+ the decimal values of the four low-order 8-bit pieces of the
+ address (standard IPv4 representation). Examples:
+
+ 0:0:0:0:0:0:13.1.68.3
+
+ 0:0:0:0:0:FFFF:129.144.52.38
+
+ or in compressed form:
+
+ ::13.1.68.3
+
+ ::FFFF:129.144.52.38
+
+
+
+
+
+Hinden & Deering Standards Track [Page 4]
+
+RFC 2373 IPv6 Addressing Architecture July 1998
+
+
+2.3 Text Representation of Address Prefixes
+
+ The text representation of IPv6 address prefixes is similar to the
+ way IPv4 addresses prefixes are written in CIDR notation. An IPv6
+ address prefix is represented by the notation:
+
+ ipv6-address/prefix-length
+
+ where
+
+ ipv6-address is an IPv6 address in any of the notations listed
+ in section 2.2.
+
+ prefix-length is a decimal value specifying how many of the
+ leftmost contiguous bits of the address comprise
+ the prefix.
+
+ For example, the following are legal representations of the 60-bit
+ prefix 12AB00000000CD3 (hexadecimal):
+
+ 12AB:0000:0000:CD30:0000:0000:0000:0000/60
+ 12AB::CD30:0:0:0:0/60
+ 12AB:0:0:CD30::/60
+
+ The following are NOT legal representations of the above prefix:
+
+ 12AB:0:0:CD3/60 may drop leading zeros, but not trailing zeros,
+ within any 16-bit chunk of the address
+
+ 12AB::CD30/60 address to left of "/" expands to
+ 12AB:0000:0000:0000:0000:000:0000:CD30
+
+ 12AB::CD3/60 address to left of "/" expands to
+ 12AB:0000:0000:0000:0000:000:0000:0CD3
+
+ When writing both a node address and a prefix of that node address
+ (e.g., the node's subnet prefix), the two can combined as follows:
+
+ the node address 12AB:0:0:CD30:123:4567:89AB:CDEF
+ and its subnet number 12AB:0:0:CD30::/60
+
+ can be abbreviated as 12AB:0:0:CD30:123:4567:89AB:CDEF/60
+
+
+
+
+
+
+
+
+
+Hinden & Deering Standards Track [Page 5]
+
+RFC 2373 IPv6 Addressing Architecture July 1998
+
+
+2.4 Address Type Representation
+
+ The specific type of an IPv6 address is indicated by the leading bits
+ in the address. The variable-length field comprising these leading
+ bits is called the Format Prefix (FP). The initial allocation of
+ these prefixes is as follows:
+
+ Allocation Prefix Fraction of
+ (binary) Address Space
+ ----------------------------------- -------- -------------
+ Reserved 0000 0000 1/256
+ Unassigned 0000 0001 1/256
+
+ Reserved for NSAP Allocation 0000 001 1/128
+ Reserved for IPX Allocation 0000 010 1/128
+
+ Unassigned 0000 011 1/128
+ Unassigned 0000 1 1/32
+ Unassigned 0001 1/16
+
+ Aggregatable Global Unicast Addresses 001 1/8
+ Unassigned 010 1/8
+ Unassigned 011 1/8
+ Unassigned 100 1/8
+ Unassigned 101 1/8
+ Unassigned 110 1/8
+
+ Unassigned 1110 1/16
+ Unassigned 1111 0 1/32
+ Unassigned 1111 10 1/64
+ Unassigned 1111 110 1/128
+ Unassigned 1111 1110 0 1/512
+
+ Link-Local Unicast Addresses 1111 1110 10 1/1024
+ Site-Local Unicast Addresses 1111 1110 11 1/1024
+
+ Multicast Addresses 1111 1111 1/256
+
+ Notes:
+
+ (1) The "unspecified address" (see section 2.5.2), the loopback
+ address (see section 2.5.3), and the IPv6 Addresses with
+ Embedded IPv4 Addresses (see section 2.5.4), are assigned out
+ of the 0000 0000 format prefix space.
+
+
+
+
+
+
+
+Hinden & Deering Standards Track [Page 6]
+
+RFC 2373 IPv6 Addressing Architecture July 1998
+
+
+ (2) The format prefixes 001 through 111, except for Multicast
+ Addresses (1111 1111), are all required to have to have 64-bit
+ interface identifiers in EUI-64 format. See section 2.5.1 for
+ definitions.
+
+ This allocation supports the direct allocation of aggregation
+ addresses, local use addresses, and multicast addresses. Space is
+ reserved for NSAP addresses and IPX addresses. The remainder of the
+ address space is unassigned for future use. This can be used for
+ expansion of existing use (e.g., additional aggregatable addresses,
+ etc.) or new uses (e.g., separate locators and identifiers). Fifteen
+ percent of the address space is initially allocated. The remaining
+ 85% is reserved for future use.
+
+ Unicast addresses are distinguished from multicast addresses by the
+ value of the high-order octet of the addresses: a value of FF
+ (11111111) identifies an address as a multicast address; any other
+ value identifies an address as a unicast address. Anycast addresses
+ are taken from the unicast address space, and are not syntactically
+ distinguishable from unicast addresses.
+
+2.5 Unicast Addresses
+
+ IPv6 unicast addresses are aggregatable with contiguous bit-wise
+ masks similar to IPv4 addresses under Class-less Interdomain Routing
+ [CIDR].
+
+ There are several forms of unicast address assignment in IPv6,
+ including the global aggregatable global unicast address, the NSAP
+ address, the IPX hierarchical address, the site-local address, the
+ link-local address, and the IPv4-capable host address. Additional
+ address types can be defined in the future.
+
+ IPv6 nodes may have considerable or little knowledge of the internal
+ structure of the IPv6 address, depending on the role the node plays
+ (for instance, host versus router). At a minimum, a node may
+ consider that unicast addresses (including its own) have no internal
+ structure:
+
+ | 128 bits |
+ +-----------------------------------------------------------------+
+ | node address |
+ +-----------------------------------------------------------------+
+
+ A slightly sophisticated host (but still rather simple) may
+ additionally be aware of subnet prefix(es) for the link(s) it is
+ attached to, where different addresses may have different values for
+ n:
+
+
+
+Hinden & Deering Standards Track [Page 7]
+
+RFC 2373 IPv6 Addressing Architecture July 1998
+
+
+ | n bits | 128-n bits |
+ +------------------------------------------------+----------------+
+ | subnet prefix | interface ID |
+ +------------------------------------------------+----------------+
+
+ Still more sophisticated hosts may be aware of other hierarchical
+ boundaries in the unicast address. Though a very simple router may
+ have no knowledge of the internal structure of IPv6 unicast
+ addresses, routers will more generally have knowledge of one or more
+ of the hierarchical boundaries for the operation of routing
+ protocols. The known boundaries will differ from router to router,
+ depending on what positions the router holds in the routing
+ hierarchy.
+
+2.5.1 Interface Identifiers
+
+ Interface identifiers in IPv6 unicast addresses are used to identify
+ interfaces on a link. They are required to be unique on that link.
+ They may also be unique over a broader scope. In many cases an
+ interface's identifier will be the same as that interface's link-
+ layer address. The same interface identifier may be used on multiple
+ interfaces on a single node.
+
+ Note that the use of the same interface identifier on multiple
+ interfaces of a single node does not affect the interface
+ identifier's global uniqueness or each IPv6 addresses global
+ uniqueness created using that interface identifier.
+
+ In a number of the format prefixes (see section 2.4) Interface IDs
+ are required to be 64 bits long and to be constructed in IEEE EUI-64
+ format [EUI64]. EUI-64 based Interface identifiers may have global
+ scope when a global token is available (e.g., IEEE 48bit MAC) or may
+ have local scope where a global token is not available (e.g., serial
+ links, tunnel end-points, etc.). It is required that the "u" bit
+ (universal/local bit in IEEE EUI-64 terminology) be inverted when
+ forming the interface identifier from the EUI-64. The "u" bit is set
+ to one (1) to indicate global scope, and it is set to zero (0) to
+ indicate local scope. The first three octets in binary of an EUI-64
+ identifier are as follows:
+
+ 0 0 0 1 1 2
+ |0 7 8 5 6 3|
+ +----+----+----+----+----+----+
+ |cccc|ccug|cccc|cccc|cccc|cccc|
+ +----+----+----+----+----+----+
+
+
+
+
+
+
+Hinden & Deering Standards Track [Page 8]
+
+RFC 2373 IPv6 Addressing Architecture July 1998
+
+
+ written in Internet standard bit-order , where "u" is the
+ universal/local bit, "g" is the individual/group bit, and "c" are the
+ bits of the company_id. Appendix A: "Creating EUI-64 based Interface
+ Identifiers" provides examples on the creation of different EUI-64
+ based interface identifiers.
+
+ The motivation for inverting the "u" bit when forming the interface
+ identifier is to make it easy for system administrators to hand
+ configure local scope identifiers when hardware tokens are not
+ available. This is expected to be case for serial links, tunnel end-
+ points, etc. The alternative would have been for these to be of the
+ form 0200:0:0:1, 0200:0:0:2, etc., instead of the much simpler ::1,
+ ::2, etc.
+
+ The use of the universal/local bit in the IEEE EUI-64 identifier is
+ to allow development of future technology that can take advantage of
+ interface identifiers with global scope.
+
+ The details of forming interface identifiers are defined in the
+ appropriate "IPv6 over <link>" specification such as "IPv6 over
+ Ethernet" [ETHER], "IPv6 over FDDI" [FDDI], etc.
+
+2.5.2 The Unspecified Address
+
+ The address 0:0:0:0:0:0:0:0 is called the unspecified address. It
+ must never be assigned to any node. It indicates the absence of an
+ address. One example of its use is in the Source Address field of
+ any IPv6 packets sent by an initializing host before it has learned
+ its own address.
+
+ The unspecified address must not be used as the destination address
+ of IPv6 packets or in IPv6 Routing Headers.
+
+2.5.3 The Loopback Address
+
+ The unicast address 0:0:0:0:0:0:0:1 is called the loopback address.
+ It may be used by a node to send an IPv6 packet to itself. It may
+ never be assigned to any physical interface. It may be thought of as
+ being associated with a virtual interface (e.g., the loopback
+ interface).
+
+ The loopback address must not be used as the source address in IPv6
+ packets that are sent outside of a single node. An IPv6 packet with
+ a destination address of loopback must never be sent outside of a
+ single node and must never be forwarded by an IPv6 router.
+
+
+
+
+
+
+Hinden & Deering Standards Track [Page 9]
+
+RFC 2373 IPv6 Addressing Architecture July 1998
+
+
+2.5.4 IPv6 Addresses with Embedded IPv4 Addresses
+
+ The IPv6 transition mechanisms [TRAN] include a technique for hosts
+ and routers to dynamically tunnel IPv6 packets over IPv4 routing
+ infrastructure. IPv6 nodes that utilize this technique are assigned
+ special IPv6 unicast addresses that carry an IPv4 address in the low-
+ order 32-bits. This type of address is termed an "IPv4-compatible
+ IPv6 address" and has the format:
+
+ | 80 bits | 16 | 32 bits |
+ +--------------------------------------+--------------------------+
+ |0000..............................0000|0000| IPv4 address |
+ +--------------------------------------+----+---------------------+
+
+ A second type of IPv6 address which holds an embedded IPv4 address is
+ also defined. This address is used to represent the addresses of
+ IPv4-only nodes (those that *do not* support IPv6) as IPv6 addresses.
+ This type of address is termed an "IPv4-mapped IPv6 address" and has
+ the format:
+
+ | 80 bits | 16 | 32 bits |
+ +--------------------------------------+--------------------------+
+ |0000..............................0000|FFFF| IPv4 address |
+ +--------------------------------------+----+---------------------+
+
+2.5.5 NSAP Addresses
+
+ This mapping of NSAP address into IPv6 addresses is defined in
+ [NSAP]. This document recommends that network implementors who have
+ planned or deployed an OSI NSAP addressing plan, and who wish to
+ deploy or transition to IPv6, should redesign a native IPv6
+ addressing plan to meet their needs. However, it also defines a set
+ of mechanisms for the support of OSI NSAP addressing in an IPv6
+ network. These mechanisms are the ones that must be used if such
+ support is required. This document also defines a mapping of IPv6
+ addresses within the OSI address format, should this be required.
+
+2.5.6 IPX Addresses
+
+ This mapping of IPX address into IPv6 addresses is as follows:
+
+ | 7 | 121 bits |
+ +-------+---------------------------------------------------------+
+ |0000010| to be defined |
+ +-------+---------------------------------------------------------+
+
+ The draft definition, motivation, and usage are under study.
+
+
+
+
+Hinden & Deering Standards Track [Page 10]
+
+RFC 2373 IPv6 Addressing Architecture July 1998
+
+
+2.5.7 Aggregatable Global Unicast Addresses
+
+ The global aggregatable global unicast address is defined in [AGGR].
+ This address format is designed to support both the current provider
+ based aggregation and a new type of aggregation called exchanges.
+ The combination will allow efficient routing aggregation for both
+ sites which connect directly to providers and who connect to
+ exchanges. Sites will have the choice to connect to either type of
+ aggregation point.
+
+ The IPv6 aggregatable global unicast address format is as follows:
+
+ | 3| 13 | 8 | 24 | 16 | 64 bits |
+ +--+-----+---+--------+--------+--------------------------------+
+ |FP| TLA |RES| NLA | SLA | Interface ID |
+ | | ID | | ID | ID | |
+ +--+-----+---+--------+--------+--------------------------------+
+
+ Where
+
+ 001 Format Prefix (3 bit) for Aggregatable Global
+ Unicast Addresses
+ TLA ID Top-Level Aggregation Identifier
+ RES Reserved for future use
+ NLA ID Next-Level Aggregation Identifier
+ SLA ID Site-Level Aggregation Identifier
+ INTERFACE ID Interface Identifier
+
+ The contents, field sizes, and assignment rules are defined in
+ [AGGR].
+
+2.5.8 Local-Use IPv6 Unicast Addresses
+
+ There are two types of local-use unicast addresses defined. These
+ are Link-Local and Site-Local. The Link-Local is for use on a single
+ link and the Site-Local is for use in a single site. Link-Local
+ addresses have the following format:
+
+ | 10 |
+ | bits | 54 bits | 64 bits |
+ +----------+-------------------------+----------------------------+
+ |1111111010| 0 | interface ID |
+ +----------+-------------------------+----------------------------+
+
+ Link-Local addresses are designed to be used for addressing on a
+ single link for purposes such as auto-address configuration, neighbor
+ discovery, or when no routers are present.
+
+
+
+
+Hinden & Deering Standards Track [Page 11]
+
+RFC 2373 IPv6 Addressing Architecture July 1998
+
+
+ Routers must not forward any packets with link-local source or
+ destination addresses to other links.
+
+ Site-Local addresses have the following format:
+
+ | 10 |
+ | bits | 38 bits | 16 bits | 64 bits |
+ +----------+-------------+-----------+----------------------------+
+ |1111111011| 0 | subnet ID | interface ID |
+ +----------+-------------+-----------+----------------------------+
+
+ Site-Local addresses are designed to be used for addressing inside of
+ a site without the need for a global prefix.
+
+ Routers must not forward any packets with site-local source or
+ destination addresses outside of the site.
+
+2.6 Anycast Addresses
+
+ An IPv6 anycast address is an address that is assigned to more than
+ one interface (typically belonging to different nodes), with the
+ property that a packet sent to an anycast address is routed to the
+ "nearest" interface having that address, according to the routing
+ protocols' measure of distance.
+
+ Anycast addresses are allocated from the unicast address space, using
+ any of the defined unicast address formats. Thus, anycast addresses
+ are syntactically indistinguishable from unicast addresses. When a
+ unicast address is assigned to more than one interface, thus turning
+ it into an anycast address, the nodes to which the address is
+ assigned must be explicitly configured to know that it is an anycast
+ address.
+
+ For any assigned anycast address, there is a longest address prefix P
+ that identifies the topological region in which all interfaces
+ belonging to that anycast address reside. Within the region
+ identified by P, each member of the anycast set must be advertised as
+ a separate entry in the routing system (commonly referred to as a
+ "host route"); outside the region identified by P, the anycast
+ address may be aggregated into the routing advertisement for prefix
+ P.
+
+ Note that in, the worst case, the prefix P of an anycast set may be
+ the null prefix, i.e., the members of the set may have no topological
+ locality. In that case, the anycast address must be advertised as a
+ separate routing entry throughout the entire internet, which presents
+
+
+
+
+
+Hinden & Deering Standards Track [Page 12]
+
+RFC 2373 IPv6 Addressing Architecture July 1998
+
+
+ a severe scaling limit on how many such "global" anycast sets may be
+ supported. Therefore, it is expected that support for global anycast
+ sets may be unavailable or very restricted.
+
+ One expected use of anycast addresses is to identify the set of
+ routers belonging to an organization providing internet service.
+ Such addresses could be used as intermediate addresses in an IPv6
+ Routing header, to cause a packet to be delivered via a particular
+ aggregation or sequence of aggregations. Some other possible uses
+ are to identify the set of routers attached to a particular subnet,
+ or the set of routers providing entry into a particular routing
+ domain.
+
+ There is little experience with widespread, arbitrary use of internet
+ anycast addresses, and some known complications and hazards when
+ using them in their full generality [ANYCST]. Until more experience
+ has been gained and solutions agreed upon for those problems, the
+ following restrictions are imposed on IPv6 anycast addresses:
+
+ o An anycast address must not be used as the source address of an
+ IPv6 packet.
+
+ o An anycast address must not be assigned to an IPv6 host, that
+ is, it may be assigned to an IPv6 router only.
+
+2.6.1 Required Anycast Address
+
+ The Subnet-Router anycast address is predefined. Its format is as
+ follows:
+
+ | n bits | 128-n bits |
+ +------------------------------------------------+----------------+
+ | subnet prefix | 00000000000000 |
+ +------------------------------------------------+----------------+
+
+ The "subnet prefix" in an anycast address is the prefix which
+ identifies a specific link. This anycast address is syntactically
+ the same as a unicast address for an interface on the link with the
+ interface identifier set to zero.
+
+ Packets sent to the Subnet-Router anycast address will be delivered
+ to one router on the subnet. All routers are required to support the
+ Subnet-Router anycast addresses for the subnets which they have
+ interfaces.
+
+
+
+
+
+
+
+Hinden & Deering Standards Track [Page 13]
+
+RFC 2373 IPv6 Addressing Architecture July 1998
+
+
+ The subnet-router anycast address is intended to be used for
+ applications where a node needs to communicate with one of a set of
+ routers on a remote subnet. For example when a mobile host needs to
+ communicate with one of the mobile agents on its "home" subnet.
+
+2.7 Multicast Addresses
+
+ An IPv6 multicast address is an identifier for a group of nodes. A
+ node may belong to any number of multicast groups. Multicast
+ addresses have the following format:
+
+ | 8 | 4 | 4 | 112 bits |
+ +------ -+----+----+---------------------------------------------+
+ |11111111|flgs|scop| group ID |
+ +--------+----+----+---------------------------------------------+
+
+ 11111111 at the start of the address identifies the address as
+ being a multicast address.
+
+ +-+-+-+-+
+ flgs is a set of 4 flags: |0|0|0|T|
+ +-+-+-+-+
+
+ The high-order 3 flags are reserved, and must be initialized to
+ 0.
+
+ T = 0 indicates a permanently-assigned ("well-known") multicast
+ address, assigned by the global internet numbering authority.
+
+ T = 1 indicates a non-permanently-assigned ("transient")
+ multicast address.
+
+ scop is a 4-bit multicast scope value used to limit the scope of
+ the multicast group. The values are:
+
+ 0 reserved
+ 1 node-local scope
+ 2 link-local scope
+ 3 (unassigned)
+ 4 (unassigned)
+ 5 site-local scope
+ 6 (unassigned)
+ 7 (unassigned)
+ 8 organization-local scope
+ 9 (unassigned)
+ A (unassigned)
+ B (unassigned)
+ C (unassigned)
+
+
+
+Hinden & Deering Standards Track [Page 14]
+
+RFC 2373 IPv6 Addressing Architecture July 1998
+
+
+ D (unassigned)
+ E global scope
+ F reserved
+
+ group ID identifies the multicast group, either permanent or
+ transient, within the given scope.
+
+ The "meaning" of a permanently-assigned multicast address is
+ independent of the scope value. For example, if the "NTP servers
+ group" is assigned a permanent multicast address with a group ID of
+ 101 (hex), then:
+
+ FF01:0:0:0:0:0:0:101 means all NTP servers on the same node as the
+ sender.
+
+ FF02:0:0:0:0:0:0:101 means all NTP servers on the same link as the
+ sender.
+
+ FF05:0:0:0:0:0:0:101 means all NTP servers at the same site as the
+ sender.
+
+ FF0E:0:0:0:0:0:0:101 means all NTP servers in the internet.
+
+ Non-permanently-assigned multicast addresses are meaningful only
+ within a given scope. For example, a group identified by the non-
+ permanent, site-local multicast address FF15:0:0:0:0:0:0:101 at one
+ site bears no relationship to a group using the same address at a
+ different site, nor to a non-permanent group using the same group ID
+ with different scope, nor to a permanent group with the same group
+ ID.
+
+ Multicast addresses must not be used as source addresses in IPv6
+ packets or appear in any routing header.
+
+2.7.1 Pre-Defined Multicast Addresses
+
+ The following well-known multicast addresses are pre-defined:
+
+ Reserved Multicast Addresses: FF00:0:0:0:0:0:0:0
+ FF01:0:0:0:0:0:0:0
+ FF02:0:0:0:0:0:0:0
+ FF03:0:0:0:0:0:0:0
+ FF04:0:0:0:0:0:0:0
+ FF05:0:0:0:0:0:0:0
+ FF06:0:0:0:0:0:0:0
+ FF07:0:0:0:0:0:0:0
+ FF08:0:0:0:0:0:0:0
+ FF09:0:0:0:0:0:0:0
+
+
+
+Hinden & Deering Standards Track [Page 15]
+
+RFC 2373 IPv6 Addressing Architecture July 1998
+
+
+ FF0A:0:0:0:0:0:0:0
+ FF0B:0:0:0:0:0:0:0
+ FF0C:0:0:0:0:0:0:0
+ FF0D:0:0:0:0:0:0:0
+ FF0E:0:0:0:0:0:0:0
+ FF0F:0:0:0:0:0:0:0
+
+ The above multicast addresses are reserved and shall never be
+ assigned to any multicast group.
+
+ All Nodes Addresses: FF01:0:0:0:0:0:0:1
+ FF02:0:0:0:0:0:0:1
+
+ The above multicast addresses identify the group of all IPv6 nodes,
+ within scope 1 (node-local) or 2 (link-local).
+
+ All Routers Addresses: FF01:0:0:0:0:0:0:2
+ FF02:0:0:0:0:0:0:2
+ FF05:0:0:0:0:0:0:2
+
+ The above multicast addresses identify the group of all IPv6 routers,
+ within scope 1 (node-local), 2 (link-local), or 5 (site-local).
+
+ Solicited-Node Address: FF02:0:0:0:0:1:FFXX:XXXX
+
+ The above multicast address is computed as a function of a node's
+ unicast and anycast addresses. The solicited-node multicast address
+ is formed by taking the low-order 24 bits of the address (unicast or
+ anycast) and appending those bits to the prefix
+ FF02:0:0:0:0:1:FF00::/104 resulting in a multicast address in the
+ range
+
+ FF02:0:0:0:0:1:FF00:0000
+
+ to
+
+ FF02:0:0:0:0:1:FFFF:FFFF
+
+ For example, the solicited node multicast address corresponding to
+ the IPv6 address 4037::01:800:200E:8C6C is FF02::1:FF0E:8C6C. IPv6
+ addresses that differ only in the high-order bits, e.g. due to
+ multiple high-order prefixes associated with different aggregations,
+ will map to the same solicited-node address thereby reducing the
+ number of multicast addresses a node must join.
+
+ A node is required to compute and join the associated Solicited-Node
+ multicast addresses for every unicast and anycast address it is
+ assigned.
+
+
+
+Hinden & Deering Standards Track [Page 16]
+
+RFC 2373 IPv6 Addressing Architecture July 1998
+
+
+2.7.2 Assignment of New IPv6 Multicast Addresses
+
+ The current approach [ETHER] to map IPv6 multicast addresses into
+ IEEE 802 MAC addresses takes the low order 32 bits of the IPv6
+ multicast address and uses it to create a MAC address. Note that
+ Token Ring networks are handled differently. This is defined in
+ [TOKEN]. Group ID's less than or equal to 32 bits will generate
+ unique MAC addresses. Due to this new IPv6 multicast addresses
+ should be assigned so that the group identifier is always in the low
+ order 32 bits as shown in the following:
+
+ | 8 | 4 | 4 | 80 bits | 32 bits |
+ +------ -+----+----+---------------------------+-----------------+
+ |11111111|flgs|scop| reserved must be zero | group ID |
+ +--------+----+----+---------------------------+-----------------+
+
+ While this limits the number of permanent IPv6 multicast groups to
+ 2^32 this is unlikely to be a limitation in the future. If it
+ becomes necessary to exceed this limit in the future multicast will
+ still work but the processing will be sightly slower.
+
+ Additional IPv6 multicast addresses are defined and registered by the
+ IANA [MASGN].
+
+2.8 A Node's Required Addresses
+
+ A host is required to recognize the following addresses as
+ identifying itself:
+
+ o Its Link-Local Address for each interface
+ o Assigned Unicast Addresses
+ o Loopback Address
+ o All-Nodes Multicast Addresses
+ o Solicited-Node Multicast Address for each of its assigned
+ unicast and anycast addresses
+ o Multicast Addresses of all other groups to which the host
+ belongs.
+
+ A router is required to recognize all addresses that a host is
+ required to recognize, plus the following addresses as identifying
+ itself:
+
+ o The Subnet-Router anycast addresses for the interfaces it is
+ configured to act as a router on.
+ o All other Anycast addresses with which the router has been
+ configured.
+ o All-Routers Multicast Addresses
+
+
+
+
+Hinden & Deering Standards Track [Page 17]
+
+RFC 2373 IPv6 Addressing Architecture July 1998
+
+
+ o Multicast Addresses of all other groups to which the router
+ belongs.
+
+ The only address prefixes which should be predefined in an
+ implementation are the:
+
+ o Unspecified Address
+ o Loopback Address
+ o Multicast Prefix (FF)
+ o Local-Use Prefixes (Link-Local and Site-Local)
+ o Pre-Defined Multicast Addresses
+ o IPv4-Compatible Prefixes
+
+ Implementations should assume all other addresses are unicast unless
+ specifically configured (e.g., anycast addresses).
+
+3. Security Considerations
+
+ IPv6 addressing documents do not have any direct impact on Internet
+ infrastructure security. Authentication of IPv6 packets is defined
+ in [AUTH].
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Hinden & Deering Standards Track [Page 18]
+
+RFC 2373 IPv6 Addressing Architecture July 1998
+
+
+APPENDIX A : Creating EUI-64 based Interface Identifiers
+--------------------------------------------------------
+
+ Depending on the characteristics of a specific link or node there are
+ a number of approaches for creating EUI-64 based interface
+ identifiers. This appendix describes some of these approaches.
+
+Links or Nodes with EUI-64 Identifiers
+
+ The only change needed to transform an EUI-64 identifier to an
+ interface identifier is to invert the "u" (universal/local) bit. For
+ example, a globally unique EUI-64 identifier of the form:
+
+ |0 1|1 3|3 4|4 6|
+ |0 5|6 1|2 7|8 3|
+ +----------------+----------------+----------------+----------------+
+ |cccccc0gcccccccc|ccccccccmmmmmmmm|mmmmmmmmmmmmmmmm|mmmmmmmmmmmmmmmm|
+ +----------------+----------------+----------------+----------------+
+
+ where "c" are the bits of the assigned company_id, "0" is the value
+ of the universal/local bit to indicate global scope, "g" is
+ individual/group bit, and "m" are the bits of the manufacturer-
+ selected extension identifier. The IPv6 interface identifier would
+ be of the form:
+
+ |0 1|1 3|3 4|4 6|
+ |0 5|6 1|2 7|8 3|
+ +----------------+----------------+----------------+----------------+
+ |cccccc1gcccccccc|ccccccccmmmmmmmm|mmmmmmmmmmmmmmmm|mmmmmmmmmmmmmmmm|
+ +----------------+----------------+----------------+----------------+
+
+ The only change is inverting the value of the universal/local bit.
+
+Links or Nodes with IEEE 802 48 bit MAC's
+
+ [EUI64] defines a method to create a EUI-64 identifier from an IEEE
+ 48bit MAC identifier. This is to insert two octets, with hexadecimal
+ values of 0xFF and 0xFE, in the middle of the 48 bit MAC (between the
+ company_id and vendor supplied id). For example the 48 bit MAC with
+ global scope:
+
+ |0 1|1 3|3 4|
+ |0 5|6 1|2 7|
+ +----------------+----------------+----------------+
+ |cccccc0gcccccccc|ccccccccmmmmmmmm|mmmmmmmmmmmmmmmm|
+ +----------------+----------------+----------------+
+
+
+
+
+
+Hinden & Deering Standards Track [Page 19]
+
+RFC 2373 IPv6 Addressing Architecture July 1998
+
+
+ where "c" are the bits of the assigned company_id, "0" is the value
+ of the universal/local bit to indicate global scope, "g" is
+ individual/group bit, and "m" are the bits of the manufacturer-
+ selected extension identifier. The interface identifier would be of
+ the form:
+
+ |0 1|1 3|3 4|4 6|
+ |0 5|6 1|2 7|8 3|
+ +----------------+----------------+----------------+----------------+
+ |cccccc1gcccccccc|cccccccc11111111|11111110mmmmmmmm|mmmmmmmmmmmmmmmm|
+ +----------------+----------------+----------------+----------------+
+
+ When IEEE 802 48bit MAC addresses are available (on an interface or a
+ node), an implementation should use them to create interface
+ identifiers due to their availability and uniqueness properties.
+
+Links with Non-Global Identifiers
+
+ There are a number of types of links that, while multi-access, do not
+ have globally unique link identifiers. Examples include LocalTalk
+ and Arcnet. The method to create an EUI-64 formatted identifier is
+ to take the link identifier (e.g., the LocalTalk 8 bit node
+ identifier) and zero fill it to the left. For example a LocalTalk 8
+ bit node identifier of hexadecimal value 0x4F results in the
+ following interface identifier:
+
+ |0 1|1 3|3 4|4 6|
+ |0 5|6 1|2 7|8 3|
+ +----------------+----------------+----------------+----------------+
+ |0000000000000000|0000000000000000|0000000000000000|0000000001001111|
+ +----------------+----------------+----------------+----------------+
+
+ Note that this results in the universal/local bit set to "0" to
+ indicate local scope.
+
+Links without Identifiers
+
+ There are a number of links that do not have any type of built-in
+ identifier. The most common of these are serial links and configured
+ tunnels. Interface identifiers must be chosen that are unique for
+ the link.
+
+ When no built-in identifier is available on a link the preferred
+ approach is to use a global interface identifier from another
+ interface or one which is assigned to the node itself. To use this
+ approach no other interface connecting the same node to the same link
+ may use the same identifier.
+
+
+
+
+Hinden & Deering Standards Track [Page 20]
+
+RFC 2373 IPv6 Addressing Architecture July 1998
+
+
+ If there is no global interface identifier available for use on the
+ link the implementation needs to create a local scope interface
+ identifier. The only requirement is that it be unique on the link.
+ There are many possible approaches to select a link-unique interface
+ identifier. They include:
+
+ Manual Configuration
+ Generated Random Number
+ Node Serial Number (or other node-specific token)
+
+ The link-unique interface identifier should be generated in a manner
+ that it does not change after a reboot of a node or if interfaces are
+ added or deleted from the node.
+
+ The selection of the appropriate algorithm is link and implementation
+ dependent. The details on forming interface identifiers are defined
+ in the appropriate "IPv6 over <link>" specification. It is strongly
+ recommended that a collision detection algorithm be implemented as
+ part of any automatic algorithm.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Hinden & Deering Standards Track [Page 21]
+
+RFC 2373 IPv6 Addressing Architecture July 1998
+
+
+APPENDIX B: ABNF Description of Text Representations
+----------------------------------------------------
+
+ This appendix defines the text representation of IPv6 addresses and
+ prefixes in Augmented BNF [ABNF] for reference purposes.
+
+ IPv6address = hexpart [ ":" IPv4address ]
+ IPv4address = 1*3DIGIT "." 1*3DIGIT "." 1*3DIGIT "." 1*3DIGIT
+
+ IPv6prefix = hexpart "/" 1*2DIGIT
+
+ hexpart = hexseq | hexseq "::" [ hexseq ] | "::" [ hexseq ]
+ hexseq = hex4 *( ":" hex4)
+ hex4 = 1*4HEXDIG
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Hinden & Deering Standards Track [Page 22]
+
+RFC 2373 IPv6 Addressing Architecture July 1998
+
+
+APPENDIX C: CHANGES FROM RFC-1884
+---------------------------------
+
+ The following changes were made from RFC-1884 "IP Version 6
+ Addressing Architecture":
+
+ - Added an appendix providing a ABNF description of text
+ representations.
+ - Clarification that link unique identifiers not change after
+ reboot or other interface reconfigurations.
+ - Clarification of Address Model based on comments.
+ - Changed aggregation format terminology to be consistent with
+ aggregation draft.
+ - Added text to allow interface identifier to be used on more than
+ one interface on same node.
+ - Added rules for defining new multicast addresses.
+ - Added appendix describing procedures for creating EUI-64 based
+ interface ID's.
+ - Added notation for defining IPv6 prefixes.
+ - Changed solicited node multicast definition to use a longer
+ prefix.
+ - Added site scope all routers multicast address.
+ - Defined Aggregatable Global Unicast Addresses to use "001" Format
+ Prefix.
+ - Changed "010" (Provider-Based Unicast) and "100" (Reserved for
+ Geographic) Format Prefixes to Unassigned.
+ - Added section on Interface ID definition for unicast addresses.
+ Requires use of EUI-64 in range of format prefixes and rules for
+ setting global/local scope bit in EUI-64.
+ - Updated NSAP text to reflect working in RFC1888.
+ - Removed protocol specific IPv6 multicast addresses (e.g., DHCP)
+ and referenced the IANA definitions.
+ - Removed section "Unicast Address Example". Had become OBE.
+ - Added new and updated references.
+ - Minor text clarifications and improvements.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Hinden & Deering Standards Track [Page 23]
+
+RFC 2373 IPv6 Addressing Architecture July 1998
+
+
+REFERENCES
+
+ [ABNF] Crocker, D., and P. Overell, "Augmented BNF for
+ Syntax Specifications: ABNF", RFC 2234, November 1997.
+
+ [AGGR] Hinden, R., O'Dell, M., and S. Deering, "An
+ Aggregatable Global Unicast Address Format", RFC 2374, July
+ 1998.
+
+ [AUTH] Atkinson, R., "IP Authentication Header", RFC 1826, August
+ 1995.
+
+ [ANYCST] Partridge, C., Mendez, T., and W. Milliken, "Host
+ Anycasting Service", RFC 1546, November 1993.
+
+ [CIDR] Fuller, V., Li, T., Yu, J., and K. Varadhan, "Classless
+ Inter-Domain Routing (CIDR): An Address Assignment and
+ Aggregation Strategy", RFC 1519, September 1993.
+
+ [ETHER] Crawford, M., "Transmission of IPv6 Pacekts over Ethernet
+ Networks", Work in Progress.
+
+ [EUI64] IEEE, "Guidelines for 64-bit Global Identifier (EUI-64)
+ Registration Authority",
+ http://standards.ieee.org/db/oui/tutorials/EUI64.html,
+ March 1997.
+
+ [FDDI] Crawford, M., "Transmission of IPv6 Packets over FDDI
+ Networks", Work in Progress.
+
+ [IPV6] Deering, S., and R. Hinden, Editors, "Internet Protocol,
+ Version 6 (IPv6) Specification", RFC 1883, December 1995.
+
+ [MASGN] Hinden, R., and S. Deering, "IPv6 Multicast Address
+ Assignments", RFC 2375, July 1998.
+
+ [NSAP] Bound, J., Carpenter, B., Harrington, D., Houldsworth, J.,
+ and A. Lloyd, "OSI NSAPs and IPv6", RFC 1888, August 1996.
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+ [TOKEN] Thomas, S., "Transmission of IPv6 Packets over Token Ring
+ Networks", Work in Progress.
+
+ [TRAN] Gilligan, R., and E. Nordmark, "Transition Mechanisms for
+ IPv6 Hosts and Routers", RFC 1993, April 1996.
+
+
+
+
+Hinden & Deering Standards Track [Page 24]
+
+RFC 2373 IPv6 Addressing Architecture July 1998
+
+
+AUTHORS' ADDRESSES
+
+ Robert M. Hinden
+ Nokia
+ 232 Java Drive
+ Sunnyvale, CA 94089
+ USA
+
+ Phone: +1 408 990-2004
+ Fax: +1 408 743-5677
+ EMail: hinden@iprg.nokia.com
+
+
+ Stephen E. Deering
+ Cisco Systems, Inc.
+ 170 West Tasman Drive
+ San Jose, CA 95134-1706
+ USA
+
+ Phone: +1 408 527-8213
+ Fax: +1 408 527-8254
+ EMail: deering@cisco.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Hinden & Deering Standards Track [Page 25]
+
+RFC 2373 IPv6 Addressing Architecture July 1998
+
+
+Full Copyright Statement
+
+ Copyright (C) The Internet Society (1998). All Rights Reserved.
+
+ This document and translations of it may be copied and furnished to
+ others, and derivative works that comment on or otherwise explain it
+ or assist in its implementation may be prepared, copied, published
+ and distributed, in whole or in part, without restriction of any
+ kind, provided that the above copyright notice and this paragraph are
+ included on all such copies and derivative works. However, this
+ document itself may not be modified in any way, such as by removing
+ the copyright notice or references to the Internet Society or other
+ Internet organizations, except as needed for the purpose of
+ developing Internet standards in which case the procedures for
+ copyrights defined in the Internet Standards process must be
+ followed, or as required to translate it into languages other than
+ English.
+
+ The limited permissions granted above are perpetual and will not be
+ revoked by the Internet Society or its successors or assigns.
+
+ This document and the information contained herein is provided on an
+ "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+ TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+ BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+ HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+ MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Hinden & Deering Standards Track [Page 26]
+
diff --git a/doc/rfc/rfc2374.txt b/doc/rfc/rfc2374.txt new file mode 100644 index 00000000..676a4c4d --- /dev/null +++ b/doc/rfc/rfc2374.txt @@ -0,0 +1,675 @@ +
+
+
+
+
+
+Network Working Group R. Hinden
+Request for Comments: 2374 Nokia
+Obsoletes: 2073 M. O'Dell
+Category: Standards Track UUNET
+ S. Deering
+ Cisco
+ July 1998
+
+
+ An IPv6 Aggregatable Global Unicast Address Format
+
+Status of this Memo
+
+ This document specifies an Internet standards track protocol for the
+ Internet community, and requests discussion and suggestions for
+ improvements. Please refer to the current edition of the "Internet
+ Official Protocol Standards" (STD 1) for the standardization state
+ and status of this protocol. Distribution of this memo is unlimited.
+
+Copyright Notice
+
+ Copyright (C) The Internet Society (1998). All Rights Reserved.
+
+1.0 Introduction
+
+ This document defines an IPv6 aggregatable global unicast address
+ format for use in the Internet. The address format defined in this
+ document is consistent with the IPv6 Protocol [IPV6] and the "IPv6
+ Addressing Architecture" [ARCH]. It is designed to facilitate
+ scalable Internet routing.
+
+ This documented replaces RFC 2073, "An IPv6 Provider-Based Unicast
+ Address Format". RFC 2073 will become historic. The Aggregatable
+ Global Unicast Address Format is an improvement over RFC 2073 in a
+ number of areas. The major changes include removal of the registry
+ bits because they are not needed for route aggregation, support of
+ EUI-64 based interface identifiers, support of provider and exchange
+ based aggregation, separation of public and site topology, and new
+ aggregation based terminology.
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in [RFC 2119].
+
+
+
+
+
+
+
+
+Hinden, et. al. Standards Track [Page 1]
+
+RFC 2374 IPv6 Global Unicast Address Format July 1998
+
+
+2.0 Overview of the IPv6 Address
+
+ IPv6 addresses are 128-bit identifiers for interfaces and sets of
+ interfaces. There are three types of addresses: Unicast, Anycast,
+ and Multicast. This document defines a specific type of Unicast
+ address.
+
+ In this document, fields in addresses are given specific names, for
+ example "subnet". When this name is used with the term "ID" (for
+ "identifier") after the name (e.g., "subnet ID"), it refers to the
+ contents of the named field. When it is used with the term "prefix"
+ (e.g. "subnet prefix") it refers to all of the addressing bits to
+ the left of and including this field.
+
+ IPv6 unicast addresses are designed assuming that the Internet
+ routing system makes forwarding decisions based on a "longest prefix
+ match" algorithm on arbitrary bit boundaries and does not have any
+ knowledge of the internal structure of IPv6 addresses. The structure
+ in IPv6 addresses is for assignment and allocation. The only
+ exception to this is the distinction made between unicast and
+ multicast addresses.
+
+ The specific type of an IPv6 address is indicated by the leading bits
+ in the address. The variable-length field comprising these leading
+ bits is called the Format Prefix (FP).
+
+ This document defines an address format for the 001 (binary) Format
+ Prefix for Aggregatable Global Unicast addresses. The same address
+ format could be used for other Format Prefixes, as long as these
+ Format Prefixes also identify IPv6 unicast addresses. Only the "001"
+ Format Prefix is defined here.
+
+3.0 IPv6 Aggregatable Global Unicast Address Format
+
+ This document defines an address format for the IPv6 aggregatable
+ global unicast address assignment. The authors believe that this
+ address format will be widely used for IPv6 nodes connected to the
+ Internet. This address format is designed to support both the
+ current provider-based aggregation and a new type of exchange-based
+ aggregation. The combination will allow efficient routing
+ aggregation for sites that connect directly to providers and for
+ sites that connect to exchanges. Sites will have the choice to
+ connect to either type of aggregation entity.
+
+
+
+
+
+
+
+
+Hinden, et. al. Standards Track [Page 2]
+
+RFC 2374 IPv6 Global Unicast Address Format July 1998
+
+
+ While this address format is designed to support exchange-based
+ aggregation (in addition to current provider-based aggregation) it is
+ not dependent on exchanges for it's overall route aggregation
+ properties. It will provide efficient route aggregation with only
+ provider-based aggregation.
+
+ Aggregatable addresses are organized into a three level hierarchy:
+
+ - Public Topology
+ - Site Topology
+ - Interface Identifier
+
+ Public topology is the collection of providers and exchanges who
+ provide public Internet transit services. Site topology is local to
+ a specific site or organization which does not provide public transit
+ service to nodes outside of the site. Interface identifiers identify
+ interfaces on links.
+
+ ______________ ______________
+ --+/ \+--------------+/ \+----------
+ ( P1 ) +----+ ( P3 ) +----+
+ +\______________/ | |----+\______________/+--| |--
+ | +--| X1 | +| X2 |
+ | ______________ / | |-+ ______________ / | |--
+ +/ \+ +-+--+ \ / \+ +----+
+ ( P2 ) / \ +( P4 )
+ --+\______________/ / \ \______________/
+ | / \ | |
+ | / | | |
+ | / | | |
+ _|_ _/_ _|_ _|_ _|_
+ / \ / \ / \ / \ / \
+ ( S.A ) ( S.B ) ( P5 ) ( P6 )( S.C )
+ \___/ \___/ \___/ \___/ \___/
+ | / \
+ _|_ _/_ \ ___
+ / \ / \ +-/ \
+ ( S.D ) ( S.E ) ( S.F )
+ \___/ \___/ \___/
+
+ As shown in the figure above, the aggregatable address format is
+ designed to support long-haul providers (shown as P1, P2, P3, and
+ P4), exchanges (shown as X1 and X2), multiple levels of providers
+ (shown at P5 and P6), and subscribers (shown as S.x) Exchanges
+ (unlike current NAPs, FIXes, etc.) will allocate IPv6 addresses.
+ Organizations who connect to these exchanges will also subscribe
+ (directly, indirectly via the exchange, etc.) for long-haul service
+ from one or more long-haul providers. Doing so, they will achieve
+
+
+
+Hinden, et. al. Standards Track [Page 3]
+
+RFC 2374 IPv6 Global Unicast Address Format July 1998
+
+
+ addressing independence from long-haul transit providers. They will
+ be able to change long-haul providers without having to renumber
+ their organization. They can also be multihomed via the exchange to
+ more than one long-haul provider without having to have address
+ prefixes from each long-haul provider. Note that the mechanisms used
+ for this type of provider selection and portability are not discussed
+ in the document.
+
+3.1 Aggregatable Global Unicast Address Structure
+
+ The aggregatable global unicast address format is as follows:
+
+ | 3| 13 | 8 | 24 | 16 | 64 bits |
+ +--+-----+---+--------+--------+--------------------------------+
+ |FP| TLA |RES| NLA | SLA | Interface ID |
+ | | ID | | ID | ID | |
+ +--+-----+---+--------+--------+--------------------------------+
+
+ <--Public Topology---> Site
+ <-------->
+ Topology
+ <------Interface Identifier----->
+
+ Where
+
+ FP Format Prefix (001)
+ TLA ID Top-Level Aggregation Identifier
+ RES Reserved for future use
+ NLA ID Next-Level Aggregation Identifier
+ SLA ID Site-Level Aggregation Identifier
+ INTERFACE ID Interface Identifier
+
+ The following sections specify each part of the IPv6 Aggregatable
+ Global Unicast address format.
+
+3.2 Top-Level Aggregation ID
+
+ Top-Level Aggregation Identifiers (TLA ID) are the top level in the
+ routing hierarchy. Default-free routers must have a routing table
+ entry for every active TLA ID and will probably have additional
+ entries providing routing information for the TLA ID in which they
+ are located. They may have additional entries in order to optimize
+ routing for their specific topology, but the routing topology at all
+ levels must be designed to minimize the number of additional entries
+ fed into the default free routing tables.
+
+
+
+
+
+
+Hinden, et. al. Standards Track [Page 4]
+
+RFC 2374 IPv6 Global Unicast Address Format July 1998
+
+
+ This addressing format supports 8,192 (2^13) TLA ID's. Additional
+ TLA ID's may be added by either growing the TLA field to the right
+ into the reserved field or by using this format for additional format
+ prefixes.
+
+ The issues relating to TLA ID assignment are beyond the scope of this
+ document. They will be described in a document under preparation.
+
+3.3 Reserved
+
+ The Reserved field is reserved for future use and must be set to
+ zero.
+
+ The Reserved field allows for future growth of the TLA and NLA fields
+ as appropriate. See section 4.0 for a discussion.
+
+3.4 Next-Level Aggregation Identifier
+
+ Next-Level Aggregation Identifier's are used by organizations
+ assigned a TLA ID to create an addressing hierarchy and to identify
+ sites. The organization can assign the top part of the NLA ID in a
+ manner to create an addressing hierarchy appropriate to its network.
+ It can use the remainder of the bits in the field to identify sites
+ it wishes to serve. This is shown as follows:
+
+ | n | 24-n bits | 16 | 64 bits |
+ +-----+--------------------+--------+-----------------+
+ |NLA1 | Site ID | SLA ID | Interface ID |
+ +-----+--------------------+--------+-----------------+
+
+ Each organization assigned a TLA ID receives 24 bits of NLA ID space.
+ This NLA ID space allows each organization to provide service to
+ approximately as many organizations as the current IPv4 Internet can
+ support total networks.
+
+ Organizations assigned TLA ID's may also support NLA ID's in their
+ own Site ID space. This allows the organization assigned a TLA ID to
+ provide service to organizations providing public transit service and
+ to organizations who do not provide public transit service. These
+ organizations receiving an NLA ID may also choose to use their Site
+ ID space to support other NLA ID's. This is shown as follows:
+
+
+
+
+
+
+
+
+
+
+Hinden, et. al. Standards Track [Page 5]
+
+RFC 2374 IPv6 Global Unicast Address Format July 1998
+
+
+ | n | 24-n bits | 16 | 64 bits |
+ +-----+--------------------+--------+-----------------+
+ |NLA1 | Site ID | SLA ID | Interface ID |
+ +-----+--------------------+--------+-----------------+
+
+ | m | 24-n-m | 16 | 64 bits |
+ +-----+--------------+--------+-----------------+
+ |NLA2 | Site ID | SLA ID | Interface ID |
+ +-----+--------------+--------+-----------------+
+
+ | o |24-n-m-o| 16 | 64 bits |
+ +-----+--------+--------+-----------------+
+ |NLA3 | Site ID| SLA ID | Interface ID |
+ +-----+--------+--------+-----------------+
+
+ The design of the bit layout of the NLA ID space for a specific TLA
+ ID is left to the organization responsible for that TLA ID. Likewise
+ the design of the bit layout of the next level NLA ID is the
+ responsibility of the previous level NLA ID. It is recommended that
+ organizations assigning NLA address space use "slow start" allocation
+ procedures similar to [RFC2050].
+
+ The design of an NLA ID allocation plan is a tradeoff between routing
+ aggregation efficiency and flexibility. Creating hierarchies allows
+ for greater amount of aggregation and results in smaller routing
+ tables. Flat NLA ID assignment provides for easier allocation and
+ attachment flexibility, but results in larger routing tables.
+
+3.5 Site-Level Aggregation Identifier
+
+ The SLA ID field is used by an individual organization to create its
+ own local addressing hierarchy and to identify subnets. This is
+ analogous to subnets in IPv4 except that each organization has a much
+ greater number of subnets. The 16 bit SLA ID field support 65,535
+ individual subnets.
+
+ Organizations may choose to either route their SLA ID "flat" (e.g.,
+ not create any logical relationship between the SLA identifiers that
+ results in larger routing tables), or to create a two or more level
+ hierarchy (that results in smaller routing tables) in the SLA ID
+ field. The latter is shown as follows:
+
+
+
+
+
+
+
+
+
+
+Hinden, et. al. Standards Track [Page 6]
+
+RFC 2374 IPv6 Global Unicast Address Format July 1998
+
+
+ | n | 16-n | 64 bits |
+ +-----+------------+-------------------------------------+
+ |SLA1 | Subnet | Interface ID |
+ +-----+------------+-------------------------------------+
+
+ | m |16-n-m | 64 bits |
+ +----+-------+-------------------------------------+
+ |SLA2|Subnet | Interface ID |
+ +----+-------+-------------------------------------+
+
+ The approach chosen for structuring an SLA ID field is the
+ responsibility of the individual organization.
+
+ The number of subnets supported in this address format should be
+ sufficient for all but the largest of organizations. Organizations
+ which need additional subnets can arrange with the organization they
+ are obtaining Internet service from to obtain additional site
+ identifiers and use this to create additional subnets.
+
+3.6 Interface ID
+
+ Interface identifiers are used to identify interfaces on a link.
+ They are required to be unique on that link. They may also be unique
+ over a broader scope. In many cases an interfaces identifier will be
+ the same or be based on the interface's link-layer address.
+ Interface IDs used in the aggregatable global unicast address format
+ are required to be 64 bits long and to be constructed in IEEE EUI-64
+ format [EUI-64]. These identifiers may have global scope when a
+ global token (e.g., IEEE 48bit MAC) is available or may have local
+ scope where a global token is not available (e.g., serial links,
+ tunnel end-points, etc.). The "u" bit (universal/local bit in IEEE
+ EUI-64 terminology) in the EUI-64 identifier must be set correctly,
+ as defined in [ARCH], to indicate global or local scope.
+
+ The procedures for creating EUI-64 based Interface Identifiers is
+ defined in [ARCH]. The details on forming interface identifiers is
+ defined in the appropriate "IPv6 over <link>" specification such as
+ "IPv6 over Ethernet" [ETHER], "IPv6 over FDDI" [FDDI], etc.
+
+4.0 Technical Motivation
+
+ The design choices for the size of the fields in the aggregatable
+ address format were based on the need to meet a number of technical
+ requirements. These are described in the following paragraphs.
+
+ The size of the Top-Level Aggregation Identifier is 13 bits. This
+ allows for 8,192 TLA ID's. This size was chosen to insure that the
+ default-free routing table in top level routers in the Internet is
+
+
+
+Hinden, et. al. Standards Track [Page 7]
+
+RFC 2374 IPv6 Global Unicast Address Format July 1998
+
+
+ kept within the limits, with a reasonable margin, of the current
+ routing technology. The margin is important because default-free
+ routers will also carry a significant number of longer (i.e., more-
+ specific) prefixes for optimizing paths internal to a TLA and between
+ TLAs.
+
+ The important issue is not only the size of the default-free routing
+ table, but the complexity of the topology that determines the number
+ of copies of the default-free routes that a router must examine while
+ computing a forwarding table. Current practice with IPv4 it is
+ common to see a prefix announced fifteen times via different paths.
+
+ The complexity of Internet topology is very likely to increase in the
+ future. It is important that IPv6 default-free routing support
+ additional complexity as well as a considerably larger internet.
+
+ It should be noted for comparison that at the time of this writing
+ (spring, 1998) the IPv4 default-free routing table contains
+ approximately 50,000 prefixes. While this shows that it is possible
+ to support more routes than 8,192 it is matter of debate if the
+ number of prefixes supported today in IPv4 is already too high for
+ current routing technology. There are serious issues of route
+ stability as well as cases of providers not supporting all top level
+ prefixes. The technical requirement was to pick a TLA ID size that
+ was below, with a reasonable margin, what was being done with IPv4.
+
+ The choice of 13 bits for the TLA field was an engineering
+ compromise. Fewer bits would have been too small by not supporting
+ enough top level organizations. More bits would have exceeded what
+ can be reasonably accommodated, with a reasonable margin, with
+ current routing technology in order to deal with the issues described
+ in the previous paragraphs.
+
+ If in the future, routing technology improves to support a larger
+ number of top level routes in the default-free routing tables there
+ are two choices on how to increase the number TLA identifiers. The
+ first is to expand the TLA ID field into the reserved field. This
+ would increase the number of TLA ID's to approximately 2 million.
+ The second approach is to allocate another format prefix (FP) for use
+ with this address format. Either or a combination of these
+ approaches allows the number of TLA ID's to increase significantly.
+
+ The size of the Reserved field is 8 bits. This size was chosen to
+ allow significant growth of either the TLA ID and/or the NLA ID
+ fields.
+
+ The size of the Next-Level Aggregation Identifier field is 24 bits.
+
+
+
+
+Hinden, et. al. Standards Track [Page 8]
+
+RFC 2374 IPv6 Global Unicast Address Format July 1998
+
+
+ This allows for approximately sixteen million NLA ID's if used in a
+ flat manner. Used hierarchically it allows for a complexity roughly
+ equivalent to the IPv4 address space (assuming an average network
+ size of 254 interfaces). If in the future additional room for
+ complexity is needed in the NLA ID, this may be accommodated by
+ extending the NLA ID into the Reserved field.
+
+ The size of the Site-Level Aggregation Identifier field is 16 bits.
+ This supports 65,535 individual subnets per site. The design goal
+ for the size of this field was to be sufficient for all but the
+ largest of organizations. Organizations which need additional
+ subnets can arrange with the organization they are obtaining Internet
+ service from to obtain additional site identifiers and use this to
+ create additional subnets.
+
+ The Site-Level Aggregation Identifier field was given a fixed size in
+ order to force the length of all prefixes identifying a particular
+ site to be the same length (i.e., 48 bits). This facilitates
+ movement of sites in the topology (e.g., changing service providers
+ and multi-homing to multiple service providers).
+
+ The Interface ID Interface Identifier field is 64 bits. This size
+ was chosen to meet the requirement specified in [ARCH] to support
+ EUI-64 based Interface Identifiers.
+
+5.0 Acknowledgments
+
+ The authors would like to express our thanks to Thomas Narten, Bob
+ Fink, Matt Crawford, Allison Mankin, Jim Bound, Christian Huitema,
+ Scott Bradner, Brian Carpenter, John Stewart, and Daniel Karrenberg
+ for their review and constructive comments.
+
+6.0 References
+
+ [ALLOC] IAB and IESG, "IPv6 Address Allocation Management",
+ RFC 1881, December 1995.
+
+ [ARCH] Hinden, R., "IP Version 6 Addressing Architecture",
+ RFC 2373, July 1998.
+
+ [AUTH] Atkinson, R., "IP Authentication Header", RFC 1826, August
+ 1995.
+
+ [AUTO] Thompson, S., and T. Narten., "IPv6 Stateless Address
+ Autoconfiguration", RFC 1971, August 1996.
+
+ [ETHER] Crawford, M., "Transmission of IPv6 Packets over Ethernet
+ Networks", Work in Progress.
+
+
+
+Hinden, et. al. Standards Track [Page 9]
+
+RFC 2374 IPv6 Global Unicast Address Format July 1998
+
+
+ [EUI64] IEEE, "Guidelines for 64-bit Global Identifier (EUI-64)
+ Registration Authority",
+ http://standards.ieee.org/db/oui/tutorials/EUI64.html,
+ March 1997.
+
+ [FDDI] Crawford, M., "Transmission of IPv6 Packets over FDDI
+ Networks", Work in Progress.
+
+ [IPV6] Deering, S., and R. Hinden, "Internet Protocol, Version 6
+ (IPv6) Specification", RFC 1883, December 1995.
+
+ [RFC2050] Hubbard, K., Kosters, M., Conrad, D., Karrenberg, D.,
+ and J. Postel, "Internet Registry IP Allocation
+ Guidelines", BCP 12, RFC 1466, November 1996.
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+7.0 Security Considerations
+
+ IPv6 addressing documents do not have any direct impact on Internet
+ infrastructure security. Authentication of IPv6 packets is defined
+ in [AUTH].
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Hinden, et. al. Standards Track [Page 10]
+
+RFC 2374 IPv6 Global Unicast Address Format July 1998
+
+
+8.0 Authors' Addresses
+
+ Robert M. Hinden
+ Nokia
+ 232 Java Drive
+ Sunnyvale, CA 94089
+ USA
+
+ Phone: 1 408 990-2004
+ EMail: hinden@iprg.nokia.com
+
+
+ Mike O'Dell
+ UUNET Technologies, Inc.
+ 3060 Williams Drive
+ Fairfax, VA 22030
+ USA
+
+ Phone: 1 703 206-5890
+ EMail: mo@uunet.uu.net
+
+
+ Stephen E. Deering
+ Cisco Systems, Inc.
+ 170 West Tasman Drive
+ San Jose, CA 95134-1706
+ USA
+
+ Phone: 1 408 527-8213
+ EMail: deering@cisco.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Hinden, et. al. Standards Track [Page 11]
+
+RFC 2374 IPv6 Global Unicast Address Format July 1998
+
+
+9.0 Full Copyright Statement
+
+ Copyright (C) The Internet Society (1998). All Rights Reserved.
+
+ This document and translations of it may be copied and furnished to
+ others, and derivative works that comment on or otherwise explain it
+ or assist in its implementation may be prepared, copied, published
+ and distributed, in whole or in part, without restriction of any
+ kind, provided that the above copyright notice and this paragraph are
+ included on all such copies and derivative works. However, this
+ document itself may not be modified in any way, such as by removing
+ the copyright notice or references to the Internet Society or other
+ Internet organizations, except as needed for the purpose of
+ developing Internet standards in which case the procedures for
+ copyrights defined in the Internet Standards process must be
+ followed, or as required to translate it into languages other than
+ English.
+
+ The limited permissions granted above are perpetual and will not be
+ revoked by the Internet Society or its successors or assigns.
+
+ This document and the information contained herein is provided on an
+ "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+ TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+ BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+ HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+ MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Hinden, et. al. Standards Track [Page 12]
+
diff --git a/doc/rfc/rfc2375.txt b/doc/rfc/rfc2375.txt new file mode 100644 index 00000000..a1fe8b9a --- /dev/null +++ b/doc/rfc/rfc2375.txt @@ -0,0 +1,451 @@ + + + + + + +Network Working Group R. Hinden +Request for Comments: 2375 Ipsilon Networks +Category: Informational S. Deering + Cisco + July 1998 + + + IPv6 Multicast Address Assignments + +Status of this Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard of any kind. Distribution of this + memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (1998). All Rights Reserved. + +1.0 Introduction + + This document defines the initial assignment of IPv6 multicast + addresses. It is based on the "IP Version 6 Addressing Architecture" + [ADDARCH] and current IPv4 multicast address assignment found in + <ftp://venera.isi.edu/in-notes/iana/assignments/multicast-addresses>. + It adapts the IPv4 assignments that are relevant to IPv6 assignments. + IPv4 assignments that were not relevant were not converted into IPv6 + assignments. Comments are solicited on this conversion. + + All other IPv6 multicast addresses are reserved. + + Sections 2 and 3 specify reserved and preassigned IPv6 multicast + addresses. + + [ADDRARCH] defines rules for assigning new IPv6 multicast addresses. + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in [RFC 2119]. + +2. Fixed Scope Multicast Addresses + + These permanently assigned multicast addresses are valid over a + specified scope value. + + + + + + + +Hinden & Deering Informational [Page 1] + +RFC 2375 IPv6 Multicast Address Assignments July 1998 + + +2.1 Node-Local Scope + + FF01:0:0:0:0:0:0:1 All Nodes Address [ADDARCH] + FF01:0:0:0:0:0:0:2 All Routers Address [ADDARCH] + +2.2 Link-Local Scope + + FF02:0:0:0:0:0:0:1 All Nodes Address [ADDARCH] + FF02:0:0:0:0:0:0:2 All Routers Address [ADDARCH] + FF02:0:0:0:0:0:0:3 Unassigned [JBP] + FF02:0:0:0:0:0:0:4 DVMRP Routers [RFC1075,JBP] + FF02:0:0:0:0:0:0:5 OSPFIGP [RFC2328,Moy] + FF02:0:0:0:0:0:0:6 OSPFIGP Designated Routers [RFC2328,Moy] + FF02:0:0:0:0:0:0:7 ST Routers [RFC1190,KS14] + FF02:0:0:0:0:0:0:8 ST Hosts [RFC1190,KS14] + FF02:0:0:0:0:0:0:9 RIP Routers [RFC2080] + FF02:0:0:0:0:0:0:A EIGRP Routers [Farinacci] + FF02:0:0:0:0:0:0:B Mobile-Agents [Bill Simpson] + + FF02:0:0:0:0:0:0:D All PIM Routers [Farinacci] + FF02:0:0:0:0:0:0:E RSVP-ENCAPSULATION [Braden] + + FF02:0:0:0:0:0:1:1 Link Name [Harrington] + FF02:0:0:0:0:0:1:2 All-dhcp-agents [Bound,Perkins] + + FF02:0:0:0:0:1:FFXX:XXXX Solicited-Node Address [ADDARCH] + +2.3 Site-Local Scope + + FF05:0:0:0:0:0:0:2 All Routers Address [ADDARCH] + + FF05:0:0:0:0:0:1:3 All-dhcp-servers [Bound,Perkins] + FF05:0:0:0:0:0:1:4 All-dhcp-relays [Bound,Perkins] + FF05:0:0:0:0:0:1:1000 Service Location [RFC2165] + -FF05:0:0:0:0:0:1:13FF + +3.0 All Scope Multicast Addresses + + These permanently assigned multicast addresses are valid over all + scope ranges. This is shown by an "X" in the scope field of the + address that means any legal scope value. + + Note that, as defined in [ADDARCH], IPv6 multicast addresses which + are only different in scope represent different groups. Nodes must + join each group individually. + + The IPv6 multicast addresses with variable scope are as follows: + + + + +Hinden & Deering Informational [Page 2] + +RFC 2375 IPv6 Multicast Address Assignments July 1998 + + + FF0X:0:0:0:0:0:0:0 Reserved Multicast Address [ADDARCH] + + FF0X:0:0:0:0:0:0:100 VMTP Managers Group [RFC1045,DRC3] + FF0X:0:0:0:0:0:0:101 Network Time Protocol (NTP) [RFC1119,DLM1] + FF0X:0:0:0:0:0:0:102 SGI-Dogfight [AXC] + FF0X:0:0:0:0:0:0:103 Rwhod [SXD] + FF0X:0:0:0:0:0:0:104 VNP [DRC3] + FF0X:0:0:0:0:0:0:105 Artificial Horizons - Aviator [BXF] + FF0X:0:0:0:0:0:0:106 NSS - Name Service Server [BXS2] + FF0X:0:0:0:0:0:0:107 AUDIONEWS - Audio News Multicast [MXF2] + FF0X:0:0:0:0:0:0:108 SUN NIS+ Information Service [CXM3] + FF0X:0:0:0:0:0:0:109 MTP Multicast Transport Protocol [SXA] + FF0X:0:0:0:0:0:0:10A IETF-1-LOW-AUDIO [SC3] + FF0X:0:0:0:0:0:0:10B IETF-1-AUDIO [SC3] + FF0X:0:0:0:0:0:0:10C IETF-1-VIDEO [SC3] + FF0X:0:0:0:0:0:0:10D IETF-2-LOW-AUDIO [SC3] + FF0X:0:0:0:0:0:0:10E IETF-2-AUDIO [SC3] + FF0X:0:0:0:0:0:0:10F IETF-2-VIDEO [SC3] + + FF0X:0:0:0:0:0:0:110 MUSIC-SERVICE [Guido van Rossum] + FF0X:0:0:0:0:0:0:111 SEANET-TELEMETRY [Andrew Maffei] + FF0X:0:0:0:0:0:0:112 SEANET-IMAGE [Andrew Maffei] + FF0X:0:0:0:0:0:0:113 MLOADD [Braden] + FF0X:0:0:0:0:0:0:114 any private experiment [JBP] + FF0X:0:0:0:0:0:0:115 DVMRP on MOSPF [Moy] + FF0X:0:0:0:0:0:0:116 SVRLOC [Veizades] + FF0X:0:0:0:0:0:0:117 XINGTV <hgxing@aol.com> + FF0X:0:0:0:0:0:0:118 microsoft-ds <arnoldm@microsoft.com> + FF0X:0:0:0:0:0:0:119 nbc-pro <bloomer@birch.crd.ge.com> + FF0X:0:0:0:0:0:0:11A nbc-pfn <bloomer@birch.crd.ge.com> + FF0X:0:0:0:0:0:0:11B lmsc-calren-1 [Uang] + FF0X:0:0:0:0:0:0:11C lmsc-calren-2 [Uang] + FF0X:0:0:0:0:0:0:11D lmsc-calren-3 [Uang] + FF0X:0:0:0:0:0:0:11E lmsc-calren-4 [Uang] + FF0X:0:0:0:0:0:0:11F ampr-info [Janssen] + + FF0X:0:0:0:0:0:0:120 mtrace [Casner] + FF0X:0:0:0:0:0:0:121 RSVP-encap-1 [Braden] + FF0X:0:0:0:0:0:0:122 RSVP-encap-2 [Braden] + FF0X:0:0:0:0:0:0:123 SVRLOC-DA [Veizades] + FF0X:0:0:0:0:0:0:124 rln-server [Kean] + FF0X:0:0:0:0:0:0:125 proshare-mc [Lewis] + FF0X:0:0:0:0:0:0:126 dantz [Yackle] + FF0X:0:0:0:0:0:0:127 cisco-rp-announce [Farinacci] + FF0X:0:0:0:0:0:0:128 cisco-rp-discovery [Farinacci] + FF0X:0:0:0:0:0:0:129 gatekeeper [Toga] + FF0X:0:0:0:0:0:0:12A iberiagames [Marocho] + + + + +Hinden & Deering Informational [Page 3] + +RFC 2375 IPv6 Multicast Address Assignments July 1998 + + + FF0X:0:0:0:0:0:0:201 "rwho" Group (BSD) (unofficial) [JBP] + FF0X:0:0:0:0:0:0:202 SUN RPC PMAPPROC_CALLIT [BXE1] + + FF0X:0:0:0:0:0:2:0000 + -FF0X:0:0:0:0:0:2:7FFD Multimedia Conference Calls [SC3] + FF0X:0:0:0:0:0:2:7FFE SAPv1 Announcements [SC3] + FF0X:0:0:0:0:0:2:7FFF SAPv0 Announcements (deprecated) [SC3] + FF0X:0:0:0:0:0:2:8000 + -FF0X:0:0:0:0:0:2:FFFF SAP Dynamic Assignments [SC3] + +5.0 References + + [ADDARCH] Hinden, R., and S. Deering, "IP Version 6 Addressing + Architecture", RFC 2373, July 1998. + + [AUTORFC] Thompson, S., and T. Narten, "IPv6 Stateless Address + Autoconfiguration", RFC 1971, August 1996. + + [ETHER] Crawford, M., "Transmission of IPv6 Packets over Ethernet + Networks", Work in Progress. + + [RFC1045] Cheriton, D., "VMTP: Versatile Message Transaction Protocol + Specification", RFC 1045, February 1988. + + [RFC1075] Waitzman, D., Partridge, C., and S. Deering, "Distance + Vector Multicast Routing Protocol", RFC 1075, November + 1988. + + [RFC1112] Deering, S., "Host Extensions for IP Multicasting", STD 5, + RFC 1112, Stanford University, August 1989. + + [RFC1119] Mills, D., "Network Time Protocol (Version 1), + Specification and Implementation", STD 12, RFC 1119, July + 1988. + + [RFC1190] Topolcic, C., Editor, "Experimental Internet Stream + Protocol, Version 2 (ST-II)", RFC 1190, October 1990. + + [RFC2080] Malkin, G., and R. Minnear, "RIPng for IPv6", RFC 2080, + January 1997. + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, March 1997. + + [RFC2165] Veizades, J., Guttman, E., Perkins, C., and S. Kaplan + "Service Location Protocol", RFC 2165 June 1997. + + [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998. + + + +Hinden & Deering Informational [Page 4] + +RFC 2375 IPv6 Multicast Address Assignments July 1998 + + +6. People + + <arnoldm@microsoft.com> + + [AXC] Andrew Cherenson <arc@SGI.COM> + + [Braden] Bob Braden, <braden@isi.edu>, April 1996. + + [Bob Brenner] + + [Bressler] David J. Bressler, <bressler@tss.com>, April 1996. + + <bloomer@birch.crd.ge.com> + + [Bound] Jim Bound <bound@zk3.dec.com> + + [BXE1] Brendan Eic <brendan@illyria.wpd.sgi.com> + + [BXF] Bruce Factor <ahi!bigapple!bruce@uunet.UU.NET> + + [BXS2] Bill Schilit <schilit@parc.xerox.com> + + [Casner] Steve Casner, <casner@isi.edu>, January 1995. + + [CXM3] Chuck McManis <cmcmanis@sun.com> + + [Tim Clark] + + [DLM1] David Mills <Mills@HUEY.UDEL.EDU> + + [DRC3] Dave Cheriton <cheriton@PESCADERO.STANFORD.EDU> + + [DXS3] Daniel Steinber <Daniel.Steinberg@Eng.Sun.COM> + + [Farinacci] Dino Farinacci, <dino@cisco.com> + + [GSM11] Gary S. Malkin <GMALKIN@XYLOGICS.COM> + + [Harrington] Dan Harrington, <dan@lucent.com>, July 1996. + + <hgxing@aol.com> + + [IANA] IANA <iana@iana.org> + + [Janssen] Rob Janssen, <rob@pe1chl.ampr.org>, January 1995. + + [JBP] Jon Postel <postel@isi.edu> + + + + +Hinden & Deering Informational [Page 5] + +RFC 2375 IPv6 Multicast Address Assignments July 1998 + + + [JXM1] Jim Miner <miner@star.com> + + [Kean] Brian Kean, <bkean@dca.com>, August 1995. + + [KS14] <mystery contact> + + [Lee] Choon Lee, <cwl@nsd.3com.com>, April 1996. + + [Lewis] Mark Lewis, <Mark_Lewis@ccm.jf.intel.com>, October 1995. + + [Malamud] Carl Malamud, <carl@radio.com>, January 1996. + + [Andrew Maffei] + + [Marohco] Jose Luis Marocho, <73374.313@compuserve.com>, July 1996. + + [Moy] John Moy <jmoy@casc.com> + + [MXF2] Martin Forssen <maf@dtek.chalmers.se> + + [Perkins] Charlie Perkins, <cperkins@corp.sun.com> + + [Guido van Rossum] + + [SC3] Steve Casner <casner@isi.edu> + + [Simpson] Bill Simpson <bill.simpson@um.cc.umich.edu> November 1994. + + [Joel Snyder] + + [SXA] Susie Armstrong <Armstrong.wbst128@XEROX.COM> + + [SXD] Steve Deering <deering@PARC.XEROX.COM> + + [tynan] Dermot Tynan, <dtynan@claddagh.ie>, August 1995. + + [Toga] Jim Toga, <jtoga@ibeam.jf.intel.com>, May 1996. + + [Uang] Yea Uang <uang@force.decnet.lockheed.com> November 1994. + + [Veizades] John Veizades, <veizades@tgv.com>, May 1995. + + [Yackle] Dotty Yackle, <ditty_yackle@dantz.com>, February 1996. + + + + + + + + +Hinden & Deering Informational [Page 6] + +RFC 2375 IPv6 Multicast Address Assignments July 1998 + + +7.0 Security Considerations + + This document defines the initial assignment of IPv6 multicast + addresses. As such it does not directly impact the security of the + Internet infrastructure or its applications. + +8.0 Authors' Addresses + + Robert M. Hinden + Ipsilon Networks, Inc. + 232 Java Drive + Sunnyvale, CA 94089 + USA + + Phone: +1 415 990 2004 + EMail: hinden@ipsilon.com + + + Stephen E. Deering + Cisco Systems, Inc. + 170 West Tasman Drive + San Jose, CA 95134-1706 + USA + + Phone: +1 408 527-8213 + EMail: deering@cisco.com + + + + + + + + + + + + + + + + + + + + + + + + + +Hinden & Deering Informational [Page 7] + +RFC 2375 IPv6 Multicast Address Assignments July 1998 + + +9.0 Full Copyright Statement + + Copyright (C) The Internet Society (1998). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + + + + + + + + + + + + + + + + + + + + + + + + +Hinden & Deering Informational [Page 8] + diff --git a/doc/rfc/rfc2418.txt b/doc/rfc/rfc2418.txt new file mode 100644 index 00000000..9bdb2c53 --- /dev/null +++ b/doc/rfc/rfc2418.txt @@ -0,0 +1,1459 @@ + + + + + + +Network Working Group S. Bradner +Request for Comments: 2418 Editor +Obsoletes: 1603 Harvard University +BCP: 25 September 1998 +Category: Best Current Practice + + + IETF Working Group + Guidelines and Procedures + +Status of this Memo + + This document specifies an Internet Best Current Practices for the + Internet Community, and requests discussion and suggestions for + improvements. Distribution of this memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (1998). All Rights Reserved. + +Abstract + + The Internet Engineering Task Force (IETF) has responsibility for + developing and reviewing specifications intended as Internet + Standards. IETF activities are organized into working groups (WGs). + This document describes the guidelines and procedures for formation + and operation of IETF working groups. It also describes the formal + relationship between IETF participants WG and the Internet + Engineering Steering Group (IESG) and the basic duties of IETF + participants, including WG Chairs, WG participants, and IETF Area + Directors. + +Table of Contents + + Abstract ......................................................... 1 + 1. Introduction .................................................. 2 + 1.1. IETF approach to standardization .......................... 4 + 1.2. Roles within a Working Group .............................. 4 + 2. Working group formation ....................................... 4 + 2.1. Criteria for formation .................................... 4 + 2.2. Charter ................................................... 6 + 2.3. Charter review & approval ................................. 8 + 2.4. Birds of a feather (BOF) .................................. 9 + 3. Working Group Operation ....................................... 10 + 3.1. Session planning .......................................... 11 + 3.2. Session venue ............................................. 11 + 3.3. Session management ........................................ 13 + 3.4. Contention and appeals .................................... 15 + + + +Bradner Best Current Practice [Page 1] + +RFC 2418 Working Group Guidelines September 1998 + + + 4. Working Group Termination ..................................... 15 + 5. Rechartering a Working Group .................................. 15 + 6. Staff Roles ................................................... 16 + 6.1. WG Chair .................................................. 16 + 6.2. WG Secretary .............................................. 18 + 6.3. Document Editor ........................................... 18 + 6.4. WG Facilitator ............................................ 18 + 6.5. Design teams .............................................. 19 + 6.6. Working Group Consultant .................................. 19 + 6.7. Area Director ............................................. 19 + 7. Working Group Documents ....................................... 19 + 7.1. Session documents ......................................... 19 + 7.2. Internet-Drafts (I-D) ..................................... 19 + 7.3. Request For Comments (RFC) ................................ 20 + 7.4. Working Group Last-Call ................................... 20 + 7.5. Submission of documents ................................... 21 + 8. Review of documents ........................................... 21 + 9. Security Considerations ....................................... 22 + 10. Acknowledgments .............................................. 23 + 11. References ................................................... 23 + 12. Editor's Address ............................................. 23 + Appendix: Sample Working Group Charter .......................... 24 + Full Copyright Statement ......................................... 26 + +1. Introduction + + The Internet, a loosely-organized international collaboration of + autonomous, interconnected networks, supports host-to-host + communication through voluntary adherence to open protocols and + procedures defined by Internet Standards. There are also many + isolated interconnected networks, which are not connected to the + global Internet but use the Internet Standards. Internet Standards + are developed in the Internet Engineering Task Force (IETF). This + document defines guidelines and procedures for IETF working groups. + The Internet Standards Process of the IETF is defined in [1]. The + organizations involved in the IETF Standards Process are described in + [2] as are the roles of specific individuals. + + The IETF is a large, open community of network designers, operators, + vendors, users, and researchers concerned with the Internet and the + technology used on it. The primary activities of the IETF are + performed by committees known as working groups. There are currently + more than 100 working groups. (See the IETF web page for an up-to- + date list of IETF Working Groups - http://www.ietf.org.) Working + groups tend to have a narrow focus and a lifetime bounded by the + completion of a specific set of tasks, although there are exceptions. + + + + + +Bradner Best Current Practice [Page 2] + +RFC 2418 Working Group Guidelines September 1998 + + + For management purposes, the IETF working groups are collected + together into areas, with each area having a separate focus. For + example, the security area deals with the development of security- + related technology. Each IETF area is managed by one or two Area + Directors (ADs). There are currently 8 areas in the IETF but the + number changes from time to time. (See the IETF web page for a list + of the current areas, the Area Directors for each area, and a list of + which working groups are assigned to each area.) + + In many areas, the Area Directors have formed an advisory group or + directorate. These comprise experienced members of the IETF and the + technical community represented by the area. The specific name and + the details of the role for each group differ from area to area, but + the primary intent is that these groups assist the Area Director(s), + e.g., with the review of specifications produced in the area. + + The IETF area directors are selected by a nominating committee, which + also selects an overall chair for the IETF. The nominations process + is described in [3]. + + The area directors sitting as a body, along with the IETF Chair, + comprise the Internet Engineering Steering Group (IESG). The IETF + Executive Director is an ex-officio participant of the IESG, as are + the IAB Chair and a designated Internet Architecture Board (IAB) + liaison. The IESG approves IETF Standards and approves the + publication of other IETF documents. (See [1].) + + A small IETF Secretariat provides staff and administrative support + for the operation of the IETF. + + There is no formal membership in the IETF. Participation is open to + all. This participation may be by on-line contribution, attendance + at face-to-face sessions, or both. Anyone from the Internet + community who has the time and interest is urged to participate in + IETF meetings and any of its on-line working group discussions. + Participation is by individual technical contributors, rather than by + formal representatives of organizations. + + This document defines procedures and guidelines for the formation and + operation of working groups in the IETF. It defines the relations of + working groups to other bodies within the IETF. The duties of working + group Chairs and Area Directors with respect to the operation of the + working group are also defined. When used in this document the key + words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", + "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be + interpreted as described in RFC 2119 [6]. RFC 2119 defines the use + of these key words to help make the intent of standards track + documents as clear as possible. The same key words are used in this + + + +Bradner Best Current Practice [Page 3] + +RFC 2418 Working Group Guidelines September 1998 + + + document to help smooth WG operation and reduce the chance for + confusion about the processes. + +1.1. IETF approach to standardization + + Familiarity with The Internet Standards Process [1] is essential for + a complete understanding of the philosophy, procedures and guidelines + described in this document. + +1.2. Roles within a Working Group + + The document, "Organizations Involved in the IETF Standards Process" + [2] describes the roles of a number of individuals within a working + group, including the working group chair and the document editor. + These descriptions are expanded later in this document. + +2. Working group formation + + IETF working groups (WGs) are the primary mechanism for development + of IETF specifications and guidelines, many of which are intended to + be standards or recommendations. A working group may be established + at the initiative of an Area Director or it may be initiated by an + individual or group of individuals. Anyone interested in creating an + IETF working group MUST obtain the advice and consent of the IETF + Area Director(s) in whose area the working group would fall and MUST + proceed through the formal steps detailed in this section. + + Working groups are typically created to address a specific problem or + to produce one or more specific deliverables (a guideline, standards + specification, etc.). Working groups are generally expected to be + short-lived in nature. Upon completion of its goals and achievement + of its objectives, the working group is terminated. A working group + may also be terminated for other reasons (see section 4). + Alternatively, with the concurrence of the IESG, Area Director, the + WG Chair, and the WG participants, the objectives or assignment of + the working group may be extended by modifying the working group's + charter through a rechartering process (see section 5). + +2.1. Criteria for formation + + When determining whether it is appropriate to create a working group, + the Area Director(s) and the IESG will consider several issues: + + - Are the issues that the working group plans to address clear and + relevant to the Internet community? + + - Are the goals specific and reasonably achievable, and achievable + within a reasonable time frame? + + + +Bradner Best Current Practice [Page 4] + +RFC 2418 Working Group Guidelines September 1998 + + + - What are the risks and urgency of the work, to determine the level + of effort required? + + - Do the working group's activities overlap with those of another + working group? If so, it may still be appropriate to create the + working group, but this question must be considered carefully by + the Area Directors as subdividing efforts often dilutes the + available technical expertise. + + - Is there sufficient interest within the IETF in the working + group's topic with enough people willing to expend the effort to + produce the desired result (e.g., a protocol specification)? + Working groups require considerable effort, including management + of the working group process, editing of working group documents, + and contributing to the document text. IETF experience suggests + that these roles typically cannot all be handled by one person; a + minimum of four or five active participants in the management + positions are typically required in addition to a minimum of one + or two dozen people that will attend the working group meetings + and contribute on the mailing list. NOTE: The interest must be + broad enough that a working group would not be seen as merely the + activity of a single vendor. + + - Is there enough expertise within the IETF in the working group's + topic, and are those people interested in contributing in the + working group? + + - Does a base of interested consumers (end-users) appear to exist + for the planned work? Consumer interest can be measured by + participation of end-users within the IETF process, as well as by + less direct means. + + - Does the IETF have a reasonable role to play in the determination + of the technology? There are many Internet-related technologies + that may be interesting to IETF members but in some cases the IETF + may not be in a position to effect the course of the technology in + the "real world". This can happen, for example, if the technology + is being developed by another standards body or an industry + consortium. + + - Are all known intellectual property rights relevant to the + proposed working group's efforts issues understood? + + - Is the proposed work plan an open IETF effort or is it an attempt + to "bless" non-IETF technology where the effect of input from IETF + participants may be limited? + + + + + +Bradner Best Current Practice [Page 5] + +RFC 2418 Working Group Guidelines September 1998 + + + - Is there a good understanding of any existing work that is + relevant to the topics that the proposed working group is to + pursue? This includes work within the IETF and elsewhere. + + - Do the working group's goals overlap with known work in another + standards body, and if so is adequate liaison in place? + + Considering the above criteria, the Area Director(s), using his or + her best judgement, will decide whether to pursue the formation of + the group through the chartering process. + +2.2. Charter + + The formation of a working group requires a charter which is + primarily negotiated between a prospective working group Chair and + the relevant Area Director(s), although final approval is made by the + IESG with advice from the Internet Architecture Board (IAB). A + charter is a contract between a working group and the IETF to perform + a set of tasks. A charter: + + 1. Lists relevant administrative information for the working group; + 2. Specifies the direction or objectives of the working group and + describes the approach that will be taken to achieve the goals; + and + 3. Enumerates a set of milestones together with time frames for their + completion. + + When the prospective Chair(s), the Area Director and the IETF + Secretariat are satisfied with the charter form and content, it + becomes the basis for forming a working group. Note that an Area + Director MAY require holding an exploratory Birds of a Feather (BOF) + meeting, as described below, to gage the level of support for a + working group before submitting the charter to the IESG and IAB for + approval. + + Charters may be renegotiated periodically to reflect the current + status, organization or goals of the working group (see section 5). + Hence, a charter is a contract between the IETF and the working group + which is committing to meet explicit milestones and delivering + specific "products". + + Specifically, each charter consists of the following sections: + + Working group name + A working group name should be reasonably descriptive or + identifiable. Additionally, the group shall define an acronym + (maximum 8 printable ASCII characters) to reference the group in + the IETF directories, mailing lists, and general documents. + + + +Bradner Best Current Practice [Page 6] + +RFC 2418 Working Group Guidelines September 1998 + + + Chair(s) + The working group may have one or more Chairs to perform the + administrative functions of the group. The email address(es) of + the Chair(s) shall be included. Generally, a working group is + limited to two chairs. + + Area and Area Director(s) + The name of the IETF area with which the working group is + affiliated and the name and electronic mail address of the + associated Area Director(s). + + Responsible Area Director + The Area Director who acts as the primary IESG contact for the + working group. + + Mailing list + An IETF working group MUST have a general Internet mailing list. + Most of the work of an IETF working group will be conducted on the + mailing list. The working group charter MUST include: + + 1. The address to which a participant sends a subscription request + and the procedures to follow when subscribing, + + 2. The address to which a participant sends submissions and + special procedures, if any, and + + 3. The location of the mailing list archive. A message archive + MUST be maintained in a public place which can be accessed via + FTP or via the web. + + As a service to the community, the IETF Secretariat operates a + mailing list archive for working group mailing lists. In order + to take advantage of this service, working group mailing lists + MUST include the address "wg_acronym-archive@lists.ietf.org" + (where "wg_acronym" is the working group acronym) in the + mailing list in order that a copy of all mailing list messages + be recorded in the Secretariat's archive. Those archives are + located at ftp://ftp.ietf.org/ietf-mail-archive. For + robustness, WGs SHOULD maintain an additional archive separate + from that maintained by the Secretariat. + + Description of working group + The focus and intent of the group shall be set forth briefly. By + reading this section alone, an individual should be able to decide + whether this group is relevant to their own work. The first + paragraph must give a brief summary of the problem area, basis, + goal(s) and approach(es) planned for the working group. This + paragraph can be used as an overview of the working group's + + + +Bradner Best Current Practice [Page 7] + +RFC 2418 Working Group Guidelines September 1998 + + + effort. + + To facilitate evaluation of the intended work and to provide on- + going guidance to the working group, the charter must describe the + problem being solved and should discuss objectives and expected + impact with respect to: + + - Architecture + - Operations + - Security + - Network management + - Scaling + - Transition (where applicable) + + Goals and milestones + The working group charter MUST establish a timetable for specific + work items. While this may be renegotiated over time, the list of + milestones and dates facilitates the Area Director's tracking of + working group progress and status, and it is indispensable to + potential participants identifying the critical moments for input. + Milestones shall consist of deliverables that can be qualified as + showing specific achievement; e.g., "Internet-Draft finished" is + fine, but "discuss via email" is not. It is helpful to specify + milestones for every 3-6 months, so that progress can be gauged + easily. This milestone list is expected to be updated + periodically (see section 5). + + An example of a WG charter is included as Appendix A. + +2.3. Charter review & approval + + Proposed working groups often comprise technically competent + participants who are not familiar with the history of Internet + architecture or IETF processes. This can, unfortunately, lead to + good working group consensus about a bad design. To facilitate + working group efforts, an Area Director may assign a Consultant from + among the ranks of senior IETF participants. (Consultants are + described in section 6.) At the discretion of the Area Director, + approval of a new WG may be withheld in the absence of sufficient + consultant resources. + + Once the Area Director (and the Area Directorate, as the Area + Director deems appropriate) has approved the working group charter, + the charter is submitted for review by the IAB and approval by the + IESG. After a review period of at least a week the proposed charter + is posted to the IETF-announce mailing list as a public notice that + the formation of the working group is being considered. At the same + time the proposed charter is also posted to the "new-work" mailing + + + +Bradner Best Current Practice [Page 8] + +RFC 2418 Working Group Guidelines September 1998 + + + list. This mailing list has been created to let qualified + representatives from other standards organizations know about pending + IETF working groups. After another review period lasting at least a + week the IESG MAY approve the charter as-is, it MAY request that + changes be made in the charter, or MAY decline to approve chartering + of the working group + + If the IESG approves the formation of the working group it remands + the approved charter to the IETF Secretariat who records and enters + the information into the IETF tracking database. The working group + is announced to the IETF-announce a by the IETF Secretariat. + +2.4. Birds of a Feather (BOF) + + Often it is not clear whether an issue merits the formation of a + working group. To facilitate exploration of the issues the IETF + offers the possibility of a Birds of a Feather (BOF) session, as well + as the early formation of an email list for preliminary discussion. + In addition, a BOF may serve as a forum for a single presentation or + discussion, without any intent to form a working group. + + A BOF is a session at an IETF meeting which permits "market research" + and technical "brainstorming". Any individual may request permission + to hold a BOF on a subject. The request MUST be filed with a relevant + Area Director who must approve a BOF before it can be scheduled. The + person who requests the BOF may be asked to serve as Chair of the + BOF. + + The Chair of the BOF is also responsible for providing a report on + the outcome of the BOF. If the Area Director approves, the BOF is + then scheduled by submitting a request to agenda@ietf.org with copies + to the Area Director(s). A BOF description and agenda are required + before a BOF can be scheduled. + + Available time for BOFs is limited, and BOFs are held at the + discretion of the ADs for an area. The AD(s) may require additional + assurances before authorizing a BOF. For example, + + - The Area Director MAY require the establishment of an open email + list prior to authorizing a BOF. This permits initial exchanges + and sharing of framework, vocabulary and approaches, in order to + make the time spent in the BOF more productive. + + - The Area Director MAY require that a BOF be held, prior to + establishing a working group (see section 2.2). + + - The Area Director MAY require that there be a draft of the WG + charter prior to holding a BOF. + + + +Bradner Best Current Practice [Page 9] + +RFC 2418 Working Group Guidelines September 1998 + + + - The Area Director MAY require that a BOF not be held until an + Internet-Draft describing the proposed technology has been + published so it can be used as a basis for discussion in the BOF. + + In general, a BOF on a particular topic is held only once (ONE slot + at one IETF Plenary meeting). Under unusual circumstances Area + Directors may, at their discretion, allow a BOF to meet for a second + time. BOFs are not permitted to meet three times. Note that all + other things being equal, WGs will be given priority for meeting + space over BOFs. Also, occasionally BOFs may be held for other + purposes than to discuss formation of a working group. + + Usually the outcome of a BOF will be one of the following: + + - There was enough interest and focus in the subject to warrant the + formation of a WG; + + - While there was a reasonable level of interest expressed in the + BOF some other criteria for working group formation was not met + (see section 2.1). + + - The discussion came to a fruitful conclusion, with results to be + written down and published, however there is no need to establish + a WG; or + + - There was not enough interest in the subject to warrant the + formation of a WG. + +3. Working Group Operation + + The IETF has basic requirements for open and fair participation and + for thorough consideration of technical alternatives. Within those + constraints, working groups are autonomous and each determines most + of the details of its own operation with respect to session + participation, reaching closure, etc. The core rule for operation is + that acceptance or agreement is achieved via working group "rough + consensus". WG participants should specifically note the + requirements for disclosure of conflicts of interest in [2]. + + A number of procedural questions and issues will arise over time, and + it is the function of the Working Group Chair(s) to manage the group + process, keeping in mind that the overall purpose of the group is to + make progress towards reaching rough consensus in realizing the + working group's goals and objectives. + + There are few hard and fast rules on organizing or conducting working + group activities, but a set of guidelines and practices has evolved + over time that have proven successful. These are listed here, with + + + +Bradner Best Current Practice [Page 10] + +RFC 2418 Working Group Guidelines September 1998 + + + actual choices typically determined by the working group participants + and the Chair(s). + +3.1. Session planning + + For coordinated, structured WG interactions, the Chair(s) MUST + publish a draft agenda well in advance of the actual session. The + agenda should contain at least: + + - The items for discussion; + - The estimated time necessary per item; and + - A clear indication of what documents the participants will need to + read before the session in order to be well prepared. + + Publication of the working group agenda shall include sending a copy + of the agenda to the working group mailing list and to + agenda@ietf.org. + + All working group actions shall be taken in a public forum, and wide + participation is encouraged. A working group will conduct much of its + business via electronic mail distribution lists but may meet + periodically to discuss and review task status and progress, to + resolve specific issues and to direct future activities. IETF + Plenary meetings are the primary venue for these face-to-face working + group sessions, and it is common (though not required) that active + "interim" face-to-face meetings, telephone conferences, or video + conferences may also be held. Interim meetings are subject to the + same rules for advance notification, reporting, open participation, + and process, which apply to other working group meetings. + + All working group sessions (including those held outside of the IETF + meetings) shall be reported by making minutes available. These + minutes should include the agenda for the session, an account of the + discussion including any decisions made, and a list of attendees. The + Working Group Chair is responsible for insuring that session minutes + are written and distributed, though the actual task may be performed + by someone designated by the Working Group Chair. The minutes shall + be submitted in printable ASCII text for publication in the IETF + Proceedings, and for posting in the IETF Directories and are to be + sent to: minutes@ietf.org + +3.2. Session venue + + Each working group will determine the balance of email and face-to- + face sessions that is appropriate for achieving its milestones. + Electronic mail permits the widest participation; face-to-face + meetings often permit better focus and therefore can be more + efficient for reaching a consensus among a core of the working group + + + +Bradner Best Current Practice [Page 11] + +RFC 2418 Working Group Guidelines September 1998 + + + participants. In determining the balance, the WG must ensure that + its process does not serve to exclude contribution by email-only + participants. Decisions reached during a face-to-face meeting about + topics or issues which have not been discussed on the mailing list, + or are significantly different from previously arrived mailing list + consensus MUST be reviewed on the mailing list. + + IETF Meetings + If a WG needs a session at an IETF meeting, the Chair must apply for + time-slots as soon as the first announcement of that IETF meeting is + made by the IETF Secretariat to the WG-chairs list. Session time is + a scarce resource at IETF meetings, so placing requests early will + facilitate schedule coordination for WGs requiring the same set of + experts. + + The application for a WG session at an IETF meeting MUST be made to + the IETF Secretariat at the address agenda@ietf.org. Some Area + Directors may want to coordinate WG sessions in their area and + request that time slots be coordinated through them. If this is the + case it will be noted in the IETF meeting announcement. A WG + scheduling request MUST contain: + + - The working group name and full title; + - The amount of time requested; + - The rough outline of the WG agenda that is expected to be covered; + - The estimated number of people that will attend the WG session; + - Related WGs that should not be scheduled for the same time slot(s); + and + - Optionally a request can be added for the WG session to be + transmitted over the Internet in audio and video. + + NOTE: While open discussion and contribution is essential to working + group success, the Chair is responsible for ensuring forward + progress. When acceptable to the WG, the Chair may call for + restricted participation (but not restricted attendance!) at IETF + working group sessions for the purpose of achieving progress. The + Working Group Chair then has the authority to refuse to grant the + floor to any individual who is unprepared or otherwise covering + inappropriate material, or who, in the opinion of the Chair is + disrupting the WG process. The Chair should consult with the Area + Director(s) if the individual persists in disruptive behavior. + + On-line + It can be quite useful to conduct email exchanges in the same manner + as a face-to-face session, with published schedule and agenda, as + well as on-going summarization and consensus polling. + + + + + +Bradner Best Current Practice [Page 12] + +RFC 2418 Working Group Guidelines September 1998 + + + Many working group participants hold that mailing list discussion is + the best place to consider and resolve issues and make decisions. The + choice of operational style is made by the working group itself. It + is important to note, however, that Internet email discussion is + possible for a much wider base of interested persons than is + attendance at IETF meetings, due to the time and expense required to + attend. + + As with face-to-face sessions occasionally one or more individuals + may engage in behavior on a mailing list which disrupts the WG's + progress. In these cases the Chair should attempt to discourage the + behavior by communication directly with the offending individual + rather than on the open mailing list. If the behavior persists then + the Chair must involve the Area Director in the issue. As a last + resort and after explicit warnings, the Area Director, with the + approval of the IESG, may request that the mailing list maintainer + block the ability of the offending individual to post to the mailing + list. (If the mailing list software permits this type of operation.) + Even if this is done, the individual must not be prevented from + receiving messages posted to the list. Other methods of mailing list + control may be considered but must be approved by the AD(s) and the + IESG. + +3.3. Session management + + Working groups make decisions through a "rough consensus" process. + IETF consensus does not require that all participants agree although + this is, of course, preferred. In general, the dominant view of the + working group shall prevail. (However, it must be noted that + "dominance" is not to be determined on the basis of volume or + persistence, but rather a more general sense of agreement.) Consensus + can be determined by a show of hands, humming, or any other means on + which the WG agrees (by rough consensus, of course). Note that 51% + of the working group does not qualify as "rough consensus" and 99% is + better than rough. It is up to the Chair to determine if rough + consensus has been reached. + + It can be particularly challenging to gauge the level of consensus on + a mailing list. There are two different cases where a working group + may be trying to understand the level of consensus via a mailing list + discussion. But in both cases the volume of messages on a topic is + not, by itself, a good indicator of consensus since one or two + individuals may be generating much of the traffic. + + In the case where a consensus which has been reached during a face- + to-face meeting is being verified on a mailing list the people who + were in the meeting and expressed agreement must be taken into + account. If there were 100 people in a meeting and only a few people + + + +Bradner Best Current Practice [Page 13] + +RFC 2418 Working Group Guidelines September 1998 + + + on the mailing list disagree with the consensus of the meeting then + the consensus should be seen as being verified. Note that enough + time should be given to the verification process for the mailing list + readers to understand and consider any objections that may be raised + on the list. The normal two week last-call period should be + sufficient for this. + + The other case is where the discussion has been held entirely over + the mailing list. The determination of the level of consensus may be + harder to do in this case since most people subscribed to mailing + lists do not actively participate in discussions on the list. It is + left to the discretion of the working group chair how to evaluate the + level of consensus. The most common method used is for the working + group chair to state what he or she believes to be the consensus view + and. at the same time, requests comments from the list about the + stated conclusion. + + The challenge to managing working group sessions is to balance the + need for open and fair consideration of the issues against the need + to make forward progress. The working group, as a whole, has the + final responsibility for striking this balance. The Chair has the + responsibility for overseeing the process but may delegate direct + process management to a formally-designated Facilitator. + + It is occasionally appropriate to revisit a topic, to re-evaluate + alternatives or to improve the group's understanding of a relevant + decision. However, unnecessary repeated discussions on issues can be + avoided if the Chair makes sure that the main arguments in the + discussion (and the outcome) are summarized and archived after a + discussion has come to conclusion. It is also good practice to note + important decisions/consensus reached by email in the minutes of the + next 'live' session, and to summarize briefly the decision-making + history in the final documents the WG produces. + + To facilitate making forward progress, a Working Group Chair may wish + to decide to reject or defer the input from a member, based upon the + following criteria: + + Old + The input pertains to a topic that already has been resolved and is + redundant with information previously available; + + Minor + The input is new and pertains to a topic that has already been + resolved, but it is felt to be of minor import to the existing + decision; + + + + + +Bradner Best Current Practice [Page 14] + +RFC 2418 Working Group Guidelines September 1998 + + + Timing + The input pertains to a topic that the working group has not yet + opened for discussion; or + + Scope + The input is outside of the scope of the working group charter. + +3.4. Contention and appeals + + Disputes are possible at various stages during the IETF process. As + much as possible the process is designed so that compromises can be + made, and genuine consensus achieved; however, there are times when + even the most reasonable and knowledgeable people are unable to + agree. To achieve the goals of openness and fairness, such conflicts + must be resolved by a process of open review and discussion. + + Formal procedures for requesting a review of WG, Chair, Area Director + or IESG actions and conducting appeals are documented in The Internet + Standards Process [1]. + +4. Working Group Termination + + Working groups are typically chartered to accomplish a specific task + or tasks. After the tasks are complete, the group will be disbanded. + However, if a WG produces a Proposed or Draft Standard, the WG will + frequently become dormant rather than disband (i.e., the WG will no + longer conduct formal activities, but the mailing list will remain + available to review the work as it moves to Draft Standard and + Standard status.) + + If, at some point, it becomes evident that a working group is unable + to complete the work outlined in the charter, or if the assumptions + which that work was based have been modified in discussion or by + experience, the Area Director, in consultation with the working group + can either: + + 1. Recharter to refocus its tasks, + 2. Choose new Chair(s), or + 3. Disband. + + If the working group disagrees with the Area Director's choice, it + may appeal to the IESG (see section 3.4). + +5. Rechartering a Working Group + + Updated milestones are renegotiated with the Area Director and the + IESG, as needed, and then are submitted to the IESG Secretariat: + iesg-secretary@ietf.org. + + + +Bradner Best Current Practice [Page 15] + +RFC 2418 Working Group Guidelines September 1998 + + + Rechartering (other than revising milestones) a working group follows + the same procedures that the initial chartering does (see section 2). + The revised charter must be submitted to the IESG and IAB for + approval. As with the initial chartering, the IESG may approve new + charter as-is, it may request that changes be made in the new charter + (including having the Working Group continue to use the old charter), + or it may decline to approve the rechartered working group. In the + latter case, the working group is disbanded. + +6. Staff Roles + + Working groups require considerable care and feeding. In addition to + general participation, successful working groups benefit from the + efforts of participants filling specific functional roles. The Area + Director must agree to the specific people performing the WG Chair, + and Working Group Consultant roles, and they serve at the discretion + of the Area Director. + +6.1. WG Chair + + The Working Group Chair is concerned with making forward progress + through a fair and open process, and has wide discretion in the + conduct of WG business. The Chair must ensure that a number of tasks + are performed, either directly or by others assigned to the tasks. + + The Chair has the responsibility and the authority to make decisions, + on behalf of the working group, regarding all matters of working + group process and staffing, in conformance with the rules of the + IETF. The AD has the authority and the responsibility to assist in + making those decisions at the request of the Chair or when + circumstances warrant such an intervention. + + The Chair's responsibility encompasses at least the following: + + Ensure WG process and content management + + The Chair has ultimate responsibility for ensuring that a working + group achieves forward progress and meets its milestones. The + Chair is also responsible to ensure that the working group + operates in an open and fair manner. For some working groups, + this can be accomplished by having the Chair perform all + management-related activities. In other working groups -- + particularly those with large or divisive participation -- it is + helpful to allocate process and/or secretarial functions to other + participants. Process management pertains strictly to the style + of working group interaction and not to its content. It ensures + fairness and detects redundancy. The secretarial function + encompasses document editing. It is quite common for a working + + + +Bradner Best Current Practice [Page 16] + +RFC 2418 Working Group Guidelines September 1998 + + + group to assign the task of specification Editor to one or two + participants. Sometimes, they also are part of the design team, + described below. + + Moderate the WG email list + + The Chair should attempt to ensure that the discussions on this + list are relevant and that they converge to consensus agreements. + The Chair should make sure that discussions on the list are + summarized and that the outcome is well documented (to avoid + repetition). The Chair also may choose to schedule organized on- + line "sessions" with agenda and deliverables. These can be + structured as true meetings, conducted over the course of several + days (to allow participation across the Internet). + + Organize, prepare and chair face-to-face and on-line formal + sessions. + + Plan WG Sessions + + The Chair must plan and announce all WG sessions well in advance + (see section 3.1). + + Communicate results of sessions + + The Chair and/or Secretary must ensure that minutes of a session + are taken and that an attendance list is circulated (see section + 3.1). + + Immediately after a session, the WG Chair MUST provide the Area + Director with a very short report (approximately one paragraph, + via email) on the session. + + Distribute the workload + + Of course, each WG will have participants who may not be able (or + want) to do any work at all. Most of the time the bulk of the work + is done by a few dedicated participants. It is the task of the + Chair to motivate enough experts to allow for a fair distribution + of the workload. + + Document development + + Working groups produce documents and documents need authors. The + Chair must make sure that authors of WG documents incorporate + changes as agreed to by the WG (see section 6.3). + + + + + +Bradner Best Current Practice [Page 17] + +RFC 2418 Working Group Guidelines September 1998 + + + Document publication + + The Chair and/or Document Editor will work with the RFC Editor to + ensure document conformance with RFC publication requirements [5] + and to coordinate any editorial changes suggested by the RFC + Editor. A particular concern is that all participants are working + from the same version of a document at the same time. + + Document implementations + + Under the procedures described in [1], the Chair is responsible + for documenting the specific implementations which qualify the + specification for Draft or Internet Standard status along with + documentation about testing of the interoperation of these + implementations. + +6.2. WG Secretary + + Taking minutes and editing working group documents often is performed + by a specifically-designated participant or set of participants. In + this role, the Secretary's job is to record WG decisions, rather than + to perform basic specification. + +6.3. Document Editor + + Most IETF working groups focus their efforts on a document, or set of + documents, that capture the results of the group's work. A working + group generally designates a person or persons to serve as the Editor + for a particular document. The Document Editor is responsible for + ensuring that the contents of the document accurately reflect the + decisions that have been made by the working group. + + As a general practice, the Working Group Chair and Document Editor + positions are filled by different individuals to help ensure that the + resulting documents accurately reflect the consensus of the working + group and that all processes are followed. + +6.4. WG Facilitator + + When meetings tend to become distracted or divisive, it often is + helpful to assign the task of "process management" to one + participant. Their job is to oversee the nature, rather than the + content, of participant interactions. That is, they attend to the + style of the discussion and to the schedule of the agenda, rather + than making direct technical contributions themselves. + + + + + + +Bradner Best Current Practice [Page 18] + +RFC 2418 Working Group Guidelines September 1998 + + +6.5. Design teams + + It is often useful, and perhaps inevitable, for a sub-group of a + working group to develop a proposal to solve a particular problem. + Such a sub-group is called a design team. In order for a design team + to remain small and agile, it is acceptable to have closed membership + and private meetings. Design teams may range from an informal chat + between people in a hallway to a formal set of expert volunteers that + the WG chair or AD appoints to attack a controversial problem. The + output of a design team is always subject to approval, rejection or + modification by the WG as a whole. + +6.6. Working Group Consultant + + At the discretion of the Area Director, a Consultant may be assigned + to a working group. Consultants have specific technical background + appropriate to the WG and experience in Internet architecture and + IETF process. + +6.7. Area Director + + Area Directors are responsible for ensuring that working groups in + their area produce coherent, coordinated, architecturally consistent + and timely output as a contribution to the overall results of the + IETF. + +7. Working Group Documents + +7.1. Session documents + + All relevant documents to be discussed at a session should be + published and available as Internet-Drafts at least two weeks before + a session starts. Any document which does not meet this publication + deadline can only be discussed in a working group session with the + specific approval of the working group chair(s). Since it is + important that working group members have adequate time to review all + documents, granting such an exception should only be done under + unusual conditions. The final session agenda should be posted to the + working group mailing list at least two weeks before the session and + sent at that time to agenda@ietf.org for publication on the IETF web + site. + +7.2. Internet-Drafts (I-D) + + The Internet-Drafts directory is provided to working groups as a + resource for posting and disseminating in-process copies of working + group documents. This repository is replicated at various locations + around the Internet. It is encouraged that draft documents be posted + + + +Bradner Best Current Practice [Page 19] + +RFC 2418 Working Group Guidelines September 1998 + + + as soon as they become reasonably stable. + + It is stressed here that Internet-Drafts are working documents and + have no official standards status whatsoever. They may, eventually, + turn into a standards-track document or they may sink from sight. + Internet-Drafts are submitted to: internet-drafts@ietf.org + + The format of an Internet-Draft must be the same as for an RFC [2]. + Further, an I-D must contain: + + - Beginning, standard, boilerplate text which is provided by the + Secretariat on their web site and in the ftp directory; + - The I-D filename; and + - The expiration date for the I-D. + + Complete specification of requirements for an Internet-Draft are + found in the file "1id-guidelines.txt" in the Internet-Drafts + directory at an Internet Repository site. The organization of the + Internet-Drafts directory is found in the file "1id-organization" in + the Internet-Drafts directory at an Internet Repository site. This + file also contains the rules for naming Internet-Drafts. (See [1] + for more information about Internet-Drafts.) + +7.3. Request For Comments (RFC) + + The work of an IETF working group often results in publication of one + or more documents, as part of the Request For Comments (RFCs) [1] + series. This series is the archival publication record for the + Internet community. A document can be written by an individual in a + working group, by a group as a whole with a designated Editor, or by + others not involved with the IETF. + + NOTE: The RFC series is a publication mechanism only and publication + does not determine the IETF status of a document. Status is + determined through separate, explicit status labels assigned by the + IESG on behalf of the IETF. In other words, the reader is reminded + that all Internet Standards are published as RFCs, but NOT all RFCs + specify standards [4]. + +7.4. Working Group Last-Call + + When a WG decides that a document is ready for publication it may be + submitted to the IESG for consideration. In most cases the + determination that a WG feels that a document is ready for + publication is done by the WG Chair issuing a working group Last- + Call. The decision to issue a working group Last-Call is at the + discretion of the WG Chair working with the Area Director. A working + group Last-Call serves the same purpose within a working group that + + + +Bradner Best Current Practice [Page 20] + +RFC 2418 Working Group Guidelines September 1998 + + + an IESG Last-Call does in the broader IETF community (see [1]). + +7.5. Submission of documents + + Once that a WG has determined at least rough consensus exists within + the WG for the advancement of a document the following must be done: + + - The version of the relevant document exactly as agreed to by the WG + MUST be in the Internet-Drafts directory. + + - The relevant document MUST be formatted according to section 7.3. + + - The WG Chair MUST send email to the relevant Area Director. A copy + of the request MUST be also sent to the IESG Secretariat. The mail + MUST contain the reference to the document's ID filename, and the + action requested. The copy of the message to the IESG Secretariat + is to ensure that the request gets recorded by the Secretariat so + that they can monitor the progress of the document through the + process. + + Unless returned by the IESG to the WG for further development, + progressing of the document is then the responsibility of the IESG. + After IESG approval, responsibility for final disposition is the + joint responsibility of the RFC Editor, the WG Chair and the Document + Editor. + +8. Review of documents + + The IESG reviews all documents submitted for publication as RFCs. + Usually minimal IESG review is necessary in the case of a submission + from a WG intended as an Informational or Experimental RFC. More + extensive review is undertaken in the case of standards-track + documents. + + Prior to the IESG beginning their deliberations on standards-track + documents, IETF Secretariat will issue a "Last-Call" to the IETF + mailing list (see [1]). This Last Call will announce the intention of + the IESG to consider the document, and it will solicit final comments + from the IETF within a period of two weeks. It is important to note + that a Last-Call is intended as a brief, final check with the + Internet community, to make sure that no important concerns have been + missed or misunderstood. The Last-Call should not serve as a more + general, in-depth review. + + The IESG review takes into account responses to the Last-Call and + will lead to one of these possible conclusions: + + + + + +Bradner Best Current Practice [Page 21] + +RFC 2418 Working Group Guidelines September 1998 + + + 1. The document is accepted as is for the status requested. + This fact will be announced by the IETF Secretariat to the IETF + mailing list and to the RFC Editor. + + 2. The document is accepted as-is but not for the status requested. + This fact will be announced by the IETF Secretariat to the IETF + mailing list and to the RFC Editor (see [1] for more details). + + 3. Changes regarding content are suggested to the author(s)/WG. + Suggestions from the IESG must be clear and direct, so as to + facilitate working group and author correction of the + specification. If the author(s)/WG can explain to the + satisfaction of the IESG why the changes are not necessary, the + document will be accepted for publication as under point 1, above. + If the changes are made the revised document may be resubmitted + for IESG review. + + 4. Changes are suggested by the IESG and a change in status is + recommended. + The process described above for 3 and 2 are followed in that + order. + + 5. The document is rejected. + Any document rejection will be accompanied by specific and + thorough arguments from the IESG. Although the IETF and working + group process is structured such that this alternative is not + likely to arise for documents coming from a working group, the + IESG has the right and responsibility to reject documents that the + IESG feels are fatally flawed in some way. + + If any individual or group of individuals feels that the review + treatment has been unfair, there is the opportunity to make a + procedural complaint. The mechanism for this type of complaints is + described in [1]. + +9. Security Considerations + + Documents describing IETF processes, such as this one, do not have an + impact on the security of the network infrastructure or of Internet + applications. + + It should be noted that all IETF working groups are required to + examine and understand the security implications of any technology + they develop. This analysis must be included in any resulting RFCs + in a Security Considerations section. Note that merely noting a + significant security hole is no longer sufficient. IETF developed + technologies should not add insecurity to the environment in which + they are run. + + + +Bradner Best Current Practice [Page 22] + +RFC 2418 Working Group Guidelines September 1998 + + +10. Acknowledgments + + This revision of this document relies heavily on the previous version + (RFC 1603) which was edited by Erik Huizer and Dave Crocker. It has + been reviewed by the Poisson Working Group. + +11. References + + [1] Bradner, S., Editor, "The Internet Standards Process -- Revision + 3", BCP 9, RFC 2026, October 1996. + + [2] Hovey, R., and S. Bradner, "The Organizations involved in the + IETF Standards Process", BCP 11, RFC 2028, October 1996. + + [3] Gavin, J., "IAB and IESG Selection, Confirmation, and Recall + Process: Operation of the Nominating and Recall Committees", BCP + 10, RFC 2282, February 1998. + + [4] Huitema, C., J. Postel, S. Crocker, "Not all RFCs are Standards", + RFC 1796, April 1995. + + [5] Postel, J., and J. Reynolds, "Instructions to RFC Authors", RFC + 2223, October 1997. + + [6] Bradner, S., "Key words for use in RFCs to Indicate Requirement + Level", BCP 14, RFC 2119, March 1997. + + +12. Editor's Address + + Scott Bradner + Harvard University + 1350 Mass Ave. + Cambridge MA + 02138 + USA + + Phone +1 617 495 3864 + EMail: sob@harvard.edu + + + + + + + + + + + + +Bradner Best Current Practice [Page 23] + +RFC 2418 Working Group Guidelines September 1998 + + + Appendix: Sample Working Group Charter + + Working Group Name: + IP Telephony (iptel) + + IETF Area: + Transport Area + + Chair(s): + Jonathan Rosenberg <jdrosen@bell-labs.com> + + Transport Area Director(s): + Scott Bradner <sob@harvard.edu> + Allyn Romanow <allyn@mci.net> + + Responsible Area Director: + Allyn Romanow <allyn@mci.net> + + Mailing Lists: + General Discussion:iptel@lists.research.bell-labs.com + To Subscribe: iptel-request@lists.research.bell-labs.com + Archive: http://www.bell-labs.com/mailing-lists/siptel + + Description of Working Group: + + Before Internet telephony can become a widely deployed service, a + number of protocols must be deployed. These include signaling and + capabilities exchange, but also include a number of "peripheral" + protocols for providing related services. + + The primary purpose of this working group is to develop two such + supportive protocols and a frameword document. They are: + + 1. Call Processing Syntax. When a call is setup between two + endpoints, the signaling will generally pass through several servers + (such as an H.323 gatekeeper) which are responsible for forwarding, + redirecting, or proxying the signaling messages. For example, a user + may make a call to j.doe@bigcompany.com. The signaling message to + initiate the call will arrive at some server at bigcompany. This + server can inform the caller that the callee is busy, forward the + call initiation request to another server closer to the user, or drop + the call completely (among other possibilities). It is very desirable + to allow the callee to provide input to this process, guiding the + server in its decision on how to act. This can enable a wide variety + of advanced personal mobility and call agent services. + + + + + + +Bradner Best Current Practice [Page 24] + +RFC 2418 Working Group Guidelines September 1998 + + + Such preferences can be expressed in a call processing syntax, which + can be authored by the user (or generated automatically by some + tool), and then uploaded to the server. The group will develop this + syntax, and specify means of securely transporting and extending it. + The result will be a single standards track RFC. + + 2. In addition, the group will write a service model document, which + describes the services that are enabled by the call processing + syntax, and discusses how the syntax can be used. This document will + result in a single RFC. + + 3. Gateway Attribute Distribution Protocol. When making a call + between an IP host and a PSTN user, a telephony gateway must be used. + The selection of such gateways can be based on many criteria, + including client expressed preferences, service provider preferences, + and availability of gateways, in addition to destination telephone + number. Since gateways outside of the hosts' administrative domain + might be used, a protocol is required to allow gateways in remote + domains to distribute their attributes (such as PSTN connectivity, + supported codecs, etc.) to entities in other domains which must make + a selection of a gateway. The protocol must allow for scalable, + bandwidth efficient, and very secure transmission of these + attributes. The group will investigate and design a protocol for this + purpose, generate an Internet Draft, and advance it to RFC as + appropriate. + + Goals and Milestones: + + May 98 Issue first Internet-Draft on service framework + Jul 98 Submit framework ID to IESG for publication as an RFC. + Aug 98 Issue first Internet-Draft on Call Processing Syntax + Oct 98 Submit Call processing syntax to IESG for consideration + as a Proposed Standard. + Dec 98 Achieve consensus on basics of gateway attribute + distribution protocol + Jan 99 Submit Gateway Attribute Distribution protocol to IESG + for consideration as a RFC (info, exp, stds track TB + + + + + + + + + + + + + + +Bradner Best Current Practice [Page 25] + +RFC 2418 Working Group Guidelines September 1998 + + +Full Copyright Statement + + Copyright (C) The Internet Society (1998). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + + + + + + + + + + + + + + + + + + + + + + + + +Bradner Best Current Practice [Page 26] + diff --git a/doc/rfc/rfc2535.txt b/doc/rfc/rfc2535.txt new file mode 100644 index 00000000..fe0b3d07 --- /dev/null +++ b/doc/rfc/rfc2535.txt @@ -0,0 +1,2635 @@ + + + + + + +Network Working Group D. Eastlake +Request for Comments: 2535 IBM +Obsoletes: 2065 March 1999 +Updates: 2181, 1035, 1034 +Category: Standards Track + + Domain Name System Security Extensions + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (1999). All Rights Reserved. + +Abstract + + Extensions to the Domain Name System (DNS) are described that provide + data integrity and authentication to security aware resolvers and + applications through the use of cryptographic digital signatures. + These digital signatures are included in secured zones as resource + records. Security can also be provided through non-security aware + DNS servers in some cases. + + The extensions provide for the storage of authenticated public keys + in the DNS. This storage of keys can support general public key + distribution services as well as DNS security. The stored keys + enable security aware resolvers to learn the authenticating key of + zones in addition to those for which they are initially configured. + Keys associated with DNS names can be retrieved to support other + protocols. Provision is made for a variety of key types and + algorithms. + + In addition, the security extensions provide for the optional + authentication of DNS protocol transactions and requests. + + This document incorporates feedback on RFC 2065 from early + implementers and potential users. + + + + + + + + +Eastlake Standards Track [Page 1] + +RFC 2535 DNS Security Extensions March 1999 + + +Acknowledgments + + The significant contributions and suggestions of the following + persons (in alphabetic order) to DNS security are gratefully + acknowledged: + + James M. Galvin + John Gilmore + Olafur Gudmundsson + Charlie Kaufman + Edward Lewis + Thomas Narten + Radia J. Perlman + Jeffrey I. Schiller + Steven (Xunhua) Wang + Brian Wellington + +Table of Contents + + Abstract...................................................1 + Acknowledgments............................................2 + 1. Overview of Contents....................................4 + 2. Overview of the DNS Extensions..........................5 + 2.1 Services Not Provided..................................5 + 2.2 Key Distribution.......................................5 + 2.3 Data Origin Authentication and Integrity...............6 + 2.3.1 The SIG Resource Record..............................7 + 2.3.2 Authenticating Name and Type Non-existence...........7 + 2.3.3 Special Considerations With Time-to-Live.............7 + 2.3.4 Special Considerations at Delegation Points..........8 + 2.3.5 Special Considerations with CNAME....................8 + 2.3.6 Signers Other Than The Zone..........................9 + 2.4 DNS Transaction and Request Authentication.............9 + 3. The KEY Resource Record................................10 + 3.1 KEY RDATA format......................................10 + 3.1.1 Object Types, DNS Names, and Keys...................11 + 3.1.2 The KEY RR Flag Field...............................11 + 3.1.3 The Protocol Octet..................................13 + 3.2 The KEY Algorithm Number Specification................14 + 3.3 Interaction of Flags, Algorithm, and Protocol Bytes...15 + 3.4 Determination of Zone Secure/Unsecured Status.........15 + 3.5 KEY RRs in the Construction of Responses..............17 + 4. The SIG Resource Record................................17 + 4.1 SIG RDATA Format......................................17 + 4.1.1 Type Covered Field..................................18 + 4.1.2 Algorithm Number Field..............................18 + 4.1.3 Labels Field........................................18 + 4.1.4 Original TTL Field..................................19 + + + +Eastlake Standards Track [Page 2] + +RFC 2535 DNS Security Extensions March 1999 + + + 4.1.5 Signature Expiration and Inception Fields...........19 + 4.1.6 Key Tag Field.......................................20 + 4.1.7 Signer's Name Field.................................20 + 4.1.8 Signature Field.....................................20 + 4.1.8.1 Calculating Transaction and Request SIGs..........21 + 4.2 SIG RRs in the Construction of Responses..............21 + 4.3 Processing Responses and SIG RRs......................22 + 4.4 Signature Lifetime, Expiration, TTLs, and Validity....23 + 5. Non-existent Names and Types...........................24 + 5.1 The NXT Resource Record...............................24 + 5.2 NXT RDATA Format......................................25 + 5.3 Additional Complexity Due to Wildcards................26 + 5.4 Example...............................................26 + 5.5 Special Considerations at Delegation Points...........27 + 5.6 Zone Transfers........................................27 + 5.6.1 Full Zone Transfers.................................28 + 5.6.2 Incremental Zone Transfers..........................28 + 6. How to Resolve Securely and the AD and CD Bits.........29 + 6.1 The AD and CD Header Bits.............................29 + 6.2 Staticly Configured Keys..............................31 + 6.3 Chaining Through The DNS..............................31 + 6.3.1 Chaining Through KEYs...............................31 + 6.3.2 Conflicting Data....................................33 + 6.4 Secure Time...........................................33 + 7. ASCII Representation of Security RRs...................34 + 7.1 Presentation of KEY RRs...............................34 + 7.2 Presentation of SIG RRs...............................35 + 7.3 Presentation of NXT RRs...............................36 + 8. Canonical Form and Order of Resource Records...........36 + 8.1 Canonical RR Form.....................................36 + 8.2 Canonical DNS Name Order..............................37 + 8.3 Canonical RR Ordering Within An RRset.................37 + 8.4 Canonical Ordering of RR Types........................37 + 9. Conformance............................................37 + 9.1 Server Conformance....................................37 + 9.2 Resolver Conformance..................................38 + 10. Security Considerations...............................38 + 11. IANA Considerations...................................39 + References................................................39 + Author's Address..........................................41 + Appendix A: Base 64 Encoding..............................42 + Appendix B: Changes from RFC 2065.........................44 + Appendix C: Key Tag Calculation...........................46 + Full Copyright Statement..................................47 + + + + + + + +Eastlake Standards Track [Page 3] + +RFC 2535 DNS Security Extensions March 1999 + + +1. Overview of Contents + + This document standardizes extensions of the Domain Name System (DNS) + protocol to support DNS security and public key distribution. It + assumes that the reader is familiar with the Domain Name System, + particularly as described in RFCs 1033, 1034, 1035 and later RFCs. An + earlier version of these extensions appears in RFC 2065. This + replacement for that RFC incorporates early implementation experience + and requests from potential users. + + Section 2 provides an overview of the extensions and the key + distribution, data origin authentication, and transaction and request + security they provide. + + Section 3 discusses the KEY resource record, its structure, and use + in DNS responses. These resource records represent the public keys + of entities named in the DNS and are used for key distribution. + + Section 4 discusses the SIG digital signature resource record, its + structure, and use in DNS responses. These resource records are used + to authenticate other resource records in the DNS and optionally to + authenticate DNS transactions and requests. + + Section 5 discusses the NXT resource record (RR) and its use in DNS + responses including full and incremental zone transfers. The NXT RR + permits authenticated denial of the existence of a name or of an RR + type for an existing name. + + Section 6 discusses how a resolver can be configured with a starting + key or keys and proceed to securely resolve DNS requests. + Interactions between resolvers and servers are discussed for various + combinations of security aware and security non-aware. Two + additional DNS header bits are defined for signaling between + resolvers and servers. + + Section 7 describes the ASCII representation of the security resource + records for use in master files and elsewhere. + + Section 8 defines the canonical form and order of RRs for DNS + security purposes. + + Section 9 defines levels of conformance for resolvers and servers. + + Section 10 provides a few paragraphs on overall security + considerations. + + Section 11 specified IANA considerations for allocation of additional + values of paramters defined in this document. + + + +Eastlake Standards Track [Page 4] + +RFC 2535 DNS Security Extensions March 1999 + + + Appendix A gives details of base 64 encoding which is used in the + file representation of some RRs defined in this document. + + Appendix B summarizes changes between this memo and RFC 2065. + + Appendix C specified how to calculate the simple checksum used as a + key tag in most SIG RRs. + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in [RFC2119]. + +2. Overview of the DNS Extensions + + The Domain Name System (DNS) protocol security extensions provide + three distinct services: key distribution as described in Section 2.2 + below, data origin authentication as described in Section 2.3 below, + and transaction and request authentication, described in Section 2.4 + below. + + Special considerations related to "time to live", CNAMEs, and + delegation points are also discussed in Section 2.3. + +2.1 Services Not Provided + + It is part of the design philosophy of the DNS that the data in it is + public and that the DNS gives the same answers to all inquirers. + Following this philosophy, no attempt has been made to include any + sort of access control lists or other means to differentiate + inquirers. + + No effort has been made to provide for any confidentiality for + queries or responses. (This service may be available via IPSEC [RFC + 2401], TLS, or other security protocols.) + + Protection is not provided against denial of service. + +2.2 Key Distribution + + A resource record format is defined to associate keys with DNS names. + This permits the DNS to be used as a public key distribution + mechanism in support of DNS security itself and other protocols. + + The syntax of a KEY resource record (RR) is described in Section 3. + It includes an algorithm identifier, the actual public key + parameter(s), and a variety of flags including those indicating the + type of entity the key is associated with and/or asserting that there + is no key associated with that entity. + + + +Eastlake Standards Track [Page 5] + +RFC 2535 DNS Security Extensions March 1999 + + + Under conditions described in Section 3.5, security aware DNS servers + will automatically attempt to return KEY resources as additional + information, along with those resource records actually requested, to + minimize the number of queries needed. + +2.3 Data Origin Authentication and Integrity + + Authentication is provided by associating with resource record sets + (RRsets [RFC 2181]) in the DNS cryptographically generated digital + signatures. Commonly, there will be a single private key that + authenticates an entire zone but there might be multiple keys for + different algorithms, signers, etc. If a security aware resolver + reliably learns a public key of the zone, it can authenticate, for + signed data read from that zone, that it is properly authorized. The + most secure implementation is for the zone private key(s) to be kept + off-line and used to re-sign all of the records in the zone + periodically. However, there are cases, for example dynamic update + [RFCs 2136, 2137], where DNS private keys need to be on-line [RFC + 2541]. + + The data origin authentication key(s) are associated with the zone + and not with the servers that store copies of the data. That means + compromise of a secondary server or, if the key(s) are kept off line, + even the primary server for a zone, will not necessarily affect the + degree of assurance that a resolver has that it can determine whether + data is genuine. + + A resolver could learn a public key of a zone either by reading it + from the DNS or by having it staticly configured. To reliably learn + a public key by reading it from the DNS, the key itself must be + signed with a key the resolver trusts. The resolver must be + configured with at least a public key which authenticates one zone as + a starting point. From there, it can securely read public keys of + other zones, if the intervening zones in the DNS tree are secure and + their signed keys accessible. + + Adding data origin authentication and integrity requires no change to + the "on-the-wire" DNS protocol beyond the addition of the signature + resource type and the key resource type needed for key distribution. + (Data non-existence authentication also requires the NXT RR as + described in 2.3.2.) This service can be supported by existing + resolver and caching server implementations so long as they can + support the additional resource types (see Section 9). The one + exception is that CNAME referrals in a secure zone can not be + authenticated if they are from non-security aware servers (see + Section 2.3.5). + + + + + +Eastlake Standards Track [Page 6] + +RFC 2535 DNS Security Extensions March 1999 + + + If signatures are separately retrieved and verified when retrieving + the information they authenticate, there will be more trips to the + server and performance will suffer. Security aware servers mitigate + that degradation by attempting to send the signature(s) needed (see + Section 4.2). + +2.3.1 The SIG Resource Record + + The syntax of a SIG resource record (signature) is described in + Section 4. It cryptographicly binds the RRset being signed to the + signer and a validity interval. + + Every name in a secured zone will have associated with it at least + one SIG resource record for each resource type under that name except + for glue address RRs and delegation point NS RRs. A security aware + server will attempt to return, with RRs retrieved, the corresponding + SIGs. If a server is not security aware, the resolver must retrieve + all the SIG records for a name and select the one or ones that sign + the resource record set(s) that resolver is interested in. + +2.3.2 Authenticating Name and Type Non-existence + + The above security mechanism only provides a way to sign existing + RRsets in a zone. "Data origin" authentication is not obviously + provided for the non-existence of a domain name in a zone or the + non-existence of a type for an existing name. This gap is filled by + the NXT RR which authenticatably asserts a range of non-existent + names in a zone and the non-existence of types for the existing name + just before that range. + + Section 5 below covers the NXT RR. + +2.3.3 Special Considerations With Time-to-Live + + A digital signature will fail to verify if any change has occurred to + the data between the time it was originally signed and the time the + signature is verified. This conflicts with our desire to have the + time-to-live (TTL) field of resource records tick down while they are + cached. + + This could be avoided by leaving the time-to-live out of the digital + signature, but that would allow unscrupulous servers to set + arbitrarily long TTL values undetected. Instead, we include the + "original" TTL in the signature and communicate that data along with + the current TTL. Unscrupulous servers under this scheme can + manipulate the TTL but a security aware resolver will bound the TTL + value it uses at the original signed value. Separately, signatures + include a signature inception time and a signature expiration time. A + + + +Eastlake Standards Track [Page 7] + +RFC 2535 DNS Security Extensions March 1999 + + + resolver that knows the absolute time can determine securely whether + a signature is in effect. It is not possible to rely solely on the + signature expiration as a substitute for the TTL, however, since the + TTL is primarily a database consistency mechanism and non-security + aware servers that depend on TTL must still be supported. + +2.3.4 Special Considerations at Delegation Points + + DNS security would like to view each zone as a unit of data + completely under the control of the zone owner with each entry + (RRset) signed by a special private key held by the zone manager. + But the DNS protocol views the leaf nodes in a zone, which are also + the apex nodes of a subzone (i.e., delegation points), as "really" + belonging to the subzone. These nodes occur in two master files and + might have RRs signed by both the upper and lower zone's keys. A + retrieval could get a mixture of these RRs and SIGs, especially since + one server could be serving both the zone above and below a + delegation point. [RFC 2181] + + There MUST be a zone KEY RR, signed by its superzone, for every + subzone if the superzone is secure. This will normally appear in the + subzone and may also be included in the superzone. But, in the case + of an unsecured subzone which can not or will not be modified to add + any security RRs, a KEY declaring the subzone to be unsecured MUST + appear with the superzone signature in the superzone, if the + superzone is secure. For all but one other RR type the data from the + subzone is more authoritative so only the subzone KEY RR should be + signed in the superzone if it appears there. The NS and any glue + address RRs SHOULD only be signed in the subzone. The SOA and any + other RRs that have the zone name as owner should appear only in the + subzone and thus are signed only there. The NXT RR type is the + exceptional case that will always appear differently and + authoritatively in both the superzone and subzone, if both are + secure, as described in Section 5. + +2.3.5 Special Considerations with CNAME + + There is a problem when security related RRs with the same owner name + as a CNAME RR are retrieved from a non-security-aware server. In + particular, an initial retrieval for the CNAME or any other type may + not retrieve any associated SIG, KEY, or NXT RR. For retrieved types + other than CNAME, it will retrieve that type at the target name of + the CNAME (or chain of CNAMEs) and will also return the CNAME. In + particular, a specific retrieval for type SIG will not get the SIG, + if any, at the original CNAME domain name but rather a SIG at the + target name. + + + + + +Eastlake Standards Track [Page 8] + +RFC 2535 DNS Security Extensions March 1999 + + + Security aware servers must be used to securely CNAME in DNS. + Security aware servers MUST (1) allow KEY, SIG, and NXT RRs along + with CNAME RRs, (2) suppress CNAME processing on retrieval of these + types as well as on retrieval of the type CNAME, and (3) + automatically return SIG RRs authenticating the CNAME or CNAMEs + encountered in resolving a query. This is a change from the previous + DNS standard [RFCs 1034/1035] which prohibited any other RR type at a + node where a CNAME RR was present. + +2.3.6 Signers Other Than The Zone + + There are cases where the signer in a SIG resource record is other + than one of the private key(s) used to authenticate a zone. + + One is for support of dynamic update [RFC 2136] (or future requests + which require secure authentication) where an entity is permitted to + authenticate/update its records [RFC 2137] and the zone is operating + in a mode where the zone key is not on line. The public key of the + entity must be present in the DNS and be signed by a zone level key + but the other RR(s) may be signed with the entity's key. + + A second case is support of transaction and request authentication as + described in Section 2.4. + + In additions, signatures can be included on resource records within + the DNS for use by applications other than DNS. DNS related + signatures authenticate that data originated with the authority of a + zone owner or that a request or transaction originated with the + relevant entity. Other signatures can provide other types of + assurances. + +2.4 DNS Transaction and Request Authentication + + The data origin authentication service described above protects + retrieved resource records and the non-existence of resource records + but provides no protection for DNS requests or for message headers. + + If header bits are falsely set by a bad server, there is little that + can be done. However, it is possible to add transaction + authentication. Such authentication means that a resolver can be + sure it is at least getting messages from the server it thinks it + queried and that the response is from the query it sent (i.e., that + these messages have not been diddled in transit). This is + accomplished by optionally adding a special SIG resource record at + the end of the reply which digitally signs the concatenation of the + server's response and the resolver's query. + + + + + +Eastlake Standards Track [Page 9] + +RFC 2535 DNS Security Extensions March 1999 + + + Requests can also be authenticated by including a special SIG RR at + the end of the request. Authenticating requests serves no function + in older DNS servers and requests with a non-empty additional + information section produce error returns or may even be ignored by + many of them. However, this syntax for signing requests is defined as + a way of authenticating secure dynamic update requests [RFC 2137] or + future requests requiring authentication. + + The private keys used in transaction security belong to the entity + composing the reply, not to the zone involved. Request + authentication may also involve the private key of the host or other + entity composing the request or other private keys depending on the + request authority it is sought to establish. The corresponding public + key(s) are normally stored in and retrieved from the DNS for + verification. + + Because requests and replies are highly variable, message + authentication SIGs can not be pre-calculated. Thus it will be + necessary to keep the private key on-line, for example in software or + in a directly connected piece of hardware. + +3. The KEY Resource Record + + The KEY resource record (RR) is used to store a public key that is + associated with a Domain Name System (DNS) name. This can be the + public key of a zone, a user, or a host or other end entity. Security + aware DNS implementations MUST be designed to handle at least two + simultaneously valid keys of the same type associated with the same + name. + + The type number for the KEY RR is 25. + + A KEY RR is, like any other RR, authenticated by a SIG RR. KEY RRs + must be signed by a zone level key. + +3.1 KEY RDATA format + + The RDATA for a KEY RR consists of flags, a protocol octet, the + algorithm number octet, and the public key itself. The format is as + follows: + + + + + + + + + + + +Eastlake Standards Track [Page 10] + +RFC 2535 DNS Security Extensions March 1999 + + + 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | flags | protocol | algorithm | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | / + / public key / + / / + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| + + The KEY RR is not intended for storage of certificates and a separate + certificate RR has been developed for that purpose, defined in [RFC + 2538]. + + The meaning of the KEY RR owner name, flags, and protocol octet are + described in Sections 3.1.1 through 3.1.5 below. The flags and + algorithm must be examined before any data following the algorithm + octet as they control the existence and format of any following data. + The algorithm and public key fields are described in Section 3.2. + The format of the public key is algorithm dependent. + + KEY RRs do not specify their validity period but their authenticating + SIG RR(s) do as described in Section 4 below. + +3.1.1 Object Types, DNS Names, and Keys + + The public key in a KEY RR is for the object named in the owner name. + + A DNS name may refer to three different categories of things. For + example, foo.host.example could be (1) a zone, (2) a host or other + end entity , or (3) the mapping into a DNS name of the user or + account foo@host.example. Thus, there are flag bits, as described + below, in the KEY RR to indicate with which of these roles the owner + name and public key are associated. Note that an appropriate zone + KEY RR MUST occur at the apex node of a secure zone and zone KEY RRs + occur only at delegation points. + +3.1.2 The KEY RR Flag Field + + In the "flags" field: + + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 + +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ + | A/C | Z | XT| Z | Z | NAMTYP| Z | Z | Z | Z | SIG | + +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ + + Bit 0 and 1 are the key "type" bits whose values have the following + meanings: + + + +Eastlake Standards Track [Page 11] + +RFC 2535 DNS Security Extensions March 1999 + + + 10: Use of the key is prohibited for authentication. + 01: Use of the key is prohibited for confidentiality. + 00: Use of the key for authentication and/or confidentiality + is permitted. Note that DNS security makes use of keys + for authentication only. Confidentiality use flagging is + provided for use of keys in other protocols. + Implementations not intended to support key distribution + for confidentiality MAY require that the confidentiality + use prohibited bit be on for keys they serve. + 11: If both bits are one, the "no key" value, there is no key + information and the RR stops after the algorithm octet. + By the use of this "no key" value, a signed KEY RR can + authenticatably assert that, for example, a zone is not + secured. See section 3.4 below. + + Bits 2 is reserved and must be zero. + + Bits 3 is reserved as a flag extension bit. If it is a one, a second + 16 bit flag field is added after the algorithm octet and + before the key data. This bit MUST NOT be set unless one or + more such additional bits have been defined and are non-zero. + + Bits 4-5 are reserved and must be zero. + + Bits 6 and 7 form a field that encodes the name type. Field values + have the following meanings: + + 00: indicates that this is a key associated with a "user" or + "account" at an end entity, usually a host. The coding + of the owner name is that used for the responsible + individual mailbox in the SOA and RP RRs: The owner name + is the user name as the name of a node under the entity + name. For example, "j_random_user" on + host.subdomain.example could have a public key associated + through a KEY RR with name + j_random_user.host.subdomain.example. It could be used + in a security protocol where authentication of a user was + desired. This key might be useful in IP or other + security for a user level service such a telnet, ftp, + rlogin, etc. + 01: indicates that this is a zone key for the zone whose name + is the KEY RR owner name. This is the public key used + for the primary DNS security feature of data origin + authentication. Zone KEY RRs occur only at delegation + points. + 10: indicates that this is a key associated with the non-zone + "entity" whose name is the RR owner name. This will + commonly be a host but could, in some parts of the DNS + + + +Eastlake Standards Track [Page 12] + +RFC 2535 DNS Security Extensions March 1999 + + + tree, be some other type of entity such as a telephone + number [RFC 1530] or numeric IP address. This is the + public key used in connection with DNS request and + transaction authentication services. It could also be + used in an IP-security protocol where authentication at + the host, rather than user, level was desired, such as + routing, NTP, etc. + 11: reserved. + + Bits 8-11 are reserved and must be zero. + + Bits 12-15 are the "signatory" field. If non-zero, they indicate + that the key can validly sign things as specified in DNS + dynamic update [RFC 2137]. Note that zone keys (see bits + 6 and 7 above) always have authority to sign any RRs in + the zone regardless of the value of the signatory field. + +3.1.3 The Protocol Octet + + It is anticipated that keys stored in DNS will be used in conjunction + with a variety of Internet protocols. It is intended that the + protocol octet and possibly some of the currently unused (must be + zero) bits in the KEY RR flags as specified in the future will be + used to indicate a key's validity for different protocols. + + The following values of the Protocol Octet are reserved as indicated: + + VALUE Protocol + + 0 -reserved + 1 TLS + 2 email + 3 dnssec + 4 IPSEC + 5-254 - available for assignment by IANA + 255 All + + In more detail: + 1 is reserved for use in connection with TLS. + 2 is reserved for use in connection with email. + 3 is used for DNS security. The protocol field SHOULD be set to + this value for zone keys and other keys used in DNS security. + Implementations that can determine that a key is a DNS + security key by the fact that flags label it a zone key or the + signatory flag field is non-zero are NOT REQUIRED to check the + protocol field. + 4 is reserved to refer to the Oakley/IPSEC [RFC 2401] protocol + and indicates that this key is valid for use in conjunction + + + +Eastlake Standards Track [Page 13] + +RFC 2535 DNS Security Extensions March 1999 + + + with that security standard. This key could be used in + connection with secured communication on behalf of an end + entity or user whose name is the owner name of the KEY RR if + the entity or user flag bits are set. The presence of a KEY + resource with this protocol value is an assertion that the + host speaks Oakley/IPSEC. + 255 indicates that the key can be used in connection with any + protocol for which KEY RR protocol octet values have been + defined. The use of this value is discouraged and the use of + different keys for different protocols is encouraged. + +3.2 The KEY Algorithm Number Specification + + This octet is the key algorithm parallel to the same field for the + SIG resource as described in Section 4.1. The following values are + assigned: + + VALUE Algorithm + + 0 - reserved, see Section 11 + 1 RSA/MD5 [RFC 2537] - recommended + 2 Diffie-Hellman [RFC 2539] - optional, key only + 3 DSA [RFC 2536] - MANDATORY + 4 reserved for elliptic curve crypto + 5-251 - available, see Section 11 + 252 reserved for indirect keys + 253 private - domain name (see below) + 254 private - OID (see below) + 255 - reserved, see Section 11 + + Algorithm specific formats and procedures are given in separate + documents. The mandatory to implement for interoperability algorithm + is number 3, DSA. It is recommended that the RSA/MD5 algorithm, + number 1, also be implemented. Algorithm 2 is used to indicate + Diffie-Hellman keys and algorithm 4 is reserved for elliptic curve. + + Algorithm number 252 indicates an indirect key format where the + actual key material is elsewhere. This format is to be defined in a + separate document. + + Algorithm numbers 253 and 254 are reserved for private use and will + never be assigned a specific algorithm. For number 253, the public + key area and the signature begin with a wire encoded domain name. + Only local domain name compression is permitted. The domain name + indicates the private algorithm to use and the remainder of the + public key area is whatever is required by that algorithm. For + number 254, the public key area for the KEY RR and the signature + begin with an unsigned length byte followed by a BER encoded Object + + + +Eastlake Standards Track [Page 14] + +RFC 2535 DNS Security Extensions March 1999 + + + Identifier (ISO OID) of that length. The OID indicates the private + algorithm in use and the remainder of the area is whatever is + required by that algorithm. Entities should only use domain names + and OIDs they control to designate their private algorithms. + + Values 0 and 255 are reserved but the value 0 is used in the + algorithm field when that field is not used. An example is in a KEY + RR with the top two flag bits on, the "no-key" value, where no key is + present. + +3.3 Interaction of Flags, Algorithm, and Protocol Bytes + + Various combinations of the no-key type flags, algorithm byte, + protocol byte, and any future assigned protocol indicating flags are + possible. The meaning of these combinations is indicated below: + + NK = no key type (flags bits 0 and 1 on) + AL = algorithm byte + PR = protocols indicated by protocol byte or future assigned flags + + x represents any valid non-zero value(s). + + AL PR NK Meaning + 0 0 0 Illegal, claims key but has bad algorithm field. + 0 0 1 Specifies total lack of security for owner zone. + 0 x 0 Illegal, claims key but has bad algorithm field. + 0 x 1 Specified protocols unsecured, others may be secure. + x 0 0 Gives key but no protocols to use it. + x 0 1 Denies key for specific algorithm. + x x 0 Specifies key for protocols. + x x 1 Algorithm not understood for protocol. + +3.4 Determination of Zone Secure/Unsecured Status + + A zone KEY RR with the "no-key" type field value (both key type flag + bits 0 and 1 on) indicates that the zone named is unsecured while a + zone KEY RR with a key present indicates that the zone named is + secure. The secured versus unsecured status of a zone may vary with + different cryptographic algorithms. Even for the same algorithm, + conflicting zone KEY RRs may be present. + + Zone KEY RRs, like all RRs, are only trusted if they are + authenticated by a SIG RR whose signer field is a signer for which + the resolver has a public key they trust and where resolver policy + permits that signer to sign for the KEY owner name. Untrusted zone + KEY RRs MUST be ignored in determining the security status of the + zone. However, there can be multiple sets of trusted zone KEY RRs + for a zone with different algorithms, signers, etc. + + + +Eastlake Standards Track [Page 15] + +RFC 2535 DNS Security Extensions March 1999 + + + For any particular algorithm, zones can be (1) secure, indicating + that any retrieved RR must be authenticated by a SIG RR or it will be + discarded as bogus, (2) unsecured, indicating that SIG RRs are not + expected or required for RRs retrieved from the zone, or (3) + experimentally secure, which indicates that SIG RRs might or might + not be present but must be checked if found. The status of a zone is + determined as follows: + + 1. If, for a zone and algorithm, every trusted zone KEY RR for the + zone says there is no key for that zone, it is unsecured for that + algorithm. + + 2. If, there is at least one trusted no-key zone KEY RR and one + trusted key specifying zone KEY RR, then that zone is only + experimentally secure for the algorithm. Both authenticated and + non-authenticated RRs for it should be accepted by the resolver. + + 3. If every trusted zone KEY RR that the zone and algorithm has is + key specifying, then it is secure for that algorithm and only + authenticated RRs from it will be accepted. + + Examples: + + (1) A resolver initially trusts only signatures by the superzone of + zone Z within the DNS hierarchy. Thus it will look only at the KEY + RRs that are signed by the superzone. If it finds only no-key KEY + RRs, it will assume the zone is not secure. If it finds only key + specifying KEY RRs, it will assume the zone is secure and reject any + unsigned responses. If it finds both, it will assume the zone is + experimentally secure + + (2) A resolver trusts the superzone of zone Z (to which it got + securely from its local zone) and a third party, cert-auth.example. + When considering data from zone Z, it may be signed by the superzone + of Z, by cert-auth.example, by both, or by neither. The following + table indicates whether zone Z will be considered secure, + experimentally secure, or unsecured, depending on the signed zone KEY + RRs for Z; + + c e r t - a u t h . e x a m p l e + + KEY RRs| None | NoKeys | Mixed | Keys | + S --+-----------+-----------+----------+----------+ + u None | illegal | unsecured | experim. | secure | + p --+-----------+-----------+----------+----------+ + e NoKeys | unsecured | unsecured | experim. | secure | + r --+-----------+-----------+----------+----------+ + Z Mixed | experim. | experim. | experim. | secure | + + + +Eastlake Standards Track [Page 16] + +RFC 2535 DNS Security Extensions March 1999 + + + o --+-----------+-----------+----------+----------+ + n Keys | secure | secure | secure | secure | + e +-----------+-----------+----------+----------+ + +3.5 KEY RRs in the Construction of Responses + + An explicit request for KEY RRs does not cause any special additional + information processing except, of course, for the corresponding SIG + RR from a security aware server (see Section 4.2). + + Security aware DNS servers include KEY RRs as additional information + in responses, where a KEY is available, in the following cases: + + (1) On the retrieval of SOA or NS RRs, the KEY RRset with the same + name (perhaps just a zone key) SHOULD be included as additional + information if space is available. If not all additional information + will fit, type A and AAAA glue RRs have higher priority than KEY + RR(s). + + (2) On retrieval of type A or AAAA RRs, the KEY RRset with the same + name (usually just a host RR and NOT the zone key (which usually + would have a different name)) SHOULD be included if space is + available. On inclusion of A or AAAA RRs as additional information, + the KEY RRset with the same name should also be included but with + lower priority than the A or AAAA RRs. + +4. The SIG Resource Record + + The SIG or "signature" resource record (RR) is the fundamental way + that data is authenticated in the secure Domain Name System (DNS). As + such it is the heart of the security provided. + + The SIG RR unforgably authenticates an RRset [RFC 2181] of a + particular type, class, and name and binds it to a time interval and + the signer's domain name. This is done using cryptographic + techniques and the signer's private key. The signer is frequently + the owner of the zone from which the RR originated. + + The type number for the SIG RR type is 24. + +4.1 SIG RDATA Format + + The RDATA portion of a SIG RR is as shown below. The integrity of + the RDATA information is protected by the signature field. + + + + + + + +Eastlake Standards Track [Page 17] + +RFC 2535 DNS Security Extensions March 1999 + + + 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | type covered | algorithm | labels | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | original TTL | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | signature expiration | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | signature inception | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | key tag | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ signer's name + + | / + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-/ + / / + / signature / + / / + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +4.1.1 Type Covered Field + + The "type covered" is the type of the other RRs covered by this SIG. + +4.1.2 Algorithm Number Field + + This octet is as described in section 3.2. + +4.1.3 Labels Field + + The "labels" octet is an unsigned count of how many labels there are + in the original SIG RR owner name not counting the null label for + root and not counting any initial "*" for a wildcard. If a secured + retrieval is the result of wild card substitution, it is necessary + for the resolver to use the original form of the name in verifying + the digital signature. This field makes it easy to determine the + original form. + + If, on retrieval, the RR appears to have a longer name than indicated + by "labels", the resolver can tell it is the result of wildcard + substitution. If the RR owner name appears to be shorter than the + labels count, the SIG RR must be considered corrupt and ignored. The + maximum number of labels allowed in the current DNS is 127 but the + entire octet is reserved and would be required should DNS names ever + be expanded to 255 labels. The following table gives some examples. + The value of "labels" is at the top, the retrieved owner name on the + left, and the table entry is the name to use in signature + verification except that "bad" means the RR is corrupt. + + + +Eastlake Standards Track [Page 18] + +RFC 2535 DNS Security Extensions March 1999 + + + labels= | 0 | 1 | 2 | 3 | 4 | + --------+-----+------+--------+----------+----------+ + .| . | bad | bad | bad | bad | + d.| *. | d. | bad | bad | bad | + c.d.| *. | *.d. | c.d. | bad | bad | + b.c.d.| *. | *.d. | *.c.d. | b.c.d. | bad | + a.b.c.d.| *. | *.d. | *.c.d. | *.b.c.d. | a.b.c.d. | + +4.1.4 Original TTL Field + + The "original TTL" field is included in the RDATA portion to avoid + (1) authentication problems that caching servers would otherwise + cause by decrementing the real TTL field and (2) security problems + that unscrupulous servers could otherwise cause by manipulating the + real TTL field. This original TTL is protected by the signature + while the current TTL field is not. + + NOTE: The "original TTL" must be restored into the covered RRs when + the signature is verified (see Section 8). This generaly implies + that all RRs for a particular type, name, and class, that is, all the + RRs in any particular RRset, must have the same TTL to start with. + +4.1.5 Signature Expiration and Inception Fields + + The SIG is valid from the "signature inception" time until the + "signature expiration" time. Both are unsigned numbers of seconds + since the start of 1 January 1970, GMT, ignoring leap seconds. (See + also Section 4.4.) Ring arithmetic is used as for DNS SOA serial + numbers [RFC 1982] which means that these times can never be more + than about 68 years in the past or the future. This means that these + times are ambiguous modulo ~136.09 years. However there is no + security flaw because keys are required to be changed to new random + keys by [RFC 2541] at least every five years. This means that the + probability that the same key is in use N*136.09 years later should + be the same as the probability that a random guess will work. + + A SIG RR may have an expiration time numerically less than the + inception time if the expiration time is near the 32 bit wrap around + point and/or the signature is long lived. + + (To prevent misordering of network requests to update a zone + dynamically, monotonically increasing "signature inception" times may + be necessary.) + + A secure zone must be considered changed for SOA serial number + purposes not only when its data is updated but also when new SIG RRs + are inserted (ie, the zone or any part of it is re-signed). + + + + +Eastlake Standards Track [Page 19] + +RFC 2535 DNS Security Extensions March 1999 + + +4.1.6 Key Tag Field + + The "key Tag" is a two octet quantity that is used to efficiently + select between multiple keys which may be applicable and thus check + that a public key about to be used for the computationally expensive + effort to check the signature is possibly valid. For algorithm 1 + (MD5/RSA) as defined in [RFC 2537], it is the next to the bottom two + octets of the public key modulus needed to decode the signature + field. That is to say, the most significant 16 of the least + significant 24 bits of the modulus in network (big endian) order. For + all other algorithms, including private algorithms, it is calculated + as a simple checksum of the KEY RR as described in Appendix C. + +4.1.7 Signer's Name Field + + The "signer's name" field is the domain name of the signer generating + the SIG RR. This is the owner name of the public KEY RR that can be + used to verify the signature. It is frequently the zone which + contained the RRset being authenticated. Which signers should be + authorized to sign what is a significant resolver policy question as + discussed in Section 6. The signer's name may be compressed with + standard DNS name compression when being transmitted over the + network. + +4.1.8 Signature Field + + The actual signature portion of the SIG RR binds the other RDATA + fields to the RRset of the "type covered" RRs with that owner name + and class. This covered RRset is thereby authenticated. To + accomplish this, a data sequence is constructed as follows: + + data = RDATA | RR(s)... + + where "|" is concatenation, + + RDATA is the wire format of all the RDATA fields in the SIG RR itself + (including the canonical form of the signer's name) before but not + including the signature, and + + RR(s) is the RRset of the RR(s) of the type covered with the same + owner name and class as the SIG RR in canonical form and order as + defined in Section 8. + + How this data sequence is processed into the signature is algorithm + dependent. These algorithm dependent formats and procedures are + described in separate documents (Section 3.2). + + + + + +Eastlake Standards Track [Page 20] + +RFC 2535 DNS Security Extensions March 1999 + + + SIGs SHOULD NOT be included in a zone for any "meta-type" such as + ANY, AXFR, etc. (but see section 5.6.2 with regard to IXFR). + +4.1.8.1 Calculating Transaction and Request SIGs + + A response message from a security aware server may optionally + contain a special SIG at the end of the additional information + section to authenticate the transaction. + + This SIG has a "type covered" field of zero, which is not a valid RR + type. It is calculated by using a "data" (see Section 4.1.8) of the + entire preceding DNS reply message, including DNS header but not the + IP header and before the reply RR counts have been adjusted for the + inclusion of any transaction SIG, concatenated with the entire DNS + query message that produced this response, including the query's DNS + header and any request SIGs but not its IP header. That is + + data = full response (less transaction SIG) | full query + + Verification of the transaction SIG (which is signed by the server + host key, not the zone key) by the requesting resolver shows that the + query and response were not tampered with in transit, that the + response corresponds to the intended query, and that the response + comes from the queried server. + + A DNS request may be optionally signed by including one or more SIGs + at the end of the query. Such SIGs are identified by having a "type + covered" field of zero. They sign the preceding DNS request message + including DNS header but not including the IP header or any request + SIGs at the end and before the request RR counts have been adjusted + for the inclusions of any request SIG(s). + + WARNING: Request SIGs are unnecessary for any currently defined + request other than update [RFC 2136, 2137] and will cause some old + DNS servers to give an error return or ignore a query. However, such + SIGs may in the future be needed for other requests. + + Except where needed to authenticate an update or similar privileged + request, servers are not required to check request SIGs. + +4.2 SIG RRs in the Construction of Responses + + Security aware DNS servers SHOULD, for every authenticated RRset the + query will return, attempt to send the available SIG RRs which + authenticate the requested RRset. The following rules apply to the + inclusion of SIG RRs in responses: + + + + + +Eastlake Standards Track [Page 21] + +RFC 2535 DNS Security Extensions March 1999 + + + 1. when an RRset is placed in a response, its SIG RR has a higher + priority for inclusion than additional RRs that may need to be + included. If space does not permit its inclusion, the response + MUST be considered truncated except as provided in 2 below. + + 2. When a SIG RR is present in the zone for an additional + information section RR, the response MUST NOT be considered + truncated merely because space does not permit the inclusion of + the SIG RR with the additional information. + + 3. SIGs to authenticate glue records and NS RRs for subzones at a + delegation point are unnecessary and MUST NOT be sent. + + 4. If a SIG covers any RR that would be in the answer section of + the response, its automatic inclusion MUST be in the answer + section. If it covers an RR that would appear in the authority + section, its automatic inclusion MUST be in the authority + section. If it covers an RR that would appear in the additional + information section it MUST appear in the additional information + section. This is a change in the existing standard [RFCs 1034, + 1035] which contemplates only NS and SOA RRs in the authority + section. + + 5. Optionally, DNS transactions may be authenticated by a SIG RR at + the end of the response in the additional information section + (Section 4.1.8.1). Such SIG RRs are signed by the DNS server + originating the response. Although the signer field MUST be a + name of the originating server host, the owner name, class, TTL, + and original TTL, are meaningless. The class and TTL fields + SHOULD be zero. To conserve space, the owner name SHOULD be + root (a single zero octet). If transaction authentication is + desired, that SIG RR must be considered the highest priority for + inclusion. + +4.3 Processing Responses and SIG RRs + + The following rules apply to the processing of SIG RRs included in a + response: + + 1. A security aware resolver that receives a response from a + security aware server via a secure communication with the AD bit + (see Section 6.1) set, MAY choose to accept the RRs as received + without verifying the zone SIG RRs. + + 2. In other cases, a security aware resolver SHOULD verify the SIG + RRs for the RRs of interest. This may involve initiating + additional queries for SIG or KEY RRs, especially in the case of + + + + +Eastlake Standards Track [Page 22] + +RFC 2535 DNS Security Extensions March 1999 + + + getting a response from a server that does not implement + security. (As explained in 2.3.5 above, it will not be possible + to secure CNAMEs being served up by non-secure resolvers.) + + NOTE: Implementers might expect the above SHOULD to be a MUST. + However, local policy or the calling application may not require + the security services. + + 3. If SIG RRs are received in response to a user query explicitly + specifying the SIG type, no special processing is required. + + If the message does not pass integrity checks or the SIG does not + check against the signed RRs, the SIG RR is invalid and should be + ignored. If all of the SIG RR(s) purporting to authenticate an RRset + are invalid, then the RRset is not authenticated. + + If the SIG RR is the last RR in a response in the additional + information section and has a type covered of zero, it is a + transaction signature of the response and the query that produced the + response. It MAY be optionally checked and the message rejected if + the checks fail. But even if the checks succeed, such a transaction + authentication SIG does NOT directly authenticate any RRs in the + message. Only a proper SIG RR signed by the zone or a key tracing + its authority to the zone or to static resolver configuration can + directly authenticate RRs, depending on resolver policy (see Section + 6). If a resolver does not implement transaction and/or request + SIGs, it MUST ignore them without error. + + If all checks indicate that the SIG RR is valid then RRs verified by + it should be considered authenticated. + +4.4 Signature Lifetime, Expiration, TTLs, and Validity + + Security aware servers MUST NOT consider SIG RRs to authenticate + anything before their signature inception or after its expiration + time (see also Section 6). Security aware servers MUST NOT consider + any RR to be authenticated after all its signatures have expired. + When a secure server caches authenticated data, if the TTL would + expire at a time further in the future than the authentication + expiration time, the server SHOULD trim the TTL in the cache entry + not to extent beyond the authentication expiration time. Within + these constraints, servers should continue to follow DNS TTL aging. + Thus authoritative servers should continue to follow the zone refresh + and expire parameters and a non-authoritative server should count + down the TTL and discard RRs when the TTL is zero (even for a SIG + that has not yet reached its authentication expiration time). In + addition, when RRs are transmitted in a query response, the TTL + + + + +Eastlake Standards Track [Page 23] + +RFC 2535 DNS Security Extensions March 1999 + + + should be trimmed so that current time plus the TTL does not extend + beyond the authentication expiration time. Thus, in general, the TTL + on a transmitted RR would be + + min(authExpTim,max(zoneMinTTL,min(originalTTL,currentTTL))) + + When signatures are generated, signature expiration times should be + set far enough in the future that it is quite certain that new + signatures can be generated before the old ones expire. However, + setting expiration too far into the future could mean a long time to + flush any bad data or signatures that may have been generated. + + It is recommended that signature lifetime be a small multiple of the + TTL (ie, 4 to 16 times the TTL) but not less than a reasonable + maximum re-signing interval and not less than the zone expiry time. + +5. Non-existent Names and Types + + The SIG RR mechanism described in Section 4 above provides strong + authentication of RRs that exist in a zone. But it is not clear + above how to verifiably deny the existence of a name in a zone or a + type for an existent name. + + The nonexistence of a name in a zone is indicated by the NXT ("next") + RR for a name interval containing the nonexistent name. An NXT RR or + RRs and its or their SIG(s) are returned in the authority section, + along with the error, if the server is security aware. The same is + true for a non-existent type under an existing name except that there + is no error indication other than an empty answer section + accompanying the NXT(s). This is a change in the existing standard + [RFCs 1034/1035] which contemplates only NS and SOA RRs in the + authority section. NXT RRs will also be returned if an explicit query + is made for the NXT type. + + The existence of a complete set of NXT records in a zone means that + any query for any name and any type to a security aware server + serving the zone will result in an reply containing at least one + signed RR unless it is a query for delegation point NS or glue A or + AAAA RRs. + +5.1 The NXT Resource Record + + The NXT resource record is used to securely indicate that RRs with an + owner name in a certain name interval do not exist in a zone and to + indicate what RR types are present for an existing name. + + + + + + +Eastlake Standards Track [Page 24] + +RFC 2535 DNS Security Extensions March 1999 + + + The owner name of the NXT RR is an existing name in the zone. It's + RDATA is a "next" name and a type bit map. Thus the NXT RRs in a zone + create a chain of all of the literal owner names in that zone, + including unexpanded wildcards but omitting the owner name of glue + address records unless they would otherwise be included. This implies + a canonical ordering of all domain names in a zone as described in + Section 8. The presence of the NXT RR means that no name between its + owner name and the name in its RDATA area exists and that no other + types exist under its owner name. + + There is a potential problem with the last NXT in a zone as it wants + to have an owner name which is the last existing name in canonical + order, which is easy, but it is not obvious what name to put in its + RDATA to indicate the entire remainder of the name space. This is + handled by treating the name space as circular and putting the zone + name in the RDATA of the last NXT in a zone. + + The NXT RRs for a zone SHOULD be automatically calculated and added + to the zone when SIGs are added. The NXT RR's TTL SHOULD NOT exceed + the zone minimum TTL. + + The type number for the NXT RR is 30. + + NXT RRs are only signed by zone level keys. + +5.2 NXT RDATA Format + + The RDATA for an NXT RR consists simply of a domain name followed by + a bit map, as shown below. + + 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | next domain name / + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | type bit map / + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + The NXT RR type bit map format currently defined is one bit per RR + type present for the owner name. A one bit indicates that at least + one RR of that type is present for the owner name. A zero indicates + that no such RR is present. All bits not specified because they are + beyond the end of the bit map are assumed to be zero. Note that bit + 30, for NXT, will always be on so the minimum bit map length is + actually four octets. Trailing zero octets are prohibited in this + format. The first bit represents RR type zero (an illegal type which + can not be present) and so will be zero in this format. This format + is not used if there exists an RR with a type number greater than + + + +Eastlake Standards Track [Page 25] + +RFC 2535 DNS Security Extensions March 1999 + + + 127. If the zero bit of the type bit map is a one, it indicates that + a different format is being used which will always be the case if a + type number greater than 127 is present. + + The domain name may be compressed with standard DNS name compression + when being transmitted over the network. The size of the bit map can + be inferred from the RDLENGTH and the length of the next domain name. + +5.3 Additional Complexity Due to Wildcards + + Proving that a non-existent name response is correct or that a + wildcard expansion response is correct makes things a little more + complex. + + In particular, when a non-existent name response is returned, an NXT + must be returned showing that the exact name queried did not exist + and, in general, one or more additional NXT's need to be returned to + also prove that there wasn't a wildcard whose expansion should have + been returned. (There is no need to return multiple copies of the + same NXT.) These NXTs, if any, are returned in the authority section + of the response. + + Furthermore, if a wildcard expansion is returned in a response, in + general one or more NXTs needs to also be returned in the authority + section to prove that no more specific name (including possibly more + specific wildcards in the zone) existed on which the response should + have been based. + +5.4 Example + + Assume zone foo.nil has entries for + + big.foo.nil, + medium.foo.nil. + small.foo.nil. + tiny.foo.nil. + + Then a query to a security aware server for huge.foo.nil would + produce an error reply with an RCODE of NXDOMAIN and the authority + section data including something like the following: + + + + + + + + + + + +Eastlake Standards Track [Page 26] + +RFC 2535 DNS Security Extensions March 1999 + + + foo.nil. NXT big.foo.nil NS KEY SOA NXT ;prove no *.foo.nil + foo.nil. SIG NXT 1 2 ( ;type-cov=NXT, alg=1, labels=2 + 19970102030405 ;signature expiration + 19961211100908 ;signature inception + 2143 ;key identifier + foo.nil. ;signer + AIYADP8d3zYNyQwW2EM4wXVFdslEJcUx/fxkfBeH1El4ixPFhpfHFElxbvKoWmvjDTCm + fiYy2X+8XpFjwICHc398kzWsTMKlxovpz2FnCTM= ;signature (640 bits) + ) + big.foo.nil. NXT medium.foo.nil. A MX SIG NXT ;prove no huge.foo.nil + big.foo.nil. SIG NXT 1 3 ( ;type-cov=NXT, alg=1, labels=3 + 19970102030405 ;signature expiration + 19961211100908 ;signature inception + 2143 ;key identifier + foo.nil. ;signer + MxFcby9k/yvedMfQgKzhH5er0Mu/vILz45IkskceFGgiWCn/GxHhai6VAuHAoNUz4YoU + 1tVfSCSqQYn6//11U6Nld80jEeC8aTrO+KKmCaY= ;signature (640 bits) + ) + Note that this response implies that big.foo.nil is an existing name + in the zone and thus has other RR types associated with it than NXT. + However, only the NXT (and its SIG) RR appear in the response to this + query for huge.foo.nil, which is a non-existent name. + +5.5 Special Considerations at Delegation Points + + A name (other than root) which is the head of a zone also appears as + the leaf in a superzone. If both are secure, there will always be + two different NXT RRs with the same name. They can be easily + distinguished by their signers, the next domain name fields, the + presence of the SOA type bit, etc. Security aware servers should + return the correct NXT automatically when required to authenticate + the non-existence of a name and both NXTs, if available, on explicit + query for type NXT. + + Non-security aware servers will never automatically return an NXT and + some old implementations may only return the NXT from the subzone on + explicit queries. + +5.6 Zone Transfers + + The subsections below describe how full and incremental zone + transfers are secured. + + SIG RRs secure all authoritative RRs transferred for both full and + incremental [RFC 1995] zone transfers. NXT RRs are an essential + element in secure zone transfers and assure that every authoritative + name and type will be present; however, if there are multiple SIGs + with the same name and type covered, a subset of the SIGs could be + + + +Eastlake Standards Track [Page 27] + +RFC 2535 DNS Security Extensions March 1999 + + + sent as long as at least one is present and, in the case of unsigned + delegation point NS or glue A or AAAA RRs a subset of these RRs or + simply a modified set could be sent as long as at least one of each + type is included. + + When an incremental or full zone transfer request is received with + the same or newer version number than that of the server's copy of + the zone, it is replied to with just the SOA RR of the server's + current version and the SIG RRset verifying that SOA RR. + + The complete NXT chains specified in this document enable a resolver + to obtain, by successive queries chaining through NXTs, all of the + names in a zone even if zone transfers are prohibited. Different + format NXTs may be specified in the future to avoid this. + +5.6.1 Full Zone Transfers + + To provide server authentication that a complete transfer has + occurred, transaction authentication SHOULD be used on full zone + transfers. This provides strong server based protection for the + entire zone in transit. + +5.6.2 Incremental Zone Transfers + + Individual RRs in an incremental (IXFR) transfer [RFC 1995] can be + verified in the same way as for a full zone transfer and the + integrity of the NXT name chain and correctness of the NXT type bits + for the zone after the incremental RR deletes and adds can check each + disjoint area of the zone updated. But the completeness of an + incremental transfer can not be confirmed because usually neither the + deleted RR section nor the added RR section has a compete zone NXT + chain. As a result, a server which securely supports IXFR must + handle IXFR SIG RRs for each incremental transfer set that it + maintains. + + The IXFR SIG is calculated over the incremental zone update + collection of RRs in the order in which it is transmitted: old SOA, + then deleted RRs, then new SOA and added RRs. Within each section, + RRs must be ordered as specified in Section 8. If condensation of + adjacent incremental update sets is done by the zone owner, the + original IXFR SIG for each set included in the condensation must be + discarded and a new on IXFR SIG calculated to cover the resulting + condensed set. + + The IXFR SIG really belongs to the zone as a whole, not to the zone + name. Although it SHOULD be correct for the zone name, the labels + field of an IXFR SIG is otherwise meaningless. The IXFR SIG is only + sent as part of an incremental zone transfer. After validation of + + + +Eastlake Standards Track [Page 28] + +RFC 2535 DNS Security Extensions March 1999 + + + the IXFR SIG, the transferred RRs MAY be considered valid without + verification of the internal SIGs if such trust in the server + conforms to local policy. + +6. How to Resolve Securely and the AD and CD Bits + + Retrieving or resolving secure data from the Domain Name System (DNS) + involves starting with one or more trusted public keys that have been + staticly configured at the resolver. With starting trusted keys, a + resolver willing to perform cryptography can progress securely + through the secure DNS structure to the zone of interest as described + in Section 6.3. Such trusted public keys would normally be configured + in a manner similar to that described in Section 6.2. However, as a + practical matter, a security aware resolver would still gain some + confidence in the results it returns even if it was not configured + with any keys but trusted what it got from a local well known server + as if it were staticly configured. + + Data stored at a security aware server needs to be internally + categorized as Authenticated, Pending, or Insecure. There is also a + fourth transient state of Bad which indicates that all SIG checks + have explicitly failed on the data. Such Bad data is not retained at + a security aware server. Authenticated means that the data has a + valid SIG under a KEY traceable via a chain of zero or more SIG and + KEY RRs allowed by the resolvers policies to a KEY staticly + configured at the resolver. Pending data has no authenticated SIGs + and at least one additional SIG the resolver is still trying to + authenticate. Insecure data is data which it is known can never be + either Authenticated or found Bad in the zone where it was found + because it is in or has been reached via a unsecured zone or because + it is unsigned glue address or delegation point NS data. Behavior in + terms of control of and flagging based on such data labels is + described in Section 6.1. + + The proper validation of signatures requires a reasonably secure + shared opinion of the absolute time between resolvers and servers as + described in Section 6.4. + +6.1 The AD and CD Header Bits + + Two previously unused bits are allocated out of the DNS + query/response format header. The AD (authentic data) bit indicates + in a response that all the data included in the answer and authority + portion of the response has been authenticated by the server + according to the policies of that server. The CD (checking disabled) + bit indicates in a query that Pending (non-authenticated) data is + acceptable to the resolver sending the query. + + + + +Eastlake Standards Track [Page 29] + +RFC 2535 DNS Security Extensions March 1999 + + + These bits are allocated from the previously must-be-zero Z field as + follows: + + 1 1 1 1 1 1 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | ID | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + |QR| Opcode |AA|TC|RD|RA| Z|AD|CD| RCODE | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | QDCOUNT | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | ANCOUNT | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | NSCOUNT | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + | ARCOUNT | + +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ + + These bits are zero in old servers and resolvers. Thus the responses + of old servers are not flagged as authenticated to security aware + resolvers and queries from non-security aware resolvers do not assert + the checking disabled bit and thus will be answered by security aware + servers only with Authenticated or Insecure data. Security aware + resolvers MUST NOT trust the AD bit unless they trust the server they + are talking to and either have a secure path to it or use DNS + transaction security. + + Any security aware resolver willing to do cryptography SHOULD assert + the CD bit on all queries to permit it to impose its own policies and + to reduce DNS latency time by allowing security aware servers to + answer with Pending data. + + Security aware servers MUST NOT return Bad data. For non-security + aware resolvers or security aware resolvers requesting service by + having the CD bit clear, security aware servers MUST return only + Authenticated or Insecure data in the answer and authority sections + with the AD bit set in the response. Security aware servers SHOULD + return Pending data, with the AD bit clear in the response, to + security aware resolvers requesting this service by asserting the CD + bit in their request. The AD bit MUST NOT be set on a response + unless all of the RRs in the answer and authority sections of the + response are either Authenticated or Insecure. The AD bit does not + cover the additional information section. + + + + + + + +Eastlake Standards Track [Page 30] + +RFC 2535 DNS Security Extensions March 1999 + + +6.2 Staticly Configured Keys + + The public key to authenticate a zone SHOULD be defined in local + configuration files before that zone is loaded at the primary server + so the zone can be authenticated. + + While it might seem logical for everyone to start with a public key + associated with the root zone and staticly configure this in every + resolver, this has problems. The logistics of updating every DNS + resolver in the world should this key ever change would be severe. + Furthermore, many organizations will explicitly wish their "interior" + DNS implementations to completely trust only their own DNS servers. + Interior resolvers of such organizations can then go through the + organization's zone servers to access data outside the organization's + domain and need not be configured with keys above the organization's + DNS apex. + + Host resolvers that are not part of a larger organization may be + configured with a key for the domain of their local ISP whose + recursive secure DNS caching server they use. + +6.3 Chaining Through The DNS + + Starting with one or more trusted keys for any zone, it should be + possible to retrieve signed keys for that zone's subzones which have + a key. A secure sub-zone is indicated by a KEY RR with non-null key + information appearing with the NS RRs in the sub-zone and which may + also be present in the parent. These make it possible to descend + within the tree of zones. + +6.3.1 Chaining Through KEYs + + In general, some RRset that you wish to validate in the secure DNS + will be signed by one or more SIG RRs. Each of these SIG RRs has a + signer under whose name is stored the public KEY to use in + authenticating the SIG. Each of those KEYs will, generally, also be + signed with a SIG. And those SIGs will have signer names also + referring to KEYs. And so on. As a result, authentication leads to + chains of alternating SIG and KEY RRs with the first SIG signing the + original data whose authenticity is to be shown and the final KEY + being some trusted key staticly configured at the resolver performing + the authentication. + + In testing such a chain, the validity periods of the SIGs encountered + must be intersected to determine the validity period of the + authentication of the data, a purely algorithmic process. In + addition, the validation of each SIG over the data with reference to + a KEY must meet the objective cryptographic test implied by the + + + +Eastlake Standards Track [Page 31] + +RFC 2535 DNS Security Extensions March 1999 + + + cryptographic algorithm used (although even here the resolver may + have policies as to trusted algorithms and key lengths). Finally, + the judgement that a SIG with a particular signer name can + authenticate data (possibly a KEY RRset) with a particular owner + name, is primarily a policy question. Ultimately, this is a policy + local to the resolver and any clients that depend on that resolver's + decisions. It is, however, recommended, that the policy below be + adopted: + + Let A < B mean that A is a shorter domain name than B formed by + dropping one or more whole labels from the left end of B, i.e., + A is a direct or indirect superdomain of B. Let A = B mean that + A and B are the same domain name (i.e., are identical after + letter case canonicalization). Let A > B mean that A is a + longer domain name than B formed by adding one or more whole + labels on the left end of B, i.e., A is a direct or indirect + subdomain of B + + Let Static be the owner names of the set of staticly configured + trusted keys at a resolver. + + Then Signer is a valid signer name for a SIG authenticating an + RRset (possibly a KEY RRset) with owner name Owner at the + resolver if any of the following three rules apply: + + (1) Owner > or = Signer (except that if Signer is root, Owner + must be root or a top level domain name). That is, Owner is the + same as or a subdomain of Signer. + + (2) ( Owner < Signer ) and ( Signer > or = some Static ). That + is, Owner is a superdomain of Signer and Signer is staticly + configured or a subdomain of a staticly configured key. + + (3) Signer = some Static. That is, the signer is exactly some + staticly configured key. + + Rule 1 is the rule for descending the DNS tree and includes a special + prohibition on the root zone key due to the restriction that the root + zone be only one label deep. This is the most fundamental rule. + + Rule 2 is the rule for ascending the DNS tree from one or more + staticly configured keys. Rule 2 has no effect if only root zone + keys are staticly configured. + + Rule 3 is a rule permitting direct cross certification. Rule 3 has + no effect if only root zone keys are staticly configured. + + + + + +Eastlake Standards Track [Page 32] + +RFC 2535 DNS Security Extensions March 1999 + + + Great care should be taken that the consequences have been fully + considered before making any local policy adjustments to these rules + (other than dispensing with rules 2 and 3 if only root zone keys are + staticly configured). + +6.3.2 Conflicting Data + + It is possible that there will be multiple SIG-KEY chains that appear + to authenticate conflicting RRset answers to the same query. A + resolver should choose only the most reliable answer to return and + discard other data. This choice of most reliable is a matter of + local policy which could take into account differing trust in + algorithms, key sizes, staticly configured keys, zones traversed, + etc. The technique given below is recommended for taking into + account SIG-KEY chain length. + + A resolver should keep track of the number of successive secure zones + traversed from a staticly configured key starting point to any secure + zone it can reach. In general, the lower such a distance number is, + the greater the confidence in the data. Staticly configured data + should be given a distance number of zero. If a query encounters + different Authenticated data for the same query with different + distance values, that with a larger value should be ignored unless + some other local policy covers the case. + + A security conscious resolver should completely refuse to step from a + secure zone into a unsecured zone unless the unsecured zone is + certified to be non-secure by the presence of an authenticated KEY RR + for the unsecured zone with the no-key type value. Otherwise the + resolver is getting bogus or spoofed data. + + If legitimate unsecured zones are encountered in traversing the DNS + tree, then no zone can be trusted as secure that can be reached only + via information from such non-secure zones. Since the unsecured zone + data could have been spoofed, the "secure" zone reached via it could + be counterfeit. The "distance" to data in such zones or zones + reached via such zones could be set to 256 or more as this exceeds + the largest possible distance through secure zones in the DNS. + +6.4 Secure Time + + Coordinated interpretation of the time fields in SIG RRs requires + that reasonably consistent time be available to the hosts + implementing the DNS security extensions. + + A variety of time synchronization protocols exist including the + Network Time Protocol (NTP [RFC 1305, 2030]). If such protocols are + used, they MUST be used securely so that time can not be spoofed. + + + +Eastlake Standards Track [Page 33] + +RFC 2535 DNS Security Extensions March 1999 + + + Otherwise, for example, a host could get its clock turned back and + might then believe old SIG RRs, and the data they authenticate, which + were valid but are no longer. + +7. ASCII Representation of Security RRs + + This section discusses the format for master file and other ASCII + presentation of the three DNS security resource records. + + The algorithm field in KEY and SIG RRs can be represented as either + an unsigned integer or symbolicly. The following initial symbols are + defined as indicated: + + Value Symbol + + 001 RSAMD5 + 002 DH + 003 DSA + 004 ECC + 252 INDIRECT + 253 PRIVATEDNS + 254 PRIVATEOID + +7.1 Presentation of KEY RRs + + KEY RRs may appear as single logical lines in a zone data master file + [RFC 1033]. + + The flag field is represented as an unsigned integer or a sequence of + mnemonics as follows separated by instances of the verticle bar ("|") + character: + + BIT Mnemonic Explanation + 0-1 key type + NOCONF =1 confidentiality use prohibited + NOAUTH =2 authentication use prohibited + NOKEY =3 no key present + 2 FLAG2 - reserved + 3 EXTEND flags extension + 4 FLAG4 - reserved + 5 FLAG5 - reserved + 6-7 name type + USER =0 (default, may be omitted) + ZONE =1 + HOST =2 (host or other end entity) + NTYP3 - reserved + 8 FLAG8 - reserved + 9 FLAG9 - reserved + + + +Eastlake Standards Track [Page 34] + +RFC 2535 DNS Security Extensions March 1999 + + + 10 FLAG10 - reserved + 11 FLAG11 - reserved + 12-15 signatory field, values 0 to 15 + can be represented by SIG0, SIG1, ... SIG15 + + No flag mnemonic need be present if the bit or field it represents is + zero. + + The protocol octet can be represented as either an unsigned integer + or symbolicly. The following initial symbols are defined: + + 000 NONE + 001 TLS + 002 EMAIL + 003 DNSSEC + 004 IPSEC + 255 ALL + + Note that if the type flags field has the NOKEY value, nothing + appears after the algorithm octet. + + The remaining public key portion is represented in base 64 (see + Appendix A) and may be divided up into any number of white space + separated substrings, down to single base 64 digits, which are + concatenated to obtain the full signature. These substrings can span + lines using the standard parenthesis. + + Note that the public key may have internal sub-fields but these do + not appear in the master file representation. For example, with + algorithm 1 there is a public exponent size, then a public exponent, + and then a modulus. With algorithm 254, there will be an OID size, + an OID, and algorithm dependent information. But in both cases only a + single logical base 64 string will appear in the master file. + +7.2 Presentation of SIG RRs + + A data SIG RR may be represented as a single logical line in a zone + data file [RFC 1033] but there are some special considerations as + described below. (It does not make sense to include a transaction or + request authenticating SIG RR in a file as they are a transient + authentication that covers data including an ephemeral transaction + number and so must be calculated in real time.) + + There is no particular problem with the signer, covered type, and + times. The time fields appears in the form YYYYMMDDHHMMSS where YYYY + is the year, the first MM is the month number (01-12), DD is the day + of the month (01-31), HH is the hour in 24 hours notation (00-23), + the second MM is the minute (00-59), and SS is the second (00-59). + + + +Eastlake Standards Track [Page 35] + +RFC 2535 DNS Security Extensions March 1999 + + + The original TTL field appears as an unsigned integer. + + If the original TTL, which applies to the type signed, is the same as + the TTL of the SIG RR itself, it may be omitted. The date field + which follows it is larger than the maximum possible TTL so there is + no ambiguity. + + The "labels" field appears as an unsigned integer. + + The key tag appears as an unsigned number. + + However, the signature itself can be very long. It is the last data + field and is represented in base 64 (see Appendix A) and may be + divided up into any number of white space separated substrings, down + to single base 64 digits, which are concatenated to obtain the full + signature. These substrings can be split between lines using the + standard parenthesis. + +7.3 Presentation of NXT RRs + + NXT RRs do not appear in original unsigned zone master files since + they should be derived from the zone as it is being signed. If a + signed file with NXTs added is printed or NXTs are printed by + debugging code, they appear as the next domain name followed by the + RR type present bits as an unsigned interger or sequence of RR + mnemonics. + +8. Canonical Form and Order of Resource Records + + This section specifies, for purposes of domain name system (DNS) + security, the canonical form of resource records (RRs), their name + order, and their overall order. A canonical name order is necessary + to construct the NXT name chain. A canonical form and ordering + within an RRset is necessary in consistently constructing and + verifying SIG RRs. A canonical ordering of types within a name is + required in connection with incremental transfer (Section 5.6.2). + +8.1 Canonical RR Form + + For purposes of DNS security, the canonical form for an RR is the + wire format of the RR with domain names (1) fully expanded (no name + compression via pointers), (2) all domain name letters set to lower + case, (3) owner name wild cards in master file form (no substitution + made for *), and (4) the original TTL substituted for the current + TTL. + + + + + + +Eastlake Standards Track [Page 36] + +RFC 2535 DNS Security Extensions March 1999 + + +8.2 Canonical DNS Name Order + + For purposes of DNS security, the canonical ordering of owner names + is to sort individual labels as unsigned left justified octet strings + where the absence of a octet sorts before a zero value octet and + upper case letters are treated as lower case letters. Names in a + zone are sorted by sorting on the highest level label and then, + within those names with the same highest level label by the next + lower label, etc. down to leaf node labels. Within a zone, the zone + name itself always exists and all other names are the zone name with + some prefix of lower level labels. Thus the zone name itself always + sorts first. + + Example: + foo.example + a.foo.example + yljkjljk.a.foo.example + Z.a.foo.example + zABC.a.FOO.EXAMPLE + z.foo.example + *.z.foo.example + \200.z.foo.example + +8.3 Canonical RR Ordering Within An RRset + + Within any particular owner name and type, RRs are sorted by RDATA as + a left justified unsigned octet sequence where the absence of an + octet sorts before the zero octet. + +8.4 Canonical Ordering of RR Types + + When RRs of the same name but different types must be ordered, they + are ordered by type, considering the type to be an unsigned integer, + except that SIG RRs are placed immediately after the type they cover. + Thus, for example, an A record would be put before an MX record + because A is type 1 and MX is type 15 but if both were signed, the + order would be A < SIG(A) < MX < SIG(MX). + +9. Conformance + + Levels of server and resolver conformance are defined below. + +9.1 Server Conformance + + Two levels of server conformance for DNS security are defined as + follows: + + + + + +Eastlake Standards Track [Page 37] + +RFC 2535 DNS Security Extensions March 1999 + + + BASIC: Basic server compliance is the ability to store and retrieve + (including zone transfer) SIG, KEY, and NXT RRs. Any secondary or + caching server for a secure zone MUST have at least basic compliance + and even then some things, such as secure CNAMEs, will not work + without full compliance. + + FULL: Full server compliance adds the following to basic compliance: + (1) ability to read SIG, KEY, and NXT RRs in zone files and (2) + ability, given a zone file and private key, to add appropriate SIG + and NXT RRs, possibly via a separate application, (3) proper + automatic inclusion of SIG, KEY, and NXT RRs in responses, (4) + suppression of CNAME following on retrieval of the security type RRs, + (5) recognize the CD query header bit and set the AD query header + bit, as appropriate, and (6) proper handling of the two NXT RRs at + delegation points. Primary servers for secure zones MUST be fully + compliant and for complete secure operation, all secondary, caching, + and other servers handling the zone SHOULD be fully compliant as + well. + +9.2 Resolver Conformance + + Two levels of resolver compliance (including the resolver portion of + a server) are defined for DNS Security: + + BASIC: A basic compliance resolver can handle SIG, KEY, and NXT RRs + when they are explicitly requested. + + FULL: A fully compliant resolver (1) understands KEY, SIG, and NXT + RRs including verification of SIGs at least for the mandatory + algorithm, (2) maintains appropriate information in its local caches + and database to indicate which RRs have been authenticated and to + what extent they have been authenticated, (3) performs additional + queries as necessary to attempt to obtain KEY, SIG, or NXT RRs when + needed, (4) normally sets the CD query header bit on its queries. + +10. Security Considerations + + This document specifies extensions to the Domain Name System (DNS) + protocol to provide data integrity and data origin authentication, + public key distribution, and optional transaction and request + security. + + It should be noted that, at most, these extensions guarantee the + validity of resource records, including KEY resource records, + retrieved from the DNS. They do not magically solve other security + problems. For example, using secure DNS you can have high confidence + in the IP address you retrieve for a host name; however, this does + not stop someone for substituting an unauthorized host at that + + + +Eastlake Standards Track [Page 38] + +RFC 2535 DNS Security Extensions March 1999 + + + address or capturing packets sent to that address and falsely + responding with packets apparently from that address. Any reasonably + complete security system will require the protection of many + additional facets of the Internet beyond DNS. + + The implementation of NXT RRs as described herein enables a resolver + to determine all the names in a zone even if zone transfers are + prohibited (section 5.6). This is an active area of work and may + change. + + A number of precautions in DNS implementation have evolved over the + years to harden the insecure DNS against spoofing. These precautions + should not be abandoned but should be considered to provide + additional protection in case of key compromise in secure DNS. + +11. IANA Considerations + + KEY RR flag bits 2 and 8-11 and all flag extension field bits can be + assigned by IETF consensus as defined in RFC 2434. The remaining + values of the NAMTYP flag field and flag bits 4 and 5 (which could + conceivably become an extension of the NAMTYP field) can only be + assigned by an IETF Standards Action [RFC 2434]. + + Algorithm numbers 5 through 251 are available for assignment should + sufficient reason arise. However, the designation of a new algorithm + could have a major impact on interoperability and requires an IETF + Standards Action [RFC 2434]. The existence of the private algorithm + types 253 and 254 should satify most needs for private or proprietary + algorithms. + + Additional values of the Protocol Octet (5-254) can be assigned by + IETF Consensus [RFC 2434]. + + The meaning of the first bit of the NXT RR "type bit map" being a one + can only be assigned by a standards action. + +References + + [RFC 1033] Lottor, M., "Domain Administrators Operations Guide", RFC + 1033, November 1987. + + [RFC 1034] Mockapetris, P., "Domain Names - Concepts and + Facilities", STD 13, RFC 1034, November 1987. + + [RFC 1035] Mockapetris, P., "Domain Names - Implementation and + Specifications", STD 13, RFC 1035, November 1987. + + + + + +Eastlake Standards Track [Page 39] + +RFC 2535 DNS Security Extensions March 1999 + + + [RFC 1305] Mills, D., "Network Time Protocol (v3)", RFC 1305, March + 1992. + + [RFC 1530] Malamud, C. and M. Rose, "Principles of Operation for the + TPC.INT Subdomain: General Principles and Policy", RFC + 1530, October 1993. + + [RFC 2401] Kent, S. and R. Atkinson, "Security Architecture for the + Internet Protocol", RFC 2401, November 1998. + + [RFC 1982] Elz, R. and R. Bush, "Serial Number Arithmetic", RFC + 1982, September 1996. + + [RFC 1995] Ohta, M., "Incremental Zone Transfer in DNS", RFC 1995, + August 1996. + + [RFC 2030] Mills, D., "Simple Network Time Protocol (SNTP) Version 4 + for IPv4, IPv6 and OSI", RFC 2030, October 1996. + + [RFC 2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail + Extensions (MIME) Part One: Format of Internet Message + Bodies", RFC 2045, November 1996. + + [RFC 2065] Eastlake, D. and C. Kaufman, "Domain Name System Security + Extensions", RFC 2065, January 1997. + + [RFC 2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, March 1997. + + [RFC 2136] Vixie, P., Thomson, S., Rekhter, Y. and J. Bound, + "Dynamic Updates in the Domain Name System (DNS UPDATE)", + RFC 2136, April 1997. + + [RFC 2137] Eastlake, D., "Secure Domain Name System Dynamic Update", + RFC 2137, April 1997. + + [RFC 2181] Elz, R. and R. Bush, "Clarifications to the DNS + Specification", RFC 2181, July 1997. + + [RFC 2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an + IANA Considerations Section in RFCs", BCP 26, RFC 2434, + October 1998. + + [RFC 2537] Eastlake, D., "RSA/MD5 KEYs and SIGs in the Domain Name + System (DNS)", RFC 2537, March 1999. + + [RFC 2539] Eastlake, D., "Storage of Diffie-Hellman Keys in the + Domain Name System (DNS)", RFC 2539, March 1999. + + + +Eastlake Standards Track [Page 40] + +RFC 2535 DNS Security Extensions March 1999 + + + [RFC 2536] Eastlake, D., "DSA KEYs and SIGs in the Domain Name + System (DNS)", RFC 2536, March 1999. + + [RFC 2538] Eastlake, D. and O. Gudmundsson, "Storing Certificates in + the Domain Name System", RFC 2538, March 1999. + + [RFC 2541] Eastlake, D., "DNS Operational Security Considerations", + RFC 2541, March 1999. + + [RSA FAQ] - RSADSI Frequently Asked Questions periodic posting. + +Author's Address + + Donald E. Eastlake 3rd + IBM + 65 Shindegan Hill Road + RR #1 + Carmel, NY 10512 + + Phone: +1-914-784-7913 (w) + +1-914-276-2668 (h) + Fax: +1-914-784-3833 (w-fax) + EMail: dee3@us.ibm.com + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Eastlake Standards Track [Page 41] + +RFC 2535 DNS Security Extensions March 1999 + + +Appendix A: Base 64 Encoding + + The following encoding technique is taken from [RFC 2045] by N. + Borenstein and N. Freed. It is reproduced here in an edited form for + convenience. + + A 65-character subset of US-ASCII is used, enabling 6 bits to be + represented per printable character. (The extra 65th character, "=", + is used to signify a special processing function.) + + The encoding process represents 24-bit groups of input bits as output + strings of 4 encoded characters. Proceeding from left to right, a + 24-bit input group is formed by concatenating 3 8-bit input groups. + These 24 bits are then treated as 4 concatenated 6-bit groups, each + of which is translated into a single digit in the base 64 alphabet. + + Each 6-bit group is used as an index into an array of 64 printable + characters. The character referenced by the index is placed in the + output string. + + Table 1: The Base 64 Alphabet + + Value Encoding Value Encoding Value Encoding Value Encoding + 0 A 17 R 34 i 51 z + 1 B 18 S 35 j 52 0 + 2 C 19 T 36 k 53 1 + 3 D 20 U 37 l 54 2 + 4 E 21 V 38 m 55 3 + 5 F 22 W 39 n 56 4 + 6 G 23 X 40 o 57 5 + 7 H 24 Y 41 p 58 6 + 8 I 25 Z 42 q 59 7 + 9 J 26 a 43 r 60 8 + 10 K 27 b 44 s 61 9 + 11 L 28 c 45 t 62 + + 12 M 29 d 46 u 63 / + 13 N 30 e 47 v + 14 O 31 f 48 w (pad) = + 15 P 32 g 49 x + 16 Q 33 h 50 y + + Special processing is performed if fewer than 24 bits are available + at the end of the data being encoded. A full encoding quantum is + always completed at the end of a quantity. When fewer than 24 input + bits are available in an input group, zero bits are added (on the + right) to form an integral number of 6-bit groups. Padding at the + end of the data is performed using the '=' character. Since all base + 64 input is an integral number of octets, only the following cases + + + +Eastlake Standards Track [Page 42] + +RFC 2535 DNS Security Extensions March 1999 + + + can arise: (1) the final quantum of encoding input is an integral + multiple of 24 bits; here, the final unit of encoded output will be + an integral multiple of 4 characters with no "=" padding, (2) the + final quantum of encoding input is exactly 8 bits; here, the final + unit of encoded output will be two characters followed by two "=" + padding characters, or (3) the final quantum of encoding input is + exactly 16 bits; here, the final unit of encoded output will be three + characters followed by one "=" padding character. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Eastlake Standards Track [Page 43] + +RFC 2535 DNS Security Extensions March 1999 + + +Appendix B: Changes from RFC 2065 + + This section summarizes the most important changes that have been + made since RFC 2065. + + 1. Most of Section 7 of [RFC 2065] called "Operational + Considerations", has been removed and may be made into a separate + document [RFC 2541]. + + 2. The KEY RR has been changed by (2a) eliminating the "experimental" + flag as unnecessary, (2b) reserving a flag bit for flags + expansion, (2c) more compactly encoding a number of bit fields in + such a way as to leave unchanged bits actually used by the limited + code currently deployed, (2d) eliminating the IPSEC and email flag + bits which are replaced by values of the protocol field and adding + a protocol field value for DNS security itself, (2e) adding + material to indicate that zone KEY RRs occur only at delegation + points, and (2f) removing the description of the RSA/MD5 algorithm + to a separate document [RFC 2537]. Section 3.4 describing the + meaning of various combinations of "no-key" and key present KEY + RRs has been added and the secure / unsecure status of a zone has + been clarified as being per algorithm. + + 3. The SIG RR has been changed by (3a) renaming the "time signed" + field to be the "signature inception" field, (3b) clarifying that + signature expiration and inception use serial number ring + arithmetic, (3c) changing the definition of the key footprint/tag + for algorithms other than 1 and adding Appendix C to specify its + calculation. In addition, the SIG covering type AXFR has been + eliminated while one covering IXFR [RFC 1995] has been added (see + section 5.6). + + 4. Algorithm 3, the DSA algorithm, is now designated as the mandatory + to implement algorithm. Algorithm 1, the RSA/MD5 algorithm, is + now a recommended option. Algorithm 2 and 4 are designated as the + Diffie-Hellman key and elliptic cryptography algorithms + respectively, all to be defined in separate documents. Algorithm + code point 252 is designated to indicate "indirect" keys, to be + defined in a separate document, where the actual key is elsewhere. + Both the KEY and SIG RR definitions have been simplified by + eliminating the "null" algorithm 253 as defined in [RFC 2065]. + That algorithm had been included because at the time it was + thought it might be useful in DNS dynamic update [RFC 2136]. It + was in fact not so used and it is dropped to simplify DNS + security. Howver, that algorithm number has been re-used to + indicate private algorithms where a domain name specifies the + algorithm. + + + + +Eastlake Standards Track [Page 44] + +RFC 2535 DNS Security Extensions March 1999 + + + 5. The NXT RR has been changed so that (5a) the NXT RRs in a zone + cover all names, including wildcards as literal names without + expansion, except for glue address records whose names would not + otherwise appear, (5b) all NXT bit map areas whose first octet has + bit zero set have been reserved for future definition, (5c) the + number of and circumstances under which an NXT must be returned in + connection with wildcard names has been extended, and (5d) in + connection with the bit map, references to the WKS RR have been + removed and verticle bars ("|") have been added between the RR + type mnemonics in the ASCII representation. + + 6. Information on the canonical form and ordering of RRs has been + moved into a separate Section 8. + + 7. A subsection covering incremental and full zone transfer has been + added in Section 5. + + 8. Concerning DNS chaining: Further specification and policy + recommendations on secure resolution have been added, primarily in + Section 6.3.1. It is now clearly stated that authenticated data + has a validity period of the intersection of the validity periods + of the SIG RRs in its authentication chain. The requirement to + staticly configure a superzone's key signed by a zone in all of + the zone's authoritative servers has been removed. The + recommendation to continue DNS security checks in a secure island + of DNS data that is separated from other parts of the DNS tree by + insecure zones and does not contain a zone for which a key has + been staticly configured was dropped. + + 9. It was clarified that the presence of the AD bit in a response + does not apply to the additional information section or to glue + address or delegation point NS RRs. The AD bit only indicates + that the answer and authority sections of the response are + authoritative. + + 10. It is now required that KEY RRs and NXT RRs be signed only with + zone-level keys. + + 11. Add IANA Considerations section and references to RFC 2434. + + + + + + + + + + + + +Eastlake Standards Track [Page 45] + +RFC 2535 DNS Security Extensions March 1999 + + +Appendix C: Key Tag Calculation + + The key tag field in the SIG RR is just a means of more efficiently + selecting the correct KEY RR to use when there is more than one KEY + RR candidate available, for example, in verifying a signature. It is + possible for more than one candidate key to have the same tag, in + which case each must be tried until one works or all fail. The + following reference implementation of how to calculate the Key Tag, + for all algorithms other than algorithm 1, is in ANSI C. It is coded + for clarity, not efficiency. (See section 4.1.6 for how to determine + the Key Tag of an algorithm 1 key.) + + /* assumes int is at least 16 bits + first byte of the key tag is the most significant byte of return + value + second byte of the key tag is the least significant byte of + return value + */ + + int keytag ( + + unsigned char key[], /* the RDATA part of the KEY RR */ + unsigned int keysize, /* the RDLENGTH */ + ) + { + long int ac; /* assumed to be 32 bits or larger */ + + for ( ac = 0, i = 0; i < keysize; ++i ) + ac += (i&1) ? key[i] : key[i]<<8; + ac += (ac>>16) & 0xFFFF; + return ac & 0xFFFF; + } + + + + + + + + + + + + + + + + + + + +Eastlake Standards Track [Page 46] + +RFC 2535 DNS Security Extensions March 1999 + + +Full Copyright Statement + + Copyright (C) The Internet Society (1999). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + + + + + + + + + + + + + + + + + + + + + + + + +Eastlake Standards Track [Page 47] + diff --git a/doc/rfc/rfc2536.txt b/doc/rfc/rfc2536.txt new file mode 100644 index 00000000..88be242b --- /dev/null +++ b/doc/rfc/rfc2536.txt @@ -0,0 +1,339 @@ + + + + + + +Network Working Group D. EastLake +Request for Comments: 2536 IBM +Category: Standards Track March 1999 + + + DSA KEYs and SIGs in the Domain Name System (DNS) + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (1999). All Rights Reserved. + +Abstract + + A standard method for storing US Government Digital Signature + Algorithm keys and signatures in the Domain Name System is described + which utilizes DNS KEY and SIG resource records. + +Table of Contents + + Abstract...................................................1 + 1. Introduction............................................1 + 2. DSA KEY Resource Records................................2 + 3. DSA SIG Resource Records................................3 + 4. Performance Considerations..............................3 + 5. Security Considerations.................................4 + 6. IANA Considerations.....................................4 + References.................................................5 + Author's Address...........................................5 + Full Copyright Statement...................................6 + +1. Introduction + + The Domain Name System (DNS) is the global hierarchical replicated + distributed database system for Internet addressing, mail proxy, and + other information. The DNS has been extended to include digital + signatures and cryptographic keys as described in [RFC 2535]. Thus + the DNS can now be secured and can be used for secure key + distribution. + + + + + +Eastlake Standards Track [Page 1] + +RFC 2536 DSA in the DNS March 1999 + + + This document describes how to store US Government Digital Signature + Algorithm (DSA) keys and signatures in the DNS. Familiarity with the + US Digital Signature Algorithm is assumed [Schneier]. Implementation + of DSA is mandatory for DNS security. + +2. DSA KEY Resource Records + + DSA public keys are stored in the DNS as KEY RRs using algorithm + number 3 [RFC 2535]. The structure of the algorithm specific portion + of the RDATA part of this RR is as shown below. These fields, from Q + through Y are the "public key" part of the DSA KEY RR. + + The period of key validity is not in the KEY RR but is indicated by + the SIG RR(s) which signs and authenticates the KEY RR(s) at that + domain name. + + Field Size + ----- ---- + T 1 octet + Q 20 octets + P 64 + T*8 octets + G 64 + T*8 octets + Y 64 + T*8 octets + + As described in [FIPS 186] and [Schneier]: T is a key size parameter + chosen such that 0 <= T <= 8. (The meaning for algorithm 3 if the T + octet is greater than 8 is reserved and the remainder of the RDATA + portion may have a different format in that case.) Q is a prime + number selected at key generation time such that 2**159 < Q < 2**160 + so Q is always 20 octets long and, as with all other fields, is + stored in "big-endian" network order. P, G, and Y are calculated as + directed by the FIPS 186 key generation algorithm [Schneier]. P is + in the range 2**(511+64T) < P < 2**(512+64T) and so is 64 + 8*T + octets long. G and Y are quantities modulus P and so can be up to + the same length as P and are allocated fixed size fields with the + same number of octets as P. + + During the key generation process, a random number X must be + generated such that 1 <= X <= Q-1. X is the private key and is used + in the final step of public key generation where Y is computed as + + Y = G**X mod P + + + + + + + + + +Eastlake Standards Track [Page 2] + +RFC 2536 DSA in the DNS March 1999 + + +3. DSA SIG Resource Records + + The signature portion of the SIG RR RDATA area, when using the US + Digital Signature Algorithm, is shown below with fields in the order + they occur. See [RFC 2535] for fields in the SIG RR RDATA which + precede the signature itself. + + Field Size + ----- ---- + T 1 octet + R 20 octets + S 20 octets + + The data signed is determined as specified in [RFC 2535]. Then the + following steps are taken, as specified in [FIPS 186], where Q, P, G, + and Y are as specified in the public key [Schneier]: + + hash = SHA-1 ( data ) + + Generate a random K such that 0 < K < Q. + + R = ( G**K mod P ) mod Q + + S = ( K**(-1) * (hash + X*R) ) mod Q + + Since Q is 160 bits long, R and S can not be larger than 20 octets, + which is the space allocated. + + T is copied from the public key. It is not logically necessary in + the SIG but is present so that values of T > 8 can more conveniently + be used as an escape for extended versions of DSA or other algorithms + as later specified. + +4. Performance Considerations + + General signature generation speeds are roughly the same for RSA [RFC + 2537] and DSA. With sufficient pre-computation, signature generation + with DSA is faster than RSA. Key generation is also faster for DSA. + However, signature verification is an order of magnitude slower than + RSA when the RSA public exponent is chosen to be small as is + recommended for KEY RRs used in domain name system (DNS) data + authentication. + + Current DNS implementations are optimized for small transfers, + typically less than 512 bytes including overhead. While larger + transfers will perform correctly and work is underway to make larger + transfers more efficient, it is still advisable at this time to make + reasonable efforts to minimize the size of KEY RR sets stored within + + + +Eastlake Standards Track [Page 3] + +RFC 2536 DSA in the DNS March 1999 + + + the DNS consistent with adequate security. Keep in mind that in a + secure zone, at least one authenticating SIG RR will also be + returned. + +5. Security Considerations + + Many of the general security consideration in [RFC 2535] apply. Keys + retrieved from the DNS should not be trusted unless (1) they have + been securely obtained from a secure resolver or independently + verified by the user and (2) this secure resolver and secure + obtainment or independent verification conform to security policies + acceptable to the user. As with all cryptographic algorithms, + evaluating the necessary strength of the key is essential and + dependent on local policy. + + The key size limitation of a maximum of 1024 bits ( T = 8 ) in the + current DSA standard may limit the security of DSA. For particularly + critical applications, implementors are encouraged to consider the + range of available algorithms and key sizes. + + DSA assumes the ability to frequently generate high quality random + numbers. See [RFC 1750] for guidance. DSA is designed so that if + manipulated rather than random numbers are used, very high bandwidth + covert channels are possible. See [Schneier] and more recent + research. The leakage of an entire DSA private key in only two DSA + signatures has been demonstrated. DSA provides security only if + trusted implementations, including trusted random number generation, + are used. + +6. IANA Considerations + + Allocation of meaning to values of the T parameter that are not + defined herein requires an IETF standards actions. It is intended + that values unallocated herein be used to cover future extensions of + the DSS standard. + + + + + + + + + + + + + + + + +Eastlake Standards Track [Page 4] + +RFC 2536 DSA in the DNS March 1999 + + +References + + [FIPS 186] U.S. Federal Information Processing Standard: Digital + Signature Standard. + + [RFC 1034] Mockapetris, P., "Domain Names - Concepts and + Facilities", STD 13, RFC 1034, November 1987. + + [RFC 1035] Mockapetris, P., "Domain Names - Implementation and + Specification", STD 13, RFC 1035, November 1987. + + [RFC 1750] Eastlake, D., Crocker, S. and J. Schiller, "Randomness + Recommendations for Security", RFC 1750, December 1994. + + [RFC 2535] Eastlake, D., "Domain Name System Security Extensions", + RFC 2535, March 1999. + + [RFC 2537] Eastlake, D., "RSA/MD5 KEYs and SIGs in the Domain Name + System (DNS)", RFC 2537, March 1999. + + [Schneier] Schneier, B., "Applied Cryptography Second Edition: + protocols, algorithms, and source code in C", 1996. + +Author's Address + + Donald E. Eastlake 3rd + IBM + 65 Shindegan Hill Road, RR #1 + Carmel, NY 10512 + + Phone: +1-914-276-2668(h) + +1-914-784-7913(w) + Fax: +1-914-784-3833(w) + EMail: dee3@us.ibm.com + + + + + + + + + + + + + + + + + +Eastlake Standards Track [Page 5] + +RFC 2536 DSA in the DNS March 1999 + + +Full Copyright Statement + + Copyright (C) The Internet Society (1999). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + + + + + + + + + + + + + + + + + + + + + + + + +Eastlake Standards Track [Page 6] + diff --git a/doc/rfc/rfc2537.txt b/doc/rfc/rfc2537.txt new file mode 100644 index 00000000..cb75cf5b --- /dev/null +++ b/doc/rfc/rfc2537.txt @@ -0,0 +1,339 @@ + + + + + + +Network Working Group D. Eastlake +Request for Comments: 2537 IBM +Category: Standards Track March 1999 + + + RSA/MD5 KEYs and SIGs in the Domain Name System (DNS) + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (1999). All Rights Reserved. + +Abstract + + A standard method for storing RSA keys and and RSA/MD5 based + signatures in the Domain Name System is described which utilizes DNS + KEY and SIG resource records. + +Table of Contents + + Abstract...................................................1 + 1. Introduction............................................1 + 2. RSA Public KEY Resource Records.........................2 + 3. RSA/MD5 SIG Resource Records............................2 + 4. Performance Considerations..............................3 + 5. Security Considerations.................................4 + References.................................................4 + Author's Address...........................................5 + Full Copyright Statement...................................6 + +1. Introduction + + The Domain Name System (DNS) is the global hierarchical replicated + distributed database system for Internet addressing, mail proxy, and + other information. The DNS has been extended to include digital + signatures and cryptographic keys as described in [RFC 2535]. Thus + the DNS can now be secured and used for secure key distribution. + + + + + + + +Eastlake Standards Track [Page 1] + +RFC 2537 RSA/MD5 KEYs and SIGs in the DNS March 1999 + + + This document describes how to store RSA keys and and RSA/MD5 based + signatures in the DNS. Familiarity with the RSA algorithm is assumed + [Schneier]. Implementation of the RSA algorithm in DNS is + recommended. + + The key words "MUST", "REQUIRED", "SHOULD", "RECOMMENDED", and "MAY" + in this document are to be interpreted as described in RFC 2119. + +2. RSA Public KEY Resource Records + + RSA public keys are stored in the DNS as KEY RRs using algorithm + number 1 [RFC 2535]. The structure of the algorithm specific portion + of the RDATA part of such RRs is as shown below. + + Field Size + ----- ---- + exponent length 1 or 3 octets (see text) + exponent as specified by length field + modulus remaining space + + For interoperability, the exponent and modulus are each currently + limited to 4096 bits in length. The public key exponent is a + variable length unsigned integer. Its length in octets is + represented as one octet if it is in the range of 1 to 255 and by a + zero octet followed by a two octet unsigned length if it is longer + than 255 bytes. The public key modulus field is a multiprecision + unsigned integer. The length of the modulus can be determined from + the RDLENGTH and the preceding RDATA fields including the exponent. + Leading zero octets are prohibited in the exponent and modulus. + +3. RSA/MD5 SIG Resource Records + + The signature portion of the SIG RR RDATA area, when using the + RSA/MD5 algorithm, is calculated as shown below. The data signed is + determined as specified in [RFC 2535]. See [RFC 2535] for fields in + the SIG RR RDATA which precede the signature itself. + + + hash = MD5 ( data ) + + signature = ( 00 | 01 | FF* | 00 | prefix | hash ) ** e (mod n) + + + + + + + + + + +Eastlake Standards Track [Page 2] + +RFC 2537 RSA/MD5 KEYs and SIGs in the DNS March 1999 + + + where MD5 is the message digest algorithm documented in [RFC 1321], + "|" is concatenation, "e" is the private key exponent of the signer, + and "n" is the modulus of the signer's public key. 01, FF, and 00 + are fixed octets of the corresponding hexadecimal value. "prefix" is + the ASN.1 BER MD5 algorithm designator prefix specified in [RFC + 2437], that is, + + hex 3020300c06082a864886f70d020505000410 [NETSEC]. + + This prefix is included to make it easier to use RSAREF (or similar + packages such as EuroRef). The FF octet MUST be repeated the maximum + number of times such that the value of the quantity being + exponentiated is the same length in octets as the value of n. + + (The above specifications are identical to the corresponding part of + Public Key Cryptographic Standard #1 [RFC 2437].) + + The size of n, including most and least significant bits (which will + be 1) MUST be not less than 512 bits and not more than 4096 bits. n + and e SHOULD be chosen such that the public exponent is small. + + Leading zero bytes are permitted in the RSA/MD5 algorithm signature. + + A public exponent of 3 minimizes the effort needed to verify a + signature. Use of 3 as the public exponent is weak for + confidentiality uses since, if the same data can be collected + encrypted under three different keys with an exponent of 3 then, + using the Chinese Remainder Theorem [NETSEC], the original plain text + can be easily recovered. This weakness is not significant for DNS + security because we seek only authentication, not confidentiality. + +4. Performance Considerations + + General signature generation speeds are roughly the same for RSA and + DSA [RFC 2536]. With sufficient pre-computation, signature + generation with DSA is faster than RSA. Key generation is also + faster for DSA. However, signature verification is an order of + magnitude slower with DSA when the RSA public exponent is chosen to + be small as is recommended for KEY RRs used in domain name system + (DNS) data authentication. + + Current DNS implementations are optimized for small transfers, + typically less than 512 bytes including overhead. While larger + transfers will perform correctly and work is underway to make larger + + + + + + + +Eastlake Standards Track [Page 3] + +RFC 2537 RSA/MD5 KEYs and SIGs in the DNS March 1999 + + + transfers more efficient, it is still advisable at this time to make + reasonable efforts to minimize the size of KEY RR sets stored within + the DNS consistent with adequate security. Keep in mind that in a + secure zone, at least one authenticating SIG RR will also be + returned. + +5. Security Considerations + + Many of the general security consideration in [RFC 2535] apply. Keys + retrieved from the DNS should not be trusted unless (1) they have + been securely obtained from a secure resolver or independently + verified by the user and (2) this secure resolver and secure + obtainment or independent verification conform to security policies + acceptable to the user. As with all cryptographic algorithms, + evaluating the necessary strength of the key is essential and + dependent on local policy. + + For interoperability, the RSA key size is limited to 4096 bits. For + particularly critical applications, implementors are encouraged to + consider the range of available algorithms and key sizes. + +References + + [NETSEC] Kaufman, C., Perlman, R. and M. Speciner, "Network + Security: PRIVATE Communications in a PUBLIC World", + Series in Computer Networking and Distributed + Communications, 1995. + + [RFC 2437] Kaliski, B. and J. Staddon, "PKCS #1: RSA Cryptography + Specifications Version 2.0", RFC 2437, October 1998. + + [RFC 1034] Mockapetris, P., "Domain Names - Concepts and + Facilities", STD 13, RFC 1034, November 1987. + + [RFC 1035] Mockapetris, P., "Domain Names - Implementation and + Specification", STD 13, RFC 1035, November 1987. + + [RFC 1321] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321 + April 1992. + + [RFC 2535] Eastlake, D., "Domain Name System Security Extensions", + RFC 2535, March 1999. + + [RFC 2536] EastLake, D., "DSA KEYs and SIGs in the Domain Name + System (DNS)", RFC 2536, March 1999. + + + + + + +Eastlake Standards Track [Page 4] + +RFC 2537 RSA/MD5 KEYs and SIGs in the DNS March 1999 + + + [Schneier] Bruce Schneier, "Applied Cryptography Second Edition: + protocols, algorithms, and source code in C", 1996, John + Wiley and Sons, ISBN 0-471-11709-9. + +Author's Address + + Donald E. Eastlake 3rd + IBM + 65 Shindegan Hill Road, RR #1 + Carmel, NY 10512 + + Phone: +1-914-276-2668(h) + +1-914-784-7913(w) + Fax: +1-914-784-3833(w) + EMail: dee3@us.ibm.com + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Eastlake Standards Track [Page 5] + +RFC 2537 RSA/MD5 KEYs and SIGs in the DNS March 1999 + + +Full Copyright Statement + + Copyright (C) The Internet Society (1999). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + + + + + + + + + + + + + + + + + + + + + + + + +Eastlake Standards Track [Page 6] + diff --git a/doc/rfc/rfc2538.txt b/doc/rfc/rfc2538.txt new file mode 100644 index 00000000..c53e3efd --- /dev/null +++ b/doc/rfc/rfc2538.txt @@ -0,0 +1,563 @@ + + + + + + +Network Working Group D. Eastlake +Request for Comments: 2538 IBM +Category: Standards Track O. Gudmundsson + TIS Labs + March 1999 + + + Storing Certificates in the Domain Name System (DNS) + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (1999). All Rights Reserved. + +Abstract + + Cryptographic public key are frequently published and their + authenticity demonstrated by certificates. A CERT resource record + (RR) is defined so that such certificates and related certificate + revocation lists can be stored in the Domain Name System (DNS). + +Table of Contents + + Abstract...................................................1 + 1. Introduction............................................2 + 2. The CERT Resource Record................................2 + 2.1 Certificate Type Values................................3 + 2.2 Text Representation of CERT RRs........................4 + 2.3 X.509 OIDs.............................................4 + 3. Appropriate Owner Names for CERT RRs....................5 + 3.1 X.509 CERT RR Names....................................5 + 3.2 PGP CERT RR Names......................................6 + 4. Performance Considerations..............................6 + 5. IANA Considerations.....................................7 + 6. Security Considerations.................................7 + References.................................................8 + Authors' Addresses.........................................9 + Full Copyright Notice.....................................10 + + + + + + +Eastlake & Gudmundsson Standards Track [Page 1] + +RFC 2538 Storing Certificates in the DNS March 1999 + + +1. Introduction + + Public keys are frequently published in the form of a certificate and + their authenticity is commonly demonstrated by certificates and + related certificate revocation lists (CRLs). A certificate is a + binding, through a cryptographic digital signature, of a public key, + a validity interval and/or conditions, and identity, authorization, + or other information. A certificate revocation list is a list of + certificates that are revoked, and incidental information, all signed + by the signer (issuer) of the revoked certificates. Examples are + X.509 certificates/CRLs in the X.500 directory system or PGP + certificates/revocations used by PGP software. + + Section 2 below specifies a CERT resource record (RR) for the storage + of certificates in the Domain Name System. + + Section 3 discusses appropriate owner names for CERT RRs. + + Sections 4, 5, and 6 below cover performance, IANA, and security + considerations, respectively. + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in [RFC2119]. + +2. The CERT Resource Record + + The CERT resource record (RR) has the structure given below. Its RR + type code is 37. + + 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | type | key tag | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | algorithm | / + +---------------+ certificate or CRL / + / / + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| + + The type field is the certificate type as define in section 2.1 + below. + + The algorithm field has the same meaning as the algorithm field in + KEY and SIG RRs [RFC 2535] except that a zero algorithm field + indicates the algorithm is unknown to a secure DNS, which may simply + be the result of the algorithm not having been standardized for + secure DNS. + + + +Eastlake & Gudmundsson Standards Track [Page 2] + +RFC 2538 Storing Certificates in the DNS March 1999 + + + The key tag field is the 16 bit value computed for the key embedded + in the certificate as specified in the DNSSEC Standard [RFC 2535]. + This field is used as an efficiency measure to pick which CERT RRs + may be applicable to a particular key. The key tag can be calculated + for the key in question and then only CERT RRs with the same key tag + need be examined. However, the key must always be transformed to the + format it would have as the public key portion of a KEY RR before the + key tag is computed. This is only possible if the key is applicable + to an algorithm (and limits such as key size limits) defined for DNS + security. If it is not, the algorithm field MUST BE zero and the tag + field is meaningless and SHOULD BE zero. + +2.1 Certificate Type Values + + The following values are defined or reserved: + + Value Mnemonic Certificate Type + ----- -------- ----------- ---- + 0 reserved + 1 PKIX X.509 as per PKIX + 2 SPKI SPKI cert + 3 PGP PGP cert + 4-252 available for IANA assignment + 253 URI URI private + 254 OID OID private + 255-65534 available for IANA assignment + 65535 reserved + + The PKIX type is reserved to indicate an X.509 certificate conforming + to the profile being defined by the IETF PKIX working group. The + certificate section will start with a one byte unsigned OID length + and then an X.500 OID indicating the nature of the remainder of the + certificate section (see 2.3 below). (NOTE: X.509 certificates do + not include their X.500 directory type designating OID as a prefix.) + + The SPKI type is reserved to indicate a certificate formated as to be + specified by the IETF SPKI working group. + + The PGP type indicates a Pretty Good Privacy certificate as described + in RFC 2440 and its extensions and successors. + + The URI private type indicates a certificate format defined by an + absolute URI. The certificate portion of the CERT RR MUST begin with + a null terminated URI [RFC 2396] and the data after the null is the + private format certificate itself. The URI SHOULD be such that a + retrieval from it will lead to documentation on the format of the + certificate. Recognition of private certificate types need not be + based on URI equality but can use various forms of pattern matching + + + +Eastlake & Gudmundsson Standards Track [Page 3] + +RFC 2538 Storing Certificates in the DNS March 1999 + + + so that, for example, subtype or version information can also be + encoded into the URI. + + The OID private type indicates a private format certificate specified + by a an ISO OID prefix. The certificate section will start with a + one byte unsigned OID length and then a BER encoded OID indicating + the nature of the remainder of the certificate section. This can be + an X.509 certificate format or some other format. X.509 certificates + that conform to the IETF PKIX profile SHOULD be indicated by the PKIX + type, not the OID private type. Recognition of private certificate + types need not be based on OID equality but can use various forms of + pattern matching such as OID prefix. + +2.2 Text Representation of CERT RRs + + The RDATA portion of a CERT RR has the type field as an unsigned + integer or as a mnemonic symbol as listed in section 2.1 above. + + The key tag field is represented as an unsigned integer. + + The algorithm field is represented as an unsigned integer or a + mnemonic symbol as listed in [RFC 2535]. + + The certificate / CRL portion is represented in base 64 and may be + divided up into any number of white space separated substrings, down + to single base 64 digits, which are concatenated to obtain the full + signature. These substrings can span lines using the standard + parenthesis. + + Note that the certificate / CRL portion may have internal sub-fields + but these do not appear in the master file representation. For + example, with type 254, there will be an OID size, an OID, and then + the certificate / CRL proper. But only a single logical base 64 + string will appear in the text representation. + +2.3 X.509 OIDs + + OIDs have been defined in connection with the X.500 directory for + user certificates, certification authority certificates, revocations + of certification authority, and revocations of user certificates. + The following table lists the OIDs, their BER encoding, and their + length prefixed hex format for use in CERT RRs: + + + + + + + + + +Eastlake & Gudmundsson Standards Track [Page 4] + +RFC 2538 Storing Certificates in the DNS March 1999 + + + id-at-userCertificate + = { joint-iso-ccitt(2) ds(5) at(4) 36 } + == 0x 03 55 04 24 + id-at-cACertificate + = { joint-iso-ccitt(2) ds(5) at(4) 37 } + == 0x 03 55 04 25 + id-at-authorityRevocationList + = { joint-iso-ccitt(2) ds(5) at(4) 38 } + == 0x 03 55 04 26 + id-at-certificateRevocationList + = { joint-iso-ccitt(2) ds(5) at(4) 39 } + == 0x 03 55 04 27 + +3. Appropriate Owner Names for CERT RRs + + It is recommended that certificate CERT RRs be stored under a domain + name related to their subject, i.e., the name of the entity intended + to control the private key corresponding to the public key being + certified. It is recommended that certificate revocation list CERT + RRs be stored under a domain name related to their issuer. + + Following some of the guidelines below may result in the use in DNS + names of characters that require DNS quoting which is to use a + backslash followed by the octal representation of the ASCII code for + the character such as \000 for NULL. + +3.1 X.509 CERT RR Names + + Some X.509 versions permit multiple names to be associated with + subjects and issuers under "Subject Alternate Name" and "Issuer + Alternate Name". For example, x.509v3 has such Alternate Names with + an ASN.1 specification as follows: + + GeneralName ::= CHOICE { + otherName [0] INSTANCE OF OTHER-NAME, + rfc822Name [1] IA5String, + dNSName [2] IA5String, + x400Address [3] EXPLICIT OR-ADDRESS.&Type, + directoryName [4] EXPLICIT Name, + ediPartyName [5] EDIPartyName, + uniformResourceIdentifier [6] IA5String, + iPAddress [7] OCTET STRING, + registeredID [8] OBJECT IDENTIFIER + } + + The recommended locations of CERT storage are as follows, in priority + order: + + + + +Eastlake & Gudmundsson Standards Track [Page 5] + +RFC 2538 Storing Certificates in the DNS March 1999 + + + (1) If a domain name is included in the identification in the + certificate or CRL, that should be used. + (2) If a domain name is not included but an IP address is included, + then the translation of that IP address into the appropriate + inverse domain name should be used. + (3) If neither of the above it used but a URI containing a domain + name is present, that domain name should be used. + (4) If none of the above is included but a character string name is + included, then it should be treated as described for PGP names in + 3.2 below. + (5) If none of the above apply, then the distinguished name (DN) + should be mapped into a domain name as specified in RFC 2247. + + Example 1: Assume that an X.509v3 certificate is issued to /CN=John + Doe/DC=Doe/DC=com/DC=xy/O=Doe Inc/C=XY/ with Subject Alternative + names of (a) string "John (the Man) Doe", (b) domain name john- + doe.com, and (c) uri <https://www.secure.john-doe.com:8080/>. Then + the storage locations recommended, in priority order, would be + (1) john-doe.com, + (2) www.secure.john-doe.com, and + (3) Doe.com.xy. + + Example 2: Assume that an X.509v3 certificate is issued to /CN=James + Hacker/L=Basingstoke/O=Widget Inc/C=GB/ with Subject Alternate names + of (a) domain name widget.foo.example, (b) IPv4 address + 10.251.13.201, and (c) string "James Hacker + <hacker@mail.widget.foo.example>". Then the storage locations + recommended, in priority order, would be + (1) widget.foo.example, + (2) 201.13.251.10.in-addr.arpa, and + (3) hacker.mail.widget.foo.example. + +3.2 PGP CERT RR Names + + PGP signed keys (certificates) use a general character string User ID + [RFC 2440]. However, it is recommended by PGP that such names include + the RFC 822 email address of the party, as in "Leslie Example + <Leslie@host.example>". If such a format is used, the CERT should be + under the standard translation of the email address into a domain + name, which would be leslie.host.example in this case. If no RFC 822 + name can be extracted from the string name no specific domain name is + recommended. + +4. Performance Considerations + + Current Domain Name System (DNS) implementations are optimized for + small transfers, typically not more than 512 bytes including + overhead. While larger transfers will perform correctly and work is + + + +Eastlake & Gudmundsson Standards Track [Page 6] + +RFC 2538 Storing Certificates in the DNS March 1999 + + + underway to make larger transfers more efficient, it is still + advisable at this time to make every reasonable effort to minimize + the size of certificates stored within the DNS. Steps that can be + taken may include using the fewest possible optional or extensions + fields and using short field values for variable length fields that + must be included. + +5. IANA Considerations + + Certificate types 0x0000 through 0x00FF and 0xFF00 through 0xFFFF can + only be assigned by an IETF standards action [RFC 2434] (and this + document assigns 0x0001 through 0x0003 and 0x00FD and 0x00FE). + Certificate types 0x0100 through 0xFEFF are assigned through IETF + Consensus [RFC 2434] based on RFC documentation of the certificate + type. The availability of private types under 0x00FD and 0x00FE + should satisfy most requirements for proprietary or private types. + +6. Security Considerations + + By definition, certificates contain their own authenticating + signature. Thus it is reasonable to store certificates in non-secure + DNS zones or to retrieve certificates from DNS with DNS security + checking not implemented or deferred for efficiency. The results MAY + be trusted if the certificate chain is verified back to a known + trusted key and this conforms with the user's security policy. + + Alternatively, if certificates are retrieved from a secure DNS zone + with DNS security checking enabled and are verified by DNS security, + the key within the retrieved certificate MAY be trusted without + verifying the certificate chain if this conforms with the user's + security policy. + + CERT RRs are not used in connection with securing the DNS security + additions so there are no security considerations related to CERT RRs + and securing the DNS itself. + + + + + + + + + + + + + + + + +Eastlake & Gudmundsson Standards Track [Page 7] + +RFC 2538 Storing Certificates in the DNS March 1999 + + +References + + RFC 1034 Mockapetris, P., "Domain Names - Concepts and Facilities", + STD 13, RFC 1034, November 1987. + + RFC 1035 Mockapetris, P., "Domain Names - Implementation and + Specifications", STD 13, RFC 1035, November 1987. + + RFC 2119 Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, March 1997. + + RFC 2247 Kille, S., Wahl, M., Grimstad, A., Huber, R. and S. + Sataluri, "Using Domains in LDAP/X.500 Distinguished + Names", RFC 2247, January 1998. + + RFC 2396 Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform + Resource Identifiers (URI): Generic Syntax", RFC 2396, + August 1998. + + RFC 2440 Callas, J., Donnerhacke, L., Finney, H. and R. Thayer, + "OpenPGP Message Format", RFC 2240, November 1998. + + RFC 2434 Narten, T. and H. Alvestrand, "Guidelines for Writing an + IANA Considerations Section in RFCs", BCP 26, RFC 2434, + October 1998. + + RFC 2535 Eastlake, D., "Domain Name System (DNS) Security + Extensions", RFC 2535, March 1999. + + RFC 2459 Housley, R., Ford, W., Polk, W. and D. Solo, "Internet + X.509 Public Key Infrastructure Certificate and CRL + Profile", RFC 2459, January 1999. + + + + + + + + + + + + + + + + + + + +Eastlake & Gudmundsson Standards Track [Page 8] + +RFC 2538 Storing Certificates in the DNS March 1999 + + +Authors' Addresses + + Donald E. Eastlake 3rd + IBM + 65 Shindegan Hill Road + RR#1 + Carmel, NY 10512 USA + + Phone: +1-914-784-7913 (w) + +1-914-276-2668 (h) + Fax: +1-914-784-3833 (w-fax) + EMail: dee3@us.ibm.com + + + Olafur Gudmundsson + TIS Labs at Network Associates + 3060 Washington Rd, Route 97 + Glenwood MD 21738 + + Phone: +1 443-259-2389 + EMail: ogud@tislabs.com + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Eastlake & Gudmundsson Standards Track [Page 9] + +RFC 2538 Storing Certificates in the DNS March 1999 + + +Full Copyright Statement + + Copyright (C) The Internet Society (1999). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + + + + + + + + + + + + + + + + + + + + + + + + +Eastlake & Gudmundsson Standards Track [Page 10] + diff --git a/doc/rfc/rfc2539.txt b/doc/rfc/rfc2539.txt new file mode 100644 index 00000000..cf32523d --- /dev/null +++ b/doc/rfc/rfc2539.txt @@ -0,0 +1,395 @@ + + + + + + +Network Working Group D. Eastlake +Request for Comments: 2539 IBM +Category: Standards Track March 1999 + + + Storage of Diffie-Hellman Keys in the Domain Name System (DNS) + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (1999). All Rights Reserved. + +Abstract + + A standard method for storing Diffie-Hellman keys in the Domain Name + System is described which utilizes DNS KEY resource records. + +Acknowledgements + + Part of the format for Diffie-Hellman keys and the description + thereof was taken from a work in progress by: + + Ashar Aziz <ashar.aziz@eng.sun.com> + Tom Markson <markson@incog.com> + Hemma Prafullchandra <hemma@eng.sun.com> + + In addition, the following person provided useful comments that have + been incorporated: + + Ran Atkinson <rja@inet.org> + Thomas Narten <narten@raleigh.ibm.com> + + + + + + + + + + + + + +Eastlake Standards Track [Page 1] + +RFC 2539 Diffie-Hellman Keys in the DNS March 1999 + + +Table of Contents + + Abstract...................................................1 + Acknowledgements...........................................1 + 1. Introduction............................................2 + 1.1 About This Document....................................2 + 1.2 About Diffie-Hellman...................................2 + 2. Diffie-Hellman KEY Resource Records.....................3 + 3. Performance Considerations..............................4 + 4. IANA Considerations.....................................4 + 5. Security Considerations.................................4 + References.................................................5 + Author's Address...........................................5 + Appendix A: Well known prime/generator pairs...............6 + A.1. Well-Known Group 1: A 768 bit prime..................6 + A.2. Well-Known Group 2: A 1024 bit prime.................6 + Full Copyright Notice......................................7 + +1. Introduction + + The Domain Name System (DNS) is the current global hierarchical + replicated distributed database system for Internet addressing, mail + proxy, and similar information. The DNS has been extended to include + digital signatures and cryptographic keys as described in [RFC 2535]. + Thus the DNS can now be used for secure key distribution. + +1.1 About This Document + + This document describes how to store Diffie-Hellman keys in the DNS. + Familiarity with the Diffie-Hellman key exchange algorithm is assumed + [Schneier]. + +1.2 About Diffie-Hellman + + Diffie-Hellman requires two parties to interact to derive keying + information which can then be used for authentication. Since DNS SIG + RRs are primarily used as stored authenticators of zone information + for many different resolvers, no Diffie-Hellman algorithm SIG RR is + defined. For example, assume that two parties have local secrets "i" + and "j". Assume they each respectively calculate X and Y as follows: + + X = g**i ( mod p ) Y = g**j ( mod p ) + + They exchange these quantities and then each calculates a Z as + follows: + + Zi = Y**i ( mod p ) Zj = X**j ( mod p ) + + + + +Eastlake Standards Track [Page 2] + +RFC 2539 Diffie-Hellman Keys in the DNS March 1999 + + + shared secret between the two parties that an adversary who does not + know i or j will not be able to learn from the exchanged messages + (unless the adversary can derive i or j by performing a discrete + logarithm mod p which is hard for strong p and g). + + The private key for each party is their secret i (or j). The public + key is the pair p and g, which must be the same for the parties, and + their individual X (or Y). + +2. Diffie-Hellman KEY Resource Records + + Diffie-Hellman keys are stored in the DNS as KEY RRs using algorithm + number 2. The structure of the RDATA portion of this RR is as shown + below. The first 4 octets, including the flags, protocol, and + algorithm fields are common to all KEY RRs as described in [RFC + 2535]. The remainder, from prime length through public value is the + "public key" part of the KEY RR. The period of key validity is not in + the KEY RR but is indicated by the SIG RR(s) which signs and + authenticates the KEY RR(s) at that domain name. + + 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | KEY flags | protocol | algorithm=2 | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | prime length (or flag) | prime (p) (or special) / + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + / prime (p) (variable length) | generator length | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | generator (g) (variable length) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | public value length | public value (variable length)/ + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + / public value (g^i mod p) (variable length) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Prime length is length of the Diffie-Hellman prime (p) in bytes if it + is 16 or greater. Prime contains the binary representation of the + Diffie-Hellman prime with most significant byte first (i.e., in + network order). If "prime length" field is 1 or 2, then the "prime" + field is actually an unsigned index into a table of 65,536 + prime/generator pairs and the generator length SHOULD be zero. See + Appedix A for defined table entries and Section 4 for information on + allocating additional table entries. The meaning of a zero or 3 + through 15 value for "prime length" is reserved. + + + + + + +Eastlake Standards Track [Page 3] + +RFC 2539 Diffie-Hellman Keys in the DNS March 1999 + + + Generator length is the length of the generator (g) in bytes. + Generator is the binary representation of generator with most + significant byte first. PublicValueLen is the Length of the Public + Value (g**i (mod p)) in bytes. PublicValue is the binary + representation of the DH public value with most significant byte + first. + + The corresponding algorithm=2 SIG resource record is not used so no + format for it is defined. + +3. Performance Considerations + + Current DNS implementations are optimized for small transfers, + typically less than 512 bytes including overhead. While larger + transfers will perform correctly and work is underway to make larger + transfers more efficient, it is still advisable to make reasonable + efforts to minimize the size of KEY RR sets stored within the DNS + consistent with adequate security. Keep in mind that in a secure + zone, an authenticating SIG RR will also be returned. + +4. IANA Considerations + + Assignment of meaning to Prime Lengths of 0 and 3 through 15 requires + an IETF consensus. + + Well known prime/generator pairs number 0x0000 through 0x07FF can + only be assigned by an IETF standards action and this Proposed + Standard assigns 0x0001 through 0x0002. Pairs number 0s0800 through + 0xBFFF can be assigned based on RFC documentation. Pairs number + 0xC000 through 0xFFFF are available for private use and are not + centrally coordinated. Use of such private pairs outside of a closed + environment may result in conflicts. + +5. Security Considerations + + Many of the general security consideration in [RFC 2535] apply. Keys + retrieved from the DNS should not be trusted unless (1) they have + been securely obtained from a secure resolver or independently + verified by the user and (2) this secure resolver and secure + obtainment or independent verification conform to security policies + acceptable to the user. As with all cryptographic algorithms, + evaluating the necessary strength of the key is important and + dependent on local policy. + + In addition, the usual Diffie-Hellman key strength considerations + apply. (p-1)/2 should also be prime, g should be primitive mod p, p + should be "large", etc. [Schneier] + + + + +Eastlake Standards Track [Page 4] + +RFC 2539 Diffie-Hellman Keys in the DNS March 1999 + + +References + + [RFC 1034] Mockapetris, P., "Domain Names - Concepts and + Facilities", STD 13, RFC 1034, November 1987. + + [RFC 1035] Mockapetris, P., "Domain Names - Implementation and + Specification", STD 13, RFC 1035, November 1987. + + [RFC 2535] Eastlake, D., "Domain Name System Security Extensions", + RFC 2535, March 1999. + + [Schneier] Bruce Schneier, "Applied Cryptography: Protocols, + Algorithms, and Source Code in C", 1996, John Wiley and + Sons + +Author's Address + + Donald E. Eastlake 3rd + IBM + 65 Shindegan Hill Road, RR #1 + Carmel, NY 10512 + + Phone: +1-914-276-2668(h) + +1-914-784-7913(w) + Fax: +1-914-784-3833(w) + EMail: dee3@us.ibm.com + + + + + + + + + + + + + + + + + + + + + + + + + +Eastlake Standards Track [Page 5] + +RFC 2539 Diffie-Hellman Keys in the DNS March 1999 + + +Appendix A: Well known prime/generator pairs + + These numbers are copied from the IPSEC effort where the derivation + of these values is more fully explained and additional information is + available. Richard Schroeppel performed all the mathematical and + computational work for this appendix. + +A.1. Well-Known Group 1: A 768 bit prime + + The prime is 2^768 - 2^704 - 1 + 2^64 * { [2^638 pi] + 149686 }. Its + decimal value is + 155251809230070893513091813125848175563133404943451431320235 + 119490296623994910210725866945387659164244291000768028886422 + 915080371891804634263272761303128298374438082089019628850917 + 0691316593175367469551763119843371637221007210577919 + + Prime modulus: Length (32 bit words): 24, Data (hex): + FFFFFFFF FFFFFFFF C90FDAA2 2168C234 C4C6628B 80DC1CD1 + 29024E08 8A67CC74 020BBEA6 3B139B22 514A0879 8E3404DD + EF9519B3 CD3A431B 302B0A6D F25F1437 4FE1356D 6D51C245 + E485B576 625E7EC6 F44C42E9 A63A3620 FFFFFFFF FFFFFFFF + + Generator: Length (32 bit words): 1, Data (hex): 2 + +A.2. Well-Known Group 2: A 1024 bit prime + + The prime is 2^1024 - 2^960 - 1 + 2^64 * { [2^894 pi] + 129093 }. + Its decimal value is + 179769313486231590770839156793787453197860296048756011706444 + 423684197180216158519368947833795864925541502180565485980503 + 646440548199239100050792877003355816639229553136239076508735 + 759914822574862575007425302077447712589550957937778424442426 + 617334727629299387668709205606050270810842907692932019128194 + 467627007 + + Prime modulus: Length (32 bit words): 32, Data (hex): + FFFFFFFF FFFFFFFF C90FDAA2 2168C234 C4C6628B 80DC1CD1 + 29024E08 8A67CC74 020BBEA6 3B139B22 514A0879 8E3404DD + EF9519B3 CD3A431B 302B0A6D F25F1437 4FE1356D 6D51C245 + E485B576 625E7EC6 F44C42E9 A637ED6B 0BFF5CB6 F406B7ED + EE386BFB 5A899FA5 AE9F2411 7C4B1FE6 49286651 ECE65381 + FFFFFFFF FFFFFFFF + + Generator: Length (32 bit words): 1, Data (hex): 2 + + + + + + + +Eastlake Standards Track [Page 6] + +RFC 2539 Diffie-Hellman Keys in the DNS March 1999 + + +Full Copyright Statement + + Copyright (C) The Internet Society (1999). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + + + + + + + + + + + + + + + + + + + + + + + + +Eastlake Standards Track [Page 7] + diff --git a/doc/rfc/rfc2540.txt b/doc/rfc/rfc2540.txt new file mode 100644 index 00000000..63148061 --- /dev/null +++ b/doc/rfc/rfc2540.txt @@ -0,0 +1,339 @@ + + + + + + +Network Working Group D. Eastlake +Request for Comments: 2540 IBM +Category: Experimental March 1999 + + + Detached Domain Name System (DNS) Information + +Status of this Memo + + This memo defines an Experimental Protocol for the Internet + community. It does not specify an Internet standard of any kind. + Discussion and suggestions for improvement are requested. + Distribution of this memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (1999). All Rights Reserved. + +Abstract + + A standard format is defined for representing detached DNS + information. This is anticipated to be of use for storing + information retrieved from the Domain Name System (DNS), including + security information, in archival contexts or contexts not connected + to the Internet. + +Table of Contents + + Abstract...................................................1 + 1. Introduction............................................1 + 2. General Format..........................................2 + 2.1 Binary Format..........................................3 + 2.2. Text Format...........................................4 + 3. Usage Example...........................................4 + 4. IANA Considerations.....................................4 + 5. Security Considerations.................................4 + References.................................................5 + Author's Address...........................................5 + Full Copyright Statement...................................6 + +1. Introduction + + The Domain Name System (DNS) is a replicated hierarchical distributed + database system [RFC 1034, 1035] that can provide highly available + service. It provides the operational basis for Internet host name to + address translation, automatic SMTP mail routing, and other basic + Internet functions. The DNS has been extended as described in [RFC + 2535] to permit the general storage of public cryptographic keys in + + + +Eastlake Experimental [Page 1] + +RFC 2540 Detached DNS Information March 1999 + + + the DNS and to enable the authentication of information retrieved + from the DNS though digital signatures. + + The DNS was not originally designed for storage of information + outside of the active zones and authoritative master files that are + part of the connected DNS. However there may be cases where this is + useful, particularly in connection with archived security + information. + +2. General Format + + The formats used for detached Domain Name System (DNS) information + are similar to those used for connected DNS information. The primary + difference is that elements of the connected DNS system (unless they + are an authoritative server for the zone containing the information) + are required to count down the Time To Live (TTL) associated with + each DNS Resource Record (RR) and discard them (possibly fetching a + fresh copy) when the TTL reaches zero. In contrast to this, detached + information may be stored in a off-line file, where it can not be + updated, and perhaps used to authenticate historic data or it might + be received via non-DNS protocols long after it was retrieved from + the DNS. Therefore, it is not practical to count down detached DNS + information TTL and it may be necessary to keep the data beyond the + point where the TTL (which is defined as an unsigned field) would + underflow. To preserve information as to the freshness of this + detached data, it is accompanied by its retrieval time. + + Whatever retrieves the information from the DNS must associate this + retrieval time with it. The retrieval time remains fixed thereafter. + When the current time minus the retrieval time exceeds the TTL for + any particular detached RR, it is no longer a valid copy within the + normal connected DNS scheme. This may make it invalid in context for + some detached purposes as well. If the RR is a SIG (signature) RR it + also has an expiration time. Regardless of the TTL, it and any RRs + it signs can not be considered authenticated after the signature + expiration time. + + + + + + + + + + + + + + + +Eastlake Experimental [Page 2] + +RFC 2540 Detached DNS Information March 1999 + + +2.1 Binary Format + + The standard binary format for detached DNS information is as + follows: + + 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | first retrieval time | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | RR count | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Resource Records (RRs) | + / / + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| + | next retrieval time | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | RR count | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Resource Records (RRs) | + / / + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + / ... / + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | hex 20 | + +-+-+-+-+-+-+-+-+ + + Retrieval time - the time that the immediately following information + was obtained from the connected DNS system. It is an unsigned + number of seconds since the start of 1 January 1970, GMT, + ignoring leap seconds, in network (big-endian) order. Note that + this time can not be before the initial proposal of this + standard. Therefore, the initial byte of an actual retrieval + time, considered as a 32 bit unsigned quantity, would always be + larger than 20 hex. The end of detached DNS information is + indicated by a "retrieval time" field initial byte equal to 0x20. + Use of a "retrieval time" field with a leading unsigned byte of + zero indicates a 64 bit (actually 8 leading zero bits plus a 56 + bit quantity). This 64 bit format will be required when + retrieval time is larger than 0xFFFFFFFF, which is some time in + the year 2106. The meaning of retrieval times with an initial + byte between 0x01 and 0x1F is reserved (see section 5). + Retrieval times will not generally be 32 bit aligned with respect + to each other due to the variable length nature of RRs. + + RR count - an unsigned integer number (with bytes in network order) + of following resource records retrieved at the preceding + retrieval time. + + + + + +Eastlake Experimental [Page 3] + +RFC 2540 Detached DNS Information March 1999 + + + Resource Records - the actual data which is in the same format as if + it were being transmitted in a DNS response. In particular, name + compression via pointers is permitted with the origin at the + beginning of the particular detached information data section, + just after the RR count. + +2.2. Text Format + + The standard text format for detached DNS information is as + prescribed for zone master files [RFC 1035] except that the $INCLUDE + control entry is prohibited and the new $DATE entry is required + (unless the information set is empty). $DATE is followed by the date + and time that the following information was obtained from the DNS + system as described for retrieval time in section 2.1 above. It is + in the text format YYYYMMDDHHMMSS where YYYY is the year (which may + be more than four digits to cover years after 9999), the first MM is + the month number (01-12), DD is the day of the month (01-31), HH is + the hour in 24 hours notation (00-23), the second MM is the minute + (00-59), and SS is the second (00-59). Thus a $DATE must appear + before the first RR and at every change in retrieval time through the + detached information. + +3. Usage Example + + A document might be authenticated by a key retrieved from the DNS in + a KEY resource record (RR). To later prove the authenticity of this + document, it would be desirable to preserve the KEY RR for that + public key, the SIG RR signing that KEY RR, the KEY RR for the key + used to authenticate that SIG, and so on through SIG and KEY RRs + until a well known trusted key is reached, perhaps the key for the + DNS root or some third party authentication service. (In some cases + these KEY RRs will actually be sets of KEY RRs with the same owner + and class because SIGs actually sign such record sets.) + + This information could be preserved as a set of detached DNS + information blocks. + +4. IANA Considerations + + Allocation of meanings to retrieval time fields with a initial byte + of between 0x01 and 0x1F requires an IETF consensus. + +5. Security Considerations + + The entirety of this document concerns a means to represent detached + DNS information. Such detached resource records may be security + relevant and/or secured information as described in [RFC 2535]. The + detached format provides no overall security for sets of detached + + + +Eastlake Experimental [Page 4] + +RFC 2540 Detached DNS Information March 1999 + + + information or for the association between retrieval time and + information. This can be provided by wrapping the detached + information format with some other form of signature. However, if + the detached information is accompanied by SIG RRs, its validity + period is indicated in those SIG RRs so the retrieval time might be + of secondary importance. + +References + + [RFC 1034] Mockapetris, P., "Domain Names - Concepts and + Facilities", STD 13, RFC 1034, November 1987. + + [RFC 1035] Mockapetris, P., " Domain Names - Implementation and + Specifications", STD 13, RFC 1035, November 1987. + + [RFC 2535] Eastlake, D., "Domain Name System Security Extensions", + RFC 2535, March 1999. + +Author's Address + + Donald E. Eastlake 3rd + IBM + 65 Shindegan Hill Road, RR #1 + Carmel, NY 10512 + + Phone: +1-914-276-2668(h) + +1-914-784-7913(w) + Fax: +1-914-784-3833(w) + EMail: dee3@us.ibm.com + + + + + + + + + + + + + + + + + + + + + + +Eastlake Experimental [Page 5] + +RFC 2540 Detached DNS Information March 1999 + + +Full Copyright Statement + + Copyright (C) The Internet Society (1999). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + + + + + + + + + + + + + + + + + + + + + + + + +Eastlake Experimental [Page 6] + diff --git a/doc/rfc/rfc2541.txt b/doc/rfc/rfc2541.txt new file mode 100644 index 00000000..a62ed2b4 --- /dev/null +++ b/doc/rfc/rfc2541.txt @@ -0,0 +1,395 @@ + + + + + + +Network Working Group D. Eastlake +Request for Comments: 2541 IBM +Category: Informational March 1999 + + + DNS Security Operational Considerations + +Status of this Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard of any kind. Distribution of this + memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (1999). All Rights Reserved. + +Abstract + + Secure DNS is based on cryptographic techniques. A necessary part of + the strength of these techniques is careful attention to the + operational aspects of key and signature generation, lifetime, size, + and storage. In addition, special attention must be paid to the + security of the high level zones, particularly the root zone. This + document discusses these operational aspects for keys and signatures + used in connection with the KEY and SIG DNS resource records. + +Acknowledgments + + The contributions and suggestions of the following persons (in + alphabetic order) are gratefully acknowledged: + + John Gilmore + Olafur Gudmundsson + Charlie Kaufman + + + + + + + + + + + + + + + + +Eastlake Informational [Page 1] + +RFC 2541 DNS Security Operational Considerations March 1999 + + +Table of Contents + + Abstract...................................................1 + Acknowledgments............................................1 + 1. Introduction............................................2 + 2. Public/Private Key Generation...........................2 + 3. Public/Private Key Lifetimes............................2 + 4. Public/Private Key Size Considerations..................3 + 4.1 RSA Key Sizes..........................................3 + 4.2 DSS Key Sizes..........................................4 + 5. Private Key Storage.....................................4 + 6. High Level Zones, The Root Zone, and The Meta-Root Key..5 + 7. Security Considerations.................................5 + References.................................................6 + Author's Address...........................................6 + Full Copyright Statement...................................7 + +1. Introduction + + This document describes operational considerations for the + generation, lifetime, size, and storage of DNS cryptographic keys and + signatures for use in the KEY and SIG resource records [RFC 2535]. + Particular attention is paid to high level zones and the root zone. + +2. Public/Private Key Generation + + Careful generation of all keys is a sometimes overlooked but + absolutely essential element in any cryptographically secure system. + The strongest algorithms used with the longest keys are still of no + use if an adversary can guess enough to lower the size of the likely + key space so that it can be exhaustively searched. Technical + suggestions for the generation of random keys will be found in [RFC + 1750]. + + Long term keys are particularly sensitive as they will represent a + more valuable target and be subject to attack for a longer time than + short period keys. It is strongly recommended that long term key + generation occur off-line in a manner isolated from the network via + an air gap or, at a minimum, high level secure hardware. + +3. Public/Private Key Lifetimes + + No key should be used forever. The longer a key is in use, the + greater the probability that it will have been compromised through + carelessness, accident, espionage, or cryptanalysis. Furthermore, if + + + + + + +Eastlake Informational [Page 2] + +RFC 2541 DNS Security Operational Considerations March 1999 + + + key rollover is a rare event, there is an increased risk that, when + the time does come to change the key, no one at the site will + remember how to do it or operational problems will have developed in + the key rollover procedures. + + While public key lifetime is a matter of local policy, these + considerations imply that, unless there are extraordinary + circumstances, no long term key should have a lifetime significantly + over four years. In fact, a reasonable guideline for long term keys + that are kept off-line and carefully guarded is a 13 month lifetime + with the intent that they be replaced every year. A reasonable + maximum lifetime for keys that are used for transaction security or + the like and are kept on line is 36 days with the intent that they be + replaced monthly or more often. In many cases, a key lifetime of + somewhat over a day may be reasonable. + + On the other hand, public keys with too short a lifetime can lead to + excessive resource consumption in re-signing data and retrieving + fresh information because cached information becomes stale. In the + Internet environment, almost all public keys should have lifetimes no + shorter than three minutes, which is a reasonable estimate of maximum + packet delay even in unusual circumstances. + +4. Public/Private Key Size Considerations + + There are a number of factors that effect public key size choice for + use in the DNS security extension. Unfortunately, these factors + usually do not all point in the same direction. Choice of zone key + size should generally be made by the zone administrator depending on + their local conditions. + + For most schemes, larger keys are more secure but slower. In + addition, larger keys increase the size of the KEY and SIG RRs. This + increases the chance of DNS UDP packet overflow and the possible + necessity for using higher overhead TCP in responses. + +4.1 RSA Key Sizes + + Given a small public exponent, verification (the most common + operation) for the MD5/RSA algorithm will vary roughly with the + square of the modulus length, signing will vary with the cube of the + modulus length, and key generation (the least common operation) will + vary with the fourth power of the modulus length. The current best + algorithms for factoring a modulus and breaking RSA security vary + roughly with the 1.6 power of the modulus itself. Thus going from a + 640 bit modulus to a 1280 bit modulus only increases the verification + time by a factor of 4 but may increase the work factor of breaking + the key by over 2^900. + + + +Eastlake Informational [Page 3] + +RFC 2541 DNS Security Operational Considerations March 1999 + + + The recommended minimum RSA algorithm modulus size is 704 bits which + is believed by the author to be secure at this time. But high level + zones in the DNS tree may wish to set a higher minimum, perhaps 1000 + bits, for security reasons. (Since the United States National + Security Agency generally permits export of encryption systems using + an RSA modulus of up to 512 bits, use of that small a modulus, i.e. + n, must be considered weak.) + + For an RSA key used only to secure data and not to secure other keys, + 704 bits should be adequate at this time. + +4.2 DSS Key Sizes + + DSS keys are probably roughly as strong as an RSA key of the same + length but DSS signatures are significantly smaller. + +5. Private Key Storage + + It is recommended that, where possible, zone private keys and the + zone file master copy be kept and used in off-line, non-network + connected, physically secure machines only. Periodically an + application can be run to add authentication to a zone by adding SIG + and NXT RRs and adding no-key type KEY RRs for subzones/algorithms + where a real KEY RR for the subzone with that algorithm is not + provided. Then the augmented file can be transferred, perhaps by + sneaker-net, to the networked zone primary server machine. + + The idea is to have a one way information flow to the network to + avoid the possibility of tampering from the network. Keeping the + zone master file on-line on the network and simply cycling it through + an off-line signer does not do this. The on-line version could still + be tampered with if the host it resides on is compromised. For + maximum security, the master copy of the zone file should be off net + and should not be updated based on an unsecured network mediated + communication. + + This is not possible if the zone is to be dynamically updated + securely [RFC 2137]. At least a private key capable of updating the + SOA and NXT chain must be on line in that case. + + Secure resolvers must be configured with some trusted on-line public + key information (or a secure path to such a resolver) or they will be + unable to authenticate. Although on line, this public key + information must be protected or it could be altered so that spoofed + DNS data would appear authentic. + + + + + + +Eastlake Informational [Page 4] + +RFC 2541 DNS Security Operational Considerations March 1999 + + + Non-zone private keys, such as host or user keys, generally have to + be kept on line to be used for real-time purposes such as DNS + transaction security. + +6. High Level Zones, The Root Zone, and The Meta-Root Key + + Higher level zones are generally more sensitive than lower level + zones. Anyone controlling or breaking the security of a zone thereby + obtains authority over all of its subdomains (except in the case of + resolvers that have locally configured the public key of a + subdomain). Therefore, extra care should be taken with high level + zones and strong keys used. + + The root zone is the most critical of all zones. Someone controlling + or compromising the security of the root zone would control the + entire DNS name space of all resolvers using that root zone (except + in the case of resolvers that have locally configured the public key + of a subdomain). Therefore, the utmost care must be taken in the + securing of the root zone. The strongest and most carefully handled + keys should be used. The root zone private key should always be kept + off line. + + Many resolvers will start at a root server for their access to and + authentication of DNS data. Securely updating an enormous population + of resolvers around the world will be extremely difficult. Yet the + guidelines in section 3 above would imply that the root zone private + key be changed annually or more often and if it were staticly + configured at all these resolvers, it would have to be updated when + changed. + + To permit relatively frequent change to the root zone key yet + minimize exposure of the ultimate key of the DNS tree, there will be + a "meta-root" key used very rarely and then only to sign a sequence + of regular root key RRsets with overlapping time validity periods + that are to be rolled out. The root zone contains the meta-root and + current regular root KEY RR(s) signed by SIG RRs under both the + meta-root and other root private key(s) themselves. + + The utmost security in the storage and use of the meta-root key is + essential. The exact techniques are precautions to be used are + beyond the scope of this document. Because of its special position, + it may be best to continue with the same meta-root key for an + extended period of time such as ten to fifteen years. + +7. Security Considerations + + The entirety of this document is concerned with operational + considerations of public/private key pair DNS Security. + + + +Eastlake Informational [Page 5] + +RFC 2541 DNS Security Operational Considerations March 1999 + + +References + + [RFC 1034] Mockapetris, P., "Domain Names - Concepts and + Facilities", STD 13, RFC 1034, November 1987. + + [RFC 1035] Mockapetris, P., "Domain Names - Implementation and + Specifications", STD 13, RFC 1035, November 1987. + + [RFC 1750] Eastlake, D., Crocker, S. and J. Schiller, "Randomness + Requirements for Security", RFC 1750, December 1994. + + [RFC 2065] Eastlake, D. and C. Kaufman, "Domain Name System + Security Extensions", RFC 2065, January 1997. + + [RFC 2137] Eastlake, D., "Secure Domain Name System Dynamic + Update", RFC 2137, April 1997. + + [RFC 2535] Eastlake, D., "Domain Name System Security Extensions", + RFC 2535, March 1999. + + [RSA FAQ] RSADSI Frequently Asked Questions periodic posting. + +Author's Address + + Donald E. Eastlake 3rd + IBM + 65 Shindegan Hill Road, RR #1 + Carmel, NY 10512 + + Phone: +1-914-276-2668(h) + +1-914-784-7913(w) + Fax: +1-914-784-3833(w) + EMail: dee3@us.ibm.com + + + + + + + + + + + + + + + + + + +Eastlake Informational [Page 6] + +RFC 2541 DNS Security Operational Considerations March 1999 + + +Full Copyright Statement + + Copyright (C) The Internet Society (1999). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + + + + + + + + + + + + + + + + + + + + + + + + +Eastlake Informational [Page 7] + diff --git a/doc/rfc/rfc2553.txt b/doc/rfc/rfc2553.txt new file mode 100644 index 00000000..6989bf30 --- /dev/null +++ b/doc/rfc/rfc2553.txt @@ -0,0 +1,2299 @@ + + + + + + +Network Working Group R. Gilligan +Request for Comments: 2553 FreeGate +Obsoletes: 2133 S. Thomson +Category: Informational Bellcore + J. Bound + Compaq + W. Stevens + Consultant + March 1999 + + + Basic Socket Interface Extensions for IPv6 + +Status of this Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard of any kind. Distribution of this + memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (1999). All Rights Reserved. + +Abstract + + The de facto standard application program interface (API) for TCP/IP + applications is the "sockets" interface. Although this API was + developed for Unix in the early 1980s it has also been implemented on + a wide variety of non-Unix systems. TCP/IP applications written + using the sockets API have in the past enjoyed a high degree of + portability and we would like the same portability with IPv6 + applications. But changes are required to the sockets API to support + IPv6 and this memo describes these changes. These include a new + socket address structure to carry IPv6 addresses, new address + conversion functions, and some new socket options. These extensions + are designed to provide access to the basic IPv6 features required by + TCP and UDP applications, including multicasting, while introducing a + minimum of change into the system and providing complete + compatibility for existing IPv4 applications. Additional extensions + for advanced IPv6 features (raw sockets and access to the IPv6 + extension headers) are defined in another document [4]. + + + + + + + + + + +Gilligan, et. al. Informational [Page 1] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + +Table of Contents + + 1. Introduction.................................................3 + 2. Design Considerations........................................3 + 2.1 What Needs to be Changed....................................4 + 2.2 Data Types..................................................5 + 2.3 Headers.....................................................5 + 2.4 Structures..................................................5 + 3. Socket Interface.............................................6 + 3.1 IPv6 Address Family and Protocol Family.....................6 + 3.2 IPv6 Address Structure......................................6 + 3.3 Socket Address Structure for 4.3BSD-Based Systems...........7 + 3.4 Socket Address Structure for 4.4BSD-Based Systems...........8 + 3.5 The Socket Functions........................................9 + 3.6 Compatibility with IPv4 Applications.......................10 + 3.7 Compatibility with IPv4 Nodes..............................10 + 3.8 IPv6 Wildcard Address......................................11 + 3.9 IPv6 Loopback Address......................................12 + 3.10 Portability Additions.....................................13 + 4. Interface Identification....................................16 + 4.1 Name-to-Index..............................................16 + 4.2 Index-to-Name..............................................17 + 4.3 Return All Interface Names and Indexes.....................17 + 4.4 Free Memory................................................18 + 5. Socket Options..............................................18 + 5.1 Unicast Hop Limit..........................................18 + 5.2 Sending and Receiving Multicast Packets....................19 + 6. Library Functions...........................................21 + 6.1 Nodename-to-Address Translation............................21 + 6.2 Address-To-Nodename Translation............................24 + 6.3 Freeing memory for getipnodebyname and getipnodebyaddr.....26 + 6.4 Protocol-Independent Nodename and Service Name Translation.26 + 6.5 Socket Address Structure to Nodename and Service Name......29 + 6.6 Address Conversion Functions...............................31 + 6.7 Address Testing Macros.....................................32 + 7. Summary of New Definitions..................................33 + 8. Security Considerations.....................................35 + 9. Year 2000 Considerations....................................35 + Changes From RFC 2133..........................................35 + Acknowledgments................................................38 + References.....................................................39 + Authors' Addresses.............................................40 + Full Copyright Statement.......................................41 + + + + + + + + +Gilligan, et. al. Informational [Page 2] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + +1. Introduction + + While IPv4 addresses are 32 bits long, IPv6 interfaces are identified + by 128-bit addresses. The socket interface makes the size of an IP + address quite visible to an application; virtually all TCP/IP + applications for BSD-based systems have knowledge of the size of an + IP address. Those parts of the API that expose the addresses must be + changed to accommodate the larger IPv6 address size. IPv6 also + introduces new features (e.g., traffic class and flowlabel), some of + which must be made visible to applications via the API. This memo + defines a set of extensions to the socket interface to support the + larger address size and new features of IPv6. + +2. Design Considerations + + There are a number of important considerations in designing changes + to this well-worn API: + + - The API changes should provide both source and binary + compatibility for programs written to the original API. That + is, existing program binaries should continue to operate when + run on a system supporting the new API. In addition, existing + applications that are re-compiled and run on a system supporting + the new API should continue to operate. Simply put, the API + changes for IPv6 should not break existing programs. An + additonal mechanism for implementations to verify this is to + verify the new symbols are protected by Feature Test Macros as + described in IEEE Std 1003.1. (Such Feature Test Macros are not + defined by this RFC.) + + - The changes to the API should be as small as possible in order + to simplify the task of converting existing IPv4 applications to + IPv6. + + - Where possible, applications should be able to use this API to + interoperate with both IPv6 and IPv4 hosts. Applications should + not need to know which type of host they are communicating with. + + - IPv6 addresses carried in data structures should be 64-bit + aligned. This is necessary in order to obtain optimum + performance on 64-bit machine architectures. + + Because of the importance of providing IPv4 compatibility in the API, + these extensions are explicitly designed to operate on machines that + provide complete support for both IPv4 and IPv6. A subset of this + API could probably be designed for operation on systems that support + only IPv6. However, this is not addressed in this memo. + + + + +Gilligan, et. al. Informational [Page 3] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + +2.1 What Needs to be Changed + + The socket interface API consists of a few distinct components: + + - Core socket functions. + + - Address data structures. + + - Name-to-address translation functions. + + - Address conversion functions. + + The core socket functions -- those functions that deal with such + things as setting up and tearing down TCP connections, and sending + and receiving UDP packets -- were designed to be transport + independent. Where protocol addresses are passed as function + arguments, they are carried via opaque pointers. A protocol-specific + address data structure is defined for each protocol that the socket + functions support. Applications must cast pointers to these + protocol-specific address structures into pointers to the generic + "sockaddr" address structure when using the socket functions. These + functions need not change for IPv6, but a new IPv6-specific address + data structure is needed. + + The "sockaddr_in" structure is the protocol-specific data structure + for IPv4. This data structure actually includes 8-octets of unused + space, and it is tempting to try to use this space to adapt the + sockaddr_in structure to IPv6. Unfortunately, the sockaddr_in + structure is not large enough to hold the 16-octet IPv6 address as + well as the other information (address family and port number) that + is needed. So a new address data structure must be defined for IPv6. + + IPv6 addresses are scoped [2] so they could be link-local, site, + organization, global, or other scopes at this time undefined. To + support applications that want to be able to identify a set of + interfaces for a specific scope, the IPv6 sockaddr_in structure must + support a field that can be used by an implementation to identify a + set of interfaces identifying the scope for an IPv6 address. + + The name-to-address translation functions in the socket interface are + gethostbyname() and gethostbyaddr(). These are left as is and new + functions are defined to support IPv4 and IPv6. Additionally, the + POSIX 1003.g draft [3] specifies a new nodename-to-address + translation function which is protocol independent. This function + can also be used with IPv4 and IPv6. + + + + + + +Gilligan, et. al. Informational [Page 4] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + The address conversion functions -- inet_ntoa() and inet_addr() -- + convert IPv4 addresses between binary and printable form. These + functions are quite specific to 32-bit IPv4 addresses. We have + designed two analogous functions that convert both IPv4 and IPv6 + addresses, and carry an address type parameter so that they can be + extended to other protocol families as well. + + Finally, a few miscellaneous features are needed to support IPv6. + New interfaces are needed to support the IPv6 traffic class, flow + label, and hop limit header fields. New socket options are needed to + control the sending and receiving of IPv6 multicast packets. + + The socket interface will be enhanced in the future to provide access + to other IPv6 features. These extensions are described in [4]. + +2.2 Data Types + + The data types of the structure elements given in this memo are + intended to be examples, not absolute requirements. Whenever + possible, data types from Draft 6.6 (March 1997) of POSIX 1003.1g are + used: uintN_t means an unsigned integer of exactly N bits (e.g., + uint16_t). We also assume the argument data types from 1003.1g when + possible (e.g., the final argument to setsockopt() is a size_t + value). Whenever buffer sizes are specified, the POSIX 1003.1 size_t + data type is used (e.g., the two length arguments to getnameinfo()). + +2.3 Headers + + When function prototypes and structures are shown we show the headers + that must be #included to cause that item to be defined. + +2.4 Structures + + When structures are described the members shown are the ones that + must appear in an implementation. Additional, nonstandard members + may also be defined by an implementation. As an additional + precaution nonstandard members could be verified by Feature Test + Macros as described in IEEE Std 1003.1. (Such Feature Test Macros + are not defined by this RFC.) + + The ordering shown for the members of a structure is the recommended + ordering, given alignment considerations of multibyte members, but an + implementation may order the members differently. + + + + + + + + +Gilligan, et. al. Informational [Page 5] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + +3. Socket Interface + + This section specifies the socket interface changes for IPv6. + +3.1 IPv6 Address Family and Protocol Family + + A new address family name, AF_INET6, is defined in <sys/socket.h>. + The AF_INET6 definition distinguishes between the original + sockaddr_in address data structure, and the new sockaddr_in6 data + structure. + + A new protocol family name, PF_INET6, is defined in <sys/socket.h>. + Like most of the other protocol family names, this will usually be + defined to have the same value as the corresponding address family + name: + + #define PF_INET6 AF_INET6 + + The PF_INET6 is used in the first argument to the socket() function + to indicate that an IPv6 socket is being created. + +3.2 IPv6 Address Structure + + A new in6_addr structure holds a single IPv6 address and is defined + as a result of including <netinet/in.h>: + + struct in6_addr { + uint8_t s6_addr[16]; /* IPv6 address */ + }; + + This data structure contains an array of sixteen 8-bit elements, + which make up one 128-bit IPv6 address. The IPv6 address is stored + in network byte order. + + The structure in6_addr above is usually implemented with an embedded + union with extra fields that force the desired alignment level in a + manner similar to BSD implementations of "struct in_addr". Those + additional implementation details are omitted here for simplicity. + + An example is as follows: + + + + + + + + + + + +Gilligan, et. al. Informational [Page 6] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + struct in6_addr { + union { + uint8_t _S6_u8[16]; + uint32_t _S6_u32[4]; + uint64_t _S6_u64[2]; + } _S6_un; + }; + #define s6_addr _S6_un._S6_u8 + +3.3 Socket Address Structure for 4.3BSD-Based Systems + + In the socket interface, a different protocol-specific data structure + is defined to carry the addresses for each protocol suite. Each + protocol- specific data structure is designed so it can be cast into a + protocol- independent data structure -- the "sockaddr" structure. + Each has a "family" field that overlays the "sa_family" of the + sockaddr data structure. This field identifies the type of the data + structure. + + The sockaddr_in structure is the protocol-specific address data + structure for IPv4. It is used to pass addresses between applications + and the system in the socket functions. The following sockaddr_in6 + structure holds IPv6 addresses and is defined as a result of including + the <netinet/in.h> header: + +struct sockaddr_in6 { + sa_family_t sin6_family; /* AF_INET6 */ + in_port_t sin6_port; /* transport layer port # */ + uint32_t sin6_flowinfo; /* IPv6 traffic class & flow info */ + struct in6_addr sin6_addr; /* IPv6 address */ + uint32_t sin6_scope_id; /* set of interfaces for a scope */ +}; + + This structure is designed to be compatible with the sockaddr data + structure used in the 4.3BSD release. + + The sin6_family field identifies this as a sockaddr_in6 structure. + This field overlays the sa_family field when the buffer is cast to a + sockaddr data structure. The value of this field must be AF_INET6. + + The sin6_port field contains the 16-bit UDP or TCP port number. This + field is used in the same way as the sin_port field of the + sockaddr_in structure. The port number is stored in network byte + order. + + + + + + + +Gilligan, et. al. Informational [Page 7] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + The sin6_flowinfo field is a 32-bit field that contains two pieces of + information: the traffic class and the flow label. The contents and + interpretation of this member is specified in [1]. The sin6_flowinfo + field SHOULD be set to zero by an implementation prior to using the + sockaddr_in6 structure by an application on receive operations. + + The sin6_addr field is a single in6_addr structure (defined in the + previous section). This field holds one 128-bit IPv6 address. The + address is stored in network byte order. + + The ordering of elements in this structure is specifically designed + so that when sin6_addr field is aligned on a 64-bit boundary, the + start of the structure will also be aligned on a 64-bit boundary. + This is done for optimum performance on 64-bit architectures. + + The sin6_scope_id field is a 32-bit integer that identifies a set of + interfaces as appropriate for the scope of the address carried in the + sin6_addr field. For a link scope sin6_addr sin6_scope_id would be + an interface index. For a site scope sin6_addr, sin6_scope_id would + be a site identifier. The mapping of sin6_scope_id to an interface + or set of interfaces is left to implementation and future + specifications on the subject of site identifiers. + + Notice that the sockaddr_in6 structure will normally be larger than + the generic sockaddr structure. On many existing implementations the + sizeof(struct sockaddr_in) equals sizeof(struct sockaddr), with both + being 16 bytes. Any existing code that makes this assumption needs + to be examined carefully when converting to IPv6. + +3.4 Socket Address Structure for 4.4BSD-Based Systems + + The 4.4BSD release includes a small, but incompatible change to the + socket interface. The "sa_family" field of the sockaddr data + structure was changed from a 16-bit value to an 8-bit value, and the + space saved used to hold a length field, named "sa_len". The + sockaddr_in6 data structure given in the previous section cannot be + correctly cast into the newer sockaddr data structure. For this + reason, the following alternative IPv6 address data structure is + provided to be used on systems based on 4.4BSD. It is defined as a + result of including the <netinet/in.h> header. + + + + + + + + + + + +Gilligan, et. al. Informational [Page 8] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + +struct sockaddr_in6 { + uint8_t sin6_len; /* length of this struct */ + sa_family_t sin6_family; /* AF_INET6 */ + in_port_t sin6_port; /* transport layer port # */ + uint32_t sin6_flowinfo; /* IPv6 flow information */ + struct in6_addr sin6_addr; /* IPv6 address */ + uint32_t sin6_scope_id; /* set of interfaces for a scope */ +}; + + The only differences between this data structure and the 4.3BSD + variant are the inclusion of the length field, and the change of the + family field to a 8-bit data type. The definitions of all the other + fields are identical to the structure defined in the previous + section. + + Systems that provide this version of the sockaddr_in6 data structure + must also declare SIN6_LEN as a result of including the + <netinet/in.h> header. This macro allows applications to determine + whether they are being built on a system that supports the 4.3BSD or + 4.4BSD variants of the data structure. + +3.5 The Socket Functions + + Applications call the socket() function to create a socket descriptor + that represents a communication endpoint. The arguments to the + socket() function tell the system which protocol to use, and what + format address structure will be used in subsequent functions. For + example, to create an IPv4/TCP socket, applications make the call: + + s = socket(PF_INET, SOCK_STREAM, 0); + + To create an IPv4/UDP socket, applications make the call: + + s = socket(PF_INET, SOCK_DGRAM, 0); + + Applications may create IPv6/TCP and IPv6/UDP sockets by simply using + the constant PF_INET6 instead of PF_INET in the first argument. For + example, to create an IPv6/TCP socket, applications make the call: + + s = socket(PF_INET6, SOCK_STREAM, 0); + + To create an IPv6/UDP socket, applications make the call: + + s = socket(PF_INET6, SOCK_DGRAM, 0); + + + + + + + +Gilligan, et. al. Informational [Page 9] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + Once the application has created a PF_INET6 socket, it must use the + sockaddr_in6 address structure when passing addresses in to the + system. The functions that the application uses to pass addresses + into the system are: + + bind() + connect() + sendmsg() + sendto() + + The system will use the sockaddr_in6 address structure to return + addresses to applications that are using PF_INET6 sockets. The + functions that return an address from the system to an application + are: + + accept() + recvfrom() + recvmsg() + getpeername() + getsockname() + + No changes to the syntax of the socket functions are needed to + support IPv6, since all of the "address carrying" functions use an + opaque address pointer, and carry an address length as a function + argument. + +3.6 Compatibility with IPv4 Applications + + In order to support the large base of applications using the original + API, system implementations must provide complete source and binary + compatibility with the original API. This means that systems must + continue to support PF_INET sockets and the sockaddr_in address + structure. Applications must be able to create IPv4/TCP and IPv4/UDP + sockets using the PF_INET constant in the socket() function, as + described in the previous section. Applications should be able to + hold a combination of IPv4/TCP, IPv4/UDP, IPv6/TCP and IPv6/UDP + sockets simultaneously within the same process. + + Applications using the original API should continue to operate as + they did on systems supporting only IPv4. That is, they should + continue to interoperate with IPv4 nodes. + +3.7 Compatibility with IPv4 Nodes + + The API also provides a different type of compatibility: the ability + for IPv6 applications to interoperate with IPv4 applications. This + feature uses the IPv4-mapped IPv6 address format defined in the IPv6 + addressing architecture specification [2]. This address format + + + +Gilligan, et. al. Informational [Page 10] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + allows the IPv4 address of an IPv4 node to be represented as an IPv6 + address. The IPv4 address is encoded into the low-order 32 bits of + the IPv6 address, and the high-order 96 bits hold the fixed prefix + 0:0:0:0:0:FFFF. IPv4- mapped addresses are written as follows: + + ::FFFF:<IPv4-address> + + These addresses can be generated automatically by the + getipnodebyname() function when the specified host has only IPv4 + addresses (as described in Section 6.1). + + Applications may use PF_INET6 sockets to open TCP connections to IPv4 + nodes, or send UDP packets to IPv4 nodes, by simply encoding the + destination's IPv4 address as an IPv4-mapped IPv6 address, and + passing that address, within a sockaddr_in6 structure, in the + connect() or sendto() call. When applications use PF_INET6 sockets + to accept TCP connections from IPv4 nodes, or receive UDP packets + from IPv4 nodes, the system returns the peer's address to the + application in the accept(), recvfrom(), or getpeername() call using + a sockaddr_in6 structure encoded this way. + + Few applications will likely need to know which type of node they are + interoperating with. However, for those applications that do need to + know, the IN6_IS_ADDR_V4MAPPED() macro, defined in Section 6.7, is + provided. + +3.8 IPv6 Wildcard Address + + While the bind() function allows applications to select the source IP + address of UDP packets and TCP connections, applications often want + the system to select the source address for them. With IPv4, one + specifies the address as the symbolic constant INADDR_ANY (called the + "wildcard" address) in the bind() call, or simply omits the bind() + entirely. + + Since the IPv6 address type is a structure (struct in6_addr), a + symbolic constant can be used to initialize an IPv6 address variable, + but cannot be used in an assignment. Therefore systems provide the + IPv6 wildcard address in two forms. + + The first version is a global variable named "in6addr_any" that is an + in6_addr structure. The extern declaration for this variable is + defined in <netinet/in.h>: + + extern const struct in6_addr in6addr_any; + + + + + + +Gilligan, et. al. Informational [Page 11] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + Applications use in6addr_any similarly to the way they use INADDR_ANY + in IPv4. For example, to bind a socket to port number 23, but let + the system select the source address, an application could use the + following code: + + struct sockaddr_in6 sin6; + . . . + sin6.sin6_family = AF_INET6; + sin6.sin6_flowinfo = 0; + sin6.sin6_port = htons(23); + sin6.sin6_addr = in6addr_any; /* structure assignment */ + . . . + if (bind(s, (struct sockaddr *) &sin6, sizeof(sin6)) == -1) + . . . + + The other version is a symbolic constant named IN6ADDR_ANY_INIT and + is defined in <netinet/in.h>. This constant can be used to + initialize an in6_addr structure: + + struct in6_addr anyaddr = IN6ADDR_ANY_INIT; + + Note that this constant can be used ONLY at declaration time. It can + not be used to assign a previously declared in6_addr structure. For + example, the following code will not work: + + /* This is the WRONG way to assign an unspecified address */ + struct sockaddr_in6 sin6; + . . . + sin6.sin6_addr = IN6ADDR_ANY_INIT; /* will NOT compile */ + + Be aware that the IPv4 INADDR_xxx constants are all defined in host + byte order but the IPv6 IN6ADDR_xxx constants and the IPv6 + in6addr_xxx externals are defined in network byte order. + +3.9 IPv6 Loopback Address + + Applications may need to send UDP packets to, or originate TCP + connections to, services residing on the local node. In IPv4, they + can do this by using the constant IPv4 address INADDR_LOOPBACK in + their connect(), sendto(), or sendmsg() call. + + IPv6 also provides a loopback address to contact local TCP and UDP + services. Like the unspecified address, the IPv6 loopback address is + provided in two forms -- a global variable and a symbolic constant. + + + + + + + +Gilligan, et. al. Informational [Page 12] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + The global variable is an in6_addr structure named + "in6addr_loopback." The extern declaration for this variable is + defined in <netinet/in.h>: + + extern const struct in6_addr in6addr_loopback; + + Applications use in6addr_loopback as they would use INADDR_LOOPBACK + in IPv4 applications (but beware of the byte ordering difference + mentioned at the end of the previous section). For example, to open + a TCP connection to the local telnet server, an application could use + the following code: + + struct sockaddr_in6 sin6; + . . . + sin6.sin6_family = AF_INET6; + sin6.sin6_flowinfo = 0; + sin6.sin6_port = htons(23); + sin6.sin6_addr = in6addr_loopback; /* structure assignment */ + . . . + if (connect(s, (struct sockaddr *) &sin6, sizeof(sin6)) == -1) + . . . + + The symbolic constant is named IN6ADDR_LOOPBACK_INIT and is defined + in <netinet/in.h>. It can be used at declaration time ONLY; for + example: + + struct in6_addr loopbackaddr = IN6ADDR_LOOPBACK_INIT; + + Like IN6ADDR_ANY_INIT, this constant cannot be used in an assignment + to a previously declared IPv6 address variable. + +3.10 Portability Additions + + One simple addition to the sockets API that can help application + writers is the "struct sockaddr_storage". This data structure can + simplify writing code portable across multiple address families and + platforms. This data structure is designed with the following goals. + + - It has a large enough implementation specific maximum size to + store the desired set of protocol specific socket address data + structures. Specifically, it is at least large enough to + accommodate sockaddr_in and sockaddr_in6 and possibly other + protocol specific socket addresses too. + - It is aligned at an appropriate boundary so protocol specific + socket address data structure pointers can be cast to it and + access their fields without alignment problems. (e.g. pointers + to sockaddr_in6 and/or sockaddr_in can be cast to it and access + fields without alignment problems). + + + +Gilligan, et. al. Informational [Page 13] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + - It has the initial field(s) isomorphic to the fields of the + "struct sockaddr" data structure on that implementation which + can be used as a discriminants for deriving the protocol in use. + These initial field(s) would on most implementations either be a + single field of type "sa_family_t" (isomorphic to sa_family + field, 16 bits) or two fields of type uint8_t and sa_family_t + respectively, (isomorphic to sa_len and sa_family_t, 8 bits + each). + + An example implementation design of such a data structure would be as + follows. + +/* + * Desired design of maximum size and alignment + */ +#define _SS_MAXSIZE 128 /* Implementation specific max size */ +#define _SS_ALIGNSIZE (sizeof (int64_t)) + /* Implementation specific desired alignment */ +/* + * Definitions used for sockaddr_storage structure paddings design. + */ +#define _SS_PAD1SIZE (_SS_ALIGNSIZE - sizeof (sa_family_t)) +#define _SS_PAD2SIZE (_SS_MAXSIZE - (sizeof (sa_family_t)+ + _SS_PAD1SIZE + _SS_ALIGNSIZE)) +struct sockaddr_storage { + sa_family_t __ss_family; /* address family */ + /* Following fields are implementation specific */ + char __ss_pad1[_SS_PAD1SIZE]; + /* 6 byte pad, this is to make implementation + /* specific pad up to alignment field that */ + /* follows explicit in the data structure */ + int64_t __ss_align; /* field to force desired structure */ + /* storage alignment */ + char __ss_pad2[_SS_PAD2SIZE]; + /* 112 byte pad to achieve desired size, */ + /* _SS_MAXSIZE value minus size of ss_family */ + /* __ss_pad1, __ss_align fields is 112 */ +}; + + On implementations where sockaddr data structure includes a "sa_len", + field this data structure would look like this: + +/* + * Definitions used for sockaddr_storage structure paddings design. + */ +#define _SS_PAD1SIZE (_SS_ALIGNSIZE - + (sizeof (uint8_t) + sizeof (sa_family_t)) +#define _SS_PAD2SIZE (_SS_MAXSIZE - (sizeof (sa_family_t)+ + + + +Gilligan, et. al. Informational [Page 14] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + _SS_PAD1SIZE + _SS_ALIGNSIZE)) +struct sockaddr_storage { + uint8_t __ss_len; /* address length */ + sa_family_t __ss_family; /* address family */ + /* Following fields are implementation specific */ + char __ss_pad1[_SS_PAD1SIZE]; + /* 6 byte pad, this is to make implementation + /* specific pad up to alignment field that */ + /* follows explicit in the data structure */ + int64_t __ss_align; /* field to force desired structure */ + /* storage alignment */ + char __ss_pad2[_SS_PAD2SIZE]; + /* 112 byte pad to achieve desired size, */ + /* _SS_MAXSIZE value minus size of ss_len, */ + /* __ss_family, __ss_pad1, __ss_align fields is 112 */ +}; + + The above example implementation illustrates a data structure which + will align on a 64 bit boundary. An implementation specific field + "__ss_align" along "__ss_pad1" is used to force a 64-bit alignment + which covers proper alignment good enough for needs of sockaddr_in6 + (IPv6), sockaddr_in (IPv4) address data structures. The size of + padding fields __ss_pad1 depends on the chosen alignment boundary. + The size of padding field __ss_pad2 depends on the value of overall + size chosen for the total size of the structure. This size and + alignment are represented in the above example by implementation + specific (not required) constants _SS_MAXSIZE (chosen value 128) and + _SS_ALIGNMENT (with chosen value 8). Constants _SS_PAD1SIZE (derived + value 6) and _SS_PAD2SIZE (derived value 112) are also for + illustration and not required. The implementation specific + definitions and structure field names above start with an underscore + to denote implementation private namespace. Portable code is not + expected to access or reference those fields or constants. + + The sockaddr_storage structure solves the problem of declaring + storage for automatic variables which is large enough and aligned + enough for storing socket address data structure of any family. For + example, code with a file descriptor and without the context of the + address family can pass a pointer to a variable of this type where a + pointer to a socket address structure is expected in calls such as + getpeername() and determine the address family by accessing the + received content after the call. + + The sockaddr_storage structure may also be useful and applied to + certain other interfaces where a generic socket address large enough + and aligned for use with multiple address families may be needed. A + discussion of those interfaces is outside the scope of this document. + + + + +Gilligan, et. al. Informational [Page 15] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + Also, much existing code assumes that any socket address structure + can fit in a generic sockaddr structure. While this has been true + for IPv4 socket address structures, it has always been false for Unix + domain socket address structures (but in practice this has not been a + problem) and it is also false for IPv6 socket address structures + (which can be a problem). + + So now an application can do the following: + + struct sockaddr_storage __ss; + struct sockaddr_in6 *sin6; + sin6 = (struct sockaddr_in6 *) &__ss; + +4. Interface Identification + + This API uses an interface index (a small positive integer) to + identify the local interface on which a multicast group is joined + (Section 5.3). Additionally, the advanced API [4] uses these same + interface indexes to identify the interface on which a datagram is + received, or to specify the interface on which a datagram is to be + sent. + + Interfaces are normally known by names such as "le0", "sl1", "ppp2", + and the like. On Berkeley-derived implementations, when an interface + is made known to the system, the kernel assigns a unique positive + integer value (called the interface index) to that interface. These + are small positive integers that start at 1. (Note that 0 is never + used for an interface index.) There may be gaps so that there is no + current interface for a particular positive interface index. + + This API defines two functions that map between an interface name and + index, a third function that returns all the interface names and + indexes, and a fourth function to return the dynamic memory allocated + by the previous function. How these functions are implemented is + left up to the implementation. 4.4BSD implementations can implement + these functions using the existing sysctl() function with the + NET_RT_IFLIST command. Other implementations may wish to use ioctl() + for this purpose. + +4.1 Name-to-Index + + The first function maps an interface name into its corresponding + index. + + #include <net/if.h> + + unsigned int if_nametoindex(const char *ifname); + + + + +Gilligan, et. al. Informational [Page 16] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + If the specified interface name does not exist, the return value is + 0, and errno is set to ENXIO. If there was a system error (such as + running out of memory), the return value is 0 and errno is set to the + proper value (e.g., ENOMEM). + +4.2 Index-to-Name + + The second function maps an interface index into its corresponding + name. + + #include <net/if.h> + + char *if_indextoname(unsigned int ifindex, char *ifname); + + The ifname argument must point to a buffer of at least IF_NAMESIZE + bytes into which the interface name corresponding to the specified + index is returned. (IF_NAMESIZE is also defined in <net/if.h> and + its value includes a terminating null byte at the end of the + interface name.) This pointer is also the return value of the + function. If there is no interface corresponding to the specified + index, NULL is returned, and errno is set to ENXIO, if there was a + system error (such as running out of memory), if_indextoname returns + NULL and errno would be set to the proper value (e.g., ENOMEM). + +4.3 Return All Interface Names and Indexes + + The if_nameindex structure holds the information about a single + interface and is defined as a result of including the <net/if.h> + header. + + struct if_nameindex { + unsigned int if_index; /* 1, 2, ... */ + char *if_name; /* null terminated name: "le0", ... */ + }; + + The final function returns an array of if_nameindex structures, one + structure per interface. + + struct if_nameindex *if_nameindex(void); + + The end of the array of structures is indicated by a structure with + an if_index of 0 and an if_name of NULL. The function returns a NULL + pointer upon an error, and would set errno to the appropriate value. + + The memory used for this array of structures along with the interface + names pointed to by the if_name members is obtained dynamically. + This memory is freed by the next function. + + + + +Gilligan, et. al. Informational [Page 17] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + +4.4 Free Memory + + The following function frees the dynamic memory that was allocated by + if_nameindex(). + + #include <net/if.h> + + void if_freenameindex(struct if_nameindex *ptr); + + The argument to this function must be a pointer that was returned by + if_nameindex(). + + Currently net/if.h doesn't have prototype definitions for functions + and it is recommended that these definitions be defined in net/if.h + as well and the struct if_nameindex{}. + +5. Socket Options + + A number of new socket options are defined for IPv6. All of these + new options are at the IPPROTO_IPV6 level. That is, the "level" + parameter in the getsockopt() and setsockopt() calls is IPPROTO_IPV6 + when using these options. The constant name prefix IPV6_ is used in + all of the new socket options. This serves to clearly identify these + options as applying to IPv6. + + The declaration for IPPROTO_IPV6, the new IPv6 socket options, and + related constants defined in this section are obtained by including + the header <netinet/in.h>. + +5.1 Unicast Hop Limit + + A new setsockopt() option controls the hop limit used in outgoing + unicast IPv6 packets. The name of this option is IPV6_UNICAST_HOPS, + and it is used at the IPPROTO_IPV6 layer. The following example + illustrates how it is used: + + int hoplimit = 10; + + if (setsockopt(s, IPPROTO_IPV6, IPV6_UNICAST_HOPS, + (char *) &hoplimit, sizeof(hoplimit)) == -1) + perror("setsockopt IPV6_UNICAST_HOPS"); + + When the IPV6_UNICAST_HOPS option is set with setsockopt(), the + option value given is used as the hop limit for all subsequent + unicast packets sent via that socket. If the option is not set, the + system selects a default value. The integer hop limit value (called + x) is interpreted as follows: + + + + +Gilligan, et. al. Informational [Page 18] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + x < -1: return an error of EINVAL + x == -1: use kernel default + 0 <= x <= 255: use x + x >= 256: return an error of EINVAL + + The IPV6_UNICAST_HOPS option may be used with getsockopt() to + determine the hop limit value that the system will use for subsequent + unicast packets sent via that socket. For example: + + int hoplimit; + size_t len = sizeof(hoplimit); + + if (getsockopt(s, IPPROTO_IPV6, IPV6_UNICAST_HOPS, + (char *) &hoplimit, &len) == -1) + perror("getsockopt IPV6_UNICAST_HOPS"); + else + printf("Using %d for hop limit.\n", hoplimit); + +5.2 Sending and Receiving Multicast Packets + + IPv6 applications may send UDP multicast packets by simply specifying + an IPv6 multicast address in the address argument of the sendto() + function. + + Three socket options at the IPPROTO_IPV6 layer control some of the + parameters for sending multicast packets. Setting these options is + not required: applications may send multicast packets without using + these options. The setsockopt() options for controlling the sending + of multicast packets are summarized below. These three options can + also be used with getsockopt(). + + IPV6_MULTICAST_IF + + Set the interface to use for outgoing multicast packets. The + argument is the index of the interface to use. + + Argument type: unsigned int + + IPV6_MULTICAST_HOPS + + Set the hop limit to use for outgoing multicast packets. (Note + a separate option - IPV6_UNICAST_HOPS - is provided to set the + hop limit to use for outgoing unicast packets.) + + The interpretation of the argument is the same as for the + IPV6_UNICAST_HOPS option: + + + + + +Gilligan, et. al. Informational [Page 19] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + x < -1: return an error of EINVAL + x == -1: use kernel default + 0 <= x <= 255: use x + x >= 256: return an error of EINVAL + + If IPV6_MULTICAST_HOPS is not set, the default is 1 + (same as IPv4 today) + + Argument type: int + + IPV6_MULTICAST_LOOP + + If a multicast datagram is sent to a group to which the sending + host itself belongs (on the outgoing interface), a copy of the + datagram is looped back by the IP layer for local delivery if + this option is set to 1. If this option is set to 0 a copy + is not looped back. Other option values return an error of + EINVAL. + + If IPV6_MULTICAST_LOOP is not set, the default is 1 (loopback; + same as IPv4 today). + + Argument type: unsigned int + + The reception of multicast packets is controlled by the two + setsockopt() options summarized below. An error of EOPNOTSUPP is + returned if these two options are used with getsockopt(). + + IPV6_JOIN_GROUP + + Join a multicast group on a specified local interface. If the + interface index is specified as 0, the kernel chooses the local + interface. For example, some kernels look up the multicast + group in the normal IPv6 routing table and using the resulting + interface. + + Argument type: struct ipv6_mreq + + IPV6_LEAVE_GROUP + + Leave a multicast group on a specified interface. + + Argument type: struct ipv6_mreq + + The argument type of both of these options is the ipv6_mreq structure, + defined as a result of including the <netinet/in.h> header; + + + + + +Gilligan, et. al. Informational [Page 20] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + struct ipv6_mreq { + struct in6_addr ipv6mr_multiaddr; /* IPv6 multicast addr */ + unsigned int ipv6mr_interface; /* interface index */ + }; + + Note that to receive multicast datagrams a process must join the + multicast group and bind the UDP port to which datagrams will be + sent. Some processes also bind the multicast group address to the + socket, in addition to the port, to prevent other datagrams destined + to that same port from being delivered to the socket. + +6. Library Functions + + New library functions are needed to perform a variety of operations + with IPv6 addresses. Functions are needed to lookup IPv6 addresses + in the Domain Name System (DNS). Both forward lookup (nodename-to- + address translation) and reverse lookup (address-to-nodename + translation) need to be supported. Functions are also needed to + convert IPv6 addresses between their binary and textual form. + + We note that the two existing functions, gethostbyname() and + gethostbyaddr(), are left as-is. New functions are defined to handle + both IPv4 and IPv6 addresses. + +6.1 Nodename-to-Address Translation + + The commonly used function gethostbyname() is inadequate for many + applications, first because it provides no way for the caller to + specify anything about the types of addresses desired (IPv4 only, + IPv6 only, IPv4-mapped IPv6 are OK, etc.), and second because many + implementations of this function are not thread safe. RFC 2133 + defined a function named gethostbyname2() but this function was also + inadequate, first because its use required setting a global option + (RES_USE_INET6) when IPv6 addresses were required, and second because + a flag argument is needed to provide the caller with additional + control over the types of addresses required. + + The following function is new and must be thread safe: + + #include <sys/socket.h> + #include <netdb.h> + + struct hostent *getipnodebyname(const char *name, int af, int flags + int *error_num); + + The name argument can be either a node name or a numeric address + string (i.e., a dotted-decimal IPv4 address or an IPv6 hex address). + The af argument specifies the address family, either AF_INET or + + + +Gilligan, et. al. Informational [Page 21] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + AF_INET6. The error_num value is returned to the caller, via a + pointer, with the appropriate error code in error_num, to support + thread safe error code returns. error_num will be set to one of the + following values: + + HOST_NOT_FOUND + + No such host is known. + + NO_ADDRESS + + The server recognised the request and the name but no address is + available. Another type of request to the name server for the + domain might return an answer. + + NO_RECOVERY + + An unexpected server failure occurred which cannot be recovered. + + TRY_AGAIN + + A temporary and possibly transient error occurred, such as a + failure of a server to respond. + + The flags argument specifies the types of addresses that are searched + for, and the types of addresses that are returned. We note that a + special flags value of AI_DEFAULT (defined below) should handle most + applications. + + That is, porting simple applications to use IPv6 replaces the call + + hptr = gethostbyname(name); + + with + + hptr = getipnodebyname(name, AF_INET6, AI_DEFAULT, &error_num); + + and changes any subsequent error diagnosis code to use error_num + instead of externally declared variables, such as h_errno. + + Applications desiring finer control over the types of addresses + searched for and returned, can specify other combinations of the + flags argument. + + + + + + + + +Gilligan, et. al. Informational [Page 22] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + A flags of 0 implies a strict interpretation of the af argument: + + - If flags is 0 and af is AF_INET, then the caller wants only + IPv4 addresses. A query is made for A records. If successful, + the IPv4 addresses are returned and the h_length member of the + hostent structure will be 4, else the function returns a NULL + pointer. + + - If flags is 0 and if af is AF_INET6, then the caller wants only + IPv6 addresses. A query is made for AAAA records. If + successful, the IPv6 addresses are returned and the h_length + member of the hostent structure will be 16, else the function + returns a NULL pointer. + + Other constants can be logically-ORed into the flags argument, to + modify the behavior of the function. + + - If the AI_V4MAPPED flag is specified along with an af of + AF_INET6, then the caller will accept IPv4-mapped IPv6 + addresses. That is, if no AAAA records are found then a query + is made for A records and any found are returned as IPv4-mapped + IPv6 addresses (h_length will be 16). The AI_V4MAPPED flag is + ignored unless af equals AF_INET6. + + - The AI_ALL flag is used in conjunction with the AI_V4MAPPED + flag, and is only used with the IPv6 address family. When AI_ALL + is logically or'd with AI_V4MAPPED flag then the caller wants + all addresses: IPv6 and IPv4-mapped IPv6. A query is first made + for AAAA records and if successful, the IPv6 addresses are + returned. Another query is then made for A records and any found + are returned as IPv4-mapped IPv6 addresses. h_length will be 16. + Only if both queries fail does the function return a NULL pointer. + This flag is ignored unless af equals AF_INET6. + + - The AI_ADDRCONFIG flag specifies that a query for AAAA records + should occur only if the node has at least one IPv6 source + address configured and a query for A records should occur only + if the node has at least one IPv4 source address configured. + + For example, if the node has no IPv6 source addresses + configured, and af equals AF_INET6, and the node name being + looked up has both AAAA and A records, then: + + (a) if only AI_ADDRCONFIG is specified, the function + returns a NULL pointer; + (b) if AI_ADDRCONFIG | AI_V4MAPPED is specified, the A + records are returned as IPv4-mapped IPv6 addresses; + + + + +Gilligan, et. al. Informational [Page 23] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + The special flags value of AI_DEFAULT is defined as + + #define AI_DEFAULT (AI_V4MAPPED | AI_ADDRCONFIG) + + We noted that the getipnodebyname() function must allow the name + argument to be either a node name or a literal address string (i.e., + a dotted-decimal IPv4 address or an IPv6 hex address). This saves + applications from having to call inet_pton() to handle literal + address strings. + + There are four scenarios based on the type of literal address string + and the value of the af argument. + + The two simple cases are: + + When name is a dotted-decimal IPv4 address and af equals AF_INET, or + when name is an IPv6 hex address and af equals AF_INET6. The members + of the returned hostent structure are: h_name points to a copy of the + name argument, h_aliases is a NULL pointer, h_addrtype is a copy of + the af argument, h_length is either 4 (for AF_INET) or 16 (for + AF_INET6), h_addr_list[0] is a pointer to the 4-byte or 16-byte + binary address, and h_addr_list[1] is a NULL pointer. + + When name is a dotted-decimal IPv4 address and af equals AF_INET6, + and flags equals AI_V4MAPPED, an IPv4-mapped IPv6 address is + returned: h_name points to an IPv6 hex address containing the IPv4- + mapped IPv6 address, h_aliases is a NULL pointer, h_addrtype is + AF_INET6, h_length is 16, h_addr_list[0] is a pointer to the 16-byte + binary address, and h_addr_list[1] is a NULL pointer. If AI_V4MAPPED + is set (with or without AI_ALL) return IPv4-mapped otherwise return + NULL. + + It is an error when name is an IPv6 hex address and af equals + AF_INET. The function's return value is a NULL pointer and error_num + equals HOST_NOT_FOUND. + +6.2 Address-To-Nodename Translation + + The following function has the same arguments as the existing + gethostbyaddr() function, but adds an error number. + + #include <sys/socket.h> #include <netdb.h> + + struct hostent *getipnodebyaddr(const void *src, size_t len, + int af, int *error_num); + + + + + + +Gilligan, et. al. Informational [Page 24] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + As with getipnodebyname(), getipnodebyaddr() must be thread safe. + The error_num value is returned to the caller with the appropriate + error code, to support thread safe error code returns. The following + error conditions may be returned for error_num: + + HOST_NOT_FOUND + + No such host is known. + + NO_ADDRESS + + The server recognized the request and the name but no address + is available. Another type of request to the name server for + the domain might return an answer. + + NO_RECOVERY + + An unexpected server failure occurred which cannot be + recovered. + + TRY_AGAIN + + A temporary and possibly transient error occurred, such as a + failure of a server to respond. + + One possible source of confusion is the handling of IPv4-mapped IPv6 + addresses and IPv4-compatible IPv6 addresses, but the following logic + should apply. + + 1. If af is AF_INET6, and if len equals 16, and if the IPv6 + address is an IPv4-mapped IPv6 address or an IPv4-compatible + IPv6 address, then skip over the first 12 bytes of the IPv6 + address, set af to AF_INET, and set len to 4. + + 2. If af is AF_INET, lookup the name for the given IPv4 address + (e.g., query for a PTR record in the in-addr.arpa domain). + + 3. If af is AF_INET6, lookup the name for the given IPv6 address + (e.g., query for a PTR record in the ip6.int domain). + + 4. If the function is returning success, then the single address + that is returned in the hostent structure is a copy of the + first argument to the function with the same address family + that was passed as an argument to this function. + + + + + + + +Gilligan, et. al. Informational [Page 25] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + All four steps listed are performed, in order. Also note that the + IPv6 hex addresses "::" and "::1" MUST NOT be treated as IPv4- + compatible addresses, and if the address is "::", HOST_NOT_FOUND MUST + be returned and a query of the address not performed. + + Also for the macro in section 6.7 IN6_IS_ADDR_V4COMPAT MUST return + false for "::" and "::1". + +6.3 Freeing memory for getipnodebyname and getipnodebyaddr + + The hostent structure does not change from its existing definition. + This structure, and the information pointed to by this structure, are + dynamically allocated by getipnodebyname and getipnodebyaddr. The + following function frees this memory: + + #include <netdb.h> + + void freehostent(struct hostent *ptr); + +6.4 Protocol-Independent Nodename and Service Name Translation + + Nodename-to-address translation is done in a protocol-independent + fashion using the getaddrinfo() function that is taken from the + Institute of Electrical and Electronic Engineers (IEEE) POSIX 1003.1g + (Protocol Independent Interfaces) draft specification [3]. + + The official specification for this function will be the final POSIX + standard, with the following additional requirements: + + - getaddrinfo() (along with the getnameinfo() function described + in the next section) must be thread safe. + + - The AI_NUMERICHOST is new with this document. + + - All fields in socket address structures returned by + getaddrinfo() that are not filled in through an explicit + argument (e.g., sin6_flowinfo and sin_zero) must be set to 0. + (This makes it easier to compare socket address structures.) + + - getaddrinfo() must fill in the length field of a socket address + structure (e.g., sin6_len) on systems that support this field. + + We are providing this independent description of the function because + POSIX standards are not freely available (as are IETF documents). + + #include <sys/socket.h> + #include <netdb.h> + + + + +Gilligan, et. al. Informational [Page 26] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + int getaddrinfo(const char *nodename, const char *servname, + const struct addrinfo *hints, + struct addrinfo **res); + + The addrinfo structure is defined as a result of including the + <netdb.h> header. + + struct addrinfo { + int ai_flags; /* AI_PASSIVE, AI_CANONNAME, AI_NUMERICHOST */ + int ai_family; /* PF_xxx */ + int ai_socktype; /* SOCK_xxx */ + int ai_protocol; /* 0 or IPPROTO_xxx for IPv4 and IPv6 */ + size_t ai_addrlen; /* length of ai_addr */ + char *ai_canonname; /* canonical name for nodename */ + struct sockaddr *ai_addr; /* binary address */ + struct addrinfo *ai_next; /* next structure in linked list */ + }; + + The return value from the function is 0 upon success or a nonzero + error code. The following names are the nonzero error codes from + getaddrinfo(), and are defined in <netdb.h>: + + EAI_ADDRFAMILY address family for nodename not supported + EAI_AGAIN temporary failure in name resolution + EAI_BADFLAGS invalid value for ai_flags + EAI_FAIL non-recoverable failure in name resolution + EAI_FAMILY ai_family not supported + EAI_MEMORY memory allocation failure + EAI_NODATA no address associated with nodename + EAI_NONAME nodename nor servname provided, or not known + EAI_SERVICE servname not supported for ai_socktype + EAI_SOCKTYPE ai_socktype not supported + EAI_SYSTEM system error returned in errno + + The nodename and servname arguments are pointers to null-terminated + strings or NULL. One or both of these two arguments must be a non- + NULL pointer. In the normal client scenario, both the nodename and + servname are specified. In the normal server scenario, only the + servname is specified. A non-NULL nodename string can be either a + node name or a numeric host address string (i.e., a dotted-decimal + IPv4 address or an IPv6 hex address). A non-NULL servname string can + be either a service name or a decimal port number. + + The caller can optionally pass an addrinfo structure, pointed to by + the third argument, to provide hints concerning the type of socket + that the caller supports. In this hints structure all members other + than ai_flags, ai_family, ai_socktype, and ai_protocol must be zero + or a NULL pointer. A value of PF_UNSPEC for ai_family means the + + + +Gilligan, et. al. Informational [Page 27] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + caller will accept any protocol family. A value of 0 for ai_socktype + means the caller will accept any socket type. A value of 0 for + ai_protocol means the caller will accept any protocol. For example, + if the caller handles only TCP and not UDP, then the ai_socktype + member of the hints structure should be set to SOCK_STREAM when + getaddrinfo() is called. If the caller handles only IPv4 and not + IPv6, then the ai_family member of the hints structure should be set + to PF_INET when getaddrinfo() is called. If the third argument to + getaddrinfo() is a NULL pointer, this is the same as if the caller + had filled in an addrinfo structure initialized to zero with + ai_family set to PF_UNSPEC. + + Upon successful return a pointer to a linked list of one or more + addrinfo structures is returned through the final argument. The + caller can process each addrinfo structure in this list by following + the ai_next pointer, until a NULL pointer is encountered. In each + returned addrinfo structure the three members ai_family, ai_socktype, + and ai_protocol are the corresponding arguments for a call to the + socket() function. In each addrinfo structure the ai_addr member + points to a filled-in socket address structure whose length is + specified by the ai_addrlen member. + + If the AI_PASSIVE bit is set in the ai_flags member of the hints + structure, then the caller plans to use the returned socket address + structure in a call to bind(). In this case, if the nodename + argument is a NULL pointer, then the IP address portion of the socket + address structure will be set to INADDR_ANY for an IPv4 address or + IN6ADDR_ANY_INIT for an IPv6 address. + + If the AI_PASSIVE bit is not set in the ai_flags member of the hints + structure, then the returned socket address structure will be ready + for a call to connect() (for a connection-oriented protocol) or + either connect(), sendto(), or sendmsg() (for a connectionless + protocol). In this case, if the nodename argument is a NULL pointer, + then the IP address portion of the socket address structure will be + set to the loopback address. + + If the AI_CANONNAME bit is set in the ai_flags member of the hints + structure, then upon successful return the ai_canonname member of the + first addrinfo structure in the linked list will point to a null- + terminated string containing the canonical name of the specified + nodename. + + If the AI_NUMERICHOST bit is set in the ai_flags member of the hints + structure, then a non-NULL nodename string must be a numeric host + address string. Otherwise an error of EAI_NONAME is returned. This + flag prevents any type of name resolution service (e.g., the DNS) + from being called. + + + +Gilligan, et. al. Informational [Page 28] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + All of the information returned by getaddrinfo() is dynamically + allocated: the addrinfo structures, and the socket address structures + and canonical node name strings pointed to by the addrinfo + structures. To return this information to the system the function + freeaddrinfo() is called: + + #include <sys/socket.h> #include <netdb.h> + + void freeaddrinfo(struct addrinfo *ai); + + The addrinfo structure pointed to by the ai argument is freed, along + with any dynamic storage pointed to by the structure. This operation + is repeated until a NULL ai_next pointer is encountered. + + To aid applications in printing error messages based on the EAI_xxx + codes returned by getaddrinfo(), the following function is defined. + + #include <sys/socket.h> #include <netdb.h> + + char *gai_strerror(int ecode); + + The argument is one of the EAI_xxx values defined earlier and the + return value points to a string describing the error. If the + argument is not one of the EAI_xxx values, the function still returns + a pointer to a string whose contents indicate an unknown error. + +6.5 Socket Address Structure to Nodename and Service Name + + The POSIX 1003.1g specification includes no function to perform the + reverse conversion from getaddrinfo(): to look up a nodename and + service name, given the binary address and port. Therefore, we + define the following function: + + #include <sys/socket.h> + #include <netdb.h> + + int getnameinfo(const struct sockaddr *sa, socklen_t salen, + char *host, size_t hostlen, + char *serv, size_t servlen, + int flags); + + This function looks up an IP address and port number provided by the + caller in the DNS and system-specific database, and returns text + strings for both in buffers provided by the caller. The function + indicates successful completion by a zero return value; a non-zero + return value indicates failure. + + + + + +Gilligan, et. al. Informational [Page 29] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + The first argument, sa, points to either a sockaddr_in structure (for + IPv4) or a sockaddr_in6 structure (for IPv6) that holds the IP + address and port number. The salen argument gives the length of the + sockaddr_in or sockaddr_in6 structure. + + The function returns the nodename associated with the IP address in + the buffer pointed to by the host argument. The caller provides the + size of this buffer via the hostlen argument. The service name + associated with the port number is returned in the buffer pointed to + by serv, and the servlen argument gives the length of this buffer. + The caller specifies not to return either string by providing a zero + value for the hostlen or servlen arguments. Otherwise, the caller + must provide buffers large enough to hold the nodename and the + service name, including the terminating null characters. + + Unfortunately most systems do not provide constants that specify the + maximum size of either a fully-qualified domain name or a service + name. Therefore to aid the application in allocating buffers for + these two returned strings the following constants are defined in + <netdb.h>: + + #define NI_MAXHOST 1025 + #define NI_MAXSERV 32 + + The first value is actually defined as the constant MAXDNAME in recent + versions of BIND's <arpa/nameser.h> header (older versions of BIND + define this constant to be 256) and the second is a guess based on the + services listed in the current Assigned Numbers RFC. + + The final argument is a flag that changes the default actions of this + function. By default the fully-qualified domain name (FQDN) for the + host is looked up in the DNS and returned. If the flag bit NI_NOFQDN + is set, only the nodename portion of the FQDN is returned for local + hosts. + + If the flag bit NI_NUMERICHOST is set, or if the host's name cannot be + located in the DNS, the numeric form of the host's address is returned + instead of its name (e.g., by calling inet_ntop() instead of + getipnodebyaddr()). If the flag bit NI_NAMEREQD is set, an error is + returned if the host's name cannot be located in the DNS. + + If the flag bit NI_NUMERICSERV is set, the numeric form of the service + address is returned (e.g., its port number) instead of its name. The + two NI_NUMERICxxx flags are required to support the "-n" flag that + many commands provide. + + + + + + +Gilligan, et. al. Informational [Page 30] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + A fifth flag bit, NI_DGRAM, specifies that the service is a datagram + service, and causes getservbyport() to be called with a second + argument of "udp" instead of its default of "tcp". This is required + for the few ports (e.g. 512-514) that have different services for UDP + and TCP. + + These NI_xxx flags are defined in <netdb.h> along with the AI_xxx + flags already defined for getaddrinfo(). + +6.6 Address Conversion Functions + + The two functions inet_addr() and inet_ntoa() convert an IPv4 address + between binary and text form. IPv6 applications need similar + functions. The following two functions convert both IPv6 and IPv4 + addresses: + + #include <sys/socket.h> + #include <arpa/inet.h> + + int inet_pton(int af, const char *src, void *dst); + + const char *inet_ntop(int af, const void *src, + char *dst, size_t size); + + The inet_pton() function converts an address in its standard text + presentation form into its numeric binary form. The af argument + specifies the family of the address. Currently the AF_INET and + AF_INET6 address families are supported. The src argument points to + the string being passed in. The dst argument points to a buffer into + which the function stores the numeric address. The address is + returned in network byte order. Inet_pton() returns 1 if the + conversion succeeds, 0 if the input is not a valid IPv4 dotted- + decimal string or a valid IPv6 address string, or -1 with errno set + to EAFNOSUPPORT if the af argument is unknown. The calling + application must ensure that the buffer referred to by dst is large + enough to hold the numeric address (e.g., 4 bytes for AF_INET or 16 + bytes for AF_INET6). + + If the af argument is AF_INET, the function accepts a string in the + standard IPv4 dotted-decimal form: + + ddd.ddd.ddd.ddd + + where ddd is a one to three digit decimal number between 0 and 255. + Note that many implementations of the existing inet_addr() and + inet_aton() functions accept nonstandard input: octal numbers, + hexadecimal numbers, and fewer than four numbers. inet_pton() does + not accept these formats. + + + +Gilligan, et. al. Informational [Page 31] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + If the af argument is AF_INET6, then the function accepts a string in + one of the standard IPv6 text forms defined in Section 2.2 of the + addressing architecture specification [2]. + + The inet_ntop() function converts a numeric address into a text + string suitable for presentation. The af argument specifies the + family of the address. This can be AF_INET or AF_INET6. The src + argument points to a buffer holding an IPv4 address if the af + argument is AF_INET, or an IPv6 address if the af argument is + AF_INET6, the address must be in network byte order. The dst + argument points to a buffer where the function will store the + resulting text string. The size argument specifies the size of this + buffer. The application must specify a non-NULL dst argument. For + IPv6 addresses, the buffer must be at least 46-octets. For IPv4 + addresses, the buffer must be at least 16-octets. In order to allow + applications to easily declare buffers of the proper size to store + IPv4 and IPv6 addresses in string form, the following two constants + are defined in <netinet/in.h>: + + #define INET_ADDRSTRLEN 16 + #define INET6_ADDRSTRLEN 46 + + The inet_ntop() function returns a pointer to the buffer containing + the text string if the conversion succeeds, and NULL otherwise. Upon + failure, errno is set to EAFNOSUPPORT if the af argument is invalid or + ENOSPC if the size of the result buffer is inadequate. + +6.7 Address Testing Macros + + The following macros can be used to test for special IPv6 addresses. + + #include <netinet/in.h> + + int IN6_IS_ADDR_UNSPECIFIED (const struct in6_addr *); + int IN6_IS_ADDR_LOOPBACK (const struct in6_addr *); + int IN6_IS_ADDR_MULTICAST (const struct in6_addr *); + int IN6_IS_ADDR_LINKLOCAL (const struct in6_addr *); + int IN6_IS_ADDR_SITELOCAL (const struct in6_addr *); + int IN6_IS_ADDR_V4MAPPED (const struct in6_addr *); + int IN6_IS_ADDR_V4COMPAT (const struct in6_addr *); + + int IN6_IS_ADDR_MC_NODELOCAL(const struct in6_addr *); + int IN6_IS_ADDR_MC_LINKLOCAL(const struct in6_addr *); + int IN6_IS_ADDR_MC_SITELOCAL(const struct in6_addr *); + int IN6_IS_ADDR_MC_ORGLOCAL (const struct in6_addr *); + int IN6_IS_ADDR_MC_GLOBAL (const struct in6_addr *); + + + + + +Gilligan, et. al. Informational [Page 32] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + The first seven macros return true if the address is of the specified + type, or false otherwise. The last five test the scope of a + multicast address and return true if the address is a multicast + address of the specified scope or false if the address is either not + a multicast address or not of the specified scope. Note that + IN6_IS_ADDR_LINKLOCAL and IN6_IS_ADDR_SITELOCAL return true only for + the two local-use IPv6 unicast addresses. These two macros do not + return true for IPv6 multicast addresses of either link-local scope + or site-local scope. + +7. Summary of New Definitions + + The following list summarizes the constants, structure, and extern + definitions discussed in this memo, sorted by header. + + <net/if.h> IF_NAMESIZE + <net/if.h> struct if_nameindex{}; + + <netdb.h> AI_ADDRCONFIG + <netdb.h> AI_DEFAULT + <netdb.h> AI_ALL + <netdb.h> AI_CANONNAME + <netdb.h> AI_NUMERICHOST + <netdb.h> AI_PASSIVE + <netdb.h> AI_V4MAPPED + <netdb.h> EAI_ADDRFAMILY + <netdb.h> EAI_AGAIN + <netdb.h> EAI_BADFLAGS + <netdb.h> EAI_FAIL + <netdb.h> EAI_FAMILY + <netdb.h> EAI_MEMORY + <netdb.h> EAI_NODATA + <netdb.h> EAI_NONAME + <netdb.h> EAI_SERVICE + <netdb.h> EAI_SOCKTYPE + <netdb.h> EAI_SYSTEM + <netdb.h> NI_DGRAM + <netdb.h> NI_MAXHOST + <netdb.h> NI_MAXSERV + <netdb.h> NI_NAMEREQD + <netdb.h> NI_NOFQDN + <netdb.h> NI_NUMERICHOST + <netdb.h> NI_NUMERICSERV + <netdb.h> struct addrinfo{}; + + <netinet/in.h> IN6ADDR_ANY_INIT + <netinet/in.h> IN6ADDR_LOOPBACK_INIT + <netinet/in.h> INET6_ADDRSTRLEN + + + +Gilligan, et. al. Informational [Page 33] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + <netinet/in.h> INET_ADDRSTRLEN + <netinet/in.h> IPPROTO_IPV6 + <netinet/in.h> IPV6_JOIN_GROUP + <netinet/in.h> IPV6_LEAVE_GROUP + <netinet/in.h> IPV6_MULTICAST_HOPS + <netinet/in.h> IPV6_MULTICAST_IF + <netinet/in.h> IPV6_MULTICAST_LOOP + <netinet/in.h> IPV6_UNICAST_HOPS + <netinet/in.h> SIN6_LEN + <netinet/in.h> extern const struct in6_addr in6addr_any; + <netinet/in.h> extern const struct in6_addr in6addr_loopback; + <netinet/in.h> struct in6_addr{}; + <netinet/in.h> struct ipv6_mreq{}; + <netinet/in.h> struct sockaddr_in6{}; + + <sys/socket.h> AF_INET6 + <sys/socket.h> PF_INET6 + <sys/socket.h> struct sockaddr_storage; + + The following list summarizes the function and macro prototypes + discussed in this memo, sorted by header. + +<arpa/inet.h> int inet_pton(int, const char *, void *); +<arpa/inet.h> const char *inet_ntop(int, const void *, + char *, size_t); + +<net/if.h> char *if_indextoname(unsigned int, char *); +<net/if.h> unsigned int if_nametoindex(const char *); +<net/if.h> void if_freenameindex(struct if_nameindex *); +<net/if.h> struct if_nameindex *if_nameindex(void); + +<netdb.h> int getaddrinfo(const char *, const char *, + const struct addrinfo *, + struct addrinfo **); +<netdb.h> int getnameinfo(const struct sockaddr *, socklen_t, + char *, size_t, char *, size_t, int); +<netdb.h> void freeaddrinfo(struct addrinfo *); +<netdb.h> char *gai_strerror(int); +<netdb.h> struct hostent *getipnodebyname(const char *, int, int, + int *); +<netdb.h> struct hostent *getipnodebyaddr(const void *, size_t, + int, int *); +<netdb.h> void freehostent(struct hostent *); + +<netinet/in.h> int IN6_IS_ADDR_LINKLOCAL(const struct in6_addr *); +<netinet/in.h> int IN6_IS_ADDR_LOOPBACK(const struct in6_addr *); +<netinet/in.h> int IN6_IS_ADDR_MC_GLOBAL(const struct in6_addr *); +<netinet/in.h> int IN6_IS_ADDR_MC_LINKLOCAL(const struct in6_addr *); + + + +Gilligan, et. al. Informational [Page 34] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + +<netinet/in.h> int IN6_IS_ADDR_MC_NODELOCAL(const struct in6_addr *); +<netinet/in.h> int IN6_IS_ADDR_MC_ORGLOCAL(const struct in6_addr *); +<netinet/in.h> int IN6_IS_ADDR_MC_SITELOCAL(const struct in6_addr *); +<netinet/in.h> int IN6_IS_ADDR_MULTICAST(const struct in6_addr *); +<netinet/in.h> int IN6_IS_ADDR_SITELOCAL(const struct in6_addr *); +<netinet/in.h> int IN6_IS_ADDR_UNSPECIFIED(const struct in6_addr *); +<netinet/in.h> int IN6_IS_ADDR_V4COMPAT(const struct in6_addr *); +<netinet/in.h> int IN6_IS_ADDR_V4MAPPED(const struct in6_addr *); + +8. Security Considerations + + IPv6 provides a number of new security mechanisms, many of which need + to be accessible to applications. Companion memos detailing the + extensions to the socket interfaces to support IPv6 security are + being written. + +9. Year 2000 Considerations + + There are no issues for this memo concerning the Year 2000 issue + regarding the use of dates. + +Changes From RFC 2133 + + Changes made in the March 1998 Edition (-01 draft): + + Changed all "hostname" to "nodename" for consistency with other + IPv6 documents. + + Section 3.3: changed comment for sin6_flowinfo to be "traffic + class & flow info" and updated corresponding text description to + current definition of these two fields. + + Section 3.10 ("Portability Additions") is new. + + Section 6: a new paragraph was added reiterating that the existing + gethostbyname() and gethostbyaddr() are not changed. + + Section 6.1: change gethostbyname3() to getnodebyname(). Add + AI_DEFAULT to handle majority of applications. Renamed + AI_V6ADDRCONFIG to AI_ADDRCONFIG and define it for A records and + IPv4 addresses too. Defined exactly what getnodebyname() must + return if the name argument is a numeric address string. + + Section 6.2: change gethostbyaddr() to getnodebyaddr(). Reword + items 2 and 3 in the description of how to handle IPv4-mapped and + IPv4- compatible addresses to "lookup a name" for a given address, + instead of specifying what type of DNS query to issue. + + + + +Gilligan, et. al. Informational [Page 35] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + Section 6.3: added two more requirements to getaddrinfo(). + + Section 7: added the following constants to the list for + <netdb.h>: AI_ADDRCONFIG, AI_ALL, and AI_V4MAPPED. Add union + sockaddr_union and SA_LEN to the lists for <sys/socket.h>. + + Updated references. + + Changes made in the November 1997 Edition (-00 draft): + + The data types have been changed to conform with Draft 6.6 of the + Posix 1003.1g standard. + + Section 3.2: data type of s6_addr changed to "uint8_t". + + Section 3.3: data type of sin6_family changed to "sa_family_t". + data type of sin6_port changed to "in_port_t", data type of + sin6_flowinfo changed to "uint32_t". + + Section 3.4: same as Section 3.3, plus data type of sin6_len + changed to "uint8_t". + + Section 6.2: first argument of gethostbyaddr() changed from "const + char *" to "const void *" and second argument changed from "int" + to "size_t". + + Section 6.4: second argument of getnameinfo() changed from + "size_t" to "socklen_t". + + The wording was changed when new structures were defined, to be + more explicit as to which header must be included to define the + structure: + + Section 3.2 (in6_addr{}), Section 3.3 (sockaddr_in6{}), Section + 3.4 (sockaddr_in6{}), Section 4.3 (if_nameindex{}), Section 5.3 + (ipv6_mreq{}), and Section 6.3 (addrinfo{}). + + Section 4: NET_RT_LIST changed to NET_RT_IFLIST. + + Section 5.1: The IPV6_ADDRFORM socket option was removed. + + Section 5.3: Added a note that an option value other than 0 or 1 + for IPV6_MULTICAST_LOOP returns an error. Added a note that + IPV6_MULTICAST_IF, IPV6_MULTICAST_HOPS, and IPV6_MULTICAST_LOOP + can also be used with getsockopt(), but IPV6_ADD_MEMBERSHIP and + IPV6_DROP_MEMBERSHIP cannot be used with getsockopt(). + + + + + +Gilligan, et. al. Informational [Page 36] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + Section 6.1: Removed the description of gethostbyname2() and its + associated RES_USE_INET6 option, replacing it with + gethostbyname3(). + + Section 6.2: Added requirement that gethostbyaddr() be thread + safe. Reworded step 4 to avoid using the RES_USE_INET6 option. + + Section 6.3: Added the requirement that getaddrinfo() and + getnameinfo() be thread safe. Added the AI_NUMERICHOST flag. + + Section 6.6: Added clarification about IN6_IS_ADDR_LINKLOCAL and + IN6_IS_ADDR_SITELOCAL macros. + + Changes made to the draft -01 specification Sept 98 + + Changed priority to traffic class in the spec. + + Added the need for scope identification in section 2.1. + + Added sin6_scope_id to struct sockaddr_in6 in sections 3.3 and + 3.4. + + Changed 3.10 to use generic storage structure to support holding + IPv6 addresses and removed the SA_LEN macro. + + Distinguished between invalid input parameters and system failures + for Interface Identification in Section 4.1 and 4.2. + + Added defaults for multicast operations in section 5.2 and changed + the names from ADD to JOIN and DROP to LEAVE to be consistent with + IPv6 multicast terminology. + + Changed getnodebyname to getipnodebyname, getnodebyaddr to + getipnodebyaddr, and added MT safe error code to function + parameters in section 6. + + Moved freehostent to its own sub-section after getipnodebyaddr now + 6.3 (so this bumps all remaining sections in section 6. + + Clarified the use of AI_ALL and AI_V4MAPPED that these are + dependent on the AF parameter and must be used as a conjunction in + section 6.1. + + Removed the restriction that literal addresses cannot be used with + a flags argument in section 6.1. + + Added Year 2000 Section to the draft + + + + +Gilligan, et. al. Informational [Page 37] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + Deleted Reference to the following because the attached is deleted + from the ID directory and has expired. But the logic from the + aforementioned draft still applies, so that was kept in Section + 6.2 bullets after 3rd paragraph. + + [7] P. Vixie, "Reverse Name Lookups of Encapsulated IPv4 + Addresses in IPv6", Internet-Draft, <draft-vixie-ipng- + ipv4ptr-00.txt>, May 1996. + + Deleted the following reference as it is no longer referenced. + And the draft has expired. + + [3] D. McDonald, "A Simple IP Security API Extension to BSD + Sockets", Internet-Draft, <draft-mcdonald-simple-ipsec-api- + 01.txt>, March 1997. + + Deleted the following reference as it is no longer referenced. + + [4] C. Metz, "Network Security API for Sockets", + Internet-Draft, <draft-metz-net-security-api-01.txt>, January + 1998. + + Update current references to current status. + + Added alignment notes for in6_addr and sin6_addr. + + Clarified further that AI_V4MAPPED must be used with a dotted IPv4 + literal address for getipnodebyname(), when address family is + AF_INET6. + + Added text to clarify "::" and "::1" when used by + getipnodebyaddr(). + +Acknowledgments + + Thanks to the many people who made suggestions and provided feedback + to this document, including: Werner Almesberger, Ran Atkinson, Fred + Baker, Dave Borman, Andrew Cherenson, Alex Conta, Alan Cox, Steve + Deering, Richard Draves, Francis Dupont, Robert Elz, Marc Hasson, Tom + Herbert, Bob Hinden, Wan-Yen Hsu, Christian Huitema, Koji Imada, + Markus Jork, Ron Lee, Alan Lloyd, Charles Lynn, Dan McDonald, Dave + Mitton, Thomas Narten, Josh Osborne, Craig Partridge, Jean-Luc + Richier, Erik Scoredos, Keith Sklower, Matt Thomas, Harvey Thompson, + Dean D. Throop, Karen Tracey, Glenn Trewitt, Paul Vixie, David + Waitzman, Carl Williams, and Kazu Yamamoto, + + + + + + +Gilligan, et. al. Informational [Page 38] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + + The getaddrinfo() and getnameinfo() functions are taken from an + earlier Internet Draft by Keith Sklower. As noted in that draft, + William Durst, Steven Wise, Michael Karels, and Eric Allman provided + many useful discussions on the subject of protocol-independent name- + to-address translation, and reviewed early versions of Keith + Sklower's original proposal. Eric Allman implemented the first + prototype of getaddrinfo(). The observation that specifying the pair + of name and service would suffice for connecting to a service + independent of protocol details was made by Marshall Rose in a + proposal to X/Open for a "Uniform Network Interface". + + Craig Metz, Jack McCann, Erik Nordmark, Tim Hartrick, and Mukesh + Kacker made many contributions to this document. Ramesh Govindan + made a number of contributions and co-authored an earlier version of + this memo. + +References + + [1] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) + Specification", RFC 2460, December 1998. + + [2] Hinden, R. and S. Deering, "IP Version 6 Addressing + Architecture", RFC 2373, July 1998. + + [3] IEEE, "Protocol Independent Interfaces", IEEE Std 1003.1g, DRAFT + 6.6, March 1997. + + [4] Stevens, W. and M. Thomas, "Advanced Sockets API for IPv6", RFC + 2292, February 1998. + + + + + + + + + + + + + + + + + + + + + + +Gilligan, et. al. Informational [Page 39] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + +Authors' Addresses + + Robert E. Gilligan + FreeGate Corporation + 1208 E. Arques Ave. + Sunnyvale, CA 94086 + + Phone: +1 408 617 1004 + EMail: gilligan@freegate.com + + + Susan Thomson + Bell Communications Research + MRE 2P-343, 445 South Street + Morristown, NJ 07960 + + Phone: +1 201 829 4514 + EMail: set@thumper.bellcore.com + + + Jim Bound + Compaq Computer Corporation + 110 Spitbrook Road ZK3-3/U14 + Nashua, NH 03062-2698 + + Phone: +1 603 884 0400 + EMail: bound@zk3.dec.com + + + W. Richard Stevens + 1202 E. Paseo del Zorro + Tucson, AZ 85718-2826 + + Phone: +1 520 297 9416 + EMail: rstevens@kohala.com + + + + + + + + + + + + + + + + +Gilligan, et. al. Informational [Page 40] + +RFC 2553 Basic Socket Interface Extensions for IPv6 March 1999 + + +Full Copyright Statement + + Copyright (C) The Internet Society (1999). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + + + + + + + + + + + + + + + + + + + + + + + + +Gilligan, et. al. Informational [Page 41] + diff --git a/doc/rfc/rfc2671.txt b/doc/rfc/rfc2671.txt new file mode 100644 index 00000000..ec05f808 --- /dev/null +++ b/doc/rfc/rfc2671.txt @@ -0,0 +1,395 @@ + + + + + + +Network Working Group P. Vixie +Request for Comments: 2671 ISC +Category: Standards Track August 1999 + + + Extension Mechanisms for DNS (EDNS0) + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (1999). All Rights Reserved. + +Abstract + + The Domain Name System's wire protocol includes a number of fixed + fields whose range has been or soon will be exhausted and does not + allow clients to advertise their capabilities to servers. This + document describes backward compatible mechanisms for allowing the + protocol to grow. + +1 - Rationale and Scope + +1.1. DNS (see [RFC1035]) specifies a Message Format and within such + messages there are standard formats for encoding options, errors, + and name compression. The maximum allowable size of a DNS Message + is fixed. Many of DNS's protocol limits are too small for uses + which are or which are desired to become common. There is no way + for implementations to advertise their capabilities. + +1.2. Existing clients will not know how to interpret the protocol + extensions detailed here. In practice, these clients will be + upgraded when they have need of a new feature, and only new + features will make use of the extensions. We must however take + account of client behaviour in the face of extra fields, and design + a fallback scheme for interoperability with these clients. + + + + + + + + + +Vixie Standards Track [Page 1] + +RFC 2671 Extension Mechanisms for DNS (EDNS0) August 1999 + + +2 - Affected Protocol Elements + +2.1. The DNS Message Header's (see [RFC1035 4.1.1]) second full 16-bit + word is divided into a 4-bit OPCODE, a 4-bit RCODE, and a number of + 1-bit flags. The original reserved Z bits have been allocated to + various purposes, and most of the RCODE values are now in use. + More flags and more possible RCODEs are needed. + +2.2. The first two bits of a wire format domain label are used to denote + the type of the label. [RFC1035 4.1.4] allocates two of the four + possible types and reserves the other two. Proposals for use of + the remaining types far outnumber those available. More label + types are needed. + +2.3. DNS Messages are limited to 512 octets in size when sent over UDP. + While the minimum maximum reassembly buffer size still allows a + limit of 512 octets of UDP payload, most of the hosts now connected + to the Internet are able to reassemble larger datagrams. Some + mechanism must be created to allow requestors to advertise larger + buffer sizes to responders. + +3 - Extended Label Types + +3.1. The "0 1" label type will now indicate an extended label type, + whose value is encoded in the lower six bits of the first octet of + a label. All subsequently developed label types should be encoded + using an extended label type. + +3.2. The "1 1 1 1 1 1" extended label type will be reserved for future + expansion of the extended label type code space. + +4 - OPT pseudo-RR + +4.1. One OPT pseudo-RR can be added to the additional data section of + either a request or a response. An OPT is called a pseudo-RR + because it pertains to a particular transport level message and not + to any actual DNS data. OPT RRs shall never be cached, forwarded, + or stored in or loaded from master files. The quantity of OPT + pseudo-RRs per message shall be either zero or one, but not + greater. + +4.2. An OPT RR has a fixed part and a variable set of options expressed + as {attribute, value} pairs. The fixed part holds some DNS meta + data and also a small collection of new protocol elements which we + expect to be so popular that it would be a waste of wire space to + encode them as {attribute, value} pairs. + + + + + +Vixie Standards Track [Page 2] + +RFC 2671 Extension Mechanisms for DNS (EDNS0) August 1999 + + +4.3. The fixed part of an OPT RR is structured as follows: + + Field Name Field Type Description + ------------------------------------------------------ + NAME domain name empty (root domain) + TYPE u_int16_t OPT + CLASS u_int16_t sender's UDP payload size + TTL u_int32_t extended RCODE and flags + RDLEN u_int16_t describes RDATA + RDATA octet stream {attribute,value} pairs + +4.4. The variable part of an OPT RR is encoded in its RDATA and is + structured as zero or more of the following: + + +0 (MSB) +1 (LSB) + +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ + 0: | OPTION-CODE | + +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ + 2: | OPTION-LENGTH | + +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ + 4: | | + / OPTION-DATA / + / / + +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ + + OPTION-CODE (Assigned by IANA.) + + OPTION-LENGTH Size (in octets) of OPTION-DATA. + + OPTION-DATA Varies per OPTION-CODE. + +4.5. The sender's UDP payload size (which OPT stores in the RR CLASS + field) is the number of octets of the largest UDP payload that can + be reassembled and delivered in the sender's network stack. Note + that path MTU, with or without fragmentation, may be smaller than + this. + +4.5.1. Note that a 512-octet UDP payload requires a 576-octet IP + reassembly buffer. Choosing 1280 on an Ethernet connected + requestor would be reasonable. The consequence of choosing too + large a value may be an ICMP message from an intermediate + gateway, or even a silent drop of the response message. + +4.5.2. Both requestors and responders are advised to take account of the + path's discovered MTU (if already known) when considering message + sizes. + + + + + +Vixie Standards Track [Page 3] + +RFC 2671 Extension Mechanisms for DNS (EDNS0) August 1999 + + +4.5.3. The requestor's maximum payload size can change over time, and + should therefore not be cached for use beyond the transaction in + which it is advertised. + +4.5.4. The responder's maximum payload size can change over time, but + can be reasonably expected to remain constant between two + sequential transactions; for example, a meaningless QUERY to + discover a responder's maximum UDP payload size, followed + immediately by an UPDATE which takes advantage of this size. + (This is considered preferrable to the outright use of TCP for + oversized requests, if there is any reason to suspect that the + responder implements EDNS, and if a request will not fit in the + default 512 payload size limit.) + +4.5.5. Due to transaction overhead, it is unwise to advertise an + architectural limit as a maximum UDP payload size. Just because + your stack can reassemble 64KB datagrams, don't assume that you + want to spend more than about 4KB of state memory per ongoing + transaction. + +4.6. The extended RCODE and flags (which OPT stores in the RR TTL field) + are structured as follows: + + +0 (MSB) +1 (LSB) + +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ + 0: | EXTENDED-RCODE | VERSION | + +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ + 2: | Z | + +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ + + EXTENDED-RCODE Forms upper 8 bits of extended 12-bit RCODE. Note + that EXTENDED-RCODE value "0" indicates that an + unextended RCODE is in use (values "0" through "15"). + + VERSION Indicates the implementation level of whoever sets + it. Full conformance with this specification is + indicated by version "0." Requestors are encouraged + to set this to the lowest implemented level capable + of expressing a transaction, to minimize the + responder and network load of discovering the + greatest common implementation level between + requestor and responder. A requestor's version + numbering strategy should ideally be a run time + configuration option. + + If a responder does not implement the VERSION level + of the request, then it answers with RCODE=BADVERS. + All responses will be limited in format to the + + + +Vixie Standards Track [Page 4] + +RFC 2671 Extension Mechanisms for DNS (EDNS0) August 1999 + + + VERSION level of the request, but the VERSION of each + response will be the highest implementation level of + the responder. In this way a requestor will learn + the implementation level of a responder as a side + effect of every response, including error responses, + including RCODE=BADVERS. + + Z Set to zero by senders and ignored by receivers, + unless modified in a subsequent specification. + +5 - Transport Considerations + +5.1. The presence of an OPT pseudo-RR in a request should be taken as an + indication that the requestor fully implements the given version of + EDNS, and can correctly understand any response that conforms to + that feature's specification. + +5.2. Lack of use of these features in a request must be taken as an + indication that the requestor does not implement any part of this + specification and that the responder may make no use of any + protocol extension described here in its response. + +5.3. Responders who do not understand these protocol extensions are + expected to send a response with RCODE NOTIMPL, FORMERR, or + SERVFAIL. Therefore use of extensions should be "probed" such that + a responder who isn't known to support them be allowed a retry with + no extensions if it responds with such an RCODE. If a responder's + capability level is cached by a requestor, a new probe should be + sent periodically to test for changes to responder capability. + +6 - Security Considerations + + Requestor-side specification of the maximum buffer size may open a + new DNS denial of service attack if responders can be made to send + messages which are too large for intermediate gateways to forward, + thus leading to potential ICMP storms between gateways and + responders. + +7 - IANA Considerations + + The IANA has assigned RR type code 41 for OPT. + + It is the recommendation of this document and its working group + that IANA create a registry for EDNS Extended Label Types, for EDNS + Option Codes, and for EDNS Version Numbers. + + This document assigns label type 0b01xxxxxx as "EDNS Extended Label + Type." We request that IANA record this assignment. + + + +Vixie Standards Track [Page 5] + +RFC 2671 Extension Mechanisms for DNS (EDNS0) August 1999 + + + This document assigns extended label type 0bxx111111 as "Reserved + for future extended label types." We request that IANA record this + assignment. + + This document assigns option code 65535 to "Reserved for future + expansion." + + This document expands the RCODE space from 4 bits to 12 bits. This + will allow IANA to assign more than the 16 distinct RCODE values + allowed in [RFC1035]. + + This document assigns EDNS Extended RCODE "16" to "BADVERS". + + IESG approval should be required to create new entries in the EDNS + Extended Label Type or EDNS Version Number registries, while any + published RFC (including Informational, Experimental, or BCP) + should be grounds for allocation of an EDNS Option Code. + +8 - Acknowledgements + + Paul Mockapetris, Mark Andrews, Robert Elz, Don Lewis, Bob Halley, + Donald Eastlake, Rob Austein, Matt Crawford, Randy Bush, and Thomas + Narten were each instrumental in creating and refining this + specification. + +9 - References + + [RFC1035] Mockapetris, P., "Domain Names - Implementation and + Specification", STD 13, RFC 1035, November 1987. + +10 - Author's Address + + Paul Vixie + Internet Software Consortium + 950 Charter Street + Redwood City, CA 94063 + + Phone: +1 650 779 7001 + EMail: vixie@isc.org + + + + + + + + + + + + +Vixie Standards Track [Page 6] + +RFC 2671 Extension Mechanisms for DNS (EDNS0) August 1999 + + +11 - Full Copyright Statement + + Copyright (C) The Internet Society (1999). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Acknowledgement + + Funding for the RFC Editor function is currently provided by the + Internet Society. + + + + + + + + + + + + + + + + + + + +Vixie Standards Track [Page 7] + diff --git a/doc/rfc/rfc2672.txt b/doc/rfc/rfc2672.txt new file mode 100644 index 00000000..11030168 --- /dev/null +++ b/doc/rfc/rfc2672.txt @@ -0,0 +1,507 @@ + + + + + + +Network Working Group M. Crawford +Request for Comments: 2672 Fermilab +Category: Standards Track August 1999 + + + Non-Terminal DNS Name Redirection + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (1999). All Rights Reserved. + +1. Introduction + + This document defines a new DNS Resource Record called "DNAME", which + provides the capability to map an entire subtree of the DNS name + space to another domain. It differs from the CNAME record which maps + a single node of the name space. + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in [KWORD]. + +2. Motivation + + This Resource Record and its processing rules were conceived as a + solution to the problem of maintaining address-to-name mappings in a + context of network renumbering. Without the DNAME mechanism, an + authoritative DNS server for the address-to-name mappings of some + network must be reconfigured when that network is renumbered. With + DNAME, the zone can be constructed so that it needs no modification + when renumbered. DNAME can also be useful in other situations, such + as when an organizational unit is renamed. + +3. The DNAME Resource Record + + The DNAME RR has mnemonic DNAME and type code 39 (decimal). + + + + + + + +Crawford Standards Track [Page 1] + +RFC 2672 Non-Terminal DNS Name Redirection August 1999 + + + DNAME has the following format: + + <owner> <ttl> <class> DNAME <target> + + The format is not class-sensitive. All fields are required. The + RDATA field <target> is a <domain-name> [DNSIS]. + + The DNAME RR causes type NS additional section processing. + + The effect of the DNAME record is the substitution of the record's + <target> for its <owner> as a suffix of a domain name. A "no- + descendants" limitation governs the use of DNAMEs in a zone file: + + If a DNAME RR is present at a node N, there may be other data at N + (except a CNAME or another DNAME), but there MUST be no data at + any descendant of N. This restriction applies only to records of + the same class as the DNAME record. + + This rule assures predictable results when a DNAME record is cached + by a server which is not authoritative for the record's zone. It + MUST be enforced when authoritative zone data is loaded. Together + with the rules for DNS zone authority [DNSCLR] it implies that DNAME + and NS records can only coexist at the top of a zone which has only + one node. + + The compression scheme of [DNSIS] MUST NOT be applied to the RDATA + portion of a DNAME record unless the sending server has some way of + knowing that the receiver understands the DNAME record format. + Signalling such understanding is expected to be the subject of future + DNS Extensions. + + Naming loops can be created with DNAME records or a combination of + DNAME and CNAME records, just as they can with CNAME records alone. + Resolvers, including resolvers embedded in DNS servers, MUST limit + the resources they devote to any query. Implementors should note, + however, that fairly lengthy chains of DNAME records may be valid. + +4. Query Processing + + To exploit the DNAME mechanism the name resolution algorithms [DNSCF] + must be modified slightly for both servers and resolvers. + + Both modified algorithms incorporate the operation of making a + substitution on a name (either QNAME or SNAME) under control of a + DNAME record. This operation will be referred to as "the DNAME + substitution". + + + + + +Crawford Standards Track [Page 2] + +RFC 2672 Non-Terminal DNS Name Redirection August 1999 + + +4.1. Processing by Servers + + For a server performing non-recursive service steps 3.c and 4 of + section 4.3.2 [DNSCF] are changed to check for a DNAME record before + checking for a wildcard ("*") label, and to return certain DNAME + records from zone data and the cache. + + DNS clients sending Extended DNS [EDNS0] queries with Version 0 or + non-extended queries are presumed not to understand the semantics of + the DNAME record, so a server which implements this specification, + when answering a non-extended query, SHOULD synthesize a CNAME record + for each DNAME record encountered during query processing to help the + client reach the correct DNS data. The behavior of clients and + servers under Extended DNS versions greater than 0 will be specified + when those versions are defined. + + The synthesized CNAME RR, if provided, MUST have + + The same CLASS as the QCLASS of the query, + + TTL equal to zero, + + An <owner> equal to the QNAME in effect at the moment the DNAME RR + was encountered, and + + An RDATA field containing the new QNAME formed by the action of + the DNAME substitution. + + If the server has the appropriate key on-line [DNSSEC, SECDYN], it + MAY generate and return a SIG RR for the synthesized CNAME RR. + + The revised server algorithm is: + + 1. Set or clear the value of recursion available in the response + depending on whether the name server is willing to provide + recursive service. If recursive service is available and + requested via the RD bit in the query, go to step 5, otherwise + step 2. + + 2. Search the available zones for the zone which is the nearest + ancestor to QNAME. If such a zone is found, go to step 3, + otherwise step 4. + + 3. Start matching down, label by label, in the zone. The matching + process can terminate several ways: + + + + + + +Crawford Standards Track [Page 3] + +RFC 2672 Non-Terminal DNS Name Redirection August 1999 + + + a. If the whole of QNAME is matched, we have found the node. + + If the data at the node is a CNAME, and QTYPE doesn't match + CNAME, copy the CNAME RR into the answer section of the + response, change QNAME to the canonical name in the CNAME RR, + and go back to step 1. + + Otherwise, copy all RRs which match QTYPE into the answer + section and go to step 6. + + b. If a match would take us out of the authoritative data, we have + a referral. This happens when we encounter a node with NS RRs + marking cuts along the bottom of a zone. + + Copy the NS RRs for the subzone into the authority section of + the reply. Put whatever addresses are available into the + additional section, using glue RRs if the addresses are not + available from authoritative data or the cache. Go to step 4. + + c. If at some label, a match is impossible (i.e., the + corresponding label does not exist), look to see whether the + last label matched has a DNAME record. + + If a DNAME record exists at that point, copy that record into + the answer section. If substitution of its <target> for its + <owner> in QNAME would overflow the legal size for a <domain- + name>, set RCODE to YXDOMAIN [DNSUPD] and exit; otherwise + perform the substitution and continue. If the query was not + extended [EDNS0] with a Version indicating understanding of the + DNAME record, the server SHOULD synthesize a CNAME record as + described above and include it in the answer section. Go back + to step 1. + + If there was no DNAME record, look to see if the "*" label + exists. + + If the "*" label does not exist, check whether the name we are + looking for is the original QNAME in the query or a name we + have followed due to a CNAME. If the name is original, set an + authoritative name error in the response and exit. Otherwise + just exit. + + If the "*" label does exist, match RRs at that node against + QTYPE. If any match, copy them into the answer section, but + set the owner of the RR to be QNAME, and not the node with the + "*" label. Go to step 6. + + + + + +Crawford Standards Track [Page 4] + +RFC 2672 Non-Terminal DNS Name Redirection August 1999 + + + 4. Start matching down in the cache. If QNAME is found in the cache, + copy all RRs attached to it that match QTYPE into the answer + section. If QNAME is not found in the cache but a DNAME record is + present at an ancestor of QNAME, copy that DNAME record into the + answer section. If there was no delegation from authoritative + data, look for the best one from the cache, and put it in the + authority section. Go to step 6. + + 5. Use the local resolver or a copy of its algorithm (see resolver + section of this memo) to answer the query. Store the results, + including any intermediate CNAMEs and DNAMEs, in the answer + section of the response. + + 6. Using local data only, attempt to add other RRs which may be + useful to the additional section of the query. Exit. + + Note that there will be at most one ancestor with a DNAME as + described in step 4 unless some zone's data is in violation of the + no-descendants limitation in section 3. An implementation might take + advantage of this limitation by stopping the search of step 3c or + step 4 when a DNAME record is encountered. + +4.2. Processing by Resolvers + + A resolver or a server providing recursive service must be modified + to treat a DNAME as somewhat analogous to a CNAME. The resolver + algorithm of [DNSCF] section 5.3.3 is modified to renumber step 4.d + as 4.e and insert a new 4.d. The complete algorithm becomes: + + 1. See if the answer is in local information, and if so return it to + the client. + + 2. Find the best servers to ask. + + 3. Send them queries until one returns a response. + + 4. Analyze the response, either: + + a. if the response answers the question or contains a name error, + cache the data as well as returning it back to the client. + + b. if the response contains a better delegation to other servers, + cache the delegation information, and go to step 2. + + c. if the response shows a CNAME and that is not the answer + itself, cache the CNAME, change the SNAME to the canonical name + in the CNAME RR and go to step 1. + + + + +Crawford Standards Track [Page 5] + +RFC 2672 Non-Terminal DNS Name Redirection August 1999 + + + d. if the response shows a DNAME and that is not the answer + itself, cache the DNAME. If substitution of the DNAME's + <target> for its <owner> in the SNAME would overflow the legal + size for a <domain-name>, return an implementation-dependent + error to the application; otherwise perform the substitution + and go to step 1. + + e. if the response shows a server failure or other bizarre + contents, delete the server from the SLIST and go back to step + 3. + + A resolver or recursive server which understands DNAME records but + sends non-extended queries MUST augment step 4.c by deleting from the + reply any CNAME records which have an <owner> which is a subdomain of + the <owner> of any DNAME record in the response. + +5. Examples of Use + +5.1. Organizational Renaming + + If an organization with domain name FROBOZZ.EXAMPLE became part of an + organization with domain name ACME.EXAMPLE, it might ease transition + by placing information such as this in its old zone. + + frobozz.example. DNAME frobozz-division.acme.example. + MX 10 mailhub.acme.example. + + The response to an extended recursive query for www.frobozz.example + would contain, in the answer section, the DNAME record shown above + and the relevant RRs for www.frobozz-division.acme.example. + +5.2. Classless Delegation of Shorter Prefixes + + The classless scheme for in-addr.arpa delegation [INADDR] can be + extended to prefixes shorter than 24 bits by use of the DNAME record. + For example, the prefix 192.0.8.0/22 can be delegated by the + following records. + + $ORIGIN 0.192.in-addr.arpa. + 8/22 NS ns.slash-22-holder.example. + 8 DNAME 8.8/22 + 9 DNAME 9.8/22 + 10 DNAME 10.8/22 + 11 DNAME 11.8/22 + + + + + + + +Crawford Standards Track [Page 6] + +RFC 2672 Non-Terminal DNS Name Redirection August 1999 + + + A typical entry in the resulting reverse zone for some host with + address 192.0.9.33 might be + + $ORIGIN 8/22.0.192.in-addr.arpa. + 33.9 PTR somehost.slash-22-holder.example. + + The same advisory remarks concerning the choice of the "/" character + apply here as in [INADDR]. + +5.3. Network Renumbering Support + + If IPv4 network renumbering were common, maintenance of address space + delegation could be simplified by using DNAME records instead of NS + records to delegate. + + $ORIGIN new-style.in-addr.arpa. + 189.190 DNAME in-addr.example.net. + + $ORIGIN in-addr.example.net. + 188 DNAME in-addr.customer.example. + + $ORIGIN in-addr.customer.example. + 1 PTR www.customer.example. + 2 PTR mailhub.customer.example. + ; etc ... + + This would allow the address space 190.189.0.0/16 assigned to the ISP + "example.net" to be changed without the necessity of altering the + zone files describing the use of that space by the ISP and its + customers. + + Renumbering IPv4 networks is currently so arduous a task that + updating the DNS is only a small part of the labor, so this scheme + may have a low value. But it is hoped that in IPv6 the renumbering + task will be quite different and the DNAME mechanism may play a + useful part. + +6. IANA Considerations + + This document defines a new DNS Resource Record type with the + mnemonic DNAME and type code 39 (decimal). The naming/numbering + space is defined in [DNSIS]. This name and number have already been + registered with the IANA. + + + + + + + + +Crawford Standards Track [Page 7] + +RFC 2672 Non-Terminal DNS Name Redirection August 1999 + + +7. Security Considerations + + The DNAME record is similar to the CNAME record with regard to the + consequences of insertion of a spoofed record into a DNS server or + resolver, differing in that the DNAME's effect covers a whole subtree + of the name space. The facilities of [DNSSEC] are available to + authenticate this record type. + +8. References + + [DNSCF] Mockapetris, P., "Domain names - concepts and facilities", + STD 13, RFC 1034, November 1987. + + [DNSCLR] Elz, R. and R. Bush, "Clarifications to the DNS + Specification", RFC 2181, July 1997. + + [DNSIS] Mockapetris, P., "Domain names - implementation and + specification", STD 13, RFC 1035, November 1987. + + [DNSSEC] Eastlake, 3rd, D. and C. Kaufman, "Domain Name System + Security Extensions", RFC 2065, January 1997. + + [DNSUPD] Vixie, P., Ed., Thomson, S., Rekhter, Y. and J. Bound, + "Dynamic Updates in the Domain Name System", RFC 2136, April + 1997. + + [EDNS0] Vixie, P., "Extensions mechanisms for DNS (EDNS0)", RFC + 2671, August 1999. + + [INADDR] Eidnes, H., de Groot, G. and P. Vixie, "Classless IN- + ADDR.ARPA delegation", RFC 2317, March 1998. + + [KWORD] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels," BCP 14, RFC 2119, March 1997. + + [SECDYN] D. Eastlake, 3rd, "Secure Domain Name System Dynamic + Update", RFC 2137, April 1997. + +9. Author's Address + + Matt Crawford + Fermilab MS 368 + PO Box 500 + Batavia, IL 60510 + USA + + Phone: +1 630 840-3461 + EMail: crawdad@fnal.gov + + + +Crawford Standards Track [Page 8] + +RFC 2672 Non-Terminal DNS Name Redirection August 1999 + + +10. Full Copyright Statement + + Copyright (C) The Internet Society (1999). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Acknowledgement + + Funding for the RFC Editor function is currently provided by the + Internet Society. + + + + + + + + + + + + + + + + + + + +Crawford Standards Track [Page 9] + diff --git a/doc/rfc/rfc2673.txt b/doc/rfc/rfc2673.txt new file mode 100644 index 00000000..19d272e9 --- /dev/null +++ b/doc/rfc/rfc2673.txt @@ -0,0 +1,395 @@ + + + + + + +Network Working Group M. Crawford +Request for Comments: 2673 Fermilab +Category: Standards Track August 1999 + + + Binary Labels in the Domain Name System + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (1999). All Rights Reserved. + +1. Introduction and Terminology + + This document defines a "Bit-String Label" which may appear within + domain names. This new label type compactly represents a sequence of + "One-Bit Labels" and enables resource records to be stored at any + bit-boundary in a binary-named section of the domain name tree. + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in [KWORD]. + +2. Motivation + + Binary labels are intended to efficiently solve the problem of + storing data and delegating authority on arbitrary boundaries when + the structure of underlying name space is most naturally represented + in binary. + +3. Label Format + + Up to 256 One-Bit Labels can be grouped into a single Bit-String + Label. Within a Bit-String Label the most significant or "highest + level" bit appears first. This is unlike the ordering of DNS labels + themselves, which has the least significant or "lowest level" label + first. Nonetheless, this ordering seems to be the most natural and + efficient for representing binary labels. + + + + + + +Crawford Standards Track [Page 1] + +RFC 2673 Binary Labels in the Domain Name System August 1999 + + + Among consecutive Bit-String Labels, the bits in the first-appearing + label are less significant or "at a lower level" than the bits in + subsequent Bit-String Labels, just as ASCII labels are ordered. + +3.1. Encoding + + 0 1 2 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 . . . + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-//+-+-+-+-+-+-+ + |0 1| ELT | Count | Label ... | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+//-+-+-+-+-+-+-+ + + (Each tic mark represents one bit.) + + + ELT 000001 binary, the six-bit extended label type [EDNS0] + assigned to the Bit-String Label. + + Count The number of significant bits in the Label field. A Count + value of zero indicates that 256 bits are significant. + (Thus the null label representing the DNS root cannot be + represented as a Bit String Label.) + + Label The bit string representing a sequence of One-Bit Labels, + with the most significant bit first. That is, the One-Bit + Label in position 17 in the diagram above represents a + subdomain of the domain represented by the One-Bit Label in + position 16, and so on. + + The Label field is padded on the right with zero to seven + pad bits to make the entire field occupy an integral number + of octets. These pad bits MUST be zero on transmission and + ignored on reception. + + A sequence of bits may be split into two or more Bit-String Labels, + but the division points have no significance and need not be + preserved. An excessively clever server implementation might split + Bit-String Labels so as to maximize the effectiveness of message + compression [DNSIS]. A simpler server might divide Bit-String Labels + at zone boundaries, if any zone boundaries happen to fall between + One-Bit Labels. + +3.2. Textual Representation + + A Bit-String Label is represented in text -- in a zone file, for + example -- as a <bit-spec> surrounded by the delimiters "\[" and "]". + The <bit-spec> is either a dotted quad or a base indicator and a + sequence of digits appropriate to that base, optionally followed by a + + + +Crawford Standards Track [Page 2] + +RFC 2673 Binary Labels in the Domain Name System August 1999 + + + slash and a length. The base indicators are "b", "o" and "x", + denoting base 2, 8 and 16 respectively. The length counts the + significant bits and MUST be between 1 and 32, inclusive, after a + dotted quad, or between 1 and 256, inclusive, after one of the other + forms. If the length is omitted, the implicit length is 32 for a + dotted quad or 1, 3 or 4 times the number of binary, octal or + hexadecimal digits supplied, respectively, for the other forms. + + In augmented Backus-Naur form [ABNF], + + bit-string-label = "\[" bit-spec "]" + + bit-spec = bit-data [ "/" length ] + / dotted-quad [ "/" slength ] + + bit-data = "x" 1*64HEXDIG + / "o" 1*86OCTDIG + / "b" 1*256BIT + + dotted-quad = decbyte "." decbyte "." decbyte "." decbyte + + decbyte = 1*3DIGIT + + length = NZDIGIT *2DIGIT + + slength = NZDIGIT [ DIGIT ] + + OCTDIG = %x30-37 + + NZDIGIT = %x31-39 + + If a <length> is present, the number of digits in the <bit-data> MUST + be just sufficient to contain the number of bits specified by the + <length>. If there are insignificant bits in a final hexadecimal or + octal digit, they MUST be zero. A <dotted-quad> always has all four + parts even if the associated <slength> is less than 24, but, like the + other forms, insignificant bits MUST be zero. + + Each number represented by a <decbyte> must be between 0 and 255, + inclusive. + + The number represented by <length> must be between 1 and 256 + inclusive. + + The number represented by <slength> must be between 1 and 32 + inclusive. + + + + + +Crawford Standards Track [Page 3] + +RFC 2673 Binary Labels in the Domain Name System August 1999 + + + When the textual form of a Bit-String Label is generated by machine, + the length SHOULD be explicit, not implicit. + +3.2.1. Examples + + The following four textual forms represent the same Bit-String Label. + + \[b11010000011101] + \[o64072/14] + \[xd074/14] + \[208.116.0.0/14] + + The following represents two consecutive Bit-String Labels which + denote the same relative point in the DNS tree as any of the above + single Bit-String Labels. + + \[b11101].\[o640] + +3.3. Canonical Representation and Sort Order + + Both the wire form and the text form of binary labels have a degree + of flexibility in their grouping into multiple consecutive Bit-String + Labels. For generating and checking DNS signature records [DNSSEC] + binary labels must be in a predictable form. This canonical form is + defined as the form which has the fewest possible Bit-String Labels + and in which all except possibly the first (least significant) label + in any sequence of consecutive Bit-String Labels is of maximum + length. + + For example, the canonical form of any sequence of up to 256 One-Bit + Labels has a single Bit-String Label, and the canonical form of a + sequence of 513 to 768 One-Bit Labels has three Bit-String Labels of + which the second and third contain 256 label bits. + + The canonical sort order of domain names [DNSSEC] is extended to + encompass binary labels as follows. Sorting is still label-by-label, + from most to least significant, where a label may now be a One-Bit + Label or a standard (code 00) label. Any One-Bit Label sorts before + any standard label, and a 0 bit sorts before a 1 bit. The absence of + a label sorts before any label, as specified in [DNSSEC]. + + + + + + + + + + + +Crawford Standards Track [Page 4] + +RFC 2673 Binary Labels in the Domain Name System August 1999 + + + For example, the following domain names are correctly sorted. + + foo.example + \[b1].foo.example + \[b100].foo.example + \[b101].foo.example + bravo.\[b10].foo.example + alpha.foo.example + +4. Processing Rules + + A One-Bit Label never matches any other kind of label. In + particular, the DNS labels represented by the single ASCII characters + "0" and "1" do not match One-Bit Labels represented by the bit values + 0 and 1. + +5. Discussion + + A Count of zero in the wire-form represents a 256-bit sequence, not + to optimize that particular case, but to make it completely + impossible to have a zero-bit label. + +6. IANA Considerations + + This document defines one Extended Label Type, termed the Bit-String + Label, and requests registration of the code point 000001 binary in + the space defined by [EDNS0]. + +7. Security Considerations + + All security considerations which apply to traditional ASCII DNS + labels apply equally to binary labels. he canonicalization and + sorting rules of section 3.3 allow these to be addressed by DNS + Security [DNSSEC]. + + + + + + + + + + + + + + + + + +Crawford Standards Track [Page 5] + +RFC 2673 Binary Labels in the Domain Name System August 1999 + + +8. References + + [ABNF] Crocker, D. and P. Overell, "Augmented BNF for Syntax + Specifications: ABNF", RFC 2234, November 1997. + + [DNSIS] Mockapetris, P., "Domain names - implementation and + specification", STD 13, RFC 1035, November 1987. + + [DNSSEC] Eastlake, D., 3rd, C. Kaufman, "Domain Name System Security + Extensions", RFC 2065, January 1997 + + [EDNS0] Vixie, P., "Extension mechanisms for DNS (EDNS0)", RFC 2671, + August 1999. + + [KWORD] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels," BCP 14, RFC 2119, March 1997. + +9. Author's Address + + Matt Crawford + Fermilab MS 368 + PO Box 500 + Batavia, IL 60510 + USA + + Phone: +1 630 840-3461 + EMail: crawdad@fnal.gov + + + + + + + + + + + + + + + + + + + + + + + + +Crawford Standards Track [Page 6] + +RFC 2673 Binary Labels in the Domain Name System August 1999 + + +10. Full Copyright Statement + + Copyright (C) The Internet Society (1999). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Acknowledgement + + Funding for the RFC Editor function is currently provided by the + Internet Society. + + + + + + + + + + + + + + + + + + + +Crawford Standards Track [Page 7] + |