summaryrefslogtreecommitdiff
path: root/doc/draft/draft-klensin-idn-tld-00.txt
blob: cbe2e15b313b718e8514f18e118ed8d1e52fc4c9 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
INTERNET-DRAFT                             John C Klensin
21 October 2002
Expires April 2003

			 National and Local Characters in DNS TLD Names
					  draft-klensin-idn-tld-00.txt

Status of this Memo

	This document is an Internet-Draft and is in full conformance
	with all provisions of Section 10 of RFC2026 except that the
	right to produce derivative works is not granted.

	Internet-Drafts are working documents of the Internet Engineering
	Task Force (IETF), its areas, and its working groups.  Note that
	other groups may also distribute working documents as
	Internet-Drafts.

	Internet-Drafts are draft documents valid for a maximum of six
	months and may be updated, replaced, or obsoleted by other
	documents at any time.  It is inappropriate to use Internet-
	Drafts as reference material or to cite them other than as
	"work in progress."

	The list of current Internet-Drafts can be accessed at
	http://www.ietf.org/ietf/1id-abstracts.txt

	The list of Internet-Draft Shadow Directories can be accessed at
	http://www.ietf.org/shadow.html.
	Internet-Drafts are working documents of the Internet Engineering
	Task Force (IETF), its areas, and its working groups.  Note that
	other groups may also distribute working documents as
	Internet-Drafts.


Abstract

   In the context of work on internationalizing the Domain Name System
   (DNS), there have been extensive discussions about "multilingual" or
   "internationalized" top level domain names (TLDs), especially for
   countries whose predominant language is not written in a Roman-based
   script.  This document reviews some of the motivations for such
   domains and the constraints that the DNS imposes.  It then suggests
   an alternative, local translation, that may solve a superset of the
   problem while avoiding protocol changes, serious deployment delays,
   and other difficulties.

Table of Contents

1 Introduction
1.1 Background on the "Multilingual Name" Problem
1.2 Domain Name System Constraints
1.3 Internationalization and Localization
2. Client-side solutions
2.1 IDNA and the client
2.2 Local translation tables for TLD names
3. Advantages and disadvantages of local translation
3.1 Every TLD in the local language and character set

3.2 Unification of country code domains
3.3 User understanding of local and global reference
3.4 Limits on TLD propagation
4. Security Considerations
5. References
6. Acknowledgements
7. Author's Address


1. Introduction

1.1 Background on the "Multilingual Name" Problem

People who share a language prefer to communicate in it, using whatever
characters are normally used to write that language, rather than in some
"foreign" one.  There have been standards for using mutually-agreed
characters and languages in electronic mail message bodies and selected
headers since the introduction of MIME in 1992 [MIME] and the Web has
permitted multilingual text since its inception.  However, since domain
names are exposed to users in email addresses and URLs, and
corresponding arrangements in other protocols, demand rapidly arose to
permit domain names in applications that used characters other than
those of the very restrictive, ASCII-subset, "LDH" conventions [LDH].
The effort to do this rapidly became known as "multilingual domain
names", although that is a misnomer, since the DNS deals only with
characters and identifier strings, and not, except by accident, what
people usually think of as "names".  And there has been little actual
interest in what would actually be a "multilingual name" -- i.e., a name
that contains components from more than one language -- but only the use
of strings conforming to different languages in the context of the DNS.

1.1.1 Approaches to the requirement

If the requirement is seen, not as "modifying the DNS", but as
"providing users with access to the DNS from a variety of languages and
character sets", three sets of proposals have emerged in the IETF and
elsewhere.  They are:

   (1) Perform processing in client software that recodes a user-visible
   string into an ASCII-compatible form that can safely be passed
   through the DNS protocols and stored in the DNS.  This is the
   approach used, for example, in the IETF's "IDNA" protocol [IDNA].

   (2) Modify the DNS to be more hospitable to non-ASCII names and
   strings.  There have been a variety of proposals to do this in almost
   as many ways, some of which have been implemented on a proprietary
   basis by various vendors.  None of them have gained acceptance in the
   IETF community, primarily because they would take a long time to
   deploy and would leave many problems unsolved.

   (3) Move the problem out of the DNS entirely, relying instead on a
   "directory" or "presentation" layer to handle internationalization.
   The rationale for this approach is discussed in [DNSROLE].

This document proposes a fourth approach, applicable to the top level
domains (TLDs) only (see section 1.2.1 for a discussion of the special
issues that make TLDs problematic).  That approach could be used as an
alternate or supplement to the strategies summarized above.


1.1.2 Writing the name of one's country in its own characters

An early focus of the "multilingual domain name" efforts was expressed
in statements such as "users in my country, in which ASCII is rarely
used, should be able to write an entire domain name in their own
character set.  In particular, since all top-level domain names, at
present, follow the LDH rules, the somewhat more restrictive naming
rules discussed in [STD3], and the coding conventions specified in
[RFC1591], all fully-qualified DNS names were effectively required to
contain at least one ASCII label (the TLD name), and that was considered
inappropriate.  One should, instead, be able to write the name of the
ccTLD for China in Chinese, the name of the ccTLD for Saudi Arabia in
Arabic, and so on.

1.1.3 Countries with multiple languages and countries with multiple
     names

>From a user interface standpoint, writing ccTLD names in local
characters is a problem.  As discussed in section 1.2.2, the DNS itself
does not easily permit a domain to be referred to by more than one name
(or spelling or translation of a name).  Countries with more than one
official language would require that the country name be represented in
each of those languages.  And, just as it is important that a user in
China be able to represent the name of the Chinese ccTLD in Chinese
characters, she should be able to access a Chinese-language site in
France using Chinese characters, requiring that she be able to write the
name of the French ccTLD in those characters rather than in a form based
on a Roman character set.


1.2 Domain Name System Constraints

1.2.1 Administrative hierarchy

The domain name system is designed around the idea of an "administrative
hierarchy", with the entity responsible for a given node of the
hierarchy responsible for policies applicable to its subhierarchies (Cf.
[STD13]).  The model works quite well for the domain and subdomains of a
particular enterprise, where the hierarchy can be organized to match the
organizational structure, there are established ways to set policies and
there is, at least presumably, shared assumptions about overall goals
and objectives among all registrants in the domain.  It is more
problematic when a domain is shared by unrelated entities which lack
common policy assumptions.  It is difficult to reach agreement on rules
that should apply to all of them.  That situation always prevails for
the labels registered in a TLD (second-level names) except in those TLDs
for which the second level is structural (e.g., the .CO, .AC, .GOV
conventions in many ccTLD) in which case, it exists for the labels
within that structural level.

TLDs may, but need not, have consistent registration policies for those
second (or third) level names.  Countries (or ccTLD administrators) have
often adopted rules about what entities may register in those ccTLDs,
and the forms the names may take.  RFC 1591 outlined registration norms
for most of the gTLDs, even though those norms have been largely ignored
in recent years. And some recent "sponsored" domains are based on quite
specific rules about appropriate registrations.  Homogeneous

registration rules for the root are, by contrast, impossible: almost by
definition, the subdomains registered in it are diverse and no single
policy applying to all root subdomains (TLDs) is feasible.

1.2.2 Aliases

In an environment different from the DNS, a rational way to permit
assigning local-language names to a country code (or other) domain would
be to set up an alias for the name, or to use some sort of "see instead"
reference.  But the DNS does not have quite the right facilities for
either.  Instead, it supports a "CNAME" record, whose label can refer
onto to a particular label and not to a subtree.  For example, if A.B.C
is a fully-qualified name, then a CNAME reference from X to A would make
X.B.C appear to have the same values as A.B.C.  However, a CNAME
reference from Y to C would not make A.B.Y referenceable (or even
defined) at all.  A second record type, DNAME [RFC2672], can provide an
alias for a portion of the tree.  But it is problematic technically, and
its use is strongly discouraged except for transition uses from one
domain to another.


1.3 Internationalization and Localization

It has often been observed that while many people talk about
"internationalization" (a term we typically use for making something
globally accessible while incorporating a broad-range "universal"
character set and conventions appropriate to all languages), they often really
mean, and want, "localization" (making things work well in a particular
locality, or well, but potentially differently, for a broad range of
localities).  Anything that actually involves the DNS must be global and
hence internationalized since the DNS cannot meaningfully support
different responses based, e.g., on the location of the user making a
query.   While the DNS cannot support localization internally, many of
the features discussed earlier in this section are much more easily
thought about in local terms --whether localized to a geographical area,
users of a language, or using some other criteria -- than in global ones.

2. Client-side solutions

Traditionally, the IETF has avoided becoming involved in standardization
for actions that take place strictly on individual hosts on the network,
assuming that it should confine itself to behavior that is observable
"on the wire", i.e., in protocols between network hosts.  Exceptions to
this general principle have been made when different clients were
required to utilize data or interpret values in compatible ways to
preserve interoperability: the standards for email and web body formats,
and IDNA itself, are examples of these exceptions.  Regardless of what
is required to be standardized, it is almost never required, and often
unwise, that a user interface, by default, present on-the-wire formats
to the user.  However, in most cases when the presentation format and
the wire format differ, the client program must take precautions that
the wire format can be reconstructed from user input, or to keep the
wire format, while hidden, bound to the presentation mechanism so that
it can be reconstructed.  And, while it is rarely a goal in itself, it
is often necessary that the user be at least vaguely aware that the wire
("real") format is different from the presentation one and that the wire
format be available for debugging.


2.1 IDNA and the client

As mentioned above, IDNA itself is entirely a client-side protocol.  It
works by providing labels to the DNS in a special format (so-called
"ACE").  When labels in that format are encountered, they are
transformed, by the client, back into internationalized (normally
Unicode) characters.  In the context of this document, the important
obvservation about IDNA is that any application program that supports it
is already doing considerable transformation work on the client; it is
not simply presenting the on-the-wire formats to the user.


2.2 Local translation tables for TLD names

We suggest that, in addition to maintaining the code and tables required
to support IDNA, clients may want to maintain a table that contains a
list of TLDs and that maps between them and locally-desirable names.
For ccTLDs, these might be the names (or locally-standard abbreviations)
by which the relevant countries are known locally (whether in ASCII
characters or others).  With some care on the part of the application
designer (e.g., to ensure that local forms do not conflict with the
actual TLD names), a particular TLD name input from the user could be
either in local or standard form without special tagging or problems.
When DNS names are received by these client programs, the TLD labels
would be mapped to local form before IDNA is applied to the rest of the
name; when names are received from users, local TLD names would be
mapped to the global ones before being passed into IDNA or for other DNS
processing.


3. Advantages and disadvantages of local translation

3.1 Every TLD in the local language and character set

The notion of a top-level domain whose name matches, e.g., the name that
is used for a  country in that country or the name of a language in that
language as, as mentioned above, immediately appealing.  But most of the
reasons for it argue equally strongly for other TLDs being accessible
from that language.  A user in Korea who can access the national ccTLD
in the Korean language and character set has every reason to expect that
both generic top level domains and and domains associated with other
countries would be similarly accessible, especially if the second-level
domains bear Korean names.  A user in Spain or Portugal, or in Latin
America, would presumably have similar expectations, but would expect to
use Spanish names, not Korean ones.  

That level of local optimization is not realistic --some would argue not
possible-- with the DNS since it would ultimately require that every top
level domain be replicated for each of the world's languages.  That
replication process would involve not just the top level domain itself:
in principle, all of its subtrees would need to be completely replicated
as well (or at least all of the subtrees for which a the language
associated with the a given replicant was relevant).  The administrative
hierarchy characteristics of the DNS (see section 1.2.1) turn the
replication process into an administrative nightmare: every
administrator of a second-level domain in the world would be forced to
maintain dozens, probably hundreds, of similar zone files for the the
replicates of the domain.  Even if only the zones relevant to a

particular country or language were replicated, the administrative and
tracking problems to bind these to the appropriate top-level domain and
keep all of the replicas synchronized would be extremely difficulty at
best.  And many administrators of third- and fourth-level domains, and
beyond, would be faced with similar problems.

By contrast, dealing with the names of TLDs as a localization problem,
using local translation, is fairly simple.  Each function represented by
a TLD -- a country, generic registrations, or purpose-specific
registrations -- could be represented in the local language and
character set as needed.  And, for countries with many languages, or
users living, working, or visiting countries where their language was
not dominant, "local" could be defined in terms of the needs or wishes
of each particular user.

3.2 Unification of country code domains

It follows from some of the comments above that, while there appears to
be some immediate appeal from having (at least) two domains for each
country, one using the ISO 3166-1 code and another one using a name
based on the national name in the national language, such a situation
would create considerable problems for registrants in the multiple
domains.  For registrants maintaining enterprise or organizational
subdomains, ease of administration in a single family of zone files will
usually make a registration in a single top-level domain preferable to
replicated sets of them, at least as long as their functional
requirements (such a local-language access) are met by the unified
structure.

Of course, having replicated domains might be popular with registries
and registrars, since replication would almost inevitably increase the
total number of domains to be registered.

3.3 User understanding of local and global references

While the IDNA tables (actually Nameprep and Stringprep -- see the IDNA
specification) must be identical globally for IDNA to work reliably, the
tables for mapping between local names and TLD names could be locally
determined, and differ from one locale to another, as long as users
understood that international interchange of names required using the
standard forms.  That understanding could be assisted by software.  It
is likely that, at least for the foreseeable future, DNS names being
passed among users in different countries, or using different languages,
will be forced to be in ACE form to guarantee compatibility in any
event, so the marginal knowledge or effort needed to put TLD names into
standard form and transmit them that way would be very small.

3.4 Limits on TLD propagation

The concept of using local translation does have one side-effect, which
some portions of the Internet community might consider undesirable.
The size and complexity of translation tables, and maintaining those
tables, will be, to a considerable extent, a function of the number of
top-level domains, the frequency with which new domains are added, and
the number of domains that are added at a time.  A country or other
locale that wished to maintain a few set of translations (i.e., so that
every TLD had a representation in the local language) would presumably
find setting up a table for the current collection of a few hundred

domains to be a task that would take some days.  If the number of TLDs
was relatively stable, with a relatively small number being added at
infrequent intervals, the updates could probably be dealt with on an ad
hoc basis.   But, if large numbers of domains were added frequently, or
if the total number of TLDs became very large, maintaining the table
might require dedicated staff. Worse, updating the tables stored on
client machines might require update and synchronization protocols and
all of the related complexities.

4. Security Considerations

IDNA provides a client-based mechanism for presenting Unicode names in
applications while passing only ASCII-based names on the wire.  As such,
it constitutes a major step along the path of introducing a client-based
presentation layer into the Internet.  Client-based presentation layer
transformations introduce risks from variant tables that can change
meaning without external protection.  For example, if a mapping table
normally maps A onto C and that table is altered by an attacker so that
A maps onto D instead, much mischief can be committed.  On the other
hand, these are not the usual sort of network attacks: they may be
thought of as falling into the "users can always cause harm to
themselves" category.  The local translation model outlined here does
not significantly increase the risks over those associated with IDNA,
but may provide some new avenues for exploiting them.

Both this approach and IDNA rely on having updated programs present
information to the user in a very different form than the one in which
it is transmitted on the wire.  Unless the internal (wire) form is
always used in interchange, there are possibilities for ambiguity and
confusion about references.

5. References

[DNSROLE] Klensin, J.C., "Role of the Domain Name System", work in
    progress (draft-klensin-dns-role-04.txt).

[IDNA] Faltstorm, F., P. Hoffman, A. M. Costello, "Internationalizing
    Domain Names in Applications (IDNA)", work in progress
	(draft-ietf-idn-idna-13.txt)

[LDH] STD13 and comments

[MIME]  Borenstein, N. and N. Freed, "MIME (Multipurpose Internet Mail
    Extensions): Mechanisms for Specifying and Describing the Format of
	Internet Message Bodies", RFC 1341, June 1992.  Updated and replaced
	by Freed, N. and N. Borenstein, "Multipurpose Internet Mail
	Extensions (MIME) Part One: Format of Internet Message Bodies",
	RFC2045, November 1996.   Also, Moore, K., "Representation of
	Non-ASCII Text in Internet Message Headers", RFC 1342, June 1992.
	Updated and replaced by Moore, K., "MIME (Multipurpose Internet
	Mail Extensions) Part Three: Message Header Extensions for
	Non-ASCII Text", RFC 2047, November 1996. 

[RFC1591] Postel, J., "Domain Name System Structure and Delegation",
    RFC1591, March 1994.

[RFC2672] Crawford, M., "Non-Terminal DNS Name Redirection", RFC 2672,
    August 1999.


[STD3] Braden, R., Ed., "Requirements for Internet Hosts - Application and
    Support", RFC1123, October 1989.

[STD13] Mockapetris, P.V., 1034 "Domain names - concepts and
    facilities", RFC 1034, and "Domain names - implementation and
	specification", RFC 1035, November 1987. 

6. Acknowledgements

This document was inspired by a number of conversations in ICANN, IETF,
MINC, and private contexts about the future evolution and
internationalization of top level domains.  Discussions within, and
about, the ICANN IDN Committee have been particularly helpful, although
several of the members of that committee may be surprised about where
those discussions led.

7. Author's Address

John C Klensin
1770 Massachusetts Ave, #322
Cambridge, MA 02140 USA
email: john+ietf@jck.com