Internet Draft Author: Sergey Charikov draft-charikov-idn-reg-01.txt CyrLINC Created: May 25, 2003 Updated: July,13,1007 Intended status: Best Current Practice Internationalized Domain Names Registration and Administration Guideline for Russian, Ukrainian, Bulgarian and Byelorussian languages in ASCII TLDs. Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document is a guideline for Registries and Registrars on registering internationalized domain names (IDNs) on Russian, Ukrainian, Bulgarian and Byelorussian languages in a zone. Before accepting registrations of domain names into a zone, the zone's registry should decide which codepoints in the [UNICODE] character set the zone will accept. The registry should also decide whether particular characters in a registered domain name should cause registration of multiple equivalent domain names. With those decisions, the registry can safely register names using the tables and algorithmes described here. 1.Introduction Current Russian alphabet has 33 characters which are described by 33 units of Unicode Cyrillic set: U+0451, U+0430-U+044F (33 characters of lower case). Ukrainian alphabet has 32 characters excluded U+044A, U+044B, U+044D, U+0451 of Russian alphabet, but added U+0454, U+0456, U+0457. 30 characters of Bulgarian alphabet are fully overlapped by Russian, excluding U+044B, U+044D, U+0451. Byelorussian alphabet has 32 characters excluded U+0438, U+044A, U+044B, of Russian alphabet, but added U+0456, U+045E. People of Russia, Ukraine, Bulgaria and Byelorussia use Cyrillic for writing, but they offen use Latin script too. Standard keyboards using in these countries have both Cyrillic and Latin characters. Therefore some registries could use Latin scripts for domain names registration in their zones. [FRAMEWORK] describe ways of IDN domain names registration which increase the risk of misunderstandings, cybersquatting, and other forms of confusion. For some human languages, there are characters and/or strings that have equivalent or near-equivalent meanings. If someone is allowed to register a name with such a character or string, the registry might want to automatically register all the names that have the same meaning in that language. Further, some registries might want to restrict the set of characters to be registered for language-based reasons. In addition, IDNA allows the use of thousands of non-alphanumeric characters, and some zone administrators will want to prohibit some or all of these characters. In a view of 16 cyrillic characters (U+0430,U+432,U+433,U+0435,U+438,U+043A, U+043C,U+043D,U+043E,U+043F,U+0440,U+0441,U+442,U+0443,U+0445,U+0456) are similar to latin characters U+0061,U+0062,U+0072,U+0065,U+0075,U+006B, U+006D,U+0068,U+006F,U+006E,U+0070,U+0063,U+0074,U+0079,U+0078,U+0069 using a technology described in [FRAMEWORK] is useful for IDN domains registration in cyrillic zones. 1.1 Terminology A "string" is an ordered set of one or more characters. This document discusses characters that have equivalent or near-equivalent characters or strings. The "base character" is the character that has one or more equivalents; the "variant(s)" are the character(s) and/or string(s) that are equivalent to the base character. A "registration bundle" is the set of all labels that comes from expanding all base characters for a single name into their variants. A registry is the administrative authority for a DNS zone. That is, the registry is the body that makes and enforces policies that are used in a particular zone in the DNS. 2. Language-based tables The registration strategy described in this document uses a table that lists all characters allowed for input and any variants of those characters. Note that the table lists all characters allowed, not only the ones that have variants. The tables are language-specific, although it is possible to create a single table that covers multiple languages. The following three sub-sections describe the use of tables. 2.1 Tables for zones that use names from single language (Annexes I,III,V,VII) These tables constructed without variants. Its have the base characters of the language used in the zone, digits 0-9 and hiphen. 2.2 Table for common zones that use names from four cyrillic languages (Annexes IX) This table constructed without variants and has the base characters of four languages used in the zone, digits 0-9 and hiphen. 2.3. Tables for zones that use names from cyrillic languages and using base latin set. (Annexes II,IV,VI,VIII,X) This tables constructed with variants for overlapped latin characters and has the base characters of cyrillic languages, base latin set, digits 0-9 and hiphen. 3. Table processing rules The input to the process is called the "input label". The output of the process is either failure (the input label cannot be registered at all), or a registration bundle that contains one or more labels that have been processed with ToASCII. 4. Table format Each character in the table is given in the "U+" notation for Unicode characters. The lines of the table are terminated with either a carriage return character (ASCII 0x0D), a linefeed character (ASCII 0x0A), or a sequence of carriage return followed by linefeed (ASCII 0x0D 0x0A). The order of the lines in the table do not matter. Each line in the table starts with the character that is allowed in the registry. If that character has any variants, the base character is followed by a vertical bar character ("", ASCII 0x7C) and the variant string. If the base character has more than one variant, the variants are separated by a colon (":", ASCII 0x3A). Strings are given without any intervening spaces The following is an example of how a table might look. The entries in this table are purposely silly and should not be used by any registry as the basis for choosing variants. For the example, assume that the registry: - allows the CYRILLIC SMALL EL character (U+043B) with no variants; - allows the CYRILLIC SMALL LETTER A character (U+0430) which has a single variant of LATIN SMALL LETTER A (U+0061); The table would look like: U+043B U+0430|U+0061 The registry's table MUST NOT have more than one entry for a particular base character. 5. Steps after registering an input label A registry has three options for how to handle the case where the registration bundle has more than one label. The policy options are: 1) Allocate all labels to the same registrant, making the zone information identical to that of the input label. 2) Block all labels so they cannot be registered in the future. 3) Allocate some labels and block some other labels. Option 1 will cause end users to be able to find names with variants more easily, but will result in larger zone files. For some language tables, the zone file could become so large that it could negatively affect the ability of the registry to perform name resolution. Option 2 does not increase the size of the zone file, but it may cause end users to not be able to find names with variants that they would expect. Option 3 is likely to cause the most confusion with users because including some variants will cause a name to be found, bout using other variants will cause the name to be not found. With any of these three options, the registry MUST keep a database that links each label in the registration bundle to the input label. This link needs to be maintained so that changes in the non-DNS registration information (such as the label's owner name and address) is reflected in every member of the registration bundle as well. 6. References [IDNA] RFC 3490. Internationalizing Domain Names in Applications. http://www.ietf.org/rfc/rfc3490.txt [FRAMEWORK] Framework for Registering Internationalized Domain Names draft-hoffman-idn-reg-00.txt 25 ìàðòà 2003. Paul Hoffman. http://www.ietf.org/internet-drafts/draft-hoffman-idn-reg-00.txt [UNICODE] The Unicode Consortium. The Unicode Standard, Version 3.2.0 is defined by The Unicode Standard, Version 3.0 (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5), as amended by the Unicode Standard Annex #27:Unicode 3.1 (http://www.unicode.org/reports/tr27/) and by the Unicode Standard Annex #28: Unicode 3.2 (http://www.unicode.org/reports/tr28/). 7. Author's address Sergey Charikov Cyrillic Languages Internet Names Consortium 1812 street 2/1 Moscow,121170 Russia s.shar@regtime.net Annex. I. Table for zones that use names from Russian language U+002D HYPHEN-MINUS U+0030..U+0039 DIGIT ZERO .. DIGIT 9 U+0430 CYRILLIC SMALL LETTER A U+0431 CYRILLIC SMALL LETTER BE U+0432 CYRILLIC SMALL LETTER VE U+0433 CYRILLIC SMALL LETTER GHE U+0434 CYRILLIC SMALL LETTER DE U+0435 CYRILLIC SMALL LETTER IE U+0436 CYRILLIC SMALL LETTER ZHE U+0437 CYRILLIC SMALL LETTER ZE U+0438 CYRILLIC SMALL LETTER I U+0439 CYRILLIC SMALL LETTER SHORT I U+043A CYRILLIC SMALL LETTER KA U+043B CYRILLIC SMALL LETTER EL U+043C CYRILLIC SMALL LETTER EM U+043D CYRILLIC SMALL LETTER EN U+043E CYRILLIC SMALL LETTER O U+043F CYRILLIC SMALL LETTER PE U+0440 CYRILLIC SMALL LETTER ER U+0441 CYRILLIC SMALL LETTER ES U+0442 CYRILLIC SMALL LETTER TE U+0443 CYRILLIC SMALL LETTER U U+0444 CYRILLIC SMALL LETTER EF U+0445 CYRILLIC SMALL LETTER HA U+0446 CYRILLIC SMALL LETTER TSE U+0447 CYRILLIC SMALL LETTER CHE U+0448 CYRILLIC SMALL LETTER SHA U+0449 CYRILLIC SMALL LETTER SHCHA U+044A CYRILLIC SMALL LETTER HARD SIGN U+044B CYRILLIC SMALL LETTER YERU U+044C CYRILLIC SMALL LETTER SOFT SIGN U+044D CYRILLIC SMALL LETTER E U+044E CYRILLIC SMALL LETTER YU U+044F CYRILLIC SMALL LETTER YA U+0451 CYRILLIC SMALL LETTER IÎ II.Table for zones that use names from Russian language and using a base latin set. U+002D HYPHEN-MINUS U+0030..U+0039 DIGIT ZERO .. DIGIT 9 U+0430|U+0061 U+0431 U+0432|U+0062 U+0433|U+0072 U+0434 U+0435|U+0065 U+0436 U+0437 U+0438|U+0075 U+0439 U+043A|U+006B U+043B U+043C|U+006D U+043D|U+0068 U+043E|U+006F U+043F|U+006E U+0440|U+0070 U+0441|U+0063 U+0442|U+0074 U+0443|U+0079 U+0444 U+0445|U+0078 U+0446 U+0447 U+0448 U+0449 U+044A U+044B U+044C U+044D U+044E U+044F U+0451 III.Table for zones that use names from Ukrainian language U+002D HYPHEN-MINUS U+0030..U+0039 DIGIT ZERO .. DIGIT 9 U+0430 CYRILLIC SMALL LETTER A U+0431 CYRILLIC SMALL LETTER BE U+0432 CYRILLIC SMALL LETTER VE U+0433 CYRILLIC SMALL LETTER GHE U+0434 CYRILLIC SMALL LETTER DE U+0435 CYRILLIC SMALL LETTER IE U+0436 CYRILLIC SMALL LETTER ZHE U+0437 CYRILLIC SMALL LETTER ZE U+0438 CYRILLIC SMALL LETTER I U+0439 CYRILLIC SMALL LETTER SHORT I U+043A CYRILLIC SMALL LETTER KA U+043B CYRILLIC SMALL LETTER EL U+043C CYRILLIC SMALL LETTER EM U+043D CYRILLIC SMALL LETTER EN U+043E CYRILLIC SMALL LETTER O U+043F CYRILLIC SMALL LETTER PE U+0440 CYRILLIC SMALL LETTER ER U+0441 CYRILLIC SMALL LETTER ES U+0442 CYRILLIC SMALL LETTER TE U+0443 CYRILLIC SMALL LETTER U U+0444 CYRILLIC SMALL LETTER EF U+0445 CYRILLIC SMALL LETTER HA U+0446 CYRILLIC SMALL LETTER TSE U+0447 CYRILLIC SMALL LETTER CHE U+0448 CYRILLIC SMALL LETTER SHA U+0449 CYRILLIC SMALL LETTER SHCHA U+044C CYRILLIC SMALL LETTER SOFT SIGN U+044E CYRILLIC SMALL LETTER YU U+044F CYRILLIC SMALL LETTER YA U+0454 CYRILLIC SMALL LETTER UKRAINIAN IE U+0456 CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I U+0457 CYRILLIC SMALL LETTER UKRAINIAN YI IV.Table for zones that use names from Ukrainian language and using a base latin set. U+002D HYPHEN-MINUS U+0030..U+0039 DIGIT ZERO .. DIGIT 9 U+0430|U+0061 U+0431 U+0432|U+0062 U+0433|U+0072 U+0434 U+0435|U+0065 U+0436 U+0437 U+0438|U+0075 U+0439 U+043A|U+006B U+043B U+043C|U+006D U+043D|U+0068 U+043E|U+006F U+043F|U+006E U+0440|U+0070 U+0441|U+0063 U+0442|U+0074 U+0443|U+0079 U+0444 U+0445|U+0078 U+0446 U+0447 U+0448 U+0449 U+044C U+044E U+044F U+0454 U+0456|U+0069 U+0457 V.Table for zones that use names from Bulgarian language U+002D HYPHEN-MINUS U+0030..U+0039 DIGIT ZERO .. DIGIT 9 U+0430 CYRILLIC SMALL LETTER A U+0431 CYRILLIC SMALL LETTER BE U+0432 CYRILLIC SMALL LETTER VE U+0433 CYRILLIC SMALL LETTER GHE U+0434 CYRILLIC SMALL LETTER DE U+0435 CYRILLIC SMALL LETTER IE U+0436 CYRILLIC SMALL LETTER ZHE U+0437 CYRILLIC SMALL LETTER ZE U+0438 CYRILLIC SMALL LETTER I U+0439 CYRILLIC SMALL LETTER SHORT I U+043A CYRILLIC SMALL LETTER KA U+043B CYRILLIC SMALL LETTER EL U+043C CYRILLIC SMALL LETTER EM U+043D CYRILLIC SMALL LETTER EN U+043E CYRILLIC SMALL LETTER O U+043F CYRILLIC SMALL LETTER PE U+0440 CYRILLIC SMALL LETTER ER U+0441 CYRILLIC SMALL LETTER ES U+0442 CYRILLIC SMALL LETTER TE U+0443 CYRILLIC SMALL LETTER U U+0444 CYRILLIC SMALL LETTER EF U+0445 CYRILLIC SMALL LETTER HA U+0446 CYRILLIC SMALL LETTER TSE U+0447 CYRILLIC SMALL LETTER CHE U+0448 CYRILLIC SMALL LETTER SHA U+0449 CYRILLIC SMALL LETTER SHCHA U+044A CYRILLIC SMALL LETTER HARD SIGN U+044C CYRILLIC SMALL LETTER SOFT SIGN U+044E CYRILLIC SMALL LETTER YU U+044F CYRILLIC SMALL LETTER YA VI.Table for zones that use names from Bulgarian language and using a base latin set. U+002D HYPHEN-MINUS U+0030..U+0039 DIGIT ZERO .. DIGIT 9 U+0430|U+0061 U+0431 U+0432|U+0062 U+0433|U+0072 U+0434 U+0435|U+0065 U+0436 U+0437 U+0438|U+0075 U+0439 U+043A|U+006B U+043B U+043C|U+006D U+043D|U+0068 U+043E|U+006F U+043F|U+006E U+0440|U+0070 U+0441|U+0063 U+0442|U+0074 U+0443|U+0079 U+0444 U+0445|U+0078 U+0446 U+0447 U+0448 U+0449 U+044A U+044C U+044E U+044F VII.Table for zones that use names from Byelorussian language. U+002D HYPHEN-MINUS U+0030..U+0039 DIGIT ZERO .. DIGIT 9 U+0430 CYRILLIC SMALL LETTER A U+0431 CYRILLIC SMALL LETTER BE U+0432 CYRILLIC SMALL LETTER VE U+0433 CYRILLIC SMALL LETTER GHE U+0434 CYRILLIC SMALL LETTER DE U+0435 CYRILLIC SMALL LETTER IE U+0436 CYRILLIC SMALL LETTER ZHE U+0437 CYRILLIC SMALL LETTER ZE U+0439 CYRILLIC SMALL LETTER SHORT I U+043A CYRILLIC SMALL LETTER KA U+043B CYRILLIC SMALL LETTER EL U+043C CYRILLIC SMALL LETTER EM U+043D CYRILLIC SMALL LETTER EN U+043E CYRILLIC SMALL LETTER O U+043F CYRILLIC SMALL LETTER PE U+0440 CYRILLIC SMALL LETTER ER U+0441 CYRILLIC SMALL LETTER ES U+0442 CYRILLIC SMALL LETTER TE U+0443 CYRILLIC SMALL LETTER U U+0444 CYRILLIC SMALL LETTER EF U+0445 CYRILLIC SMALL LETTER HA U+0446 CYRILLIC SMALL LETTER TSE U+0447 CYRILLIC SMALL LETTER CHE U+0448 CYRILLIC SMALL LETTER SHA U+0449 CYRILLIC SMALL LETTER SHCHA U+044C CYRILLIC SMALL LETTER SOFT SIGN U+044D CYRILLIC SMALL LETTER E U+044E CYRILLIC SMALL LETTER YU U+044F CYRILLIC SMALL LETTER YA U+0451 CYRILLIC SMALL LETTER IÎ U+0456 CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I U+045E CYRILLIC SMALL LETTER SHORT U VIII.Table for zones that use names from Byelorussian language and using a base latin set. U+002D HYPHEN-MINUS U+0030..U+0039 DIGIT ZERO .. DIGIT 9 U+0430|U+0061 U+0431 U+0432|U+0062 U+0433|U+0072 U+0434 U+0435|U+0065 U+0436 U+0437 U+0439 U+043A|U+006B U+043B U+043C|U+006D U+043D|U+0068 U+043E|U+006F U+043F|U+006E U+0440|U+0070 U+0441|U+0063 U+0442|U+0074 U+0443|U+0079 U+0444 U+0445|U+0078 U+0446 U+0447 U+0448 U+0449 U+044C U+044D U+044E U+044F U+0451 U+0456|U+0069 U+045E IX.Table for common zones that use names from Russian, Ukrainian, Bulgarian and Byelorussian languages. U+002D HYPHEN-MINUS U+0030..U+0039 DIGIT ZERO .. DIGIT 9 U+0430 CYRILLIC SMALL LETTER A U+0431 CYRILLIC SMALL LETTER BE U+0432 CYRILLIC SMALL LETTER VE U+0433 CYRILLIC SMALL LETTER GHE U+0434 CYRILLIC SMALL LETTER DE U+0435 CYRILLIC SMALL LETTER IE U+0436 CYRILLIC SMALL LETTER ZHE U+0437 CYRILLIC SMALL LETTER ZE U+0438 CYRILLIC SMALL LETTER I U+0439 CYRILLIC SMALL LETTER SHORT I U+043A CYRILLIC SMALL LETTER KA U+043B CYRILLIC SMALL LETTER EL U+043C CYRILLIC SMALL LETTER EM U+043D CYRILLIC SMALL LETTER EN U+043E CYRILLIC SMALL LETTER O U+043F CYRILLIC SMALL LETTER PE U+0440 CYRILLIC SMALL LETTER ER U+0441 CYRILLIC SMALL LETTER ES U+0442 CYRILLIC SMALL LETTER TE U+0443 CYRILLIC SMALL LETTER U U+0444 CYRILLIC SMALL LETTER EF U+0445 CYRILLIC SMALL LETTER HA U+0446 CYRILLIC SMALL LETTER TSE U+0447 CYRILLIC SMALL LETTER CHE U+0448 CYRILLIC SMALL LETTER SHA U+0449 CYRILLIC SMALL LETTER SHCHA U+044A CYRILLIC SMALL LETTER HARD SIGN U+044B CYRILLIC SMALL LETTER YERU U+044C CYRILLIC SMALL LETTER SOFT SIGN U+044D CYRILLIC SMALL LETTER E U+044E CYRILLIC SMALL LETTER YU U+044F CYRILLIC SMALL LETTER YA U+0451 CYRILLIC SMALL LETTER IÎ U+0454 CYRILLIC SMALL LETTER UKRAINIAN IE U+0456 CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I U+0457 CYRILLIC SMALL LETTER UKRAINIAN YI U+045E CYRILLIC SMALL LETTER SHORT U X.Table for common zones that use names from Russian, Ukrainian, Bulgarian and Byelorussian languages and using a base latin set. U+002D HYPHEN-MINUS U+0030..U+0039 DIGIT ZERO .. DIGIT 9 U+0430|U+0061 U+0431 U+0432|U+0062 U+0433|U+0072 U+0434 U+0435|U+0065 U+0436 U+0437 U+0438|U+0075 U+0439 U+043A|U+006B U+043B U+043C|U+006D U+043D|U+0068 U+043E|U+006F U+043F|U+006E U+0440|U+0070 U+0441|U+0063 U+0442|U+0074 U+0443|U+0079 U+0444 U+0445|U+0078 U+0446 U+0447 U+0448 U+0449 U+044A U+044B U+044C U+044D U+044E U+044F U+0451 U+0454 U+0456|U+0069 U+0457 U+045E