~vcs-imports/libiconv/trunk : contents of NOTES at revision 1256

~vcs-imports/libiconv/trunk : (revision 1256)

Q: Why does libiconv support encoding XXX? Why does libiconv not support
   encoding ZZZ?

A: libiconv, as an internationalization library, supports those character
   sets and encodings which are in wide-spread use in at least one territory
   of the world.

   Hint1: On http://www.w3c.org/International/O-charset-lang.html you find a
   page "Languages, countries, and the charsets typically used for them".
   From this table, we can conclude that the following are in active use:

     ISO-8859-1, CP1252   Afrikaans, Albanian, Basque, Catalan, Danish, Dutch,
                          English, Faroese, Finnish, French, Galician, German,
                          Icelandic, Irish, Italian, Norwegian, Portuguese,
                          Scottish, Spanish, Swedish
     ISO-8859-2           Croatian, Czech, Hungarian, Polish, Romanian, Slovak,
                          Slovenian
     ISO-8859-3           Esperanto, Maltese
     ISO-8859-5           Bulgarian, Byelorussian, Macedonian, Russian,
                          Serbian, Ukrainian
     ISO-8859-6           Arabic
     ISO-8859-7           Greek
     ISO-8859-8           Hebrew
     ISO-8859-9, CP1254   Turkish
     ISO-8859-10          Inuit, Lapp
     ISO-8859-13          Latvian, Lithuanian
     ISO-8859-15          Estonian
     KOI8-R               Russian
     SHIFT_JIS            Japanese
     ISO-2022-JP          Japanese
     EUC-JP               Japanese

   Ordered by frequency on the web (1997):
     ISO-8859-1, CP1252   96%
     SHIFT_JIS             1.6%
     ISO-2022-JP           1.2%
     EUC-JP                0.4%
     CP1250                0.3%
     CP1251                0.2%
     CP850                 0.1%
     MACINTOSH             0.1%
     ISO-8859-5            0.1%
     ISO-8859-2            0.0%

   Hint2: The character sets mentioned in the XFree86 4.0 locale.alias file.

     ISO-8859-1           Afrikaans, Basque, Breton, Catalan, Danish, Dutch,
                          English, Estonian, Faroese, Finnish, French,
                          Galician, German, Greenlandic, Icelandic,
                          Indonesian, Irish, Italian, Lithuanian, Norwegian,
                          Occitan, Portuguese, Scottish, Spanish, Swedish,
                          Walloon, Welsh
     ISO-8859-2           Albanian, Croatian, Czech, Hungarian, Polish,
                          Romanian, Serbian, Slovak, Slovenian
     ISO-8859-3           Esperanto
     ISO-8859-4           Estonian, Latvian, Lithuanian
     ISO-8859-5           Bulgarian, Byelorussian, Macedonian, Russian,
                          Serbian, Ukrainian
     ISO-8859-6           Arabic
     ISO-8859-7           Greek
     ISO-8859-8           Hebrew
     ISO-8859-9           Turkish
     ISO-8859-14          Breton, Irish, Scottish, Welsh
     ISO-8859-15          Basque, Breton, Catalan, Danish, Dutch, Estonian,
                          Faroese, Finnish, French, Galician, German,
                          Greenlandic, Icelandic, Irish, Italian, Lithuanian,
                          Norwegian, Occitan, Portuguese, Scottish, Spanish,
                          Swedish, Walloon, Welsh
     KOI8-R               Russian
     KOI8-U               Russian, Ukrainian
     EUC-JP (alias eucJP)      Japanese
     ISO-2022-JP (alias JIS7)  Japanese
     SHIFT_JIS (alias SJIS)    Japanese
     U90                       Japanese
     S90                       Japanese
     EUC-CN (alias eucCN)      Chinese
     EUC-TW (alias eucTW)      Chinese
     BIG5                      Chinese
     EUC-KR (alias eucKR)      Korean
     ARMSCII-8                 Armenian
     GEORGIAN-ACADEMY          Georgian
     GEORGIAN-PS               Georgian
     TIS-620 (alias TACTIS)    Thai
     MULELAO-1                 Laothian
     IBM-CP1133                Laothian
     VISCII                    Vietnamese
     TCVN                      Vietnamese
     NUNACOM-8                 Inuktitut

   Hint3: The character sets supported by Netscape Communicator 4.

     Where is this documented? For the complete picture, I had to use
     "strings netscape" and then a lot of guesswork. For a quick take,
     look at the "View - Character set" menu of Netscape Communicator 4.6:

     ISO-8859-{1,2,5,7,9,15}
     WINDOWS-{1250,1251,1253}
     KOI8-R               Cyrillic
     CP866                Cyrillic
     Autodetect           Japanese  (EUC-JP, ISO-2022-JP, ISO-2022-JP-2, SJIS)
     EUC-JP               Japanese
     SHIFT_JIS            Japanese
     GB2312               Chinese
     BIG5                 Chinese
     EUC-TW               Chinese
     Autodetect           Korean    (EUC-KR, ISO-2022-KR, but not JOHAB)

     UTF-8
     UTF-7

   Hint4: The character sets supported by Microsoft Internet Explorer 4.

     ISO-8859-{1,2,3,4,5,6,7,8,9}
     WINDOWS-{1250,1251,1252,1253,1254,1255,1256,1257}
     KOI8-R               Cyrillic
     KOI8-RU              Ukrainian
     ASMO-708             Arabic
     EUC-JP               Japanese
     ISO-2022-JP          Japanese
     SHIFT_JIS            Japanese
     GB2312               Chinese
     HZ-GB-2312           Chinese
     BIG5                 Chinese
     EUC-KR               Korean
     ISO-2022-KR          Korean
     WINDOWS-874          Thai
     WINDOWS-1258         Vietnamese

     UTF-8
     UTF-7
     UNICODE             actually UNICODE-LITTLE
     UNICODEFEFF         actually UNICODE-BIG

     and various DOS character sets: DOS-720, DOS-862, IBM852, CP866.

   We take the union of all these four sets. The result is:

   European and Semitic languages
     * ASCII.
       We implement this because it is occasionally useful to know or to
       check whether some text is entirely ASCII (i.e. if the conversion
       ISO-8859-x -> UTF-8 is trivial).
     * ISO-8859-{1,2,3,4,5,6,7,8,9,10}
       We implement this because they are widely used. Except ISO-8859-4
       which appears to have been superseded by ISO-8859-13 in the baltic
       countries. But it's an ISO standard anyway.
     * ISO-8859-13
       We implement this because it's a standard in Lithuania and Latvia.
     * ISO-8859-14
       We implement this because it's an ISO standard.
     * ISO-8859-15
       We implement this because it's increasingly used in Europe, because
       of the Euro symbol.
     * ISO-8859-16
       We implement this because it's an ISO standard.
     * KOI8-R, KOI8-U
       We implement this because it appears to be the predominant encoding
       on Unix in Russia and Ukraine, respectively.
     * KOI8-RU
       We implement this because MSIE4 supports it.
     * KOI8-T
       We implement this because it is the locale encoding in glibc's Tajik
       locale.
     * PT154
       We implement this because it is the locale encoding in glibc's Kazakh
       locale.
     * RK1048
       We implement this because it's a standard in Kazakhstan.
     * CP{1250,1251,1252,1253,1254,1255,1256,1257}
       We implement these because they are the predominant Windows encodings
       in Europe.
     * CP850
       We implement this because it is mentioned as occurring in the web
       in the aforementioned statistics.
     * CP862
       We implement this because Ron Aaron says it is sometimes used in web
       pages and emails.
     * CP866
       We implement this because Netscape Communicator does.
     * CP1131
       We implement this because it is the locale encoding of a Belorusian
       locale in FreeBSD and MacOS X.
     * Mac{Roman,CentralEurope,Croatian,Romania,Cyrillic,Greek,Turkish} and
       Mac{Hebrew,Arabic}
       We implement these because the Sun JDK does, and because Mac users
       don't deserve to be punished.
     * Macintosh
       We implement this because it is mentioned as occurring in the web
       in the aforementioned statistics.
   Japanese
     * EUC-JP, SHIFT_JIS, ISO-2022-JP
       We implement these because they are widely used. EUC-JP and SHIFT_JIS
       are more used for files, whereas ISO-2022-JP is recommended for email.
     * CP932
       We implement this because it is the Microsoft variant of SHIFT_JIS,
       used on Windows.
     * ISO-2022-JP-2
       We implement this because it's the common way to represent mails which
       make use of JIS X 0212 characters.
     * ISO-2022-JP-1
       We implement this because it's in the RFCs, but I don't think it is
       really used.
     * ISO-2022-JP-MS
       We implement this because Microsoft Outlook Express / Microsoft MimeOLE
       sends emails in this encoding.
     * U90, S90
       We DON'T implement this because I have no informations about what it
       is or who uses it.
   Simplified Chinese
     * EUC-CN = GB2312
       We implement this because it is the widely used representation
       of simplified Chinese.
     * GBK
       We implement this because it appears to be used on Solaris and Windows.
     * GB18030
       We implement this because it is an official requirement in the
       People's Republic of China.
     * ISO-2022-CN
       We implement this because it is in the RFCs, but I have no idea
       whether it is really used.
     * ISO-2022-CN-EXT
       We implement this because it's in the RFCs, but I don't think it is
       really used.
     * HZ = HZ-GB-2312
       We implement this because the RFCs recommend it for Usenet postings,
       and because MSIE4 supports it.
   Traditional Chinese
     * EUC-TW
       We implement it because it appears to be used on Unix.
     * BIG5
       We implement it because it is the de-facto standard for traditional
       Chinese.
     * CP950
       We implement this because it is the Microsoft variant of BIG5, used
       on Windows.
     * BIG5+
       We DON'T implement this because it doesn't appear to be in wide use.
       Only the CWEX fonts use this encoding. Furthermore, the conversion
       tables in the big5p package are not coherent: If you convert directly,
       you get different results than when you convert via GBK.
     * BIG5-HKSCS
       We implement it because it is the de-facto standard for traditional
       Chinese in Hongkong.
   Korean
     * EUC-KR
       We implement these because they appear to be the widely used
       representations for Korean.
     * CP949
       We implement this because it is the Microsoft variant of EUC-KR, used
       on Windows.
     * ISO-2022-KR
       We implement it because it is in the RFCs and because MSIE4 supports
       it, but I have no idea whether it's really used.
     * JOHAB
       We implement this because it is apparently used on Windows as a locale
       encoding (codepage 1361).
     * ISO-646-KR
       We DON'T implement this because although an old ASCII variant, its
       glyph for 0x7E is not clear: RFC 1345 and unicode.org's JOHAB.TXT
       say it's a tilde, but Ken Lunde's "CJKV information processing" says
       it's an overline. And it is not ISO-IR registered.
   Armenian
     * ARMSCII-8
       We implement it because XFree86 supports it.
   Georgian
     * Georgian-Academy, Georgian-PS
       We implement these because they appear to be both used for Georgian;
       Xfree86 supports them.
   Thai
     * ISO-8859-11, TIS-620
       We implement these because it seems to be standard for Thai.
     * CP874
       We implement this because MSIE4 supports it.
     * MacThai
       We implement this because the Sun JDK does, and because Mac users
       don't deserve to be punished.
   Laotian
     * MuleLao-1, CP1133
       We implement these because XFree86 supports them. I have no idea which
       one is used more widely.
   Vietnamese
     * VISCII, TCVN
       We implement these because XFree86 supports them.
     * CP1258
       We implement this because MSIE4 supports it.
   Other languages
     * NUNACOM-8 (Inuktitut)
       We DON'T implement this because it isn't part of Unicode yet, and
       therefore doesn't convert to anything except itself.
   Platform specifics
     * HP-ROMAN8, NEXTSTEP
       We implement these because they were the native character set on HPs
       and NeXTs for a long time, and libiconv is intended to be usable on
       these old machines.
   Full Unicode
     * UTF-8, UCS-2, UCS-4
       We implement these. Obviously.
     * UCS-2BE, UCS-2LE, UCS-4BE, UCS-4LE
       We implement these because they are the preferred internal
       representation of strings in Unicode aware applications. These are
       non-ambiguous names, known to glibc. (glibc doesn't have
       UCS-2-INTERNAL and UCS-4-INTERNAL.)
     * UTF-16, UTF-16BE, UTF-16LE
       We implement these, because UTF-16 is still the favourite encoding of
       the president of the Unicode Consortium (for political reasons), and
       because they appear in RFC 2781.
     * UTF-32, UTF-32BE, UTF-32LE
       We implement these because they are part of Unicode 3.1.
     * UTF-7
       We implement this because it is essential functionality for mail
       applications.
     * C99
       We implement it because it's used for C and C++ programs and because
       it's a nice encoding for debugging.
     * JAVA
       We implement it because it's used for Java programs and because it's
       a nice encoding for debugging.
     * UNICODE (big endian), UNICODEFEFF (little endian)
       We DON'T implement these because they are stupid and not standardized.
   Full Unicode, in terms of 'uint16_t' or 'uint32_t'
   (with machine dependent endianness and alignment)
     * UCS-2-INTERNAL, UCS-4-INTERNAL
       We implement these because they are the preferred internal
       representation of strings in Unicode aware applications.

Q: Support encodings mentioned in RFC 1345 ?
A: No, they are not in use any more. Supporting ISO-646 variants is pointless
   since ISO-8859-* have been adopted.

Q: Support EBCDIC ?
A: Available through --enable-extra-encodings.
   Why? Because several people (Ulrich Schwab, Calvin Buckley) have shown
   interest in these encodings, by preparing forks of GNU libiconv.

Q: How do I add a new character set?
A: 1. Explain the "why" in this file, above.
   2. You need to have a conversion table from/to Unicode. Transform it into
   the format used by the mapping tables found on ftp.unicode.org: each line
   contains the character code, in hex, with 0x prefix, then whitespace,
   then the Unicode code point, in hex, 4 hex digits, with 0x prefix. '#'
   counts as a comment delimiter until end of line.
   Please also send your table to Mark Leisher <mleisher@crl.nmsu.edu> so he
   can include it in his collection.
   3. If it's an 8-bit character set, use the '8bit_tab_to_h' program in the
   tools directory to generate the C code for the conversion. You may tweak
   the resulting C code if you are not satisfied with its quality, but this
   is rarely needed.
   If it's a two-dimensional character set (with rows and columns), use the
   'cjk_tab_to_h' program in the tools directory to generate the C code for
   the conversion. You will need to modify the main() function to recognize
   the new character set name, with the proper dimensions, but that shouldn't
   be too hard. This yields the CCS. The CES you have to write by hand.
   4. Store the resulting C code file in the lib directory. Add a #include
   directive to converters.h, and add an entry to the encodings.def file.
   5. Compile the package, and test your new encoding using a program like
   iconv(1) or clisp(1).
   6. Augment the testsuite: Add a line to tests/Makefile.in. For a stateless
   encoding, create the complete table as a TXT file. For a stateful encoding,
   provide a text snippet encoded using your new encoding and its UTF-8
   equivalent.
   7. Update the README and man/iconv_open.3, to mention the new encoding.
   Add a note in the NEWS file.

Q: What about bidirectional text? Should it be tagged or reversed when
   converting from ISO-8859-8 or ISO-8859-6 to Unicode? Qt appears to do
   this, see qt-2.0.1/src/tools/qrtlcodec.cpp.
A: After reading RFC 1556: I don't think so. Support for ISO-8859-8-I and
   ISO-8859-E remains to be implemented.
   On the other hand, a page on www.w3c.org says that ISO-8859-8 in *email*
   is visually encoded, ISO-8859-8 in *HTML* is logically encoded, i.e.
   the same as ISO-8859-8-I. I'm confused.

Other character sets not implemented:
"MNEMONIC" = "csMnemonic"
"MNEM" = "csMnem"
"ISO-10646-UCS-Basic" = "csUnicodeASCII"
"ISO-10646-Unicode-Latin1" = "csUnicodeLatin1" = "ISO-10646"
"ISO-10646-J-1"
"UNICODE-1-1" = "csUnicode11"
"csWindows31Latin5"

Other aliases not implemented (and not implemented in glibc-2.1 either):
  From MSIE4:
    ISO-8859-1: alias ISO8859-1
    ISO-8859-2: alias ISO8859-2
    KSC_5601: alias KS_C_5601
    UTF-8: aliases UNICODE-1-1-UTF-8 UNICODE-2-0-UTF-8


Q: How can I integrate libiconv into my package?
A: Just copy the entire libiconv package into a subdirectory of your package.
   At configuration time, call libiconv's configure script with the
   appropriate --srcdir option and maybe --enable-static or --disable-shared.
   Then "cd libiconv && make && make install-lib libdir=... includedir=...".
   'install-lib' is a special (not GNU standardized) target which installs
   only the include file - in $(includedir) - and the library - in $(libdir) -
   and does not use other directory variables. After "installing" libiconv
   in your package's build directory, building of your package can proceed.

Q: Why is the testsuite so big?
A: Because some of the tests are very comprehensive.
   If you don't feel like using the testsuite, you can simply remove the
   tests/ directory.


1 by Bruno Haible Import from libiconv-0.3.	1	Q: Why does libiconv support encoding XXX? Why does libiconv not support
	2	encoding ZZZ?
	3
	4	A: libiconv, as an internationalization library, supports those character
	5	sets and encodings which are in wide-spread use in at least one territory
	6	of the world.
	7
63 by Bruno Haible Small update, initiated by Nerijus Baliunas.	8	Hint1: On http://www.w3c.org/International/O-charset-lang.html you find a
	9	page "Languages, countries, and the charsets typically used for them".
1 by Bruno Haible Import from libiconv-0.3.	10	From this table, we can conclude that the following are in active use:
	11
	12	ISO-8859-1, CP1252 Afrikaans, Albanian, Basque, Catalan, Danish, Dutch,
	13	English, Faroese, Finnish, French, Galician, German,
	14	Icelandic, Irish, Italian, Norwegian, Portuguese,
	15	Scottish, Spanish, Swedish
	16	ISO-8859-2 Croatian, Czech, Hungarian, Polish, Romanian, Slovak,
	17	Slovenian
	18	ISO-8859-3 Esperanto, Maltese
	19	ISO-8859-5 Bulgarian, Byelorussian, Macedonian, Russian,
	20	Serbian, Ukrainian
	21	ISO-8859-6 Arabic
	22	ISO-8859-7 Greek
	23	ISO-8859-8 Hebrew
	24	ISO-8859-9, CP1254 Turkish
63 by Bruno Haible Small update, initiated by Nerijus Baliunas.	25	ISO-8859-10 Inuit, Lapp
	26	ISO-8859-13 Latvian, Lithuanian
	27	ISO-8859-15 Estonian
1 by Bruno Haible Import from libiconv-0.3.	28	KOI8-R Russian
	29	SHIFT_JIS Japanese
	30	ISO-2022-JP Japanese
	31	EUC-JP Japanese
	32
	33	Ordered by frequency on the web (1997):
	34	ISO-8859-1, CP1252 96%
	35	SHIFT_JIS 1.6%
	36	ISO-2022-JP 1.2%
	37	EUC-JP 0.4%
	38	CP1250 0.3%
	39	CP1251 0.2%
	40	CP850 0.1%
	41	MACINTOSH 0.1%
	42	ISO-8859-5 0.1%
	43	ISO-8859-2 0.0%
	44
	45	Hint2: The character sets mentioned in the XFree86 4.0 locale.alias file.
	46
	47	ISO-8859-1 Afrikaans, Basque, Breton, Catalan, Danish, Dutch,
	48	English, Estonian, Faroese, Finnish, French,
	49	Galician, German, Greenlandic, Icelandic,
	50	Indonesian, Irish, Italian, Lithuanian, Norwegian,
	51	Occitan, Portuguese, Scottish, Spanish, Swedish,
	52	Walloon, Welsh
	53	ISO-8859-2 Albanian, Croatian, Czech, Hungarian, Polish,
	54	Romanian, Serbian, Slovak, Slovenian
	55	ISO-8859-3 Esperanto
	56	ISO-8859-4 Estonian, Latvian, Lithuanian
	57	ISO-8859-5 Bulgarian, Byelorussian, Macedonian, Russian,
	58	Serbian, Ukrainian
	59	ISO-8859-6 Arabic
	60	ISO-8859-7 Greek
	61	ISO-8859-8 Hebrew
	62	ISO-8859-9 Turkish
	63	ISO-8859-14 Breton, Irish, Scottish, Welsh
	64	ISO-8859-15 Basque, Breton, Catalan, Danish, Dutch, Estonian,
	65	Faroese, Finnish, French, Galician, German,
	66	Greenlandic, Icelandic, Irish, Italian, Lithuanian,
	67	Norwegian, Occitan, Portuguese, Scottish, Spanish,
	68	Swedish, Walloon, Welsh
	69	KOI8-R Russian
	70	KOI8-U Russian, Ukrainian
	71	EUC-JP (alias eucJP) Japanese
	72	ISO-2022-JP (alias JIS7) Japanese
	73	SHIFT_JIS (alias SJIS) Japanese
	74	U90 Japanese
	75	S90 Japanese
	76	EUC-CN (alias eucCN) Chinese
	77	EUC-TW (alias eucTW) Chinese
	78	BIG5 Chinese
	79	EUC-KR (alias eucKR) Korean
	80	ARMSCII-8 Armenian
	81	GEORGIAN-ACADEMY Georgian
	82	GEORGIAN-PS Georgian
	83	TIS-620 (alias TACTIS) Thai
	84	MULELAO-1 Laothian
	85	IBM-CP1133 Laothian
	86	VISCII Vietnamese
	87	TCVN Vietnamese
	88	NUNACOM-8 Inuktitut
	89
	90	Hint3: The character sets supported by Netscape Communicator 4.
	91
92	Where is this documented? For the complete picture, I had to use
93	"strings netscape" and then a lot of guesswork. For a quick take,
94	look at the "View - Character set" menu of Netscape Communicator 4.6:
95
96	ISO-8859-{1,2,5,7,9,15}
97	WINDOWS-{1250,1251,1253}
98	KOI8-R Cyrillic
99	CP866 Cyrillic
100	Autodetect Japanese (EUC-JP, ISO-2022-JP, ISO-2022-JP-2, SJIS)
101	EUC-JP Japanese
102	SHIFT_JIS Japanese
103	GB2312 Chinese
104	BIG5 Chinese
105	EUC-TW Chinese
106	Autodetect Korean (EUC-KR, ISO-2022-KR, but not JOHAB)
107
108	UTF-8
109	UTF-7
110
111	Hint4: The character sets supported by Microsoft Internet Explorer 4.
112
113	ISO-8859-{1,2,3,4,5,6,7,8,9}
114	WINDOWS-{1250,1251,1252,1253,1254,1255,1256,1257}
115	KOI8-R Cyrillic
116	KOI8-RU Ukrainian
117	ASMO-708 Arabic
118	EUC-JP Japanese
119	ISO-2022-JP Japanese
120	SHIFT_JIS Japanese
121	GB2312 Chinese
122	HZ-GB-2312 Chinese
123	BIG5 Chinese
124	EUC-KR Korean
125	ISO-2022-KR Korean
126	WINDOWS-874 Thai
127	WINDOWS-1258 Vietnamese
128
129	UTF-8
130	UTF-7
131	UNICODE actually UNICODE-LITTLE
132	UNICODEFEFF actually UNICODE-BIG
133
134	and various DOS character sets: DOS-720, DOS-862, IBM852, CP866.
135
136	We take the union of all these four sets. The result is:
137
138	European and Semitic languages
139	* ASCII.
140	We implement this because it is occasionally useful to know or to
141	check whether some text is entirely ASCII (i.e. if the conversion
142	ISO-8859-x -> UTF-8 is trivial).
143	* ISO-8859-{1,2,3,4,5,6,7,8,9,10}
144	We implement this because they are widely used. Except ISO-8859-4
46 by Bruno Haible Fix rationale about ISO-8859-4 and ISO-8859-13. Comments by	145	which appears to have been superseded by ISO-8859-13 in the baltic
1 by Bruno Haible Import from libiconv-0.3.	146	countries. But it's an ISO standard anyway.
46 by Bruno Haible Fix rationale about ISO-8859-4 and ISO-8859-13. Comments by	147	* ISO-8859-13
	148	We implement this because it's a standard in Lithuania and Latvia.
	149	* ISO-8859-14
1 by Bruno Haible Import from libiconv-0.3.	150	We implement this because it's an ISO standard.
	151	* ISO-8859-15
	152	We implement this because it's increasingly used in Europe, because
	153	of the Euro symbol.
3 by Bruno Haible Upgrade to libiconv-1.1.	154	* ISO-8859-16
3 by Bruno Haible Upgrade to libiconv-1.1.	155	We implement this because it's an ISO standard.
1 by Bruno Haible Import from libiconv-0.3.	156	* KOI8-R, KOI8-U
	157	We implement this because it appears to be the predominant encoding
	158	on Unix in Russia and Ukraine, respectively.
	159	* KOI8-RU
	160	We implement this because MSIE4 supports it.
301 by Bruno Haible Add KOI8-T encoding.	161	* KOI8-T
	162	We implement this because it is the locale encoding in glibc's Tajik
	163	locale.
574 by Bruno Haible Support for PT154 encoding.	164	* PT154
	165	We implement this because it is the locale encoding in glibc's Kazakh
	166	locale.
861 by Bruno Haible Add support for the Kazakh RK1048 encoding.	167	* RK1048
	168	We implement this because it's a standard in Kazakhstan.
1 by Bruno Haible Import from libiconv-0.3.	169	* CP{1250,1251,1252,1253,1254,1255,1256,1257}
	170	We implement these because they are the predominant Windows encodings
	171	in Europe.
	172	* CP850
	173	We implement this because it is mentioned as occurring in the web
	174	in the aforementioned statistics.
109 by Bruno Haible Add support for CP862.	175	* CP862
	176	We implement this because Ron Aaron says it is sometimes used in web
	177	pages and emails.
1 by Bruno Haible Import from libiconv-0.3.	178	* CP866
1 by Bruno Haible Import from libiconv-0.3.	179	We implement this because Netscape Communicator does.
989 by Bruno Haible New converter for CP1131.	180	* CP1131
	181	We implement this because it is the locale encoding of a Belorusian
	182	locale in FreeBSD and MacOS X.
1 by Bruno Haible Import from libiconv-0.3.	183	* Mac{Roman,CentralEurope,Croatian,Romania,Cyrillic,Greek,Turkish} and
	184	Mac{Hebrew,Arabic}
	185	We implement these because the Sun JDK does, and because Mac users
	186	don't deserve to be punished.
	187	* Macintosh
	188	We implement this because it is mentioned as occurring in the web
	189	in the aforementioned statistics.
	190	Japanese
309 by Bruno Haible Write Shift_JIS instead of Shift-JIS.	191	* EUC-JP, SHIFT_JIS, ISO-2022-JP
309 by Bruno Haible Write Shift_JIS instead of Shift-JIS.	192	We implement these because they are widely used. EUC-JP and SHIFT_JIS
1 by Bruno Haible Import from libiconv-0.3.	193	are more used for files, whereas ISO-2022-JP is recommended for email.
3 by Bruno Haible Upgrade to libiconv-1.1.	194	* CP932
309 by Bruno Haible Write Shift_JIS instead of Shift-JIS.	195	We implement this because it is the Microsoft variant of SHIFT_JIS,
3 by Bruno Haible Upgrade to libiconv-1.1.	196	used on Windows.
1 by Bruno Haible Import from libiconv-0.3.	197	* ISO-2022-JP-2
	198	We implement this because it's the common way to represent mails which
	199	make use of JIS X 0212 characters.
	200	* ISO-2022-JP-1
	201	We implement this because it's in the RFCs, but I don't think it is
	202	really used.
1084 by Bruno Haible New encoding ISO-2022-CP-MS.	203	* ISO-2022-JP-MS
	204	We implement this because Microsoft Outlook Express / Microsoft MimeOLE
	205	sends emails in this encoding.
1 by Bruno Haible Import from libiconv-0.3.	206	* U90, S90
	207	We DON'T implement this because I have no informations about what it
	208	is or who uses it.
	209	Simplified Chinese
	210	* EUC-CN = GB2312
	211	We implement this because it is the widely used representation
	212	of simplified Chinese.
	213	* GBK
	214	We implement this because it appears to be used on Solaris and Windows.
54 by Bruno Haible Add support for GB18030 and BIG5HKSCS.	215	* GB18030
	216	We implement this because it is an official requirement in the
	217	People's Republic of China.
1 by Bruno Haible Import from libiconv-0.3.	218	* ISO-2022-CN
	219	We implement this because it is in the RFCs, but I have no idea
	220	whether it is really used.
	221	* ISO-2022-CN-EXT
	222	We implement this because it's in the RFCs, but I don't think it is
	223	really used.
	224	* HZ = HZ-GB-2312
	225	We implement this because the RFCs recommend it for Usenet postings,
	226	and because MSIE4 supports it.
	227	Traditional Chinese
	228	* EUC-TW
	229	We implement it because it appears to be used on Unix.
	230	* BIG5
	231	We implement it because it is the de-facto standard for traditional
	232	Chinese.
	233	* CP950
	234	We implement this because it is the Microsoft variant of BIG5, used
	235	on Windows.
	236	* BIG5+
	237	We DON'T implement this because it doesn't appear to be in wide use.
	238	Only the CWEX fonts use this encoding. Furthermore, the conversion
	239	tables in the big5p package are not coherent: If you convert directly,
	240	you get different results than when you convert via GBK.
202 by Bruno Haible Rename BIG5HKSCS to BIG5-HKSCS.	241	* BIG5-HKSCS
54 by Bruno Haible Add support for GB18030 and BIG5HKSCS.	242	We implement it because it is the de-facto standard for traditional
54 by Bruno Haible Add support for GB18030 and BIG5HKSCS.	243	Chinese in Hongkong.
1 by Bruno Haible Import from libiconv-0.3.	244	Korean
3 by Bruno Haible Upgrade to libiconv-1.1.	245	* EUC-KR
1 by Bruno Haible Import from libiconv-0.3.	246	We implement these because they appear to be the widely used
1 by Bruno Haible Import from libiconv-0.3.	247	representations for Korean.
3 by Bruno Haible Upgrade to libiconv-1.1.	248	* CP949
	249	We implement this because it is the Microsoft variant of EUC-KR, used
	250	on Windows.
1 by Bruno Haible Import from libiconv-0.3.	251	* ISO-2022-KR
	252	We implement it because it is in the RFCs and because MSIE4 supports
	253	it, but I have no idea whether it's really used.
3 by Bruno Haible Upgrade to libiconv-1.1.	254	* JOHAB
68 by Bruno Haible Document JOHAB again.	255	We implement this because it is apparently used on Windows as a locale
68 by Bruno Haible Document JOHAB again.	256	encoding (codepage 1361).
3 by Bruno Haible Upgrade to libiconv-1.1.	257	* ISO-646-KR
	258	We DON'T implement this because although an old ASCII variant, its
	259	glyph for 0x7E is not clear: RFC 1345 and unicode.org's JOHAB.TXT
	260	say it's a tilde, but Ken Lunde's "CJKV information processing" says
	261	it's an overline. And it is not ISO-IR registered.
1 by Bruno Haible Import from libiconv-0.3.	262	Armenian
	263	* ARMSCII-8
	264	We implement it because XFree86 supports it.
	265	Georgian
	266	* Georgian-Academy, Georgian-PS
	267	We implement these because they appear to be both used for Georgian;
	268	Xfree86 supports them.
	269	Thai
499 by Bruno Haible New encoding ISO-8859-11.	270	* ISO-8859-11, TIS-620
499 by Bruno Haible New encoding ISO-8859-11.	271	We implement these because it seems to be standard for Thai.
1 by Bruno Haible Import from libiconv-0.3.	272	* CP874
	273	We implement this because MSIE4 supports it.
	274	* MacThai
	275	We implement this because the Sun JDK does, and because Mac users
	276	don't deserve to be punished.
	277	Laotian
	278	* MuleLao-1, CP1133
	279	We implement these because XFree86 supports them. I have no idea which
	280	one is used more widely.
	281	Vietnamese
	282	* VISCII, TCVN
	283	We implement these because XFree86 supports them.
	284	* CP1258
	285	We implement this because MSIE4 supports it.
	286	Other languages
	287	* NUNACOM-8 (Inuktitut)
	288	We DON'T implement this because it isn't part of Unicode yet, and
	289	therefore doesn't convert to anything except itself.
	290	Platform specifics
	291	* HP-ROMAN8, NEXTSTEP
	292	We implement these because they were the native character set on HPs
	293	and NeXTs for a long time, and libiconv is intended to be usable on
	294	these old machines.
	295	Full Unicode
	296	* UTF-8, UCS-2, UCS-4
	297	We implement these. Obviously.
20 by Bruno Haible Upgrade to libiconv-1.3.	298	* UCS-2BE, UCS-2LE, UCS-4BE, UCS-4LE
	299	We implement these because they are the preferred internal
	300	representation of strings in Unicode aware applications. These are
	301	non-ambiguous names, known to glibc. (glibc doesn't have
	302	UCS-2-INTERNAL and UCS-4-INTERNAL.)
13 by Bruno Haible Upgrade to libiconv-1.2.	303	* UTF-16, UTF-16BE, UTF-16LE
	304	We implement these, because UTF-16 is still the favourite encoding of
	305	the president of the Unicode Consortium (for political reasons), and
	306	because they appear in RFC 2781.
180 by Bruno Haible Add UTF-32 encodings.	307	* UTF-32, UTF-32BE, UTF-32LE
180 by Bruno Haible Add UTF-32 encodings.	308	We implement these because they are part of Unicode 3.1.
1 by Bruno Haible Import from libiconv-0.3.	309	* UTF-7
	310	We implement this because it is essential functionality for mail
	311	applications.
325 by Bruno Haible New encoding C99.	312	* C99
	313	We implement it because it's used for C and C++ programs and because
	314	it's a nice encoding for debugging.
1 by Bruno Haible Import from libiconv-0.3.	315	* JAVA
	316	We implement it because it's used for Java programs and because it's
	317	a nice encoding for debugging.
	318	* UNICODE (big endian), UNICODEFEFF (little endian)
	319	We DON'T implement these because they are stupid and not standardized.
1092 by Bruno Haible Modernize quoting.	320	Full Unicode, in terms of 'uint16_t' or 'uint32_t'
1 by Bruno Haible Import from libiconv-0.3.	321	(with machine dependent endianness and alignment)
	322	* UCS-2-INTERNAL, UCS-4-INTERNAL
	323	We implement these because they are the preferred internal
	324	representation of strings in Unicode aware applications.
	325
	326	Q: Support encodings mentioned in RFC 1345 ?
	327	A: No, they are not in use any more. Supporting ISO-646 variants is pointless
	328	since ISO-8859-* have been adopted.
	329
	330	Q: Support EBCDIC ?
1224 by Bruno Haible New EBCDIC encodings.	331	A: Available through --enable-extra-encodings.
	332	Why? Because several people (Ulrich Schwab, Calvin Buckley) have shown
	333	interest in these encodings, by preparing forks of GNU libiconv.
1 by Bruno Haible Import from libiconv-0.3.	334
	335	Q: How do I add a new character set?
	336	A: 1. Explain the "why" in this file, above.
	337	2. You need to have a conversion table from/to Unicode. Transform it into
	338	the format used by the mapping tables found on ftp.unicode.org: each line
	339	contains the character code, in hex, with 0x prefix, then whitespace,
	340	then the Unicode code point, in hex, 4 hex digits, with 0x prefix. '#'
	341	counts as a comment delimiter until end of line.
	342	Please also send your table to Mark Leisher <mleisher@crl.nmsu.edu> so he
	343	can include it in his collection.
	344	3. If it's an 8-bit character set, use the '8bit_tab_to_h' program in the
	345	tools directory to generate the C code for the conversion. You may tweak
109 by Bruno Haible Add support for CP862.	346	the resulting C code if you are not satisfied with its quality, but this
109 by Bruno Haible Add support for CP862.	347	is rarely needed.
1 by Bruno Haible Import from libiconv-0.3.	348	If it's a two-dimensional character set (with rows and columns), use the
	349	'cjk_tab_to_h' program in the tools directory to generate the C code for
	350	the conversion. You will need to modify the main() function to recognize
	351	the new character set name, with the proper dimensions, but that shouldn't
	352	be too hard. This yields the CCS. The CES you have to write by hand.
109 by Bruno Haible Add support for CP862.	353	4. Store the resulting C code file in the lib directory. Add a #include
2 by Bruno Haible Upgrade to libiconv-1.0.	354	directive to converters.h, and add an entry to the encodings.def file.
1 by Bruno Haible Import from libiconv-0.3.	355	5. Compile the package, and test your new encoding using a program like
1 by Bruno Haible Import from libiconv-0.3.	356	iconv(1) or clisp(1).
749 by Bruno Haible Remove OS/2 build support that doesn't assume GNU make and GNU bash.	357	6. Augment the testsuite: Add a line to tests/Makefile.in. For a stateless
	358	encoding, create the complete table as a TXT file. For a stateful encoding,
2 by Bruno Haible Upgrade to libiconv-1.0.	359	provide a text snippet encoded using your new encoding and its UTF-8
2 by Bruno Haible Upgrade to libiconv-1.0.	360	equivalent.
3 by Bruno Haible Upgrade to libiconv-1.1.	361	7. Update the README and man/iconv_open.3, to mention the new encoding.
1 by Bruno Haible Import from libiconv-0.3.	362	Add a note in the NEWS file.
	363
	364	Q: What about bidirectional text? Should it be tagged or reversed when
	365	converting from ISO-8859-8 or ISO-8859-6 to Unicode? Qt appears to do
	366	this, see qt-2.0.1/src/tools/qrtlcodec.cpp.
	367	A: After reading RFC 1556: I don't think so. Support for ISO-8859-8-I and
	368	ISO-8859-E remains to be implemented.
	369	On the other hand, a page on www.w3c.org says that ISO-8859-8 in email
	370	is visually encoded, ISO-8859-8 in HTML is logically encoded, i.e.
	371	the same as ISO-8859-8-I. I'm confused.
	372
	373	Other character sets not implemented:
	374	"MNEMONIC" = "csMnemonic"
	375	"MNEM" = "csMnem"
	376	"ISO-10646-UCS-Basic" = "csUnicodeASCII"
	377	"ISO-10646-Unicode-Latin1" = "csUnicodeLatin1" = "ISO-10646"
	378	"ISO-10646-J-1"
	379	"UNICODE-1-1" = "csUnicode11"
	380	"csWindows31Latin5"
	381
	382	Other aliases not implemented (and not implemented in glibc-2.1 either):
	383	From MSIE4:
	384	ISO-8859-1: alias ISO8859-1
	385	ISO-8859-2: alias ISO8859-2
	386	KSC_5601: alias KS_C_5601
	387	UTF-8: aliases UNICODE-1-1-UTF-8 UNICODE-2-0-UTF-8
	388
2 by Bruno Haible Upgrade to libiconv-1.0.	389
	390	Q: How can I integrate libiconv into my package?
	391	A: Just copy the entire libiconv package into a subdirectory of your package.
	392	At configuration time, call libiconv's configure script with the
	393	appropriate --srcdir option and maybe --enable-static or --disable-shared.
	394	Then "cd libiconv && make && make install-lib libdir=... includedir=...".
	395	'install-lib' is a special (not GNU standardized) target which installs
	396	only the include file - in $(includedir) - and the library - in $(libdir) -
	397	and does not use other directory variables. After "installing" libiconv
	398	in your package's build directory, building of your package can proceed.
	399
	400	Q: Why is the testsuite so big?
	401	A: Because some of the tests are very comprehensive.
	402	If you don't feel like using the testsuite, you can simply remove the
	403	tests/ directory.
	404