26
// http://en.wikipedia.org/wiki/UTF-16
28
// In computing, UTF-16 (16-bit Unicode Transformation Format) is a variable-length character encoding
29
// for Unicode, capable of encoding the entire Unicode repertoire. The encoding form maps code points
30
// (characters) into a sequence of 16-bit words, called code units. For characters in the Basic
31
// Multilingual Plane (BMP) the resulting encoding is a single 16-bit word. For characters in the other
32
// planes, the encoding will result in a pair of 16-bit words, together called a surrogate pair. All possible
33
// code points from U+0000 through U+10FFFF, except for the surrogate code points U+D800�U+DFFF
34
// (which are not characters), are uniquely mapped by UTF-16 regardless of the code point's current or
35
// future character assignment or use.
37
// As many uses in computing require units of bytes (octets) there are three related encoding schemes
38
// which map to octet sequences instead of words: namely UTF-16, UTF-16BE, and UTF-16LE. They
39
// differ only in the byte order chosen to represent each 16-bit unit and whether they make use of a
40
// Byte Order Mark. All of the schemes will result in either a 2 or 4-byte sequence for any given character.
42
// UTF-16 is officially defined in Annex Q of the international standard ISO/IEC 10646-1. It is also
43
// described in The Unicode Standard version 3.0 and higher, as well as in the IETF's RFC 2781.
45
// UCS-2 (2-byte Universal Character Set) is an obsolete character encoding which is a predecessor
46
// to UTF-16. The UCS-2 encoding form is nearly identical to that of UTF-16, except that it does not
47
// support surrogate pairs and therefore can only encode characters in the BMP range U+0000 through
48
// U+FFFF. As a consequence it is a fixed-length encoding that always encodes characters into a
49
// single 16-bit value. As with UTF-16, there are three related encoding schemes (UCS-2, UCS-2BE, UCS-2LE)
50
// that map characters to a specific byte sequence.
52
// Because of the technical similarities and upwards compatibility from UCS-2 to UTF-16, the two
53
// encodings are often erroneously conflated and used as if interchangeable, so that strings encoded
54
// in UTF-16 are sometimes misidentified as being encoded in UCS-2.
58
//! Convert UTF-16 to UTF-8
61
// UTF-8 encoded characters may theoretically be up to six bytes long, however 16-bit BMP characters are only up to three bytes long.
63
explicit NUTF8(const UNICHAR* Source);
64
explicit NUTF8(const std::wstring& Source);
67
operator const char* ();
70
void Convert(const UNICHAR*);
71
//void Convert(const t_UTF32*);
76
//! Convert UTF-8 to UTF-16
80
explicit NUTF16(const char* Source);
81
explicit NUTF16(const std::string& Source);
84
operator const UNICHAR* ();
87
void Convert(const char*);
26
// http://en.wikipedia.org/wiki/UTF-16
28
// In computing, UTF-16 (16-bit Unicode Transformation Format) is a variable-length character encoding
29
// for Unicode, capable of encoding the entire Unicode repertoire. The encoding form maps code points
30
// (characters) into a sequence of 16-bit words, called code units. For characters in the Basic
31
// Multilingual Plane (BMP) the resulting encoding is a single 16-bit word. For characters in the other
32
// planes, the encoding will result in a pair of 16-bit words, together called a surrogate pair. All possible
33
// code points from U+0000 through U+10FFFF, except for the surrogate code points U+D800�U+DFFF
34
// (which are not characters), are uniquely mapped by UTF-16 regardless of the code point's current or
35
// future character assignment or use.
37
// As many uses in computing require units of bytes (octets) there are three related encoding schemes
38
// which map to octet sequences instead of words: namely UTF-16, UTF-16BE, and UTF-16LE. They
39
// differ only in the byte order chosen to represent each 16-bit unit and whether they make use of a
40
// Byte Order Mark. All of the schemes will result in either a 2 or 4-byte sequence for any given character.
42
// UTF-16 is officially defined in Annex Q of the international standard ISO/IEC 10646-1. It is also
43
// described in The Unicode Standard version 3.0 and higher, as well as in the IETF's RFC 2781.
45
// UCS-2 (2-byte Universal Character Set) is an obsolete character encoding which is a predecessor
46
// to UTF-16. The UCS-2 encoding form is nearly identical to that of UTF-16, except that it does not
47
// support surrogate pairs and therefore can only encode characters in the BMP range U+0000 through
48
// U+FFFF. As a consequence it is a fixed-length encoding that always encodes characters into a
49
// single 16-bit value. As with UTF-16, there are three related encoding schemes (UCS-2, UCS-2BE, UCS-2LE)
50
// that map characters to a specific byte sequence.
52
// Because of the technical similarities and upwards compatibility from UCS-2 to UTF-16, the two
53
// encodings are often erroneously conflated and used as if interchangeable, so that strings encoded
54
// in UTF-16 are sometimes misidentified as being encoded in UCS-2.
58
//! Convert UTF-16 to UTF-8
61
// UTF-8 encoded characters may theoretically be up to six bytes long, however 16-bit BMP characters are only up to three bytes long.
63
explicit NUTF8(const UNICHAR* Source);
64
explicit NUTF8(const std::wstring& Source);
67
operator const char* ();
70
void Convert(const UNICHAR*);
71
//void Convert(const t_UTF32*);
76
//! Convert UTF-8 to UTF-16
80
explicit NUTF16(const char* Source);
81
explicit NUTF16(const std::string& Source);
84
operator const UNICHAR* ();
87
void Convert(const char*);