61
* @title: Character Set Conversion
62
* @short_description: Convert strings between different character sets
64
* The g_convert() family of function wraps the functionality of iconv(). In
65
* addition to pure character set conversions, GLib has functions to deal
66
* with the extra complications of encodings for file names.
68
* <refsect2 id="file-name-encodings">
69
* <title>File Name Encodings</title>
71
* Historically, Unix has not had a defined encoding for file
72
* names: a file name is valid as long as it does not have path
73
* separators in it ("/"). However, displaying file names may
74
* require conversion: from the character set in which they were
75
* created, to the character set in which the application
76
* operates. Consider the Spanish file name
77
* "<filename>Presentación.sxi</filename>". If the
78
* application which created it uses ISO-8859-1 for its encoding,
80
* <programlisting id="filename-iso8859-1">
81
* Character: P r e s e n t a c i ó n . s x i
82
* Hex code: 50 72 65 73 65 6e 74 61 63 69 f3 6e 2e 73 78 69
85
* However, if the application use UTF-8, the actual file name on
86
* disk would look like this:
88
* <programlisting id="filename-utf-8">
89
* Character: P r e s e n t a c i ó n . s x i
90
* Hex code: 50 72 65 73 65 6e 74 61 63 69 c3 b3 6e 2e 73 78 69
93
* Glib uses UTF-8 for its strings, and GUI toolkits like GTK+
94
* that use Glib do the same thing. If you get a file name from
95
* the file system, for example, from readdir(3) or from g_dir_read_name(),
96
* and you wish to display the file name to the user, you
97
* <emphasis>will</emphasis> need to convert it into UTF-8. The
98
* opposite case is when the user types the name of a file he
99
* wishes to save: the toolkit will give you that string in
100
* UTF-8 encoding, and you will need to convert it to the
101
* character set used for file names before you can create the
102
* file with open(2) or fopen(3).
105
* By default, Glib assumes that file names on disk are in UTF-8
106
* encoding. This is a valid assumption for file systems which
107
* were created relatively recently: most applications use UTF-8
108
* encoding for their strings, and that is also what they use for
109
* the file names they create. However, older file systems may
110
* still contain file names created in "older" encodings, such as
111
* ISO-8859-1. In this case, for compatibility reasons, you may
112
* want to instruct Glib to use that particular encoding for file
113
* names rather than UTF-8. You can do this by specifying the
114
* encoding for file names in the <link
115
* linkend="G_FILENAME_ENCODING"><envar>G_FILENAME_ENCODING</envar></link>
116
* environment variable. For example, if your installation uses
117
* ISO-8859-1 for file names, you can put this in your
118
* <filename>~/.profile</filename>:
121
* export G_FILENAME_ENCODING=ISO-8859-1
124
* Glib provides the functions g_filename_to_utf8() and
125
* g_filename_from_utf8() to perform the necessary conversions. These
126
* functions convert file names from the encoding specified in
127
* <envar>G_FILENAME_ENCODING</envar> to UTF-8 and vice-versa.
128
* <xref linkend="file-name-encodings-diagram"/> illustrates how
129
* these functions are used to convert between UTF-8 and the
130
* encoding for file names in the file system.
132
* <figure id="file-name-encodings-diagram">
133
* <title>Conversion between File Name Encodings</title>
134
* <graphic fileref="file-name-encodings.png" format="PNG"/>
136
* <refsect3 id="file-name-encodings-checklist">
137
* <title>Checklist for Application Writers</title>
139
* This section is a practical summary of the detailed
140
* description above. You can use this as a checklist of
141
* things to do to make sure your applications process file
142
* name encodings correctly.
146
* If you get a file name from the file system from a function
147
* such as readdir(3) or gtk_file_chooser_get_filename(),
148
* you do not need to do any conversion to pass that
149
* file name to functions like open(2), rename(2), or
150
* fopen(3) — those are "raw" file names which the file
151
* system understands.
154
* If you need to display a file name, convert it to UTF-8 first by
155
* using g_filename_to_utf8(). If conversion fails, display a string like
156
* "<literal>Unknown file name</literal>". <emphasis>Do not</emphasis>
157
* convert this string back into the encoding used for file names if you
158
* wish to pass it to the file system; use the original file name instead.
159
* For example, the document window of a word processor could display
160
* "Unknown file name" in its title bar but still let the user save the
161
* file, as it would keep the raw file name internally. This can happen
162
* if the user has not set the <envar>G_FILENAME_ENCODING</envar>
163
* environment variable even though he has files whose names are not
167
* If your user interface lets the user type a file name for saving or
168
* renaming, convert it to the encoding used for file names in the file
169
* system by using g_filename_from_utf8(). Pass the converted file name
170
* to functions like fopen(3). If conversion fails, ask the user to enter
171
* a different file name. This can happen if the user types Japanese
172
* characters when <envar>G_FILENAME_ENCODING</envar> is set to
173
* <literal>ISO-8859-1</literal>, for example.
59
180
/* We try to terminate strings in unknown charsets with this many zero bytes
60
181
* to ensure that multibyte strings really are nul-terminated when we return
61
182
* them from g_convert() and friends.