1
by Chuck Short
Initial version |
1 |
<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Chapter 30. Unicode/Charsets</title><link rel="stylesheet" href="../samba.css" type="text/css"><meta name="generator" content="DocBook XSL Stylesheets V1.74.0"><link rel="home" href="index.html" title="The Official Samba 3.4.x HOWTO and Reference Guide"><link rel="up" href="optional.html" title="Part III. Advanced Configuration"><link rel="prev" href="integrate-ms-networks.html" title="Chapter 29. Integrating MS Windows Networks with Samba"><link rel="next" href="Backup.html" title="Chapter 31. Backup Techniques"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Chapter 30. Unicode/Charsets</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="integrate-ms-networks.html">Prev</a> </td><th width="60%" align="center">Part III. Advanced Configuration</th><td width="20%" align="right"> <a accesskey="n" href="Backup.html">Next</a></td></tr></table><hr></div><div class="chapter" lang="en"><div class="titlepage"><div><div><h2 class="title"><a name="unicode"></a>Chapter 30. Unicode/Charsets</h2></div><div><div class="author"><h3 class="author"><span class="firstname">Jelmer</span> <span class="othername">R.</span> <span class="orgname">The Samba Team</span> <span class="surname">Vernooij</span></h3><div class="affiliation"><span class="orgname">The Samba Team<br></span><div class="address"><p><code class="email"><<a class="email" href="mailto:jelmer@samba.org">jelmer@samba.org</a>></code></p></div></div></div></div><div><div class="author"><h3 class="author"><span class="firstname">John</span> <span class="othername">H.</span> <span class="orgname">Samba Team</span> <span class="surname">Terpstra</span></h3><div class="affiliation"><span class="orgname">Samba Team<br></span><div class="address"><p><code class="email"><<a class="email" href="mailto:jht@samba.org">jht@samba.org</a>></code></p></div></div></div></div><div><div class="author"><h3 class="author"><span class="firstname">TAKAHASHI</span> <span class="surname">Motonobu</span></h3><span class="contrib">Japanese character support</span> <div class="affiliation"><div class="address"><p><code class="email"><<a class="email" href="mailto:monyo@home.monyo.com">monyo@home.monyo.com</a>></code></p></div></div></div></div><div><p class="pubdate">25 March 2003</p></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="sect1"><a href="unicode.html#id2669864">Features and Benefits</a></span></dt><dt><span class="sect1"><a href="unicode.html#id2669916">What Are Charsets and Unicode?</a></span></dt><dt><span class="sect1"><a href="unicode.html#id2670049">Samba and Charsets</a></span></dt><dt><span class="sect1"><a href="unicode.html#id2670185">Conversion from Old Names</a></span></dt><dt><span class="sect1"><a href="unicode.html#id2670216">Japanese Charsets</a></span></dt><dd><dl><dt><span class="sect2"><a href="unicode.html#id2670356">Basic Parameter Setting</a></span></dt><dt><span class="sect2"><a href="unicode.html#id2670996">Individual Implementations</a></span></dt><dt><span class="sect2"><a href="unicode.html#id2671120">Migration from Samba-2.2 Series</a></span></dt></dl></dd><dt><span class="sect1"><a href="unicode.html#id2671266">Common Errors</a></span></dt><dd><dl><dt><span class="sect2"><a href="unicode.html#id2671272">CP850.so Can't Be Found</a></span></dt></dl></dd></dl></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id2669864"></a>Features and Benefits</h2></div></div></div><p> |
2 |
<a class="indexterm" name="id2669872"></a> |
|
3 |
Every industry eventually matures. One of the great areas of maturation is in |
|
4 |
the focus that has been given over the past decade to make it possible for anyone |
|
5 |
anywhere to use a computer. It has not always been that way. In fact, not so long |
|
6 |
ago, it was common for software to be written for exclusive use in the country of |
|
7 |
origin. |
|
8 |
</p><p> |
|
9 |
Of all the effort that has been brought to bear on providing native |
|
10 |
language support for all computer users, the efforts of the |
|
11 |
<a class="ulink" href="http://www.openi18n.org/" target="_top">Openi18n organization</a> |
|
12 |
is deserving of special mention. |
|
13 |
</p><p> |
|
14 |
<a class="indexterm" name="id2669900"></a> |
|
15 |
Samba-2.x supported a single locale through a mechanism called |
|
16 |
<span class="emphasis"><em>codepages</em></span>. Samba-3 is destined to become a truly transglobal |
|
17 |
file- and printer-sharing platform. |
|
18 |
</p></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id2669916"></a>What Are Charsets and Unicode?</h2></div></div></div><p> |
|
19 |
<a class="indexterm" name="id2669924"></a> |
|
20 |
Computers communicate in numbers. In texts, each number is |
|
21 |
translated to a corresponding letter. The meaning that will be assigned |
|
22 |
to a certain number depends on the <span class="emphasis"><em>character set (charset) |
|
23 |
</em></span> that is used. |
|
24 |
</p><p> |
|
25 |
<a class="indexterm" name="id2669941"></a> |
|
26 |
<a class="indexterm" name="id2669948"></a> |
|
27 |
A charset can be seen as a table that is used to translate numbers to |
|
28 |
letters. Not all computers use the same charset (there are charsets |
|
29 |
with German umlauts, Japanese characters, and so on). The American Standard Code |
|
30 |
for Information Interchange (ASCII) encoding system has been the normative character |
|
31 |
encoding scheme used by computers to date. This employs a charset that contains |
|
32 |
256 characters. Using this mode of encoding, each character takes exactly one byte. |
|
33 |
</p><p> |
|
34 |
<a class="indexterm" name="id2669966"></a> |
|
35 |
<a class="indexterm" name="id2669973"></a> |
|
36 |
There are also charsets that support extended characters, but those need at least |
|
37 |
twice as much storage space as does ASCII encoding. Such charsets can contain |
|
38 |
<code class="literal">256 * 256 = 65536</code> characters, which is more than all possible |
|
39 |
characters one could think of. They are called multibyte charsets because they use |
|
40 |
more then one byte to store one character. |
|
41 |
</p><p> |
|
42 |
<a class="indexterm" name="id2669994"></a> |
|
43 |
One standardized multibyte charset encoding scheme is known as |
|
44 |
<a class="ulink" href="http://www.unicode.org/" target="_top">unicode</a>. A big advantage of using a |
|
45 |
multibyte charset is that you only need one. There is no need to make sure two |
|
46 |
computers use the same charset when they are communicating. |
|
47 |
</p><p> |
|
48 |
<a class="indexterm" name="id2670015"></a> |
|
49 |
<a class="indexterm" name="id2670022"></a> |
|
50 |
<a class="indexterm" name="id2670028"></a> |
|
51 |
Old Windows clients use single-byte charsets, named |
|
52 |
<em class="parameter"><code>codepages</code></em>, by Microsoft. However, there is no support for |
|
53 |
negotiating the charset to be used in the SMB/CIFS protocol. Thus, you |
|
54 |
have to make sure you are using the same charset when talking to an older client. |
|
55 |
Newer clients (Windows NT, 200x, XP) talk Unicode over the wire. |
|
56 |
</p></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id2670049"></a>Samba and Charsets</h2></div></div></div><p> |
|
57 |
<a class="indexterm" name="id2670057"></a> |
|
58 |
<a class="indexterm" name="id2670064"></a> |
|
59 |
As of Samba-3, Samba can (and will) talk Unicode over the wire. Internally, |
|
60 |
Samba knows of three kinds of character sets: |
|
61 |
</p><div class="variablelist"><dl><dt><span class="term"><a class="link" href="smb.conf.5.html#UNIXCHARSET" target="_top">unix charset</a></span></dt><dd><p> |
|
62 |
<a class="indexterm" name="id2670096"></a> |
|
63 |
<a class="indexterm" name="id2670102"></a> |
|
64 |
This is the charset used internally by your operating system. |
|
65 |
The default is <code class="constant">UTF-8</code>, which is fine for most |
|
66 |
systems and covers all characters in all languages. The default |
|
67 |
in previous Samba releases was to save filenames in the encoding of the |
|
68 |
clients for example, CP850 for Western European countries. |
|
69 |
</p></dd><dt><span class="term"><a class="link" href="smb.conf.5.html#DISPLAYCHARSET" target="_top">display charset</a></span></dt><dd><p>This is the charset Samba uses to print messages |
|
70 |
on your screen. It should generally be the same as the <em class="parameter"><code>unix charset</code></em>. |
|
71 |
</p></dd><dt><span class="term"><a class="link" href="smb.conf.5.html#DOSCHARSET" target="_top">dos charset</a></span></dt><dd><p>This is the charset Samba uses when communicating with |
|
72 |
DOS and Windows 9x/Me clients. It will talk Unicode to all newer clients. |
|
73 |
The default depends on the charsets you have installed on your system. |
|
74 |
Run <code class="literal">testparm -v | grep "dos charset"</code> to see |
|
75 |
what the default is on your system. |
|
76 |
</p></dd></dl></div></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id2670185"></a>Conversion from Old Names</h2></div></div></div><p> |
|
77 |
<a class="indexterm" name="id2670193"></a> |
|
78 |
Because previous Samba versions did not do any charset conversion, |
|
79 |
characters in filenames are usually not correct in the UNIX charset but only |
|
80 |
for the local charset used by the DOS/Windows clients. |
|
81 |
</p><p>Bjoern Jacke has written a utility named <a class="ulink" href="http://j3e.de/linux/convmv/" target="_top">convmv</a> |
|
82 |
that can convert whole directory structures to different charsets with one single command. |
|
83 |
</p></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id2670216"></a>Japanese Charsets</h2></div></div></div><p> |
|
84 |
Setting up Japanese charsets is quite difficult. This is mainly because: |
|
85 |
</p><div class="itemizedlist"><ul type="disc"><li><p> |
|
86 |
<a class="indexterm" name="id2670232"></a> |
|
87 |
The Windows character set is extended from the original legacy Japanese |
|
88 |
standard (JIS X 0208) and is not standardized. This means that the strictly |
|
89 |
standardized implementation cannot support the full Windows character set. |
|
90 |
</p></li><li><p> |
|
91 |
<a class="indexterm" name="id2670247"></a> |
|
92 |
<a class="indexterm" name="id2670254"></a> |
|
93 |
<a class="indexterm" name="id2670260"></a> |
|
94 |
<a class="indexterm" name="id2670267"></a> |
|
95 |
<a class="indexterm" name="id2670274"></a> |
|
96 |
Mainly for historical reasons, there are several encoding methods in |
|
97 |
Japanese, which are not fully compatible with each other. There are |
|
98 |
two major encoding methods. One is the Shift_JIS series used in Windows |
|
99 |
and some UNIXes. The other is the EUC-JP series used in most UNIXes |
|
100 |
and Linux. Moreover, Samba previously also offered several unique encoding |
|
101 |
methods, named CAP and HEX, to keep interoperability with CAP/NetAtalk and |
|
102 |
UNIXes that can't use Japanese filenames. Some implementations of the |
|
103 |
EUC-JP series can't support the full Windows character set. |
|
104 |
</p></li><li><p>There are some code conversion tables between Unicode and legacy |
|
105 |
Japanese character sets. One is compatible with Windows, another one |
|
106 |
is based on the reference of the Unicode consortium, and others are |
|
107 |
a mixed implementation. The Unicode consortium does not officially |
|
108 |
define any conversion tables between Unicode and legacy character |
|
109 |
sets, so there cannot be standard one. |
|
110 |
</p></li><li><p>The character set and conversion tables available in iconv() depend |
|
111 |
on the iconv library that is available. Next to that, the Japanese locale |
|
112 |
names may be different on different systems. This means that the value of |
|
113 |
the charset parameters depends on the implementation of iconv() you are using. |
|
114 |
</p><p> |
|
115 |
<a class="indexterm" name="id2670324"></a> |
|
116 |
<a class="indexterm" name="id2670330"></a> |
|
117 |
<a class="indexterm" name="id2670337"></a> |
|
118 |
<a class="indexterm" name="id2670344"></a> |
|
119 |
Though 2-byte fixed UCS-2 encoding is used in Windows internally, |
|
120 |
Shift_JIS series encoding is usually used in Japanese environments |
|
121 |
as ASCII encoding is in English environments. |
|
122 |
</p></li></ul></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2670356"></a>Basic Parameter Setting</h3></div></div></div><p> |
|
123 |
<a class="indexterm" name="id2670363"></a> |
|
124 |
The <a class="link" href="smb.conf.5.html#DOSCHARSET" target="_top">dos charset</a> and |
|
125 |
<a class="link" href="smb.conf.5.html#DISPLAYCHARSET" target="_top">display charset</a> |
|
126 |
should be set to the locale compatible with the character set |
|
127 |
and encoding method used on Windows. This is usually CP932 |
|
128 |
but sometimes has a different name. |
|
129 |
</p><p> |
|
130 |
<a class="indexterm" name="id2670400"></a> |
|
131 |
<a class="indexterm" name="id2670406"></a> |
|
132 |
<a class="indexterm" name="id2670413"></a> |
|
133 |
The <a class="link" href="smb.conf.5.html#UNIXCHARSET" target="_top">unix charset</a> can be either Shift_JIS series, |
|
134 |
EUC-JP series, or UTF-8. UTF-8 is always available, but the availability of other locales |
|
135 |
and the name itself depends on the system. |
|
136 |
</p><p> |
|
137 |
Additionally, you can consider using the Shift_JIS series as the |
|
138 |
value of the <a class="link" href="smb.conf.5.html#UNIXCHARSET" target="_top">unix charset</a> |
|
139 |
parameter by using the vfs_cap module, which does the same thing as |
|
140 |
setting “<span class="quote">coding system = CAP</span>” in the Samba 2.2 series. |
|
141 |
</p><p> |
|
142 |
Where to set <a class="link" href="smb.conf.5.html#UNIXCHARSET" target="_top">unix charset</a> |
|
143 |
to is a difficult question. Here is a list of details, advantages, and |
|
144 |
disadvantages of using a certain value. |
|
145 |
</p><div class="variablelist"><dl><dt><span class="term">Shift_JIS series</span></dt><dd><p> |
|
146 |
Shift_JIS series means a locale that is equivalent to <code class="constant">Shift_JIS</code>, |
|
147 |
used as a standard on Japanese Windows. In the case of <code class="constant">Shift_JIS</code>, |
|
148 |
for example, if a Japanese filename consists of 0x8ba4 and 0x974c |
|
149 |
(a 4-bytes Japanese character string meaning “<span class="quote">share</span>”) and “<span class="quote">.txt</span>” |
|
150 |
is written from Windows on Samba, the filename on UNIX becomes |
|
151 |
0x8ba4, 0x974c, “<span class="quote">.txt</span>” (an 8-byte BINARY string), same as Windows. |
|
152 |
</p><p>Since Shift_JIS series is usually used on some commercial-based |
|
153 |
UNIXes; hp-ux and AIX as the Japanese locale (however, it is also possible |
|
154 |
to use the EUC-JP locale series). To use Shift_JIS series on these platforms, |
|
155 |
Japanese filenames created from Windows can be referred to also on |
|
156 |
UNIX.</p><p> |
|
157 |
If your UNIX is already working with Shift_JIS and there is a user |
|
158 |
who needs to use Japanese filenames written from Windows, the |
|
159 |
Shift_JIS series is the best choice. However, broken filenames |
|
160 |
may be displayed, and some commands that cannot handle non-ASCII |
|
161 |
filenames may be aborted during parsing filenames. Especially, there |
|
162 |
may be “<span class="quote">\ (0x5c)</span>” in filenames, which need to be handled carefully. |
|
163 |
It is best to not touch filenames written from Windows on UNIX. |
|
164 |
</p><p> |
|
165 |
Note that most Japanized free software actually works with EUC-JP |
|
166 |
only. It is good practice to verify that the Japanized free software can work |
|
167 |
with Shift_JIS. |
|
168 |
</p></dd><dt><span class="term">EUC-JP series</span></dt><dd><p> |
|
169 |
<a class="indexterm" name="id2670547"></a> |
|
170 |
<a class="indexterm" name="id2670554"></a> |
|
171 |
EUC-JP series means a locale that is equivalent to the industry |
|
172 |
standard called EUC-JP, widely used in Japanese UNIX (although EUC |
|
173 |
contains specifications for languages other than Japanese, such as |
|
174 |
EUC-KR). In the case of EUC-JP series, for example, if a Japanese |
|
175 |
filename consists of 0x8ba4 and 0x974c and “<span class="quote">.txt</span>” is written from |
|
176 |
Windows on Samba, the filename on UNIX becomes 0xb6a6, 0xcdad, |
|
177 |
“<span class="quote">.txt</span>” (an 8-byte BINARY string). |
|
178 |
</p><p> |
|
179 |
<a class="indexterm" name="id2670579"></a> |
|
180 |
<a class="indexterm" name="id2670585"></a> |
|
181 |
<a class="indexterm" name="id2670592"></a> |
|
182 |
<a class="indexterm" name="id2670599"></a> |
|
183 |
<a class="indexterm" name="id2670606"></a> |
|
184 |
<a class="indexterm" name="id2670613"></a> |
|
185 |
<a class="indexterm" name="id2670619"></a> |
|
186 |
<a class="indexterm" name="id2670626"></a> |
|
187 |
<a class="indexterm" name="id2670633"></a> |
|
188 |
<a class="indexterm" name="id2670640"></a> |
|
189 |
Since EUC-JP is usually used on open source UNIX, Linux, and FreeBSD, and on commercial-based UNIX, Solaris, |
|
190 |
IRIX, and Tru64 UNIX as Japanese locale (however, it is also possible on Solaris to use Shift_JIS and UTF-8, |
|
191 |
and on Tru64 UNIX it is possible to use Shift_JIS). To use EUC-JP series, most Japanese filenames created from |
|
192 |
Windows can be referred to also on UNIX. Also, most Japanized free software works mainly with EUC-JP only. |
|
193 |
</p><p> |
|
194 |
It is recommended to choose EUC-JP series when using Japanese filenames on UNIX. |
|
195 |
</p><p> |
|
196 |
Although there is no character that needs to be carefully treated |
|
197 |
like “<span class="quote">\ (0x5c)</span>”, broken filenames may be displayed and some |
|
198 |
commands that cannot handle non-ASCII filenames may be aborted |
|
199 |
during parsing filenames. |
|
200 |
</p><p> |
|
201 |
<a class="indexterm" name="id2670673"></a> |
|
202 |
Moreover, if you built Samba using differently installed libiconv, |
|
203 |
the eucJP-ms locale included in libiconv and EUC-JP series locale |
|
204 |
included in the operating system may not be compatible. In this case, you may need to |
|
205 |
avoid using incompatible characters for filenames. |
|
206 |
</p></dd><dt><span class="term">UTF-8</span></dt><dd><p> |
|
207 |
UTF-8 means a locale equivalent to UTF-8, the international standard defined by the Unicode consortium. In |
|
208 |
UTF-8, a <em class="parameter"><code>character</code></em> is expressed using 1 to 3 bytes. In case of the Japanese language, |
|
209 |
most characters are expressed using 3 bytes. Since on Windows Shift_JIS, where a character is expressed with 1 |
|
210 |
or 2 bytes is used to express Japanese, basically a byte length of a UTF-8 string the length of the UTF-8 |
|
211 |
string is 1.5 times that of the original Shift_JIS string. In the case of UTF-8, for example, if a Japanese |
|
212 |
filename consists of 0x8ba4 and 0x974c, and “<span class="quote">.txt</span>” is written from Windows on Samba, the filename |
|
213 |
on UNIX becomes 0xe585, 0xb1e6, 0x9c89, “<span class="quote">.txt</span>” (a 10-byte BINARY string). |
|
214 |
</p><p> |
|
215 |
For systems where iconv() is not available or where iconv()'s locales |
|
216 |
are not compatible with Windows, UTF-8 is the only locale available. |
|
217 |
</p><p> |
|
218 |
There are no systems that use UTF-8 as the default locale for Japanese. |
|
219 |
</p><p> |
|
220 |
Some broken filenames may be displayed, and some commands that |
|
221 |
cannot handle non-ASCII filenames may be aborted during parsing |
|
222 |
filenames. Especially, there may be “<span class="quote">\ (0x5c)</span>” in filenames, which |
|
223 |
must be handled carefully, so you had better not touch filenames |
|
224 |
written from Windows on UNIX. |
|
225 |
</p><p> |
|
226 |
<a class="indexterm" name="id2670746"></a> |
|
227 |
<a class="indexterm" name="id2670753"></a> |
|
228 |
<a class="indexterm" name="id2670760"></a> |
|
229 |
In addition, although it is not directly concerned with Samba, since |
|
230 |
there is a delicate difference between the iconv() function, which is |
|
231 |
generally used on UNIX, and the functions used on other platforms, |
|
232 |
such as Windows and Java, so far is concerens the conversion between |
|
233 |
Shift_JIS and Unicode UTF-8 must be done with care and recognition |
|
234 |
of the limitations involved in the process. |
|
235 |
</p><p> |
|
236 |
<a class="indexterm" name="id2670777"></a> |
|
237 |
Although Mac OS X uses UTF-8 as its encoding method for filenames, |
|
238 |
it uses an extended UTF-8 specification that Samba cannot handle, so |
|
239 |
UTF-8 locale is not available for Mac OS X. |
|
240 |
</p></dd><dt><span class="term">Shift_JIS series + vfs_cap (CAP encoding)</span></dt><dd><p> |
|
241 |
<a class="indexterm" name="id2670798"></a> |
|
242 |
<a class="indexterm" name="id2670804"></a> |
|
243 |
<a class="indexterm" name="id2670811"></a> |
|
244 |
CAP encoding means a specification used in CAP and NetAtalk, file |
|
245 |
server software for Macintosh. In the case of CAP encoding, for |
|
246 |
example, if a Japanese filename consists of 0x8ba4 and 0x974c, and |
|
247 |
“<span class="quote">.txt</span>” is written from Windows on Samba, the filename on UNIX |
|
248 |
becomes “<span class="quote">:8b:a4:97L.txt</span>” (a 14 bytes ASCII string). |
|
249 |
</p><p> |
|
250 |
For CAP encoding, a byte that cannot be expressed as an ASCII |
|
251 |
character (0x80 or above) is encoded in an “<span class="quote">:xx</span>” form. You need to take |
|
252 |
care of containing a “<span class="quote">\(0x5c)</span>” in a filename, but filenames are not |
|
253 |
broken in a system that cannot handle non-ASCII filenames. |
|
254 |
</p><p> |
|
255 |
The greatest merit of CAP encoding is the compatibility of encoding |
|
256 |
filenames with CAP or NetAtalk. These are respectively the Columbia Appletalk |
|
257 |
Protocol, and the NetAtalk Open Source software project. |
|
258 |
Since these software applications write a file name on UNIX with CAP encoding, if a |
|
259 |
directory is shared with both Samba and NetAtalk, you need to use |
|
260 |
CAP encoding to avoid non-ASCII filenames from being broken. |
|
261 |
</p><p> |
|
262 |
However, recently, NetAtalk has been |
|
263 |
patched on some systems to write filenames with EUC-JP (e.g., Japanese original Vine Linux). |
|
264 |
In this case, you need to choose EUC-JP series instead of CAP encoding. |
|
265 |
</p><p> |
|
266 |
vfs_cap itself is available for non-Shift_JIS series locales for |
|
267 |
systems that cannot handle non-ASCII characters or systems that |
|
268 |
share files with NetAtalk. |
|
269 |
</p><p> |
|
270 |
To use CAP encoding on Samba-3, you should use the unix charset parameter and VFS |
|
271 |
as in <a class="link" href="unicode.html#vfscap-intl" title="Example 30.1. VFS CAP">the VFS CAP smb.conf file</a>. |
|
272 |
</p><div class="example"><a name="vfscap-intl"></a><p class="title"><b>Example 30.1. VFS CAP</b></p><div class="example-contents"><table class="simplelist" border="0" summary="Simple list"><tr><td> </td></tr><tr><td><em class="parameter"><code>[global]</code></em></td></tr><tr><td># the locale name "CP932" may be different</td></tr><tr><td><a class="indexterm" name="id2670910"></a><em class="parameter"><code>dos charset = CP932</code></em></td></tr><tr><td><a class="indexterm" name="id2670922"></a><em class="parameter"><code>unix charset = CP932</code></em></td></tr><tr><td> </td></tr><tr><td><em class="parameter"><code>[cap-share]</code></em></td></tr><tr><td><a class="indexterm" name="id2670943"></a><em class="parameter"><code>vfs option = cap</code></em></td></tr></table></div></div><br class="example-break"><p> |
|
273 |
<a class="indexterm" name="id2670958"></a> |
|
274 |
<a class="indexterm" name="id2670964"></a> |
|
275 |
<a class="indexterm" name="id2670971"></a> |
|
276 |
<a class="indexterm" name="id2670978"></a> |
|
277 |
You should set CP932 if using GNU libiconv for unix charset. With this setting, |
|
278 |
filenames in the “<span class="quote">cap-share</span>” share are written with CAP encoding. |
|
279 |
</p></dd></dl></div></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2670996"></a>Individual Implementations</h3></div></div></div><p> |
|
280 |
Here is some additional information regarding individual implementations: |
|
281 |
</p><div class="variablelist"><dl><dt><span class="term">GNU libiconv</span></dt><dd><p> |
|
282 |
To handle Japanese correctly, you should apply the patch |
|
283 |
<a class="ulink" href="http://www2d.biglobe.ne.jp/~msyk/software/libiconv-patch.html" target="_top">libiconv-1.8-cp932-patch.diff.gz</a> |
|
284 |
to libiconv-1.8. |
|
285 |
</p><p> |
|
286 |
Using the patched libiconv-1.8, these settings are available: |
|
287 |
</p><pre class="programlisting"> |
|
288 |
dos charset = CP932 |
|
289 |
unix charset = CP932 / eucJP-ms / UTF-8 |
|
290 |
| | |
|
291 |
| +-- EUC-JP series |
|
292 |
+-- Shift_JIS series |
|
293 |
display charset = CP932 |
|
294 |
</pre><p> |
|
295 |
Other Japanese locales (for example, Shift_JIS and EUC-JP) should not |
|
296 |
be used because of the lack of the compatibility with Windows. |
|
297 |
</p></dd><dt><span class="term">GNU glibc</span></dt><dd><p> |
|
298 |
To handle Japanese correctly, you should apply a <a class="ulink" href="http://www2d.biglobe.ne.jp/~msyk/software/glibc/" target="_top">patch</a> |
|
299 |
to glibc-2.2.5/2.3.1/2.3.2 or should use the patch-merged versions, glibc-2.3.3 or later. |
|
300 |
</p><p> |
|
301 |
Using the above glibc, these setting are available: |
|
302 |
</p><table class="simplelist" border="0" summary="Simple list"><tr><td><a class="indexterm" name="id2671073"></a><em class="parameter"><code>dos charset = CP932</code></em></td></tr><tr><td><a class="indexterm" name="id2671085"></a><em class="parameter"><code>unix charset = CP932 / eucJP-ms / UTF-8</code></em></td></tr><tr><td><a class="indexterm" name="id2671097"></a><em class="parameter"><code>display charset = CP932</code></em></td></tr></table><p> |
|
303 |
</p><p> |
|
304 |
Other Japanese locales (for example, Shift_JIS and EUC-JP) should not |
|
305 |
be used because of the lack of the compatibility with Windows. |
|
306 |
</p></dd></dl></div></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2671120"></a>Migration from Samba-2.2 Series</h3></div></div></div><p> |
|
307 |
Prior to Samba-2.2 series, the “<span class="quote">coding system</span>” parameter was used. The default codepage in Samba |
|
308 |
2.x was code page 850. In the Samba-3 series this has been replaced with the <a class="link" href="smb.conf.5.html#UNIXCHARSET" target="_top">unix charset</a> parameter. <a class="link" href="unicode.html#japancharsets" title="Table 30.1. Japanese Character Sets in Samba-2.2 and Samba-3">Japanese Character Sets in Samba-2.2 and Samba-3</a> |
|
309 |
shows the mapping table when migrating from the Samba-2.2 series to Samba-3. |
|
310 |
</p><div class="table"><a name="japancharsets"></a><p class="title"><b>Table 30.1. Japanese Character Sets in Samba-2.2 and Samba-3</b></p><div class="table-contents"><table summary="Japanese Character Sets in Samba-2.2 and Samba-3" border="1"><colgroup><col align="center"><col align="center"></colgroup><thead><tr><th align="center">Samba-2.2 Coding System</th><th align="center">Samba-3 unix charset</th></tr></thead><tbody><tr><td align="center">SJIS</td><td align="center">Shift_JIS series</td></tr><tr><td align="center">EUC</td><td align="center">EUC-JP series</td></tr><tr><td align="center">EUC3<sup>[<a name="id2671215" href="#ftn.id2671215" class="footnote">a</a>]</sup></td><td align="center">EUC-JP series</td></tr><tr><td align="center">CAP</td><td align="center">Shift_JIS series + VFS</td></tr><tr><td align="center">HEX</td><td align="center">currently none</td></tr><tr><td align="center">UTF8</td><td align="center">UTF-8</td></tr><tr><td align="center">UTF8-Mac<sup>[<a name="id2671246" href="#ftn.id2671246" class="footnote">b</a>]</sup></td><td align="center">currently none</td></tr><tr><td align="center">others</td><td align="center">none</td></tr></tbody><tbody class="footnotes"><tr><td colspan="2"><div class="footnote"><p><sup>[<a name="ftn.id2671215" href="#id2671215" class="para">a</a>] </sup>Only exists in Japanese Samba version</p></div><div class="footnote"><p><sup>[<a name="ftn.id2671246" href="#id2671246" class="para">b</a>] </sup>Only exists in Japanese Samba version</p></div></td></tr></tbody></table></div></div><br class="table-break"></div></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id2671266"></a>Common Errors</h2></div></div></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2671272"></a>CP850.so Can't Be Found</h3></div></div></div><p>“<span class="quote">Samba is complaining about a missing <code class="filename">CP850.so</code> file.</span>”</p><p> |
|
311 |
CP850 is the default <a class="link" href="smb.conf.5.html#DOSCHARSET" target="_top">dos charset</a>. |
|
312 |
The <a class="link" href="smb.conf.5.html#DOSCHARSET" target="_top">dos charset</a> is used to convert data to the codepage used by your DOS clients. |
|
313 |
If you do not have any DOS clients, you can safely ignore this message. </p><p> |
|
314 |
CP850 should be supported by your local iconv implementation. Make sure you have all the required packages installed. |
|
315 |
If you compiled Samba from source, make sure that the configure process found iconv. This can be |
|
316 |
confirmed by checking the <code class="filename">config.log</code> file that is generated when |
|
317 |
<code class="literal">configure</code> is executed.</p></div></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="integrate-ms-networks.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="optional.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="Backup.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Chapter 29. Integrating MS Windows Networks with Samba </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> Chapter 31. Backup Techniques</td></tr></table></div></body></html> |