2
:mod:`gettext` --- Multilingual internationalization services
3
=============================================================
6
:synopsis: Multilingual internationalization services.
7
.. moduleauthor:: Barry A. Warsaw <barry@zope.com>
8
.. sectionauthor:: Barry A. Warsaw <barry@zope.com>
11
The :mod:`gettext` module provides internationalization (I18N) and localization
12
(L10N) services for your Python modules and applications. It supports both the
13
GNU ``gettext`` message catalog API and a higher level, class-based API that may
14
be more appropriate for Python files. The interface described below allows you
15
to write your module and application messages in one natural language, and
16
provide a catalog of translated messages for running under different natural
19
Some hints on localizing your Python modules and applications are also given.
22
GNU :program:`gettext` API
23
--------------------------
25
The :mod:`gettext` module defines the following API, which is very similar to
26
the GNU :program:`gettext` API. If you use this API you will affect the
27
translation of your entire application globally. Often this is what you want if
28
your application is monolingual, with the choice of language dependent on the
29
locale of your user. If you are localizing a Python module, or if your
30
application needs to switch languages on the fly, you probably want to use the
31
class-based API instead.
34
.. function:: bindtextdomain(domain[, localedir])
36
Bind the *domain* to the locale directory *localedir*. More concretely,
37
:mod:`gettext` will look for binary :file:`.mo` files for the given domain using
38
the path (on Unix): :file:`localedir/language/LC_MESSAGES/domain.mo`, where
39
*languages* is searched for in the environment variables :envvar:`LANGUAGE`,
40
:envvar:`LC_ALL`, :envvar:`LC_MESSAGES`, and :envvar:`LANG` respectively.
42
If *localedir* is omitted or ``None``, then the current binding for *domain* is
46
.. function:: bind_textdomain_codeset(domain[, codeset])
48
Bind the *domain* to *codeset*, changing the encoding of strings returned by the
49
:func:`gettext` family of functions. If *codeset* is omitted, then the current
55
.. function:: textdomain([domain])
57
Change or query the current global domain. If *domain* is ``None``, then the
58
current global domain is returned, otherwise the global domain is set to
59
*domain*, which is returned.
62
.. function:: gettext(message)
64
Return the localized translation of *message*, based on the current global
65
domain, language, and locale directory. This function is usually aliased as
66
:func:`_` in the local namespace (see examples below).
69
.. function:: lgettext(message)
71
Equivalent to :func:`gettext`, but the translation is returned in the preferred
72
system encoding, if no other encoding was explicitly set with
73
:func:`bind_textdomain_codeset`.
78
.. function:: dgettext(domain, message)
80
Like :func:`gettext`, but look the message up in the specified *domain*.
83
.. function:: ldgettext(domain, message)
85
Equivalent to :func:`dgettext`, but the translation is returned in the preferred
86
system encoding, if no other encoding was explicitly set with
87
:func:`bind_textdomain_codeset`.
92
.. function:: ngettext(singular, plural, n)
94
Like :func:`gettext`, but consider plural forms. If a translation is found,
95
apply the plural formula to *n*, and return the resulting message (some
96
languages have more than two plural forms). If no translation is found, return
97
*singular* if *n* is 1; return *plural* otherwise.
99
The Plural formula is taken from the catalog header. It is a C or Python
100
expression that has a free variable *n*; the expression evaluates to the index
101
of the plural in the catalog. See the GNU gettext documentation for the precise
102
syntax to be used in :file:`.po` files and the formulas for a variety of
105
.. versionadded:: 2.3
108
.. function:: lngettext(singular, plural, n)
110
Equivalent to :func:`ngettext`, but the translation is returned in the preferred
111
system encoding, if no other encoding was explicitly set with
112
:func:`bind_textdomain_codeset`.
114
.. versionadded:: 2.4
117
.. function:: dngettext(domain, singular, plural, n)
119
Like :func:`ngettext`, but look the message up in the specified *domain*.
121
.. versionadded:: 2.3
124
.. function:: ldngettext(domain, singular, plural, n)
126
Equivalent to :func:`dngettext`, but the translation is returned in the
127
preferred system encoding, if no other encoding was explicitly set with
128
:func:`bind_textdomain_codeset`.
130
.. versionadded:: 2.4
132
Note that GNU :program:`gettext` also defines a :func:`dcgettext` method, but
133
this was deemed not useful and so it is currently unimplemented.
135
Here's an example of typical usage for this API::
138
gettext.bindtextdomain('myapplication', '/path/to/my/language/directory')
139
gettext.textdomain('myapplication')
142
print _('This is a translatable string.')
148
The class-based API of the :mod:`gettext` module gives you more flexibility and
149
greater convenience than the GNU :program:`gettext` API. It is the recommended
150
way of localizing your Python applications and modules. :mod:`gettext` defines
151
a "translations" class which implements the parsing of GNU :file:`.mo` format
152
files, and has methods for returning either standard 8-bit strings or Unicode
153
strings. Instances of this "translations" class can also install themselves in
154
the built-in namespace as the function :func:`_`.
157
.. function:: find(domain[, localedir[, languages[, all]]])
159
This function implements the standard :file:`.mo` file search algorithm. It
160
takes a *domain*, identical to what :func:`textdomain` takes. Optional
161
*localedir* is as in :func:`bindtextdomain` Optional *languages* is a list of
162
strings, where each string is a language code.
164
If *localedir* is not given, then the default system locale directory is used.
165
[#]_ If *languages* is not given, then the following environment variables are
166
searched: :envvar:`LANGUAGE`, :envvar:`LC_ALL`, :envvar:`LC_MESSAGES`, and
167
:envvar:`LANG`. The first one returning a non-empty value is used for the
168
*languages* variable. The environment variables should contain a colon separated
169
list of languages, which will be split on the colon to produce the expected list
170
of language code strings.
172
:func:`find` then expands and normalizes the languages, and then iterates
173
through them, searching for an existing file built of these components:
175
:file:`localedir/language/LC_MESSAGES/domain.mo`
177
The first such file name that exists is returned by :func:`find`. If no such
178
file is found, then ``None`` is returned. If *all* is given, it returns a list
179
of all file names, in the order in which they appear in the languages list or
180
the environment variables.
183
.. function:: translation(domain[, localedir[, languages[, class_[, fallback[, codeset]]]]])
185
Return a :class:`Translations` instance based on the *domain*, *localedir*, and
186
*languages*, which are first passed to :func:`find` to get a list of the
187
associated :file:`.mo` file paths. Instances with identical :file:`.mo` file
188
names are cached. The actual class instantiated is either *class_* if provided,
189
otherwise :class:`GNUTranslations`. The class's constructor must take a single
190
file object argument. If provided, *codeset* will change the charset used to
191
encode translated strings.
193
If multiple files are found, later files are used as fallbacks for earlier ones.
194
To allow setting the fallback, :func:`copy.copy` is used to clone each
195
translation object from the cache; the actual instance data is still shared with
198
If no :file:`.mo` file is found, this function raises :exc:`IOError` if
199
*fallback* is false (which is the default), and returns a
200
:class:`NullTranslations` instance if *fallback* is true.
202
.. versionchanged:: 2.4
203
Added the *codeset* parameter.
206
.. function:: install(domain[, localedir[, unicode [, codeset[, names]]]])
208
This installs the function :func:`_` in Python's builtin namespace, based on
209
*domain*, *localedir*, and *codeset* which are passed to the function
210
:func:`translation`. The *unicode* flag is passed to the resulting translation
211
object's :meth:`install` method.
213
For the *names* parameter, please see the description of the translation
214
object's :meth:`install` method.
216
As seen below, you usually mark the strings in your application that are
217
candidates for translation, by wrapping them in a call to the :func:`_`
218
function, like this::
220
print _('This string will be translated.')
222
For convenience, you want the :func:`_` function to be installed in Python's
223
builtin namespace, so it is easily accessible in all modules of your
226
.. versionchanged:: 2.4
227
Added the *codeset* parameter.
229
.. versionchanged:: 2.5
230
Added the *names* parameter.
233
The :class:`NullTranslations` class
234
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
236
Translation classes are what actually implement the translation of original
237
source file message strings to translated message strings. The base class used
238
by all translation classes is :class:`NullTranslations`; this provides the basic
239
interface you can use to write your own specialized translation classes. Here
240
are the methods of :class:`NullTranslations`:
243
.. class:: NullTranslations([fp])
245
Takes an optional file object *fp*, which is ignored by the base class.
246
Initializes "protected" instance variables *_info* and *_charset* which are set
247
by derived classes, as well as *_fallback*, which is set through
248
:meth:`add_fallback`. It then calls ``self._parse(fp)`` if *fp* is not
252
.. method:: _parse(fp)
254
No-op'd in the base class, this method takes file object *fp*, and reads
255
the data from the file, initializing its message catalog. If you have an
256
unsupported message catalog file format, you should override this method
257
to parse your format.
260
.. method:: add_fallback(fallback)
262
Add *fallback* as the fallback object for the current translation
263
object. A translation object should consult the fallback if it cannot provide a
264
translation for a given message.
267
.. method:: gettext(message)
269
If a fallback has been set, forward :meth:`gettext` to the
270
fallback. Otherwise, return the translated message. Overridden in derived
274
.. method:: lgettext(message)
276
If a fallback has been set, forward :meth:`lgettext` to the
277
fallback. Otherwise, return the translated message. Overridden in derived
280
.. versionadded:: 2.4
283
.. method:: ugettext(message)
285
If a fallback has been set, forward :meth:`ugettext` to the
286
fallback. Otherwise, return the translated message as a Unicode
287
string. Overridden in derived classes.
290
.. method:: ngettext(singular, plural, n)
292
If a fallback has been set, forward :meth:`ngettext` to the
293
fallback. Otherwise, return the translated message. Overridden in derived
296
.. versionadded:: 2.3
299
.. method:: lngettext(singular, plural, n)
301
If a fallback has been set, forward :meth:`ngettext` to the
302
fallback. Otherwise, return the translated message. Overridden in derived
305
.. versionadded:: 2.4
308
.. method:: ungettext(singular, plural, n)
310
If a fallback has been set, forward :meth:`ungettext` to the fallback.
311
Otherwise, return the translated message as a Unicode string. Overridden
314
.. versionadded:: 2.3
319
Return the "protected" :attr:`_info` variable.
322
.. method:: charset()
324
Return the "protected" :attr:`_charset` variable.
327
.. method:: output_charset()
329
Return the "protected" :attr:`_output_charset` variable, which defines the
330
encoding used to return translated messages.
332
.. versionadded:: 2.4
335
.. method:: set_output_charset(charset)
337
Change the "protected" :attr:`_output_charset` variable, which defines the
338
encoding used to return translated messages.
340
.. versionadded:: 2.4
343
.. method:: install([unicode [, names]])
345
If the *unicode* flag is false, this method installs :meth:`self.gettext`
346
into the built-in namespace, binding it to ``_``. If *unicode* is true,
347
it binds :meth:`self.ugettext` instead. By default, *unicode* is false.
349
If the *names* parameter is given, it must be a sequence containing the
350
names of functions you want to install in the builtin namespace in
351
addition to :func:`_`. Supported names are ``'gettext'`` (bound to
352
:meth:`self.gettext` or :meth:`self.ugettext` according to the *unicode*
353
flag), ``'ngettext'`` (bound to :meth:`self.ngettext` or
354
:meth:`self.ungettext` according to the *unicode* flag), ``'lgettext'``
357
Note that this is only one way, albeit the most convenient way, to make
358
the :func:`_` function available to your application. Because it affects
359
the entire application globally, and specifically the built-in namespace,
360
localized modules should never install :func:`_`. Instead, they should use
361
this code to make :func:`_` available to their module::
364
t = gettext.translation('mymodule', ...)
367
This puts :func:`_` only in the module's global namespace and so only
368
affects calls within this module.
370
.. versionchanged:: 2.5
371
Added the *names* parameter.
374
The :class:`GNUTranslations` class
375
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
377
The :mod:`gettext` module provides one additional class derived from
378
:class:`NullTranslations`: :class:`GNUTranslations`. This class overrides
379
:meth:`_parse` to enable reading GNU :program:`gettext` format :file:`.mo` files
380
in both big-endian and little-endian format. It also coerces both message ids
381
and message strings to Unicode.
383
:class:`GNUTranslations` parses optional meta-data out of the translation
384
catalog. It is convention with GNU :program:`gettext` to include meta-data as
385
the translation for the empty string. This meta-data is in :rfc:`822`\ -style
386
``key: value`` pairs, and should contain the ``Project-Id-Version`` key. If the
387
key ``Content-Type`` is found, then the ``charset`` property is used to
388
initialize the "protected" :attr:`_charset` instance variable, defaulting to
389
``None`` if not found. If the charset encoding is specified, then all message
390
ids and message strings read from the catalog are converted to Unicode using
391
this encoding. The :meth:`ugettext` method always returns a Unicode, while the
392
:meth:`gettext` returns an encoded 8-bit string. For the message id arguments
393
of both methods, either Unicode strings or 8-bit strings containing only
394
US-ASCII characters are acceptable. Note that the Unicode version of the
395
methods (i.e. :meth:`ugettext` and :meth:`ungettext`) are the recommended
396
interface to use for internationalized Python programs.
398
The entire set of key/value pairs are placed into a dictionary and set as the
399
"protected" :attr:`_info` instance variable.
401
If the :file:`.mo` file's magic number is invalid, or if other problems occur
402
while reading the file, instantiating a :class:`GNUTranslations` class can raise
405
The following methods are overridden from the base class implementation:
408
.. method:: GNUTranslations.gettext(message)
410
Look up the *message* id in the catalog and return the corresponding message
411
string, as an 8-bit string encoded with the catalog's charset encoding, if
412
known. If there is no entry in the catalog for the *message* id, and a fallback
413
has been set, the look up is forwarded to the fallback's :meth:`gettext` method.
414
Otherwise, the *message* id is returned.
417
.. method:: GNUTranslations.lgettext(message)
419
Equivalent to :meth:`gettext`, but the translation is returned in the preferred
420
system encoding, if no other encoding was explicitly set with
421
:meth:`set_output_charset`.
423
.. versionadded:: 2.4
426
.. method:: GNUTranslations.ugettext(message)
428
Look up the *message* id in the catalog and return the corresponding message
429
string, as a Unicode string. If there is no entry in the catalog for the
430
*message* id, and a fallback has been set, the look up is forwarded to the
431
fallback's :meth:`ugettext` method. Otherwise, the *message* id is returned.
434
.. method:: GNUTranslations.ngettext(singular, plural, n)
436
Do a plural-forms lookup of a message id. *singular* is used as the message id
437
for purposes of lookup in the catalog, while *n* is used to determine which
438
plural form to use. The returned message string is an 8-bit string encoded with
439
the catalog's charset encoding, if known.
441
If the message id is not found in the catalog, and a fallback is specified, the
442
request is forwarded to the fallback's :meth:`ngettext` method. Otherwise, when
443
*n* is 1 *singular* is returned, and *plural* is returned in all other cases.
445
.. versionadded:: 2.3
448
.. method:: GNUTranslations.lngettext(singular, plural, n)
450
Equivalent to :meth:`gettext`, but the translation is returned in the preferred
451
system encoding, if no other encoding was explicitly set with
452
:meth:`set_output_charset`.
454
.. versionadded:: 2.4
457
.. method:: GNUTranslations.ungettext(singular, plural, n)
459
Do a plural-forms lookup of a message id. *singular* is used as the message id
460
for purposes of lookup in the catalog, while *n* is used to determine which
461
plural form to use. The returned message string is a Unicode string.
463
If the message id is not found in the catalog, and a fallback is specified, the
464
request is forwarded to the fallback's :meth:`ungettext` method. Otherwise,
465
when *n* is 1 *singular* is returned, and *plural* is returned in all other
470
n = len(os.listdir('.'))
471
cat = GNUTranslations(somefile)
472
message = cat.ungettext(
473
'There is %(num)d file in this directory',
474
'There are %(num)d files in this directory',
477
.. versionadded:: 2.3
480
Solaris message catalog support
481
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
483
The Solaris operating system defines its own binary :file:`.mo` file format, but
484
since no documentation can be found on this format, it is not supported at this
488
The Catalog constructor
489
^^^^^^^^^^^^^^^^^^^^^^^
491
.. index:: single: GNOME
493
GNOME uses a version of the :mod:`gettext` module by James Henstridge, but this
494
version has a slightly different API. Its documented usage was::
497
cat = gettext.Catalog(domain, localedir)
499
print _('hello world')
501
For compatibility with this older module, the function :func:`Catalog` is an
502
alias for the :func:`translation` function described above.
504
One difference between this module and Henstridge's: his catalog objects
505
supported access through a mapping API, but this appears to be unused and so is
506
not currently supported.
509
Internationalizing your programs and modules
510
--------------------------------------------
512
Internationalization (I18N) refers to the operation by which a program is made
513
aware of multiple languages. Localization (L10N) refers to the adaptation of
514
your program, once internationalized, to the local language and cultural habits.
515
In order to provide multilingual messages for your Python programs, you need to
516
take the following steps:
518
#. prepare your program or module by specially marking translatable strings
520
#. run a suite of tools over your marked files to generate raw messages catalogs
522
#. create language specific translations of the message catalogs
524
#. use the :mod:`gettext` module so that message strings are properly translated
526
In order to prepare your code for I18N, you need to look at all the strings in
527
your files. Any string that needs to be translated should be marked by wrapping
528
it in ``_('...')`` --- that is, a call to the function :func:`_`. For example::
530
filename = 'mylog.txt'
531
message = _('writing a log message')
532
fp = open(filename, 'w')
536
In this example, the string ``'writing a log message'`` is marked as a candidate
537
for translation, while the strings ``'mylog.txt'`` and ``'w'`` are not.
539
The Python distribution comes with two tools which help you generate the message
540
catalogs once you've prepared your source code. These may or may not be
541
available from a binary distribution, but they can be found in a source
542
distribution, in the :file:`Tools/i18n` directory.
544
The :program:`pygettext` [#]_ program scans all your Python source code looking
545
for the strings you previously marked as translatable. It is similar to the GNU
546
:program:`gettext` program except that it understands all the intricacies of
547
Python source code, but knows nothing about C or C++ source code. You don't
548
need GNU ``gettext`` unless you're also going to be translating C code (such as
549
C extension modules).
551
:program:`pygettext` generates textual Uniforum-style human readable message
552
catalog :file:`.pot` files, essentially structured human readable files which
553
contain every marked string in the source code, along with a placeholder for the
554
translation strings. :program:`pygettext` is a command line script that supports
555
a similar command line interface as :program:`xgettext`; for details on its use,
560
Copies of these :file:`.pot` files are then handed over to the individual human
561
translators who write language-specific versions for every supported natural
562
language. They send you back the filled in language-specific versions as a
563
:file:`.po` file. Using the :program:`msgfmt.py` [#]_ program (in the
564
:file:`Tools/i18n` directory), you take the :file:`.po` files from your
565
translators and generate the machine-readable :file:`.mo` binary catalog files.
566
The :file:`.mo` files are what the :mod:`gettext` module uses for the actual
567
translation processing during run-time.
569
How you use the :mod:`gettext` module in your code depends on whether you are
570
internationalizing a single module or your entire application. The next two
571
sections will discuss each case.
574
Localizing your module
575
^^^^^^^^^^^^^^^^^^^^^^
577
If you are localizing your module, you must take care not to make global
578
changes, e.g. to the built-in namespace. You should not use the GNU ``gettext``
579
API but instead the class-based API.
581
Let's say your module is called "spam" and the module's various natural language
582
translation :file:`.mo` files reside in :file:`/usr/share/locale` in GNU
583
:program:`gettext` format. Here's what you would put at the top of your
587
t = gettext.translation('spam', '/usr/share/locale')
590
If your translators were providing you with Unicode strings in their :file:`.po`
591
files, you'd instead do::
594
t = gettext.translation('spam', '/usr/share/locale')
598
Localizing your application
599
^^^^^^^^^^^^^^^^^^^^^^^^^^^
601
If you are localizing your application, you can install the :func:`_` function
602
globally into the built-in namespace, usually in the main driver file of your
603
application. This will let all your application-specific files just use
604
``_('...')`` without having to explicitly install it in each file.
606
In the simple case then, you need only add the following bit of code to the main
607
driver file of your application::
610
gettext.install('myapplication')
612
If you need to set the locale directory or the *unicode* flag, you can pass
613
these into the :func:`install` function::
616
gettext.install('myapplication', '/usr/share/locale', unicode=1)
619
Changing languages on the fly
620
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
622
If your program needs to support many languages at the same time, you may want
623
to create multiple translation instances and then switch between them
624
explicitly, like so::
628
lang1 = gettext.translation('myapplication', languages=['en'])
629
lang2 = gettext.translation('myapplication', languages=['fr'])
630
lang3 = gettext.translation('myapplication', languages=['de'])
632
# start by using language1
635
# ... time goes by, user selects language 2
638
# ... more time goes by, user selects language 3
642
Deferred translations
643
^^^^^^^^^^^^^^^^^^^^^
645
In most coding situations, strings are translated where they are coded.
646
Occasionally however, you need to mark strings for translation, but defer actual
647
translation until later. A classic example is::
649
animals = ['mollusk',
659
Here, you want to mark the strings in the ``animals`` list as being
660
translatable, but you don't actually want to translate them until they are
663
Here is one way you can handle this situation::
665
def _(message): return message
667
animals = [_('mollusk'),
680
This works because the dummy definition of :func:`_` simply returns the string
681
unchanged. And this dummy definition will temporarily override any definition
682
of :func:`_` in the built-in namespace (until the :keyword:`del` command). Take
683
care, though if you have a previous definition of :func:`_` in the local
686
Note that the second use of :func:`_` will not identify "a" as being
687
translatable to the :program:`pygettext` program, since it is not a string.
689
Another way to handle this is with the following example::
691
def N_(message): return message
693
animals = [N_('mollusk'),
704
In this case, you are marking translatable strings with the function :func:`N_`,
705
[#]_ which won't conflict with any definition of :func:`_`. However, you will
706
need to teach your message extraction program to look for translatable strings
707
marked with :func:`N_`. :program:`pygettext` and :program:`xpot` both support
708
this through the use of command line switches.
711
:func:`gettext` vs. :func:`lgettext`
712
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
714
In Python 2.4 the :func:`lgettext` family of functions were introduced. The
715
intention of these functions is to provide an alternative which is more
716
compliant with the current implementation of GNU gettext. Unlike
717
:func:`gettext`, which returns strings encoded with the same codeset used in the
718
translation file, :func:`lgettext` will return strings encoded with the
719
preferred system encoding, as returned by :func:`locale.getpreferredencoding`.
720
Also notice that Python 2.4 introduces new functions to explicitly choose the
721
codeset used in translated strings. If a codeset is explicitly set, even
722
:func:`lgettext` will return translated strings in the requested codeset, as
723
would be expected in the GNU gettext implementation.
729
The following people contributed code, feedback, design suggestions, previous
730
implementations, and valuable experience to the creation of this module:
736
* Juan David Ibáñez Palomar
748
.. rubric:: Footnotes
750
.. [#] The default locale directory is system dependent; for example, on RedHat Linux
751
it is :file:`/usr/share/locale`, but on Solaris it is :file:`/usr/lib/locale`.
752
The :mod:`gettext` module does not try to support these system dependent
753
defaults; instead its default is :file:`sys.prefix/share/locale`. For this
754
reason, it is always best to call :func:`bindtextdomain` with an explicit
755
absolute path at the start of your application.
757
.. [#] See the footnote for :func:`bindtextdomain` above.
759
.. [#] François Pinard has written a program called :program:`xpot` which does a
760
similar job. It is available as part of his :program:`po-utils` package at http
761
://po-utils.progiciels-bpi.ca/.
763
.. [#] :program:`msgfmt.py` is binary compatible with GNU :program:`msgfmt` except that
764
it provides a simpler, all-Python implementation. With this and
765
:program:`pygettext.py`, you generally won't need to install the GNU
766
:program:`gettext` package to internationalize your Python applications.
768
.. [#] The choice of :func:`N_` here is totally arbitrary; it could have just as easily
769
been :func:`MarkThisStringForTranslation`.