1
\input texinfo @c -*-texinfo-*-
3
@setfilename gettext.info
4
@settitle GNU @code{gettext} utilities
10
@dircategory GNU Gettext Utilities
12
* Gettext: (gettext). GNU gettext utilities.
13
* gettextize: (gettext)gettextize Invocation. Prepare a package for gettext.
14
* msgfmt: (gettext)msgfmt Invocation. Make MO files out of PO files.
15
* msgmerge: (gettext)msgmerge Invocation. Update two PO files into one.
16
* xgettext: (gettext)xgettext Invocation. Extract strings into a PO file.
20
This file provides documentation for GNU @code{gettext} utilities.
21
It also serves as a reference for the free Translation Project.
23
Copyright (C) 1995, 1996, 1997, 1998, 2001 Free Software Foundation, Inc.
25
Permission is granted to make and distribute verbatim copies of
26
this manual provided the copyright notice and this permission notice
27
are preserved on all copies.
30
Permission is granted to process this file through TeX and print the
31
results, provided the printed document carries copying permission
32
notice identical to this one except for the removal of this paragraph
33
(this paragraph not being relevant to the printed manual).
36
Permission is granted to copy and distribute modified versions of this
37
manual under the conditions for verbatim copying, provided that the entire
38
resulting derived work is distributed under the terms of a permission
39
notice identical to this one.
41
Permission is granted to copy and distribute translations of this manual
42
into another language, under the above conditions for modified versions,
43
except that this permission notice may be stated in a translation approved
48
@title GNU gettext tools, version @value{VERSION}
49
@subtitle Native Language Support Library and Tools
50
@subtitle Edition @value{EDITION}, @value{UPDATED}
51
@author Ulrich Drepper
53
@author Fran@,{c}ois Pinard
56
@vskip 0pt plus 1filll
57
Copyright @copyright{} 1995, 1996, 1997, 1998, 2001 Free Software Foundation, Inc.
59
Permission is granted to make and distribute verbatim copies of
60
this manual provided the copyright notice and this permission notice
61
are preserved on all copies.
63
Permission is granted to copy and distribute modified versions of this
64
manual under the conditions for verbatim copying, provided that the entire
65
resulting derived work is distributed under the terms of a permission
66
notice identical to this one.
68
Permission is granted to copy and distribute translations of this manual
69
into another language, under the above conditions for modified versions,
70
except that this permission notice may be stated in a translation approved
75
@node Top, Introduction, (dir), (dir)
76
@top GNU @code{gettext} utilities
79
* Introduction:: Introduction
80
* Basics:: PO Files and PO Mode Basics
81
* Sources:: Preparing Program Sources
82
* Template:: Making the PO Template File
83
* Creating:: Creating a New PO File
84
* Updating:: Updating Existing PO Files
85
* Binaries:: Producing Binary MO Files
86
* Users:: The User's View
87
* Programmers:: The Programmer's View
88
* Translators:: The Translator's View
89
* Maintainers:: The Maintainer's View
90
* Conclusion:: Concluding Remarks
92
* Language Codes:: ISO 639 language codes
93
* Country Codes:: ISO 3166 country codes
96
--- The Detailed Node Listing ---
100
* Why:: The Purpose of GNU @code{gettext}
101
* Concepts:: I18n, L10n, and Such
102
* Aspects:: Aspects in Native Language Support
103
* Files:: Files Conveying Translations
104
* Overview:: Overview of GNU @code{gettext}
106
PO Files and PO Mode Basics
108
* Installation:: Completing GNU @code{gettext} Installation
109
* PO Files:: The Format of PO Files
110
* Main PO Commands:: Main Commands
111
* Entry Positioning:: Entry Positioning
112
* Normalizing:: Normalizing Strings in Entries
114
Preparing Program Sources
116
* Triggering:: Triggering @code{gettext} Operations
117
* Mark Keywords:: How Marks Appear in Sources
118
* Marking:: Marking Translatable Strings
119
* c-format:: Telling something about the following string
120
* Special cases:: Special Cases of Translatable Strings
122
Making the PO Template File
124
* xgettext Invocation:: Invoking the @code{xgettext} Program
126
Updating Existing PO Files
128
* msgmerge Invocation:: Invoking the @code{msgmerge} Program
129
* Translated Entries:: Translated Entries
130
* Fuzzy Entries:: Fuzzy Entries
131
* Untranslated Entries:: Untranslated Entries
132
* Obsolete Entries:: Obsolete Entries
133
* Modifying Translations:: Modifying Translations
134
* Modifying Comments:: Modifying Comments
135
* Subedit:: Mode for Editing Translations
136
* C Sources Context:: C Sources Context
137
* Auxiliary:: Consulting Auxiliary PO Files
138
* Compendium:: Using Translation Compendiums
140
Producing Binary MO Files
142
* msgfmt Invocation:: Invoking the @code{msgfmt} Program
143
* MO Files:: The Format of GNU MO Files
147
* Matrix:: The Current @file{ABOUT-NLS} Matrix
148
* Installers:: Magic for Installers
149
* End Users:: Magic for End Users
151
The Programmer's View
153
* catgets:: About @code{catgets}
154
* gettext:: About @code{gettext}
155
* Comparison:: Comparing the two interfaces
156
* Using libintl.a:: Using libintl.a in own programs
157
* gettext grok:: Being a @code{gettext} grok
158
* Temp Programmers:: Temporary Notes for the Programmers Chapter
162
* Interface to catgets:: The interface
163
* Problems with catgets:: Problems with the @code{catgets} interface?!
167
* Interface to gettext:: The interface
168
* Ambiguities:: Solving ambiguities
169
* Locating Catalogs:: Locating message catalog files
170
* Charset conversion:: How to request conversion to Unicode
171
* Plural forms:: Additional functions for handling plurals
172
* GUI program problems:: Another technique for solving ambiguities
173
* Optimized gettext:: Optimization of the *gettext functions
175
Temporary Notes for the Programmers Chapter
177
* Temp Implementations:: Temporary - Two Possible Implementations
178
* Temp catgets:: Temporary - About @code{catgets}
179
* Temp WSI:: Temporary - Why a single implementation
180
* Temp Notes:: Temporary - Notes
182
The Translator's View
184
* Trans Intro 0:: Introduction 0
185
* Trans Intro 1:: Introduction 1
186
* Discussions:: Discussions
187
* Organization:: Organization
188
* Information Flow:: Information Flow
192
* Central Coordination:: Central Coordination
193
* National Teams:: National Teams
194
* Mailing Lists:: Mailing Lists
198
* Sub-Cultures:: Sub-Cultures
199
* Organizational Ideas:: Organizational Ideas
201
The Maintainer's View
203
* Flat and Non-Flat:: Flat or Non-Flat Directory Structures
204
* Prerequisites:: Prerequisite Works
205
* gettextize Invocation:: Invoking the @code{gettextize} Program
206
* Adjusting Files:: Files You Must Create or Alter
208
Files You Must Create or Alter
210
* po/POTFILES.in:: @file{POTFILES.in} in @file{po/}
211
* configure.in:: @file{configure.in} at top level
212
* config.guess:: @file{config.guess}, @file{config.sub} at top level
213
* aclocal:: @file{aclocal.m4} at top level
214
* acconfig:: @file{acconfig.h} at top level
215
* Makefile:: @file{Makefile.in} at top level
216
* src/Makefile:: @file{Makefile.in} in @file{src/}
220
* History:: History of GNU @code{gettext}
221
* References:: Related Readings
228
@node Introduction, Basics, Top, Top
229
@chapter Introduction
232
This manual is still in @emph{DRAFT} state. Some sections are still
233
empty, or almost. We keep merging material from other sources
234
(essentially e-mail folders) while the proper integration of this
238
In this manual, we use @emph{he} when speaking of the programmer or
239
maintainer, @emph{she} when speaking of the translator, and @emph{they}
240
when speaking of the installers or end users of the translated program.
241
This is only a convenience for clarifying the documentation. It is
242
@emph{absolutely} not meant to imply that some roles are more appropriate
243
to males or females. Besides, as you might guess, GNU @code{gettext}
244
is meant to be useful for people using computers, whatever their sex,
245
race, religion or nationality!
247
This chapter explains the goals sought in the creation
248
of GNU @code{gettext} and the free Translation Project.
249
Then, it explains a few broad concepts around
250
Native Language Support, and positions message translation with regard
251
to other aspects of national and cultural variance, as they apply to
252
to programs. It also surveys those files used to convey the
253
translations. It explains how the various tools interact in the
254
initial generation of these files, and later, how the maintenance
255
cycle should usually operate.
257
Please send suggestions and corrections to:
261
@r{Internet address:}
262
bug-gnu-utils@@gnu.org
267
Please include the manual's edition number and update date in your messages.
270
* Why:: The Purpose of GNU @code{gettext}
271
* Concepts:: I18n, L10n, and Such
272
* Aspects:: Aspects in Native Language Support
273
* Files:: Files Conveying Translations
274
* Overview:: Overview of GNU @code{gettext}
277
@node Why, Concepts, Introduction, Introduction
278
@section The Purpose of GNU @code{gettext}
280
Usually, programs are written and documented in English, and use
281
English at execution time to interact with users. This is true
282
not only of GNU software, but also of a great deal of commercial
283
and free software. Using a common language is quite handy for
284
communication between developers, maintainers and users from all
285
countries. On the other hand, most people are less comfortable with
286
English than with their own native language, and would prefer to
287
use their mother tongue for day to day's work, as far as possible.
288
Many would simply @emph{love} to see their computer screen showing
289
a lot less of English, and far more of their own language.
291
However, to many people, this dream might appear so far fetched that
292
they may believe it is not even worth spending time thinking about
293
it. They have no confidence at all that the dream might ever
294
become true. Yet some have not lost hope, and have organized themselves.
295
The Translation Project is a formalization of this hope into a
296
workable structure, which has a good chance to get all of us nearer
297
the achievement of a truly multi-lingual set of programs.
299
GNU @code{gettext} is an important step for the Translation Project,
300
as it is an asset on which we may build many other steps. This package
301
offers to programmers, translators and even users, a well integrated
302
set of tools and documentation. Specifically, the GNU @code{gettext}
303
utilities are a set of tools that provides a framework within which
304
other free packages may produce multi-lingual messages. These tools
309
A set of conventions about how programs should be written to support
313
A directory and file naming organization for the message catalogs
317
A runtime library supporting the retrieval of translated messages.
320
A few stand-alone programs to massage in various ways the sets of
321
translatable strings, or already translated strings.
324
A special mode for Emacs@footnote{In this manual, all mentions of Emacs
325
refers to either GNU Emacs or to XEmacs, which people sometimes call FSF
326
Emacs and Lucid Emacs, respectively.} which helps preparing these sets
327
and bringing them up to date.
330
GNU @code{gettext} is designed to minimize the impact of
331
internationalization on program sources, keeping this impact as small
332
and hardly noticeable as possible. Internationalization has better
333
chances of succeeding if it is very light weighted, or at least,
334
appear to be so, when looking at program sources.
336
The Translation Project also uses the GNU @code{gettext} distribution
337
as a vehicle for documenting its structure and methods. This goes
338
beyond the strict technicalities of documenting the GNU @code{gettext}
339
proper. By so doing, translators will find in a single place, as
340
far as possible, all they need to know for properly doing their
341
translating work. Also, this supplemental documentation might also
342
help programmers, and even curious users, in understanding how GNU
343
@code{gettext} is related to the remainder of the Translation
344
Project, and consequently, have a glimpse at the @emph{big picture}.
346
@node Concepts, Aspects, Why, Introduction
347
@section I18n, L10n, and Such
349
Two long words appear all the time when we discuss support of native
350
language in programs, and these words have a precise meaning, worth
351
being explained here, once and for all in this document. The words are
352
@emph{internationalization} and @emph{localization}. Many people,
353
tired of writing these long words over and over again, took the
354
habit of writing @dfn{i18n} and @dfn{l10n} instead, quoting the first
355
and last letter of each word, and replacing the run of intermediate
356
letters by a number merely telling how many such letters there are.
357
But in this manual, in the sake of clarity, we will patiently write
358
the names in full, each time@dots{}
360
By @dfn{internationalization}, one refers to the operation by which a
361
program, or a set of programs turned into a package, is made aware of and
362
able to support multiple languages. This is a generalization process,
363
by which the programs are untied from calling only English strings or
364
other English specific habits, and connected to generic ways of doing
365
the same, instead. Program developers may use various techniques to
366
internationalize their programs. Some of these have been standardized.
367
GNU @code{gettext} offers one of these standards. @xref{Programmers}.
369
By @dfn{localization}, one means the operation by which, in a set
370
of programs already internationalized, one gives the program all
371
needed information so that it can adapt itself to handle its input
372
and output in a fashion which is correct for some native language and
373
cultural habits. This is a particularisation process, by which generic
374
methods already implemented in an internationalized program are used
375
in specific ways. The programming environment puts several functions
376
to the programmers disposal which allow this runtime configuration.
377
The formal description of specific set of cultural habits for some
378
country, together with all associated translations targeted to the
379
same native language, is called the @dfn{locale} for this language
380
or country. Users achieve localization of programs by setting proper
381
values to special environment variables, prior to executing those
382
programs, identifying which locale should be used.
384
In fact, locale message support is only one component of the cultural
385
data that makes up a particular locale. There are a whole host of
386
routines and functions provided to aid programmers in developing
387
internationalized software and which allow them to access the data
388
stored in a particular locale. When someone presently refers to a
389
particular locale, they are obviously referring to the data stored
390
within that particular locale. Similarly, if a programmer is referring
391
to ``accessing the locale routines'', they are referring to the
392
complete suite of routines that access all of the locale's information.
394
One uses the expression @dfn{Native Language Support}, or merely NLS,
395
for speaking of the overall activity or feature encompassing both
396
internationalization and localization, allowing for multi-lingual
397
interactions in a program. In a nutshell, one could say that
398
internationalization is the operation by which further localizations
401
Also, very roughly said, when it comes to multi-lingual messages,
402
internationalization is usually taken care of by programmers, and
403
localization is usually taken care of by translators.
405
@node Aspects, Files, Concepts, Introduction
406
@section Aspects in Native Language Support
408
For a totally multi-lingual distribution, there are many things to
409
translate beyond output messages.
413
As of today, GNU @code{gettext} offers a complete toolset for
414
translating messages output by C programs. Perl scripts and shell
415
scripts will also need to be translated. Even if there are today some hooks
416
by which this can be done, these hooks are not integrated as well as they
420
Some programs, like @code{autoconf} or @code{bison}, are able
421
to produce other programs (or scripts). Even if the generating
422
programs themselves are internationalized, the generated programs they
423
produce may need internationalization on their own, and this indirect
424
internationalization could be automated right from the generating
425
program. In fact, quite usually, generating and generated programs
426
could be internationalized independently, as the effort needed is
430
A few programs include textual tables which might need translation
431
themselves, independently of the strings contained in the program
432
itself. For example, @w{RFC 1345} gives an English description for each
433
character which the @code{recode} program is able to reconstruct at execution.
434
Since these descriptions are extracted from the RFC by mechanical means,
435
translating them properly would require a prior translation of the RFC
439
Almost all programs accept options, which are often worded out so to
440
be descriptive for the English readers; one might want to consider
441
offering translated versions for program options as well.
444
Many programs read, interpret, compile, or are somewhat driven by
445
input files which are texts containing keywords, identifiers, or
446
replies which are inherently translatable. For example, one may want
447
@code{gcc} to allow diacriticized characters in identifiers or use
448
translated keywords; @samp{rm -i} might accept something else than
449
@samp{y} or @samp{n} for replies, etc. Even if the program will
450
eventually make most of its output in the foreign languages, one has
451
to decide whether the input syntax, option values, etc., are to be
455
The manual accompanying a package, as well as all documentation files
456
in the distribution, could surely be translated, too. Translating a
457
manual, with the intent of later keeping up with updates, is a major
458
undertaking in itself, generally.
462
As we already stressed, translation is only one aspect of locales.
463
Other internationalization aspects are system services and are handled
464
in GNU @code{libc}. There
465
are many attributes that are needed to define a country's cultural
466
conventions. These attributes include beside the country's native
467
language, the formatting of the date and time, the representation of
468
numbers, the symbols for currency, etc. These local @dfn{rules} are
469
termed the country's locale. The locale represents the knowledge
470
needed to support the country's native attributes.
472
There are a few major areas which may vary between countries and
473
hence, define what a locale must describe. The following list helps
474
putting multi-lingual messages into the proper context of other tasks
475
related to locales. See the GNU @code{libc} manual for details.
479
@item Characters and Codesets
481
The codeset most commonly used through out the USA and most English
482
speaking parts of the world is the ASCII codeset. However, there are
483
many characters needed by various locales that are not found within
484
this codeset. The 8-bit @w{ISO 8859-1} code set has most of the special
485
characters needed to handle the major European languages. However, in
486
many cases, the @w{ISO 8859-1} font is not adequate. Hence each locale
487
will need to specify which codeset they need to use and will need
488
to have the appropriate character handling routines to cope with
493
The symbols used vary from country to country as does the position
494
used by the symbol. Software needs to be able to transparently
495
display currency figures in the native mode for each locale.
499
The format of date varies between locales. For example, Christmas day
500
in 1994 is written as 12/25/94 in the USA and as 25/12/94 in Australia.
501
Other countries might use @w{ISO 8061} dates, etc.
503
Time of the day may be noted as @var{hh}:@var{mm}, @var{hh}.@var{mm},
504
or otherwise. Some locales require time to be specified in 24-hour
505
mode rather than as AM or PM. Further, the nature and yearly extent
506
of the Daylight Saving correction vary widely between countries.
510
Numbers can be represented differently in different locales.
511
For example, the following numbers are all written correctly for
512
their respective locales:
520
Some programs could go further and use different unit systems, like
521
English units or Metric units, or even take into account variants
522
about how numbers are spelled in full.
526
The most obvious area is the language support within a locale. This is
527
where GNU @code{gettext} provides the means for developers and users to
528
easily change the language that the software uses to communicate to
533
Components of locale outside of message handling are standardized in
534
the ISO C standard and the SUSV2 specification. GNU @code{libc}
535
fully implements this, and most other modern systems provide a more
536
or less reasonable support for at least some of the missing components.
538
@node Files, Overview, Aspects, Introduction
539
@section Files Conveying Translations
541
The letters PO in @file{.po} files means Portable Object, to
542
distinguish it from @file{.mo} files, where MO stands for Machine
543
Object. This paradigm, as well as the PO file format, is inspired
544
by the NLS standard developed by Uniforum, and implemented by Sun
545
in their Solaris system.
547
PO files are meant to be read and edited by humans, and associate each
548
original, translatable string of a given package with its translation
549
in a particular target language. A single PO file is dedicated to
550
a single target language. If a package supports many languages,
551
there is one such PO file per language supported, and each package
552
has its own set of PO files. These PO files are best created by
553
the @code{xgettext} program, and later updated or refreshed through
554
the @code{msgmerge} program. Program @code{xgettext} extracts all
555
marked messages from a set of C files and initializes a PO file with
556
empty translations. Program @code{msgmerge} takes care of adjusting
557
PO files between releases of the corresponding sources, commenting
558
obsolete entries, initializing new ones, and updating all source
559
line references. Files ending with @file{.pot} are kind of base
560
translation files found in distributions, in PO file format, and
561
@file{.pox} files are often temporary PO files.
563
MO files are meant to be read by programs, and are binary in nature.
564
A few systems already offer tools for creating and handling MO files
565
as part of the Native Language Support coming with the system, but the
566
format of these MO files is often different from system to system,
567
and non-portable. The tools already provided with these systems don't
568
support all the features of GNU @code{gettext}. Therefore GNU
569
@code{gettext} uses its own format for MO files. Files ending with
570
@file{.gmo} are really MO files, when it is known that these files use
573
@node Overview, , Files, Introduction
574
@section Overview of GNU @code{gettext}
576
The following diagram summarizes the relation between the files
577
handled by GNU @code{gettext} and the tools acting on these files.
578
It is followed by a somewhat detailed explanations, which you should
579
read while keeping an eye on the diagram. Having a clear understanding
580
of these interrelations would surely help programmers, translators
585
Original C Sources ---> PO mode ---> Marked C Sources ---.
587
.---------<--- GNU gettext Library |
589
| `---------<--------------------+-----------'
591
| .-----<--- PACKAGE.pot <--- xgettext <---' .---<--- PO Compendium
594
| `---. +---> PO mode ---.
595
| +----> msgmerge ------> LANG.pox --->--------' |
598
| `-------------<---------------. |
599
| +--- LANG.po <--- New LANG.pox <----'
600
| .--- LANG.gmo <--- msgfmt <---'
602
| `---> install ---> /.../LANG/PACKAGE.mo ---.
603
| +---> "Hello world!"
604
`-------> install ---> /.../bin/PROGRAM -------'
608
The indication @samp{PO mode} appears in two places in this picture,
609
and you may safely read it as merely meaning ``hand editing'', using
610
any editor of your choice, really. However, for those of you being
611
the lucky users of Emacs, PO mode has been specifically created
612
for providing a cozy environment for editing or modifying PO files.
613
While editing a PO file, PO mode allows for the easy browsing of
614
auxiliary and compendium PO files, as well as for following references into
615
the set of C program sources from which PO files have been derived.
616
It has a few special features, among which are the interactive marking
617
of program strings as translatable, and the validatation of PO files
618
with easy repositioning to PO file lines showing errors.
620
As a programmer, the first step to bringing GNU @code{gettext}
621
into your package is identifying, right in the C sources, those strings
622
which are meant to be translatable, and those which are untranslatable.
623
This tedious job can be done a little more comfortably using emacs PO
624
mode, but you can use any means familiar to you for modifying your
625
C sources. Beside this some other simple, standard changes are needed to
626
properly initialize the translation library. @xref{Sources}, for
627
more information about all this.
629
For newly written software the strings of course can and should be
630
marked while writing it. The @code{gettext} approach makes this
631
very easy. Simply put the following lines at the beginning of each file
632
or in a central header file:
636
#define _(String) (String)
637
#define N_(String) (String)
638
#define textdomain(Domain)
639
#define bindtextdomain(Package, Directory)
644
Doing this allows you to prepare the sources for internationalization.
645
Later when you feel ready for the step to use the @code{gettext} library
646
simply replace these definitions by the following:
651
#define _(String) gettext (String)
652
#define gettext_noop(String) (String)
653
#define N_(String) gettext_noop (String)
657
and link against @file{libintl.a} or @file{libintl.so}. Note that on
658
GNU systems, you don't need to link with @code{libintl} because the
659
@code{gettext} library functions are already contained in GNU libc.
660
That is all you have to change.
662
Once the C sources have been modified, the @code{xgettext} program
663
is used to find and extract all translatable strings, and create a
664
PO template file out of all these. This @file{@var{package}.pot} file
665
contains all original program strings. It has sets of pointers to
666
exactly where in C sources each string is used. All translations
667
are set to empty. The letter @kbd{t} in @file{.pot} marks this as
668
a Template PO file, not yet oriented towards any particular language.
669
@xref{xgettext Invocation}, for more details about how one calls the
670
@code{xgettext} program. If you are @emph{really} lazy, you might
671
be interested at working a lot more right away, and preparing the
672
whole distribution setup (@pxref{Maintainers}). By doing so, you
673
spare yourself typing the @code{xgettext} command, as @code{make}
674
should now generate the proper things automatically for you!
676
The first time through, there is no @file{@var{lang}.po} yet, so the
677
@code{msgmerge} step may be skipped and replaced by a mere copy of
678
@file{@var{package}.pot} to @file{@var{lang}.pox}, where @var{lang}
679
represents the target language.
681
Then comes the initial translation of messages. Translation in
682
itself is a whole matter, still exclusively meant for humans,
683
and whose complexity far overwhelms the level of this manual.
684
Nevertheless, a few hints are given in some other chapter of this
685
manual (@pxref{Translators}). You will also find there indications
686
about how to contact translating teams, or becoming part of them,
687
for sharing your translating concerns with others who target the same
690
While adding the translated messages into the @file{@var{lang}.pox}
691
PO file, if you do not have Emacs handy, you are on your own
692
for ensuring that your efforts fully respect the PO file format, and quoting
693
conventions (@pxref{PO Files}). This is surely not an impossible task,
694
as this is the way many people have handled PO files already for Uniforum or
695
Solaris. On the other hand, by using PO mode in Emacs, most details
696
of PO file format are taken care of for you, but you have to acquire
697
some familiarity with PO mode itself. Besides main PO mode commands
698
(@pxref{Main PO Commands}), you should know how to move between entries
699
(@pxref{Entry Positioning}), and how to handle untranslated entries
700
(@pxref{Untranslated Entries}).
702
If some common translations have already been saved into a compendium
703
PO file, translators may use PO mode for initializing untranslated
704
entries from the compendium, and also save selected translations into
705
the compendium, updating it (@pxref{Compendium}). Compendium files
706
are meant to be exchanged between members of a given translation team.
708
Programs, or packages of programs, are dynamic in nature: users write
709
bug reports and suggestion for improvements, maintainers react by
710
modifying programs in various ways. The fact that a package has
711
already been internationalized should not make maintainers shy
712
of adding new strings, or modifying strings already translated.
713
They just do their job the best they can. For the Translation
714
Project to work smoothly, it is important that maintainers do not
715
carry translation concerns on their already loaded shoulders, and that
716
translators be kept as free as possible of programmatic concerns.
718
The only concern maintainers should have is carefully marking new
719
strings as translatable, when they should be, and do not otherwise
720
worry about them being translated, as this will come in proper time.
721
Consequently, when programs and their strings are adjusted in various
722
ways by maintainers, and for matters usually unrelated to translation,
723
@code{xgettext} would construct @file{@var{package}.pot} files which are
724
evolving over time, so the translations carried by @file{@var{lang}.po}
725
are slowly fading out of date.
727
It is important for translators (and even maintainers) to understand
728
that package translation is a continuous process in the lifetime of a
729
package, and not something which is done once and for all at the start.
730
After an initial burst of translation activity for a given package,
731
interventions are needed once in a while, because here and there,
732
translated entries become obsolete, and new untranslated entries
733
appear, needing translation.
735
The @code{msgmerge} program has the purpose of refreshing an already
736
existing @file{@var{lang}.po} file, by comparing it with a newer
737
@file{@var{package}.pot} template file, extracted by @code{xgettext}
738
out of recent C sources. The refreshing operation adjusts all
739
references to C source locations for strings, since these strings
740
move as programs are modified. Also, @code{msgmerge} comments out as
741
obsolete, in @file{@var{lang}.pox}, those already translated entries
742
which are no longer used in the program sources (@pxref{Obsolete
743
Entries}). It finally discovers new strings and inserts them in
744
the resulting PO file as untranslated entries (@pxref{Untranslated
745
Entries}). @xref{msgmerge Invocation}, for more information about what
746
@code{msgmerge} really does.
748
Whatever route or means taken, the goal is to obtain an updated
749
@file{@var{lang}.pox} file offering translations for all strings.
750
When this is properly achieved, this file @file{@var{lang}.pox} may
751
take the place of the previous official @file{@var{lang}.po} file.
753
The temporal mobility, or fluidity of PO files, is an integral part of
754
the translation game, and should be well understood, and accepted.
755
People resisting it will have a hard time participating in the
756
Translation Project, or will give a hard time to other participants! In
757
particular, maintainers should relax and include all available official
758
PO files in their distributions, even if these have not recently been
759
updated, without banging or otherwise trying to exert pressure on the
760
translator teams to get the job done. The pressure should rather come
761
from the community of users speaking a particular language, and
762
maintainers should consider themselves fairly relieved of any concern
763
about the adequacy of translation files. On the other hand, translators
764
should reasonably try updating the PO files they are responsible for,
765
while the package is undergoing pretest, prior to an official
768
Once the PO file is complete and dependable, the @code{msgfmt} program
769
is used for turning the PO file into a machine-oriented format, which
770
may yield efficient retrieval of translations by the programs of the
771
package, whenever needed at runtime (@pxref{MO Files}). @xref{msgfmt
772
Invocation}, for more information about all modalities of execution
773
for the @code{msgfmt} program.
775
Finally, the modified and marked C sources are compiled and linked
776
with the GNU @code{gettext} library, usually through the operation of
777
@code{make}, given a suitable @file{Makefile} exists for the project,
778
and the resulting executable is installed somewhere users will find it.
779
The MO files themselves should also be properly installed. Given the
780
appropriate environment variables are set (@pxref{End Users}), the
781
program should localize itself automatically, whenever it executes.
783
The remainder of this manual has the purpose of explaining in depth the various
784
steps outlined above.
786
@node Basics, Sources, Introduction, Top
787
@chapter PO Files and PO Mode Basics
789
The GNU @code{gettext} toolset helps programmers and translators
790
at producing, updating and using translation files, mainly those
791
PO files which are textual, editable files. This chapter stresses
792
the format of PO files, and contains a PO mode starter. PO mode
793
description is spread throughout this manual instead of being concentrated
794
in one place. Here we present only the basics of PO mode.
797
* Installation:: Completing GNU @code{gettext} Installation
798
* PO Files:: The Format of PO Files
799
* Main PO Commands:: Main Commands
800
* Entry Positioning:: Entry Positioning
801
* Normalizing:: Normalizing Strings in Entries
804
@node Installation, PO Files, Basics, Basics
805
@section Completing GNU @code{gettext} Installation
807
Once you have received, unpacked, configured and compiled the GNU
808
@code{gettext} distribution, the @samp{make install} command puts in
809
place the programs @code{xgettext}, @code{msgfmt}, @code{gettext}, and
810
@code{msgmerge}, as well as their available message catalogs. To
811
top off a comfortable installation, you might also want to make the
812
PO mode available to your Emacs users.
814
During the installation of the PO mode, you might want to modify your
815
file @file{.emacs}, once and for all, so it contains a few lines looking
819
(setq auto-mode-alist
820
(cons '("\\.po[tx]?\\'\\|\\.po\\." . po-mode) auto-mode-alist))
821
(autoload 'po-mode "po-mode" "Major mode for translators to edit PO files" t)
824
Later, whenever you edit some @file{.po}, @file{.pot} or @file{.pox}
825
file, or any file having the string @samp{.po.} within its name,
826
Emacs loads @file{po-mode.elc} (or @file{po-mode.el}) as needed, and
827
automatically activates PO mode commands for the associated buffer.
828
The string @emph{PO} appears in the mode line for any buffer for
829
which PO mode is active. Many PO files may be active at once in a
830
single Emacs session.
832
If you are using Emacs version 20 or newer, and have already installed
833
the appropriate international fonts on your system, you may also tell
834
Emacs how to determine automatically the coding system of every PO file.
835
This will often (but not always) cause the necessary fonts to be loaded
836
and used for displaying the translations on your Emacs screen. For this
837
to happen, add the lines:
840
(modify-coding-system-alist 'file "\\.po[tx]?\\'\\|\\.po\\."
841
'po-find-file-coding-system)
842
(autoload 'po-find-file-coding-system "po-mode")
846
to your @file{.emacs} file. If, with this, you still see boxes instead
847
of international characters, try a different font set (via Shift Mouse
850
@node PO Files, Main PO Commands, Installation, Basics
851
@section The Format of PO Files
853
A PO file is made up of many entries, each entry holding the relation
854
between an original untranslated string and its corresponding
855
translation. All entries in a given PO file usually pertain
856
to a single project, and all translations are expressed in a single
857
target language. One PO file @dfn{entry} has the following schematic
862
# @var{translator-comments}
863
#. @var{automatic-comments}
864
#: @var{reference}@dots{}
866
msgid @var{untranslated-string}
867
msgstr @var{translated-string}
870
The general structure of a PO file should be well understood by
871
the translator. When using PO mode, very little has to be known
872
about the format details, as PO mode takes care of them for her.
874
Entries begin with some optional white space. Usually, when generated
875
through GNU @code{gettext} tools, there is exactly one blank line
876
between entries. Then comments follow, on lines all starting with the
877
character @kbd{#}. There are two kinds of comments: those which have
878
some white space immediately following the @kbd{#}, which comments are
879
created and maintained exclusively by the translator, and those which
880
have some non-white character just after the @kbd{#}, which comments
881
are created and maintained automatically by GNU @code{gettext} tools.
882
All comments, of either kind, are optional.
884
After white space and comments, entries show two strings, namely
885
first the untranslated string as it appears in the original program
886
sources, and then, the translation of this string. The original
887
string is introduced by the keyword @code{msgid}, and the translation,
888
by @code{msgstr}. The two strings, untranslated and translated,
889
are quoted in various ways in the PO file, using @kbd{"}
890
delimiters and @kbd{\} escapes, but the translator does not really
891
have to pay attention to the precise quoting format, as PO mode fully
892
takes care of quoting for her.
894
The @code{msgid} strings, as well as automatic comments, are produced
895
and managed by other GNU @code{gettext} tools, and PO mode does not
896
provide means for the translator to alter these. The most she can
897
do is merely deleting them, and only by deleting the whole entry.
898
On the other hand, the @code{msgstr} string, as well as translator
899
comments, are really meant for the translator, and PO mode gives her
900
the full control she needs.
902
The comment lines beginning with @kbd{#,} are special because they are
903
not completely ignored by the programs as comments generally are. The
904
comma separated list of @var{flag}s is used by the @code{msgfmt}
905
program to give the user some better diagnostic messages. Currently
906
there are two forms of flags defined:
910
This flag can be generated by the @code{msgmerge} program or it can be
911
inserted by the translator herself. It shows that the @code{msgstr}
912
string might not be a correct translation (anymore). Only the translator
913
can judge if the translation requires further modification, or is
914
acceptable as is. Once satisfied with the translation, she then removes
915
this @kbd{fuzzy} attribute. The @code{msgmerge} program inserts this
916
when it combined the @code{msgid} and @code{msgstr} entries after fuzzy
917
search only. @xref{Fuzzy Entries}.
921
These flags should not be added by a human. Instead only the
922
@code{xgettext} program adds them. In an automatized PO file processing
923
system as proposed here the user changes would be thrown away again as
924
soon as the @code{xgettext} program generates a new template file.
926
In case the @kbd{c-format} flag is given for a string the @code{msgfmt}
927
does some more tests to check to validity of the translation.
928
@xref{msgfmt Invocation}.
932
A different kind of entries is used for translations which involve
937
# @var{translator-comments}
938
#. @var{automatic-comments}
939
#: @var{reference}@dots{}
941
msgid @var{untranslated-string-singular}
942
msgid_plural @var{untranslated-string-plural}
943
msgstr[0] @var{translated-string-case-0}
945
msgstr[N] @var{translated-string-case-n}
948
It happens that some lines, usually whitespace or comments, follow the
949
very last entry of a PO file. Such lines are not part of any entry,
950
and PO mode is unable to take action on those lines. By using the
951
PO mode function @w{@kbd{M-x po-normalize}}, the translator may get
952
rid of those spurious lines. @xref{Normalizing}.
954
The remainder of this section may be safely skipped by those using
955
PO mode, yet it may be interesting for everybody to have a better
956
idea of the precise format of a PO file. On the other hand, those
957
not having Emacs handy should carefully continue reading on.
959
Each of @var{untranslated-string} and @var{translated-string} respects
960
the C syntax for a character string, including the surrounding quotes
961
and imbedded backslashed escape sequences. When the time comes
962
to write multi-line strings, one should not use escaped newlines.
963
Instead, a closing quote should follow the last character on the
964
line to be continued, and an opening quote should resume the string
965
at the beginning of the following PO file line. For example:
969
"Here is an example of how one might continue a very long string\n"
970
"for the common case the string represents multi-line output.\n"
974
In this example, the empty string is used on the first line, to
975
allow better alignment of the @kbd{H} from the word @samp{Here}
976
over the @kbd{f} from the word @samp{for}. In this example, the
977
@code{msgid} keyword is followed by three strings, which are meant
978
to be concatenated. Concatenating the empty string does not change
979
the resulting overall string, but it is a way for us to comply with
980
the necessity of @code{msgid} to be followed by a string on the same
981
line, while keeping the multi-line presentation left-justified, as
982
we find this to be a cleaner disposition. The empty string could have
983
been omitted, but only if the string starting with @samp{Here} was
984
promoted on the first line, right after @code{msgid}.@footnote{This
985
limitation is not imposed by GNU @code{gettext}, but is for compatibility
986
with the @code{msgfmt} implementation on Solaris.} It was not really necessary
987
either to switch between the two last quoted strings immediately after
988
the newline @samp{\n}, the switch could have occurred after @emph{any}
989
other character, we just did it this way because it is neater.
991
One should carefully distinguish between end of lines marked as
992
@samp{\n} @emph{inside} quotes, which are part of the represented
993
string, and end of lines in the PO file itself, outside string quotes,
994
which have no incidence on the represented string.
996
Outside strings, white lines and comments may be used freely.
997
Comments start at the beginning of a line with @samp{#} and extend
998
until the end of the PO file line. Comments written by translators
999
should have the initial @samp{#} immediately followed by some white
1000
space. If the @samp{#} is not immediately followed by white space,
1001
this comment is most likely generated and managed by specialized GNU
1002
tools, and might disappear or be replaced unexpectedly when the PO
1003
file is given to @code{msgmerge}.
1005
@node Main PO Commands, Entry Positioning, PO Files, Basics
1006
@section Main PO mode Commands
1008
After setting up Emacs with something similar to the lines in
1009
@ref{Installation}, PO mode is activated for a window when Emacs finds a
1010
PO file in that window. This puts the window read-only and establishes a
1011
po-mode-map, which is a genuine Emacs mode, in a way that is not derived
1012
from text mode in any way. Functions found on @code{po-mode-hook},
1013
if any, will be executed.
1015
When PO mode is active in a window, the letters @samp{PO} appear
1016
in the mode line for that window. The mode line also displays how
1017
many entries of each kind are held in the PO file. For example,
1018
the string @samp{132t+3f+10u+2o} would tell the translator that the
1019
PO mode contains 132 translated entries (@pxref{Translated Entries},
1020
3 fuzzy entries (@pxref{Fuzzy Entries}), 10 untranslated entries
1021
(@pxref{Untranslated Entries}) and 2 obsolete entries (@pxref{Obsolete
1022
Entries}). Zero-coefficients items are not shown. So, in this example, if
1023
the fuzzy entries were unfuzzied, the untranslated entries were translated
1024
and the obsolete entries were deleted, the mode line would merely display
1025
@samp{145t} for the counters.
1027
The main PO commands are those which do not fit into the other categories of
1028
subsequent sections. These allow for quitting PO mode or for managing windows
1033
Undo last modification to the PO file.
1036
Quit processing and save the PO file.
1039
Quit processing, possibly after confirmation.
1042
Temporary leave the PO file window.
1046
Show help about PO mode.
1049
Give some PO file statistics.
1052
Batch validate the format of the whole PO file.
1056
The command @kbd{U} (@code{po-undo}) interfaces to the Emacs
1057
@emph{undo} facility. @xref{Undo, , Undoing Changes, emacs, The Emacs
1058
Editor}. Each time @kbd{U} is typed, modifications which the translator
1059
did to the PO file are undone a little more. For the purpose of
1060
undoing, each PO mode command is atomic. This is especially true for
1061
the @kbd{@key{RET}} command: the whole edition made by using a single
1062
use of this command is undone at once, even if the edition itself
1063
implied several actions. However, while in the editing window, one
1064
can undo the edition work quite parsimoniously.
1066
The commands @kbd{Q} (@code{po-quit}) and @kbd{q}
1067
(@code{po-confirm-and-quit}) are used when the translator is done with the
1068
PO file. The former is a bit less verbose than the latter. If the file
1069
has been modified, it is saved to disk first. In both cases, and prior to
1070
all this, the commands check if some untranslated message remains in the
1071
PO file and, if yes, the translator is asked if she really wants to leave
1072
off working with this PO file. This is the preferred way of getting rid
1073
of an Emacs PO file buffer. Merely killing it through the usual command
1074
@w{@kbd{C-x k}} (@code{kill-buffer}) is not the tidiest way to proceed.
1076
The command @kbd{O} (@code{po-other-window}) is another, softer way,
1077
to leave PO mode, temporarily. It just moves the cursor to some other
1078
Emacs window, and pops one if necessary. For example, if the translator
1079
just got PO mode to show some source context in some other, she might
1080
discover some apparent bug in the program source that needs correction.
1081
This command allows the translator to change sex, become a programmer,
1082
and have the cursor right into the window containing the program she
1083
(or rather @emph{he}) wants to modify. By later getting the cursor back
1084
in the PO file window, or by asking Emacs to edit this file once again,
1085
PO mode is then recovered.
1087
The command @kbd{h} (@code{po-help}) displays a summary of all available PO
1088
mode commands. The translator should then type any character to resume
1089
normal PO mode operations. The command @kbd{?} has the same effect
1092
The command @kbd{=} (@code{po-statistics}) computes the total number of
1093
entries in the PO file, the ordinal of the current entry (counted from
1094
1), the number of untranslated entries, the number of obsolete entries,
1095
and displays all these numbers.
1097
The command @kbd{V} (@code{po-validate}) launches @code{msgfmt} in verbose
1098
mode over the current PO file. This command first offers to save the
1099
current PO file on disk. The @code{msgfmt} tool, from GNU @code{gettext},
1100
has the purpose of creating a MO file out of a PO file, and PO mode uses
1101
the features of this program for checking the overall format of a PO file,
1102
as well as all individual entries.
1104
The program @code{msgfmt} runs asynchronously with Emacs, so the
1105
translator regains control immediately while her PO file is being studied.
1106
Error output is collected in the Emacs @samp{*compilation*} buffer,
1107
displayed in another window. The regular Emacs command @kbd{C-x`}
1108
(@code{next-error}), as well as other usual compile commands, allow the
1109
translator to reposition quickly to the offending parts of the PO file.
1110
Once the cursor is on the line in error, the translator may decide on
1111
any PO mode action which would help correcting the error.
1113
@node Entry Positioning, Normalizing, Main PO Commands, Basics
1114
@section Entry Positioning
1116
The cursor in a PO file window is almost always part of
1117
an entry. The only exceptions are the special case when the cursor
1118
is after the last entry in the file, or when the PO file is
1119
empty. The entry where the cursor is found to be is said to be the
1120
current entry. Many PO mode commands operate on the current entry,
1121
so moving the cursor does more than allowing the translator to browse
1122
the PO file, this also selects on which entry commands operate.
1124
Some PO mode commands alter the position of the cursor in a specialized
1125
way. A few of those special purpose positioning are described here,
1126
the others are described in following sections.
1131
Redisplay the current entry.
1135
Select the entry after the current one.
1139
Select the entry before the current one.
1142
Select the first entry in the PO file.
1145
Select the last entry in the PO file.
1148
Record the location of the current entry for later use.
1151
Return to a previously saved entry location.
1154
Exchange the current entry location with the previously saved one.
1158
Any Emacs command able to reposition the cursor may be used
1159
to select the current entry in PO mode, including commands which
1160
move by characters, lines, paragraphs, screens or pages, and search
1161
commands. However, there is a kind of standard way to display the
1162
current entry in PO mode, which usual Emacs commands moving
1163
the cursor do not especially try to enforce. The command @kbd{.}
1164
(@code{po-current-entry}) has the sole purpose of redisplaying the
1165
current entry properly, after the current entry has been changed by
1166
means external to PO mode, or the Emacs screen otherwise altered.
1168
It is yet to be decided if PO mode helps the translator, or otherwise
1169
irritates her, by forcing a rigid window disposition while she
1170
is doing her work. We originally had quite precise ideas about
1171
how windows should behave, but on the other hand, anyone used to
1172
Emacs is often happy to keep full control. Maybe a fixed window
1173
disposition might be offered as a PO mode option that the translator
1174
might activate or deactivate at will, so it could be offered on an
1175
experimental basis. If nobody feels a real need for using it, or
1176
a compulsion for writing it, we should drop this whole idea.
1177
The incentive for doing it should come from translators rather than
1178
programmers, as opinions from an experienced translator are surely
1179
more worth to me than opinions from programmers @emph{thinking} about
1180
how @emph{others} should do translation.
1182
The commands @kbd{n} (@code{po-next-entry}) and @kbd{p}
1183
(@code{po-previous-entry}) move the cursor the entry following,
1184
or preceding, the current one. If @kbd{n} is given while the
1185
cursor is on the last entry of the PO file, or if @kbd{p}
1186
is given while the cursor is on the first entry, no move is done.
1188
The commands @kbd{<} (@code{po-first-entry}) and @kbd{>}
1189
(@code{po-last-entry}) move the cursor to the first entry, or last
1190
entry, of the PO file. When the cursor is located past the last
1191
entry in a PO file, most PO mode commands will return an error saying
1192
@samp{After last entry}. Moreover, the commands @kbd{<} and @kbd{>}
1193
have the special property of being able to work even when the cursor
1194
is not into some PO file entry, and one may use them for nicely
1195
correcting this situation. But even these commands will fail on a
1196
truly empty PO file. There are development plans for the PO mode for it
1197
to interactively fill an empty PO file from sources. @xref{Marking}.
1199
The translator may decide, before working at the translation of
1200
a particular entry, that she needs to browse the remainder of the
1201
PO file, maybe for finding the terminology or phraseology used
1202
in related entries. She can of course use the standard Emacs idioms
1203
for saving the current cursor location in some register, and use that
1204
register for getting back, or else, use the location ring.
1206
PO mode offers another approach, by which cursor locations may be saved
1207
onto a special stack. The command @kbd{m} (@code{po-push-location})
1208
merely adds the location of current entry to the stack, pushing
1209
the already saved locations under the new one. The command
1210
@kbd{r} (@code{po-pop-location}) consumes the top stack element and
1211
repositions the cursor to the entry associated with that top element.
1212
This position is then lost, for the next @kbd{r} will move the cursor
1213
to the previously saved location, and so on until no locations remain
1216
If the translator wants the position to be kept on the location stack,
1217
maybe for taking a look at the entry associated with the top
1218
element, then go elsewhere with the intent of getting back later, she
1219
ought to use @kbd{m} immediately after @kbd{r}.
1221
The command @kbd{x} (@code{po-exchange-location}) simultaneously
1222
repositions the cursor to the entry associated with the top element of
1223
the stack of saved locations, and replaces that top element with the
1224
location of the current entry before the move. Consequently, repeating
1225
the @kbd{x} command toggles alternatively between two entries.
1226
For achieving this, the translator will position the cursor on the
1227
first entry, use @kbd{m}, then position to the second entry, and
1228
merely use @kbd{x} for making the switch.
1230
@node Normalizing, , Entry Positioning, Basics
1231
@section Normalizing Strings in Entries
1233
There are many different ways for encoding a particular string into a
1234
PO file entry, because there are so many different ways to split and
1235
quote multi-line strings, and even, to represent special characters
1236
by backslahsed escaped sequences. Some features of PO mode rely on
1237
the ability for PO mode to scan an already existing PO file for a
1238
particular string encoded into the @code{msgid} field of some entry.
1239
Even if PO mode has internally all the built-in machinery for
1240
implementing this recognition easily, doing it fast is technically
1241
difficult. To facilitate a solution to this efficiency problem,
1242
we decided on a canonical representation for strings.
1244
A conventional representation of strings in a PO file is currently
1245
under discussion, and PO mode experiments with a canonical representation.
1246
Having both @code{xgettext} and PO mode converging towards a uniform
1247
way of representing equivalent strings would be useful, as the internal
1248
normalization needed by PO mode could be automatically satisfied
1249
when using @code{xgettext} from GNU @code{gettext}. An explicit
1250
PO mode normalization should then be only necessary for PO files
1251
imported from elsewhere, or for when the convention itself evolves.
1253
So, for achieving normalization of at least the strings of a given
1254
PO file needing a canonical representation, the following PO mode
1255
command is available:
1258
@item M-x po-normalize
1259
Tidy the whole PO file by making entries more uniform.
1263
The special command @kbd{M-x po-normalize}, which has no associated
1264
keys, revises all entries, ensuring that strings of both original
1265
and translated entries use uniform internal quoting in the PO file.
1266
It also removes any crumb after the last entry. This command may be
1267
useful for PO files freshly imported from elsewhere, or if we ever
1268
improve on the canonical quoting format we use. This canonical format
1269
is not only meant for getting cleaner PO files, but also for greatly
1270
speeding up @code{msgid} string lookup for some other PO mode commands.
1272
@kbd{M-x po-normalize} presently makes three passes over the entries.
1273
The first implements heuristics for converting PO files for GNU
1274
@code{gettext} 0.6 and earlier, in which @code{msgid} and @code{msgstr}
1275
fields were using K&R style C string syntax for multi-line strings.
1276
These heuristics may fail for comments not related to obsolete
1277
entries and ending with a backslash; they also depend on subsequent
1278
passes for finalizing the proper commenting of continued lines for
1279
obsolete entries. This first pass might disappear once all oldish PO
1280
files would have been adjusted. The second and third pass normalize
1281
all @code{msgid} and @code{msgstr} strings respectively. They also
1282
clean out those trailing backslashes used by XView's @code{msgfmt}
1283
for continued lines.
1285
Having such an explicit normalizing command allows for importing PO
1286
files from other sources, but also eases the evolution of the current
1287
convention, evolution driven mostly by aesthetic concerns, as of now.
1288
It is easy to make suggested adjustments at a later time, as the
1289
normalizing command and eventually, other GNU @code{gettext} tools
1290
should greatly automate conformance. A description of the canonical
1291
string format is given below, for the particular benefit of those not
1292
having Emacs handy, and who would nevertheless want to handcraft
1293
their PO files in nice ways.
1295
Right now, in PO mode, strings are single line or multi-line. A string
1296
goes multi-line if and only if it has @emph{embedded} newlines, that
1297
is, if it matches @samp{[^\n]\n+[^\n]}. So, we would have:
1300
msgstr "\n\nHello, world!\n\n\n"
1303
but, replacing the space by a newline, this becomes:
1315
We are deliberately using a caricatural example, here, to make the
1316
point clearer. Usually, multi-lines are not that bad looking.
1317
It is probable that we will implement the following suggestion.
1318
We might lump together all initial newlines into the empty string,
1319
and also all newlines introducing empty lines (that is, for @w{@var{n}
1320
> 1}, the @var{n}-1'th last newlines would go together on a separate
1321
string), so making the previous example appear:
1330
There are a few yet undecided little points about string normalization,
1331
to be documented in this manual, once these questions settle.
1333
@node Sources, Template, Basics, Top
1334
@chapter Preparing Program Sources
1336
@c FIXME: Rewrite (the whole chapter).
1338
For the programmer, changes to the C source code fall into three
1339
categories. First, you have to make the localization functions
1340
known to all modules needing message translation. Second, you should
1341
properly trigger the operation of GNU @code{gettext} when the program
1342
initializes, usually from the @code{main} function. Last, you should
1343
identify and especially mark all constant strings in your program
1344
needing translation.
1346
Presuming that your set of programs, or package, has been adjusted
1347
so all needed GNU @code{gettext} files are available, and your
1348
@file{Makefile} files are adjusted (@pxref{Maintainers}), each C module
1349
having translated C strings should contain the line:
1352
#include <libintl.h>
1355
The remaining changes to your C sources are discussed in the further
1356
sections of this chapter.
1359
* Triggering:: Triggering @code{gettext} Operations
1360
* Mark Keywords:: How Marks Appear in Sources
1361
* Marking:: Marking Translatable Strings
1362
* c-format:: Telling something about the following string
1363
* Special cases:: Special Cases of Translatable Strings
1366
@node Triggering, Mark Keywords, Sources, Sources
1367
@section Triggering @code{gettext} Operations
1369
The initialization of locale data should be done with more or less
1370
the same code in every program, as demonstrated below:
1380
setlocale (LC_ALL, "");
1381
bindtextdomain (PACKAGE, LOCALEDIR);
1382
textdomain (PACKAGE);
1388
@var{PACKAGE} and @var{LOCALEDIR} should be provided either by
1389
@file{config.h} or by the Makefile. For now consult the @code{gettext}
1390
sources for more information.
1392
The use of @code{LC_ALL} might not be appropriate for you.
1393
@code{LC_ALL} includes all locale categories and especially
1394
@code{LC_CTYPE}. This later category is responsible for determining
1395
character classes with the @code{isalnum} etc. functions from
1396
@file{ctype.h} which could especially for programs, which process some
1397
kind of input language, be wrong. For example this would mean that a
1398
source code using the @,{c} (c-cedilla character) is runnable in
1399
France but not in the U.S.
1401
Some systems also have problems with parsing numbers using the
1402
@code{scanf} functions if an other but the @code{LC_ALL} locale is used.
1403
The standards say that additional formats but the one known in the
1404
@code{"C"} locale might be recognized. But some systems seem to reject
1405
numbers in the @code{"C"} locale format. In some situation, it might
1406
also be a problem with the notation itself which makes it impossible to
1407
recognize whether the number is in the @code{"C"} locale or the local
1408
format. This can happen if thousands separator characters are used.
1409
Some locales define this character accordfing to the national
1410
conventions to @code{'.'} which is the same character used in the
1411
@code{"C"} locale to denote the decimal point.
1413
So it is sometimes necessary to replace the @code{LC_ALL} line in the
1414
code above by a sequence of @code{setlocale} lines
1420
setlocale (LC_CTYPE, "");
1421
setlocale (LC_MESSAGES, "");
1428
On all POSIX conformant systems the locale categories @code{LC_CTYPE},
1429
@code{LC_COLLATE}, @code{LC_MONETARY}, @code{LC_NUMERIC}, and
1430
@code{LC_TIME} are available. On some modern systems there is also a
1431
locale @code{LC_MESSAGES} which is called on some old, XPG2 compliant
1432
systems @code{LC_RESPONSES}.
1434
Note that changing the @code{LC_CTYPE} also affects the functions
1435
declared in the @code{<ctype.h>} standard header. If this is not
1436
desirable in your application (for example in a compiler's parser),
1437
you can use a set of substitute functions which hardwire the C locale,
1438
such as found in the @code{<c-ctype.h>} and @code{<c-ctype.c>} files
1439
in the gettext source distribution.
1441
It is also possible to switch the locale forth and back between the
1442
environment dependent locale and the C locale, but this approach is
1443
normally avoided because a @code{setlocale} call is expensive,
1444
because it is tedious to determine the places where a locale switch
1445
is needed in a large program's source, and because switching a locale
1446
is not multithread-safe.
1448
@node Mark Keywords, Marking, Triggering, Sources
1449
@section How Marks Appear in Sources
1451
All strings requiring translation should be marked in the C sources. Marking
1452
is done in such a way that each translatable string appears to be
1453
the sole argument of some function or preprocessor macro. There are
1454
only a few such possible functions or macros meant for translation,
1455
and their names are said to be marking keywords. The marking is
1456
attached to strings themselves, rather than to what we do with them.
1457
This approach has more uses. A blatant example is an error message
1458
produced by formatting. The format string needs translation, as
1459
well as some strings inserted through some @samp{%s} specification
1460
in the format, while the result from @code{sprintf} may have so many
1461
different instances that it is impractical to list them all in some
1462
@samp{error_string_out()} routine, say.
1464
This marking operation has two goals. The first goal of marking
1465
is for triggering the retrieval of the translation, at run time.
1466
The keyword are possibly resolved into a routine able to dynamically
1467
return the proper translation, as far as possible or wanted, for the
1468
argument string. Most localizable strings are found in executable
1469
positions, that is, attached to variables or given as parameters to
1470
functions. But this is not universal usage, and some translatable
1471
strings appear in structured initializations. @xref{Special cases}.
1473
The second goal of the marking operation is to help @code{xgettext}
1474
at properly extracting all translatable strings when it scans a set
1475
of program sources and produces PO file templates.
1477
The canonical keyword for marking translatable strings is
1478
@samp{gettext}, it gave its name to the whole GNU @code{gettext}
1479
package. For packages making only light use of the @samp{gettext}
1480
keyword, macro or function, it is easily used @emph{as is}. However,
1481
for packages using the @code{gettext} interface more heavily, it
1482
is usually more convenient to give the main keyword a shorter, less
1483
obtrusive name. Indeed, the keyword might appear on a lot of strings
1484
all over the package, and programmers usually do not want nor need
1485
their program sources to remind them forcefully, all the time, that they
1486
are internationalized. Further, a long keyword has the disadvantage
1487
of using more horizontal space, forcing more indentation work on
1488
sources for those trying to keep them within 79 or 80 columns.
1490
Many packages use @samp{_} (a simple underline) as a keyword,
1491
and write @samp{_("Translatable string")} instead of @samp{gettext
1492
("Translatable string")}. Further, the coding rule, from GNU standards,
1493
wanting that there is a space between the keyword and the opening
1494
parenthesis is relaxed, in practice, for this particular usage.
1495
So, the textual overhead per translatable string is reduced to
1496
only three characters: the underline and the two parentheses.
1497
However, even if GNU @code{gettext} uses this convention internally,
1498
it does not offer it officially. The real, genuine keyword is truly
1499
@samp{gettext} indeed. It is fairly easy for those wanting to use
1500
@samp{_} instead of @samp{gettext} to declare:
1503
#include <libintl.h>
1504
#define _(String) gettext (String)
1508
instead of merely using @samp{#include <libintl.h>}.
1510
Later on, the maintenance is relatively easy. If, as a programmer,
1511
you add or modify a string, you will have to ask yourself if the
1512
new or altered string requires translation, and include it within
1513
@samp{_()} if you think it should be translated. @samp{"%s: %d"} is
1514
an example of string @emph{not} requiring translation!
1516
@node Marking, c-format, Mark Keywords, Sources
1517
@section Marking Translatable Strings
1519
In PO mode, one set of features is meant more for the programmer than
1520
for the translator, and allows him to interactively mark which strings,
1521
in a set of program sources, are translatable, and which are not.
1522
Even if it is a fairly easy job for a programmer to find and mark
1523
such strings by other means, using any editor of his choice, PO mode
1524
makes this work more comfortable. Further, this gives translators
1525
who feel a little like programmers, or programmers who feel a little
1526
like translators, a tool letting them work at marking translatable
1527
strings in the program sources, while simultaneously producing a set of
1528
translation in some language, for the package being internationalized.
1530
The set of program sources, targetted by the PO mode commands describe
1531
here, should have an Emacs tags table constructed for your project,
1532
prior to using these PO file commands. This is easy to do. In any
1533
shell window, change the directory to the root of your project, then
1534
execute a command resembling:
1537
etags src/*.[hc] lib/*.[hc]
1541
presuming here you want to process all @file{.h} and @file{.c} files
1542
from the @file{src/} and @file{lib/} directories. This command will
1543
explore all said files and create a @file{TAGS} file in your root
1544
directory, somewhat summarizing the contents using a special file
1545
format Emacs can understand.
1547
For packages following the GNU coding standards, there is
1548
a make goal @code{tags} or @code{TAGS} which constructs the tag files in
1549
all directories and for all files containing source code.
1551
Once your @file{TAGS} file is ready, the following commands assist
1552
the programmer at marking translatable strings in his set of sources.
1553
But these commands are necessarily driven from within a PO file
1554
window, and it is likely that you do not even have such a PO file yet.
1555
This is not a problem at all, as you may safely open a new, empty PO
1556
file, mainly for using these commands. This empty PO file will slowly
1557
fill in while you mark strings as translatable in your program sources.
1561
Search through program sources for a string which looks like a
1562
candidate for translation.
1565
Mark the last string found with @samp{_()}.
1568
Mark the last string found with a keyword taken from a set of possible
1569
keywords. This command with a prefix allows some management of these
1574
The @kbd{,} (@code{po-tags-search}) command searches for the next
1575
occurrence of a string which looks like a possible candidate for
1576
translation, and displays the program source in another Emacs window,
1577
positioned in such a way that the string is near the top of this other
1578
window. If the string is too big to fit whole in this window, it is
1579
positioned so only its end is shown. In any case, the cursor
1580
is left in the PO file window. If the shown string would be better
1581
presented differently in different native languages, you may mark it
1582
using @kbd{M-,} or @kbd{M-.}. Otherwise, you might rather ignore it
1583
and skip to the next string by merely repeating the @kbd{,} command.
1585
A string is a good candidate for translation if it contains a sequence
1586
of three or more letters. A string containing at most two letters in
1587
a row will be considered as a candidate if it has more letters than
1588
non-letters. The command disregards strings containing no letters,
1589
or isolated letters only. It also disregards strings within comments,
1590
or strings already marked with some keyword PO mode knows (see below).
1592
If you have never told Emacs about some @file{TAGS} file to use, the
1593
command will request that you specify one from the minibuffer, the
1594
first time you use the command. You may later change your @file{TAGS}
1595
file by using the regular Emacs command @w{@kbd{M-x visit-tags-table}},
1596
which will ask you to name the precise @file{TAGS} file you want
1597
to use. @xref{Tags, , Tag Tables, emacs, The Emacs Editor}.
1599
Each time you use the @kbd{,} command, the search resumes from where it was
1600
left by the previous search, and goes through all program sources,
1601
obeying the @file{TAGS} file, until all sources have been processed.
1602
However, by giving a prefix argument to the command @w{(@kbd{C-u
1603
,})}, you may request that the search be restarted all over again
1604
from the first program source; but in this case, strings that you
1605
recently marked as translatable will be automatically skipped.
1607
Using this @kbd{,} command does not prevent using of other regular
1608
Emacs tags commands. For example, regular @code{tags-search} or
1609
@code{tags-query-replace} commands may be used without disrupting the
1610
independent @kbd{,} search sequence. However, as implemented, the
1611
@emph{initial} @kbd{,} command (or the @kbd{,} command is used with a
1612
prefix) might also reinitialize the regular Emacs tags searching to the
1613
first tags file, this reinitialization might be considered spurious.
1615
The @kbd{M-,} (@code{po-mark-translatable}) command will mark the
1616
recently found string with the @samp{_} keyword. The @kbd{M-.}
1617
(@code{po-select-mark-and-mark}) command will request that you type
1618
one keyword from the minibuffer and use that keyword for marking
1619
the string. Both commands will automatically create a new PO file
1620
untranslated entry for the string being marked, and make it the
1621
current entry (making it easy for you to immediately proceed to its
1622
translation, if you feel like doing it right away). It is possible
1623
that the modifications made to the program source by @kbd{M-,} or
1624
@kbd{M-.} render some source line longer than 80 columns, forcing you
1625
to break and re-indent this line differently. You may use the @kbd{O}
1626
command from PO mode, or any other window changing command from
1627
Emacs, to break out into the program source window, and do any
1628
needed adjustments. You will have to use some regular Emacs command
1629
to return the cursor to the PO file window, if you want command
1630
@kbd{,} for the next string, say.
1632
The @kbd{M-.} command has a few built-in speedups, so you do not
1633
have to explicitly type all keywords all the time. The first such
1634
speedup is that you are presented with a @emph{preferred} keyword,
1635
which you may accept by merely typing @kbd{@key{RET}} at the prompt.
1636
The second speedup is that you may type any non-ambiguous prefix of the
1637
keyword you really mean, and the command will complete it automatically
1638
for you. This also means that PO mode has to @emph{know} all
1639
your possible keywords, and that it will not accept mistyped keywords.
1641
If you reply @kbd{?} to the keyword request, the command gives a
1642
list of all known keywords, from which you may choose. When the
1643
command is prefixed by an argument @w{(@kbd{C-u M-.})}, it inhibits
1644
updating any program source or PO file buffer, and does some simple
1645
keyword management instead. In this case, the command asks for a
1646
keyword, written in full, which becomes a new allowed keyword for
1647
later @kbd{M-.} commands. Moreover, this new keyword automatically
1648
becomes the @emph{preferred} keyword for later commands. By typing
1649
an already known keyword in response to @w{@kbd{C-u M-.}}, one merely
1650
changes the @emph{preferred} keyword and does nothing more.
1652
All keywords known for @kbd{M-.} are recognized by the @kbd{,} command
1653
when scanning for strings, and strings already marked by any of those
1654
known keywords are automatically skipped. If many PO files are opened
1655
simultaneously, each one has its own independent set of known keywords.
1656
There is no provision in PO mode, currently, for deleting a known
1657
keyword, you have to quit the file (maybe using @kbd{q}) and reopen
1658
it afresh. When a PO file is newly brought up in an Emacs window, only
1659
@samp{gettext} and @samp{_} are known as keywords, and @samp{gettext}
1660
is preferred for the @kbd{M-.} command. In fact, this is not useful to
1661
prefer @samp{_}, as this one is already built in the @kbd{M-,} command.
1663
@node c-format, Special cases, Marking, Sources
1664
@section Special Comments preceding Keywords
1666
@c FIXME document c-format and no-c-format.
1668
In C programs strings are often used within calls of functions from the
1669
@code{printf} family. The special thing about these format strings is
1670
that they can contain format specifiers introduced with @kbd{%}. Assume
1674
printf (gettext ("String `%s' has %d characters\n"), s, strlen (s));
1678
A possible German translation for the above string might be:
1681
"%d Zeichen lang ist die Zeichenkette `%s'"
1684
A C programmer, even if he cannot speak German, will recognize that
1685
there is something wrong here. The order of the two format specifiers
1686
is changed but of course the arguments in the @code{printf} don't have.
1687
This will most probably lead to problems because now the length of the
1688
string is regarded as the address.
1690
To prevent errors at runtime caused by translations the @code{msgfmt}
1691
tool can check statically whether the arguments in the original and the
1692
translation string match in type and number. If this is not the case a
1693
warning will be given and the error cannot causes problems at runtime.
1696
If the word order in the above German translation would be correct one
1700
"%2$d Zeichen lang ist die Zeichenkette `%1$s'"
1704
The routines in @code{msgfmt} know about this special notation.
1706
Because not all strings in a program must be format strings it is not
1707
useful for @code{msgfmt} to test all the strings in the @file{.po} file.
1708
This might cause problems because the string might contain what looks
1709
like a format specifier, but the string is not used in @code{printf}.
1711
Therefore the @code{xgettext} adds a special tag to those messages it
1712
thinks might be a format string. There is no absolute rule for this,
1713
only a heuristic. In the @file{.po} file the entry is marked using the
1714
@code{c-format} flag in the @kbd{#,} comment line (@pxref{PO Files}).
1716
The careful reader now might say that this again can cause problems.
1717
The heuristic might guess it wrong. This is true and therefore
1718
@code{xgettext} knows about special kind of comment which lets
1719
the programmer take over the decision. If in the same line or
1720
the immediately preceding line of the @code{gettext} keyword
1721
the @code{xgettext} program find a comment containing the words
1722
@kbd{xgettext:c-format} it will mark the string in any case with
1723
the @kbd{c-format} flag. This kind of comment should be used when
1724
@code{xgettext} does not recognize the string as a format string but
1725
is really is one and it should be tested. Please note that when the
1726
comment is in the same line of the @code{gettext} keyword, it must be
1727
before the string to be translated.
1729
This situation happens quite often. The @code{printf} function is often
1730
called with strings which do not contain a format specifier. Of course
1731
one would normally use @code{fputs} but it does happen. In this case
1732
@code{xgettext} does not recognize this as a format string but what
1733
happens if the translation introduces a valid format specifier? The
1734
@code{printf} function will try to access one of the parameter but none
1735
exists because the original code does not refer to any parameter.
1737
@code{xgettext} of course could make a wrong decision the other way
1738
round, i.e. a string marked as a format string actually is not a format
1739
string. In this case the @code{msgfmt} might give too many warnings and
1740
would prevent translating the @file{.po} file. The method to prevent
1741
this wrong decision is similar to the one used above, only the comment
1742
to use must contain the string @kbd{xgettext:no-c-format}.
1744
If a string is marked with @kbd{c-format} and this is not correct the
1745
user can find out who is responsible for the decision. See
1746
@ref{xgettext Invocation} to see how the @kbd{--debug} option can be
1747
used for solving this problem.
1749
@node Special cases, , c-format, Sources
1750
@section Special Cases of Translatable Strings
1752
The attentive reader might now point out that it is not always possible
1753
to mark translatable string with @code{gettext} or something like this.
1754
Consider the following case:
1759
static const char *messages[] = @{
1760
"some very meaningful message",
1766
= index > 1 ? "a default message" : messages[index];
1774
While it is no problem to mark the string @code{"a default message"} it
1775
is not possible to mark the string initializers for @code{messages}.
1776
What is to be done? We have to fulfill two tasks. First we have to mark the
1777
strings so that the @code{xgettext} program (@pxref{xgettext Invocation})
1778
can find them, and second we have to translate the string at runtime
1779
before printing them.
1781
The first task can be fulfilled by creating a new keyword, which names a
1782
no-op. For the second we have to mark all access points to a string
1783
from the array. So one solution can look like this:
1787
#define gettext_noop(String) (String)
1790
static const char *messages[] = @{
1791
gettext_noop ("some very meaningful message"),
1792
gettext_noop ("and another one")
1797
= index > 1 ? gettext ("a default message") : gettext (messages[index]);
1805
Please convince yourself that the string which is written by
1806
@code{fputs} is translated in any case. How to get @code{xgettext} know
1807
the additional keyword @code{gettext_noop} is explained in @ref{xgettext
1810
The above is of course not the only solution. You could also come along
1811
with the following one:
1815
#define gettext_noop(String) (String)
1818
static const char *messages[] = @{
1819
gettext_noop ("some very meaningful message",
1820
gettext_noop ("and another one")
1825
= index > 1 ? gettext_noop ("a default message") : messages[index];
1827
fputs (gettext (string));
1833
But this has some drawbacks. First the programmer has to take care that
1834
he uses @code{gettext_noop} for the string @code{"a default message"}.
1835
A use of @code{gettext} could have in rare cases unpredictable results.
1836
The second reason is found in the internals of the GNU @code{gettext}
1837
Library which will make this solution less efficient.
1839
One advantage is that you need not make control flow analysis to make
1840
sure the output is really translated in any case. But this analysis is
1841
generally not very difficult. If it should be in any situation you can
1842
use this second method in this situation.
1844
@node Template, Creating, Sources, Top
1845
@chapter Making the PO Template File
1847
After preparing the sources, the programmer creates a PO template file.
1848
This section explains how to use @code{xgettext} for this purpose.
1853
* xgettext Invocation:: Invoking the @code{xgettext} Program
1856
@node xgettext Invocation, , Template, Template
1857
@section Invoking the @code{xgettext} Program
1862
xgettext [@var{option}] @var{inputfile} @dots{}
1867
@itemx --extract-all
1868
Extract all strings.
1870
@item -c [@var{tag}]
1871
@itemx --add-comments[=@var{tag}]
1872
Place comment block with @var{tag} (or those preceding keyword lines)
1877
Recognize C++ style comments.
1880
Use the flags @kbd{c-format} and @kbd{possible-c-format} to show who was
1881
responsible for marking a message as a format string. The latter form is
1882
used if the @code{xgettext} program decided, the format form is used if
1883
the programmer prescribed it.
1885
By default only the @kbd{c-format} form is used. The translator should
1886
not have to care about these details.
1889
@itemx --default-domain=@var{name}
1890
Use @file{@var{name}.po} for output (instead of @file{messages.po}).
1892
The special domain name @file{-} or @file{/dev/stdout} means to write
1893
the output to @file{stdout}.
1895
@item -D @var{directory}
1896
@itemx --directory=@var{directory}
1897
Change to @var{directory} before beginning to search and scan source
1898
files. The resulting @file{.po} file will be written relative to the
1899
original directory, though.
1902
@itemx --files-from=@var{file}
1903
Read the names of the input files from @var{file} instead of getting
1904
them from the command line.
1907
Always write an output file even if no message is defined.
1911
Display this help and exit.
1914
@itemx --input-path=@var{list}
1915
List of directories searched for input files.
1918
@itemx --join-existing
1919
Join messages with existing file.
1922
@itemx --keyword[=@var{keywordspec}]
1923
Additional keyword to be looked for (without @var{keywordspec} means not to
1924
use default keywords).
1926
If @var{keywordspec} is a C identifer @var{id}, @code{xgettext} looks
1927
for strings in the first argument of each call to the function or macro
1928
@var{id}. If @var{keywordspec} is of the form
1929
@samp{@var{id}:@var{argnum}}, @code{xgettext} looks for strings in the
1930
@var{argnum}th argument of the call. If @var{keywordspec} is of the form
1931
@samp{@var{id}:@var{argnum1},@var{argnum2}}, @code{xgettext} looks for
1932
strings in the @var{argnum1}st argument and in the @var{argnum2}nd argument
1933
of the call, and treats them as singular/plural variants for a message
1934
with plural handling.
1936
The default keyword specifications, which are always looked for if not
1937
explicitly disabled, are @code{gettext}, @code{dgettext:2},
1938
@code{dcgettext:2}, @code{ngettext:1,2}, @code{dngettext:2,3},
1939
@code{dcngettext:2,3}, and @code{gettext_noop}.
1941
@item -m [@var{string}]
1942
@itemx --msgstr-prefix[=@var{string}]
1943
Use @var{string} or "" as prefix for msgstr entries.
1945
@item -M [@var{string}]
1946
@itemx --msgstr-suffix[=@var{string}]
1947
Use @var{string} or "" as suffix for msgstr entries.
1950
Do not write @samp{#: @var{filename}:@var{line}} lines.
1953
@itemx --add-location
1954
Generate @samp{#: @var{filename}:@var{line}} lines (default).
1957
Don't write header with @samp{msgid ""} entry.
1959
This is useful for testing purposes because it eliminates a source
1960
of variance for generated @code{.gmo} files. We can ship some of
1961
these files in the GNU @code{gettext} package, and the result of
1962
regenerating them through @code{msgfmt} should yield the same values.
1965
@itemx --output-dir=@var{dir}
1966
Output files will be placed in directory @var{dir}.
1969
@itemx --sort-output
1970
Generate sorted output and remove duplicates.
1973
Write out a strict Uniforum conforming PO file.
1977
Output version information and exit.
1980
@itemx --exclude-file=@var{file}
1981
Entries from @var{file} are not extracted.
1985
Search path for supplementary PO files is:
1986
@file{/usr/local/share/nls/src/}.
1988
If @var{inputfile} is @samp{-}, standard input is read.
1990
This implementation of @code{xgettext} is able to process a few awkward
1991
cases, like strings in preprocessor macros, ANSI concatenation of
1992
adjacent strings, and escaped end of lines for continued strings.
1994
@node Creating, Updating, Template, Top
1995
@chapter Creating a New PO File
1997
When starting a new translation, the translator copies the
1998
@file{@var{package}.pot} template file to a file called
1999
@file{@var{LANG}.po}. Then she modifies the initial comments and
2000
the header entry of this file.
2002
The initial comments "SOME DESCRIPTIVE TITLE", "YEAR" and
2003
"FIRST AUTHOR <EMAIL@@ADDRESS>, YEAR" ought to be replaced by sensible
2004
information. This can be done in any text editor; if Emacs is used
2005
and it switched to PO mode automatically (because it has recognized
2006
the file's suffix), you can disable it by typing @kbd{M-x fundamental-mode}.
2008
Modifying the header entry can already be done using PO mode: in Emacs,
2009
type @kbd{M-x po-mode RET} and then @kbd{RET} again to start editing the
2010
entry. You should fill in the following fields.
2013
@item Project-Id-Version
2014
This is the name and version of the package.
2016
@item POT-Creation-Date
2017
This has already been filled in by @code{xgettext}.
2019
@item PO-Revision-Date
2020
You don't need to fill this in. It will be filled by the Emacs PO mode
2021
when you save the file.
2023
@item Last-Translator
2024
Fill in your name and email address (without double quotes).
2027
Fill in the English name of the language, and the email address of the
2028
language team you are part of.
2030
Before starting a translation, it is a good idea to get in touch with
2031
your translation team, not only to make sure you don't do duplicated work,
2032
but also to coordinate difficult linguistic issues.
2034
In the Free Translation Project, each translation team has its own mailing
2035
list. The up-to-date list of teams can be found at the Free Translation
2036
Project's homepage, @file{http://www.iro.umontreal.ca/contrib/po/HTML/},
2037
in the "National teams" area.
2040
Replace @samp{CHARSET} with the character encoding used for your language,
2041
in your locale, or UTF-8. This field is needed for correct operation of the
2042
@code{msgmerge} and @code{msgfmt} programs, as well as for users whose
2043
locale's character encoding differs from yours (see @ref{Charset conversion}).
2045
You get the character encoding of your locale by running the shell command
2046
@samp{locale charmap}. If the result is @samp{C} or @samp{ANSI_X3.4-1968},
2047
which is equivalent to @samp{ASCII} (= @samp{US-ASCII}), it means that your
2048
locale is not correctly configured. In this case, ask your translation
2049
team which charset to use. @samp{ASCII} is not usable for any language
2052
Because the PO files must be portable to operating systems with less advanced
2053
internationalization facilities, the character encodings that can be used
2054
are limited to those supported by both GNU @code{libc} and GNU
2055
@code{libiconv}. These are:
2056
@code{ASCII}, @code{ISO-8859-1}, @code{ISO-8859-2}, @code{ISO-8859-3},
2057
@code{ISO-8859-4}, @code{ISO-8859-5}, @code{ISO-8859-6}, @code{ISO-8859-7},
2058
@code{ISO-8859-8}, @code{ISO-8859-9}, @code{ISO-8859-13}, @code{ISO-8859-15},
2059
@code{KOI8-R}, @code{KOI8-U}, @code{CP850}, @code{CP866}, @code{CP874},
2060
@code{CP932}, @code{CP949}, @code{CP950}, @code{CP1250}, @code{CP1251},
2061
@code{CP1252}, @code{CP1253}, @code{CP1254}, @code{CP1255}, @code{CP1256},
2062
@code{CP1257}, @code{GB2312}, @code{EUC-JP}, @code{EUC-KR}, @code{EUC-TW},
2063
@code{BIG5}, @code{BIG5-HKSCS}, @code{GBK}, @code{GB18030}, @code{SHIFT_JIS},
2064
@code{JOHAB}, @code{TIS-620}, @code{VISCII}, @code{UTF-8}.
2066
@c This data is taken from glibc/localedata/SUPPORTED.
2067
In the GNU system, the following encodings are frequently used for the
2068
corresponding languages.
2071
@item @code{ISO-8859-1} for
2072
Afrikaans, Albanian, Basque, Catalan, Dutch, English, Estonian, Faroese,
2073
Finnish, French, Galician, German, Greenlandic, Icelandic, Indonesian,
2074
Irish, Italian, Malay, Norwegian, Portuguese, Spanish, Swedish,
2075
@item @code{ISO-8859-2} for
2076
Croatian, Czech, Hungarian, Polish, Romanian, Serbian, Slovak, Slovenian,
2077
@item @code{ISO-8859-3} for Maltese,
2078
@item @code{ISO-8859-5} for Macedonian, Serbian,
2079
@item @code{ISO-8859-6} for Arabic,
2080
@item @code{ISO-8859-7} for Greek,
2081
@item @code{ISO-8859-8} for Hebrew,
2082
@item @code{ISO-8859-9} for Turkish,
2083
@item @code{ISO-8859-13} for Latvian, Lithuanian,
2084
@item @code{ISO-8859-15} for
2085
Basque, Catalan, Dutch, English, Finnish, French, Galician, German, Irish,
2086
Italian, Portuguese, Spanish, Swedish,
2087
@item @code{KOI8-R} for Russian,
2088
@item @code{KOI8-U} for Ukrainian,
2089
@item @code{CP1251} for Bulgarian, Byelorussian,
2090
@item @code{GB2312}, @code{GBK}, @code{GB18030}
2091
for simplified writing of Chinese,
2092
@item @code{BIG5}, @code{BIG5-HKSCS}
2093
for traditional writing of Chinese,
2094
@item @code{EUC-JP} for Japanese,
2095
@item @code{EUC-KR} for Korean,
2096
@item @code{TIS-620} for Thai,
2097
@item @code{UTF-8} for any language, including those listed above.
2100
When single quote characters or double quote characters are used in
2101
translations for your language, and your locale's encoding is one of the
2102
ISO-8859-* charsets, it is best if you create your PO files in UTF-8
2103
encoding, instead of your locale's encoding. This is because in UTF-8
2104
the real quote characters can be represented (single quote characters:
2105
U+2018, U+2019, double quote characters: U+201C, U+201D), whereas none of
2106
ISO-8859-* charsets has them all. Users in UTF-8 locales will see the
2107
real quote characters, whereas users in ISO-8859-* locales will see the
2108
vertical apostrophe and the vertical double quote instead (because that's
2109
what the character set conversion will transliterate them to).
2111
To enter such quote characters under X11, you can change your keyboard
2112
mapping using the @code{xmodmap} program. The X11 names of the quote
2113
characters are "leftsinglequotemark", "rightsinglequotemark",
2114
"leftdoublequotemark", "rightdoublequotemark", "singlelowquotemark",
2115
"doublelowquotemark".
2117
Note that only recent versions of GNU Emacs support the UTF-8 encoding:
2118
Emacs 20 with Mule-UCS, and Emacs 21. As of January 2001, XEmacs doesn't
2119
support the UTF-8 encoding.
2121
The character encoding name can be written in either upper or lower case.
2122
Usually upper case is preferred.
2124
@item Content-Transfer-Encoding
2125
Set this to @code{8bit}.
2128
This field is optional. It is only needed if the PO file has plural forms.
2129
You can find them by searching for the @samp{msgid_plural} keyword. The
2130
format of the plural forms field is described in @ref{Plural forms}.
2133
@node Updating, Binaries, Creating, Top
2134
@chapter Updating Existing PO Files
2139
* msgmerge Invocation:: Invoking the @code{msgmerge} Program
2140
* Translated Entries:: Translated Entries
2141
* Fuzzy Entries:: Fuzzy Entries
2142
* Untranslated Entries:: Untranslated Entries
2143
* Obsolete Entries:: Obsolete Entries
2144
* Modifying Translations:: Modifying Translations
2145
* Modifying Comments:: Modifying Comments
2146
* Subedit:: Mode for Editing Translations
2147
* C Sources Context:: C Sources Context
2148
* Auxiliary:: Consulting Auxiliary PO Files
2149
* Compendium:: Using Translation Compendiums
2152
@node msgmerge Invocation, Translated Entries, Updating, Updating
2153
@section Invoking the @code{msgmerge} Program
2159
@c tupdate --version
2160
@c tupdate @var{new} @var{old}
2163
@c File @var{new} is the last created PO file (generally by
2164
@c @code{xgettext}). It need not contain any translations. File
2165
@c @var{old} is the PO file including the old translations which will
2166
@c be taken over to the newly created file as long as they still match.
2168
@c When English messages change in the programs, this is reflected in
2169
@c the PO file as extracted by @code{xgettext}. In large messages, that
2170
@c can be hard to detect, and will obviously result in an incomplete
2171
@c translation. One of the virtues of @code{tupdate} is that it detects
2172
@c such changes, saving the previous translation into a PO file comment,
2173
@c so marking the entry as obsolete, and giving the modified string with
2174
@c an empty translation, that is, marking the entry as untranslated.
2176
@node Translated Entries, Fuzzy Entries, msgmerge Invocation, Updating
2177
@section Translated Entries
2179
Each PO file entry for which the @code{msgstr} field has been filled with
2180
a translation, and which is not marked as fuzzy (@pxref{Fuzzy Entries}),
2181
is a said to be a @dfn{translated} entry. Only translated entries will
2182
later be compiled by GNU @code{msgfmt} and become usable in programs.
2183
Other entry types will be excluded; translation will not occur for them.
2185
Some commands are more specifically related to translated entry processing.
2189
Find the next translated entry.
2192
Find the previous translated entry.
2196
The commands @kbd{t} (@code{po-next-translated-entry}) and @kbd{M-t}
2197
(@code{po-previous-transted-entry}) move forwards or backwards, chasing
2198
for an translated entry. If none is found, the search is extended and
2199
wraps around in the PO file buffer.
2201
Translated entries usually result from the translator having edited in
2202
a translation for them, @ref{Modifying Translations}. However, if the
2203
variable @code{po-auto-fuzzy-on-edit} is not @code{nil}, the entry having
2204
received a new translation first becomes a fuzzy entry, which ought to
2205
be later unfuzzied before becoming an official, genuine translated entry.
2206
@xref{Fuzzy Entries}.
2208
@node Fuzzy Entries, Untranslated Entries, Translated Entries, Updating
2209
@section Fuzzy Entries
2211
Each PO file entry may have a set of @dfn{attributes}, which are
2212
qualities given a name and explicitely associated with the translation,
2213
using a special system comment. One of these attributes
2214
has the name @code{fuzzy}, and entries having this attribute are said
2215
to have a fuzzy translation. They are called fuzzy entries, for short.
2217
Fuzzy entries, even if they account for translated entries for
2218
most other purposes, usually call for revision by the translator.
2219
Those may be produced by applying the program @code{msgmerge} to
2220
update an older translated PO files according to a new PO template
2221
file, when this tool hypothesises that some new @code{msgid} has
2222
been modified only slightly out of an older one, and chooses to pair
2223
what it thinks to be the old translation for the new modified entry.
2224
The slight alteration in the original string (the @code{msgid} string)
2225
should often be reflected in the translated string, and this requires
2226
the intervention of the translator. For this reason, @code{msgmerge}
2227
might mark some entries as being fuzzy.
2229
Also, the translator may decide herself to mark an entry as fuzzy
2230
for her own convenience, when she wants to remember that the entry
2231
has to be later revisited. So, some commands are more specifically
2232
related to fuzzy entry processing.
2236
Find the next fuzzy entry.
2239
Find the previous fuzzy entry.
2242
Remove the fuzzy attribute of the current entry.
2246
The commands @kbd{f} (@code{po-next-fuzzy}) and @kbd{M-f}
2247
(@code{po-previous-fuzzy}) move forwards or backwards, chasing for
2248
a fuzzy entry. If none is found, the search is extended and wraps
2249
around in the PO file buffer.
2251
The command @kbd{@key{TAB}} (@code{po-unfuzzy}) removes the fuzzy
2252
attribute associated with an entry, usually leaving it translated.
2253
Further, if the variable @code{po-auto-select-on-unfuzzy} has not
2254
the @code{nil} value, the @kbd{@key{TAB}} command will automatically chase
2255
for another interesting entry to work on. The initial value of
2256
@code{po-auto-select-on-unfuzzy} is @code{nil}.
2258
The initial value of @code{po-auto-fuzzy-on-edit} is @code{nil}. However,
2259
if the variable @code{po-auto-fuzzy-on-edit} is set to @code{t}, any entry
2260
edited through the @kbd{@key{RET}} command is marked fuzzy, as a way to
2261
ensure some kind of double check, later. In this case, the usual paradigm
2262
is that an entry becomes fuzzy (if not already) whenever the translator
2263
modifies it. If she is satisfied with the translation, she then uses
2264
@kbd{@key{TAB}} to pick another entry to work on, clearing the fuzzy attribute
2265
on the same blow. If she is not satisfied yet, she merely uses @kbd{@key{SPC}}
2266
to chase another entry, leaving the entry fuzzy.
2268
The translator may also use the @kbd{@key{DEL}} command
2269
(@code{po-fade-out-entry}) over any translated entry to mark it as being
2270
fuzzy, when she wants to easily leave a trace she wants to later return
2271
working at this entry.
2273
Also, when time comes to quit working on a PO file buffer with the @kbd{q}
2274
command, the translator is asked for confirmation, if fuzzy string
2277
@node Untranslated Entries, Obsolete Entries, Fuzzy Entries, Updating
2278
@section Untranslated Entries
2280
When @code{xgettext} originally creates a PO file, unless told
2281
otherwise, it initializes the @code{msgid} field with the untranslated
2282
string, and leaves the @code{msgstr} string to be empty. Such entries,
2283
having an empty translation, are said to be @dfn{untranslated} entries.
2284
Later, when the programmer slightly modifies some string right in
2285
the program, this change is later reflected in the PO file
2286
by the appearance of a new untranslated entry for the modified string.
2288
The usual commands moving from entry to entry consider untranslated
2289
entries on the same level as active entries. Untranslated entries
2290
are easily recognizable by the fact they end with @w{@samp{msgstr ""}}.
2292
The work of the translator might be (quite naively) seen as the process
2293
of seeking for an untranslated entry, editing a translation for
2294
it, and repeating these actions until no untranslated entries remain.
2295
Some commands are more specifically related to untranslated entry
2300
Find the next untranslated entry.
2303
Find the previous untranslated entry.
2306
Turn the current entry into an untranslated one.
2310
The commands @kbd{u} (@code{po-next-untranslated-entry}) and @kbd{M-u}
2311
(@code{po-previous-untransted-entry}) move forwards or backwards,
2312
chasing for an untranslated entry. If none is found, the search is
2313
extended and wraps around in the PO file buffer.
2315
An entry can be turned back into an untranslated entry by
2316
merely emptying its translation, using the command @kbd{k}
2317
(@code{po-kill-msgstr}). @xref{Modifying Translations}.
2319
Also, when time comes to quit working on a PO file buffer
2320
with the @kbd{q} command, the translator is asked for confirmation,
2321
if some untranslated string still exists.
2323
@node Obsolete Entries, Modifying Translations, Untranslated Entries, Updating
2324
@section Obsolete Entries
2326
By @dfn{obsolete} PO file entries, we mean those entries which are
2327
commented out, usually by @code{msgmerge} when it found that the
2328
translation is not needed anymore by the package being localized.
2330
The usual commands moving from entry to entry consider obsolete
2331
entries on the same level as active entries. Obsolete entries are
2332
easily recognizable by the fact that all their lines start with
2333
@kbd{#}, even those lines containing @code{msgid} or @code{msgstr}.
2335
Commands exist for emptying the translation or reinitializing it
2336
to the original untranslated string. Commands interfacing with the
2337
kill ring may force some previously saved text into the translation.
2338
The user may interactively edit the translation. All these commands
2339
may apply to obsolete entries, carefully leaving the entry obsolete
2342
Moreover, some commands are more specifically related to obsolete
2347
Find the next obsolete entry.
2350
Find the previous obsolete entry.
2353
Make an active entry obsolete, or zap out an obsolete entry.
2357
The commands @kbd{o} (@code{po-next-obsolete-entry}) and @kbd{M-o}
2358
(@code{po-previous-obsolete-entry}) move forwards or backwards,
2359
chasing for an obsolete entry. If none is found, the search is
2360
extended and wraps around in the PO file buffer.
2362
PO mode does not provide ways for un-commenting an obsolete entry
2363
and making it active, because this would reintroduce an original
2364
untranslated string which does not correspond to any marked string
2365
in the program sources. This goes with the philosophy of never
2366
introducing useless @code{msgid} values.
2368
However, it is possible to comment out an active entry, so making
2369
it obsolete. GNU @code{gettext} utilities will later react to the
2370
disappearance of a translation by using the untranslated string.
2371
The command @kbd{@key{DEL}} (@code{po-fade-out-entry}) pushes the current entry
2372
a little further towards annihilation. If the entry is active (it is a
2373
translated entry), then it is first made fuzzy. If it is already fuzzy,
2374
then the entry is merely commented out, with confirmation. If the entry
2375
is already obsolete, then it is completely deleted from the PO file.
2376
It is easy to recycle the translation so deleted into some other PO file
2377
entry, usually one which is untranslated. @xref{Modifying Translations}.
2379
Here is a quite interesting problem to solve for later development of
2380
PO mode, for those nights you are not sleepy. The idea would be that
2381
PO mode might become bright enough, one of these days, to make good
2382
guesses at retrieving the most probable candidate, among all obsolete
2383
entries, for initializing the translation of a newly appeared string.
2384
I think it might be a quite hard problem to do this algorithmically, as
2385
we have to develop good and efficient measures of string similarity.
2386
Right now, PO mode completely lets the decision to the translator,
2387
when the time comes to find the adequate obsolete translation, it
2388
merely tries to provide handy tools for helping her to do so.
2390
@node Modifying Translations, Modifying Comments, Obsolete Entries, Updating
2391
@section Modifying Translations
2393
PO mode prevents direct edition of the PO file, by the usual
2394
means Emacs give for altering a buffer's contents. By doing so,
2395
it pretends helping the translator to avoid little clerical errors
2396
about the overall file format, or the proper quoting of strings,
2397
as those errors would be easily made. Other kinds of errors are
2398
still possible, but some may be caught and diagnosed by the batch
2399
validation process, which the translator may always trigger by the
2400
@kbd{V} command. For all other errors, the translator has to rely on
2401
her own judgment, and also on the linguistic reports submitted to her
2402
by the users of the translated package, having the same mother tongue.
2404
When the time comes to create a translation, correct an error diagnosed
2405
mechanically or reported by a user, the translators have to resort to
2406
using the following commands for modifying the translations.
2410
Interactively edit the translation.
2413
Reinitialize the translation with the original, untranslated string.
2416
Save the translation on the kill ring, and delete it.
2419
Save the translation on the kill ring, without deleting it.
2422
Replace the translation, taking the new from the kill ring.
2426
The command @kbd{@key{RET}} (@code{po-edit-msgstr}) opens a new Emacs
2427
window meant to edit in a new translation, or to modify an already existing
2428
translation. The new window contains a copy of the translation taken from
2429
the current PO file entry, all ready for edition, expunged of all quoting
2430
marks, fully modifiable and with the complete extent of Emacs modifying
2431
commands. When the translator is done with her modifications, she may use
2432
@w{@kbd{C-c C-c}} to close the subedit window with the automatically requoted
2433
results, or @w{@kbd{C-c C-k}} to abort her modifications. @xref{Subedit},
2434
for more information.
2436
The command @kbd{@key{LFD}} (@code{po-msgid-to-msgstr}) initializes, or
2437
reinitializes the translation with the original string. This command is
2438
normally used when the translator wants to redo a fresh translation of
2439
the original string, disregarding any previous work.
2441
It is possible to arrange so, whenever editing an untranslated
2442
entry, the @kbd{@key{LFD}} command be automatically executed. If you set
2443
@code{po-auto-edit-with-msgid} to @code{t}, the translation gets
2444
initialised with the original string, in case none exists already.
2445
The default value for @code{po-auto-edit-with-msgid} is @code{nil}.
2447
In fact, whether it is best to start a translation with an empty
2448
string, or rather with a copy of the original string, is a matter of
2449
taste or habit. Sometimes, the source language and the
2450
target language are so different that is simply best to start writing
2451
on an empty page. At other times, the source and target languages
2452
are so close that it would be a waste to retype a number of words
2453
already being written in the original string. A translator may also
2454
like having the original string right under her eyes, as she will
2455
progressively overwrite the original text with the translation, even
2456
if this requires some extra editing work to get rid of the original.
2458
The command @kbd{k} (@code{po-kill-msgstr}) merely empties the
2459
translation string, so turning the entry into an untranslated
2460
one. But while doing so, its previous contents is put apart in
2461
a special place, known as the kill ring. The command @kbd{w}
2462
(@code{po-kill-ring-save-msgstr}) has also the effect of taking a
2463
copy of the translation onto the kill ring, but it otherwise leaves
2464
the entry alone, and does @emph{not} remove the translation from the
2465
entry. Both commands use exactly the Emacs kill ring, which is shared
2466
between buffers, and which is well known already to Emacs lovers.
2468
The translator may use @kbd{k} or @kbd{w} many times in the course
2469
of her work, as the kill ring may hold several saved translations.
2470
From the kill ring, strings may later be reinserted in various
2471
Emacs buffers. In particular, the kill ring may be used for moving
2472
translation strings between different entries of a single PO file
2473
buffer, or if the translator is handling many such buffers at once,
2474
even between PO files.
2476
To facilitate exchanges with buffers which are not in PO mode, the
2477
translation string put on the kill ring by the @kbd{k} command is fully
2478
unquoted before being saved: external quotes are removed, multi-line
2479
strings are concatenated, and backslash escaped sequences are turned
2480
into their corresponding characters. In the special case of obsolete
2481
entries, the translation is also uncommented prior to saving.
2483
The command @kbd{y} (@code{po-yank-msgstr}) completely replaces the
2484
translation of the current entry by a string taken from the kill ring.
2485
Following Emacs terminology, we then say that the replacement
2486
string is @dfn{yanked} into the PO file buffer.
2487
@xref{Yanking, , , emacs, The Emacs Editor}.
2488
The first time @kbd{y} is used, the translation receives the value of
2489
the most recent addition to the kill ring. If @kbd{y} is typed once
2490
again, immediately, without intervening keystrokes, the translation
2491
just inserted is taken away and replaced by the second most recent
2492
addition to the kill ring. By repeating @kbd{y} many times in a row,
2493
the translator may travel along the kill ring for saved strings,
2494
until she finds the string she really wanted.
2496
When a string is yanked into a PO file entry, it is fully and
2497
automatically requoted for complying with the format PO files should
2498
have. Further, if the entry is obsolete, PO mode then appropriately
2499
push the inserted string inside comments. Once again, translators
2500
should not burden themselves with quoting considerations besides, of
2501
course, the necessity of the translated string itself respective to
2502
the program using it.
2504
Note that @kbd{k} or @kbd{w} are not the only commands pushing strings
2505
on the kill ring, as almost any PO mode command replacing translation
2506
strings (or the translator comments) automatically saves the old string
2507
on the kill ring. The main exceptions to this general rule are the
2508
yanking commands themselves.
2510
To better illustrate the operation of killing and yanking, let's
2511
use an actual example, taken from a common situation. When the
2512
programmer slightly modifies some string right in the program, his
2513
change is later reflected in the PO file by the appearance
2514
of a new untranslated entry for the modified string, and the fact
2515
that the entry translating the original or unmodified string becomes
2516
obsolete. In many cases, the translator might spare herself some work
2517
by retrieving the unmodified translation from the obsolete entry,
2518
then initializing the untranslated entry @code{msgstr} field with
2519
this retrieved translation. Once this done, the obsolete entry is
2520
not wanted anymore, and may be safely deleted.
2522
When the translator finds an untranslated entry and suspects that a
2523
slight variant of the translation exists, she immediately uses @kbd{m}
2524
to mark the current entry location, then starts chasing obsolete
2525
entries with @kbd{o}, hoping to find some translation corresponding
2526
to the unmodified string. Once found, she uses the @kbd{@key{DEL}} command
2527
for deleting the obsolete entry, knowing that @kbd{@key{DEL}} also @emph{kills}
2528
the translation, that is, pushes the translation on the kill ring.
2529
Then, @kbd{r} returns to the initial untranslated entry, and @kbd{y}
2530
then @emph{yanks} the saved translation right into the @code{msgstr}
2531
field. The translator is then free to use @kbd{@key{RET}} for fine
2532
tuning the translation contents, and maybe to later use @kbd{u},
2533
then @kbd{m} again, for going on with the next untranslated string.
2535
When some sequence of keys has to be typed over and over again, the
2536
translator may find it useful to become better acquainted with the Emacs
2537
capability of learning these sequences and playing them back under request.
2538
@xref{Keyboard Macros, , , emacs, The Emacs Editor}.
2540
@node Modifying Comments, Subedit, Modifying Translations, Updating
2541
@section Modifying Comments
2543
Any translation work done seriously will raise many linguistic
2544
difficulties, for which decisions have to be made, and the choices
2545
further documented. These documents may be saved within the
2546
PO file in form of translator comments, which the translator
2547
is free to create, delete, or modify at will. These comments may
2548
be useful to herself when she returns to this PO file after a while.
2550
Comments not having whitespace after the initial @samp{#}, for example,
2551
those beginning with @samp{#.} or @samp{#:}, are @emph{not} translator
2552
comments, they are exclusively created by other @code{gettext} tools.
2553
So, the commands below will never alter such system added comments,
2554
they are not meant for the translator to modify. @xref{PO Files}.
2556
The following commands are somewhat similar to those modifying translations,
2557
so the general indications given for those apply here. @xref{Modifying
2563
Interactively edit the translator comments.
2566
Save the translator comments on the kill ring, and delete it.
2569
Save the translator comments on the kill ring, without deleting it.
2572
Replace the translator comments, taking the new from the kill ring.
2576
These commands parallel PO mode commands for modifying the translation
2577
strings, and behave much the same way as they do, except that they handle
2578
this part of PO file comments meant for translator usage, rather
2579
than the translation strings. So, if the descriptions given below are
2580
slightly succinct, it is because the full details have already been given.
2581
@xref{Modifying Translations}.
2583
The command @kbd{#} (@code{po-edit-comment}) opens a new Emacs window
2584
containing a copy of the translator comments on the current PO file entry.
2585
If there are no such comments, PO mode understands that the translator wants
2586
to add a comment to the entry, and she is presented with an empty screen.
2587
Comment marks (@kbd{#}) and the space following them are automatically
2588
removed before edition, and reinstated after. For translator comments
2589
pertaining to obsolete entries, the uncommenting and recommenting operations
2590
are done twice. Once in the editing window, the keys @w{@kbd{C-c C-c}}
2591
allow the translator to tell she is finished with editing the comment.
2592
@xref{Subedit}, for further details.
2594
Functions found on @code{po-subedit-mode-hook}, if any, are executed after
2595
the string has been inserted in the edit buffer.
2597
The command @kbd{K} (@code{po-kill-comment}) gets rid of all
2598
translator comments, while saving those comments on the kill ring.
2599
The command @kbd{W} (@code{po-kill-ring-save-comment}) takes
2600
a copy of the translator comments on the kill ring, but leaves
2601
them undisturbed in the current entry. The command @kbd{Y}
2602
(@code{po-yank-comment}) completely replaces the translator comments
2603
by a string taken at the front of the kill ring. When this command
2604
is immediately repeated, the comments just inserted are withdrawn,
2605
and replaced by other strings taken along the kill ring.
2607
On the kill ring, all strings have the same nature. There is no
2608
distinction between @emph{translation} strings and @emph{translator
2609
comments} strings. So, for example, let's presume the translator
2610
has just finished editing a translation, and wants to create a new
2611
translator comment to document why the previous translation was
2612
not good, just to remember what was the problem. Foreseeing that she
2613
will do that in her documentation, the translator may want to quote
2614
the previous translation in her translator comments. To do so, she
2615
may initialize the translator comments with the previous translation,
2616
still at the head of the kill ring. Because editing already pushed the
2617
previous translation on the kill ring, she merely has to type @kbd{M-w}
2618
prior to @kbd{#}, and the previous translation will be right there,
2619
all ready for being introduced by some explanatory text.
2621
On the other hand, presume there are some translator comments already
2622
and that the translator wants to add to those comments, instead
2623
of wholly replacing them. Then, she should edit the comment right
2624
away with @kbd{#}. Once inside the editing window, she can use the
2625
regular Emacs commands @kbd{C-y} (@code{yank}) and @kbd{M-y}
2626
(@code{yank-pop}) to get the previous translation where she likes.
2628
@node Subedit, C Sources Context, Modifying Comments, Updating
2629
@section Details of Sub Edition
2631
The PO subedit minor mode has a few peculiarities worth being described
2632
in fuller detail. It installs a few commands over the usual editing set
2633
of Emacs, which are described below.
2643
Consult auxiliary PO files.
2647
The window's contents represents a translation for a given message,
2648
or a translator comment. The translator may modify this window to
2649
her heart's content. Once this done, the command @w{@kbd{C-c C-c}}
2650
(@code{po-subedit-exit}) may be used to return the edited translation into
2651
the PO file, replacing the original translation, even if it moved out of
2652
sight or if buffers were switched.
2654
If the translator becomes unsatisfied with her translation or comment,
2655
to the extent she prefers keeping what was existent prior to the
2656
@kbd{@key{RET}} or @kbd{#} command, she may use the command @w{@kbd{C-c C-k}}
2657
(@code{po-subedit-abort}) to merely get rid of edition, while preserving
2658
the original translation or comment. Another way would be for her to exit
2659
normally with @w{@kbd{C-c C-c}}, then type @code{U} once for undoing the
2660
whole effect of last edition.
2662
The command @w{@kbd{C-c C-a}} allows for glancing through translations
2663
already achieved in other languages, directly while editing the current
2664
translation. This may be quite convenient when the translator is fluent
2665
at many languages, but of course, only makes sense when such completed
2666
auxiliary PO files are already available to her (@pxref{Auxiliary}).
2668
Functions found on @code{po-subedit-mode-hook}, if any, are executed after
2669
the string has been inserted in the edit buffer.
2671
While editing her translation, the translator should pay attention to not
2672
inserting unwanted @kbd{@key{RET}} (newline) characters at the end of
2673
the translated string if those are not meant to be there, or to removing
2674
such characters when they are required. Since these characters are not
2675
visible in the editing buffer, they are easily introduced by mistake.
2676
To help her, @kbd{@key{RET}} automatically puts the character @kbd{<}
2677
at the end of the string being edited, but this @kbd{<} is not really
2678
part of the string. On exiting the editing window with @w{@kbd{C-c C-c}},
2679
PO mode automatically removes such @kbd{<} and all whitespace added after
2680
it. If the translator adds characters after the terminating @kbd{<}, it
2681
looses its delimiting property and integrally becomes part of the string.
2682
If she removes the delimiting @kbd{<}, then the edited string is taken
2683
@emph{as is}, with all trailing newlines, even if invisible. Also, if
2684
the translated string ought to end itself with a genuine @kbd{<}, then
2685
the delimiting @kbd{<} may not be removed; so the string should appear,
2686
in the editing window, as ending with two @kbd{<} in a row.
2688
When a translation (or a comment) is being edited, the translator may move
2689
the cursor back into the PO file buffer and freely move to other entries,
2690
browsing at will. If, with an edition pending, the translator wanders in the
2691
PO file buffer, she may decide to start modifying another entry. Each entry
2692
being edited has its own subedit buffer. It is possible to simultaneously
2693
edit the translation @emph{and} the comment of a single entry, or to
2694
edit entries in different PO files, all at once. Typing @kbd{@key{RET}}
2695
on a field already being edited merely resumes that particular edit. Yet,
2696
the translator should better be comfortable at handling many Emacs windows!
2698
Pending subedits may be completed or aborted in any order, regardless
2699
of how or when they were started. When many subedits are pending and the
2700
translator asks for quitting the PO file (with the @kbd{q} command), subedits
2701
are automatically resumed one at a time, so she may decide for each of them.
2703
@node C Sources Context, Auxiliary, Subedit, Updating
2704
@section C Sources Context
2706
PO mode is particularily powerful when used with PO files
2707
created through GNU @code{gettext} utilities, as those utilities
2708
insert special comments in the PO files they generate.
2709
Some of these special comments relate the PO file entry to
2710
exactly where the untranslated string appears in the program sources.
2712
When the translator gets to an untranslated entry, she is fairly
2713
often faced with an original string which is not as informative as
2714
it normally should be, being succinct, cryptic, or otherwise ambiguous.
2715
Before chosing how to translate the string, she needs to understand
2716
better what the string really means and how tight the translation has
2717
to be. Most of times, when problems arise, the only way left to make
2718
her judgment is looking at the true program sources from where this
2719
string originated, searching for surrounding comments the programmer
2720
might have put in there, and looking around for helping clues of
2723
Surely, when looking at program sources, the translator will receive
2724
more help if she is a fluent programmer. However, even if she is
2725
not versed in programming and feels a little lost in C code, the
2726
translator should not be shy at taking a look, once in a while.
2727
It is most probable that she will still be able to find some of the
2728
hints she needs. She will learn quickly to not feel uncomfortable
2729
in program code, paying more attention to programmer's comments,
2730
variable and function names (if he dared chosing them well), and
2731
overall organization, than to programmation itself.
2733
The following commands are meant to help the translator at getting
2734
program source context for a PO file entry.
2738
Resume the display of a program source context, or cycle through them.
2741
Display of a program source context selected by menu.
2744
Add a directory to the search path for source files.
2747
Delete a directory from the search path for source files.
2751
The commands @kbd{s} (@code{po-cycle-reference}) and @kbd{M-s}
2752
(@code{po-select-source-reference}) both open another window displaying
2753
some source program file, and already positioned in such a way that
2754
it shows an actual use of the string to be translated. By doing
2755
so, the command gives source program context for the string. But if
2756
the entry has no source context references, or if all references
2757
are unresolved along the search path for program sources, then the
2758
command diagnoses this as an error.
2760
Even if @kbd{s} (or @kbd{M-s}) opens a new window, the cursor stays
2761
in the PO file window. If the translator really wants to
2762
get into the program source window, she ought to do it explicitly,
2763
maybe by using command @kbd{O}.
2765
When @kbd{s} is typed for the first time, or for a PO file entry which
2766
is different of the last one used for getting source context, then the
2767
command reacts by giving the first context available for this entry,
2768
if any. If some context has already been recently displayed for the
2769
current PO file entry, and the translator wandered off to do other
2770
things, typing @kbd{s} again will merely resume, in another window,
2771
the context last displayed. In particular, if the translator moved
2772
the cursor away from the context in the source file, the command will
2773
bring the cursor back to the context. By using @kbd{s} many times
2774
in a row, with no other commands intervening, PO mode will cycle to
2775
the next available contexts for this particular entry, getting back
2776
to the first context once the last has been shown.
2778
The command @kbd{M-s} behaves differently. Instead of cycling through
2779
references, it lets the translator choose a particular reference among
2780
many, and displays that reference. It is best used with completion,
2781
if the translator types @kbd{@key{TAB}} immediately after @kbd{M-s}, in
2782
response to the question, she will be offered a menu of all possible
2783
references, as a reminder of which are the acceptable answers.
2784
This command is useful only where there are really many contexts
2785
available for a single string to translate.
2787
Program source files are usually found relative to where the PO
2788
file stands. As a special provision, when this fails, the file is
2789
also looked for, but relative to the directory immediately above it.
2790
Those two cases take proper care of most PO files. However, it might
2791
happen that a PO file has been moved, or is edited in a different
2792
place than its normal location. When this happens, the translator
2793
should tell PO mode in which directory normally sits the genuine PO
2794
file. Many such directories may be specified, and all together, they
2795
constitute what is called the @dfn{search path} for program sources.
2796
The command @kbd{S} (@code{po-consider-source-path}) is used to interactively
2797
enter a new directory at the front of the search path, and the command
2798
@kbd{M-S} (@code{po-ignore-source-path}) is used to select, with completion,
2799
one of the directories she does not want anymore on the search path.
2801
@node Auxiliary, Compendium, C Sources Context, Updating
2802
@section Consulting Auxiliary PO Files
2804
PO mode is able to help the knowledgeable translator, being fluent in
2805
many languages, at taking advantage of translations already achieved
2806
in other languages she just happens to know. It provides these other
2807
language translations as additional context for her own work. Moreover,
2808
it has features to ease the production of translations for many languages
2809
at once, for translators preferring to work in this way.
2811
An @dfn{auxiliary} PO file is an existing PO file meant for the same
2812
package the translator is working on, but targeted to a different mother
2813
tongue language. Commands exist for declaring and handling auxiliary
2814
PO files, and also for showing contexts for the entry under work.
2816
Here are the auxiliary file commands available in PO mode.
2820
Seek auxiliary files for another translation for the same entry.
2823
Switch to a particular auxiliary file.
2826
Declare this PO file as an auxiliary file.
2829
Remove this PO file from the list of auxiliary files.
2833
Command @kbd{A} (@code{po-consider-as-auxiliary}) adds the current
2834
PO file to the list of auxiliary files, while command @kbd{M-A}
2835
(@code{po-ignore-as-auxiliary} just removes it.
2837
The command @kbd{a} (@code{po-cycle-auxiliary}) seeks all auxiliary PO
2838
files, round-robin, searching for a translated entry in some other language
2839
having an @code{msgid} field identical as the one for the current entry.
2840
The found PO file, if any, takes the place of the current PO file in
2841
the display (its window gets on top). Before doing so, the current PO
2842
file is also made into an auxiliary file, if not already. So, @kbd{a}
2843
in this newly displayed PO file will seek another PO file, and so on,
2844
so repeating @kbd{a} will eventually yield back the original PO file.
2846
The command @kbd{M-a} (@code{po-select-auxiliary}) asks the translator
2847
for her choice of a particular auxiliary file, with completion, and
2848
then switches to that selected PO file. The command also checks if
2849
the selected file has an @code{msgid} field identical as the one for
2850
the current entry, and if yes, this entry becomes current. Otherwise,
2851
the cursor of the selected file is left undisturbed.
2853
For all this to work fully, auxiliary PO files will have to be normalized,
2854
in that way that @code{msgid} fields should be written @emph{exactly}
2855
the same way. It is possible to write @code{msgid} fields in various
2856
ways for representing the same string, different writing would break the
2857
proper behaviour of the auxiliary file commands of PO mode. This is not
2858
expected to be much a problem in practice, as most existing PO files have
2859
their @code{msgid} entries written by the same GNU @code{gettext} tools.
2861
However, PO files initially created by PO mode itself, while marking
2862
strings in source files, are normalised differently. So are PO
2863
files resulting of the the @samp{M-x normalize} command. Until these
2864
discrepancies between PO mode and other GNU @code{gettext} tools get
2865
fully resolved, the translator should stay aware of normalisation issues.
2867
@node Compendium, , Auxiliary, Updating
2868
@section Using Translation Compendiums
2872
Compendiums are yet to be implemented.
2874
An incoming PO mode feature will let the translator maintain a
2875
compendium of already achieved translations. A @dfn{compendium}
2876
is a special PO file containing a set of translations recurring in
2877
many different packages. The translator will be given commands for
2878
adding entries to her compendium, and later initializing untranslated
2879
entries, or updating already translated entries, from translations
2880
kept in the compendium. For this to work, however, the compendium
2881
would have to be normalized. @xref{Normalizing}.
2883
@c It is not useful that I modify the @file{lib/} routines if not done in
2884
@c the true sources. How do you/I/they proceed for getting this job done?
2885
@c I presume that @file{lib/} routines will all use @code{gettext} for
2888
@node Binaries, Users, Updating, Top
2889
@chapter Producing Binary MO Files
2894
* msgfmt Invocation:: Invoking the @code{msgfmt} Program
2895
* MO Files:: The Format of GNU MO Files
2898
@node msgfmt Invocation, MO Files, Binaries, Binaries
2899
@section Invoking the @code{msgfmt} Program
2904
Usage: msgfmt [@var{option}] @var{filename}.po @dots{}
2908
@item -a @var{number}
2909
@itemx --alignment=@var{number}
2910
Align strings to @var{number} bytes (default: 1).
2911
@c Currently the README mentions that this constant could be changed by
2912
@c the installer by changing the value in config.h. Should this go away?
2916
Display this help and exit.
2919
Binary file will not include the hash table.
2922
@itemx --output-file=@var{file}
2923
Specify output file name as @var{file}.
2926
Direct the program to work strictly following the Uniforum/Sun
2927
implementation. Currently this only affects the naming of the output
2928
file. If this option is not given the name of the output file is the
2929
same as the domain name. If the strict Uniforum mode is enabled the
2930
suffix @file{.mo} is added to the file name if it is not already
2933
We find this behaviour of Sun's implementation rather silly and so by
2934
default this mode is @emph{not} selected.
2938
Detect and diagnose input file anomalies which might represent
2939
translation errors. The @code{msgid} and @code{msgstr} strings are
2940
studied and compared. It is considered abnormal that one string
2941
starts or ends with a newline while the other does not.
2943
Also, if the string represents a format string used in a
2944
@code{printf}-like function both strings should have the same number of
2945
@samp{%} format specifiers, with matching types. If the flag
2946
@code{c-format} or @code{possible-c-format} appears in the special
2947
comment @key{#,} for this entry a check is performed. For example, the
2948
check will diagnose using @samp{%.*s} against @samp{%s}, or @samp{%d}
2949
against @samp{%s}, or @samp{%d} against @samp{%x}. It can even handle
2950
positional parameters.
2952
Normally the @code{xgettext} program automatically decides whether a
2953
string is a format string or not. This algorithm is not perfect,
2954
though. It might regard a string as a format string though it is not
2955
used in a @code{printf}-like function and so @code{msgfmt} might report
2956
errors where there are none. Or the other way round: a string is not
2957
regarded as a format string but it is used in a @code{printf}-like
2960
So solve this problem the programmer can dictate the decision to the
2961
@code{xgettext} program (@pxref{c-format}). The translator should not
2962
consider removing the flag from the @key{#,} line. This "fix" would be
2963
reversed again as soon as @code{msgmerge} is called the next time.
2967
Output version information and exit.
2971
If input file is @samp{-}, standard input is read. If output file
2972
is @samp{-}, output is written to standard output.
2974
@node MO Files, , msgfmt Invocation, Binaries
2975
@section The Format of GNU MO Files
2977
The format of the generated MO files is best described by a picture,
2978
which appears below.
2980
The first two words serve the identification of the file. The magic
2981
number will always signal GNU MO files. The number is stored in the
2982
byte order of the generating machine, so the magic number really is
2983
two numbers: @code{0x950412de} and @code{0xde120495}. The second
2984
word describes the current revision of the file format. For now the
2985
revision is 0. This might change in future versions, and ensures
2986
that the readers of MO files can distinguish new formats from old
2987
ones, so that both can be handled correctly. The version is kept
2988
separate from the magic number, instead of using different magic
2989
numbers for different formats, mainly because @file{/etc/magic} is
2990
not updated often. It might be better to have magic separated from
2991
internal format version identification.
2993
Follow a number of pointers to later tables in the file, allowing
2994
for the extension of the prefix part of MO files without having to
2995
recompile programs reading them. This might become useful for later
2996
inserting a few flag bits, indication about the charset used, new
2997
tables, or other things.
2999
Then, at offset @var{O} and offset @var{T} in the picture, two tables
3000
of string descriptors can be found. In both tables, each string
3001
descriptor uses two 32 bits integers, one for the string length,
3002
another for the offset of the string in the MO file, counting in bytes
3003
from the start of the file. The first table contains descriptors
3004
for the original strings, and is sorted so the original strings
3005
are in increasing lexicographical order. The second table contains
3006
descriptors for the translated strings, and is parallel to the first
3007
table: to find the corresponding translation one has to access the
3008
array slot in the second array with the same index.
3010
Having the original strings sorted enables the use of simple binary
3011
search, for when the MO file does not contain an hashing table, or
3012
for when it is not practical to use the hashing table provided in
3013
the MO file. This also has another advantage, as the empty string
3014
in a PO file GNU @code{gettext} is usually @emph{translated} into
3015
some system information attached to that particular MO file, and the
3016
empty string necessarily becomes the first in both the original and
3017
translated tables, making the system information very easy to find.
3019
The size @var{S} of the hash table can be zero. In this case, the
3020
hash table itself is not contained in the MO file. Some people might
3021
prefer this because a precomputed hashing table takes disk space, and
3022
does not win @emph{that} much speed. The hash table contains indices
3023
to the sorted array of strings in the MO file. Conflict resolution is
3024
done by double hashing. The precise hashing algorithm used is fairly
3025
dependent of GNU @code{gettext} code, and is not documented here.
3027
As for the strings themselves, they follow the hash file, and each
3028
is terminated with a @key{NUL}, and this @key{NUL} is not counted in
3029
the length which appears in the string descriptor. The @code{msgfmt}
3030
program has an option selecting the alignment for MO file strings.
3031
With this option, each string is separately aligned so it starts at
3032
an offset which is a multiple of the alignment value. On some RISC
3033
machines, a correct alignment will speed things up.
3035
Plural forms are stored by letting the plural of the original string
3036
follow the singular of the original string, separated through a
3037
@key{NUL} byte. The length which appears in the string descriptor
3038
includes both. However, only the singular of the original string
3039
takes part in the hash table lookup. The plural variants of the
3040
translation are all stored consecutively, separated through a
3041
@key{NUL} byte. Here also, the length in the string descriptor
3042
includes all of them.
3044
Nothing prevents a MO file from having embedded @key{NUL}s in strings.
3045
However, the program interface currently used already presumes
3046
that strings are @key{NUL} terminated, so embedded @key{NUL}s are
3047
somewhat useless. But the MO file format is general enough so other
3048
interfaces would be later possible, if for example, we ever want to
3049
implement wide characters right in MO files, where @key{NUL} bytes may
3050
accidently appear. (No, we don't want to have wide characters in MO
3051
files. They would make the file unnecessarily large, and the
3052
@samp{wchar_t} type being platform dependent, MO files would be
3053
platform dependent as well.)
3055
This particular issue has been strongly debated in the GNU
3056
@code{gettext} development forum, and it is expectable that MO file
3057
format will evolve or change over time. It is even possible that many
3058
formats may later be supported concurrently. But surely, we have to
3059
start somewhere, and the MO file format described here is a good start.
3060
Nothing is cast in concrete, and the format may later evolve fairly
3061
easily, so we should feel comfortable with the current approach.
3066
+------------------------------------------+
3067
0 | magic number = 0x950412de |
3069
4 | file format revision = 0 |
3071
8 | number of strings | == N
3073
12 | offset of table with original strings | == O
3075
16 | offset of table with translation strings | == T
3077
20 | size of hashing table | == S
3079
24 | offset of hashing table | == H
3082
. (possibly more entries later) .
3085
O | length & offset 0th string ----------------.
3086
O + 8 | length & offset 1st string ------------------.
3088
O + ((N-1)*8)| length & offset (N-1)th string | | |
3090
T | length & offset 0th translation ---------------.
3091
T + 8 | length & offset 1st translation -----------------.
3093
T + ((N-1)*8)| length & offset (N-1)th translation | | | | |
3095
H | start hash table | | | | |
3097
H + S * 4 | end hash table | | | | |
3099
| NUL terminated 0th string <----------------' | | |
3101
| NUL terminated 1st string <------------------' | |
3105
| NUL terminated 0th translation <---------------' |
3107
| NUL terminated 1st translation <-----------------'
3111
+------------------------------------------+
3115
@node Users, Programmers, Binaries, Top
3116
@chapter The User's View
3118
When GNU @code{gettext} will truly have reached its goal, average users
3119
should feel some kind of astonished pleasure, seeing the effect of
3120
that strange kind of magic that just makes their own native language
3121
appear everywhere on their screens. As for naive users, they would
3122
ideally have no special pleasure about it, merely taking their own
3123
language for @emph{granted}, and becoming rather unhappy otherwise.
3125
So, let's try to describe here how we would like the magic to operate,
3126
as we want the users' view to be the simplest, among all ways one
3127
could look at GNU @code{gettext}. All other software engineers:
3128
programmers, translators, maintainers, should work together in such a
3129
way that the magic becomes possible. This is a long and progressive
3130
undertaking, and information is available about the progress of the
3131
Translation Project.
3133
When a package is distributed, there are two kinds of users:
3134
@dfn{installers} who fetch the distribution, unpack it, configure
3135
it, compile it and install it for themselves or others to use; and
3136
@dfn{end users} that call programs of the package, once these have
3137
been installed at their site. GNU @code{gettext} is offering magic
3138
for both installers and end users.
3141
* Matrix:: The Current @file{ABOUT-NLS} Matrix
3142
* Installers:: Magic for Installers
3143
* End Users:: Magic for End Users
3146
@node Matrix, Installers, Users, Users
3147
@section The Current @file{ABOUT-NLS} Matrix
3149
Languages are not equally supported in all packages using GNU
3150
@code{gettext}. To know if some package uses GNU @code{gettext}, one
3151
may check the distribution for the @file{ABOUT-NLS} information file, for
3152
some @file{@var{ll}.po} files, often kept together into some @file{po/}
3153
directory, or for an @file{intl/} directory. Internationalized packages
3154
have usually many @file{@var{ll}.po} files, where @var{ll} represents
3155
the language. @ref{End Users} for a complete description of the format
3158
More generally, a matrix is available for showing the current state
3159
of the Translation Project, listing which packages are prepared for
3160
multi-lingual messages, and which languages are supported by each.
3161
Because this information changes often, this matrix is not kept within
3162
this GNU @code{gettext} manual. This information is often found in
3163
file @file{ABOUT-NLS} from various distributions, but is also as old as
3164
the distribution itself. A recent copy of this @file{ABOUT-NLS} file,
3165
containing up-to-date information, should generally be found on the
3166
Translation Project sites, and also on most GNU archive sites.
3168
@node Installers, End Users, Matrix, Users
3169
@section Magic for Installers
3171
By default, packages fully using GNU @code{gettext}, internally,
3172
are installed in such a way that they to allow translation of
3173
messages. At @emph{configuration} time, those packages should
3174
automatically detect whether the underlying host system already provides
3175
the GNU @code{gettext} functions. If not,
3176
the GNU @code{gettext} library should be automatically prepared
3177
and used. Installers may use special options at configuration
3178
time for changing this behavior. The command @samp{./configure
3179
--with-included-gettext} bypasses system @code{gettext} to
3180
use the included GNU @code{gettext} instead,
3181
while @samp{./configure --disable-nls}
3182
produces programs totally unable to translate messages.
3184
Internationalized packages have usually many @file{@var{ll}.po}
3186
translations are disabled, all those available are installed together
3187
with the package. However, the environment variable @code{LINGUAS}
3188
may be set, prior to configuration, to limit the installed set.
3189
@code{LINGUAS} should then contain a space separated list of two-letter
3190
codes, stating which languages are allowed.
3192
@node End Users, , Installers, Users
3193
@section Magic for End Users
3195
We consider here those packages using GNU @code{gettext} internally,
3196
and for which the installers did not disable translation at
3197
@emph{configure} time. Then, users only have to set the @code{LANG}
3198
environment variable to the appropriate @samp{@var{ll}_@var{CC}}
3199
combination prior to using the programs in the package. @xref{Matrix}.
3200
For example, let's presume a German site. At the shell prompt, users
3201
merely have to execute @w{@samp{setenv LANG de_DE}} (in @code{csh}) or
3202
@w{@samp{export LANG; LANG=de_DE}} (in @code{sh}). They could even do
3203
this from their @file{.login} or @file{.profile} file.
3205
@node Programmers, Translators, Users, Top
3206
@chapter The Programmer's View
3208
@c FIXME: Reorganize whole chapter.
3210
One aim of the current message catalog implementation provided by
3211
GNU @code{gettext} was to use the systems message catalog handling, if the
3212
installer wishes to do so. So we perhaps should first take a look at
3213
the solutions we know about. The people in the POSIX committee did not
3214
manage to agree on one of the semi-official standards which we'll
3215
describe below. In fact they couldn't agree on anything, so they decided
3216
only to include an example of an interface. The major Unix vendors
3217
are split in the usage of the two most important specifications: X/Open's
3218
catgets vs. Uniforum's gettext interface. We'll describe them both and
3219
later explain our solution of this dilemma.
3222
* catgets:: About @code{catgets}
3223
* gettext:: About @code{gettext}
3224
* Comparison:: Comparing the two interfaces
3225
* Using libintl.a:: Using libintl.a in own programs
3226
* gettext grok:: Being a @code{gettext} grok
3227
* Temp Programmers:: Temporary Notes for the Programmers Chapter
3230
@node catgets, gettext, Programmers, Programmers
3231
@section About @code{catgets}
3233
The @code{catgets} implementation is defined in the X/Open Portability
3234
Guide, Volume 3, XSI Supplementary Definitions, Chapter 5. But the
3235
process of creating this standard seemed to be too slow for some of
3236
the Unix vendors so they created their implementations on preliminary
3237
versions of the standard. Of course this leads again to problems while
3238
writing platform independent programs: even the usage of @code{catgets}
3239
does not guarantee a unique interface.
3241
Another, personal comment on this that only a bunch of committee members
3242
could have made this interface. They never really tried to program
3243
using this interface. It is a fast, memory-saving implementation, an
3244
user can happily live with it. But programmers hate it (at least me and
3245
some others do@dots{})
3247
But we must not forget one point: after all the trouble with transfering
3248
the rights on Unix(tm) they at last came to X/Open, the very same who
3249
published this specification. This leads me to making the prediction
3250
that this interface will be in future Unix standards (e.g. Spec1170) and
3251
therefore part of all Unix implementation (implementations, which are
3252
@emph{allowed} to wear this name).
3255
* Interface to catgets:: The interface
3256
* Problems with catgets:: Problems with the @code{catgets} interface?!
3259
@node Interface to catgets, Problems with catgets, catgets, catgets
3260
@subsection The Interface
3262
The interface to the @code{catgets} implementation consists of three
3263
functions which correspond to those used in file access: @code{catopen}
3264
to open the catalog for using, @code{catgets} for accessing the message
3265
tables, and @code{catclose} for closing after work is done. Prototypes
3266
for the functions and the needed definitions are in the
3267
@code{<nl_types.h>} header file.
3269
@code{catopen} is used like in this:
3272
nl_catd catd = catopen ("catalog_name", 0);
3275
The function takes as the argument the name of the catalog. This usual
3276
refers to the name of the program or the package. The second parameter
3277
is not further specified in the standard. I don't even know whether it
3278
is implemented consistently among various systems. So the common advice
3279
is to use @code{0} as the value. The return value is a handle to the
3280
message catalog, equivalent to handles to file returned by @code{open}.
3282
This handle is of course used in the @code{catgets} function which can
3286
char *translation = catgets (catd, set_no, msg_id, "original string");
3289
The first parameter is this catalog descriptor. The second parameter
3290
specifies the set of messages in this catalog, in which the message
3291
described by @code{msg_id} is obtained. @code{catgets} therefore uses a
3292
three-stage addressing:
3295
catalog name @result{} set number @result{} message ID @result{} translation
3298
@c Anybody else loving Haskell??? :-) -- Uli
3300
The fourth argument is not used to address the translation. It is given
3301
as a default value in case when one of the addressing stages fail. One
3302
important thing to remember is that although the return type of catgets
3303
is @code{char *} the resulting string @emph{must not} be changed. It
3304
should better be @code{const char *}, but the standard is published in
3305
1988, one year before ANSI C.
3308
The last of these function functions is used and behaves as expected:
3314
After this no @code{catgets} call using the descriptor is legal anymore.
3316
@node Problems with catgets, , Interface to catgets, catgets
3317
@subsection Problems with the @code{catgets} Interface?!
3319
Now that this description seemed to be really easy --- where are the
3320
problem we speak of? In fact the interface could be used in a
3321
reasonable way, but constructing the message catalogs is a pain. The
3322
reason for this lies in the third argument of @code{catgets}: the unique
3323
message ID. This has to be a numeric value for all messages in a single
3324
set. Perhaps you could imagine the problems keeping such a list while
3325
changing the source code. Add a new message here, remove one there. Of
3326
course there have been developed a lot of tools helping to organize this
3327
chaos but one as the other fails in one aspect or the other. We don't
3328
want to say that the other approach has no problems but they are far
3329
more easy to manage.
3331
@node gettext, Comparison, catgets, Programmers
3332
@section About @code{gettext}
3334
The definition of the @code{gettext} interface comes from a Uniforum
3335
proposal and it is followed by at least one major Unix vendor
3336
(Sun) in its last developments. It is not specified in any official
3339
The main points about this solution is that it does not follow the
3340
method of normal file handling (open-use-close) and that it does not
3341
burden the programmer so many task, especially the unique key handling.
3342
Of course here is also a unique key needed, but this key is the message
3343
itself (how long or short it is). See @ref{Comparison} for a more
3344
detailed comparison of the two methods.
3346
The following section contains a rather detailed description of the
3347
interface. We make it that detailed because this is the interface
3348
we chose for the GNU @code{gettext} Library. Programmers interested
3349
in using this library will be interested in this description.
3352
* Interface to gettext:: The interface
3353
* Ambiguities:: Solving ambiguities
3354
* Locating Catalogs:: Locating message catalog files
3355
* Charset conversion:: How to request conversion to Unicode
3356
* Plural forms:: Additional functions for handling plurals
3357
* GUI program problems:: Another technique for solving ambiguities
3358
* Optimized gettext:: Optimization of the *gettext functions
3361
@node Interface to gettext, Ambiguities, gettext, gettext
3362
@subsection The Interface
3364
The minimal functionality an interface must have is a) to select a
3365
domain the strings are coming from (a single domain for all programs is
3366
not reasonable because its construction and maintenance is difficult,
3367
perhaps impossible) and b) to access a string in a selected domain.
3369
This is principally the description of the @code{gettext} interface. It
3370
has a global domain which unqualified usages reference. Of course this
3371
domain is selectable by the user.
3374
char *textdomain (const char *domain_name);
3377
This provides the possibility to change or query the current status of
3378
the current global domain of the @code{LC_MESSAGE} category. The
3379
argument is a null-terminated string, whose characters must be legal in
3380
the use in filenames. If the @var{domain_name} argument is @code{NULL},
3381
the function return the current value. If no value has been set
3382
before, the name of the default domain is returned: @emph{messages}.
3383
Please note that although the return value of @code{textdomain} is of
3384
type @code{char *} no changing is allowed. It is also important to know
3385
that no checks of the availability are made. If the name is not
3386
available you will see this by the fact that no translations are provided.
3389
To use a domain set by @code{textdomain} the function
3392
char *gettext (const char *msgid);
3395
is to be used. This is the simplest reasonable form one can imagine.
3396
The translation of the string @var{msgid} is returned if it is available
3397
in the current domain. If not available the argument itself is
3398
returned. If the argument is @code{NULL} the result is undefined.
3400
One things which should come into mind is that no explicit dependency to
3401
the used domain is given. The current value of the domain for the
3402
@code{LC_MESSAGES} locale is used. If this changes between two
3403
executions of the same @code{gettext} call in the program, both calls
3404
reference a different message catalog.
3406
For the easiest case, which is normally used in internationalized
3407
packages, once at the beginning of execution a call to @code{textdomain}
3408
is issued, setting the domain to a unique name, normally the package
3409
name. In the following code all strings which have to be translated are
3410
filtered through the gettext function. That's all, the package speaks
3413
@node Ambiguities, Locating Catalogs, Interface to gettext, gettext
3414
@subsection Solving Ambiguities
3416
While this single name domain works well for most applications there
3417
might be the need to get translations from more than one domain. Of
3418
course one could switch between different domains with calls to
3419
@code{textdomain}, but this is really not convenient nor is it fast. A
3420
possible situation could be one case subject to discussion during this
3422
error messages of functions in the set of common used functions should
3423
go into a separate domain @code{error}. By this mean we would only need
3424
to translate them once.
3425
Another case are messages from a library, as these @emph{have} to be
3426
independent of the current domain set by the application.
3429
For this reasons there are two more functions to retrieve strings:
3432
char *dgettext (const char *domain_name, const char *msgid);
3433
char *dcgettext (const char *domain_name, const char *msgid,
3437
Both take an additional argument at the first place, which corresponds
3438
to the argument of @code{textdomain}. The third argument of
3439
@code{dcgettext} allows to use another locale but @code{LC_MESSAGES}.
3440
But I really don't know where this can be useful. If the
3441
@var{domain_name} is @code{NULL} or @var{category} has an value beside
3442
the known ones, the result is undefined. It should also be noted that
3443
this function is not part of the second known implementation of this
3444
function family, the one found in Solaris.
3446
A second ambiguity can arise by the fact, that perhaps more than one
3447
domain has the same name. This can be solved by specifying where the
3448
needed message catalog files can be found.
3451
char *bindtextdomain (const char *domain_name,
3452
const char *dir_name);
3455
Calling this function binds the given domain to a file in the specified
3456
directory (how this file is determined follows below). Especially a
3457
file in the systems default place is not favored against the specified
3458
file anymore (as it would be by solely using @code{textdomain}). A
3459
@code{NULL} pointer for the @var{dir_name} parameter returns the binding
3460
associated with @var{domain_name}. If @var{domain_name} itself is
3461
@code{NULL} nothing happens and a @code{NULL} pointer is returned. Here
3462
again as for all the other functions is true that none of the return
3463
value must be changed!
3465
It is important to remember that relative path names for the
3466
@var{dir_name} parameter can be trouble. Since the path is always
3467
computed relative to the current directory different results will be
3468
achieved when the program executes a @code{chdir} command. Relative
3469
paths should always be avoided to avoid dependencies and
3472
@node Locating Catalogs, Charset conversion, Ambiguities, gettext
3473
@subsection Locating Message Catalog Files
3475
Because many different languages for many different packages have to be
3476
stored we need some way to add these information to file message catalog
3477
files. The way usually used in Unix environments is have this encoding
3478
in the file name. This is also done here. The directory name given in
3479
@code{bindtextdomain}s second argument (or the default directory),
3480
followed by the value and name of the locale and the domain name are
3484
@var{dir_name}/@var{locale}/LC_@var{category}/@var{domain_name}.mo
3487
The default value for @var{dir_name} is system specific. For the GNU
3488
library, and for packages adhering to its conventions, it's:
3490
/usr/local/share/locale
3494
@var{locale} is the value of the locale whose name is this
3495
@code{LC_@var{category}}. For @code{gettext} and @code{dgettext} this
3496
@code{LC_@var{category}} is always @code{LC_MESSAGES}.@footnote{Some
3497
system, eg Ultrix, don't have @code{LC_MESSAGES}. Here we use a more or
3498
less arbitrary value for it, namely 1729, the smallest positive integer
3499
which can be represented in two different ways as the sum of two cubes.}
3500
The value of the locale is determined through
3501
@code{setlocale (LC_@var{category}, NULL)}.
3502
@footnote{When the system does not support @code{setlocale} its behavior
3503
in setting the locale values is simulated by looking at the environment
3505
@code{dcgettext} specifies the locale category by the third argument.
3507
@node Charset conversion, Plural forms, Locating Catalogs, gettext
3508
@subsection How to specify the output character set @code{gettext} uses
3510
@code{gettext} not only looks up a translation in a message catalog. It
3511
also converts the translation on the fly to the desired output character
3512
set. This is useful if the user is working in a different character set
3513
than the translator who created the message catalog, because it avoids
3514
distributing variants of message catalogs which differ only in the
3517
The output character set is, by default, the value of @code{nl_langinfo
3518
(CODESET)}, which depends on the @code{LC_CTYPE} part of the current
3519
locale. But programs which store strings in a locale independent way
3520
(e.g. UTF-8) can request that @code{gettext} and related functions
3521
return the translations in that encoding, by use of the
3522
@code{bind_textdomain_codeset} function.
3524
Note that the @var{msgid} argument to @code{gettext} is not subject to
3525
character set conversion. Also, when @code{gettext} does not find a
3526
translation for @var{msgid}, it returns @var{msgid} unchanged --
3527
independently of the current output character set. It is therefore
3528
recommended that all @var{msgid}s be US-ASCII strings.
3530
@deftypefun {char *} bind_textdomain_codeset (const char *@var{domainname}, const char *@var{codeset})
3531
The @code{bind_textdomain_codeset} function can be used to specify the
3532
output character set for message catalogs for domain @var{domainname}.
3533
The @var{codeset} argument must be a valid codeset name which can be used
3534
for the @code{iconv_open} function, or a null pointer.
3536
If the @var{codeset} parameter is the null pointer,
3537
@code{bind_textdomain_codeset} returns the currently selected codeset
3538
for the domain with the name @var{domainname}. It returns @code{NULL} if
3539
no codeset has yet been selected.
3541
The @code{bind_textdomain_codeset} function can be used several times.
3542
If used multiple times with the same @var{domainname} argument, the
3543
later call overrides the settings made by the earlier one.
3545
The @code{bind_textdomain_codeset} function returns a pointer to a
3546
string containing the name of the selected codeset. The string is
3547
allocated internally in the function and must not be changed by the
3548
user. If the system went out of core during the execution of
3549
@code{bind_textdomain_codeset}, the return value is @code{NULL} and the
3550
global variable @var{errno} is set accordingly.
3553
@node Plural forms, GUI program problems, Charset conversion, gettext
3554
@subsection Additional functions for plural forms
3556
The functions of the @code{gettext} family described so far (and all the
3557
@code{catgets} functions as well) have one problem in the real world
3558
which have been neglected completely in all existing approaches. What
3559
is meant here is the handling of plural forms.
3561
Looking through Unix source code before the time anybody thought about
3562
internationalization (and, sadly, even afterwards) one can often find
3563
code similar to the following:
3566
printf ("%d file%s deleted", n, n == 1 ? "" : "s");
3570
After the first complaints from people internationalizing the code people
3571
either completely avoided formulations like this or used strings like
3572
@code{"file(s)"}. Both look unnatural and should be avoided. First
3573
tries to solve the problem correctly looked like this:
3577
printf ("%d file deleted", n);
3579
printf ("%d files deleted", n);
3582
But this does not solve the problem. It helps languages where the
3583
plural form of a noun is not simply constructed by adding an `s' but
3584
that is all. Once again people fell into the trap of believing the
3585
rules their language is using are universal. But the handling of plural
3586
forms differs widely between the language families. For example,
3587
Rafal Maszkowski @code{<rzm@@mat.uni.torun.pl>} reports:
3590
In Polish we use e.g. plik (file) this way:
3598
and so on (o' means 8859-2 oacute which should be rather okreska,
3599
similar to aogonek).
3602
There are two things which can differ between languages (and even inside
3607
The form how plural forms are build differs. This is a problem with
3608
languages which have many irregularities. German, for instance, is a
3609
drastic case. Though English and German are part of the same language
3610
family (Germanic), the almost regular forming of plural noun forms
3611
(appending an `s') is hardly found in German.
3614
The number of plural forms differ. This is somewhat surprising for
3615
those who only have experiences with Romanic and Germanic languages
3616
since here the number is the same (there are two).
3618
But other language families have only one form or many forms. More
3619
information on this in an extra section.
3622
The consequence of this is that application writers should not try to
3623
solve the problem in their code. This would be localization since it is
3624
only usable for certain, hardcoded language environments. Instead the
3625
extended @code{gettext} interface should be used.
3627
These extra functions are taking instead of the one key string two
3628
strings and a numerical argument. The idea behind this is that using
3629
the numerical argument and the first string as a key, the implementation
3630
can select using rules specified by the translator the right plural
3631
form. The two string arguments then will be used to provide a return
3632
value in case no message catalog is found (similar to the normal
3633
@code{gettext} behavior). In this case the rules for Germanic language
3634
is used and it is assumed that the first string argument is the singular
3635
form, the second the plural form.
3637
This has the consequence that programs without language catalogs can
3638
display the correct strings only if the program itself is written using
3639
a Germanic language. This is a limitation but since the GNU C library
3640
(as well as the GNU @code{gettext} package) are written as part of the
3641
GNU package and the coding standards for the GNU project require program
3642
being written in English, this solution nevertheless fulfills its
3645
@deftypefun {char *} ngettext (const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
3646
The @code{ngettext} function is similar to the @code{gettext} function
3647
as it finds the message catalogs in the same way. But it takes two
3648
extra arguments. The @var{msgid1} parameter must contain the singular
3649
form of the string to be converted. It is also used as the key for the
3650
search in the catalog. The @var{msgid2} parameter is the plural form.
3651
The parameter @var{n} is used to determine the plural form. If no
3652
message catalog is found @var{msgid1} is returned if @code{n == 1},
3653
otherwise @code{msgid2}.
3655
An example for the use of this function is:
3658
printf (ngettext ("%d file removed", "%d files removed", n), n);
3661
Please note that the numeric value @var{n} has to be passed to the
3662
@code{printf} function as well. It is not sufficient to pass it only to
3666
@deftypefun {char *} dngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
3667
The @code{dngettext} is similar to the @code{dgettext} function in the
3668
way the message catalog is selected. The difference is that it takes
3669
two extra parameter to provide the correct plural form. These two
3670
parameters are handled in the same way @code{ngettext} handles them.
3673
@deftypefun {char *} dcngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}, int @var{category})
3674
The @code{dcngettext} is similar to the @code{dcgettext} function in the
3675
way the message catalog is selected. The difference is that it takes
3676
two extra parameter to provide the correct plural form. These two
3677
parameters are handled in the same way @code{ngettext} handles them.
3680
Now, how do these functions solve the problem of the plural forms?
3681
Without the input of linguists (which was not available) it was not
3682
possible to determine whether there are only a few different forms in
3683
which plural forms are formed or whether the number can increase with
3684
every new supported language.
3686
Therefore the solution implemented is to allow the translator to specify
3687
the rules of how to select the plural form. Since the formula varies
3688
with every language this is the only viable solution except for
3689
hardcoding the information in the code (which still would require the
3690
possibility of extensions to not prevent the use of new languages).
3692
The information about the plural form selection has to be stored in the
3693
header entry of the PO file (the one with the empty @code{msgid} string).
3694
The plural form information looks like this:
3697
Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;
3700
The @code{nplurals} value must be a decimal number which specifies how
3701
many different plural forms exist for this language. The string
3702
following @code{plural} is an expression which is using the C language
3703
syntax. Exceptions are that no negative numbers are allowed, numbers
3704
must be decimal, and the only variable allowed is @code{n}. This
3705
expression will be evaluated whenever one of the functions
3706
@code{ngettext}, @code{dngettext}, or @code{dcngettext} is called. The
3707
numeric value passed to these functions is then substituted for all uses
3708
of the variable @code{n} in the expression. The resulting value then
3709
must be greater or equal to zero and smaller than the value given as the
3710
value of @code{nplurals}.
3713
The following rules are known at this point. The language with families
3714
are listed. But this does not necessarily mean the information can be
3715
generalized for the whole family (as can be easily seen in the table
3716
below).@footnote{Additions are welcome. Send appropriate information to
3717
@email{bug-glibc-manual@@gnu.org}.}
3720
@item Only one form:
3721
Some languages only require one single form. There is no distinction
3722
between the singular and plural form. An appropriate header entry
3723
would look like this:
3726
Plural-Forms: nplurals=1; plural=0;
3730
Languages with this property include:
3733
@item Finno-Ugric family
3737
@item Turkic/Altaic family
3741
@item Two forms, singular used for one only
3742
This is the form used in most existing programs since it is what English
3743
is using. A header entry would look like this:
3746
Plural-Forms: nplurals=2; plural=n != 1;
3749
(Note: this uses the feature of C expressions that boolean expressions
3750
have to value zero or one.)
3753
Languages with this property include:
3756
@item Germanic family
3757
Danish, Dutch, English, German, Norwegian, Swedish
3758
@item Finno-Ugric family
3760
@item Latin/Greek family
3762
@item Semitic family
3764
@item Romanic family
3765
Italian, Portuguese, Spanish
3770
@item Two forms, singular used for zero and one
3771
Exceptional case in the language family. The header entry would be:
3774
Plural-Forms: nplurals=2; plural=n>1;
3778
Languages with this property include:
3781
@item Romanic family
3782
French, Brazilian Portuguese
3785
@item Three forms, special case for zero
3786
The header entry would be:
3789
Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2;
3793
Languages with this property include:
3800
@item Three forms, special cases for one and two
3801
The header entry would be:
3804
Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2;
3808
Languages with this property include:
3815
@item Three forms, special case for numbers ending in 1[2-9]
3816
The header entry would look like this:
3819
Plural-Forms: nplurals=3; \
3820
plural=n%10==1 && n%100!=11 ? 0 : \
3821
n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2;
3825
Languages with this property include:
3832
@item Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4]
3833
The header entry would look like this:
3836
Plural-Forms: nplurals=3; \
3837
plural=n%10==1 && n%100!=11 ? 0 : \
3838
n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
3842
Languages with this property include:
3846
Croatian, Czech, Russian, Slovak, Ukrainian
3849
@item Three forms, special case for one and some numbers ending in 2, 3, or 4
3850
The header entry would look like this:
3853
Plural-Forms: nplurals=3; \
3855
n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
3859
Languages with this property include:
3866
@item Four forms, special case for one and all numbers ending in 02, 03, or 04
3867
The header entry would look like this:
3870
Plural-Forms: nplurals=4; \
3871
plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3;
3875
Languages with this property include:
3883
@node GUI program problems, Optimized gettext, Plural forms, gettext
3884
@subsection How to use @code{gettext} in GUI programs
3886
One place where the @code{gettext} functions, if used normally, have big
3887
problems is within programs with graphical user interfaces (GUIs). The
3888
problem is that many of the strings which have to be translated are very
3889
short. They have to appear in pull-down menus which restricts the
3890
length. But strings which are not containing entire sentences or at
3891
least large fragments of a sentence may appear in more than one
3892
situation in the program but might have different translations. This is
3893
especially true for the one-word strings which are frequently used in
3896
As a consequence many people say that the @code{gettext} approach is
3897
wrong and instead @code{catgets} should be used which indeed does not
3898
have this problem. But there is a very simple and powerful method to
3899
handle these kind of problems with the @code{gettext} functions.
3902
As as example consider the following fictional situation. A GUI program
3903
has a menu bar with the following entries:
3906
+------------+------------+--------------------------------------+
3907
| File | Printer | |
3908
+------------+------------+--------------------------------------+
3911
+----------+ | Connect |
3915
To have the strings @code{File}, @code{Printer}, @code{Open},
3916
@code{New}, @code{Select}, and @code{Connect} translated there has to be
3917
at some point in the code a call to a function of the @code{gettext}
3918
family. But in two places the string passed into the function would be
3919
@code{Open}. The translations might not be the same and therefore we
3920
are in the dilemma described above.
3922
One solution to this problem is to artificially enlengthen the strings
3923
to make them unambiguous. But what would the program do if no
3924
translation is available? The enlengthened string is not what should be
3925
printed. So we should use a little bit modified version of the functions.
3927
To enlengthen the strings a uniform method should be used. E.g., in the
3928
example above the strings could be chosen as
3937
Menu|Printer|Connect
3940
Now all the strings are different and if now instead of @code{gettext}
3941
the following little wrapper function is used, everything works just
3947
sgettext (const char *msgid)
3949
char *msgval = gettext (msgid);
3950
if (msgval == msgid)
3951
msgval = strrchr (msgid, '|') + 1;
3956
What this little function does is to recognize the case when no
3957
translation is available. This can be done very efficiently by a
3958
pointer comparison since the return value is the input value. If there
3959
is no translation we know that the input string is in the format we used
3960
for the Menu entries and therefore contains a @code{|} character. We
3961
simply search for the last occurrence of this character and return a
3962
pointer to the character following it. That's it!
3964
If one now consistently uses the enlengthened string form and replaces
3965
the @code{gettext} calls with calls to @code{sgettext} (this is normally
3966
limited to very few places in the GUI implementation) then it is
3967
possible to produce a program which can be internationalized.
3969
The other @code{gettext} functions (@code{dgettext}, @code{dcgettext}
3970
and the @code{ngettext} equivalents) can and should have corresponding
3971
functions as well which look almost identical, except for the parameters
3972
and the call to the underlying function.
3974
Now there is of course the question why such functions do not exist in
3975
the GNU gettext package? There are two parts of the answer to this question.
3979
They are easy to write and therefore can be provided by the project they
3980
are used in. This is not an answer by itself and must be seen together
3981
with the second part which is:
3984
There is no way the gettext package can contain a version which can work
3985
everywhere. The problem is the selection of the character to separate
3986
the prefix from the actual string in the enlenghtened string. The
3987
examples above used @code{|} which is a quite good choice because it
3988
resembles a notation frequently used in this context and it also is a
3989
character not often used in message strings.
3991
But what if the character is used in message strings? Or if the chose
3992
character is not available in the character set on the machine one
3993
compiles (e.g., @code{|} is not required to exist for @w{ISO C}; this is
3994
why the @file{iso646.h} file exists in @w{ISO C} programming environments).
3997
There is only one more comment to be said. The wrapper function above
3998
requires that the translations strings are not enlengthened themselves.
3999
This is only logical. There is no need to disambiguate the strings
4000
(since they are never used as keys for a search) and one also saves
4001
quite some memory and disk space by doing this.
4003
@node Optimized gettext, , GUI program problems, gettext
4004
@subsection Optimization of the *gettext functions
4006
At this point of the discussion we should talk about an advantage of the
4007
GNU @code{gettext} implementation. Some readers might have pointed out
4008
that an internationalized program might have a poor performance if some
4009
string has to be translated in an inner loop. While this is unavoidable
4010
when the string varies from one run of the loop to the other it is
4011
simply a waste of time when the string is always the same. Take the
4019
puts (gettext ("Hello world"));
4026
When the locale selection does not change between two runs the resulting
4027
string is always the same. One way to use this is:
4032
str = gettext ("Hello world");
4042
But this solution is not usable in all situation (e.g. when the locale
4043
selection changes) nor does it lead to legible code.
4045
For this reason, GNU @code{gettext} caches previous translation results.
4046
When the same translation is requested twice, with no new message
4047
catalogs being loaded in between, @code{gettext} will, the second time,
4048
find the result through a single cache lookup.
4050
@node Comparison, Using libintl.a, gettext, Programmers
4051
@section Comparing the Two Interfaces
4053
@c FIXME: arguments to catgets vs. gettext
4054
@c Partly done 950718 -- drepper
4056
The following discussion is perhaps a little bit colored. As said
4057
above we implemented GNU @code{gettext} following the Uniforum
4058
proposal and this surely has its reasons. But it should show how we
4059
came to this decision.
4061
First we take a look at the developing process. When we write an
4062
application using NLS provided by @code{gettext} we proceed as always.
4063
Only when we come to a string which might be seen by the users and thus
4064
has to be translated we use @code{gettext("@dots{}")} instead of
4065
@code{"@dots{}"}. At the beginning of each source file (or in a central
4066
header file) we define
4069
#define gettext(String) (String)
4072
Even this definition can be avoided when the system supports the
4073
@code{gettext} function in its C library. When we compile this code the
4074
result is the same as if no NLS code is used. When you take a look at
4075
the GNU @code{gettext} code you will see that we use @code{_("@dots{}")}
4076
instead of @code{gettext("@dots{}")}. This reduces the number of
4077
additional characters per translatable string to @emph{3} (in words:
4080
When now a production version of the program is needed we simply replace
4084
#define _(String) (String)
4091
#include <libintl.h>
4092
#define _(String) gettext (String)
4096
Additionally we run the program @file{xgettext} on all source code file
4097
which contain translatable strings and that's it: we have a running
4098
program which does not depend on translations to be available, but which
4099
can use any that becomes available.
4101
The same procedure can be done for the @code{gettext_noop} invocations
4102
(@pxref{Special cases}). One usually defines @code{gettext_noop} as a
4103
no-op macro. So you should consider the following code for your project:
4106
#define gettext_noop(String) (String)
4107
#define N_(String) gettext_noop (String)
4110
@code{N_} is a short form similar to @code{_}. The @file{Makefile} in
4111
the @file{po/} directory of GNU @code{gettext} knows by default both of the
4112
mentioned short forms so you are invited to follow this proposal for
4115
Now to @code{catgets}. The main problem is the work for the
4116
programmer. Every time he comes to a translatable string he has to
4117
define a number (or a symbolic constant) which has also be defined in
4118
the message catalog file. He also has to take care for duplicate
4119
entries, duplicate message IDs etc. If he wants to have the same
4120
quality in the message catalog as the GNU @code{gettext} program
4121
provides he also has to put the descriptive comments for the strings and
4122
the location in all source code files in the message catalog. This is
4123
nearly a Mission: Impossible.
4125
But there are also some points people might call advantages speaking for
4126
@code{catgets}. If you have a single word in a string and this string
4127
is used in different contexts it is likely that in one or the other
4128
language the word has different translations. Example:
4131
printf ("%s: %d", gettext ("number"), number_of_errors)
4133
printf ("you should see %d %s", number_count,
4134
number_count == 1 ? gettext ("number") : gettext ("numbers"))
4137
Here we have to translate two times the string @code{"number"}. Even
4138
if you do not speak a language beside English it might be possible to
4139
recognize that the two words have a different meaning. In German the
4140
first appearance has to be translated to @code{"Anzahl"} and the second
4143
Now you can say that this example is really esoteric. And you are
4144
right! This is exactly how we felt about this problem and decide that
4145
it does not weight that much. The solution for the above problem could
4149
printf ("%s %d", gettext ("number:"), number_of_errors)
4151
printf (number_count == 1 ? gettext ("you should see %d number")
4152
: gettext ("you should see %d numbers"),
4156
We believe that we can solve all conflicts with this method. If it is
4157
difficult one can also consider changing one of the conflicting string a
4158
little bit. But it is not impossible to overcome.
4160
@code{catgets} allows same original entry to have different translations,
4161
but @code{gettext} has another, scalable approach for solving ambiguities
4162
of this kind: @xref{Ambiguities}.
4164
@node Using libintl.a, gettext grok, Comparison, Programmers
4165
@section Using libintl.a in own programs
4167
Starting with version 0.9.4 the library @code{libintl.h} should be
4168
self-contained. I.e., you can use it in your own programs without
4169
providing additional functions. The @file{Makefile} will put the header
4170
and the library in directories selected using the @code{$(prefix)}.
4172
One exception of the above is found on HP-UX 10.01 systems. Here the C
4173
library does not contain the @code{alloca} function (and the HP compiler
4174
does not generate it inlined). But it is not intended to rewrite the whole
4175
library just because of this dumb system. Instead include the
4176
@code{alloca} function in all package you use the @code{libintl.a} in.
4178
@node gettext grok, Temp Programmers, Using libintl.a, Programmers
4179
@section Being a @code{gettext} grok
4181
To fully exploit the functionality of the GNU @code{gettext} library it
4182
is surely helpful to read the source code. But for those who don't want
4183
to spend that much time in reading the (sometimes complicated) code here
4187
@item Changing the language at runtime
4189
For interactive programs it might be useful to offer a selection of the
4190
used language at runtime. To understand how to do this one need to know
4191
how the used language is determined while executing the @code{gettext}
4192
function. The method which is presented here only works correctly
4193
with the GNU implementation of the @code{gettext} functions.
4195
In the function @code{dcgettext} at every call the current setting of
4196
the highest priority environment variable is determined and used.
4197
Highest priority means here the following list with decreasing
4201
@item @code{LANGUAGE}
4203
@item @code{LC_xxx}, according to selected locale
4207
Afterwards the path is constructed using the found value and the
4208
translation file is loaded if available.
4210
What is now when the value for, say, @code{LANGUAGE} changes. According
4211
to the process explained above the new value of this variable is found
4212
as soon as the @code{dcgettext} function is called. But this also means
4213
the (perhaps) different message catalog file is loaded. In other
4214
words: the used language is changed.
4216
But there is one little hook. The code for gcc-2.7.0 and up provides
4217
some optimization. This optimization normally prevents the calling of
4218
the @code{dcgettext} function as long as no new catalog is loaded. But
4219
if @code{dcgettext} is not called the program also cannot find the
4220
@code{LANGUAGE} variable be changed (@pxref{Optimized gettext}). A
4221
solution for this is very easy. Include the following code in the
4222
language switching function.
4225
/* Change language. */
4226
setenv ("LANGUAGE", "fr", 1);
4228
/* Make change known. */
4230
extern int _nl_msg_cat_cntr;
4235
The variable @code{_nl_msg_cat_cntr} is defined in @file{loadmsgcat.c}.
4236
The programmer will find himself in need for a construct like this only
4237
when developing programs which do run longer and provide the user to
4238
select the language at runtime. Non-interactive programs (like all
4239
these little Unix tools) should never need this.
4243
@node Temp Programmers, , gettext grok, Programmers
4244
@section Temporary Notes for the Programmers Chapter
4247
* Temp Implementations:: Temporary - Two Possible Implementations
4248
* Temp catgets:: Temporary - About @code{catgets}
4249
* Temp WSI:: Temporary - Why a single implementation
4250
* Temp Notes:: Temporary - Notes
4253
@node Temp Implementations, Temp catgets, Temp Programmers, Temp Programmers
4254
@subsection Temporary - Two Possible Implementations
4256
There are two competing methods for language independent messages:
4257
the X/Open @code{catgets} method, and the Uniforum @code{gettext}
4258
method. The @code{catgets} method indexes messages by integers; the
4259
@code{gettext} method indexes them by their English translations.
4260
The @code{catgets} method has been around longer and is supported
4261
by more vendors. The @code{gettext} method is supported by Sun,
4262
and it has been heard that the COSE multi-vendor initiative is
4263
supporting it. Neither method is a POSIX standard; the POSIX.1
4264
committee had a lot of disagreement in this area.
4266
Neither one is in the POSIX standard. There was much disagreement
4267
in the POSIX.1 committee about using the @code{gettext} routines
4268
vs. @code{catgets} (XPG). In the end the committee couldn't
4269
agree on anything, so no messaging system was included as part
4270
of the standard. I believe the informative annex of the standard
4271
includes the XPG3 messaging interfaces, ``@dots{}as an example of
4272
a messaging system that has been implemented@dots{}''
4274
They were very careful not to say anywhere that you should use one
4275
set of interfaces over the other. For more on this topic please
4276
see the Programming for Internationalization FAQ.
4278
@node Temp catgets, Temp WSI, Temp Implementations, Temp Programmers
4279
@subsection Temporary - About @code{catgets}
4281
There have been a few discussions of late on the use of
4282
@code{catgets} as a base. I think it important to present both
4283
sides of the argument and hence am opting to play devil's advocate
4286
I'll not deny the fact that @code{catgets} could have been designed
4287
a lot better. It currently has quite a number of limitations and
4288
these have already been pointed out.
4290
However there is a great deal to be said for consistency and
4291
standardization. A common recurring problem when writing Unix
4292
software is the myriad portability problems across Unix platforms.
4293
It seems as if every Unix vendor had a look at the operating system
4294
and found parts they could improve upon. Undoubtedly, these
4295
modifications are probably innovative and solve real problems.
4296
However, software developers have a hard time keeping up with all
4297
these changes across so many platforms.
4299
And this has prompted the Unix vendors to begin to standardize their
4300
systems. Hence the impetus for Spec1170. Every major Unix vendor
4301
has committed to supporting this standard and every Unix software
4302
developer waits with glee the day they can write software to this
4303
standard and simply recompile (without having to use autoconf)
4304
across different platforms.
4306
As I understand it, Spec1170 is roughly based upon version 4 of the
4307
X/Open Portability Guidelines (XPG4). Because @code{catgets} and
4308
friends are defined in XPG4, I'm led to believe that @code{catgets}
4309
is a part of Spec1170 and hence will become a standardized component
4310
of all Unix systems.
4312
@node Temp WSI, Temp Notes, Temp catgets, Temp Programmers
4313
@subsection Temporary - Why a single implementation
4315
Now it seems kind of wasteful to me to have two different systems
4316
installed for accessing message catalogs. If we do want to remedy
4317
@code{catgets} deficiencies why don't we try to expand @code{catgets}
4318
(in a compatible manner) rather than implement an entirely new system.
4319
Otherwise, we'll end up with two message catalog access systems installed
4320
with an operating system - one set of routines for packages using GNU
4321
@code{gettext} for their internationalization, and another set of routines
4322
(catgets) for all other software. Bloated?
4324
Supposing another catalog access system is implemented. Which do
4325
we recommend? At least for Linux, we need to attract as many
4326
software developers as possible. Hence we need to make it as easy
4327
for them to port their software as possible. Which means supporting
4328
@code{catgets}. We will be implementing the @code{libintl} code
4329
within our @code{libc}, but does this mean we also have to incorporate
4330
another message catalog access scheme within our @code{libc} as well?
4331
And what about people who are going to be using the @code{libintl}
4332
+ non-@code{catgets} routines. When they port their software to
4333
other platforms, they're now going to have to include the front-end
4334
(@code{libintl}) code plus the back-end code (the non-@code{catgets}
4335
access routines) with their software instead of just including the
4336
@code{libintl} code with their software.
4338
Message catalog support is however only the tip of the iceberg.
4339
What about the data for the other locale categories. They also have
4340
a number of deficiencies. Are we going to abandon them as well and
4341
develop another duplicate set of routines (should @code{libintl}
4342
expand beyond message catalog support)?
4344
Like many parts of Unix that can be improved upon, we're stuck with balancing
4345
compatibility with the past with useful improvements and innovations for
4348
@node Temp Notes, , Temp WSI, Temp Programmers
4349
@subsection Temporary - Notes
4351
X/Open agreed very late on the standard form so that many
4352
implementations differ from the final form. Both of my system (old
4353
Linux catgets and Ultrix-4) have a strange variation.
4355
OK. After incorporating the last changes I have to spend some time on
4356
making the GNU/Linux @code{libc} @code{gettext} functions. So in future
4357
Solaris is not the only system having @code{gettext}.
4359
@node Translators, Maintainers, Programmers, Top
4360
@chapter The Translator's View
4362
@c FIXME: Reorganize whole chapter.
4365
* Trans Intro 0:: Introduction 0
4366
* Trans Intro 1:: Introduction 1
4367
* Discussions:: Discussions
4368
* Organization:: Organization
4369
* Information Flow:: Information Flow
4372
@node Trans Intro 0, Trans Intro 1, Translators, Translators
4373
@section Introduction 0
4375
Free software is going international! The Translation Project is a way
4376
to get maintainers, translators and users all together, so free software
4377
will gradually become able to speak many native languages.
4379
The GNU @code{gettext} tool set contains @emph{everything} maintainers
4380
need for internationalizing their packages for messages. It also
4381
contains quite useful tools for helping translators at localizing
4382
messages to their native language, once a package has already been
4385
To achieve the Translation Project, we need many interested
4386
people who like their own language and write it well, and who are also
4387
able to synergize with other translators speaking the same language.
4388
If you'd like to volunteer to @emph{work} at translating messages,
4389
please send mail to your translating team.
4391
Each team has its own mailing list, courtesy of Linux
4392
International. You may reach your translating team at the address
4393
@file{@var{ll}@@li.org}, replacing @var{ll} by the two-letter @w{ISO 639}
4394
code for your language. Language codes are @emph{not} the same as
4395
country codes given in @w{ISO 3166}. The following translating teams
4399
Chinese @code{zh}, Czech @code{cs}, Danish @code{da}, Dutch @code{nl},
4400
Esperanto @code{eo}, Finnish @code{fi}, French @code{fr}, Irish
4401
@code{ga}, German @code{de}, Greek @code{el}, Italian @code{it},
4402
Japanese @code{ja}, Indonesian @code{in}, Norwegian @code{no}, Polish
4403
@code{pl}, Portuguese @code{pt}, Russian @code{ru}, Spanish @code{es},
4404
Swedish @code{sv} and Turkish @code{tr}.
4408
For example, you may reach the Chinese translating team by writing to
4409
@file{zh@@li.org}. When you become a member of the translating team
4410
for your own language, you may subscribe to its list. For example,
4411
Swedish people can send a message to @w{@file{sv-request@@li.org}},
4412
having this message body:
4418
Keep in mind that team members should be interested in @emph{working}
4419
at translations, or at solving translational difficulties, rather than
4420
merely lurking around. If your team does not exist yet and you want to
4421
start one, please write to @w{@file{translation@@iro.umontreal.ca}};
4422
you will then reach the coordinator for all translator teams.
4424
A handful of GNU packages have already been adapted and provided
4425
with message translations for several languages. Translation
4426
teams have begun to organize, using these packages as a starting
4427
point. But there are many more packages and many languages for
4428
which we have no volunteer translators. If you would like to
4429
volunteer to work at translating messages, please send mail to
4430
@file{translation@@iro.umontreal.ca} indicating what language(s)
4433
@node Trans Intro 1, Discussions, Trans Intro 0, Translators
4434
@section Introduction 1
4436
This is now official, GNU is going international! Here is the
4437
announcement submitted for the January 1995 GNU Bulletin:
4440
A handful of GNU packages have already been adapted and provided
4441
with message translations for several languages. Translation
4442
teams have begun to organize, using these packages as a starting
4443
point. But there are many more packages and many languages
4444
for which we have no volunteer translators. If you'd like to
4445
volunteer to work at translating messages, please send mail to
4446
@samp{translation@@iro.umontreal.ca} indicating what language(s)
4450
This document should answer many questions for those who are curious about
4451
the process or would like to contribute. Please at least skim over it,
4452
hoping to cut down a little of the high volume of e-mail generated by this
4453
collective effort towards internationalization of free software.
4455
Most free programming which is widely shared is done in English, and
4456
currently, English is used as the main communicating language between
4457
national communities collaborating to free software. This very document
4458
is written in English. This will not change in the foreseeable future.
4460
However, there is a strong appetite from national communities for
4461
having more software able to write using national language and habits,
4462
and there is an on-going effort to modify free software in such a way
4463
that it becomes able to do so. The experiments driven so far raised
4464
an enthusiastic response from pretesters, so we believe that
4465
internationalization of free software is dedicated to succeed.
4467
For suggestion clarifications, additions or corrections to this
4468
document, please e-mail to @file{translation@@iro.umontreal.ca}.
4470
@node Discussions, Organization, Trans Intro 1, Translators
4471
@section Discussions
4473
Facing this internationalization effort, a few users expressed their
4474
concerns. Some of these doubts are presented and discussed, here.
4477
@item Smaller groups
4479
Some languages are not spoken by a very large number of people, so people
4480
speaking them sometimes consider that there may not be all that much
4481
demand such versions of free software packages. Moreover, many people
4482
being @emph{into computers}, in some countries, generally seem to prefer
4483
English versions of their software.
4485
On the other end, people might enjoy their own language a lot, and be
4486
very motivated at providing to themselves the pleasure of having their
4487
beloved free software speaking their mother tongue. They do themselves
4488
a personal favor, and do not pay that much attention to the number of
4489
people beneficiating of their work.
4491
@item Misinterpretation
4493
Other users are shy to push forward their own language, seeing in this
4494
some kind of misplaced propaganda. Someone thought there must be some
4495
users of the language over the networks pestering other people with it.
4497
But any spoken language is worth localization, because there are
4498
people behind the language for whom the language is important and
4499
dear to their hearts.
4501
@item Odd translations
4503
The biggest problem is to find the right translations so that
4504
everybody can understand the messages. Translations are usually a
4505
little odd. Some people get used to English, to the extent they may
4506
find translations into their own language ``rather pushy, obnoxious
4507
and sometimes even hilarious.'' As a French speaking man, I have
4508
the experience of those instruction manuals for goods, so poorly
4509
translated in French in Korea or Taiwan@dots{}
4511
The fact is that we sometimes have to create a kind of national
4512
computer culture, and this is not easy without the collaboration of
4513
many people liking their mother tongue. This is why translations are
4514
better achieved by people knowing and loving their own language, and
4515
ready to work together at improving the results they obtain.
4517
@item Dependencies over the GPL or LGPL
4519
Some people wonder if using GNU @code{gettext} necessarily brings their
4520
package under the protective wing of the GNU General Public License or
4521
the GNU Library General Public License, when they do not want to make
4522
their program free, or want other kinds of freedom. The simplest
4523
answer is ``normally not''.
4525
The GNU @code{gettext} library, i.e. the contents of @code{libintl},
4526
is covered by the GNU Library General Public License. The rest of
4527
the GNU @code{gettext} package is covered by the GNU General Public
4530
The mere marking of localizable strings in a package, or conditional
4531
inclusion of a few lines for initialization, is not really including
4532
GPL'ed or LGPL'ed code. However, since the localization routines in
4533
@code{libintl} are under the LGPL, the LGPL needs to be considered.
4534
It gives the right to distribute the complete unmodified source of
4535
@code{libintl} even with non-free programs. It also gives the right
4536
to use @code{libintl} as a shared library, even for non-free programs.
4537
But it gives the right to use @code{libintl} as a static library or
4538
to incorporate @code{libintl} into another library only to free
4543
@node Organization, Information Flow, Discussions, Translators
4544
@section Organization
4546
On a larger scale, the true solution would be to organize some kind of
4547
fairly precise set up in which volunteers could participate. I gave
4548
some thought to this idea lately, and realize there will be some
4549
touchy points. I thought of writing to Richard Stallman to launch
4550
such a project, but feel it might be good to shake out the ideas
4551
between ourselves first. Most probably that Linux International has
4552
some experience in the field already, or would like to orchestrate
4553
the volunteer work, maybe. Food for thought, in any case!
4555
I guess we have to setup something early, somehow, that will help
4556
many possible contributors of the same language to interlock and avoid
4557
work duplication, and further be put in contact for solving together
4558
problems particular to their tongue (in most languages, there are many
4559
difficulties peculiar to translating technical English). My Swedish
4560
contributor acknowledged these difficulties, and I'm well aware of
4563
This is surely not a technical issue, but we should manage so the
4564
effort of locale contributors be maximally useful, despite the national
4565
team layer interface between contributors and maintainers.
4567
The Translation Project needs some setup for coordinating language
4568
coordinators. Localizing evolving programs will surely
4569
become a permanent and continuous activity in the free software community,
4571
The setup should be minimally completed and tested before GNU
4572
@code{gettext} becomes an official reality. The e-mail address
4573
@file{translation@@iro.umontreal.ca} has been setup for receiving
4574
offers from volunteers and general e-mail on these topics. This address
4575
reaches the Translation Project coordinator.
4578
* Central Coordination:: Central Coordination
4579
* National Teams:: National Teams
4580
* Mailing Lists:: Mailing Lists
4583
@node Central Coordination, National Teams, Organization, Organization
4584
@subsection Central Coordination
4586
I also think GNU will need sooner than it thinks, that someone setup
4587
a way to organize and coordinate these groups. Some kind of group
4588
of groups. My opinion is that it would be good that GNU delegates
4589
this task to a small group of collaborating volunteers, shortly.
4590
Perhaps in @file{gnu.announce} a list of this national committee's
4593
My role as coordinator would simply be to refer to Ulrich any German
4594
speaking volunteer interested to localization of free software packages, and
4595
maybe helping national groups to initially organize, while maintaining
4596
national registries for until national groups are ready to take over.
4597
In fact, the coordinator should ease volunteers to get in contact with
4598
one another for creating national teams, which should then select
4599
one coordinator per language, or country (regionalized language).
4600
If well done, the coordination should be useful without being an
4601
overwhelming task, the time to put delegations in place.
4603
@node National Teams, Mailing Lists, Central Coordination, Organization
4604
@subsection National Teams
4606
I suggest we look for volunteer coordinators/editors for individual
4607
languages. These people will scan contributions of translation files
4608
for various programs, for their own languages, and will ensure high
4609
and uniform standards of diction.
4611
From my current experience with other people in these days, those who
4612
provide localizations are very enthusiastic about the process, and are
4613
more interested in the localization process than in the program they
4614
localize, and want to do many programs, not just one. This seems
4615
to confirm that having a coordinator/editor for each language is a
4618
We need to choose someone who is good at writing clear and concise
4619
prose in the language in question. That is hard---we can't check
4620
it ourselves. So we need to ask a few people to judge each others'
4621
writing and select the one who is best.
4623
I announce my prerelease to a few dozen people, and you would not
4624
believe all the discussions it generated already. I shudder to think
4625
what will happen when this will be launched, for true, officially,
4626
world wide. Who am I to arbitrate between two Czekolsovak users
4627
contradicting each other, for example?
4629
I assume that your German is not much better than my French so that
4630
I would not be able to judge about these formulations. What I would
4631
suggest is that for each language there is a group for people who
4632
maintain the PO files and judge about changes. I suspect there will
4633
be cultural differences between how such groups of people will behave.
4634
Some will have relaxed ways, reach consensus easily, and have anyone
4635
of the group relate to the maintainers, while others will fight to
4636
death, organize heavy administrations up to national standards, and
4637
use strict channels.
4639
The German team is putting out a good example. Right now, they are
4640
maybe half a dozen people revising translations of each other and
4641
discussing the linguistic issues. I do not even have all the names.
4642
Ulrich Drepper is taking care of coordinating the German team.
4643
He subscribed to all my pretest lists, so I do not even have to warn
4644
him specifically of incoming releases.
4646
I'm sure, that is a good idea to get teams for each language working
4647
on translations. That will make the translations better and more
4651
* Sub-Cultures:: Sub-Cultures
4652
* Organizational Ideas:: Organizational Ideas
4655
@node Sub-Cultures, Organizational Ideas, National Teams, National Teams
4656
@subsubsection Sub-Cultures
4658
Taking French for example, there are a few sub-cultures around computers
4659
which developed diverging vocabularies. Picking volunteers here and
4660
there without addressing this problem in an organized way, soon in the
4661
project, might produce a distasteful mix of internationalized programs,
4662
and possibly trigger endless quarrels among those who really care.
4664
Keeping some kind of unity in the way French localization of
4665
internationalized programs is achieved is a difficult (and delicate) job.
4666
Knowing the latin character of French people (:-), if we take this
4667
the wrong way, we could end up nowhere, or spoil a lot of energies.
4668
Maybe we should begin to address this problem seriously @emph{before}
4669
GNU @code{gettext} become officially published. And I suspect that this
4672
@node Organizational Ideas, , Sub-Cultures, National Teams
4673
@subsubsection Organizational Ideas
4675
I expect the next big changes after the official release. Please note
4676
that I use the German translation of the short GPL message. We need
4677
to set a few good examples before the localization goes out for true
4678
in the free software community. Here are a few points to discuss:
4682
Each group should have one FTP server (at least one master).
4685
The files on the server should reflect the latest version (of
4686
course!) and it should also contain a RCS directory with the
4687
corresponding archives (I don't have this now).
4690
There should also be a ChangeLog file (this is more useful than the
4691
RCS archive but can be generated automatically from the later by
4695
A @dfn{core group} should judge about questionable changes (for now
4696
this group consists solely by me but I ask some others occasionally;
4697
this also seems to work).
4701
@node Mailing Lists, , National Teams, Organization
4702
@subsection Mailing Lists
4704
If we get any inquiries about GNU @code{gettext}, send them on to:
4707
@file{translation@@iro.umontreal.ca}
4710
The @file{*-pretest} lists are quite useful to me, maybe the idea could
4711
be generalized to many GNU, and non-GNU packages. But each maintainer
4714
Fran@,{c}ois, we have a mechanism in place here at
4715
@file{gnu.ai.mit.edu} to track teams, support mailing lists for
4716
them and log members. We have a slight preference that you use it.
4717
If this is OK with you, I can get you clued in.
4719
Things are changing! A few years ago, when Daniel Fekete and I
4720
asked for a mailing list for GNU localization, nested at the FSF, we
4721
were politely invited to organize it anywhere else, and so did we.
4722
For communicating with my pretesters, I later made a handful of
4723
mailing lists located at iro.umontreal.ca and administrated by
4724
@code{majordomo}. These lists have been @emph{very} dependable
4727
I suspect that the German team will organize itself a mailing list
4728
located in Germany, and so forth for other countries. But before they
4729
organize for true, it could surely be useful to offer mailing lists
4730
located at the FSF to each national team. So yes, please explain me
4731
how I should proceed to create and handle them.
4733
We should create temporary mailing lists, one per country, to help
4734
people organize. Temporary, because once regrouped and structured, it
4735
would be fair the volunteers from country bring back @emph{their} list
4736
in there and manage it as they want. My feeling is that, in the long
4737
run, each team should run its own list, from within their country.
4738
There also should be some central list to which all teams could
4739
subscribe as they see fit, as long as each team is represented in it.
4741
@node Information Flow, , Organization, Translators
4742
@section Information Flow
4744
There will surely be some discussion about this messages after the
4745
packages are finally released. If people now send you some proposals
4746
for better messages, how do you proceed? Jim, please note that
4747
right now, as I put forward nearly a dozen of localizable programs, I
4748
receive both the translations and the coordination concerns about them.
4750
If I put one of my things to pretest, Ulrich receives the announcement
4751
and passes it on to the German team, who make last minute revisions.
4752
Then he submits the translation files to me @emph{as the maintainer}.
4753
For free packages I do not maintain, I would not even hear about it.
4754
This scheme could be made to work for the whole Translation Project,
4755
I think. For security reasons, maybe Ulrich (national coordinators,
4756
in fact) should update central registry kept at the Translation Project
4757
(Jim, me, or Len's recruits) once in a while.
4759
In December/January, I was aggressively ready to internationalize
4760
all of GNU, giving myself the duty of one small GNU package per week
4761
or so, taking many weeks or months for bigger packages. But it does
4762
not work this way. I first did all the things I'm responsible for.
4763
I've nothing against some missionary work on other maintainers, but
4764
I'm also loosing a lot of energy over it---same debates over again.
4766
And when the first localized packages are released we'll get a lot of
4767
responses about ugly translations :-). Surely, and we need to have
4768
beforehand a fairly good idea about how to handle the information
4769
flow between the national teams and the package maintainers.
4771
Please start saving somewhere a quick history of each PO file. I know
4772
for sure that the file format will change, allowing for comments.
4773
It would be nice that each file has a kind of log, and references for
4774
those who want to submit comments or gripes, or otherwise contribute.
4775
I sent a proposal for a fast and flexible format, but it is not
4776
receiving acceptance yet by the GNU deciders. I'll tell you when I
4777
have more information about this.
4779
@node Maintainers, Conclusion, Translators, Top
4780
@chapter The Maintainer's View
4782
The maintainer of a package has many responsibilities. One of them
4783
is ensuring that the package will install easily on many platforms,
4784
and that the magic we described earlier (@pxref{Users}) will work
4785
for installers and end users.
4787
Of course, there are many possible ways by which GNU @code{gettext}
4788
might be integrated in a distribution, and this chapter does not cover
4789
them in all generality. Instead, it details one possible approach which
4790
is especially adequate for many free software distributions following GNU
4791
standards, or even better, Gnits standards, because GNU @code{gettext}
4792
is purposely for helping the internationalization of the whole GNU
4793
project, and as many other good free packages as possible. So, the
4794
maintainer's view presented here presumes that the package already has
4795
a @file{configure.in} file and uses GNU Autoconf.
4797
Nevertheless, GNU @code{gettext} may surely be useful for free packages
4798
not following GNU standards and conventions, but the maintainers of such
4799
packages might have to show imagination and initiative in organizing
4800
their distributions so @code{gettext} work for them in all situations.
4801
There are surely many, out there.
4803
Even if @code{gettext} methods are now stabilizing, slight adjustments
4804
might be needed between successive @code{gettext} versions, so you
4805
should ideally revise this chapter in subsequent releases, looking
4809
* Flat and Non-Flat:: Flat or Non-Flat Directory Structures
4810
* Prerequisites:: Prerequisite Works
4811
* gettextize Invocation:: Invoking the @code{gettextize} Program
4812
* Adjusting Files:: Files You Must Create or Alter
4815
@node Flat and Non-Flat, Prerequisites, Maintainers, Maintainers
4816
@section Flat or Non-Flat Directory Structures
4818
Some free software packages are distributed as @code{tar} files which unpack
4819
in a single directory, these are said to be @dfn{flat} distributions.
4820
Other free software packages have a one level hierarchy of subdirectories, using
4821
for example a subdirectory named @file{doc/} for the Texinfo manual and
4822
man pages, another called @file{lib/} for holding functions meant to
4823
replace or complement C libraries, and a subdirectory @file{src/} for
4824
holding the proper sources for the package. These other distributions
4825
are said to be @dfn{non-flat}.
4827
We cannot say much about flat distributions. A flat
4828
directory structure has the disadvantage of increasing the difficulty
4829
of updating to a new version of GNU @code{gettext}. Also, if you have
4830
many PO files, this could somewhat pollute your single directory.
4831
Also, GNU @code{gettext}'s libintl sources consist of C sources, shell
4832
scripts, @code{sed} scripts and complicated Makefile rules, which don't
4833
fit well into an existing flat structure. For these reasons, we
4834
recommend to use non-flat approach in this case as well.
4836
Maybe because GNU @code{gettext} itself has a non-flat structure,
4837
we have more experience with this approach, and this is what will be
4838
described in the remaining of this chapter. Some maintainers might
4839
use this as an opportunity to unflatten their package structure.
4841
@node Prerequisites, gettextize Invocation, Flat and Non-Flat, Maintainers
4842
@section Prerequisite Works
4844
There are some works which are required for using GNU @code{gettext}
4845
in one of your package. These works have some kind of generality
4846
that escape the point by point descriptions used in the remainder
4847
of this chapter. So, we describe them here.
4851
Before attempting to use @code{gettextize} you should install some
4852
other packages first.
4853
Ensure that recent versions of GNU @code{m4}, GNU Autoconf and GNU
4854
@code{gettext} are already installed at your site, and if not, proceed
4855
to do this first. If you got to install these things, beware that
4856
GNU @code{m4} must be fully installed before GNU Autoconf is even
4859
To further ease the task of a package maintainer the @code{automake}
4860
package was designed and implemented. GNU @code{gettext} now uses this
4861
tool and the @file{Makefile}s in the @file{intl/} and @file{po/}
4862
therefore know about all the goals necessary for using @code{automake}
4863
and @file{libintl} in one project.
4865
Those four packages are only needed to you, as a maintainer; the
4866
installers of your own package and end users do not really need any of
4867
GNU @code{m4}, GNU Autoconf, GNU @code{gettext}, or GNU @code{automake}
4868
for successfully installing and running your package, with messages
4869
properly translated. But this is not completely true if you provide
4870
internationalized shell scripts within your own package: GNU
4871
@code{gettext} shall then be installed at the user site if the end users
4872
want to see the translation of shell script messages.
4875
Your package should use Autoconf and have a @file{configure.in} file.
4876
If it does not, you have to learn how. The Autoconf documentation
4877
is quite well written, it is a good idea that you print it and get
4881
Your C sources should have already been modified according to
4882
instructions given earlier in this manual. @xref{Sources}.
4885
Your @file{po/} directory should receive all PO files submitted to you
4886
by the translator teams, each having @file{@var{ll}.po} as a name.
4887
This is not usually easy to get translation
4888
work done before your package gets internationalized and available!
4889
Since the cycle has to start somewhere, the easiest for the maintainer
4890
is to start with absolutely no PO files, and wait until various
4891
translator teams get interested in your package, and submit PO files.
4895
It is worth adding here a few words about how the maintainer should
4896
ideally behave with PO files submissions. As a maintainer, your role is
4897
to authentify the origin of the submission as being the representative
4898
of the appropriate translating teams of the Translation Project (forward
4899
the submission to @file{translation@@iro.umontreal.ca} in case of doubt),
4900
to ensure that the PO file format is not severely broken and does not
4901
prevent successful installation, and for the rest, to merely to put these
4902
PO files in @file{po/} for distribution.
4904
As a maintainer, you do not have to take on your shoulders the
4905
responsibility of checking if the translations are adequate or
4906
complete, and should avoid diving into linguistic matters. Translation
4907
teams drive themselves and are fully responsible of their linguistic
4908
choices for the Translation Project. Keep in mind that translator teams are @emph{not}
4909
driven by maintainers. You can help by carefully redirecting all
4910
communications and reports from users about linguistic matters to the
4911
appropriate translation team, or explain users how to reach or join
4912
their team. The simplest might be to send them the @file{ABOUT-NLS} file.
4914
Maintainers should @emph{never ever} apply PO file bug reports
4915
themselves, short-cutting translation teams. If some translator has
4916
difficulty to get some of her points through her team, it should not be
4917
an issue for her to directly negotiate translations with maintainers.
4918
Teams ought to settle their problems themselves, if any. If you, as
4919
a maintainer, ever think there is a real problem with a team, please
4920
never try to @emph{solve} a team's problem on your own.
4922
@node gettextize Invocation, Adjusting Files, Prerequisites, Maintainers
4923
@section Invoking the @code{gettextize} Program
4925
Some files are consistently and identically needed in every package
4926
internationalized through GNU @code{gettext}. As a matter of
4927
convenience, the @code{gettextize} program puts all these files right
4928
in your package. This program has the following synopsis:
4931
gettextize [ @var{option}@dots{} ] [ @var{directory} ]
4935
and accepts the following options:
4940
Copy the needed files instead of making symbolic links. Using links
4941
would allow the package to always use the latest @code{gettext} code
4942
available on the system, but it might disturb some mechanism the
4943
maintainer is used to apply to the sources. Because running
4944
@code{gettextize} is easy there shouldn't be problems with using copies.
4948
Force replacement of files which already exist.
4952
Display this help and exit.
4955
Output version information and exit.
4959
If @var{directory} is given, this is the top level directory of a
4960
package to prepare for using GNU @code{gettext}. If not given, it
4961
is assumed that the current directory is the top level directory of
4964
The program @code{gettextize} provides the following files. However,
4965
no existing file will be replaced unless the option @code{--force}
4966
(@code{-f}) is specified.
4970
The @file{ABOUT-NLS} file is copied in the main directory of your package,
4971
the one being at the top level. This file gives the main indications
4972
about how to install and use the Native Language Support features
4973
of your program. You might elect to use a more recent copy of this
4974
@file{ABOUT-NLS} file than the one provided through @code{gettextize},
4975
if you have one handy. You may also fetch a more recent copy of file
4976
@file{ABOUT-NLS} from Translation Project sites, and from most GNU
4980
A @file{po/} directory is created for eventually holding
4981
all translation files, but initially only containing the file
4982
@file{po/Makefile.in.in} from the GNU @code{gettext} distribution.
4983
(beware the double @samp{.in} in the file name). If the @file{po/}
4984
directory already exists, it will be preserved along with the files
4985
it contains, and only @file{Makefile.in.in} will be overwritten.
4988
A @file{intl/} directory is created and filled with most of the files
4989
originally in the @file{intl/} directory of the GNU @code{gettext}
4990
distribution. Also, if option @code{--force} (@code{-f}) is given,
4991
the @file{intl/} directory is emptied first.
4995
If your site support symbolic links, @code{gettextize} will not
4996
actually copy the files into your package, but establish symbolic
4997
links instead. This avoids duplicating the disk space needed in
4998
all packages. Merely using the @samp{-h} option while creating the
4999
@code{tar} archive of your distribution will resolve each link by an
5000
actual copy in the distribution archive. So, to insist, you really
5001
should use @samp{-h} option with @code{tar} within your @code{dist}
5002
goal of your main @file{Makefile.in}.
5004
It is interesting to understand that most new files for supporting
5005
GNU @code{gettext} facilities in one package go in @file{intl/}
5006
and @file{po/} subdirectories. One distinction between these two
5007
directories is that @file{intl/} is meant to be completely identical
5008
in all packages using GNU @code{gettext}, while all newly created
5009
files, which have to be different, go into @file{po/}. There is a
5010
common @file{Makefile.in.in} in @file{po/}, because the @file{po/}
5011
directory needs its own @file{Makefile}, and it has been designed so
5012
it can be identical in all packages.
5014
@node Adjusting Files, , gettextize Invocation, Maintainers
5015
@section Files You Must Create or Alter
5017
Besides files which are automatically added through @code{gettextize},
5018
there are many files needing revision for properly interacting with
5019
GNU @code{gettext}. If you are closely following GNU standards for
5020
Makefile engineering and auto-configuration, the adaptations should
5021
be easier to achieve. Here is a point by point description of the
5022
changes needed in each.
5024
So, here comes a list of files, each one followed by a description of
5025
all alterations it needs. Many examples are taken out from the GNU
5026
@code{gettext} @value{VERSION} distribution itself. You may indeed
5027
refer to the source code of the GNU @code{gettext} package, as it
5028
is intended to be a good example and master implementation for using
5029
its own functionality.
5032
* po/POTFILES.in:: @file{POTFILES.in} in @file{po/}
5033
* configure.in:: @file{configure.in} at top level
5034
* config.guess:: @file{config.guess}, @file{config.sub} at top level
5035
* aclocal:: @file{aclocal.m4} at top level
5036
* acconfig:: @file{acconfig.h} at top level
5037
* Makefile:: @file{Makefile.in} at top level
5038
* src/Makefile:: @file{Makefile.in} in @file{src/}
5041
@node po/POTFILES.in, configure.in, Adjusting Files, Adjusting Files
5042
@subsection @file{POTFILES.in} in @file{po/}
5044
The @file{po/} directory should receive a file named
5045
@file{POTFILES.in}. This file tells which files, among all program
5046
sources, have marked strings needing translation. Here is an example
5051
# List of source files containing translatable strings.
5052
# Copyright (C) 1995 Free Software Foundation, Inc.
5054
# Common library files
5059
# Package source files
5067
Hash-marked comments and white lines are ignored. All other lines
5068
list those source files containing strings marked for translation
5069
(@pxref{Mark Keywords}), in a notation relative to the top level
5070
of your whole distribution, rather than the location of the
5071
@file{POTFILES.in} file itself.
5073
@node configure.in, config.guess, po/POTFILES.in, Adjusting Files
5074
@subsection @file{configure.in} at top level
5077
@item Declare the package and version.
5079
This is done by a set of lines like these:
5083
VERSION=@value{VERSION}
5084
AC_DEFINE_UNQUOTED(PACKAGE, "$PACKAGE")
5085
AC_DEFINE_UNQUOTED(VERSION, "$VERSION")
5091
Of course, you replace @samp{gettext} with the name of your package,
5092
and @samp{@value{VERSION}} by its version numbers, exactly as they
5093
should appear in the packaged @code{tar} file name of your distribution
5094
(@file{gettext-@value{VERSION}.tar.gz}, here).
5096
@item Declare the available translations.
5098
This is done by defining @code{ALL_LINGUAS} to the white separated,
5099
quoted list of available languages, in a single line, like this:
5106
This example means that German and French PO files are available, so
5107
that these languages are currently supported by your package. If you
5108
want to further restrict, at installation time, the set of installed
5109
languages, this should not be done by modifying @code{ALL_LINGUAS} in
5110
@file{configure.in}, but rather by using the @code{LINGUAS} environment
5111
variable (@pxref{Installers}).
5113
@item Check for internationalization support.
5115
Here is the main @code{m4} macro for triggering internationalization
5116
support. Just add this line to @file{configure.in}:
5123
This call is purposely simple, even if it generates a lot of configure
5124
time checking and actions.
5126
@item Have output files created.
5128
The @code{AC_OUTPUT} directive, at the end of your @file{configure.in}
5129
file, needs to be modified in two ways:
5132
AC_OUTPUT([@var{existing configuration files} intl/Makefile po/Makefile.in],
5133
@var{existing additional actions}])
5136
The modification to the first argument to @code{AC_OUTPUT} asks
5137
for substitution in the @file{intl/} and @file{po/} directories.
5138
Note the @samp{.in} suffix used for @file{po/} only. This is because
5139
the distributed file is really @file{po/Makefile.in.in}.
5143
@node config.guess, aclocal, configure.in, Adjusting Files
5144
@subsection @file{config.guess}, @file{config.sub} at top level
5146
You need to add the GNU @file{config.guess} and @file{config.sub} files
5147
to your distribution. They are needed because the @file{intl/} directory
5148
has platform dependent support for determining the locale's character
5149
encoding and therefore needs to identify the platform.
5151
You can obtain the newest version of @file{config.guess} and
5152
@file{config.sub} from @file{ftp://ftp.gnu.org/pub/gnu/config/}.
5153
Less recent versions are also contained in the GNU @code{automake} and
5154
GNU @code{libtool} packages.
5156
Normally, @file{config.guess} and @file{config.sub} are put at the
5157
top level of a distribution. But it is also possible to put them in a
5158
subdirectory, altogether with other configuration support files like
5159
@file{install-sh}, @file{ltconfig}, @file{ltmain.sh},
5160
@file{mkinstalldirs} or @file{missing}. All you need to do, other than
5161
moving the files, is to add the following line to your
5162
@file{configure.in}.
5165
AC_CONFIG_AUX_DIR([@var{subdir}])
5168
@node aclocal, acconfig, config.guess, Adjusting Files
5169
@subsection @file{aclocal.m4} at top level
5171
If you do not have an @file{aclocal.m4} file in your distribution,
5172
the simplest is to concatenate the files @file{codeset.m4},
5173
@file{gettext.m4}, @file{glibc21.m4}, @file{iconv.m4}, @file{isc-posix.m4},
5174
@file{lcmessage.m4}, @file{progtest.m4} from GNU @code{gettext}'s
5175
@file{m4/} directory into a single file.
5177
If you already have an @file{aclocal.m4} file, then you will have
5178
to merge the said macro files into your @file{aclocal.m4}. Note that if
5179
you are upgrading from a previous release of GNU @code{gettext}, you
5180
should most probably @emph{replace} the macros (@code{AM_GNU_GETTEXT},
5181
@code{AM_WITH_NLS}, etc.), as they usually
5182
change a little from one release of GNU @code{gettext} to the next.
5183
Their contents may vary as we get more experience with strange systems
5186
These macros check for the internationalization support functions
5187
and related informations. Hopefully, once stabilized, these macros
5188
might be integrated in the standard Autoconf set, because this
5189
piece of @code{m4} code will be the same for all projects using GNU
5192
@node acconfig, Makefile, aclocal, Adjusting Files
5193
@subsection @file{acconfig.h} at top level
5195
Earlier GNU @code{gettext} releases required to put definitions for
5196
@code{ENABLE_NLS}, @code{HAVE_GETTEXT} and @code{HAVE_LC_MESSAGES},
5197
@code{HAVE_STPCPY}, @code{PACKAGE} and @code{VERSION} into an
5198
@file{acconfig.h} file. This is not needed any more; you can remove
5199
them from your @file{acconfig.h} file unless your package uses them
5200
independently from the @file{intl/} directory.
5202
@node Makefile, src/Makefile, acconfig, Adjusting Files
5203
@subsection @file{Makefile.in} at top level
5205
Here are a few modifications you need to make to your main, top-level
5206
@file{Makefile.in} file.
5210
Add the following lines near the beginning of your @file{Makefile.in},
5211
so the @samp{dist:} goal will work properly (as explained further down):
5214
PACKAGE = @@PACKAGE@@
5215
VERSION = @@VERSION@@
5219
Add file @file{ABOUT-NLS} to the @code{DISTFILES} definition, so the file gets
5223
Wherever you process subdirectories in your @file{Makefile.in}, be sure
5224
you also process dir subdirectories @samp{intl} and @samp{po}. Special
5225
rules in the @file{Makefiles} take care for the case where no
5226
internationalization is wanted.
5228
If you are using Makefiles, either generated by automake, or hand-written
5229
so they carefully follow the GNU coding standards, the effected goals for
5230
which the new subdirectories must be handled include @samp{installdirs},
5231
@samp{install}, @samp{uninstall}, @samp{clean}, @samp{distclean}.
5233
Here is an example of a canonical order of processing. In this
5234
example, we also define @code{SUBDIRS} in @code{Makefile.in} for it
5235
to be further used in the @samp{dist:} goal.
5238
SUBDIRS = doc intl lib src @@POSUB@@
5241
Note that you must arrange for @samp{make} to descend into the
5242
@code{intl} directory before descending into other directories containing
5243
code which make use of the @code{libintl.h} header file. For this
5244
reason, here we mention @code{intl} before @code{lib} and @code{src}.
5247
that you will have to adapt to your own package.
5250
A delicate point is the @samp{dist:} goal, as both
5251
@file{intl/Makefile} and @file{po/Makefile} will later assume that the
5252
proper directory has been set up from the main @file{Makefile}. Here is
5253
an example at what the @samp{dist:} goal might look like:
5256
distdir = $(PACKAGE)-$(VERSION)
5260
chmod 777 $(distdir)
5261
for file in $(DISTFILES); do \
5262
ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir); \
5264
for subdir in $(SUBDIRS); do \
5265
mkdir $(distdir)/$$subdir || exit 1; \
5266
chmod 777 $(distdir)/$$subdir; \
5267
(cd $$subdir && $(MAKE) $@@) || exit 1; \
5269
tar chozf $(distdir).tar.gz $(distdir)
5275
@node src/Makefile, , Makefile, Adjusting Files
5276
@subsection @file{Makefile.in} in @file{src/}
5278
Some of the modifications made in the main @file{Makefile.in} will
5279
also be needed in the @file{Makefile.in} from your package sources,
5280
which we assume here to be in the @file{src/} subdirectory. Here are
5281
all the modifications needed in @file{src/Makefile.in}:
5285
In view of the @samp{dist:} goal, you should have these lines near the
5286
beginning of @file{src/Makefile.in}:
5289
PACKAGE = @@PACKAGE@@
5290
VERSION = @@VERSION@@
5294
If not done already, you should guarantee that @code{top_srcdir}
5295
gets defined. This will serve for @code{cpp} include files. Just add
5299
top_srcdir = @@top_srcdir@@
5303
You might also want to define @code{subdir} as @samp{src}, later
5304
allowing for almost uniform @samp{dist:} goals in all your
5305
@file{Makefile.in}. At list, the @samp{dist:} goal below assume that
5313
The @code{main} function of your program will normally call
5314
@code{bindtextdomain} (see @pxref{Triggering}), like this:
5317
bindtextdomain (@var{PACKAGE}, LOCALEDIR);
5320
To make LOCALEDIR known to the program, add the following lines to
5324
datadir = @@datadir@@
5325
localedir = $(datadir)/locale
5326
DEFS = -DLOCALEDIR=\"$(localedir)\" @@DEFS@@
5329
Note that @code{@@datadir@@} defaults to @samp{$(prefix)/share}, thus
5330
@code{$(localedir)} defaults to @samp{$(prefix)/share/locale}.
5333
You should ensure that the final linking will use @code{@@INTLLIBS@@} as
5334
a library. An easy way to achieve this is to manage that it gets into
5335
@code{LIBS}, like this:
5338
LIBS = @@INTLLIBS@@ @@LIBS@@
5341
In most packages internationalized with GNU @code{gettext}, one will
5342
find a directory @file{lib/} in which a library containing some helper
5343
functions will be build. (You need at least the few functions which the
5344
GNU @code{gettext} Library itself needs.) However some of the functions
5345
in the @file{lib/} also give messages to the user which of course should be
5346
translated, too. Taking care of this it is not enough to place the support
5347
library (say @file{libsupport.a}) just between the @code{@@INTLLIBS@@}
5348
and @code{@@LIBS@@} in the above example. Instead one has to write this:
5351
LIBS = ../lib/libsupport.a @@INTLLIBS@@ ../lib/libsupport.a @@LIBS@@
5355
You should also ensure that directory @file{intl/} will be searched for
5356
C preprocessor include files in all circumstances. So, you have to
5357
manage so both @samp{-I../intl} and @samp{-I$(top_srcdir)/intl} will
5358
be given to the C compiler.
5361
Your @samp{dist:} goal has to conform with others. Here is a
5362
reasonable definition for it:
5365
distdir = ../$(PACKAGE)-$(VERSION)/$(subdir)
5366
dist: Makefile $(DISTFILES)
5367
for file in $(DISTFILES); do \
5368
ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir); \
5374
@node Conclusion, Language Codes, Maintainers, Top
5375
@chapter Concluding Remarks
5377
We would like to conclude this GNU @code{gettext} manual by presenting
5378
an history of the Translation Project so far. We finally give
5379
a few pointers for those who want to do further research or readings
5380
about Native Language Support matters.
5383
* History:: History of GNU @code{gettext}
5384
* References:: Related Readings
5387
@node History, References, Conclusion, Conclusion
5388
@section History of GNU @code{gettext}
5390
Internationalization concerns and algorithms have been informally
5391
and casually discussed for years in GNU, sometimes around GNU
5392
@code{libc}, maybe around the incoming @code{Hurd}, or otherwise
5393
(nobody clearly remembers). And even then, when the work started for
5394
real, this was somewhat independently of these previous discussions.
5396
This all began in July 1994, when Patrick D'Cruze had the idea and
5397
initiative of internationalizing version 3.9.2 of GNU @code{fileutils}.
5398
He then asked Jim Meyering, the maintainer, how to get those changes
5399
folded into an official release. That first draft was full of
5400
@code{#ifdef}s and somewhat disconcerting, and Jim wanted to find
5401
nicer ways. Patrick and Jim shared some tries and experimentations
5402
in this area. Then, feeling that this might eventually have a deeper
5403
impact on GNU, Jim wanted to know what standards were, and contacted
5404
Richard Stallman, who very quickly and verbally described an overall
5405
design for what was meant to become @code{glocale}, at that time.
5407
Jim implemented @code{glocale} and got a lot of exhausting feedback
5408
from Patrick and Richard, of course, but also from Mitchum DSouza
5409
(who wrote a @code{catgets}-like package), Roland McGrath, maybe David
5410
MacKenzie, Fran@,{c}ois Pinard, and Paul Eggert, all pushing and
5411
pulling in various directions, not always compatible, to the extent
5412
that after a couple of test releases, @code{glocale} was torn apart.
5414
While Jim took some distance and time and became dad for a second
5415
time, Roland wanted to get GNU @code{libc} internationalized, and
5416
got Ulrich Drepper involved in that project. Instead of starting
5417
from @code{glocale}, Ulrich rewrote something from scratch, but
5418
more conformant to the set of guidelines who emerged out of the
5419
@code{glocale} effort. Then, Ulrich got people from the previous
5420
forum to involve themselves into this new project, and the switch
5421
from @code{glocale} to what was first named @code{msgutils}, renamed
5422
@code{nlsutils}, and later @code{gettext}, became officially accepted
5423
by Richard in May 1995 or so.
5425
Let's summarize by saying that Ulrich Drepper wrote GNU @code{gettext}
5426
in April 1995. The first official release of the package, including
5427
PO mode, occurred in July 1995, and was numbered 0.7. Other people
5428
contributed to the effort by providing a discussion forum around
5429
Ulrich, writing little pieces of code, or testing. These are quoted
5430
in the @code{THANKS} file which comes with the GNU @code{gettext}
5433
While this was being done, Fran@,{c}ois adapted half a dozen of
5434
GNU packages to @code{glocale} first, then later to @code{gettext},
5435
putting them in pretest, so providing along the way an effective
5436
user environment for fine tuning the evolving tools. He also took
5437
the responsibility of organizing and coordinating the Translation
5438
Project. After nearly a year of informal exchanges between people from
5439
many countries, translator teams started to exist in May 1995, through
5440
the creation and support by Patrick D'Cruze of twenty unmoderated
5441
mailing lists for that many native languages, and two moderated
5442
lists: one for reaching all teams at once, the other for reaching
5443
all willing maintainers of internationalized free software packages.
5445
Fran@,{c}ois also wrote PO mode in June 1995 with the collaboration
5446
of Greg McGary, as a kind of contribution to Ulrich's package.
5447
He also gave a hand with the GNU @code{gettext} Texinfo manual.
5449
@node References, , History, Conclusion
5450
@section Related Readings
5452
Eugene H. Dorr (@file{dorre@@well.com}) maintains an interesting
5453
bibliography on internationalization matters, called
5454
@cite{Internationalization Reference List}, which is available as:
5456
ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/i18n-books.txt
5459
Michael Gschwind (@file{mike@@vlsivie.tuwien.ac.at}) maintains a
5460
Frequently Asked Questions (FAQ) list, entitled @cite{Programming for
5461
Internationalisation}. This FAQ discusses writing programs which
5462
can handle different language conventions, character sets, etc.;
5463
and is applicable to all character set encodings, with particular
5464
emphasis on @w{ISO 8859-1}. It is regularly published in Usenet
5465
groups @file{comp.unix.questions}, @file{comp.std.internat},
5466
@file{comp.software.international}, @file{comp.lang.c},
5467
@file{comp.windows.x}, @file{comp.std.c}, @file{comp.answers}
5468
and @file{news.answers}. The home location of this document is:
5470
ftp://ftp.vlsivie.tuwien.ac.at/pub/8bit/ISO-programming
5473
Patrick D'Cruze (@file{pdcruze@@li.org}) wrote a tutorial about NLS
5474
matters, and Jochen Hein (@file{Hein@@student.tu-clausthal.de}) took
5475
over the responsibility of maintaining it. It may be found as:
5477
ftp://sunsite.unc.edu/pub/Linux/utils/nls/catalogs/Incoming/...
5478
...locale-tutorial-0.8.txt.gz
5481
This site is mirrored in:
5483
ftp://ftp.ibp.fr/pub/linux/sunsite/
5486
A French version of the same tutorial should be findable at:
5488
ftp://ftp.ibp.fr/pub/linux/french/docs/
5491
together with French translations of many Linux-related documents.
5493
@node Language Codes, Country Codes, Conclusion, Top
5494
@appendix Language Codes
5496
The @w{ISO 639} standard defines two character codes for many languages.
5497
All abbreviations for languages used in the Translation Project should
5498
come from this standard.
5501
@include iso-639.texi
5504
@node Country Codes, , Language Codes, Top
5505
@appendix Country Codes
5507
The @w{ISO 3166} standard defines two character codes for many countries
5508
and territories. All abbreviations for countries used in the Translation
5509
Project should come from this standard.
5512
@include iso-3166.texi
5519
@c texinfo-column-for-description: 32