~ubuntu-branches/ubuntu/trusty/gettext/trusty : revision 1

1

\input texinfo @c -*-texinfo-*-

2

@c %**start of header

3

@setfilename gettext.info

4

@settitle GNU @code{gettext} utilities

5

@finalout

6

@c %**end of header

7

8

@include version.texi

9

10

@dircategory GNU Gettext Utilities

11

@direntry

12

* Gettext: (gettext). GNU gettext utilities.

13

* gettextize: (gettext)gettextize Invocation. Prepare a package for gettext.

14

* msgfmt: (gettext)msgfmt Invocation. Make MO files out of PO files.

15

* msgmerge: (gettext)msgmerge Invocation. Update two PO files into one.

16

* xgettext: (gettext)xgettext Invocation. Extract strings into a PO file.

17

@end direntry

18

19

@ifinfo

20

This file provides documentation for GNU @code{gettext} utilities.

21

It also serves as a reference for the free Translation Project.

22

23

24

25

Permission is granted to make and distribute verbatim copies of

26

this manual provided the copyright notice and this permission notice

27

are preserved on all copies.

28

29

@ignore

30

Permission is granted to process this file through TeX and print the

31

results, provided the printed document carries copying permission

32

notice identical to this one except for the removal of this paragraph

33

(this paragraph not being relevant to the printed manual).

34

35

@end ignore

36

Permission is granted to copy and distribute modified versions of this

37

manual under the conditions for verbatim copying, provided that the entire

38

resulting derived work is distributed under the terms of a permission

39

notice identical to this one.

40

41

Permission is granted to copy and distribute translations of this manual

42

into another language, under the above conditions for modified versions,

43

except that this permission notice may be stated in a translation approved

44

by the Foundation.

45

@end ifinfo

46

47

@titlepage

48

@title GNU gettext tools, version @value{VERSION}

49

@subtitle Native Language Support Library and Tools

50

@subtitle Edition @value{EDITION}, @value{UPDATED}

51

@author Ulrich Drepper

52

@author Jim Meyering

53

@author Fran@,{c}ois Pinard

54

55

@page

56

@vskip 0pt plus 1filll

57

Copyright @copyright{} 1995, 1996, 1997, 1998, 2001 Free Software Foundation, Inc.

58

59

Permission is granted to make and distribute verbatim copies of

60

this manual provided the copyright notice and this permission notice

61

are preserved on all copies.

62

63

Permission is granted to copy and distribute modified versions of this

64

manual under the conditions for verbatim copying, provided that the entire

65

resulting derived work is distributed under the terms of a permission

66

notice identical to this one.

67

68

Permission is granted to copy and distribute translations of this manual

69

into another language, under the above conditions for modified versions,

70

except that this permission notice may be stated in a translation approved

71

by the Foundation.

72

@end titlepage

73

74

@ifinfo

75

@node Top, Introduction, (dir), (dir)

76

@top GNU @code{gettext} utilities

77

78

@menu

79

* Introduction:: Introduction

80

* Basics:: PO Files and PO Mode Basics

81

* Sources:: Preparing Program Sources

82

* Template:: Making the PO Template File

83

* Creating:: Creating a New PO File

84

* Updating:: Updating Existing PO Files

85

* Binaries:: Producing Binary MO Files

86

* Users:: The User's View

87

* Programmers:: The Programmer's View

88

* Translators:: The Translator's View

89

* Maintainers:: The Maintainer's View

90

* Conclusion:: Concluding Remarks

91

92

* Language Codes:: ISO 639 language codes

93

* Country Codes:: ISO 3166 country codes

94

95

@detailmenu

96

--- The Detailed Node Listing ---

97

98

Introduction

99

100

* Why:: The Purpose of GNU @code{gettext}

101

* Concepts:: I18n, L10n, and Such

102

* Aspects:: Aspects in Native Language Support

103

* Files:: Files Conveying Translations

104

* Overview:: Overview of GNU @code{gettext}

105

106

PO Files and PO Mode Basics

107

108

* Installation:: Completing GNU @code{gettext} Installation

109

* PO Files:: The Format of PO Files

110

* Main PO Commands:: Main Commands

111

* Entry Positioning:: Entry Positioning

112

* Normalizing:: Normalizing Strings in Entries

113

114

Preparing Program Sources

115

116

* Triggering:: Triggering @code{gettext} Operations

117

* Mark Keywords:: How Marks Appear in Sources

118

* Marking:: Marking Translatable Strings

119

* c-format:: Telling something about the following string

120

* Special cases:: Special Cases of Translatable Strings

121

122

Making the PO Template File

123

124

* xgettext Invocation:: Invoking the @code{xgettext} Program

125

126

Updating Existing PO Files

127

128

* msgmerge Invocation:: Invoking the @code{msgmerge} Program

129

* Translated Entries:: Translated Entries

130

* Fuzzy Entries:: Fuzzy Entries

131

* Untranslated Entries:: Untranslated Entries

132

* Obsolete Entries:: Obsolete Entries

133

* Modifying Translations:: Modifying Translations

134

* Modifying Comments:: Modifying Comments

135

* Subedit:: Mode for Editing Translations

136

* C Sources Context:: C Sources Context

137

* Auxiliary:: Consulting Auxiliary PO Files

138

* Compendium:: Using Translation Compendiums

139

140

Producing Binary MO Files

141

142

* msgfmt Invocation:: Invoking the @code{msgfmt} Program

143

* MO Files:: The Format of GNU MO Files

144

145

The User's View

146

147

* Matrix:: The Current @file{ABOUT-NLS} Matrix

148

* Installers:: Magic for Installers

149

* End Users:: Magic for End Users

150

151

The Programmer's View

152

153

* catgets:: About @code{catgets}

154

* gettext:: About @code{gettext}

155

* Comparison:: Comparing the two interfaces

156

* Using libintl.a:: Using libintl.a in own programs

157

* gettext grok:: Being a @code{gettext} grok

158

* Temp Programmers:: Temporary Notes for the Programmers Chapter

159

160

About @code{catgets}

161

162

* Interface to catgets:: The interface

163

* Problems with catgets:: Problems with the @code{catgets} interface?!

164

165

About @code{gettext}

166

167

* Interface to gettext:: The interface

168

* Ambiguities:: Solving ambiguities

169

* Locating Catalogs:: Locating message catalog files

170

* Charset conversion:: How to request conversion to Unicode

171

* Plural forms:: Additional functions for handling plurals

172

* GUI program problems:: Another technique for solving ambiguities

173

* Optimized gettext:: Optimization of the *gettext functions

174

175

Temporary Notes for the Programmers Chapter

176

177

* Temp Implementations:: Temporary - Two Possible Implementations

178

* Temp catgets:: Temporary - About @code{catgets}

179

* Temp WSI:: Temporary - Why a single implementation

180

* Temp Notes:: Temporary - Notes

181

182

The Translator's View

183

184

* Trans Intro 0:: Introduction 0

185

* Trans Intro 1:: Introduction 1

186

* Discussions:: Discussions

187

* Organization:: Organization

188

* Information Flow:: Information Flow

189

190

Organization

191

192

* Central Coordination:: Central Coordination

193

* National Teams:: National Teams

194

* Mailing Lists:: Mailing Lists

195

196

National Teams

197

198

* Sub-Cultures:: Sub-Cultures

199

* Organizational Ideas:: Organizational Ideas

200

201

The Maintainer's View

202

203

* Flat and Non-Flat:: Flat or Non-Flat Directory Structures

204

* Prerequisites:: Prerequisite Works

205

* gettextize Invocation:: Invoking the @code{gettextize} Program

206

* Adjusting Files:: Files You Must Create or Alter

207

208

Files You Must Create or Alter

209

210

* po/POTFILES.in:: @file{POTFILES.in} in @file{po/}

211

* configure.in:: @file{configure.in} at top level

212

* config.guess:: @file{config.guess}, @file{config.sub} at top level

213

* aclocal:: @file{aclocal.m4} at top level

214

* acconfig:: @file{acconfig.h} at top level

215

* Makefile:: @file{Makefile.in} at top level

216

* src/Makefile:: @file{Makefile.in} in @file{src/}

217

218

Concluding Remarks

219

220

* History:: History of GNU @code{gettext}

221

* References:: Related Readings

222

223

@end detailmenu

224

@end menu

225

226

@end ifinfo

227

228

@node Introduction, Basics, Top, Top

229

@chapter Introduction

230

231

@quotation

232

This manual is still in @emph{DRAFT} state. Some sections are still

233

empty, or almost. We keep merging material from other sources

234

(essentially e-mail folders) while the proper integration of this

235

material is delayed.

236

@end quotation

237

238

In this manual, we use @emph{he} when speaking of the programmer or

239

maintainer, @emph{she} when speaking of the translator, and @emph{they}

240

when speaking of the installers or end users of the translated program.

241

This is only a convenience for clarifying the documentation. It is

242

@emph{absolutely} not meant to imply that some roles are more appropriate

243

to males or females. Besides, as you might guess, GNU @code{gettext}

244

is meant to be useful for people using computers, whatever their sex,

245

race, religion or nationality!

246

247

This chapter explains the goals sought in the creation

248

of GNU @code{gettext} and the free Translation Project.

249

Then, it explains a few broad concepts around

250

Native Language Support, and positions message translation with regard

251

to other aspects of national and cultural variance, as they apply to

252

to programs. It also surveys those files used to convey the

253

translations. It explains how the various tools interact in the

254

initial generation of these files, and later, how the maintenance

255

cycle should usually operate.

256

257

Please send suggestions and corrections to:

258

259

@example

260

@group

261

@r{Internet address:}

262

bug-gnu-utils@@gnu.org

263

@end group

264

@end example

265

266

@noindent

267

Please include the manual's edition number and update date in your messages.

268

269

@menu

270

* Why:: The Purpose of GNU @code{gettext}

271

* Concepts:: I18n, L10n, and Such

272

* Aspects:: Aspects in Native Language Support

273

* Files:: Files Conveying Translations

274

* Overview:: Overview of GNU @code{gettext}

275

@end menu

276

277

@node Why, Concepts, Introduction, Introduction

278

@section The Purpose of GNU @code{gettext}

279

280

Usually, programs are written and documented in English, and use

281

English at execution time to interact with users. This is true

282

not only of GNU software, but also of a great deal of commercial

283

and free software. Using a common language is quite handy for

284

communication between developers, maintainers and users from all

285

countries. On the other hand, most people are less comfortable with

286

English than with their own native language, and would prefer to

287

use their mother tongue for day to day's work, as far as possible.

288

Many would simply @emph{love} to see their computer screen showing

289

a lot less of English, and far more of their own language.

290

291

However, to many people, this dream might appear so far fetched that

292

they may believe it is not even worth spending time thinking about

293

it. They have no confidence at all that the dream might ever

294

become true. Yet some have not lost hope, and have organized themselves.

295

The Translation Project is a formalization of this hope into a

296

workable structure, which has a good chance to get all of us nearer

297

the achievement of a truly multi-lingual set of programs.

298

299

GNU @code{gettext} is an important step for the Translation Project,

300

as it is an asset on which we may build many other steps. This package

301

offers to programmers, translators and even users, a well integrated

302

set of tools and documentation. Specifically, the GNU @code{gettext}

303

utilities are a set of tools that provides a framework within which

304

other free packages may produce multi-lingual messages. These tools

305

include

306

307

@itemize @bullet

308

@item

309

A set of conventions about how programs should be written to support

310

message catalogs.

311

312

@item

313

A directory and file naming organization for the message catalogs

314

themselves.

315

316

@item

317

A runtime library supporting the retrieval of translated messages.

318

319

@item

320

A few stand-alone programs to massage in various ways the sets of

321

translatable strings, or already translated strings.

322

323

@item

324

A special mode for Emacs@footnote{In this manual, all mentions of Emacs

325

refers to either GNU Emacs or to XEmacs, which people sometimes call FSF

326

Emacs and Lucid Emacs, respectively.} which helps preparing these sets

327

and bringing them up to date.

328

@end itemize

329

330

GNU @code{gettext} is designed to minimize the impact of

331

internationalization on program sources, keeping this impact as small

332

and hardly noticeable as possible. Internationalization has better

333

chances of succeeding if it is very light weighted, or at least,

334

appear to be so, when looking at program sources.

335

336

The Translation Project also uses the GNU @code{gettext} distribution

337

as a vehicle for documenting its structure and methods. This goes

338

beyond the strict technicalities of documenting the GNU @code{gettext}

339

proper. By so doing, translators will find in a single place, as

340

far as possible, all they need to know for properly doing their

341

translating work. Also, this supplemental documentation might also

342

help programmers, and even curious users, in understanding how GNU

343

@code{gettext} is related to the remainder of the Translation

344

Project, and consequently, have a glimpse at the @emph{big picture}.

345

346

@node Concepts, Aspects, Why, Introduction

347

@section I18n, L10n, and Such

348

349

Two long words appear all the time when we discuss support of native

350

language in programs, and these words have a precise meaning, worth

351

being explained here, once and for all in this document. The words are

352

@emph{internationalization} and @emph{localization}. Many people,

353

tired of writing these long words over and over again, took the

354

habit of writing @dfn{i18n} and @dfn{l10n} instead, quoting the first

355

and last letter of each word, and replacing the run of intermediate

356

letters by a number merely telling how many such letters there are.

357

But in this manual, in the sake of clarity, we will patiently write

358

the names in full, each time@dots{}

359

360

By @dfn{internationalization}, one refers to the operation by which a

361

program, or a set of programs turned into a package, is made aware of and

362

able to support multiple languages. This is a generalization process,

363

by which the programs are untied from calling only English strings or

364

other English specific habits, and connected to generic ways of doing

365

the same, instead. Program developers may use various techniques to

366

internationalize their programs. Some of these have been standardized.

367

GNU @code{gettext} offers one of these standards. @xref{Programmers}.

368

369

By @dfn{localization}, one means the operation by which, in a set

370

of programs already internationalized, one gives the program all

371

needed information so that it can adapt itself to handle its input

372

and output in a fashion which is correct for some native language and

373

cultural habits. This is a particularisation process, by which generic

374

methods already implemented in an internationalized program are used

375

in specific ways. The programming environment puts several functions

376

to the programmers disposal which allow this runtime configuration.

377

The formal description of specific set of cultural habits for some

378

country, together with all associated translations targeted to the

379

same native language, is called the @dfn{locale} for this language

380

or country. Users achieve localization of programs by setting proper

381

values to special environment variables, prior to executing those

382

programs, identifying which locale should be used.

383

384

In fact, locale message support is only one component of the cultural

385

data that makes up a particular locale. There are a whole host of

386

routines and functions provided to aid programmers in developing

387

internationalized software and which allow them to access the data

388

stored in a particular locale. When someone presently refers to a

389

particular locale, they are obviously referring to the data stored

390

within that particular locale. Similarly, if a programmer is referring

391

to ``accessing the locale routines'', they are referring to the

392

complete suite of routines that access all of the locale's information.

393

394

One uses the expression @dfn{Native Language Support}, or merely NLS,

395

for speaking of the overall activity or feature encompassing both

396

internationalization and localization, allowing for multi-lingual

397

interactions in a program. In a nutshell, one could say that

398

internationalization is the operation by which further localizations

399

are made possible.

400

401

Also, very roughly said, when it comes to multi-lingual messages,

402

internationalization is usually taken care of by programmers, and

403

localization is usually taken care of by translators.

404

405

@node Aspects, Files, Concepts, Introduction

406

@section Aspects in Native Language Support

407

408

For a totally multi-lingual distribution, there are many things to

409

translate beyond output messages.

410

411

@itemize @bullet

412

@item

413

As of today, GNU @code{gettext} offers a complete toolset for

414

translating messages output by C programs. Perl scripts and shell

415

scripts will also need to be translated. Even if there are today some hooks

416

by which this can be done, these hooks are not integrated as well as they

417

should be.

418

419

@item

420

Some programs, like @code{autoconf} or @code{bison}, are able

421

to produce other programs (or scripts). Even if the generating

422

programs themselves are internationalized, the generated programs they

423

produce may need internationalization on their own, and this indirect

424

internationalization could be automated right from the generating

425

program. In fact, quite usually, generating and generated programs

426

could be internationalized independently, as the effort needed is

427

fairly orthogonal.

428

429

@item

430

A few programs include textual tables which might need translation

431

themselves, independently of the strings contained in the program

432

itself. For example, @w{RFC 1345} gives an English description for each

433

character which the @code{recode} program is able to reconstruct at execution.

434

Since these descriptions are extracted from the RFC by mechanical means,

435

translating them properly would require a prior translation of the RFC

436

itself.

437

438

@item

439

Almost all programs accept options, which are often worded out so to

440

be descriptive for the English readers; one might want to consider

441

offering translated versions for program options as well.

442

443

@item

444

Many programs read, interpret, compile, or are somewhat driven by

445

input files which are texts containing keywords, identifiers, or

446

replies which are inherently translatable. For example, one may want

447

@code{gcc} to allow diacriticized characters in identifiers or use

448

translated keywords; @samp{rm -i} might accept something else than

449

@samp{y} or @samp{n} for replies, etc. Even if the program will

450

eventually make most of its output in the foreign languages, one has

451

to decide whether the input syntax, option values, etc., are to be

452

localized or not.

453

454

@item

455

The manual accompanying a package, as well as all documentation files

456

in the distribution, could surely be translated, too. Translating a

457

manual, with the intent of later keeping up with updates, is a major

458

undertaking in itself, generally.

459

460

@end itemize

461

462

As we already stressed, translation is only one aspect of locales.

463

Other internationalization aspects are system services and are handled

464

in GNU @code{libc}. There

465

are many attributes that are needed to define a country's cultural

466

conventions. These attributes include beside the country's native

467

language, the formatting of the date and time, the representation of

468

numbers, the symbols for currency, etc. These local @dfn{rules} are

469

termed the country's locale. The locale represents the knowledge

470

needed to support the country's native attributes.

471

472

There are a few major areas which may vary between countries and

473

hence, define what a locale must describe. The following list helps

474

putting multi-lingual messages into the proper context of other tasks

475

related to locales. See the GNU @code{libc} manual for details.

476

477

@table @emph

478

479

@item Characters and Codesets

480

481

The codeset most commonly used through out the USA and most English

482

speaking parts of the world is the ASCII codeset. However, there are

483

many characters needed by various locales that are not found within

484

this codeset. The 8-bit @w{ISO 8859-1} code set has most of the special

485

characters needed to handle the major European languages. However, in

486

many cases, the @w{ISO 8859-1} font is not adequate. Hence each locale

487

will need to specify which codeset they need to use and will need

488

to have the appropriate character handling routines to cope with

489

the codeset.

490

491

@item Currency

492

493

The symbols used vary from country to country as does the position

494

used by the symbol. Software needs to be able to transparently

495

display currency figures in the native mode for each locale.

496

497

@item Dates

498

499

The format of date varies between locales. For example, Christmas day

500

in 1994 is written as 12/25/94 in the USA and as 25/12/94 in Australia.

501

Other countries might use @w{ISO 8061} dates, etc.

502

503

Time of the day may be noted as @var{hh}:@var{mm}, @var{hh}.@var{mm},

504

or otherwise. Some locales require time to be specified in 24-hour

505

mode rather than as AM or PM. Further, the nature and yearly extent

506

of the Daylight Saving correction vary widely between countries.

507

508

@item Numbers

509

510

Numbers can be represented differently in different locales.

511

For example, the following numbers are all written correctly for

512

their respective locales:

513

514

@example

515

12,345.67 English

516

12.345,67 French

517

1,2345.67 Asia

518

@end example

519

520

Some programs could go further and use different unit systems, like

521

English units or Metric units, or even take into account variants

522

about how numbers are spelled in full.

523

524

@item Messages

525

526

The most obvious area is the language support within a locale. This is

527

where GNU @code{gettext} provides the means for developers and users to

528

easily change the language that the software uses to communicate to

529

the user.

530

531

@end table

532

533

Components of locale outside of message handling are standardized in

534

the ISO C standard and the SUSV2 specification. GNU @code{libc}

535

fully implements this, and most other modern systems provide a more

536

or less reasonable support for at least some of the missing components.

537

538

@node Files, Overview, Aspects, Introduction

539

@section Files Conveying Translations

540

541

The letters PO in @file{.po} files means Portable Object, to

542

distinguish it from @file{.mo} files, where MO stands for Machine

543

Object. This paradigm, as well as the PO file format, is inspired

544

by the NLS standard developed by Uniforum, and implemented by Sun

545

in their Solaris system.

546

547

PO files are meant to be read and edited by humans, and associate each

548

original, translatable string of a given package with its translation

549

in a particular target language. A single PO file is dedicated to

550

a single target language. If a package supports many languages,

551

there is one such PO file per language supported, and each package

552

has its own set of PO files. These PO files are best created by

553

the @code{xgettext} program, and later updated or refreshed through

554

the @code{msgmerge} program. Program @code{xgettext} extracts all

555

marked messages from a set of C files and initializes a PO file with

556

empty translations. Program @code{msgmerge} takes care of adjusting

557

PO files between releases of the corresponding sources, commenting

558

obsolete entries, initializing new ones, and updating all source

559

line references. Files ending with @file{.pot} are kind of base

560

translation files found in distributions, in PO file format, and

561

@file{.pox} files are often temporary PO files.

562

563

MO files are meant to be read by programs, and are binary in nature.

564

A few systems already offer tools for creating and handling MO files

565

as part of the Native Language Support coming with the system, but the

566

format of these MO files is often different from system to system,

567

and non-portable. The tools already provided with these systems don't

568

support all the features of GNU @code{gettext}. Therefore GNU

569

@code{gettext} uses its own format for MO files. Files ending with

570

@file{.gmo} are really MO files, when it is known that these files use

571

the GNU format.

572

573

@node Overview, , Files, Introduction

574

@section Overview of GNU @code{gettext}

575

576

The following diagram summarizes the relation between the files

577

handled by GNU @code{gettext} and the tools acting on these files.

578

It is followed by a somewhat detailed explanations, which you should

579

read while keeping an eye on the diagram. Having a clear understanding

580

of these interrelations would surely help programmers, translators

581

and maintainers.

582

583

@example

584

@group

585

Original C Sources ---> PO mode ---> Marked C Sources ---.

586

|

587

.---------<--- GNU gettext Library |

588

.--- make <---+ |

589

| `---------<--------------------+-----------'

590

| |

591

| .-----<--- PACKAGE.pot <--- xgettext <---' .---<--- PO Compendium

592

| | | ^

593

| | `---. |

594

| `---. +---> PO mode ---.

595

| +----> msgmerge ------> LANG.pox --->--------' |

596

| .---' |

597

| | |

598

| `-------------<---------------. |

599

| +--- LANG.po <--- New LANG.pox <----'

600

| .--- LANG.gmo <--- msgfmt <---'

601

| |

602

| `---> install ---> /.../LANG/PACKAGE.mo ---.

603

| +---> "Hello world!"

604

`-------> install ---> /.../bin/PROGRAM -------'

605

@end group

606

@end example

607

608

The indication @samp{PO mode} appears in two places in this picture,

609

and you may safely read it as merely meaning ``hand editing'', using

610

any editor of your choice, really. However, for those of you being

611

the lucky users of Emacs, PO mode has been specifically created

612

for providing a cozy environment for editing or modifying PO files.

613

While editing a PO file, PO mode allows for the easy browsing of

614

auxiliary and compendium PO files, as well as for following references into

615

the set of C program sources from which PO files have been derived.

616

It has a few special features, among which are the interactive marking

617

of program strings as translatable, and the validatation of PO files

618

with easy repositioning to PO file lines showing errors.

619

620

As a programmer, the first step to bringing GNU @code{gettext}

621

into your package is identifying, right in the C sources, those strings

622

which are meant to be translatable, and those which are untranslatable.

623

This tedious job can be done a little more comfortably using emacs PO

624

mode, but you can use any means familiar to you for modifying your

625

C sources. Beside this some other simple, standard changes are needed to

626

properly initialize the translation library. @xref{Sources}, for

627

more information about all this.

628

629

For newly written software the strings of course can and should be

630

marked while writing it. The @code{gettext} approach makes this

631

very easy. Simply put the following lines at the beginning of each file

632

or in a central header file:

633

634

@example

635

@group

636

#define _(String) (String)

637

#define N_(String) (String)

638

#define textdomain(Domain)

639

#define bindtextdomain(Package, Directory)

640

@end group

641

@end example

642

643

@noindent

644

Doing this allows you to prepare the sources for internationalization.

645

Later when you feel ready for the step to use the @code{gettext} library

646

simply replace these definitions by the following:

647

648

@example

649

@group

650

#include <libintl.h>

651

#define _(String) gettext (String)

652

#define gettext_noop(String) (String)

653

#define N_(String) gettext_noop (String)

654

@end group

655

@end example

656

657

and link against @file{libintl.a} or @file{libintl.so}. Note that on

658

GNU systems, you don't need to link with @code{libintl} because the

659

@code{gettext} library functions are already contained in GNU libc.

660

That is all you have to change.

661

662

Once the C sources have been modified, the @code{xgettext} program

663

is used to find and extract all translatable strings, and create a

664

PO template file out of all these. This @file{@var{package}.pot} file

665

contains all original program strings. It has sets of pointers to

666

exactly where in C sources each string is used. All translations

667

are set to empty. The letter @kbd{t} in @file{.pot} marks this as

668

a Template PO file, not yet oriented towards any particular language.

669

@xref{xgettext Invocation}, for more details about how one calls the

670

@code{xgettext} program. If you are @emph{really} lazy, you might

671

be interested at working a lot more right away, and preparing the

672

whole distribution setup (@pxref{Maintainers}). By doing so, you

673

spare yourself typing the @code{xgettext} command, as @code{make}

674

should now generate the proper things automatically for you!

675

676

The first time through, there is no @file{@var{lang}.po} yet, so the

677

@code{msgmerge} step may be skipped and replaced by a mere copy of

678

@file{@var{package}.pot} to @file{@var{lang}.pox}, where @var{lang}

679

represents the target language.

680

681

Then comes the initial translation of messages. Translation in

682

itself is a whole matter, still exclusively meant for humans,

683

and whose complexity far overwhelms the level of this manual.

684

Nevertheless, a few hints are given in some other chapter of this

685

manual (@pxref{Translators}). You will also find there indications

686

about how to contact translating teams, or becoming part of them,

687

for sharing your translating concerns with others who target the same

688

native language.

689

690

While adding the translated messages into the @file{@var{lang}.pox}

691

PO file, if you do not have Emacs handy, you are on your own

692

for ensuring that your efforts fully respect the PO file format, and quoting

693

conventions (@pxref{PO Files}). This is surely not an impossible task,

694

as this is the way many people have handled PO files already for Uniforum or

695

Solaris. On the other hand, by using PO mode in Emacs, most details

696

of PO file format are taken care of for you, but you have to acquire

697

some familiarity with PO mode itself. Besides main PO mode commands

698

(@pxref{Main PO Commands}), you should know how to move between entries

699

(@pxref{Entry Positioning}), and how to handle untranslated entries

700

(@pxref{Untranslated Entries}).

701

702

If some common translations have already been saved into a compendium

703

PO file, translators may use PO mode for initializing untranslated

704

entries from the compendium, and also save selected translations into

705

the compendium, updating it (@pxref{Compendium}). Compendium files

706

are meant to be exchanged between members of a given translation team.

707

708

Programs, or packages of programs, are dynamic in nature: users write

709

bug reports and suggestion for improvements, maintainers react by

710

modifying programs in various ways. The fact that a package has

711

already been internationalized should not make maintainers shy

712

of adding new strings, or modifying strings already translated.

713

They just do their job the best they can. For the Translation

714

Project to work smoothly, it is important that maintainers do not

715

carry translation concerns on their already loaded shoulders, and that

716

translators be kept as free as possible of programmatic concerns.

717

718

The only concern maintainers should have is carefully marking new

719

strings as translatable, when they should be, and do not otherwise

720

worry about them being translated, as this will come in proper time.

721

Consequently, when programs and their strings are adjusted in various

722

ways by maintainers, and for matters usually unrelated to translation,

723

@code{xgettext} would construct @file{@var{package}.pot} files which are

724

evolving over time, so the translations carried by @file{@var{lang}.po}

725

are slowly fading out of date.

726

727

It is important for translators (and even maintainers) to understand

728

that package translation is a continuous process in the lifetime of a

729

package, and not something which is done once and for all at the start.

730

After an initial burst of translation activity for a given package,

731

interventions are needed once in a while, because here and there,

732

translated entries become obsolete, and new untranslated entries

733

appear, needing translation.

734

735

The @code{msgmerge} program has the purpose of refreshing an already

736

existing @file{@var{lang}.po} file, by comparing it with a newer

737

@file{@var{package}.pot} template file, extracted by @code{xgettext}

738

out of recent C sources. The refreshing operation adjusts all

739

references to C source locations for strings, since these strings

740

move as programs are modified. Also, @code{msgmerge} comments out as

741

obsolete, in @file{@var{lang}.pox}, those already translated entries

742

which are no longer used in the program sources (@pxref{Obsolete

743

Entries}). It finally discovers new strings and inserts them in

744

the resulting PO file as untranslated entries (@pxref{Untranslated

745

Entries}). @xref{msgmerge Invocation}, for more information about what

746

@code{msgmerge} really does.

747

748

Whatever route or means taken, the goal is to obtain an updated

749

@file{@var{lang}.pox} file offering translations for all strings.

750

When this is properly achieved, this file @file{@var{lang}.pox} may

751

take the place of the previous official @file{@var{lang}.po} file.

752

753

The temporal mobility, or fluidity of PO files, is an integral part of

754

the translation game, and should be well understood, and accepted.

755

People resisting it will have a hard time participating in the

756

Translation Project, or will give a hard time to other participants! In

757

particular, maintainers should relax and include all available official

758

PO files in their distributions, even if these have not recently been

759

updated, without banging or otherwise trying to exert pressure on the

760

translator teams to get the job done. The pressure should rather come

761

from the community of users speaking a particular language, and

762

maintainers should consider themselves fairly relieved of any concern

763

about the adequacy of translation files. On the other hand, translators

764

should reasonably try updating the PO files they are responsible for,

765

while the package is undergoing pretest, prior to an official

766

distribution.

767

768

Once the PO file is complete and dependable, the @code{msgfmt} program

769

is used for turning the PO file into a machine-oriented format, which

770

may yield efficient retrieval of translations by the programs of the

771

package, whenever needed at runtime (@pxref{MO Files}). @xref{msgfmt

772

Invocation}, for more information about all modalities of execution

773

for the @code{msgfmt} program.

774

775

Finally, the modified and marked C sources are compiled and linked

776

with the GNU @code{gettext} library, usually through the operation of

777

@code{make}, given a suitable @file{Makefile} exists for the project,

778

and the resulting executable is installed somewhere users will find it.

779

The MO files themselves should also be properly installed. Given the

780

appropriate environment variables are set (@pxref{End Users}), the

781

program should localize itself automatically, whenever it executes.

782

783

The remainder of this manual has the purpose of explaining in depth the various

784

steps outlined above.

785

786

@node Basics, Sources, Introduction, Top

787

@chapter PO Files and PO Mode Basics

788

789

The GNU @code{gettext} toolset helps programmers and translators

790

at producing, updating and using translation files, mainly those

791

PO files which are textual, editable files. This chapter stresses

792

the format of PO files, and contains a PO mode starter. PO mode

793

description is spread throughout this manual instead of being concentrated

794

in one place. Here we present only the basics of PO mode.

795

796

@menu

797

* Installation:: Completing GNU @code{gettext} Installation

798

* PO Files:: The Format of PO Files

799

* Main PO Commands:: Main Commands

800

* Entry Positioning:: Entry Positioning

801

* Normalizing:: Normalizing Strings in Entries

802

@end menu

803

804

@node Installation, PO Files, Basics, Basics

805

@section Completing GNU @code{gettext} Installation

806

807

Once you have received, unpacked, configured and compiled the GNU

808

@code{gettext} distribution, the @samp{make install} command puts in

809

place the programs @code{xgettext}, @code{msgfmt}, @code{gettext}, and

810

@code{msgmerge}, as well as their available message catalogs. To

811

top off a comfortable installation, you might also want to make the

812

PO mode available to your Emacs users.

813

814

During the installation of the PO mode, you might want to modify your

815

file @file{.emacs}, once and for all, so it contains a few lines looking

816

like:

817

818

@example

819

(setq auto-mode-alist

820

(cons '("\\.po[tx]?\\'\\|\\.po\\." . po-mode) auto-mode-alist))

821

(autoload 'po-mode "po-mode" "Major mode for translators to edit PO files" t)

822

@end example

823

824

Later, whenever you edit some @file{.po}, @file{.pot} or @file{.pox}

825

file, or any file having the string @samp{.po.} within its name,

826

Emacs loads @file{po-mode.elc} (or @file{po-mode.el}) as needed, and

827

automatically activates PO mode commands for the associated buffer.

828

The string @emph{PO} appears in the mode line for any buffer for

829

which PO mode is active. Many PO files may be active at once in a

830

single Emacs session.

831

832

If you are using Emacs version 20 or newer, and have already installed

833

the appropriate international fonts on your system, you may also tell

834

Emacs how to determine automatically the coding system of every PO file.

835

This will often (but not always) cause the necessary fonts to be loaded

836

and used for displaying the translations on your Emacs screen. For this

837

to happen, add the lines:

838

839

@example

840

(modify-coding-system-alist 'file "\\.po[tx]?\\'\\|\\.po\\."

841

'po-find-file-coding-system)

842

(autoload 'po-find-file-coding-system "po-mode")

843

@end example

844

845

@noindent

846

to your @file{.emacs} file. If, with this, you still see boxes instead

847

of international characters, try a different font set (via Shift Mouse

848

button 1).

849

850

@node PO Files, Main PO Commands, Installation, Basics

851

@section The Format of PO Files

852

853

A PO file is made up of many entries, each entry holding the relation

854

between an original untranslated string and its corresponding

855

translation. All entries in a given PO file usually pertain

856

to a single project, and all translations are expressed in a single

857

target language. One PO file @dfn{entry} has the following schematic

858

structure:

859

860

@example

861

@var{white-space}

862

# @var{translator-comments}

863

#. @var{automatic-comments}

864

#: @var{reference}@dots{}

865

#, @var{flag}@dots{}

866

msgid @var{untranslated-string}

867

msgstr @var{translated-string}

868

@end example

869

870

The general structure of a PO file should be well understood by

871

the translator. When using PO mode, very little has to be known

872

about the format details, as PO mode takes care of them for her.

873

874

Entries begin with some optional white space. Usually, when generated

875

through GNU @code{gettext} tools, there is exactly one blank line

876

between entries. Then comments follow, on lines all starting with the

877

character @kbd{#}. There are two kinds of comments: those which have

878

some white space immediately following the @kbd{#}, which comments are

879

created and maintained exclusively by the translator, and those which

880

have some non-white character just after the @kbd{#}, which comments

881

are created and maintained automatically by GNU @code{gettext} tools.

882

All comments, of either kind, are optional.

883

884

After white space and comments, entries show two strings, namely

885

first the untranslated string as it appears in the original program

886

sources, and then, the translation of this string. The original

887

string is introduced by the keyword @code{msgid}, and the translation,

888

by @code{msgstr}. The two strings, untranslated and translated,

889

are quoted in various ways in the PO file, using @kbd{"}

890

delimiters and @kbd{\} escapes, but the translator does not really

891

have to pay attention to the precise quoting format, as PO mode fully

892

takes care of quoting for her.

893

894

The @code{msgid} strings, as well as automatic comments, are produced

895

and managed by other GNU @code{gettext} tools, and PO mode does not

896

provide means for the translator to alter these. The most she can

897

do is merely deleting them, and only by deleting the whole entry.

898

On the other hand, the @code{msgstr} string, as well as translator

899

comments, are really meant for the translator, and PO mode gives her

900

the full control she needs.

901

902

The comment lines beginning with @kbd{#,} are special because they are

903

not completely ignored by the programs as comments generally are. The

904

comma separated list of @var{flag}s is used by the @code{msgfmt}

905

program to give the user some better diagnostic messages. Currently

906

there are two forms of flags defined:

907

908

@table @kbd

909

@item fuzzy

910

This flag can be generated by the @code{msgmerge} program or it can be

911

inserted by the translator herself. It shows that the @code{msgstr}

912

string might not be a correct translation (anymore). Only the translator

913

can judge if the translation requires further modification, or is

914

acceptable as is. Once satisfied with the translation, she then removes

915

this @kbd{fuzzy} attribute. The @code{msgmerge} program inserts this

916

when it combined the @code{msgid} and @code{msgstr} entries after fuzzy

917

search only. @xref{Fuzzy Entries}.

918

919

@item c-format

920

@itemx no-c-format

921

These flags should not be added by a human. Instead only the

922

@code{xgettext} program adds them. In an automatized PO file processing

923

system as proposed here the user changes would be thrown away again as

924

soon as the @code{xgettext} program generates a new template file.

925

926

In case the @kbd{c-format} flag is given for a string the @code{msgfmt}

927

does some more tests to check to validity of the translation.

928

@xref{msgfmt Invocation}.

929

930

@end table

931

932

A different kind of entries is used for translations which involve

933

plural forms.

934

935

@example

936

@var{white-space}

937

# @var{translator-comments}

938

#. @var{automatic-comments}

939

#: @var{reference}@dots{}

940

#, @var{flag}@dots{}

941

msgid @var{untranslated-string-singular}

942

msgid_plural @var{untranslated-string-plural}

943

msgstr[0] @var{translated-string-case-0}

944

...

945

msgstr[N] @var{translated-string-case-n}

946

@end example

947

948

It happens that some lines, usually whitespace or comments, follow the

949

very last entry of a PO file. Such lines are not part of any entry,

950

and PO mode is unable to take action on those lines. By using the

951

PO mode function @w{@kbd{M-x po-normalize}}, the translator may get

952

rid of those spurious lines. @xref{Normalizing}.

953

954

The remainder of this section may be safely skipped by those using

955

PO mode, yet it may be interesting for everybody to have a better

956

idea of the precise format of a PO file. On the other hand, those

957

not having Emacs handy should carefully continue reading on.

958

959

Each of @var{untranslated-string} and @var{translated-string} respects

960

the C syntax for a character string, including the surrounding quotes

961

and imbedded backslashed escape sequences. When the time comes

962

to write multi-line strings, one should not use escaped newlines.

963

Instead, a closing quote should follow the last character on the

964

line to be continued, and an opening quote should resume the string

965

at the beginning of the following PO file line. For example:

966

967

@example

968

msgid ""

969

"Here is an example of how one might continue a very long string\n"

970

"for the common case the string represents multi-line output.\n"

971

@end example

972

973

@noindent

974

In this example, the empty string is used on the first line, to

975

allow better alignment of the @kbd{H} from the word @samp{Here}

976

over the @kbd{f} from the word @samp{for}. In this example, the

977

@code{msgid} keyword is followed by three strings, which are meant

978

to be concatenated. Concatenating the empty string does not change

979

the resulting overall string, but it is a way for us to comply with

980

the necessity of @code{msgid} to be followed by a string on the same

981

line, while keeping the multi-line presentation left-justified, as

982

we find this to be a cleaner disposition. The empty string could have

983

been omitted, but only if the string starting with @samp{Here} was

984

promoted on the first line, right after @code{msgid}.@footnote{This

985

limitation is not imposed by GNU @code{gettext}, but is for compatibility

986

with the @code{msgfmt} implementation on Solaris.} It was not really necessary

987

either to switch between the two last quoted strings immediately after

988

the newline @samp{\n}, the switch could have occurred after @emph{any}

989

other character, we just did it this way because it is neater.

990

991

One should carefully distinguish between end of lines marked as

992

@samp{\n} @emph{inside} quotes, which are part of the represented

993

string, and end of lines in the PO file itself, outside string quotes,

994

which have no incidence on the represented string.

995

996

Outside strings, white lines and comments may be used freely.

997

Comments start at the beginning of a line with @samp{#} and extend

998

until the end of the PO file line. Comments written by translators

999

should have the initial @samp{#} immediately followed by some white

1000

space. If the @samp{#} is not immediately followed by white space,

1001

this comment is most likely generated and managed by specialized GNU

1002

tools, and might disappear or be replaced unexpectedly when the PO

1003

file is given to @code{msgmerge}.

1004

1005

@node Main PO Commands, Entry Positioning, PO Files, Basics

1006

@section Main PO mode Commands

1007

1008

After setting up Emacs with something similar to the lines in

1009

@ref{Installation}, PO mode is activated for a window when Emacs finds a

1010

PO file in that window. This puts the window read-only and establishes a

1011

po-mode-map, which is a genuine Emacs mode, in a way that is not derived

1012

from text mode in any way. Functions found on @code{po-mode-hook},

1013

if any, will be executed.

1014

1015

When PO mode is active in a window, the letters @samp{PO} appear

1016

in the mode line for that window. The mode line also displays how

1017

many entries of each kind are held in the PO file. For example,

1018

the string @samp{132t+3f+10u+2o} would tell the translator that the

1019

PO mode contains 132 translated entries (@pxref{Translated Entries},

1020

3 fuzzy entries (@pxref{Fuzzy Entries}), 10 untranslated entries

1021

(@pxref{Untranslated Entries}) and 2 obsolete entries (@pxref{Obsolete

1022

Entries}). Zero-coefficients items are not shown. So, in this example, if

1023

the fuzzy entries were unfuzzied, the untranslated entries were translated

1024

and the obsolete entries were deleted, the mode line would merely display

1025

@samp{145t} for the counters.

1026

1027

The main PO commands are those which do not fit into the other categories of

1028

subsequent sections. These allow for quitting PO mode or for managing windows

1029

in special ways.

1030

1031

@table @kbd

1032

@item U

1033

Undo last modification to the PO file.

1034

1035

@item Q

1036

Quit processing and save the PO file.

1037

1038

@item q

1039

Quit processing, possibly after confirmation.

1040

1041

@item O

1042

Temporary leave the PO file window.

1043

1044

@item ?

1045

@itemx h

1046

Show help about PO mode.

1047

1048

@item =

1049

Give some PO file statistics.

1050

1051

@item V

1052

Batch validate the format of the whole PO file.

1053

1054

@end table

1055

1056

The command @kbd{U} (@code{po-undo}) interfaces to the Emacs

1057

@emph{undo} facility. @xref{Undo, , Undoing Changes, emacs, The Emacs

1058

Editor}. Each time @kbd{U} is typed, modifications which the translator

1059

did to the PO file are undone a little more. For the purpose of

1060

undoing, each PO mode command is atomic. This is especially true for

1061

the @kbd{@key{RET}} command: the whole edition made by using a single

1062

use of this command is undone at once, even if the edition itself

1063

implied several actions. However, while in the editing window, one

1064

can undo the edition work quite parsimoniously.

1065

1066

The commands @kbd{Q} (@code{po-quit}) and @kbd{q}

1067

(@code{po-confirm-and-quit}) are used when the translator is done with the

1068

PO file. The former is a bit less verbose than the latter. If the file

1069

has been modified, it is saved to disk first. In both cases, and prior to

1070

all this, the commands check if some untranslated message remains in the

1071

PO file and, if yes, the translator is asked if she really wants to leave

1072

off working with this PO file. This is the preferred way of getting rid

1073

of an Emacs PO file buffer. Merely killing it through the usual command

1074

@w{@kbd{C-x k}} (@code{kill-buffer}) is not the tidiest way to proceed.

1075

1076

The command @kbd{O} (@code{po-other-window}) is another, softer way,

1077

to leave PO mode, temporarily. It just moves the cursor to some other

1078

Emacs window, and pops one if necessary. For example, if the translator

1079

just got PO mode to show some source context in some other, she might

1080

discover some apparent bug in the program source that needs correction.

1081

This command allows the translator to change sex, become a programmer,

1082

and have the cursor right into the window containing the program she

1083

(or rather @emph{he}) wants to modify. By later getting the cursor back

1084

in the PO file window, or by asking Emacs to edit this file once again,

1085

PO mode is then recovered.

1086

1087

The command @kbd{h} (@code{po-help}) displays a summary of all available PO

1088

mode commands. The translator should then type any character to resume

1089

normal PO mode operations. The command @kbd{?} has the same effect

1090

as @kbd{h}.

1091

1092

The command @kbd{=} (@code{po-statistics}) computes the total number of

1093

entries in the PO file, the ordinal of the current entry (counted from

1094

1), the number of untranslated entries, the number of obsolete entries,

1095

and displays all these numbers.

1096

1097

The command @kbd{V} (@code{po-validate}) launches @code{msgfmt} in verbose

1098

mode over the current PO file. This command first offers to save the

1099

current PO file on disk. The @code{msgfmt} tool, from GNU @code{gettext},

1100

has the purpose of creating a MO file out of a PO file, and PO mode uses

1101

the features of this program for checking the overall format of a PO file,

1102

as well as all individual entries.

1103

1104

The program @code{msgfmt} runs asynchronously with Emacs, so the

1105

translator regains control immediately while her PO file is being studied.

1106

Error output is collected in the Emacs @samp{*compilation*} buffer,

1107

displayed in another window. The regular Emacs command @kbd{C-x`}

1108

(@code{next-error}), as well as other usual compile commands, allow the

1109

translator to reposition quickly to the offending parts of the PO file.

1110

Once the cursor is on the line in error, the translator may decide on

1111

any PO mode action which would help correcting the error.

1112

1113

@node Entry Positioning, Normalizing, Main PO Commands, Basics

1114

@section Entry Positioning

1115

1116

The cursor in a PO file window is almost always part of

1117

an entry. The only exceptions are the special case when the cursor

1118

is after the last entry in the file, or when the PO file is

1119

empty. The entry where the cursor is found to be is said to be the

1120

current entry. Many PO mode commands operate on the current entry,

1121

so moving the cursor does more than allowing the translator to browse

1122

the PO file, this also selects on which entry commands operate.

1123

1124

Some PO mode commands alter the position of the cursor in a specialized

1125

way. A few of those special purpose positioning are described here,

1126

the others are described in following sections.

1127

1128

@table @kbd

1129

1130

@item .

1131

Redisplay the current entry.

1132

1133

@item n

1134

@itemx n

1135

Select the entry after the current one.

1136

1137

@item p

1138

@itemx p

1139

Select the entry before the current one.

1140

1141

@item <

1142

Select the first entry in the PO file.

1143

1144

@item >

1145

Select the last entry in the PO file.

1146

1147

@item m

1148

Record the location of the current entry for later use.

1149

1150

@item l

1151

Return to a previously saved entry location.

1152

1153

@item x

1154

Exchange the current entry location with the previously saved one.

1155

1156

@end table

1157

1158

Any Emacs command able to reposition the cursor may be used

1159

to select the current entry in PO mode, including commands which

1160

move by characters, lines, paragraphs, screens or pages, and search

1161

commands. However, there is a kind of standard way to display the

1162

current entry in PO mode, which usual Emacs commands moving

1163

the cursor do not especially try to enforce. The command @kbd{.}

1164

(@code{po-current-entry}) has the sole purpose of redisplaying the

1165

current entry properly, after the current entry has been changed by

1166

means external to PO mode, or the Emacs screen otherwise altered.

1167

1168

It is yet to be decided if PO mode helps the translator, or otherwise

1169

irritates her, by forcing a rigid window disposition while she

1170

is doing her work. We originally had quite precise ideas about

1171

how windows should behave, but on the other hand, anyone used to

1172

Emacs is often happy to keep full control. Maybe a fixed window

1173

disposition might be offered as a PO mode option that the translator

1174

might activate or deactivate at will, so it could be offered on an

1175

experimental basis. If nobody feels a real need for using it, or

1176

a compulsion for writing it, we should drop this whole idea.

1177

The incentive for doing it should come from translators rather than

1178

programmers, as opinions from an experienced translator are surely

1179

more worth to me than opinions from programmers @emph{thinking} about

1180

how @emph{others} should do translation.

1181

1182

The commands @kbd{n} (@code{po-next-entry}) and @kbd{p}

1183

(@code{po-previous-entry}) move the cursor the entry following,

1184

or preceding, the current one. If @kbd{n} is given while the

1185

cursor is on the last entry of the PO file, or if @kbd{p}

1186

is given while the cursor is on the first entry, no move is done.

1187

1188

The commands @kbd{<} (@code{po-first-entry}) and @kbd{>}

1189

(@code{po-last-entry}) move the cursor to the first entry, or last

1190

entry, of the PO file. When the cursor is located past the last

1191

entry in a PO file, most PO mode commands will return an error saying

1192

@samp{After last entry}. Moreover, the commands @kbd{<} and @kbd{>}

1193

have the special property of being able to work even when the cursor

1194

is not into some PO file entry, and one may use them for nicely

1195

correcting this situation. But even these commands will fail on a

1196

truly empty PO file. There are development plans for the PO mode for it

1197

to interactively fill an empty PO file from sources. @xref{Marking}.

1198

1199

The translator may decide, before working at the translation of

1200

a particular entry, that she needs to browse the remainder of the

1201

PO file, maybe for finding the terminology or phraseology used

1202

in related entries. She can of course use the standard Emacs idioms

1203

for saving the current cursor location in some register, and use that

1204

register for getting back, or else, use the location ring.

1205

1206

PO mode offers another approach, by which cursor locations may be saved

1207

onto a special stack. The command @kbd{m} (@code{po-push-location})

1208

merely adds the location of current entry to the stack, pushing

1209

the already saved locations under the new one. The command

1210

@kbd{r} (@code{po-pop-location}) consumes the top stack element and

1211

repositions the cursor to the entry associated with that top element.

1212

This position is then lost, for the next @kbd{r} will move the cursor

1213

to the previously saved location, and so on until no locations remain

1214

on the stack.

1215

1216

If the translator wants the position to be kept on the location stack,

1217

maybe for taking a look at the entry associated with the top

1218

element, then go elsewhere with the intent of getting back later, she

1219

ought to use @kbd{m} immediately after @kbd{r}.

1220

1221

The command @kbd{x} (@code{po-exchange-location}) simultaneously

1222

repositions the cursor to the entry associated with the top element of

1223

the stack of saved locations, and replaces that top element with the

1224

location of the current entry before the move. Consequently, repeating

1225

the @kbd{x} command toggles alternatively between two entries.

1226

For achieving this, the translator will position the cursor on the

1227

first entry, use @kbd{m}, then position to the second entry, and

1228

merely use @kbd{x} for making the switch.

1229

1230

@node Normalizing, , Entry Positioning, Basics

1231

@section Normalizing Strings in Entries

1232

1233

There are many different ways for encoding a particular string into a

1234

PO file entry, because there are so many different ways to split and

1235

quote multi-line strings, and even, to represent special characters

1236

by backslahsed escaped sequences. Some features of PO mode rely on

1237

the ability for PO mode to scan an already existing PO file for a

1238

particular string encoded into the @code{msgid} field of some entry.

1239

Even if PO mode has internally all the built-in machinery for

1240

implementing this recognition easily, doing it fast is technically

1241

difficult. To facilitate a solution to this efficiency problem,

1242

we decided on a canonical representation for strings.

1243

1244

A conventional representation of strings in a PO file is currently

1245

under discussion, and PO mode experiments with a canonical representation.

1246

Having both @code{xgettext} and PO mode converging towards a uniform

1247

way of representing equivalent strings would be useful, as the internal

1248

normalization needed by PO mode could be automatically satisfied

1249

when using @code{xgettext} from GNU @code{gettext}. An explicit

1250

PO mode normalization should then be only necessary for PO files

1251

imported from elsewhere, or for when the convention itself evolves.

1252

1253

So, for achieving normalization of at least the strings of a given

1254

PO file needing a canonical representation, the following PO mode

1255

command is available:

1256

1257

@table @kbd

1258

@item M-x po-normalize

1259

Tidy the whole PO file by making entries more uniform.

1260

1261

@end table

1262

1263

The special command @kbd{M-x po-normalize}, which has no associated

1264

keys, revises all entries, ensuring that strings of both original

1265

and translated entries use uniform internal quoting in the PO file.

1266

It also removes any crumb after the last entry. This command may be

1267

useful for PO files freshly imported from elsewhere, or if we ever

1268

improve on the canonical quoting format we use. This canonical format

1269

is not only meant for getting cleaner PO files, but also for greatly

1270

speeding up @code{msgid} string lookup for some other PO mode commands.

1271

1272

@kbd{M-x po-normalize} presently makes three passes over the entries.

1273

The first implements heuristics for converting PO files for GNU

1274

@code{gettext} 0.6 and earlier, in which @code{msgid} and @code{msgstr}

1275

fields were using K&R style C string syntax for multi-line strings.

1276

These heuristics may fail for comments not related to obsolete

1277

entries and ending with a backslash; they also depend on subsequent

1278

passes for finalizing the proper commenting of continued lines for

1279

obsolete entries. This first pass might disappear once all oldish PO

1280

files would have been adjusted. The second and third pass normalize

1281

all @code{msgid} and @code{msgstr} strings respectively. They also

1282

clean out those trailing backslashes used by XView's @code{msgfmt}

1283

for continued lines.

1284

1285

Having such an explicit normalizing command allows for importing PO

1286

files from other sources, but also eases the evolution of the current

1287

convention, evolution driven mostly by aesthetic concerns, as of now.

1288

It is easy to make suggested adjustments at a later time, as the

1289

normalizing command and eventually, other GNU @code{gettext} tools

1290

should greatly automate conformance. A description of the canonical

1291

string format is given below, for the particular benefit of those not

1292

having Emacs handy, and who would nevertheless want to handcraft

1293

their PO files in nice ways.

1294

1295

Right now, in PO mode, strings are single line or multi-line. A string

1296

goes multi-line if and only if it has @emph{embedded} newlines, that

1297

is, if it matches @samp{[^\n]\n+[^\n]}. So, we would have:

1298

1299

@example

1300

msgstr "\n\nHello, world!\n\n\n"

1301

@end example

1302

1303

but, replacing the space by a newline, this becomes:

1304

1305

@example

1306

msgstr ""

1307

"\n"

1308

"\n"

1309

"Hello,\n"

1310

"world!\n"

1311

"\n"

1312

"\n"

1313

@end example

1314

1315

We are deliberately using a caricatural example, here, to make the

1316

point clearer. Usually, multi-lines are not that bad looking.

1317

It is probable that we will implement the following suggestion.

1318

We might lump together all initial newlines into the empty string,

1319

and also all newlines introducing empty lines (that is, for @w{@var{n}

1320

> 1}, the @var{n}-1'th last newlines would go together on a separate

1321

string), so making the previous example appear:

1322

1323

@example

1324

msgstr "\n\n"

1325

"Hello,\n"

1326

"world!\n"

1327

"\n\n"

1328

@end example

1329

1330

There are a few yet undecided little points about string normalization,

1331

to be documented in this manual, once these questions settle.

1332

1333

@node Sources, Template, Basics, Top

1334

@chapter Preparing Program Sources

1335

1336

@c FIXME: Rewrite (the whole chapter).

1337

1338

For the programmer, changes to the C source code fall into three

1339

categories. First, you have to make the localization functions

1340

known to all modules needing message translation. Second, you should

1341

properly trigger the operation of GNU @code{gettext} when the program

1342

initializes, usually from the @code{main} function. Last, you should

1343

identify and especially mark all constant strings in your program

1344

needing translation.

1345

1346

Presuming that your set of programs, or package, has been adjusted

1347

so all needed GNU @code{gettext} files are available, and your

1348

@file{Makefile} files are adjusted (@pxref{Maintainers}), each C module

1349

having translated C strings should contain the line:

1350

1351

@example

1352

#include <libintl.h>

1353

@end example

1354

1355

The remaining changes to your C sources are discussed in the further

1356

sections of this chapter.

1357

1358

@menu

1359

* Triggering:: Triggering @code{gettext} Operations

1360

* Mark Keywords:: How Marks Appear in Sources

1361

* Marking:: Marking Translatable Strings

1362

* c-format:: Telling something about the following string

1363

* Special cases:: Special Cases of Translatable Strings

1364

@end menu

1365

1366

@node Triggering, Mark Keywords, Sources, Sources

1367

@section Triggering @code{gettext} Operations

1368

1369

The initialization of locale data should be done with more or less

1370

the same code in every program, as demonstrated below:

1371

1372

@example

1373

@group

1374

int

1375

main (argc, argv)

1376

int argc;

1377

char argv;

1378

@{

1379

@dots{}

1380

setlocale (LC_ALL, "");

1381

bindtextdomain (PACKAGE, LOCALEDIR);

1382

textdomain (PACKAGE);

1383

@dots{}

1384

@}

1385

@end group

1386

@end example

1387

1388

@var{PACKAGE} and @var{LOCALEDIR} should be provided either by

1389

@file{config.h} or by the Makefile. For now consult the @code{gettext}

1390

sources for more information.

1391

1392

The use of @code{LC_ALL} might not be appropriate for you.

1393

@code{LC_ALL} includes all locale categories and especially

1394

@code{LC_CTYPE}. This later category is responsible for determining

1395

character classes with the @code{isalnum} etc. functions from

1396

@file{ctype.h} which could especially for programs, which process some

1397

kind of input language, be wrong. For example this would mean that a

1398

source code using the @,{c} (c-cedilla character) is runnable in

1399

France but not in the U.S.

1400

1401

Some systems also have problems with parsing numbers using the

1402

@code{scanf} functions if an other but the @code{LC_ALL} locale is used.

1403

The standards say that additional formats but the one known in the

1404

@code{"C"} locale might be recognized. But some systems seem to reject

1405

numbers in the @code{"C"} locale format. In some situation, it might

1406

also be a problem with the notation itself which makes it impossible to

1407

recognize whether the number is in the @code{"C"} locale or the local

1408

format. This can happen if thousands separator characters are used.

1409

Some locales define this character accordfing to the national

1410

conventions to @code{'.'} which is the same character used in the

1411

@code{"C"} locale to denote the decimal point.

1412

1413

So it is sometimes necessary to replace the @code{LC_ALL} line in the

1414

code above by a sequence of @code{setlocale} lines

1415

1416

@example

1417

@group

1418

@{

1419

@dots{}

1420

setlocale (LC_CTYPE, "");

1421

setlocale (LC_MESSAGES, "");

1422

@dots{}

1423

@}

1424

@end group

1425

@end example

1426

1427

@noindent

1428

On all POSIX conformant systems the locale categories @code{LC_CTYPE},

1429

@code{LC_COLLATE}, @code{LC_MONETARY}, @code{LC_NUMERIC}, and

1430

@code{LC_TIME} are available. On some modern systems there is also a

1431

locale @code{LC_MESSAGES} which is called on some old, XPG2 compliant

1432

systems @code{LC_RESPONSES}.

1433

1434

Note that changing the @code{LC_CTYPE} also affects the functions

1435

declared in the @code{<ctype.h>} standard header. If this is not

1436

desirable in your application (for example in a compiler's parser),

1437

you can use a set of substitute functions which hardwire the C locale,

1438

such as found in the @code{<c-ctype.h>} and @code{<c-ctype.c>} files

1439

in the gettext source distribution.

1440

1441

It is also possible to switch the locale forth and back between the

1442

environment dependent locale and the C locale, but this approach is

1443

normally avoided because a @code{setlocale} call is expensive,

1444

because it is tedious to determine the places where a locale switch

1445

is needed in a large program's source, and because switching a locale

1446

is not multithread-safe.

1447

1448

@node Mark Keywords, Marking, Triggering, Sources

1449

@section How Marks Appear in Sources

1450

1451

All strings requiring translation should be marked in the C sources. Marking

1452

is done in such a way that each translatable string appears to be

1453

the sole argument of some function or preprocessor macro. There are

1454

only a few such possible functions or macros meant for translation,

1455

and their names are said to be marking keywords. The marking is

1456

attached to strings themselves, rather than to what we do with them.

1457

This approach has more uses. A blatant example is an error message

1458

produced by formatting. The format string needs translation, as

1459

well as some strings inserted through some @samp{%s} specification

1460

in the format, while the result from @code{sprintf} may have so many

1461

different instances that it is impractical to list them all in some

1462

@samp{error_string_out()} routine, say.

1463

1464

This marking operation has two goals. The first goal of marking

1465

is for triggering the retrieval of the translation, at run time.

1466

The keyword are possibly resolved into a routine able to dynamically

1467

return the proper translation, as far as possible or wanted, for the

1468

argument string. Most localizable strings are found in executable

1469

positions, that is, attached to variables or given as parameters to

1470

functions. But this is not universal usage, and some translatable

1471

strings appear in structured initializations. @xref{Special cases}.

1472

1473

The second goal of the marking operation is to help @code{xgettext}

1474

at properly extracting all translatable strings when it scans a set

1475

of program sources and produces PO file templates.

1476

1477

The canonical keyword for marking translatable strings is

1478

@samp{gettext}, it gave its name to the whole GNU @code{gettext}

1479

package. For packages making only light use of the @samp{gettext}

1480

keyword, macro or function, it is easily used @emph{as is}. However,

1481

for packages using the @code{gettext} interface more heavily, it

1482

is usually more convenient to give the main keyword a shorter, less

1483

obtrusive name. Indeed, the keyword might appear on a lot of strings

1484

all over the package, and programmers usually do not want nor need

1485

their program sources to remind them forcefully, all the time, that they

1486

are internationalized. Further, a long keyword has the disadvantage

1487

of using more horizontal space, forcing more indentation work on

1488

sources for those trying to keep them within 79 or 80 columns.

1489

1490

Many packages use @samp{_} (a simple underline) as a keyword,

1491

and write @samp{_("Translatable string")} instead of @samp{gettext

1492

("Translatable string")}. Further, the coding rule, from GNU standards,

1493

wanting that there is a space between the keyword and the opening

1494

parenthesis is relaxed, in practice, for this particular usage.

1495

So, the textual overhead per translatable string is reduced to

1496

only three characters: the underline and the two parentheses.

1497

However, even if GNU @code{gettext} uses this convention internally,

1498

it does not offer it officially. The real, genuine keyword is truly

1499

@samp{gettext} indeed. It is fairly easy for those wanting to use

1500

@samp{_} instead of @samp{gettext} to declare:

1501

1502

@example

1503

#include <libintl.h>

1504

#define _(String) gettext (String)

1505

@end example

1506

1507

@noindent

1508

instead of merely using @samp{#include <libintl.h>}.

1509

1510

Later on, the maintenance is relatively easy. If, as a programmer,

1511

you add or modify a string, you will have to ask yourself if the

1512

new or altered string requires translation, and include it within

1513

@samp{_()} if you think it should be translated. @samp{"%s: %d"} is

1514

an example of string @emph{not} requiring translation!

1515

1516

@node Marking, c-format, Mark Keywords, Sources

1517

@section Marking Translatable Strings

1518

1519

In PO mode, one set of features is meant more for the programmer than

1520

for the translator, and allows him to interactively mark which strings,

1521

in a set of program sources, are translatable, and which are not.

1522

Even if it is a fairly easy job for a programmer to find and mark

1523

such strings by other means, using any editor of his choice, PO mode

1524

makes this work more comfortable. Further, this gives translators

1525

who feel a little like programmers, or programmers who feel a little

1526

like translators, a tool letting them work at marking translatable

1527

strings in the program sources, while simultaneously producing a set of

1528

translation in some language, for the package being internationalized.

1529

1530

The set of program sources, targetted by the PO mode commands describe

1531

here, should have an Emacs tags table constructed for your project,

1532

prior to using these PO file commands. This is easy to do. In any

1533

shell window, change the directory to the root of your project, then

1534

execute a command resembling:

1535

1536

@example

1537

etags src/*.[hc] lib/*.[hc]

1538

@end example

1539

1540

@noindent

1541

presuming here you want to process all @file{.h} and @file{.c} files

1542

from the @file{src/} and @file{lib/} directories. This command will

1543

explore all said files and create a @file{TAGS} file in your root

1544

directory, somewhat summarizing the contents using a special file

1545

format Emacs can understand.

1546

1547

For packages following the GNU coding standards, there is

1548

a make goal @code{tags} or @code{TAGS} which constructs the tag files in

1549

all directories and for all files containing source code.

1550

1551

Once your @file{TAGS} file is ready, the following commands assist

1552

the programmer at marking translatable strings in his set of sources.

1553

But these commands are necessarily driven from within a PO file

1554

window, and it is likely that you do not even have such a PO file yet.

1555

This is not a problem at all, as you may safely open a new, empty PO

1556

file, mainly for using these commands. This empty PO file will slowly

1557

fill in while you mark strings as translatable in your program sources.

1558

1559

@table @kbd

1560

@item ,

1561

Search through program sources for a string which looks like a

1562

candidate for translation.

1563

1564

@item M-,

1565

Mark the last string found with @samp{_()}.

1566

1567

@item M-.

1568

Mark the last string found with a keyword taken from a set of possible

1569

keywords. This command with a prefix allows some management of these

1570

keywords.

1571

1572

@end table

1573

1574

The @kbd{,} (@code{po-tags-search}) command searches for the next

1575

occurrence of a string which looks like a possible candidate for

1576

translation, and displays the program source in another Emacs window,

1577

positioned in such a way that the string is near the top of this other

1578

window. If the string is too big to fit whole in this window, it is

1579

positioned so only its end is shown. In any case, the cursor

1580

is left in the PO file window. If the shown string would be better

1581

presented differently in different native languages, you may mark it

1582

using @kbd{M-,} or @kbd{M-.}. Otherwise, you might rather ignore it

1583

and skip to the next string by merely repeating the @kbd{,} command.

1584

1585

A string is a good candidate for translation if it contains a sequence

1586

of three or more letters. A string containing at most two letters in

1587

a row will be considered as a candidate if it has more letters than

1588

non-letters. The command disregards strings containing no letters,

1589

or isolated letters only. It also disregards strings within comments,

1590

or strings already marked with some keyword PO mode knows (see below).

1591

1592

If you have never told Emacs about some @file{TAGS} file to use, the

1593

command will request that you specify one from the minibuffer, the

1594

first time you use the command. You may later change your @file{TAGS}

1595

file by using the regular Emacs command @w{@kbd{M-x visit-tags-table}},

1596

which will ask you to name the precise @file{TAGS} file you want

1597

to use. @xref{Tags, , Tag Tables, emacs, The Emacs Editor}.

1598

1599

Each time you use the @kbd{,} command, the search resumes from where it was

1600

left by the previous search, and goes through all program sources,

1601

obeying the @file{TAGS} file, until all sources have been processed.

1602

However, by giving a prefix argument to the command @w{(@kbd{C-u

1603

,})}, you may request that the search be restarted all over again

1604

from the first program source; but in this case, strings that you

1605

recently marked as translatable will be automatically skipped.

1606

1607

Using this @kbd{,} command does not prevent using of other regular

1608

Emacs tags commands. For example, regular @code{tags-search} or

1609

@code{tags-query-replace} commands may be used without disrupting the

1610

independent @kbd{,} search sequence. However, as implemented, the

1611

@emph{initial} @kbd{,} command (or the @kbd{,} command is used with a

1612

prefix) might also reinitialize the regular Emacs tags searching to the

1613

first tags file, this reinitialization might be considered spurious.

1614

1615

The @kbd{M-,} (@code{po-mark-translatable}) command will mark the

1616

recently found string with the @samp{_} keyword. The @kbd{M-.}

1617

(@code{po-select-mark-and-mark}) command will request that you type

1618

one keyword from the minibuffer and use that keyword for marking

1619

the string. Both commands will automatically create a new PO file

1620

untranslated entry for the string being marked, and make it the

1621

current entry (making it easy for you to immediately proceed to its

1622

translation, if you feel like doing it right away). It is possible

1623

that the modifications made to the program source by @kbd{M-,} or

1624

@kbd{M-.} render some source line longer than 80 columns, forcing you

1625

to break and re-indent this line differently. You may use the @kbd{O}

1626

command from PO mode, or any other window changing command from

1627

Emacs, to break out into the program source window, and do any

1628

needed adjustments. You will have to use some regular Emacs command

1629

to return the cursor to the PO file window, if you want command

1630

@kbd{,} for the next string, say.

1631

1632

The @kbd{M-.} command has a few built-in speedups, so you do not

1633

have to explicitly type all keywords all the time. The first such

1634

speedup is that you are presented with a @emph{preferred} keyword,

1635

which you may accept by merely typing @kbd{@key{RET}} at the prompt.

1636

The second speedup is that you may type any non-ambiguous prefix of the

1637

keyword you really mean, and the command will complete it automatically

1638

for you. This also means that PO mode has to @emph{know} all

1639

your possible keywords, and that it will not accept mistyped keywords.

1640

1641

If you reply @kbd{?} to the keyword request, the command gives a

1642

list of all known keywords, from which you may choose. When the

1643

command is prefixed by an argument @w{(@kbd{C-u M-.})}, it inhibits

1644

updating any program source or PO file buffer, and does some simple

1645

keyword management instead. In this case, the command asks for a

1646

keyword, written in full, which becomes a new allowed keyword for

1647

later @kbd{M-.} commands. Moreover, this new keyword automatically

1648

becomes the @emph{preferred} keyword for later commands. By typing

1649

an already known keyword in response to @w{@kbd{C-u M-.}}, one merely

1650

changes the @emph{preferred} keyword and does nothing more.

1651

1652

All keywords known for @kbd{M-.} are recognized by the @kbd{,} command

1653

when scanning for strings, and strings already marked by any of those

1654

known keywords are automatically skipped. If many PO files are opened

1655

simultaneously, each one has its own independent set of known keywords.

1656

There is no provision in PO mode, currently, for deleting a known

1657

keyword, you have to quit the file (maybe using @kbd{q}) and reopen

1658

it afresh. When a PO file is newly brought up in an Emacs window, only

1659

@samp{gettext} and @samp{_} are known as keywords, and @samp{gettext}

1660

is preferred for the @kbd{M-.} command. In fact, this is not useful to

1661

prefer @samp{_}, as this one is already built in the @kbd{M-,} command.

1662

1663

@node c-format, Special cases, Marking, Sources

1664

@section Special Comments preceding Keywords

1665

1666

@c FIXME document c-format and no-c-format.

1667

1668

In C programs strings are often used within calls of functions from the

1669

@code{printf} family. The special thing about these format strings is

1670

that they can contain format specifiers introduced with @kbd{%}. Assume

1671

we have the code

1672

1673

@example

1674

printf (gettext ("String `%s' has %d characters\n"), s, strlen (s));

1675

@end example

1676

1677

@noindent

1678

A possible German translation for the above string might be:

1679

1680

@example

1681

"%d Zeichen lang ist die Zeichenkette `%s'"

1682

@end example

1683

1684

A C programmer, even if he cannot speak German, will recognize that

1685

there is something wrong here. The order of the two format specifiers

1686

is changed but of course the arguments in the @code{printf} don't have.

1687

This will most probably lead to problems because now the length of the

1688

string is regarded as the address.

1689

1690

To prevent errors at runtime caused by translations the @code{msgfmt}

1691

tool can check statically whether the arguments in the original and the

1692

translation string match in type and number. If this is not the case a

1693

warning will be given and the error cannot causes problems at runtime.

1694

1695

@noindent

1696

If the word order in the above German translation would be correct one

1697

would have to write

1698

1699

@example

1700

"%2$d Zeichen lang ist die Zeichenkette `%1$s'"

1701

@end example

1702

1703

@noindent

1704

The routines in @code{msgfmt} know about this special notation.

1705

1706

Because not all strings in a program must be format strings it is not

1707

useful for @code{msgfmt} to test all the strings in the @file{.po} file.

1708

This might cause problems because the string might contain what looks

1709

like a format specifier, but the string is not used in @code{printf}.

1710

1711

Therefore the @code{xgettext} adds a special tag to those messages it

1712

thinks might be a format string. There is no absolute rule for this,

1713

only a heuristic. In the @file{.po} file the entry is marked using the

1714

@code{c-format} flag in the @kbd{#,} comment line (@pxref{PO Files}).

1715

1716

The careful reader now might say that this again can cause problems.

1717

The heuristic might guess it wrong. This is true and therefore

1718

@code{xgettext} knows about special kind of comment which lets

1719

the programmer take over the decision. If in the same line or

1720

the immediately preceding line of the @code{gettext} keyword

1721

the @code{xgettext} program find a comment containing the words

1722

@kbd{xgettext:c-format} it will mark the string in any case with

1723

the @kbd{c-format} flag. This kind of comment should be used when

1724

@code{xgettext} does not recognize the string as a format string but

1725

is really is one and it should be tested. Please note that when the

1726

comment is in the same line of the @code{gettext} keyword, it must be

1727

before the string to be translated.

1728

1729

This situation happens quite often. The @code{printf} function is often

1730

called with strings which do not contain a format specifier. Of course

1731

one would normally use @code{fputs} but it does happen. In this case

1732

@code{xgettext} does not recognize this as a format string but what

1733

happens if the translation introduces a valid format specifier? The

1734

@code{printf} function will try to access one of the parameter but none

1735

exists because the original code does not refer to any parameter.

1736

1737

@code{xgettext} of course could make a wrong decision the other way

1738

round, i.e. a string marked as a format string actually is not a format

1739

string. In this case the @code{msgfmt} might give too many warnings and

1740

would prevent translating the @file{.po} file. The method to prevent

1741

this wrong decision is similar to the one used above, only the comment

1742

to use must contain the string @kbd{xgettext:no-c-format}.

1743

1744

If a string is marked with @kbd{c-format} and this is not correct the

1745

user can find out who is responsible for the decision. See

1746

@ref{xgettext Invocation} to see how the @kbd{--debug} option can be

1747

used for solving this problem.

1748

1749

@node Special cases, , c-format, Sources

1750

@section Special Cases of Translatable Strings

1751

1752

The attentive reader might now point out that it is not always possible

1753

to mark translatable string with @code{gettext} or something like this.

1754

Consider the following case:

1755

1756

@example

1757

@group

1758

@{

1759

static const char *messages[] = @{

1760

"some very meaningful message",

1761

"and another one"

1762

@};

1763

const char *string;

1764

@dots{}

1765

string

1766

= index > 1 ? "a default message" : messages[index];

1767

1768

fputs (string);

1769

@dots{}

1770

@}

1771

@end group

1772

@end example

1773

1774

While it is no problem to mark the string @code{"a default message"} it

1775

is not possible to mark the string initializers for @code{messages}.

1776

What is to be done? We have to fulfill two tasks. First we have to mark the

1777

strings so that the @code{xgettext} program (@pxref{xgettext Invocation})

1778

can find them, and second we have to translate the string at runtime

1779

before printing them.

1780

1781

The first task can be fulfilled by creating a new keyword, which names a

1782

no-op. For the second we have to mark all access points to a string

1783

from the array. So one solution can look like this:

1784

1785

@example

1786

@group

1787

#define gettext_noop(String) (String)

1788

1789

@{

1790

static const char *messages[] = @{

1791

gettext_noop ("some very meaningful message"),

1792

gettext_noop ("and another one")

1793

@};

1794

const char *string;

1795

@dots{}

1796

string

1797

= index > 1 ? gettext ("a default message") : gettext (messages[index]);

1798

1799

fputs (string);

1800

@dots{}

1801

@}

1802

@end group

1803

@end example

1804

1805

Please convince yourself that the string which is written by

1806

@code{fputs} is translated in any case. How to get @code{xgettext} know

1807

the additional keyword @code{gettext_noop} is explained in @ref{xgettext

1808

Invocation}.

1809

1810

The above is of course not the only solution. You could also come along

1811

with the following one:

1812

1813

@example

1814

@group

1815

#define gettext_noop(String) (String)

1816

1817

@{

1818

static const char *messages[] = @{

1819

gettext_noop ("some very meaningful message",

1820

gettext_noop ("and another one")

1821

@};

1822

const char *string;

1823

@dots{}

1824

string

1825

= index > 1 ? gettext_noop ("a default message") : messages[index];

1826

1827

fputs (gettext (string));

1828

@dots{}

1829

@}

1830

@end group

1831

@end example

1832

1833

But this has some drawbacks. First the programmer has to take care that

1834

he uses @code{gettext_noop} for the string @code{"a default message"}.

1835

A use of @code{gettext} could have in rare cases unpredictable results.

1836

The second reason is found in the internals of the GNU @code{gettext}

1837

Library which will make this solution less efficient.

1838

1839

One advantage is that you need not make control flow analysis to make

1840

sure the output is really translated in any case. But this analysis is

1841

generally not very difficult. If it should be in any situation you can

1842

use this second method in this situation.

1843

1844

@node Template, Creating, Sources, Top

1845

@chapter Making the PO Template File

1846

1847

After preparing the sources, the programmer creates a PO template file.

1848

This section explains how to use @code{xgettext} for this purpose.

1849

1850

@c FIXME: Rewrite.

1851

1852

@menu

1853

* xgettext Invocation:: Invoking the @code{xgettext} Program

1854

@end menu

1855

1856

@node xgettext Invocation, , Template, Template

1857

@section Invoking the @code{xgettext} Program

1858

1859

@c FIXME: Rewrite.

1860

1861

@example

1862

xgettext [@var{option}] @var{inputfile} @dots{}

1863

@end example

1864

1865

@table @samp

1866

@item -a

1867

@itemx --extract-all

1868

Extract all strings.

1869

1870

@item -c [@var{tag}]

1871

@itemx --add-comments[=@var{tag}]

1872

Place comment block with @var{tag} (or those preceding keyword lines)

1873

in output file.

1874

1875

@item -C

1876

@itemx --c++

1877

Recognize C++ style comments.

1878

1879

@itemx --debug

1880

Use the flags @kbd{c-format} and @kbd{possible-c-format} to show who was

1881

responsible for marking a message as a format string. The latter form is

1882

used if the @code{xgettext} program decided, the format form is used if

1883

the programmer prescribed it.

1884

1885

By default only the @kbd{c-format} form is used. The translator should

1886

not have to care about these details.

1887

1888

@item -d @var{name}

1889

@itemx --default-domain=@var{name}

1890

Use @file{@var{name}.po} for output (instead of @file{messages.po}).

1891

1892

The special domain name @file{-} or @file{/dev/stdout} means to write

1893

the output to @file{stdout}.

1894

1895

@item -D @var{directory}

1896

@itemx --directory=@var{directory}

1897

Change to @var{directory} before beginning to search and scan source

1898

files. The resulting @file{.po} file will be written relative to the

1899

original directory, though.

1900

1901

@item -f @var{file}

1902

@itemx --files-from=@var{file}

1903

Read the names of the input files from @var{file} instead of getting

1904

them from the command line.

1905

1906

@itemx --force

1907

Always write an output file even if no message is defined.

1908

1909

@item -h

1910

@itemx --help

1911

Display this help and exit.

1912

1913

@item -I @var{list}

1914

@itemx --input-path=@var{list}

1915

List of directories searched for input files.

1916

1917

@item -j

1918

@itemx --join-existing

1919

Join messages with existing file.

1920

1921

@item -k @var{word}

1922

@itemx --keyword[=@var{keywordspec}]

1923

Additional keyword to be looked for (without @var{keywordspec} means not to

1924

use default keywords).

1925

1926

If @var{keywordspec} is a C identifer @var{id}, @code{xgettext} looks

1927

for strings in the first argument of each call to the function or macro

1928

@var{id}. If @var{keywordspec} is of the form

1929

@samp{@var{id}:@var{argnum}}, @code{xgettext} looks for strings in the

1930

@var{argnum}th argument of the call. If @var{keywordspec} is of the form

1931

@samp{@var{id}:@var{argnum1},@var{argnum2}}, @code{xgettext} looks for

1932

strings in the @var{argnum1}st argument and in the @var{argnum2}nd argument

1933

of the call, and treats them as singular/plural variants for a message

1934

with plural handling.

1935

1936

The default keyword specifications, which are always looked for if not

1937

explicitly disabled, are @code{gettext}, @code{dgettext:2},

1938

@code{dcgettext:2}, @code{ngettext:1,2}, @code{dngettext:2,3},

1939

@code{dcngettext:2,3}, and @code{gettext_noop}.

1940

1941

@item -m [@var{string}]

1942

@itemx --msgstr-prefix[=@var{string}]

1943

Use @var{string} or "" as prefix for msgstr entries.

1944

1945

@item -M [@var{string}]

1946

@itemx --msgstr-suffix[=@var{string}]

1947

Use @var{string} or "" as suffix for msgstr entries.

1948

1949

@item --no-location

1950

Do not write @samp{#: @var{filename}:@var{line}} lines.

1951

1952

@item -n

1953

@itemx --add-location

1954

Generate @samp{#: @var{filename}:@var{line}} lines (default).

1955

1956

@item --omit-header

1957

Don't write header with @samp{msgid ""} entry.

1958

1959

This is useful for testing purposes because it eliminates a source

1960

of variance for generated @code{.gmo} files. We can ship some of

1961

these files in the GNU @code{gettext} package, and the result of

1962

regenerating them through @code{msgfmt} should yield the same values.

1963

1964

@item -p @var{dir}

1965

@itemx --output-dir=@var{dir}

1966

Output files will be placed in directory @var{dir}.

1967

1968

@item -s

1969

@itemx --sort-output

1970

Generate sorted output and remove duplicates.

1971

1972

@item --strict

1973

Write out a strict Uniforum conforming PO file.

1974

1975

@item -v

1976

@itemx --version

1977

Output version information and exit.

1978

1979

@item -x @var{file}

1980

@itemx --exclude-file=@var{file}

1981

Entries from @var{file} are not extracted.

1982

1983

@end table

1984

1985

Search path for supplementary PO files is:

1986

@file{/usr/local/share/nls/src/}.

1987

1988

If @var{inputfile} is @samp{-}, standard input is read.

1989

1990

This implementation of @code{xgettext} is able to process a few awkward

1991

cases, like strings in preprocessor macros, ANSI concatenation of

1992

adjacent strings, and escaped end of lines for continued strings.

1993

1994

@node Creating, Updating, Template, Top

1995

@chapter Creating a New PO File

1996

1997

When starting a new translation, the translator copies the

1998

@file{@var{package}.pot} template file to a file called

1999

@file{@var{LANG}.po}. Then she modifies the initial comments and

2000

the header entry of this file.

2001

2002

The initial comments "SOME DESCRIPTIVE TITLE", "YEAR" and

2003

"FIRST AUTHOR <EMAIL@@ADDRESS>, YEAR" ought to be replaced by sensible

2004

information. This can be done in any text editor; if Emacs is used

2005

and it switched to PO mode automatically (because it has recognized

2006

the file's suffix), you can disable it by typing @kbd{M-x fundamental-mode}.

2007

2008

Modifying the header entry can already be done using PO mode: in Emacs,

2009

type @kbd{M-x po-mode RET} and then @kbd{RET} again to start editing the

2010

entry. You should fill in the following fields.

2011

2012

@table @asis

2013

@item Project-Id-Version

2014

This is the name and version of the package.

2015

2016

@item POT-Creation-Date

2017

This has already been filled in by @code{xgettext}.

2018

2019

@item PO-Revision-Date

2020

You don't need to fill this in. It will be filled by the Emacs PO mode

2021

when you save the file.

2022

2023

@item Last-Translator

2024

Fill in your name and email address (without double quotes).

2025

2026

@item Language-Team

2027

Fill in the English name of the language, and the email address of the

2028

language team you are part of.

2029

2030

Before starting a translation, it is a good idea to get in touch with

2031

your translation team, not only to make sure you don't do duplicated work,

2032

but also to coordinate difficult linguistic issues.

2033

2034

In the Free Translation Project, each translation team has its own mailing

2035

list. The up-to-date list of teams can be found at the Free Translation

2036

Project's homepage, @file{http://www.iro.umontreal.ca/contrib/po/HTML/},

2037

in the "National teams" area.

2038

2039

@item Content-Type

2040

Replace @samp{CHARSET} with the character encoding used for your language,

2041

in your locale, or UTF-8. This field is needed for correct operation of the

2042

@code{msgmerge} and @code{msgfmt} programs, as well as for users whose

2043

locale's character encoding differs from yours (see @ref{Charset conversion}).

2044

2045

You get the character encoding of your locale by running the shell command

2046

@samp{locale charmap}. If the result is @samp{C} or @samp{ANSI_X3.4-1968},

2047

which is equivalent to @samp{ASCII} (= @samp{US-ASCII}), it means that your

2048

locale is not correctly configured. In this case, ask your translation

2049

team which charset to use. @samp{ASCII} is not usable for any language

2050

except Latin.

2051

2052

Because the PO files must be portable to operating systems with less advanced

2053

internationalization facilities, the character encodings that can be used

2054

are limited to those supported by both GNU @code{libc} and GNU

2055

@code{libiconv}. These are:

2056

@code{ASCII}, @code{ISO-8859-1}, @code{ISO-8859-2}, @code{ISO-8859-3},

2057

@code{ISO-8859-4}, @code{ISO-8859-5}, @code{ISO-8859-6}, @code{ISO-8859-7},

2058

@code{ISO-8859-8}, @code{ISO-8859-9}, @code{ISO-8859-13}, @code{ISO-8859-15},

2059

@code{KOI8-R}, @code{KOI8-U}, @code{CP850}, @code{CP866}, @code{CP874},

2060

@code{CP932}, @code{CP949}, @code{CP950}, @code{CP1250}, @code{CP1251},

2061

@code{CP1252}, @code{CP1253}, @code{CP1254}, @code{CP1255}, @code{CP1256},

2062

@code{CP1257}, @code{GB2312}, @code{EUC-JP}, @code{EUC-KR}, @code{EUC-TW},

2063

@code{BIG5}, @code{BIG5-HKSCS}, @code{GBK}, @code{GB18030}, @code{SHIFT_JIS},

2064

@code{JOHAB}, @code{TIS-620}, @code{VISCII}, @code{UTF-8}.

2065

2066

@c This data is taken from glibc/localedata/SUPPORTED.

2067

In the GNU system, the following encodings are frequently used for the

2068

corresponding languages.

2069

2070

@itemize

2071

@item @code{ISO-8859-1} for

2072

Afrikaans, Albanian, Basque, Catalan, Dutch, English, Estonian, Faroese,

2073

Finnish, French, Galician, German, Greenlandic, Icelandic, Indonesian,

2074

Irish, Italian, Malay, Norwegian, Portuguese, Spanish, Swedish,

2075

@item @code{ISO-8859-2} for

2076

Croatian, Czech, Hungarian, Polish, Romanian, Serbian, Slovak, Slovenian,

2077

@item @code{ISO-8859-3} for Maltese,

2078

@item @code{ISO-8859-5} for Macedonian, Serbian,

2079

@item @code{ISO-8859-6} for Arabic,

2080

@item @code{ISO-8859-7} for Greek,

2081

@item @code{ISO-8859-8} for Hebrew,

2082

@item @code{ISO-8859-9} for Turkish,

2083

@item @code{ISO-8859-13} for Latvian, Lithuanian,

2084

@item @code{ISO-8859-15} for

2085

Basque, Catalan, Dutch, English, Finnish, French, Galician, German, Irish,

2086

Italian, Portuguese, Spanish, Swedish,

2087

@item @code{KOI8-R} for Russian,

2088

@item @code{KOI8-U} for Ukrainian,

2089

@item @code{CP1251} for Bulgarian, Byelorussian,

2090

@item @code{GB2312}, @code{GBK}, @code{GB18030}

2091

for simplified writing of Chinese,

2092

@item @code{BIG5}, @code{BIG5-HKSCS}

2093

for traditional writing of Chinese,

2094

@item @code{EUC-JP} for Japanese,

2095

@item @code{EUC-KR} for Korean,

2096

@item @code{TIS-620} for Thai,

2097

@item @code{UTF-8} for any language, including those listed above.

2098

@end itemize

2099

2100

When single quote characters or double quote characters are used in

2101

translations for your language, and your locale's encoding is one of the

2102

ISO-8859-* charsets, it is best if you create your PO files in UTF-8

2103

encoding, instead of your locale's encoding. This is because in UTF-8

2104

the real quote characters can be represented (single quote characters:

2105

U+2018, U+2019, double quote characters: U+201C, U+201D), whereas none of

2106

ISO-8859-* charsets has them all. Users in UTF-8 locales will see the

2107

real quote characters, whereas users in ISO-8859-* locales will see the

2108

vertical apostrophe and the vertical double quote instead (because that's

2109

what the character set conversion will transliterate them to).

2110

2111

To enter such quote characters under X11, you can change your keyboard

2112

mapping using the @code{xmodmap} program. The X11 names of the quote

2113

characters are "leftsinglequotemark", "rightsinglequotemark",

2114

"leftdoublequotemark", "rightdoublequotemark", "singlelowquotemark",

2115

"doublelowquotemark".

2116

2117

Note that only recent versions of GNU Emacs support the UTF-8 encoding:

2118

Emacs 20 with Mule-UCS, and Emacs 21. As of January 2001, XEmacs doesn't

2119

support the UTF-8 encoding.

2120

2121

The character encoding name can be written in either upper or lower case.

2122

Usually upper case is preferred.

2123

2124

@item Content-Transfer-Encoding

2125

Set this to @code{8bit}.

2126

2127

@item Plural-Forms

2128

This field is optional. It is only needed if the PO file has plural forms.

2129

You can find them by searching for the @samp{msgid_plural} keyword. The

2130

format of the plural forms field is described in @ref{Plural forms}.

2131

@end table

2132

2133

@node Updating, Binaries, Creating, Top

2134

@chapter Updating Existing PO Files

2135

2136

@c FIXME: Rewrite.

2137

2138

@menu

2139

* msgmerge Invocation:: Invoking the @code{msgmerge} Program

2140

* Translated Entries:: Translated Entries

2141

* Fuzzy Entries:: Fuzzy Entries

2142

* Untranslated Entries:: Untranslated Entries

2143

* Obsolete Entries:: Obsolete Entries

2144

* Modifying Translations:: Modifying Translations

2145

* Modifying Comments:: Modifying Comments

2146

* Subedit:: Mode for Editing Translations

2147

* C Sources Context:: C Sources Context

2148

* Auxiliary:: Consulting Auxiliary PO Files

2149

* Compendium:: Using Translation Compendiums

2150

@end menu

2151

2152

@node msgmerge Invocation, Translated Entries, Updating, Updating

2153

@section Invoking the @code{msgmerge} Program

2154

2155

@c FIXME: Rewrite.

2156

2157

@c @example

2158

@c tupdate --help

2159

@c tupdate --version

2160

@c tupdate @var{new} @var{old}

2161

@c @end example

2162

2163

@c File @var{new} is the last created PO file (generally by

2164

@c @code{xgettext}). It need not contain any translations. File

2165

@c @var{old} is the PO file including the old translations which will

2166

@c be taken over to the newly created file as long as they still match.

2167

2168

@c When English messages change in the programs, this is reflected in

2169

@c the PO file as extracted by @code{xgettext}. In large messages, that

2170

@c can be hard to detect, and will obviously result in an incomplete

2171

@c translation. One of the virtues of @code{tupdate} is that it detects

2172

@c such changes, saving the previous translation into a PO file comment,

2173

@c so marking the entry as obsolete, and giving the modified string with

2174

@c an empty translation, that is, marking the entry as untranslated.

2175

2176

@node Translated Entries, Fuzzy Entries, msgmerge Invocation, Updating

2177

@section Translated Entries

2178

2179

Each PO file entry for which the @code{msgstr} field has been filled with

2180

a translation, and which is not marked as fuzzy (@pxref{Fuzzy Entries}),

2181

is a said to be a @dfn{translated} entry. Only translated entries will

2182

later be compiled by GNU @code{msgfmt} and become usable in programs.

2183

Other entry types will be excluded; translation will not occur for them.

2184

2185

Some commands are more specifically related to translated entry processing.

2186

2187

@table @kbd

2188

@item t

2189

Find the next translated entry.

2190

2191

@item M-t

2192

Find the previous translated entry.

2193

2194

@end table

2195

2196

The commands @kbd{t} (@code{po-next-translated-entry}) and @kbd{M-t}

2197

(@code{po-previous-transted-entry}) move forwards or backwards, chasing

2198

for an translated entry. If none is found, the search is extended and

2199

wraps around in the PO file buffer.

2200

2201

Translated entries usually result from the translator having edited in

2202

a translation for them, @ref{Modifying Translations}. However, if the

2203

variable @code{po-auto-fuzzy-on-edit} is not @code{nil}, the entry having

2204

received a new translation first becomes a fuzzy entry, which ought to

2205

be later unfuzzied before becoming an official, genuine translated entry.

2206

@xref{Fuzzy Entries}.

2207

2208

@node Fuzzy Entries, Untranslated Entries, Translated Entries, Updating

2209

@section Fuzzy Entries

2210

2211

Each PO file entry may have a set of @dfn{attributes}, which are

2212

qualities given a name and explicitely associated with the translation,

2213

using a special system comment. One of these attributes

2214

has the name @code{fuzzy}, and entries having this attribute are said

2215

to have a fuzzy translation. They are called fuzzy entries, for short.

2216

2217

Fuzzy entries, even if they account for translated entries for

2218

most other purposes, usually call for revision by the translator.

2219

Those may be produced by applying the program @code{msgmerge} to

2220

update an older translated PO files according to a new PO template

2221

file, when this tool hypothesises that some new @code{msgid} has

2222

been modified only slightly out of an older one, and chooses to pair

2223

what it thinks to be the old translation for the new modified entry.

2224

The slight alteration in the original string (the @code{msgid} string)

2225

should often be reflected in the translated string, and this requires

2226

the intervention of the translator. For this reason, @code{msgmerge}

2227

might mark some entries as being fuzzy.

2228

2229

Also, the translator may decide herself to mark an entry as fuzzy

2230

for her own convenience, when she wants to remember that the entry

2231

has to be later revisited. So, some commands are more specifically

2232

related to fuzzy entry processing.

2233

2234

@table @kbd

2235

@item f

2236

Find the next fuzzy entry.

2237

2238

@item M-f

2239

Find the previous fuzzy entry.

2240

2241

@item @key{TAB}

2242

Remove the fuzzy attribute of the current entry.

2243

2244

@end table

2245

2246

The commands @kbd{f} (@code{po-next-fuzzy}) and @kbd{M-f}

2247

(@code{po-previous-fuzzy}) move forwards or backwards, chasing for

2248

a fuzzy entry. If none is found, the search is extended and wraps

2249

around in the PO file buffer.

2250

2251

The command @kbd{@key{TAB}} (@code{po-unfuzzy}) removes the fuzzy

2252

attribute associated with an entry, usually leaving it translated.

2253

Further, if the variable @code{po-auto-select-on-unfuzzy} has not

2254

the @code{nil} value, the @kbd{@key{TAB}} command will automatically chase

2255

for another interesting entry to work on. The initial value of

2256

@code{po-auto-select-on-unfuzzy} is @code{nil}.

2257

2258

The initial value of @code{po-auto-fuzzy-on-edit} is @code{nil}. However,

2259

if the variable @code{po-auto-fuzzy-on-edit} is set to @code{t}, any entry

2260

edited through the @kbd{@key{RET}} command is marked fuzzy, as a way to

2261

ensure some kind of double check, later. In this case, the usual paradigm

2262

is that an entry becomes fuzzy (if not already) whenever the translator

2263

modifies it. If she is satisfied with the translation, she then uses

2264

@kbd{@key{TAB}} to pick another entry to work on, clearing the fuzzy attribute

2265

on the same blow. If she is not satisfied yet, she merely uses @kbd{@key{SPC}}

2266

to chase another entry, leaving the entry fuzzy.

2267

2268

The translator may also use the @kbd{@key{DEL}} command

2269

(@code{po-fade-out-entry}) over any translated entry to mark it as being

2270

fuzzy, when she wants to easily leave a trace she wants to later return

2271

working at this entry.

2272

2273

Also, when time comes to quit working on a PO file buffer with the @kbd{q}

2274

command, the translator is asked for confirmation, if fuzzy string

2275

still exists.

2276

2277

@node Untranslated Entries, Obsolete Entries, Fuzzy Entries, Updating

2278

@section Untranslated Entries

2279

2280

When @code{xgettext} originally creates a PO file, unless told

2281

otherwise, it initializes the @code{msgid} field with the untranslated

2282

string, and leaves the @code{msgstr} string to be empty. Such entries,

2283

having an empty translation, are said to be @dfn{untranslated} entries.

2284

Later, when the programmer slightly modifies some string right in

2285

the program, this change is later reflected in the PO file

2286

by the appearance of a new untranslated entry for the modified string.

2287

2288

The usual commands moving from entry to entry consider untranslated

2289

entries on the same level as active entries. Untranslated entries

2290

are easily recognizable by the fact they end with @w{@samp{msgstr ""}}.

2291

2292

The work of the translator might be (quite naively) seen as the process

2293

of seeking for an untranslated entry, editing a translation for

2294

it, and repeating these actions until no untranslated entries remain.

2295

Some commands are more specifically related to untranslated entry

2296

processing.

2297

2298

@table @kbd

2299

@item u

2300

Find the next untranslated entry.

2301

2302

@item M-u

2303

Find the previous untranslated entry.

2304

2305

@item k

2306

Turn the current entry into an untranslated one.

2307

2308

@end table

2309

2310

The commands @kbd{u} (@code{po-next-untranslated-entry}) and @kbd{M-u}

2311

(@code{po-previous-untransted-entry}) move forwards or backwards,

2312

chasing for an untranslated entry. If none is found, the search is

2313

extended and wraps around in the PO file buffer.

2314

2315

An entry can be turned back into an untranslated entry by

2316

merely emptying its translation, using the command @kbd{k}

2317

(@code{po-kill-msgstr}). @xref{Modifying Translations}.

2318

2319

Also, when time comes to quit working on a PO file buffer

2320

with the @kbd{q} command, the translator is asked for confirmation,

2321

if some untranslated string still exists.

2322

2323

@node Obsolete Entries, Modifying Translations, Untranslated Entries, Updating

2324

@section Obsolete Entries

2325

2326

By @dfn{obsolete} PO file entries, we mean those entries which are

2327

commented out, usually by @code{msgmerge} when it found that the

2328

translation is not needed anymore by the package being localized.

2329

2330

The usual commands moving from entry to entry consider obsolete

2331

entries on the same level as active entries. Obsolete entries are

2332

easily recognizable by the fact that all their lines start with

2333

@kbd{#}, even those lines containing @code{msgid} or @code{msgstr}.

2334

2335

Commands exist for emptying the translation or reinitializing it

2336

to the original untranslated string. Commands interfacing with the

2337

kill ring may force some previously saved text into the translation.

2338

The user may interactively edit the translation. All these commands

2339

may apply to obsolete entries, carefully leaving the entry obsolete

2340

after the fact.

2341

2342

Moreover, some commands are more specifically related to obsolete

2343

entry processing.

2344

2345

@table @kbd

2346

@item o

2347

Find the next obsolete entry.

2348

2349

@item M-o

2350

Find the previous obsolete entry.

2351

2352

@item @key{DEL}

2353

Make an active entry obsolete, or zap out an obsolete entry.

2354

2355

@end table

2356

2357

The commands @kbd{o} (@code{po-next-obsolete-entry}) and @kbd{M-o}

2358

(@code{po-previous-obsolete-entry}) move forwards or backwards,

2359

chasing for an obsolete entry. If none is found, the search is

2360

extended and wraps around in the PO file buffer.

2361

2362

PO mode does not provide ways for un-commenting an obsolete entry

2363

and making it active, because this would reintroduce an original

2364

untranslated string which does not correspond to any marked string

2365

in the program sources. This goes with the philosophy of never

2366

introducing useless @code{msgid} values.

2367

2368

However, it is possible to comment out an active entry, so making

2369

it obsolete. GNU @code{gettext} utilities will later react to the

2370

disappearance of a translation by using the untranslated string.

2371

The command @kbd{@key{DEL}} (@code{po-fade-out-entry}) pushes the current entry

2372

a little further towards annihilation. If the entry is active (it is a

2373

translated entry), then it is first made fuzzy. If it is already fuzzy,

2374

then the entry is merely commented out, with confirmation. If the entry

2375

is already obsolete, then it is completely deleted from the PO file.

2376

It is easy to recycle the translation so deleted into some other PO file

2377

entry, usually one which is untranslated. @xref{Modifying Translations}.

2378

2379

Here is a quite interesting problem to solve for later development of

2380

PO mode, for those nights you are not sleepy. The idea would be that

2381

PO mode might become bright enough, one of these days, to make good

2382

guesses at retrieving the most probable candidate, among all obsolete

2383

entries, for initializing the translation of a newly appeared string.

2384

I think it might be a quite hard problem to do this algorithmically, as

2385

we have to develop good and efficient measures of string similarity.

2386

Right now, PO mode completely lets the decision to the translator,

2387

when the time comes to find the adequate obsolete translation, it

2388

merely tries to provide handy tools for helping her to do so.

2389

2390

@node Modifying Translations, Modifying Comments, Obsolete Entries, Updating

2391

@section Modifying Translations

2392

2393

PO mode prevents direct edition of the PO file, by the usual

2394

means Emacs give for altering a buffer's contents. By doing so,

2395

it pretends helping the translator to avoid little clerical errors

2396

about the overall file format, or the proper quoting of strings,

2397

as those errors would be easily made. Other kinds of errors are

2398

still possible, but some may be caught and diagnosed by the batch

2399

validation process, which the translator may always trigger by the

2400

@kbd{V} command. For all other errors, the translator has to rely on

2401

her own judgment, and also on the linguistic reports submitted to her

2402

by the users of the translated package, having the same mother tongue.

2403

2404

When the time comes to create a translation, correct an error diagnosed

2405

mechanically or reported by a user, the translators have to resort to

2406

using the following commands for modifying the translations.

2407

2408

@table @kbd

2409

@item @key{RET}

2410

Interactively edit the translation.

2411

2412

@item @key{LFD}

2413

Reinitialize the translation with the original, untranslated string.

2414

2415

@item k

2416

Save the translation on the kill ring, and delete it.

2417

2418

@item w

2419

Save the translation on the kill ring, without deleting it.

2420

2421

@item y

2422

Replace the translation, taking the new from the kill ring.

2423

2424

@end table

2425

2426

The command @kbd{@key{RET}} (@code{po-edit-msgstr}) opens a new Emacs

2427

window meant to edit in a new translation, or to modify an already existing

2428

translation. The new window contains a copy of the translation taken from

2429

the current PO file entry, all ready for edition, expunged of all quoting

2430

marks, fully modifiable and with the complete extent of Emacs modifying

2431

commands. When the translator is done with her modifications, she may use

2432

@w{@kbd{C-c C-c}} to close the subedit window with the automatically requoted

2433

results, or @w{@kbd{C-c C-k}} to abort her modifications. @xref{Subedit},

2434

for more information.

2435

2436

The command @kbd{@key{LFD}} (@code{po-msgid-to-msgstr}) initializes, or

2437

reinitializes the translation with the original string. This command is

2438

normally used when the translator wants to redo a fresh translation of

2439

the original string, disregarding any previous work.

2440

2441

It is possible to arrange so, whenever editing an untranslated

2442

entry, the @kbd{@key{LFD}} command be automatically executed. If you set

2443

@code{po-auto-edit-with-msgid} to @code{t}, the translation gets

2444

initialised with the original string, in case none exists already.

2445

The default value for @code{po-auto-edit-with-msgid} is @code{nil}.

2446

2447

In fact, whether it is best to start a translation with an empty

2448

string, or rather with a copy of the original string, is a matter of

2449

taste or habit. Sometimes, the source language and the

2450

target language are so different that is simply best to start writing

2451

on an empty page. At other times, the source and target languages

2452

are so close that it would be a waste to retype a number of words

2453

already being written in the original string. A translator may also

2454

like having the original string right under her eyes, as she will

2455

progressively overwrite the original text with the translation, even

2456

if this requires some extra editing work to get rid of the original.

2457

2458

The command @kbd{k} (@code{po-kill-msgstr}) merely empties the

2459

translation string, so turning the entry into an untranslated

2460

one. But while doing so, its previous contents is put apart in

2461

a special place, known as the kill ring. The command @kbd{w}

2462

(@code{po-kill-ring-save-msgstr}) has also the effect of taking a

2463

copy of the translation onto the kill ring, but it otherwise leaves

2464

the entry alone, and does @emph{not} remove the translation from the

2465

entry. Both commands use exactly the Emacs kill ring, which is shared

2466

between buffers, and which is well known already to Emacs lovers.

2467

2468

The translator may use @kbd{k} or @kbd{w} many times in the course

2469

of her work, as the kill ring may hold several saved translations.

2470

From the kill ring, strings may later be reinserted in various

2471

Emacs buffers. In particular, the kill ring may be used for moving

2472

translation strings between different entries of a single PO file

2473

buffer, or if the translator is handling many such buffers at once,

2474

even between PO files.

2475

2476

To facilitate exchanges with buffers which are not in PO mode, the

2477

translation string put on the kill ring by the @kbd{k} command is fully

2478

unquoted before being saved: external quotes are removed, multi-line

2479

strings are concatenated, and backslash escaped sequences are turned

2480

into their corresponding characters. In the special case of obsolete

2481

entries, the translation is also uncommented prior to saving.

2482

2483

The command @kbd{y} (@code{po-yank-msgstr}) completely replaces the

2484

translation of the current entry by a string taken from the kill ring.

2485

Following Emacs terminology, we then say that the replacement

2486

string is @dfn{yanked} into the PO file buffer.

2487

@xref{Yanking, , , emacs, The Emacs Editor}.

2488

The first time @kbd{y} is used, the translation receives the value of

2489

the most recent addition to the kill ring. If @kbd{y} is typed once

2490

again, immediately, without intervening keystrokes, the translation

2491

just inserted is taken away and replaced by the second most recent

2492

addition to the kill ring. By repeating @kbd{y} many times in a row,

2493

the translator may travel along the kill ring for saved strings,

2494

until she finds the string she really wanted.

2495

2496

When a string is yanked into a PO file entry, it is fully and

2497

automatically requoted for complying with the format PO files should

2498

have. Further, if the entry is obsolete, PO mode then appropriately

2499

push the inserted string inside comments. Once again, translators

2500

should not burden themselves with quoting considerations besides, of

2501

course, the necessity of the translated string itself respective to

2502

the program using it.

2503

2504

Note that @kbd{k} or @kbd{w} are not the only commands pushing strings

2505

on the kill ring, as almost any PO mode command replacing translation

2506

strings (or the translator comments) automatically saves the old string

2507

on the kill ring. The main exceptions to this general rule are the

2508

yanking commands themselves.

2509

2510

To better illustrate the operation of killing and yanking, let's

2511

use an actual example, taken from a common situation. When the

2512

programmer slightly modifies some string right in the program, his

2513

change is later reflected in the PO file by the appearance

2514

of a new untranslated entry for the modified string, and the fact

2515

that the entry translating the original or unmodified string becomes

2516

obsolete. In many cases, the translator might spare herself some work

2517

by retrieving the unmodified translation from the obsolete entry,

2518

then initializing the untranslated entry @code{msgstr} field with

2519

this retrieved translation. Once this done, the obsolete entry is

2520

not wanted anymore, and may be safely deleted.

2521

2522

When the translator finds an untranslated entry and suspects that a

2523

slight variant of the translation exists, she immediately uses @kbd{m}

2524

to mark the current entry location, then starts chasing obsolete

2525

entries with @kbd{o}, hoping to find some translation corresponding

2526

to the unmodified string. Once found, she uses the @kbd{@key{DEL}} command

2527

for deleting the obsolete entry, knowing that @kbd{@key{DEL}} also @emph{kills}

2528

the translation, that is, pushes the translation on the kill ring.

2529

Then, @kbd{r} returns to the initial untranslated entry, and @kbd{y}

2530

then @emph{yanks} the saved translation right into the @code{msgstr}

2531

field. The translator is then free to use @kbd{@key{RET}} for fine

2532

tuning the translation contents, and maybe to later use @kbd{u},

2533

then @kbd{m} again, for going on with the next untranslated string.

2534

2535

When some sequence of keys has to be typed over and over again, the

2536

translator may find it useful to become better acquainted with the Emacs

2537

capability of learning these sequences and playing them back under request.

2538

@xref{Keyboard Macros, , , emacs, The Emacs Editor}.

2539

2540

@node Modifying Comments, Subedit, Modifying Translations, Updating

2541

@section Modifying Comments

2542

2543

Any translation work done seriously will raise many linguistic

2544

difficulties, for which decisions have to be made, and the choices

2545

further documented. These documents may be saved within the

2546

PO file in form of translator comments, which the translator

2547

is free to create, delete, or modify at will. These comments may

2548

be useful to herself when she returns to this PO file after a while.

2549

2550

Comments not having whitespace after the initial @samp{#}, for example,

2551

those beginning with @samp{#.} or @samp{#:}, are @emph{not} translator

2552

comments, they are exclusively created by other @code{gettext} tools.

2553

So, the commands below will never alter such system added comments,

2554

they are not meant for the translator to modify. @xref{PO Files}.

2555

2556

The following commands are somewhat similar to those modifying translations,

2557

so the general indications given for those apply here. @xref{Modifying

2558

Translations}.

2559

2560

@table @kbd

2561

2562

@item #

2563

Interactively edit the translator comments.

2564

2565

@item K

2566

Save the translator comments on the kill ring, and delete it.

2567

2568

@item W

2569

Save the translator comments on the kill ring, without deleting it.

2570

2571

@item Y

2572

Replace the translator comments, taking the new from the kill ring.

2573

2574

@end table

2575

2576

These commands parallel PO mode commands for modifying the translation

2577

strings, and behave much the same way as they do, except that they handle

2578

this part of PO file comments meant for translator usage, rather

2579

than the translation strings. So, if the descriptions given below are

2580

slightly succinct, it is because the full details have already been given.

2581

@xref{Modifying Translations}.

2582

2583

The command @kbd{#} (@code{po-edit-comment}) opens a new Emacs window

2584

containing a copy of the translator comments on the current PO file entry.

2585

If there are no such comments, PO mode understands that the translator wants

2586

to add a comment to the entry, and she is presented with an empty screen.

2587

Comment marks (@kbd{#}) and the space following them are automatically

2588

removed before edition, and reinstated after. For translator comments

2589

pertaining to obsolete entries, the uncommenting and recommenting operations

2590

are done twice. Once in the editing window, the keys @w{@kbd{C-c C-c}}

2591

allow the translator to tell she is finished with editing the comment.

2592

@xref{Subedit}, for further details.

2593

2594

Functions found on @code{po-subedit-mode-hook}, if any, are executed after

2595

the string has been inserted in the edit buffer.

2596

2597

The command @kbd{K} (@code{po-kill-comment}) gets rid of all

2598

translator comments, while saving those comments on the kill ring.

2599

The command @kbd{W} (@code{po-kill-ring-save-comment}) takes

2600

a copy of the translator comments on the kill ring, but leaves

2601

them undisturbed in the current entry. The command @kbd{Y}

2602

(@code{po-yank-comment}) completely replaces the translator comments

2603

by a string taken at the front of the kill ring. When this command

2604

is immediately repeated, the comments just inserted are withdrawn,

2605

and replaced by other strings taken along the kill ring.

2606

2607

On the kill ring, all strings have the same nature. There is no

2608

distinction between @emph{translation} strings and @emph{translator

2609

comments} strings. So, for example, let's presume the translator

2610

has just finished editing a translation, and wants to create a new

2611

translator comment to document why the previous translation was

2612

not good, just to remember what was the problem. Foreseeing that she

2613

will do that in her documentation, the translator may want to quote

2614

the previous translation in her translator comments. To do so, she

2615

may initialize the translator comments with the previous translation,

2616

still at the head of the kill ring. Because editing already pushed the

2617

previous translation on the kill ring, she merely has to type @kbd{M-w}

2618

prior to @kbd{#}, and the previous translation will be right there,

2619

all ready for being introduced by some explanatory text.

2620

2621

On the other hand, presume there are some translator comments already

2622

and that the translator wants to add to those comments, instead

2623

of wholly replacing them. Then, she should edit the comment right

2624

away with @kbd{#}. Once inside the editing window, she can use the

2625

regular Emacs commands @kbd{C-y} (@code{yank}) and @kbd{M-y}

2626

(@code{yank-pop}) to get the previous translation where she likes.

2627

2628

@node Subedit, C Sources Context, Modifying Comments, Updating

2629

@section Details of Sub Edition

2630

2631

The PO subedit minor mode has a few peculiarities worth being described

2632

in fuller detail. It installs a few commands over the usual editing set

2633

of Emacs, which are described below.

2634

2635

@table @kbd

2636

@item C-c C-c

2637

Complete edition.

2638

2639

@item C-c C-k

2640

Abort edition.

2641

2642

@item C-c C-a

2643

Consult auxiliary PO files.

2644

2645

@end table

2646

2647

The window's contents represents a translation for a given message,

2648

or a translator comment. The translator may modify this window to

2649

her heart's content. Once this done, the command @w{@kbd{C-c C-c}}

2650

(@code{po-subedit-exit}) may be used to return the edited translation into

2651

the PO file, replacing the original translation, even if it moved out of

2652

sight or if buffers were switched.

2653

2654

If the translator becomes unsatisfied with her translation or comment,

2655

to the extent she prefers keeping what was existent prior to the

2656

@kbd{@key{RET}} or @kbd{#} command, she may use the command @w{@kbd{C-c C-k}}

2657

(@code{po-subedit-abort}) to merely get rid of edition, while preserving

2658

the original translation or comment. Another way would be for her to exit

2659

normally with @w{@kbd{C-c C-c}}, then type @code{U} once for undoing the

2660

whole effect of last edition.

2661

2662

The command @w{@kbd{C-c C-a}} allows for glancing through translations

2663

already achieved in other languages, directly while editing the current

2664

translation. This may be quite convenient when the translator is fluent

2665

at many languages, but of course, only makes sense when such completed

2666

auxiliary PO files are already available to her (@pxref{Auxiliary}).

2667

2668

Functions found on @code{po-subedit-mode-hook}, if any, are executed after

2669

the string has been inserted in the edit buffer.

2670

2671

While editing her translation, the translator should pay attention to not

2672

inserting unwanted @kbd{@key{RET}} (newline) characters at the end of

2673

the translated string if those are not meant to be there, or to removing

2674

such characters when they are required. Since these characters are not

2675

visible in the editing buffer, they are easily introduced by mistake.

2676

To help her, @kbd{@key{RET}} automatically puts the character @kbd{<}

2677

at the end of the string being edited, but this @kbd{<} is not really

2678

part of the string. On exiting the editing window with @w{@kbd{C-c C-c}},

2679

PO mode automatically removes such @kbd{<} and all whitespace added after

2680

it. If the translator adds characters after the terminating @kbd{<}, it

2681

looses its delimiting property and integrally becomes part of the string.

2682

If she removes the delimiting @kbd{<}, then the edited string is taken

2683

@emph{as is}, with all trailing newlines, even if invisible. Also, if

2684

the translated string ought to end itself with a genuine @kbd{<}, then

2685

the delimiting @kbd{<} may not be removed; so the string should appear,

2686

in the editing window, as ending with two @kbd{<} in a row.

2687

2688

When a translation (or a comment) is being edited, the translator may move

2689

the cursor back into the PO file buffer and freely move to other entries,

2690

browsing at will. If, with an edition pending, the translator wanders in the

2691

PO file buffer, she may decide to start modifying another entry. Each entry

2692

being edited has its own subedit buffer. It is possible to simultaneously

2693

edit the translation @emph{and} the comment of a single entry, or to

2694

edit entries in different PO files, all at once. Typing @kbd{@key{RET}}

2695

on a field already being edited merely resumes that particular edit. Yet,

2696

the translator should better be comfortable at handling many Emacs windows!

2697

2698

Pending subedits may be completed or aborted in any order, regardless

2699

of how or when they were started. When many subedits are pending and the

2700

translator asks for quitting the PO file (with the @kbd{q} command), subedits

2701

are automatically resumed one at a time, so she may decide for each of them.

2702

2703

@node C Sources Context, Auxiliary, Subedit, Updating

2704

@section C Sources Context

2705

2706

PO mode is particularily powerful when used with PO files

2707

created through GNU @code{gettext} utilities, as those utilities

2708

insert special comments in the PO files they generate.

2709

Some of these special comments relate the PO file entry to

2710

exactly where the untranslated string appears in the program sources.

2711

2712

When the translator gets to an untranslated entry, she is fairly

2713

often faced with an original string which is not as informative as

2714

it normally should be, being succinct, cryptic, or otherwise ambiguous.

2715

Before chosing how to translate the string, she needs to understand

2716

better what the string really means and how tight the translation has

2717

to be. Most of times, when problems arise, the only way left to make

2718

her judgment is looking at the true program sources from where this

2719

string originated, searching for surrounding comments the programmer

2720

might have put in there, and looking around for helping clues of

2721

@emph{any} kind.

2722

2723

Surely, when looking at program sources, the translator will receive

2724

more help if she is a fluent programmer. However, even if she is

2725

not versed in programming and feels a little lost in C code, the

2726

translator should not be shy at taking a look, once in a while.

2727

It is most probable that she will still be able to find some of the

2728

hints she needs. She will learn quickly to not feel uncomfortable

2729

in program code, paying more attention to programmer's comments,

2730

variable and function names (if he dared chosing them well), and

2731

overall organization, than to programmation itself.

2732

2733

The following commands are meant to help the translator at getting

2734

program source context for a PO file entry.

2735

2736

@table @kbd

2737

@item s

2738

Resume the display of a program source context, or cycle through them.

2739

2740

@item M-s

2741

Display of a program source context selected by menu.

2742

2743

@item S

2744

Add a directory to the search path for source files.

2745

2746

@item M-S

2747

Delete a directory from the search path for source files.

2748

2749

@end table

2750

2751

The commands @kbd{s} (@code{po-cycle-reference}) and @kbd{M-s}

2752

(@code{po-select-source-reference}) both open another window displaying

2753

some source program file, and already positioned in such a way that

2754

it shows an actual use of the string to be translated. By doing

2755

so, the command gives source program context for the string. But if

2756

the entry has no source context references, or if all references

2757

are unresolved along the search path for program sources, then the

2758

command diagnoses this as an error.

2759

2760

Even if @kbd{s} (or @kbd{M-s}) opens a new window, the cursor stays

2761

in the PO file window. If the translator really wants to

2762

get into the program source window, she ought to do it explicitly,

2763

maybe by using command @kbd{O}.

2764

2765

When @kbd{s} is typed for the first time, or for a PO file entry which

2766

is different of the last one used for getting source context, then the

2767

command reacts by giving the first context available for this entry,

2768

if any. If some context has already been recently displayed for the

2769

current PO file entry, and the translator wandered off to do other

2770

things, typing @kbd{s} again will merely resume, in another window,

2771

the context last displayed. In particular, if the translator moved

2772

the cursor away from the context in the source file, the command will

2773

bring the cursor back to the context. By using @kbd{s} many times

2774

in a row, with no other commands intervening, PO mode will cycle to

2775

the next available contexts for this particular entry, getting back

2776

to the first context once the last has been shown.

2777

2778

The command @kbd{M-s} behaves differently. Instead of cycling through

2779

references, it lets the translator choose a particular reference among

2780

many, and displays that reference. It is best used with completion,

2781

if the translator types @kbd{@key{TAB}} immediately after @kbd{M-s}, in

2782

response to the question, she will be offered a menu of all possible

2783

references, as a reminder of which are the acceptable answers.

2784

This command is useful only where there are really many contexts

2785

available for a single string to translate.

2786

2787

Program source files are usually found relative to where the PO

2788

file stands. As a special provision, when this fails, the file is

2789

also looked for, but relative to the directory immediately above it.

2790

Those two cases take proper care of most PO files. However, it might

2791

happen that a PO file has been moved, or is edited in a different

2792

place than its normal location. When this happens, the translator

2793

should tell PO mode in which directory normally sits the genuine PO

2794

file. Many such directories may be specified, and all together, they

2795

constitute what is called the @dfn{search path} for program sources.

2796

The command @kbd{S} (@code{po-consider-source-path}) is used to interactively

2797

enter a new directory at the front of the search path, and the command

2798

@kbd{M-S} (@code{po-ignore-source-path}) is used to select, with completion,

2799

one of the directories she does not want anymore on the search path.

2800

2801

@node Auxiliary, Compendium, C Sources Context, Updating

2802

@section Consulting Auxiliary PO Files

2803

2804

PO mode is able to help the knowledgeable translator, being fluent in

2805

many languages, at taking advantage of translations already achieved

2806

in other languages she just happens to know. It provides these other

2807

language translations as additional context for her own work. Moreover,

2808

it has features to ease the production of translations for many languages

2809

at once, for translators preferring to work in this way.

2810

2811

An @dfn{auxiliary} PO file is an existing PO file meant for the same

2812

package the translator is working on, but targeted to a different mother

2813

tongue language. Commands exist for declaring and handling auxiliary

2814

PO files, and also for showing contexts for the entry under work.

2815

2816

Here are the auxiliary file commands available in PO mode.

2817

2818

@table @kbd

2819

@item a

2820

Seek auxiliary files for another translation for the same entry.

2821

2822

@item M-a

2823

Switch to a particular auxiliary file.

2824

2825

@item A

2826

Declare this PO file as an auxiliary file.

2827

2828

@item M-A

2829

Remove this PO file from the list of auxiliary files.

2830

2831

@end table

2832

2833

Command @kbd{A} (@code{po-consider-as-auxiliary}) adds the current

2834

PO file to the list of auxiliary files, while command @kbd{M-A}

2835

(@code{po-ignore-as-auxiliary} just removes it.

2836

2837

The command @kbd{a} (@code{po-cycle-auxiliary}) seeks all auxiliary PO

2838

files, round-robin, searching for a translated entry in some other language

2839

having an @code{msgid} field identical as the one for the current entry.

2840

The found PO file, if any, takes the place of the current PO file in

2841

the display (its window gets on top). Before doing so, the current PO

2842

file is also made into an auxiliary file, if not already. So, @kbd{a}

2843

in this newly displayed PO file will seek another PO file, and so on,

2844

so repeating @kbd{a} will eventually yield back the original PO file.

2845

2846

The command @kbd{M-a} (@code{po-select-auxiliary}) asks the translator

2847

for her choice of a particular auxiliary file, with completion, and

2848

then switches to that selected PO file. The command also checks if

2849

the selected file has an @code{msgid} field identical as the one for

2850

the current entry, and if yes, this entry becomes current. Otherwise,

2851

the cursor of the selected file is left undisturbed.

2852

2853

For all this to work fully, auxiliary PO files will have to be normalized,

2854

in that way that @code{msgid} fields should be written @emph{exactly}

2855

the same way. It is possible to write @code{msgid} fields in various

2856

ways for representing the same string, different writing would break the

2857

proper behaviour of the auxiliary file commands of PO mode. This is not

2858

expected to be much a problem in practice, as most existing PO files have

2859

their @code{msgid} entries written by the same GNU @code{gettext} tools.

2860

2861

However, PO files initially created by PO mode itself, while marking

2862

strings in source files, are normalised differently. So are PO

2863

files resulting of the the @samp{M-x normalize} command. Until these

2864

discrepancies between PO mode and other GNU @code{gettext} tools get

2865

fully resolved, the translator should stay aware of normalisation issues.

2866

2867

@node Compendium, , Auxiliary, Updating

2868

@section Using Translation Compendiums

2869

2870

@c FIXME: Rewrite.

2871

2872

Compendiums are yet to be implemented.

2873

2874

An incoming PO mode feature will let the translator maintain a

2875

compendium of already achieved translations. A @dfn{compendium}

2876

is a special PO file containing a set of translations recurring in

2877

many different packages. The translator will be given commands for

2878

adding entries to her compendium, and later initializing untranslated

2879

entries, or updating already translated entries, from translations

2880

kept in the compendium. For this to work, however, the compendium

2881

would have to be normalized. @xref{Normalizing}.

2882

2883

@c It is not useful that I modify the @file{lib/} routines if not done in

2884

@c the true sources. How do you/I/they proceed for getting this job done?

2885

@c I presume that @file{lib/} routines will all use @code{gettext} for

2886

@c the time being.

2887

2888

@node Binaries, Users, Updating, Top

2889

@chapter Producing Binary MO Files

2890

2891

@c FIXME: Rewrite.

2892

2893

@menu

2894

* msgfmt Invocation:: Invoking the @code{msgfmt} Program

2895

* MO Files:: The Format of GNU MO Files

2896

@end menu

2897

2898

@node msgfmt Invocation, MO Files, Binaries, Binaries

2899

@section Invoking the @code{msgfmt} Program

2900

2901

@c FIXME: Rewrite.

2902

2903

@example

2904

Usage: msgfmt [@var{option}] @var{filename}.po @dots{}

2905

@end example

2906

2907

@table @samp

2908

@item -a @var{number}

2909

@itemx --alignment=@var{number}

2910

Align strings to @var{number} bytes (default: 1).

2911

@c Currently the README mentions that this constant could be changed by

2912

@c the installer by changing the value in config.h. Should this go away?

2913

2914

@item -h

2915

@itemx --help

2916

Display this help and exit.

2917

2918

@item --no-hash

2919

Binary file will not include the hash table.

2920

2921

@item -o @var{file}

2922

@itemx --output-file=@var{file}

2923

Specify output file name as @var{file}.

2924

2925

@itemx --strict

2926

Direct the program to work strictly following the Uniforum/Sun

2927

implementation. Currently this only affects the naming of the output

2928

file. If this option is not given the name of the output file is the

2929

same as the domain name. If the strict Uniforum mode is enabled the

2930

suffix @file{.mo} is added to the file name if it is not already

2931

present.

2932

2933

We find this behaviour of Sun's implementation rather silly and so by

2934

default this mode is @emph{not} selected.

2935

2936

@item -v

2937

@itemx --verbose

2938

Detect and diagnose input file anomalies which might represent

2939

translation errors. The @code{msgid} and @code{msgstr} strings are

2940

studied and compared. It is considered abnormal that one string

2941

starts or ends with a newline while the other does not.

2942

2943

Also, if the string represents a format string used in a

2944

@code{printf}-like function both strings should have the same number of

2945

@samp{%} format specifiers, with matching types. If the flag

2946

@code{c-format} or @code{possible-c-format} appears in the special

2947

comment @key{#,} for this entry a check is performed. For example, the

2948

check will diagnose using @samp{%.*s} against @samp{%s}, or @samp{%d}

2949

against @samp{%s}, or @samp{%d} against @samp{%x}. It can even handle

2950

positional parameters.

2951

2952

Normally the @code{xgettext} program automatically decides whether a

2953

string is a format string or not. This algorithm is not perfect,

2954

though. It might regard a string as a format string though it is not

2955

used in a @code{printf}-like function and so @code{msgfmt} might report

2956

errors where there are none. Or the other way round: a string is not

2957

regarded as a format string but it is used in a @code{printf}-like

2958

function.

2959

2960

So solve this problem the programmer can dictate the decision to the

2961

@code{xgettext} program (@pxref{c-format}). The translator should not

2962

consider removing the flag from the @key{#,} line. This "fix" would be

2963

reversed again as soon as @code{msgmerge} is called the next time.

2964

2965

@item -V

2966

@itemx --version

2967

Output version information and exit.

2968

2969

@end table

2970

2971

If input file is @samp{-}, standard input is read. If output file

2972

is @samp{-}, output is written to standard output.

2973

2974

@node MO Files, , msgfmt Invocation, Binaries

2975

@section The Format of GNU MO Files

2976

2977

The format of the generated MO files is best described by a picture,

2978

which appears below.

2979

2980

The first two words serve the identification of the file. The magic

2981

number will always signal GNU MO files. The number is stored in the

2982

byte order of the generating machine, so the magic number really is

2983

two numbers: @code{0x950412de} and @code{0xde120495}. The second

2984

word describes the current revision of the file format. For now the

2985

revision is 0. This might change in future versions, and ensures

2986

that the readers of MO files can distinguish new formats from old

2987

ones, so that both can be handled correctly. The version is kept

2988

separate from the magic number, instead of using different magic

2989

numbers for different formats, mainly because @file{/etc/magic} is

2990

not updated often. It might be better to have magic separated from

2991

internal format version identification.

2992

2993

Follow a number of pointers to later tables in the file, allowing

2994

for the extension of the prefix part of MO files without having to

2995

recompile programs reading them. This might become useful for later

2996

inserting a few flag bits, indication about the charset used, new

2997

tables, or other things.

2998

2999

Then, at offset @var{O} and offset @var{T} in the picture, two tables

3000

of string descriptors can be found. In both tables, each string

3001

descriptor uses two 32 bits integers, one for the string length,

3002

another for the offset of the string in the MO file, counting in bytes

3003

from the start of the file. The first table contains descriptors

3004

for the original strings, and is sorted so the original strings

3005

are in increasing lexicographical order. The second table contains

3006

descriptors for the translated strings, and is parallel to the first

3007

table: to find the corresponding translation one has to access the

3008

array slot in the second array with the same index.

3009

3010

Having the original strings sorted enables the use of simple binary

3011

search, for when the MO file does not contain an hashing table, or

3012

for when it is not practical to use the hashing table provided in

3013

the MO file. This also has another advantage, as the empty string

3014

in a PO file GNU @code{gettext} is usually @emph{translated} into

3015

some system information attached to that particular MO file, and the

3016

empty string necessarily becomes the first in both the original and

3017

translated tables, making the system information very easy to find.

3018

3019

The size @var{S} of the hash table can be zero. In this case, the

3020

hash table itself is not contained in the MO file. Some people might

3021

prefer this because a precomputed hashing table takes disk space, and

3022

does not win @emph{that} much speed. The hash table contains indices

3023

to the sorted array of strings in the MO file. Conflict resolution is

3024

done by double hashing. The precise hashing algorithm used is fairly

3025

dependent of GNU @code{gettext} code, and is not documented here.

3026

3027

As for the strings themselves, they follow the hash file, and each

3028

is terminated with a @key{NUL}, and this @key{NUL} is not counted in

3029

the length which appears in the string descriptor. The @code{msgfmt}

3030

program has an option selecting the alignment for MO file strings.

3031

With this option, each string is separately aligned so it starts at

3032

an offset which is a multiple of the alignment value. On some RISC

3033

machines, a correct alignment will speed things up.

3034

3035

Plural forms are stored by letting the plural of the original string

3036

follow the singular of the original string, separated through a

3037

@key{NUL} byte. The length which appears in the string descriptor

3038

includes both. However, only the singular of the original string

3039

takes part in the hash table lookup. The plural variants of the

3040

translation are all stored consecutively, separated through a

3041

@key{NUL} byte. Here also, the length in the string descriptor

3042

includes all of them.

3043

3044

Nothing prevents a MO file from having embedded @key{NUL}s in strings.

3045

However, the program interface currently used already presumes

3046

that strings are @key{NUL} terminated, so embedded @key{NUL}s are

3047

somewhat useless. But the MO file format is general enough so other

3048

interfaces would be later possible, if for example, we ever want to

3049

implement wide characters right in MO files, where @key{NUL} bytes may

3050

accidently appear. (No, we don't want to have wide characters in MO

3051

files. They would make the file unnecessarily large, and the

3052

@samp{wchar_t} type being platform dependent, MO files would be

3053

platform dependent as well.)

3054

3055

This particular issue has been strongly debated in the GNU

3056

@code{gettext} development forum, and it is expectable that MO file

3057

format will evolve or change over time. It is even possible that many

3058

formats may later be supported concurrently. But surely, we have to

3059

start somewhere, and the MO file format described here is a good start.

3060

Nothing is cast in concrete, and the format may later evolve fairly

3061

easily, so we should feel comfortable with the current approach.

3062

3063

@example

3064

@group

3065

byte

3066

+------------------------------------------+

3067

0 | magic number = 0x950412de |

3068

| |

3069

4 | file format revision = 0 |

3070

| |

3071

8 | number of strings | == N

3072

| |

3073

12 | offset of table with original strings | == O

3074

| |

3075

16 | offset of table with translation strings | == T

3076

| |

3077

20 | size of hashing table | == S

3078

| |

3079

24 | offset of hashing table | == H

3080

| |

3081

. .

3082

. (possibly more entries later) .

3083

. .

3084

| |

3085

O | length & offset 0th string ----------------.

3086

O + 8 | length & offset 1st string ------------------.

3087

... ... | |

3088

O + ((N-1)*8)| length & offset (N-1)th string | | |

3089

| | | |

3090

T | length & offset 0th translation ---------------.

3091

T + 8 | length & offset 1st translation -----------------.

3092

... ... | | | |

3093

T + ((N-1)*8)| length & offset (N-1)th translation | | | | |

3094

| | | | | |

3095

H | start hash table | | | | |

3096

... ... | | | |

3097

H + S * 4 | end hash table | | | | |

3098

| | | | | |

3099

| NUL terminated 0th string <----------------' | | |

3100

| | | | |

3101

| NUL terminated 1st string <------------------' | |

3102

| | | |

3103

... ... | |

3104

| | | |

3105

| NUL terminated 0th translation <---------------' |

3106

| | |

3107

| NUL terminated 1st translation <-----------------'

3108

| |

3109

... ...

3110

| |

3111

+------------------------------------------+

3112

@end group

3113

@end example

3114

3115

@node Users, Programmers, Binaries, Top

3116

@chapter The User's View

3117

3118

When GNU @code{gettext} will truly have reached its goal, average users

3119

should feel some kind of astonished pleasure, seeing the effect of

3120

that strange kind of magic that just makes their own native language

3121

appear everywhere on their screens. As for naive users, they would

3122

ideally have no special pleasure about it, merely taking their own

3123

language for @emph{granted}, and becoming rather unhappy otherwise.

3124

3125

So, let's try to describe here how we would like the magic to operate,

3126

as we want the users' view to be the simplest, among all ways one

3127

could look at GNU @code{gettext}. All other software engineers:

3128

programmers, translators, maintainers, should work together in such a

3129

way that the magic becomes possible. This is a long and progressive

3130

undertaking, and information is available about the progress of the

3131

Translation Project.

3132

3133

When a package is distributed, there are two kinds of users:

3134

@dfn{installers} who fetch the distribution, unpack it, configure

3135

it, compile it and install it for themselves or others to use; and

3136

@dfn{end users} that call programs of the package, once these have

3137

been installed at their site. GNU @code{gettext} is offering magic

3138

for both installers and end users.

3139

3140

@menu

3141

* Matrix:: The Current @file{ABOUT-NLS} Matrix

3142

* Installers:: Magic for Installers

3143

* End Users:: Magic for End Users

3144

@end menu

3145

3146

@node Matrix, Installers, Users, Users

3147

@section The Current @file{ABOUT-NLS} Matrix

3148

3149

Languages are not equally supported in all packages using GNU

3150

@code{gettext}. To know if some package uses GNU @code{gettext}, one

3151

may check the distribution for the @file{ABOUT-NLS} information file, for

3152

some @file{@var{ll}.po} files, often kept together into some @file{po/}

3153

directory, or for an @file{intl/} directory. Internationalized packages

3154

have usually many @file{@var{ll}.po} files, where @var{ll} represents

3155

the language. @ref{End Users} for a complete description of the format

3156

for @var{ll}.

3157

3158

More generally, a matrix is available for showing the current state

3159

of the Translation Project, listing which packages are prepared for

3160

multi-lingual messages, and which languages are supported by each.

3161

Because this information changes often, this matrix is not kept within

3162

this GNU @code{gettext} manual. This information is often found in

3163

file @file{ABOUT-NLS} from various distributions, but is also as old as

3164

the distribution itself. A recent copy of this @file{ABOUT-NLS} file,

3165

containing up-to-date information, should generally be found on the

3166

Translation Project sites, and also on most GNU archive sites.

3167

3168

@node Installers, End Users, Matrix, Users

3169

@section Magic for Installers

3170

3171

By default, packages fully using GNU @code{gettext}, internally,

3172

are installed in such a way that they to allow translation of

3173

messages. At @emph{configuration} time, those packages should

3174

automatically detect whether the underlying host system already provides

3175

the GNU @code{gettext} functions. If not,

3176

the GNU @code{gettext} library should be automatically prepared

3177

and used. Installers may use special options at configuration

3178

time for changing this behavior. The command @samp{./configure

3179

--with-included-gettext} bypasses system @code{gettext} to

3180

use the included GNU @code{gettext} instead,

3181

while @samp{./configure --disable-nls}

3182

produces programs totally unable to translate messages.

3183

3184

Internationalized packages have usually many @file{@var{ll}.po}

3185

files. Unless

3186

translations are disabled, all those available are installed together

3187

with the package. However, the environment variable @code{LINGUAS}

3188

may be set, prior to configuration, to limit the installed set.

3189

@code{LINGUAS} should then contain a space separated list of two-letter

3190

codes, stating which languages are allowed.

3191

3192

@node End Users, , Installers, Users

3193

@section Magic for End Users

3194

3195

We consider here those packages using GNU @code{gettext} internally,

3196

and for which the installers did not disable translation at

3197

@emph{configure} time. Then, users only have to set the @code{LANG}

3198

environment variable to the appropriate @samp{@var{ll}_@var{CC}}

3199

combination prior to using the programs in the package. @xref{Matrix}.

3200

For example, let's presume a German site. At the shell prompt, users

3201

merely have to execute @w{@samp{setenv LANG de_DE}} (in @code{csh}) or

3202

@w{@samp{export LANG; LANG=de_DE}} (in @code{sh}). They could even do

3203

this from their @file{.login} or @file{.profile} file.

3204

3205

@node Programmers, Translators, Users, Top

3206

@chapter The Programmer's View

3207

3208

@c FIXME: Reorganize whole chapter.

3209

3210

One aim of the current message catalog implementation provided by

3211

GNU @code{gettext} was to use the systems message catalog handling, if the

3212

installer wishes to do so. So we perhaps should first take a look at

3213

the solutions we know about. The people in the POSIX committee did not

3214

manage to agree on one of the semi-official standards which we'll

3215

describe below. In fact they couldn't agree on anything, so they decided

3216

only to include an example of an interface. The major Unix vendors

3217

are split in the usage of the two most important specifications: X/Open's

3218

catgets vs. Uniforum's gettext interface. We'll describe them both and

3219

later explain our solution of this dilemma.

3220

3221

@menu

3222

* catgets:: About @code{catgets}

3223

* gettext:: About @code{gettext}

3224

* Comparison:: Comparing the two interfaces

3225

* Using libintl.a:: Using libintl.a in own programs

3226

* gettext grok:: Being a @code{gettext} grok

3227

* Temp Programmers:: Temporary Notes for the Programmers Chapter

3228

@end menu

3229

3230

@node catgets, gettext, Programmers, Programmers

3231

@section About @code{catgets}

3232

3233

The @code{catgets} implementation is defined in the X/Open Portability

3234

Guide, Volume 3, XSI Supplementary Definitions, Chapter 5. But the

3235

process of creating this standard seemed to be too slow for some of

3236

the Unix vendors so they created their implementations on preliminary

3237

versions of the standard. Of course this leads again to problems while

3238

writing platform independent programs: even the usage of @code{catgets}

3239

does not guarantee a unique interface.

3240

3241

Another, personal comment on this that only a bunch of committee members

3242

could have made this interface. They never really tried to program

3243

using this interface. It is a fast, memory-saving implementation, an

3244

user can happily live with it. But programmers hate it (at least me and

3245

some others do@dots{})

3246

3247

But we must not forget one point: after all the trouble with transfering

3248

the rights on Unix(tm) they at last came to X/Open, the very same who

3249

published this specification. This leads me to making the prediction

3250

that this interface will be in future Unix standards (e.g. Spec1170) and

3251

therefore part of all Unix implementation (implementations, which are

3252

@emph{allowed} to wear this name).

3253

3254

@menu

3255

* Interface to catgets:: The interface

3256

* Problems with catgets:: Problems with the @code{catgets} interface?!

3257

@end menu

3258

3259

@node Interface to catgets, Problems with catgets, catgets, catgets

3260

@subsection The Interface

3261

3262

The interface to the @code{catgets} implementation consists of three

3263

functions which correspond to those used in file access: @code{catopen}

3264

to open the catalog for using, @code{catgets} for accessing the message

3265

tables, and @code{catclose} for closing after work is done. Prototypes

3266

for the functions and the needed definitions are in the

3267

@code{<nl_types.h>} header file.

3268

3269

@code{catopen} is used like in this:

3270

3271

@example

3272

nl_catd catd = catopen ("catalog_name", 0);

3273

@end example

3274

3275

The function takes as the argument the name of the catalog. This usual

3276

refers to the name of the program or the package. The second parameter

3277

is not further specified in the standard. I don't even know whether it

3278

is implemented consistently among various systems. So the common advice

3279

is to use @code{0} as the value. The return value is a handle to the

3280

message catalog, equivalent to handles to file returned by @code{open}.

3281

3282

This handle is of course used in the @code{catgets} function which can

3283

be used like this:

3284

3285

@example

3286

char *translation = catgets (catd, set_no, msg_id, "original string");

3287

@end example

3288

3289

The first parameter is this catalog descriptor. The second parameter

3290

specifies the set of messages in this catalog, in which the message

3291

described by @code{msg_id} is obtained. @code{catgets} therefore uses a

3292

three-stage addressing:

3293

3294

@display

3295

catalog name @result{} set number @result{} message ID @result{} translation

3296

@end display

3297

3298

@c Anybody else loving Haskell??? :-) -- Uli

3299

3300

The fourth argument is not used to address the translation. It is given

3301

as a default value in case when one of the addressing stages fail. One

3302

important thing to remember is that although the return type of catgets

3303

is @code{char *} the resulting string @emph{must not} be changed. It

3304

should better be @code{const char *}, but the standard is published in

3305

1988, one year before ANSI C.

3306

3307

@noindent

3308

The last of these function functions is used and behaves as expected:

3309

3310

@example

3311

catclose (catd);

3312

@end example

3313

3314

After this no @code{catgets} call using the descriptor is legal anymore.

3315

3316

@node Problems with catgets, , Interface to catgets, catgets

3317

@subsection Problems with the @code{catgets} Interface?!

3318

3319

Now that this description seemed to be really easy --- where are the

3320

problem we speak of? In fact the interface could be used in a

3321

reasonable way, but constructing the message catalogs is a pain. The

3322

reason for this lies in the third argument of @code{catgets}: the unique

3323

message ID. This has to be a numeric value for all messages in a single

3324

set. Perhaps you could imagine the problems keeping such a list while

3325

changing the source code. Add a new message here, remove one there. Of

3326

course there have been developed a lot of tools helping to organize this

3327

chaos but one as the other fails in one aspect or the other. We don't

3328

want to say that the other approach has no problems but they are far

3329

more easy to manage.

3330

3331

@node gettext, Comparison, catgets, Programmers

3332

@section About @code{gettext}

3333

3334

The definition of the @code{gettext} interface comes from a Uniforum

3335

proposal and it is followed by at least one major Unix vendor

3336

(Sun) in its last developments. It is not specified in any official

3337

standard, though.

3338

3339

The main points about this solution is that it does not follow the

3340

method of normal file handling (open-use-close) and that it does not

3341

burden the programmer so many task, especially the unique key handling.

3342

Of course here is also a unique key needed, but this key is the message

3343

itself (how long or short it is). See @ref{Comparison} for a more

3344

detailed comparison of the two methods.

3345

3346

The following section contains a rather detailed description of the

3347

interface. We make it that detailed because this is the interface

3348

we chose for the GNU @code{gettext} Library. Programmers interested

3349

in using this library will be interested in this description.

3350

3351

@menu

3352

* Interface to gettext:: The interface

3353

* Ambiguities:: Solving ambiguities

3354

* Locating Catalogs:: Locating message catalog files

3355

* Charset conversion:: How to request conversion to Unicode

3356

* Plural forms:: Additional functions for handling plurals

3357

* GUI program problems:: Another technique for solving ambiguities

3358

* Optimized gettext:: Optimization of the *gettext functions

3359

@end menu

3360

3361

@node Interface to gettext, Ambiguities, gettext, gettext

3362

@subsection The Interface

3363

3364

The minimal functionality an interface must have is a) to select a

3365

domain the strings are coming from (a single domain for all programs is

3366

not reasonable because its construction and maintenance is difficult,

3367

perhaps impossible) and b) to access a string in a selected domain.

3368

3369

This is principally the description of the @code{gettext} interface. It

3370

has a global domain which unqualified usages reference. Of course this

3371

domain is selectable by the user.

3372

3373

@example

3374

char *textdomain (const char *domain_name);

3375

@end example

3376

3377

This provides the possibility to change or query the current status of

3378

the current global domain of the @code{LC_MESSAGE} category. The

3379

argument is a null-terminated string, whose characters must be legal in

3380

the use in filenames. If the @var{domain_name} argument is @code{NULL},

3381

the function return the current value. If no value has been set

3382

before, the name of the default domain is returned: @emph{messages}.

3383

Please note that although the return value of @code{textdomain} is of

3384

type @code{char *} no changing is allowed. It is also important to know

3385

that no checks of the availability are made. If the name is not

3386

available you will see this by the fact that no translations are provided.

3387

3388

@noindent

3389

To use a domain set by @code{textdomain} the function

3390

3391

@example

3392

char *gettext (const char *msgid);

3393

@end example

3394

3395

is to be used. This is the simplest reasonable form one can imagine.

3396

The translation of the string @var{msgid} is returned if it is available

3397

in the current domain. If not available the argument itself is

3398

returned. If the argument is @code{NULL} the result is undefined.

3399

3400

One things which should come into mind is that no explicit dependency to

3401

the used domain is given. The current value of the domain for the

3402

@code{LC_MESSAGES} locale is used. If this changes between two

3403

executions of the same @code{gettext} call in the program, both calls

3404

reference a different message catalog.

3405

3406

For the easiest case, which is normally used in internationalized

3407

packages, once at the beginning of execution a call to @code{textdomain}

3408

is issued, setting the domain to a unique name, normally the package

3409

name. In the following code all strings which have to be translated are

3410

filtered through the gettext function. That's all, the package speaks

3411

your language.

3412

3413

@node Ambiguities, Locating Catalogs, Interface to gettext, gettext

3414

@subsection Solving Ambiguities

3415

3416

While this single name domain works well for most applications there

3417

might be the need to get translations from more than one domain. Of

3418

course one could switch between different domains with calls to

3419

@code{textdomain}, but this is really not convenient nor is it fast. A

3420

possible situation could be one case subject to discussion during this

3421

writing: all

3422

error messages of functions in the set of common used functions should

3423

go into a separate domain @code{error}. By this mean we would only need

3424

to translate them once.

3425

Another case are messages from a library, as these @emph{have} to be

3426

independent of the current domain set by the application.

3427

3428

@noindent

3429

For this reasons there are two more functions to retrieve strings:

3430

3431

@example

3432

char *dgettext (const char *domain_name, const char *msgid);

3433

char *dcgettext (const char *domain_name, const char *msgid,

3434

int category);

3435

@end example

3436

3437

Both take an additional argument at the first place, which corresponds

3438

to the argument of @code{textdomain}. The third argument of

3439

@code{dcgettext} allows to use another locale but @code{LC_MESSAGES}.

3440

But I really don't know where this can be useful. If the

3441

@var{domain_name} is @code{NULL} or @var{category} has an value beside

3442

the known ones, the result is undefined. It should also be noted that

3443

this function is not part of the second known implementation of this

3444

function family, the one found in Solaris.

3445

3446

A second ambiguity can arise by the fact, that perhaps more than one

3447

domain has the same name. This can be solved by specifying where the

3448

needed message catalog files can be found.

3449

3450

@example

3451

char *bindtextdomain (const char *domain_name,

3452

const char *dir_name);

3453

@end example

3454

3455

Calling this function binds the given domain to a file in the specified

3456

directory (how this file is determined follows below). Especially a

3457

file in the systems default place is not favored against the specified

3458

file anymore (as it would be by solely using @code{textdomain}). A

3459

@code{NULL} pointer for the @var{dir_name} parameter returns the binding

3460

associated with @var{domain_name}. If @var{domain_name} itself is

3461

@code{NULL} nothing happens and a @code{NULL} pointer is returned. Here

3462

again as for all the other functions is true that none of the return

3463

value must be changed!

3464

3465

It is important to remember that relative path names for the

3466

@var{dir_name} parameter can be trouble. Since the path is always

3467

computed relative to the current directory different results will be

3468

achieved when the program executes a @code{chdir} command. Relative

3469

paths should always be avoided to avoid dependencies and

3470

unreliabilities.

3471

3472

@node Locating Catalogs, Charset conversion, Ambiguities, gettext

3473

@subsection Locating Message Catalog Files

3474

3475

Because many different languages for many different packages have to be

3476

stored we need some way to add these information to file message catalog

3477

files. The way usually used in Unix environments is have this encoding

3478

in the file name. This is also done here. The directory name given in

3479

@code{bindtextdomain}s second argument (or the default directory),

3480

followed by the value and name of the locale and the domain name are

3481

concatenated:

3482

3483

@example

3484

@var{dir_name}/@var{locale}/LC_@var{category}/@var{domain_name}.mo

3485

@end example

3486

3487

The default value for @var{dir_name} is system specific. For the GNU

3488

library, and for packages adhering to its conventions, it's:

3489

@example

3490

/usr/local/share/locale

3491

@end example

3492

3493

@noindent

3494

@var{locale} is the value of the locale whose name is this

3495

@code{LC_@var{category}}. For @code{gettext} and @code{dgettext} this

3496

@code{LC_@var{category}} is always @code{LC_MESSAGES}.@footnote{Some

3497

system, eg Ultrix, don't have @code{LC_MESSAGES}. Here we use a more or

3498

less arbitrary value for it, namely 1729, the smallest positive integer

3499

which can be represented in two different ways as the sum of two cubes.}

3500

The value of the locale is determined through

3501

@code{setlocale (LC_@var{category}, NULL)}.

3502

@footnote{When the system does not support @code{setlocale} its behavior

3503

in setting the locale values is simulated by looking at the environment

3504

variables.}

3505

@code{dcgettext} specifies the locale category by the third argument.

3506

3507

@node Charset conversion, Plural forms, Locating Catalogs, gettext

3508

@subsection How to specify the output character set @code{gettext} uses

3509

3510

@code{gettext} not only looks up a translation in a message catalog. It

3511

also converts the translation on the fly to the desired output character

3512

set. This is useful if the user is working in a different character set

3513

than the translator who created the message catalog, because it avoids

3514

distributing variants of message catalogs which differ only in the

3515

character set.

3516

3517

The output character set is, by default, the value of @code{nl_langinfo

3518

(CODESET)}, which depends on the @code{LC_CTYPE} part of the current

3519

locale. But programs which store strings in a locale independent way

3520

(e.g. UTF-8) can request that @code{gettext} and related functions

3521

return the translations in that encoding, by use of the

3522

@code{bind_textdomain_codeset} function.

3523

3524

Note that the @var{msgid} argument to @code{gettext} is not subject to

3525

character set conversion. Also, when @code{gettext} does not find a

3526

translation for @var{msgid}, it returns @var{msgid} unchanged --

3527

independently of the current output character set. It is therefore

3528

recommended that all @var{msgid}s be US-ASCII strings.

3529

3530

@deftypefun {char *} bind_textdomain_codeset (const char *@var{domainname}, const char *@var{codeset})

3531

The @code{bind_textdomain_codeset} function can be used to specify the

3532

output character set for message catalogs for domain @var{domainname}.

3533

The @var{codeset} argument must be a valid codeset name which can be used

3534

for the @code{iconv_open} function, or a null pointer.

3535

3536

If the @var{codeset} parameter is the null pointer,

3537

@code{bind_textdomain_codeset} returns the currently selected codeset

3538

for the domain with the name @var{domainname}. It returns @code{NULL} if

3539

no codeset has yet been selected.

3540

3541

The @code{bind_textdomain_codeset} function can be used several times.

3542

If used multiple times with the same @var{domainname} argument, the

3543

later call overrides the settings made by the earlier one.

3544

3545

The @code{bind_textdomain_codeset} function returns a pointer to a

3546

string containing the name of the selected codeset. The string is

3547

allocated internally in the function and must not be changed by the

3548

user. If the system went out of core during the execution of

3549

@code{bind_textdomain_codeset}, the return value is @code{NULL} and the

3550

global variable @var{errno} is set accordingly.

3551

@end deftypefun

3552

3553

@node Plural forms, GUI program problems, Charset conversion, gettext

3554

@subsection Additional functions for plural forms

3555

3556

The functions of the @code{gettext} family described so far (and all the

3557

@code{catgets} functions as well) have one problem in the real world

3558

which have been neglected completely in all existing approaches. What

3559

is meant here is the handling of plural forms.

3560

3561

Looking through Unix source code before the time anybody thought about

3562

internationalization (and, sadly, even afterwards) one can often find

3563

code similar to the following:

3564

3565

@smallexample

3566

printf ("%d file%s deleted", n, n == 1 ? "" : "s");

3567

@end smallexample

3568

3569

@noindent

3570

After the first complaints from people internationalizing the code people

3571

either completely avoided formulations like this or used strings like

3572

@code{"file(s)"}. Both look unnatural and should be avoided. First

3573

tries to solve the problem correctly looked like this:

3574

3575

@smallexample

3576

if (n == 1)

3577

printf ("%d file deleted", n);

3578

else

3579

printf ("%d files deleted", n);

3580

@end smallexample

3581

3582

But this does not solve the problem. It helps languages where the

3583

plural form of a noun is not simply constructed by adding an `s' but

3584

that is all. Once again people fell into the trap of believing the

3585

rules their language is using are universal. But the handling of plural

3586

forms differs widely between the language families. For example,

3587

Rafal Maszkowski @code{<rzm@@mat.uni.torun.pl>} reports:

3588

3589

@quotation

3590

In Polish we use e.g. plik (file) this way:

3591

@example

3592

1 plik

3593

2,3,4 pliki

3594

5-21 pliko'w

3595

22-24 pliki

3596

25-31 pliko'w

3597

@end example

3598

and so on (o' means 8859-2 oacute which should be rather okreska,

3599

similar to aogonek).

3600

@end quotation

3601

3602

There are two things which can differ between languages (and even inside

3603

language families);

3604

3605

@itemize @bullet

3606

@item

3607

The form how plural forms are build differs. This is a problem with

3608

languages which have many irregularities. German, for instance, is a

3609

drastic case. Though English and German are part of the same language

3610

family (Germanic), the almost regular forming of plural noun forms

3611

(appending an `s') is hardly found in German.

3612

3613

@item

3614

The number of plural forms differ. This is somewhat surprising for

3615

those who only have experiences with Romanic and Germanic languages

3616

since here the number is the same (there are two).

3617

3618

But other language families have only one form or many forms. More

3619

information on this in an extra section.

3620

@end itemize

3621

3622

The consequence of this is that application writers should not try to

3623

solve the problem in their code. This would be localization since it is

3624

only usable for certain, hardcoded language environments. Instead the

3625

extended @code{gettext} interface should be used.

3626

3627

These extra functions are taking instead of the one key string two

3628

strings and a numerical argument. The idea behind this is that using

3629

the numerical argument and the first string as a key, the implementation

3630

can select using rules specified by the translator the right plural

3631

form. The two string arguments then will be used to provide a return

3632

value in case no message catalog is found (similar to the normal

3633

@code{gettext} behavior). In this case the rules for Germanic language

3634

is used and it is assumed that the first string argument is the singular

3635

form, the second the plural form.

3636

3637

This has the consequence that programs without language catalogs can

3638

display the correct strings only if the program itself is written using

3639

a Germanic language. This is a limitation but since the GNU C library

3640

(as well as the GNU @code{gettext} package) are written as part of the

3641

GNU package and the coding standards for the GNU project require program

3642

being written in English, this solution nevertheless fulfills its

3643

purpose.

3644

3645

@deftypefun {char *} ngettext (const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})

3646

The @code{ngettext} function is similar to the @code{gettext} function

3647

as it finds the message catalogs in the same way. But it takes two

3648

extra arguments. The @var{msgid1} parameter must contain the singular

3649

form of the string to be converted. It is also used as the key for the

3650

search in the catalog. The @var{msgid2} parameter is the plural form.

3651

The parameter @var{n} is used to determine the plural form. If no

3652

message catalog is found @var{msgid1} is returned if @code{n == 1},

3653

otherwise @code{msgid2}.

3654

3655

An example for the use of this function is:

3656

3657

@smallexample

3658

printf (ngettext ("%d file removed", "%d files removed", n), n);

3659

@end smallexample

3660

3661

Please note that the numeric value @var{n} has to be passed to the

3662

@code{printf} function as well. It is not sufficient to pass it only to

3663

@code{ngettext}.

3664

@end deftypefun

3665

3666

@deftypefun {char *} dngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})

3667

The @code{dngettext} is similar to the @code{dgettext} function in the

3668

way the message catalog is selected. The difference is that it takes

3669

two extra parameter to provide the correct plural form. These two

3670

parameters are handled in the same way @code{ngettext} handles them.

3671

@end deftypefun

3672

3673

@deftypefun {char *} dcngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}, int @var{category})

3674

The @code{dcngettext} is similar to the @code{dcgettext} function in the

3675

way the message catalog is selected. The difference is that it takes

3676

two extra parameter to provide the correct plural form. These two

3677

parameters are handled in the same way @code{ngettext} handles them.

3678

@end deftypefun

3679

3680

Now, how do these functions solve the problem of the plural forms?

3681

Without the input of linguists (which was not available) it was not

3682

possible to determine whether there are only a few different forms in

3683

which plural forms are formed or whether the number can increase with

3684

every new supported language.

3685

3686

Therefore the solution implemented is to allow the translator to specify

3687

the rules of how to select the plural form. Since the formula varies

3688

with every language this is the only viable solution except for

3689

hardcoding the information in the code (which still would require the

3690

possibility of extensions to not prevent the use of new languages).

3691

3692

The information about the plural form selection has to be stored in the

3693

header entry of the PO file (the one with the empty @code{msgid} string).

3694

The plural form information looks like this:

3695

3696

@smallexample

3697

Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;

3698

@end smallexample

3699

3700

The @code{nplurals} value must be a decimal number which specifies how

3701

many different plural forms exist for this language. The string

3702

following @code{plural} is an expression which is using the C language

3703

syntax. Exceptions are that no negative numbers are allowed, numbers

3704

must be decimal, and the only variable allowed is @code{n}. This

3705

expression will be evaluated whenever one of the functions

3706

@code{ngettext}, @code{dngettext}, or @code{dcngettext} is called. The

3707

numeric value passed to these functions is then substituted for all uses

3708

of the variable @code{n} in the expression. The resulting value then

3709

must be greater or equal to zero and smaller than the value given as the

3710

value of @code{nplurals}.

3711

3712

@noindent

3713

The following rules are known at this point. The language with families

3714

are listed. But this does not necessarily mean the information can be

3715

generalized for the whole family (as can be easily seen in the table

3716

below).@footnote{Additions are welcome. Send appropriate information to

3717

@email{bug-glibc-manual@@gnu.org}.}

3718

3719

@table @asis

3720

@item Only one form:

3721

Some languages only require one single form. There is no distinction

3722

between the singular and plural form. An appropriate header entry

3723

would look like this:

3724

3725

@smallexample

3726

Plural-Forms: nplurals=1; plural=0;

3727

@end smallexample

3728

3729

@noindent

3730

Languages with this property include:

3731

3732

@table @asis

3733

@item Finno-Ugric family

3734

Hungarian

3735

@item Asian family

3736

Japanese, Korean

3737

@item Turkic/Altaic family

3738

Turkish

3739

@end table

3740

3741

@item Two forms, singular used for one only

3742

This is the form used in most existing programs since it is what English

3743

is using. A header entry would look like this:

3744

3745

@smallexample

3746

Plural-Forms: nplurals=2; plural=n != 1;

3747

@end smallexample

3748

3749

(Note: this uses the feature of C expressions that boolean expressions

3750

have to value zero or one.)

3751

3752

@noindent

3753

Languages with this property include:

3754

3755

@table @asis

3756

@item Germanic family

3757

Danish, Dutch, English, German, Norwegian, Swedish

3758

@item Finno-Ugric family

3759

Estonian, Finnish

3760

@item Latin/Greek family

3761

Greek

3762

@item Semitic family

3763

Hebrew

3764

@item Romanic family

3765

Italian, Portuguese, Spanish

3766

@item Artificial

3767

Esperanto

3768

@end table

3769

3770

@item Two forms, singular used for zero and one

3771

Exceptional case in the language family. The header entry would be:

3772

3773

@smallexample

3774

Plural-Forms: nplurals=2; plural=n>1;

3775

@end smallexample

3776

3777

@noindent

3778

Languages with this property include:

3779

3780

@table @asis

3781

@item Romanic family

3782

French, Brazilian Portuguese

3783

@end table

3784

3785

@item Three forms, special case for zero

3786

The header entry would be:

3787

3788

@smallexample

3789

Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2;

3790

@end smallexample

3791

3792

@noindent

3793

Languages with this property include:

3794

3795

@table @asis

3796

@item Baltic family

3797

Latvian

3798

@end table

3799

3800

@item Three forms, special cases for one and two

3801

The header entry would be:

3802

3803

@smallexample

3804

Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2;

3805

@end smallexample

3806

3807

@noindent

3808

Languages with this property include:

3809

3810

@table @asis

3811

@item Celtic

3812

Gaeilge

3813

@end table

3814

3815

@item Three forms, special case for numbers ending in 1[2-9]

3816

The header entry would look like this:

3817

3818

@smallexample

3819

Plural-Forms: nplurals=3; \

3820

plural=n%10==1 && n%100!=11 ? 0 : \

3821

n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2;

3822

@end smallexample

3823

3824

@noindent

3825

Languages with this property include:

3826

3827

@table @asis

3828

@item Baltic family

3829

Lithuanian

3830

@end table

3831

3832

@item Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4]

3833

The header entry would look like this:

3834

3835

@smallexample

3836

Plural-Forms: nplurals=3; \

3837

plural=n%10==1 && n%100!=11 ? 0 : \

3838

n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;

3839

@end smallexample

3840

3841

@noindent

3842

Languages with this property include:

3843

3844

@table @asis

3845

@item Slavic family

3846

Croatian, Czech, Russian, Slovak, Ukrainian

3847

@end table

3848

3849

@item Three forms, special case for one and some numbers ending in 2, 3, or 4

3850

The header entry would look like this:

3851

3852

@smallexample

3853

Plural-Forms: nplurals=3; \

3854

plural=n==1 ? 0 : \

3855

n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;

3856

@end smallexample

3857

3858

@noindent

3859

Languages with this property include:

3860

3861

@table @asis

3862

@item Slavic family

3863

Polish

3864

@end table

3865

3866

@item Four forms, special case for one and all numbers ending in 02, 03, or 04

3867

The header entry would look like this:

3868

3869

@smallexample

3870

Plural-Forms: nplurals=4; \

3871

plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3;

3872

@end smallexample

3873

3874

@noindent

3875

Languages with this property include:

3876

3877

@table @asis

3878

@item Slavic family

3879

Slovenian

3880

@end table

3881

@end table

3882

3883

@node GUI program problems, Optimized gettext, Plural forms, gettext

3884

@subsection How to use @code{gettext} in GUI programs

3885

3886

One place where the @code{gettext} functions, if used normally, have big

3887

problems is within programs with graphical user interfaces (GUIs). The

3888

problem is that many of the strings which have to be translated are very

3889

short. They have to appear in pull-down menus which restricts the

3890

length. But strings which are not containing entire sentences or at

3891

least large fragments of a sentence may appear in more than one

3892

situation in the program but might have different translations. This is

3893

especially true for the one-word strings which are frequently used in

3894

GUI programs.

3895

3896

As a consequence many people say that the @code{gettext} approach is

3897

wrong and instead @code{catgets} should be used which indeed does not

3898

have this problem. But there is a very simple and powerful method to

3899

handle these kind of problems with the @code{gettext} functions.

3900

3901

@noindent

3902

As as example consider the following fictional situation. A GUI program

3903

has a menu bar with the following entries:

3904

3905

@smallexample

3906

+------------+------------+--------------------------------------+

3907

| File | Printer | |

3908

+------------+------------+--------------------------------------+

3909

| Open | | Select |

3910

| New | | Open |

3911

+----------+ | Connect |

3912

+----------+

3913

@end smallexample

3914

3915

To have the strings @code{File}, @code{Printer}, @code{Open},

3916

@code{New}, @code{Select}, and @code{Connect} translated there has to be

3917

at some point in the code a call to a function of the @code{gettext}

3918

family. But in two places the string passed into the function would be

3919

@code{Open}. The translations might not be the same and therefore we

3920

are in the dilemma described above.

3921

3922

One solution to this problem is to artificially enlengthen the strings

3923

to make them unambiguous. But what would the program do if no

3924

translation is available? The enlengthened string is not what should be

3925

printed. So we should use a little bit modified version of the functions.

3926

3927

To enlengthen the strings a uniform method should be used. E.g., in the

3928

example above the strings could be chosen as

3929

3930

@smallexample

3931

Menu|File

3932

Menu|Printer

3933

Menu|File|Open

3934

Menu|File|New

3935

Menu|Printer|Select

3936

Menu|Printer|Open

3937

Menu|Printer|Connect

3938

@end smallexample

3939

3940

Now all the strings are different and if now instead of @code{gettext}

3941

the following little wrapper function is used, everything works just

3942

fine:

3943

3944

@cindex sgettext

3945

@smallexample

3946

char *

3947

sgettext (const char *msgid)

3948

@{

3949

char *msgval = gettext (msgid);

3950

if (msgval == msgid)

3951

msgval = strrchr (msgid, '|') + 1;

3952

return msgval;

3953

@}

3954

@end smallexample

3955

3956

What this little function does is to recognize the case when no

3957

translation is available. This can be done very efficiently by a

3958

pointer comparison since the return value is the input value. If there

3959

is no translation we know that the input string is in the format we used

3960

for the Menu entries and therefore contains a @code{|} character. We

3961

simply search for the last occurrence of this character and return a

3962

pointer to the character following it. That's it!

3963

3964

If one now consistently uses the enlengthened string form and replaces

3965

the @code{gettext} calls with calls to @code{sgettext} (this is normally

3966

limited to very few places in the GUI implementation) then it is

3967

possible to produce a program which can be internationalized.

3968

3969

The other @code{gettext} functions (@code{dgettext}, @code{dcgettext}

3970

and the @code{ngettext} equivalents) can and should have corresponding

3971

functions as well which look almost identical, except for the parameters

3972

and the call to the underlying function.

3973

3974

Now there is of course the question why such functions do not exist in

3975

the GNU gettext package? There are two parts of the answer to this question.

3976

3977

@itemize @bullet

3978

@item

3979

They are easy to write and therefore can be provided by the project they

3980

are used in. This is not an answer by itself and must be seen together

3981

with the second part which is:

3982

3983

@item

3984

There is no way the gettext package can contain a version which can work

3985

everywhere. The problem is the selection of the character to separate

3986

the prefix from the actual string in the enlenghtened string. The

3987

examples above used @code{|} which is a quite good choice because it

3988

resembles a notation frequently used in this context and it also is a

3989

character not often used in message strings.

3990

3991

But what if the character is used in message strings? Or if the chose

3992

character is not available in the character set on the machine one

3993

compiles (e.g., @code{|} is not required to exist for @w{ISO C}; this is

3994

why the @file{iso646.h} file exists in @w{ISO C} programming environments).

3995

@end itemize

3996

3997

There is only one more comment to be said. The wrapper function above

3998

requires that the translations strings are not enlengthened themselves.

3999

This is only logical. There is no need to disambiguate the strings

4000

(since they are never used as keys for a search) and one also saves

4001

quite some memory and disk space by doing this.

4002

4003

@node Optimized gettext, , GUI program problems, gettext

4004

@subsection Optimization of the *gettext functions

4005

4006

At this point of the discussion we should talk about an advantage of the

4007

GNU @code{gettext} implementation. Some readers might have pointed out

4008

that an internationalized program might have a poor performance if some

4009

string has to be translated in an inner loop. While this is unavoidable

4010

when the string varies from one run of the loop to the other it is

4011

simply a waste of time when the string is always the same. Take the

4012

following example:

4013

4014

@example

4015

@group

4016

@{

4017

while (@dots{})

4018

@{

4019

puts (gettext ("Hello world"));

4020

@}

4021

@}

4022

@end group

4023

@end example

4024

4025

@noindent

4026

When the locale selection does not change between two runs the resulting

4027

string is always the same. One way to use this is:

4028

4029

@example

4030

@group

4031

@{

4032

str = gettext ("Hello world");

4033

while (@dots{})

4034

@{

4035

puts (str);

4036

@}

4037

@}

4038

@end group

4039

@end example

4040

4041

@noindent

4042

But this solution is not usable in all situation (e.g. when the locale

4043

selection changes) nor does it lead to legible code.

4044

4045

For this reason, GNU @code{gettext} caches previous translation results.

4046

When the same translation is requested twice, with no new message

4047

catalogs being loaded in between, @code{gettext} will, the second time,

4048

find the result through a single cache lookup.

4049

4050

@node Comparison, Using libintl.a, gettext, Programmers

4051

@section Comparing the Two Interfaces

4052

4053

@c FIXME: arguments to catgets vs. gettext

4054

@c Partly done 950718 -- drepper

4055

4056

The following discussion is perhaps a little bit colored. As said

4057

above we implemented GNU @code{gettext} following the Uniforum

4058

proposal and this surely has its reasons. But it should show how we

4059

came to this decision.

4060

4061

First we take a look at the developing process. When we write an

4062

application using NLS provided by @code{gettext} we proceed as always.

4063

Only when we come to a string which might be seen by the users and thus

4064

has to be translated we use @code{gettext("@dots{}")} instead of

4065

@code{"@dots{}"}. At the beginning of each source file (or in a central

4066

header file) we define

4067

4068

@example

4069

#define gettext(String) (String)

4070

@end example

4071

4072

Even this definition can be avoided when the system supports the

4073

@code{gettext} function in its C library. When we compile this code the

4074

result is the same as if no NLS code is used. When you take a look at

4075

the GNU @code{gettext} code you will see that we use @code{_("@dots{}")}

4076

instead of @code{gettext("@dots{}")}. This reduces the number of

4077

additional characters per translatable string to @emph{3} (in words:

4078

three).

4079

4080

When now a production version of the program is needed we simply replace

4081

the definition

4082

4083

@example

4084

#define _(String) (String)

4085

@end example

4086

4087

@noindent

4088

by

4089

4090

@example

4091

#include <libintl.h>

4092

#define _(String) gettext (String)

4093

@end example

4094

4095

@noindent

4096

Additionally we run the program @file{xgettext} on all source code file

4097

which contain translatable strings and that's it: we have a running

4098

program which does not depend on translations to be available, but which

4099

can use any that becomes available.

4100

4101

The same procedure can be done for the @code{gettext_noop} invocations

4102

(@pxref{Special cases}). One usually defines @code{gettext_noop} as a

4103

no-op macro. So you should consider the following code for your project:

4104

4105

@example

4106

#define gettext_noop(String) (String)

4107

#define N_(String) gettext_noop (String)

4108

@end example

4109

4110

@code{N_} is a short form similar to @code{_}. The @file{Makefile} in

4111

the @file{po/} directory of GNU @code{gettext} knows by default both of the

4112

mentioned short forms so you are invited to follow this proposal for

4113

your own ease.

4114

4115

Now to @code{catgets}. The main problem is the work for the

4116

programmer. Every time he comes to a translatable string he has to

4117

define a number (or a symbolic constant) which has also be defined in

4118

the message catalog file. He also has to take care for duplicate

4119

entries, duplicate message IDs etc. If he wants to have the same

4120

quality in the message catalog as the GNU @code{gettext} program

4121

provides he also has to put the descriptive comments for the strings and

4122

the location in all source code files in the message catalog. This is

4123

nearly a Mission: Impossible.

4124

4125

But there are also some points people might call advantages speaking for

4126

@code{catgets}. If you have a single word in a string and this string

4127

is used in different contexts it is likely that in one or the other

4128

language the word has different translations. Example:

4129

4130

@example

4131

printf ("%s: %d", gettext ("number"), number_of_errors)

4132

4133

printf ("you should see %d %s", number_count,

4134

number_count == 1 ? gettext ("number") : gettext ("numbers"))

4135

@end example

4136

4137

Here we have to translate two times the string @code{"number"}. Even

4138

if you do not speak a language beside English it might be possible to

4139

recognize that the two words have a different meaning. In German the

4140

first appearance has to be translated to @code{"Anzahl"} and the second

4141

to @code{"Zahl"}.

4142

4143

Now you can say that this example is really esoteric. And you are

4144

right! This is exactly how we felt about this problem and decide that

4145

it does not weight that much. The solution for the above problem could

4146

be very easy:

4147

4148

@example

4149

printf ("%s %d", gettext ("number:"), number_of_errors)

4150

4151

printf (number_count == 1 ? gettext ("you should see %d number")

4152

: gettext ("you should see %d numbers"),

4153

number_count)

4154

@end example

4155

4156

We believe that we can solve all conflicts with this method. If it is

4157

difficult one can also consider changing one of the conflicting string a

4158

little bit. But it is not impossible to overcome.

4159

4160

@code{catgets} allows same original entry to have different translations,

4161

but @code{gettext} has another, scalable approach for solving ambiguities

4162

of this kind: @xref{Ambiguities}.

4163

4164

@node Using libintl.a, gettext grok, Comparison, Programmers

4165

@section Using libintl.a in own programs

4166

4167

Starting with version 0.9.4 the library @code{libintl.h} should be

4168

self-contained. I.e., you can use it in your own programs without

4169

providing additional functions. The @file{Makefile} will put the header

4170

and the library in directories selected using the @code{$(prefix)}.

4171

4172

One exception of the above is found on HP-UX 10.01 systems. Here the C

4173

library does not contain the @code{alloca} function (and the HP compiler

4174

does not generate it inlined). But it is not intended to rewrite the whole

4175

library just because of this dumb system. Instead include the

4176

@code{alloca} function in all package you use the @code{libintl.a} in.

4177

4178

@node gettext grok, Temp Programmers, Using libintl.a, Programmers

4179

@section Being a @code{gettext} grok

4180

4181

To fully exploit the functionality of the GNU @code{gettext} library it

4182

is surely helpful to read the source code. But for those who don't want

4183

to spend that much time in reading the (sometimes complicated) code here

4184

is a list comments:

4185

4186

@itemize @bullet

4187

@item Changing the language at runtime

4188

4189

For interactive programs it might be useful to offer a selection of the

4190

used language at runtime. To understand how to do this one need to know

4191

how the used language is determined while executing the @code{gettext}

4192

function. The method which is presented here only works correctly

4193

with the GNU implementation of the @code{gettext} functions.

4194

4195

In the function @code{dcgettext} at every call the current setting of

4196

the highest priority environment variable is determined and used.

4197

Highest priority means here the following list with decreasing

4198

priority:

4199

4200

@enumerate

4201

@item @code{LANGUAGE}

4202

@item @code{LC_ALL}

4203

@item @code{LC_xxx}, according to selected locale

4204

@item @code{LANG}

4205

@end enumerate

4206

4207

Afterwards the path is constructed using the found value and the

4208

translation file is loaded if available.

4209

4210

What is now when the value for, say, @code{LANGUAGE} changes. According

4211

to the process explained above the new value of this variable is found

4212

as soon as the @code{dcgettext} function is called. But this also means

4213

the (perhaps) different message catalog file is loaded. In other

4214

words: the used language is changed.

4215

4216

But there is one little hook. The code for gcc-2.7.0 and up provides

4217

some optimization. This optimization normally prevents the calling of

4218

the @code{dcgettext} function as long as no new catalog is loaded. But

4219

if @code{dcgettext} is not called the program also cannot find the

4220

@code{LANGUAGE} variable be changed (@pxref{Optimized gettext}). A

4221

solution for this is very easy. Include the following code in the

4222

language switching function.

4223

4224

@example

4225

/* Change language. */

4226

setenv ("LANGUAGE", "fr", 1);

4227

4228

/* Make change known. */

4229

@{

4230

extern int _nl_msg_cat_cntr;

4231

++_nl_msg_cat_cntr;

4232

@}

4233

@end example

4234

4235

The variable @code{_nl_msg_cat_cntr} is defined in @file{loadmsgcat.c}.

4236

The programmer will find himself in need for a construct like this only

4237

when developing programs which do run longer and provide the user to

4238

select the language at runtime. Non-interactive programs (like all

4239

these little Unix tools) should never need this.

4240

4241

@end itemize

4242

4243

@node Temp Programmers, , gettext grok, Programmers

4244

@section Temporary Notes for the Programmers Chapter

4245

4246

@menu

4247

* Temp Implementations:: Temporary - Two Possible Implementations

4248

* Temp catgets:: Temporary - About @code{catgets}

4249

* Temp WSI:: Temporary - Why a single implementation

4250

* Temp Notes:: Temporary - Notes

4251

@end menu

4252

4253

@node Temp Implementations, Temp catgets, Temp Programmers, Temp Programmers

4254

@subsection Temporary - Two Possible Implementations

4255

4256

There are two competing methods for language independent messages:

4257

the X/Open @code{catgets} method, and the Uniforum @code{gettext}

4258

method. The @code{catgets} method indexes messages by integers; the

4259

@code{gettext} method indexes them by their English translations.

4260

The @code{catgets} method has been around longer and is supported

4261

by more vendors. The @code{gettext} method is supported by Sun,

4262

and it has been heard that the COSE multi-vendor initiative is

4263

supporting it. Neither method is a POSIX standard; the POSIX.1

4264

committee had a lot of disagreement in this area.

4265

4266

Neither one is in the POSIX standard. There was much disagreement

4267

in the POSIX.1 committee about using the @code{gettext} routines

4268

vs. @code{catgets} (XPG). In the end the committee couldn't

4269

agree on anything, so no messaging system was included as part

4270

of the standard. I believe the informative annex of the standard

4271

includes the XPG3 messaging interfaces, ``@dots{}as an example of

4272

a messaging system that has been implemented@dots{}''

4273

4274

They were very careful not to say anywhere that you should use one

4275

set of interfaces over the other. For more on this topic please

4276

see the Programming for Internationalization FAQ.

4277

4278

@node Temp catgets, Temp WSI, Temp Implementations, Temp Programmers

4279

@subsection Temporary - About @code{catgets}

4280

4281

There have been a few discussions of late on the use of

4282

@code{catgets} as a base. I think it important to present both

4283

sides of the argument and hence am opting to play devil's advocate

4284

for a little bit.

4285

4286

I'll not deny the fact that @code{catgets} could have been designed

4287

a lot better. It currently has quite a number of limitations and

4288

these have already been pointed out.

4289

4290

However there is a great deal to be said for consistency and

4291

standardization. A common recurring problem when writing Unix

4292

software is the myriad portability problems across Unix platforms.

4293

It seems as if every Unix vendor had a look at the operating system

4294

and found parts they could improve upon. Undoubtedly, these

4295

modifications are probably innovative and solve real problems.

4296

However, software developers have a hard time keeping up with all

4297

these changes across so many platforms.

4298

4299

And this has prompted the Unix vendors to begin to standardize their

4300

systems. Hence the impetus for Spec1170. Every major Unix vendor

4301

has committed to supporting this standard and every Unix software

4302

developer waits with glee the day they can write software to this

4303

standard and simply recompile (without having to use autoconf)

4304

across different platforms.

4305

4306

As I understand it, Spec1170 is roughly based upon version 4 of the

4307

X/Open Portability Guidelines (XPG4). Because @code{catgets} and

4308

friends are defined in XPG4, I'm led to believe that @code{catgets}

4309

is a part of Spec1170 and hence will become a standardized component

4310

of all Unix systems.

4311

4312

@node Temp WSI, Temp Notes, Temp catgets, Temp Programmers

4313

@subsection Temporary - Why a single implementation

4314

4315

Now it seems kind of wasteful to me to have two different systems

4316

installed for accessing message catalogs. If we do want to remedy

4317

@code{catgets} deficiencies why don't we try to expand @code{catgets}

4318

(in a compatible manner) rather than implement an entirely new system.

4319

Otherwise, we'll end up with two message catalog access systems installed

4320

with an operating system - one set of routines for packages using GNU

4321

@code{gettext} for their internationalization, and another set of routines

4322

(catgets) for all other software. Bloated?

4323

4324

Supposing another catalog access system is implemented. Which do

4325

we recommend? At least for Linux, we need to attract as many

4326

software developers as possible. Hence we need to make it as easy

4327

for them to port their software as possible. Which means supporting

4328

@code{catgets}. We will be implementing the @code{libintl} code

4329

within our @code{libc}, but does this mean we also have to incorporate

4330

another message catalog access scheme within our @code{libc} as well?

4331

And what about people who are going to be using the @code{libintl}

4332

+ non-@code{catgets} routines. When they port their software to

4333

other platforms, they're now going to have to include the front-end

4334

(@code{libintl}) code plus the back-end code (the non-@code{catgets}

4335

access routines) with their software instead of just including the

4336

@code{libintl} code with their software.

4337

4338

Message catalog support is however only the tip of the iceberg.

4339

What about the data for the other locale categories. They also have

4340

a number of deficiencies. Are we going to abandon them as well and

4341

develop another duplicate set of routines (should @code{libintl}

4342

expand beyond message catalog support)?

4343

4344

Like many parts of Unix that can be improved upon, we're stuck with balancing

4345

compatibility with the past with useful improvements and innovations for

4346

the future.

4347

4348

@node Temp Notes, , Temp WSI, Temp Programmers

4349

@subsection Temporary - Notes

4350

4351

X/Open agreed very late on the standard form so that many

4352

implementations differ from the final form. Both of my system (old

4353

Linux catgets and Ultrix-4) have a strange variation.

4354

4355

OK. After incorporating the last changes I have to spend some time on

4356

making the GNU/Linux @code{libc} @code{gettext} functions. So in future

4357

Solaris is not the only system having @code{gettext}.

4358

4359

@node Translators, Maintainers, Programmers, Top

4360

@chapter The Translator's View

4361

4362

@c FIXME: Reorganize whole chapter.

4363

4364

@menu

4365

* Trans Intro 0:: Introduction 0

4366

* Trans Intro 1:: Introduction 1

4367

* Discussions:: Discussions

4368

* Organization:: Organization

4369

* Information Flow:: Information Flow

4370

@end menu

4371

4372

@node Trans Intro 0, Trans Intro 1, Translators, Translators

4373

@section Introduction 0

4374

4375

Free software is going international! The Translation Project is a way

4376

to get maintainers, translators and users all together, so free software

4377

will gradually become able to speak many native languages.

4378

4379

The GNU @code{gettext} tool set contains @emph{everything} maintainers

4380

need for internationalizing their packages for messages. It also

4381

contains quite useful tools for helping translators at localizing

4382

messages to their native language, once a package has already been

4383

internationalized.

4384

4385

To achieve the Translation Project, we need many interested

4386

people who like their own language and write it well, and who are also

4387

able to synergize with other translators speaking the same language.

4388

If you'd like to volunteer to @emph{work} at translating messages,

4389

please send mail to your translating team.

4390

4391

Each team has its own mailing list, courtesy of Linux

4392

International. You may reach your translating team at the address

4393

@file{@var{ll}@@li.org}, replacing @var{ll} by the two-letter @w{ISO 639}

4394

code for your language. Language codes are @emph{not} the same as

4395

country codes given in @w{ISO 3166}. The following translating teams

4396

exist:

4397

4398

@quotation

4399

Chinese @code{zh}, Czech @code{cs}, Danish @code{da}, Dutch @code{nl},

4400

Esperanto @code{eo}, Finnish @code{fi}, French @code{fr}, Irish

4401

@code{ga}, German @code{de}, Greek @code{el}, Italian @code{it},

4402

Japanese @code{ja}, Indonesian @code{in}, Norwegian @code{no}, Polish

4403

@code{pl}, Portuguese @code{pt}, Russian @code{ru}, Spanish @code{es},

4404

Swedish @code{sv} and Turkish @code{tr}.

4405

@end quotation

4406

4407

@noindent

4408

For example, you may reach the Chinese translating team by writing to

4409

@file{zh@@li.org}. When you become a member of the translating team

4410

for your own language, you may subscribe to its list. For example,

4411

Swedish people can send a message to @w{@file{sv-request@@li.org}},

4412

having this message body:

4413

4414

@example

4415

subscribe

4416

@end example

4417

4418

Keep in mind that team members should be interested in @emph{working}

4419

at translations, or at solving translational difficulties, rather than

4420

merely lurking around. If your team does not exist yet and you want to

4421

start one, please write to @w{@file{translation@@iro.umontreal.ca}};

4422

you will then reach the coordinator for all translator teams.

4423

4424

A handful of GNU packages have already been adapted and provided

4425

with message translations for several languages. Translation

4426

teams have begun to organize, using these packages as a starting

4427

point. But there are many more packages and many languages for

4428

which we have no volunteer translators. If you would like to

4429

volunteer to work at translating messages, please send mail to

4430

@file{translation@@iro.umontreal.ca} indicating what language(s)

4431

you can work on.

4432

4433

@node Trans Intro 1, Discussions, Trans Intro 0, Translators

4434

@section Introduction 1

4435

4436

This is now official, GNU is going international! Here is the

4437

announcement submitted for the January 1995 GNU Bulletin:

4438

4439

@quotation

4440

A handful of GNU packages have already been adapted and provided

4441

with message translations for several languages. Translation

4442

teams have begun to organize, using these packages as a starting

4443

point. But there are many more packages and many languages

4444

for which we have no volunteer translators. If you'd like to

4445

volunteer to work at translating messages, please send mail to

4446

@samp{translation@@iro.umontreal.ca} indicating what language(s)

4447

you can work on.

4448

@end quotation

4449

4450

This document should answer many questions for those who are curious about

4451

the process or would like to contribute. Please at least skim over it,

4452

hoping to cut down a little of the high volume of e-mail generated by this

4453

collective effort towards internationalization of free software.

4454

4455

Most free programming which is widely shared is done in English, and

4456

currently, English is used as the main communicating language between

4457

national communities collaborating to free software. This very document

4458

is written in English. This will not change in the foreseeable future.

4459

4460

However, there is a strong appetite from national communities for

4461

having more software able to write using national language and habits,

4462

and there is an on-going effort to modify free software in such a way

4463

that it becomes able to do so. The experiments driven so far raised

4464

an enthusiastic response from pretesters, so we believe that

4465

internationalization of free software is dedicated to succeed.

4466

4467

For suggestion clarifications, additions or corrections to this

4468

document, please e-mail to @file{translation@@iro.umontreal.ca}.

4469

4470

@node Discussions, Organization, Trans Intro 1, Translators

4471

@section Discussions

4472

4473

Facing this internationalization effort, a few users expressed their

4474

concerns. Some of these doubts are presented and discussed, here.

4475

4476

@itemize @bullet

4477

@item Smaller groups

4478

4479

Some languages are not spoken by a very large number of people, so people

4480

speaking them sometimes consider that there may not be all that much

4481

demand such versions of free software packages. Moreover, many people

4482

being @emph{into computers}, in some countries, generally seem to prefer

4483

English versions of their software.

4484

4485

On the other end, people might enjoy their own language a lot, and be

4486

very motivated at providing to themselves the pleasure of having their

4487

beloved free software speaking their mother tongue. They do themselves

4488

a personal favor, and do not pay that much attention to the number of

4489

people beneficiating of their work.

4490

4491

@item Misinterpretation

4492

4493

Other users are shy to push forward their own language, seeing in this

4494

some kind of misplaced propaganda. Someone thought there must be some

4495

users of the language over the networks pestering other people with it.

4496

4497

But any spoken language is worth localization, because there are

4498

people behind the language for whom the language is important and

4499

dear to their hearts.

4500

4501

@item Odd translations

4502

4503

The biggest problem is to find the right translations so that

4504

everybody can understand the messages. Translations are usually a

4505

little odd. Some people get used to English, to the extent they may

4506

find translations into their own language ``rather pushy, obnoxious

4507

and sometimes even hilarious.'' As a French speaking man, I have

4508

the experience of those instruction manuals for goods, so poorly

4509

translated in French in Korea or Taiwan@dots{}

4510

4511

The fact is that we sometimes have to create a kind of national

4512

computer culture, and this is not easy without the collaboration of

4513

many people liking their mother tongue. This is why translations are

4514

better achieved by people knowing and loving their own language, and

4515

ready to work together at improving the results they obtain.

4516

4517

@item Dependencies over the GPL or LGPL

4518

4519

Some people wonder if using GNU @code{gettext} necessarily brings their

4520

package under the protective wing of the GNU General Public License or

4521

the GNU Library General Public License, when they do not want to make

4522

their program free, or want other kinds of freedom. The simplest

4523

answer is ``normally not''.

4524

4525

The GNU @code{gettext} library, i.e. the contents of @code{libintl},

4526

is covered by the GNU Library General Public License. The rest of

4527

the GNU @code{gettext} package is covered by the GNU General Public

4528

License.

4529

4530

The mere marking of localizable strings in a package, or conditional

4531

inclusion of a few lines for initialization, is not really including

4532

GPL'ed or LGPL'ed code. However, since the localization routines in

4533

@code{libintl} are under the LGPL, the LGPL needs to be considered.

4534

It gives the right to distribute the complete unmodified source of

4535

@code{libintl} even with non-free programs. It also gives the right

4536

to use @code{libintl} as a shared library, even for non-free programs.

4537

But it gives the right to use @code{libintl} as a static library or

4538

to incorporate @code{libintl} into another library only to free

4539

software.

4540

4541

@end itemize

4542

4543

@node Organization, Information Flow, Discussions, Translators

4544

@section Organization

4545

4546

On a larger scale, the true solution would be to organize some kind of

4547

fairly precise set up in which volunteers could participate. I gave

4548

some thought to this idea lately, and realize there will be some

4549

touchy points. I thought of writing to Richard Stallman to launch

4550

such a project, but feel it might be good to shake out the ideas

4551

between ourselves first. Most probably that Linux International has

4552

some experience in the field already, or would like to orchestrate

4553

the volunteer work, maybe. Food for thought, in any case!

4554

4555

I guess we have to setup something early, somehow, that will help

4556

many possible contributors of the same language to interlock and avoid

4557

work duplication, and further be put in contact for solving together

4558

problems particular to their tongue (in most languages, there are many

4559

difficulties peculiar to translating technical English). My Swedish

4560

contributor acknowledged these difficulties, and I'm well aware of

4561

them for French.

4562

4563

This is surely not a technical issue, but we should manage so the

4564

effort of locale contributors be maximally useful, despite the national

4565

team layer interface between contributors and maintainers.

4566

4567

The Translation Project needs some setup for coordinating language

4568

coordinators. Localizing evolving programs will surely

4569

become a permanent and continuous activity in the free software community,

4570

once well started.

4571

The setup should be minimally completed and tested before GNU

4572

@code{gettext} becomes an official reality. The e-mail address

4573

@file{translation@@iro.umontreal.ca} has been setup for receiving

4574

offers from volunteers and general e-mail on these topics. This address

4575

reaches the Translation Project coordinator.

4576

4577

@menu

4578

* Central Coordination:: Central Coordination

4579

* National Teams:: National Teams

4580

* Mailing Lists:: Mailing Lists

4581

@end menu

4582

4583

@node Central Coordination, National Teams, Organization, Organization

4584

@subsection Central Coordination

4585

4586

I also think GNU will need sooner than it thinks, that someone setup

4587

a way to organize and coordinate these groups. Some kind of group

4588

of groups. My opinion is that it would be good that GNU delegates

4589

this task to a small group of collaborating volunteers, shortly.

4590

Perhaps in @file{gnu.announce} a list of this national committee's

4591

can be published.

4592

4593

My role as coordinator would simply be to refer to Ulrich any German

4594

speaking volunteer interested to localization of free software packages, and

4595

maybe helping national groups to initially organize, while maintaining

4596

national registries for until national groups are ready to take over.

4597

In fact, the coordinator should ease volunteers to get in contact with

4598

one another for creating national teams, which should then select

4599

one coordinator per language, or country (regionalized language).

4600

If well done, the coordination should be useful without being an

4601

overwhelming task, the time to put delegations in place.

4602

4603

@node National Teams, Mailing Lists, Central Coordination, Organization

4604

@subsection National Teams

4605

4606

I suggest we look for volunteer coordinators/editors for individual

4607

languages. These people will scan contributions of translation files

4608

for various programs, for their own languages, and will ensure high

4609

and uniform standards of diction.

4610

4611

From my current experience with other people in these days, those who

4612

provide localizations are very enthusiastic about the process, and are

4613

more interested in the localization process than in the program they

4614

localize, and want to do many programs, not just one. This seems

4615

to confirm that having a coordinator/editor for each language is a

4616

good idea.

4617

4618

We need to choose someone who is good at writing clear and concise

4619

prose in the language in question. That is hard---we can't check

4620

it ourselves. So we need to ask a few people to judge each others'

4621

writing and select the one who is best.

4622

4623

I announce my prerelease to a few dozen people, and you would not

4624

believe all the discussions it generated already. I shudder to think

4625

what will happen when this will be launched, for true, officially,

4626

world wide. Who am I to arbitrate between two Czekolsovak users

4627

contradicting each other, for example?

4628

4629

I assume that your German is not much better than my French so that

4630

I would not be able to judge about these formulations. What I would

4631

suggest is that for each language there is a group for people who

4632

maintain the PO files and judge about changes. I suspect there will

4633

be cultural differences between how such groups of people will behave.

4634

Some will have relaxed ways, reach consensus easily, and have anyone

4635

of the group relate to the maintainers, while others will fight to

4636

death, organize heavy administrations up to national standards, and

4637

use strict channels.

4638

4639

The German team is putting out a good example. Right now, they are

4640

maybe half a dozen people revising translations of each other and

4641

discussing the linguistic issues. I do not even have all the names.

4642

Ulrich Drepper is taking care of coordinating the German team.

4643

He subscribed to all my pretest lists, so I do not even have to warn

4644

him specifically of incoming releases.

4645

4646

I'm sure, that is a good idea to get teams for each language working

4647

on translations. That will make the translations better and more

4648

consistent.

4649

4650

@menu

4651

* Sub-Cultures:: Sub-Cultures

4652

* Organizational Ideas:: Organizational Ideas

4653

@end menu

4654

4655

@node Sub-Cultures, Organizational Ideas, National Teams, National Teams

4656

@subsubsection Sub-Cultures

4657

4658

Taking French for example, there are a few sub-cultures around computers

4659

which developed diverging vocabularies. Picking volunteers here and

4660

there without addressing this problem in an organized way, soon in the

4661

project, might produce a distasteful mix of internationalized programs,

4662

and possibly trigger endless quarrels among those who really care.

4663

4664

Keeping some kind of unity in the way French localization of

4665

internationalized programs is achieved is a difficult (and delicate) job.

4666

Knowing the latin character of French people (:-), if we take this

4667

the wrong way, we could end up nowhere, or spoil a lot of energies.

4668

Maybe we should begin to address this problem seriously @emph{before}

4669

GNU @code{gettext} become officially published. And I suspect that this

4670

means soon!

4671

4672

@node Organizational Ideas, , Sub-Cultures, National Teams

4673

@subsubsection Organizational Ideas

4674

4675

I expect the next big changes after the official release. Please note

4676

that I use the German translation of the short GPL message. We need

4677

to set a few good examples before the localization goes out for true

4678

in the free software community. Here are a few points to discuss:

4679

4680

@itemize @bullet

4681

@item

4682

Each group should have one FTP server (at least one master).

4683

4684

@item

4685

The files on the server should reflect the latest version (of

4686

course!) and it should also contain a RCS directory with the

4687

corresponding archives (I don't have this now).

4688

4689

@item

4690

There should also be a ChangeLog file (this is more useful than the

4691

RCS archive but can be generated automatically from the later by

4692

Emacs).

4693

4694

@item

4695

A @dfn{core group} should judge about questionable changes (for now

4696

this group consists solely by me but I ask some others occasionally;

4697

this also seems to work).

4698

4699

@end itemize

4700

4701

@node Mailing Lists, , National Teams, Organization

4702

@subsection Mailing Lists

4703

4704

If we get any inquiries about GNU @code{gettext}, send them on to:

4705

4706

@example

4707

@file{translation@@iro.umontreal.ca}

4708

@end example

4709

4710

The @file{*-pretest} lists are quite useful to me, maybe the idea could

4711

be generalized to many GNU, and non-GNU packages. But each maintainer

4712

his/her way!

4713

4714

Fran@,{c}ois, we have a mechanism in place here at

4715

@file{gnu.ai.mit.edu} to track teams, support mailing lists for

4716

them and log members. We have a slight preference that you use it.

4717

If this is OK with you, I can get you clued in.

4718

4719

Things are changing! A few years ago, when Daniel Fekete and I

4720

asked for a mailing list for GNU localization, nested at the FSF, we

4721

were politely invited to organize it anywhere else, and so did we.

4722

For communicating with my pretesters, I later made a handful of

4723

mailing lists located at iro.umontreal.ca and administrated by

4724

@code{majordomo}. These lists have been @emph{very} dependable

4725

so far@dots{}

4726

4727

I suspect that the German team will organize itself a mailing list

4728

located in Germany, and so forth for other countries. But before they

4729

organize for true, it could surely be useful to offer mailing lists

4730

located at the FSF to each national team. So yes, please explain me

4731

how I should proceed to create and handle them.

4732

4733

We should create temporary mailing lists, one per country, to help

4734

people organize. Temporary, because once regrouped and structured, it

4735

would be fair the volunteers from country bring back @emph{their} list

4736

in there and manage it as they want. My feeling is that, in the long

4737

run, each team should run its own list, from within their country.

4738

There also should be some central list to which all teams could

4739

subscribe as they see fit, as long as each team is represented in it.

4740

4741

@node Information Flow, , Organization, Translators

4742

@section Information Flow

4743

4744

There will surely be some discussion about this messages after the

4745

packages are finally released. If people now send you some proposals

4746

for better messages, how do you proceed? Jim, please note that

4747

right now, as I put forward nearly a dozen of localizable programs, I

4748

receive both the translations and the coordination concerns about them.

4749

4750

If I put one of my things to pretest, Ulrich receives the announcement

4751

and passes it on to the German team, who make last minute revisions.

4752

Then he submits the translation files to me @emph{as the maintainer}.

4753

For free packages I do not maintain, I would not even hear about it.

4754

This scheme could be made to work for the whole Translation Project,

4755

I think. For security reasons, maybe Ulrich (national coordinators,

4756

in fact) should update central registry kept at the Translation Project

4757

(Jim, me, or Len's recruits) once in a while.

4758

4759

In December/January, I was aggressively ready to internationalize

4760

all of GNU, giving myself the duty of one small GNU package per week

4761

or so, taking many weeks or months for bigger packages. But it does

4762

not work this way. I first did all the things I'm responsible for.

4763

I've nothing against some missionary work on other maintainers, but

4764

I'm also loosing a lot of energy over it---same debates over again.

4765

4766

And when the first localized packages are released we'll get a lot of

4767

responses about ugly translations :-). Surely, and we need to have

4768

beforehand a fairly good idea about how to handle the information

4769

flow between the national teams and the package maintainers.

4770

4771

Please start saving somewhere a quick history of each PO file. I know

4772

for sure that the file format will change, allowing for comments.

4773

It would be nice that each file has a kind of log, and references for

4774

those who want to submit comments or gripes, or otherwise contribute.

4775

I sent a proposal for a fast and flexible format, but it is not

4776

receiving acceptance yet by the GNU deciders. I'll tell you when I

4777

have more information about this.

4778

4779

@node Maintainers, Conclusion, Translators, Top

4780

@chapter The Maintainer's View

4781

4782

The maintainer of a package has many responsibilities. One of them

4783

is ensuring that the package will install easily on many platforms,

4784

and that the magic we described earlier (@pxref{Users}) will work

4785

for installers and end users.

4786

4787

Of course, there are many possible ways by which GNU @code{gettext}

4788

might be integrated in a distribution, and this chapter does not cover

4789

them in all generality. Instead, it details one possible approach which

4790

is especially adequate for many free software distributions following GNU

4791

standards, or even better, Gnits standards, because GNU @code{gettext}

4792

is purposely for helping the internationalization of the whole GNU

4793

project, and as many other good free packages as possible. So, the

4794

maintainer's view presented here presumes that the package already has

4795

a @file{configure.in} file and uses GNU Autoconf.

4796

4797

Nevertheless, GNU @code{gettext} may surely be useful for free packages

4798

not following GNU standards and conventions, but the maintainers of such

4799

packages might have to show imagination and initiative in organizing

4800

their distributions so @code{gettext} work for them in all situations.

4801

There are surely many, out there.

4802

4803

Even if @code{gettext} methods are now stabilizing, slight adjustments

4804

might be needed between successive @code{gettext} versions, so you

4805

should ideally revise this chapter in subsequent releases, looking

4806

for changes.

4807

4808

@menu

4809

* Flat and Non-Flat:: Flat or Non-Flat Directory Structures

4810

* Prerequisites:: Prerequisite Works

4811

* gettextize Invocation:: Invoking the @code{gettextize} Program

4812

* Adjusting Files:: Files You Must Create or Alter

4813

@end menu

4814

4815

@node Flat and Non-Flat, Prerequisites, Maintainers, Maintainers

4816

@section Flat or Non-Flat Directory Structures

4817

4818

Some free software packages are distributed as @code{tar} files which unpack

4819

in a single directory, these are said to be @dfn{flat} distributions.

4820

Other free software packages have a one level hierarchy of subdirectories, using

4821

for example a subdirectory named @file{doc/} for the Texinfo manual and

4822

man pages, another called @file{lib/} for holding functions meant to

4823

replace or complement C libraries, and a subdirectory @file{src/} for

4824

holding the proper sources for the package. These other distributions

4825

are said to be @dfn{non-flat}.

4826

4827

We cannot say much about flat distributions. A flat

4828

directory structure has the disadvantage of increasing the difficulty

4829

of updating to a new version of GNU @code{gettext}. Also, if you have

4830

many PO files, this could somewhat pollute your single directory.

4831

Also, GNU @code{gettext}'s libintl sources consist of C sources, shell

4832

scripts, @code{sed} scripts and complicated Makefile rules, which don't

4833

fit well into an existing flat structure. For these reasons, we

4834

recommend to use non-flat approach in this case as well.

4835

4836

Maybe because GNU @code{gettext} itself has a non-flat structure,

4837

we have more experience with this approach, and this is what will be

4838

described in the remaining of this chapter. Some maintainers might

4839

use this as an opportunity to unflatten their package structure.

4840

4841

@node Prerequisites, gettextize Invocation, Flat and Non-Flat, Maintainers

4842

@section Prerequisite Works

4843

4844

There are some works which are required for using GNU @code{gettext}

4845

in one of your package. These works have some kind of generality

4846

that escape the point by point descriptions used in the remainder

4847

of this chapter. So, we describe them here.

4848

4849

@itemize @bullet

4850

@item

4851

Before attempting to use @code{gettextize} you should install some

4852

other packages first.

4853

Ensure that recent versions of GNU @code{m4}, GNU Autoconf and GNU

4854

@code{gettext} are already installed at your site, and if not, proceed

4855

to do this first. If you got to install these things, beware that

4856

GNU @code{m4} must be fully installed before GNU Autoconf is even

4857

@emph{configured}.

4858

4859

To further ease the task of a package maintainer the @code{automake}

4860

package was designed and implemented. GNU @code{gettext} now uses this

4861

tool and the @file{Makefile}s in the @file{intl/} and @file{po/}

4862

therefore know about all the goals necessary for using @code{automake}

4863

and @file{libintl} in one project.

4864

4865

Those four packages are only needed to you, as a maintainer; the

4866

installers of your own package and end users do not really need any of

4867

GNU @code{m4}, GNU Autoconf, GNU @code{gettext}, or GNU @code{automake}

4868

for successfully installing and running your package, with messages

4869

properly translated. But this is not completely true if you provide

4870

internationalized shell scripts within your own package: GNU

4871

@code{gettext} shall then be installed at the user site if the end users

4872

want to see the translation of shell script messages.

4873

4874

@item

4875

Your package should use Autoconf and have a @file{configure.in} file.

4876

If it does not, you have to learn how. The Autoconf documentation

4877

is quite well written, it is a good idea that you print it and get

4878

familiar with it.

4879

4880

@item

4881

Your C sources should have already been modified according to

4882

instructions given earlier in this manual. @xref{Sources}.

4883

4884

@item

4885

Your @file{po/} directory should receive all PO files submitted to you

4886

by the translator teams, each having @file{@var{ll}.po} as a name.

4887

This is not usually easy to get translation

4888

work done before your package gets internationalized and available!

4889

Since the cycle has to start somewhere, the easiest for the maintainer

4890

is to start with absolutely no PO files, and wait until various

4891

translator teams get interested in your package, and submit PO files.

4892

4893

@end itemize

4894

4895

It is worth adding here a few words about how the maintainer should

4896

ideally behave with PO files submissions. As a maintainer, your role is

4897

to authentify the origin of the submission as being the representative

4898

of the appropriate translating teams of the Translation Project (forward

4899

the submission to @file{translation@@iro.umontreal.ca} in case of doubt),

4900

to ensure that the PO file format is not severely broken and does not

4901

prevent successful installation, and for the rest, to merely to put these

4902

PO files in @file{po/} for distribution.

4903

4904

As a maintainer, you do not have to take on your shoulders the

4905

responsibility of checking if the translations are adequate or

4906

complete, and should avoid diving into linguistic matters. Translation

4907

teams drive themselves and are fully responsible of their linguistic

4908

choices for the Translation Project. Keep in mind that translator teams are @emph{not}

4909

driven by maintainers. You can help by carefully redirecting all

4910

communications and reports from users about linguistic matters to the

4911

appropriate translation team, or explain users how to reach or join

4912

their team. The simplest might be to send them the @file{ABOUT-NLS} file.

4913

4914

Maintainers should @emph{never ever} apply PO file bug reports

4915

themselves, short-cutting translation teams. If some translator has

4916

difficulty to get some of her points through her team, it should not be

4917

an issue for her to directly negotiate translations with maintainers.

4918

Teams ought to settle their problems themselves, if any. If you, as

4919

a maintainer, ever think there is a real problem with a team, please

4920

never try to @emph{solve} a team's problem on your own.

4921

4922

@node gettextize Invocation, Adjusting Files, Prerequisites, Maintainers

4923

@section Invoking the @code{gettextize} Program

4924

4925

Some files are consistently and identically needed in every package

4926

internationalized through GNU @code{gettext}. As a matter of

4927

convenience, the @code{gettextize} program puts all these files right

4928

in your package. This program has the following synopsis:

4929

4930

@example

4931

gettextize [ @var{option}@dots{} ] [ @var{directory} ]

4932

@end example

4933

4934

@noindent

4935

and accepts the following options:

4936

4937

@table @samp

4938

@item -c

4939

@itemx --copy

4940

Copy the needed files instead of making symbolic links. Using links

4941

would allow the package to always use the latest @code{gettext} code

4942

available on the system, but it might disturb some mechanism the

4943

maintainer is used to apply to the sources. Because running

4944

@code{gettextize} is easy there shouldn't be problems with using copies.

4945

4946

@item -f

4947

@itemx --force

4948

Force replacement of files which already exist.

4949

4950

@item -h

4951

@itemx --help

4952

Display this help and exit.

4953

4954

@item --version

4955

Output version information and exit.

4956

4957

@end table

4958

4959

If @var{directory} is given, this is the top level directory of a

4960

package to prepare for using GNU @code{gettext}. If not given, it

4961

is assumed that the current directory is the top level directory of

4962

such a package.

4963

4964

The program @code{gettextize} provides the following files. However,

4965

no existing file will be replaced unless the option @code{--force}

4966

(@code{-f}) is specified.

4967

4968

@enumerate

4969

@item

4970

The @file{ABOUT-NLS} file is copied in the main directory of your package,

4971

the one being at the top level. This file gives the main indications

4972

about how to install and use the Native Language Support features

4973

of your program. You might elect to use a more recent copy of this

4974

@file{ABOUT-NLS} file than the one provided through @code{gettextize},

4975

if you have one handy. You may also fetch a more recent copy of file

4976

@file{ABOUT-NLS} from Translation Project sites, and from most GNU

4977

archive sites.

4978

4979

@item

4980

A @file{po/} directory is created for eventually holding

4981

all translation files, but initially only containing the file

4982

@file{po/Makefile.in.in} from the GNU @code{gettext} distribution.

4983

(beware the double @samp{.in} in the file name). If the @file{po/}

4984

directory already exists, it will be preserved along with the files

4985

it contains, and only @file{Makefile.in.in} will be overwritten.

4986

4987

@item

4988

A @file{intl/} directory is created and filled with most of the files

4989

originally in the @file{intl/} directory of the GNU @code{gettext}

4990

distribution. Also, if option @code{--force} (@code{-f}) is given,

4991

the @file{intl/} directory is emptied first.

4992

4993

@end enumerate

4994

4995

If your site support symbolic links, @code{gettextize} will not

4996

actually copy the files into your package, but establish symbolic

4997

links instead. This avoids duplicating the disk space needed in

4998

all packages. Merely using the @samp{-h} option while creating the

4999

@code{tar} archive of your distribution will resolve each link by an

5000

actual copy in the distribution archive. So, to insist, you really

5001

should use @samp{-h} option with @code{tar} within your @code{dist}

5002

goal of your main @file{Makefile.in}.

5003

5004

It is interesting to understand that most new files for supporting

5005

GNU @code{gettext} facilities in one package go in @file{intl/}

5006

and @file{po/} subdirectories. One distinction between these two

5007

directories is that @file{intl/} is meant to be completely identical

5008

in all packages using GNU @code{gettext}, while all newly created

5009

files, which have to be different, go into @file{po/}. There is a

5010

common @file{Makefile.in.in} in @file{po/}, because the @file{po/}

5011

directory needs its own @file{Makefile}, and it has been designed so

5012

it can be identical in all packages.

5013

5014

@node Adjusting Files, , gettextize Invocation, Maintainers

5015

@section Files You Must Create or Alter

5016

5017

Besides files which are automatically added through @code{gettextize},

5018

there are many files needing revision for properly interacting with

5019

GNU @code{gettext}. If you are closely following GNU standards for

5020

Makefile engineering and auto-configuration, the adaptations should

5021

be easier to achieve. Here is a point by point description of the

5022

changes needed in each.

5023

5024

So, here comes a list of files, each one followed by a description of

5025

all alterations it needs. Many examples are taken out from the GNU

5026

@code{gettext} @value{VERSION} distribution itself. You may indeed

5027

refer to the source code of the GNU @code{gettext} package, as it

5028

is intended to be a good example and master implementation for using

5029

its own functionality.

5030

5031

@menu

5032

* po/POTFILES.in:: @file{POTFILES.in} in @file{po/}

5033

* configure.in:: @file{configure.in} at top level

5034

* config.guess:: @file{config.guess}, @file{config.sub} at top level

5035

* aclocal:: @file{aclocal.m4} at top level

5036

* acconfig:: @file{acconfig.h} at top level

5037

* Makefile:: @file{Makefile.in} at top level

5038

* src/Makefile:: @file{Makefile.in} in @file{src/}

5039

@end menu

5040

5041

@node po/POTFILES.in, configure.in, Adjusting Files, Adjusting Files

5042

@subsection @file{POTFILES.in} in @file{po/}

5043

5044

The @file{po/} directory should receive a file named

5045

@file{POTFILES.in}. This file tells which files, among all program

5046

sources, have marked strings needing translation. Here is an example

5047

of such a file:

5048

5049

@example

5050

@group

5051

# List of source files containing translatable strings.

5052

5053

5054

# Common library files

5055

lib/error.c

5056

lib/getopt.c

5057

lib/xmalloc.c

5058

5059

# Package source files

5060

src/gettext.c

5061

src/msgfmt.c

5062

src/xgettext.c

5063

@end group

5064

@end example

5065

5066

@noindent

5067

Hash-marked comments and white lines are ignored. All other lines

5068

list those source files containing strings marked for translation

5069

(@pxref{Mark Keywords}), in a notation relative to the top level

5070

of your whole distribution, rather than the location of the

5071

@file{POTFILES.in} file itself.

5072

5073

@node configure.in, config.guess, po/POTFILES.in, Adjusting Files

5074

@subsection @file{configure.in} at top level

5075

5076

@enumerate

5077

@item Declare the package and version.

5078

5079

This is done by a set of lines like these:

5080

5081

@example

5082

PACKAGE=gettext

5083

VERSION=@value{VERSION}

5084

AC_DEFINE_UNQUOTED(PACKAGE, "$PACKAGE")

5085

AC_DEFINE_UNQUOTED(VERSION, "$VERSION")

5086

AC_SUBST(PACKAGE)

5087

AC_SUBST(VERSION)

5088

@end example

5089

5090

@noindent

5091

Of course, you replace @samp{gettext} with the name of your package,

5092

and @samp{@value{VERSION}} by its version numbers, exactly as they

5093

should appear in the packaged @code{tar} file name of your distribution

5094

(@file{gettext-@value{VERSION}.tar.gz}, here).

5095

5096

@item Declare the available translations.

5097

5098

This is done by defining @code{ALL_LINGUAS} to the white separated,

5099

quoted list of available languages, in a single line, like this:

5100

5101

@example

5102

ALL_LINGUAS="de fr"

5103

@end example

5104

5105

@noindent

5106

This example means that German and French PO files are available, so

5107

that these languages are currently supported by your package. If you

5108

want to further restrict, at installation time, the set of installed

5109

languages, this should not be done by modifying @code{ALL_LINGUAS} in

5110

@file{configure.in}, but rather by using the @code{LINGUAS} environment

5111

variable (@pxref{Installers}).

5112

5113

@item Check for internationalization support.

5114

5115

Here is the main @code{m4} macro for triggering internationalization

5116

support. Just add this line to @file{configure.in}:

5117

5118

@example

5119

AM_GNU_GETTEXT

5120

@end example

5121

5122

@noindent

5123

This call is purposely simple, even if it generates a lot of configure

5124

time checking and actions.

5125

5126

@item Have output files created.

5127

5128

The @code{AC_OUTPUT} directive, at the end of your @file{configure.in}

5129

file, needs to be modified in two ways:

5130

5131

@example

5132

AC_OUTPUT([@var{existing configuration files} intl/Makefile po/Makefile.in],

5133

@var{existing additional actions}])

5134

@end example

5135

5136

The modification to the first argument to @code{AC_OUTPUT} asks

5137

for substitution in the @file{intl/} and @file{po/} directories.

5138

Note the @samp{.in} suffix used for @file{po/} only. This is because

5139

the distributed file is really @file{po/Makefile.in.in}.

5140

5141

@end enumerate

5142

5143

@node config.guess, aclocal, configure.in, Adjusting Files

5144

@subsection @file{config.guess}, @file{config.sub} at top level

5145

5146

You need to add the GNU @file{config.guess} and @file{config.sub} files

5147

to your distribution. They are needed because the @file{intl/} directory

5148

has platform dependent support for determining the locale's character

5149

encoding and therefore needs to identify the platform.

5150

5151

You can obtain the newest version of @file{config.guess} and

5152

@file{config.sub} from @file{ftp://ftp.gnu.org/pub/gnu/config/}.

5153

Less recent versions are also contained in the GNU @code{automake} and

5154

GNU @code{libtool} packages.

5155

5156

Normally, @file{config.guess} and @file{config.sub} are put at the

5157

top level of a distribution. But it is also possible to put them in a

5158

subdirectory, altogether with other configuration support files like

5159

@file{install-sh}, @file{ltconfig}, @file{ltmain.sh},

5160

@file{mkinstalldirs} or @file{missing}. All you need to do, other than

5161

moving the files, is to add the following line to your

5162

@file{configure.in}.

5163

5164

@example

5165

AC_CONFIG_AUX_DIR([@var{subdir}])

5166

@end example

5167

5168

@node aclocal, acconfig, config.guess, Adjusting Files

5169

@subsection @file{aclocal.m4} at top level

5170

5171

If you do not have an @file{aclocal.m4} file in your distribution,

5172

the simplest is to concatenate the files @file{codeset.m4},

5173

@file{gettext.m4}, @file{glibc21.m4}, @file{iconv.m4}, @file{isc-posix.m4},

5174

@file{lcmessage.m4}, @file{progtest.m4} from GNU @code{gettext}'s

5175

@file{m4/} directory into a single file.

5176

5177

If you already have an @file{aclocal.m4} file, then you will have

5178

to merge the said macro files into your @file{aclocal.m4}. Note that if

5179

you are upgrading from a previous release of GNU @code{gettext}, you

5180

should most probably @emph{replace} the macros (@code{AM_GNU_GETTEXT},

5181

@code{AM_WITH_NLS}, etc.), as they usually

5182

change a little from one release of GNU @code{gettext} to the next.

5183

Their contents may vary as we get more experience with strange systems

5184

out there.

5185

5186

These macros check for the internationalization support functions

5187

and related informations. Hopefully, once stabilized, these macros

5188

might be integrated in the standard Autoconf set, because this

5189

piece of @code{m4} code will be the same for all projects using GNU

5190

@code{gettext}.

5191

5192

@node acconfig, Makefile, aclocal, Adjusting Files

5193

@subsection @file{acconfig.h} at top level

5194

5195

Earlier GNU @code{gettext} releases required to put definitions for

5196

@code{ENABLE_NLS}, @code{HAVE_GETTEXT} and @code{HAVE_LC_MESSAGES},

5197

@code{HAVE_STPCPY}, @code{PACKAGE} and @code{VERSION} into an

5198

@file{acconfig.h} file. This is not needed any more; you can remove

5199

them from your @file{acconfig.h} file unless your package uses them

5200

independently from the @file{intl/} directory.

5201

5202

@node Makefile, src/Makefile, acconfig, Adjusting Files

5203

@subsection @file{Makefile.in} at top level

5204

5205

Here are a few modifications you need to make to your main, top-level

5206

@file{Makefile.in} file.

5207

5208

@enumerate

5209

@item

5210

Add the following lines near the beginning of your @file{Makefile.in},

5211

so the @samp{dist:} goal will work properly (as explained further down):

5212

5213

@example

5214

PACKAGE = @@PACKAGE@@

5215

VERSION = @@VERSION@@

5216

@end example

5217

5218

@item

5219

Add file @file{ABOUT-NLS} to the @code{DISTFILES} definition, so the file gets

5220

distributed.

5221

5222

@item

5223

Wherever you process subdirectories in your @file{Makefile.in}, be sure

5224

you also process dir subdirectories @samp{intl} and @samp{po}. Special

5225

rules in the @file{Makefiles} take care for the case where no

5226

internationalization is wanted.

5227

5228

If you are using Makefiles, either generated by automake, or hand-written

5229

so they carefully follow the GNU coding standards, the effected goals for

5230

which the new subdirectories must be handled include @samp{installdirs},

5231

@samp{install}, @samp{uninstall}, @samp{clean}, @samp{distclean}.

5232

5233

Here is an example of a canonical order of processing. In this

5234

example, we also define @code{SUBDIRS} in @code{Makefile.in} for it

5235

to be further used in the @samp{dist:} goal.

5236

5237

@example

5238

SUBDIRS = doc intl lib src @@POSUB@@

5239

@end example

5240

5241

Note that you must arrange for @samp{make} to descend into the

5242

@code{intl} directory before descending into other directories containing

5243

code which make use of the @code{libintl.h} header file. For this

5244

reason, here we mention @code{intl} before @code{lib} and @code{src}.

5245

5246

@noindent

5247

that you will have to adapt to your own package.

5248

5249

@item

5250

A delicate point is the @samp{dist:} goal, as both

5251

@file{intl/Makefile} and @file{po/Makefile} will later assume that the

5252

proper directory has been set up from the main @file{Makefile}. Here is

5253

an example at what the @samp{dist:} goal might look like:

5254

5255

@example

5256

distdir = $(PACKAGE)-$(VERSION)

5257

dist: Makefile

5258

rm -fr $(distdir)

5259

mkdir $(distdir)

5260

chmod 777 $(distdir)

5261

for file in $(DISTFILES); do \

5262

ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir); \

5263

done

5264

for subdir in $(SUBDIRS); do \

5265

mkdir $(distdir)/$$subdir || exit 1; \

5266

chmod 777 $(distdir)/$$subdir; \

5267

(cd $$subdir && $(MAKE) $@@) || exit 1; \

5268

done

5269

tar chozf $(distdir).tar.gz $(distdir)

5270

rm -fr $(distdir)

5271

@end example

5272

5273

@end enumerate

5274

5275

@node src/Makefile, , Makefile, Adjusting Files

5276

@subsection @file{Makefile.in} in @file{src/}

5277

5278

Some of the modifications made in the main @file{Makefile.in} will

5279

also be needed in the @file{Makefile.in} from your package sources,

5280

which we assume here to be in the @file{src/} subdirectory. Here are

5281

all the modifications needed in @file{src/Makefile.in}:

5282

5283

@enumerate

5284

@item

5285

In view of the @samp{dist:} goal, you should have these lines near the

5286

beginning of @file{src/Makefile.in}:

5287

5288

@example

5289

PACKAGE = @@PACKAGE@@

5290

VERSION = @@VERSION@@

5291

@end example

5292

5293

@item

5294

If not done already, you should guarantee that @code{top_srcdir}

5295

gets defined. This will serve for @code{cpp} include files. Just add

5296

the line:

5297

5298

@example

5299

top_srcdir = @@top_srcdir@@

5300

@end example

5301

5302

@item

5303

You might also want to define @code{subdir} as @samp{src}, later

5304

allowing for almost uniform @samp{dist:} goals in all your

5305

@file{Makefile.in}. At list, the @samp{dist:} goal below assume that

5306

you used:

5307

5308

@example

5309

subdir = src

5310

@end example

5311

5312

@item

5313

The @code{main} function of your program will normally call

5314

@code{bindtextdomain} (see @pxref{Triggering}), like this:

5315

5316

@example

5317

bindtextdomain (@var{PACKAGE}, LOCALEDIR);

5318

@end example

5319

5320

To make LOCALEDIR known to the program, add the following lines to

5321

Makefile.in:

5322

5323

@example

5324

datadir = @@datadir@@

5325

localedir = $(datadir)/locale

5326

DEFS = -DLOCALEDIR=\"$(localedir)\" @@DEFS@@

5327

@end example

5328

5329

Note that @code{@@datadir@@} defaults to @samp{$(prefix)/share}, thus

5330

@code{$(localedir)} defaults to @samp{$(prefix)/share/locale}.

5331

5332

@item

5333

You should ensure that the final linking will use @code{@@INTLLIBS@@} as

5334

a library. An easy way to achieve this is to manage that it gets into

5335

@code{LIBS}, like this:

5336

5337

@example

5338

LIBS = @@INTLLIBS@@ @@LIBS@@

5339

@end example

5340

5341

In most packages internationalized with GNU @code{gettext}, one will

5342

find a directory @file{lib/} in which a library containing some helper

5343

functions will be build. (You need at least the few functions which the

5344

GNU @code{gettext} Library itself needs.) However some of the functions

5345

in the @file{lib/} also give messages to the user which of course should be

5346

translated, too. Taking care of this it is not enough to place the support

5347

library (say @file{libsupport.a}) just between the @code{@@INTLLIBS@@}

5348

and @code{@@LIBS@@} in the above example. Instead one has to write this:

5349

5350

@example

5351

LIBS = ../lib/libsupport.a @@INTLLIBS@@ ../lib/libsupport.a @@LIBS@@

5352

@end example

5353

5354

@item

5355

You should also ensure that directory @file{intl/} will be searched for

5356

C preprocessor include files in all circumstances. So, you have to

5357

manage so both @samp{-I../intl} and @samp{-I$(top_srcdir)/intl} will

5358

be given to the C compiler.

5359

5360

@item

5361

Your @samp{dist:} goal has to conform with others. Here is a

5362

reasonable definition for it:

5363

5364

@example

5365

distdir = ../$(PACKAGE)-$(VERSION)/$(subdir)

5366

dist: Makefile $(DISTFILES)

5367

for file in $(DISTFILES); do \

5368

ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir); \

5369

done

5370

@end example

5371

5372

@end enumerate

5373

5374

@node Conclusion, Language Codes, Maintainers, Top

5375

@chapter Concluding Remarks

5376

5377

We would like to conclude this GNU @code{gettext} manual by presenting

5378

an history of the Translation Project so far. We finally give

5379

a few pointers for those who want to do further research or readings

5380

about Native Language Support matters.

5381

5382

@menu

5383

* History:: History of GNU @code{gettext}

5384

* References:: Related Readings

5385

@end menu

5386

5387

@node History, References, Conclusion, Conclusion

5388

@section History of GNU @code{gettext}

5389

5390

Internationalization concerns and algorithms have been informally

5391

and casually discussed for years in GNU, sometimes around GNU

5392

@code{libc}, maybe around the incoming @code{Hurd}, or otherwise

5393

(nobody clearly remembers). And even then, when the work started for

5394

real, this was somewhat independently of these previous discussions.

5395

5396

This all began in July 1994, when Patrick D'Cruze had the idea and

5397

initiative of internationalizing version 3.9.2 of GNU @code{fileutils}.

5398

He then asked Jim Meyering, the maintainer, how to get those changes

5399

folded into an official release. That first draft was full of

5400

@code{#ifdef}s and somewhat disconcerting, and Jim wanted to find

5401

nicer ways. Patrick and Jim shared some tries and experimentations

5402

in this area. Then, feeling that this might eventually have a deeper

5403

impact on GNU, Jim wanted to know what standards were, and contacted

5404

Richard Stallman, who very quickly and verbally described an overall

5405

design for what was meant to become @code{glocale}, at that time.

5406

5407

Jim implemented @code{glocale} and got a lot of exhausting feedback

5408

from Patrick and Richard, of course, but also from Mitchum DSouza

5409

(who wrote a @code{catgets}-like package), Roland McGrath, maybe David

5410

MacKenzie, Fran@,{c}ois Pinard, and Paul Eggert, all pushing and

5411

pulling in various directions, not always compatible, to the extent

5412

that after a couple of test releases, @code{glocale} was torn apart.

5413

5414

While Jim took some distance and time and became dad for a second

5415

time, Roland wanted to get GNU @code{libc} internationalized, and

5416

got Ulrich Drepper involved in that project. Instead of starting

5417

from @code{glocale}, Ulrich rewrote something from scratch, but

5418

more conformant to the set of guidelines who emerged out of the

5419

@code{glocale} effort. Then, Ulrich got people from the previous

5420

forum to involve themselves into this new project, and the switch

5421

from @code{glocale} to what was first named @code{msgutils}, renamed

5422

@code{nlsutils}, and later @code{gettext}, became officially accepted

5423

by Richard in May 1995 or so.

5424

5425

Let's summarize by saying that Ulrich Drepper wrote GNU @code{gettext}

5426

in April 1995. The first official release of the package, including

5427

PO mode, occurred in July 1995, and was numbered 0.7. Other people

5428

contributed to the effort by providing a discussion forum around

5429

Ulrich, writing little pieces of code, or testing. These are quoted

5430

in the @code{THANKS} file which comes with the GNU @code{gettext}

5431

distribution.

5432

5433

While this was being done, Fran@,{c}ois adapted half a dozen of

5434

GNU packages to @code{glocale} first, then later to @code{gettext},

5435

putting them in pretest, so providing along the way an effective

5436

user environment for fine tuning the evolving tools. He also took

5437

the responsibility of organizing and coordinating the Translation

5438

Project. After nearly a year of informal exchanges between people from

5439

many countries, translator teams started to exist in May 1995, through

5440

the creation and support by Patrick D'Cruze of twenty unmoderated

5441

mailing lists for that many native languages, and two moderated

5442

lists: one for reaching all teams at once, the other for reaching

5443

all willing maintainers of internationalized free software packages.

5444

5445

Fran@,{c}ois also wrote PO mode in June 1995 with the collaboration

5446

of Greg McGary, as a kind of contribution to Ulrich's package.

5447

He also gave a hand with the GNU @code{gettext} Texinfo manual.

5448

5449

@node References, , History, Conclusion

5450

@section Related Readings

5451

5452

Eugene H. Dorr (@file{dorre@@well.com}) maintains an interesting

5453

bibliography on internationalization matters, called

5454

@cite{Internationalization Reference List}, which is available as:

5455

@example

5456

ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/i18n-books.txt

5457

@end example

5458

5459

Michael Gschwind (@file{mike@@vlsivie.tuwien.ac.at}) maintains a

5460

Frequently Asked Questions (FAQ) list, entitled @cite{Programming for

5461

Internationalisation}. This FAQ discusses writing programs which

5462

can handle different language conventions, character sets, etc.;

5463

and is applicable to all character set encodings, with particular

5464

emphasis on @w{ISO 8859-1}. It is regularly published in Usenet

5465

groups @file{comp.unix.questions}, @file{comp.std.internat},

5466

@file{comp.software.international}, @file{comp.lang.c},

5467

@file{comp.windows.x}, @file{comp.std.c}, @file{comp.answers}

5468

and @file{news.answers}. The home location of this document is:

5469

@example

5470

ftp://ftp.vlsivie.tuwien.ac.at/pub/8bit/ISO-programming

5471

@end example

5472

5473

Patrick D'Cruze (@file{pdcruze@@li.org}) wrote a tutorial about NLS

5474

matters, and Jochen Hein (@file{Hein@@student.tu-clausthal.de}) took

5475

over the responsibility of maintaining it. It may be found as:

5476

@example

5477

ftp://sunsite.unc.edu/pub/Linux/utils/nls/catalogs/Incoming/...

5478

...locale-tutorial-0.8.txt.gz

5479

@end example

5480

@noindent

5481

This site is mirrored in:

5482

@example

5483

ftp://ftp.ibp.fr/pub/linux/sunsite/

5484

@end example

5485

5486

A French version of the same tutorial should be findable at:

5487

@example

5488

ftp://ftp.ibp.fr/pub/linux/french/docs/

5489

@end example

5490

@noindent

5491

together with French translations of many Linux-related documents.

5492

5493

@node Language Codes, Country Codes, Conclusion, Top

5494

@appendix Language Codes

5495

5496

The @w{ISO 639} standard defines two character codes for many languages.

5497

All abbreviations for languages used in the Translation Project should

5498

come from this standard.

5499

5500

@table @samp

5501

@include iso-639.texi

5502

@end table

5503

5504

@node Country Codes, , Language Codes, Top

5505

@appendix Country Codes

5506

5507

The @w{ISO 3166} standard defines two character codes for many countries

5508

and territories. All abbreviations for countries used in the Translation

5509

Project should come from this standard.

5510

5511

@table @samp

5512

@include iso-3166.texi

5513

@end table

5514

5515

@contents

5516

@bye

5517

5518

@c Local variables:

5519

@c texinfo-column-for-description: 32

5520

@c End: