~win-cross-dev/win-cross/gettext

Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_10.html">previous</A>, <A HREF="gettext_12.html">next</A>, <A HREF="gettext_25.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.

<HR>

<H1><A NAME="SEC178" HREF="gettext_toc.html#TOC178">11 The Programmer's View</A></H1>

One aim of the current message catalog implementation provided by

GNU <CODE>gettext</CODE> was to use the system's message catalog handling, if the

installer wishes to do so. So we perhaps should first take a look at

the solutions we know about. The people in the POSIX committee did not

manage to agree on one of the semi-official standards which we'll

describe below. In fact they couldn't agree on anything, so they decided

only to include an example of an interface. The major Unix vendors

are split in the usage of the two most important specifications: X/Open's

catgets vs. Uniforum's gettext interface. We'll describe them both and

later explain our solution of this dilemma.

<H2><A NAME="SEC179" HREF="gettext_toc.html#TOC179">11.1 About <CODE>catgets</CODE></A></H2>

The <CODE>catgets</CODE> implementation is defined in the X/Open Portability

Guide, Volume 3, XSI Supplementary Definitions, Chapter 5. But the

process of creating this standard seemed to be too slow for some of

the Unix vendors so they created their implementations on preliminary

versions of the standard. Of course this leads again to problems while

writing platform independent programs: even the usage of <CODE>catgets</CODE>

does not guarantee a unique interface.

Another, personal comment on this that only a bunch of committee members

could have made this interface. They never really tried to program

using this interface. It is a fast, memory-saving implementation, an

user can happily live with it. But programmers hate it (at least I and

some others do...)

But we must not forget one point: after all the trouble with transferring

the rights on Unix(tm) they at last came to X/Open, the very same who

published this specification. This leads me to making the prediction

that this interface will be in future Unix standards (e.g. Spec1170) and

therefore part of all Unix implementation (implementations, which are

allowed to wear this name).

<H3><A NAME="SEC180" HREF="gettext_toc.html#TOC180">11.1.1 The Interface</A></H3>

The interface to the <CODE>catgets</CODE> implementation consists of three

functions which correspond to those used in file access: <CODE>catopen</CODE>

to open the catalog for using, <CODE>catgets</CODE> for accessing the message

tables, and <CODE>catclose</CODE> for closing after work is done. Prototypes

for the functions and the needed definitions are in the

<CODE><nl_types.h></CODE> header file.

<CODE>catopen</CODE> is used like in this:

<PRE>

nl_catd catd = catopen ("catalog_name", 0);

</PRE>

The function takes as the argument the name of the catalog. This usual

refers to the name of the program or the package. The second parameter

is not further specified in the standard. I don't even know whether it

is implemented consistently among various systems. So the common advice

is to use <CODE>0</CODE> as the value. The return value is a handle to the

message catalog, equivalent to handles to file returned by <CODE>open</CODE>.

100

101

102

This handle is of course used in the <CODE>catgets</CODE> function which can

103

be used like this:

104

105

106

107

<PRE>

108

char *translation = catgets (catd, set_no, msg_id, "original string");

109

</PRE>

110

111

112

The first parameter is this catalog descriptor. The second parameter

113

specifies the set of messages in this catalog, in which the message

114

described by <CODE>msg_id</CODE> is obtained. <CODE>catgets</CODE> therefore uses a

115

three-stage addressing:

116

117

118

119

<PRE>

120

catalog name => set number => message ID => translation

121

</PRE>

122

123

124

The fourth argument is not used to address the translation. It is given

125

as a default value in case when one of the addressing stages fail. One

126

important thing to remember is that although the return type of catgets

127

is <CODE>char *</CODE> the resulting string must not be changed. It

128

should better be <CODE>const char *</CODE>, but the standard is published in

129

1988, one year before ANSI C.

130

131

132

133

134

The last of these functions is used and behaves as expected:

135

136

137

138

<PRE>

139

catclose (catd);

140

</PRE>

141

142

143

After this no <CODE>catgets</CODE> call using the descriptor is legal anymore.

144

145

146

147

148

<H3><A NAME="SEC181" HREF="gettext_toc.html#TOC181">11.1.2 Problems with the <CODE>catgets</CODE> Interface?!</A></H3>

149

150

151

152

153

154

Now that this description seemed to be really easy -- where are the

155

problems we speak of? In fact the interface could be used in a

156

reasonable way, but constructing the message catalogs is a pain. The

157

reason for this lies in the third argument of <CODE>catgets</CODE>: the unique

158

message ID. This has to be a numeric value for all messages in a single

159

set. Perhaps you could imagine the problems keeping such a list while

160

changing the source code. Add a new message here, remove one there. Of

161

course there have been developed a lot of tools helping to organize this

162

chaos but one as the other fails in one aspect or the other. We don't

163

want to say that the other approach has no problems but they are far

164

more easy to manage.

165

166

167

168

169

<H2><A NAME="SEC182" HREF="gettext_toc.html#TOC182">11.2 About <CODE>gettext</CODE></A></H2>

170

171

172

173

174

175

The definition of the <CODE>gettext</CODE> interface comes from a Uniforum

176

proposal. It was submitted there by Sun, who had implemented the

177

<CODE>gettext</CODE> function in SunOS 4, around 1990. Nowadays, the

178

<CODE>gettext</CODE> interface is specified by the OpenI18N standard.

179

180

181

182

The main point about this solution is that it does not follow the

183

method of normal file handling (open-use-close) and that it does not

184

burden the programmer with so many tasks, especially the unique key handling.

185

Of course here also a unique key is needed, but this key is the message

186

itself (how long or short it is). See section <A HREF="gettext_11.html#SEC190">11.3 Comparing the Two Interfaces</A> for a more

187

detailed comparison of the two methods.

188

189

190

191

The following section contains a rather detailed description of the

192

interface. We make it that detailed because this is the interface

193

we chose for the GNU <CODE>gettext</CODE> Library. Programmers interested

194

in using this library will be interested in this description.

195

196

197

198

199

200

<H3><A NAME="SEC183" HREF="gettext_toc.html#TOC183">11.2.1 The Interface</A></H3>

201

202

203

204

205

206

The minimal functionality an interface must have is a) to select a

207

domain the strings are coming from (a single domain for all programs is

208

not reasonable because its construction and maintenance is difficult,

209

perhaps impossible) and b) to access a string in a selected domain.

210

211

212

213

This is principally the description of the <CODE>gettext</CODE> interface. It

214

has a global domain which unqualified usages reference. Of course this

215

domain is selectable by the user.

216

217

218

219

<PRE>

220

char *textdomain (const char *domain_name);

221

</PRE>

222

223

224

This provides the possibility to change or query the current status of

225

the current global domain of the <CODE>LC_MESSAGE</CODE> category. The

226

argument is a null-terminated string, whose characters must be legal in

227

the use in filenames. If the <VAR>domain_name</VAR> argument is <CODE>NULL</CODE>,

228

the function returns the current value. If no value has been set

229

before, the name of the default domain is returned: messages.

230

Please note that although the return value of <CODE>textdomain</CODE> is of

231

type <CODE>char *</CODE> no changing is allowed. It is also important to know

232

that no checks of the availability are made. If the name is not

233

available you will see this by the fact that no translations are provided.

234

235

236

237

To use a domain set by <CODE>textdomain</CODE> the function

238

239

240

241

<PRE>

242

char *gettext (const char *msgid);

243

</PRE>

244

245

246

is to be used. This is the simplest reasonable form one can imagine.

247

The translation of the string <VAR>msgid</VAR> is returned if it is available

248

in the current domain. If it is not available, the argument itself is

249

returned. If the argument is <CODE>NULL</CODE> the result is undefined.

250

251

252

253

One thing which should come into mind is that no explicit dependency to

254

the used domain is given. The current value of the domain is used.

255

If this changes between two

256

executions of the same <CODE>gettext</CODE> call in the program, both calls

257

reference a different message catalog.

258

259

260

261

For the easiest case, which is normally used in internationalized

262

packages, once at the beginning of execution a call to <CODE>textdomain</CODE>

263

is issued, setting the domain to a unique name, normally the package

264

name. In the following code all strings which have to be translated are

265

filtered through the gettext function. That's all, the package speaks

266

your language.

267

268

269

270

271

<H3><A NAME="SEC184" HREF="gettext_toc.html#TOC184">11.2.2 Solving Ambiguities</A></H3>

272

273

274

275

276

277

278

279

While this single name domain works well for most applications there

280

might be the need to get translations from more than one domain. Of

281

course one could switch between different domains with calls to

282

<CODE>textdomain</CODE>, but this is really not convenient nor is it fast. A

283

possible situation could be one case subject to discussion during this

284

writing: all

285

error messages of functions in the set of common used functions should

286

go into a separate domain <CODE>error</CODE>. By this mean we would only need

287

to translate them once.

288

Another case are messages from a library, as these have to be

289

independent of the current domain set by the application.

290

291

292

293

For this reasons there are two more functions to retrieve strings:

294

295

296

297

<PRE>

298

char *dgettext (const char *domain_name, const char *msgid);

299

char *dcgettext (const char *domain_name, const char *msgid,

300

int category);

301

</PRE>

302

303

304

Both take an additional argument at the first place, which corresponds

305

to the argument of <CODE>textdomain</CODE>. The third argument of

306

<CODE>dcgettext</CODE> allows to use another locale category but <CODE>LC_MESSAGES</CODE>.

307

But I really don't know where this can be useful. If the

308

<VAR>domain_name</VAR> is <CODE>NULL</CODE> or <VAR>category</VAR> has an value beside

309

the known ones, the result is undefined. It should also be noted that

310

this function is not part of the second known implementation of this

311

function family, the one found in Solaris.

312

313

314

315

A second ambiguity can arise by the fact, that perhaps more than one

316

domain has the same name. This can be solved by specifying where the

317

needed message catalog files can be found.

318

319

320

321

<PRE>

322

char *bindtextdomain (const char *domain_name,

323

const char *dir_name);

324

</PRE>

325

326

327

Calling this function binds the given domain to a file in the specified

328

directory (how this file is determined follows below). Especially a

329

file in the systems default place is not favored against the specified

330

file anymore (as it would be by solely using <CODE>textdomain</CODE>). A

331

<CODE>NULL</CODE> pointer for the <VAR>dir_name</VAR> parameter returns the binding

332

associated with <VAR>domain_name</VAR>. If <VAR>domain_name</VAR> itself is

333

<CODE>NULL</CODE> nothing happens and a <CODE>NULL</CODE> pointer is returned. Here

334

again as for all the other functions is true that none of the return

335

value must be changed!

336

337

338

339

It is important to remember that relative path names for the

340

<VAR>dir_name</VAR> parameter can be trouble. Since the path is always

341

computed relative to the current directory different results will be

342

achieved when the program executes a <CODE>chdir</CODE> command. Relative

343

paths should always be avoided to avoid dependencies and

344

unreliabilities.

345

346

347

348

349

<H3><A NAME="SEC185" HREF="gettext_toc.html#TOC185">11.2.3 Locating Message Catalog Files</A></H3>

350

351

352

353

354

355

Because many different languages for many different packages have to be

356

stored we need some way to add these information to file message catalog

357

files. The way usually used in Unix environments is have this encoding

358

in the file name. This is also done here. The directory name given in

359

<CODE>bindtextdomain</CODE>s second argument (or the default directory),

360

followed by the name of the locale, the locale category, and the domain name

361

are concatenated:

362

363

364

365

<PRE>

366

<VAR>dir_name</VAR>/<VAR>locale</VAR>/LC_<VAR>category</VAR>/<VAR>domain_name</VAR>.mo

367

</PRE>

368

369

370

The default value for <VAR>dir_name</VAR> is system specific. For the GNU

371

library, and for packages adhering to its conventions, it's:

372

373

<PRE>

374

/usr/local/share/locale

375

</PRE>

376

377

378

<VAR>locale</VAR> is the name of the locale category which is designated by

379

<CODE>LC_<VAR>category</VAR></CODE>. For <CODE>gettext</CODE> and <CODE>dgettext</CODE> this

380

<CODE>LC_<VAR>category</VAR></CODE> is always <CODE>LC_MESSAGES</CODE>.<A NAME="DOCF3" HREF="gettext_foot.html#FOOT3">(3)</A>

381

The name of the locale category is determined through

382

<CODE>setlocale (LC_<VAR>category</VAR>, NULL)</CODE>.

383

384

When using the function <CODE>dcgettext</CODE>, you can specify the locale category

385

through the third argument.

386

387

388

389

390

<H3><A NAME="SEC186" HREF="gettext_toc.html#TOC186">11.2.4 How to specify the output character set <CODE>gettext</CODE> uses</A></H3>

391

392

393

394

395

396

397

<CODE>gettext</CODE> not only looks up a translation in a message catalog. It

398

also converts the translation on the fly to the desired output character

399

set. This is useful if the user is working in a different character set

400

than the translator who created the message catalog, because it avoids

401

distributing variants of message catalogs which differ only in the

402

character set.

403

404

405

406

The output character set is, by default, the value of <CODE>nl_langinfo

407

(CODESET)</CODE>, which depends on the <CODE>LC_CTYPE</CODE> part of the current

408

locale. But programs which store strings in a locale independent way

409

(e.g. UTF-8) can request that <CODE>gettext</CODE> and related functions

410

return the translations in that encoding, by use of the

411

<CODE>bind_textdomain_codeset</CODE> function.

412

413

414

415

Note that the <VAR>msgid</VAR> argument to <CODE>gettext</CODE> is not subject to

416

character set conversion. Also, when <CODE>gettext</CODE> does not find a

417

translation for <VAR>msgid</VAR>, it returns <VAR>msgid</VAR> unchanged --

418

independently of the current output character set. It is therefore

419

recommended that all <VAR>msgid</VAR>s be US-ASCII strings.

420

421

422

423

<DL>

424

<DT>Function: char * bind_textdomain_codeset (const char *<VAR>domainname</VAR>, const char *<VAR>codeset</VAR>)

425

426

The <CODE>bind_textdomain_codeset</CODE> function can be used to specify the

427

output character set for message catalogs for domain <VAR>domainname</VAR>.

428

The <VAR>codeset</VAR> argument must be a valid codeset name which can be used

429

for the <CODE>iconv_open</CODE> function, or a null pointer.

430

431

432

433

If the <VAR>codeset</VAR> parameter is the null pointer,

434

<CODE>bind_textdomain_codeset</CODE> returns the currently selected codeset

435

for the domain with the name <VAR>domainname</VAR>. It returns <CODE>NULL</CODE> if

436

no codeset has yet been selected.

437

438

439

440

The <CODE>bind_textdomain_codeset</CODE> function can be used several times.

441

If used multiple times with the same <VAR>domainname</VAR> argument, the

442

later call overrides the settings made by the earlier one.

443

444

445

446

The <CODE>bind_textdomain_codeset</CODE> function returns a pointer to a

447

string containing the name of the selected codeset. The string is

448

allocated internally in the function and must not be changed by the

449

user. If the system went out of core during the execution of

450

<CODE>bind_textdomain_codeset</CODE>, the return value is <CODE>NULL</CODE> and the

451

global variable <VAR>errno</VAR> is set accordingly.

452

</DL>

453

454

455

456

457

<H3><A NAME="SEC187" HREF="gettext_toc.html#TOC187">11.2.5 Using contexts for solving ambiguities</A></H3>

458

459

460

461

462

463

464

465

466

One place where the <CODE>gettext</CODE> functions, if used normally, have big

467

problems is within programs with graphical user interfaces (GUIs). The

468

problem is that many of the strings which have to be translated are very

469

short. They have to appear in pull-down menus which restricts the

470

length. But strings which are not containing entire sentences or at

471

least large fragments of a sentence may appear in more than one

472

situation in the program but might have different translations. This is

473

especially true for the one-word strings which are frequently used in

474

GUI programs.

475

476

477

478

As a consequence many people say that the <CODE>gettext</CODE> approach is

479

wrong and instead <CODE>catgets</CODE> should be used which indeed does not

480

have this problem. But there is a very simple and powerful method to

481

handle this kind of problems with the <CODE>gettext</CODE> functions.

482

483

484

485

Contexts can be added to strings to be translated. A context dependent

486

translation lookup is when a translation for a given string is searched,

487

that is limited to a given context. The translation for the same string

488

in a different context can be different. The different translations of

489

the same string in different contexts can be stored in the in the same

490

MO file, and can be edited by the translator in the same PO file.

491

492

493

494

The <TT>‘gettext.h’</TT> include file contains the lookup macros for strings

495

with contexts. They are implemented as thin macros and inline functions

496

over the functions from <CODE><libintl.h></CODE>.

497

498

499

500

501

502

<PRE>

503

const char *pgettext (const char *msgctxt, const char *msgid);

504

</PRE>

505

506

507

In a call of this macro, <VAR>msgctxt</VAR> and <VAR>msgid</VAR> must be string

508

literals. The macro returns the translation of <VAR>msgid</VAR>, restricted

509

to the context given by <VAR>msgctxt</VAR>.

510

511

512

513

The <VAR>msgctxt</VAR> string is visible in the PO file to the translator.

514

You should try to make it somehow canonical and never changing. Because

515

every time you change an <VAR>msgctxt</VAR>, the translator will have to review

516

the translation of <VAR>msgid</VAR>.

517

518

519

520

Finding a canonical <VAR>msgctxt</VAR> string that doesn't change over time can

521

be hard. But you shouldn't use the file name or class name containing the

522

<CODE>pgettext</CODE> call -- because it is a common development task to rename

523

a file or a class, and it shouldn't cause translator work. Also you shouldn't

524

use a comment in the form of a complete English sentence as <VAR>msgctxt</VAR> --

525

because orthography or grammar changes are often applied to such sentences,

526

and again, it shouldn't force the translator to do a review.

527

528

529

530

The <SAMP>‘p’</SAMP> in <SAMP>‘pgettext’</SAMP> stands for “particular”: <CODE>pgettext</CODE>

531

fetches a particular translation of the <VAR>msgid</VAR>.

532

533

534

535

536

537

538

<PRE>

539

const char *dpgettext (const char *domain_name,

540

const char *msgctxt, const char *msgid);

541

const char *dcpgettext (const char *domain_name,

542

const char *msgctxt, const char *msgid,

543

int category);

544

</PRE>

545

546

547

These are generalizations of <CODE>pgettext</CODE>. They behave similarly to

548

<CODE>dgettext</CODE> and <CODE>dcgettext</CODE>, respectively. The <VAR>domain_name</VAR>

549

argument defines the translation domain. The <VAR>category</VAR> argument

550

allows to use another locale category than <CODE>LC_MESSAGES</CODE>.

551

552

553

554

As as example consider the following fictional situation. A GUI program

555

has a menu bar with the following entries:

556

557

558

559

<PRE>

560

+------------+------------+--------------------------------------+

561

| File | Printer | |

562

+------------+------------+--------------------------------------+

563

| Open | | Select |

564

| New | | Open |

565

+----------+ | Connect |

566

+----------+

567

</PRE>

568

569

570

To have the strings <CODE>File</CODE>, <CODE>Printer</CODE>, <CODE>Open</CODE>,

571

<CODE>New</CODE>, <CODE>Select</CODE>, and <CODE>Connect</CODE> translated there has to be

572

at some point in the code a call to a function of the <CODE>gettext</CODE>

573

family. But in two places the string passed into the function would be

574

<CODE>Open</CODE>. The translations might not be the same and therefore we

575

are in the dilemma described above.

576

577

578

579

What distinguishes the two places is the menu path from the menu root to

580

the particular menu entries:

581

582

583

584

<PRE>

585

Menu|File

586

Menu|Printer

587

Menu|File|Open

588

Menu|File|New

589

Menu|Printer|Select

590

Menu|Printer|Open

591

Menu|Printer|Connect

592

</PRE>

593

594

595

The context is thus the menu path without its last part. So, the calls

596

look like this:

597

598

599

600

<PRE>

601

pgettext ("Menu|", "File")

602

pgettext ("Menu|", "Printer")

603

pgettext ("Menu|File|", "Open")

604

pgettext ("Menu|File|", "New")

605

pgettext ("Menu|Printer|", "Select")

606

pgettext ("Menu|Printer|", "Open")

607

pgettext ("Menu|Printer|", "Connect")

608

</PRE>

609

610

611

Whether or not to use the <SAMP>‘|’</SAMP> character at the end of the context is a

612

matter of style.

613

614

615

616

For more complex cases, where the <VAR>msgctxt</VAR> or <VAR>msgid</VAR> are not

617

string literals, more general macros are available:

618

619

620

621

622

623

624

625

<PRE>

626

const char *pgettext_expr (const char *msgctxt, const char *msgid);

627

const char *dpgettext_expr (const char *domain_name,

628

const char *msgctxt, const char *msgid);

629

const char *dcpgettext_expr (const char *domain_name,

630

const char *msgctxt, const char *msgid,

631

int category);

632

</PRE>

633

634

635

Here <VAR>msgctxt</VAR> and <VAR>msgid</VAR> can be arbitrary string-valued expressions.

636

These macros are more general. But in the case that both argument expressions

637

are string literals, the macros without the <SAMP>‘_expr’</SAMP> suffix are more

638

efficient.

639

640

641

642

643

<H3><A NAME="SEC188" HREF="gettext_toc.html#TOC188">11.2.6 Additional functions for plural forms</A></H3>

644

645

646

647

648

649

The functions of the <CODE>gettext</CODE> family described so far (and all the

650

<CODE>catgets</CODE> functions as well) have one problem in the real world

651

which have been neglected completely in all existing approaches. What

652

is meant here is the handling of plural forms.

653

654

655

656

Looking through Unix source code before the time anybody thought about

657

internationalization (and, sadly, even afterwards) one can often find

658

code similar to the following:

659

660

661

662

<PRE>

663

printf ("%d file%s deleted", n, n == 1 ? "" : "s");

664

</PRE>

665

666

667

After the first complaints from people internationalizing the code people

668

either completely avoided formulations like this or used strings like

669

<CODE>"file(s)"</CODE>. Both look unnatural and should be avoided. First

670

tries to solve the problem correctly looked like this:

671

672

673

674

<PRE>

675

if (n == 1)

676

printf ("%d file deleted", n);

677

else

678

printf ("%d files deleted", n);

679

</PRE>

680

681

682

But this does not solve the problem. It helps languages where the

683

plural form of a noun is not simply constructed by adding an

684

‘s’

685

but that is all. Once again people fell into the trap of believing the

686

rules their language is using are universal. But the handling of plural

687

forms differs widely between the language families. For example,

688

Rafal Maszkowski <CODE><rzm@mat.uni.torun.pl></CODE> reports:

689

690

691

692

693

694

In Polish we use e.g. plik (file) this way:

695

696

<PRE>

697

1 plik

698

2,3,4 pliki

699

5-21 pliko'w

700

22-24 pliki

701

25-31 pliko'w

702

</PRE>

703

704

705

and so on (o' means 8859-2 oacute which should be rather okreska,

706

similar to aogonek).

707

</BLOCKQUOTE>

708

709

710

There are two things which can differ between languages (and even inside

711

language families);

712

713

714

715

<UL>

716

<LI>

717

718

The form how plural forms are built differs. This is a problem with

719

languages which have many irregularities. German, for instance, is a

720

drastic case. Though English and German are part of the same language

721

family (Germanic), the almost regular forming of plural noun forms

722

(appending an

723

‘s’)

724

is hardly found in German.

725

726

<LI>

727

728

The number of plural forms differ. This is somewhat surprising for

729

those who only have experiences with Romanic and Germanic languages

730

since here the number is the same (there are two).

731

732

But other language families have only one form or many forms. More

733

information on this in an extra section.

734

</UL>

735

736

737

The consequence of this is that application writers should not try to

738

solve the problem in their code. This would be localization since it is

739

only usable for certain, hardcoded language environments. Instead the

740

extended <CODE>gettext</CODE> interface should be used.

741

742

743

744

These extra functions are taking instead of the one key string two

745

strings and a numerical argument. The idea behind this is that using

746

the numerical argument and the first string as a key, the implementation

747

can select using rules specified by the translator the right plural

748

form. The two string arguments then will be used to provide a return

749

value in case no message catalog is found (similar to the normal

750

<CODE>gettext</CODE> behavior). In this case the rules for Germanic language

751

is used and it is assumed that the first string argument is the singular

752

form, the second the plural form.

753

754

755

756

This has the consequence that programs without language catalogs can

757

display the correct strings only if the program itself is written using

758

a Germanic language. This is a limitation but since the GNU C library

759

(as well as the GNU <CODE>gettext</CODE> package) are written as part of the

760

GNU package and the coding standards for the GNU project require program

761

being written in English, this solution nevertheless fulfills its

762

purpose.

763

764

765

766

<DL>

767

<DT>Function: char * ngettext (const char *<VAR>msgid1</VAR>, const char *<VAR>msgid2</VAR>, unsigned long int <VAR>n</VAR>)

768

769

The <CODE>ngettext</CODE> function is similar to the <CODE>gettext</CODE> function

770

as it finds the message catalogs in the same way. But it takes two

771

extra arguments. The <VAR>msgid1</VAR> parameter must contain the singular

772

form of the string to be converted. It is also used as the key for the

773

search in the catalog. The <VAR>msgid2</VAR> parameter is the plural form.

774

The parameter <VAR>n</VAR> is used to determine the plural form. If no

775

message catalog is found <VAR>msgid1</VAR> is returned if <CODE>n == 1</CODE>,

776

otherwise <CODE>msgid2</CODE>.

777

778

779

780

An example for the use of this function is:

781

782

783

784

<PRE>

785

printf (ngettext ("%d file removed", "%d files removed", n), n);

786

</PRE>

787

788

789

Please note that the numeric value <VAR>n</VAR> has to be passed to the

790

<CODE>printf</CODE> function as well. It is not sufficient to pass it only to

791

<CODE>ngettext</CODE>.

792

793

794

795

In the English singular case, the number -- always 1 -- can be replaced with

796

"one":

797

798

799

800

<PRE>

801

printf (ngettext ("One file removed", "%d files removed", n), n);

802

</PRE>

803

804

805

This works because the <SAMP>‘printf’</SAMP> function discards excess arguments that

806

are not consumed by the format string.

807

808

809

810

If this function is meant to yield a format string that takes two or more

811

arguments, you can not use it like this:

812

813

814

815

<PRE>

816

printf (ngettext ("%d file removed from directory %s",

817

"%d files removed from directory %s",

818

n, dir),

819

n);

820

</PRE>

821

822

823

because in many languages the translators want to replace the <SAMP>‘%d’</SAMP>

824

with an explicit word in the singular case, just like “one” in English,

825

and C format strings cannot consume the second argument but skip the first

826

argument. Instead, you have to reorder the arguments so that <SAMP>‘n’</SAMP>

827

comes last:

828

829

830

831

<PRE>

832

printf (ngettext ("%$2d file removed from directory %$1s",

833

"%$2d files removed from directory %$1s",

834

dir, n),

835

n);

836

</PRE>

837

838

839

See section <A HREF="gettext_15.html#SEC249">15.3.1 C Format Strings</A> for details about this argument reordering syntax.

840

841

842

843

When you know that the value of <CODE>n</CODE> is within a given range, you can

844

specify it as a comment directed to the <CODE>xgettext</CODE> tool. This

845

information may help translators to use more adequate translations. Like

846

this:

847

848

849

850

<PRE>

851

if (days > 7 && days < 14)

852

/* xgettext: range: 1..6 */

853

printf (ngettext ("one week and one day", "one week and %d days",

854

days - 7),

855

days - 7);

856

</PRE>

857

858

859

It is also possible to use this function when the strings don't contain a

860

cardinal number:

861

862

863

864

<PRE>

865

puts (ngettext ("Delete the selected file?",

866

"Delete the selected files?",

867

n));

868

</PRE>

869

870

871

In this case the number <VAR>n</VAR> is only used to choose the plural form.

872

</DL>

873

874

875

876

<DL>

877

<DT>Function: char * dngettext (const char *<VAR>domain</VAR>, const char *<VAR>msgid1</VAR>, const char *<VAR>msgid2</VAR>, unsigned long int <VAR>n</VAR>)

878

879

The <CODE>dngettext</CODE> is similar to the <CODE>dgettext</CODE> function in the

880

way the message catalog is selected. The difference is that it takes

881

two extra parameter to provide the correct plural form. These two

882

parameters are handled in the same way <CODE>ngettext</CODE> handles them.

883

</DL>

884

885

886

887

<DL>

888

<DT>Function: char * dcngettext (const char *<VAR>domain</VAR>, const char *<VAR>msgid1</VAR>, const char *<VAR>msgid2</VAR>, unsigned long int <VAR>n</VAR>, int <VAR>category</VAR>)

889

890

The <CODE>dcngettext</CODE> is similar to the <CODE>dcgettext</CODE> function in the

891

way the message catalog is selected. The difference is that it takes

892

two extra parameter to provide the correct plural form. These two

893

parameters are handled in the same way <CODE>ngettext</CODE> handles them.

894

</DL>

895

896

897

898

Now, how do these functions solve the problem of the plural forms?

899

Without the input of linguists (which was not available) it was not

900

possible to determine whether there are only a few different forms in

901

which plural forms are formed or whether the number can increase with

902

every new supported language.

903

904

905

906

Therefore the solution implemented is to allow the translator to specify

907

the rules of how to select the plural form. Since the formula varies

908

with every language this is the only viable solution except for

909

hardcoding the information in the code (which still would require the

910

possibility of extensions to not prevent the use of new languages).

911

912

913

914

915

916

917

The information about the plural form selection has to be stored in the

918

header entry of the PO file (the one with the empty <CODE>msgid</CODE> string).

919

The plural form information looks like this:

920

921

922

923

<PRE>

924

Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;

925

</PRE>

926

927

928

The <CODE>nplurals</CODE> value must be a decimal number which specifies how

929

many different plural forms exist for this language. The string

930

following <CODE>plural</CODE> is an expression which is using the C language

931

syntax. Exceptions are that no negative numbers are allowed, numbers

932

must be decimal, and the only variable allowed is <CODE>n</CODE>. Spaces are

933

allowed in the expression, but backslash-newlines are not; in the

934

examples below the backslash-newlines are present for formatting purposes

935

only. This expression will be evaluated whenever one of the functions

936

<CODE>ngettext</CODE>, <CODE>dngettext</CODE>, or <CODE>dcngettext</CODE> is called. The

937

numeric value passed to these functions is then substituted for all uses

938

of the variable <CODE>n</CODE> in the expression. The resulting value then

939

must be greater or equal to zero and smaller than the value given as the

940

value of <CODE>nplurals</CODE>.

941

942

943

944

945

The following rules are known at this point. The language with families

946

are listed. But this does not necessarily mean the information can be

947

generalized for the whole family (as can be easily seen in the table

948

below).<A NAME="DOCF5" HREF="gettext_foot.html#FOOT5">(5)</A>

949

950

951

952

953

<DT>Only one form:

954

<DD>

955

Some languages only require one single form. There is no distinction

956

between the singular and plural form. An appropriate header entry

957

would look like this:

958

959

960

<PRE>

961

Plural-Forms: nplurals=1; plural=0;

962

</PRE>

963

964

Languages with this property include:

965

966

967

968

<DT>Asian family

969

<DD>

970

Japanese,

971

Vietnamese,

972

Korean

973

</DL>

974

975

<DT>Two forms, singular used for one only

976

<DD>

977

This is the form used in most existing programs since it is what English

978

is using. A header entry would look like this:

979

980

981

<PRE>

982

Plural-Forms: nplurals=2; plural=n != 1;

983

</PRE>

984

985

(Note: this uses the feature of C expressions that boolean expressions

986

have to value zero or one.)

987

988

Languages with this property include:

989

990

991

992

<DT>Germanic family

993

<DD>

994

English,

995

German,

996

Dutch,

997

Swedish,

998

Danish,

999

Norwegian,

1000

Faroese

1001

<DT>Romanic family

1002

<DD>

1003

Spanish,

1004

Portuguese,

1005

Italian,

1006

Bulgarian

1007

<DT>Latin/Greek family

1008

<DD>

1009

Greek

1010

<DT>Finno-Ugric family

1011

<DD>

1012

Finnish,

1013

Estonian

1014

<DT>Semitic family

1015

<DD>

1016

Hebrew

1017

<DT>Artificial

1018

<DD>

1019

Esperanto

1020

</DL>

1021

1022

Other languages using the same header entry are:

1023

1024

1025

1026

<DT>Finno-Ugric family

1027

<DD>

1028

Hungarian

1029

<DT>Turkic/Altaic family

1030

<DD>

1031

Turkish

1032

</DL>

1033

1034

Hungarian does not appear to have a plural if you look at sentences involving

1035

cardinal numbers. For example, “1 apple” is “1 alma”, and “123 apples” is

1036

“123 alma”. But when the number is not explicit, the distinction between

1037

singular and plural exists: “the apple” is “az alma”, and “the apples” is

1038

“az alm'{a}k”. Since <CODE>ngettext</CODE> has to support both types of sentences,

1039

it is classified here, under “two forms”.

1040

1041

The same holds for Turkish: “1 apple” is “1 elma”, and “123 apples” is

1042

“123 elma”. But when the number is omitted, the distinction between singular

1043

and plural exists: “the apple” is “elma”, and “the apples” is

1044

“elmalar”.

1045

1046

<DT>Two forms, singular used for zero and one

1047

<DD>

1048

Exceptional case in the language family. The header entry would be:

1049

1050

1051

<PRE>

1052

Plural-Forms: nplurals=2; plural=n>1;

1053

</PRE>

1054

1055

Languages with this property include:

1056

1057

1058

1059

<DT>Romanic family

1060

<DD>

1061

Brazilian Portuguese,

1062

French

1063

</DL>

1064

1065

<DT>Three forms, special case for zero

1066

<DD>

1067

The header entry would be:

1068

1069

1070

<PRE>

1071

Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2;

1072

</PRE>

1073

1074

Languages with this property include:

1075

1076

1077

1078

<DT>Baltic family

1079

<DD>

1080

Latvian

1081

</DL>

1082

1083

<DT>Three forms, special cases for one and two

1084

<DD>

1085

The header entry would be:

1086

1087

1088

<PRE>

1089

Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2;

1090

</PRE>

1091

1092

Languages with this property include:

1093

1094

1095

1096

<DT>Celtic

1097

<DD>

1098

Gaeilge (Irish)

1099

</DL>

1100

1101

<DT>Three forms, special case for numbers ending in 00 or [2-9][0-9]

1102

<DD>

1103

The header entry would be:

1104

1105

1106

<PRE>

1107

Plural-Forms: nplurals=3; \

1108

plural=n==1 ? 0 : (n==0 || (n%100 > 0 && n%100 < 20)) ? 1 : 2;

1109

</PRE>

1110

1111

Languages with this property include:

1112

1113

1114

1115

<DT>Romanic family

1116

<DD>

1117

Romanian

1118

</DL>

1119

1120

<DT>Three forms, special case for numbers ending in 1[2-9]

1121

<DD>

1122

The header entry would look like this:

1123

1124

1125

<PRE>

1126

Plural-Forms: nplurals=3; \

1127

plural=n%10==1 && n%100!=11 ? 0 : \

1128

n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2;

1129

</PRE>

1130

1131

Languages with this property include:

1132

1133

1134

1135

<DT>Baltic family

1136

<DD>

1137

Lithuanian

1138

</DL>

1139

1140

<DT>Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4]

1141

<DD>

1142

The header entry would look like this:

1143

1144

1145

<PRE>

1146

Plural-Forms: nplurals=3; \

1147

plural=n%10==1 && n%100!=11 ? 0 : \

1148

n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;

1149

</PRE>

1150

1151

Languages with this property include:

1152

1153

1154

1155

<DT>Slavic family

1156

<DD>

1157

Russian,

1158

Ukrainian,

1159

Serbian,

1160

Croatian

1161

</DL>

1162

1163

<DT>Three forms, special cases for 1 and 2, 3, 4

1164

<DD>

1165

The header entry would look like this:

1166

1167

1168

<PRE>

1169

Plural-Forms: nplurals=3; \

1170

plural=(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2;

1171

</PRE>

1172

1173

Languages with this property include:

1174

1175

1176

1177

<DT>Slavic family

1178

<DD>

1179

Czech,

1180

Slovak

1181

</DL>

1182

1183

<DT>Three forms, special case for one and some numbers ending in 2, 3, or 4

1184

<DD>

1185

The header entry would look like this:

1186

1187

1188

<PRE>

1189

Plural-Forms: nplurals=3; \

1190

plural=n==1 ? 0 : \

1191

n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;

1192

</PRE>

1193

1194

Languages with this property include:

1195

1196

1197

1198

<DT>Slavic family

1199

<DD>

1200

Polish

1201

</DL>

1202

1203

<DT>Four forms, special case for one and all numbers ending in 02, 03, or 04

1204

<DD>

1205

The header entry would look like this:

1206

1207

1208

<PRE>

1209

Plural-Forms: nplurals=4; \

1210

plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3;

1211

</PRE>

1212

1213

Languages with this property include:

1214

1215

1216

1217

<DT>Slavic family

1218

<DD>

1219

Slovenian

1220

</DL>

1221

</DL>

1222

1223

1224

You might now ask, <CODE>ngettext</CODE> handles only numbers <VAR>n</VAR> of type

1225

<SAMP>‘unsigned long’</SAMP>. What about larger integer types? What about negative

1226

numbers? What about floating-point numbers?

1227

1228

1229

1230

About larger integer types, such as <SAMP>‘uintmax_t’</SAMP> or

1231

<SAMP>‘unsigned long long’</SAMP>: they can be handled by reducing the value to a

1232

range that fits in an <SAMP>‘unsigned long’</SAMP>. Simply casting the value to

1233

<SAMP>‘unsigned long’</SAMP> would not do the right thing, since it would treat

1234

<CODE>ULONG_MAX + 1</CODE> like zero, <CODE>ULONG_MAX + 2</CODE> like singular, and

1235

the like. Here you can exploit the fact that all mentioned plural form

1236

formulas eventually become periodic, with a period that is a divisor of 100

1237

(or 1000 or 1000000). So, when you reduce a large value to another one in

1238

the range [1000000, 1999999] that ends in the same 6 decimal digits, you

1239

can assume that it will lead to the same plural form selection. This code

1240

does this:

1241

1242

1243

1244

<PRE>

1245

#include <inttypes.h>

1246

uintmax_t nbytes = ...;

1247

printf (ngettext ("The file has %"PRIuMAX" byte.",

1248

"The file has %"PRIuMAX" bytes.",

1249

(nbytes > ULONG_MAX

1250

? (nbytes % 1000000) + 1000000

1251

: nbytes)),

1252

nbytes);

1253

</PRE>

1254

1255

1256

Negative and floating-point values usually represent physical entities for

1257

which singular and plural don't clearly apply. In such cases, there is no

1258

need to use <CODE>ngettext</CODE>; a simple <CODE>gettext</CODE> call with a form suitable

1259

for all values will do. For example:

1260

1261

1262

1263

<PRE>

1264

printf (gettext ("Time elapsed: %.3f seconds"),

1265

num_milliseconds * 0.001);

1266

</PRE>

1267

1268

1269

Even if <VAR>num_milliseconds</VAR> happens to be a multiple of 1000, the output

1270

1271

<PRE>

1272

Time elapsed: 1.000 seconds

1273

</PRE>

1274

1275

1276

is acceptable in English, and similarly for other languages.

1277

1278

1279

1280

The translators' perspective regarding plural forms is explained in

1281

section <A HREF="gettext_12.html#SEC209">12.6 Translating plural forms</A>.

1282

1283

1284

1285

1286

<H3><A NAME="SEC189" HREF="gettext_toc.html#TOC189">11.2.7 Optimization of the *gettext functions</A></H3>

1287

1288

1289

1290

1291

1292

At this point of the discussion we should talk about an advantage of the

1293

GNU <CODE>gettext</CODE> implementation. Some readers might have pointed out

1294

that an internationalized program might have a poor performance if some

1295

string has to be translated in an inner loop. While this is unavoidable

1296

when the string varies from one run of the loop to the other it is

1297

simply a waste of time when the string is always the same. Take the

1298

following example:

1299

1300

1301

1302

<PRE>

1303

{

1304

while (...)

1305

{

1306

puts (gettext ("Hello world"));

1307

}

1308

}

1309

</PRE>

1310

1311

1312

When the locale selection does not change between two runs the resulting

1313

string is always the same. One way to use this is:

1314

1315

1316

1317

<PRE>

1318

{

1319

str = gettext ("Hello world");

1320

while (...)

1321

{

1322

puts (str);

1323

}

1324

}

1325

</PRE>

1326

1327

1328

But this solution is not usable in all situation (e.g. when the locale

1329

selection changes) nor does it lead to legible code.

1330

1331

1332

1333

For this reason, GNU <CODE>gettext</CODE> caches previous translation results.

1334

When the same translation is requested twice, with no new message

1335

catalogs being loaded in between, <CODE>gettext</CODE> will, the second time,

1336

find the result through a single cache lookup.

1337

1338

1339

1340

1341

<H2><A NAME="SEC190" HREF="gettext_toc.html#TOC190">11.3 Comparing the Two Interfaces</A></H2>

1342

1343

1344

1345

1346

1347

1348

1349

The following discussion is perhaps a little bit colored. As said

1350

above we implemented GNU <CODE>gettext</CODE> following the Uniforum

1351

proposal and this surely has its reasons. But it should show how we

1352

came to this decision.

1353

1354

1355

1356

First we take a look at the developing process. When we write an

1357

application using NLS provided by <CODE>gettext</CODE> we proceed as always.

1358

Only when we come to a string which might be seen by the users and thus

1359

has to be translated we use <CODE>gettext("...")</CODE> instead of

1360

<CODE>"..."</CODE>. At the beginning of each source file (or in a central

1361

header file) we define

1362

1363

1364

1365

<PRE>

1366

#define gettext(String) (String)

1367

</PRE>

1368

1369

1370

Even this definition can be avoided when the system supports the

1371

<CODE>gettext</CODE> function in its C library. When we compile this code the

1372

result is the same as if no NLS code is used. When you take a look at

1373

the GNU <CODE>gettext</CODE> code you will see that we use <CODE>_("...")</CODE>

1374

instead of <CODE>gettext("...")</CODE>. This reduces the number of

1375

additional characters per translatable string to 3 (in words:

1376

three).

1377

1378

1379

1380

When now a production version of the program is needed we simply replace

1381

the definition

1382

1383

1384

1385

<PRE>

1386

#define _(String) (String)

1387

</PRE>

1388

1389

1390

1391

1392

1393

1394

1395

1396

<PRE>

1397

#include <libintl.h>

1398

#define _(String) gettext (String)

1399

</PRE>

1400

1401

1402

Additionally we run the program <TT>‘xgettext’</TT> on all source code file

1403

which contain translatable strings and that's it: we have a running

1404

program which does not depend on translations to be available, but which

1405

can use any that becomes available.

1406

1407

1408

1409

1410

The same procedure can be done for the <CODE>gettext_noop</CODE> invocations

1411

(see section <A HREF="gettext_4.html#SEC23">4.7 Special Cases of Translatable Strings</A>). One usually defines <CODE>gettext_noop</CODE> as a

1412

no-op macro. So you should consider the following code for your project:

1413

1414

1415

1416

<PRE>

1417

#define gettext_noop(String) String

1418

#define N_(String) gettext_noop (String)

1419

</PRE>

1420

1421

1422

<CODE>N_</CODE> is a short form similar to <CODE>_</CODE>. The <TT>‘Makefile’</TT> in

1423

the <TT>‘po/’</TT> directory of GNU <CODE>gettext</CODE> knows by default both of the

1424

mentioned short forms so you are invited to follow this proposal for

1425

your own ease.

1426

1427

1428

1429

Now to <CODE>catgets</CODE>. The main problem is the work for the

1430

programmer. Every time he comes to a translatable string he has to

1431

define a number (or a symbolic constant) which has also be defined in

1432

the message catalog file. He also has to take care for duplicate

1433

entries, duplicate message IDs etc. If he wants to have the same

1434

quality in the message catalog as the GNU <CODE>gettext</CODE> program

1435

provides he also has to put the descriptive comments for the strings and

1436

the location in all source code files in the message catalog. This is

1437

nearly a Mission: Impossible.

1438

1439

1440

1441

But there are also some points people might call advantages speaking for

1442

<CODE>catgets</CODE>. If you have a single word in a string and this string

1443

is used in different contexts it is likely that in one or the other

1444

language the word has different translations. Example:

1445

1446

1447

1448

<PRE>

1449

printf ("%s: %d", gettext ("number"), number_of_errors)

1450

1451

printf ("you should see %d %s", number_count,

1452

number_count == 1 ? gettext ("number") : gettext ("numbers"))

1453

</PRE>

1454

1455

1456

Here we have to translate two times the string <CODE>"number"</CODE>. Even

1457

if you do not speak a language beside English it might be possible to

1458

recognize that the two words have a different meaning. In German the

1459

first appearance has to be translated to <CODE>"Anzahl"</CODE> and the second

1460

to <CODE>"Zahl"</CODE>.

1461

1462

1463

1464

Now you can say that this example is really esoteric. And you are

1465

right! This is exactly how we felt about this problem and decide that

1466

it does not weight that much. The solution for the above problem could

1467

be very easy:

1468

1469

1470

1471

<PRE>

1472

printf ("%s %d", gettext ("number:"), number_of_errors)

1473

1474

printf (number_count == 1 ? gettext ("you should see %d number")

1475

: gettext ("you should see %d numbers"),

1476

number_count)

1477

</PRE>

1478

1479

1480

We believe that we can solve all conflicts with this method. If it is

1481

difficult one can also consider changing one of the conflicting string a

1482

little bit. But it is not impossible to overcome.

1483

1484

1485

1486

<CODE>catgets</CODE> allows same original entry to have different translations,

1487

but <CODE>gettext</CODE> has another, scalable approach for solving ambiguities

1488

of this kind: See section <A HREF="gettext_11.html#SEC184">11.2.2 Solving Ambiguities</A>.

1489

1490

1491

1492

1493

<H2><A NAME="SEC191" HREF="gettext_toc.html#TOC191">11.4 Using libintl.a in own programs</A></H2>

1494

1495

1496

Starting with version 0.9.4 the library <CODE>libintl.h</CODE> should be

1497

self-contained. I.e., you can use it in your own programs without

1498

providing additional functions. The <TT>‘Makefile’</TT> will put the header

1499

and the library in directories selected using the <CODE>$(prefix)</CODE>.

1500

1501

1502

1503

1504

<H2><A NAME="SEC192" HREF="gettext_toc.html#TOC192">11.5 Being a <CODE>gettext</CODE> grok</A></H2>

1505

1506

1507

NOTE: This documentation section is outdated and needs to be

1508

revised.

1509

1510

1511

1512

To fully exploit the functionality of the GNU <CODE>gettext</CODE> library it

1513

is surely helpful to read the source code. But for those who don't want

1514

to spend that much time in reading the (sometimes complicated) code here

1515

is a list comments:

1516

1517

1518

1519

<UL>

1520

<LI>Changing the language at runtime

1521

1522

1523

1524

For interactive programs it might be useful to offer a selection of the

1525

used language at runtime. To understand how to do this one need to know

1526

how the used language is determined while executing the <CODE>gettext</CODE>

1527

function. The method which is presented here only works correctly

1528

with the GNU implementation of the <CODE>gettext</CODE> functions.

1529

1530

In the function <CODE>dcgettext</CODE> at every call the current setting of

1531

the highest priority environment variable is determined and used.

1532

Highest priority means here the following list with decreasing

1533

priority:

1534

1535

1536

<OL>

1537

<LI><CODE>LANGUAGE</CODE>

1538

1539

1540

1541

1542

1543

1544

1545

1546

1547

1548

1549

1550

<LI><CODE>LC_xxx</CODE>, according to selected locale category

1551

1552

1553

1554

1555

</OL>

1556

1557

Afterwards the path is constructed using the found value and the

1558

translation file is loaded if available.

1559

1560

What happens now when the value for, say, <CODE>LANGUAGE</CODE> changes? According

1561

to the process explained above the new value of this variable is found

1562

as soon as the <CODE>dcgettext</CODE> function is called. But this also means

1563

the (perhaps) different message catalog file is loaded. In other

1564

words: the used language is changed.

1565

1566

But there is one little hook. The code for gcc-2.7.0 and up provides

1567

some optimization. This optimization normally prevents the calling of

1568

the <CODE>dcgettext</CODE> function as long as no new catalog is loaded. But

1569

if <CODE>dcgettext</CODE> is not called the program also cannot find the

1570

<CODE>LANGUAGE</CODE> variable be changed (see section <A HREF="gettext_11.html#SEC189">11.2.7 Optimization of the *gettext functions</A>). A

1571

solution for this is very easy. Include the following code in the

1572

language switching function.

1573

1574

1575

<PRE>

1576

/* Change language. */

1577

setenv ("LANGUAGE", "fr", 1);

1578

1579

/* Make change known. */

1580

{

1581

extern int _nl_msg_cat_cntr;

1582

++_nl_msg_cat_cntr;

1583

}

1584

</PRE>

1585

1586

1587

The variable <CODE>_nl_msg_cat_cntr</CODE> is defined in <TT>‘loadmsgcat.c’</TT>.

1588

You don't need to know what this is for. But it can be used to detect

1589

whether a <CODE>gettext</CODE> implementation is GNU gettext and not non-GNU

1590

system's native gettext implementation.

1591

1592

</UL>

1593

1594

1595

1596

<H2><A NAME="SEC193" HREF="gettext_toc.html#TOC193">11.6 Temporary Notes for the Programmers Chapter</A></H2>

1597

1598

1599

NOTE: This documentation section is outdated and needs to be

1600

revised.

1601

1602

1603

1604

1605

1606

<H3><A NAME="SEC194" HREF="gettext_toc.html#TOC194">11.6.1 Temporary - Two Possible Implementations</A></H3>

1607

1608

1609

There are two competing methods for language independent messages:

1610

the X/Open <CODE>catgets</CODE> method, and the Uniforum <CODE>gettext</CODE>

1611

method. The <CODE>catgets</CODE> method indexes messages by integers; the

1612

<CODE>gettext</CODE> method indexes them by their English translations.

1613

The <CODE>catgets</CODE> method has been around longer and is supported

1614

by more vendors. The <CODE>gettext</CODE> method is supported by Sun,

1615

and it has been heard that the COSE multi-vendor initiative is

1616

supporting it. Neither method is a POSIX standard; the POSIX.1

1617

committee had a lot of disagreement in this area.

1618

1619

1620

1621

Neither one is in the POSIX standard. There was much disagreement

1622

in the POSIX.1 committee about using the <CODE>gettext</CODE> routines

1623

vs. <CODE>catgets</CODE> (XPG). In the end the committee couldn't

1624

agree on anything, so no messaging system was included as part

1625

of the standard. I believe the informative annex of the standard

1626

includes the XPG3 messaging interfaces, “...as an example of

1627

a messaging system that has been implemented...”

1628

1629

1630

1631

They were very careful not to say anywhere that you should use one

1632

set of interfaces over the other. For more on this topic please

1633

see the Programming for Internationalization FAQ.

1634

1635

1636

1637

1638

<H3><A NAME="SEC195" HREF="gettext_toc.html#TOC195">11.6.2 Temporary - About <CODE>catgets</CODE></A></H3>

1639

1640

1641

There have been a few discussions of late on the use of

1642

<CODE>catgets</CODE> as a base. I think it important to present both

1643

sides of the argument and hence am opting to play devil's advocate

1644

for a little bit.

1645

1646

1647

1648

I'll not deny the fact that <CODE>catgets</CODE> could have been designed

1649

a lot better. It currently has quite a number of limitations and

1650

these have already been pointed out.

1651

1652

1653

1654

However there is a great deal to be said for consistency and

1655

standardization. A common recurring problem when writing Unix

1656

software is the myriad portability problems across Unix platforms.

1657

It seems as if every Unix vendor had a look at the operating system

1658

and found parts they could improve upon. Undoubtedly, these

1659

modifications are probably innovative and solve real problems.

1660

However, software developers have a hard time keeping up with all

1661

these changes across so many platforms.

1662

1663

1664

1665

And this has prompted the Unix vendors to begin to standardize their

1666

systems. Hence the impetus for Spec1170. Every major Unix vendor

1667

has committed to supporting this standard and every Unix software

1668

developer waits with glee the day they can write software to this

1669

standard and simply recompile (without having to use autoconf)

1670

across different platforms.

1671

1672

1673

1674

As I understand it, Spec1170 is roughly based upon version 4 of the

1675

X/Open Portability Guidelines (XPG4). Because <CODE>catgets</CODE> and

1676

friends are defined in XPG4, I'm led to believe that <CODE>catgets</CODE>

1677

is a part of Spec1170 and hence will become a standardized component

1678

of all Unix systems.

1679

1680

1681

1682

1683

<H3><A NAME="SEC196" HREF="gettext_toc.html#TOC196">11.6.3 Temporary - Why a single implementation</A></H3>

1684

1685

1686

Now it seems kind of wasteful to me to have two different systems

1687

installed for accessing message catalogs. If we do want to remedy

1688

<CODE>catgets</CODE> deficiencies why don't we try to expand <CODE>catgets</CODE>

1689

(in a compatible manner) rather than implement an entirely new system.

1690

Otherwise, we'll end up with two message catalog access systems installed

1691

with an operating system - one set of routines for packages using GNU

1692

<CODE>gettext</CODE> for their internationalization, and another set of routines

1693

(catgets) for all other software. Bloated?

1694

1695

1696

1697

Supposing another catalog access system is implemented. Which do

1698

we recommend? At least for Linux, we need to attract as many

1699

software developers as possible. Hence we need to make it as easy

1700

for them to port their software as possible. Which means supporting

1701

<CODE>catgets</CODE>. We will be implementing the <CODE>libintl</CODE> code

1702

within our <CODE>libc</CODE>, but does this mean we also have to incorporate

1703

another message catalog access scheme within our <CODE>libc</CODE> as well?

1704

And what about people who are going to be using the <CODE>libintl</CODE>

1705

+ non-<CODE>catgets</CODE> routines. When they port their software to

1706

other platforms, they're now going to have to include the front-end

1707

(<CODE>libintl</CODE>) code plus the back-end code (the non-<CODE>catgets</CODE>

1708

access routines) with their software instead of just including the

1709

<CODE>libintl</CODE> code with their software.

1710

1711

1712

1713

Message catalog support is however only the tip of the iceberg.

1714

What about the data for the other locale categories? They also have

1715

a number of deficiencies. Are we going to abandon them as well and

1716

develop another duplicate set of routines (should <CODE>libintl</CODE>

1717

expand beyond message catalog support)?

1718

1719

1720

1721

Like many parts of Unix that can be improved upon, we're stuck with balancing

1722

compatibility with the past with useful improvements and innovations for

1723

the future.

1724

1725

1726

1727

1728

<H3><A NAME="SEC197" HREF="gettext_toc.html#TOC197">11.6.4 Temporary - Notes</A></H3>

1729

1730

1731

X/Open agreed very late on the standard form so that many

1732

implementations differ from the final form. Both of my system (old

1733

Linux catgets and Ultrix-4) have a strange variation.

1734

1735

1736

1737

OK. After incorporating the last changes I have to spend some time on

1738

making the GNU/Linux <CODE>libc</CODE> <CODE>gettext</CODE> functions. So in future

1739

Solaris is not the only system having <CODE>gettext</CODE>.

1740

1741

1742

<HR>

1743

1744

</BODY>

1745

</HTML>