4
Liblouis User's and Programmer's Manual
12
3 How to Write Translation Tables
13
3.1 Hyphenation Tables
14
3.2 Character-Definition Opcodes
15
3.3 Braille Indicator Opcodes
17
3.5 Special Symbol Opcodes
18
3.6 Special Processing Opcodes
19
3.7 Translation Opcodes
20
3.8 Character-Class Opcodes
22
3.10 The Context and Multipass Opcodes
23
3.11 The correct Opcode
24
3.12 Miscellaneous Opcodes
25
4 Notes on Back-Translation
26
5 Programming with liblouis
29
5.3 Data structure of liblouis tables
31
5.5 lou_translateString
33
5.7 lou_backTranslateString
39
5.13 lou_readCharFromFile
47
Liblouis User's and Programmer's Manual
48
***************************************
50
This manual is for liblouis (version 1.8.0, 18 November 2009), a
51
Braille Translation and Back-Translation Library derived from the Linux
54
Copyright (C) 1999-2006 by the BRLTTY Team.
56
Copyright (C) 2004-2007 ViewPlus Technologies, Inc. `www.viewplus.com'.
58
Copyright (C) 2007,2009 Abilitiessoft, Inc. `www.abilitiessoft.com'.
60
This file is free software; you can redistribute it and/or modify
61
it under the terms of the GNU Lesser (or library) General Public
62
License (LGPL) as published by the Free Software Foundation;
63
either version 3, or (at your option) any later version.
65
This file is distributed in the hope that it will be useful, but
66
WITHOUT ANY WARRANTY; without even the implied warranty of
67
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
68
Lesser (or Library) General Public License LGPL for more details.
70
You should have received a copy of the GNU Lesser (or Library)
71
General Public License (LGPL) along with this program; see the
72
file COPYING. If not, write to the Free Software Foundation, 51
73
Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
78
Liblouis is an open-source braille translator and back-translator
79
derived from the translation routines in the BRLTTY screenreader for
80
Linux. It has, however, gone far beyond these routines. It is named in
81
honor of Louis Braille. In Linux and Mac OSX it is a shared library,
82
and in Windows it is a DLL. For installation instructions see the
83
README file. Please report bugs and oddities to the maintainer,
84
<john.boyer@abilitiessoft.com>
86
This documentation is derived from Chapter 7 of the BRLTTY manual,
87
but it has been extensively rewritten to cover new features.
89
Please read the following copyright and warranty information. Note
90
that this information also applies to all source code, tables and other
91
files in this distribution of liblouis. It applies similarly to the
92
sister library liblouisxml.
94
This file is maintained by John J. Boyer
95
<john.boyer@abilitiessoft.com>.
97
Persons who wish to program with liblouis but will not be writing
98
translation tables may want to skip ahead to *note Programming with
104
Five test programs are provided as part of the liblouis package. They
105
are intended for testing liblouis and for debugging tables. None of
106
them is suitable for braille transcription. An application that can be
107
used for transcription is `xml2brl', which is part of the liblouisxml
108
package (*note Introduction: (liblouisxml)Top.). The source code of the
109
test programs can be studied to learn how to use the liblouis library
110
and they can be used to perform the following functions.
112
All of these programs recognize the `--help' and `--version' options.
116
Print a usage message listing all available options, then exit
121
Print the version number, then exit successfully.
127
The lou_debug tool is intended for debugging liblouis translation
128
tables. The command line for lou_debug is:
130
lou_debug [OPTIONS] TABLE[,TABLE,...]
132
The command line options that are accepted by lou_debug are described
133
in *note common options::.
135
The table (or comma-separated list of tables) is compiled. If no
136
errors are found a brief command summary is printed, then the prompt
137
`Command:'. You can then input one of the command letters and get
138
output, as described below.
140
Most of the commands print information in the various arrays of
141
`TranslationTableHeader'. Since these arrays are pointers to chains of
142
hashed items, the commands first print the hash number, then the first
143
item, then the next item chained to it, and so on. After each item
144
there is a prompt indicated by `=>'. You can then press enter (`<RET>')
145
to see the next item in the chain or the first item in the next chain.
146
Or you can press `h' (for next-(h)ash) to skip to the next hash chain.
147
You can also press `e' to exit the command and go back to the
151
Brings up a screen of somewhat more extensive help.
154
Display the first forward-translation rule in the first non-empty
155
hash bucket. The number of the bucket is displayed at the
156
beginning of the chain. Each rule is identified by the word
157
`Rule:'. The fields are displayed by phrases consisting of the
158
name of the field, an equal sign, and its value. The before and
159
after fields are displayed only if they are nonzero. Special
160
opcodes such as the `correct' opcode (*note correct: correct
161
opcode.) and the multipass opcodes are shown with the code that
162
instructs the virtual machine that interprets them. If you want to
163
see only the rules for a particular character string you can type
164
`p' at the `command:' prompt. This will take you to the
165
`particular:' prompt, where you can press `f' and then type in the
166
string. The whole hash chain containing the string will be
170
Display back-translation rules. This display is very similar to
171
that of forward translation rules except that the dot pattern is
172
displayed before the character string.
175
Display character definitions, again within their hash chains.
178
Displays single-cell dot definitions. If a character-definition
179
opcode gives a multi-cell dot pattern, it is displayed among the
180
back-translation rules.
183
Display the character-to-dots map. This is set up by the
184
character-definition opcodes and can also be influenced by the
185
`display' opcode (*note display: display opcode.).
188
Display the dot to character map, which shows which single-cell dot
189
patterns map to which characters.
192
Show the multi-cell dot patterns which have been assigned to the
193
characters from 0 to 255 to comply with computer braille codes
194
such as a 6-dot code. Note that the character-definition opcodes
195
should use 8-dot computer braille.
198
Bring up a secondary (`particular:') prompt from which you can
199
examine particular character strings, dot patterns, etc. The
200
commands (given in its own command summary) are very similar to
201
those of the main `command:' prompt, but you can type a character
202
string or dot pattern. They include `h', `f', `b', `c', `d', `C',
203
`D', `z' and `x' (to exit this prompt), but not `p', `i' and `m'.
206
Show braille indicators. This shows the dot patterns for various
207
opcodes such as the `capsign' opcode (*note capsign: capsign
208
opcode.) and the `numsign' opcode (*note numsign: numsign opcode.).
209
It also shows emphasis dot patterns, such as those for the
210
`italword', the `firstletterbold' opcode (*note firstletterbold:
211
firstletterbold opcode.), etc. If a given opcode has not been used
212
nothing is printed for it.
215
Display various miscellaneous information about the table, such as
216
the number of passes, whether certain opcodes have been used, and
217
whether there is a hyphenation table.
225
To use this program type the following:
227
lou_checktable [OPTIONS] TABLE
229
The command line options that are accepted by lou_checktable are
230
described in *note common options::.
232
If the table contains errors, appropriate messages will be displayed.
233
If there are no errors the message `no errors found.' will be shown.
238
This program tests every capability of the liblouis library. It is
239
completely interactive. Invoke it as follows:
241
lou_allround [OPTIONS]
243
The command line options that are accepted by lou_debug are described
244
in *note common options::.
246
You will see a few lines telling you how to use the program. Pressing
247
one of the letters in parentheses and then enter will take you to a
248
message asking for more information or for the answer to a yes/no
249
question. Typing the letter `r' and then <RET> will take you to a
250
screen where you can enter a line to be processed by the library and
251
then view the results.
256
This program translates whatever is on the standard input unit and
257
prints it on the standard output unit. It is intended for large-scale
258
testing of the accuracy of translation and back-translation. The
259
command line for lou_translate is:
261
lou_translate [OPTION] TABLE
263
Aside from the standard options (*note common options::) this program
264
also accepts the following options:
268
Do a forward translation.
272
Do a backward translation.
275
To use it to translate or back-translate a file use a line like
277
lou_translate --forward en-us-g2.ctb <liblouis.txt >testtrans
282
This program checks the accuracy of hyphenation in Braille translation
283
for both translated and untranslated words. It is completely
284
interactive. Invoke it as follows:
286
lou_checkhyphens [OPTIONS]
288
The command line options that are accepted by lou_checkhyphens are
289
described in *note common options::.
291
You will see a few lines telling you how to use the program.
293
3 How to Write Translation Tables
294
*********************************
296
Many translation (contraction) tables have already been made up. They
297
are included in this distribution in the tables directory and should be
298
studied as part of the documentation. The most helpful (and normative)
299
are listed in the following table:
302
Character definitions for U.S. tables
305
Remove excessive white-space
308
Uncontracted American English
311
Contracted or Grade 2 American English
314
Make liblouis output conform to BRF standard
317
8-dot computer braille for use in coding examples
320
6-dot computer braille
323
Nemeth Code translasion for use with liblouisxml
326
Fixes errors at the boundaries of math and text
329
The names used for files containing translation tables are completely
330
arbitrary. They are not interpreted in any way by the translator.
331
Contraction tables may be 8-bit ASCII files, 16-bit big-endian Unicode
332
files or 16-bit little-endian Unicode files. Blank lines are ignored.
333
Any leading and trailing white-space (any number of blanks and/or tabs)
334
is ignored. Lines which begin with a number sign or hatch mark (`#')
335
are ignored, i.e. they are comments. If the number sign is not the
336
first non-blank character in the line, it is treated as an ordinary
337
character. If the first non-blank character is less-than (`<') the line
338
is also treated as a comment. This makes it possible to mark up tables
339
as xhtml documents. Lines which are not blank or comments define table
340
entries. The general format of a table entry is:
342
opcode operands comments
344
Table entries may not be split between lines. The opcode is a
345
mnemonic that specifies what the entry does. The operands may be
346
character sequences, braille dot patterns or occasionally something
347
else. They are described for each opcode. With some exceptions, opcodes
348
expect a certain number of operands. Any text on the line after the last
349
operand is ignored, and may be a comment. A few opcodes accept a
350
variable number of operands. In this case a number sign begins a
351
comment unless it is preceded by a backslash (`\'). *Note Opcode
352
Index::, for a list of opcodes, with a link to each one.
354
Here are some examples of table entries.
357
always world 456-2456 A word and the dot pattern of its contraction
359
Most opcodes have both a "characters" operand and a "dots" operand,
360
though some have only one and a few have other types.
362
The characters operand consists of any combination of characters and
363
escape sequences proceeded and followed by whitespace. Escape sequences
364
are used to represent difficult characters. They begin with a backslash
389
"escape" character (hex 1b, dec 27)
392
4-digit hexadecimal value of a character
395
If liblouis has been compiled for 32-bit Unicode the following are
399
5-digit (20 bit) character
405
The dots operand is a braille dot pattern. The real braille dots, 1
406
through 8, must be specified with their standard numbers. liblouis
407
recognizes "virtual dots," which are used for special purposes, such as
408
distinguishing accent marks. There are seven virtual dots. They are
409
specified by the number 9 and the letters `a' through `f'. For a
410
multi-cell dot pattern, the cell specifications must be separated from
411
one another by a dash (`-'). For example, the contraction for the
412
English word `lord' (the letter `l' preceded by dot 5) would be
413
specified as 5-123. A space may be specified with the special dot
416
An opcode which is helpful in writing translation tables is
417
`include'. Its format is:
421
It reads the file indicated by `filename' and incorporates or
422
includes its entries into the table. Included files can include other
423
files, which can include other files, etc. For an example, see what
424
files are included by the entry `include en-us-g1.ctb' in the table
425
`en-us-g2.ctb'. If the included file is not in the same directory as
426
the main table, use a full pathname for filename.
428
The order of the various types of opcodes or table entries is
429
important. Character-definition opcodes should come first. However, if
430
the optional `display' opcode (*note display: display opcode.) is used
431
it should precede character-definition opcodes. Braille-indicator
432
opcodes should come next. Translation opcodes should follow. The
433
`context' opcode (*note context: context opcode.) is a translation
434
opcode, even though it is considered along with the multipass opcodes.
435
These latter should follow the translation opcodes. The `correct'
436
opcode (*note correct: correct opcode.) can be used anywhere after the
437
character-definition opcodes, but it is probably a good idea to group
438
all `correct' opcodes together. The `include' opcode (*note include:
439
include opcode.) can be used anywhere, but the order of entries in the
440
combined table must conform to the order given above. Within each type
441
of opcode, the order of entries is generally unimportant. Thus the
442
translation entries can be grouped alphabetically or in any other order
445
3.1 Hyphenation Tables
446
======================
448
Hyphenation tables are necessary to make opcodes such as the `nocross'
449
opcode (*note nocross: nocross opcode.) function properly. There are no
450
opcodes for hyphenation table entries because these tables have a
451
special format. Therefore, they cannot be specified as part of an
452
ordinary table. Rather, they must be included using the `include'
453
opcode (*note include: include opcode.). Hyphenation tables must
454
follow character definitions. For an example of a hyphenation table,
455
see `hyph_en_US.dic'.
457
3.2 Character-Definition Opcodes
458
================================
460
These opcodes are needed to define attributes such as digit,
461
punctuation, letter, etc. for all characters and their dot patterns.
462
liblouis has no built-in character definitions, but such definitions
463
are essential to the operation of the `context' opcode (*note context:
464
context opcode.), the `correct' opcode (*note correct: correct
465
opcode.), the multipass opcodes and the back-translator. If the dot
466
pattern is a single cell, it is used to define the mapping between dot
467
patterns and characters, unless a `display' opcode (*note display:
468
display opcode.) for that character-dot-pattern pair has been used
469
previously. If only a single-cell dot pattern has been given for a
470
character, that dot pattern is defined with the character's own
471
attributes. If more than one cell is given and some of them have not
472
previously been defined as single cells, the undefined cells are
473
entered into the dots table with the space attribute. This is done for
474
backward compatibility with old tables, but it may cause problems with
475
the above opcodes or back-translation. For this reason, every
476
single-cell dot pattern should be defined before it is used in a
477
multi-cell character representation. The best way to do this is to use
478
the 8-dot computer braille representation for the particular braille
479
code. If a character or dot pattern used in any rule, except those with
480
the `display' opcode, the `repeated' opcode (*note repeated: repeated
481
opcode.) or the `replace' opcode (*note replace: replace opcode.), is
482
not defined by one of the character-definition opcodes, liblouis will
483
give an error message and refuse to continue until the problem is
484
fixed. If the translator or back-translator encounters an undefined
485
character in its input it produces a succinct error indication in its
486
output, and the character is treated as a space.
488
`space character dots'
489
Defines a character as a space and also defines the dot pattern as
492
space \s 0 \s is the escape sequence for blank; 0 means no dots.
494
`punctuation character dots'
495
Associates a punctuation mark in the particular language with a
496
braille representation and defines the character and dot pattern as
497
punctuation. For example:
499
punctuation . 46 dot pattern for period in NAB computer braille
501
`digit character dots'
502
Associates a digit with a dot pattern and defines the character as
503
a digit. For example:
505
digit 0 356 NAB computer braille
507
`uplow characters dots [,dots]'
508
The characters operand must be a pair of letters, of which the
509
first is uppercase and the second lowercase. The first dots
510
suboperand indicates the dot pattern for the upper-case letter. It
511
may have more than one cell. The second dots suboperand must be
512
separated from the first by a comma and is optional, as indicated
513
by the square brackets. If present, it indicates the dot pattern
514
for the lower-case letter. It may also have more than one cell. If
515
the second dots suboperand is not present the first is used for
516
the lower-case letter as well as the upper-case letter. This
517
opcode is needed because not all languages follow a consistent
518
pattern in assigning Unicode codes to upper and lower case
519
letters. It should be used even for languages that do. The
520
distinction is important in the forward translator. for example:
524
`grouping name characters dots ,dots'
525
This opcode is used to indicate pairs of grouping symbols used in
526
processing mathematical expressions. These symbols are usually
527
generated by the MathML interpreter in liblouisxml. They are used
528
in multipass opcodes. The name operand must contain only letters,
529
but they may be upper- or lower-case. The characters operand must
530
contain exactly two Unicode characters. The dots operand must
531
contain exactly two braille cells, separated by a comma. Note that
532
grouping dot patterns also need to be declared with the exactdots
533
opcode. The characters may need to be declared with the math
536
grouping mrow \x0001\x0002 1e,2e
537
grouping mfrac \x0003\x0004 3e,4e
539
`letter character dots'
540
Associates a letter in the language with a braille representation
541
and defines the character as a letter. This is intended for
542
letters which are neither uppercase nor lowercase.
544
`lowercase character dots'
545
Associates a character with a dot pattern and defines the
546
character as a lowercase letter. Both the character and the dot
547
pattern have the attributes lowercase and letter.
549
`uppercase character dots'
550
Associates a character with a dot pattern and defines the
551
character as an uppercase letter. Both the character and the dot
552
pattern have the attributes uppercase and letter. `lowercase' and
553
`uppercase' should be used when a letter has only one case.
554
Otherwise use the `uplow' opcode (*note uplow: uplow opcode.).
556
`litdigit digit dots'
557
Associates a digit with the dot pattern which should be used to
558
represent it in literary texts. For example:
563
`sign character dots'
564
Associates a character with a dot pattern and defines both as a
565
sign. This opcode should be used for things like at sign (`@'),
566
percent (`%'), dollar sign (`$'), etc. Do not use it to define
567
ordinary punctuation such as period and comma. For example:
569
sign % 4-25-1234 literary percent sign
571
`math character dots'
572
Associates a character and a dot pattern and defines them as a
573
mathematical symbol. It should be used for less than (`<'),
574
greater than(`>'), equals(`='), plus(`+'), etc. For example:
579
3.3 Braille Indicator Opcodes
580
=============================
582
Braille indicators are dot patterns which are inserted into the braille
583
text to indicate such things as capitalization, italic type, computer
584
braille, etc. The opcodes which define them are followed only by a dot
585
pattern, which may be one or more cells.
588
The dot pattern which indicates capitalization of a single letter.
589
In English, this is dot 6. For example:
594
The dot pattern which begins a block of capital letters. For
600
The dot pattern which ends a block of capital letters within a
606
This indicator is needed in Grade 2 to show that a single letter is
607
not a contraction. It is also used when an abbreviation happens to
608
be a sequence of letters that is the same as a contraction. For
614
The letters in the operand will not be proceeded by a letter sign.
615
More than one `noletsign' opcode can be used. This is equivalent
616
to a single entry containing all the letters. In addition, if a
617
single letter, such as `a' in English, is defined as a `word'
618
(*note word: word opcode.) or `largesign' (*note largesign:
619
largesign opcode.), it will be treated as though it had also been
620
specified in a `noletsign' entry.
622
`noletsignbefore characters'
623
If any of the characters proceeds a single letter without a space a
624
letter sign is not used. By default the characters apostrophe
625
(`'') and period (`.') have this property. Use of a
626
`noletsignbefore' entry cancels the defaults. If more than one
627
`noletsignbefore' entry is used, the characters in all entries are
630
`noletsignafter characters'
631
If any of the characters follows a single letter without a space a
632
letter sign is not used. By default the characters apostrophe
633
(`'') and period (`.') have this property. Use of a
634
`noletsignafter' entry cancels the defaults. If more than one
635
`noletsignafter' entry is used the characters in all entries are
639
The translator inserts this indicator before numbers made up of
640
digits defined with the `litdigit' opcode (*note litdigit:
641
litdigit opcode.) to show that they are a number and not letters
642
or some other symbols. For example:
650
These also define braille indicators, but they require more
651
explanation. There are four sets, for italic, bold, underline and
652
computer braille. In each of the first three sets there are seven
653
opcodes, for use before the first word of a phrase, for use before the
654
last word, for use after the last word, for use before the first letter
655
(or character) if emphasis starts in the middle of a word, for use
656
after the last letter (or character) if emphasis ends in the middle of
657
a word, before a single letter (or character), and to specify the
658
length of a phrase to which the first-word and last-word-before
659
indicators apply. This rather elaborate set of emphasis opcodes was
660
devised to try to meet all contingencies. It is unlikely that a
661
translation table will contain all of them. The translator checks for
662
their presence. If they are present, it first looks to see if the
663
single-letter indicator should be used. Then it looks at the word (or
664
phrase) indicators and finally at the multi-letter indicators.
666
The translator will apply up to two emphasis indicators to each
667
phrase or string of characters, depending on what the `typeform'
668
parameter in its calling sequence indicates (*note Programming with
671
For computer braille there are only two braille indicators, for the
672
beginning and end of a sequence of characters to be rendered in
673
computer braille. Such a sequence may also have other emphasis. The
674
computer braille indicators are applied not only when computer braille
675
is indicated in the `typeform' parameter, but also when a sequence of
676
characters is determined to be computer braille because it contains a
677
subsequence defined by the `compbrl' opcode (*note compbrl: compbrl
678
opcode.) or the `literal' opcode (*note literal: literal opcode.).
680
Here are the various emphasis opcodes.
683
This is the braille indicator to be placed before the first word
684
of an italicized phrase that is longer than the value given in the
685
`lenitalphrase' opcode (*note lenitalphrase: lenitalphrase
686
opcode.). For example:
688
firstwordital 46-46 English indicator
690
`lastworditalbefore dots'
692
These two opcodes are synonyms. This is the braille indicator to be
693
placed before the last word of an italicized phrase. In addition,
694
if `firstwordital' is not used, this braille indicator is doubled
695
and placed before the first word. Do not use `lastworditalbefore'
696
and `lastworditalafter' in the same table. For example:
698
lastworditalbefore 4-6
700
`lastworditalafter dots'
701
This is the braille indicator to be placed after the last word of
702
an italicized phrase. Do not use `lastworditalbefore' and
703
`lastworditalafter' in the same table. See also the
704
`lenitalphrase' opcode (*note lenitalphrase: lenitalphrase
705
opcode.) for more information.
707
`firstletterital dots'
709
These two opcodes are synonyms. This is the braille indicator to be
710
placed before the first letter (or character) if italicization
711
begins in the middle of a word.
713
`lastletterital dots'
715
These two opcodes are synonyms. This is the braille indicator to be
716
placed after the last letter (or character) when italicization
717
ends in the middle of a word.
719
`singleletterital dots'
720
This braille indicator is used if only a single letter (or
721
character) is italicized.
723
`lenitalphrase number'
724
If `lastworditalbefore' is used, an italicized phrase is checked
725
to see how many words it contains. If this number is less than or
726
equal to the number given in the `lenitalphrase' opcode, the
727
`lastworditalbefore' sign is placed in front of each word. If it
728
is greater, the `firstwordital' indicator is placed before the
729
first word and the `lastworditalbefore' indicator is placed after
730
the last word. Note that if the `firstwordital' opcode is not used
731
its indicator is made up by doubling the dot pattern given in the
732
`lastworditalbefore' entry. For example:
737
This is the braille indicator to be placed before the first word
738
of a bold phrase. For example:
740
firstwordbold 456-456
742
`lastwordboldbefore dots'
744
These two opcodes are synonyms. This is the braille indicator to be
745
placed before the last word of a bold phrase. In addition, if
746
`firstwordbold' is not used, this braille indicator is doubled and
747
placed before the first word. Do not use `lastwordboldbefore' and
748
`lastwordboldafter' in the same table. For example:
750
lastwordboldbefore 456
752
`lastwordboldafter dots'
753
This is the braille indicator to be placed after the last word of a
754
bold phrase. Do not use `lastwordboldbefore' and
755
`lastwordboldafter' in the same table.
757
`firstletterbold dots'
759
These two opcodes are synonyms. This is the braille indicator to be
760
placed before the first letter (or character) if bold emphasis
761
begins in the middle of a word.
763
`lastletterbold dots'
765
These two opcodes are synonyms. This is the braille indicator to be
766
placed after the last letter (or character) when bold emphasis
767
ends in the middle of a word.
769
`singleletterbold dots'
770
This braille indicator is used if only a single letter (or
771
character) is in boldface.
773
`lenboldphrase number'
774
If `lastwordboldbefore' is used, a bold phrase is checked to see
775
how many words it contains. If this number is less than or equal to
776
the number given in the `lenboldphrase' opcode, the
777
`lastwordboldbefore' sign is placed in front of each word. If it
778
is greater, the `firstwordbold' indicator is placed before the
779
first word and the `lastwordboldbefore' indicator is placed after
780
the last word. Note that if the `firstwordbold' opcode is not used
781
its indicator is made up by doubling the dot pattern given in the
782
`lastwordboldbefore' entry.
784
`firstwordunder dots'
785
This is the braille indicator to be placed before the first word
786
of an underlined phrase.
788
`lastwordunderbefore dots'
790
These two opcodes are synonyms. This is the braille indicator to be
791
placed before the last word of an underlined phrase. In addition,
792
if `firstwordunder' is not used, this braille indicator is doubled
793
and placed before the first word.
795
`lastwordunderafter dots'
796
This is the braille indicator to be placed after the last word of
797
an underlined phrase.
799
`firstletterunder dots'
801
These two opcodes are synonyms. This is the braille indicator to be
802
placed before the first letter (or character) if underline emphasis
803
begins in the middle of a word.
805
`lastletterunder dots'
807
These two opcodes are synonyms. This is the braille indicator to be
808
placed after the last letter (or character) when underline emphasis
809
ends in the middle of a word.
811
`singleletterunder dots'
812
This braille indicator is used if only a single letter (or
813
character) is underlined.
815
`lenunderphrase number'
816
If `lastwordunderbefore' is used, an underlined phrase is checked
817
to see how many words it contains. If this number is less than or
818
equal to the number given in the `lenunderphrase' opcode, the
819
`lastwordunderbefore' sign is placed in front of each word. If it
820
is greater, the `firstwordunder' indicator is placed before the
821
first word and the `lastwordunderbefore' indicator is placed after
822
the last word. Note that if the `firstwordunder' opcode is not
823
used its indicator is made up by doubling the dot pattern given in
824
the `lastwordunderbefore' entry.
827
This braille indicator is placed before a sequence of characters
828
translated in computer braille, whether this sequence is indicated
829
in the `typeform' parameter (*note Programming with liblouis::) or
830
inferred because it contains a subsequence specified by the
831
`compbrl' opcode (*note compbrl: compbrl opcode.).
834
This braille indicator is placed after a sequence of characters
835
translated in computer braille, whether this sequence is indicated
836
in the `typeform' parameter (*note Programming with liblouis::) or
837
inferred because it contains a subsequence specified by the
838
`compbrl' opcode (*note compbrl: compbrl opcode.).
841
3.5 Special Symbol Opcodes
842
==========================
844
These opcodes define certain symbols, such as the decimal point, which
845
require special treatment.
847
`decpoint character dots'
848
This opcode defines the decimal point. The character operand must
849
have only one character. For example, in `en-us-g1.ctb' we have:
853
`hyphen character dots'
854
This opcode defines the hyphen, that is, the character used in
855
compound words such as have-nots. The back-translator uses it to
856
determine the end of individual words.
859
3.6 Special Processing Opcodes
860
==============================
862
These opcodes cause special processing to be carried out.
865
This opcode has no operands. If it is specified words or parts of
866
words in all caps are not contracted. This is needed for languages
870
3.7 Translation Opcodes
871
=======================
873
These opcodes define the braille representations for character
874
sequences. Each of them defines an entry within the contraction table.
875
These entries may be defined in any order except, as noted below, when
876
they define alternate representations for the same character sequence.
878
Each of these opcodes specifies a condition under which the
879
translation is legal, and each also has a characters operand and a dots
880
operand. The text being translated is processed strictly from left to
881
right, character by character, with the most eligible entry for each
882
position being used. If there is more than one eligible entry for a
883
given position in the text, then the one with the longest character
884
string is used. If there is more than one eligible entry for the same
885
character string, then the one defined first is is tested for legality
886
first. (This is the only case in which the order of the entries makes a
889
The characters operand is a sequence or string of characters preceded
890
and followed by whitespace. Each character can be entered in the normal
891
way, or it can be defined as a four-digit hexadecimal number preceded
894
The dots operand defines the braille representation for the
895
characters operand. It may also be specified as an equals sign (`=').
896
This means that the the default representation for each character
897
(*note Character-Definition Opcodes::) within the sequence is to be
900
In what follows the word `characters' means a sequence of one or
901
more consecutive letters between spaces and/or punctuation marks.
904
This is an opcode prefix, that is to say, it modifies the
905
operation of the opcode that follows it on the same line. noback
906
specifies that no back-translation is to be done using this line.
911
This is an opcode prefix which modifies the opration of the opcode
912
following it on the same line. nofor specifies that forward
913
translation is not to use the information on this line.
917
These two opcodes are synonyms. If the characters are found within
918
a block of text surrounded by whitespace the entire block is
919
translated according to the default braille representations
920
defined by the *note Character-Definition Opcodes::, if 8-dot
921
computer braille is enabled or according to the dot patterns given
922
in the `comp6' opcode (*note comp6: comp6 opcode.), if 6-dot
923
computer braille is enabled. For example:
925
compbrl www translate URLs in computer braille
927
`comp6 character dots'
928
This opcode specifies the translation of characters in 6-dot
929
computer braille. It is necessary because the translation of a
930
single character may require more than one cell. The first operand
931
must be a character with a decimal representation from 0 to 255
932
inclusive. The second operand may specify as many cells as
933
necessary. The opcode is somewhat of a misnomer, since any dots,
934
not just dots 1 through 6, can be specified. This even includes
938
Like `compbrl', except that the string is uncontracted. `prepunc'
939
opcode (*note prepunc: prepunc opcode.) and `postpunc' opcode
940
(*note postpunc: postpunc opcode.) rules are applied, however.
941
This is useful for specifying that foreign words should not be
942
contracted in an entire document.
944
`replace characters {characters}'
945
Replace the first set of characters, no matter where they appear,
946
with the second. Note that the second operand is _NOT_ a dot
947
pattern. It is also optional. If it is omitted the character(s)
948
in the first operand will be discarded. This is useful for
949
ignoring characters. It is possible that the "ignored" characters
950
may still affect the translation indirectly. Therefore, it is
951
preferable to use `correct' opcode (*note correct: correct
954
`always characters dots'
955
Replace the characters with the dot pattern no matter where they
956
appear. Do _NOT_ use an entry such as `always a 1'. Use the
957
`uplow', `letter', etc. character definition opcodes instead. For
960
always world 456-2456 unconditional translation
962
`repeated characters dots'
963
Replace the characters with the dot pattern no matter where they
964
appear. Ignore any consecutive repetitions of the same character
965
sequence. This is useful for shortening long strings of spaces or
966
hyphens or periods. For example:
968
repeated --- 36-36-36 shorten separator lines made with hyphens
970
`repword characters dots'
971
When characters are encountered check to see if the word before
972
this string matches the word after it. If so, replace characters
973
with dots and eliminate the second word and any word following
974
another occurence of characters that is the same. This opcode is
975
used in Malaysian braille. In this case the rule is:
979
`largesign characters dots'
980
Replace the characters with the dot pattern no matter where they
981
appear. In addition, if two words defined as large signs follow
982
each other, remove the space between them. For example, in
983
`en-us-g2.ctb' the words `and' and `the' are both defined as large
984
signs. Thus, in the phrase `the cat and the dog' the space would
985
be deleted between `and' and `the', with the result `the cat
986
andthe dog'. Of course, `and' and `the' would be properly
987
contracted. The term `largesign' is a bit of braille jargon that
988
pleases braille experts.
990
`word characters dots'
991
Replace the characters with the dot pattern if they are a word,
992
that is, are surrounded by whitespace and/or punctuation.
994
`syllable characters dots'
995
As its name indicates, this opcode defines a "syllable" which must
996
be represented by exactly the dot patterns given. Contractions may
997
not cross the boundaries of this "syllable" either from left or
998
right. The character string defined by this opcode need not be a
999
lexical syllable, though it usually will be. The equal sign in the
1000
following example means that the the default representation for
1001
each character within the sequence is to be used (*note
1002
Translation Opcodes::):
1004
syllable horse = sawhorse, horseradish
1006
`nocross characters dots'
1007
Replace the characters with the dot pattern if the characters are
1008
all in one syllable (do not cross a syllable boundary). For this
1009
opcode to work, a hyphenation table must be included. If this is
1010
not done, `nocross' behaves like the `always' opcode (*note
1011
always: always opcode.). For example, if the English Grade 2 table
1012
is being used and the appropriate hyphenation table has been
1013
included `nocross sh 146' will cause the `sh' in `monkshood' not
1016
`joinword characters dots'
1017
Replace the characters with the dot pattern if they are a word
1018
which is followed by whitespace and a letter. In addition remove
1019
the whitespace. For example, `en-us-g2.ctb' has `joinword to 235'.
1020
This means that if the word `to' is followed by another word the
1021
contraction is to be used and the space is to be omitted. If these
1022
conditions are not met, the word is translated according to any
1023
other opcodes that may apply to it.
1025
`lowword characters dots'
1026
Replace the characters with the dot pattern if they are a word
1027
preceded and followed by whitespace. No punctuation either before
1028
or after the word is allowed. The term `lowword' derives from the
1029
fact that in English these contractions are written in the lower
1030
part of the cell. For example:
1034
`contraction characters'
1035
If you look at `en-us-g2.ctb' you will see that some words are
1036
actually contracted into some of their own letters. A famous
1037
example among braille transcribers is `also', which is contracted
1038
as `al'. But this is also the name of a person. To take another
1039
example, `altogether' is contracted as `alt', but this is the
1040
abbreviation for the alternate key on a computer keyboard.
1041
Similarly `could' is contracted into `cd', but this is the
1042
abbreviation for compact disk. To prevent confusion in such cases,
1043
the letter sign (see `letsign' opcode (*note letsign: letsign
1044
opcode.)) is placed before such letter combinations when they
1045
actually are abbreviations, not contractions. The `contraction'
1046
opcode tells the translator to do this.
1048
`sufword characters dots'
1049
Replace the characters with the dot pattern if they are either a
1050
word or at the beginning of a word.
1052
`prfword characters dots'
1053
Replace the characters with the dot pattern if they are either a
1054
word or at the end of a word.
1056
`begword characters dots'
1057
Replace the characters with the dot pattern if they are at the
1058
beginning of a word.
1060
`begmidword characters dots'
1061
Replace the characters with the dot pattern if they are either at
1062
the beginning or in the middle of a word.
1064
`midword characters dots'
1065
Replace the characters with the dot pattern if they are in the
1068
`midendword characters dots'
1069
Replace the characters with the dot pattern if they are either in
1070
the middle or at the end of a word.
1072
`endword characters dots'
1073
Replace the characters with the dot pattern if they are at the end
1076
`partword characters dots'
1077
Replace the characters with the dot pattern if the characters are
1078
anywhere in a word, that is, if they are proceeded or followed by a
1082
Note that the operand must begin with an at sign (`@'). The dot
1083
pattern following it is evaluated for validity. If it is valid,
1084
whenever an at sign followed by this dot pattern appears in the
1085
source document it is replaced by the characters corresponding to
1086
the dot pattern in the output. This opcode is intended for use in
1087
liblouisxml semantic-action files to specify exact dot patterns,
1088
as in mathematical codes. For example:
1090
exactdots @4-46-12356
1091
will produce the characters with these dot patterns in the output.
1093
`prepunc characters dots'
1094
Replace the characters with the dot pattern if they are part of
1095
punctuation at the beginning of a word.
1097
`postpunc characters dots'
1098
Replace the characters with the dot pattern if they are part of
1099
punctuation at the end of a word.
1101
`begnum characters dots'
1102
Replace the characters with the dot pattern if they are at the
1103
beginning of a number, that is, before all its digits. For
1104
example, in `en-us-g1.ctb' we have `begnum # 4'.
1106
`midnum characters dots'
1107
Replace the characters with the dot pattern if they are in the
1108
middle of a number. For example, `en-us-g1.ctb' has `midnum . 46'.
1109
This is because the decimal point has a different dot pattern than
1112
`endnum characters dots'
1113
Replace the characters with the dot pattern if they are at the end
1114
of a number. For example `en-us-g1.ctb' has `endnum th 1456'.
1115
This handles things like `4th'. A letter sign is _NOT_ inserted.
1117
`joinnum characters dots'
1118
Replace the characters with the dot pattern. In addition, if
1119
whitespace and a number follows omit the whitespace.
1122
3.8 Character-Class Opcodes
1123
===========================
1125
These opcodes define and use character classes. A character class
1126
associates a set of characters with a name. The name then refers to any
1127
character within the class. A character may belong to more than one
1130
The basic character classes correspond to the character definition
1131
opcodes, with the exception of the `uplow' opcode (*note uplow: uplow
1132
opcode.), which defines characters belonging to the two classes
1133
`uppercase' and `lowercase'. These classes are:
1136
White-space characters such as blank and tab
1142
Both uppercase and lowercase alphabetic characters
1145
Lowercase alphabetic characters
1148
Uppercase alphabetic characters
1154
Signs such as percent (`%')
1157
Mathematical symbols
1163
Not properly defined
1166
The opcodes which define and use character classes are shown below.
1167
For examples see `fr-abrege.ctb'.
1169
`class name characters'
1170
Define a new character class. The characters operand must be
1171
specified as a string. A character class may not be used until it
1174
`after class opcode ...'
1175
The specified opcode is further constrained in that the matched
1176
character sequence must be immediately preceded by a character
1177
belonging to the specified class. If this opcode is used more than
1178
once on the same line then the union of the characters in all the
1181
`before class opcode ...'
1182
The specified opcode is further constrained in that the matched
1183
character sequence must be immediately followed by a character
1184
belonging to the specified class. If this opcode is used more than
1185
once on the same line then the union of the characters in all the
1192
The swap opcodes are needed to tell the `context' opcode (*note
1193
context: context opcode.), the `correct' opcode (*note correct: correct
1194
opcode.) and multipass opcodes which dot patterns to swap for which
1195
characters. There are three, `swapcd', `swapdd' and `swapcc'. The first
1196
swaps dot patterns for characters. The second swaps dot patterns for
1197
dot patterns and the third swaps characters for characters. The first
1198
is used in the `context' opcode and the second is used in the multipass
1199
opcodes. Dot patterns are separated by commas and may contain more than
1202
`swapcd name characters dots, dots, dots, ...'
1203
See above paragraph for explanation. For example:
1205
swapcd dropped 0123456789 356,2,23,...
1207
`swapdd name dots, dots, dots ... dotpattern1, dotpattern2, dotpattern3, ...'
1208
The `swapdd' opcode defines substitutions for the multipass
1209
opcodes. In the second operand the dot patterns must be single
1210
cells, but in the third operand multi-cell dot patterns are
1211
allowed. This is because multi-cell patterns in the second operand
1212
would lead to ambiguities.
1214
`swapcc name characters characters'
1215
The `swapcc' opcode swaps characters in its second operand for
1216
characters in the corresponding places in its third operand. It is
1217
intended for use with `correct' opcodes and can solve problems
1218
such as formatting phone numbers.
1221
3.10 The Context and Multipass Opcodes
1222
======================================
1224
`context test action'
1228
The `context' and multipass opcodes (`pass2', `pass3' and `pass4')
1229
provide translation capabilities beyond those of the basic
1230
translation opcodes (*note Translation Opcodes::) discussed
1231
previously. The multipass opcodes cause additional passes to be
1232
made over the string to be translated. The number after the word
1233
`pass' indicates in which pass the entry is to be applied. If no
1234
multipass opcodes are given, only the first translation pass is
1235
made. The `context' opcode is basically a multipass opcode for the
1236
first pass. It differs slightly from the multipass opcodes per se.
1237
The format of all these opcodes is:
1241
The `test' and `action' operands have suboperands. Each suboperand
1242
begins with a non-alphanumeric character and ends when another
1243
non-alphanumeric character is encountered. The suboperands and
1244
their initial characters are as follows.
1247
a string of characters. This string must be terminated by
1248
another double quote. It may contain any characters. If a
1249
double quote is needed within the string, it must be preceded
1250
by a backslash (`\'). If a space is needed, it must be
1251
represented by the escape sequence \s. This suboperand is
1252
valid only in the test part of the `context' opcode.
1255
a sequence of dot patterns. Cells are separated by hyphens as
1256
usual. This suboperand is not valid in the test part of the
1257
context and correct opcodes.
1260
a string of attributes, such as `d' for digit, `l' for
1261
letter, etc. More than one attribute can be given. If you
1262
wish to check characters with any attribute, use the letter
1263
`a'. Input characters are checked to see if they have at
1264
least one of the attributes. The attribute string can be
1265
followed by numbers specifying how many characters are to be
1266
checked. If no numbers are given, 1 is assumed. If two
1267
numbers separated by a hyphen are given, the input is checked
1268
to make sure that at least the first number of characters with
1269
the attributes are present, but no more than the second
1270
number. If only one number is present, then exactly that many
1271
characters must have the attributes. A period instead of the
1272
numbers indicates an indefinite number of characters. This
1273
suboperand is valid in all test parts but not in action
1274
parts. For the characters which can be used in attribute
1275
strings, see the following table.
1277
`! (exclamation point)'
1278
reverses the logical meaning of the suboperand which follows.
1279
For example, !$d is true only if the character is _NOT_ a
1280
digit. This suboperand is valid in test parts only.
1283
the name of a class defined by the `class' opcode (*note
1284
class: class opcode.) or the name of a swap set defined by
1285
the swap opcodes (*note Swap Opcodes::). Names may contain
1286
only letters. The letters may be upper or lower-case. The
1287
case matters. Class names may be used in test parts only.
1288
Swap names are valid everywhere.
1291
Name: the name of a grouping pair. The left brace indicates
1292
that the first (or left) member of the pair is to be used in
1293
matching. If this is between replacement brackets it must be
1294
the only item. This is also valid in the action part.
1297
Name: the name of a grouping pair. The right brace indicates
1298
that the second (or right) member is to be used in matching.
1299
See the remarks on the left brace immediately above.
1302
Search the input for the expression following the slash and
1303
return true if found. This can be used to set a variable.
1306
Move backward. If a number follows, move backward that number
1307
of characters. The program never moves backward beyond the
1308
beginning of the input string. This suboperand is valid only
1312
start replacement here. This suboperand must always be paired
1313
with a right bracket and is valid only in test parts.
1316
end replacement here. This suboperand must always be paired
1317
with a left bracket and is valid only in test parts.
1319
`# (number sign or crosshatch)'
1320
test or set a variable. Variables are referred to by numbers
1321
1 to 50, for example, `#1', `#2', `#25'. Variables may be set
1322
by one `context' or multipass opcode and tested by another.
1323
Thus, an operation that occurs at one place in a translation
1324
can tell an operation that occurs later about itself. This
1325
feature will be used in math translation, and it may also
1326
help to alleviate the need for new opcodes. This suboperand
1327
is valid everywhere.
1329
Variables are set in the action part. To set a variable use an
1330
expression like `#1=1', `#2=5', etc. Variables are also
1331
incremented and decremented in the action part with
1332
expressions like `#1+', `#3-', etc. These operators increment
1333
or decrement the variable by 1.
1335
Variables are tested in the test part with expressions like
1336
`#1=2', `#3<4', `#5>6', etc.
1339
Copy the characters or dot patterns in the input within the
1340
replacement brackets into the output and discard anything
1341
else that may match. This feature is used, for example, for
1342
handling numeric subscripts in Nemeth. This suboperand is
1343
valid only in action parts.
1346
Valid only in the action part. The characters to be replaced
1347
are simply ignored. That is, they are replaced with nothing.
1348
If either membar of a grouping pair is in the replace
1349
brackets the other member at the same level is also removed.
1352
The characters which can be used in attribute strings are as
1375
`w first user-defined class'
1377
`x second user-defined class'
1379
`y third user-defined class'
1381
`z fourth user-defined class'
1383
Note that if any multipass opcode or the correct opcode is used
1384
and the `pass1Only' mode bit (*note lou_translateString::) is not
1385
set input and output positions may be incorrect.
1388
3.11 The correct Opcode
1389
=======================
1391
`correct test action'
1392
Because some input (such as that from an OCR program) may contain
1393
systematic errors, it is sometimes advantageous to use a
1394
pre-translation pass to remove them. The errors and their
1395
corrections are specified by the `correct' opcode. If there are no
1396
`correct' opcodes in a table, the pre-translation pass is not
1397
used. The format of the `correct' opcode is very similar to that
1398
of the `context' opcode (*note context: context opcode.). The only
1399
difference is that in the action part strings may be used and dot
1400
patterns may not be used. Some examples of `correct' opcode
1403
correct "\\" ? Eliminate backslashes
1404
correct "cornf" "comf" fix a common "scano"
1405
correct "cornm" "comm"
1406
correct "cornp" "comp"
1407
correct "*" ? Get rid of stray asterisks
1408
correct "|" ? ditto for vertical bars
1409
correct "\s?" "?" drop space before question mark
1411
Note that if the `correct' opcode is used and the `pass1Only' mode
1412
bit (*note lou_translateString::) is not set input and output
1413
positions may be incorrect.
1416
3.12 Miscellaneous Opcodes
1417
==========================
1420
Read the file indicated by `filename' and incorporate or include
1421
its entries into the table. Included files can include other files,
1422
which can include other files, etc. For an example, see what files
1423
are included by the entry include `en-us-g1.ctb' in the table
1424
`en-us-g2.ctb'. If the included file is not in the same directory
1425
as the main table, use a full pathname for filename.
1428
Not implemented, but recognized and ignored for backward
1431
`display character dots'
1432
Associates dot patterns with the characters which will be sent to a
1433
braille embosser, display or screen font. The character must be in
1434
the range 0-255 and the dots must specify a single cell. Here are
1437
display a 1 When the character a is sent to the embosser or display,
1438
it # will produce a dot 1.
1440
display L 123 When the character L is sent to the display or embosser
1441
# produces dots 1-2-3.
1443
The display opcode is optional. It is used when the embosser or
1444
display has a different mapping of characters to dot patterns than
1445
that given in *note Character-Definition Opcodes::. If used,
1446
display entries must proceed character-definition entries.
1448
`multind dots opcode opcode ...'
1449
the multind opcode tells the back-translator that a sequence of
1450
braille cells represents more than one braille indicator. For
1451
example, in `en-us-g1.ctb' we have `multind 56-6 letsign capsign'.
1452
The back-translator can generally handle single braille indicators,
1453
but it cannot apply them when they immediately follow each other.
1454
It recognizes the letter sign if it is followed by a letter and
1455
takes appropriate action. It also recognizes the capital sign if
1456
it is followed by a letter. But when there is a letter sign
1457
followed by a capital sign it fails to recognize the letter sign
1458
unless the sequence has been defined with `multind'. A `multind'
1459
entry may not contain a comment because liblouis would attempt to
1460
interpret it as an opcode.
1463
4 Notes on Back-Translation
1464
***************************
1466
Back-translation is carried out by the function
1467
`lou_backTranslateString'. Its calling sequence is described in *note
1468
Programming with liblouis::. Tables containing no `context' opcode
1469
(*note context: context opcode.), `correct' opcode (*note correct:
1470
correct opcode.) or multipass opcodes can be used for both forward and
1471
backward translation. If these opcodes are needed different tables will
1472
be required. `lou_backTranslateString' first performs `pass4', if
1473
present, then `pass3', then `pass2', then the backtranslation, then
1474
corrections. Note that this is exactly the inverse of forward
1477
5 Programming with liblouis
1478
***************************
1483
Liblouis may contain code borrowed from the Linux screenreader BRLTTY,
1484
Copyright (C) 1999-2006 by the BRLTTY Team.
1486
Copyright (C) 2004-2007 ViewPlus Technologies, Inc. `www.viewplus.com'.
1488
Copyright (C) 2007,2009 Abilitiessoft, Inc. `www.abilitiessoft.com'.
1490
Liblouis is free software: you can redistribute it and/or modify it
1491
under the terms of the GNU Lesser General Public License as published
1492
by the Free Software Foundation, either version 3 of the License, or
1493
(at your option) any later version.
1495
Liblouis is distributed in the hope that it will be useful, but
1496
WITHOUT ANY WARRANTY; without even the implied warranty of
1497
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser
1498
General Public License for more details.
1500
You should have received a copy of the GNU Lesser General Public
1501
License along with Liblouis. If not, see `http://www.gnu.org/licenses/'.
1506
You use the liblouis library by calling eleven functions,
1507
`lou_translateString', `lou_backTranslateString', `lou_logFile',
1508
`lou_logPrint', `lou_getTable', `lou_translate', `lou_backTranslate',
1509
`lou_hyphenate', `lou_readCharFromFile', `lou_version' and `lou_free'.
1510
These are described below. The header file, `liblouis.h', also contains
1511
brief descriptions. Liblouis is written in straight C. It has just
1512
three code modules, `compileTranslationTable.c',
1513
`lou_translateString.c' and `lou_backTranslateString.c'. In addition,
1514
there are two header files, `liblouis.h', which defines the API, and
1515
`louis.h', used only internally. The latter includes `liblouis.h'.
1517
Persons who wish to use liblouis from Python may want to skip ahead
1518
to *note Python bindings::.
1520
`compileTranslationTable.c' keeps track of all translation tables
1521
which an application has used. It is called by the translation,
1522
hyphenation and checking functions when they start. If a table has not
1523
yet been compiled `compileTranslationTable.c' checks it for correctness
1524
and compiles it into an efficient internal representation. The main
1525
entry point is `lou_getTable'. Since it is the module that keeps track
1526
of memory usage, it also contains the `lou_free' function. In addition,
1527
it contains the `lou_logFile' and `lou_logPrint' functions, plus some
1528
utility functions which are used by the other modules.
1530
By default, liblouis handles all characters internally as 16-bit
1531
unsigned integers. It can be compiled for 32-bit characters as
1532
explained below. The meanings of these integers are not hard-coded.
1533
Rather they are defined by the character-definition opcodes. However,
1534
the standard printable characters, from decimal 32 to 126 are
1535
recognized for the purpose of processing the opcodes. Hence, the
1536
following definition is included in `liblouis.h'. It is correct for
1537
computers with at least 32-bit processors.
1539
#define widechar unsigned short int
1541
To make liblouis handle 32-bit Unicode simply remove the word
1542
`short' in the above `define'. This will cause the translate and
1543
back-translate functions to expect input in 32-bit form and to deliver
1544
their output in this form. The input to the compiler (tables) is
1545
unaffected except that two new escape sequences for 20-bit and 32-bit
1546
characters are recognized.
1548
Here are the definitions of the eleven liblouis functions and their
1549
parameters. They are given in terms of 16-bit Unicode. If liblouis has
1550
been compiled for 32-bit Unicode simply read 32 instead of 16.
1552
5.3 Data structure of liblouis tables
1553
=====================================
1555
The data structure `TranslationTableHeader' is defined by a `typedef'
1556
statement in `louis.h'. To find the beginning, search for the word
1557
`header'. As its name implies, this is actually the table header. Data
1558
are placed in the `ruleArea' array, which is the last item defined in
1559
this structure. This array is declared with a length of 1 and is
1560
expanded as needed. The table header consists mostly of arrays of
1561
pointers of size `HASHNUM'. These pointers are actually offsets into
1562
`ruleArea' and point to chains of items which have been placed in the
1563
same hash bucket by a simple hashing algorithm. `HASHNUM' should be a
1564
prime and is currently 1123. The structure of the table was chosen to
1565
optimize speed rather than memory usage.
1567
The first part of the table contains miscellaneous information, such
1568
as the number of passes and whether various opcodes have been used. It
1569
also contains the amount of memory allocated to the table and the
1570
amount actually used.
1572
The next section contains pointers to various braille indicators and
1573
begins with `capitalSign'. The rules pointed to contain the dot pattern
1574
for the indicator and an opcode which is used by the back-translator
1575
but does not appear in the list of opcodes. The braille indicators also
1576
include various kinds of emphasis, such as italic and bold and
1577
information about the length of emphasized phrases. The latter is
1578
contained directly in the table item instead of in a rule.
1580
After the braille indicators comes information about when a letter
1581
sign should be used.
1583
Next is an array of size `HASHNUM' which points to character
1584
definitions. These are created by the character-definition opcodes.
1586
Following this is a similar array pointing to definitions of
1587
single-cell dot patterns. This is also created from the
1588
character-definition opcodes. If a character definition contains a
1589
multi-cell dot pattern this is compiled into ordinary forward and
1590
backward rules. If such a multi-cell dot pattern contains a single cell
1591
which has not previously been defined that cell is placed in this
1592
array, but is given the attribute `undefined'.
1594
Next come arrays that map characters to single-cell dot patterns and
1595
dots to characters. These are created from both character-definition
1596
opcodes and display opcodes.
1598
Next is an array of size 256 which maps characters in this range to
1599
dot patterns which may consist of multiple cells. It is used, for
1600
example, to map `{' to dots 456-246. These mappings are created by the
1601
`compdots' or the `comp6' opcode (*note comp6: comp6 opcode.).
1603
Next are two small arrays that held pointers to chains of rules
1604
produced by the `swapcd' opcode (*note swapcd: swapcd opcode.) and the
1605
`swapdd' opcode (*note swapdd: swapdd opcode.) and by some multipass,
1606
context and correct opcodes.
1608
Now we get to an array of size `HASHNUM' which points to chains of
1609
rules for forward translation.
1611
Following this is a similar array for back-translation.
1613
Finally is the `ruleArea', an array of variable size to which
1614
various structures are mapped and to which almost everything else
1620
char *lou_version ()
1622
This function returns a pointer to a character string containing the
1623
version of liblouis, plus other information, such as the release date
1624
and perhaps notable changes.
1626
5.5 lou_translateString
1627
=======================
1629
int lou_translateString (
1630
const char *const trantab,
1631
const widechar *const inbuf,
1639
This function takes a string of 16-bit Unicode characters in `inbuf'
1640
and translates it into a string of 16-bit characters in `outbuf'. Each
1641
16-bit character produces a particular dot pattern in one braille cell
1642
when sent to an embosser or braille display or to a screen typefont.
1643
Which 16-bit character represents which dot pattern is indicated by the
1644
character-definition and display opcodes in the translation table.
1646
The `trantab' parameter points to a list of translation tables
1647
separated by commas. If only one table is given, no comma should be
1648
used after it. It is these tables which control just how the
1649
translation is made, whether in Grade 2, Grade 1, or something else.
1650
The first table in the list must be a full pathname, unless the tables
1651
are in the current directory. The pathname is extracted up to the
1652
filename. The first table is then compiled. The pathname is then added
1653
to the name of the second table, which is compiled, and so on. The
1654
tables in a list are all compiled into the same internal table. The
1655
list is then regarded as the name of this table. As explained in *note
1656
How to Write Translation Tables::, each table is a file which may be
1657
plain text, big-endian Unicode or little-endian Unicode. A table (or
1658
list of tables) is compiled into an internal representation the first
1659
time it is used. Liblouis keeps track of which tables have been
1660
compiled. For this reason, it is essential to call the lou_free
1661
function at the end of your application to avoid memory leaks. Do _NOT_
1662
call `lou_free' after each translation. This will force liblouis to
1663
compile the translation tables each time they are used, leading to
1666
Note that both the `*inlen' and `*outlen' parameters are pointers to
1667
integers. When the function is called, these integers contain the
1668
maximum input and output lengths, respectively. When it returns, they
1669
are set to the actual lengths used.
1671
The `typeform' parameter is used to indicate italic type, boldface
1672
type, computer braille, etc. It is a string of characters with the same
1673
length as the input buffer pointed to by `*inbuf'. However, it is used
1674
to pass back character-by-character results, so enough space must be
1675
provided to match the `*outlen' parameter. Each character indicates
1676
the typeform of the corresponding character in the input buffer. The
1677
values are as follows: 0 plain-text; 1 italic; 2 bold; 4 underline; 8
1678
computer braille. These values can be added for multiple emphasis. If
1679
this parameter is `NULL', no checking for typeforms is done. In
1680
addition, if this parameter is not `NULL', it is set on return to have
1681
an 8 at every position corresponding to a character in `outbuf' which
1682
was defined to have a dot representation containing dot 7, dot 8 or
1683
both, and to 0 otherwise.
1685
The `spacing' parameter is used to indicate differences in spacing
1686
between the input string and the translated output string. It is also
1687
of the same length as the string pointed to by `*inbuf'. If this
1688
parameter is `NULL', no spacing information is computed.
1690
The `mode' parameter specifies how the translation should be done.
1691
The valid values of mode are listed in `liblouis.h'. They are all
1692
powers of 2, so that a combined mode can be specified by adding up
1695
The function returns 1 if no errors were encountered and 0 if a
1696
complete translation could not be done.
1702
const char *const trantab,
1703
const widechar * const inbuf,
1714
This function adds the parameters `outputPos', `inputPos' and
1715
`cursorPos', to facilitate use in screenreader programs. The
1716
`outputPos' parameter must point to an array of integers with at least
1717
`outlen' elements. On return, this array will contain the position in
1718
`inbuf' corresponding to each output position. Similarly, `inputPos'
1719
must point to an array of integers of at least `inlen' elements. On
1720
return, this array will contain the position in `outbuf' corresponding
1721
to each position in `inbuf'. `cursorPos' must point to an integer
1722
containing the position of the cursor in the input. On return, it will
1723
contain the cursor position in the output. Any parameter after `outlen'
1724
may be `NULL'. In this case, the actions corresponding to it will not
1725
be carried out. The `mode' parameter, however, must be present and must
1726
be an integer, not a pointer to an integer. If the `compbrlAtCursor'
1727
bit is set in the `mode' parameter the space-bounded characters
1728
containing the cursor will be translated in computer braille.
1730
5.7 lou_backTranslateString
1731
===========================
1733
int lou_backTranslateString (
1734
const char *const trantab,
1735
const widechar *const inbuf,
1743
This is exactly the opposite of `lou_translateString'. `inbuf' is a
1744
string of 16-bit Unicode characters representing braille. `outbuf' will
1745
contain a string of 16-bit Unicode characters. `typeform' will indicate
1746
any emphasis found in the input string, while `spacing' will indicate
1747
any differences in spacing between the input and output strings. The
1748
`typeform' and `spacing' parameters may be `NULL' if this information is
1749
not needed. `mode' again specifies how the back-translation should be
1752
5.8 lou_backTranslate
1753
=====================
1755
int lou_backTranslate (
1756
const char *const trantab,
1757
const widechar *const inbufx,
1768
This function is exactly the inverse of `lou_translate'.
1774
const char *const trantab,
1775
const widechar * const inbuf,
1780
This function looks at the characters in `inbuf' and if it finds a
1781
sequence of letters attempts to hyphenate it as a word. Note that
1782
lou_hyphenate operates on single words only, and spaces or punctuation
1783
marks between letters are not allowed. Leading and trailing punctuation
1784
marks are ignored. The table named by the `trantab' parameter must
1785
contain a hyphenation table. If it does not, the function does nothing.
1786
`inlen' is the length of the character string in `inbuf'. `hyphens' is
1787
an array of characters and must be of size `inlen'. If hyphenation is
1788
successful it will have a 1 at the beginning of each syllable and a 0
1789
elsewhere. If the `mode' parameter is 0 `inbuf' is assumed to contain
1790
untranslated characters. Any nonzero value means that `inbuf' contains
1791
a translation. In this case, it is back-translated, hyphenation is
1792
performed, and it is retranslated so that the hyphens can be placed
1793
correctly. The `lou_translate' and `lou_backTranslate' functions are
1794
used in this process. `lou_hyphenate' returns 1 if hyphenation was
1795
successful and 0 otherwise. In the latter case, the contents of the
1796
`hyphens' parameter are undefined. This function was provided for use in
1802
void lou_logFile (char *fileName);
1804
This function is used when it is not convenient either to let
1805
messages be printed on stderr or to use redirection, as when liblouis
1806
is used in a GUI application or in liblouisxml. Any error messages
1807
generated will be printed to the file given in this call. The entire
1808
pathname of the file must be given.
1813
void lou_logPrint (char *format, ...);
1815
This function is called like `fprint'. It can be used by other
1816
libraries to print messages to the file specified by the call to
1817
`lou_logFile'. In particular, it is used by the companion library
1823
void *lou_getTable (char *tablelist);
1825
`tablelist' is a list of names of table files separated by commas,
1826
as explained previously (*note `trantab' parameter in
1827
`lou_translateString': translation-tables.). If no errors are found
1828
this function returns a pointer to the compiled table. If errors are
1829
found messages are printed to the log file, which is stderr unless a
1830
different filename has been given using the `lou_logFile' function.
1831
Errors result in a `NULL' pointer being returned.
1833
5.13 lou_readCharFromFile
1834
=========================
1836
int lou_readCharFromFile (const char *fileName, int *mode);
1838
This function is provided for situations where it is necessary to
1839
read a file which may contain little-endian or big-endian 16-bit Unicode
1840
characters or ASCII8 characters. The return value is a little-endian
1841
character, encoded as an integer. The `fileName' parameter is the name
1842
of the file to be read. The `mode' parameter is a pointer to an integer
1843
which must be set to 1 on the first call. After that, the function
1844
takes care of it. On end-of-file the function returns `EOF'.
1851
This function should be called at the end of the application to free
1852
all memory allocated by liblouis. Failure to do so will result in
1853
memory leaks. Do _NOT_ call `lou_free' after each translation. This
1854
will force liblouis to compile the translation tables every time they
1855
are used, resulting in great inefficiency.
1857
5.15 Python bindings
1858
====================
1860
There are Python bindings for `lou_translateString', `lou_translate'
1861
and `lou_version'. For installation instructions see the the `README'
1862
file in the `python' directory. Usage information is included in the
1863
Python module itself.
1868
after: See 3.8. (line 1174)
1869
always: See 3.7. (line 954)
1870
before: See 3.8. (line 1181)
1871
begbold: See 3.4. (line 757)
1872
begcaps: See 3.3. (line 593)
1873
begcomp: See 3.4. (line 826)
1874
begital: See 3.4. (line 707)
1875
begmidword: See 3.7. (line 1060)
1876
begnum: See 3.7. (line 1101)
1877
begunder: See 3.4. (line 799)
1878
begword: See 3.7. (line 1056)
1879
boldsign: See 3.4. (line 742)
1880
capsign: See 3.3. (line 587)
1881
capsnocont: See 3.6. (line 864)
1882
class: See 3.8. (line 1169)
1883
comp6: See 3.7. (line 927)
1884
compbrl: See 3.7. (line 915)
1885
context: See 3.10. (line 1224)
1886
contraction: See 3.7. (line 1034)
1887
correct: See 3.11. (line 1391)
1888
decpoint: See 3.5. (line 847)
1889
digit: See 3.2. (line 501)
1890
display: See 3.12. (line 1431)
1891
endbold: See 3.4. (line 763)
1892
endcaps: See 3.3. (line 599)
1893
endcomp: See 3.4. (line 833)
1894
endital: See 3.4. (line 713)
1895
endnum: See 3.7. (line 1112)
1896
endunder: See 3.4. (line 805)
1897
endword: See 3.7. (line 1072)
1898
exactdots: See 3.7. (line 1081)
1899
firstletterbold: See 3.4. (line 757)
1900
firstletterital: See 3.4. (line 707)
1901
firstletterunder: See 3.4. (line 799)
1902
firstwordbold: See 3.4. (line 736)
1903
firstwordital: See 3.4. (line 682)
1904
firstwordunder: See 3.4. (line 784)
1905
grouping: See 3.2. (line 524)
1906
hyphen: See 3.5. (line 853)
1907
include: See 3.12. (line 1419)
1908
italsign: See 3.4. (line 690)
1909
joinnum: See 3.7. (line 1117)
1910
joinword: See 3.7. (line 1016)
1911
largesign: See 3.7. (line 979)
1912
lastletterbold: See 3.4. (line 763)
1913
lastletterital: See 3.4. (line 713)
1914
lastletterunder: See 3.4. (line 805)
1915
lastwordboldafter: See 3.4. (line 752)
1916
lastwordboldbefore: See 3.4. (line 742)
1917
lastworditalafter: See 3.4. (line 700)
1918
lastworditalbefore: See 3.4. (line 690)
1919
lastwordunderafter: See 3.4. (line 795)
1920
lastwordunderbefore: See 3.4. (line 788)
1921
lenboldphrase: See 3.4. (line 773)
1922
lenitalphrase: See 3.4. (line 723)
1923
lenunderphrase: See 3.4. (line 815)
1924
letsign: See 3.3. (line 605)
1925
letter: See 3.2. (line 539)
1926
litdigit: See 3.2. (line 556)
1927
literal: See 3.7. (line 915)
1928
locale: See 3.12. (line 1427)
1929
lowercase: See 3.2. (line 544)
1930
lowword: See 3.7. (line 1025)
1931
math: See 3.2. (line 571)
1932
midendword: See 3.7. (line 1068)
1933
midnum: See 3.7. (line 1106)
1934
midword: See 3.7. (line 1064)
1935
multind: See 3.12. (line 1448)
1936
noback: See 3.7. (line 903)
1937
nocont: See 3.7. (line 937)
1938
nocross: See 3.7. (line 1006)
1939
nofor: See 3.7. (line 910)
1940
noletsign: See 3.3. (line 613)
1941
noletsignafter: See 3.3. (line 630)
1942
noletsignbefore: See 3.3. (line 622)
1943
numsign: See 3.3. (line 638)
1944
partword: See 3.7. (line 1076)
1945
pass2: See 3.10. (line 1224)
1946
pass3: See 3.10. (line 1224)
1947
pass4: See 3.10. (line 1224)
1948
postpunc: See 3.7. (line 1097)
1949
prepunc: See 3.7. (line 1093)
1950
prfword: See 3.7. (line 1052)
1951
punctuation: See 3.2. (line 494)
1952
repeated: See 3.7. (line 962)
1953
replace: See 3.7. (line 944)
1954
repword: See 3.7. (line 970)
1955
sign: See 3.2. (line 563)
1956
singleletterbold: See 3.4. (line 769)
1957
singleletterital: See 3.4. (line 719)
1958
singleletterunder: See 3.4. (line 811)
1959
space: See 3.2. (line 488)
1960
sufword: See 3.7. (line 1048)
1961
swapcc: See 3.9. (line 1214)
1962
swapcd: See 3.9. (line 1202)
1963
swapdd: See 3.9. (line 1207)
1964
syllable: See 3.7. (line 994)
1965
undersign: See 3.4. (line 788)
1966
uplow: See 3.2. (line 507)
1967
uppercase: See 3.2. (line 549)
1968
word: See 3.7. (line 990)
1972
lou_backTranslate: See 5.8. (line 1755)
1973
lou_backTranslateString: See 5.7. (line 1733)
1974
lou_free: See 5.14. (line 1849)
1975
lou_getTable: See 5.12. (line 1823)
1976
lou_hyphenate: See 5.9. (line 1773)
1977
lou_logFile: See 5.10. (line 1802)
1978
lou_logPrint: See 5.11. (line 1813)
1979
lou_readCharFromFile: See 5.13. (line 1836)
1980
lou_translate: See 5.6. (line 1701)
1981
lou_translateString: See 5.5. (line 1629)
1982
lou_version: See 5.4. (line 1620)
1986
lou_allround: See 2.3. (line 238)
1987
lou_checkhyphens: See 2.5. (line 282)
1988
lou_checktable: See 2.2. (line 225)
1989
lou_debug: See 2.1. (line 127)
1990
lou_translate: See 2.4. (line 256)