1
% This file is part of Malaga, a system for Natural Language Analysis.
2
% Copyright (C) 1995-1999 Bjoern Beutel
5
% Universitaet Erlangen-Nuernberg
6
% Abteilung fuer Computerlinguistik
9
% e-mail: malaga@linguistik.uni-erlangen.de
11
% This program is free software; you can redistribute it and/or modify
12
% it under the terms of the GNU General Public License as published by
13
% the Free Software Foundation; either version 2 of the License, or
14
% (at your option) any later version.
16
% This program is distributed in the hope that it will be useful,
17
% but WITHOUT ANY WARRANTY; without even the implied warranty of
18
% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
19
% GNU General Public License for more details.
21
% You should have received a copy of the GNU General Public License
22
% along with this program; if not, write to the Free Software
23
% Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
25
% description =================================================================
27
% This file contains the documentation of the programming package Malaga
29
% format of the document ======================================================
31
\documentclass[twoside]{report}
35
\oddsidemargin -0.5 cm
36
\evensidemargin 0.5 cm
44
% title page and table of contents ============================================
47
\author{Bj\"orn Beutel\\
48
Abteilung f\"ur Computerlinguistik\\
49
Universit\"at Erlangen-N\"urnberg, Germany}
50
\date{August 18th, 1999}
63
\pagenumbering{arabic}
66
% plain text ==================================================================
68
\chapter{Introduction}
70
The Name ``Malaga'' has two different meanings: on the one hand, it is the name
71
of a special purpose programming language, namely a language to implement
72
grammars for natural languages. On the other hand, it is the name of a program
73
package for development of Malaga Grammars and testing them by analysing words
76
``Malaga'' is an acronym for ``{\bf M}erely {\bf a} {\bf L}eft-{\bf
77
A}ssociative-{\bf G}rammar {\bf A}pplication''. We will explain the
78
formalism of Left Associative Grammars (LAG) later.
80
The program package ``Malaga'' has been developed by Bj\"orn~Beutel in the
81
``Abteilung f\"ur Computerlinguistik der Universit\"at Erlangen-N\"urnberg'',
82
Germany. There is a number of predecessors: The program packages {\bf LAMA},
83
{\bf IMP}, {\bf MAGIC}, {\bf MOSAIC} and {\bf LAP}, all of them being developed
84
at the same department. They are all based on LAG.
86
Gerald~Sch\"uller has implemented parts of the original debugger, the original
87
Emacs Malaga mode and the original {\bf Tree} and {\bf Variable} output.
89
Meanwhile (1999) there exist morphology grammars for some real-world languages,
90
for example for the German, Italian, English and Korean language.
92
If you have questions, criticism or suggestions for the improvement of Malaga,
93
you can write an e-mail letter to {\tt malaga@linguistik.uni-erlangen.de} or
94
write to the following address:
98
Universitaet Erlangen-Nuernberg \\
99
Abteilung fuer Computerlinguistik \\
100
Bismarckstrasse 12 \\
104
%------------------------------------------------------------------------------
106
\chapter{Left Associative Grammars}
108
A formal grammar for a natural language can be used to check whether a sentence
109
or a word form is grammatically well-formed (a word form is a special
110
flectional form of a word, so ``book'' and ``books'' are two different word
111
forms of the word ``book''). Furthermore, they can describe the structure and
112
meaning of a sentence or a word form by a data structure that has been
113
constructed in the analysis process.
115
The Left Associative Grammar (LAG) is such a kind of formal grammar. An LAG
116
analyses a sentence (or a word form) step by step: its parts are concatenated
117
from the left to the right, hence the name ``Left Associative Grammar''. A
118
single LAG rule can only join two parts to a bigger one: it concatenates the
119
Start part (which is the beginning of the sentence or word form that has
120
already been analysed) and the Next part (which is the next word form or the
121
next allomorph). Take a look at the following sentence:
123
Shakespeare liked writing comedies.
126
The sentence is being analysed by five rule applications:
129
``'' + ``Shakespeare''\\
130
``Shakespeare'' + ``liked''\\
131
``Shakespeare liked'' + ``writing''\\
132
``Shakespeare liked writing'' + ``comedies''\\
133
``Shakespeare liked writing comedies'' + ``.''
137
To apply a rule it's not sufficient to know the spelling of a word or an
138
allomorph. A rule also requires morphological and syntactic information, such
139
as word class, gender, meaning of a suffix and much more. This information
140
associated with a part of speech (sentence, word form or allomorph) is called
141
its {\em category\/}. The analysis of a sentence or a word returns such a
144
Now we'll take a closer look at how a sentence is analysed.
147
\item Before we can start to analyse a sentence, the analysis automaton must be
148
in an {\em initial state\/}. The initial state determines:
150
\item the category of the empty sentence start, and
151
\item the {\em combination rule\/} checking whether it is allowed to combine
152
the empty sentence start with the first word form (which is yet to be
153
read). This rule also determines the resulting category of the new sentence
154
start (which consists of the old sentence start and the first word form
158
\item The next word form to be analysed is read and analysed morphologically.
159
If there is no valid word form, the analysis process aborts.
161
\item The category that morphology assigns to this word form is called the Next
162
category. The category of the input that has been analysed syntactically so
163
far is called the Start category.
165
\item The active combination rule checks whether it is allowed to combine the
166
sentence start (which may be empty), represented by the Start category, with
167
the next word form, represented by the Next category. In a rule, categories
168
can be compared by logical tests, and finally the category of the new
169
sentence start (including the word form that has been read), the Result
170
category, is constructed by the rule. The rule finally specifies which {\em
171
successor rule\/} is active in the next step. Execution then continues at
174
Instead of calling a successor rule a rule can also accept the analysed
175
sentence. In this case the Result category of this rule is the category of
176
the complete analysed sentence.
179
Morphological analysis operates analogously, except that a word form, composed
180
from allomorphs, is being analysed. The next allomorph (step 2) is found in the
183
This sketch is of course simplified. There can be ambiguities in an analysis,
184
induced by several causes:
187
\item The initial state can call several rules to analyse the first word form
189
\item A rule has multiple successor rules.
190
\item In morphology, the continuation of the input matches several trie
192
\item In syntax analysis, the next word form is assigned several categories to
196
These ambiguities are coped with by dividing the analysis into several
197
subanalyses: if there are two lexicon entries for a word form, for example, the
198
analysis continues using the first entry (and its category) as well as the
199
second one. You can compare this with a branching path. The analyses will be
200
continued independently of each other. So, one analysis can succeed while the
201
other fails. Each analysis path can divide repeatedly, if another ambiguity is
202
met. If several analysis paths are continued until they accept, the analysis
203
process returns more than one result.
205
%------------------------------------------------------------------------------
207
\chapter{The Malaga Programs}
209
The Malaga programs are all started in a similar manner: either you give the
210
name of a {\em project file} as argument (this is not possible if you start
211
{\bf malrul} or {\bf malsym}), or you give the name of the files that are
212
needed by the program (for {\bf malmake}, you have to give the project file as
213
argument). The file type is recognised by the file name ending.
215
Assume you've written a grammar that consists of a symbol file ``english.sym'',
216
an allomorph rule file ``english.all'', a lexicon file ``english.lex'' and a
217
morphology rule file ``english.mor'', and you have also written a project file
218
``english.pro''. Then you can start the program {\bf malaga} by two ways (after
219
you've compiled the grammar files):
221
{\tt malaga english.pro}
225
{\tt malaga english.sym\_c english.mor\_c english.lex\_c}
227
If you use the first command line, the names of the grammar files will be read
228
from the project file. The second command line contains the names of the
229
compiled files explicitly. The order of the names is of no importance. The name
230
of the allomorph rule file must not be included if you are starting {\bf
231
malaga}, since this file is not used by {\bf malaga} itself, but it's needed
232
by {\bf mallex} to compile the lexicon file.
234
If you just want to know which version of a Malaga program you are using, you
235
can get the version number by using the option ``{\tt -version}'' or ``{\tt
238
{\tt malrul -version}
240
The program only emits a few lines with information about its version number
243
%------------------------------------------------------------------------------
247
A couple of files, taken together, form a Malaga grammar:
251
\item a lexicon of base forms (the {\em lexicon file\/}, ending in ``{\tt
254
\item a file with rules which generate the allomorphs of the base forms (the
255
{\em allomorph rule file\/}, ending in ``{\tt .all}''),
257
\item a file with LAG rules which combine allomorphs to word forms (the {\em
258
morphology rule file\/}, ending in ``{\tt .mor}''),
260
\item (optionally) a file with LAG rules that combine word forms to sentences
261
(the {\em syntax rule file\/}, ending in ``{\tt .syn}''),
263
\item a file with the used category symbols (the {\em symbol file\/}, whose
264
name ends in ``{\tt .sym}''), and
266
\item (optional) a file with additional category symbols that may only be used
267
in a syntax rule file (the {\em extended symbol file\/}, whose name ends in
272
You can group these files together to a {\em project\/}. To do this, you have
273
to write a project file, with a name ending in ``{\tt pro}'', in which you list
274
the names of the several files, each one behind a keyword (each file type in a
275
line on its own). Imagine you have written a grammar that consists of the files
276
``standard.sym'', ``webster.lex'', ``english.all'', ``english.mor'' and
277
``english.syn''. The project file for this grammar will look like this:
285
By using the {\bf include} statement, you can include further source
286
files in your source files, so a part of your grammar can consist of
287
several files. Assume, you've got a lexicon file ``webster.lex'' that
290
include "suffixes.lex";
292
include "adjectives.lex";
294
include "particles.lex";
295
include "abbreviations.lex";
297
include "numbers.lex";
299
In this case, you must write the names of all these files in the ``{\tt lex:}''
300
line of your project file behind the name of the real lexicon file:
302
lex: webster.lex suffixes.lex verbs.lex adjectives.lex
303
lex: nouns.lex particles.lex abbreviations.lex names.lex numbers.lex
305
Since there is a number of files in this example, the ``{\tt lex:}'' line has
306
been divided into two lines, each line starting with ``{\tt lex:}''.
308
If you want to extend an existing project (for example, you might want to add a
309
syntax rule file to a morphology grammar), you can include the project file of
310
the morphology grammar in the project file of your syntax grammar by using a
311
line starting with ``{\tt include:}'':
313
include: /projects/grammars/english/english.pro
314
syn: english_syntax.syn
316
The file entries in the project file of the morphology are treated as if they
317
would replace the ``{\tt include:}'' line.
319
The programs {\bf malaga} and {\bf mallex} can set options like {\bf hidden} or
320
{\bf robust} from the project file, so you do not need to set these options each
321
time you start {\bf malaga}. Each line in the project file that starts with
322
``{\tt malaga:}'' and ``{\tt mallex:}'', resp., will be executed when {\bf
323
malaga} and {\bf mallex}, resp., has been started, but you may only use the
324
{\bf set} command, so you can only set options. Here's an example:
327
malaga: set hidden +semantics
328
malaga: set robust on
330
mallex: set hidden +semantics +syntax
332
When you start {\bf malaga}, the commands ``{\tt set hidden +semantics}'' and
333
``{\tt set robust on}'' will be executed; when you start {\bf mallex}, the
334
command ``{\tt set hidden +semantics +syntax}'' will be executed.
336
Options in project files that are read in by ``{\tt include:}'' lines in other
337
project files will be executed as if they were at the position of the ``{\tt
340
Lines that start with ``{\tt morinfo:}'' contain information about the
341
morphology; lines that start with ``{\tt syninfo:}'' contain information about
342
the syntax. In {\bf malaga}, you get this information if you use the command
345
morinfo: =====================================
346
morinfo: Deutsche Malaga Morphologie 3.0
347
morinfo: written by Oliver Lorenz, 11.04.1997
348
morinfo: dmm@linguistik.uni-erlangen.de
349
morinfo: =====================================
352
%------------------------------------------------------------------------------
354
\section{The Malaga startup file ``.malagarc''}
356
If you prefer some options that you want to use with every Malaga project, you
357
may create your personal startup file in your home directory, called
358
``{\tt .malagarc}''. You can enter {\bf malaga} and {\bf mallex} options in the
359
same manner as you do in the project file:
361
malaga: set hidden +semantics
362
malaga: set robust on
364
mallex: set hidden +semantics +syntax
367
The options in the project file are used first, so you can override options in
368
the project file by setting them in the startup file. In the startup file, you
369
should set the {\bf display} option if you want to use the graphical display
370
program written in TCL/Tk.
372
You can set some attributes of the graphical user interface, namely the
373
position, the size, and the font size of each window that is part of the user
374
interface. Here is an example which sets every option available:
376
result_geometry: 628x480+640+0
379
tree_geometry: 628x480+640+512
382
path_geometry: 628x480+640+0
385
variables_geometry: 628x480+0+512
386
variables_font_size: 12
389
The geometry defines the size and/or position of each window. The first two
390
numbers (``{\tt 628x480}'') define the width and the height of the window in
391
pixels, the last two numbers (``{\tt +640+512}'') define the position of its
392
upper left corner. The available font sizes are 8, 10, 12, 14, and 18 pixels.
394
%------------------------------------------------------------------------------
396
\section{The Program ``malaga''}
398
The program {\bf malaga} is the user interface for analysing word forms and
399
sentences, displaying the results and finding bugs in a grammar. You can start
400
{\bf malaga} giving either the name of a project file or the names of the
401
grammar files as arguments:
403
{\tt malaga english.pro}
407
{\tt malaga english.sym\_c english.mor\_c english.lex\_c english.syn\_c}
409
If you are not using a project file, you have to give:
411
\item the symbol file,
412
\item the lexicon file,
413
\item the morphology rule file, and
414
\item the syntax rule file (optional).
417
When {\bf malaga} has been started, it loads the symbol file, the lexicon file
418
and the rule file(s). After loading, the {\em prompt\/} appears. Then {\bf
419
malaga} is ready to execute your commands:
421
malaga (4.3) - Copyright (C) 1995-1999 Bjoern Beutel
422
This program comes with ABSOLUTELY NO WARRANTY.
423
This is free software which you may redistribute under certain conditions.
424
For details, refer to the GNU General Public License.
428
You can now enter any {\bf malaga} command. If you are not sure about the name
429
of a command, use the command {\bf help} to get an overview of all {\bf malaga}
432
If you want to quit {\bf malaga}, enter the command {\bf quit}.
434
You can use the following command line options when you start {\bf malaga}:
437
\item ``{\tt -morphology}'' or ``{\tt -m}'' starts {\bf malaga} in {\em
438
morphology mode\/}. That is, word forms are being read in from the
439
standard input stream and analysed (one word form per line). The analysis
440
result is being written to the standard output stream.
441
\item ``{\tt -syntax}'' or ``{\tt -s}'' starts {\bf malaga} in {\em syntax
442
mode\/}. That is, sentences are being read in from the standard input
443
stream and analysed (one sentence per line). The analysis result is being
444
written to the standard output stream.
447
%------------------------------------------------------------------------------
449
\section{The Program ``mallex''}
451
By using {\bf mallex}, you can make the allomorph rules process the entries of
452
a base form lexicon. A run time lexicon (with the ending ``{\tt .lex\_c}'')
453
will be built. Normally, {\bf mallex} starts in {\em batch mode\/}. If you want
454
to run it interactively, you must give it the option ``{\tt -interactive}'' or
455
``{\tt -i}'' when starting (if you start it from Emacs with ``{\tt M-x
456
mallex}'', this will be done automatically).
458
You can start {\bf mallex} either with the name of a project file or with the
459
names of the needed grammar files:
461
{\tt mallex english.pro}
465
{\tt mallex english.sym\_c english.all\_c english.lex}
467
If you are not using a project file, you must give
469
\item the symbol file,
470
\item the allomorph rule file, and
471
\item the lexicon file (in batch mode).
474
If you have started {\bf mallex} by using the option ``{\tt -interactive}'' or
475
``{\tt -i}, {\bf mallex} runs interactively: it loads the symbol file and the
476
allomorph rule file. Then the {\em prompt\/} appears:
478
mallex (4.3) - Copyright (C) 1995-1999 Bjoern Beutel
479
This program comes with ABSOLUTELY NO WARRANTY.
480
This is free software which you may redistribute under certain conditions.
481
For details, refer to the GNU General Public License.
484
You can now enter any {\bf mallex} command. If you do not remember the command
485
names, you can use the command {\bf help} to see an overview of the {\bf
488
If you want to quit {\bf mallex}, enter the command {\bf quit}.
490
If you've started {\bf mallex} in batch mode, it creates the run time lexicon
491
file from the base form lexicon file. If the lexicons are very big or the
492
allomorph rules are very complex, this can take some minutes. After creation,
495
You can use the following command line options when you start {\bf mallex}:
497
\item ``{\tt -interactive}'' or ``{\tt -i}'' runs {\bf mallex} in interactive
499
\item ``{\tt -readable}'' or ``{\tt -r}'' runs {\bf mallex} in batch mode and
500
outputs the allomorph lexicon in readable form on the standard output stream.
503
%------------------------------------------------------------------------------
505
\section{The Program ``malmake''}
507
The program {\bf malmake} reads a project file, it checks if all grammar files
508
needed do exist, and it translates all grammar files that have not yet been
509
translated or whose source files have changed since they have been translated.
510
{\bf malmake} itself calls the programs {\bf malsym}, {\bf mallex} and {\bf
511
malrul} if needed. An example: assume you have written a morphology grammar
512
whose grammar files are bundled in a project file ``{\tt english.pro}'':
514
sym: rules/english.sym
516
all: rules/english.all
518
lex: rules/english.lex lex/adjectives.lex
519
lex: lex/particles.lex lex/suffixes.lex lex/verbs.lex
520
lex: lex/nouns.lex lex/abbreviations.lex lex/numbers.lex
522
mor: rules/english.mor
524
mallex: set hidden +semantics +syntax
526
malaga: set hidden +semantics
528
When executing ``{\tt malmake dmm.pro}'' for the first time, the symbol file,
529
the rule files and the lexicon file will be translated:
535
project is up to date
537
The translation of a big lexicon can take a long time, since the allomorph
538
rules have to be executed for each lexicon entry.
540
%------------------------------------------------------------------------------
542
\section{The Program ``malrul''}
544
The program {\bf malrul} translates Malaga rule files, i.e.\ files that have
545
the endings ``{\tt .all}'', ``{\tt .mor}'' or ``{\tt .syn}''. The compiled
546
file gets the name ``{\tt .all\_c}'', ``{\tt .mor\_c}'', or ``{\tt .syn\_c}''.
547
Give the following arguments if you are starting {\bf malrul}:
549
\item the rule file that is to be translated, and
550
\item the associated symbol file.
552
The order of the arguments is arbitrary. Here is an example:
554
{\tt malrul english.mor english.sym\_c}
556
%------------------------------------------------------------------------------
558
\section{The Program ``malsym''}
560
{\bf malsym} can translate Malaga symbol files, i.e.\ files having the
561
ending ``{\tt .sym}'' or ``{\tt .esym}''. The translated file gets the ending
562
``{\tt .sym\_c}'' or ``{\tt .esym\_c}''.
566
{\tt malsym english.sym}
568
If you are translating an extended symbol file with the ending ``{\tt .esym}'',
569
enter the name of the compiled symbol file as an additional argument:
571
{\tt malsym english.esym english.sym\_c}
573
This argument is needed since extended symbol files are extensions of ordinary
576
%------------------------------------------------------------------------------
578
\chapter{The Commands of ``malaga'' and ``mallex''}
580
Since the user interfaces of {\bf malaga} and {\bf mallex} are very similar and
581
since they have a bunch of commands in common, we will describe them in a
582
common chapter. Commands that can be used in {\bf malaga} or in {\bf mallex}
583
only, are marked by the name of the program in which they can be used.
585
%------------------------------------------------------------------------------
587
\section{The Command ``break''}
589
If you want to stop the rules at a specific point, for example to take a look
590
at the variables, you can use the command {\bf break} to set {\em
591
breakpoints\/}. A breakpoint is a point in the rule source text where rule
592
execution is interrupted, so you can enter commands in debug mode. Breakpoints
593
are only active in debug mode, this means you have started rule execution by a
594
debug command or you have continued rule execution by one of the commands {\bf
595
step}, {\bf next}, {\bf walk}, or {\bf go}.
597
Behind the command name, {\bf break}, you can give one of the following
601
\item[a line number.] A breakpoint is set at this line in the current source
602
file. If there is no statement starting at this line, the breakpoint will be
603
set at the nearest line where a statement starts. You can, for example, set a
604
breakpoint at line 245 in the current source file by entering the command
608
\item[a file name and a line number.] A breakpoint is set at this line in this
609
file. If there is no statement starting at this line, the breakpoint will be
610
set at the nearest line where a statement starts. An example:
612
{\tt break english.syn 59}
614
\item[a rule name.] A breakpoint is set at the first statement in this rule. An
617
{\tt break final\_rule}
621
If the rule name or the file name is ambiguous, you can insert an abbreviation
622
for the rule system you refer to. Put it in front of the rule name or the file
623
name. The following abbreviations are used:
625
\item[all] for allomorph rules,
626
\item[mor] for morphology rules,
627
\item[syn] for syntax rules,
630
If you omit any argument, the breakpoint is set on the current line in the
631
current file (this is helpful in debug mode).
633
Every breakpoint gets a unique number once it has been set, so you can delete
634
it later, when you do not need it any longer.
636
You can list the breakpoints using the command {\bf list} and delete them using
639
%------------------------------------------------------------------------------
641
\section{The Command ``clear-cache'' (malaga)}
643
If you have changed your settings so that the wordform cache is no longer
644
valid, you can clear the cache using {\bf clear-cache}.
646
%------------------------------------------------------------------------------
648
\section{The Command ``debug-entry'' (mallex)}
650
Use {\bf debug-entry} to find errors in your allomorph rules. This command
651
works like {\bf ga}, but the allomorph generation will be stopped before the
652
first statement of the first rule is executed:
654
mallex> debug-entry [surface: "john", class: name]
655
at rule "irregular_verb"
658
The prompt ``{\tt debug>}'' that appears instead of ``{\tt mallex>}'' indicates
659
that {\bf mallex} is currently executing the allomorph rules but has been
660
interrupted. Since this ability has been developed to support the {\em
661
debugging\/} of Malaga rules, this mode is called {\em debug mode\/}.
663
When {\bf mallex} comes to the start of a new rule in debug mode (as in the
664
example above), the name of this rule is printed. When in debug mode, you can
665
always get the name of the current rule using the command {\bf rule}.
667
If you're running {\bf mallex} from Emacs, another Emacs window will display
668
the source file. An arrow is used to show to the statement that will be
672
allo_rule irregular_verb ($entry):
674
=>? $entry.class = verb;
678
In debug mode, you can, for example, get the variables that are currently
679
defined (using {\bf variable} or {\bf print}), and you can execute statements
680
(using {\bf step}, {\bf next}, {\bf walk}, {\bf go}, or {\bf run}). If you want to quit the
681
debug mode, just enter {\bf run}. The remaining statements for generation will
682
then be executed without interruption.
684
%------------------------------------------------------------------------------
686
\section{The Command ``debug-file'' (mallex)}
688
Use the command {\bf debug-file} to make the allomorph rules work on
689
a lexicon file in debug mode. Assume you have written a lexicon file ``{\tt
692
[surface: "m{a}n", class: noun];
693
[surface: "table", class: noun];
694
[surface: "wise", class: adjective];
696
To let the rules process this lexicon in debug mode, enter:
698
{\tt debug-file mini.lex}
700
%------------------------------------------------------------------------------
702
\section{The Command ``debug-line'' (mallex)}
704
Use the command {\bf debug-line} to make the allomorph rules generate
705
allomorphs for a single lexicon entry in debug mode. Assume you want to test
706
the second line in the lexicon file ``{\tt mini.lex}'':
708
[surface: "m{a}n", class: noun];
709
[surface: "table", class: noun];
710
[surface: "wise", class: adjective];
712
Enter the following line:
714
{\tt debug-line mini.lex 2}
716
Then {\bf mallex} stops in debug mode at the entry of the first allomorph rule
717
that is being executed for the lexicon entry
718
``\verb#[surface: "table", class:noun];#''.
720
If there is no lexicon entry at this line, the subsequent lexicon entry will be
723
%------------------------------------------------------------------------------
725
\section{The Command ``debug-mor'' (malaga)}
727
Use the command {\bf debug-mor} to find errors in your morphology combination
728
rules. This command analyses the rest of the command line morphologically and
729
executes the morphology combination rules in debug mode. Debug mode is
730
explained for the command {\bf debug}.
732
%------------------------------------------------------------------------------
734
\section{The Command ``debug-node'' (malaga)}
736
Use the command {\bf debug-node} to execute the successor rules of a specific
737
LAG state in debug mode. Previously, you must have already analysed a word or a
738
sentence, respectively. Make malaga display the analysis tree by entering {\bf
739
tree}, move the mouse pointer to the state node you want to debug, and press
740
the left mouse button. A window opens in which this state's category is shown.
741
The window's title line contains the number of the state node. Use this number
742
as argument for {\bf debug-node}. The last analysis input will be analysed
743
again, and analysis stops when reaching the first successor rule of the
744
specified state and malaga switches to debug mode.
746
%------------------------------------------------------------------------------
748
\section{The Command ``debug-syn'' (malaga)}
750
Use the command {\bf debug-syn} to find errors in your syntax combination
751
rules. This command analyses the rest of the command line syntactically and
752
executes the syntax combination rules in debug mode. Debug mode is explained
753
for the command {\bf debug}.
755
%------------------------------------------------------------------------------
757
\section{The Command ``delete''}
759
If you want to delete a breakpoint, use the command {\bf delete} with the
760
number of the breakpoints as argument.
762
Enter ``{\tt delete all}'' to delete all breakpoints.
764
%------------------------------------------------------------------------------
766
\section{The Command ``ga'' (mallex)}
768
Use the command {\bf ga} (short for ``generate allomorphs'') to generate
769
allomorphs. This is useful for testing allomorph generation from within {\bf
770
mallex}. When you enter the command, give a lexicon entry as argument. All
771
allomorphs that are generated from this entry by the allomorph rules, are
772
printed on screen. For example:
774
mallex> ga [surface: "john", class: name]
775
surf: "john", cat: [class: name, base_form: "abraham"]
777
If the rules create multiple allomorphs from an entry, they are displayed one
780
%------------------------------------------------------------------------------
782
\section{The Command ``ga-file'' (mallex)}
784
Use the command {\bf ga-file} to make the allomorph rules generate allomorphs
785
for a lexicon file. Assume you have written a lexicon file ``{\tt mini.lex}'':
787
[surface: "m{a}n", class: noun];
788
[surface: "table", class: noun];
789
[surface: "wise", class: adjective];
791
To generate the allomorphs for this lexicon, enter:
793
{\tt ga-file mini.lex}
795
This will produce a readable allomorph file whose name ends in ``{\tt .cat}''
796
(for {\em categories\/}); for ``{\tt mini.lex}'' its name will be ``{\tt
799
surf: "man", cat: [class: noun, syn: singular]
800
surf: "men", cat: [class: noun, syn: plural]
801
surf: "table", cat: [class: noun]
802
surf: "wise", cat: [class: adjective, restr: complete]
803
surf: "wis", cat: [class: adjective, restr: inflect]
806
%------------------------------------------------------------------------------
808
\section{The Command ``ga-line'' (mallex)}
810
Use the command {\bf ga-line} to make the allomorph rules generate
811
allomorphs for a single lexicon entry. Assume you want to test
812
the second line in the lexicon file ``{\tt mini.lex}'':
814
[surface: "m{a}n", class: noun];
815
[surface: "table", class: noun];
816
[surface: "wise", class: adjective];
818
Enter the following line:
820
{\tt ga-line mini.lex 2}
822
Then {\bf mallex} generates allomorphs for the lexicon entry
823
``\verb#[surface: "table", class:noun];#''.
825
If there is no lexicon entry at this line, the subsequent lexicon entry will be
828
%------------------------------------------------------------------------------
830
\section{The Command ``get''}
832
This command is used to query settings of {\bf malaga} or {\bf mallex}. Enter
833
it together with the name of the option whose setting you want to know. The
834
possible options are described in the next chapter.
835
If you just enter ``{\tt get}'', all settings will be shown.
837
%------------------------------------------------------------------------------
839
\section{The Command ``go''}
841
This command can only be executed in debug mode. The rule execution will be
842
resumed and continued until a breakpoint is met or the rules have been executed
845
%------------------------------------------------------------------------------
847
\section{The Command ``help''}
849
Use this command to get a list of the commands you can use. If you give the
850
name of a command or an option as argument, a short explanation of this item
851
will be printed. If a name represents a command as well as an option, prepend
852
``{\tt command}'' or ``{\tt option}'' to it.
854
%------------------------------------------------------------------------------
856
\section{The Command ``info'' (malaga)}
858
This command gives you information about the morphology or syntax rules you are
861
\item ``{\tt info mor}'' prints the lines in your project file(s) that begin
862
with ``{\tt morinfo:}''
863
\item ``{\tt info syn}'' prints the lines in your project file(s) that begin
864
with ``{\tt syninfo:}''
867
%------------------------------------------------------------------------------
869
\section{The Command ``list''}
871
If you enter the command {\bf list}, all breakpoints are listed. For each
872
breakpoint, its number, the name of the source file and the source line is
875
%------------------------------------------------------------------------------
877
\section{The Command ``ma'' (malaga)}
879
The command {\bf ma} (for {\em morphological analysis\/}) starts a word form
880
analysis. Give the word form that you want to be analysed as argument:
884
Malaga will show the results automatically, and it will also show the analysis
885
tree automatically if you specified it using the {\bf tree} option. You can
886
look at the results using {\bf result} or at the entire analysis tree using
889
If you do not enter a word form behind the command {\bf ma}, {\bf malaga}
890
re-analyses the last input.
892
%------------------------------------------------------------------------------
894
\section{The Command ``ma-file'' (malaga)}
896
The command {\bf ma-file} can be used to analyse files that contain word lists.
897
A word list consists of a number of word forms, each word form on a line on its
898
own. There may be empty lines in a word list. The following example is a word
899
list called ``{\tt word-list}'':
907
To analyse this word list, enter:
909
{\tt ma-file word-list result}
911
This will produce a file ``{\tt result}'' that contains the analysis results.
912
If the second argument is missing, the result will be written to a file whose
913
name ends in ``{\tt .cat}'' (for {\em categories\/}); for ``{\tt word-list}'',
914
its name will be ``{\tt word-list.cat}'':
916
1: "table": [class: noun, ...]
917
2: "men's": [class: noun, ...]
918
3: "blue": [class: noun, ...]
919
3: "blue": [class: adjective, ...]
920
3: "blue": [class: name, ...]
921
4: "handicap: unknown
924
The number at the line start represents the line number of the analysed
925
original word form. The output format can be changed by using the commands
926
{\bf output-format} and {\bf unknown-format}.
928
If a runtime error occurs during the analysis of a word, the error message will
929
be inserted into the result file, and the next word will be processed.
931
After the analysis, some statistics will be printed: The number of analysed and
932
recognised word forms, the average number of results per word form, and the
933
average number of word forms that have been analysed per second (if the
934
analysis took long enough).
936
%------------------------------------------------------------------------------
938
\section{The Command ``mg'' (malaga)}
940
Use the command {\bf mg} to generate all word forms that consist of a specified
941
set of allomorphs. For example, the command
943
{\tt mg 3 un able believe}
945
generates all word forms that consist of up to three allomorphs, where only the
946
specified allomorphs (``un'', ``able'', and ``believe'') are used. The word
947
forms are numbered from 1 onward, but different analyses of the same word form
948
get the same index. The output will look like this:
955
Please note that generation does not know of filters, pruning rules and
958
%------------------------------------------------------------------------------
960
\section{The Command ``next''}
962
This command can only be executed in debug mode. The rule execution will be
963
resumed and continues until a different source line is met or until the rules
964
have been executed completely. It is like {\bf step}, but subrules will be
965
executed without interruption. If you specify a number as argument, the command
966
will be repeated as often as specified.
968
%------------------------------------------------------------------------------
970
\section{The Command ``output''}
972
This command prints the results of the last analysis or allomorph generation as
973
ordinary text. The output format can be changed by using the commands {\bf
974
allo-format} (for {\bf mallex}), {\bf output-format}, and {\bf
975
unknown-format} (for {\bf malaga}).
977
%------------------------------------------------------------------------------
979
\section{The Command ``print''}
981
You can only use the command {\bf print} in debug mode or if the previous
982
analysis has stopped with an error in the combination rules. Using this
983
command, you get the values of all Malaga variables currently defined. The
984
variables will be printed in the order of their definitions:
986
malaga> sa-debug You are beautiful.
987
entering rule "Noun", start: "", next: "You"
989
$sentence = [class: main_clause, parts: <>]
990
$word = [class: pronoun, result: S2]
993
You can specify any variable names (including the ``{\tt \$}'') as arguments to
994
this command; you can even specify a path behind each of the
995
variable names. In this case, only the values of the specified variables or
999
$word = [class: pronoun, result: S2]
1000
debug> print $word.class
1001
$word.class = pronoun
1004
If the variable values are very complex, the output of {\bf print} can be
1005
confusing. Please use the command {\bf variables} in this case.
1007
%------------------------------------------------------------------------------
1009
\section{The Command ``quit''}
1011
Use this command to leave {\bf malaga} or {\bf mallex}.
1013
%------------------------------------------------------------------------------
1015
\section{The Command ``result''}
1017
If you have previously analysed a word form or a sentence using {\bf ma} or
1018
{\bf sa} (in {\bf malaga}), or you have generated allomorphs using {\bf ga} or
1019
{\bf ga-line} (in {\bf mallex}), you can display the results with ``{\tt
1020
result}''. The analysis results will be displayed in a window on their own
1021
which is called ``{\bf Results}'' for {\bf malaga} and ``{\bf Allomorph}'' for
1022
{\bf mallex}. They are numbered from 1 onward.
1024
If you are executing the command {\bf result} for the first time, or if you
1025
have closed a {\bf Results/Allomorph} window that you'd opened before, a window
1026
will open, displaying the values of all results/allomorphs of the last
1027
analysis/generation.
1029
If there is a {\bf Results/Allomorph} window currently opened, the new results/allomorphs
1030
will be displayed in this window.
1032
The {\bf Results/Allomorph} window has a menu with some commands:
1034
\item[Window:] Here, two items can be selected:
1036
\item[Export Postscript$...$:] Choose this item to convert the display
1037
content to Postscript and save it as a file.
1038
\item[Close:] Choose this item to close the {\bf Results/Allomorph} window.
1040
\item[Font size:] Choose one of the menu's subitems to change the font size.
1043
%------------------------------------------------------------------------------
1045
\section{The Command ``rule''}
1047
This command can only be used in debugger mode or after rule execution has been
1048
stopped by an error. It prints the name of the rule that has been executed;
1049
additionally, the Start and Next surface are printed in {\bf malaga}. For
1053
at rule "flexion", start: "hous", next: "es"
1056
%------------------------------------------------------------------------------
1058
\section{The Command ``run''}
1060
This command can only be used in debug mode. The rule execution will be
1061
resumed, and the rules will be executed completely without any interruption.
1063
If you have invoked the debug mode by the command {\bf debug-node}, rule
1064
execution will be stopped again when another Next item will be analysed.
1066
%------------------------------------------------------------------------------
1068
\section{The Command ``sa'' (malaga)}
1070
If you have started {\bf malaga} with a syntax file in your command line or in
1071
the project file, you can start syntactic analyses using the command {\bf sa}
1072
(short for {\em syntactic analysis\/}). Put the sentence you want to be
1073
analysed as argument behind the command name:
1075
malaga> sa The man is in town.
1077
Malaga will show the results automatically, and it will also show the analysis
1078
tree automatically if you specified it using the {\bf tree} option. You can
1079
look at the results using {\bf result} or at the entire analysis tree using
1082
If you do not enter a sentence behind the command {\bf sa}, {\bf malaga}
1083
re-analyses the last input.
1085
%------------------------------------------------------------------------------
1087
\section{The Command ``sa-file'' (malaga)}
1089
Using the command {\bf sa-file}, you can analyse files that contain sentence
1090
lists. In a sentence list, each sentence stands in a line on its own; empty
1091
lines are permitted. Here is an example, a sentence list named ``{\tt
1100
To analyse this sentence list, enter:
1102
{\tt sa-file sentence-list result}
1104
This will produce a file ``{\tt result}'' that contains the analysis results.
1105
If the second argument is missing, the result will be written to a file whose
1106
name ends in ``{\tt .cat}'' (for {\em categories\/}); for ``{\tt
1107
sentence-list}'', its name will be ``{\tt sentence-list.cat}''.
1109
1: "He sleeps.": [functor: [syn: <S3>, sem: <"sleep">],
1110
arguments: <[syn: S3, sem: "definite pronoun"]>]
1111
2: "He slept.": [functor: [syn: <S3>, sem: <"sleep">],
1112
arguments: <[syn: S3, sem: "definite pronoun"]>]
1113
3: "He has slept.": [functor: [syn: <S3>, sem: <"have", "sleep">],
1114
arguments: <[syn: S3, sem: "definite pronoun"]>]
1115
4: "He had slept.": [functor: [syn: <S3>, sem: <"have", "sleep">],
1116
arguments: <[syn: S3, sem: "definite pronoun"]>]
1119
The number at the line start represents the line number of the analysed
1120
original sentence. The output format can be changed by using the commands
1121
{\bf output-format} and {\bf unknown-format}.
1123
If a runtime error occurs during the analysis of a sentence, the error message
1124
will be inserted into the result file, and the next sentence will be processed.
1126
After the analysis, some statistics will be printed: The number of analysed and
1127
recognised sentences, the average number of results per sentence, and the
1128
average number of sentences that have been analysed per second (if the analysis
1131
%------------------------------------------------------------------------------
1133
\section{The Command ``set''}
1135
This command is used to change the settings of {\bf malaga} or {\bf
1136
mallex}. The command line
1137
``{\tt set} {\it option argument\/}'' changes {\it option\/} to
1140
If you want to get the current state of an option, use the command {\bf get}.
1141
Options can also be set in the project file. The possible options are
1142
described in the next chapter.
1144
%------------------------------------------------------------------------------
1146
\section{The Command ``sg'' (malaga)}
1148
Use {\bf sg} to generate sentences that are composed of a specified set of word
1149
forms. For example, if you enter
1151
{\tt sg 3 . ? he she sleeps}
1153
all sentences that consist of up to three word forms, where only the specified
1154
word forms (``.'', ``?'', ``he'', ``she'', and ``sleeps'') are used. The
1155
sentences are numbered from 1 onward, but different analyses of the same
1156
sentence get the same index. The output looks like this:
1158
malaga> sg 3 . ? he she sleeps
1164
Please note that generation does not know of filters, pruning rules and
1167
%------------------------------------------------------------------------------
1169
\section{The Command ``step''}
1171
This command can only be executed in debug mode. The rule execution will be
1172
resumed and continues until a different source line is met or until the rules
1173
have been executed completely. If you specify a number as argument, the command
1174
will be repeated as often as specified.
1176
%------------------------------------------------------------------------------
1178
\section{The Command ``trace''}
1180
If you are executing your rules in debug mode or the rules were interrupted
1181
by an error, this command shows were rule execution currently stopped. If it
1182
stopped in a subrule, all calling rules are also shown.
1185
line 23 in file "dmm-deutsch.syn", rule "fill_valencies"
1186
line 391 in file "dmm-deutsch.syn", rule "main_clause_end"
1188
This means, rule execution stopped in line 23 of ``{\tt dmm-deutsch.syn}'', in
1189
rule ``{\tt fill\_valencies}''. This subrule was called from line 391 in
1190
``{\tt dmm-deutsch.syn}'', in rule ``{\tt main\_clause\_end}''.
1192
%------------------------------------------------------------------------------
1194
\section{The Command ``transmit'' (malaga)}
1197
%------------------------------------------------------------------------------
1199
\section{The Command ``tree'' (malaga)}
1201
If you've started a grammatical analysis using one of the commands {\bf ma} or
1202
{\bf sa} (or their debug variants), you can make {\bf malaga} display the
1207
If the analysis has not yet finished (in debug mode or in case of an error), an
1208
intermediate result will be shown.
1210
If you're executing the command {\bf tree} for the first time, or if you've
1211
closed the {\bf Tree} window before, a new tree window will open in which the
1212
current analysis tree will be displayed.
1214
If there is already a {\bf Tree} window open, the new analysis tree will be
1215
displayed in this window.
1217
In the upper left corner of the {\bf Tree} window, you will see the sentence or
1218
the word form that has been analysed. Below, the analysis tree is displayed. An
1219
analysis path always follows the edges from the left to the right.
1221
A circle node stands for a LAG state, a two-circle node stands for an end
1224
Above each edge, the Next surface that has been read in by the corresponding
1225
rule application is displayed. On the bottom of an edge, you'll see the name of
1228
You can click on a node using the left mouse button. Then another window will
1229
open, namely the {\bf Path} window. The {\bf Path} window displays the surface,
1230
the category and the successor rules of the state you've clicked on. The node
1231
will be highlighted by a fatter border. If you've already clicked on a node,
1232
you can click on one of its successor nodes using the right mouse button. Then
1233
all rule applications, from the state clicked on previously up to the state
1234
clicked on this time, will be displayed in the {\bf Path} windows. The
1235
corresponding path will be highlighted in the {\bf Tree} window.
1237
If you're clicking on a Next surface using the left mouse button, the surface
1238
and its category will be displayed in the {\bf Path} window.
1240
You can also click on rule names using the left mouse button. Then the
1241
corresponding rule application will be displayed in the {\bf Path} window,
1242
i.e.\ the Start, Next and Result surface, the Start, Next and Result category,
1243
and the successor rules.
1245
There are some commands that can be started from the {\bf Tree} menu bar:
1248
\item[Window:] Here you can select from two menu items:
1250
\item[Export Postscript$...$:] Convert the displayed analysis tree to a
1252
\item[Close:] Close the {\bf Tree} window.
1254
\item[Font size:] Select an item in this menu to adjust the font size.
1255
\item[View:] Specify which nodes of the analysis tree are actually displayed.
1257
\item[Result paths only:] Only the nodes that are part of a complete analysis
1259
\item[All but dead ends:] All analysis states are displayed.
1260
\item[All nodes:] All analysis states are displayed, and also rectangular
1261
nodes for rule applications that did not succeed (dead ends).
1263
\item[Result:] Select an end state to display in the {\bf Path} window.
1265
\item[First result:] Display the first end state.
1266
\item[Previous result:] If there is an end state displayed in the {\bf Path}
1267
window, jump to the previous one.
1268
\item[Next result:] If there is an end state displayed in the {\bf Path}
1269
window, jump to the next one.
1270
\item[Last result:] Display the last end state.
1274
The {\bf Path} windows has got an own menu bar which contains the menus {\bf
1275
Window}, {\bf Font size} and {\bf Result} with the same menu items as the
1276
corresponding menus in the {\bf Tree} window.
1278
%------------------------------------------------------------------------------
1280
\section{The Command ``variables''}
1282
Use this command if you want to examine the values of the currently defined
1283
variables. They will be displayed in window on their own. You do not need to
1284
give any arguments, but you can only execute this command if {\bf malaga} is in
1285
debug mode or if the previous analysis has been stopped by an error in the
1288
If you are executing the command {\bf variables} for the first time, or if you
1289
have closed a {\bf Variables} window that you'd opened before, a window will
1290
open, displaying the values of all variables currently defined.
1292
If there is a {\bf Variables} window currently opened, the new variable
1293
contents will be displayed in this window.
1295
The {\bf Variables} window has a menu with some commands:
1298
\item[Window:] Here, two items can be selected:
1300
\item[Export Postscript$...$:] Choose this item to convert the variable
1301
display to Postscript and save it as a file.
1302
\item[Close:] Choose this item to close the {\bf Variables} window.
1304
\item[Font size:] Choose one of the menu's subitems to change the font size.
1307
\item[Show selected variables:] Choose one of the menu's subitems (variable
1308
names) to hide (or show) the corresponding variable.
1309
\item[Show all variables:] Choose this item to display all variables that are
1311
\item[Show no variables:] Choose this item to suppress the display of all
1316
%------------------------------------------------------------------------------
1318
\section{The Command ``walk''}
1320
This command works in debug mode only. The rule execution will be continued and
1321
stopped again as soon as a new rule is executed, a breakpoint is met or there
1322
are no more rules to execute.
1324
%------------------------------------------------------------------------------
1326
\chapter{The Options of ``malaga'' and ``mallex''}
1328
The programs {\bf malaga} and {\bf mallex} share some of their options, so we
1329
describe them in a common chapter. Options can be set using the command {\bf
1330
set}, and you can get the current value of an option using {\bf get}.
1331
Options that can be used in {\bf malaga} or in {\bf mallex}
1332
only, are marked by the name of the program in which they can be used.
1334
%------------------------------------------------------------------------------
1336
\section{The Option ``alias''}
1338
With {\bf alias}, you can define abbreviations for longer command lines. As
1339
arguments, give a name and an expansion, that is a command line which the name
1340
will stand for. If the expansion contains spaces, enclose it in double quotes.
1341
Omit the expansion if you want to delete an existing abbreviation.
1343
If you type in the name of an alias at your command line, its expansion will be
1346
Aliases cannot be nested.
1348
%------------------------------------------------------------------------------
1350
\section{The Option ``allo-format'' (mallex)}
1352
With {\bf allo-format}, you can change the output format for the generated allomorphs. Enter a format string as argument. If the
1353
format string contains spaces, enclose it in double quotes. If the argument is
1354
an empty string (\verb#""#), no allomorphs will be shown.
1356
In the format string, the following sequences have a special meaning:
1358
\item[``{\tt \%c}'':] will be replaced by the allomorph category.
1359
\item[``{\tt \%n}'':] will be replaced by the allomorph number.
1360
\item[``{\tt \%s}'':] will be replaced by the allomorph surface.
1363
%------------------------------------------------------------------------------
1365
\section{The Option ``cache-size'' (malaga)}
1367
Malaga has a cache for word forms. You can set the cache size, i.e. the maximum
1368
number of words in the cache, to {\em n} with ``{\tt set cache-size }{\em
1369
n\/}''. If you set the cache size to 0, the cache is deactivated.
1371
%------------------------------------------------------------------------------
1373
\section{The Option ``display''}
1375
If you want to use any program that shows the Malaga trees, results or
1376
variables graphically, set the command line that starts this program via the
1377
{\bf display} option. We recommend to set it in your {\tt .malagarc} file.
1379
set display "wish ~/malaga/tcl/display.tcl"
1382
%------------------------------------------------------------------------------
1384
\section{The Option ``hidden''}
1386
Some grammars can produce very large categories, so it can be useful not to
1387
show the values of some specified attributes. To achieve this, use the
1388
option {\bf hidden}. You can give any number of arguments to this option. The
1389
following arguments are available:
1391
\item ``{\tt +}{\it attribute\_name\/}'': The specified attribute name will be
1392
put in parentheses if it occurs in a value; the attribute value will not be
1394
\item ``{\tt -}{\it attribute\_name\/}'': The specified attribute will be shown
1395
completely again in the future.
1396
\item ``{\tt none}'': All attributes will be shown completely
1397
again in the future.
1400
%------------------------------------------------------------------------------
1402
\section{The Option ``mor-out-filter'' (malaga)}
1404
Use the option {\bf mor-out-filter} to switch the morphology output-filter
1407
\item ``{\tt set mor-out-filter yes}'' activates the filter;
1408
\item ``{\tt set mor-out-filter no}'' disactivates the filter.
1411
%------------------------------------------------------------------------------
1413
\section{The Option ``output''}
1415
In {\bf malaga}, you can use the {\bf output} option to execute the {\bf output} command each
1416
time when you invoked an analysis by {\bf ma} or {\bf sa}.
1417
In {\bf mallex}, you can use the {\bf output} option to execute the {\bf output} command each
1418
time when you invoked an allomorph generation by {\bf ga} or {\bf ga-line}.
1419
Set it in one of the following ways:
1421
\item ``{\tt set output on}'': The {\bf output} command will be executed after
1422
each analysis or generation.
1423
\item ``{\tt set output off}'': The {\bf output} command will not be executed
1427
%------------------------------------------------------------------------------
1429
\section{The Option ``output-format'' (malaga)}
1431
With {\bf output-format}, you can change the output format for analysed items
1432
that have been recognised. Enter a format string as argument. If the format
1433
string contains spaces, enclose it in double quotes. If the argument is an
1434
empty string (\verb#""#), no recognised forms will be shown.
1436
In the format string, the following sequences have a special meaning:
1438
\item[``{\tt \%c}'':] will be replaced by the result category of the analysis.
1439
\item[``{\tt \%l}'':] will be replaced by the line number of the analysed form.
1440
\item[``{\tt \%n}'':] will be replaced by the number of analysis states for
1442
\item[``{\tt \%r}'':] will be replaced by the reading index (the results for a
1443
form are indexed from 1 to the number of results).
1444
\item[``{\tt \%s}'':] will be replaced by the surface.
1447
%------------------------------------------------------------------------------
1449
\section{The Option ``pruning'' (malaga)}
1451
In your syntax rules, you may have specified a pruning rule that can prune the
1452
syntax analysis tree, i.e it can reduce the number of parallel paths. If you
1453
want this pruning rule to be executed, use the option {\bf pruning}.
1454
Us one of the following arguments:
1456
\item ``{\tt set pruning on}'' activates the pruning rule;
1457
\item ``{\tt set pruning off}'' disactivates the pruning rule.
1460
%------------------------------------------------------------------------------
1462
\section{The Option ``result''}
1464
In {\bf malaga}, you can use the {\bf result} option to execute the {\bf result} command each
1465
time when you invoked an analysis by {\bf ma} or {\bf sa}.
1466
In {\bf mallex}, you can use the {\bf result} option to execute the {\bf result} command each
1467
time when you invoked an allomorph generation by {\bf ga} or {\bf ga-line}.
1469
Set it in one of the following ways:
1471
\item ``{\tt set result on}'': The {\bf result} command will be executed after
1472
each analysis or generation.
1473
\item ``{\tt set result off}'': The {\bf result} command will not be executed
1477
%------------------------------------------------------------------------------
1479
\section{The Option ``robust'' (malaga)}
1481
With this command, you can specify if you want to run a robust-rule for the
1482
word forms that could not be recognised by LAG rules. The robust-rule gets the
1483
surface of an unknown word form as parameter and it can create one or more
1484
results by executing the {\bf result} statement.
1486
\item ``{\tt set robust on}'' enables this function;
1487
\item ``{\tt set robust off}'' disables it.
1490
%------------------------------------------------------------------------------
1492
\section{The Option ``sort-records''}
1494
There are different ways to determine the order in which the attributes of a
1495
record are printed. With {\bf sort-records}, you can choose between three
1498
\item ``{\tt set sort-records internal}'': The attributes will be printed in
1499
the order they have internally.
1500
\item ``{\tt set sort-records alphabetic}'': The attributes will be ordered
1501
alphabetically by their names.
1502
\item ``{\tt set sort-records definition}'': The attributes will be ordered by
1503
their names; the order is the same as in the symbol table.
1506
%------------------------------------------------------------------------------
1508
\section{The Option ``switch''}
1510
Malaga rules can query simple Malaga values ({\em switches}) that you can
1511
change during run time. Use the option {\bf switch} to change the values:
1513
\item ``{\tt set switch {\it name value\/}}'' sets the switch {\it name}, which
1514
must be a symbol, to {\it value}, which can be any Malaga value.
1517
%------------------------------------------------------------------------------
1519
\section{The Option ``syn-in-filter'' (malaga)}
1521
Use the option {\bf syn-in-filter} to switch the syntax input-filter on or
1524
\item ``{\tt set syn-in-filter yes}'' activates the filter;
1525
\item ``{\tt set syn-in-filter no}'' disactivates the filter.
1528
%------------------------------------------------------------------------------
1530
\section{The Option ``syn-out-filter'' (malaga)}
1532
Use the option {\bf syn-out-filter} to switch the syntax output-filter on
1535
\item ``{\tt set syn-out-filter yes}'' activates the filter;
1536
\item ``{\tt set syn-out-filter no}'' disactivates the filter.
1539
%------------------------------------------------------------------------------
1541
\section{The Option ``transmit'' (malaga)}
1543
If you want to use the {\bf transmit} function in {\bf malaga}, you have to set
1544
a command line that starts the transmit process using the {\bf transmit}
1545
option. Here is an example:
1547
set transmit "my_transmit_program"
1550
%------------------------------------------------------------------------------
1552
\section{The Option ``tree'' (malaga)}
1554
You can use {\bf tree} to make {\bf malaga} execute the {\bf tree} command each
1555
time when you invoked an analysis by {\bf ma} or {\bf sa}. Set it in one of
1558
\item ``{\tt set tree on}'': The {\bf tree} command will be executed after each
1560
\item ``{\tt set tree off}'': The {\bf tree} command will not be executed
1564
%------------------------------------------------------------------------------
1566
\section{The Option ``unknown-format'' (malaga)}
1568
With {\bf unknown-format}, you can change the output format for analysed items
1569
that have not been recognised. Enter a format string as argument. If the
1570
format string contains spaces, enclose it in double quotes. If the argument is
1571
an empty string (\verb#""#), no unrecognised forms will be shown.
1573
In the format string, the following sequences have a special meaning:
1575
\item[``{\tt \%l}'':] will be replaced by the line number of the analysed form.
1576
\item[``{\tt \%n}'':] will be replaced by the number of analysis states for
1578
\item[``{\tt \%s}'':] will be replaced by the surface.
1581
%------------------------------------------------------------------------------
1583
\section{The Option ``variables''}
1585
When {\bf malaga} or {\bf mallex} stops in debug mode while executing a
1586
malaga rule, they can automatically show the defined variables at this point.
1587
Use the option {\bf variables} to invoke this behaviour.
1589
\item ``{\tt set variables on}'': The {\bf variables} command will be executed
1590
each time when {\bf malaga} or {\bf mallex} stops in debug mode.
1591
\item ``{\tt set variables off}'': The {\bf variables} command will not be
1592
executed automatically.
1595
%------------------------------------------------------------------------------
1597
\chapter{Definition of the Programming Language Malaga}
1599
%------------------------------------------------------------------------------
1601
\section{Characterisation of Malaga}
1603
A malaga rule file resembles much in programming languages like Pascal or C (of
1604
course, those languages do not have a Left Associative Grammar formalism built
1605
in). A malaga source file must be translated before execution, this is the same
1606
as for compiler languages. But the generated Malaga code is not a machine
1607
code, but an {\em intermediate code\/} and has to be executed ({\em
1608
interpreted\/}) by an analysis program.
1610
We may characterise Malaga as follows, as far as programming structures and
1611
data structures are concerned:
1615
\item[structured values:] The basic values in Malaga are symbols (names that
1616
can be used e.g. for categories or subcategories), numbers (floating point
1617
numbers), and strings. Values can be combined to ordered lists or records
1618
(also known as feature structures). A value in a list or a record can be a
1619
list or a record itself. An ``ambiguous'' symbol like ``{\tt
1620
singular\_plural}'' can be assigned a list of symbols like ``{\tt
1621
<singular, plural>}''; such a symbol is called a {\em multi symbol\/}.
1623
\item[structured statements:] In Malaga, the concept of statement blocks is
1624
implemented in a similar way as it is in the programming language Pascal.
1625
There are structured control statements to select or repeat a statement
1626
sequence. A variable is always defined {\em locally\/}, i.e.\ it only exists
1627
from the point where it has been defined up to the end of the statement
1628
sequence in which it has been defined.
1630
\item[no type restrictions:] Any value can be assigned to a variable and the
1631
programmer can freely define the structure of values.
1633
\item[no side effects:] Malaga is, unlike programming languages like Pascal or
1634
C, free of side effects. If a variable gets a value, no other variable will
1635
be changed. Analysis paths are independent of each other.
1637
\item[termination:] A Malaga grammar that contains no recursive subrules and no
1638
{\bf repeat} statements is guaranteed to terminate, i.e.\ it can never hang
1641
\item[variables:] In a {\bf define} statement, a variable is defined and gets
1642
an initial value. Use an assignment to set a variable that has already
1643
been defined to a new value.
1645
\item[operators:] Many generative grammar theories or linguistical programming
1646
languages use the concept of unification of feature structures.
1647
Malaga does not use unification, but it offers some operators to build lists
1648
or records (feature structures) explicitly. Since Malaga does without
1649
unification, analyses are much faster.
1653
%------------------------------------------------------------------------------
1655
\section{Malaga Source Texts}
1657
Source texts in Malaga are format-free; this means that between lexical symbols
1658
(strings, identifiers, keywords, numerals and symbols such as ``{\tt +}'',
1659
``$\sim$'' or ``{\tt :=}'') there may be blanks or newlines (whitespaces) or
1660
comments. Between two identifiers or two keywords there {\em must\/} be at
1661
least one whitespace to separate them syntactically.
1663
In this documentation, the syntax of the source text components is defined
1664
formally in EBNF notation. The EBNF lines are printed in typewriter style and
1665
headed by ``{\tt \$\$}''.
1667
%------------------------------------------------------------------------------
1669
\subsection{Comments}
1672
$$ Comment ::= "#" {printing_char} .
1675
A comment may be inserted everywhere where a whitespace may be inserted. A
1676
comment begins with the symbol ``{\tt \#}''and extends to the end of the line.
1677
Comments are being ignored.
1679
%------------------------------------------------------------------------------
1681
\subsection{The {\bf include} Statement}
1684
$$ Include ::= "include" String ";" .
1687
A Malaga file may contain the statement
1689
{\tt {\bf include} "{\it filename\/}";}
1692
In a rule file, it can stand everywhere a rule can stand. In lexicon files, it
1693
can stand in place of a value; in symbol files, it can replace a symbol
1694
definition. The text of the included file is inserted verbatim at the very
1695
location where the {\bf include} statement occurs. The file name has to be
1696
stated relatively to the directory of the file which contains the {\bf include}
1699
%------------------------------------------------------------------------------
1701
\subsection{Identifiers}
1704
$$ Identifier ::= (Letter | "_" | "&") {Letter | Digit | "_" | "&"} .
1707
In Malaga, names for variables, constants, symbols, and rules, and (see below
1708
for explanation) are called {\em identifiers\/}. An identifier may consist of
1709
uppercase and lowercase characters, the underscore ``{\tt \_}'', the ampersand
1710
``{\tt \&}'', the vertical bar ``{\tt |}'', and, from the second character on,
1711
also of digits. Uppercase and lowercase characters are not distinguished, i.e.,
1712
Malaga is {\em not\/} case-sensitive. Malaga keywords must not be used as
1713
identifiers. A variable name must start with a ``{\tt \$}'', a constant name
1714
must start with a ``{\tt \@}''. The same identifier may be used as variable
1715
name, constant name, symbol name, or rule name independently. Malaga can
1716
distinguish them by the context in which they occur.
1718
Valid identifiers would be ``{\tt Noun}'', ``{\tt noun}'' (the same as the
1719
first), ``{\tt R2D2}'', ``{\tt Vb\_aux}'', ``{\tt A|G|D}'', ``{\tt \_INF}''.
1720
Identifiers like ``{\tt 2Noun}'', ``{\tt Verb.Frame}'', ``{\tt OK?}'', ``{\tt
1721
\_~INF}'' are {\em not\/} valid.
1723
%------------------------------------------------------------------------------
1727
Malaga expressions can have values with very complex structures. To describe
1728
how those values can be composed from simple values a few rules suffice. Simple
1729
values in Malaga are {\em symbols\/}, {\em numbers}, and {\em strings}, which
1730
can be composed to form {\em records\/} and {\em lists\/}.
1732
%------------------------------------------------------------------------------
1734
\subsection{Symbols}
1737
$$ Symbol ::= Identifier .
1740
The central data type in Malaga is the symbol. It is used for describing
1741
syntactic or semantic properties of an allomorph, a word, or a sentence. A
1742
symbol is an identifier like ``{\tt Verb}'', ``{\tt reflexive}'', ``{\tt
1743
Sing\_1}''. The symbols ``{\tt nil}'', ``{\tt yes}'', ``{\tt no}'', ``{\tt
1744
symbol}'', ``{\tt string}'', ``{\tt number}'', ``{\tt
1745
list}'', and ``{\tt record}'' are predefined and have special meanings.
1747
%------------------------------------------------------------------------------
1749
\subsection{Numbers}
1752
$$ Number ::= [-] Digit {Digit} ["." Digit {Digit}] "E" Digit {Digit} .
1755
A number in Malaga consists of an optional ``-'' sign, an integer part, an
1756
optional fractional part and an optional exponent of the form ``{\tt
1757
E$[$+$|$-$]n$}''. There must be a dot between the integer part and the
1758
fractional part. Examples: ``{\tt 0}'', ``{\tt 1}'', ``{\tt 1.0}'', ``{\tt
1759
-13.75}'', ``{\tt 1.2E-5}''.
1761
%------------------------------------------------------------------------------
1763
\subsection{Strings}
1766
$$ String ::= '"' {printing_char_except_double_quotes | '\"' | '\\'} '"' .
1769
A string may consist of any number of characters (it may also be empty). It
1770
must be enclosed in double quotes and must not extend over more than one line.
1771
Within the double quotes there may be any combination of printable characters
1772
except the backslash ``\verb#\#'' and the double quotes. These characters must
1773
be preceded by a ``\verb#\#'' (escape character). Examples: {\tt "Hello"}, {\tt
1774
"He~says:~\verb#\#"Great\verb#\#""}.
1776
%------------------------------------------------------------------------------
1781
$$ List ::= "<" Expression {"," Expression} ">" .
1784
A list is an ordered sequence of values. The values are separated by commas and
1785
enclosed in angle brackets:
1787
{\tt <{\it element1}, {\it element2}, $...$>}
1789
A list may as well be empty. The elements in a list may be arbitrarily complex;
1790
they may also be lists or records.
1792
%------------------------------------------------------------------------------
1794
\subsection{Records}
1797
$$ Record ::= "[" Symbol-Value-Pair {"," Symbol-Value-Pair} "]" .
1798
$$ Symbol-Value-Pair ::= Expression ":" Expression .
1801
A record is a collection of attributes. An {\em attribute\/} consists of a
1802
symbol, the {\em attribute name\/}, and an associated {\em attribute value},
1803
which can by an arbitrary Malaga value. The attribute name serves as an access
1804
key for the attribute value, so all attributes in a record must have different
1807
Records are noted down as follows:
1810
[{\it name1}:\ {\it value1}, {\it name2}:\ {\it value2}, $...$]
1813
where {\it name i} denotes an attribute name and {\it value i} the associated
1814
attribute value. Example: ``{\tt [Class:\ Verb, Reg:\ Reg, Val:\ dirObj]}''.
1816
A record with no attributes, ``{\tt []}'', is called {\em empty record\/}.
1818
%------------------------------------------------------------------------------
1820
\section{Expressions}
1823
$$ Expression ::= ["-"] Term {("+" | "-") Term} .
1824
$$ Term ::= Factor {("*" | "/") Factor} .
1825
$$ Factor ::= Value {"." Value} .
1826
$$ Value ::= Symbol | String | Number | List | Record | Constant
1827
$$ | Subrule-Invocation | Variable | "(" Condition ")" .
1828
$$ Constant-Expression ::= Expression .
1831
An expression is the form in which a value is used in Malaga. Values can be
1834
{\tt [Surf:\ "he", Class:\ Pron, Case\&Number:\ S3]}
1837
Variables (these are placeholders for values within a rule) can as well be used
1843
Furthermore, constants (placeholders for values in a rule file) can be used as
1846
{\tt @combination\_table}
1849
All three forms can be mixed:
1851
{\tt [Surf:\ "he", Class:\ Pron, Case\&Number:\ \$result]}
1854
Furthermore, there are operators which modify values or combine two values to
1855
form a new value. Using those operators complex values can be composed. All
1856
operators work left-associatively and have a different priority (an operator
1857
with higher priority is applied before one with lower priority):
1859
\begin{tabular}{|c|c|}
1861
operator & priority \\
1865
{\tt *}, {\tt /} & 2 \\
1867
{\tt +}, {\tt -} & 1 \\
1872
The order in which the operators are to be applied can be changed by bracketing
1873
with round parentheses ``{\tt ()}''.
1875
%------------------------------------------------------------------------------
1877
\subsection{Variables}
1880
$$ Variable ::= "$" Identifier .
1883
A variable is marked by a ``{\tt \$}'' preceding its name. The name may be any
1884
valid identifier. A variable is defined by the {\bf define} statement; it
1885
receives a value and may from this point on be used in all expressions within
1886
the statement sequence. In such a statement sequence (and all subordinated
1887
statement sequences) a variable with the same name must not be defined again.
1889
%------------------------------------------------------------------------------
1891
\subsection{Constants}
1894
$$ Constant ::= "@" Identifier .
1897
A constant is marked by a ``{\tt @}'' preceding its name. The name may be any
1898
valid identifier. A constant is defined by a constant definition in a rule
1899
file, outside a rule. It is assigned a value and can be used in subsequent
1900
rules and constant definitions in that rule file.
1902
%------------------------------------------------------------------------------
1904
\subsection{Subrule Invokations}
1907
$$ Subrule-Invocation ::= Rule-Name "(" Expression {"," Expression} ")" .
1908
$$ Rule-Name ::= Identifier .
1911
A subrule is invoked when an expression ``{\tt {\it subrule} ({\it value1},
1912
{\it value2}, $...$)}'' is evaluated. The expression yields the value that is
1913
returned by the {\bf return} statement in the subrule. The number of parameters
1914
in a subrule invokation must match the number of parameters in the subrule
1917
There is a number of default subrules which are predefined. They are called
1918
{\em functions\/} and they all take one parameter only.
1920
%------------------------------------------------------------------------------
1922
\subsection{The Function ``{\bf atoms}''}
1924
The expression ``{\tt {\bf atoms}({\it symbol\/})}'' yields the list of atomic
1925
symbols for {\it symbol}. If {\it symbol\/} is not a multi-symbol, it yields
1926
the list {\tt <{\it symbol}>}.
1928
%------------------------------------------------------------------------------
1930
\subsection{The Function ``{\bf capital}''}
1932
The expression ``{\tt {\bf capital} ({\it string\/})}'' yields {\tt yes} if the
1933
first character of the string {\it string\/} is a capital letter, else it
1936
%------------------------------------------------------------------------------
1938
\subsection{The Function ``{\bf length}''}
1940
The expression ``{\tt {\bf length} ({\it list\/})}'' yields the number of
1941
elements in ``{\it list}''.
1943
%------------------------------------------------------------------------------
1945
\subsection{The Function ``{\bf multi}''}
1947
The expression ``{\tt {\bf multi}({\it list\/})}'', where {\it list\/} is a
1948
list of symbols, yields the multi symbol whose atomic list corresponds to {\it
1949
list\/}. If {\it list\/} contains a single atomic symbol, this symbol will be
1950
yield by the expression.
1952
%------------------------------------------------------------------------------
1954
\subsection{The Function ``{\bf set}''}
1956
The expression ``{\tt {\bf set}({\it list\/})}'' yields a list which contains
1957
each element of {\it list}, but only once. That means, the list is converted to
1960
%------------------------------------------------------------------------------
1962
\subsection{The Function ``{\bf switch}''}
1964
The expression ``{\tt {\bf switch} ({\it symbol\/})}'' yields the current value
1965
of the switch associated to ``{\it symbol}''. Use the option {\bf switch} to
1968
%------------------------------------------------------------------------------
1970
\subsection{The Function ``{\bf symbol\_name}''}
1972
The expression ``{\tt {\bf symbol\_name} ({\it symbol\/})}'' yields the name of
1973
{\it symbol} as a string.
1975
%------------------------------------------------------------------------------
1977
\subsection{The Function ``{\bf transmit}'' (malaga)}
1979
The expression ``{\tt {\bf transmit} ({\it value\/})}'' writes {\it value},
1980
converted to text format, to the transmit process via pipe and reads a value in
1981
text format from the transmit process via pipe. The answer is converted to the
1982
internal Malaga value format and returned as the result of the expression.
1984
When this function is evaluated, the transmit process is started if it has not
1985
been started yet. The command line of the transmit process is specified by the
1986
option {\bf transmit}.
1988
%------------------------------------------------------------------------------
1990
\subsection{The Function ``{\bf truncate}''}
1992
The expression ``{\tt {\bf truncate} ({\it number\/})}'' yields the largest
1993
integer number that is not greater than {\it number}.
1995
%------------------------------------------------------------------------------
1997
\subsection{The Function ``{\bf value\_type}''}
1999
The expression ``{\tt {\bf value\_type} ({\it value\/})}'' yields the type of
2000
{\it value}. The type information is coded as one of the symbols ``{\tt
2001
symbol}'', ``{\tt string}'', ``{\tt number}'', ``{\tt
2002
list}'', or ``{\tt record}''.
2004
%------------------------------------------------------------------------------
2006
\subsection{The Operator ``{\tt .}''}
2008
This operator may only be used in the following ways:
2010
\item The expression ``{\tt {\it record\/}.{\it symbol\/}}'' yields the
2011
attribute value of the attribute of {\it record\/} whose name is {\it
2012
symbol}. If there is no attribute in {\it record\/} whose name is {\it
2013
symbol}, the expression yields the special symbol {\tt nil}.
2014
\item The expression ``{\tt {\it list\/}.{\it number}}'' yields the element of
2015
{\it list\/} at position {\it number}. If there is no element at position
2016
{\it number\/} in {\it list\/}, the expression yields the special symbol {\tt
2018
\item The expression ``{\tt {\it value\/}.{\it list\/}}'', where {\it list\/}
2019
is a list {\tt <{\it e1}, {\it e2}, $...$>} of symbols and/or numbers, serves
2020
as an abbreviation for ``{\tt {\it value\/}.{\it e1}.{\it e2}$...$}''.
2023
%------------------------------------------------------------------------------
2025
\subsection{The Operator ``{\tt +}''}
2027
This operator may only be used in the following ways:
2029
\item The expression ``{\tt {\it string1\/} + {\it string2\/}}'' yields the
2030
concatenation of {\it string1\/} and {\it string2}.
2031
\item The expression ``{\tt {\it list1\/} + {\it list2\/}}'' yields the
2032
concatenation of {\it list1\/} and {\it list2}.
2033
\item The expression ``{\tt {\it number1\/} + {\it number2\/}}'' yields the sum
2034
of {\it number1\/} and {\it number2}.
2035
\item The expression ``{\tt {\it record1\/} + {\it record2\/}}'' yields a
2036
record wich consists of all attributes of {\it record1} and {\it record2}. If
2037
{\it record1} and {\it record2} have a common attribute names, the
2038
corresponding attributes in the result record will have the attribute values
2039
from {\it record2}, in contrast to the operator ``{\tt *}''.
2042
%------------------------------------------------------------------------------
2044
\subsection{The Operator ``{\tt -}''}
2046
This operator may only be used in the following ways:
2048
\item The expression ``{\tt {\it record\/} - {\it symbol\/}}'' yields {\it
2049
record\/} without the attribute named {\it symbol}, if {\it symbol\/} is an
2050
attribute name in {\it record}. If not, the expression yields {\it record}.
2051
\item The expression ``{\tt {\it record\/} - {\it list\/}}'', where {\it
2052
list\/} is a list of symbols, yields {\it record\/} without the attributes
2054
\item The expression ``{\tt {\it list\/} - {\it number\/}}'' yields {\it
2055
list\/} without the element at index {\it number}. If this element does not exist,
2056
the expression yields {\it list}.
2057
\item The expression ``{\tt {\it list1\/} - {\it list2\/}}'' yields the
2058
multi-set difference of the two lists {\it list1} and {\it list2}. This
2059
means, it yields the list {\it list1}, but the first $n$ appearances of each
2060
element will be deleted, if that element appears $n$ times in {\it list2}.
2061
\item The expression ``{\tt {\it number1\/} - {\it number2\/}}'' yields the
2062
difference of {\it number1\/} and {\it number2}.
2065
%------------------------------------------------------------------------------
2067
\subsection{The Operator ``{\tt *}''}
2069
This operator may only be used in the following ways:
2071
\item The expression ``{\tt {\it record\/} * {\it symbol\/}}'' yields the
2072
record which only contains the attribute of {\it record\/} whose name is {\it
2074
\item The expression ``{\tt {\it record1\/} * {\it record2\/}}'' yields the
2075
\item The expression ``{\tt {\it record1\/} + {\it record2\/}}'' yields a
2076
record wich consists of all attributes of {\it record1} and {\it record2}. If
2077
{\it record1} and {\it record2} have a common attribute names, the
2078
corresponding attributes in the result record will have the attribute values
2079
from {\it record1}, in contrast to the operator ``{\tt +}''.
2081
record which containsonly contains the attribute of {\it record\/} whose name is {\it
2083
\item The expression ``{\tt {\it record\/} * {\it list\/}}'', where {\it
2084
list\/} is a list of symbols, yields the record which only contains the
2085
attributes of {\it record\/} whose names are in {\it list}.
2086
\item The expression ``{\tt {\it list1\/} * {\it list2\/}}'' yields the
2087
``intersection'' of the lists interpreted as multi-sets; if an element is $m$
2088
times contained in {\it list1}
2089
and $n$ times contained in {\it list2}, it will be {\bf min}($m$, $n$) times
2090
contained in the result.
2091
\item The expression ``{\tt {\it number1\/} * {\it number2\/}}'' yields the
2092
product of {\it number1\/} and {\it number2}.
2095
%------------------------------------------------------------------------------
2097
\subsection{The Operator ``{\tt /}''}
2099
This operator may only be used in the following ways:
2101
\item The expression ``{\tt {\it list1\/} / {\it list2\/}}'' yields the
2102
list which contains all elements of {\it list1} which are not elements of
2104
\item The expression ``{\tt {\it list\/} / {\it number\/}}'' yields the list
2105
which contains all elements of {\it list\/} without the leftmost {\it
2106
number\/} elements, if {\it number\/} is positive, or without the rightmost
2107
-{\it number\/} elements, if {\it number\/} is negative.
2108
\item The expression ``{\tt {\it number1\/} / {\it number2\/}}'', where {\it
2109
number2\/} is not 0, yields the quotient of {\it number1\/} and {\it
2113
%------------------------------------------------------------------------------
2115
\section{Conditions}
2118
$$ Condition ::= Comparison ({"and" Comparison} | {"or" Comparison}) .
2119
$$ Comparison ::= ["not"] (Expression [Comparison-Operator Expression]
2120
| Match-Comparison) .
2121
$$ Comparison-Operator ::= "=" | "/=" | "~" | "/~" | "in" | "less" | "greater"
2122
| "less_equal" | "greater_equal" .
2125
A condition can either be true or false, as in ``{\tt Verb = Verb}'' or ``{\tt
2126
Verb = Noun}'', respectively.
2127
An expression that is evaluated to any of the symbols {\tt yes} or {\tt no} is
2130
A condition can be used everywhere a (non-constant) value is needed. It will
2131
evaluate to {\tt yes} or {\tt no}. In this case, the condition must be
2132
surrounded by parentheses.
2134
%------------------------------------------------------------------------------
2136
\subsection{The Operators ``{\tt =}'' and ``{\tt /=}''}
2138
The condition ``{\tt {\it expr1} = {\it expr2}}'' tests whether the
2139
expressions {\it expr1} and {\it expr2} are equal. There are several
2143
\item[{\it expr1} and {\it expr2} are strings, symbols or numbers.] In this
2144
case {\it expr1} and {\it expr2} must be identical.
2145
\item[{\it expr1} and {\it expr2} are lists.] In this case {\it expr1}
2146
and {\it expr2} must match element by element.
2147
\item[{\it expr1} and {\it expr2} are records.] In this case {\it expr1}
2148
and {\it expr2} must contain the same attributes (though not necessarily in
2149
the same order) as in {\it expr2}.
2152
For nested structures, equality is tested recursively.
2154
If {\it expr1} and {\it expr2} do not have the same type, the test
2155
results in an error; only the symbol {\tt nil} can be compared to any value.
2157
The comparison ``{\tt {\it expr1} /= {\it expr2}}'' holds iff the
2158
comparison ``{\tt {\it expr1} = {\it expr2}}'' does not hold.
2160
%------------------------------------------------------------------------------
2162
\subsection{The Operators ``{\bf less}'', ``{\bf less\_equal}'', ``{\bf
2163
greater}'', ``{\bf greater\_equal}''}
2165
A condition of type ``{\it expr1} {\it operator} {\it expr2}'' compares
2166
two numbers. Here, {\it operator\/} can have the following values:
2168
\begin{tabular}{|c|c|}
2170
\it operator & meaning \\
2174
\bf less\_equal & $\leq$ \\
2176
\bf greater & $>$ \\
2178
\bf greater\_equal & $\geq$ \\
2183
If either {\it expr1} or {\it expr2} is no number, an error will be
2186
%------------------------------------------------------------------------------
2188
\subsection{The Operators ``$\sim$'' and ``/$\sim$''}
2190
For a comparison ``{\tt {\it expr1} $\sim$ {\it expr2}}'', {\it expr1}
2191
and {\it expr2} must be lists or symbols.
2193
If {\it expr1} and {\it expr2} are symbols, the list of their atomic
2194
symbols ({\tt {\bf atoms}({\it expr1})} and {\tt {\bf atoms}({\it
2195
expr2})} will be used for the comparison instead of the symbols themself.
2197
The comparison test whether the lists do {\em congruate\/}, this means, whether
2198
they have an element in common.
2200
The comparison ``{\tt {\it expr1} /$\sim$ {\it expr2}}'' holds iff the
2201
comparison ``{\tt {\it expr1} $\sim$ {\it expr2}}'' does not hold.
2203
%------------------------------------------------------------------------------
2205
\subsection{The Operator ``{\bf in}''}
2207
The operator ``{\bf in}'' can be only used in the following ways:
2209
\item The condition ``{\tt {\it symbol} {\bf in} {\it record}}'' holds iff {\it
2210
record\/} contains an attribute named {\it symbol}.
2211
\item The condition ``{\tt {\it value} {\bf in} {\it list}}'' holds iff {\it
2212
value\/} is an element of {\it list}.
2215
%------------------------------------------------------------------------------
2217
\subsection{The {\bf matches} Condition (Regular Expressions)}
2220
$$ Match-Comparison ::= Expression "matches" "(" Segment {"," Segment} ")".
2221
$$ Segment ::= [Variable ":"] Constant-Expression .
2226
{\tt {\it expr\/} {\bf matches} ({\it pattern\/})}
2228
interprets {\it pattern\/} as a pattern (a regular expression) and
2229
tests whether {\it expr\/} matches {\it pattern\/}. Patterns are defined as
2232
\item {\it pattern} ::= {\it alternative} \{ ``{\tt |}'' {\it
2235
The string must be identical with one of the alternatives.
2237
\item {\it alternative} $::=$
2238
\{ {\it atom} [ ``{\tt *}'' $|$ ``{\tt ?}'' $|$ ``{\tt +}'' ] \}
2240
An alternative is a (possibly empty) sequence of atoms. An atom in a pattern
2241
corresponds to a character in a string. By using an optional postfix operator
2242
it is possible to specify for any atom how often it may be repeated within
2243
the string at that location: zero times or once, at least once (``{\tt +}''),
2244
or arbitrarily often, including zero times (``{\tt *}'').
2246
\item {\it atom} ::= ``{\tt (}'' {\it pattern} ``{\tt )}''
2248
A pattern may be grouped by parentheses.
2250
\item {\it atom} ::= ``{\tt [}'' [ ``\verb#^#'' ] {\it range} \{
2251
{\it range} \} ``{\tt ]}''
2253
A character class. It represents exactly one character from one of the
2254
ranges. If the symbol ``\verb#^#'' is the first one in the class, the
2255
expression represents exactly one character that is {\em not\/} contained in
2258
\item {\it atom} ::= ``{\tt .}''
2260
Represents any character.
2262
\item {\it atom} ::= {\it character}
2264
Represents the character itself.
2266
\item {\it range} ::= {\it character1} [ ``{\tt -}'' {\it character2} ]
2268
The range contains any character with a code at least as big as the code of
2269
{\it character1} and not bigger than the code of {\it character2}. The
2270
code of {\it character2} must be at least as big as the code of {\it
2271
character1}. If {\it character2} is omitted, the range only contains
2274
\item {\it character} ::= Any character except ``\verb#*?+[]^-.\|()#''
2276
To use one of the characters ``\verb#*?+[]^-.\|()#'', it must be preceded by
2277
a ``\verb#\#'' (escape character).
2281
You can divide the pattern into segments:
2283
{\tt \$surf {\bf matches} ("un|in|im|ir|il", ".*", "(en)?")}
2287
{\tt \$surf {\bf matches} ("(un|in|im|ir|il).*(en)?")}.
2290
A section of the string can be stored in a variable by prefixing the respective
2291
pattern with ``{\tt {\it variable\_name\/}:}'', as in
2293
{\tt \$surf {\bf matches} (\$a:\ "un|in|im|ir|il", ".*")}
2296
The variables defined by pattern matching are only defined in the statement
2297
sequence which is being executed if the pattern matching is successful. A
2298
matches condition that is
2300
\item contained in a disjunction (an {\bf or} condition),
2301
\item contained in a negation (a {\bf not} condition), or
2302
\item used as a value (e.g. in an assignment)
2304
may not have variable definitions in it.
2306
%------------------------------------------------------------------------------
2308
\section{The Operators {\bf not}, {\bf and}, and {\bf or}}
2310
Conditions can be combined logically:
2313
\item The condition ``{\bf not} {\it cond\/}'' is true if condition {\it
2315
\item The condition ``{\it cond1} {\bf and} {\it cond2} {\bf and} {\it cond3}
2316
{\bf and} $...$'' is true if all conditions {\it cond1}, {\it cond2}, {\it
2317
cond3}, $...$ are true. The conditions are only tested until one of them
2318
is false (short-cut evaluation).
2319
\item The condition ``{\it cond1} {\bf or} {\it cond2} {\bf or} {\it cond3}
2320
{\bf or} $...$'' is true if at least one of the conditions {\it cond1}, {\it
2321
cond2}, {\it cond3}, $...$ is true. The conditions are only tested until
2322
one of them is true (short-cut evaluation).
2325
The operator {\bf not} takes exactly one argument. Complex conditions have to
2326
be put in parentheses ``(\,)''.
2328
The operators {\bf and} and {\bf or} may not be mixed; otherwise the order of
2329
evaluation would be ambiguous. They have to be put in parentheses
2332
%------------------------------------------------------------------------------
2334
\section{The Symbol Table}
2337
$$ Symbol-Definition ::= Symbol [":=" "<" Symbol {"," Symbol} ">"] ";".
2340
Every symbol used in a grammar has to be defined exactly once in the {\em
2341
symbol table\/}. Every symbol must be followed by a semicolon:
2343
{\tt verb; noun; adjective;}
2345
Symbols that are being defined that way are called {\em atomic symbols\/}. A
2346
symbol can also be defined as a {\em multi-symbol\/}. Then the entry for this
2347
symbol has the following format:
2349
{\tt {\it symbol\/} := {\it list\/};}
2351
The {\it list\/} for this symbol must consist of at least two atomic symbols,
2352
all different from those that have already been defined. This list will be
2353
used by the operators ``$\sim$'' and ``{\tt /}$\sim$'', ``{\bf atoms}'', and
2354
``{\bf multi}''. The lists in the symbol table must be all different; they may
2355
not only differ in the order of their elements.
2357
%------------------------------------------------------------------------------
2359
\section{The Initial State}
2362
$$ Initial ::= "initial" Constant-Expression "," Rule-Set ";" .
2363
$$ Rule-Set ::= "rules" (Rules {"else" Rules} | "(" Rules {"else" Rules} ")") .
2364
$$ Rules ::= Rule-Name {"," Rule-Name} .
2367
The initial state in a combination rule file is defined as follows:
2371
{\bf initial} {\it value},
2372
{\bf rules} {\it rule1}, {\it rule2}, $...$;
2377
The initial state specifies a category for the empty word start (or sentence
2378
start) in a combi rule file; the rules listed behind {\bf rules} are applied in
2379
parallel to combine the empty word (sentence) start with the first allomorph
2380
(word form). The rules may be enclosed in parentheses.
2382
If you want rules to be executed only if no other rule
2383
has been successful, you can put their names behind the other rules'
2384
names and write an {\bf else} in front of them:
2387
{\bf initial} {\it value} {\bf rules} {\it rule1}, {\it rule2}
2388
{\bf else} {\it rule3}, {\it rule4} {\bf else} $...$;
2391
If none of the normal rules {\it rule1} and {\it rule2} have been
2392
successful, {\it rule3} and {\it rule4} are executed. If these rules also
2393
fails, the next rules are executed, and so on.
2395
%------------------------------------------------------------------------------
2397
\section{The Constant Definition}
2400
$$ Constant-Definition ::= "define" Constant ":=" Constant-Expression ";" .
2403
A constant definition is of the form
2405
{\tt @{\it constant\/} := {\it expr\/};}
2407
The constant expression {\it expr\/} will be evalued and the constant @{\it
2408
constant\/} will be defined to have this value. The constant must not be
2409
defined previously. The constant is valid from this definition up to the end of
2412
%------------------------------------------------------------------------------
2417
$$ Rule ::= Rule-Type Rule-Name "(" Variable {"," Variable} ")" ":"
2418
$$ {Statement} "end" [Rule-Type] [Rule-Name] ";" .
2419
$$ Rule-Type ::= "allo_rule" | "combi_rule" | "end_rule" | "pruning_rule"
2420
$$ "robust_rule" | "input_filter" | "output_filter" | "subrule" .
2423
A rule is a sequence of statements that is executed as a unit:
2427
{\bf combi\_rule} {\it name} ({\it \$param1}, {\it \$param2}, $...$): \\
2428
\qquad {\it statement1} \\
2429
\qquad {\it statement2} \\
2431
{\bf end} {\it name}; \\
2436
A rule has to begin with one of the keywords {\bf allo\_rule}, {\bf
2437
combi\_rule}, {\bf end\_rule}, {\bf pruning\_rule}, {\bf robust\_rule}, {\bf
2438
input\_filter}, {\bf output\_filter} or {\bf subrule}. It is followed by its
2439
{\em parameter list}, a list of variable names in parentheses. The variables
2440
will be assigned the parameter values when the rule is executed. The number of
2441
parameters depends on the rule type. The rule names have the following
2445
\item[``{\tt {\bf allo\_rule} ({\it \$lex\_entry\/})}'':] An allo-rule
2446
must occur exactly once in an allomorph rule file. It analyses a lexical
2447
entry and must generate one or more allomorph entries (via {\bf result}). An
2448
allomorph rule has one parameter, namely the lexicon entry.
2449
\item[``{\tt {\bf combi\_rule} ({\it \$start, \$next, \$surf, \$index\/})}'':]
2450
Any number of combi-rules may occur in a combi-rule file. Before
2451
processing such a rule, the next segment (either the next allomorph or the
2452
next word form) is being read. The first parameter is the Start category, the
2453
second is the Next category, the third is the Next surface, and the fourth is
2454
the Next index. The third and the fourth parameter are optional. A combi-rule
2455
may state a successor rule set or accept the analysed input (both via {\bf
2457
\item[``{\tt {\bf pruning\_rule} ({\it \$list\/})}'':] A pruning-rule may occur
2458
at most once in a syntax rule file. During syntax analysis, it can decide
2459
which states are still valid and which are to be deleted. The parameter is a
2460
list of categories of the states that have consumed the same input so far.
2461
The pruning-rule must execute a {\bf return} statement with a list of {\tt
2462
yes}- and {\tt no}-symbols. Each state in {\it \$list\/} corresponds to a
2463
symbol in the result list. If the symbol is {\tt yes}, the corresponding
2464
state is preserved. If the symbol is {\tt no}, the state is abandoned.
2465
\item[``{\tt {\bf robust\_rule} ({\it \$surface})}'':] A
2466
robust-rule can only appear at most once a morphology rule file. If robust
2467
analysis has been switched on by the {\bf robust} command, and a word form
2468
could not be recognised by the combi-rules, the robust-rule is executed with
2469
the surface of the word form as its parameter. A robust-rule can accept the
2470
word form via {\bf result}.
2471
\item[``{\tt {\bf input\_filter} ({\it \$cat\_list\/})}'':] An input-filter may
2472
occur at most once in a syntax rule file. The input-filter is called after a
2473
word form has been analysed. It gets one parameter, namely the list of the
2474
analysis results, and it transforms it to one or more filtered results (via
2476
\item[``{\tt {\bf output\_filter} ({\it \$cat\_list\/})}'':] An output-filter
2477
may occur at most once in any rule file.
2479
\item[In allo-rule files:] The output-filter is called after all lexicon entry
2480
have been processed by the allo-rules. The filter is called for every
2481
allomorph surface. It gets one parameter, namely the list of the generated
2482
categories with that surface, and it transforms it to one or more filtered allomorph
2483
categories (via {\bf result}).
2484
\item[In combi-rule files:] The output-filter is called after an item has
2485
been analysed. It gets one parameter, namely the list of the analysis
2486
results, and it transforms it to one or more filtered results (via {\bf
2489
\item[``{\tt {\bf subrule} ({\it \$param1}, {\it \$param2}, $...$)}'':] Any
2490
number of subrules may occur in any rule file. A subrule can be invoked from
2491
other rules and it must return a value to this rule via {\bf return}. It can
2492
have any number of parameters (at least one).
2495
If a rule is executed, all statements in the rule are processed sequentially.
2496
After that, the rule execution is terminated. Thereby, the {\bf if} statement,
2497
the {\bf foreach} statement, and the {\bf parallel} statement may change the
2498
processing order. Special conditions apply if:
2501
\item A condition in a test statement does not hold. In this case the
2502
processing of the rule path is terminated. This is not an error.
2503
\item The {\bf fail} statement was executed. This is a special case of case 1.
2504
\item An {\bf assert} condition does not hold. In this case the processing of
2505
the whole grammar is terminated and an error message is displayed. This rule
2506
termination can be used to find categorisation or programming flaws in the
2507
rule system or in the lexicon.
2508
\item The {\bf error} statement was executed. This is a special case of
2510
\item The {\bf return} statement was executed in a subrule or in a pruning
2511
rule. In a subrule, this terminates the subrule int the current rule path and
2512
immediately returns to the calling rule. In a pruning rule, this terminates
2516
%------------------------------------------------------------------------------
2518
\section{Statements}
2521
$$ Statement ::= Assert-Statement | Assignment
2522
$$ | Choose-Statement | Define-Statement
2523
$$ | Error-Statement | Fail-Statement | Foreach-Statement
2524
$$ | If-Statement | Parallel-Statement | Repeat-Statement
2525
$$ | Require-Statement | Result-Statement | Return-Statement .
2528
A rule body contains a sequence of statements.
2530
The statements are the assignment and the statements beginning with
2531
{\bf assert}, {\bf choose}, {\bf define}, {\bf error},
2532
{\bf fail}, {\bf foreach}, {\bf if}, {\bf parallel}, {\bf repeat},
2533
{\bf require}, {\bf result}, and {\bf return}.
2535
%------------------------------------------------------------------------------
2537
\subsection{The {\bf assert} Statement}
2540
$$ Assert-Statement ::= ("assert" | "!") Condition ";" .
2545
{\tt {\bf assert} {\it condition\/};}
2549
{\tt ! {\it condition\/};}
2552
tests whether {\it condition\/} holds. If this is not the case, an error
2553
message with the line number in the source code is printed and the processing
2554
of {\em all} paths is terminated.
2556
The {\bf assert} statement should be used to check whether there are structural
2557
flaws in the lexicon or the rule system.
2559
%------------------------------------------------------------------------------
2561
\subsection{The Assignment}
2564
$$ Assignment ::= Variable {"." Value}
2565
$$ (":=" | ":=+" | ":=-" | ":=*" | ":=/") Expression ";" .
2568
To set the value of an already defined variable to a different value, use a
2569
statement of the following form:
2571
{\tt {\it \$var\/} := {\it expr\/};}
2573
The expression {\it expr\/} is evaluated and the result is assigned to the
2574
variable {\it \$var\/}. The variable must have already been defined.
2576
You can optionally specify a path behind the variable that is to be set by an
2579
{\tt {\it \$var\/}.{\it part1\/}.{\it part2\/} := {\it value\/};}
2581
In this case, only the value of ``{\tt {\it \$var\/}.{\it part1}.{\it
2582
part2}}'' will be set to {\it value\/}; the remainder of the variable
2583
{\it \$var} will be unchanged. Each {\it part\/} must be an expression that
2584
evaluates to a symbol, a number or a list of symbols and numbers.
2586
You can also use one of four other assignment operators instead of the operator
2587
``{\tt :=}'': The statement ``{\tt {\it \$var\/} :=+ {\it value\/};}'' is a
2588
shorthand for ``{\tt {\it \$var\/} := {\it \$var\/} + {\it value\/};}'', the
2589
analogon holds for the assignment operators ``{\tt :=-}'', ``{\tt :=*}'', and
2590
``{\tt :=/}''. Here, {\it \$var\/} may be followed by a path again.
2592
%------------------------------------------------------------------------------
2594
\subsection{The {\bf choose} Statement }
2597
$$ Choose-Statement ::= "choose" Variable "in" Expression ";" .
2600
The {\bf choose} statement chooses an element of a list. Its format
2603
{\tt {\bf choose} {\it \$var\/} {\bf in} {\it expr\/};}
2606
For every element in the list {\it expr\/} a rule path is created; in this rule
2607
path the element is stored in the variable {\it \$var\/}. Thus the number of
2608
rule paths can multiply. If, for example, {\it expr\/} has the value {\tt <A,
2609
B, C>}, the currently processed rule path has three continuations: In the
2610
first one {\it \$var\/} has the value {\tt A}, in the second one it has the
2611
value {\tt B} and in the third one it has the value {\tt C}. The three paths
2612
behave independently from now on; some may fail while others may be processed
2613
successfully, and the results can be different.
2615
The {\bf choose} statement can also be used for records. In that case, the
2616
variable {\it \$var\/} gets a different attribute name of the record {\it
2617
expr\/} in each path.
2619
The {\bf choose} statement also works for numbers:
2621
\item If {\it expr\/} is a positive number {\it n\/}, the variable {\it
2622
\$var\/} is assigned the numbers {\tt 1}, {\tt 2}, $...$, {\it n},
2623
respectively, in each path.
2624
\item If {\it expr\/} is a negative number {\it -n\/}, the variable {\it
2625
\$var\/} is assigned the numbers {\tt -1}, {\tt -2}, $...$, {\it -n\/},
2626
respectively, in each path.
2629
%------------------------------------------------------------------------------
2631
\subsection{The {\bf define} Statement}
2634
$$ Define-Statement ::= "define" Variable ":=" Expression ";" .
2637
A {\bf define} statement is of the form
2639
{\tt {\bf define} {\it \$var\/} := {\it expr\/};}
2642
The expression {\it expr\/} is evaluated and the result is assigned to the
2643
variable {\it \$var\/}. The variable may not be defined before this statement;
2644
it is defined by the statement and only exists until the statement sequence in
2645
which the assignment is situated has been processed fully.
2647
%------------------------------------------------------------------------------
2649
\subsection{The {\bf error} Statement}
2652
$$ Error-Statement ::= "error" String ";" .
2655
The statement {\bf error} terminates the execution of {\em all\/} paths and
2656
prints out a given error message string and the line of the source text.
2658
{\tt {\bf error} {\it message\/};}
2661
%------------------------------------------------------------------------------
2663
\subsection{The {\bf fail} Statement}
2666
$$ Fail-Statement ::= "fail" ";" .
2669
The {\bf fail} statement terminates the current rule path. Its format is:
2674
%------------------------------------------------------------------------------
2676
\subsection{The {\bf foreach} Statement}
2679
$$ Foreach-Statement ::= "foreach" Variable "in" Expression ":" {Statement}
2680
$$ "end" ["foreach"] ";" .
2683
You may wish to manipulate all elements of a list or a record {\it
2684
sequentially} in {\it one} rule path. For this purpose, the {\bf foreach}
2685
statement was introduced. It has the following format:
2688
{\bf foreach} {\it \$var\/} {\bf in} {\it expr\/}:\ {\it statements\/}
2693
Sequentially the first, second, third, $...$ element of the list {\it expr\/}
2694
are assigned to {\it \$var\/} and the statement sequence {\it statements\/} is
2695
executed for each of those assignments.
2697
Every time the {\it statements\/} are being walked through, the variable {\it
2698
\$var\/} is defined again. Its scope is the block {\it statements\/}.
2700
The {\bf foreach} statement also works for records. In that case, the variable
2701
{\it \$var\/} is assigned the first, second, $...$ attribute name of the record
2704
The {\bf foreach} statement also works for numbers:
2706
\item If {\it expr\/} is a positive number {\it n\/}, the variable {\it
2707
\$var\/} is assigned the numbers {\tt 1}, {\tt 2}, $...$, {\it n\/}
2709
\item If {\it expr\/} is a negative number {\it n\/}, the variable {\it
2710
\$var\/} is assigned the numbers {\tt -1}, {\tt -2}, $...$, {\it -n\/}
2714
%------------------------------------------------------------------------------
2716
\subsection{The {\bf if} Statement}
2719
$$ If-Statement ::= "if" Condition "then" {Statement}
2720
$$ {"elseif" Condition "then" {Statement}}
2721
$$ "else" {Statement} "end" ["if"] ";" .
2724
An {\bf if} statement has the following form:
2727
\begin{tabular}{llll}
2728
{\bf if} & {\it condition1} & {\bf then} & {\it statements1} \\
2729
{\bf elseif} & {\it condition2} & {\bf then} & {\it statements2} \\
2730
{\bf else} & & & {\it statements3} \\
2736
The second line may be repeated unrestrictedly (including zero times), the
2737
third line may be omitted.
2739
Firstly, {\it condition1} is evaluated. If it is satisfied, the
2740
statement sequence {\it statements1} is executed.
2742
If the first condition is not satisfied, {\it condition2} is evaluated; if
2743
the result is true, {\it statements2} is executed. This procedure is
2744
repeated for every {\bf elseif} part until a condition is satisfied.
2746
If the {\bf if} condition and {\bf elseif} conditions fail, the statement
2747
sequence {\it statements3} is executed (if it exists).
2749
After the {\bf if} statement has been processed the next statement is executed.
2751
The {\bf if} after the {\bf end} may be omitted.
2753
%------------------------------------------------------------------------------
2755
\subsection{The {\bf parallel} Statement}
2758
$$ Parallel-Statement ::= "parallel" {Statement} {"and" {Statement}}
2759
$$ "end" ["parallel"] ";" .
2762
Using the {\bf parallel} statement more than one continuation of an
2763
analysis can be generated. Its format is:
2767
{\bf parallel} & {\it statements1}\\
2768
{\bf and} & {\it statements2}\\
2769
{\bf and} & {\it statements3}\\
2771
{\bf end parallel};\\
2776
This creates as many rule paths as there are statement sequences. In the first
2777
rule path, {\it statements1} are executed, in the second one {\it statements2}
2778
are executed, etc. Each rule path continues by executing the statements
2779
following the {\bf parallel} statement.
2781
The keyword {\bf parallel} behind the {\bf end} can be omitted.
2783
%------------------------------------------------------------------------------
2785
\subsection{The {\bf repeat} Statement}
2788
$$ While-Statement ::= "repeat" {Statement} "while" Condition ";" {Statement}
2789
$$ "end" ["while"] ";"
2792
You may wish to repeat a sequence of statements while a specific condition
2793
holds. This can be realised by the {\bf repeat} loop. It has the following form:
2798
{\bf while} {\it condition\/} ;\\
2804
The statements {\it statements1\/} are executed. Then, {\it condition\/}
2805
is tested. If it holds, the {\it statements2\/} are
2806
executed and the {\bf repeat} statement is executed again. If {\it condition\/}
2807
does not hold, execution proceeds after the {\bf repeat} statement.
2809
%------------------------------------------------------------------------------
2811
\subsection{The {\bf require} Statement}
2814
$$ Require-Statement ::= ("require" | "?") Condition ";" .
2817
A statement of the form
2819
{\tt {\bf require} {\it condition\/};}
2823
{\tt ?\ {\it condition\/};}
2825
tests whether {\it condition\/} is true. If this is not the case the rule path
2826
is terminated {\em without\/} error message. Test statements should be used to
2827
decide whether a read word start (sentence start) is grammatical according to
2828
the interpretation of the rule path.
2830
%------------------------------------------------------------------------------
2832
\subsection{The {\bf result} Statement}
2835
$$ Result-Statement ::= "result" Expression ["," (Rule-Set | "accept")] ";" .
2839
\item[In combi rules:] The statement
2843
{\bf result} {\it expr\/},\\
2844
{\bf rules} {\it rule1}, {\it rule2}, $...$;
2848
specifies the Result category of the rule and the successor rules. The value
2849
{\it expr\/} is the Result category. Behind the keyword {\bf rules} the names
2850
of all successor rules are enumerated. For every successor rule that is being
2851
executed a new rule path will be created. The rule set may be enclosed in
2854
If you want successor rules to be executed only if no other rule has been
2855
successful, you can put their names behind the other rules' names and write an
2856
{\bf else} in front of them:
2859
{\bf rules} {\it rule1}, {\it rule2}
2860
{\bf else} {\it rule3}, {\it rule4} {\bf else} $...$;
2863
If none of the normal rules (here: {\it rule1} and {\it rule2}) has been
2864
successful, {\it rule3} and {\it rule4} are executed. If these rule also fail,
2865
the next rules are executed, and so on. A rule has been successful if it has
2866
executed at least one {\bf result} statement.
2868
\item[In combi-rules and end-rules:]
2869
If the input is to be accepted by the {\bf result} statement (and therefore no successor rules are to be called) the following format has to be used:
2871
{\tt {\bf result} {\it expr\/}, {\bf accept};}
2873
If this statement is reached in a rule path, the input is accepted as
2874
grammatically well-formed. The value {\it expr\/} is returned as the result of
2875
the morphological or syntactic analysis.
2877
\item[In filters and robust-rules:] The format of a {\bf result} statement
2878
in a filter or robust-rule:
2880
{\tt {\bf result} {\it expr\/};}
2882
If this statement is reached, the value {\it expr\/} is used as a result of the
2885
\item[In allo rules:] The format of the {\bf result} statement in an allo rule
2888
{\tt {\bf result} {\it surface\/}, {\it category\/};}
2890
It creates an entry in the allomorph lexicon. The allomorph surface
2891
{\it surface\/} must be a string; {\it category\/} is the categorical
2892
information of the allomorph.
2896
%------------------------------------------------------------------------------
2898
\subsection{The {\bf return} Statement}
2901
$$ Return-Statement ::= "return" Expression ";" .
2904
In a subrule, the {\bf return} statement is of the
2907
{\tt {\bf return} {\it expr\/};}
2909
The value of {\it expr} is returned to the rule that invoked this subrule and
2910
the subrule execution is finished.
2912
In a pruning rule, the {\bf return} statement is of the same form. Here, {\it
2913
expr\/} must be a list a list of {\tt yes}- and {\tt no}-symbols. Each state
2914
in the category list, which is the pruning rule parameter, corresponds to a
2915
symbol in the result list. If the symbol is {\tt yes}, the corresponding state
2916
is preserved. If the symbol is {\tt no}, the state is abandoned.
2918
%------------------------------------------------------------------------------
2922
A Malaga grammar system comprises several files: a symbol file, a lexicon file,
2923
an allomorph rule file, a morphology rule file, an extended symbol file
2924
(optional), and a syntax rule file (optional). The type of a file can be
2925
seen by the ending of the file name. A grammar for the English language may
2926
consist of the files ``{\tt english.sym}'', ``{\tt english.lex}'', ``{\tt
2927
english.all}'', ``{\tt english.mor}'' and ``{\tt english.syn}''.
2929
%------------------------------------------------------------------------------
2931
\subsection{The Symbol File}
2934
$$ Symbol-File ::= {Symbol-Definition | Include} .
2937
A symbol file has the suffix ``{\tt .sym}''. It contains the symbol table.
2939
%------------------------------------------------------------------------------
2941
\subsection{The Extended Symbol File}
2944
$$ Extended-Symbol-File ::= Symbol-File .
2947
An extended symbol file has the suffix ``{\tt .esym}''. It contains an
2948
additional symbol table that contains symbols that may only be used in the
2951
%------------------------------------------------------------------------------
2953
\subsection{The Lexicon File}
2956
$$ Lexicon-File ::= {Constant-Definition | Constant-Expression ";"} .
2959
A lexicon file has the suffix ``{\tt .lex}''. It consists of any number of
2960
values and constant definitions, each terminated by a semicolon. Each value
2961
stands for a lexical entry. A value may contain named constants and the
2962
operators ``.'', ``+'', ``-'', ``*'', and ``/''. values, the lexical entries;
2963
The format of the lexical entries is free, although it should be consistent
2964
with the conception of the whole rule system.
2966
%------------------------------------------------------------------------------
2968
\subsection{The Allomorph Rule File}
2971
$$ Rule-File ::= {Rule | Constant-Definition | Initial | Include} .
2972
$$ Allomorph-Rule-File ::= Rule-File .
2975
The allomorph lexicon is generated from the base form lexicon by applying the
2976
allo-rule on the base form entries. The allomorph generation rule file has
2977
the suffix ``{\tt .all}'' and consists of one allo-rule, an optional
2978
output-filter, and any number of subrules and constant definitions.
2980
For every lexical entry, the allo-rule is executed with the value of the
2981
lexicon entry as parameter. The allo-rule can generate allomorphs using the
2982
{\bf result} statement.
2984
After all allomorphs have been produced, the output-filter is executed once for
2985
each surface in the (intermediate) allomorph lexicon. As parameter, the
2986
output-filter gets the list of categories that share that surface. An entry in
2987
the final allomorph lexicon is created everytime the {\bf result} statement is
2988
executed. The surface cannot be changed by the output-filter.
2990
%------------------------------------------------------------------------------
2992
\subsection{The Combi-Rule Files}
2995
$$ Combi-Rule-File ::= Rule-File .
2998
A grammar system includes up to two combination rules files: one for
2999
morphological combination with the suffix ``{\tt .mor}'' and (optionally) one
3000
for syntactic combination with the suffix ``{\tt .syn}''.
3002
A combination rule file consists of an initial state and any number of
3003
combi-rules, subrules, and constant definitions. A syntax rule
3004
file may contain one optional pruning-rule, one optional input-filter and one
3005
optional output-filter; a morphology rule file may contain
3006
one optional robust-rule and one optional output-filter.
3008
Beginning with the rules listed up in the initial state, the rules and
3009
their successors are processed until a {\bf result} statement with the
3010
keyword {\bf accept} is encountered in every path. A path dies if there is no
3011
more input (from the lexicon or from the morphology) that can be processed.
3013
In morphology, if analysis has created no result and robust analysis has been
3014
switched on, the robust-rule will be called with the analysis surface and can
3017
In syntax, when a new wordfom has been imported from morphology, the
3018
input-filter can take a look at its categories and create new result
3021
In syntax, if a pruning-rule is present and pruning has been activated, the
3022
concatenation of the next word form is preceded by the following step: The
3023
categories of all current LAG states are merged into a list, which is the
3024
parameter of the pruning rule. The pruning-rule must execute a {\bf return}
3025
statement with a list of {\tt yes}- and {\tt no}-symbols. Each state in the
3026
category list corresponds to a symbol in the result list. If the symbol is {\tt
3027
yes}, the corresponding state is preserved. If the symbol is {\tt no}, the
3030
After analysis, the output-filter can take a look at all result categories and
3031
create new result categories.
3035
% end of file =================================================================