3
@setfilename malaga.info
11
@c Copyright (C) 1995 Bjoern Beutel.
13
@dircategory Malaga - a Natural Language Analysis System
15
* Malaga: (malaga). A Grammar Development Environment for Natural Languages.
16
* malaga: (malaga)malaga. Analyse words/sentences using a Malaga grammar.
17
* mallex: (malaga)mallex. Run allomorph rules on lexicon entries.
18
* malmake: (malaga)malmake. Compile all files of a Malaga grammar.
19
* malrul: (malaga)malrul. Compile a rule file of a Malaga grammar.
20
* malsym: (malaga)malsym. Compile a symbol file of a Malaga grammar.
23
@c ----------------------------------------------------------------------------
27
@subtitle User's and Programmer's Manual
31
@vskip 0pt plus 1 filll
32
Copyright @copyright{} 1995 Bj@"orn Beutel.
34
Permission is granted to make and distribute verbatim copies of this
35
manual provided the copyright notice and this permission notice are
36
preserved on all copies.
38
Permission is granted to copy and distribute modified versions of
39
this manual under the conditions for verbatim copying, provided that the
40
entire resulting derived work is distributed under the terms of a
41
permission notice identical to this one.
43
Permission is granted to copy and distribute translations of this manual
44
into another language, under the above conditions for modified versions,
45
except that this permission notice may be stated in a translation
46
approved by the Free Software Foundation.
50
@c ----------------------------------------------------------------------------
53
@node Top, Contents, (dir), (dir)
56
@node Top, Introduction, (dir), (dir)
60
@majorheading Malaga 6.13
62
This is the documentation for Malaga, a software package for the
63
development and application of grammars that are used for the analysis
64
of words and sentences of natural languages.
66
Copyright (C) 1995 Bjoern Beutel.
70
This is the documentation for Malaga, a software package for the
71
development and application of grammars that are used for the analysis
72
of words and sentences of natural languages.
74
Copyright @copyright{} 1995 Bj@"orn Beutel.
78
Permission is granted to make and distribute verbatim copies of this
79
manual provided the copyright notice and this permission notice are
80
preserved on all copies.
82
Permission is granted to copy and distribute modified versions of
83
this manual under the conditions for verbatim copying, provided that the
84
entire resulting derived work is distributed under the terms of a
85
permission notice identical to this one.
87
Permission is granted to copy and distribute translations of this manual
88
into another language, under the above conditions for modified versions,
89
except that this permission notice may be stated in a translation
90
approved by the Free Software Foundation.
95
* Contents:: Table of Contents.
97
* Introduction:: What is Malaga?
98
* Formalism:: The grammar formalism used by Malaga.
99
* The Programs:: Invoking @code{malaga} and its friends.
100
* Commands:: Interactive commands for @code{malaga} and @code{mallex}.
101
* Options:: Interactive options for @code{malaga} and @code{mallex}.
102
* The Language:: Definition of the Programming Language Malaga.
106
@c ----------------------------------------------------------------------------
109
@node Contents, Introduction, Top, Top
113
@c ----------------------------------------------------------------------------
116
@node Introduction, Formalism, Contents, Top
119
@node Introduction, Formalism, Top, Top
122
@chapter Introduction
123
The Name ``Malaga'' is used with two different meanings: on the one
124
hand, it is the name of a special purpose programming language, namely a
125
language to implement grammars for natural languages. On the other hand,
126
it is the name of a program package for development of Malaga Grammars
127
and testing them by analysing words and sentences. ``Malaga'' is an
128
acronym for ``@b{M}erely @b{a} @b{L}anguage-@b{A}nalysing @b{G}rammar
132
The program package ``Malaga'' has been developed by Bj@"orn Beutel.
133
@cindex Beutel, Bj@"orn
136
The program package ``Malaga'' has been developed by Bjoern Beutel.
137
@cindex Beutel, Bjoern
142
@cindex Sch@"uller, Gerald
146
@cindex Schueller, Gerald
148
has implemented parts of the original debugger, the original
149
Emacs Malaga mode and the original Tree and Variable output.
151
So far, morphology grammars for several natural languages have been
152
developed with Malaga, including the German, Italian, English,
153
Spanish, Albanian and Korean language.
155
@c ----------------------------------------------------------------------------
157
@node Formalism, The Programs, Introduction, Top
158
@chapter Malaga's Grammar Formalism
162
A formal grammar for a natural language can be used to check whether a
163
sentence or a word form is grammatically well-formed (a word form is a
164
special inflectional form of a word, so ``book'' and ``books'' are two
165
different word forms of the word ``book''). Furthermore, a grammar can
166
describe the structure and meaning of a sentence or a word form by a
167
data structure that has been constructed during the analysis process.
169
Malaga is using a formalism that is derived of the Left-Associative
170
Grammar (LAG), which has been developed by Roland Hausser. An LAG
171
analyses a sentence (or a word form) step by step:
172
its parts are concatenated from the left to the right, hence the name
173
``Left-Associative Grammar''. A single LAG rule can only join two
174
parts to a bigger one: it concatenates the state part (which is the
175
beginning of the sentence or word form that has already been analysed)
176
and the link part (which is the next word form or the next
177
allomorph). In contrast to LAG, Malaga's formalism already reads in
178
the first part of a word form or of a sentence by applying a rule.
179
Take a look at the following sentence:
182
Shakespeare liked writing comedies.
185
The sentence is being analysed by five rule applications:
188
``'' + ``Shakespeare'' @*
189
``Shakespeare'' + ``liked'' @*
190
``Shakespeare liked'' + ``writing'' @*
191
``Shakespeare liked writing'' + ``comedies'' @*
192
``Shakespeare liked writing comedies'' + ``.'' @*
195
To apply a rule it's not sufficient to know the spelling of a word or an
196
allomorph. A rule also requires morphological and syntactic information, such
197
as word class, gender, meaning of a suffix etc. This information, which is
198
associated with an element of an utterance, like a sentence, a word form or an
199
allomorph, is called its @dfn{feature structure}. The analysis of a sentence or
200
a word form returns such a feature structure as result.
202
Now let us take a closer look at how a sentence is analysed.
206
Before we can start to analyse a sentence, the analysis automaton must be
207
in an @dfn{initial state}. The initial state includes:
211
a feature structure for that state, and
213
the @dfn{combination rule} checking whether it is allowed to start with a
214
specific word form. This rule also builds the feature structure of the
215
successor state (whose surface consists of the first word form).
219
The next word form that is going to be added is read and analysed
220
morphologically. If there is no valid word form, the analysis process
224
The feature structure that morphology assigns to this word form is called the
225
link's feature structure. The feature structure of the input that has been
226
analysed syntactically so far is called the state's feature structure.
229
The active combination rule checks whether it is allowed to combine the state's
230
surface (which may be empty if the rule is operating on the initial state) with
231
the link, i.e., the next word form. The combination rule takes the feature
232
structures of the state and of the link as parameters. They can be compared by
233
logical tests, and finally the feature structure of the successor state (whose
234
surface includes the word form that has been read), is constructed by the rule.
235
The rule also specifies which @dfn{successor rule} is active in the successor
236
state. Execution then continues at step 2.
238
Instead of specifying a successor rule, a rule can also @emph{accept} the
239
analysed sentence. In that case, the feature structure of the successor state
240
will be used as the feature structure of the complete analysed sentence.
243
Morphological analysis operates analogously, except that a word form,
244
composed from allomorphs, is being analysed. The link (step
245
2) is found in the allomorph lexicon.
247
This sketch is of course simplified. There can be ambiguities in an
248
analysis, induced by several causes:
252
The initial state may contain several rules to analyse the first word
255
A rule may have multiple successor rules.
257
In morphology, the continuation of the input may match several trie entries.
259
In syntax analysis, the link may be assigned several feature structures
263
These ambiguities are coped with by dividing the analysis into several
264
subanalyses: if there are two lexicon entries for a word form, for example, the
265
analysis continues using the first entry (and its feature structure) as well as
266
the second one. You can compare this with a branching path. The analyses will
267
be continued independently of each other. So, one analysis path can accept the
268
input while the other fails. Each analysis path can divide repeatedly when
269
other ambiguities are met. If several analysis paths are continued until they
270
accept, the analysis process returns more than one result.
272
@c ----------------------------------------------------------------------------
274
@node The Programs, Commands, Formalism, Top
275
@chapter The Malaga Programs
277
The Malaga programs are all started in a similar manner: either you give
278
the name of a @dfn{project file} as argument (this is not possible if
279
you start @code{malrul} or @code{malsym}), or you give the name of the
280
files that are needed by the program (for @code{malmake} and
281
@code{malaga}, you have to give the project file as argument). The file
282
type is recognised by the file name ending.
284
Assume you've written a grammar that consists of a symbol file
285
@file{english.sym}, an allomorph rule file @file{english.all}, a lexicon
286
file @file{english.lex} and a morphology rule file @file{english.mor},
287
and you have also written a project file @file{english.pro}. You first
288
have to create binary files from these files:
294
The binary files have the same name as their source counterparts, but have a
295
@file{_l} (for little endian processors like x86), a @file{_b}
296
(for big endian processors like HPPA) or a @file{_c} (for other architectures)
297
appended. Now you can start the program @code{malaga} by entering
298
the following command line: @code{malaga english.pro}.
300
The names of the grammar files will be read from the project file.
302
If you want to know about the command line arguments of a Malaga
303
program, you can get help by using the option @samp{-help} or
304
@samp{-h}, like @code{mallex -help}
305
@cindex @code{help} (command line option)
307
If you just want to know which version of a Malaga program you are using, you
308
can get the version number by using the option @samp{-version} or
309
@samp{-v}, like @code{malrul -version}
310
@cindex @code{version} (command line option)
312
The program just emits a few lines with information about its version number
313
and about using and copying it.
316
* Projects:: Describing the parts of a Malaga grammar.
317
* Profiles:: Settings for @code{malaga}, @code{mallex} and @code{malshow}.
318
* malaga:: Analysing words and sentences.
319
* mallex:: Generating and debugging the allomorph lexicon.
320
* malmake:: Controlling the compilation of a Malaga grammar.
321
* malrul:: Compiling a Malaga rule file.
322
* malsym:: Compiling a Malaga symbol file.
325
@c ----------------------------------------------------------------------------
327
@node Projects, Profiles, The Programs, The Programs
331
A couple of files, taken together, form a Malaga grammar:
334
@item The @dfn{lexicon file} (@file{.lex})
335
A lexicon of base forms.
337
@item The @dfn{prelex file} (@file{.prelex}, optional)
338
A precompiled lexicon in binary format.
340
@item The @dfn{allomorph rule file} (@file{.all})
341
A file with rules which generate the allomorphs of the base forms.
343
@item The @dfn{morphology rule file} (@file{.mor})
344
A file with rules which combine allomorphs to word forms.
346
@item The @dfn{symbol file} (@file{.sym})
347
A file with the symbols that may be used in rules and feature structures.
349
@item The @dfn{syntax rule file} (@file{.syn}, optional)
350
A file with rules that combine word forms to sentences.
352
@item The @dfn{extended symbol file} (@file{.esym}, optional)
353
A file with additional symbols that may only be used in a syntax rule file.
357
You can group these files together to a @dfn{project}. To do this, you
358
have to write a project file, with a name ending in @file{.pro}, in
359
which you list the names of the several files, each one behind a keyword
360
(each file type in a line on its own). Imagine you have written a
361
grammar that consists of the files @file{standard.sym},
362
@file{webster.lex}, @file{english.all}, @file{english.mor}, and
363
@file{english.syn}. The project file for this grammar will look like
374
In your source files, you can include further source files by using the
375
@code{include} statement; so a binary file of your grammar may be dependent on
376
several source files. The program @code{malmake} uses the information in the
377
project file to check for dependencies between source files and binaries, so
378
the project file must contain the name of all source files for a specific
379
binary. Relative path names are always relative to the directory of
382
Assume, you've got a lexicon file @file{webster.lex} that
386
include "suffixes.lex";
388
include "adjectives.lex";
390
include "particles.lex";
391
include "abbreviations.lex";
393
include "numbers.lex";
396
In this case, you must write the names of all these files in the @samp{lex:}
397
line of your project file behind the name of the real lexicon file:
400
lex: webster.lex suffixes.lex verbs.lex adjectives.lex
401
lex: nouns.lex particles.lex abbreviations.lex names.lex numbers.lex
404
Since there is a number of files in this example, the @samp{lex:} line has
405
been divided into two lines, each line starting with @samp{lex:}.
407
If you want to extend an existing project (for example, you might want to add a
408
syntax rule file to a morphology grammar), you can include the project file of
409
the morphology grammar in the project file of your syntax grammar by using a
410
line starting with @samp{include:}:
413
include: /projects/grammars/english/english.pro
414
syn: english-syntax.syn
417
The file entries in the project file of the morphology are treated as
418
if they would replace the @samp{include:} line. Relative paths in the
419
included file are relative to the @emph{included} directory, not the
420
@emph{including} directory.
422
The programs @code{malaga} and @code{mallex} can set options like
423
@code{hidden} or @code{robust} from the project file, so you do not need
424
to set these options each time you start @code{malaga}. Each line in the
425
project file that starts with @samp{malaga:} and @samp{mallex:},
426
respectively, will be executed when @code{malaga} and @code{mallex},
427
respectively, has been started, but you may only use the @code{set}
428
command, so you can only set options in the project file. Here is an
433
malaga: set hidden +semantics
434
malaga: set robust-rule on
435
mallex: set hidden +semantics +syntax
439
When you start @code{malaga}, the commands @code{set hidden +semantics} and
440
@code{set robust-rule on} will be executed; when you start @code{mallex}, the
441
command @code{set hidden +semantics +syntax} will be executed.
443
Options in project files that are read in by @samp{include:} lines in other
444
project files will be executed as if they were in place of the
445
@samp{include:} line.
447
Lines in project files that start with @samp{info:} contain information
448
about the grammar. In @code{malaga}, you get this information if you use the
449
command @code{info}. Example:
452
info: =====================================
453
info: Deutsche Malaga Morphologie 3.0
454
info: written by Oliver Lorenz, 11.04.1997
455
info: =====================================
458
@cindex character set
459
The @code{malshow} display program normally assumes that the character
460
set is @samp{iso8859-1}. If your grammar uses a different character set,
461
insert the name of the character set into your project file:
468
The Korean writing system, Hangul, needs special treatment, because the
469
characters it uses are syllables that must be split up into individual letters
470
for morphological analysis. Such a conversion is built-in into malaga. To
471
activate it, insert the following line into your project file:
477
@c ----------------------------------------------------------------------------
479
@node Profiles, malaga, Projects, The Programs
480
@section The Malaga Profiles @file{.malagarc} or @file{malaga.ini}
481
@cindex @code{.malagarc} (file)
482
@cindex @code{malaga.ini} (file)
485
If you prefer some options that you want to use with every Malaga
486
project, you may create a personal startup file. On Unix systems, it is
487
located in your home directory and is called @file{.malagarc}. In Windows
488
NT based systems, it is located in your user profile directory and is
489
called @file{malaga.ini}. In Windows DOS based systems, it is located in
490
the root directory of your system drive and is also called
491
@file{malaga.ini}. You can enter @code{malaga}
492
and @code{mallex} options in the same manner as you do in the project
496
malaga: set display-line "malshow"
497
malaga: set use-display yes
498
mallex: set display-line "malshow"
499
mallex: set use-display yes
502
The settings in your personal startup file override the settings in the project
505
You can set some attributes of the graphical user interface, namely the
506
position, the size, and the font size of each window that is part of the user
507
interface. Here is an example which sets every option available:
509
@cindex window geometry
512
allomorphs_geometry: 628x480+640+0
513
path_geometry: 628x480+640+0
514
result_geometry: 628x480+640+0
515
tree_geometry: 628x480+640+512
516
variables_geometry: 628x480+640+512
517
expressions_geometry: 628x480*640+0
523
The geometry defines the size and/or position of each window. The first
524
two numbers (@samp{628x480}) define the width and the height of the
525
window in pixels, the last two numbers (@samp{+640+512}) define the
526
position of its upper left corner. The available font sizes are 8, 10,
527
12, 14, 18, and 24 pixels.
529
@c ----------------------------------------------------------------------------
531
@node malaga, mallex, Profiles, The Programs
532
@section The Program @code{malaga}
533
@cindex @code{malaga} (program)
535
The program @code{malaga} is the user interface for analysing word forms and
536
sentences, displaying the results and finding bugs in a grammar. Start
537
@code{malaga} with the name of a project file as argument:
543
When @code{malaga} has been started, it loads the symbol file, the lexicon file
544
and the morphology rule file, and the syntax rule file, if there is one. After
545
loading, the @dfn{prompt} appears. Then @code{malaga} is ready to execute your
551
This is malaga, version 6.13.
552
Copyright (C) 1995 Bjoern Beutel.
553
This program is part of Malaga, a system for Natural Language Analysis.
554
You can distribute it under the terms of the GNU General Public License.
559
You can now enter any @code{malaga} command. If you are not sure about
560
the name of a command, use the command @code{help} to get an overview of
561
all @code{malaga} commands.
563
If you want to quit @code{malaga}, enter the command @code{quit}.
565
You can use the following command line options when you start @code{malaga}:
568
@item @samp{-morphology} or @samp{-m}
569
@cindex @code{morphology} (command line option)
570
Starts @code{malaga} in @dfn{morphology mode}. That is, word forms are
571
being read in from the standard input stream and analysed (one word form
572
per line). The analysis result is being written to the standard output
575
@item @samp{-syntax} or @samp{-s}
576
@cindex @code{syntax} (command line option)
577
Starts @code{malaga} in @dfn{syntax mode}. That is, sentences are being
578
read in from the standard input stream and analysed (one sentence per
579
line). The analysis result is being written to the standard output
582
@item @samp{-quoted} or @samp{-q}
583
@cindex @code{quoted} (command line option)
584
When @code{malaga} has been started in syntax or morphology mode, and the
585
option @samp{-quoted} has been used, then each input line must be enclosed in
586
double quotes which are removed prior to analysis. Within the double quotes
587
there may be any combination of printable characters except the backslash
588
@samp{\} and the double quotes. These characters must be preceded by a @samp{\}
591
@item @samp{-input} or @samp{-i}
592
@cindex @code{input} (command line option)
593
Starts @code{malaga} in @dfn{argument analysis mode}. That is, the
594
argument following the @samp{-input} is being analysed. Either the
595
@samp{-morphology} or the @samp{-syntax} option must also be
596
given. The analysis result is being pretty-printed to the standard
600
@c ----------------------------------------------------------------------------
602
@node mallex, malmake, malaga, The Programs
603
@section The Program @code{mallex}
604
@cindex @code{mallex} (program)
606
By using @code{mallex}, you can make the allomorph rules process the entries of
609
You can start @code{mallex} either with the name of a project file or with the
610
names of the needed grammar files:
619
mallex english.sym english.all english.lex
622
If you are not using a project file, you must give
626
the name of the symbol file (@file{.sym}),
628
the name of the allomorph rule file (@file{.all}),
630
the name of the lexicon file (@file{.lex}, in batch mode), and
632
the name of the prelex file (@file{.prelex}, in batch mode, optional).
635
Normally, @code{mallex} runs interactively: it loads the symbol file and the
636
allomorph rule file. Then the @dfn{prompt} appears:
641
This is mallex, version 6.13.
642
Copyright (C) 1995 Bjoern Beutel.
643
This program is part of Malaga, a system for Natural Language Analysis.
644
You can distribute it under the terms of the GNU General Public License.
649
You can now enter any @code{mallex} command. If you do not remember the command
650
names, you can use the command @code{help} to see an overview of the
651
@code{mallex} commands.
653
If you want to quit @code{mallex}, enter the command @code{quit}.
655
If you have started @code{mallex} by using the option @samp{-binary}
656
or @samp{-b}, it creates the run time lexicon file from the base form
657
lexicon file and the optional prelex file. If the lexicons are very
658
big or the allomorph rules are very complex, this can take some
659
time. After creation, @code{mallex} exits.
661
If you have started @code{mallex} by using the option @samp{-prelex}
662
or @samp{-p}, it creates a precompiled lexicon file from the source
663
lexicon file and the optional prelex file and exits.
665
You can use the following command line options when you start
669
@item @samp{-binary} or @samp{-b}
670
@cindex @code{binary} (command line option)
671
Runs @code{mallex} in batch mode and creates the run-time lexicon.
673
@item @samp{-readable} or @samp{-r}
674
@cindex @code{readable} (command line option)
675
Runs @code{mallex} in batch mode and outputs the allomorph lexicon in
676
readable form on the standard output stream.
678
@item @samp{-prelex} or @samp{-p}
679
@cindex @code{prelex} (command line option)
680
Runs @code{mallex} in batch mode, but doesn't apply the allomorph filter yet.
681
Outputs the allomorph lexicon as a @file{.prelex} binary file.
685
@c ----------------------------------------------------------------------------
687
@node malmake, malrul, mallex, The Programs
688
@section The Program @code{malmake}
689
@cindex @code{malmake} (program)
691
The program @code{malmake} reads a project file, checks if all grammar
692
files needed do exist, and translates all grammar files that have not
693
yet been translated or whose source files have changed since they have
694
been translated. @code{malmake} itself calls the programs
695
@code{malsym}, @code{mallex} and @code{malrul} if needed. An example:
696
assume you have written a morphology grammar whose grammar files are
697
bundled in a project file @file{english.pro}:
700
sym: rules/english.sym
701
all: rules/english.all
702
lex: rules/english.lex lex/adjectives.lex
703
lex: lex/particles.lex lex/suffixes.lex lex/verbs.lex
704
lex: lex/nouns.lex lex/abbreviations.lex lex/numbers.lex
705
mor: rules/english.mor
706
mallex: set hidden +semantics +syntax
707
malaga: set hidden +semantics
710
When executing @code{malmake dmm.pro} for the first time, the symbol file,
711
the rule files and the lexicon file will be translated:
720
project is up to date
725
If you want all files to be recompiled on all accounts, use the option
726
@file{-new} or @file{-n}.
728
The translation of a big lexicon can take some minutes, since the allomorph
729
rules have to be executed for each lexicon entry.
731
@c ----------------------------------------------------------------------------
733
@node malrul, malsym, malmake, The Programs
734
@section The Program @code{malrul}
736
The program @code{malrul} translates Malaga rule files, i.e.@ files that
737
have the endings @file{.all}, @file{.mor} or @file{.syn}. The compiled
738
file gets the suffix @file{_l}, @file{_b}, or @file{_c}, depending on the
739
endianness of your processor. Give the following arguments if you are starting
744
the name of the rule file that is to be translated, and
746
the name of the associated symbol file
747
(@file{.sym} or @file{.esym}).
750
The order of the arguments is arbitrary. Here is an example:
753
malrul english.mor english.sym
756
@c ----------------------------------------------------------------------------
758
@node malsym, , malrul, The Programs
759
@section The Program @code{malsym}
761
@code{malsym} can translate Malaga symbol files, i.e.@ files having the
762
ending @file{.sym} or @file{.esym}. The translated file gets the suffix
763
@file{_l}, @file{_b}, or @file{_c}, depending on the endianness of your
772
If you are translating an extended symbol file with the ending
773
@file{.esym}, enter the name of the compiled symbol file after the command
774
line option @file{-use} or @file{-u}:
777
malsym english.esym -use english.sym
780
This argument is needed since extended symbol files are extensions of ordinary
783
If you use the command line option @samp{-hangul} when starting
784
@code{malsym}, the symbol file and all the Malaga files that use it will
785
split up Hangul syllables in individual letters internally. This option
786
is invoked by @code{malmake} if the project file contains the line
787
@samp{char-set: hangul}.
789
@c ----------------------------------------------------------------------------
791
@node Commands, Options, The Programs, Top
792
@chapter The Commands of @code{malaga} and @code{mallex}
795
Since the user interfaces of @code{malaga} and @code{mallex} are very
796
similar and since they have a bunch of commands in common, I will
797
describe them in a common chapter. Commands that can be used in
798
@code{malaga} or in @code{mallex} only, are marked by the name of the
799
program in which they can be used.
802
* backtrace:: Show where rule execution has stopped.
803
* break:: Add a new breakpoint.
804
* clear-cache:: Clear the word cache.
805
* continue:: Continue execution up to next breakpoint.
806
* debug-ga:: Debug Generating Allomorphs.
807
* debug-ga-file:: Debug Generating Allomorphs from a file.
808
* debug-ga-line:: Debug Generating Allomorphs from a single line in a file.
809
* debug-ma:: Debug Morphology Analysis.
810
* debug-ma-line:: Debug Morphology Analysis of a line in a file.
811
* debug-sa:: Debug Syntax Analysis.
812
* debug-sa-line:: Debug Syntax Analysis of a line in a file.
813
* debug-state:: Debug rule execution at an analysis state.
814
* delete:: Delete breakpoints.
815
* down:: Show code position and variables in calling rule.
816
* finish:: Continue execution up to return or path termination.
817
* frame:: Show code position and variables of a frame.
818
* ga:: Generate Allomorphs.
819
* ga-file:: Generate Allomorphs from a file.
820
* ga-line:: Generate Allomorphs from a single line in a file.
821
* get:: Get current values of options.
822
* help:: Get help about commands and options.
823
* info:: Get info about current grammar.
824
* list:: List current breakpoints.
825
* ma:: Analyse a word.
826
* ma-file:: Analyse words in a file.
827
* ma-line:: Analyse a word at line in a file.
828
* mg:: Generate words from allomorphs.
829
* next:: Continue execution up to next line, skip subrules.
830
* print:: Print a variable or constant or a part of it.
831
* quit:: Quit @code{malaga} or @code{mallex}.
832
* read-constants:: Read constant definitions in lexicon file.
833
* result:: Show results.
834
* run:: Continue execution up to the end.
835
* sa:: Analyse a sentence.
836
* sa-file:: Analyse sentences in a file.
837
* sa-line:: Analyse a sentence at a line in a file.
838
* set:: Set values of options.
839
* sg:: Generate sentences from words.
840
* step:: Continue execution up to next line, enter subrules.
841
* transmit:: Send value to transmit process and print answer.
842
* tree:: Display analysis tree.
843
* up:: Show code position and variables in called rule.
844
* variables:: Display current variables.
845
* walk:: Execute until next rule.
846
* where:: Show current analysis state.
849
@c ----------------------------------------------------------------------------
851
@node backtrace, break, Commands, Commands
852
@section The Command @code{backtrace}
853
@cindex @code{backtrace} (command)
855
If you are executing your rules in debug mode or the rules were interrupted
856
by an error, this command shows where rule execution currently stopped. If it
857
stopped in a subrule, all calling rules are also shown. The currently examined
858
rule is marked with a @samp{*}:
863
*2: "dmm.mor", line 1218, rule "deletePOS"
864
1: "dmm.mor", line 31, rule "Start"
869
This means, rule execution stopped in frame 2, line 1218 of @file{dmm.mor},
870
in rule @code{deletePOS}. This subrule was called from frame 1, line 31 in
871
@file{dmm.mor}, in rule @code{Start}.
873
@c ----------------------------------------------------------------------------
875
@node break, clear-cache, backtrace, Commands
876
@section The Command @code{break}
877
@cindex @code{break} (command)
880
If you want to stop the rules at a specific point, for example to take a look
881
at the variables, you can use the command @code{break} to set
882
@dfn{breakpoints}. A breakpoint is a point in the rule source text where rule
883
execution is interrupted, so you can enter commands in debug mode. Breakpoints
884
are only active in debug mode, this means you have started rule execution by a
885
debug command or you have continued rule execution by one of the
886
commands @code{step}, @code{next}, @code{walk}, or @code{continue}.
888
Behind the command name, @code{break}, you can give one of the following
893
A breakpoint is set at this line in the current source file. If there is
894
no statement starting at this line, the breakpoint will be set at the
895
nearest line where a statement starts. You can, for example, set a
896
breakpoint at line 245 in the current source file by entering the
903
@item A file name and a line number.
904
A breakpoint is set at this line in this file. If there is no statement
905
starting at this line, the breakpoint will be set at the nearest line
906
where a statement starts. An example:
913
A breakpoint is set at the first statement in this rule. An example:
920
If the rule name or the file name is ambiguous, you can insert an abbreviation
921
for the rule system you refer to. Put it in front of the rule name or the file
922
name. The following abbreviations are used:
926
@samp{all} for allomorph rules,
928
@samp{mor} for morphology rules,
930
@samp{syn} for syntax rules,
933
If you omit any argument, the breakpoint is set on the current line in the
934
current file (this is helpful in debug mode).
936
Every breakpoint gets a unique number once it has been set, so you can delete
937
it later, when you do not need it any longer.
939
You can list the breakpoints using the command @code{list} and delete
940
them using @code{delete}.
942
@c ----------------------------------------------------------------------------
944
@node clear-cache, continue, break, Commands
945
@section The Command @code{clear-cache} (@code{malaga})
946
@cindex @code{clear-cache} (@code{malaga} command)
948
If you have changed your settings so that the wordform cache is no longer
949
valid, you can clear the cache using @code{clear-cache}. This can be necessary
950
if you have turned on/off input or output filters or modified switches.
952
@c ----------------------------------------------------------------------------
954
@node continue, debug-ga, clear-cache, Commands
955
@section The Command @code{continue}
956
@cindex @code{continue} (command)
958
This command can only be executed in debug mode. It resumes rule execution and
963
Rule execution is continued until a breakpoint is met or the rules have
964
been executed completely.
967
Rule execution is continued until a breakpoint is met, the rules have
968
been executed completely or the given line in the current source file is
969
met. If there is no statement starting at this line, execution will be
970
stopped at the nearest line where a statement starts. You can, for
971
example, continue execution until line 245 in the current source file is
972
met by entering the command
978
@item A file name and a line number.
979
Rule execution is continued until a breakpoint is met, the rules have
980
been executed completely or the given line in the given file is met. If
981
there is no statement starting at this line, execution will be stopped
982
at the nearest line where a statement starts. An example:
985
continue english.syn 59
989
Rule execution is continued until a breakpoint is met, the rules have
990
been executed completely or the first statement of the given rule is
998
The comparison must be of the form @code{@var{variable} = @var{value}},
999
where @var{variable} may be any variable name, maybe followed by a path,
1000
and @var{value} may be any Malaga value. Rule execution is continued
1001
until a breakpoint is met, the rules have been executed completely or
1002
until @var{variable} is defined and its value is @var{value}.
1005
@c ----------------------------------------------------------------------------
1007
@node debug-ga, debug-ga-file, continue, Commands
1008
@section The Command @code{debug-ga} (@code{mallex})
1009
@cindex @code{debug-ga} (@code{mallex} command)
1011
Use @code{debug-ga} to find errors in your allomorph rules. This command
1012
works like @code{ga}, but the allomorph generation will be stopped before the
1013
first statement of the first rule is executed:
1017
mallex> debug-ga [surface: "john", class: name]
1018
at rule "irregular_verb"
1023
The prompt @samp{debug>} that appears instead of @samp{mallex>} indicates
1024
that @code{mallex} is currently executing the allomorph rules but has been
1025
interrupted. Since this ability has been developed to support the
1026
@emph{debugging} of Malaga rules, this mode is called @dfn{debug mode}.
1028
When @code{mallex} arrives at the start of a new rule in debug mode (as in the
1029
example above), the name of this rule is printed. When in debug mode, you can
1030
always get the name of the current rule using the command @code{rule}.
1032
If you're running @code{mallex} from Emacs, another Emacs window will display
1033
the source file. An arrow is used to show to the statement that will be
1039
allo_rule irregular_verb ($entry):
1040
=>? $entry.class = verb;
1045
In debug mode, you can, for example, get the variables that are
1046
currently defined (using @code{variable} or @code{print}), and you can
1047
execute statements (using @code{step}, @code{next}, @code{walk},
1048
@code{continue}, or @code{run}). If you want to quit the debug mode,
1049
just enter @code{run}. The remaining statements for generation will then
1050
be executed without interruption.
1052
@c ----------------------------------------------------------------------------
1054
@node debug-ga-file, debug-ga-line, debug-ga, Commands
1055
@section The Command @code{debug-ga-file} (@code{mallex})
1056
@cindex @code{debug-ga-file} (@code{mallex} command)
1058
Use the command @code{debug-ga-file} to make the allomorph rules work on
1059
a lexicon file in debug mode. Assume you have written a lexicon file
1063
[surface: "m@{a@}n", class: noun];
1064
[surface: "table", class: noun];
1065
[surface: "wise", class: adjective];
1068
To let the rules process this lexicon in debug mode, enter:
1071
debug-ga-file mini.lex
1074
@c ----------------------------------------------------------------------------
1076
@node debug-ga-line, debug-ma, debug-ga-file, Commands
1077
@section The Command @code{debug-ga-line} (@code{mallex})
1078
@cindex @code{debug-ga-line} (@code{mallex} command)
1080
Use the command @code{debug-ga-line} to make the allomorph rules generate
1081
allomorphs for a single lexicon entry in debug mode. Assume you want to test
1082
the second line in the lexicon file @file{mini.lex}:
1085
[surface: "m@{a@}n", class: noun];
1086
[surface: "table", class: noun];
1087
[surface: "wise", class: adjective];
1090
Enter the following line:
1093
debug-ga-line mini.lex 2
1096
Then @code{mallex} stops in debug mode at the entry of the first allomorph rule
1097
that is being executed for the lexicon entry
1100
[surface: "table", class:noun];
1103
If there is no lexicon entry at this line, the subsequent lexicon entry will be
1106
@c ----------------------------------------------------------------------------
1108
@node debug-ma, debug-ma-line, debug-ga-line, Commands
1109
@section The Command @code{debug-ma} (@code{malaga})
1110
@cindex @code{debug-ma} (@code{malaga} command)
1112
Use the command @code{debug-ma} to find errors in your morphology combination
1113
rules. This command analyses the rest of the command line morphologically and
1114
executes the morphology combination rules in debug mode. Debug mode is
1115
explained for the command @code{debug-ga}.
1117
@c ----------------------------------------------------------------------------
1119
@node debug-ma-line, debug-sa, debug-ma, Commands
1120
@section The Command @code{debug-ma-line} (@code{malaga})
1121
@cindex @code{debug-ma-line} (@code{malaga} command)
1123
Use the command @code{debug-ma-line} to find errors in your morphology
1124
combination rules. This command analyses the rest of the command line
1125
morphologically and executes the morphology combination rules in debug mode.
1126
Debug mode is explained for the command @code{debug-ga}.
1128
@c ----------------------------------------------------------------------------
1130
@node debug-sa, debug-sa-line, debug-ma-line, Commands
1131
@section The Command @code{debug-sa} (@code{malaga})
1132
@cindex @code{debug-sa} (@code{malaga} command)
1134
Use the command @code{debug-sa} to find errors in your syntax combination
1135
rules. This command analyses the rest of the command line syntactically and
1136
executes the syntax combination rules in debug mode. Debug mode is explained
1137
for the command @code{debug-ga}.
1139
@c ----------------------------------------------------------------------------
1141
@node debug-sa-line, debug-state, debug-sa, Commands
1142
@section The Command @code{debug-sa-line} (@code{malaga})
1143
@cindex @code{debug-sa-line} (@code{malaga} command)
1145
Use the command @code{debug-sa-line} to find errors in your syntax
1146
combination rules. This command analyses the rest of the command line
1147
morphologically and executes the morphology combination rules in debug mode.
1148
Debug mode is explained for the command @code{debug-ga}.
1150
@c ----------------------------------------------------------------------------
1152
@node debug-state, delete, debug-sa-line, Commands
1153
@section The Command @code{debug-state} (@code{malaga})
1154
@cindex @code{debug-state} (@code{malaga} command)
1156
Use the command @code{debug-state} to execute the successor rules of a
1157
specific LAG state in debug mode. Previously, you must have already
1158
analysed a word or a sentence, respectively. Make malaga display the
1159
analysis tree by entering @code{tree}, move the mouse pointer over the
1160
state you want to debug, and press the left mouse button. A window
1161
opens in which this state's feature structure is shown. The window's title
1162
line contains the index of the state. Use this number as argument for
1163
@code{debug-state}. The last analysis input will be analysed again,
1164
and analysis stops when reaching the first successor rule of the
1165
specified state and malaga switches to debug mode. Debug mode is
1166
explained for the command @code{debug-ga}.
1168
@c ----------------------------------------------------------------------------
1170
@node delete, down, debug-state, Commands
1171
@section The Command @code{delete}
1172
@cindex @code{delete} (command)
1174
If you want to delete a breakpoint, use the command @code{delete} with the
1175
number of the breakpoints as argument.
1177
Enter @samp{delete all} to delete all breakpoints.
1179
@c ----------------------------------------------------------------------------
1181
@node down, finish, delete, Commands
1182
@section The Command @code{down}
1183
@cindex @code{down} (command)
1185
If you want to look at the source and the variables of the (sub)rule that is
1186
currently being called by the current subrule, you can do this by entering
1187
@code{down}. You can list the frames via @code{backtrace}.
1189
@c ----------------------------------------------------------------------------
1191
@node finish, frame, down, Commands
1192
@section The Command @code{finish}
1193
@cindex @code{finish} (command)
1195
This command can only be executed in debug mode. The rule execution will be
1196
resumed and continues until a @code{return} statement is met or until
1197
the current rule path will be terminated.
1199
@c ----------------------------------------------------------------------------
1201
@node frame, ga, finish, Commands
1202
@section The Command @code{frame}
1203
@cindex @code{frame} (command)
1205
If you want to look at the source and the variables of a (sub)rule that has
1206
called the current subrule, directly or indirectly, you can do this by typing
1207
@code{frame} and the number of the frame you want to examine. You can list the
1208
frames via @code{backtrace}.
1210
@c ----------------------------------------------------------------------------
1212
@node ga, ga-file, frame, Commands
1213
@section The Command @code{ga} (@code{mallex})
1214
@cindex @code{ga} (@code{mallex} command)
1216
Use the command @code{ga} (short for @emph{generate allomorphs}) to
1217
generate allomorphs. This is useful for testing allomorph generation
1218
from within @code{mallex}. When you enter the command, give a lexicon
1219
entry as argument. All allomorphs that are generated from this entry by
1220
the allomorph rules, are printed on screen. For example:
1224
mallex> ga [Lemma: "!", POS: Punctuation, Type: ExclamationMark]
1225
"!": [POS: <Punctuation>,
1226
Punctuation: <[Allomorph: "!",
1231
Type: ExclamationMark,
1238
If the rules create multiple allomorphs from an entry, they are displayed one
1241
@c ----------------------------------------------------------------------------
1243
@node ga-file, ga-line, ga, Commands
1244
@section The Command @code{ga-file} (@code{mallex})
1245
@cindex @code{ga-file} (@code{mallex} command)
1247
Use the command @code{ga-file} to make the allomorph rules generate allomorphs
1248
for a lexicon file. Assume you have written a lexicon file @file{mini.lex}:
1251
[surface: "m@{a@}n", class: noun];
1252
[surface: "table", class: noun];
1253
[surface: "wise", class: adjective];
1256
To generate the allomorphs for this lexicon, enter @samp{ga-file mini.lex}.
1258
This will produce a readable allomorph file whose name ends in
1259
@file{.out}; for @file{mini.lex} its name will be @file{mini.lex.out}:
1262
"man": [class: noun, syn: singular]
1263
"men": [class: noun, syn: plural]
1264
"table": [class: noun]
1265
"wise": [class: adjective, restr: complete]
1266
"wis": [class: adjective, restr: inflect]
1269
@c ----------------------------------------------------------------------------
1271
@node ga-line, get, ga-file, Commands
1272
@section The Command @code{ga-line} (@code{mallex})
1273
@cindex @code{ga-line} (@code{mallex} command)
1275
Use the command @code{ga-line} to make the allomorph rules generate
1276
allomorphs for a single lexicon entry. Assume you want to test
1277
the second line in the lexicon file @file{mini.lex}:
1280
[surface: "m@{a@}n", class: noun];
1281
[surface: "table", class: noun];
1282
[surface: "wise", class: adjective];
1285
Enter the following line:
1291
Then @code{mallex} generates allomorphs for
1292
@code{[surface: "table", class:noun];}.
1294
If there is no lexicon entry at this line, the subsequent lexicon entry will be
1297
@c ----------------------------------------------------------------------------
1299
@node get, help, ga-line, Commands
1300
@section The Command @code{get}
1301
@cindex @code{get} (command)
1303
This command is used to query settings of @code{malaga} or
1304
@code{mallex}. Enter it together with the name of the option whose
1305
setting you want to know. The possible options are described in the next
1306
chapter. If you just enter @samp{get}, all settings will be shown.
1308
@c ----------------------------------------------------------------------------
1310
@node help, info, get, Commands
1311
@section The Command @code{help}
1312
@cindex @code{help} (command)
1314
Use this command to get a list of the commands you can use. If you give the
1315
name of a command or an option as argument, a short explanation of this item
1316
will be printed. If a name represents a command as well as an option, prepend
1317
@samp{command} or @samp{option} to it.
1319
@c ----------------------------------------------------------------------------
1321
@node info, list, help, Commands
1322
@section The Command @code{info} (@code{malaga})
1323
@cindex @code{info} (@code{malaga} command)
1325
This command gives you information about the grammar you are using. It
1328
@c ----------------------------------------------------------------------------
1330
@node list, ma, info, Commands
1331
@section The Command @code{list}
1332
@cindex @code{list} (command)
1334
If you enter the command @code{list}, all breakpoints are listed. For each
1335
breakpoint, its number, the name of the source file and the source line is
1338
@c ----------------------------------------------------------------------------
1340
@node ma, ma-file, list, Commands
1341
@section The Command @code{ma} (@code{malaga})
1342
@cindex @code{ma} (@code{malaga} command)
1344
The command @code{ma} (for @emph{morphological analysis}) starts a word form
1345
analysis. Give the word form that you want to be analysed as argument:
1351
Malaga will show the results automatically, and it will also show the
1352
analysis tree automatically if you specified it using the
1353
@code{auto-tree} option. You can look at the results using
1354
@code{result} or at the entire analysis tree using @code{tree}.
1356
If you do not enter a word form behind the command @code{ma}, @code{malaga}
1357
re-analyses the last input.
1359
@c ----------------------------------------------------------------------------
1361
@node ma-file, ma-line, ma, Commands
1362
@section The Command @code{ma-file} (@code{malaga})
1363
@cindex @code{ma-file} (@code{malaga} command)
1365
The command @code{ma-file} can be used to analyse files that contain
1366
word lists. A word list consists of a number of word forms, each word
1367
form on a line on its own. There may be empty lines in a word list. The
1368
following example is a word list called @file{word-list}:
1377
To analyse this word list, enter:
1380
ma-file word-list result
1383
This will produce a file @file{result} that contains the analysis
1384
results. If the second argument is missing, the result will be written
1385
to a file whose name ends in @file{.out}; for @file{word-list}, its name
1386
will be @file{word-list.out}:
1389
1: "table": [class: noun, ...]
1390
2: "men's": [class: noun, ...]
1391
3: "blue": [class: noun, ...]
1392
3: "blue": [class: adjective, ...]
1393
3: "blue": [class: name, ...]
1394
4: "handicap: unknown
1397
The number at the line start represents the line number of the analysed
1398
original word form. The output format can be changed by using the options
1399
@code{result-format} and @code{unknown-format}.
1401
If a runtime error occurs during the analysis of a word, the line will be
1402
printed in the format given by the option @code{error-format}.
1404
After the analysis, some statistics will be printed:
1407
@item The number of analysed word forms.
1408
@item The number of recognised word forms.
1409
@item The number of word forms recognised by combi-rules and end-rules.
1410
@item The number of word forms recognised by robust-rules.
1411
@item The number of word forms whose analyses produced errors.
1412
@item The average number of results per word form.
1413
@item The analysis run time.
1414
@item The average number of word forms that have been analysed per second.
1415
@item The number of cache accesses.
1416
@item The number of cache hits.
1419
@c ----------------------------------------------------------------------------
1421
@node ma-line, mg, ma-file, Commands
1422
@section The Command @code{ma-line} (@code{malaga})
1423
@cindex @code{ma-line} (@code{malaga} command)
1425
You can use this command to analyse a single line in a text file
1426
morphologically. Assume you want to analyse the word in the third line in the
1427
file @file{words}. Then enter the following command:
1433
Malaga will show the results automatically, and it will also show the
1434
analysis tree automatically if you specified it using the
1435
@code{auto-tree} option. You can look at the results using @code{result}
1436
or at the entire analysis tree using @code{tree}.
1438
@c ----------------------------------------------------------------------------
1440
@node mg, next, ma-line, Commands
1441
@section The Command @code{mg} (@code{malaga})
1442
@cindex @code{mg} (@code{malaga} command)
1444
Use the command @code{mg} to generate all word forms that consist of a
1445
specified set of allomorphs. For example, the command
1448
mg 3 un able believe
1451
This generates all word forms that consist of up to three allomorphs,
1452
where only the specified allomorphs (@samp{un}, @samp{able}, and
1453
@samp{believe}) are used. The word forms are numbered from 1 onward, but
1454
different analyses of the same word form get the same index. The output
1455
will look like this:
1459
malaga> mg 3 un able believe
1468
Please note that generation does not know of filters, pruning rules and
1471
@c ----------------------------------------------------------------------------
1473
@node next, print, mg, Commands
1474
@section The Command @code{next}
1475
@cindex @code{next} (command)
1477
This command can only be executed in debug mode. The rule execution
1478
will be resumed and continues until a different source line is met, a
1479
different path is going to be executed since the old one has
1480
terminated, or until the rules have been executed completely. It is
1481
like @code{step}, but subrules will be executed without
1482
interruption. If you specify a number as argument, the command will be
1483
repeated as often as specified.
1485
@c ----------------------------------------------------------------------------
1487
@node print, quit, next, Commands
1488
@section The Command @code{print}
1489
@cindex @code{print} (command)
1491
The command @code{print} is used to print the current values of Malaga
1492
variables or named constants, or parts of them. You can specify any
1493
variable or constant names (including the @samp{$} or @samp{@@}) as
1494
arguments to this command; you can also specify a path of attributes
1495
and/or indexes (with suffix @samp{L} or @samp{R}) behind each of the
1496
variable or constant names. In that case, only the values of the
1497
specified paths are printed:
1502
$word = [class: pronoun,
1504
debug> print $word.class
1505
$word.class = pronoun
1506
debug> print @@plan.1L.name
1507
$plan.1L = declarative
1512
If the option @code{use-display} is on, the expressions will be displayed in
1513
window on their own. If the @code{Expressions} window is not open yet, it will
1514
open now. If there is an open @code{Expressions} window, the new
1515
expressions and their values will be displayed in this window.
1517
You can left-click on an expression to make its value disappear or appear
1518
again. You can middle-click or right-click on an expression to erase
1521
The @code{Expressions} window has a menu with some commands:
1526
@item Export Postscript...
1527
Export the Expressions window as an Embedded Postscript file.
1529
Close the @code{Expressions} window.
1534
Select an item to adjust the font size.
1536
Normally, all values and subvalues are aligned at their bottom. If
1537
this option is active, records are ``hanging down'': they are
1538
aligned at their top.
1543
Clear all expressions.
1545
Display the values of all expressions currently displayed.
1547
Suppress the values of all expressions currently displayed.
1551
@c ----------------------------------------------------------------------------
1553
@node quit, read-constants, print, Commands
1554
@section The Command @code{quit}
1555
@cindex @code{quit} (command)
1557
Use this command to leave @code{malaga} or @code{mallex}.
1559
@c ----------------------------------------------------------------------------
1561
@node read-constants, result, quit, Commands
1562
@section The Command @code{read-constants} (@code{mallex})
1563
@cindex @code{read-constants} (@code{mallex} command)
1565
If you want to parse lexicon entries that use Malaga constants (prefixed by
1566
@samp{@@}), these constants can be read in using the command
1567
@samp{read-constants @var{lexicon-file}}. It parses @var{lexicon-file} and
1568
memorizes all constant definitions in it.
1570
@c ----------------------------------------------------------------------------
1572
@node result, run, read-constants, Commands
1573
@section The Command @code{result}
1574
@cindex @code{result} (command)
1576
If you have previously analysed a word form or a sentence using
1577
@code{ma}, @code{ma-line}, @code{sa}, or @code{sa-line} (in
1578
@code{malaga}), or you have generated allomorphs using @code{ga} or
1579
@code{ga-line} (in @code{mallex}), you can display the results with
1583
@item @code{use-display} is off:
1584
The results will be printed on standard output.
1586
@item @code{use-display} is on:
1587
The results will be displayed in a window on their own which is called
1588
@code{Results} for @code{malaga} and @code{Allomorphs} for
1589
@code{mallex}. They are numbered from 1 onward.
1591
If you are executing the command @code{result} for the first time, or if
1592
you have closed a @code{Results/Allomorphs} window that you'd opened
1593
before, a window will open, displaying the values of all
1594
results/allomorphs of the last analysis/generation.
1596
If there is a @code{Results/Allomorphs} window currently opened, the new
1597
results/allomorphs will be displayed in this window.
1600
The @code{Result/Allomorphs} window has a menu with some commands:
1605
@item Export Postscript...
1606
Export the result display as an Embedded Postscript file.
1608
Close the @code{Result/Allomorphs} window.
1613
Select an item to adjust the font size.
1615
Normally, all values and subvalues are aligned at their bottom. If this
1616
option is active, records are ``hanging down'': they are aligned at
1621
@c ----------------------------------------------------------------------------
1623
@node run, sa, result, Commands
1624
@section The Command @code{run}
1625
@cindex @code{run} (command)
1627
This command can only be used in debug mode. The rule execution will be
1628
resumed, and the rules will be executed completely without any interruption.
1630
If you have invoked debug mode by the command @code{debug-node}, rule
1631
execution will be stopped again when another link is going to be analysed.
1633
@c ----------------------------------------------------------------------------
1635
@node sa, sa-file, run, Commands
1636
@section The Command @code{sa} (@code{malaga})
1637
@cindex @code{sa} (@code{malaga} command)
1639
If you have started @code{malaga} with a syntax file in your command line or in
1640
the project file, you can start syntactic analyses using the command @code{sa}
1641
(short for @emph{syntactic analysis}). Put the sentence you want to be
1642
analysed as argument behind the command name:
1645
sa The man is in town.
1648
Malaga will show the results automatically, and it will also show the analysis
1649
tree automatically if you specified it using the @code{tree} option. You can
1650
look at the results using @code{result} or at the entire analysis tree using
1653
If you do not enter a sentence behind the command @code{sa}, @code{malaga}
1654
re-analyses the last input.
1656
@c ----------------------------------------------------------------------------
1658
@node sa-file, sa-line, sa, Commands
1659
@section The Command @code{sa-file} (@code{malaga})
1660
@cindex @code{sa-file} (@code{malaga} command)
1662
Using the command @code{sa-file}, you can analyse files that contain
1663
sentence lists. In a sentence list, each sentence stands in a line on
1664
its own; empty lines are permitted. Here is an example, a sentence list
1665
named @file{sentence-list}:
1674
To analyse this sentence list, enter:
1677
sa-file sentence-list result
1680
This will produce a file @file{result} that contains the analysis
1681
results. If the second argument is missing, the result will be written
1682
to a file whose name ends in @file{.out}; for @file{sentence-list}, its
1683
name will be @file{sentence-list.out}.
1686
1: "He sleeps.": [functor: [syn: <S3>, sem: <"sleep">]]
1687
2: "He slept.": [functor: [syn: <S3>, sem: <"sleep">]]
1688
3: "He has slept.": [functor: [syn: <S3>, sem: <"have", "sleep">]]
1689
4: "He had slept.": [functor: [syn: <S3>, sem: <"have", "sleep">]]
1692
The number at the line start represents the line number of the analysed
1693
original sentence. The output format can be changed by using the options
1694
@code{result-format} and @code{unknown-format}.
1696
If a runtime error occurs during the analysis of a sentence, the line will be
1697
printed in the format given by the option @code{error-format}.
1699
After the analysis, some statistics will be printed:
1702
@item The number of analysed sentences.
1703
@item The number of recognised sentences.
1704
@item The number of sentences recognised by combi-rules and end-rules.
1705
@item The number of sentences recognised by robust-rules.
1706
@item The number of sentences whose analyses produced errors.
1707
@item The average number of results per sentence.
1708
@item The analysis run time.
1709
@item The average number of sentences that have been analysed per second.
1710
@item The number of cache accesses.
1711
@item The number of cache hits.
1714
@c ----------------------------------------------------------------------------
1716
@node sa-line, set, sa-file, Commands
1717
@section The Command @code{sa-line} (@code{malaga})
1718
@cindex @code{sa-line} (@code{malaga} command)
1720
If you have started @code{malaga} with a syntax file in your command
1721
line or in the project file, you can start syntactic analyses using the
1722
command @code{sa-line} (short for @emph{syntactic analysis}). Assume you
1723
want to analyse the sentence in the third line in the file
1724
@file{sentences}. Then enter the following command:
1730
Malaga will show the results automatically, and it will also show the
1731
analysis tree automatically if you specified it using the
1732
@code{auto-tree} option. You can look at the results using
1733
@code{result} or at the entire analysis tree using @code{tree}.
1735
@c ----------------------------------------------------------------------------
1737
@node set, sg, sa-line, Commands
1738
@section The Command @code{set}
1739
@cindex @code{set} (command)
1741
This command is used to change the settings of @code{malaga} or
1742
@code{mallex}. The command line @samp{set @var{option argument}} changes
1743
@var{option} to @var{argument}. If you want to get the current state of
1744
an option, use the command @code{get}. Options can also be set in the
1745
project file. The possible options are described in the next chapter.
1747
@c ----------------------------------------------------------------------------
1749
@node sg, step, set, Commands
1750
@section The Command @code{sg} (@code{malaga})
1751
@cindex @code{sg} (@code{malaga} command)
1753
Use @code{sg} to generate sentences that are composed of a specified set
1754
of word forms. For example, enter:
1757
sg 3 . ? he she sleeps
1760
All sentences that consist of up to three word forms, where only the specified
1761
word forms (``.'', ``?'', ``he'', ``she'', and ``sleeps'') are used. The
1762
sentences are numbered from 1 onward, but different analyses of the same
1763
sentence get the same index. The output looks like this:
1767
malaga> sg 3 . ? he she sleeps
1776
Please note that generation does not know of filters, pruning rules and
1779
@c ----------------------------------------------------------------------------
1781
@node step, transmit, sg, Commands
1782
@section The Command @code{step}
1783
@cindex @code{step} (command)
1785
This command can only be executed in debug mode. The rule execution
1786
will be resumed and continues until a different source line is met, a
1787
different path is going to be executed since the old one has
1788
terminated, or until the rules have been executed completely.
1790
@c ----------------------------------------------------------------------------
1792
@node transmit, tree, step, Commands
1793
@section The Command @code{transmit}
1794
@cindex @code{transmit} (command)
1796
If you have specified a transmit command line (to do this, use the option
1797
@code{transmit-line}), you can send a command to it:
1801
malaga> set transmit-line cat
1802
malaga> transmit [surf: "go", POS: verb];
1809
@c ----------------------------------------------------------------------------
1811
@node tree, up, transmit, Commands
1812
@section The Command @code{tree} (@code{malaga})
1813
@cindex @code{tree} (@code{malaga} command)
1815
If you've started a grammatical analysis using one of the commands @code{ma} or
1816
@code{sa} (or their debug variants), you can make @code{malaga} display the
1823
If the analysis has not yet finished (in debug mode or in case of an error), an
1824
partial tree will be shown.
1826
If you're executing the command @code{tree} for the first time, or if you've
1827
closed the @code{Tree} window before, a new tree window will open in which the
1828
current analysis tree will be displayed.
1830
If there is already a @code{Tree} window open, the new analysis tree will be
1831
displayed in this window.
1833
In the upper left corner of the @code{Tree} window, you will see the
1834
sentence or the word form that has been analysed. Below, the analysis
1835
tree is displayed. An analysis path always follows the edges from the
1838
A circle node stands for a LAG state, a two-circle node stands for an end
1839
state. A crossed circle stands for a LAG state that has been removed by a
1840
pruning-rule, and a crossed two-circle node stands for an end state that is
1841
invalid because it has some remaining input still remaining. A box node
1842
is not a state, but a @dfn{dead end}, which means that no rule has created a
1843
state at this position.
1845
Above each edge, the link's surface of the corresponding rule application is
1846
displayed. Below the edge, you'll see the name of the applied rule.
1848
You can click on a node using any mouse button. Then another window will
1849
open, namely the @code{Path} window. The @code{Path} window displays the
1850
surface, the feature structure and the successor rules of the state you've
1851
clicked on. The node will be highlighted by a red border. If you've
1852
already clicked on a node, you can click on one of its successor nodes
1853
using the right mouse button or on one of its predecessor nodes using
1854
the left mouse button. Then all rule applications, from the state
1855
clicked on previously up to the state clicked on this time, will be
1856
displayed in the @code{Path} windows. The corresponding path will be
1857
highlighted in the @code{Tree} window. If you click on a node with the
1858
middle mouse button, only this node will be displayed in the @code{Path}
1861
If you're clicking on a link surface using any mouse button, the surface
1862
and its feature structure will be displayed in the @code{Path} window.
1864
You can also click on rule names using any mouse button. Then the corresponding
1865
rule application will be displayed in the @code{Path} window, i.e.@ the
1866
surfaces and feature structures of the original state, the link, and the
1867
successor state, and the successor rules.
1869
There are some commands that can be started from the @code{Tree} menu bar:
1874
@item Export Postscript...
1875
Export the displayed analysis tree as an Embedded Postscript file.
1877
Close the @code{Tree} window.
1880
Select an item in this menu to adjust the font size.
1882
Specify which nodes of the analysis tree are actually displayed.
1886
All analysis states are displayed, and also boxes for rule
1887
applications that did not succeed (dead ends).
1889
All analysis states are displayed.
1890
@item Complete paths
1891
Only the nodes that are part of a complete analysis are displayed.
1894
Select an end state to display in the @code{Path} window.
1898
Display the first end state.
1900
If there is an end state displayed in the @code{Path} window, jump
1901
to the previous one.
1903
If there is an end state displayed in the @code{Path} window, jump
1906
Display the last end state.
1910
The @code{Path} windows has got its own menu bar which contains the menus
1911
@code{Window}, @code{Style} and @code{End States} with the same menu
1912
items as the corresponding menus in the @code{Tree} window, and two
1913
additional options in @code{Style}:
1917
Normally, all values and subvalues are aligned at their bottom. If this
1918
option is active, records are ``hanging down'': they are aligned at
1921
Normally, a state is displayed with surface, feature structure and rule set
1922
stacked. If this option is active, they are displayed aligned on on
1926
@c ----------------------------------------------------------------------------
1928
@node up, variables, tree, Commands
1929
@section The Command @code{up}
1930
@cindex @code{up} (command)
1932
If you want to look at the source and the variables of the (sub)rule that has
1933
called the current subrule, you can do this by entering @code{up}. You can list
1934
the frames via @code{backtrace}.
1936
@c ----------------------------------------------------------------------------
1938
@node variables, walk, up, Commands
1939
@section The Command @code{variables}
1940
@cindex @code{variables} (command)
1942
If you invoke @code{variables}, you get the values of all Malaga
1943
variables that are currently defined. The variables will be shown in the
1944
order of their definitions. You can only use the command
1945
@code{variables} in debug mode or if the previous analysis has stopped
1946
with an error in the combination rules.
1948
If the option @code{use-display} is off, the variables will be printed on
1953
malaga> sa-debug You are so beautiful.
1954
entering rule "Noun", surf: "", link: "You", state: 1
1956
$sentence = [class: main_clause,
1958
$word = [class: pronoun,
1964
If the option @code{use-display} is on, the variables will be displayed in
1965
window on their own. If the @code{Variables} window is not open yet, it will
1966
open now. If there is an open @code{Variables} window, the new variable
1967
contents will be displayed in this window.
1969
You can left-click on a variable name to make its value disappear or appear
1972
The @code{Variables} window has a menu with some commands:
1977
@item Export Postscript...
1978
Export the variable display as an Embedded Postscript file.
1980
Close the @code{Variables} window.
1985
Select an item to adjust the font size.
1987
Normally, all values and subvalues are aligned at their bottom. If
1988
this option is active, records are ``hanging down'': they are
1989
aligned at their top.
1994
Display the values of all variables currently defined.
1996
Suppress the values of all variables currently defined.
2000
@c ----------------------------------------------------------------------------
2002
@node walk, where, variables, Commands
2003
@section The Command @code{walk}
2004
@cindex @code{walk} (command)
2006
This command works in debug mode only. The rule execution will be continued and
2007
stopped again as soon as a new rule is executed, a breakpoint is met or there
2008
are no more rules to execute.
2010
@c ----------------------------------------------------------------------------
2012
@node where, , walk, Commands
2013
@section The Command @code{where}
2014
@cindex @code{where} (command)
2016
This command can only be used in debugger mode or after rule execution has been
2017
stopped by an error. It prints the name of the rule that has been executed;
2018
additionally, the surfaces of state and link are printed in @code{malaga}. For
2024
at rule "flexion", surf: "hous", link: "es", state: 2
2029
@c ----------------------------------------------------------------------------
2031
@node Options, The Language, Commands, Top
2032
@chapter The Options of @code{malaga} and @code{mallex}
2035
The programs @code{malaga} and @code{mallex} share some of their
2036
options, so I will describe them in a common chapter. Options can be set
2037
using the command @code{set}, and you can get the current value of an
2038
option using @code{get}. Options that can be used in @code{malaga} or
2039
in @code{mallex} only, are marked by the name of the program in which
2043
* alias:: Shortcuts for other commands.
2044
* allo-format:: The output format for allomorphs in readable form.
2045
* auto-tree:: Is the analysis tree displayed automatically after analysis?
2046
* auto-variables:: Are variables displayed automatically in debug mode?
2047
* cache-size:: The size of the word form cache.
2048
* display-line:: The command line for the graphical display.
2049
* error-format:: The output-format for analyses that reported an error.
2050
* hidden:: The attributes whose values are hidden in output.
2051
* mor-incomplete:: Will we accept words that have been incompletely parsed?
2052
* mor-out-filter:: Will the morphology output filter be executed?
2053
* pruning-rule:: Will the pruning rule be executed?
2054
* result-format:: The output format for successful analyses.
2055
* result-list:: Pack all analysis results in a single list.
2056
* robust-rule:: Will the robust rule be executed?
2057
* sort-records:: The order of the attributes in a record when printed.
2058
* switch:: User options that can be read by the grammar.
2059
* syn-incomplete:: Will we accept sentences that have been incompletely parsed?
2060
* syn-in-filter:: Will the syntax input filter be executed?
2061
* syn-out-filter:: Will the syntax output filter be executed?
2062
* transmit-line:: The command line for the transmit process.
2063
* unknown-format:: The output format for analyses that got no results.
2064
* use-display:: Will the program in @code{display-line} be used for output?
2065
* use-ksc:: Is KSC5601 used for output? (for Hangul grammars only)
2068
@c ----------------------------------------------------------------------------
2070
@node alias, allo-format, Options, Options
2071
@section The Option @code{alias}
2072
@cindex @code{alias} (option)
2074
With @code{alias}, you can define abbreviations for longer command
2075
lines. As arguments, give an alias name and an expansion (a command line
2076
which the name will stand for). If the expansion contains spaces,
2077
enclose it in double quotes. Use @code{set alias @var{name}} to delete
2080
If you type the name of an alias at your command line, its expansion
2081
will be executed. The character sequence @samp{%a} in your alias
2082
definition will be replaced by what follows the alias name in the
2085
Aliases cannot be nested.
2087
@c ----------------------------------------------------------------------------
2089
@node allo-format, auto-tree, alias, Options
2090
@section The Option @code{allo-format} (@code{mallex})
2091
@cindex @code{allo-format} (@code{mallex} option)
2093
With @code{allo-format}, you can change the output format for the
2094
generated allomorphs. Enter a format string as argument. If the format
2095
string contains spaces, enclose it in double quotes. If the argument is
2096
an empty string (@code{""}), no allomorphs will be shown.
2098
In the format string, the following sequences have a special meaning:
2102
Will be replaced by the allomorph's feature structure.
2104
Will be replaced by the allomorph's number.
2106
Will be replaced by the allomorph's surface.
2109
@c ----------------------------------------------------------------------------
2111
@node auto-tree, auto-variables, allo-format, Options
2112
@section The Option @code{auto-tree} (@code{malaga})
2113
@cindex @code{auto-tree} (@code{malaga} option)
2115
You can use @code{auto-tree} to make @code{malaga} execute the
2116
@code{tree} command each time when you invoked an analysis by @code{ma}
2117
or @code{sa}. Set it in one of the following ways:
2120
@item set auto-tree yes
2121
The @code{tree} command will be executed after each analysis.
2122
@item set auto-tree no
2123
The @code{tree} command will not be executed automatically.
2126
@c ----------------------------------------------------------------------------
2128
@node auto-variables, cache-size, auto-tree, Options
2129
@section The Option @code{auto-variables}
2130
@cindex @code{auto-variables} (option)
2132
When @code{malaga} or @code{mallex} stops in debug mode while executing
2133
a malaga rule, they can automatically show the defined variables at this
2134
point. Use the option @code{auto-variables} to set this behaviour.
2137
@item set auto-variables yes
2138
The @code{variables} command will be executed each time when
2139
@code{malaga} or @code{mallex} stops in debug mode.
2140
@item set auto-variables no
2141
The @code{variables} command will not be executed automatically.
2144
@c ----------------------------------------------------------------------------
2146
@node cache-size, display-line, auto-variables, Options
2147
@section The Option @code{cache-size} (@code{malaga})
2148
@cindex @code{cache-size} (@code{malaga} option)
2150
Malaga has a cache for word forms. You can set the cache size, i.e. the maximum
2151
number of words in the cache, to @var{n} with @code{set cache-size @var{n}}.
2152
If you set the cache size to 0, the cache will be deactivated.
2154
When malaga analyses a word form or sentence, it tries to get a word form from
2155
the cache before it uses the morphology combination rules. Therefore, malaga
2156
separates the first word form from the remaining input. It uses spacing
2157
characters as separators; so if a word-form contains a space or does not end
2158
with a space, caching will not work.
2160
@c ----------------------------------------------------------------------------
2162
@node display-line, error-format, cache-size, Options
2163
@section The Option @code{display-line}
2164
@cindex @code{display-line} (option)
2166
The programs @code{malaga} and @code{mallex} use the program @code{malshow} to
2167
show the Malaga trees, results or variables graphically. If you want to use a
2168
different display program, set the command line that starts this program with
2169
the @code{display} option, like this:
2172
set display-line "java -classpath /opt/malaga/amalgam Amalgam"
2175
@c ----------------------------------------------------------------------------
2177
@node error-format, hidden, display-line, Options
2178
@section The Option @code{error-format} (@code{malaga})
2179
@cindex @code{error-format} (@code{malaga} option)
2181
With @code{error-format}, you can change the output format for items
2182
that produced an analysis error. Enter a format string as argument. If the
2183
format string contains spaces, enclose it in double quotes. If the argument is
2184
an empty string (@code{""}), no forms that produced an error will be shown.
2186
In the format string, the following sequences have a special meaning:
2190
Will be replaced by the error message for the analysed form.
2192
Will be replaced by the line number of the analysed form.
2194
Will be replaced by the number of analysis states for this form.
2196
Will be replaced by the surface.
2199
@c ----------------------------------------------------------------------------
2201
@node hidden, mor-incomplete, error-format, Options
2202
@section The Option @code{hidden}
2203
@cindex @code{hidden} (option)
2205
Some grammars can produce very large feature structures, so it can be useful
2206
not to show the values of some specified attributes. To achieve this, use the
2207
option @code{hidden}. You can give any number of arguments to this option. The
2208
following arguments are available:
2211
@item +@var{attribute-name}
2212
The specified attribute name will be put in parentheses if it occurs in
2213
a value; the attribute value will not be shown.
2214
@item -@var{attribute-name}
2215
The specified attribute will be shown completely again in the future.
2217
All attributes will be shown completely again in the future.
2220
@c ----------------------------------------------------------------------------
2222
@node mor-incomplete, mor-out-filter, hidden, Options
2223
@section The Option @code{mor-incomplete} (@code{malaga})
2224
@cindex @code{mor-incomplete} (@code{malaga} option)
2226
If you want to get morphological analysis results not only for the whole input
2227
line, but for any grammatically well-formed prefix of the input line, you can
2228
use the option @code{mor-incomplete}:
2231
@item set mor-incomplete yes
2232
Accept words that have been incompletely parsed.
2233
@item set mor-incomplete no
2234
Only accept words that have been completely parsed.
2237
Note that this option has no effect in subordinate morphological analyses that
2238
are needed by syntactic analysis.
2240
@c ----------------------------------------------------------------------------
2242
@node mor-out-filter, pruning-rule, mor-incomplete, Options
2243
@section The Option @code{mor-out-filter} (@code{malaga})
2244
@cindex @code{mor-out-filter} (@code{malaga} option)
2246
Use the option @code{mor-out-filter} to switch the morphology output-filter
2250
@item set mor-out-filter yes
2251
Activate the filter.
2252
@item set mor-out-filter no
2253
Deactivate the filter.
2256
@c ----------------------------------------------------------------------------
2258
@node pruning-rule, result-format, mor-out-filter, Options
2259
@section The Option @code{pruning-rule} (@code{malaga})
2260
@cindex @code{pruning-rule} (@code{malaga} option)
2262
In your syntax rules, you may have specified a pruning rule that can prune the
2263
syntax analysis tree, i.e it can reduce the number of parallel paths. If you
2264
want this pruning rule to be executed, use the option @code{pruning}.
2265
Use one of the following arguments:
2268
@item set pruning-rule yes
2269
Activate the pruning rule.
2270
@item set pruning-rule no
2271
Deactivate the pruning rule.
2274
@c ----------------------------------------------------------------------------
2276
@node result-format, result-list, pruning-rule, Options
2277
@section The Option @code{result-format} (@code{malaga})
2278
@cindex @code{result-format} (@code{malaga} option)
2280
With @code{result-format}, you can change the output format for analysed items
2281
that have been recognised. Enter a format string as argument. If the format
2282
string contains spaces, enclose it in double quotes. If the argument is an
2283
empty string (@code{""}), no recognised forms will be shown.
2285
In the format string, the following sequences have a special meaning:
2289
Will be replaced by the result feature structure of the analysis.
2291
Will be replaced by the line number of the analysed form.
2293
Will be replaced by the number of analysis states for this form.
2295
Will be replaced by the reading index (the results for a form are
2296
indexed from 1 to the number of results).
2298
Will be replaced by the surface.
2301
@c ----------------------------------------------------------------------------
2303
@node result-list, robust-rule, result-format, Options
2304
@section The Option @code{result-list} (@code{malaga})
2305
@cindex @code{result-list} (@code{malaga} option)
2307
With this command, you can specify whether you want malaga to pack all
2308
analysis results into a single list when printing. This option only
2309
has an impact in filter mode or when a file is being analysed. Even results
2310
of different lengths are combined; this could not be achieved by an
2311
output-filter. Results of different lenghts can occur when the option
2312
@code{mor-incomplete} or @code{syn-incomplete} is active.
2315
@item set result-list yes
2316
Combine results into a single list.
2317
@item set result-list no
2318
Leave results unchanged.
2321
@c ----------------------------------------------------------------------------
2323
@node robust-rule, sort-records, result-list, Options
2324
@section The Option @code{robust-rule} (@code{malaga})
2325
@cindex @code{robust-rule} (@code{malaga} option)
2327
With this command, you can specify if you want to run a robust-rule for the
2328
word forms that could not be recognised by LAG rules. The robust-rule gets the
2329
surface of an unknown word form as parameter and it can create one or more
2330
results by executing the @code{result} statement.
2333
@item set robust-rule yes
2334
Enable the robust rule.
2335
@item set robust-rule no
2336
Disable the robust rule.
2339
@c ----------------------------------------------------------------------------
2341
@node sort-records, switch, robust-rule, Options
2342
@section The Option @code{sort-records}
2343
@cindex @code{sort-records} (option)
2344
@cindex order, attribute
2345
@cindex attribute order
2347
There are different ways to determine the order in which the attributes of a
2348
record are printed. With @code{sort-records}, you can choose between three
2352
@item set sort-records internal
2353
The attributes will be printed in the order they have internally.
2354
@item set sort-records alphabetic
2355
The attributes will be ordered alphabetically by their names.
2356
@item set sort-records definition
2357
The attributes will be ordered by their names; the order is the same as
2358
in the symbol table.
2361
@c ----------------------------------------------------------------------------
2363
@node switch, syn-incomplete, sort-records, Options
2364
@section The Option @code{switch}
2365
@cindex @code{switch} (option)
2367
Malaga rules can query simple Malaga values (@dfn{switches}) that you can
2368
change during run time. Use the option @code{switch} to change the values:
2371
@item set switch @var{name} @var{value}
2372
Set the switch @var{name}, which must be a symbol, to @var{value}, which
2373
can be any Malaga value.
2376
@c ----------------------------------------------------------------------------
2378
@node syn-incomplete, syn-in-filter, switch, Options
2379
@section The Option @code{syn-incomplete} (@code{malaga})
2380
@cindex @code{syn-incomplete} (@code{malaga} option)
2382
If you want to get syntactic analysis results not only for the whole input
2383
line, but for any grammatically well-formed prefix of the sentence, you can use
2384
the option @code{syn-incomplete}:
2387
@item set syn-incomplete yes
2388
Accept sentences that have been incompletely parsed.
2389
@item set syn-incomplete no
2390
Only accept sentences that have been completely parsed.
2393
@c ----------------------------------------------------------------------------
2395
@node syn-in-filter, syn-out-filter, syn-incomplete, Options
2396
@section The Option @code{syn-in-filter} (@code{malaga})
2397
@cindex @code{syn-in-filter} (@code{malaga} option)
2399
Use the option @code{syn-in-filter} to switch the syntax input-filter on or
2403
@item set syn-in-filter yes
2404
Activate the filter.
2405
@item set syn-in-filter no
2406
Deactivate the filter.
2409
@c ----------------------------------------------------------------------------
2411
@node syn-out-filter, transmit-line, syn-in-filter, Options
2412
@section The Option @code{syn-out-filter} (@code{malaga})
2413
@cindex @code{syn-out-filter} (@code{malaga} option)
2415
Use the option @code{syn-out-filter} to switch the syntax output-filter on
2419
@item set syn-out-filter yes
2420
Activate the filter.
2421
@item set syn-out-filter no
2422
Deactivate the filter.
2425
@c ----------------------------------------------------------------------------
2427
@node transmit-line, unknown-format, syn-out-filter, Options
2428
@section The Option @code{transmit-line}
2429
@cindex @code{transmit-line} (option)
2431
If you want to use the @code{transmit} function in @code{malaga} or
2432
@code{mallex}, you have to set a command line that starts the transmit
2433
process using the @code{transmit-line} option. Here is an example:
2436
set transmit-line "my-transmit-program"
2439
@c ----------------------------------------------------------------------------
2441
@node unknown-format, use-display, transmit-line, Options
2442
@section The Option @code{unknown-format} (@code{malaga})
2443
@cindex @code{unknown-format} (@code{malaga} option)
2445
With @code{unknown-format}, you can change the output format for analysed items
2446
that have not been recognised. Enter a format string as argument. If the
2447
format string contains spaces, enclose it in double quotes. If the argument is
2448
an empty string (@code{""}), no unrecognised forms will be shown.
2450
In the format string, the following sequences have a special meaning:
2454
Will be replaced by the line number of the analysed form.
2456
Will be replaced by the number of analysis states for this form.
2458
Will be replaced by the surface.
2461
@c ----------------------------------------------------------------------------
2463
@node use-display, use-ksc, unknown-format, Options
2464
@section The Option @code{use-display}
2465
@cindex @code{use-display} (option)
2467
If you want the output of the commands @code{result} and @code{variables} to be
2468
shown by the @code{Display} process, use the option @code{use-display}:
2471
@item set use-display yes
2472
Use the @code{Display} process to show the output of @code{result} and
2474
@item set use-display no
2475
Print the output of @code{result} and @code{variables} on your terminal.
2478
@c ----------------------------------------------------------------------------
2480
@node use-ksc, , use-display, Options
2481
@section The Option @code{use-ksc}
2482
@cindex @code{use-ksc} (option)
2485
@cindex Hangul, romanised
2486
@cindex romanised Hangul
2487
If you are using a Hangul grammar (with @samp{char-set: hangul}) in your
2488
project file, you can make Malaga use KSC5601 code as well as romanised
2492
@item set use-ksc yes
2493
Print output using KSC5601 code.
2494
@item set use-ksc no
2495
Print output using romanised Hangul.
2498
@c ----------------------------------------------------------------------------
2500
@node The Language, Index, Options, Top
2501
@chapter The Programming Language Malaga
2502
@cindex Malaga, programming language
2505
* Characterisation:: The abstract characteristics of the language.
2506
* Source Texts:: General rules for Malaga source files.
2507
* Values:: The types that make any Malaga data.
2508
* Expressions:: How operators can combine values.
2509
* Conditions:: Expressions yielding a boolean value.
2510
* Boolean Operators:: The Operators @code{and}, @code{or} and @code{not}.
2511
* Symbol Table:: All symbols have to be defined here.
2512
* Initial State:: The initial LAG state.
2513
* Constant Definition:: Constants can be used in lexicon and rule files.
2514
* Rules:: Rules are comparable to functions in C.
2515
* Statements:: The atoms of which a rule is constructed.
2516
* Files:: The Files that make a Malaga grammar.
2517
* Syntax Summary:: Formal Description of the Malaga syntax.
2520
@c ----------------------------------------------------------------------------
2522
@node Characterisation, Source Texts, The Language, The Language
2523
@section Characterisation of Malaga
2525
A malaga rule file resembles much in programming languages like Pascal
2526
or C (of course, those languages do not have a Left-Associative Grammar
2527
formalism built in). A malaga source file must be translated before
2528
execution, this is the same as for compiler languages. But the
2529
generated Malaga code is not a machine code, but an @emph{intermediate code}
2530
and has to be executed (@dfn{interpreted}) by an analysis program.
2531
Malaga may be characterised as follows, as far as programming structures and
2532
data structures are concerned:
2535
@item structured values:
2536
The basic values in Malaga are symbols (names that can be used e.g. for
2537
categories or subcategories), numbers (floating point numbers), and
2538
strings. Values can be combined to ordered lists or records (also known
2539
as attribute-value matrixes). A value in a list or a record can be a list or a
2540
record itself. An ``ambiguous'' symbol like @code{singular_plural} can
2541
be assigned a list of symbols like @code{<singular, plural>}; such a
2542
symbol is called a @dfn{multi symbol}.
2544
@item structured statements:
2545
In Malaga, the concept of statement blocks is implemented in a similar
2546
way as it is in the programming language Pascal. There are structured
2547
control statements to select or repeat a statement sequence. A variable
2548
is always defined @dfn{locally}, i.e.@ it only exists from the point
2549
where it has been defined up to the end of the statement sequence in
2550
which it has been defined.
2552
@item no type restrictions:
2553
Any value can be assigned to a variable and the programmer can freely
2554
define the structure of values.
2556
@item no side effects:
2557
Malaga is, unlike programming languages like Pascal or C, free of side
2558
effects. If a variable gets a value, no other variable will be
2559
changed. Analysis paths are independent of each other.
2562
A Malaga grammar that contains no recursive subrules and no
2563
@code{repeat} statements is guaranteed to terminate, i.e.@ it can never
2567
In a @code{define} statement, a variable is defined and gets an initial
2568
value. Use an assignment to set a variable that has already been defined
2572
Many generative grammar theories or linguistical programming languages use the
2573
concept of unification of feature structures. Malaga does not use unification,
2574
but it offers some operators to build feature structures explicitly. Since
2575
Malaga does without unification, analyses are much faster.
2578
@c ----------------------------------------------------------------------------
2580
@node Source Texts, Values, Characterisation, The Language
2581
@section Malaga Source Texts
2583
Source texts in Malaga are format-free; this means that between lexical symbols
2584
(strings, identifiers, keywords, numerals and symbols such as @samp{+},
2585
@samp{~} or @samp{:=}) there may be blanks or newlines (whitespaces) or
2586
comments. Between two identifiers or two keywords there @emph{must} be at
2587
least one whitespace to separate them syntactically.
2590
* Comments:: How to insert comments in your source file.
2591
* Include:: How to read other files from your source file.
2592
* Identifiers:: Names in Malaga source files.
2595
@c ----------------------------------------------------------------------------
2597
@node Comments, Include, Source Texts, Source Texts
2598
@subsection Comments
2601
A comment may be inserted everywhere where a whitespace may be inserted. A
2602
comment begins with the symbol @samp{#} and extends to the end of the line.
2603
Comments are being ignored.
2605
@c ----------------------------------------------------------------------------
2607
@node Include, Identifiers, Comments, Source Texts
2608
@subsection The @code{include} Statement
2609
@cindex @code{include} (statement)
2611
A Malaga file may contain the statement
2614
include "@var{filename}";
2617
In a rule file, it can stand everywhere a rule can stand. In lexicon
2618
files, it can stand in place of a value; in symbol files, it can replace
2619
a symbol definition. The text of the included file is inserted verbatim
2620
at the very location where the @code{include} statement occurs. The file
2621
name has to be stated relatively to the directory of the file which
2622
contains the @code{include} statement.
2624
@c ----------------------------------------------------------------------------
2626
@node Identifiers, , Include, Source Texts
2627
@subsection Identifiers
2630
In Malaga, names for variables, constants, symbols, and rules, and (see below
2631
for explanation) are called @dfn{identifiers}. An identifier may consist of
2632
uppercase and lowercase characters, the underscore @samp{_}, the ampersand
2633
@samp{&}, the vertical bar @samp{|}, and, from the second character on,
2634
also of digits. Uppercase and lowercase characters are not distinguished, i.e.,
2635
Malaga is @emph{not} case-sensitive. Malaga keywords must not be used as
2636
identifiers. A variable name must start with a @samp{$}, a constant name
2637
must start with a @samp{@@}. The same identifier may be used as variable
2638
name, constant name, symbol name, or rule name independently. Malaga can
2639
distinguish them by the context in which they occur.
2641
Valid identifiers would be @samp{Noun}, @samp{noun} (the same as the
2642
first), @samp{R2D2}, @samp{Vb_aux}, @samp{A|G|D}, @samp{_INF}.
2643
Identifiers like @samp{2Noun}, @samp{Verb.Frame}, @samp{OK?},
2644
@samp{_~INF} are @emph{not} valid.
2646
@c ----------------------------------------------------------------------------
2648
@node Values, Expressions, Source Texts, The Language
2652
Malaga expressions can have values with very complex structures. To describe
2653
how those values can be composed from simple values a few rules suffice. Simple
2654
values in Malaga are @dfn{symbols}, @dfn{numbers}, and @dfn{strings},
2655
which can be composed to form @dfn{records} and @dfn{lists}.
2658
* Symbols:: The atomic datatype that is basic to Malaga.
2659
* Numbers:: Floating point numbers, also used for indexes.
2660
* Strings:: A sequence of characters, used to store text.
2661
* Lists:: An ordered sequence of subvalues.
2662
* Records:: A set of attribute-value pairs.
2665
@c ----------------------------------------------------------------------------
2667
@node Symbols, Numbers, Values, Values
2671
The central data type in Malaga is the symbol. It is used for describing
2672
syntactic or semantic properties of an allomorph, a word, or a
2673
sentence. A symbol is an identifier like @samp{Verb}, @samp{reflexive},
2674
@samp{Sing_1}. The symbols @samp{nil}, @samp{yes}, @samp{no},
2675
@samp{symbol}, @samp{string}, @samp{number}, @samp{list}, and
2676
@samp{record} are predefined and have special meanings.
2678
@c ----------------------------------------------------------------------------
2680
@node Numbers, Strings, Symbols, Values
2684
A number in Malaga consists of an integer part, an optional fractional
2685
part and an optional exponent of the form @samp{E[+|-]n}. There must be
2686
a dot between the integer part and the fractional part. Examples:
2687
@samp{0}, @samp{1}, @samp{1.0}, @samp{13.75}, @samp{1.2E-5}.
2689
Alternatively, a number may consist of an integer number followed by
2690
@samp{L}, indicating that the number is intended as a list index
2691
counting from the @emph{left} border), or by @samp{R}, indicating that
2692
the number is intended as a list index counting from the @emph{right}
2693
border. Examples: @code{5L} = @code{5}, @code{12R} = @code{-12}.
2695
@c ----------------------------------------------------------------------------
2697
@node Strings, Lists, Numbers, Values
2701
A string may consist of any number of characters (it may also be empty). It
2702
must be enclosed in double quotes and must not extend over more than one line.
2703
Within the double quotes there may be any combination of printable characters
2704
except the backslash @samp{\} and the double quotes. These characters must
2705
be preceded by a @samp{\} (escape character).
2706
@cindex escape character (@samp{\})
2707
Examples: @code{"Hello"}, @code{"He says: \"Great\""}.
2709
@c ----------------------------------------------------------------------------
2711
@node Lists, Records, Strings, Values
2715
A list is an ordered sequence of values. The values are separated by commas and
2716
enclosed in angle brackets:
2719
<@var{element1}, @var{element2}, ...>
2722
A list may as well be empty. The elements in a list may be arbitrarily complex;
2723
they may also be lists or records.
2725
@c ----------------------------------------------------------------------------
2727
@node Records, , Lists, Values
2732
A record is a collection of attributes. An @emph{attribute} consists of a
2733
symbol, the @emph{attribute name}, and an associated @emph{attribute value},
2734
which can by an arbitrary Malaga value. The attribute name serves as an access
2735
key for the attribute value, so all attributes in a record must have different
2738
Records are noted down as follows:
2741
[@var{name1}: @var{value1}, @var{name2}: @var{value2}, ...]
2744
where @var{name i} denotes an attribute name and @var{value i} the associated
2745
attribute value. Example: @code{[Class: Verb, Reg: Reg, Val: dirObj]}.
2747
A record with no attributes, @code{[]}, is called @dfn{empty record}.
2748
@cindex record, empty
2749
@cindex empty record
2751
@c ----------------------------------------------------------------------------
2753
@node Expressions, Conditions, Values, The Language
2754
@section Expressions
2757
An expression is the form in which a value is used in Malaga. Values can be
2761
[Surf: "he", Class: Pron, Case&Number: S3]
2764
Variables (these are placeholders for values within a rule) can as well be used
2771
Furthermore, constants (placeholders for values in a rule file) can be used as
2778
All three forms can be mixed:
2781
[Surf: "he", Class: Pron, Case&Number: $result]
2784
Furthermore, there are operators which modify values or combine two
2785
values to form a new value. Complex values can be composed using those
2786
operators. All operators have a priority assigned. An operator with
2787
higher priority is applied before an operator with lower priority. If
2788
two operators have the same priority, they are applied from the left to
2789
the right. The order in which the operators are to be applied can be
2790
changed by bracketing with round parentheses @samp{()}.
2791
@cindex priority, operator
2792
@cindex operator priority
2795
@item unary @samp{-}
2799
@item @samp{*}, @samp{/}
2801
@item @samp{+}, @samp{-}
2806
* Malaga Variables:: Containers for Malaga Values in a Rule.
2807
* Constants:: Global containers for Malaga Values.
2808
* Subrule Calls:: Call a subrule from another rule.
2809
* Atoms:: The atoms of a multisymbol.
2810
* Capital:: Does a string begin with a capital letter?
2811
* Floor:: Round down to the next integer.
2812
* Length:: The length of a list or a string.
2813
* Multi:: The multisymbol of the given atoms list.
2814
* Set:: Make a list contain unique elements only.
2815
* Substring:: Get a substring of a string.
2816
* Switch:: Get a user-defined value.
2817
* Transmit:: Call the transmit process.
2818
* Value_String:: Convert a value to a string.
2819
* Value_Type:: Get the type of a value.
2820
* Unary Minus:: Negate a value.
2821
* Operator Dot:: Select an attribute or a list element.
2822
* Operator Plus:: Concat strings, lists or records, or add.
2823
* Operator Minus:: Delete an attribute or an element, or subtract.
2824
* Operator Times:: Intersect lists, concat records, or multiply.
2825
* Operator Divide:: Delete elements from a list, or divide.
2828
@c ----------------------------------------------------------------------------
2830
@node Malaga Variables, Constants, Expressions, Expressions
2831
@subsection Variables
2833
A variable is marked by a @samp{$} preceding its name. The name may be any
2834
valid identifier. A variable is defined by the @code{define} statement; it
2835
receives a value and may from this point on be used in all expressions within
2836
the statement sequence. In such a statement sequence (and all subordinated
2837
statement sequences) a variable with the same name must not be defined again.
2839
@c ----------------------------------------------------------------------------
2841
@node Constants, Subrule Calls, Malaga Variables, Expressions
2842
@subsection Constants
2845
A constant is marked by a @samp{@@} preceding its name. The name may be any
2846
valid identifier. A constant is defined by a constant definition in a rule
2847
file, outside a rule. It is assigned a value and can be used in subsequent
2848
rules and constant definitions in that rule file.
2850
@c ----------------------------------------------------------------------------
2852
@node Subrule Calls, Atoms, Constants, Expressions
2853
@subsection Subrule Invokations
2854
@cindex subrules, calling
2856
A subrule is invoked when an expression
2857
@code{@var{subrule}(@var{value1}, @var{value2}, ...)} is evaluated.
2859
The expression yields the value that is returned by the @code{return}
2860
statement in the subrule.
2861
@cindex @code{return} (statement)
2863
The number of parameters in a subrule invokation must match the number of
2864
parameters in the subrule definition.
2866
There is a number of default subrules which are predefined. They are called
2870
@c ----------------------------------------------------------------------------
2872
@node Atoms, Capital, Subrule Calls, Expressions
2873
@subsection The Function @code{atoms}
2874
@cindex @code{atoms} (function)
2876
The expression @code{atoms(@var{symbol})} yields the list of atomic
2877
symbols for @var{symbol}. If @var{symbol} is not a multi-symbol, it
2878
yields the list @code{<@var{symbol}>}.
2880
@c ----------------------------------------------------------------------------
2882
@node Capital, Floor, Atoms, Expressions
2883
@subsection The Function @code{capital}
2884
@cindex @code{capital} (function)
2886
The expression @code{capital(@var{string})} yields @code{yes} if the
2887
first character of @var{string} is a capital letter, else it yields
2890
@c ----------------------------------------------------------------------------
2892
@node Floor, Length, Capital, Expressions
2893
@subsection The Function @code{floor}
2894
@cindex @code{floor} (function)
2896
The expression @code{floor(@var{number})} yields the largest integer
2897
number that is not greater than @var{number}.
2899
@c ----------------------------------------------------------------------------
2901
@node Length, Multi, Floor, Expressions
2902
@subsection The Function @code{length}
2903
@cindex @code{length} (function)
2905
The expression @code{length(@var{list})} yields the number of
2906
elements in @var{list}.
2908
The expression @code{length(@var{string})} yields the number of
2909
characters in @var{string}.
2912
@c ----------------------------------------------------------------------------
2914
@node Multi, Set, Length, Expressions
2915
@subsection The Function @code{multi}
2916
@cindex @code{multi} (function)
2918
The expression @code{multi(@var{list})} where @var{list} is a list of
2919
symbols, yields the multi symbol whose atomic list corresponds to
2920
@var{list}. If @var{list} contains a single atomic symbol, this symbol
2921
will be yield by the expression.
2923
@c ----------------------------------------------------------------------------
2925
@node Set, Substring, Multi, Expressions
2926
@subsection The Function @code{set}
2927
@cindex @code{set} (function)
2929
The expression @code{set(@var{list})} yields a list which contains
2930
each element of @var{list}, but only once. That means, the list is
2933
@c ----------------------------------------------------------------------------
2935
@node Substring, Switch, Set, Expressions
2936
@subsection The Function @code{substring}
2937
@cindex @code{substring} (function)
2939
The expression @code{substring(@var{string}, @var{start_index},
2940
@var{end_index})} yields the substring of @var{string} that starts at
2941
@var{start_index} and ends at @var{end_index}, both inclusive. A
2942
positive index counts from the string start: @code{1L} is the index of
2943
the first character; a negative index counts from the string end:
2944
@code{1R} is the index of the last character. If @var{end_index} is
2945
omitted, it is assumed to be the same as @var{start_index}, so
2946
@code{substring(@var{string}, @var{index})} yields the character at
2947
@var{index} in @var{string}. If @var{end_index} is less than
2948
@var{start_index}, the function yields an empty string.
2950
@c ----------------------------------------------------------------------------
2952
@node Switch, Transmit, Substring, Expressions
2953
@subsection The Function @code{switch}
2954
@cindex @code{switch} (function)
2956
The expression @code{switch(@var{symbol})} yields the current value of
2957
the switch associated to @var{symbol}. Use the option @code{switch} to
2960
@c ----------------------------------------------------------------------------
2962
@node Transmit, Value_String, Switch, Expressions
2963
@subsection The Function @code{transmit}
2964
@cindex @code{transmit} (function)
2966
The expression @code{transmit(@var{value})} writes @var{value},
2967
converted to text format, to the transmit process via pipe and reads a
2968
value in text format from the transmit process via pipe. The answer is
2969
converted to the internal Malaga value format and returned as the
2970
result of the expression.
2972
When this function is evaluated, the transmit process is started if it
2973
is not running. The command line of the transmit process is specified by
2974
the option @code{transmit}.
2976
@c ----------------------------------------------------------------------------
2978
@node Value_String, Value_Type, Transmit, Expressions
2979
@subsection The Function @code{value_string}
2980
@cindex @code{value_string} (function)
2982
The expression @code{value_string(@var{value})} returns @var{value}
2983
converted to text format as a string.
2985
@c ----------------------------------------------------------------------------
2987
@node Value_Type, Unary Minus, Value_String, Expressions
2988
@subsection The Function @code{value_type}
2989
@cindex @code{value_type} (function)
2991
The expression @code{value_type(@var{value})} yields the type of
2992
@var{value}. The type information is coded as one of the symbols
2993
@code{symbol}, @code{string}, @code{number}, @code{list}, or
2996
@c ----------------------------------------------------------------------------
2998
@node Unary Minus, Operator Dot, Value_Type, Expressions
2999
@subsection Unary @samp{-}
3001
A @samp{-} in front of a value of type @code{number} negates that value.
3003
@c ----------------------------------------------------------------------------
3005
@node Operator Dot, Operator Plus, Unary Minus, Expressions
3006
@subsection The Operator @samp{.}
3008
This operator may only be used in the following ways:
3011
@item @var{record}.@var{symbol}
3012
This yields the attribute value of the attribute of @var{record} whose
3013
name is @var{symbol}. If there is no attribute in @var{record} whose
3014
name is @var{symbol}, the expression yields the special symbol
3017
@item @var{list}.@var{number}
3018
This yields the element of @var{list} at position @var{number}. If
3019
there is no element at position @var{number} in @var{list}, the
3020
expression yields the special symbol @code{nil}.
3022
@item @var{value}.@var{list}
3023
Here, @var{list} must be a list @code{<@var{e1}, @var{e2}, ...>} of
3024
symbols and/or numbers. This expression serves as an abbreviation for
3025
@code{@var{value}.@var{e1}.@var{e2}...}.
3028
@c ----------------------------------------------------------------------------
3030
@node Operator Plus, Operator Minus, Operator Dot, Expressions
3031
@subsection The Operator @samp{+}
3032
@cindex @code{+} (operator)
3034
This operator may only be used in the following ways:
3037
@item @var{string1} + @var{string2}
3038
This yields the concatenation of @var{string1} and @var{string2}.
3040
@item @var{list1} + @var{list2}
3041
This yields the concatenation of @var{list1} and @var{list2}.
3043
@item @var{number1} + @var{number2}
3044
This yields the sum of @var{number1} and @var{number2}.
3046
@item @var{record1} + @var{record2}
3047
This yields a record wich consists of all attributes of @var{record1}
3048
and @var{record2}. If @var{record1} and @var{record2} have a common
3049
attribute names, the corresponding attributes in the result record will
3050
have the attribute values from @var{record2}, in contrast to the
3054
@c ----------------------------------------------------------------------------
3056
@node Operator Minus, Operator Times, Operator Plus, Expressions
3057
@subsection The Operator @samp{-}
3058
@cindex @code{-} (operator)
3060
This operator may only be used in the following ways:
3063
@item @var{record} - @var{symbol}
3064
This yields @var{record} without the attribute named @var{symbol}, if
3065
@var{symbol} is an attribute name in @var{record}. If not, the
3066
expression yields @var{record}.
3068
@item @var{record} - @var{list}
3069
Here, @var{list} must be a list of symbols. This yields @var{record}
3070
without the attributes in @var{list}.
3072
@item @var{list} - @var{number}
3073
This yields @var{list} without the element at index @var{number}. If
3074
this element does not exist, the expression yields @var{list}.
3076
@item @var{list1} - @var{list2}
3077
This yields the multi-set difference of the two lists @var{list1} and
3078
@var{list2}. This means, it yields the list @var{list1}, but the first
3079
@var{n} appearances of each element will be deleted, if that element
3080
appears @var{n} times in @var{list2}.
3082
@item @var{number1} - @var{number2}
3083
This yields the difference of @var{number1} and @var{number2}.
3086
@c ----------------------------------------------------------------------------
3088
@node Operator Times, Operator Divide, Operator Minus, Expressions
3089
@subsection The Operator @samp{*}
3090
@cindex @code{*} (operator)
3092
This operator may only be used in the following ways:
3095
@item @var{record} * @var{symbol}
3096
This yields the record which only contains the attribute of @var{record}
3097
whose name is @var{symbol}.
3099
@item @var{record1} * @var{record2}
3100
This yields a record wich consists of all attributes of @var{record1}
3101
and @var{record2}. If @var{record1} and @var{record2} have a common
3102
attribute names, the corresponding attributes in the result record will
3103
have the attribute values from @var{record1}, in contrast to the
3106
@item @var{record} * @var{list}
3107
Her, @var{list} must be a list of symbols. This yields the record which
3108
only contains the attributes of @var{record} whose names are in
3111
@item @var{list1} * @var{list2}
3112
This yields the @dfn{intersection} of the lists interpreted as
3113
multi-sets; if an element is @var{m} times contained in @var{list1} and
3114
@var{n} times contained in @var{list2}, it will be @code{min(@var{m},
3115
@var{n})} times contained in the result.
3117
@item @var{number1} * @var{number2}
3118
This yields the product of @var{number1} and @var{number2}.
3121
@c ----------------------------------------------------------------------------
3123
@node Operator Divide, , Operator Times, Expressions
3124
@subsection The Operator @samp{/}
3125
@cindex @code{/} (operator)
3127
This operator may only be used in the following ways:
3130
@item @var{list1} / @var{list2}
3131
This yields the list which contains all elements of @var{list1} which
3132
are not elements of @var{list2}.
3134
@item @var{list} / @var{number}
3135
This yields the list which contains all elements of @var{list} without
3136
the leftmost @var{number} elements, if @var{number} is positive, or
3137
without the rightmost -@var{number} elements, if @var{number} is
3140
@item @var{number1} / @var{number2}
3141
Here, @var{number2} must not be 0. This yields the quotient of
3142
@var{number1} and @var{number2}.
3145
@c ----------------------------------------------------------------------------
3147
@node Conditions, Boolean Operators, Expressions, The Language
3150
@cindex @code{yes} (symbol)
3151
@cindex @code{no} (symbol)
3153
A condition can either be true or false, as in @code{Verb = Verb} or
3154
@code{Verb = Noun}, respectively. An expression that is evaluated to
3155
any of the symbols @code{yes} or @code{no} is a valid condition.
3157
A condition can be used everywhere a (non-constant) value is needed. It
3158
will evaluate to @code{yes} or @code{no}. In this case, the condition
3159
must be surrounded by parentheses.
3162
* Equals and Not Equals:: Compare any values for equality.
3163
* Comparing numbers:: @samp{<}, @samp{<=}, @samp{>}, @samp{>=}
3164
* Congruent and Not Congruent:: Test for congruency.
3165
* Operator In:: Test an element or attribute for inclusion.
3166
* Regular Expressions:: String patterns.
3169
@c ----------------------------------------------------------------------------
3171
@node Equals and Not Equals, Comparing numbers, Conditions, Conditions
3172
@subsection The Operators @samp{=} and @samp{/=}
3173
@cindex @code{=} (operator)
3174
@cindex @code{/=} (operator)
3176
The condition @code{@var{expr1} = @var{expr2}} tests whether the
3177
expressions @var{expr1} and @var{expr2} are equal. There are several
3181
@item @var{expr1} and @var{expr2} are strings, symbols or numbers.
3182
In this case @var{expr1} and @var{expr2} must be identical.
3183
@item @var{expr1} and @var{expr2} are lists.
3184
In this case @var{expr1} and @var{expr2} must match element by element.
3185
@item @var{expr1} and @var{expr2} are records.
3186
In this case @var{expr1} and @var{expr2} must contain the same
3187
attributes (though not necessarily in the same order) as in @var{expr2}.
3190
For nested structures, equality is tested recursively.
3192
If @var{expr1} and @var{expr2} do not have the same type, the test
3193
results in an error; only the symbol @code{nil} can be compared to any value.
3195
The comparison @code{@var{expr1} /= @var{expr2}} holds iff the
3196
comparison @code{@var{expr1} = @var{expr2}} does not hold.
3198
@c ----------------------------------------------------------------------------
3200
@node Comparing numbers, Congruent and Not Congruent, Equals and Not Equals, Conditions
3201
@subsection The Operators @code{less}, @code{less_equal}, @code{greater}, @code{greater_equal}
3202
@cindex @code{less} (operator)
3203
@cindex @code{less_equal} (operator)
3204
@cindex @code{greater} (operator)
3205
@cindex @code{greater_equal} (operator)
3207
A condition of type @code{@var{expr1} @var{operator} @var{expr2}} compares
3208
two numbers. Here, @var{operator} can have the following values:
3221
If either @var{expr1} or @var{expr2} is no number, an error will be
3224
@c ----------------------------------------------------------------------------
3226
@node Congruent and Not Congruent, Operator In, Comparing numbers, Conditions
3227
@subsection The Operators @samp{~} and @samp{/~}
3228
@cindex @code{~} (operator)
3229
@cindex @code{/~} (operator)
3231
The operator @samp{~} can be used in the following ways:
3234
@item @var{list1} ~ @var{list2}
3235
This tests whether @var{list1} and @var{list2} do @dfn{congruate},
3236
this means, whether they have an element in common.
3238
@item @var{symbol1} ~ @var{symbol2}
3239
This tests if @code{atoms(@var{symbol1})} and
3240
@code{atoms(@var{symbol2})}, the lists of their atomic symbols, do
3244
The comparison @code{@var{expr1} /~ @var{expr2}} holds iff the
3245
comparison @code{@var{expr1} ~ @var{expr2}} does not hold.
3247
@c ----------------------------------------------------------------------------
3249
@node Operator In, Regular Expressions, Congruent and Not Congruent, Conditions
3250
@subsection The Operator @code{in}
3251
@cindex @code{in} (operator)
3253
The operator @code{in} can be only used in the following ways:
3256
@item @var{symbol} in @var{record}
3257
This condition holds if and only if @var{record} contains an attribute named
3260
@item @var{value} in @var{list}
3261
This condition holds if and only if @var{value} is an element of
3265
@c ----------------------------------------------------------------------------
3267
@node Regular Expressions, , Operator In, Conditions
3268
@subsection The @code{matches} Condition (Regular Expressions)
3269
@cindex @code{matches} (operator)
3270
@cindex expressions, regular
3271
@cindex regular expressions
3272
@cindex patterns, string
3273
@cindex string patterns
3276
@code{@var{expr} matches @var{pattern}}
3278
@code{@var{expr} matches (@var{pattern})}
3279
interprets @var{pattern} as a pattern (a regular expression) and
3280
tests whether @var{expr} matches @var{pattern}. Patterns are defined as
3284
@item @var{pattern} ::= @var{alternative} @{@samp{|} @var{alternative}@}
3285
The string must be identical with one of the alternatives.
3287
@item @var{alternative} ::= @{@var{atom} [@samp{*} | @samp{?} | @samp{+}]@}
3288
An alternative is a (possibly empty) sequence of atoms. An atom in a
3289
pattern corresponds to a character in a string. By using an optional
3290
postfix operator it is possible to specify for any atom how often it may
3291
be repeated within the string at that location: zero times or
3292
once (@samp{?}), at least once (@samp{+}), or arbitrarily often,
3293
including zero times (@samp{*}).
3295
Normally, these operators are @emph{greedy}, i.e. they try to match as
3296
much as possible. If you put a @samp{?} behind a postfix operator, it
3297
will try to match as few characters as possible. This can make a
3298
difference if you're assigning variables in your pattern.
3300
@item @var{atom} ::= @samp{(} @var{pattern} @samp{)}
3301
A pattern may be grouped by parentheses.
3303
@item @var{atom} ::= @samp{[} [@samp{^}] @var{range} @{@var{range}@} @samp{]}
3304
A character class. It represents exactly one character from one of the
3305
ranges. If the symbol @samp{^} is the first one in the class, the
3306
expression represents exactly one character that is @emph{not} contained
3307
in one of the ranges.
3309
@item @var{atom} ::= @samp{.}
3310
Represents any character.
3312
@item @var{atom} ::= @var{character}
3313
Represents the character itself.
3315
@item @var{range} ::= @var{character1} [@samp{-} @var{character2}]
3316
The range contains any character with a code at least as big as the code
3317
of @var{character1} and not bigger than the code of
3318
@var{character2}. The code of @var{character2} must be at least as big
3319
as the code of @var{character1}. If @var{character2} is omitted, the
3320
range only contains @var{character1}.
3322
@item @var{character} ::= Any character except @samp{*?+[]^-.\|()}
3323
To use one of the characters @samp{*?+[]^-.|()}, it must be preceded by
3324
a @samp{\\} (pattern escape). To insert the pattern escape itself, you
3325
have to double it: @samp{\\\\}.
3329
You can divide the pattern into segments:
3332
$surf matches ("un|in|im|ir|il", ".*", "(en)?")
3338
$surf matches ("(un|in|im|ir|il).*(en)?")
3341
A section of the string can be stored in a variable by suffixing the respective
3342
pattern with @samp{: @var{variable_name}}, as in
3345
$surf matches ("un|in|im|ir|il": $a, ".*")
3348
For backwards compatibility, you may also prefix the pattern with the variable
3352
$surf matches $a: "un|in|im|ir|il", ".*"
3355
The variables defined by pattern matching are only defined in the statement
3356
sequence which is being executed if the pattern matching is successful.
3357
A @code{matches} condition may not have variable definitions in it if it
3362
contained in a disjunction (an @code{or} condition),
3364
contained in a negation (a @code{not} condition), or
3366
used as a truth value (e.g. in an assignment).
3369
@c ----------------------------------------------------------------------------
3371
@node Boolean Operators, Symbol Table, Conditions, The Language
3372
@section The Operators @code{not}, @code{and}, and @code{or}
3373
@cindex @code{not} (operator)
3374
@cindex @code{and} (operator)
3375
@cindex @code{or} (operator)
3376
@cindex boolean operators
3378
Conditions can be combined logically:
3381
@item not @var{cond}
3382
This is true if condition @var{cond} is false.
3384
@item @var{cond1} and @var{cond2} and @var{cond3} and ...
3385
This is true if all conditions @var{cond1}, @var{cond2}, @var{cond3},
3386
... are true. The conditions are tested one by one from left to right
3387
until one of them is false. This is called @dfn{short-cut evaluation}.
3389
@item @var{cond1} or @var{cond2} or @var{cond3} or ...
3390
This is true if at least one of the conditions @var{cond1},
3391
@var{cond2}, @var{cond3}, ... is true. The conditions are tested one
3392
by one from left to right until one of them is true. This is also a form
3393
of short-cut evaluation.
3396
The operator @code{not} takes exactly one argument. If its argument contains
3397
another logical operator, put it in parentheses @samp{()}, as in
3398
@code{not (@var{cond1} or @var{cond2})}.
3400
The operators @code{and} and @code{or} may not be mixed as in
3401
@code{@var{cond1} and @var{cond2} or @var{cond3}}; here the order of
3402
evaluation would be ambiguous. Use parentheses @samp{()} to indicate in wich
3403
order the condition is to be evaluated, as in
3404
@code{(@var{cond1} and @var{cond2}) or @var{cond3}}.
3406
@c ----------------------------------------------------------------------------
3408
@node Symbol Table, Initial State, Boolean Operators, The Language
3409
@section The Symbol Table
3410
@cindex symbol table
3411
@cindex symbol definition
3413
Every symbol used in a grammar has to be defined at least once in the
3414
@dfn{symbol table}. Every symbol must be followed by a semicolon:
3415
@code{verb; noun; adjective;}
3417
Symbols that are being defined that way are called @dfn{atoms}. A
3418
symbol can also be defined as a @dfn{molecule}. Then the entry for this
3419
symbol has the following format:
3424
@var{symbol} := @var{list};
3427
The @var{list} for this symbol must consist of at least two atoms; no atom may
3428
occur more than once in the list. This list will be used by the operators
3429
@samp{~} and @samp{/~}, @code{atoms}, and @code{multi}. The
3430
lists in the symbol table must be different from each other; it does not
3431
suffice that they only differ in the order of their elements. If a symbol is
3432
defined more than once in the symbol table, the definitions must all match:
3433
Either the symbol must always be defined atomic or it must always be molecular
3434
with the same atom-list.
3436
@c ----------------------------------------------------------------------------
3438
@node Initial State, Constant Definition, Symbol Table, The Language
3439
@section The Initial State
3440
@cindex state, initial
3441
@cindex initial state
3443
The initial state in a combination rule file is defined as follows:
3446
initial @var{value}, rules @var{rule1}, @var{rule2}, ...;
3449
The initial state of a combi rule file specifies a feature structure and a list
3450
of rules (behind the keyword @code{rules}). Each of the rules will be applied
3451
to read in the first allomorph (in morphology) or word form (in syntax). The
3452
list may be enclosed in parentheses.
3454
@cindex failing rule
3455
@cindex rule, failing
3456
@cindex successful rule
3457
@cindex rule, successful
3458
A combi rule or an end rule is successful if it creates at least one
3459
new state, otherwise it fails. If you want rules to be executed only
3460
if all other rules failed, you can put their names behind the other rules'
3461
names and write an @code{else} in front of them:
3464
initial @var{value}, rules @var{rule1}, @var{rule2} else
3465
@var{rule3}, @var{rule4} else ...;
3468
If both rules @var{rule1} and @var{rule2} fail, @var{rule3} and
3469
@var{rule4} are executed. If these rules also fail, the next rules are
3470
executed, and so on.
3472
@c ----------------------------------------------------------------------------
3474
@node Constant Definition, Rules, Initial State, The Language
3475
@section The Constant Definition
3476
@cindex constant definition
3477
@cindex definition, constant
3479
A constant definition is of the form
3482
define @@@var{constant} := @var{expr};
3485
The constant expression @var{expr} will be evalued and the constant
3486
@@@var{constant} will be defined to have this value. The constant must
3487
not have been defined previously. The constant is valid from this
3488
definition up to the end of the rule file. If you use the keyword
3489
@code{default} instead of @code{define}, you provide a default value for
3490
@@@var{constant}. This means, the value is only preliminary and may be
3491
changed by a normal constant definition. After a constant has been used
3492
in an expression, its value may not be changed any more.
3494
@c ----------------------------------------------------------------------------
3496
@node Rules, Statements, Constant Definition, The Language
3499
@cindex @code{allo_rule} (rule)
3500
@cindex @code{combi_rule} (rule)
3501
@cindex @code{end_rule} (rule)
3502
@cindex @code{pruning_rule} (rule)
3503
@cindex @code{robust_rule} (rule)
3504
@cindex @code{input_filter} (rule)
3505
@cindex @code{output_filter} (rule)
3506
@cindex @code{subrule} (rule)
3508
A rule is a sequence of statements that is executed as a unit:
3511
combi_rule @var{name}(@var{$param1}, @var{$param2}, ...):
3518
A rule has to begin with one of the keywords @code{allo_rule},
3519
@code{combi_rule}, @code{end_rule}, @code{pruning_rule},
3520
@code{robust_rule}, @code{input_filter}, @code{output_filter} or
3521
@code{subrule}. It is followed by its @emph{parameter list}, a list of
3522
variable names. The variables will be assigned the
3523
parameter values when the rule is executed. The number of parameters
3524
depends on the rule type. The rule names have the following meanings:
3527
@item allo_rule (@var{$lex_entry})
3528
An allo-rule must occur exactly once in an allomorph rule file. It
3529
analyses a lexical entry and must generate one or more allomorph entries
3530
via @code{result}. An allomorph rule has one parameter, namely the
3533
@item combi_rule (@var{$state}, @var{$link}, @var{$surf}, @var{$index})
3534
Any number of combi-rules may occur in a combi-rule file. Before processing
3535
such a rule, the @dfn{link} is read in, which is either the word form or
3536
the allomorph that follows the state's surface. The first parameter of the rule
3537
is the state's feature structure, the second is the link's feature structure,
3538
the third is the link's surface, and the fourth is the link's index. The third
3539
and the fourth parameter are optional. A combi-rule may state a successor rule
3540
set or accept the analysed input (both via @code{result}).
3542
@item end_rule (@var{$state}, @var{$remain_input})
3543
Any number of end-rules may occur in a combi-rule file. The first parameter is
3544
the state's feature structure, the second, which is optional, is the remaining
3545
input. If the rule takes only one parameter, it is only called if the remaining
3546
input is empty or begins with a space. An end rule may accept the analysed
3547
input via @code{result}.
3549
@item pruning_rule (@var{$list})
3550
A pruning-rule may occur at most once in a syntax rule file. During
3551
syntax analysis, it can decide which states are still valid and which
3552
are to be deleted. The parameter is a list of feature structures of the states
3553
that have consumed the same input so far. The pruning-rule must execute
3554
a @code{return} statement with a list of the symbols @code{yes} and/or
3555
@code{no}. Each state in @var{$list} corresponds to a symbol in the
3556
result list. If the symbol is @code{yes}, the corresponding state is
3557
preserved. If the symbol is @code{no}, the state is abandoned.
3559
@item robust_rule (@var{$surface}, @var{$remain_input})
3560
A robust-rule can only appear at most once a morphology rule file. If
3561
robust analysis has been switched on by the @code{robust} command, and a
3562
word form could not be recognised by the combi-rules, the robust-rule is
3563
executed with the surface of the next word form as its first
3564
parameter. The next word form is defined as the remaining input up to
3565
(but excluding) the next space. The optional second parameter contains
3566
the whole remaining input. A robust-rule can accept any prefix of the
3567
remaining input via @code{result}.
3569
@item input_filter (@var{$feature_structure_list})
3570
An input-filter may occur at most once in a syntax rule file. The
3571
input-filter is called after a word form has been analysed. It gets one
3572
parameter, namely the list of the analysis results, and it transforms it
3573
to one or more filtered results (via @code{result}).
3575
@item output_filter (@var{$feature_structure_list})
3576
An output-filter may occur at most once in any rule file.
3579
@item In allo-rule files:
3580
The output-filter is called after all lexicon entry have been processed
3581
by the allo-rules. The filter is called for every allomorph surface. It
3582
gets one parameter, namely the list of the generated feature structures with
3583
that surface, and it transforms it to one or more filtered allomorph
3584
feature structures (via @code{result}).
3586
@item In combi-rule files:
3587
The output-filter is called after an item has been analysed. It gets one
3588
parameter, namely the list of the analysis results, and it transforms it
3589
to one or more filtered results (via @code{result}).
3592
@item subrule (@var{$param1}, @var{$param2}, ...)
3593
Any number of subrules may occur in any rule file. A subrule can be
3594
invoked from other rules and it must return a value to this rule via
3595
@code{return}. It can have any number of parameters (at least one).
3598
If a rule is executed, all statements in the rule are processed sequentially.
3599
After that, the rule execution is terminated. Thereby, the @code{if} statement,
3600
the @code{foreach} statement, and the @code{parallel} statement may change the
3601
processing order. Special conditions apply if:
3605
A condition in a @code{require} statement does not hold. In this case the
3606
processing of the current rule path is terminated. This is not an error.
3608
The @code{stop} statement was executed. In this case the
3609
processing of the current rule path is terminated. This is not an error.
3611
An @code{assert} condition does not hold. In this case the processing of
3612
the whole grammar is terminated and an error message is displayed. This rule
3613
termination can be used to find bugs in the rule system or in the lexicon.
3615
The @code{error} statement was executed. In this case the processing of
3616
the whole grammar is terminated and an error message is displayed.
3618
The @code{return} statement was executed in a subrule or in a pruning
3619
rule. In a subrule, this terminates the subrule int the current rule path and
3620
immediately returns to the calling rule. In a pruning rule, this terminates
3624
@c ----------------------------------------------------------------------------
3626
@node Statements, Files, Rules, The Language
3630
A rule body contains a sequence of statements.
3632
The statements are the assignment and the statements beginning with
3633
@code{assert}, @code{choose}, @code{define}, @code{error},
3634
@code{foreach}, @code{if}, @code{parallel}, @code{repeat},
3635
@code{require}, @code{result}, @code{return}, and @code{stop}.
3638
* Assert:: Report an error if condition is not met.
3639
* The Assignment:: Assign a new value to a variable.
3640
* Break:: Break a @code{foreach} loop.
3641
* Choose:: Branch the current path for different values.
3642
* Define:: Define a new variable.
3643
* Error:: Report an error.
3644
* Foreach:: Repeat statements for a given number of iterations.
3645
* If:: Conditionally execute statements.
3646
* Parallel:: Branch the current path for different statements.
3647
* Repeat:: Repeat statements for an unknown number of iterations.
3648
* Require:: Terminate the current path if condition is not met.
3649
* Result:: Emit a result in a rule.
3650
* Return:: Terminate the current subrule and return a value.
3651
* Stop:: Terminate a path.
3655
@c ----------------------------------------------------------------------------
3657
@node Assert, The Assignment, Statements, Statements
3658
@subsection The @code{assert} Statement
3660
The statement @code{assert @var{condition};} or @code{!
3661
@var{condition};} tests whether @var{condition} holds. If this is not
3662
the case, an error message with the line number in the source code is
3663
printed and the processing of @emph{all} paths is terminated.
3665
The @code{assert} statement should be used to check whether there are
3666
structural flaws in the lexicon or the rule system.
3668
@c ----------------------------------------------------------------------------
3670
@node The Assignment, Break, Assert, Statements
3671
@subsection The Assignment
3674
To set the value of an already defined variable to a different value, use a
3675
statement of the following form:
3678
@var{$var} := @var{expr};
3681
The expression @var{expr} is evaluated and the result is assigned to the
3682
variable @var{$var}. The variable must have already been defined.
3684
You can assign the elements of a list value to multiple variables at once:
3685
@cindex list assignment
3688
<@var{$var1}, @var{$var2}, ... > := @var{expr};
3691
The first, second, ... element of @var{expr}, which must be a list, is
3692
assigned to variable @var{$var1}, @var{$var2}, ... respectively. Any of
3693
these variables may be followed by a path.
3694
The number of variables must match the length of the list value.
3696
You can optionally specify a path behind the variable that is to be set by an
3700
@var{$var}.@var{part1}.@var{part2} := @var{value};
3703
In this case, only the value of @code{@var{$var}.@var{part1}.@var{part2}}
3704
will be set to @var{value}; the remainder of the variable @var{$var}
3705
will be unchanged. Each @var{part} must be an expression that evaluates
3706
to a symbol, a number or a list of symbols and numbers.
3708
You can also use one of four other assignment operators instead of the operator
3709
@samp{:=}: The statement @code{@var{$var} :=+ @var{value};} is a
3710
shorthand for @code{@var{$var} := @var{$var} + @var{value};}. The
3711
same holds for the assignment operators @samp{:=-}, @samp{:=*}, and
3712
@samp{:=/}. Here, @var{$var} may be followed by a path again.
3714
@c ----------------------------------------------------------------------------
3716
@node Break, Choose, The Assignment, Statements
3717
@subsection The @code{break} Statement
3718
@cindex @code{break} (statement)
3720
The @code{break} statement leaves the @code{foreach} loop with @var{Label}.
3726
If the label is omitted, it leaves the innermost @code{foreach} loop it is
3727
contained in. The break statement must be situated in the body of the
3728
@code{foreach} loop it wants to leave.
3730
@c ----------------------------------------------------------------------------
3732
@node Choose, Define, Break, Statements
3733
@subsection The @code{choose} Statement
3734
@cindex @code{choose} (statement)
3736
The @code{choose} statement chooses an element of a list. Its format
3740
choose @var{$var} in @var{expr};
3743
For every element in the list @var{expr} a rule path is created; in this
3744
rule path the element is stored in the variable @var{$var}. Thus the
3745
number of rule paths can multiply. If, for example, @var{expr} has the
3746
value @code{<A, B, C>}, the currently processed rule path has three
3747
continuations: In the first one @var{$var} has the value @code{A}, in
3748
the second one it has the value @code{B} and in the third one it has the
3749
value @code{C}. The three paths behave independently from now on.
3751
The @code{choose} statement can also be used for records. In that case, the
3752
variable @var{$var} gets a different attribute name of the record
3753
@var{expr} in each path.
3755
The @code{choose} statement also works for numbers:
3758
If @var{expr} is a positive number @var{n}, the variable @var{$var} is
3759
assigned the numbers 1, 2, ..., @var{n}, respectively, in each path.
3761
If @var{expr} is a negative number @var{-n}, the variable @var{$var} is
3762
assigned the numbers -1, -2, ..., @var{-n}, respectively, in each path.
3765
@c ----------------------------------------------------------------------------
3767
@node Define, Error, Choose, Statements
3768
@subsection The @code{define} Statement
3769
@cindex @code{define} (statement)
3771
A @code{define} statement is of the form
3773
define @var{$var} := @var{expr};
3775
The expression @var{expr} is evaluated and the result is assigned to the
3776
variable @var{$var}. The variable may not be defined before this statement;
3777
it is defined by the statement and only exists until the statement sequence in
3778
which the assignment is situated has been processed fully.
3780
You can assign the elements of a list value to multiple variables at once:
3782
define <@var{$var1}, @var{$var2}, ... > := @var{expr};
3784
The first, second, ... element of @var{expr}, which must be a list, is
3785
assigned to the new variable @var{$var1}, @var{$var2}, ... respectively.
3786
The number of variables must match the length of the list value.
3788
@c ----------------------------------------------------------------------------
3790
@node Error, Foreach, Define, Statements
3791
@subsection The @code{error} Statement
3792
@cindex @code{error} (statement)
3794
The statement @code{error} terminates the execution of @emph{all} paths and
3795
prints out the given expression, which must be a string, and the line of the
3799
error @var{message};
3802
@c ----------------------------------------------------------------------------
3804
@node Foreach, If, Error, Statements
3805
@subsection The @code{foreach} Statement
3806
@cindex @code{foreach} (statement)
3808
You may wish to manipulate all elements of a list or a record
3809
@emph{sequentially} in @var{one} rule path. For this purpose, the
3810
@code{foreach} statement was introduced. It has the following format:
3813
foreach @var{$var} in @var{expr}:
3818
Sequentually, @var{$var} is assigned a number of values, depending on the
3819
type of @var{expr}, and the statement sequence @var{statements} is executed
3820
for each of those assignments. Every time the @var{statements} are being
3821
walked through, the variable @var{$var} is defined again. Its scope is the
3822
block @var{statements}.
3826
If @var{expr} is a list, @var{$var} is assigned the first, second,
3827
third, ... element of @var{expr}.
3829
If @var{expr} is a record, @var{$var} is assigned the first, second,
3830
... attribute name of @var{expr}.
3832
If @var{expr} is a positive number @var{n}, the variable @var{$var} is
3833
assigned the numbers 1, 2, ..., @var{n} sequentially.
3835
If @var{expr} is a negative number @var{n}, the variable @var{$var} is
3836
assigned the numbers -1, -2, ..., @var{-n} sequentially.
3838
If @var{expr} is an empty list, an empty record or the number 0, the
3839
foreach loop is terminated immediately.
3842
@c ----------------------------------------------------------------------------
3844
@node If, Parallel, Foreach, Statements
3845
@subsection The @code{if} Statement
3846
@cindex @code{if} (statement)
3847
@cindex @code{else} (keyword)
3848
@cindex @code{elseif} (keyword)
3850
An @code{if} statement has the following form:
3853
if @var{condition1} then
3855
elseif @var{condition2} then
3862
The @code{elseif} part may be repeated unrestrictedly (including zero times),
3863
the @code{else} part may be omitted.
3865
First, @var{condition1} is evaluated. If it is satisfied, the
3866
statement sequence @var{statements1} is executed.
3868
If the first condition is not satisfied, @var{condition2} is evaluated; if
3869
the result is true, @var{statements2} is executed. This procedure is
3870
repeated for every @code{elseif} part until a condition is satisfied.
3872
If the @code{if} condition and @code{elseif} conditions fail, the statement
3873
sequence @var{statements3} is executed (if it exists).
3875
After the @code{if} statement has been processed, the following statement is
3878
The @code{if} after the @code{end} may be omitted.
3880
@c ----------------------------------------------------------------------------
3882
@node Parallel, Repeat, If, Statements
3883
@subsection The @code{parallel} Statement
3884
@cindex @code{parallel} (statement)
3886
By using the @code{parallel} statement, more than one continuation of an
3887
analysis can be generated. Its format is:
3900
This creates as many rule paths as there are statement sequences. In the
3901
first rule path, @var{statements1} are executed, in the second one
3902
@var{statements2} are executed, etc. Each rule path continues by
3903
executing the statements following the @code{parallel} statement.
3905
The keyword @code{parallel} behind the @code{end} can be omitted.
3907
@c ----------------------------------------------------------------------------
3909
@node Repeat, Require, Parallel, Statements
3910
@subsection The @code{repeat} Statement
3911
@cindex @code{repeat} (statement)
3913
You may wish to repeat a sequence of statements while a specific condition
3914
holds. This can be realised by the @code{repeat} loop. It has the following
3920
while @var{condition};
3925
The statements @var{statements1} are executed. Then, @var{condition}
3926
is tested. If it holds, the @var{statements2} are
3927
executed and the @code{repeat} statement is executed again. If @var{condition}
3928
does not hold, execution proceeds after the @code{repeat} statement.
3930
If @var{statements1} is empty, the @code{repeat} loop is equivalent to a
3934
repeat while @var{condition};
3939
If @var{statements2} is empty, the @code{repeat} loop is equivalent to a
3945
while @var{condition};
3949
@c ----------------------------------------------------------------------------
3951
@node Require, Result, Repeat, Statements
3952
@subsection The @code{require} Statement
3953
@cindex @code{require} (statement)
3955
A statement of the form
3958
require @var{condition};
3967
tests whether @var{condition} is true. If this is not the case the rule path
3968
is terminated @emph{without} error message. Test statements should be used to
3969
decide whether the combination of a state and a link is grammatical.
3971
@c ----------------------------------------------------------------------------
3973
@node Result, Return, Require, Statements
3974
@subsection The @code{result} Statement
3975
@cindex @code{result} (statement)
3976
@cindex @code{accept} (keyword)
3979
@item In combi rules:
3983
result @var{expr}, rules @var{rule1}, @var{rule2}, ...;
3986
specifies the Result feature structure of the rule and the successor rules. The
3987
value @var{expr} is the Result feature structure. Behind the keyword
3988
@code{rules} the names of all successor rules are enumerated. For every
3989
successor rule that is being executed a new rule path will be created. The rule
3990
set may be enclosed in parentheses.
3992
If you want successor rules to be executed only if no other rule has
3993
been successful, you can put their names behind the other rules' names
3994
and write an @code{else} in front of them:
3998
rules @var{rule1}, @var{rule2} else @var{rule3}, @var{rule4} else ...;
4001
If none of the normal rules (here: @var{rule1} and @var{rule2}) has been
4002
successful, @var{rule3} and @var{rule4} are executed. If these rule also fail,
4003
the next rules are executed, and so on. A rule has been successful if at least
4004
one @code{result} statement has been executed.
4006
@item In combi-rules and end-rules:
4007
If the input is to be accepted by the @code{result} statement (and
4008
therefore no successor rules are to be called) the following format has
4012
result @var{expr}, accept;
4015
If this statement is reached in a rule path, the input is accepted as
4016
grammatically well-formed. The value @var{expr} is returned as the
4017
result of the morphological or syntactic analysis.
4020
The format of a @code{result} statement in a filter or robust-rule is
4026
If this statement is reached, the value @var{expr} is used as a result
4027
of the executed rule.
4029
@item In robust-rules:
4030
The format of a @code{result} statement in a robust-rule:
4033
result @var{feature_structure};
4039
result @var{surface}, @var{feature_structure};
4042
The word form @var{surface} with feature structure @var{feature_structure} is
4043
used as a result of the robust-rule. @var{surface} must be a prefix of the
4044
input that has not been parsed yet. If it is omitted, the input up to, but
4045
excluding, the first space is taken.
4047
@item In allo-rules:
4048
The format of the @code{result} statement in an allo rule is:
4051
result @var{surface}, @var{feature_structure};
4054
It creates an entry in the allomorph lexicon. The allomorph surface
4055
@var{surface} must be a string; @var{feature_structure} is the feature
4056
structure of the allomorph.
4060
@c ----------------------------------------------------------------------------
4062
@node Return, Stop, Result, Statements
4063
@subsection The @code{return} Statement
4064
@cindex @code{return} (statement)
4066
In a subrule, the @code{return} statement is of the following form:
4072
The value of @var{expr} is returned to the rule that invoked this subrule and
4073
the subrule execution is finished.
4075
In a pruning rule, the @code{return} statement is of the same form. Here,
4076
@var{expr} must be a list a list of the symbols @code{yes} and/or
4077
@code{no}. Each state in the feature structure list, which is the pruning rule
4078
parameter, corresponds to a symbol in the result list. If the symbol is
4079
@code{yes}, the corresponding state is preserved. If the symbol is @code{no},
4080
the state is abandoned.
4082
@c ----------------------------------------------------------------------------
4084
@node Stop, , Return, Statements
4085
@subsection The @code{stop} Statement
4086
@cindex @code{stop} (statement)
4088
The @code{stop} statement terminates the current rule path. Its format is:
4094
@c ----------------------------------------------------------------------------
4096
@node Files, Syntax Summary, Statements, The Language
4100
A Malaga grammar system comprises several files: a symbol file, a lexicon file,
4101
an allomorph rule file, a morphology rule file, an extended symbol file
4102
(optional), and a syntax rule file (optional). The type of a file can be
4103
seen by the ending of the file name. A grammar for the English language may
4104
consist of the files @file{english.sym}, @file{english.lex},
4105
@file{english.all}, @file{english.mor} and @file{english.syn}.
4108
* Symbol File:: The definition of all morphology symbols.
4109
* Extended Symbol File:: Additional syntax symbols.
4110
* Lexicon File:: The lexicon from which allomorphs will be created.
4111
* Allomorph Rule File:: The rules that create the allomorphs.
4112
* Combi-Rule Files:: The LAG rules that combine the allomorphs or words.
4115
@c ----------------------------------------------------------------------------
4117
@node Symbol File, Extended Symbol File, Files, Files
4118
@subsection The Symbol File
4119
@cindex files, symbol
4120
@cindex symbol files
4122
A symbol file has the suffix @file{.sym}. It contains the symbol table.
4124
@c ----------------------------------------------------------------------------
4126
@node Extended Symbol File, Lexicon File, Symbol File, Files
4127
@subsection The Extended Symbol File
4128
@cindex files, extended symbol
4129
@cindex symbol files, extended
4130
@cindex extended symbol files
4132
An extended symbol file has the suffix @file{.esym}. It contains an
4133
additional symbol table that contains symbols that may only be used in the
4136
@c ----------------------------------------------------------------------------
4138
@node Lexicon File, Allomorph Rule File, Extended Symbol File, Files
4139
@subsection The Lexicon File
4140
@cindex files, lexicon
4141
@cindex lexicon files
4143
A lexicon file has the suffix @file{.lex}. It consists of any number of
4144
values and constant definitions, each terminated by a semicolon. Each
4145
value stands for a lexical entry. A value may contain named constants
4146
and the operators @samp{.}, @samp{+}, @samp{-}, @samp{*}, and @samp{/}.
4147
values, the lexical entries; The format of the lexical entries is free,
4148
although it should be consistent with the conception of the whole rule
4151
@c ----------------------------------------------------------------------------
4153
@node Allomorph Rule File, Combi-Rule Files, Lexicon File, Files
4154
@subsection The Allomorph Rule File
4155
@cindex files, allomorph rule
4156
@cindex rule files, allomorph
4157
@cindex allomorph rule files
4159
The allomorph lexicon is generated from the base form lexicon by applying the
4160
allo-rule on the base form entries. The allomorph generation rule file has
4161
the suffix @file{.all} and consists of one allo-rule, an optional
4162
output-filter, and any number of subrules and constant definitions.
4164
For every lexical entry, the allo-rule is executed with the value of the
4165
lexicon entry as parameter. The allo-rule can generate allomorphs using the
4166
@code{result} statement.
4168
After all allomorphs have been produced, the output-filter is executed once for
4169
each surface in the (intermediate) allomorph lexicon. As parameter, the
4170
output-filter gets the list of feature structures that share that surface. An
4171
entry in the final allomorph lexicon is created everytime the @code{result}
4172
statement is executed. The surface cannot be changed by the output-filter.
4174
@c ----------------------------------------------------------------------------
4176
@node Combi-Rule Files, , Allomorph Rule File, Files
4177
@subsection The Combi-Rule Files
4178
@cindex files, combi-rule
4179
@cindex files, syntax rule
4180
@cindex files, morphology rule
4181
@cindex rule files, syntax
4182
@cindex rule files, morphology
4183
@cindex combi-rule files
4184
@cindex syntax rule files
4185
@cindex morphology rule files
4187
A grammar system includes up to two combination rules files: one for
4188
morphological combination with the suffix @file{.mor} and (optionally) one
4189
for syntactic combination with the suffix @file{.syn}.
4191
A combination rule file consists of an initial state and any number of
4192
combi-rules, subrules, and constant definitions. A syntax rule
4193
file may contain one optional pruning-rule, one optional input-filter and one
4194
optional output-filter; a morphology rule file may contain
4195
one optional robust-rule and one optional output-filter.
4197
Beginning with the rules listed up in the initial state, the rules and
4198
their successors are processed until a @code{result} statement with the
4199
keyword @code{accept} is encountered in every path. A path dies if there is no
4200
more input (from the lexicon or from the morphology) that can be processed.
4202
In morphology, if analysis has created no result and robust analysis has been
4203
switched on, the robust-rule will be called with the analysis surface and can
4206
In syntax, when a new wordfom has been imported from morphology, the
4207
input-filter can take a look at its feature structuress and create new result
4210
In syntax, if a pruning-rule is present and pruning has been activated,
4211
the concatenation of the next word form is preceded by the following
4212
step: The feature structures of all current LAG states are merged into a list,
4213
which is the parameter of the pruning rule. The pruning-rule must
4214
execute a @code{return} statement with a list of the symbols @code{yes}
4215
and @code{no}. Each state in the feature structure list corresponds to a symbol
4216
in the result list. If the symbol is @code{yes}, the corresponding state
4217
is preserved. If the symbol is @code{no}, the state is abandoned.
4219
After analysis, the output-filter can take a look at all result
4220
feature structures and create new result feature structures.
4222
@c ----------------------------------------------------------------------------
4224
@node Syntax Summary, , Files, The Language
4225
@section Summary of the Malaga Syntax
4226
@cindex syntax, Malaga
4228
The syntax of Malaga source texts is defined formally by a sort of EBNF
4233
Terminals like @code{assert} and @samp{:=} stand for themselves.
4235
Nonterminals like @var{assignment} are defined by @dfn{productions}.
4237
A bar `|' separates alternatives.
4239
Brackets `[]' enclose optional parts.
4241
Curly braces `@{@}' enclose parts that are repeated zero times, one time, or
4244
Parentheses `()' are used for grouping.
4247
The start productions for Malaga source texts are
4248
@var{lexicon-file}, @var{rule-file}, and @var{symbol-file}. A
4249
nonterminal marked with @samp{*} in its definition is a lexical
4254
@item assert-statement:
4255
(@code{assert} | @samp{!}) @var{condition} @samp{;}
4258
@var{path} (@samp{:=} | @samp{:=+} | @samp{:=-} | @samp{:=*} |
4259
@samp{:=/}) @var{expression} @samp{;} | @samp{<} @var{path} @{@samp{,}
4260
@var{path}@} @samp{>} @samp{:=} @var{expression} @samp{;}
4262
@item break-statement:
4263
@code{break} [@var{label}] @samp{;}
4265
@item choose-statement:
4266
@code{choose} @var{variable} @code{in} @var{expression} @samp{;}
4269
@samp{#} @{@var{printing-char}@}
4272
[@code{not}] (@var{expression} [@var{comparison-operator}
4273
@var{expression}] | @var{match-comparison})
4275
@item comparison-operator:
4276
@samp{=} | @samp{/=} | @samp{~} | @samp{/~} | @code{in} | @code{less} |
4277
@code{greater} | @code{less_equal} | @code{greater_equal}
4280
@var{comparison} (@{@code{and} @var{comparison}@} | @{@code{or}
4284
@samp{@@} @var{identifier}
4286
@item constant-definition:
4287
(@code{define} | @code{default}) @var{constant} @samp{:=}
4288
@var{constant-expression} @samp{;}
4290
@item constant-expression:
4293
@item define-statement:
4294
@code{define} @var{variable} @samp{:=} @var{expression} @samp{;} |
4295
@code{define} @samp{<} @var{variable} @{@samp{,} @var{variable}@}
4296
@samp{>} @samp{:=} @var{expression} @samp{;}
4298
@item error-statement:
4299
@code{error} @var{expression} @samp{;}
4302
@var{term} @{(@samp{+} | @samp{-}) @var{term}@}
4305
@var{value} @{@samp{.} @var{value}@}
4307
@item foreach-statement:
4308
[@var{label} @samp{:}] @code{foreach} @var{variable} @code{in}
4309
@var{expression} @samp{:} @var{statements} @code{end}
4310
[@code{foreach}] @samp{;}
4313
(@var{letter} | @samp{_} | @samp{&}) @{@var{letter} | @var{digit} |
4314
@samp{_} | @samp{&}@}
4317
@code{if} @var{condition} @code{then} @var{statements}
4318
@{@code{elseif} @var{condition} @code{then} @var{statements}@}
4319
[@code{else} @var{statements}] @code{end} [@code{if}] @samp{;}
4322
@code{include} @var{string} @samp{;}
4325
@code{initial} @var{constant-expression} @samp{,} @var{rule-set} @samp{;}
4331
@{@var{constant-definition} | @var{constant-expression} @samp{;}@}
4334
@samp{<} @{@var{expression} @{@samp{,} @var{expression}@}@} @samp{>}
4337
@var{constant-expression} [@samp{:} @var{variable}] | @var{variable}
4338
@samp{:} @var{constant-expression}
4340
@item match-comparison:
4341
@var{expression} @code{matches} ( @samp{(} @var{match} @{@samp{,}
4342
@var{match}@} @samp{)} | @var{match} @{@samp{,} @var{match}@} )
4345
@var{digit} @{@var{digit}@} ( @samp{L} | @samp{R} | [@samp{.} @var{digit}
4346
@{@var{digit}@}] [@samp{E} @var{digit} @{@var{digit}@}] )
4348
@item parallel-statement:
4349
@code{parallel} @var{statements} @{@code{or} @var{statements}@}
4350
@code{end} [@code{parallel}] @samp{;}
4353
@var{variable} @{@samp{.} @var{value}@}
4356
@samp{[} @{@var{symbol-value-pair} @{@samp{,}
4357
@var{symbol-value-pair}@}@} @samp{]}
4359
@item repeat-statement:
4360
@code{repeat} @var{statements} @code{while} @var{condition} @samp{;}
4361
@var{statements} @code{end} [@code{repeat}] @samp{;}
4363
@item require-statement:
4364
(@code{require} | @samp{?}) @var{condition} @samp{;}
4366
@item result-statement:
4367
@code{result} @var{expression} [@samp{,} (@var{rule-set} |
4368
@code{accept})] @samp{;}
4370
@item return-statement:
4371
@code{return} @var{expression} @samp{;}
4374
@var{rule-type} @var{rule-name} @samp{(} @var{variable} @{@samp{,}
4375
@var{variable}@} @samp{)} @samp{:} @var{statements} @code{end}
4376
[@var{rule-type}] [@var{rule-name}] @samp{;}
4379
@{@var{rule} | @var{constant-definition} | @var{initial} |
4386
@code{rules} (@var{rules} @{@code{else} @var{rules}@} | @samp{(}
4387
@var{rules} @{@code{else} @var{rules}@} @samp{)})
4390
@code{allo_rule} | @code{combi_rule} | @code{end_rule} |
4391
@code{pruning_rule} | @code{robust_rule} | @code{input_filter} |
4392
@code{output_filter} | @code{subrule}
4395
@var{rule-name} @{@samp{,} @var{rule-name}@}
4398
@{@var{assert-statement} | @var{assignment} | @var{choose-statement} |
4399
@var{define-statement} | @var{error-statement} | @var{foreach-statement} |
4400
@var{if-statement} | @var{parallel-statement} | @var{repeat-statement} |
4401
@var{require-statement} | @var{result-statement} | @var{return-statement} |
4402
@var{stop-statement}@}
4404
@item stop-statement:
4405
@code{stop} @samp{;}
4408
@samp{"} @{@var{char-except-double-quotes} | @samp{\"} | @samp{\\}@} @samp{"}
4410
@item subrule-invocation:
4411
@var{rule-name} @samp{(} @var{expression} @{@samp{,} @var{expression}@}
4416
@item symbol-definition:
4417
symbol [@samp{:=} @samp{<} @var{symbol} @{@samp{,} @var{symbol}@}
4421
@{@var{symbol-definition} | @var{include}@}
4423
@item symbol-value-pair:
4424
@var{expression} @samp{:} @var{expression}
4427
@var{factor} @{(@samp{*} | @samp{/}) @var{factor}@}
4430
[@samp{-}] (@var{symbol} | @var{string} | @var{number} | @var{list} |
4431
@var{record} | @var{constant} | @var{subrule-invocation} |
4432
@var{variable} | @samp{(} @var{condition} @samp{)})
4435
@samp{$} @var{identifier}
4438
@c ----------------------------------------------------------------------------
4440
@node Index, , The Language, Top
4446
@c End of file. ===============================================================