1
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
4
<title>DM4 §28: How nouns are parsed</title>
5
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
6
<link rel="stylesheet" type="text/css" href="dm4.css">
10
<a href="index.html">home</a> /
11
<a href="contents.html">contents</a> /
12
<a href="ch4.html" title="Chapter IV: Describing and Parsing">chapter IV</a> /
13
<a href="s27.html" title="§27: Listing and grouping objects">prev</a> /
14
<a href="s29.html" title="§29: Plural names for duplicated objects">next</a> /
15
<a href="dm4index.html">index</a>
18
<a id="p216" name="p216"></a>
19
<h2>§28 How nouns are parsed</h2>
21
<blockquote>The Naming of Cats is a difficult matter,<br>
22
It isn't just one of your holiday games;<br>
23
You may think at first I'm as mad as a hatter<br>
24
When I tell you, a cat must have THREE DIFFERENT NAMES.<br>
25
— T. S. Eliot (1888–1965), <i>The Naming of Cats</i></blockquote>
27
<p class="normal"><span class="atleft"><img src="dm4-216_1.jpg" alt=""></span>
28
Suppose we have a tomato defined with</p>
30
<p class="lynxonly"></p>
31
<pre class="code">name 'fried' 'green' 'tomato',</pre>
33
<p class="normal">but which is going to redden later and need to be
34
referred to as “red tomato”. The <code>name</code> property
35
holds an array of dictionary words, so that</p>
37
<p class="lynxonly"></p>
40
tomato.&name-->0 == 'fried'
41
tomato.&name-->1 == 'green'
42
tomato.&name-->2 == 'tomato'
45
<p class="normal">(Recall that <code>X.#Y</code> tells you the number of
46
<code>-></code> entries in such a property array, in this case six,
47
so that <code>X.#Y/2</code> tells you the number of <code>--></code>
48
entries, in this case three.) You are quite free to alter this array
51
<p class="lynxonly"></p>
52
<pre class="code">tomato.&name-->1 = 'red';</pre>
54
<p class="normal">The down side of this technique is that it's clumsy,
55
when all's said and done, and not so very flexible, because you can't
56
change the length of the <code>tomato.&name</code> array during
57
play. Of course you <em>could</em> define the tomato</p>
59
<p class="lynxonly"></p>
61
with name 'fried' 'green' 'tomato' 'blank.' 'blank.' 'blank.'
62
'blank.' 'blank.' 'blank.' 'blank.' 'blank.' 'blank.'
63
'blank.' 'blank.' 'blank.' 'blank.' 'blank.' 'blank.',
66
<p class="normal">or something similar, giving yourself another (say)
67
fifteen “slots” to put new names into, but this is
68
inelegant even by Inform standards. Instead, an object like the tomato
69
can be given a <code>parse_name</code> routine, allowing complete flexibility
70
for the designer to specify just what names it does and doesn't match.
71
It is time to begin looking into the parser and how it works.</p>
73
<a id="p217" name="p217"></a>
74
<p class="dotbreak">� � � � �</p>
76
<p class="normal">The Inform parser has two cardinal principles:
77
firstly, it is designed to be as “open-access” as possible,
78
because a parser cannot ever be general enough for every game without
79
being highly modifiable. This means that there are many levels on
80
which you can augment or override what it does. Secondly, it tries
81
to be generous in what it accepts from the player, understanding the
82
broadest possible range of commands and making no effort to be strict
83
in rejecting ungrammatical requests. For instance, given a shallow
84
pool nearby, “examine shallow” has an adjective without
85
a noun: but it's clear what the player means. In general, all sensible
86
commands should be accepted but it is not important whether or not
87
nonsensical ones are rejected.</p>
89
<p class="indent">The first thing the parser does is to read in text
90
from the keyboard and break it up into a stream of words: so the text
91
“wizened man, eat the grey bread” becomes</p>
93
<p class="syntax"><tt>wizened</tt> / <tt>man</tt> / <tt>,</tt> /
94
<tt>eat</tt> / <tt>the</tt> / <tt>grey</tt> / <tt>bread</tt></p>
96
<p class="normal">and these words are numbered from 1. At all times
97
the parser keeps a “word number” marker to keep its place
98
along this line, and this is held in the variable <code>wn</code>.
99
The routine <code>NextWord()</code> returns the word at the current
100
position of the marker, and moves it forward, i.e., adds 1 to <code>wn</code>.
101
For instance, the parser may find itself at word 6 and trying to
102
match “grey bread” as the name of an object. Calling
103
<code>NextWord()</code> returns the value <code>'grey'</code> and
104
calling it again gives <code>'bread'</code>.</p>
106
<p class="indent">Note that if the player had mistyped “grye bread”,
107
“grye” being a word which isn't mentioned anywhere in the
108
program or created by the library, then <code>NextWord()</code> returns
109
0 for ‘not in the dictionary’. Inform creates the dictionary
110
of a story file by taking all the <code>name</code> words of objects, all
111
the verbs and prepositions from grammar lines, and all the words used
112
in constants like <code>'frog'</code> written in the source code,
113
and then sorting these into alphabetical order.</p>
115
<p class="aside"><span class="warning">▲</span>
116
However, the story file's dictionary only has 9-character resolution.
117
(And only 6 if Inform has been told to compile an early-model story
118
file: see <a href="s45.html">§45</a>.) Thus the values of
119
<code>'polyunsaturate'</code> and <code>'polyunsaturated'</code>
120
are equal. Also, upper case and lower case letters are considered the
121
same. Although dictionary words are permitted to contain numerals
122
or typewriter symbols like <code>-</code>, <code>:</code> or
123
<code>/</code>, these cost as much as two ordinary letters, so
124
<code>'catch-22'</code> looks the same as <code>'catch-2'</code> or
125
<code>'catch-207'</code>.</p>
127
<p class="aside"><span class="warning">▲▲</span>
128
A dictionary word can even contain spaces, full stops or commas, but
129
if so it is ‘untypeable’. For instance, <code>'in,out'</code>
130
is an untypeable word because if the player were to type something
131
like “go in,out”, the text would be broken up into four
132
words, <code>go</code> /
133
<a id="p218" name="p218"></a>
134
<code>in</code> / <code>,</code> / <code>out</code>.
135
Thus <code>'in,out'</code> may be in the story file's dictionary but
136
it will never match against any word of what the player typed. Surprisingly,
137
this can be useful, as it was at the end of <a href="s18.html">§18</a>.</p>
139
<p class="dotbreak">� � � � �</p>
141
<p class="normal">Since the story file's dictionary isn't always perfect,
142
there is sometimes no alternative but to actually look at the player's
143
text one character at a time: for instance, to check that a 12-digit
144
phone number has been typed correctly and in full.</p>
146
<p class="indent">The routine <code>WordAddress(wordnum)</code> returns
147
a byte array of the characters in the word, and <code>WordLength(wordnum)</code>
148
tells you how many characters there are in it. Given the above example
149
text of “wizened man, eat the grey bread”:</p>
151
<p class="lynxonly"></p>
154
WordAddress(4)->0 == 'e'
155
WordAddress(4)->1 == 'a'
156
WordAddress(4)->2 == 't'
159
<p class="normal">because word number 4 is “eat”. (Recall
160
that the comma is considered as a word in its own right.)</p>
162
<p class="aside"><span class="warning">▲</span>
163
The parser provides a basic routine for comparing a word against the
164
texts <code>'0'</code>, <code>'1'</code>, <code>'2'</code>, …,
165
<code>'9999'</code>, <code>'10000'</code> or, in other words, against
166
small numbers. This is the library routine <code>TryNumber(wordnum)</code>,
167
which tries to parse the word at <code>wordnum</code> as a number and
168
returns that number, if it finds a match. Besides numbers written
169
out in digits, it also recognises the texts <code>'one'</code>,
170
<code>'two'</code>, <code>'three'</code>, …, <code>'twenty'</code>.
171
If it fails to recognise the text as a number, it returns −1,000;
172
if it finds a number greater than 10,000, it rounds down and returns
175
<p class="dotbreak">� � � � �</p>
177
<p class="normal">To return to the naming of objects, the parser normally
178
recognises any arrangement of some or all of the <code>name</code> words of an
179
object as a noun which refers to it: and the more words, the better the
180
match is considered to be. Thus “fried green tomato” is
181
a better match than “fried tomato” or “green tomato”
182
but all three are considered to match. On the other hand, so is
183
“fried green”, and “green green tomato green fried
184
green” is considered a very good match indeed. The method is quick
185
and good at understanding a wide variety of sensible texts, though
186
poor at throwing out foolish ones. (An example of the parser's strategy
187
of being generous rather than strict.) To be more precise, here is what
188
happens when the parser wants to match some text against an object:</p>
190
<a id="p219" name="p219"></a>
191
<ol style="list-style-type:decimal">
192
<li>If the object provides a <code>parse_name</code> routine, ask
193
this routine to determine how good a match there is.</li>
194
<li>If there was no <code>parse_name</code> routine, or if there was
195
but it returned −1, ask the entry point routine <code>ParseNoun</code>,
196
if the game has one, to make the decision.</li>
197
<li>If there was no <code>ParseNoun</code> entry point, or if there
198
was but it returned −1, look at the <code>name</code> of the object
199
and match the longest possible sequence of words given in the <code>name</code>.</li>
202
<p class="normal">So: a <code>parse_name</code> routine, if provided,
203
is expected to try to match as many words as possible starting from
204
the current position of <code>wn</code> and reading them in one at
205
a time using the <code>NextWord()</code> routine. Thus it must not stop
206
just because the first word makes sense, but must keep reading and find
207
out how many words in a row make sense. It should return:
209
<p class="lynxonly"></p>
210
<div class="inset"><table>
211
<tr><td>0</td><td> if the text didn't make any sense at all,</td></tr>
212
<tr><td><i>k</i></td><td> if <i>k</i> words in a row of the text seem to refer to the
214
<tr><td>−1</td><td> to tell the parser it doesn't want to decide
218
<p class="normal">The word marker <code>wn</code> can be left anywhere
219
afterwards. For example, here is the fried tomato with which this section
222
<p class="lynxonly"></p>
224
parse_name [ n colour;
225
if (self.ripe) colour = 'red'; else colour = 'green';
226
while (NextWord() == 'tomato' or 'fried' or colour) n++;
231
<p class="normal">The effect of this is that if <code>tomato.ripe</code> is
232
true then the tomato responds to the names “tomato”, “fried”
233
and “red”, and otherwise to “tomato”, “fried”
234
and “green”.</p>
236
<p class="indent">As a second example of how <code>parse_name</code> can
237
be useful, suppose you define:</p>
239
<p class="lynxonly"></p>
241
Object -> "fly in amber"
242
with name 'fly' 'in' 'amber';
245
<p class="normal">If the player then types “put fly in amber in
246
hole”, the parser will be thrown, because it will think “fly
247
in amber in” is all just naming the object and then it won't
248
know what the word “hole” is doing at the end. However:</p>
250
<p class="lynxonly"></p>
252
Object -> "fly in amber"
254
if (NextWord() ~= 'fly' or 'amber') return 0;
255
if (NextWord() == 'in' && NextWord() == 'amber')
261
<a id="p220" name="p220"></a>
262
<p class="normal">Now the word “in” is only recognised as
263
part of the fly's name if it is followed by the word “amber”,
264
and the ambiguity goes away. (“amber in amber” is also
265
recognised, but then it's not worth the bother of excluding.)</p>
267
<p class="aside"><span class="warning">▲</span>
268
<code>parse_name</code> is also used to spot plurals: see
269
<a href="s29.html">§29</a>.</p>
271
<a id="ex71" name="ex71"></a>
272
<p class="aside"><span class="warning"><b>•</b>
273
<b><a href="sa6.html#ans71">EXERCISE 71</a></b></span><br>
274
Rewrite the tomato's <code>parse_name</code> to insist that the adjectives
275
must come before the noun, which must be present.</p>
277
<a id="ex72" name="ex72"></a>
278
<p class="aside"><span class="warning"><b>•</b>
279
<b><a href="sa6.html#ans72">EXERCISE 72</a></b></span><br>
280
Create a musician called Princess who, when kissed, is transformed
281
into “<tt>/?%?/</tt> (the artiste formerly known as Princess)”.</p>
283
<a id="ex73" name="ex73"></a>
284
<p class="aside"><span class="warning"><b>•</b>
285
<b><a href="sa6.html#ans73">EXERCISE 73</a></b></span><br>
286
Construct a drinks machine capable of serving cola, coffee or tea,
287
using only one object for the buttons and one for the possible drinks.</p>
289
<a id="ex74" name="ex74"></a>
290
<p class="aside"><span class="warning"><b>•</b>
291
<b><a href="sa6.html#ans74">EXERCISE 74</a></b></span><br>
292
Write a <code>parse_name</code> routine which looks through <code>name</code>
293
in just the way that the parser would have done anyway if there hadn't
294
been a <code>parse_name</code> in the first place.</p>
296
<a id="ex75" name="ex75"></a>
297
<p class="aside"><span class="warning"><b>•</b>▲
298
<b><a href="sa6.html#ans75">EXERCISE 75</a></b></span><br>
299
Some adventure game parsers split object names into ‘adjectives’
300
and ‘nouns’, so that only the pattern ‹<span class="token">0 or more adjectives</span>›
301
‹<span class="token">1 or more nouns</span>› is recognised. Implement this.</p>
303
<a id="ex76" name="ex76"></a>
304
<p class="aside"><span class="warning"><b>•</b>
305
<b><a href="sa6.html#ans76">EXERCISE 76</a></b></span><br>
306
During debugging it sometimes helps to be able to refer to objects by
307
their internal numbers, so that “put object 31 on object 5”
308
would work. Implement this.</p>
310
<a id="ex77" name="ex77"></a>
311
<p class="aside"><span class="warning"><b>•</b>▲
312
<b><a href="sa6.html#ans77">EXERCISE 77</a></b></span><br>
313
How could the word “<tt>#</tt>” be made a wild-card,
314
meaning “match any single object”?</p>
316
<a id="ex78" name="ex78"></a>
317
<p class="aside"><span class="warning"><b>•</b>▲▲
318
<b><a href="sa6.html#ans78">EXERCISE 78</a></b></span><br>
319
And how could “<tt>*</tt>” be a wild-card for “match
320
any collection of objects”? (Note: you need to have read
321
<a href="s29.html">§29</a> to answer this.)</p>
323
<p class="aside"><span class="warning"><b>•</b>
324
<b>REFERENCES</b></span><br>
325
Straightforward <code>parse_name</code> examples are the chess pieces
326
object and the kittens class of ‘Alice Through the Looking-Glass’.
327
Lengthier ones are found in ‘Balances’, especially
328
in the white cubes class.
329
<span class="warning"><b>•</b></span>Miron Schmidt's library
330
extension <tt>"calyx_adjectives.h"</tt>, based on earlier
331
work by Andrew Clover, provides for objects to have “adnames”
332
as well as “names”: “adnames” are usually
333
adjectives, and are regarded as being less good
334
<a id="p221" name="p221"></a>
335
matches for an object
336
than “names”. In this system “get string”
337
would take either a string bag or a ball of string, but if both were
338
present would take the ball of string, because “string”
339
is in that case a noun rather than an adjective.</p>
343
<a href="index.html">home</a> /
344
<a href="contents.html">contents</a> /
345
<a href="ch4.html" title="Chapter IV: Describing and Parsing">chapter IV</a> /
346
<a href="s27.html" title="§27: Listing and grouping objects">prev</a> /
347
<a href="s29.html" title="§29: Plural names for duplicated objects">next</a> /
348
<a href="dm4index.html">index</a>