1
<HTML><HEAD><TITLE>Section 24: How nouns are parsed</TITLE></HEAD>
2
<BODY BGCOLOR="#FFFFFF">
5
<TR><TD Valign="top"><A HREF="contents.html">Contents</A><BR><A HREF="section23.html">Back</A><BR><A HREF="section25.html">Forward</A><TD bgcolor="#F5DEB3"><BLOCKQUOTE><H3>24. How nouns are parsed</H3></BLOCKQUOTE><TR><TD><TD>
9
<BR>The Naming of Cats is a difficult matter,
10
<BR>It isn't just one of your holiday games;
11
<BR>You may think at first I'm as mad as a hatter
12
<BR>When I tell you, a cat must have THREE DIFFERENT NAMES.
13
<BR><P>...T. S. Eliot (<B>1888</B>--<B>1965</B>), <I>The Naming of Cats</I></BLOCKQUOTE>
17
Bulldust, coolamon, dashiki, fizgig, grungy, jirble, pachinko,
18
poodle-faker, sharny, taghairm
19
<P>...Catachrestic words from Chambers English Dictionary</BLOCKQUOTE>
23
Suppose we have a tomato defined with
25
name "fried" "green" "tomato",
28
but which is going to redden later and need to be referred to as "red
29
tomato''. It's perfectly straightforward to alter the <TT>name</TT> property of
30
an object, which is a word array of dictionary words. For example,
33
for (i=0:2*i<obj.#name:i++) print (address) (obj.&name)-->i, "^";
37
prints out the list of dictionary words held in <TT>name</TT> for a given
38
object. It's perfectly possible to write to this, so we could just set
40
(tomato.&name)-->1 = 'red';
43
but this is not a flexible or elegant solution, and it's time to
44
begin delving into the parser.
47
<P><TR><TD Valign="top"><IMG SRC="icons/dbend.gif" ALT="/\"><TD bgcolor="#EEEEEE"><SMALL> Note that we can't change the size of the <TT>name</TT> array.
48
To simulate this, we could define the object with
49
<TT>name</TT> set to, say, 30 copies of an 'untypeable word' (see below)
50
such as <TT>'blank.'</TT>.
56
The Inform parser is designed to be as "open-access'' as possible,
57
because a parser cannot ever be general enough for every game without
58
being highly modifiable. The first thing it does
59
is to read in text from the keyboard and break it up into a
61
so the text "wizened man, eat the grey bread'' becomes
63
<TT>wizened</TT> / <TT>man</TT> / <TT>,</TT> / <TT>eat</TT> / <TT>the</TT> / <TT>grey</TT> / <TT>bread</TT><BR>
67
and these words are numbered from 1. At all times the parser keeps
68
a "word number'' marker to keep its place along this line, and this
69
is held in the variable <TT>wn</TT>. The routine <TT>NextWord()</TT> returns
70
the word at the current position of the marker, and moves it forward,
71
i.e. adds 1 to <TT>wn</TT>. For instance, the parser may find itself at
72
word 6 and trying to match "grey bread'' as the name of an object.
73
Calling <TT>NextWord()</TT> gives the value <TT>'grey'</TT> and calling it again
74
gives <TT>'bread'</TT>.
77
Note that if the player had mistyped "grye bread'', "grye'' being a word
78
which isn't mentioned anywhere in the program or created by the library,
79
<TT>NextWord()</TT> returns 0 for 'misunderstood word'. Writing something like
80
<TT>if (w=='grye') ...</TT> somewhere in the program makes Inform put "grye''
81
into the dictionary automatically.
84
<P><TR><TD Valign="top"><IMG SRC="icons/dbend.gif" ALT="/\"><TD bgcolor="#EEEEEE"><SMALL> Remember that the game's dictionary only has 9-character
85
resolution. (And only 6 if Inform has been
86
told to compile an early-model story file: see <A HREF="section31.html">Section 31</A>.)
87
Thus the values of <TT>'polyunsaturate'</TT> and
88
<TT>'polyunsaturated'</TT> are equal.
89
Also, upper case and lower case letters are considered the same.
90
Words are permitted to contain numerals or symbols (but not at
91
present to contain accented
96
<P><TR><TD Valign="top"><IMG SRC="icons/ddbend.gif" ALT="/\/\"><TD bgcolor="#EEEEEE"><SMALL> A dictionary word can even contain spaces, full stops or commas.
97
If so it is 'untypeable'. For instance, <TT>'in,out'</TT> is an untypeable
98
word because if the player does type it then the parser cuts it into
99
three, never checking the dictionary for the entire word. Thus the
100
constant <TT>'in,out'</TT> can never be anything that <TT>NextWord</TT> returns.
101
This can actually be useful (as it was in <A HREF="section16.html">Section 16</A>).
106
<P><TR><TD Valign="top"><IMG SRC="icons/dbend.gif" ALT="/\"><TD bgcolor="#EEEEEE"><SMALL> It can also be useful to check for numbers. The library
107
routine <TT>TryNumber(wordnum)</TT> tries to parse the word
108
at <TT>wordnum</TT> as a number (recognising decimal numbers and English
109
ones from "one'' to "twenty''), returning -1000 if it fails
110
altogether, or else the number. Values exceeding 10000 are rounded
115
<P><TR><TD Valign="top"><IMG SRC="icons/ddbend.gif" ALT="/\/\"><TD bgcolor="#EEEEEE"><SMALL> Sometimes there is no alternative but to actually look at
116
the player's text one character at a time (for instance, to check
117
a 20-digit phone number). The routine <TT>WordAddress(wordnum)</TT>
118
returns a byte array of the characters in the word, and
119
<TT>WordLength(wordnum)</TT> tells you how many characters there are in it.
120
Thus in the above example,
122
thetext = WordAddress(4);
123
print WordLength(4), " ", (char) thetext->0, (char) thetext->2;
128
prints the text "3 et''.
133
An object can affect how its name is parsed by giving a <TT>parse_name</TT>
134
routine. This is expected to try to match as many words as possible
135
starting from the current position of <TT>wn</TT>, reading them in one at
136
a time using the <TT>NextWord()</TT> routine. Thus it must not stop just because
137
the first word makes sense, but must keep reading and find out how many
138
words in a row make sense.
141
0<SAMP> </SAMP> if the text didn't make any sense at all,<BR>
142
<I>k</I><SAMP> </SAMP> if <I>k</I> words in a row of the text seem to refer to the object, or<BR>
143
-1<SAMP> </SAMP> to tell the parser it doesn't want to decide after all.<BR>
146
The word marker <TT>wn</TT> can be left anywhere afterwards. For example:
148
Object -> thing "weird thing"
150
[ i; while (NextWord()=='weird' or 'thing') i++;
155
This definition duplicates (very nearly) the effect of having defined:
157
Object -> thing "weird thing"
158
with name "weird" "thing";
161
Which isn't very useful. But the tomato can now be coded up with
164
[ i j; if (self has general) j='red'; else j='green';
165
while (NextWord()=='tomato' or 'fried' or j) i++;
170
so that "green" only applies until its <TT>general</TT> attribute has
171
been set, whereupon "red'' does.
174
<P><TR><TD Valign="top"><IMG SRC="icons/exercise.gif" ALT="??"><TD bgcolor="#FBB9AC"><A NAME="ex56"><B>EXERCISE 56:</B><BR>(link to <A HREF="answers2/answer56.html">the answer</A>)<TR><TD><TD> Rewrite this to insist that the adjectives
175
must come before the noun, which must be
179
<P><TR><TD Valign="top"><IMG SRC="icons/exercise.gif" ALT="??"><TD bgcolor="#FBB9AC"><A NAME="ex57"><B>EXERCISE 57:</B><BR>(link to <A HREF="answers2/answer57.html">the answer</A>)<TR><TD><TD> Create a musician called Princess who, when kissed,
180
is transformed into "<TT>/?%?/</TT> (the
181
artiste formerly known as Princess)''.
184
<P><TR><TD Valign="top"><IMG SRC="icons/exercise.gif" ALT="??"><TD bgcolor="#FBB9AC"><A NAME="ex58"><B>EXERCISE 58:</B><BR>(link to <A HREF="answers2/answer58.html">the answer</A>)<TR><TD><TD> (Cf. 'Cafè Inform'.) Construct a drinks machine capable
185
of serving cola, coffee or tea, using only one object for the buttons
186
and one for the possible drinks.
189
<P><TR><TD Valign="top"><IMG SRC="icons/dbend.gif" ALT="/\"><TD bgcolor="#EEEEEE"><SMALL> <TT>parse_name</TT> is also used to spot plurals: see <A HREF="section25.html">Section 25</A>.
194
Suppose that an object doesn't have a <TT>parse_name</TT> routine, or that
195
it has but it returned -1. The parser then looks at the <TT>name</TT>
196
words. It recognises any arrangement of some or all of these words
197
as a match (the more words, the better). Thus "fried green tomato'' is
198
understood, as are "fried tomato'' and "green tomato''. On the other
199
hand, so are "fried green'' and "green green tomato green fried green''.
200
This method is quick and good at understanding a wide variety of
201
sensible inputs, though bad at throwing out foolish ones.
204
However, you can affect this by using the <TT>ParseNoun</TT> entry point.
205
This is called with one argument, the object in question, and should
206
work exactly as if it were a <TT>parse_name</TT> routine: i.e., returning
207
-1, 0 or the number of words matched as above. Remember that it
208
is called very often and should not be horribly slow. For example,
209
the following duplicates what the parser usually does:
212
while (IsAWordIn(NextWord(),obj,name) == 1) n++; return n;
214
[ IsAWordIn w obj prop k l m;
215
k=obj.&prop; l=(obj.#prop)/2;
216
for (m=0:m<l:m++)
217
if (w==k-->m) rtrue;
222
In this example <TT>IsAWordIn</TT> just checks to see if <TT>w</TT> is one of the
223
entries in the word array <TT>obj.&prop</TT>.
226
<P><TR><TD Valign="top"><IMG SRC="icons/dexercise.gif" ALT="??/\"><TD bgcolor="#FBB9AC"><A NAME="ex59"><B>EXERCISE 59:</B><BR>(link to <A HREF="answers2/answer59.html">the answer</A>)<TR><TD><TD> Many adventure-game parsers split object names into
227
'adjectives' and 'nouns', so that only the pattern
228
<I><B><0 or more adjectives></B></I> <I><B><1 or more nouns></B></I>
229
is recognised. Implement this.
232
<P><TR><TD Valign="top"><IMG SRC="icons/dexercise.gif" ALT="??/\"><TD bgcolor="#FBB9AC"><A NAME="ex60"><B>EXERCISE 60:</B><BR>(link to <A HREF="answers2/answer60.html">the answer</A>)<TR><TD><TD> During debugging it sometimes helps to
233
be able to refer to objects by their internal numbers, so that
234
"put object 31 on object 5'' would work. Implement
240
<P><TR><TD Valign="top"><IMG SRC="icons/dexercise.gif" ALT="??/\"><TD bgcolor="#FBB9AC"><A NAME="ex61"><B>EXERCISE 61:</B><BR>(link to <A HREF="answers2/answer61.html">the answer</A>)<TR><TD><TD> How could the word "<TT>#</TT>'' be made a wild-card,
241
meaning "match any single object''?
245
<P><TR><TD Valign="top"><IMG SRC="icons/ddexercise.gif" ALT="??/\/\"><TD bgcolor="#FBB9AC"><A NAME="ex62"><B>EXERCISE 62:</B><BR>(link to <A HREF="answers2/answer62.html">the answer</A>)<TR><TD><TD> And how could "<TT>*</TT>'' be a wild-card for
246
"match any collection of objects''?
250
<P><TR><TD Valign="top"><IMG SRC="icons/ddexercise.gif" ALT="??/\/\"><TD bgcolor="#FBB9AC"><A NAME="ex63"><B>EXERCISE 63:</B><BR>(link to <A HREF="answers2/answer63.html">the answer</A>)<TR><TD><TD> There is no problem with calling a container
251
"hole in wall'', because the parser will understand
252
"put apple in hole in wall'' as "put (apple) in (hole in wall)''.
253
But create a fly in amber, so that "put fly in amber in
254
hole in wall'' works properly and isn't misinterpreted as
255
"put (fly) in (amber in hole in wall)''.
256
(Warning: you may need to know about the <TT>BeforeParsing</TT> entry
257
point (see <A HREF="section26.html">Section 26</A>) and the format of the <TT>parse</TT> buffer (see
258
<A HREF="section27.html">Section 27</A>).)
261
<P><TR><TD Valign="top"><IMG SRC="icons/refs.gif" ALT="*"><TD bgcolor="#EEEEEE"><B>REFERENCES:</B><BR><SMALL> Straightforward <TT>parse_name</TT> examples are the chess-pieces object
262
and the kittens class of 'Alice Through The Looking-Glass'. Lengthier
263
ones are found in 'Balances', especially in the white cubes class.
265
<HR><A HREF="contents.html">Contents</A> / <A HREF="section23.html">Back</A> / <A HREF="section25.html">Forward</A> <BR>
266
<A HREF="chapter1.html">Chapter I</A> / <A HREF="chapter2.html">Chapter II</A> / <A HREF="chapter3.html">Chapter III</A> / <A HREF="chapter4.html">Chapter IV</A> / <A HREF="chapter5.html">Chapter V</A> / <A HREF="chapter6.html">Chapter VI</A> / <A HREF="chapterA.html">Appendix</A><HR><SMALL><I>Mechanically translated to HTML from third edition as revised 16 May 1997. Copyright © Graham Nelson 1993, 1994, 1995, 1996, 1997: all rights reserved.</I></SMALL></BODY></HTML>