3
<meta http-equiv="content-type" content="text/html; charset=Unicode-2-0">
4
<TITLE>A Neural Network for Disambiguating Pinyin Chinese Input</TITLE>
6
<BODY bgcolor="ffffff" text="000000">
7
<i><font size="-1">Reproduced here with permission from the <a href="http://www.calico.org">Computer Assisted Language Instruction Consortium</a>, from the Proceedings of the CALICO '94 Annual Symposium, (March 14-18, 1994, Northern Arizona University)<br></font></i>
9
<H1><B>A Neural Network for Disambiguating Pinyin Chinese Input</B>
13
<B>Mei Yuan, Richard A. Kunst, and Frank L. Borchardt</B>
18
The most user-friendly way of typing Chinese on a personal computer
19
is through entering it phonetically, as one would speak, by typing
20
in a standard Roman transcription such as <I>pinyin</I>, and letting
21
the computer do the work of looking up the <I>pinyin</I> words
22
in an internal dictionary which contains correspondences between
23
<I>pinyin</I> and Chinese Hanzi characters, then converting them
24
instantly into the correct characters. This phonetic conversion
25
method is the main Chinese input method for both courseware authors
26
and students in the <B>WinCALIS</B> 2.0 Computer Assisted Language
27
Learning (CALL) authoring system for Windows<B>. </B>But there
28
is an inconvenience to the <I>pinyin</I>-based typing of Chinese
29
because there are many homophones, words which sound alike, even
30
when tones are taken into account<B>. </B>In such cases, the computer
31
can only present the typist with a selection list and ask him
32
to choose the desired word<B>. </B>(If Chinese were English, the
33
typist would enter the phonetic transcription "tu,"
34
and be presented with the homophone selection list "to,"
35
"two," and "too.")
37
Sometimes the typist must search and choose from among dozens
38
of Chinese characters<B>. </B>Homophones are most numerous for
39
the single-syllable words which form the high-frequency core vocabulary
40
of any language. Choosing from among a list of homophones is obviously
41
inefficient<B>. </B>Some efforts have been made to deal with this
42
problem, such as maintaining frequency lists and typing in longer
43
contexts of combinations of syllables. These are made use of extensively
44
also in <B>WinCALIS</B> 2.0<B>. </B>Here we would like to introduce
45
a new approach to deal with cases where these other approaches
46
fail: a neural network which is used to predict the most likely
49
Perhaps because our software is used to assist in language teaching,
50
it is easy to imagine that syntax could help to predict<B>. </B>But
51
the problem is that natural language is so flexible that it can't
52
be defined clearly like a computer language<B>. </B>The concept
53
of the neural network has been talked about for several years
54
and has been explained in various ways. Here, we would like to
55
explain a neural network by drawing an analogy<B>. </B>There are
56
two ways to learn language: one is native language learning, the
57
other is foreign language learning<B>. </B>The difference between
58
these is just like that between the neural network and the traditional
59
computer. Foreign language learning, by which we mean learning
60
at school, starts with character, pronunciation, sentence structure,
61
and so on<B>. </B>That means the teacher knows all the details,
63
do and how to do it<B>. </B>This is like the traditional computer
64
algorithm it does what people know definitely how to do and what
65
will happen<B>. </B>But the neural network is designed to simulate
66
the intelligence of human beings and to tell us how to do something
67
after it learns by itself<B>. </B>It is similar to native language
68
learning<B>. </B>Parents hardly are concerned at all about syntax,
69
but the child will speak perfectly after listening and listening<B>.
70
</B>Parents never talk about any rules of language, but the child
71
will follow these rules even though he never notices them<B>.
72
</B>Similarly we provide to the neural network only samples of
73
Chinese text, but no rules, and after learning by itself, it will
74
know what to do<B>. </B>It will know which homophone is the right
75
one to fit in a certain context, just as the English speaker knows
76
which form of /tu/ fits in a certain context<B>. </B>Thus a neural
77
network looks as if it is very suitable to deal with the problem
82
The structure of the neural network in <B>WinCALIS</B> is shown
85
<center><img src="../neural/1fig1.gif" width=300 height=355 alt=""><BR>
86
<I>Figure 1. Structure of the Neural Network
89
There are 23 nodes both in the input layer and the output layer.
90
Each of the nodes stands for one word class category, like "noun,"
91
"verb," "adverb," etc. There are 7 nodes in
92
the hidden layer. The input is one certain word class, and the
93
output indicates the different probabilities of any of the 23
94
word classes appearing after that word class.
98
To train the neural network is to find an appropriate weight matrix
99
according to the training material<B>. </B>We used the generalized
100
delta rule as the learning algorithm for the network<B>. </B>The
101
training material consists of many sample sentences<B>. </B>While
102
the typists who compiled the training material typed Chinese,
103
a file including all the information about word classes was automatically
104
created<B>. </B>For example, when the following sentence was typed:
106
Wǒ yǒu sān bǎ yàoshi.
108
I have three (measure for keys) keys.
110
the corresponding wordclass string "PR Vt NU ME NO EP"
111
(which stands for pronoun, verb-transitive, number, measure, noun,
112
and period) was also saved.
114
When the neural network was trained, any two adjacent word classes
115
were used as a pair of Input and Target. Thus in the sentence
116
above, the pairs of Inputs and Targets are (PR Vt) (Vt NU) (NU
117
ME) (ME NO) (NO EP). When the neural network gets a word class
118
input, it converts the word class to a vector I = (i1,i2,i3 ...
119
i23) by giving a high value to the node corresponding to that
120
word class, and giving a low value to all the others<B>. </B>So
121
if the input word class is AD (adverb), the input vector is I
122
= (0.999, 0.001, 0.001 ... 0.001)<B>. </B>The target is converted
125
After it gets its input, the neural network calculates step by
126
step from the input layer to the output layer.
131
O =W<sup>o</sup>.H ; <br>
132
(Here, H is Hidden, I is Input, O is Output)</b></blockquote>
134
Because the output will be the probability, the value of which
135
ranges from 0 to 1, we used the sigmoid function f(x) = 1/(1+e-x).
146
o23<p></b></blockquote>
147
Then it compares the output with the target and gets the errors<B>.
148
</B>Using these errors it back-propagates step by step from the
149
output layer back to the input layer, in order to adjust the weight
154
. = = (T -O ).f'(O )<br>
157
f'(x) = (1/(1+e-x))' = f(x).(1-f(x))<br>
160
. = h= .Wo.f'(H )<br>
163
W(t+1)= W(t) + . .(I )T+ . W(t-1)<P></b></blockquote>
164
After completing those steps for a single Input-Target pair, the
165
next Input-Target pair is used to repeat the above steps, then
166
the next, then the next until the whole material is learned (about
167
60,000 word class tokens for the whole material).
169
We did not use masses of training material, so the neural network
170
had to occasionally re-learn the same material.<B> </B>When the
171
table resulting from a trial run had only a few changes after
172
one more training, we considered that the weight matrixes had
173
converged sufficiently<B>. </B>Below is the table of weight matrices
174
which <B>WinCALIS</B> uses now.
176
<center><table border>
177
<caption><i>Figure 2. Table of Neural Network Prediction Weights</i></caption>
178
<tr><td>Adverb</td><td>AD ( VE 0.6700 VA 0.0826 CV 0.0769 AD 0.0713 )</td></tr>
179
<tr><td>Adstative</td><td>AS ( AJ 0.9750 VE 0.1478 NO 0.0594 )</td></tr>
180
<tr><td>Adjective</td><td>AJ ( NO 0.3135 VE 0.1979 PS 0.1131 EC 0.0885 EP 0.0802 )</td></tr>
181
<tr><td>Coverb (Prep.)</td><td>CV ( NO 0.5748 VE 0.1640 )</td></tr>
182
<tr><td>Interjection</td><td>IN ( NO 0.4567 VE 0.1553 AJ 0.0780 )</td></tr>
183
<tr><td>Movable. Adv.</td><td>MA ( NO 0.5776 VE 0.2537 AJ 0.0594 )</td></tr>
184
<tr><td>Measure</td><td>ME ( NO 0.4664 VE 0.1407 AJ0.0801 )</td></tr>
185
<tr><td>Noun</td><td>NO ( NO 0.3253 VE 0.1812 EC 0.0741 EP 0.0686 PS 0.0592 AD 0.0569 )</td></tr>
186
<tr><td>Number</td><td>NU ( ME 0.5017 NU 0.3380 VE 0.1672 NO 0.0751 )</td></tr>
187
<tr><td>Particle</td><td>PA ( EP 0.3095 NO 0.1819 NU 0.1483 EC 0.1085 VE 0.0513 )</td></tr>
188
<tr><td>Ordinaliz. Part.</td><td>PO ( NU 0.8621 VE 0.1375 )</td></tr>
189
<tr><td>Subord. Particle</td><td>PS ( NO 0.6858 AJ 0.0894 )</td></tr>
190
<tr><td>Prevrb.Sub.Part.</td><td>PT (VE 0.5069 AJ 0.2129 NO 0.1655 CV 0.1092 AD 0.0529 PS 0.0504 )</td></tr>
191
<tr><td>Pstvrb.Sub.Part.</td><td>PU ( AJ 0.5234 VE 0.1836 NO 0.1256 AD 0.1092 )</td></tr>
192
<tr><td>Specifier</td><td>SP ( ME 0.6236 NU 0.1553 NO 0.0875 )</td></tr>
193
<tr><td>Stative Verb</td><td>SV ( NO 0.2955 VE 0.1879 )</td></tr>
194
<tr><td>Aux. Verb</td><td>VA ( VE 0.6698 AJ 0.1123 CV 0.0839 AD 0.0517 )</td></tr>
195
<tr><td>Verb</td><td>VE ( NO 0.3315 VE 0.1636 NU 0.0706 EC 0.0624 )</td></tr>
196
<tr><td>Verb-Complmnt.</td><td>VC ( VE 0.2961 NO 0.1656 )</td></tr>
197
<tr><td>Verb-Obj.Cmpd.</td><td>VO ( VE 0.2391 )</td></tr>
198
<tr><td>Period</td><td>EP ( NO 0.6524 VE 0.1531 AD 0.0588 MA 0.0563 )</td></tr>
199
<tr><td>Comma</td><td>EC ( VE 0.2786 NO 0.2205 AD 0.1555 AJ 0.0731 MA 0.0559 )</td></tr>
200
<tr><td>Phrase</td><td>PH ( VE 0.4510 AD 0.0750 NO 0.0581)</td></tr>
203
This table indicates the probability of one word class being followed
204
by another. For example:
206
<li>after AD (adverb), VE (verb) is the most probable word class, occurring 67% of the time;
207
<li>the second most probable one is VA (auxiliary verb), occurring 8% of the time;
208
<li>CV (coverb, similar to the English preposition), and AD (adverb), are the next, each occurring 7% of the time;
209
<li>and the other 19 word classes have such a low probability that they are neglected.
213
The neural network will function when the typist presses the space
214
bar, which serves as a convert key after he finishes typing a
215
word in <I>pinyin</I> transcription and any homophones are found
216
after a search of the <B>WinCALIS</B> internal dictionary of <I>pinyin</I>-Chinese
217
character correspondences<B>. </B>It gets the word class of the
218
preceding already converted word from the text buffer, does its
219
calculations, and gets the list of probable following word classes<B>.
220
</B>Then the word classes of the homophones are compared with
221
this list<B>. </B>If only one word has the word class belonging
222
to the list, it is considered as the one the typist wanted as
223
in automatically converted without any user intervention. For
224
example, in the sentence cited above,
226
Wǒ yǒu sān bǎ yàoshi.
228
I have three (measure for keys) keys.
230
in the <B>WinCALIS</B> internal Chinese dictionary, <I>yàoshi</I>
233
要是 "if" MA (movable adverb)
235
钥匙 "key" NO (noun).
237
Bǎ is a measure which may be followed by NO (noun), VE (verb), AJ
238
(adjective), but not MA (movable adverb), so the neural network
239
predicts it is the <I>yàoshi</I> meaning "key"
242
If more than one match is found, the most probable word will be
243
compared with the second probable but different word<B>. </B>If
244
the difference of probability is significant, the neural network
245
will be sure that the most probable word is the one the typist
246
wants. Consider the word<I> jìn</I> in the following example:
252
In the dictionary, the syllable <I>jìn</I> has the word
255
进 "come in" VE
257
近 "near" AJ
259
劲 "energy" NO
261
Hěn is AS (adstative), so AJ gets the probability of 0.975, but VE
262
and NO only get 0.1478 and 0.0594, respectively<B>. </B>The difference
263
is very high, so the neural network is sure it is the <I>jìn</I>
264
meaning "near" 近 .
266
But in another example,
272
in the <B>WinCALIS</B> internal dictionary, qī has the word entries:
274
七 " seven" NU<br>
275
期 "period" ME
277
Qī 七 is a NU (number), so it may be followed by both another NU (number)
278
and ME (measure), and the difference of probability between NU
279
and ME (which is 0.5017 - 0.3380 = 0.1637) is considered as insignificant<B>.
280
</B>The neural network will not predict in this situation<B>.
281
</B>In fact, both qī qī <I>(nián)</I> 七七年 "(the year) seventy-seven"
282
and <I>(dì) qī qī 第七期 </I>"seven(th) period" are possible.
286
When we analyzed the neural network's training and operation,
287
it also suggested to us to regroup the word classes<B>. </B>We
288
originally had combined all the verbs Vi (verb-intransitive),
289
Vt (verb-transitive), Va (verb-auxiliary), and Vr (verb-resultative)
290
in one category VE<B>. </B>Because the word class NO (noun) was
291
more likely to follow this mega-word class VE (verb) than any
292
others, when one typed an auxiliary verb+main verb phrase like
293
<I>néng tīng</I> "can listen," the neural network
294
always favored the noun for the second word, choosing the nonsensical 能厅 "can hall," instead of the desired 能听 "can listen." We thought about the difference between Va (auxiliary verbs)
295
and regular verbs and decided to separate the Va from the VE category.
296
It is wonderful that the neural network confirms that the lists
297
of probable words after regular VE and Va are totally different<B>.
298
</B>This is true also for the newly created word class AS (adstative)
299
from the more general word class AD (adverb), and PT (preverbal
300
subordinating particle de 地) and PU (postverbal subordinating particle
301
de 得), differentiated from PS (subordinating particle de 的).
305
The network usually selects correctly by providing the output
306
with highest probability in that context: the likelihood that
307
it will offer a correct choice in second or third place, if first
308
place is in error, approaches 100%. Considering that the system
309
allows for twenty other outputs, this result is highly significant.
310
The network has "learned" the allowable and unallowable
311
sequences in Chinese syntax and applies that "learning"
312
to the actual task of typing Chinese in phonetic representation.
314
<B>Presenters' Biodata </B>
316
Mei Yuan is a visiting researcher at the Humanities Computing
317
Facility of Duke University. She received a B.E. degree in Computer
318
Science from Zhejiang University and an M.S. degree in Computer
319
Assisted Design from the Shanghai Maritime Institute. She is
320
investigating the design and application of back-propagation networks
321
in Chinese natural language processing.
323
Frank L. Borchardt, Ph.D., is Professor of German at Duke University
324
and Executive Director of <a href="http://agoralang.com:2410/calico.html">CALICO</a>.
326
Richard A. Kunst, Ph.D., is a Research Associate in the Duke University
327
Computer Assisted Language Learning (DUCALL) Project, where he
328
is developing full support for all the languages of the world
329
in <B>WinCALIS</B> 2.0.
331
<B>Contact Information</B>
334
<HR><CENTER><A HREF="/index.htm">Home</A> | <A HREF="/sitemap.htm">Site Map</A>
335
| <A HREF="/hcf/hcf.htm">Services</A> | <A HREF="/whatsnew.htm">New</A>
336
| <A HREF="/wincalis.htm">WinCALIS</A> | <A HREF="/uniintro.htm">UniEdit</A>
339
<p><center><a href="http://www.humancomp.org">The Humanities Computing Laboratory</a>
340
<br>A Nonprofit Education and Research Corporation
341
<br>301 W. Main St. Suite 400-I
342
<br>Durham, NC 27701 USA
343
<br><i>Voice: (919) 667-9556, 656-5915</i>
344
<br><i>Fax: (919) 667-9556</i>
345
<br>E-mail: <a href="mailto:info@humancomp.org"><i>info@humancomp.org</i></a></center></p>
347
Current Webmaster: Peter Sobolewski <I><A HREF="mailto:info@humancomp.org?Subject=Re: WinCALIS Web Page"">info@humancomp.org</A></I>
348
Suggestions regarding the web page, appearance, requested information,
349
etc. are welcome and appreciated at this e-mail address.