~ubuntu-branches/ubuntu/oneiric/strigi/oneiric

Reproduced here with permission from the <a href="http://www.calico.org">Computer Assisted Language Instruction Consortium</a>, from the Proceedings of the CALICO '94 Annual Symposium, (March 14-18, 1994, Northern Arizona University)

<H1>A Neural Network for Disambiguating Pinyin Chinese Input

</H1>

Mei Yuan, Richard A. Kunst, and Frank L. Borchardt

</center>

Introduction

The most user-friendly way of typing Chinese on a personal computer

is through entering it phonetically, as one would speak, by typing

in a standard Roman transcription such as pinyin, and letting

the computer do the work of looking up the pinyin words

in an internal dictionary which contains correspondences between

pinyin and Chinese Hanzi characters, then converting them

instantly into the correct characters. This phonetic conversion

method is the main Chinese input method for both courseware authors

and students in the WinCALIS 2.0 Computer Assisted Language

Learning (CALL) authoring system for Windows. But there

is an inconvenience to the pinyin-based typing of Chinese

because there are many homophones, words which sound alike, even

when tones are taken into account. In such cases, the computer

can only present the typist with a selection list and ask him

to choose the desired word. (If Chinese were English, the

typist would enter the phonetic transcription "tu,"

and be presented with the homophone selection list "to,"

"two," and "too.")

Sometimes the typist must search and choose from among dozens

of Chinese characters. Homophones are most numerous for

the single-syllable words which form the high-frequency core vocabulary

of any language. Choosing from among a list of homophones is obviously

inefficient. Some efforts have been made to deal with this

problem, such as maintaining frequency lists and typing in longer

contexts of combinations of syllables. These are made use of extensively

also in WinCALIS 2.0. Here we would like to introduce

a new approach to deal with cases where these other approaches

fail: a neural network which is used to predict the most likely

word.

Perhaps because our software is used to assist in language teaching,

it is easy to imagine that syntax could help to predict. But

the problem is that natural language is so flexible that it can't

be defined clearly like a computer language. The concept

of the neural network has been talked about for several years

and has been explained in various ways. Here, we would like to

explain a neural network by drawing an analogy. There are

two ways to learn language: one is native language learning, the

other is foreign language learning. The difference between

these is just like that between the neural network and the traditional

computer. Foreign language learning, by which we mean learning

at school, starts with character, pronunciation, sentence structure,

and so on. That means the teacher knows all the details,

and teaches what to

do and how to do it. This is like the traditional computer

algorithm it does what people know definitely how to do and what

will happen. But the neural network is designed to simulate

the intelligence of human beings and to tell us how to do something

after it learns by itself. It is similar to native language

learning. Parents hardly are concerned at all about syntax,

but the child will speak perfectly after listening and listening.

Parents never talk about any rules of language, but the child

will follow these rules even though he never notices them.

Similarly we provide to the neural network only samples of

Chinese text, but no rules, and after learning by itself, it will

know what to do. It will know which homophone is the right

one to fit in a certain context, just as the English speaker knows

which form of /tu/ fits in a certain context. Thus a neural

network looks as if it is very suitable to deal with the problem

of Chinese input.

Structure

The structure of the neural network in WinCALIS is shown

as Figure 1.

Figure 1. Structure of the Neural Network

</center>

There are 23 nodes both in the input layer and the output layer.

Each of the nodes stands for one word class category, like "noun,"

"verb," "adverb," etc. There are 7 nodes in

the hidden layer. The input is one certain word class, and the

output indicates the different probabilities of any of the 23

word classes appearing after that word class.

Training

To train the neural network is to find an appropriate weight matrix

according to the training material. We used the generalized

100

delta rule as the learning algorithm for the network. The

101

training material consists of many sample sentences. While

102

the typists who compiled the training material typed Chinese,

103

a file including all the information about word classes was automatically

104

created. For example, when the following sentence was typed:

105

106

Wǒ yǒu sān bǎ yàoshi.

107

我有三把钥匙。

108

I have three (measure for keys) keys.

109

</blockquote>

110

the corresponding wordclass string "PR Vt NU ME NO EP"

111

(which stands for pronoun, verb-transitive, number, measure, noun,

112

and period) was also saved.

113

114

When the neural network was trained, any two adjacent word classes

115

were used as a pair of Input and Target. Thus in the sentence

116

above, the pairs of Inputs and Targets are (PR Vt) (Vt NU) (NU

117

ME) (ME NO) (NO EP). When the neural network gets a word class

118

input, it converts the word class to a vector I = (i1,i2,i3 ...

119

i23) by giving a high value to the node corresponding to that

120

word class, and giving a low value to all the others. So

121

if the input word class is AD (adverb), the input vector is I

122

= (0.999, 0.001, 0.001 ... 0.001). The target is converted

123

in the same way.

124

125

After it gets its input, the neural network calculates step by

126

step from the input layer to the output layer.

127

128

129

H =Wh.I ;

130

131

O =Wo.H ;

132

(Here, H is Hidden, I is Input, O is Output)</blockquote>

133

134

Because the output will be the probability, the value of which

135

ranges from 0 to 1, we used the sigmoid function f(x) = 1/(1+e-x).

136

137

h1

138

h2

139

. = H = f(Wh.I )

140

. hk:=

141

h7

142

o1

143

o2

144

. = O = f(Wo.H )

145

.

146

o23</blockquote>

147

Then it compares the output with the target and gets the errors.

148

Using these errors it back-propagates step by step from the

149

output layer back to the input layer, in order to adjust the weight

150

matrixes.

151

152

1

153

2

154

. = = (T -O ).f'(O )

155

.

156

23

157

f'(x) = (1/(1+e-x))' = f(x).(1-f(x))

158

h1

159

h2

160

. = h= .Wo.f'(H )

161

.

162

h7

163

W(t+1)= W(t) + . .(I )T+ . W(t-1)</blockquote>

164

After completing those steps for a single Input-Target pair, the

165

next Input-Target pair is used to repeat the above steps, then

166

the next, then the next until the whole material is learned (about

167

60,000 word class tokens for the whole material).

168

169

We did not use masses of training material, so the neural network

170

had to occasionally re-learn the same material. When the

171

table resulting from a trial run had only a few changes after

172

one more training, we considered that the weight matrixes had

173

converged sufficiently. Below is the table of weight matrices

174

which WinCALIS uses now.

175

176

177

<caption>Figure 2. Table of Neural Network Prediction Weights</caption>

178

<tr><td>Adverb</td><td>AD ( VE 0.6700 VA 0.0826 CV 0.0769 AD 0.0713 )</td></tr>

179

<tr><td>Adstative</td><td>AS ( AJ 0.9750 VE 0.1478 NO 0.0594 )</td></tr>

180

<tr><td>Adjective</td><td>AJ ( NO 0.3135 VE 0.1979 PS 0.1131 EC 0.0885 EP 0.0802 )</td></tr>

181

<tr><td>Coverb (Prep.)</td><td>CV ( NO 0.5748 VE 0.1640 )</td></tr>

182

<tr><td>Interjection</td><td>IN ( NO 0.4567 VE 0.1553 AJ 0.0780 )</td></tr>

183

<tr><td>Movable. Adv.</td><td>MA ( NO 0.5776 VE 0.2537 AJ 0.0594 )</td></tr>

184

<tr><td>Measure</td><td>ME ( NO 0.4664 VE 0.1407 AJ0.0801 )</td></tr>

185

186

<tr><td>Number</td><td>NU ( ME 0.5017 NU 0.3380 VE 0.1672 NO 0.0751 )</td></tr>

187

<tr><td>Particle</td><td>PA ( EP 0.3095 NO 0.1819 NU 0.1483 EC 0.1085 VE 0.0513 )</td></tr>

188

<tr><td>Ordinaliz. Part.</td><td>PO ( NU 0.8621 VE 0.1375 )</td></tr>

189

<tr><td>Subord. Particle</td><td>PS ( NO 0.6858 AJ 0.0894 )</td></tr>

190

<tr><td>Prevrb.Sub.Part.</td><td>PT (VE 0.5069 AJ 0.2129 NO 0.1655 CV 0.1092 AD 0.0529 PS 0.0504 )</td></tr>

191

<tr><td>Pstvrb.Sub.Part.</td><td>PU ( AJ 0.5234 VE 0.1836 NO 0.1256 AD 0.1092 )</td></tr>

192

<tr><td>Specifier</td><td>SP ( ME 0.6236 NU 0.1553 NO 0.0875 )</td></tr>

193

<tr><td>Stative Verb</td><td>SV ( NO 0.2955 VE 0.1879 )</td></tr>

194

195

196

<tr><td>Verb-Complmnt.</td><td>VC ( VE 0.2961 NO 0.1656 )</td></tr>

197

198

<tr><td>Period</td><td>EP ( NO 0.6524 VE 0.1531 AD 0.0588 MA 0.0563 )</td></tr>

199

<tr><td>Comma</td><td>EC ( VE 0.2786 NO 0.2205 AD 0.1555 AJ 0.0731 MA 0.0559 )</td></tr>

200

<tr><td>Phrase</td><td>PH ( VE 0.4510 AD 0.0750 NO 0.0581)</td></tr>

201

</table></center>

202

203

This table indicates the probability of one word class being followed

204

by another. For example:

205

<ul>

206

<li>after AD (adverb), VE (verb) is the most probable word class, occurring 67% of the time;

207

<li>the second most probable one is VA (auxiliary verb), occurring 8% of the time;

208

<li>CV (coverb, similar to the English preposition), and AD (adverb), are the next, each occurring 7% of the time;

209

<li>and the other 19 word classes have such a low probability that they are neglected.

210

</ul>

211

Operation

212

213

The neural network will function when the typist presses the space

214

bar, which serves as a convert key after he finishes typing a

215

word in pinyin transcription and any homophones are found

216

after a search of the WinCALIS internal dictionary of pinyin-Chinese

217

character correspondences. It gets the word class of the

218

preceding already converted word from the text buffer, does its

219

calculations, and gets the list of probable following word classes.

220

Then the word classes of the homophones are compared with

221

this list. If only one word has the word class belonging

222

to the list, it is considered as the one the typist wanted as

223

in automatically converted without any user intervention. For

224

example, in the sentence cited above,

225

226

Wǒ yǒu sān bǎ yàoshi.

227

我有三把钥匙。

228

I have three (measure for keys) keys.

229

</blockquote>

230

in the WinCALIS internal Chinese dictionary, yàoshi

231

has the entries:

232

233

要是 "if" MA (movable adverb)

234

235

钥匙 "key" NO (noun).

236

</blockquote>

237

Bǎ is a measure which may be followed by NO (noun), VE (verb), AJ

238

(adjective), but not MA (movable adverb), so the neural network

239

predicts it is the yàoshi meaning "key"

240

钥匙 .

241

242

If more than one match is found, the most probable word will be

243

compared with the second probable but different word. If

244

the difference of probability is significant, the neural network

245

will be sure that the most probable word is the one the typist

246

wants. Consider the word jìn in the following example:

247

248

hěn jìn

249

很 jìn

250

very - ?

251

</blockquote>

252

In the dictionary, the syllable jìn has the word

253

entries:

254

255

进 "come in" VE

256

257

近 "near" AJ

258

259

劲 "energy" NO

260

</blockquote>

261

Hěn is AS (adstative), so AJ gets the probability of 0.975, but VE

262

and NO only get 0.1478 and 0.0594, respectively. The difference

263

is very high, so the neural network is sure it is the jìn

264

meaning "near" 近 .

265

266

But in another example,

267

268

qī qī

269

七 qī

270

seven - ?

271

</blockquote>

272

in the WinCALIS internal dictionary, qī has the word entries:

273

274

七 " seven" NU

275

期 "period" ME

276

</blockquote>

277

Qī 七 is a NU (number), so it may be followed by both another NU (number)

278

and ME (measure), and the difference of probability between NU

279

and ME (which is 0.5017 - 0.3380 = 0.1637) is considered as insignificant.

280

The neural network will not predict in this situation.

281

In fact, both qī qī (nián) 七七年 "(the year) seventy-seven"

282

and (dì) qī qī 第七期 "seven(th) period" are possible.

283

284

Enhancements

285

286

When we analyzed the neural network's training and operation,

287

it also suggested to us to regroup the word classes. We

288

originally had combined all the verbs Vi (verb-intransitive),

289

Vt (verb-transitive), Va (verb-auxiliary), and Vr (verb-resultative)

290

in one category VE. Because the word class NO (noun) was

291

more likely to follow this mega-word class VE (verb) than any

292

others, when one typed an auxiliary verb+main verb phrase like

293

néng tīng "can listen," the neural network

294

always favored the noun for the second word, choosing the nonsensical 能厅 "can hall," instead of the desired 能听 "can listen." We thought about the difference between Va (auxiliary verbs)

295

and regular verbs and decided to separate the Va from the VE category.

296

It is wonderful that the neural network confirms that the lists

297

of probable words after regular VE and Va are totally different.

298

This is true also for the newly created word class AS (adstative)

299

from the more general word class AD (adverb), and PT (preverbal

300

subordinating particle de 地) and PU (postverbal subordinating particle

301

de 得), differentiated from PS (subordinating particle de 的).

302

303

Conclusions

304

305

The network usually selects correctly by providing the output

306

with highest probability in that context: the likelihood that

307

it will offer a correct choice in second or third place, if first

308

place is in error, approaches 100%. Considering that the system

309

allows for twenty other outputs, this result is highly significant.

310

The network has "learned" the allowable and unallowable

311

sequences in Chinese syntax and applies that "learning"

312

to the actual task of typing Chinese in phonetic representation.

313

314

Presenters' Biodata

315

316

Mei Yuan is a visiting researcher at the Humanities Computing

317

Facility of Duke University. She received a B.E. degree in Computer

318

Science from Zhejiang University and an M.S. degree in Computer

319

Assisted Design from the Shanghai Maritime Institute. She is

320

investigating the design and application of back-propagation networks

321

in Chinese natural language processing.

322

323

Frank L. Borchardt, Ph.D., is Professor of German at Duke University

324

and Executive Director of <a href="http://agoralang.com:2410/calico.html">CALICO</a>.

325

326

Richard A. Kunst, Ph.D., is a Research Associate in the Duke University

327

Computer Assisted Language Learning (DUCALL) Project, where he

328

is developing full support for all the languages of the world

329

in WinCALIS 2.0.

330

331

Contact Information

332

The Authors c/o

333

334

335

| <A HREF="/hcf/hcf.htm">Services</A> | <A HREF="/whatsnew.htm">New</A>

336

| <A HREF="/wincalis.htm">WinCALIS</A> | <A HREF="/uniintro.htm">UniEdit</A>

337

338

339

<center><a href="http://www.humancomp.org">The Humanities Computing Laboratory</a>

340

A Nonprofit Education and Research Corporation

341

301 W. Main St. Suite 400-I

342

Durham, NC 27701 USA

343

Voice: (919) 667-9556, 656-5915

344

Fax: (919) 667-9556

345

E-mail: <a href="mailto:info@humancomp.org">info@humancomp.org</a></center>

346

<HR>

347

Current Webmaster: Peter Sobolewski <A HREF="mailto:info@humancomp.org?Subject=Re: WinCALIS Web Page"">info@humancomp.org</A>

348

Suggestions regarding the web page, appearance, requested information,

349

etc. are welcome and appreciated at this e-mail address.

350

</body>

351

</html>

Older »