~ubuntu-branches/ubuntu/utopic/robojournal/utopic

« back to all changes in this revision

Viewing changes to ui/hunspell/affentry.cxx

Committer: Package Import Robot
Author(s): Ritesh Raj Sarraf
Date: 2013-10-22 11:50:46 UTC
mfrom: (1.2.1) (2.1.2 experimental)
Revision ID: package-import@ubuntu.com-20131022115046-gl6s110no2x4buoc

Tags: 0.4.2-1

* [b62d8ee] Imported Upstream version 0.4.2
* [344ee28] Update patches
* [3366262] Fix Vcs Links
* Upload to unstable

files added:
.pc/.quilt_patches

.pc/.quilt_series

.pc/fix-documentation-link.patch

.pc/fix-documentation-link.patch/ui

.pc/fix-documentation-link.patch/ui/mainwindow.cpp

MAINTAINERS.txt

README.md

README_en_US.txt

changelog.xhtml

compile-instructions.xhtml

core

core/buffer.cpp

core/buffer.h

core/settingsmanager.cpp

core/settingsmanager.h

debian/patches/disable-spellcheck.patch

debian/patches/fix-documentation-link.patch

debian/robojournal-doc.docs

doc/compile_doc.pl

doc/css

doc/css/master.css

doc/export.html

doc/faq.html

doc/fundamentals.html

doc/getting_started.html

doc/help_debugger.bat

doc/img

doc/img/rj_doc_header.png

doc/img/rj_doc_header.psd

doc/img/rj_icon.png

doc/img/screens

doc/img/screens/.directory

doc/img/screens/assign_tag1.png

doc/img/screens/assign_tag2.png

doc/img/screens/assign_tag3.png

doc/img/screens/assign_tag4.png

doc/img/screens/background_browse.png

doc/img/screens/clear_background.png

doc/img/screens/color_chooser.png

doc/img/screens/define_tag1.png

doc/img/screens/define_tag2.png

doc/img/screens/define_tag3.png

doc/img/screens/editor_new_entry.png

doc/img/screens/editor_new_entry2.png

doc/img/screens/export_preview.png

doc/img/screens/jc0.png

doc/img/screens/jc1.png

doc/img/screens/jc2.png

doc/img/screens/jc3.png

doc/img/screens/js0.png

doc/img/screens/js1.png

doc/img/screens/js2.png

doc/img/screens/js3.png

doc/img/screens/login1.png

doc/img/screens/mainwindow_connected_mode.png

doc/img/screens/mainwindow_default.png

doc/img/screens/mainwindow_default2.png

doc/img/screens/modify1.png

doc/img/screens/modify2.png

doc/img/screens/mw_anatomy.png

doc/img/screens/mw_delete1.png

doc/img/screens/mw_delete2.png

doc/img/screens/mw_delete3.png

doc/img/screens/mw_disconnect1.png

doc/img/screens/mw_export1.png

doc/img/screens/mw_export2.png

doc/img/screens/mw_export3.png

doc/img/screens/mw_export4.png

doc/img/screens/mw_export5.png

doc/img/screens/mw_export6.png

doc/img/screens/mw_modify1.png

doc/img/screens/mw_modify2.png

doc/img/screens/mw_navigation1.png

doc/img/screens/mw_navigation2.png

doc/img/screens/mw_new_entry.png

doc/img/screens/mw_new_entry_2.png

doc/img/screens/mw_search1.png

doc/img/screens/mw_search2.png

doc/img/screens/mw_search_anatomy.png

doc/img/screens/mw_tag1.png

doc/img/screens/mw_tag2.png

doc/img/screens/mw_tag3.png

doc/img/screens/mw_tag4.png

doc/img/screens/mysql_install1.png

doc/img/screens/mysql_install2.png

doc/img/screens/mysql_install3.png

doc/img/screens/mysql_install4.png

doc/img/screens/mysql_install5.png

doc/img/screens/mysql_install6.png

doc/img/screens/mysql_install7.png

doc/img/screens/pattern_search1.png

doc/img/screens/pattern_search2.png

doc/img/screens/pattern_search3.png

doc/img/screens/pattern_search4.png

doc/img/screens/preferences_appearance.png

doc/img/screens/preferences_editor.png

doc/img/screens/preferences_export.png

doc/img/screens/preferences_general.png

doc/img/screens/preferences_journal.png

doc/img/screens/preferences_mysql.png

doc/img/screens/range_indicator.png

doc/img/screens/remove_tag1.png

doc/img/screens/remove_tag2.png

doc/img/screens/remove_tag3.png

doc/img/screens/select_background.png

doc/img/screens/select_dictionary.png

doc/img/screens/set_gender.png

doc/img/screens/set_name.png

doc/img/screens/spell_check_demo.png

doc/img/screens/tag_reminder.png

doc/img/screens/tag_search1.png

doc/img/screens/tag_search2.png

doc/img/screens/tag_search3.png

doc/img/screens/tag_search4.png

doc/img/screens/toolbar_with_labels.png

doc/img/screens/welcome2.png

doc/img/screens/welcome_to_robojournal.png

doc/img/screens/win_setup1.png

doc/index.html

doc/legacy_compile_doc.sh

doc/preferences.html

doc/preview_doc.sh

doc/rj_help.png

doc/robojournal.qhcp

doc/robojournal.qhp

doc/search.html

doc/setup.html

doc/shortcut_keys.html

doc/tags.html

doc/template.html

doc/toc.html

en_US.aff

en_US.dic

icons/application_side_contract.png

icons/application_side_expand.png

icons/arrow-circle-double.png

icons/arrow-curve-180-left.png

icons/arrow-curve.png

icons/arrow-repeat.png

icons/arrow_rotate_clockwise.png

icons/balloon-quotation.png

icons/bin.png

icons/binocular.png

icons/book.png

icons/clear-text.png

icons/color-swatch.png

icons/disk.png

icons/document-globe.png

icons/document-tree.png

icons/eraser.png

icons/external.png

icons/folder.png

icons/funnel.png

icons/highlight-cyan.png

icons/highlight-green.png

icons/highlight-orange.png

icons/highlight-pink.png

icons/highlight-purple.png

icons/highlight_yellow'.png

icons/latest.png

icons/modify2.png

icons/mysql_icon2.png

icons/na.png

icons/navigation-180-button.png

icons/node.png

icons/page_white.png

icons/page_white_copy.png

icons/page_white_database.png

icons/page_white_stack.png

icons/pencil-small.png

icons/pencil2.png

icons/picture.png

icons/prohibition-button.png

icons/robojournal-icon-big.png

icons/server.png

icons/spell-check-error.png

icons/spell_swap.png

icons/sqlite_icon.png

icons/tag_red.png

icons/tag_red_add.png

icons/textfield.png

icons/time.png

icons/ui-text-area.png

icons/user-female2.png

icons/user.png

icons/write2.png

known_bugs.txt

license.txt

linux_compile.pl

menus

menus/robojournal

menus/robojournal.desktop

menus/robojournal.xpm

pkg/changelog

pkg/compat

pkg/control

pkg/copyright

pkg/fedora-rpmbuild.patch

pkg/menu

pkg/robojournal.spec

pkg/rules

pkg/watch

robojournal.7.gz

robojournal32.png

robojournal48.png

robojournal64.png

sql/mysqlcore.cpp

sql/mysqlcore.h

sql/psqlcore.cpp

sql/psqlcore.h

sql/sqlitecore.cpp

sql/sqlitecore.h

sql/sqlshield.cpp

sql/sqlshield.h

ui/SpellTextEdit.cpp

ui/SpellTextEdit.h

ui/aboutrj.cpp

ui/aboutrj.h

ui/aboutrj.ui

ui/configurationappearance.cpp

ui/configurationappearance.h

ui/configurationappearance.ui

ui/configurationeditor.cpp

ui/configurationeditor.h

ui/configurationeditor.ui

ui/configurationexport.cpp

ui/configurationexport.h

ui/configurationexport.ui

ui/configurationgeneral.cpp

ui/configurationgeneral.h

ui/configurationgeneral.ui

ui/configurationjournal.cpp

ui/configurationjournal.h

ui/configurationjournal.ui

ui/configurationmysql.cpp

ui/configurationmysql.h

ui/configurationmysql.ui

ui/dblogin.cpp

ui/dblogin.h

ui/dblogin.ui

ui/editor.cpp

ui/editor.h

ui/editor.ui

ui/entryexporter.cpp

ui/entryexporter.h

ui/entryexporter.ui

ui/entrysearch.cpp

ui/exportpreview.cpp

ui/exportpreview.h

ui/exportpreview.ui

ui/firstrun.cpp

ui/firstrun.h

ui/firstrun.ui

ui/highlighter.cpp

ui/highlighter.h

ui/hunspell

ui/hunspell/affentry.cxx

ui/hunspell/affentry.hxx

ui/hunspell/affixmgr.cxx

ui/hunspell/affixmgr.hxx

ui/hunspell/atypes.hxx

ui/hunspell/baseaffix.hxx

ui/hunspell/csutil.cxx

ui/hunspell/csutil.hxx

ui/hunspell/dictmgr.cxx

ui/hunspell/dictmgr.hxx

ui/hunspell/filemgr.cxx

ui/hunspell/filemgr.hxx

ui/hunspell/hashmgr.cxx

ui/hunspell/hashmgr.hxx

ui/hunspell/htypes.hxx

ui/hunspell/hunspell.cxx

ui/hunspell/hunspell.h

ui/hunspell/hunspell.hxx

ui/hunspell/hunzip.cxx

ui/hunspell/hunzip.hxx

ui/hunspell/langnum.hxx

ui/hunspell/license.hunspell

ui/hunspell/license.myspell

ui/hunspell/phonet.cxx

ui/hunspell/phonet.hxx

ui/hunspell/suggestmgr.cxx

ui/hunspell/suggestmgr.hxx

ui/hunspell/utf_info.cxx

ui/hunspell/w_char.hxx

ui/journalcreator.cpp

ui/journalcreator.h

ui/journalcreator.ui

ui/journalselector.cpp

ui/journalselector.h

ui/journalselector.ui

ui/mainwindow.cpp

ui/mainwindow.h

ui/mainwindow.ui

ui/newconfig.cpp

ui/newconfig.h

ui/newconfig.ui

ui/tagger.cpp

ui/tagger.h

ui/tagger.ui

ui/tagreminder.cpp

ui/tagreminder.h

ui/tagreminder.ui

win32_cleanup.bat

win32_compile.bat

files removed:
CHANGELOG.htm

INSTALL-GUIDE-r3.htm

LICENSE.txt

aboutrj.cpp

aboutrj.h

aboutrj.ui

buffer.cpp

buffer.h

config.cpp

config.h

config.ui

configmanager.cpp

configmanager.h

dblogin.cpp

dblogin.h

dblogin.ui

editor.cpp

editor.h

editor.ui

entrysearch.cpp

entrysearch.h

entrysearch.ui

firstrun.cpp

firstrun.h

firstrun.ui

icons/action_paste.gif

icons/action_save.gif

icons/arrow_redo.png

icons/arrow_undo.png

icons/asterisk_yellow.png

icons/book_edit.png

icons/book_open.png

icons/calendar_view_day.png

icons/calendar_view_month.png

icons/color_wheel.png

icons/comment_edit.png

icons/copy.gif

icons/cross.png

icons/database_error.png

icons/eye.png

icons/folder_picture.png

icons/magnifier.png

icons/note_edit.png

icons/note_new.gif

icons/page_white_edit.png

icons/text_dropcaps.png

icons/wand.png

journalcreator.cpp

journalcreator.h

journalcreator.ui

mainwindow.cpp

mainwindow.h

mainwindow.ui

mysqlcore.cpp

mysqlcore.h

newdatabase.h

psqlcore.cpp

psqlcore.h

sqlitecore.cpp

sqlitecore.h

files modified:
.pc/applied-patches

debian/changelog

debian/compat

debian/control

debian/patches/robojournal-xpm-icon.patch

debian/patches/series

debian/rules

icons/add.png *

icons/application.png *

icons/bullet_black.png *

icons/bullet_blue.png *

icons/connect.png *

icons/cut.png *

icons/database.png *

icons/database_add.png *

icons/database_link.png *

icons/delete.png *

icons/disconnect.png *

icons/paste_plain.png *

icons/pencil.png *

icons/printer.png *

icons/resultset_first.png *

icons/resultset_last.png *

icons/robojournal-icon.png *

icons/robojournal.ico *

icons/robojournal.png *

icons/shield.png *

icons/tag_orange.png *

icons/wrench.png *

icons/wrench_orange.png *

images.qrc

main.cpp

robojournal.pro

Show diffs side-by-side

added added

removed removed

ui/hunspell/affentry.cxx

This file is part of RoboJournal.

MADE IN USA

RoboJournal is free software: you can redistribute it and/or modify

it under the terms of the GNU General Public License as published by

the Free Software Foundation, either version 3 of the License, or

(at your option) any later version.

RoboJournal is distributed in the hope that it will be useful,

but WITHOUT ANY WARRANTY; without even the implied warranty of

MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

GNU General Public License for more details.

You should have received a copy of the GNU General Public License

along with RoboJournal. If not, see <http://www.gnu.org/licenses/>.

/* ***** BEGIN LICENSE BLOCK *****

* Version: MPL 1.1/GPL 2.0/LGPL 2.1

* The contents of this file are subject to the Mozilla Public License Version

* 1.1 (the "License"); you may not use this file except in compliance with

* the License. You may obtain a copy of the License at

* http://www.mozilla.org/MPL/

* Software distributed under the License is distributed on an "AS IS" basis,

* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License

* for the specific language governing rights and limitations under the

* License.

* The Original Code is Hunspell, based on MySpell.

* The Initial Developers of the Original Code are

* Kevin Hendricks (MySpell) and Laszlo Nemeth (Hunspell).

* Contributor(s):

* David Einstein

* Davide Prina

* Giuseppe Modugno

* Gianluca Turconi

* Simon Brouwer

* Noll Janos

* Biro Arpad

* Goldman Eleonora

* Sarlos Tamas

* Bencsath Boldizsar

* Halacsy Peter

* Dvornik Laszlo

* Gefferth Andras

* Nagy Viktor

* Varga Daniel

* Chris Halls

* Rene Engelhard

* Bram Moolenaar

* Dafydd Jones

* Harri Pitkanen

* Andras Timar

* Tor Lillqvist

* Alternatively, the contents of this file may be used under the terms of

* either the GNU General Public License Version 2 or later (the "GPL"), or

* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),

* in which case the provisions of the GPL or the LGPL are applicable instead

* of those above. If you wish to allow use of your version of this file only

* under the terms of either the GPL or the LGPL, and not to allow others to

* use your version of this file under the terms of the MPL, indicate your

* decision by deleting the provisions above and replace them with the notice

* and other provisions required by the GPL or the LGPL. If you do not delete

* the provisions above, a recipient may use your version of this file under

* the terms of any one of the MPL, the GPL or the LGPL.

* ***** END LICENSE BLOCK ***** */

#include "license.hunspell"

#include "license.myspell"

#ifndef MOZILLA_CLIENT

#include <cstdlib>

#include <cstring>

#include <cctype>

#include <cstdio>

#else

#include <stdlib.h>

#include <string.h>

#include <stdio.h>

#include <ctype.h>

#endif

#include "ui/hunspell/affentry.hxx"

#include "ui/hunspell/csutil.hxx"

#ifndef MOZILLA_CLIENT

#ifndef WIN32

using namespace std;

#endif

100

#endif

101

102

103

PfxEntry::PfxEntry(AffixMgr* pmgr, affentry* dp)

104

{

105

// register affix manager

106

pmyMgr = pmgr;

107

108

// set up its intial values

109

110

aflag = dp->aflag; // flag

111

strip = dp->strip; // string to strip

112

appnd = dp->appnd; // string to append

113

stripl = dp->stripl; // length of strip string

114

appndl = dp->appndl; // length of append string

115

numconds = dp->numconds; // length of the condition

116

opts = dp->opts; // cross product flag

117

// then copy over all of the conditions

118

if (opts & aeLONGCOND) {

119

memcpy(c.conds, dp->c.l.conds1, MAXCONDLEN_1);

120

c.l.conds2 = dp->c.l.conds2;

121

} else memcpy(c.conds, dp->c.conds, MAXCONDLEN);

122

next = NULL;

123

nextne = NULL;

124

nexteq = NULL;

125

morphcode = dp->morphcode;

126

contclass = dp->contclass;

127

contclasslen = dp->contclasslen;

128

}

129

130

131

PfxEntry::~PfxEntry()

132

{

133

aflag = 0;

134

if (appnd) free(appnd);

135

if (strip) free(strip);

136

pmyMgr = NULL;

137

appnd = NULL;

138

strip = NULL;

139

if (opts & aeLONGCOND) free(c.l.conds2);

140

if (morphcode && !(opts & aeALIASM)) free(morphcode);

141

if (contclass && !(opts & aeALIASF)) free(contclass);

142

}

143

144

// add prefix to this word assuming conditions hold

145

char * PfxEntry::add(const char * word, int len)

146

{

147

char tword[MAXWORDUTF8LEN + 4];

148

149

if ((len > stripl) && (len >= numconds) && test_condition(word) &&

150

(!stripl || (strncmp(word, strip, stripl) == 0)) &&

151

((MAXWORDUTF8LEN + 4) > (len + appndl - stripl))) {

152

/* we have a match so add prefix */

153

char * pp = tword;

154

if (appndl) {

155

strcpy(tword,appnd);

156

pp += appndl;

157

}

158

strcpy(pp, (word + stripl));

159

return mystrdup(tword);

160

}

161

return NULL;

162

}

163

164

inline char * PfxEntry::nextchar(char * p) {

165

if (p) {

166

p++;

167

if (opts & aeLONGCOND) {

168

// jump to the 2nd part of the condition

169

if (p == c.conds + MAXCONDLEN_1) return c.l.conds2;

170

// end of the MAXCONDLEN length condition

171

} else if (p == c.conds + MAXCONDLEN) return NULL;

172

}

173

return p;

174

}

175

176

inline int PfxEntry::test_condition(const char * st)

177

{

178

const char * pos = NULL; // group with pos input position

179

bool neg = false; // complementer

180

bool ingroup = false; // character in the group

181

if (numconds == 0) return 1;

182

char * p = c.conds;

183

while (1) {

184

switch (*p) {

185

case '\0': return 1;

186

case '[': {

187

neg = false;

188

ingroup = false;

189

p = nextchar(p);

190

pos = st; break;

191

}

192

case '^': { p = nextchar(p); neg = true; break; }

193

case ']': {

194

if ((neg && ingroup) || (!neg && !ingroup)) return 0;

195

pos = NULL;

196

p = nextchar(p);

197

// skip the next character

198

if (!ingroup) for (st++; (opts & aeUTF8) && (*st & 0xc0) == 0x80; st++);

199

if (*st == '\0' && p && *p != '\0') return 0; // word <= condition

200

break;

201

}

202

case '.': if (!pos) { // dots are not metacharacters in groups: [.]

203

p = nextchar(p);

204

// skip the next character

205

for (st++; (opts & aeUTF8) && (*st & 0xc0) == 0x80; st++);

206

if (*st == '\0') return 0; // word <= condition

207

break;

208

}

209

default: {

210

if (*st == *p) {

211

st++;

212

p = nextchar(p);

213

if ((opts & aeUTF8) && (*(st - 1) & 0x80)) { // multibyte

214

while (p && (*p & 0xc0) == 0x80) { // character

215

if (*p != *st) {

216

if (!pos) return 0;

217

st = pos;

218

break;

219

}

220

p = nextchar(p);

221

st++;

222

}

223

if (pos && st != pos) {

224

ingroup = true;

225

while (p && *p != ']' && (p = nextchar(p)));

226

}

227

} else if (pos) {

228

ingroup = true;

229

while (p && *p != ']' && (p = nextchar(p)));

230

}

231

} else if (pos) { // group

232

p = nextchar(p);

233

} else return 0;

234

}

235

}

236

if (!p) return 1;

237

}

238

}

239

240

// check if this prefix entry matches

241

struct hentry * PfxEntry::checkword(const char * word, int len, char in_compound, const FLAG needflag)

242

{

243

int tmpl; // length of tmpword

244

struct hentry * he; // hash entry of root word or NULL

245

char tmpword[MAXWORDUTF8LEN + 4];

246

247

// on entry prefix is 0 length or already matches the beginning of the word.

248

// So if the remaining root word has positive length

249

// and if there are enough chars in root word and added back strip chars

250

// to meet the number of characters conditions, then test it

251

252

tmpl = len - appndl;

253

254

if (tmpl > 0) {

255

256

// generate new root word by removing prefix and adding

257

// back any characters that would have been stripped

258

259

if (stripl) strcpy (tmpword, strip);

260

strcpy ((tmpword + stripl), (word + appndl));

261

262

// now make sure all of the conditions on characters

263

// are met. Please see the appendix at the end of

264

// this file for more info on exactly what is being

265

// tested

266

267

// if all conditions are met then check if resulting

268

// root word in the dictionary

269

270

if (test_condition(tmpword)) {

271

tmpl += stripl;

272

if ((he = pmyMgr->lookup(tmpword)) != NULL) {

273

do {

274

if (TESTAFF(he->astr, aflag, he->alen) &&

275

// forbid single prefixes with needaffix flag

276

! TESTAFF(contclass, pmyMgr->get_needaffix(), contclasslen) &&

277

// needflag

278

((!needflag) || TESTAFF(he->astr, needflag, he->alen) ||

279

(contclass && TESTAFF(contclass, needflag, contclasslen))))

280

return he;

281

he = he->next_homonym; // check homonyms

282

} while (he);

283

}

284

285

// prefix matched but no root word was found

286

// if aeXPRODUCT is allowed, try again but now

287

// ross checked combined with a suffix

288

289

//if ((opts & aeXPRODUCT) && in_compound) {

290

if ((opts & aeXPRODUCT)) {

291

he = pmyMgr->suffix_check(tmpword, tmpl, aeXPRODUCT, (AffEntry *)this, NULL,

292

0, NULL, FLAG_NULL, needflag, in_compound);

293

if (he) return he;

294

}

295

}

296

}

297

return NULL;

298

}

299

300

// check if this prefix entry matches

301

struct hentry * PfxEntry::check_twosfx(const char * word, int len,

302

char in_compound, const FLAG needflag)

303

{

304

int tmpl; // length of tmpword

305

struct hentry * he; // hash entry of root word or NULL

306

char tmpword[MAXWORDUTF8LEN + 4];

307

308

// on entry prefix is 0 length or already matches the beginning of the word.

309

// So if the remaining root word has positive length

310

// and if there are enough chars in root word and added back strip chars

311

// to meet the number of characters conditions, then test it

312

313

tmpl = len - appndl;

314

315

if ((tmpl > 0) && (tmpl + stripl >= numconds)) {

316

317

// generate new root word by removing prefix and adding

318

// back any characters that would have been stripped

319

320

if (stripl) strcpy (tmpword, strip);

321

strcpy ((tmpword + stripl), (word + appndl));

322

323

// now make sure all of the conditions on characters

324

// are met. Please see the appendix at the end of

325

// this file for more info on exactly what is being

326

// tested

327

328

// if all conditions are met then check if resulting

329

// root word in the dictionary

330

331

if (test_condition(tmpword)) {

332

tmpl += stripl;

333

334

// prefix matched but no root word was found

335

// if aeXPRODUCT is allowed, try again but now

336

// cross checked combined with a suffix

337

338

if ((opts & aeXPRODUCT) && (in_compound != IN_CPD_BEGIN)) {

339

he = pmyMgr->suffix_check_twosfx(tmpword, tmpl, aeXPRODUCT, (AffEntry *)this, needflag);

340

if (he) return he;

341

}

342

}

343

}

344

return NULL;

345

}

346

347

// check if this prefix entry matches

348

char * PfxEntry::check_twosfx_morph(const char * word, int len,

349

char in_compound, const FLAG needflag)

350

{

351

int tmpl; // length of tmpword

352

char tmpword[MAXWORDUTF8LEN + 4];

353

354

// on entry prefix is 0 length or already matches the beginning of the word.

355

// So if the remaining root word has positive length

356

// and if there are enough chars in root word and added back strip chars

357

// to meet the number of characters conditions, then test it

358

359

tmpl = len - appndl;

360

361

if ((tmpl > 0) && (tmpl + stripl >= numconds)) {

362

363

// generate new root word by removing prefix and adding

364

// back any characters that would have been stripped

365

366

if (stripl) strcpy (tmpword, strip);

367

strcpy ((tmpword + stripl), (word + appndl));

368

369

// now make sure all of the conditions on characters

370

// are met. Please see the appendix at the end of

371

// this file for more info on exactly what is being

372

// tested

373

374

// if all conditions are met then check if resulting

375

// root word in the dictionary

376

377

if (test_condition(tmpword)) {

378

tmpl += stripl;

379

380

// prefix matched but no root word was found

381

// if aeXPRODUCT is allowed, try again but now

382

// ross checked combined with a suffix

383

384

if ((opts & aeXPRODUCT) && (in_compound != IN_CPD_BEGIN)) {

385

return pmyMgr->suffix_check_twosfx_morph(tmpword, tmpl,

386

aeXPRODUCT, (AffEntry *)this, needflag);

387

}

388

}

389

}

390

return NULL;

391

}

392

393

// check if this prefix entry matches

394

char * PfxEntry::check_morph(const char * word, int len, char in_compound, const FLAG needflag)

395

{

396

int tmpl; // length of tmpword

397

struct hentry * he; // hash entry of root word or NULL

398

char tmpword[MAXWORDUTF8LEN + 4];

399

char result[MAXLNLEN];

400

char * st;

401

402

*result = '\0';

403

404

// on entry prefix is 0 length or already matches the beginning of the word.

405

// So if the remaining root word has positive length

406

// and if there are enough chars in root word and added back strip chars

407

// to meet the number of characters conditions, then test it

408

409

tmpl = len - appndl;

410

411

if ((tmpl > 0) && (tmpl + stripl >= numconds)) {

412

413

// generate new root word by removing prefix and adding

414

// back any characters that would have been stripped

415

416

if (stripl) strcpy (tmpword, strip);

417

strcpy ((tmpword + stripl), (word + appndl));

418

419

// now make sure all of the conditions on characters

420

// are met. Please see the appendix at the end of

421

// this file for more info on exactly what is being

422

// tested

423

424

// if all conditions are met then check if resulting

425

// root word in the dictionary

426

427

if (test_condition(tmpword)) {

428

tmpl += stripl;

429

if ((he = pmyMgr->lookup(tmpword)) != NULL) {

430

do {

431

if (TESTAFF(he->astr, aflag, he->alen) &&

432

// forbid single prefixes with needaffix flag

433

! TESTAFF(contclass, pmyMgr->get_needaffix(), contclasslen) &&

434

// needflag

435

((!needflag) || TESTAFF(he->astr, needflag, he->alen) ||

436

(contclass && TESTAFF(contclass, needflag, contclasslen)))) {

437

if (morphcode) {

438

strcat(result, " ");

439

strcat(result, morphcode);

440

} else strcat(result,getKey());

441

if (!HENTRY_FIND(he, MORPH_STEM)) {

442

strcat(result, " ");

443

strcat(result, MORPH_STEM);

444

strcat(result, HENTRY_WORD(he));

445

}

446

// store the pointer of the hash entry

447

if (HENTRY_DATA(he)) {

448

strcat(result, " ");

449

strcat(result, HENTRY_DATA2(he));

450

} else {

451

// return with debug information

452

char * flag = pmyMgr->encode_flag(getFlag());

453

strcat(result, " ");

454

strcat(result, MORPH_FLAG);

455

strcat(result, flag);

456

free(flag);

457

}

458

strcat(result, "\n");

459

}

460

he = he->next_homonym;

461

} while (he);

462

}

463

464

// prefix matched but no root word was found

465

// if aeXPRODUCT is allowed, try again but now

466

// ross checked combined with a suffix

467

468

if ((opts & aeXPRODUCT) && (in_compound != IN_CPD_BEGIN)) {

469

st = pmyMgr->suffix_check_morph(tmpword, tmpl, aeXPRODUCT, (AffEntry *)this,

470

FLAG_NULL, needflag);

471

if (st) {

472

strcat(result, st);

473

free(st);

474

}

475

}

476

}

477

}

478

479

if (*result) return mystrdup(result);

480

return NULL;

481

}

482

483

SfxEntry::SfxEntry(AffixMgr * pmgr, affentry* dp)

484

{

485

// register affix manager

486

pmyMgr = pmgr;

487

488

// set up its intial values

489

aflag = dp->aflag; // char flag

490

strip = dp->strip; // string to strip

491

appnd = dp->appnd; // string to append

492

stripl = dp->stripl; // length of strip string

493

appndl = dp->appndl; // length of append string

494

numconds = dp->numconds; // length of the condition

495

opts = dp->opts; // cross product flag

496

497

// then copy over all of the conditions

498

if (opts & aeLONGCOND) {

499

memcpy(c.l.conds1, dp->c.l.conds1, MAXCONDLEN_1);

500

c.l.conds2 = dp->c.l.conds2;

501

} else memcpy(c.conds, dp->c.conds, MAXCONDLEN);

502

503

rappnd = myrevstrdup(appnd);

504

morphcode = dp->morphcode;

505

contclass = dp->contclass;

506

contclasslen = dp->contclasslen;

507

}

508

509

510

SfxEntry::~SfxEntry()

511

{

512

aflag = 0;

513

if (appnd) free(appnd);

514

if (rappnd) free(rappnd);

515

if (strip) free(strip);

516

pmyMgr = NULL;

517

appnd = NULL;

518

strip = NULL;

519

if (opts & aeLONGCOND) free(c.l.conds2);

520

if (morphcode && !(opts & aeALIASM)) free(morphcode);

521

if (contclass && !(opts & aeALIASF)) free(contclass);

522

}

523

524

// add suffix to this word assuming conditions hold

525

char * SfxEntry::add(const char * word, int len)

526

{

527

char tword[MAXWORDUTF8LEN + 4];

528

529

/* make sure all conditions match */

530

if ((len > stripl) && (len >= numconds) && test_condition(word + len, word) &&

531

(!stripl || (strcmp(word + len - stripl, strip) == 0)) &&

532

((MAXWORDUTF8LEN + 4) > (len + appndl - stripl))) {

533

/* we have a match so add suffix */

534

strcpy(tword,word);

535

if (appndl) {

536

strcpy(tword + len - stripl, appnd);

537

} else {

538

*(tword + len - stripl) = '\0';

539

}

540

return mystrdup(tword);

541

}

542

return NULL;

543

}

544

545

inline char * SfxEntry::nextchar(char * p) {

546

p++;

547

if (opts & aeLONGCOND) {

548

// jump to the 2nd part of the condition

549

if (p == c.l.conds1 + MAXCONDLEN_1) return c.l.conds2;

550

// end of the MAXCONDLEN length condition

551

} else if (p == c.conds + MAXCONDLEN) return NULL;

552

return p;

553

}

554

555

inline int SfxEntry::test_condition(const char * st, const char * beg)

556

{

557

const char * pos = NULL; // group with pos input position

558

bool neg = false; // complementer

559

bool ingroup = false; // character in the group

560

if (numconds == 0) return 1;

561

char * p = c.conds;

562

st--;

563

int i = 1;

564

while (1) {

565

switch (*p) {

566

case '\0': return 1;

567

case '[': { p = nextchar(p); pos = st; break; }

568

case '^': { p = nextchar(p); neg = true; break; }

569

case ']': { if (!neg && !ingroup) return 0;

570

i++;

571

// skip the next character

572

if (!ingroup) {

573

for (; (opts & aeUTF8) && (st >= beg) && (*st & 0xc0) == 0x80; st--);

574

st--;

575

}

576

pos = NULL;

577

neg = false;

578

ingroup = false;

579

p = nextchar(p);

580

if (st < beg && p && *p != '\0') return 0; // word <= condition

581

break;

582

}

583

case '.': if (!pos) { // dots are not metacharacters in groups: [.]

584

p = nextchar(p);

585

// skip the next character

586

for (st--; (opts & aeUTF8) && (st >= beg) && (*st & 0xc0) == 0x80; st--);

587

if (st < beg) return 0; // word <= condition

588

if (*st & 0x80) { // head of the UTF-8 character

589

st--;

590

if (st < beg) return 0; // word <= condition

591

}

592

break;

593

}

594

default: {

595

if (*st == *p) {

596

p = nextchar(p);

597

if ((opts & aeUTF8) && (*st & 0x80)) {

598

st--;

599

while (p && (st >= beg)) {

600

if (*p != *st) {

601

if (!pos) return 0;

602

st = pos;

603

break;

604

}

605

// first byte of the UTF-8 multibyte character

606

if ((*p & 0xc0) != 0x80) break;

607

p = nextchar(p);

608

st--;

609

}

610

if (pos && st != pos) {

611

if (neg) return 0;

612

else if (i == numconds) return 1;

613

ingroup = true;

614

while (p && *p != ']' && (p = nextchar(p)));

615

}

616

if (p && *p != '\0') p = nextchar(p);

617

} else if (pos) {

618

if (neg) return 0;

619

else if (i == numconds) return 1;

620

ingroup = true;

621

st--;

622

}

623

if (!pos) {

624

i++;

625

st--;

626

if (st < beg && p && *p != '\0') return 0; // word <= condition

627

}

628

} else if (pos) { // group

629

p = nextchar(p);

630

} else return 0;

631

}

632

}

633

if (!p) return 1;

634

}

635

}

636

637

// see if this suffix is present in the word

638

struct hentry * SfxEntry::checkword(const char * word, int len, int optflags,

639

AffEntry* ppfx, char ** wlst, int maxSug, int * ns, const FLAG cclass, const FLAG needflag,

640

const FLAG badflag)

641

{

642

int tmpl; // length of tmpword

643

struct hentry * he; // hash entry pointer

644

unsigned char * cp;

645

char tmpword[MAXWORDUTF8LEN + 4];

646

PfxEntry* ep = (PfxEntry *) ppfx;

647

648

// if this suffix is being cross checked with a prefix

649

// but it does not support cross products skip it

650

651

if (((optflags & aeXPRODUCT) != 0) && ((opts & aeXPRODUCT) == 0))

652

return NULL;

653

654

// upon entry suffix is 0 length or already matches the end of the word.

655

// So if the remaining root word has positive length

656

// and if there are enough chars in root word and added back strip chars

657

// to meet the number of characters conditions, then test it

658

659

tmpl = len - appndl;

660

// the second condition is not enough for UTF-8 strings

661

// it checked in test_condition()

662

663

if ((tmpl > 0) && (tmpl + stripl >= numconds)) {

664

665

// generate new root word by removing suffix and adding

666

// back any characters that would have been stripped or

667

// or null terminating the shorter string

668

669

strcpy (tmpword, word);

670

cp = (unsigned char *)(tmpword + tmpl);

671

if (stripl) {

672

strcpy ((char *)cp, strip);

673

tmpl += stripl;

674

cp = (unsigned char *)(tmpword + tmpl);

675

} else *cp = '\0';

676

677

// now make sure all of the conditions on characters

678

// are met. Please see the appendix at the end of

679

// this file for more info on exactly what is being

680

// tested

681

682

// if all conditions are met then check if resulting

683

// root word in the dictionary

684

685

if (test_condition((char *) cp, (char *) tmpword)) {

686

687

#ifdef SZOSZABLYA_POSSIBLE_ROOTS

688

fprintf(stdout,"%s %s %c\n", word, tmpword, aflag);

689

#endif

690

if ((he = pmyMgr->lookup(tmpword)) != NULL) {

691

do {

692

// check conditional suffix (enabled by prefix)

693

if ((TESTAFF(he->astr, aflag, he->alen) || (ep && ep->getCont() &&

694

TESTAFF(ep->getCont(), aflag, ep->getContLen()))) &&

695

(((optflags & aeXPRODUCT) == 0) ||

696

TESTAFF(he->astr, ep->getFlag(), he->alen) ||

697

// enabled by prefix

698

((contclass) && TESTAFF(contclass, ep->getFlag(), contclasslen))

699

) &&

700

// handle cont. class

701

((!cclass) ||

702

((contclass) && TESTAFF(contclass, cclass, contclasslen))

703

) &&

704

// check only in compound homonyms (bad flags)

705

(!badflag || !TESTAFF(he->astr, badflag, he->alen)

706

) &&

707

// handle required flag

708

((!needflag) ||

709

(TESTAFF(he->astr, needflag, he->alen) ||

710

((contclass) && TESTAFF(contclass, needflag, contclasslen)))

711

)

712

) return he;

713

he = he->next_homonym; // check homonyms

714

} while (he);

715

716

// obsolote stemming code (used only by the

717

// experimental SuffixMgr:suggest_pos_stems)

718

// store resulting root in wlst

719

} else if (wlst && (*ns < maxSug)) {

720

int cwrd = 1;

721

for (int k=0; k < *ns; k++)

722

if (strcmp(tmpword, wlst[k]) == 0) cwrd = 0;

723

if (cwrd) {

724

wlst[*ns] = mystrdup(tmpword);

725

if (wlst[*ns] == NULL) {

726

for (int j=0; j<*ns; j++) free(wlst[j]);

727

*ns = -1;

728

return NULL;

729

}

730

(*ns)++;

731

}

732

}

733

}

734

}

735

return NULL;

736

}

737

738

// see if two-level suffix is present in the word

739

struct hentry * SfxEntry::check_twosfx(const char * word, int len, int optflags,

740

AffEntry* ppfx, const FLAG needflag)

741

{

742

int tmpl; // length of tmpword

743

struct hentry * he; // hash entry pointer

744

unsigned char * cp;

745

char tmpword[MAXWORDUTF8LEN + 4];

746

PfxEntry* ep = (PfxEntry *) ppfx;

747

748

749

// if this suffix is being cross checked with a prefix

750

// but it does not support cross products skip it

751

752

if ((optflags & aeXPRODUCT) != 0 && (opts & aeXPRODUCT) == 0)

753

return NULL;

754

755

// upon entry suffix is 0 length or already matches the end of the word.

756

// So if the remaining root word has positive length

757

// and if there are enough chars in root word and added back strip chars

758

// to meet the number of characters conditions, then test it

759

760

tmpl = len - appndl;

761

762

if ((tmpl > 0) && (tmpl + stripl >= numconds)) {

763

764

// generate new root word by removing suffix and adding

765

// back any characters that would have been stripped or

766

// or null terminating the shorter string

767

768

strcpy (tmpword, word);

769

cp = (unsigned char *)(tmpword + tmpl);

770

if (stripl) {

771

strcpy ((char *)cp, strip);

772

tmpl += stripl;

773

cp = (unsigned char *)(tmpword + tmpl);

774

} else *cp = '\0';

775

776

// now make sure all of the conditions on characters

777

// are met. Please see the appendix at the end of

778

// this file for more info on exactly what is being

779

// tested

780

781

// if all conditions are met then recall suffix_check

782

783

if (test_condition((char *) cp, (char *) tmpword)) {

784

if (ppfx) {

785

// handle conditional suffix

786

if ((contclass) && TESTAFF(contclass, ep->getFlag(), contclasslen))

787

he = pmyMgr->suffix_check(tmpword, tmpl, 0, NULL, NULL, 0, NULL, (FLAG) aflag, needflag);

788

else

789

he = pmyMgr->suffix_check(tmpword, tmpl, optflags, ppfx, NULL, 0, NULL, (FLAG) aflag, needflag);

790

} else {

791

he = pmyMgr->suffix_check(tmpword, tmpl, 0, NULL, NULL, 0, NULL, (FLAG) aflag, needflag);

792

}

793

if (he) return he;

794

}

795

}

796

return NULL;

797

}

798

799

// see if two-level suffix is present in the word

800

char * SfxEntry::check_twosfx_morph(const char * word, int len, int optflags,

801

AffEntry* ppfx, const FLAG needflag)

802

{

803

int tmpl; // length of tmpword

804

unsigned char * cp;

805

char tmpword[MAXWORDUTF8LEN + 4];

806

PfxEntry* ep = (PfxEntry *) ppfx;

807

char * st;

808

809

char result[MAXLNLEN];

810

811

*result = '\0';

812

813

// if this suffix is being cross checked with a prefix

814

// but it does not support cross products skip it

815

816

if ((optflags & aeXPRODUCT) != 0 && (opts & aeXPRODUCT) == 0)

817

return NULL;

818

819

// upon entry suffix is 0 length or already matches the end of the word.

820

// So if the remaining root word has positive length

821

// and if there are enough chars in root word and added back strip chars

822

// to meet the number of characters conditions, then test it

823

824

tmpl = len - appndl;

825

826

if ((tmpl > 0) && (tmpl + stripl >= numconds)) {

827

828

// generate new root word by removing suffix and adding

829

// back any characters that would have been stripped or

830

// or null terminating the shorter string

831

832

strcpy (tmpword, word);

833

cp = (unsigned char *)(tmpword + tmpl);

834

if (stripl) {

835

strcpy ((char *)cp, strip);

836

tmpl += stripl;

837

cp = (unsigned char *)(tmpword + tmpl);

838

} else *cp = '\0';

839

840

// now make sure all of the conditions on characters

841

// are met. Please see the appendix at the end of

842

// this file for more info on exactly what is being

843

// tested

844

845

// if all conditions are met then recall suffix_check

846

847

if (test_condition((char *) cp, (char *) tmpword)) {

848

if (ppfx) {

849

// handle conditional suffix

850

if ((contclass) && TESTAFF(contclass, ep->getFlag(), contclasslen)) {

851

st = pmyMgr->suffix_check_morph(tmpword, tmpl, 0, NULL, aflag, needflag);

852

if (st) {

853

if (((PfxEntry *) ppfx)->getMorph()) {

854

strcat(result, ((PfxEntry *) ppfx)->getMorph());

855

strcat(result, " ");

856

}

857

strcat(result,st);

858

free(st);

859

mychomp(result);

860

}

861

} else {

862

st = pmyMgr->suffix_check_morph(tmpword, tmpl, optflags, ppfx, aflag, needflag);

863

if (st) {

864

strcat(result, st);

865

free(st);

866

mychomp(result);

867

}

868

}

869

} else {

870

st = pmyMgr->suffix_check_morph(tmpword, tmpl, 0, NULL, aflag, needflag);

871

if (st) {

872

strcat(result, st);

873

free(st);

874

mychomp(result);

875

}

876

}

877

if (*result) return mystrdup(result);

878

}

879

}

880

return NULL;

881

}

882

883

// get next homonym with same affix

884

struct hentry * SfxEntry::get_next_homonym(struct hentry * he, int optflags, AffEntry* ppfx,

885

const FLAG cclass, const FLAG needflag)

886

{

887

PfxEntry* ep = (PfxEntry *) ppfx;

888

FLAG eFlag = ep ? ep->getFlag() : FLAG_NULL;

889

890

while (he->next_homonym) {

891

he = he->next_homonym;

892

if ((TESTAFF(he->astr, aflag, he->alen) || (ep && ep->getCont() && TESTAFF(ep->getCont(), aflag, ep->getContLen()))) &&

893

((optflags & aeXPRODUCT) == 0 ||

894

TESTAFF(he->astr, eFlag, he->alen) ||

895

// handle conditional suffix

896

((contclass) && TESTAFF(contclass, eFlag, contclasslen))

897

) &&

898

// handle cont. class

899

((!cclass) ||

900

((contclass) && TESTAFF(contclass, cclass, contclasslen))

901

) &&

902

// handle required flag

903

((!needflag) ||

904

(TESTAFF(he->astr, needflag, he->alen) ||

905

((contclass) && TESTAFF(contclass, needflag, contclasslen)))

906

)

907

) return he;

908

}

909

return NULL;

910

}

911

912

913

#if 0

914

915

Appendix: Understanding Affix Code

916

917

918

An affix is either a prefix or a suffix attached to root words to make

919

other words.

920

921

Basically a Prefix or a Suffix is set of AffEntry objects

922

which store information about the prefix or suffix along

923

with supporting routines to check if a word has a particular

924

prefix or suffix or a combination.

925

926

The structure affentry is defined as follows:

927

928

struct affentry

929

{

930

unsigned short aflag; // ID used to represent the affix

931

char * strip; // string to strip before adding affix

932

char * appnd; // the affix string to add

933

unsigned char stripl; // length of the strip string

934

unsigned char appndl; // length of the affix string

935

char numconds; // the number of conditions that must be met

936

char opts; // flag: aeXPRODUCT- combine both prefix and suffix

937

char conds[SETSIZE]; // array which encodes the conditions to be met

938

};

939

940

941

Here is a suffix borrowed from the en_US.aff file. This file

942

is whitespace delimited.

943

944

SFX D Y 4

945

SFX D 0 e d

946

SFX D y ied [^aeiou]y

947

SFX D 0 ed [^ey]

948

SFX D 0 ed [aeiou]y

949

950

This information can be interpreted as follows:

951

952

In the first line has 4 fields

953

954

Field

955

-----

956

1 SFX - indicates this is a suffix

957

2 D - is the name of the character flag which represents this suffix

958

3 Y - indicates it can be combined with prefixes (cross product)

959

4 4 - indicates that sequence of 4 affentry structures are needed to

960

properly store the affix information

961

962

The remaining lines describe the unique information for the 4 SfxEntry

963

objects that make up this affix. Each line can be interpreted

964

as follows: (note fields 1 and 2 are as a check against line 1 info)

965

966

Field

967

-----

968

1 SFX - indicates this is a suffix

969

2 D - is the name of the character flag for this affix

970

3 y - the string of chars to strip off before adding affix

971

(a 0 here indicates the NULL string)

972

4 ied - the string of affix characters to add

973

5 [^aeiou]y - the conditions which must be met before the affix

974

can be applied

975

976

Field 5 is interesting. Since this is a suffix, field 5 tells us that

977

there are 2 conditions that must be met. The first condition is that

978

the next to the last character in the word must *NOT* be any of the

979

following "a", "e", "i", "o" or "u". The second condition is that

980

the last character of the word must end in "y".

981

982

So how can we encode this information concisely and be able to

983

test for both conditions in a fast manner? The answer is found

984

but studying the wonderful ispell code of Geoff Kuenning, et.al.

985

(now available under a normal BSD license).

986

987

If we set up a conds array of 256 bytes indexed (0 to 255) and access it

988

using a character (cast to an unsigned char) of a string, we have 8 bits

989

of information we can store about that character. Specifically we

990

could use each bit to say if that character is allowed in any of the

991

last (or first for prefixes) 8 characters of the word.

992

993

Basically, each character at one end of the word (up to the number

994

of conditions) is used to index into the conds array and the resulting

995

value found there says whether the that character is valid for a

996

specific character position in the word.

997

998

For prefixes, it does this by setting bit 0 if that char is valid

999

in the first position, bit 1 if valid in the second position, and so on.

1000

1001

If a bit is not set, then that char is not valid for that postion in the

1002

word.

1003

1004

If working with suffixes bit 0 is used for the character closest

1005

to the front, bit 1 for the next character towards the end, ...,

1006

with bit numconds-1 representing the last char at the end of the string.

1007

1008

Note: since entries in the conds[] are 8 bits, only 8 conditions

1009

(read that only 8 character positions) can be examined at one

1010

end of a word (the beginning for prefixes and the end for suffixes.

1011

1012

So to make this clearer, lets encode the conds array values for the

1013

first two affentries for the suffix D described earlier.

1014

1015

1016

For the first affentry:

1017

numconds = 1 (only examine the last character)

1018

1019

conds['e'] = (1 << 0) (the word must end in an E)

1020

all others are all 0

1021

1022

For the second affentry:

1023

numconds = 2 (only examine the last two characters)

1024

1025

conds[X] = conds[X] | (1 << 0) (aeiou are not allowed)

1026

where X is all characters *but* a, e, i, o, or u

1027

1028

1029

conds['y'] = (1 << 1) (the last char must be a y)

1030

all other bits for all other entries in the conds array are zero

1031

1032

1033

#endif

1034

Older »