1
2007-08-28 Theppitak Karoonboonyanan <thep@linux.thai.net>
6
2007-08-28 Theppitak Karoonboonyanan <thep@linux.thai.net>
8
* tests/test_thbrk.c (main), tests/test_thwbrk.c (main): Update check
9
values, according to the new compound words support.
11
2007-08-28 Theppitak Karoonboonyanan <thep@linux.thai.net>
13
* doc/Doxyfile.in: Update for doxygen 1.5.3.
15
2007-08-22 Theppitak Karoonboonyanan <thep@linux.thai.net>
17
* data/tdict-std[-compound].txt: Remove rare words. Rearrange and add
18
compounds. (O Ang - Ho Nokhuk)
20
* data/tdict-{common,spell}.txt: Add words.
22
2007-08-15 Theppitak Karoonboonyanan <thep@linux.thai.net>
24
* data/tdict-std[-compound].txt: Remove rare words. Rearrange and add
27
* data/tdict-{common,geo}.txt: Add words.
29
2007-08-07 Theppitak Karoonboonyanan <thep@linux.thai.net>
31
* data/tdict-std[-compound].txt: Remove rare words. Rearrange and add
34
* data/tdict-{common,district,geo,spell}.txt: Add words.
36
2007-07-20 Theppitak Karoonboonyanan <thep@linux.thai.net>
38
* data/tdict-std[-compound].txt: Remove rare words. Rearrange and add
39
compounds. (Wo Waen - So Rusi)
41
* data/tdict-{common,ict,scicence,spell}.txt: Add words.
43
2007-07-12 Theppitak Karoonboonyanan <thep@linux.thai.net>
45
* data/tdict-std[-compound].txt: Remove rare words. Rearrange and add
46
compounds. (Ro Rua - Lu)
48
* data/tdict-{common,science}.txt: Add words.
50
* data/tdict-district.txt: Move non-province names to the bottom, so
51
they are separated from provinces. Add two more names.
53
2007-07-09 Theppitak Karoonboonyanan <thep@linux.thai.net>
55
* data/tdict-std[-compound].txt: Remove rare words. Rearrange and add
56
compounds. (Mo Ma - Yo Yak)
58
* data/tdict-{common,geo,ict,science,spell}.txt: Add words.
60
2007-07-06 Theppitak Karoonboonyanan <thep@linux.thai.net>
62
* data/tdict-std[-compound].txt: Remove rare words. Rearrange and add
63
compounds. (Pho Phan - Pho Samphao)
65
* data/tdict-{common,spell,ict}.txt: Add words.
67
2007-06-30 Theppitak Karoonboonyanan <thep@linux.thai.net>
69
* data/tdict-std[-compound].txt: Remove rare words. Rearrange and add
70
compounds. (Tho Thahan - Fo Fa)
72
* data/tdict-{common,spell}.txt: Add words.
74
2007-06-25 Theppitak Karoonboonyanan <thep@linux.thai.net>
76
* data/tdict-std[-compound].txt: Remove more rare words. Move some
77
compound words from -std to -std-compound. Add some missing entries
78
found. (Restarted from Ko Kai - Tho Thung. We need more space to
79
continue adding compounds from last commit.)
81
* data/tdict-{common,ict}.txt: Add words.
83
2007-06-21 Theppitak Karoonboonyanan <thep@linux.thai.net>
85
* data/tdict-std[-compound].txt: Add more compound words. Move some
86
compound words from -std to -std-compound. Remove some rare entries,
87
to make room for more entries. (~80% done)
89
* data/tdict-{common,ict}.txt: Add words.
91
2007-06-18 Theppitak Karoonboonyanan <thep@linux.thai.net>
93
* data/Makefile.am, +data/tdict-std-compound.txt, data/tdict-std.txt:
94
Split compound words into a new file. Selectively add compound words.
97
* data/tdict-{common,ict,science,spell}.txt: Add words.
99
2007-06-12 Theppitak Karoonboonyanan <thep@linux.thai.net>
101
* src/thbrk/thbrk.c (th_brk): Don't break between CR and LF.
102
Remove last break if at string end.
104
* tests/test_th[w]brk.c (main): Update test values.
106
2007-06-11 Theppitak Karoonboonyanan <thep@linux.thai.net>
108
Redesign itemization code for th_brk(), aiming at Unicode UAX #14
111
* src/thbrk/Makefile.am, +src/thbrk/brk-ctype.{c,h}: Add character
112
classification table, as well as operation table for breaking between
113
all class combinations.
115
* src/thbrk/thbrk.c (th_brk): Rewrite the itemization code, based on
116
the break class table.
118
* tests/test_th[w]brk.c (main): Update test values.
120
2007-06-08 Theppitak Karoonboonyanan <thep@linux.thai.net>
122
* configure.in: Post-release version bump.
124
* data/tdict-{std,common}.txt: Add words.
126
2007-03-03 Theppitak Karoonboonyanan <thep@linux.thai.net>
130
=== Version 0.1.8 ===
132
2007-03-03 Theppitak Karoonboonyanan <thep@linux.thai.net>
134
* tests/test_thbrk.c (main), tests/test_thwbrk.c (main): Fix word
135
break check values, as white space handling has now been changed.
137
2007-03-03 Theppitak Karoonboonyanan <thep@linux.thai.net>
139
* configure.in: Add AC_LIBTOOL_WIN32_DLL as required for Win32.
141
* src/Makefile.am (libthai_la_LDFLAGS): Always pass -no-undefined to
142
enforce all resolved symbols. Thanks Loïc Minier.
144
2007-03-03 Theppitak Karoonboonyanan <thep@linux.thai.net>
146
* src/Makefile.am (EXTRA_DIST, libthai_la_LDFLAGS), +src/libthai.def:
147
Add -export-symbols flag to limit exported symbols. Thanks Loïc Minier
150
* src/thwstr/thwstr.c (th_wthaichunk): Declare the non-extern func as
153
2007-03-02 Theppitak Karoonboonyanan <thep@linux.thai.net>
155
* data/tdict-common.txt: Added words.
157
2007-02-04 Theppitak Karoonboonyanan <thep@linux.thai.net>
159
Yet another fix to white space bug in th_brk(), as spotted by
160
Suppachoke Santiwitchaya. This is just temporary fix for use while the
161
planned redesign does not happen.
163
* src/thbrk/thbrk.c (th_brk): Allow break between Thai punct and white
166
* src/thbrk/thbrk.c (is_breakable): Allow break between punct and
167
white space. Remove rule that inhibited break between space and
168
MAIYAMOK. It was not sufficient anyway, as the space before MAIYAMOK
171
* ChangeLog: Fix wrong date in previous commit.
173
2007-02-02 Theppitak Karoonboonyanan <thep@linux.thai.net>
175
* src/thbrk/thbrk.c (is_breakable): Allow break before white space.
176
This fixes wrong treatment of whitespace in HTML in mozlibthai
177
component, which caused glitches in webpages.
179
2007-01-13 Theppitak Karoonboonyanan <thep@linux.thai.net>
181
* data/tdict-{common,geo,std}.txt: Added words.
183
2006-10-14 Theppitak Karoonboonyanan <thep@linux.thai.net>
185
* configure.in: Post-release version bump.
187
2006-10-14 Theppitak Karoonboonyanan <thep@linux.thai.net>
189
* ChangeLog: Converted to UTF-8.
193
=== Version 0.1.7 ===
195
2006-10-14 Theppitak Karoonboonyanan <thep@linux.thai.net>
197
* data/Makefile.am: Specify LC_ALL=C to make sure 'sort' always works.
199
2006-10-14 Theppitak Karoonboonyanan <thep@linux.thai.net>
201
Fix 'make distcheck', plus a little enhancement on dict location.
203
* src/thbrk/brk-maximal.c (brk_get_dict): Try openning dict at
204
$LIBTHAI_DICTDIR environment before the default location.
206
* tests/Makefile.am, +tests/test-thbrk.sh, +tests/test-thwbrk.sh:
207
Added wrapper scripts to call test_th[w]brk programs with
208
LIBTHAI_DICTDIR set to trie in build tree.
210
* data/Makefile.am (EXTRA_DIST): Do not ship the auto-generated
213
2006-10-14 Theppitak Karoonboonyanan <thep@linux.thai.net>
215
* tests/test_thbrk.c (main), tests/test_thwbrk.c (main): Rearrange
216
source. Remove some unnecessary variables. Adjust style. Fix warnings.
218
2006-10-13 Theppitak Karoonboonyanan <thep@linux.thai.net>
220
* tests/test_thbrk.c (main), tests/test_thwbrk.c (main): Fix checking
221
value for the length of the output from th_[w]brk_line() tests.
223
2006-10-11 Theppitak Karoonboonyanan <thep@linux.thai.net>
225
* data/tdict-std.txt: Added some compound words.
227
* data/tdict-{common,ict}.txt: Added words.
229
2006-10-01 Theppitak Karoonboonyanan <thep@linux.thai.net>
231
* data/tdict-std.txt: Added some compound words.
233
* data/tdict-{common,ict}.txt: Added words.
235
2006-09-19 Theppitak Karoonboonyanan <thep@linux.thai.net>
237
* src/thbrk/Makefile.am (libthbrk_la_SOURCES), src/thbrk/thbrk.c,
238
+src/thbrk/brk-maximal.{h,c}: Split low-level mechanisms to
241
* -src/thbrk/cttex.c, -src/thbrk/dict2state.c: Removed unused files.
243
2006-09-19 Theppitak Karoonboonyanan <thep@linux.thai.net>
245
* data/tdict-std.txt: Removed rare word (ยาจนก) which potentially
246
causes wierd ambiguity. Added some more compound words.
248
* data/tdict-{common,ict,spell}.txt: Added words.
250
2006-09-17 Theppitak Karoonboonyanan <thep@linux.thai.net>
252
* TODO: Updated plan. Cleared what have been done.
254
2006-09-17 Theppitak Karoonboonyanan <thep@linux.thai.net>
256
* data/tdict-std.txt: Removed rare words (มาระ, มาริ) which
257
potentially cause weird ambiguities. Added some more compound words.
259
* data/tdict-{common,geo}.txt: Added words.
261
2006-09-15 Theppitak Karoonboonyanan <thep@linux.thai.net>
263
* data/tdict-std.txt: Added some compound words.
265
* data/tdict-{common,district,geo,ict,spell}.txt: Added words.
267
2006-09-12 Theppitak Karoonboonyanan <thep@linux.thai.net>
269
* data/tdict-std.txt: Removed rare word (มมาก) which potentially
270
causes ambiguity. Moved entry (มาย) into its compound forms, as it
271
alone can cause ambiguity. Added some more compound words.
273
* data/tdict-{common,geo,ict,spell}.txt: Added words.
275
2006-09-11 Theppitak Karoonboonyanan <thep@linux.thai.net>
277
* src/thbrk/thbrk.c (brk_do): Used is_breakable() to determine
278
breakability at the end of Thai chunk, instead of hard-coded
281
* src/thbrk/thbrk.c (is_breakable): Added condition for Thai chunk
282
ending. Also added condition so text is not breakable right after
283
period, comma and semicolon.
285
* data/tdict-std.txt: Broke "{เมทิล|เอทิล}แอลกอฮอล์" into two words.
287
* data/tdict-{common,geo,science,spell}.txt: Added words.
289
2006-09-11 Theppitak Karoonboonyanan <thep@linux.thai.net>
291
* src/thbrk/thbrk.c (is_breakable): Added non-breakable cases:
292
space + Mai Yamok; * + {right parenthesis|Khomut|...}.
294
2006-09-11 Theppitak Karoonboonyanan <thep@linux.thai.net>
296
* src/thbrk/thbrk.c (th_brk, is_breakable): Do not break after certain
297
punctuations like left quote, left parenthesis, etc., also covering
300
* data/tdict-common.txt: Removed Paiyan Yai. Added some more words.
302
2006-09-11 Theppitak Karoonboonyanan <thep@linux.thai.net>
304
* data/tdict-std.txt: Removed rare word (ทิวสะ) that caused weird
305
ambiguities. Added some compound words.
307
* data/tdict-{common,geo,ict,spell}.txt: Added words.
309
2006-09-07 Theppitak Karoonboonyanan <thep@linux.thai.net>
311
* data/tdict-{std,common,district,geo,ict,science,spell}.txt:
314
2006-09-06 Theppitak Karoonboonyanan <thep@linux.thai.net>
316
* data/tdict-std.txt: Removed two rare words (การก, ผลอ) that caused
317
weird ambiguities. Added some compound words.
319
* data/tdict-{common,district,geo,ict,science}.txt: Added words.
321
2006-09-05 Theppitak Karoonboonyanan <thep@linux.thai.net>
323
* src/thbrk/thbrk.c (brk_recover): Guarded against accessing beyond
326
* data/tdict-{std,common,geo,ict}.txt: Added more entries.
328
2006-09-03 Theppitak Karoonboonyanan <thep@linux.thai.net>
330
* src/thbrk/thbrk.c (brk_do): Adjusted condition in previous change a
333
2006-09-03 Theppitak Karoonboonyanan <thep@linux.thai.net>
335
* src/thbrk/thbrk.c (brk_do): (Optimization) In recovery mode, stop
336
immediately when first solution is found.
338
2006-09-02 Theppitak Karoonboonyanan <thep@linux.thai.net>
340
* src/thbrk/thbrk.c (brk_recover, brk_do): (Optimization) Remembered
341
previous recovery result for reuse, cutting off a few repeated
342
recoveries at the same position.
344
* data/tdict-{std,common}.txt: Added entries.
346
2006-09-01 Theppitak Karoonboonyanan <thep@linux.thai.net>
348
* src/thbrk/thbrk.c (best_brk_contest): Adjusted condition so that
349
equally scored solution that comes later overrides previous one.
350
Longest matching is preferred as a result for such situation.
352
* data/tdict-std.txt: Added three more words.
354
2006-09-01 Theppitak Karoonboonyanan <thep@linux.thai.net>
356
* src/thbrk/thbrk.c (th_isleadable): RU and LU are also leadable.
357
And don't bother checking for Thai digits. They are never passed.
359
* src/thbrk/thbrk.c (brk_do): Fixed wrong choosing of nodes with error
360
at end of string. Added penalty for such cases, and made sure the
361
break position is not marked.
363
* data/tdict-{std,science,ict,common}.txt: Added & removed entries.
365
2006-09-01 Theppitak Karoonboonyanan <thep@linux.thai.net>
367
* data/Makefile.am, +data/tdict-collection.txt, data/tdict-common.txt:
368
Split collection sets into tdict-collection.
370
* data/Makefile.am, +data/tdict-spell.txt, data/tdict-std.txt:
371
Split common typos or variations into tdict-spell.
373
* data/tdict-{std,common,ict,district,geo,science}.txt: Moved more
374
words out of tdict-std. Removed more redundant entries. Fixed typos.
377
2006-09-01 Theppitak Karoonboonyanan <thep@linux.thai.net>
379
* src/thbrk/Makefile.am (dictdatadir): Added variable missed during
382
* data/tdict-{std,ict,common}.txt: Moved some words out of tdict-std.
383
Removed duplicated and redundant entries. Added some more words.
385
2006-08-31 Theppitak Karoonboonyanan <thep@linux.thai.net>
387
* configure.in, Makefile.am, src/thbrk/Makefile.am, +data/Makefile.am,
388
src/thbrk/tdict.sbm -> data/tdict.sbm,
389
src/thbrk/tdict.txt ->
390
data/tdict-{common,district,geo,ict,science,std}.txt: Moved tdict
391
generation from source to data directory.
393
2006-08-31 Theppitak Karoonboonyanan <thep@linux.thai.net>
395
=== merged from datrie_wbrk-branch into HEAD ===
397
2006-08-30 Theppitak Karoonboonyanan <thep@linux.thai.net>
399
* src/thbrk/thbrk.c (brk_do): (Optimization) Unified the recovered
400
node immediately. Chance is that it gets superseded, rather than
401
picked up in later loop. Rearranged code to eliminate source
404
2006-08-30 Theppitak Karoonboonyanan <thep@linux.thai.net>
406
* src/thbrk/thbrk.c (brk_do): (Optimization) When unifying converted
407
nodes, clear all matches rather than just the first.
409
2006-08-30 Theppitak Karoonboonyanan <thep@linux.thai.net>
411
* src/thbrk/thbrk.c (brk_do): (Optimization) When successfully
412
recovered, stop walking immediately, increasing chance to be
413
superseded earlier by better candidate. Also removed unnecessary check
414
for str_pos < len, because it's guaranteed by brk_recover() when return
417
2006-08-29 Theppitak Karoonboonyanan <thep@linux.thai.net>
419
* src/thbrk/thbrk.c (+brk_pool_allocator_use,
420
brk_pool_allocator_clear): Guarded the free list with ref count, for
423
* src/thbrk/thbrk.c (th_brk): Requested to use the break pool
424
allocator at the beginning.
426
2006-08-29 Theppitak Karoonboonyanan <thep@linux.thai.net>
428
* src/thbrk/thbrk.c (brk_pool_node_new, brk_pool_free_node):
429
~(Optimization) Kept freed BrkPool nodes for reuse in next allocation,
430
reducing calls to malloc().
432
* src/thbrk/thbrk.c (+brk_pool_allocator_clear, th_brk): Cleared the
433
free list when work is done.
435
2006-08-29 Theppitak Karoonboonyanan <thep@linux.thai.net>
437
* src/thbrk/thbrk.c (brk_do): Calculated penalty for unrecoverable
438
string with (len - recent break), not (strlen(s) - recent break).
440
2006-08-29 Theppitak Karoonboonyanan <thep@linux.thai.net>
442
* configure.in (LT_REVISION): Incremented library revision.
444
* src/thbrk/thbrk.c (brk_do): (Optimization) Do not contest best break
445
when trie walking crashes in recover mode. It won't win recovery
446
criterion anyway. Also got rid of one inner loop condition.
448
2006-08-29 Theppitak Karoonboonyanan <thep@linux.thai.net>
450
* src/thbrk/thbrk.c (brk_recover): Do not try to recover after a
453
2006-08-26 Theppitak Karoonboonyanan <thep@linux.thai.net>
455
* src/thbrk/thbrk.c (th_brk): Tokenized mixed Thai-English text and
456
called brk_do() chunk by chunk.
458
* src/thbrk/thbrk.c (brk_do, brk_recover): Accepted string and length
459
rather than null-terminated string, to support chunk-wise breaking.
461
2006-08-26 Theppitak Karoonboonyanan <thep@linux.thai.net>
463
* src/thbrk/thbrk.c (brk_pool_delete): Adjusted code, for tiny
464
performance improvement, esp. when deleting first node.
466
2006-08-25 Theppitak Karoonboonyanan <thep@linux.thai.net>
468
* src/thbrk/tdict.txt: Manually revised word list. Removed some
469
archaic or obsolete words. Added some new terms.
471
2006-08-24 Theppitak Karoonboonyanan <thep@linux.thai.net>
473
* src/thbrk/thbrk.c: s/penulty/penalty/. :-P
475
2006-08-24 Theppitak Karoonboonyanan <thep@linux.thai.net>
477
* src/thbrk/thbrk.c (brk_do): Calculated penalty more accurately by
478
measuring distance from recent break pos, rather than the crash pos.
479
Also added penalty on recovery failure.
481
* src/thbrk/thbrk.c (best_brk_contest): Fixed boolean expression by
484
2006-08-23 Theppitak Karoonboonyanan <thep@linux.thai.net>
486
* libthai.pc.in: Added datrie to Requires.
488
* src/thbrk/Makefile.am: Removed old dict before rebuilding.
490
* src/thbrk/thbrk.c (th_brk_line): Added implementation.
492
* src/thbrk/thbrk.c (brk_do): Be satisfied with terminal state only if
493
the following character can begin a word.
495
* src/thbrk/thbrk.c (BrkShot, BestBrk, brk_root_pool, brk_do,
496
brk_shot_copy, best_brk_new, best_brk_contest): Added penulty
497
on crash recovery, and considered it when contesting shots. This can
498
prevent long crash shots from showing up as maximally matched.
500
2006-08-22 Theppitak Karoonboonyanan <thep@linux.thai.net>
502
=== begin of datrie_wbrk-branch ===
504
* configure.in, src/thbrk/Makefile.am, src/thbrk/thbrk.c,
505
+src/thbrk/thbrk.sbm: Replaced old thbrk from cttex with my new
506
version written from scratch.
508
2006-08-22 Theppitak Karoonboonyanan <thep@linux.thai.net>
510
* configure.in: Post-release version bump.
1
512
2006-08-05 Theppitak Karoonboonyanan <thep@linux.thai.net>
3
514
* NEWS, configure.in: