1
by David A. van Leeuwen
Import upstream version 1.2a1 |
1 |
Last updated: 07/20/1999, 11:04 |
2 |
||
3 |
-------- 08/01/1998 tcLex version 1.0 -------- |
|
4 |
||
5 |
||
6 |
-------- 09/02/1998 tcLex version 1.0p1 -------- |
|
7 |
||
8 |
1. Corrected potential bug when a global lexer was created from within a |
|
9 |
namespace. For example: |
|
10 |
||
11 |
namespace eval foo { |
|
12 |
lexer ::bar::baz ... |
|
13 |
} |
|
14 |
||
15 |
The created commmand was ::foo::bar::baz instead of ::bar::baz. Also, the return |
|
16 |
value is now the fully qualified name (like with proc) and not the specified |
|
17 |
name (namespace-relative). |
|
18 |
||
19 |
2. Corrected major bug with incremental processing. When used with rule |
|
20 |
rejection, some rules were incorrectly bypassed. This correction is also a |
|
21 |
performance enhancement for incremental processing. |
|
22 |
||
23 |
3. Minor typos corrected in the man page: the example demonstrating the |
|
24 |
difference between inclusive and exclusive conditions was incorrect. |
|
25 |
||
26 |
4. Corrected syntax error in the default pkgIndex.tcl file provided with the |
|
27 |
previous version. This file didn't work due to extra curly braces :(. Hopefully |
|
28 |
doing "pkg_mkIndex" worked. |
|
29 |
||
30 |
5. Added configure files for Unix, many thanks to John Ellson of Lucent |
|
31 |
<ellson@lucent.com> for these files! |
|
32 |
||
33 |
6. Changed "static const char*" to "static char*" in some places to avoid |
|
34 |
compiler warnings on Unix. Thanks to John Ellson <ellson@lucent.com> and Paul |
|
35 |
Vogel <vogel@cygnet.rsn.hp.com> for pointing that out. |
|
36 |
||
37 |
7. Added .txt extension to all text files in the distrib. This makes them easier |
|
38 |
to read them under Windows. |
|
39 |
||
40 |
8. Added the current file (changes.txt) |
|
41 |
||
42 |
||
43 |
-------- 11/11/1998 tcLex version 1.1a1 -------- |
|
44 |
||
45 |
1. Completely rewrote the regexp interface. A patched version of Tcl's regexp |
|
46 |
package is now included in the code. Although it makes the code a bit bigger |
|
47 |
(the binay is a few KB more), it allows for better handling of string overrun |
|
48 |
cases, which was a major limitation in previous versions. Also allows |
|
49 |
newline-sensitive regexps in Tcl8.0 (see below). Added files tcLexRE.h and |
|
50 |
tcLexRE.c -- which in turn includes RE80.c or RE81.c (modified regexp engines) |
|
51 |
depending on the Tcl version number. |
|
52 |
||
53 |
2. Completely reworked the string handling code, so it is Unicode-clean now |
|
54 |
under Tcl8.1. Now it stores the Unicode string instead of UTF-8, so that using |
|
55 |
string indices is easier (UTF-8 uses variable-byte chars and thus needs special |
|
56 |
parsing procedures). Also brings significant performance enhancement with big |
|
57 |
strings (previously, the whole UTF-8 string was converted to Unicode by the |
|
58 |
regexp package every time a rule was tried). |
|
59 |
||
60 |
3. Corrected bug with index under Tcl8.1 (the correction is related to the above |
|
61 |
changes). The returned index was the byte index and not the character index. |
|
62 |
||
63 |
4. Renamed tcLexPrivate.h to tcLexInt.h for more consistency with Tcl. |
|
64 |
||
65 |
5. Added -args option to allow extra arguments passing, using the same syntax as |
|
66 |
proc. For example: |
|
67 |
||
68 |
lexer foo -args {a b {c 3}} ... |
|
69 |
foo eval $string 1 2; # a=1, b=2, c defaults to 3 |
|
70 |
||
71 |
6. Added -lines flag for line-sensitive processing. This changes the behavior of |
|
72 |
"^$" and "." in regexps, and provides a portable way to use line-sensitive |
|
73 |
regexps (Tcl8.0 doesn't support them, and Tcl8.1 requires special syntax). This |
|
74 |
has been implemented thanks to the inclusion of the regexp code. |
|
75 |
||
76 |
7. Added a TcLex_Buffer structure to allow future improvements: different types |
|
77 |
of inputs (string, variable, file, channel) as well as multiple input buffers. |
|
78 |
||
79 |
8. Reorganized the code to make future improvements easier to implement. |
|
80 |
||
81 |
9. The return value to "lexer" is now an empty string, like with proc (contrary |
|
82 |
to what I previous wrote) |
|
83 |
||
84 |
10. Fixed bug due to overzealous memory deallocation, thanks to Claude BARRAS |
|
85 |
<barras@etca.fr>. |
|
86 |
||
87 |
11. Added "input" and "unput" subcommands, following the suggestions of Neil |
|
88 |
Walker <neil.walker@mrc-bsu.cam.ac.uk>. They are similar to flex's input() and |
|
89 |
unput() functions, except that unput can't put arbitrary chars back into the |
|
90 |
input string (this is a design choice, not a technical limiation). |
|
91 |
||
92 |
||
93 |
-------- 11/19/1998 tcLex version 1.1a2 -------- |
|
94 |
||
95 |
1. Added -nocase flag for case-insensitivity. Under Tcl8.0, it needed further |
|
96 |
incursion into the regexp code. |
|
97 |
||
98 |
2. Added -longest flag to chose longest matching rule (as flex) instead of first |
|
99 |
matching rule (the default). |
|
100 |
||
101 |
3. Reworked the rule rejection code so that it works correctly and efficiently |
|
102 |
with -longest. It also made it safer. |
|
103 |
||
104 |
||
105 |
-------- 11/25/1998 tcLex version 1.1a3 -------- |
|
106 |
||
107 |
1. Corrected major bug in the modified Tcl8.0 regexp engine, which caused some |
|
108 |
regexps to fail (especially those with ?-marked subexpressions). For instance, |
|
109 |
the expression "a?b" matched the string "b", but not the string "ab". |
|
110 |
||
111 |
2. Added "create" and "current" subcommands to the lexer command. The first is |
|
112 |
optional and is used when creating lexers: |
|
113 |
||
114 |
lexer ?create? <name> ?args ... args? |
|
115 |
||
116 |
The second can be used during a processing to get the name of the currently |
|
117 |
active lexer, for example: |
|
118 |
||
119 |
[lexer current] index |
|
120 |
||
121 |
This avoids using the name of the lexer everywhere, and is useful when lexers |
|
122 |
are renamed, aliased or imported. Suggestion made by Leo Schubert |
|
123 |
<leo@bj-ig.de>. These new subcommands introduce a potential incompatibility: |
|
124 |
lexers cannot be named "create" or "current" anymore (but this shouldn't be a |
|
125 |
problem). |
|
126 |
||
127 |
||
128 |
-------- 12/18/1998 tcLex version 1.1b1 -------- |
|
129 |
||
130 |
1. TcLex is now intended to be linked against Tcl8.0.4 or Tcl8.1b1. Some changes |
|
131 |
have been made in the source files to take the new import directives into |
|
132 |
account when building Windows DLLs (introduced in Tcl8.0.3). |
|
133 |
||
134 |
2. Slighly modified the Windows makefile.vc to build the object files into |
|
135 |
distinct directories depending on some settings (debug, Tcl version). |
|
136 |
||
137 |
3. File RE81.c is now based on the regexp source from Tcl8.1b1. |
|
138 |
||
139 |
4. Completely rewrote the documentation. This now includes a comparison with |
|
140 |
flex, as well as a classical man page. It uses HTML + CSS so that newer browsers |
|
141 |
can display enhanced presentation while still allowing text-based browsers to |
|
142 |
display properly formatted text. |
|
143 |
||
144 |
5. Added several examples, some from Neil Walker (thanks, Neil!), some from me |
|
145 |
(Frédéric BONNET). |
|
146 |
||
147 |
||
148 |
-------- 01/11/1999 tcLex version 1.1b2 -------- |
|
149 |
||
150 |
1. Added SafeTcl entry point (Tclex_SafeInit). |
|
151 |
||
152 |
2. Corrected bug that seemed to occur only on some Unix systems (eg. SGI and |
|
153 |
Solaris) but potentially affected others as well. This caused some lexers to be |
|
154 |
incorrectly reported as inactive even when returned by [lexer current]. The |
|
155 |
source of a bug was a missing lower bound in the lexer state deallocator |
|
156 |
(StateDelete) that caused subsequent states to be given a negative index, |
|
157 |
causing the "inactive lexer" error. Bug reported by Claude BARRAS and Neil |
|
158 |
Walker. |
|
159 |
||
160 |
3. Corrected bug in the modified Tcl8.0 regexp engine that caused newlines to be |
|
161 |
treated as any characters even in line-sensitive mode, when used with * or +. |
|
162 |
Bug reported by Neil Walker. |
|
163 |
||
164 |
4. Improved handling of ^$ in line-sensitive mode under Tcl8.0 so that they |
|
165 |
behave the same as under Tcl8.1. |
|
166 |
||
167 |
5. Corrected bug with empty string match handling: some actions were called |
|
168 |
twice, once for the matched string and once for an empty string at the end of |
|
169 |
the previous one. |
|
170 |
||
171 |
6. Fixed Unix warnings previously reported by Claude BARRAS but forgotten in the |
|
172 |
previous version: the struct regexec_state in RE80.c (modified Tcl8.0 regexp |
|
173 |
engine) was used before defined. This warning was silent under Windows (too low |
|
174 |
warning level?). |
|
175 |
||
176 |
||
177 |
-------- 04/04/1999 tcLex version 1.1 final -------- |
|
178 |
||
179 |
1. Corrected minor typo in RE80.c: in function findChar, parameter c was |
|
180 |
declared as int* instead of int. This had no influence (it got cast to a char |
|
181 |
anyway) but generated warnings with some compilers (not mine unfortunately )-: |
|
182 |
Reported by Volker Hetzer <hetzer.abg@sni.de>. |
|
183 |
||
184 |
2. TcLex is now intended to be linked against Tcl8.0.4 (or higher patchlevel) or |
|
185 |
Tcl8.1b2. On the latter, tcLex is configured by default to use the new stubs |
|
186 |
facility. Only minor code modifications were needed. Tcl8.1b1 isn't supported |
|
187 |
anymore. |
|
188 |
||
189 |
3. Removed compatibility macros from tcLexInt.h now that the old functions are |
|
190 |
back in Tcl8.1b2. |
|
191 |
||
192 |
4. Fixed major bug occuring with longest-prefered matching lexers. When several |
|
193 |
rules matched the same number of characters, the last defined rule was chosen |
|
194 |
instead of the first one, due to a bad comparison operator ('<' was used instead |
|
195 |
of '<=' in RuleTry). This broke the "pascal" example. |
|
196 |
||
197 |
5. Reformatted the code so that it uses 4 spaces indentations instead of 2, to |
|
198 |
better conform with Tcl C coding conventions. This is rather cosmetic but makes |
|
199 |
the code a bit more readable. |
|
200 |
||
201 |
||
202 |
-------- 04/30/1999 tcLex version 1.1.1 -------- |
|
203 |
||
204 |
1. TcLex is now intended to be linked against Tcl8.0.4 (or higher patchlevel) or |
|
205 |
Tcl8.1b3. Tcl8.1b2 isn't supported anymore. |
|
206 |
||
207 |
2. Removed redefinition of TclUtfToUniCharDString and TclUniCharToUtfDString |
|
208 |
that were needed by stub-enabled Tcl8.1b2, now that Tcl_UtfToUniCharDString and |
|
209 |
Tcl_UniCharToUtfDString are publicly available in Tcl8.1b3. |
|
210 |
||
211 |
3. Removed the hack needed by TclRegCompObj not being exported by stub-enabled |
|
212 |
Tcl8.1b2. Tcl8.1b3 now exports the public Tcl_GetRegExpFromObj which does the |
|
213 |
same thing. |
|
214 |
||
215 |
4. Fixed regexp inconsistency between Tcl8.0 and Tcl8.1 with line-sensitive |
|
216 |
matching. Regexps with negated ranges (eg. [^a]) could span multiple lines under |
|
217 |
Tcl8.0 but couldn't under Tcl8.1 (the right behavior). |
|
218 |
||
219 |
5. Cleaned up the modified regexp exec code and proposed it as a patch to the |
|
220 |
Tcl core. |
|
221 |
||
222 |
6. Rewrote arguments parsing code using Tcl_GetIndexFromObj to use symbolic |
|
223 |
constants rather than integer indices. |
|
224 |
||
225 |
7. Added links to Neil Walker's tcLex page (thanks Neil!) from the doc. |
|
226 |
||
227 |
||
228 |
-------- 04/25/1999 tcLex version 1.1.2 -------- |
|
229 |
||
230 |
1. Corrected bug in line-sensitive matching. This bug was introduced by the |
|
231 |
above change #4, and was located in the negated range processing code in certain |
|
232 |
cases. |
|
233 |
||
234 |
||
235 |
-------- 06/24/1999 tcLex version 1.1.3 -------- |
|
236 |
||
237 |
1. Corrected major bug with Tcl 8.1.1. The new regexp caching scheme introduced |
|
238 |
by Tcl 8.1.1 conflicted with the way tcLex stored compiled regexps. The regexp |
|
239 |
handling code has been completely reworked. Bug reported by Claude BARRAS. |
|
240 |
||
241 |
2. Added URL to Scriptics' regexp-HOWTO in the doc |
|
242 |
(http://www.scriptics.com/support/howto/regexp81.html). |
|
243 |
||
244 |
||
245 |
-------- 07/20/1999 tcLex version 1.1.4 -------- |
|
246 |
||
247 |
1. Corrected major bug with Tcl 8.1. The functions BufferNotStarving() |
|
248 |
and BufferAtEnd() mixed character and byte indices. which resulted in string |
|
249 |
overflows. Bug reported by Neil Walker. It is surprising that this bug did not |
|
250 |
show up earlier because the string overflows occured eventually in virtually |
|
251 |
any case, however it only crashed tcLex in very precise cases (hard to |
|
252 |
reproduce on Windows). |
|
253 |
||
254 |
||
255 |
-------- 09/03/1999 tcLex version 1.2a1 -------- |
|
256 |
||
257 |
1. Added support for Tcl8.2 and higher. Now that Tcl8.2's regexp engine provides |
|
258 |
the features needed by tcLex (ie string overrun detection and matching at the |
|
259 |
beginning of the string), tcLex no longer needs a patched version of this |
|
260 |
engine. This makes the code much simpler as it now uses standard Tcl library |
|
261 |
functions. Added file RE82.c |
|
262 |
||
263 |
2. The input string is now stored as a Tcl_Obj instead of a Tcl_DString. |
|
264 |
Reworked the related code in consequence (RuleTry(), RuleExec(), |
|
265 |
RuleGetRange()). Under Tcl8.0, use the obj's 8bits string. Under Tcl8.2, use the |
|
266 |
obj's Unicode (not UTF-8) string (actually, only pass the string obj to the Tcl |
|
267 |
library procs, which in turn use the obj's Unicode representation). Under |
|
268 |
Tcl8.1, added a Unicode object type and related procs (eg. Tcl_NewUnicodeObj(), |
|
269 |
Tcl_GetUnicode() and Tcl_GetCharLength()) to be source compatible with Tcl8.2. |
|
270 |
These new Unicode objects use Unicode Tcl_DStrings as their internal rep. |
|
271 |
||
272 |
3. Modified "lexer begin initial" behavior so that it empties the conditions |
|
273 |
stack rather than pushing the "initial" condition on top of it. This makes some |
|
274 |
lexers easier to write (eg. Neil Walker's flex examples). |