3
For more information on past and future Lucene versions, please see:
4
http://s.apache.org/luceneversions
6
======================= Lucene 3.5.0 =======================
8
Changes in backwards compatibility policy
10
* LUCENE-3390: The first approach in Lucene 3.4.0 for missing values
11
support for sorting had a design problem that made the missing value
12
be populated directly into the FieldCache arrays during sorting,
13
leading to concurrency issues. To fix this behaviour, the method
14
signatures had to be changed:
15
- FieldCache.getUnValuedDocs() was renamed to FieldCache.getDocsWithField()
16
returning a Bits interface (backported from Lucene 4.0).
17
- FieldComparator.setMissingValue() was removed and added to
19
As this is expert API, most code will not be affected.
20
(Uwe Schindler, Doron Cohen, Mike McCandless)
22
* LUCENE-3464: IndexReader.reopen has been renamed to
23
IndexReader.openIfChanged (a static method), and now returns null
24
(instead of the old reader) if there are no changes in the index, to
25
prevent the common pitfall of accidentally closing the old reader.
27
* LUCENE-3541: Remove IndexInput's protected copyBuf. If you want to
28
keep a buffer in your IndexInput, do this yourself in your implementation,
29
and be sure to do the right thing on clone()! (Robert Muir)
31
* LUCENE-2822: TimeLimitingCollector now expects a counter clock instead of
32
relying on a private daemon thread. The global time limiting clock thread
33
has been exposed and is now lazily loaded and fully optional.
34
TimeLimitingCollector now supports setting clock baseline manually to include
35
prelude of a search. Previous versions set the baseline on construction time,
36
now baseline is set once the first IndexReader is passed to the collector
37
unless set before. (Simon Willnauer)
39
Changes in runtime behavior
41
* LUCENE-3520: IndexReader.openIfChanged, when passed a near-real-time
42
reader, will now return null if there are no changes. The API has
43
always reserved the right to do this; it's just that in the past for
44
near-real-time readers it never did. (Mike McCandless)
48
* SOLR-2762: (backport form 4.x line): FSTLookup could return duplicate
49
results or one results less than requested. (David Smiley, Dawid Weiss)
51
* LUCENE-3412: SloppyPhraseScorer was returning non-deterministic results
52
for queries with many repeats (Doron Cohen)
54
* LUCENE-3421: PayloadTermQuery's explain was wrong when includeSpanScore=false.
55
(Edward Drapkin via Robert Muir)
57
* LUCENE-3432: IndexWriter.expungeDeletes with TieredMergePolicy
58
should ignore the maxMergedSegmentMB setting (v.sevel via Mike
61
* LUCENE-3442: TermQuery.TermWeight.scorer() returns null for non-atomic
62
IndexReaders (optimization bug, introcuced by LUCENE-2829), preventing
63
QueryWrapperFilter and similar classes to get a top-level DocIdSet.
64
(Dan C., Uwe Schindler)
66
* LUCENE-3390: Corrected handling of missing values when two parallel searches
67
using different missing values for sorting: the missing value was populated
68
directly into the FieldCache arrays during sorting, leading to concurrency
69
issues. (Uwe Schindler, Doron Cohen, Mike McCandless)
71
* LUCENE-3439: Closing an NRT reader after the writer was closed was
72
incorrectly invoking the DeletionPolicy and (then possibly deleting
73
files) on the closed IndexWriter (Robert Muir, Mike McCandless)
75
* LUCENE-3215: SloppyPhraseScorer sometimes computed Infinite freq
76
(Robert Muir, Doron Cohen)
78
* LUCENE-3465: IndexSearcher with ExecutorService was always passing 0
79
for docBase to Collector.setNextReader. (Robert Muir, Mike
82
* LUCENE-3503: DisjunctionSumScorer would give slightly different scores
83
for a document depending if you used nextDoc() versus advance().
84
(Mike McCandless, Robert Muir)
86
* LUCENE-3529: Properly support indexing an empty field with empty term text.
87
Previously, if you had assertions enabled you would receive an error during
88
flush, if you didn't, you would get an invalid index.
89
(Mike McCandless, Robert Muir)
91
* LUCENE-2633: PackedInts Packed32 and Packed64 did not support internal
92
structures larger than 256MB (Toke Eskildsen via Mike McCandless)
94
* LUCENE-3540: LUCENE-3255 dropped support for pre-1.9 indexes, but the
95
error message in IndexFormatTooOldException was incorrect. (Uwe Schindler,
98
* LUCENE-3541: IndexInput's default copyBytes() implementation was not safe
99
across multiple threads, because all clones shared the same buffer.
102
* LUCENE-3548: Fix CharsRef#append to extend length of the existing char[]
103
and preserve existing chars. (Simon Willnauer)
105
* LUCENE-3582: Normalize NaN values in NumericUtils.floatToSortableInt() /
106
NumericUtils.doubleToSortableLong(), so this is consistent with stored
107
fields. Also fix NumericRangeQuery to not falsely hit NaNs on half-open
108
ranges (one bound is null). Because of normalization, NumericRangeQuery
109
can now be used to hit NaN values by creating a query with
110
upper == lower == NaN (inclusive). (Dawid Weiss, Uwe Schindler)
114
* LUCENE-3454: Rename IndexWriter.optimize to forceMerge to discourage
115
use of this method since it is horribly costly and rarely justified
116
anymore. MergePolicy.findMergesForOptimize was renamed to
117
findForcedMerges. IndexReader.isOptimized was
118
deprecated. IndexCommit.isOptimized was replaced with
119
getSegmentCount. (Robert Muir, Mike McCandless)
121
* LUCENE-3205: Deprecated MultiTermQuery.getTotalNumerOfTerms() [and
122
related methods], as the numbers returned are not useful
123
for multi-segment indexes. They were only needed for tests of
124
NumericRangeQuery. (Mike McCandless, Uwe Schindler)
126
* LUCENE-3574: Deprecate outdated constants in org.apache.lucene.util.Constants
127
and add new ones for Java 6 and Java 7. (Uwe Schindler)
129
* LUCENE-3571: Deprecate IndexSearcher(Directory). Use the constructors
130
that take IndexReader instead. (Robert Muir)
132
* LUCENE-3577: Rename IndexWriter.expungeDeletes to forceMergeDeletes,
133
and revamped the javadocs, to discourage
134
use of this method since it is horribly costly and rarely
135
justified. MergePolicy.findMergesToExpungeDeletes was renamed to
136
findForcedDeletesMerges. (Robert Muir, Mike McCandless)
140
* LUCENE-3448: Added FixedBitSet.and(other/DISI), andNot(other/DISI).
143
* LUCENE-2215: Added IndexSearcher.searchAfter which returns results after a
144
specified ScoreDoc (e.g. last document on the previous page) to support deep
145
paging use cases. (Aaron McCurry, Grant Ingersoll, Robert Muir)
147
* LUCENE-1990: Adds internal packed ints implementation, to be used
148
for more efficient storage of int arrays when the values are
149
bounded, for example for storing the terms dict index (Toke
150
Eskildsen via Mike McCandless)
152
* LUCENE-3558: Moved SearcherManager, NRTManager & SearcherLifetimeManager into
153
core. All classes are contained in o.a.l.search. (Simon Willnauer)
157
* LUCENE-3426: Add NGramPhraseQuery which extends PhraseQuery and tries to
158
reduce the number of terms of the query when rewrite(), in order to improve
159
performance. (Robert Muir, Koji Sekiguchi)
161
* LUCENE-3494: Optimize FilteredQuery to remove a multiply in score()
162
(Uwe Schindler, Robert Muir)
164
* LUCENE-3534: Remove filter logic from IndexSearcher and delegate to
165
FilteredQuery's Scorer. This is a partial backport of a cleanup in
166
FilteredQuery/IndexSearcher added by LUCENE-1536 to Lucene 4.0.
169
* LUCENE-2205: Very substantial (3-5X) RAM reduction required to hold
170
the terms index on opening an IndexReader (Aaron McCurry via Mike McCandless)
172
* LUCENE-3443: FieldCache can now set docsWithField, and create an
173
array, in a single pass. This results in faster init time for apps
174
that need both (such as sorting by a field with a missing value).
179
* LUCENE-3420: Disable the finalness checks in TokenStream and Analyzer
180
for implementing subclasses in different packages, where assertions are not
181
enabled. (Uwe Schindler)
183
* LUCENE-3506: tests relying on assertions being enabled were no-op because
184
they ignored AssertionError. With this fix now entire test framework
185
(every test) fails if assertions are disabled, unless
186
-Dtests.asserts.gracious=true is specified. (Doron Cohen)
190
* SOLR-2849: Fix dependencies in Maven POMs. (David Smiley via Steve Rowe)
192
* LUCENE-3561: Fix maven xxx-src.jar files that were missing resources.
195
======================= Lucene 3.4.0 =======================
199
* LUCENE-3251: Directory#copy failed to close target output if opening the
200
source stream failed. (Simon Willnauer)
202
* LUCENE-3255: If segments_N file is all zeros (due to file
203
corruption), don't read that to mean the index is empty. (Gregory
204
Tarr, Mark Harwood, Simon Willnauer, Mike McCandless)
206
* LUCENE-3254: Fixed minor bug in deletes were written to disk,
207
causing the file to sometimes be larger than it needed to be. (Mike
210
* LUCENE-3224: Fixed a big where CheckIndex would incorrectly report a
211
corrupt index if a term with docfreq >= 16 was indexed more than once
212
at the same position. (Robert Muir)
214
* LUCENE-3339: Fixed deadlock case when multiple threads use the new
215
block-add (IndexWriter.add/updateDocuments) methods. (Robert Muir,
218
* LUCENE-3340: Fixed case where IndexWriter was not flushing at
219
exactly maxBufferedDeleteTerms (Mike McCandless)
221
* LUCENE-3358, LUCENE-3361: StandardTokenizer and UAX29URLEmailTokenizer
222
wrongly discarded combining marks attached to Han or Hiragana characters,
223
this is fixed if you supply Version >= 3.4 If you supply a previous
224
lucene version, you get the old buggy behavior for backwards compatibility.
225
(Trejkaz, Robert Muir)
227
* LUCENE-3368: IndexWriter commits segments without applying their buffered
228
deletes when flushing concurrently. (Simon Willnauer, Mike McCandless)
230
* LUCENE-3365: Create or Append mode determined before obtaining write lock
231
can cause IndexWriter overriding an existing index.
232
(Geoff Cooney via Simon Willnauer)
234
* LUCENE-3380: Fixed a bug where FileSwitchDirectory's listAll() would wrongly
235
throw NoSuchDirectoryException when all files written so far have been
236
written to one directory, but the other still has not yet been created on the
237
filesystem. (Robert Muir)
239
* LUCENE-3402: term vectors disappeared from the index if optimize() was called
240
following addIndexes(). (Shai Erera)
242
* LUCENE-3409: IndexWriter.deleteAll was failing to close pooled NRT
243
SegmentReaders, leading to unused files accumulating in the
244
Directory. (tal steier via Mike McCandless)
246
* LUCENE-3390: Added SortField.setMissingValue(v) to enable well defined
247
sorting behavior for documents that do not include the given field.
248
(Gilad Barkai via Doron Cohen)
250
* LUCENE-3418: Lucene was failing to fsync index files on commit,
251
meaning an operating system or hardware crash, or power loss, could
252
easily corrupt the index. (Mark Miller, Robert Muir, Mike
257
* LUCENE-3290: Added FieldInvertState.numUniqueTerms
258
(Mike McCandless, Robert Muir)
260
* LUCENE-3280: Add FixedBitSet, like OpenBitSet but is not elastic
261
(grow on demand if you set/get/clear too-large indices). (Mike
264
* LUCENE-2048: Added the ability to omit positions but still index
265
term frequencies, you can now control what is indexed into
266
the postings via AbstractField.setIndexOptions:
267
DOCS_ONLY: only documents are indexed: term frequencies and positions are omitted
268
DOCS_AND_FREQS: only documents and term frequencies are indexed: positions are omitted
269
DOCS_AND_FREQS_AND_POSITIONS: full postings: documents, frequencies, and positions
270
AbstractField.setOmitTermFrequenciesAndPositions is deprecated,
271
you should use DOCS_ONLY instead. (Robert Muir)
273
* LUCENE-3097: Added a new grouping collector that can be used to retrieve all most relevant
274
documents per group. This can be useful in situations when one wants to compute grouping
275
based facets / statistics on the complete query result. (Martijn van Groningen)
277
* LUCENE-3334: If Java7 is detected, IOUtils.closeSafely() will log
278
suppressed exceptions in the original exception, so stack trace
279
will contain them. (Uwe Schindler)
283
* LUCENE-3289: When building an FST you can now tune how aggressively
284
the FST should try to share common suffixes. Typically you can
285
greatly reduce RAM required during building, and CPU consumed, at
286
the cost of a somewhat larger FST. (Mike McCandless)
290
* LUCENE-3327: Fix AIOOBE when TestFSTs is run with
291
-Dtests.verbose=true (James Dyer via Mike McCandless)
295
* LUCENE-3406: Add ant target 'package-local-src-tgz' to Lucene and Solr
296
to package sources from the local working copy.
297
(Seung-Yeoul Yang via Steve Rowe)
300
======================= Lucene 3.3.0 =======================
302
Changes in backwards compatibility policy
304
* LUCENE-3140: IndexOutput.copyBytes now takes a DataInput (superclass
305
of IndexInput) as its first argument. (Robert Muir, Dawid Weiss,
308
* LUCENE-3191: FieldComparator.value now returns an Object not
309
Comparable; FieldDoc.fields also changed from Comparable[] to
310
Object[] (Uwe Schindler, Mike McCandless)
312
* LUCENE-3208: Made deprecated methods Query.weight(Searcher) and
313
Searcher.createWeight() final to prevent override. If you have
314
overridden one of these methods, cut over to the non-deprecated
315
implementation. (Uwe Schindler, Robert Muir, Yonik Seeley)
317
* LUCENE-3238: Made MultiTermQuery.rewrite() final, to prevent
318
problems (such as not properly setting rewrite methods, or
319
not working correctly with things like SpanMultiTermQueryWrapper).
320
To rewrite to a simpler form, instead return a simpler enum
321
from getEnum(IndexReader). For example, to rewrite to a single term,
322
return a SingleTermEnum. (ludovic Boutros, Uwe Schindler, Robert Muir)
324
Changes in runtime behavior
326
* LUCENE-2834: the hash used to compute the lock file name when the
327
lock file is not stored in the index has changed. This means you
328
will see a different lucene-XXX-write.lock in your lock directory.
329
(Robert Muir, Uwe Schindler, Mike McCandless)
331
* LUCENE-3146: IndexReader.setNorm throws IllegalStateException if the field
332
does not store norms. (Shai Erera, Mike McCandless)
334
* LUCENE-3198: On Linux, if the JRE is 64 bit and supports unmapping,
335
FSDirectory.open now defaults to MMapDirectory instead of
336
NIOFSDirectory since MMapDirectory gives better performance. (Mike
339
* LUCENE-3200: MMapDirectory now uses chunk sizes that are powers of 2.
340
When setting the chunk size, it is rounded down to the next possible
341
value. The new default value for 64 bit platforms is 2^30 (1 GiB),
342
for 32 bit platforms it stays unchanged at 2^28 (256 MiB).
343
Internally, MMapDirectory now only uses one dedicated final IndexInput
344
implementation supporting multiple chunks, which makes Hotspot's life
345
easier. (Uwe Schindler, Robert Muir, Mike McCandless)
349
* LUCENE-3147,LUCENE-3152: Fixed open file handles leaks in many places in the
350
code. Now MockDirectoryWrapper (in test-framework) tracks all open files,
351
including locks, and fails if the test fails to release all of them.
352
(Mike McCandless, Robert Muir, Shai Erera, Simon Willnauer)
354
* LUCENE-3102: CachingCollector.replay was failing to call setScorer
355
per-segment (Martijn van Groningen via Mike McCandless)
357
* LUCENE-3183: Fix rare corner case where seeking to empty term
358
(field="", term="") with terms index interval 1 could hit
359
ArrayIndexOutOfBoundsException (selckin, Robert Muir, Mike
362
* LUCENE-3208: IndexSearcher had its own private similarity field
363
and corresponding get/setter overriding Searcher's implementation. If you
364
setted a different Similarity instance on IndexSearcher, methods implemented
365
in the superclass Searcher were not using it, leading to strange bugs.
366
(Uwe Schindler, Robert Muir)
368
* LUCENE-3197: Fix core merge policies to not over-merge during
369
background optimize when documents are still being deleted
370
concurrently with the optimize (Mike McCandless)
372
* LUCENE-3222: The RAM accounting for buffered delete terms was
373
failing to measure the space required to hold the term's field and
374
text character data. (Mike McCandless)
376
* LUCENE-3238: Fixed bug where using WildcardQuery("prefix*") inside
377
of a SpanMultiTermQueryWrapper rewrote incorrectly and returned
378
an error instead. (ludovic Boutros, Uwe Schindler, Robert Muir)
382
* LUCENE-3208: Renamed protected IndexSearcher.createWeight() to expert
383
public method IndexSearcher.createNormalizedWeight() as this better describes
384
what this method does. The old method is still there for backwards
385
compatibility. Query.weight() was deprecated and simply delegates to
386
IndexSearcher. Both deprecated methods will be removed in Lucene 4.0.
387
(Uwe Schindler, Robert Muir, Yonik Seeley)
389
* LUCENE-3197: MergePolicy.findMergesForOptimize now takes
390
Map<SegmentInfo,Boolean> instead of Set<SegmentInfo> as the second
391
argument, so the merge policy knows which segments were originally
392
present vs produced by an optimizing merge (Mike McCandless)
396
* LUCENE-1736: DateTools.java general improvements.
397
(David Smiley via Steve Rowe)
401
* LUCENE-3140: Added experimental FST implementation to Lucene.
402
(Robert Muir, Dawid Weiss, Mike McCandless)
404
* LUCENE-3193: A new TwoPhaseCommitTool allows running a 2-phase commit
405
algorithm over objects that implement the new TwoPhaseCommit interface (such
406
as IndexWriter). (Shai Erera)
408
* LUCENE-3191: Added TopDocs.merge, to facilitate merging results from
409
different shards (Uwe Schindler, Mike McCandless)
411
* LUCENE-3179: Added OpenBitSet.prevSetBit (Paul Elschot via Mike McCandless)
413
* LUCENE-3210: Made TieredMergePolicy more aggressive in reclaiming
414
segments with deletions; added new methods
415
set/getReclaimDeletesWeight to control this. (Mike McCandless)
419
* LUCENE-1344: Create OSGi bundle using dev-tools/maven.
420
(Nicolas Lalevée, Luca Stancapiano via ryan)
422
* LUCENE-3204: The maven-ant-tasks jar is now included in the source tree;
423
users of the generate-maven-artifacts target no longer have to manually
424
place this jar in the Ant classpath. NOTE: when Ant looks for the
425
maven-ant-tasks jar, it looks first in its pre-existing classpath, so
426
any copies it finds will be used instead of the copy included in the
427
Lucene/Solr source tree. For this reason, it is recommeded to remove
428
any copies of the maven-ant-tasks jar in the Ant classpath, e.g. under
429
~/.ant/lib/ or under the Ant installation's lib/ directory. (Steve Rowe)
432
======================= Lucene 3.2.0 =======================
434
Changes in backwards compatibility policy
436
* LUCENE-2953: PriorityQueue's internal heap was made private, as subclassing
437
with generics can lead to ClassCastException. For advanced use (e.g. in Solr)
438
a method getHeapArray() was added to retrieve the internal heap array as a
439
non-generic Object[]. (Uwe Schindler, Yonik Seeley)
441
* LUCENE-1076: IndexWriter.setInfoStream now throws IOException
442
(Mike McCandless, Shai Erera)
444
* LUCENE-3084: MergePolicy.OneMerge.segments was changed from
445
SegmentInfos to a List<SegmentInfo>. SegmentInfos itsself was changed
446
to no longer extend Vector<SegmentInfo> (to update code that is using
447
Vector-API, use the new asList() and asSet() methods returning unmodifiable
448
collections; modifying SegmentInfos is now only possible through
449
the explicitely declared methods). IndexWriter.segString() now takes
450
Iterable<SegmentInfo> instead of List<SegmentInfo>. A simple recompile
451
should fix this. MergePolicy and SegmentInfos are internal/experimental
452
APIs not covered by the strict backwards compatibility policy.
453
(Uwe Schindler, Mike McCandless)
455
Changes in runtime behavior
457
* LUCENE-3065: When a NumericField is retrieved from a Document loaded
458
from IndexReader (or IndexSearcher), it will now come back as
459
NumericField not as a Field with a string-ified version of the
460
numeric value you had indexed. Note that this only applies for
461
newly-indexed Documents; older indices will still return Field
462
with the string-ified numeric value. If you call Document.get(),
463
the value comes still back as String, but Document.getFieldable()
464
returns NumericField instances. (Uwe Schindler, Ryan McKinley,
467
* LUCENE-1076: Changed the default merge policy from
468
LogByteSizeMergePolicy to TieredMergePolicy, as of Version.LUCENE_32
469
(passed to IndexWriterConfig), which is able to merge non-contiguous
470
segments. This means docIDs no longer necessarily stay "in order"
471
during indexing. If this is a problem then you can use either of
472
the LogMergePolicy impls. (Mike McCandless)
476
* LUCENE-3082: Added index upgrade tool oal.index.IndexUpgrader
477
that allows to upgrade all segments to last recent supported index
478
format without fully optimizing. (Uwe Schindler, Mike McCandless)
480
* LUCENE-1076: Added TieredMergePolicy which is able to merge non-contiguous
481
segments, which means docIDs no longer necessarily stay "in order".
482
(Mike McCandless, Shai Erera)
484
* LUCENE-3071: Adding ReversePathHierarchyTokenizer, added skip parameter to
485
PathHierarchyTokenizer (Olivier Favre via ryan)
487
* LUCENE-1421, LUCENE-3102: added CachingCollector which allow you to cache
488
document IDs and scores encountered during the search, and "replay" them to
489
another Collector. (Mike McCandless, Shai Erera)
491
* LUCENE-3112: Added experimental IndexWriter.add/updateDocuments,
492
enabling a block of documents to be indexed, atomically, with
493
guaranteed sequential docIDs. (Mike McCandless)
497
* LUCENE-3061: IndexWriter's getNextMerge() and merge(OneMerge) are now public
498
(though @lucene.experimental), allowing for custom MergeScheduler
499
implementations. (Shai Erera)
501
* LUCENE-3065: Document.getField() was deprecated, as it throws
502
ClassCastException when loading lazy fields or NumericFields.
503
(Uwe Schindler, Ryan McKinley, Mike McCandless)
505
* LUCENE-2027: Directory.touchFile is deprecated and will be removed
506
in 4.0. (Mike McCandless)
510
* LUCENE-2990: ArrayUtil/CollectionUtil.*Sort() methods now exit early
511
on empty or one-element lists/arrays. (Uwe Schindler)
513
* LUCENE-2897: Apply deleted terms while flushing a segment. We still
514
buffer deleted terms to later apply to past segments. (Mike McCandless)
516
* LUCENE-3126: IndexWriter.addIndexes copies incoming segments into CFS if they
517
aren't already and MergePolicy allows that. (Shai Erera)
521
* LUCENE-2996: addIndexes(IndexReader) did not flush before adding the new
522
indexes, causing existing deletions to be applied on the incoming indexes as
523
well. (Shai Erera, Mike McCandless)
525
* LUCENE-3024: Index with more than 2.1B terms was hitting AIOOBE when
526
seeking TermEnum (eg used by Solr's faceting) (Tom Burton-West, Mike
529
* LUCENE-3042: When a filter or consumer added Attributes to a TokenStream
530
chain after it was already (partly) consumed [or clearAttributes(),
531
captureState(), cloneAttributes(),... was called by the Tokenizer],
532
the Tokenizer calling clearAttributes() or capturing state after addition
533
may not do this on the newly added Attribute. This bug affected only
534
very special use cases of the TokenStream-API, most users would not
535
have recognized it. (Uwe Schindler, Robert Muir)
537
* LUCENE-3054: PhraseQuery can in some cases stack overflow in
538
SorterTemplate.quickSort(). This fix also adds an optimization to
539
PhraseQuery as term with lower doc freq will also have less positions.
540
(Uwe Schindler, Robert Muir, Otis Gospodnetic)
542
* LUCENE-3068: sloppy phrase query failed to match valid documents when multiple
543
query terms had same position in the query. (Doron Cohen)
545
* LUCENE-3012: Lucene writes the header now for separate norm files (*.sNNN)
550
* LUCENE-3006: Building javadocs will fail on warnings by default.
551
Override with -Dfailonjavadocwarning=false (sarowe, gsingers)
553
* LUCENE-3128: "ant eclipse" creates a .project file for easier Eclipse
554
integration (unless one already exists). (Daniel Serodio via Shai Erera)
558
* LUCENE-3002: added 'tests.iter.min' to control 'tests.iter' by allowing to
559
stop iterating if at least 'tests.iter.min' ran and a failure occured.
560
(Shai Erera, Chris Hostetter)
562
======================= Lucene 3.1.0 =======================
564
Changes in backwards compatibility policy
566
* LUCENE-2719: Changed API of internal utility class
567
org.apache.lucene.util.SorterTemplate to support faster quickSort using
568
pivot values and also merge sort and insertion sort. If you have used
569
this class, you have to implement two more methods for handling pivots.
570
(Uwe Schindler, Robert Muir, Mike McCandless)
572
* LUCENE-1923: Renamed SegmentInfo & SegmentInfos segString method to
573
toString. These are advanced APIs and subject to change suddenly.
574
(Tim Smith via Mike McCandless)
576
* LUCENE-2190: Removed deprecated customScore() and customExplain()
577
methods from experimental CustomScoreQuery. (Uwe Schindler)
579
* LUCENE-2286: Enabled DefaultSimilarity.setDiscountOverlaps by default.
580
This means that terms with a position increment gap of zero do not
581
affect the norms calculation by default. (Robert Muir)
583
* LUCENE-2320: MergePolicy.writer is now of type SetOnce, which allows setting
584
the IndexWriter for a MergePolicy exactly once. You can change references to
585
'writer' from <code>writer.doXYZ()</code> to <code>writer.get().doXYZ()</code>
586
(it is also advisable to add an <code>assert writer != null;</code> before you
587
access the wrapped IndexWriter.)
589
In addition, MergePolicy only exposes a default constructor, and the one that
590
took IndexWriter as argument has been removed from all MergePolicy extensions.
591
(Shai Erera via Mike McCandless)
593
* LUCENE-2328: SimpleFSDirectory.SimpleFSIndexInput is moved to
594
FSDirectory.FSIndexInput. Anyone extending this class will have to
595
fix their code on upgrading. (Earwin Burrfoot via Mike McCandless)
597
* LUCENE-2302: The new interface for term attributes, CharTermAttribute,
598
now implements CharSequence. This requires the toString() methods of
599
CharTermAttribute, deprecated TermAttribute, and Token to return only
600
the term text and no other attribute contents. LUCENE-2374 implements
601
an attribute reflection API to no longer rely on toString() for attribute
602
inspection. (Uwe Schindler, Robert Muir)
604
* LUCENE-2372, LUCENE-2389: StandardAnalyzer, KeywordAnalyzer,
605
PerFieldAnalyzerWrapper, WhitespaceTokenizer are now final. Also removed
606
the now obsolete and deprecated Analyzer.setOverridesTokenStreamMethod().
607
Analyzer and TokenStream base classes now have an assertion in their ctor,
608
that check subclasses to be final or at least have final implementations
609
of incrementToken(), tokenStream(), and reusableTokenStream().
610
(Uwe Schindler, Robert Muir)
612
* LUCENE-2316: Directory.fileLength contract was clarified - it returns the
613
actual file's length if the file exists, and throws FileNotFoundException
614
otherwise. Returning length=0 for a non-existent file is no longer allowed. If
615
you relied on that, make sure to catch the exception. (Shai Erera)
617
* LUCENE-2386: IndexWriter no longer performs an empty commit upon new index
618
creation. Previously, if you passed an empty Directory and set OpenMode to
619
CREATE*, IndexWriter would make a first empty commit. If you need that
620
behavior you can call writer.commit()/close() immediately after you create it.
621
(Shai Erera, Mike McCandless)
623
* LUCENE-2733: Removed public constructors of utility classes with only static
624
methods to prevent instantiation. (Uwe Schindler)
626
* LUCENE-2602: The default (LogByteSizeMergePolicy) merge policy now
627
takes deletions into account by default. You can disable this by
628
calling setCalibrateSizeByDeletes(false) on the merge policy. (Mike
631
* LUCENE-2529, LUCENE-2668: Position increment gap and offset gap of empty
632
values in multi-valued field has been changed for some cases in index.
633
If you index empty fields and uses positions/offsets information on that
634
fields, reindex is recommended. (David Smiley, Koji Sekiguchi)
636
* LUCENE-2804: Directory.setLockFactory new declares throwing an IOException.
637
(Shai Erera, Robert Muir)
639
* LUCENE-2837: Added deprecations noting that in 4.0, Searcher and
640
Searchable are collapsed into IndexSearcher; contrib/remote and
641
MultiSearcher have been removed. (Mike McCandless)
643
* LUCENE-2854: Deprecated SimilarityDelegator and
644
Similarity.lengthNorm; the latter is now final, forcing any custom
645
Similarity impls to cutover to the more general computeNorm (Robert
646
Muir, Mike McCandless)
648
* LUCENE-2869: Deprecated Query.getSimilarity: instead of using
649
"runtime" subclassing/delegation, subclass the Weight instead.
652
* LUCENE-2674: A new idfExplain method was added to Similarity, that
653
accepts an incoming docFreq. If you subclass Similarity, make sure
654
you also override this method on upgrade. (Robert Muir, Mike
657
Changes in runtime behavior
659
* LUCENE-1923: Made IndexReader.toString() produce something
660
meaningful (Tim Smith via Mike McCandless)
662
* LUCENE-2179: CharArraySet.clear() is now functional.
663
(Robert Muir, Uwe Schindler)
665
* LUCENE-2455: IndexWriter.addIndexes no longer optimizes the target index
666
before it adds the new ones. Also, the existing segments are not merged and so
667
the index will not end up with a single segment (unless it was empty before).
668
In addition, addIndexesNoOptimize was renamed to addIndexes and no longer
669
invokes a merge on the incoming and target segments, but instead copies the
670
segments to the target index. You can call maybeMerge or optimize after this
671
method completes, if you need to.
673
In addition, Directory.copyTo* were removed in favor of copy which takes the
674
target Directory, source and target files as arguments, and copies the source
675
file to the target Directory under the target file name. (Shai Erera)
677
* LUCENE-2663: IndexWriter no longer forcefully clears any existing
678
locks when create=true. This was a holdover from when
679
SimpleFSLockFactory was the default locking implementation, and,
680
even then it was dangerous since it could mask bugs in IndexWriter's
681
usage, allowing applications to accidentally open two writers on the
682
same directory. (Mike McCandless)
684
* LUCENE-2701: maxMergeMBForOptimize and maxMergeDocs constraints set on
685
LogMergePolicy now affect optimize() as well (as opposed to only regular
686
merges). This means that you can run optimize() and too large segments won't
687
be merged. (Shai Erera)
689
* LUCENE-2753: IndexReader and DirectoryReader .listCommits() now return a List,
690
guaranteeing the commits are sorted from oldest to latest. (Shai Erera)
692
* LUCENE-2785: TopScoreDocCollector, TopFieldCollector and
693
the IndexSearcher search methods that take an int nDocs will now
694
throw IllegalArgumentException if nDocs is 0. Instead, you should
695
use the newly added TotalHitCountCollector. (Mike McCandless)
697
* LUCENE-2790: LogMergePolicy.useCompoundFile's logic now factors in noCFSRatio
698
to determine whether the passed in segment should be compound.
699
(Shai Erera, Earwin Burrfoot)
701
* LUCENE-2805: IndexWriter now increments the index version on every change to
702
the index instead of for every commit. Committing or closing the IndexWriter
703
without any changes to the index will not cause any index version increment.
704
(Simon Willnauer, Mike McCandless)
706
* LUCENE-2650, LUCENE-2825: The behavior of FSDirectory.open has changed. On 64-bit
707
Windows and Solaris systems that support unmapping, FSDirectory.open returns
708
MMapDirectory. Additionally the behavior of MMapDirectory has been
709
changed to enable unmapping by default if supported by the JRE.
710
(Mike McCandless, Uwe Schindler, Robert Muir)
712
* LUCENE-2829: Improve the performance of "primary key" lookup use
713
case (running a TermQuery that matches one document) on a
714
multi-segment index. (Robert Muir, Mike McCandless)
716
* LUCENE-2010: Segments with 100% deleted documents are now removed on
717
IndexReader or IndexWriter commit. (Uwe Schindler, Mike McCandless)
719
* LUCENE-2960: Allow some changes to IndexWriterConfig to take effect
720
"live" (after an IW is instantiated), via
721
IndexWriter.getConfig().setXXX(...) (Shay Banon, Mike McCandless)
725
* LUCENE-2076: Rename FSDirectory.getFile -> getDirectory. (George
726
Aroush via Mike McCandless)
728
* LUCENE-1260: Change norm encode (float->byte) and decode
729
(byte->float) to be instance methods not static methods. This way a
730
custom Similarity can alter how norms are encoded, though they must
731
still be encoded as a single byte (Johan Kindgren via Mike
734
* LUCENE-2103: NoLockFactory should have a private constructor;
735
until Lucene 4.0 the default one will be deprecated.
736
(Shai Erera via Uwe Schindler)
738
* LUCENE-2177: Deprecate the Field ctors that take byte[] and Store.
739
Since the removal of compressed fields, Store can only be YES, so
740
it's not necessary to specify. (Erik Hatcher via Mike McCandless)
742
* LUCENE-2200: Several final classes had non-overriding protected
743
members. These were converted to private and unused protected
744
constructors removed. (Steven Rowe via Robert Muir)
746
* LUCENE-2240: SimpleAnalyzer and WhitespaceAnalyzer now have
747
Version ctors. (Simon Willnauer via Uwe Schindler)
749
* LUCENE-2259: Add IndexWriter.deleteUnusedFiles, to attempt removing
750
unused files. This is only useful on Windows, which prevents
751
deletion of open files. IndexWriter will eventually remove these
752
files itself; this method just lets you do so when you know the
753
files are no longer open by IndexReaders. (luocanrao via Mike
756
* LUCENE-2282: IndexFileNames is exposed as a public class allowing for easier
757
use by external code. In addition it offers a matchExtension method which
758
callers can use to query whether a certain file matches a certain extension.
759
(Shai Erera via Mike McCandless)
761
* LUCENE-124: Add a TopTermsBoostOnlyBooleanQueryRewrite to MultiTermQuery.
762
This rewrite method is similar to TopTermsScoringBooleanQueryRewrite, but
763
only scores terms by their boost values. For example, this can be used
764
with FuzzyQuery to ensure that exact matches are always scored higher,
765
because only the boost will be used in scoring. (Robert Muir)
767
* LUCENE-2015: Add a static method foldToASCII to ASCIIFoldingFilter to
768
expose its folding logic. (Cédrik Lime via Robert Muir)
770
* LUCENE-2294: IndexWriter constructors have been deprecated in favor of a
771
single ctor which accepts IndexWriterConfig and a Directory. You can set all
772
the parameters related to IndexWriter on IndexWriterConfig. The different
773
setter/getter methods were deprecated as well. One should call
774
writer.getConfig().getXYZ() to query for a parameter XYZ.
775
Additionally, the setter/getter related to MergePolicy were deprecated as
776
well. One should interact with the MergePolicy directly.
777
(Shai Erera via Mike McCandless)
779
* LUCENE-2320: IndexWriter's MergePolicy configuration was moved to
780
IndexWriterConfig and the respective methods on IndexWriter were deprecated.
781
(Shai Erera via Mike McCandless)
783
* LUCENE-2328: Directory now keeps track itself of the files that are written
784
but not yet fsynced. The old Directory.sync(String file) method is deprecated
785
and replaced with Directory.sync(Collection<String> files). Take a look at
786
FSDirectory to see a sample of how such tracking might look like, if needed
787
in your custom Directories. (Earwin Burrfoot via Mike McCandless)
789
* LUCENE-2302: Deprecated TermAttribute and replaced by a new
790
CharTermAttribute. The change is backwards compatible, so
791
mixed new/old TokenStreams all work on the same char[] buffer
792
independent of which interface they use. CharTermAttribute
793
has shorter method names and implements CharSequence and
794
Appendable. This allows usage like Java's StringBuilder in
795
addition to direct char[] access. Also terms can directly be
796
used in places where CharSequence is allowed (e.g. regular
798
(Uwe Schindler, Robert Muir)
800
* LUCENE-2402: IndexWriter.deleteUnusedFiles now deletes unreferenced commit
801
points too. If you use an IndexDeletionPolicy which holds onto index commits
802
(such as SnapshotDeletionPolicy), you can call this method to remove those
803
commit points when they are not needed anymore (instead of waiting for the
804
next commit). (Shai Erera)
806
* LUCENE-2481: SnapshotDeletionPolicy.snapshot() and release() were replaced
807
with equivalent ones that take a String (id) as argument. You can pass
808
whatever ID you want, as long as you use the same one when calling both.
811
* LUCENE-2356: Add IndexWriterConfig.set/getReaderTermIndexDivisor, to
812
set what IndexWriter passes for termsIndexDivisor to the readers it
813
opens internally when apply deletions or creating a near-real-time
814
reader. (Earwin Burrfoot via Mike McCandless)
816
* LUCENE-2167,LUCENE-2699,LUCENE-2763,LUCENE-2847: StandardTokenizer/Analyzer
817
in common/standard/ now implement the Word Break rules from the Unicode 6.0.0
818
Text Segmentation algorithm (UAX#29), covering the full range of Unicode code
819
points, including values from U+FFFF to U+10FFFF
821
ClassicTokenizer/Analyzer retains the old (pre-Lucene 3.1) StandardTokenizer/
822
Analyzer implementation and behavior. Only the Unicode Basic Multilingual
823
Plane (code points from U+0000 to U+FFFF) is covered.
825
UAX29URLEmailTokenizer tokenizes URLs and E-mail addresses according to the
826
relevant RFCs, in addition to implementing the UAX#29 Word Break rules.
827
(Steven Rowe, Robert Muir, Uwe Schindler)
829
* LUCENE-2778: RAMDirectory now exposes newRAMFile() which allows to override
830
and return a different RAMFile implementation. (Shai Erera)
832
* LUCENE-2785: Added TotalHitCountCollector whose sole purpose is to
833
count the number of hits matching the query. (Mike McCandless)
835
* LUCENE-2846: Deprecated IndexReader.setNorm(int, String, float). This method
836
is only syntactic sugar for setNorm(int, String, byte), but using the global
837
Similarity.getDefault().encodeNormValue(). Use the byte-based method instead
838
to ensure that the norm is encoded with your Similarity.
839
(Robert Muir, Mike McCandless)
841
* LUCENE-2374: Added Attribute reflection API: It's now possible to inspect the
842
contents of AttributeImpl and AttributeSource using a well-defined API.
843
This is e.g. used by Solr's AnalysisRequestHandlers to display all attributes
845
There are also some backwards incompatible changes in toString() output,
846
as LUCENE-2302 introduced the CharSequence interface to CharTermAttribute
847
leading to changed toString() return values. The new API allows to get a
848
string representation in a well-defined way using a new method
849
reflectAsString(). For backwards compatibility reasons, when toString()
850
was implemented by implementation subclasses, the default implementation of
851
AttributeImpl.reflectWith() uses toString()s output instead to report the
852
Attribute's properties. Otherwise, reflectWith() uses Java's reflection
853
(like toString() did before) to get the attribute properties.
854
In addition, the mandatory equals() and hashCode() are no longer required
855
for AttributeImpls, but can still be provided (if needed).
858
* LUCENE-2691: Deprecate IndexWriter.getReader in favor of
859
IndexReader.open(IndexWriter) (Grant Ingersoll, Mike McCandless)
861
* LUCENE-2876: Deprecated Scorer.getSimilarity(). If your Scorer uses a Similarity,
862
it should keep it itself. Fixed Scorers to pass their parent Weight, so that
863
Scorer.visitSubScorers (LUCENE-2590) will work correctly.
864
(Robert Muir, Doron Cohen)
866
* LUCENE-2900: When opening a near-real-time (NRT) reader
867
(IndexReader.re/open(IndexWriter)) you can now specify whether
868
deletes should be applied. Applying deletes can be costly, and some
869
expert use cases can handle seeing deleted documents returned. The
870
deletes remain buffered so that the next time you open an NRT reader
871
and pass true, all deletes will be a applied. (Mike McCandless)
873
* LUCENE-1253: LengthFilter (and Solr's KeepWordTokenFilter) now
874
require up front specification of enablePositionIncrement. Together with
875
StopFilter they have a common base class (FilteringTokenFilter) that handles
876
the position increments automatically. Implementors only need to override an
877
accept() method that filters tokens. (Uwe Schindler, Robert Muir)
881
* LUCENE-2249: ParallelMultiSearcher should shut down thread pool on
882
close. (Martin Traverso via Uwe Schindler)
884
* LUCENE-2273: FieldCacheImpl.getCacheEntries() used WeakHashMap
885
incorrectly and lead to ConcurrentModificationException.
886
(Uwe Schindler, Robert Muir)
888
* LUCENE-2328: Index files fsync tracking moved from
889
IndexWriter/IndexReader to Directory, and it no longer leaks memory.
890
(Earwin Burrfoot via Mike McCandless)
892
* LUCENE-2074: Reduce buffer size of lexer back to default on reset.
893
(Ruben Laguna, Shai Erera via Uwe Schindler)
895
* LUCENE-2496: Don't throw NPE if IndexWriter is opened with CREATE on
896
a prior (corrupt) index missing its segments_N file. (Mike
899
* LUCENE-2458: QueryParser no longer automatically forms phrase queries,
900
assuming whitespace tokenization. Previously all CJK queries, for example,
901
would be turned into phrase queries. The old behavior is preserved with
902
the matchVersion parameter for previous versions. Additionally, you can
903
explicitly enable the old behavior with setAutoGeneratePhraseQueries(true)
906
* LUCENE-2537: FSDirectory.copy() implementation was unsafe and could result in
907
OOM if a large file was copied. (Shai Erera)
909
* LUCENE-2580: MultiPhraseQuery throws AIOOBE if number of positions
910
exceeds number of terms at one position (Jayendra Patil via Mike McCandless)
912
* LUCENE-2617: Optional clauses of a BooleanQuery were not factored
913
into coord if the scorer for that segment returned null. This
914
can cause the same document to score to differently depending on
915
what segment it resides in. (yonik)
917
* LUCENE-2272: Fix explain in PayloadNearQuery and also fix scoring issue (Peter Keegan via Grant Ingersoll)
919
* LUCENE-2732: Fix charset problems in XML loading in
920
HyphenationCompoundWordTokenFilter. (Uwe Schindler)
922
* LUCENE-2802: NRT DirectoryReader returned incorrect values from
923
getVersion, isOptimized, getCommitUserData, getIndexCommit and isCurrent due
924
to a mutable reference to the IndexWriters SegmentInfos.
925
(Simon Willnauer, Earwin Burrfoot)
927
* LUCENE-2852: Fixed corner case in RAMInputStream that would hit a
928
false EOF after seeking to EOF then seeking back to same block you
929
were just in and then calling readBytes (Robert Muir, Mike McCandless)
931
* LUCENE-2860: Fixed SegmentInfo.sizeInBytes to factor includeDocStores when it
932
decides whether to return the cached computed size or not. (Shai Erera)
934
* LUCENE-2584: SegmentInfo.files() could hit ConcurrentModificationException if
935
called by multiple threads. (Alexander Kanarsky via Shai Erera)
937
* LUCENE-2809: Fixed IndexWriter.numDocs to take into account
938
applied but not yet flushed deletes. (Mike McCandless)
940
* LUCENE-2879: MultiPhraseQuery previously calculated its phrase IDF by summing
941
internally, it now calls Similarity.idfExplain(Collection, IndexSearcher).
944
* LUCENE-2693: RAM used by IndexWriter was slightly incorrectly computed.
945
(Jason Rutherglen via Shai Erera)
947
* LUCENE-1846: DateTools now uses the US locale everywhere, so DateTools.round()
948
is safe also in strange locales. (Uwe Schindler)
950
* LUCENE-2891: IndexWriterConfig did not accept -1 in setReaderTermIndexDivisor,
951
which can be used to prevent loading the terms index into memory. (Shai Erera)
953
* LUCENE-2937: Encoding a float into a byte (e.g. encoding field norms during
954
indexing) had an underflow detection bug that caused floatToByte(f)==0 where
955
f was greater than 0, but slightly less than byteToFloat(1). This meant that
956
certain very small field norms (index_boost * length_norm) could have
957
been rounded down to 0 instead of being rounded up to the smallest
958
positive number. (yonik)
960
* LUCENE-2936: PhraseQuery score explanations were not correctly
961
identifying matches vs non-matches. (hossman)
963
* LUCENE-2975: A hotspot bug corrupts IndexInput#readVInt()/readVLong() if
964
the underlying readByte() is inlined (which happens e.g. in MMapDirectory).
965
The loop was unwinded which makes the hotspot bug disappear.
966
(Uwe Schindler, Robert Muir, Mike McCandless)
970
* LUCENE-2128: Parallelized fetching document frequencies during weight
971
creation. (Israel Tsadok, Simon Willnauer via Uwe Schindler)
973
* LUCENE-2069: Added Unicode 4 support to CharArraySet. Due to the switch
974
to Java 5, supplementary characters are now lowercased correctly if the
975
set is created as case insensitive.
976
CharArraySet now requires a Version argument to preserve
977
backwards compatibility. If Version < 3.1 is passed to the constructor,
978
CharArraySet yields the old behavior. (Simon Willnauer)
980
* LUCENE-2069: Added Unicode 4 support to LowerCaseFilter. Due to the switch
981
to Java 5, supplementary characters are now lowercased correctly.
982
LowerCaseFilter now requires a Version argument to preserve
983
backwards compatibility. If Version < 3.1 is passed to the constructor,
984
LowerCaseFilter yields the old behavior. (Simon Willnauer, Robert Muir)
986
* LUCENE-2034: Added ReusableAnalyzerBase, an abstract subclass of Analyzer
987
that makes it easier to reuse TokenStreams correctly. This issue also added
988
StopwordAnalyzerBase, which improves consistency of all Analyzers that use
989
stopwords, and implement many analyzers in contrib with it.
990
(Simon Willnauer via Robert Muir)
992
* LUCENE-2198, LUCENE-2901: Support protected words in stemming TokenFilters using a
993
new KeywordAttribute. (Simon Willnauer, Drew Farris via Uwe Schindler)
995
* LUCENE-2183, LUCENE-2240, LUCENE-2241: Added Unicode 4 support
996
to CharTokenizer and its subclasses. CharTokenizer now has new
997
int-API which is conditionally preferred to the old char-API depending
998
on the provided Version. Version < 3.1 will use the char-API.
999
(Simon Willnauer via Uwe Schindler)
1001
* LUCENE-2247: Added a CharArrayMap<V> for performance improvements
1002
in some stemmers and synonym filters. (Uwe Schindler)
1004
* LUCENE-2320: Added SetOnce which wraps an object and allows it to be set
1005
exactly once. (Shai Erera via Mike McCandless)
1007
* LUCENE-2314: Added AttributeSource.copyTo(AttributeSource) that
1008
allows to use cloneAttributes() and this method as a replacement
1009
for captureState()/restoreState(), if the state itself
1010
needs to be inspected/modified. (Uwe Schindler)
1012
* LUCENE-2293: Expose control over max number of threads that
1013
IndexWriter will allow to run concurrently while indexing
1014
documents (previously this was hardwired to 5), using
1015
IndexWriterConfig.setMaxThreadStates. (Mike McCandless)
1017
* LUCENE-2297: Enable turning on reader pooling inside IndexWriter
1018
even when getReader (near-real-timer reader) is not in use, through
1019
IndexWriterConfig.enable/disableReaderPooling. (Mike McCandless)
1021
* LUCENE-2331: Add NoMergePolicy which never returns any merges to execute. In
1022
addition, add NoMergeScheduler which never executes any merges. These two are
1023
convenient classes in case you want to disable segment merges by IndexWriter
1024
without tweaking a particular MergePolicy parameters, such as mergeFactor.
1025
MergeScheduler's methods are now public. (Shai Erera via Mike McCandless)
1027
* LUCENE-2339: Deprecate static method Directory.copy in favor of
1028
Directory.copyTo, and use nio's FileChannel.transferTo when copying
1029
files between FSDirectory instances. (Earwin Burrfoot via Mike
1032
* LUCENE-2074: Make StandardTokenizer fit for Unicode 4.0, if the
1033
matchVersion parameter is Version.LUCENE_31. (Uwe Schindler)
1035
* LUCENE-2385: Moved NoDeletionPolicy from benchmark to core. NoDeletionPolicy
1036
can be used to prevent commits from ever getting deleted from the index.
1039
* LUCENE-1585: IndexWriter now accepts a PayloadProcessorProvider which can
1040
return a DirPayloadProcessor for a given Directory, which returns a
1041
PayloadProcessor for a given Term. The PayloadProcessor will be used to
1042
process the payloads of the segments as they are merged (e.g. if one wants to
1043
rewrite payloads of external indexes as they are added, or of local ones).
1044
(Shai Erera, Michael Busch, Mike McCandless)
1046
* LUCENE-2440: Add support for custom ExecutorService in
1047
ParallelMultiSearcher (Edward Drapkin via Mike McCandless)
1049
* LUCENE-2295: Added a LimitTokenCountAnalyzer / LimitTokenCountFilter
1050
to wrap any other Analyzer and provide the same functionality as
1051
MaxFieldLength provided on IndexWriter. This patch also fixes a bug
1052
in the offset calculation in CharTokenizer. (Uwe Schindler, Shai Erera)
1054
* LUCENE-2526: Don't throw NPE from MultiPhraseQuery.toString when
1055
it's empty. (Ross Woolf via Mike McCandless)
1057
* LUCENE-2559: Added SegmentReader.reopen methods (John Wang via Mike
1060
* LUCENE-2590: Added Scorer.visitSubScorers, and Scorer.freq. Along
1061
with a custom Collector these experimental methods make it possible
1062
to gather the hit-count per sub-clause and per document while a
1063
search is running. (Simon Willnauer, Mike McCandless)
1065
* LUCENE-2636: Added MultiCollector which allows running the search with several
1066
Collectors. (Shai Erera)
1068
* LUCENE-2754, LUCENE-2757: Added a wrapper around MultiTermQueries
1069
to add span support: SpanMultiTermQueryWrapper<Q extends MultiTermQuery>.
1070
Using this wrapper its easy to add fuzzy/wildcard to e.g. a SpanNearQuery.
1071
(Robert Muir, Uwe Schindler)
1073
* LUCENE-2838: ConstantScoreQuery now directly supports wrapping a Query
1074
instance for stripping off scores. The use of a QueryWrapperFilter
1075
is no longer needed and discouraged for that use case. Directly wrapping
1076
Query improves performance, as out-of-order collection is now supported.
1079
* LUCENE-2864: Add getMaxTermFrequency (maximum within-document TF) to
1080
FieldInvertState so that it can be used in Similarity.computeNorm.
1083
* LUCENE-2720: Segments now record the code version which created them.
1084
(Shai Erera, Mike McCandless, Uwe Schindler)
1086
* LUCENE-2474: Added expert ReaderFinishedListener API to
1087
IndexReader, to allow apps that maintain external per-segment caches
1088
to evict entries when a segment is finished. (Shay Banon, Yonik
1089
Seeley, Mike McCandless)
1091
* LUCENE-2911: The new StandardTokenizer, UAX29URLEmailTokenizer, and
1092
the ICUTokenizer in contrib now all tag types with a consistent set
1093
of token types (defined in StandardTokenizer). Tokens in the major
1094
CJK types are explicitly marked to allow for custom downstream handling:
1095
<IDEOGRAPHIC>, <HANGUL>, <KATAKANA>, and <HIRAGANA>.
1096
(Robert Muir, Steven Rowe)
1098
* LUCENE-2913: Add missing getters to Numeric* classes. (Uwe Schindler)
1100
* LUCENE-1810: Added FieldSelectorResult.LATENT to not cache lazy loaded fields
1101
(Tim Smith, Grant Ingersoll)
1103
* LUCENE-2692: Added several new SpanQuery classes for positional checking
1104
(match is in a range, payload is a specific value) (Grant Ingersoll)
1108
* LUCENE-2494: Use CompletionService in ParallelMultiSearcher instead of
1109
simple polling for results. (Edward Drapkin, Simon Willnauer)
1111
* LUCENE-2075: Terms dict cache is now shared across threads instead
1112
of being stored separately in thread local storage. Also fixed
1113
terms dict so that the cache is used when seeking the thread local
1114
term enum, which will be important for MultiTermQuery impls that do
1115
lots of seeking (Mike McCandless, Uwe Schindler, Robert Muir, Yonik
1118
* LUCENE-2136: If the multi reader (DirectoryReader or MultiReader)
1119
only has a single sub-reader, delegate all enum requests to it.
1120
This avoid the overhead of using a PQ unnecessarily. (Mike
1123
* LUCENE-2137: Switch to AtomicInteger for some ref counting (Earwin
1124
Burrfoot via Mike McCandless)
1126
* LUCENE-2123, LUCENE-2261: Move FuzzyQuery rewrite to separate RewriteMode
1127
into MultiTermQuery. The number of fuzzy expansions can be specified with
1128
the maxExpansions parameter to FuzzyQuery.
1129
(Uwe Schindler, Robert Muir, Mike McCandless)
1131
* LUCENE-2164: ConcurrentMergeScheduler has more control over merge
1132
threads. First, it gives smaller merges higher thread priority than
1133
larges ones. Second, a new set/getMaxMergeCount setting will pause
1134
the larger merges to allow smaller ones to finish. The defaults for
1135
these settings are now dynamic, depending the number CPU cores as
1136
reported by Runtime.getRuntime().availableProcessors() (Mike
1139
* LUCENE-2169: Improved CharArraySet.copy(), if source set is
1140
also a CharArraySet. (Simon Willnauer via Uwe Schindler)
1142
* LUCENE-2084: Change IndexableBinaryStringTools to work on byte[] and char[]
1143
directly, instead of Byte/CharBuffers, and modify CollationKeyFilter to
1144
take advantage of this for faster performance.
1145
(Steven Rowe, Uwe Schindler, Robert Muir)
1147
* LUCENE-2188: Add a utility class for tracking deprecated overridden
1148
methods in non-final subclasses.
1149
(Uwe Schindler, Robert Muir)
1151
* LUCENE-2195: Speedup CharArraySet if set is empty.
1152
(Simon Willnauer via Robert Muir)
1154
* LUCENE-2285: Code cleanup. (Shai Erera via Uwe Schindler)
1156
* LUCENE-2303: Remove code duplication in Token class by subclassing
1157
TermAttributeImpl, move DEFAULT_TYPE constant to TypeInterface, improve
1158
null-handling for TypeAttribute. (Uwe Schindler)
1160
* LUCENE-2329: Switch TermsHash* from using a PostingList object per unique
1161
term to parallel arrays, indexed by termID. This reduces garbage collection
1162
overhead significantly, which results in great indexing performance wins
1163
when the available JVM heap space is low. This will become even more
1164
important when the DocumentsWriter RAM buffer is searchable in the future,
1165
because then it will make sense to make the RAM buffers as large as
1166
possible. (Mike McCandless, Michael Busch)
1168
* LUCENE-2380: The terms field cache methods (getTerms,
1169
getTermsIndex), which replace the older String equivalents
1170
(getStrings, getStringIndex), consume quite a bit less RAM in most
1171
cases. (Mike McCandless)
1173
* LUCENE-2410: ~20% speedup on exact (slop=0) PhraseQuery matching.
1176
* LUCENE-2531: Fix issue when sorting by a String field that was
1177
causing too many fallbacks to compare-by-value (instead of by-ord).
1180
* LUCENE-2574: IndexInput exposes copyBytes(IndexOutput, long) to allow for
1181
efficient copying by sub-classes. Optimized copy is implemented for RAM and FS
1182
streams. (Shai Erera)
1184
* LUCENE-2719: Improved TermsHashPerField's sorting to use a better
1185
quick sort algorithm that dereferences the pivot element not on
1186
every compare call. Also replaced lots of sorting code in Lucene
1187
by the improved SorterTemplate class.
1188
(Uwe Schindler, Robert Muir, Mike McCandless)
1190
* LUCENE-2760: Optimize SpanFirstQuery and SpanPositionRangeQuery.
1193
* LUCENE-2770: Make SegmentMerger always work on atomic subreaders,
1194
even when IndexWriter.addIndexes(IndexReader...) is used with
1195
DirectoryReaders or other MultiReaders. This saves lots of memory
1196
during merge of norms. (Uwe Schindler, Mike McCandless)
1198
* LUCENE-2824: Optimize BufferedIndexInput to do less bounds checks.
1201
* LUCENE-2010: Segments with 100% deleted documents are now removed on
1202
IndexReader or IndexWriter commit. (Uwe Schindler, Mike McCandless)
1204
* LUCENE-1472: Removed synchronization from static DateTools methods
1205
by using a ThreadLocal. Also converted DateTools.Resolution to a
1206
Java 5 enum (this should not break backwards). (Uwe Schindler)
1210
* LUCENE-2124: Moved the JDK-based collation support from contrib/collation
1211
into core, and moved the ICU-based collation support into contrib/icu.
1214
* LUCENE-2326: Removed SVN checkouts for backwards tests. The backwards
1215
branch is now included in the svn repository using "svn copy"
1216
after release. (Uwe Schindler)
1218
* LUCENE-2074: Regenerating StandardTokenizerImpl files now needs
1219
JFlex 1.5 (currently only available on SVN). (Uwe Schindler)
1221
* LUCENE-1709: Tests are now parallelized by default (except for benchmark). You
1222
can force them to run sequentially by passing -Drunsequential=1 on the command
1223
line. The number of threads that are spawned per CPU defaults to '1'. If you
1224
wish to change that, you can run the tests with -DthreadsPerProcessor=[num].
1225
(Robert Muir, Shai Erera, Peter Kofler)
1227
* LUCENE-2516: Backwards tests are now compiled against released lucene-core.jar
1228
from tarball of previous version. Backwards tests are now packaged together
1229
with src distribution. (Uwe Schindler)
1231
* LUCENE-2611: Added Ant target to install IntelliJ IDEA configuration:
1232
"ant idea". See http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ
1235
* LUCENE-2657: Switch from using Maven POM templates to full POMs when
1236
generating Maven artifacts (Steven Rowe)
1238
* LUCENE-2609: Added jar-test-framework Ant target which packages Lucene's
1239
tests' framework classes. (Drew Farris, Grant Ingersoll, Shai Erera,
1244
* LUCENE-2037 Allow Junit4 tests in our environment (Erick Erickson
1245
via Mike McCandless)
1247
* LUCENE-1844: Speed up the unit tests (Mark Miller, Erick Erickson,
1250
* LUCENE-2065: Use Java 5 generics throughout our unit tests. (Kay
1251
Kay via Mike McCandless)
1253
* LUCENE-2155: Fix time and zone dependent localization test failures
1254
in queryparser tests. (Uwe Schindler, Chris Male, Robert Muir)
1256
* LUCENE-2170: Fix thread starvation problems. (Uwe Schindler)
1258
* LUCENE-2248, LUCENE-2251, LUCENE-2285: Refactor tests to not use
1259
Version.LUCENE_CURRENT, but instead use a global static value
1260
from LuceneTestCase(J4), that contains the release version.
1261
(Uwe Schindler, Simon Willnauer, Shai Erera)
1263
* LUCENE-2313, LUCENE-2322: Add VERBOSE to LuceneTestCase(J4) to control
1264
verbosity of tests. If VERBOSE==false (default) tests should not print
1265
anything other than errors to System.(out|err). The setting can be
1266
changed with -Dtests.verbose=true on test invocation.
1267
(Shai Erera, Paul Elschot, Uwe Schindler)
1269
* LUCENE-2318: Remove inconsistent system property code for retrieving
1270
temp and data directories inside test cases. It is now centralized in
1271
LuceneTestCase(J4). Also changed lots of tests to use
1272
getClass().getResourceAsStream() to retrieve test data. Tests needing
1273
access to "real" files from the test folder itself, can use
1274
LuceneTestCase(J4).getDataFile(). (Uwe Schindler)
1276
* LUCENE-2398, LUCENE-2611: Improve tests to work better from IDEs such
1277
as Eclipse and IntelliJ.
1278
(Paolo Castagna, Steven Rowe via Robert Muir)
1280
* LUCENE-2804: add newFSDirectory to LuceneTestCase to create a FSDirectory at
1281
random. (Shai Erera, Robert Muir)
1285
* LUCENE-2579: Fix oal.search's package.html description of abstract
1286
methods. (Santiago M. Mola via Mike McCandless)
1288
* LUCENE-2625: Add a note to IndexReader.termDocs() with additional verbiage
1289
that the TermEnum must be seeked since it is unpositioned.
1290
(Adriano Crestani via Robert Muir)
1292
* LUCENE-2894: Use google-code-prettify for syntax highlighting in javadoc.
1293
(Shinichiro Abe, Koji Sekiguchi)
1295
================== Release 2.9.4 / 3.0.3 ====================
1297
Changes in runtime behavior
1299
* LUCENE-2689: NativeFSLockFactory no longer attempts to acquire a
1300
test lock just before the real lock is acquired. (Surinder Pal
1301
Singh Bindra via Mike McCandless)
1303
* LUCENE-2762: Fixed bug in IndexWriter causing it to hold open file
1304
handles against deleted files when compound-file was enabled (the
1305
default) and readers are pooled. As a result of this the peak
1306
worst-case free disk space required during optimize is now 3X the
1307
index size, when compound file is enabled (else 2X). (Mike
1310
* LUCENE-2773: LogMergePolicy accepts a double noCFSRatio (default =
1311
0.1), which means any time a merged segment is greater than 10% of
1312
the index size, it will be left in non-compound format even if
1313
compound format is on. This change was made to reduce peak
1314
transient disk usage during optimize which increased due to
1315
LUCENE-2762. (Mike McCandless)
1319
* LUCENE-2142 (correct fix): FieldCacheImpl.getStringIndex no longer
1320
throws an exception when term count exceeds doc count.
1321
(Mike McCandless, Uwe Schindler)
1323
* LUCENE-2513: when opening writable IndexReader on a not-current
1324
commit, do not overwrite "future" commits. (Mike McCandless)
1326
* LUCENE-2536: IndexWriter.rollback was failing to properly rollback
1327
buffered deletions against segments that were flushed (Mark Harwood
1328
via Mike McCandless)
1330
* LUCENE-2541: Fixed NumericRangeQuery that returned incorrect results
1331
with endpoints near Long.MIN_VALUE and Long.MAX_VALUE:
1332
NumericUtils.splitRange() overflowed, if
1333
- the range contained a LOWER bound
1334
that was greater than (Long.MAX_VALUE - (1L << precisionStep))
1335
- the range contained an UPPER bound
1336
that was less than (Long.MIN_VALUE + (1L << precisionStep))
1337
With standard precision steps around 4, this had no effect on
1338
most queries, only those that met the above conditions.
1339
Queries with large precision steps failed more easy. Queries with
1340
precision step >=64 were not affected. Also 32 bit data types int
1341
and float were not affected.
1342
(Yonik Seeley, Uwe Schindler)
1344
* LUCENE-2593: Fixed certain rare cases where a disk full could lead
1345
to a corrupted index (Robert Muir, Mike McCandless)
1347
* LUCENE-2620: Fixed a bug in WildcardQuery where too many asterisks
1348
would result in unbearably slow performance. (Nick Barkas via Robert Muir)
1350
* LUCENE-2627: Fixed bug in MMapDirectory chunking when a file is an
1351
exact multiple of the chunk size. (Robert Muir)
1353
* LUCENE-2634: isCurrent on an NRT reader was failing to return false
1354
if the writer had just committed (Nikolay Zamosenchuk via Mike McCandless)
1356
* LUCENE-2650: Added extra safety to MMapIndexInput clones to prevent accessing
1357
an unmapped buffer if the input is closed (Mike McCandless, Uwe Schindler, Robert Muir)
1359
* LUCENE-2384: Reset zzBuffer in StandardTokenizerImpl when lexer is reset.
1360
(Ruben Laguna via Uwe Schindler, sub-issue of LUCENE-2074)
1362
* LUCENE-2658: Exceptions while processing term vectors enabled for multiple
1363
fields could lead to invalid ArrayIndexOutOfBoundsExceptions.
1364
(Robert Muir, Mike McCandless)
1366
* LUCENE-2235: Implement missing PerFieldAnalyzerWrapper.getOffsetGap().
1367
(Javier Godoy via Uwe Schindler)
1369
* LUCENE-2328: Fixed memory leak in how IndexWriter/Reader tracked
1370
already sync'd files. (Earwin Burrfoot via Mike McCandless)
1372
* LUCENE-2549: Fix TimeLimitingCollector#TimeExceededException to record
1373
the absolute docid. (Uwe Schindler)
1375
* LUCENE-2533: fix FileSwitchDirectory.listAll to not return dups when
1376
primary & secondary dirs share the same underlying directory.
1377
(Michael McCandless)
1379
* LUCENE-2365: IndexWriter.newestSegment (used normally for testing)
1380
is fixed to return null if there are no segments. (Karthick
1381
Sankarachary via Mike McCandless)
1383
* LUCENE-2730: Fix two rare deadlock cases in IndexWriter (Mike McCandless)
1385
* LUCENE-2744: CheckIndex was stating total number of fields,
1386
not the number that have norms enabled, on the "test: field
1387
norms..." output. (Mark Kristensson via Mike McCandless)
1389
* LUCENE-2759: Fixed two near-real-time cases where doc store files
1390
may be opened for read even though they are still open for write.
1393
* LUCENE-2618: Fix rare thread safety issue whereby
1394
IndexWriter.optimize could sometimes return even though the index
1395
wasn't fully optimized (Mike McCandless)
1397
* LUCENE-2767: Fix thread safety issue in addIndexes(IndexReader[])
1398
that could potentially result in index corruption. (Mike
1401
* LUCENE-2762: Fixed bug in IndexWriter causing it to hold open file
1402
handles against deleted files when compound-file was enabled (the
1403
default) and readers are pooled. As a result of this the peak
1404
worst-case free disk space required during optimize is now 3X the
1405
index size, when compound file is enabled (else 2X). (Mike
1408
* LUCENE-2216: OpenBitSet.hashCode returned different hash codes for
1409
sets that only differed by trailing zeros. (Dawid Weiss, yonik)
1411
* LUCENE-2782: Fix rare potential thread hazard with
1412
IndexWriter.commit (Mike McCandless)
1416
* LUCENE-2773: LogMergePolicy accepts a double noCFSRatio (default =
1417
0.1), which means any time a merged segment is greater than 10% of
1418
the index size, it will be left in non-compound format even if
1419
compound format is on. This change was made to reduce peak
1420
transient disk usage during optimize which increased due to
1421
LUCENE-2762. (Mike McCandless)
1425
* LUCENE-2556: Improve memory usage after cloning TermAttribute.
1426
(Adriano Crestani via Uwe Schindler)
1428
* LUCENE-2098: Improve the performance of BaseCharFilter, especially for
1429
large documents. (Robin Wojciki, Koji Sekiguchi, Robert Muir)
1433
* LUCENE-2675 (2.9.4 only): Add support for Lucene 3.0 stored field files
1434
also in 2.9. The file format did not change, only the version number was
1435
upgraded to mark segments that have no compression. FieldsWriter still only
1436
writes 2.9 segments as they could contain compressed fields. This cross-version
1437
index format compatibility is provided here solely because Lucene 2.9 and 3.0
1438
have the same bugfix level, features, and the same index format with this slight
1439
compression difference. In general, Lucene does not support reading newer
1440
indexes with older library versions. (Uwe Schindler)
1444
* LUCENE-2239: Documented limitations in NIOFSDirectory and MMapDirectory due to
1445
Java NIO behavior when a Thread is interrupted while blocking on IO.
1446
(Simon Willnauer, Robert Muir)
1448
================== Release 2.9.3 / 3.0.2 ====================
1450
Changes in backwards compatibility policy
1452
* LUCENE-2135: Added FieldCache.purge(IndexReader) method to the
1453
interface. Anyone implementing FieldCache externally will need to
1454
fix their code to implement this, on upgrading. (Mike McCandless)
1456
Changes in runtime behavior
1458
* LUCENE-2421: NativeFSLockFactory does not throw LockReleaseFailedException if
1459
it cannot delete the lock file, since obtaining the lock does not fail if the
1460
file is there. (Shai Erera)
1462
* LUCENE-2060 (2.9.3 only): Changed ConcurrentMergeScheduler's default for
1463
maxNumThreads from 3 to 1, because in practice we get the most gains
1464
from running a single merge in the backround. More than one
1465
concurrent merge causes alot of thrashing (though it's possible on
1466
SSD storage that there would be net gains). (Jason Rutherglen, Mike
1471
* LUCENE-2046 (2.9.3 only): IndexReader should not see the index as changed, after
1472
IndexWriter.prepareCommit has been called but before
1473
IndexWriter.commit is called. (Peter Keegan via Mike McCandless)
1475
* LUCENE-2119: Don't throw NegativeArraySizeException if you pass
1476
Integer.MAX_VALUE as nDocs to IndexSearcher search methods. (Paul
1477
Taylor via Mike McCandless)
1479
* LUCENE-2142: FieldCacheImpl.getStringIndex no longer throws an
1480
exception when term count exceeds doc count. (Mike McCandless)
1482
* LUCENE-2104: NativeFSLock.release() would silently fail if the lock is held by
1483
another thread/process. (Shai Erera via Uwe Schindler)
1485
* LUCENE-2283: Use shared memory pool for term vector and stored
1486
fields buffers. This memory will be reclaimed if needed according to
1487
the configured RAM Buffer Size for the IndexWriter. This also fixes
1488
potentially excessive memory usage when many threads are indexing a
1489
mix of small and large documents. (Tim Smith via Mike McCandless)
1491
* LUCENE-2300: If IndexWriter is pooling reader (because NRT reader
1492
has been obtained), and addIndexes* is run, do not pool the
1493
readers from the external directory. This is harmless (NRT reader is
1494
correct), but a waste of resources. (Mike McCandless)
1496
* LUCENE-2422: Don't reuse byte[] in IndexInput/Output -- it gains
1497
little performance, and ties up possibly large amounts of memory
1498
for apps that index large docs. (Ross Woolf via Mike McCandless)
1500
* LUCENE-2387: Don't hang onto Fieldables from the last doc indexed,
1501
in IndexWriter, nor the Reader in Tokenizer after close is
1502
called. (Ruben Laguna, Uwe Schindler, Mike McCandless)
1504
* LUCENE-2417: IndexCommit did not implement hashCode() and equals()
1505
consistently. Now they both take Directory and version into consideration. In
1506
addition, all of IndexComnmit methods which threw
1507
UnsupportedOperationException are now abstract. (Shai Erera)
1509
* LUCENE-2467: Fixed memory leaks in IndexWriter when large documents
1510
are indexed. (Mike McCandless)
1512
* LUCENE-2473: Clicking on the "More Results" link in the luceneweb.war
1513
demo resulted in ArrayIndexOutOfBoundsException.
1514
(Sami Siren via Robert Muir)
1516
* LUCENE-2476: If any exception is hit init'ing IW, release the write
1517
lock (previously we only released on IOException). (Tamas Cservenak
1518
via Mike McCandless)
1520
* LUCENE-2478: Fix CachingWrapperFilter to not throw NPE when
1521
Filter.getDocIdSet() returns null. (Uwe Schindler, Daniel Noll)
1523
* LUCENE-2468: Allow specifying how new deletions should be handled in
1524
CachingWrapperFilter and CachingSpanFilter. By default, new
1525
deletions are ignored in CachingWrapperFilter, since typically this
1526
filter is AND'd with a query that correctly takes new deletions into
1527
account. This should be a performance gain (higher cache hit rate)
1528
in apps that reopen readers, or use near-real-time reader
1529
(IndexWriter.getReader()), but may introduce invalid search results
1530
(allowing deleted docs to be returned) for certain cases, so a new
1531
expert ctor was added to CachingWrapperFilter to enforce deletions
1532
at a performance cost. CachingSpanFilter by default recaches if
1533
there are new deletions (Shay Banon via Mike McCandless)
1535
* LUCENE-2299: If you open an NRT reader while addIndexes* is running,
1536
it may miss some segments (Earwin Burrfoot via Mike McCandless)
1538
* LUCENE-2397: Don't throw NPE from SnapshotDeletionPolicy.snapshot if
1539
there are no commits yet (Shai Erera)
1541
* LUCENE-2424: Fix FieldDoc.toString to actually return its fields
1542
(Stephen Green via Mike McCandless)
1544
* LUCENE-2311: Always pass a "fully loaded" (terms index & doc stores)
1545
SegmentsReader to IndexWriter's mergedSegmentWarmer (if set), so
1546
that warming is free to do whatever it needs to. (Earwin Burrfoot
1547
via Mike McCandless)
1549
* LUCENE-3029: Fix corner case when MultiPhraseQuery is used with zero
1550
position-increment tokens that would sometimes assign different
1551
scores to identical docs. (Mike McCandless)
1553
* LUCENE-2486: Fixed intermittent FileNotFoundException on doc store
1554
files when a mergedSegmentWarmer is set on IndexWriter. (Mike
1557
* LUCENE-2130: Fix performance issue when FuzzyQuery runs on a
1558
multi-segment index (Michael McCandless)
1562
* LUCENE-2281: added doBeforeFlush to IndexWriter to allow extensions to perform
1563
operations before flush starts. Also exposed doAfterFlush as protected instead
1564
of package-private. (Shai Erera via Mike McCandless)
1566
* LUCENE-2356: Add IndexWriter.set/getReaderTermsIndexDivisor, to set
1567
what IndexWriter passes for termsIndexDivisor to the readers it
1568
opens internally when applying deletions or creating a
1569
near-real-time reader. (Earwin Burrfoot via Mike McCandless)
1573
* LUCENE-2494 (3.0.2 only): Use CompletionService in ParallelMultiSearcher
1574
instead of simple polling for results. (Edward Drapkin, Simon Willnauer)
1576
* LUCENE-2135: On IndexReader.close, forcefully evict any entries from
1577
the FieldCache rather than waiting for the WeakHashMap to release
1578
the reference (Mike McCandless)
1580
* LUCENE-2161: Improve concurrency of IndexReader, especially in the
1581
context of near real-time readers. (Mike McCandless)
1583
* LUCENE-2360: Small speedup to recycling of reused per-doc RAM in
1584
IndexWriter (Robert Muir, Mike McCandless)
1588
* LUCENE-2488 (2.9.3 only): Support build with JDK 1.4 and exclude Java 1.5
1589
contrib modules on request (pass '-Dforce.jdk14.build=true') when
1590
compiling/testing/packaging. This marks the benchmark contrib also
1591
as Java 1.5, as it depends on fast-vector-highlighter. (Uwe Schindler)
1593
================== Release 2.9.2 / 3.0.1 ====================
1595
Changes in backwards compatibility policy
1597
* LUCENE-2123 (3.0.1 only): Removed the protected inner class ScoreTerm
1598
from FuzzyQuery. The change was needed because the comparator of this
1599
class had to be changed in an incompatible way. The class was never
1600
intended to be public. (Uwe Schindler, Mike McCandless)
1604
* LUCENE-2092: BooleanQuery was ignoring disableCoord in its hashCode
1605
and equals methods, cause bad things to happen when caching
1606
BooleanQueries. (Chris Hostetter, Mike McCandless)
1608
* LUCENE-2095: Fixes: when two threads call IndexWriter.commit() at
1609
the same time, it's possible for commit to return control back to
1610
one of the threads before all changes are actually committed.
1611
(Sanne Grinovero via Mike McCandless)
1613
* LUCENE-2132 (3.0.1 only): Fix the demo result.jsp to use QueryParser
1614
with a Version argument. (Brian Li via Robert Muir)
1616
* LUCENE-2166: Don't incorrectly keep warning about the same immense
1617
term, when IndexWriter.infoStream is on. (Mike McCandless)
1619
* LUCENE-2158: At high indexing rates, NRT reader could temporarily
1620
lose deletions. (Mike McCandless)
1622
* LUCENE-2182: DEFAULT_ATTRIBUTE_FACTORY was failing to load
1623
implementation class when interface was loaded by a different
1624
class loader. (Uwe Schindler, reported on java-user by Ahmed El-dawy)
1626
* LUCENE-2257: Increase max number of unique terms in one segment to
1627
termIndexInterval (default 128) * ~2.1 billion = ~274 billion.
1628
(Tom Burton-West via Mike McCandless)
1630
* LUCENE-2260: Fixed AttributeSource to not hold a strong
1631
reference to the Attribute/AttributeImpl classes which prevents
1632
unloading of custom attributes loaded by other classloaders
1633
(e.g. in Solr plugins). (Uwe Schindler)
1635
* LUCENE-1941: Fix Min/MaxPayloadFunction returns 0 when
1636
only one payload is present. (Erik Hatcher, Mike McCandless
1639
* LUCENE-2270: Queries consisting of all zero-boost clauses
1640
(for example, text:foo^0) sorted incorrectly and produced
1641
invalid docids. (yonik)
1645
* LUCENE-1609 (3.0.1 only): Restore IndexReader.getTermInfosIndexDivisor
1646
(it was accidentally removed in 3.0.0) (Mike McCandless)
1648
* LUCENE-1972 (3.0.1 only): Restore SortField.getComparatorSource
1649
(it was accidentally removed in 3.0.0) (John Wang via Uwe Schindler)
1651
* LUCENE-2190: Added a new class CustomScoreProvider to function package
1652
that can be subclassed to provide custom scoring to CustomScoreQuery.
1653
The methods in CustomScoreQuery that did this before were deprecated
1654
and replaced by a method getCustomScoreProvider(IndexReader) that
1655
returns a custom score implementation using the above class. The change
1656
is necessary with per-segment searching, as CustomScoreQuery is
1657
a stateless class (like all other Queries) and does not know about
1658
the currently searched segment. This API works similar to Filter's
1659
getDocIdSet(IndexReader). (Paul chez Jamespot via Mike McCandless,
1662
* LUCENE-2080: Deprecate Version.LUCENE_CURRENT, as using this constant
1663
will cause backwards compatibility problems when upgrading Lucene. See
1664
the Version javadocs for additional information.
1669
* LUCENE-2086: When resolving deleted terms, do so in term sort order
1670
for better performance (Bogdan Ghidireac via Mike McCandless)
1672
* LUCENE-2123 (partly, 3.0.1 only): Fixes a slowdown / memory issue
1673
added by LUCENE-504. (Uwe Schindler, Robert Muir, Mike McCandless)
1675
* LUCENE-2258: Remove unneeded synchronization in FuzzyTermEnum.
1676
(Uwe Schindler, Robert Muir)
1680
* LUCENE-2114: Change TestFilteredSearch to test on multi-segment
1681
index as well. (Simon Willnauer via Mike McCandless)
1683
* LUCENE-2211: Improves BaseTokenStreamTestCase to use a fake attribute
1684
that checks if clearAttributes() was called correctly.
1685
(Uwe Schindler, Robert Muir)
1687
* LUCENE-2207, LUCENE-2219: Improve BaseTokenStreamTestCase to check if
1688
end() is implemented correctly. (Koji Sekiguchi, Robert Muir)
1692
* LUCENE-2114: Improve javadocs of Filter to call out that the
1693
provided reader is per-segment (Simon Willnauer via Mike
1696
======================= Release 3.0.0 =======================
1698
Changes in backwards compatibility policy
1700
* LUCENE-1979: Change return type of SnapshotDeletionPolicy#snapshot()
1701
from IndexCommitPoint to IndexCommit. Code that uses this method
1702
needs to be recompiled against Lucene 3.0 in order to work. The
1703
previously deprecated IndexCommitPoint is also removed.
1706
* o.a.l.Lock.isLocked() is now allowed to throw an IOException.
1709
* LUCENE-2030: CachingWrapperFilter and CachingSpanFilter now hide
1710
the internal cache implementation for thread safety, before it was
1711
declared protected. (Peter Lenahan, Uwe Schindler, Simon Willnauer)
1713
* LUCENE-2053: If you call Thread.interrupt() on a thread inside
1714
Lucene, Lucene will do its best to interrupt the thread. However,
1715
instead of throwing InterruptedException (which is a checked
1716
exception), you'll get an oal.util.ThreadInterruptedException (an
1717
unchecked exception, subclassing RuntimeException). The interrupt
1718
status on the thread is cleared when this exception is thrown.
1721
* LUCENE-2052: Some methods in Lucene core were changed to accept
1722
Java 5 varargs. This is not a backwards compatibility problem as
1723
long as you not try to override such a method. We left common
1724
overridden methods unchanged and added varargs to constructors,
1725
static, or final methods (MultiSearcher,...). (Uwe Schindler)
1727
* LUCENE-1558: IndexReader.open(Directory) now opens a readOnly=true
1728
reader, and new IndexSearcher(Directory) does the same. Note that
1729
this is a change in the default from 2.9, when these methods were
1730
previously deprecated. (Mike McCandless)
1732
* LUCENE-1753: Make not yet final TokenStreams final to enforce
1733
decorator pattern. (Uwe Schindler)
1735
Changes in runtime behavior
1737
* LUCENE-1677: Remove the system property to set SegmentReader class
1738
implementation. (Uwe Schindler)
1740
* LUCENE-1960: As a consequence of the removal of Field.Store.COMPRESS,
1741
support for this type of fields was removed. Lucene 3.0 is still able
1742
to read indexes with compressed fields, but as soon as merges occur
1743
or the index is optimized, all compressed fields are decompressed
1744
and converted to Field.Store.YES. Because of this, indexes with
1745
compressed fields can suddenly get larger. Also the first merge with
1746
decompression cannot be done in raw mode, it is therefore slower.
1747
This change has no effect for code that uses such old indexes,
1748
they behave as before (fields are automatically decompressed
1749
during read). Indexes converted to Lucene 3.0 format cannot be read
1750
anymore with previous versions.
1751
It is recommended to optimize your indexes after upgrading to convert
1752
to the new format and decompress all fields.
1753
If you want compressed fields, you can use CompressionTools, that
1754
creates compressed byte[] to be added as binary stored field. This
1755
cannot be done automatically, as you also have to decompress such
1756
fields when reading. You have to reindex to do that.
1757
(Michael Busch, Uwe Schindler)
1759
* LUCENE-2060: Changed ConcurrentMergeScheduler's default for
1760
maxNumThreads from 3 to 1, because in practice we get the most
1761
gains from running a single merge in the background. More than one
1762
concurrent merge causes a lot of thrashing (though it's possible on
1763
SSD storage that there would be net gains). (Jason Rutherglen,
1768
* LUCENE-1257, LUCENE-1984, LUCENE-1985, LUCENE-2057, LUCENE-1833, LUCENE-2012,
1769
LUCENE-1998: Port to Java 1.5:
1771
- Add generics to public and internal APIs (see below).
1772
- Replace new Integer(int), new Double(double),... by static valueOf() calls.
1773
- Replace for-loops with Iterator by foreach loops.
1774
- Replace StringBuffer with StringBuilder.
1775
- Replace o.a.l.util.Parameter by Java 5 enums (see below).
1776
- Add @Override annotations.
1777
(Uwe Schindler, Robert Muir, Karl Wettin, Paul Elschot, Kay Kay, Shai Erera,
1780
* Generify Lucene API:
1782
- TokenStream/AttributeSource: Now addAttribute()/getAttribute() return an
1783
instance of the requested attribute interface and no cast needed anymore
1785
- NumericRangeQuery, NumericRangeFilter, and FieldCacheRangeFilter
1786
now have Integer, Long, Float, Double as type param (LUCENE-1857).
1787
- Document.getFields() returns List<Fieldable>.
1788
- Query.extractTerms(Set<Term>)
1789
- CharArraySet and stop word sets in core/contrib
1790
- PriorityQueue (LUCENE-1935)
1792
- DisjunctionMaxQuery (LUCENE-1984)
1793
- MultiTermQueryWrapperFilter
1794
- CloseableThreadLocal
1796
- o.a.l.util.cache package
1797
- lot's of internal APIs of IndexWriter
1798
(Uwe Schindler, Michael Busch, Kay Kay, Robert Muir, Adriano Crestani)
1800
* LUCENE-1944, LUCENE-1856, LUCENE-1957, LUCENE-1960, LUCENE-1961,
1801
LUCENE-1968, LUCENE-1970, LUCENE-1946, LUCENE-1971, LUCENE-1975,
1802
LUCENE-1972, LUCENE-1978, LUCENE-944, LUCENE-1979, LUCENE-1973, LUCENE-2011:
1803
Remove deprecated methods/constructors/classes:
1805
- Remove all String/File directory paths in IndexReader /
1806
IndexSearcher / IndexWriter.
1807
- Remove FSDirectory.getDirectory()
1808
- Make FSDirectory abstract.
1809
- Remove Field.Store.COMPRESS (see above).
1810
- Remove Filter.bits(IndexReader) method and make
1811
Filter.getDocIdSet(IndexReader) abstract.
1812
- Remove old DocIdSetIterator methods and make the new ones abstract.
1813
- Remove some methods in PriorityQueue.
1814
- Remove old TokenStream API and backwards compatibility layer.
1815
- Remove RangeQuery, RangeFilter and ConstantScoreRangeQuery.
1816
- Remove SpanQuery.getTerms().
1817
- Remove ExtendedFieldCache, custom and auto caches, SortField.AUTO.
1818
- Remove old-style custom sort.
1819
- Remove legacy search setting in SortField.
1820
- Remove Hits and all references from core and contrib.
1821
- Remove HitCollector and its TopDocs support implementations.
1822
- Remove term field and accessors in MultiTermQuery
1823
(and fix Highlighter).
1824
- Remove deprecated methods in BooleanQuery.
1825
- Remove deprecated methods in Similarity.
1826
- Remove BoostingTermQuery.
1827
- Remove MultiValueSource.
1828
- Remove Scorer.explain(int).
1829
...and some other minor ones (Uwe Schindler, Michael Busch, Mark Miller)
1831
* LUCENE-1925: Make IndexSearcher's subReaders and docStarts members
1832
protected; add expert ctor to directly specify reader, subReaders
1833
and docStarts. (John Wang, Tim Smith via Mike McCandless)
1835
* LUCENE-1945: All public classes that have a close() method now
1836
also implement java.io.Closeable (IndexReader, IndexWriter, Directory,...).
1839
* LUCENE-1998: Change all Parameter instances to Java 5 enums. This
1840
is no backwards-break, only a change of the super class. Parameter
1841
was deprecated and will be removed in a later version.
1842
(DM Smith, Uwe Schindler)
1846
* LUCENE-1951: When the text provided to WildcardQuery has no wildcard
1847
characters (ie matches a single term), don't lose the boost and
1848
rewrite method settings. Also, rewrite to PrefixQuery if the
1849
wildcard is form "foo*", for slightly faster performance. (Robert
1850
Muir via Mike McCandless)
1852
* LUCENE-2013: SpanRegexQuery does not work with QueryScorer.
1853
(Benjamin Keil via Mark Miller)
1855
* LUCENE-2088: addAttribute() should only accept interfaces that
1856
extend Attribute. (Shai Erera, Uwe Schindler)
1858
* LUCENE-2045: Fix silly FileNotFoundException hit if you enable
1859
infoStream on IndexWriter and then add an empty document and commit
1860
(Shai Erera via Mike McCandless)
1862
* LUCENE-2046: IndexReader should not see the index as changed, after
1863
IndexWriter.prepareCommit has been called but before
1864
IndexWriter.commit is called. (Peter Keegan via Mike McCandless)
1868
* LUCENE-1933: Provide a convenience AttributeFactory that creates a
1869
Token instance for all basic attributes. (Uwe Schindler)
1871
* LUCENE-2041: Parallelize the rest of ParallelMultiSearcher. Lots of
1872
code refactoring and Java 5 concurrent support in MultiSearcher.
1873
(Joey Surls, Simon Willnauer via Uwe Schindler)
1875
* LUCENE-2051: Add CharArraySet.copy() as a simple method to copy
1876
any Set<?> to a CharArraySet that is optimized, if Set<?> is already
1877
an CharArraySet. (Simon Willnauer)
1881
* LUCENE-1183: Optimize Levenshtein Distance computation in
1882
FuzzyQuery. (Cédrik Lime via Mike McCandless)
1884
* LUCENE-2006: Optimization of FieldDocSortedHitQueue to always
1885
use Comparable<?> interface. (Uwe Schindler, Mark Miller)
1887
* LUCENE-2087: Remove recursion in NumericRangeTermEnum.
1892
* LUCENE-486: Remove test->demo dependencies. (Michael Busch)
1894
* LUCENE-2024: Raise build requirements to Java 1.5 and ANT 1.7.0
1895
(Uwe Schindler, Mike McCandless)
1897
======================= Release 2.9.1 =======================
1899
Changes in backwards compatibility policy
1901
* LUCENE-2002: Add required Version matchVersion argument when
1902
constructing QueryParser or MultiFieldQueryParser and, default (as
1903
of 2.9) enablePositionIncrements to true to match
1904
StandardAnalyzer's 2.9 default (Uwe Schindler, Mike McCandless)
1908
* LUCENE-1974: Fixed nasty bug in BooleanQuery (when it used
1909
BooleanScorer for scoring), whereby some matching documents fail to
1910
be collected. (Fulin Tang via Mike McCandless)
1912
* LUCENE-1124: Make sure FuzzyQuery always matches the precise term.
1913
(stefatwork@gmail.com via Mike McCandless)
1915
* LUCENE-1976: Fix IndexReader.isCurrent() to return the right thing
1916
when the reader is a near real-time reader. (Jake Mannix via Mike
1919
* LUCENE-1986: Fix NPE when scoring PayloadNearQuery (Peter Keegan,
1920
Mark Miller via Mike McCandless)
1922
* LUCENE-1992: Fix thread hazard if a merge is committing just as an
1923
exception occurs during sync (Uwe Schindler, Mike McCandless)
1925
* LUCENE-1995: Note in javadocs that IndexWriter.setRAMBufferSizeMB
1926
cannot exceed 2048 MB, and throw IllegalArgumentException if it
1927
does. (Aaron McKee, Yonik Seeley, Mike McCandless)
1929
* LUCENE-2004: Fix Constants.LUCENE_MAIN_VERSION to not be inlined
1930
by client code. (Uwe Schindler)
1932
* LUCENE-2016: Replace illegal U+FFFF character with the replacement
1933
char (U+FFFD) during indexing, to prevent silent index corruption.
1934
(Peter Keegan, Mike McCandless)
1938
* Un-deprecate search(Weight weight, Filter filter, int n) from
1939
Searchable interface (deprecated by accident). (Uwe Schindler)
1941
* Un-deprecate o.a.l.util.Version constants. (Mike McCandless)
1943
* LUCENE-1987: Un-deprecate some ctors of Token, as they will not
1944
be removed in 3.0 and are still useful. Also add some missing
1945
o.a.l.util.Version constants for enabling invalid acronym
1946
settings in StandardAnalyzer to be compatible with the coming
1947
Lucene 3.0. (Uwe Schindler)
1949
* LUCENE-1973: Un-deprecate IndexSearcher.setDefaultFieldSortScoring,
1950
to allow controlling per-IndexSearcher whether scores are computed
1951
when sorting by field. (Uwe Schindler, Mike McCandless)
1953
* LUCENE-2043: Make IndexReader.commit(Map<String,String>) public.
1958
* LUCENE-1955: Fix Hits deprecation notice to point users in right
1959
direction. (Mike McCandless, Mark Miller)
1961
* Fix javadoc about score tracking done by search methods in Searcher
1962
and IndexSearcher. (Mike McCandless)
1964
* LUCENE-2008: Javadoc improvements for TokenStream/Tokenizer/Token
1965
(Luke Nezda via Mike McCandless)
1967
======================= Release 2.9.0 =======================
1969
Changes in backwards compatibility policy
1971
* LUCENE-1575: Searchable.search(Weight, Filter, int, Sort) no
1972
longer computes a document score for each hit by default. If
1973
document score tracking is still needed, you can call
1974
IndexSearcher.setDefaultFieldSortScoring(true, true) to enable
1975
both per-hit and maxScore tracking; however, this is deprecated
1976
and will be removed in 3.0.
1978
Alternatively, use Searchable.search(Weight, Filter, Collector)
1979
and pass in a TopFieldCollector instance, using the following code
1983
TopFieldCollector tfc = TopFieldCollector.create(sort, numHits, fillFields,
1984
true /* trackDocScores */,
1985
true /* trackMaxScore */,
1986
false /* docsInOrder */);
1987
searcher.search(query, tfc);
1988
TopDocs results = tfc.topDocs();
1991
Note that your Sort object cannot use SortField.AUTO when you
1992
directly instantiate TopFieldCollector.
1994
Also, the method search(Weight, Filter, Collector) was added to
1995
the Searchable interface and the Searcher abstract class to
1996
replace the deprecated HitCollector versions. If you either
1997
implement Searchable or extend Searcher, you should change your
1998
code to implement this method. If you already extend
1999
IndexSearcher, no further changes are needed to use Collector.
2001
Finally, the values Float.NaN and Float.NEGATIVE_INFINITY are not
2002
valid scores. Lucene uses these values internally in certain
2003
places, so if you have hits with such scores, it will cause
2004
problems. (Shai Erera via Mike McCandless)
2006
* LUCENE-1687: All methods and parsers from the interface ExtendedFieldCache
2007
have been moved into FieldCache. ExtendedFieldCache is now deprecated and
2008
contains only a few declarations for binary backwards compatibility.
2009
ExtendedFieldCache will be removed in version 3.0. Users of FieldCache and
2010
ExtendedFieldCache will be able to plug in Lucene 2.9 without recompilation.
2011
The auto cache (FieldCache.getAuto) is now deprecated. Due to the merge of
2012
ExtendedFieldCache and FieldCache, FieldCache can now additionally return
2013
long[] and double[] arrays in addition to int[] and float[] and StringIndex.
2015
The interface changes are only notable for users implementing the interfaces,
2016
which was unlikely done, because there is no possibility to change
2017
Lucene's FieldCache implementation. (Grant Ingersoll, Uwe Schindler)
2019
* LUCENE-1630, LUCENE-1771: Weight, previously an interface, is now an abstract
2020
class. Some of the method signatures have changed, but it should be fairly
2021
easy to see what adjustments must be made to existing code to sync up
2022
with the new API. You can find more detail in the API Changes section.
2024
Going forward Searchable will be kept for convenience only and may
2025
be changed between minor releases without any deprecation
2026
process. It is not recommended that you implement it, but rather extend
2028
(Shai Erera, Chris Hostetter, Martin Ruckli, Mark Miller via Mike McCandless)
2030
* LUCENE-1422, LUCENE-1693: The new Attribute based TokenStream API (see below)
2031
has some backwards breaks in rare cases. We did our best to make the
2032
transition as easy as possible and you are not likely to run into any problems.
2033
If your tokenizers still implement next(Token) or next(), the calls are
2034
automatically wrapped. The indexer and query parser use the new API
2035
(eg use incrementToken() calls). All core TokenStreams are implemented using
2036
the new API. You can mix old and new API style TokenFilters/TokenStream.
2037
Problems only occur when you have done the following:
2038
You have overridden next(Token) or next() in one of the non-abstract core
2039
TokenStreams/-Filters. These classes should normally be final, but some
2040
of them are not. In this case, next(Token)/next() would never be called.
2041
To fail early with a hard compile/runtime error, the next(Token)/next()
2042
methods in these TokenStreams/-Filters were made final in this release.
2043
(Michael Busch, Uwe Schindler)
2045
* LUCENE-1763: MergePolicy now requires an IndexWriter instance to
2046
be passed upon instantiation. As a result, IndexWriter was removed
2047
as a method argument from all MergePolicy methods. (Shai Erera via
2050
* LUCENE-1748: LUCENE-1001 introduced PayloadSpans, but this was a back
2051
compat break and caused custom SpanQuery implementations to fail at runtime
2052
in a variety of ways. This issue attempts to remedy things by causing
2053
a compile time break on custom SpanQuery implementations and removing
2054
the PayloadSpans class, with its functionality now moved to Spans. To
2055
help in alleviating future back compat pain, Spans has been changed from
2056
an interface to an abstract class.
2057
(Hugh Cayless, Mark Miller)
2059
* LUCENE-1808: Query.createWeight has been changed from protected to
2060
public. This will be a back compat break if you have overridden this
2061
method - but you are likely already affected by the LUCENE-1693 (make Weight
2062
abstract rather than an interface) back compat break if you have overridden
2063
Query.creatWeight, so we have taken the opportunity to make this change.
2064
(Tim Smith, Shai Erera via Mark Miller)
2066
* LUCENE-1708 - IndexReader.document() no longer checks if the document is
2067
deleted. You can call IndexReader.isDeleted(n) prior to calling document(n).
2068
(Shai Erera via Mike McCandless)
2071
Changes in runtime behavior
2073
* LUCENE-1424: QueryParser now by default uses constant score auto
2074
rewriting when it generates a WildcardQuery and PrefixQuery (it
2075
already does so for TermRangeQuery, as well). Call
2076
setMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE)
2077
to revert to slower BooleanQuery rewriting method. (Mark Miller via Mike
2080
* LUCENE-1575: As of 2.9, the core collectors as well as
2081
IndexSearcher's search methods that return top N results, no
2082
longer filter documents with scores <= 0.0. If you rely on this
2083
functionality you can use PositiveScoresOnlyCollector like this:
2086
TopDocsCollector tdc = new TopScoreDocCollector(10);
2087
Collector c = new PositiveScoresOnlyCollector(tdc);
2088
searcher.search(query, c);
2089
TopDocs hits = tdc.topDocs();
2093
* LUCENE-1604: IndexReader.norms(String field) is now allowed to
2094
return null if the field has no norms, as long as you've
2095
previously called IndexReader.setDisableFakeNorms(true). This
2096
setting now defaults to false (to preserve the fake norms back
2097
compatible behavior) but in 3.0 will be hardwired to true. (Shon
2098
Vella via Mike McCandless).
2100
* LUCENE-1624: If you open IndexWriter with create=true and
2101
autoCommit=false on an existing index, IndexWriter no longer
2102
writes an empty commit when it's created. (Paul Taylor via Mike
2105
* LUCENE-1593: When you call Sort() or Sort.setSort(String field,
2106
boolean reverse), the resulting SortField array no longer ends
2107
with SortField.FIELD_DOC (it was unnecessary as Lucene breaks ties
2108
internally by docID). (Shai Erera via Michael McCandless)
2110
* LUCENE-1542: When the first token(s) have 0 position increment,
2111
IndexWriter used to incorrectly record the position as -1, if no
2112
payload is present, or Integer.MAX_VALUE if a payload is present.
2113
This causes positional queries to fail to match. The bug is now
2114
fixed, but if your app relies on the buggy behavior then you must
2115
call IndexWriter.setAllowMinus1Position(). That API is deprecated
2116
so you must fix your application, and rebuild your index, to not
2117
rely on this behavior by the 3.0 release of Lucene. (Jonathan
2118
Mamou, Mark Miller via Mike McCandless)
2121
* LUCENE-1715: Finalizers have been removed from the 4 core classes
2122
that still had them, since they will cause GC to take longer, thus
2123
tying up memory for longer, and at best they mask buggy app code.
2124
DirectoryReader (returned from IndexReader.open) & IndexWriter
2125
previously released the write lock during finalize.
2126
SimpleFSDirectory.FSIndexInput closed the descriptor in its
2127
finalizer, and NativeFSLock released the lock. It's possible
2128
applications will be affected by this, but only if the application
2129
is failing to close reader/writers. (Brian Groose via Mike
2132
* LUCENE-1717: Fixed IndexWriter to account for RAM usage of
2133
buffered deletions. (Mike McCandless)
2135
* LUCENE-1727: Ensure that fields are stored & retrieved in the
2136
exact order in which they were added to the document. This was
2137
true in all Lucene releases before 2.3, but was broken in 2.3 and
2138
2.4, and is now fixed in 2.9. (Mike McCandless)
2140
* LUCENE-1678: The addition of Analyzer.reusableTokenStream
2141
accidentally broke back compatibility of external analyzers that
2142
subclassed core analyzers that implemented tokenStream but not
2143
reusableTokenStream. This is now fixed, such that if
2144
reusableTokenStream is invoked on such a subclass, that method
2145
will forcefully fallback to tokenStream. (Mike McCandless)
2147
* LUCENE-1801: Token.clear() and Token.clearNoTermBuffer() now also clear
2148
startOffset, endOffset and type. This is not likely to affect any
2149
Tokenizer chains, as Tokenizers normally always set these three values.
2150
This change was made to be conform to the new AttributeImpl.clear() and
2151
AttributeSource.clearAttributes() to work identical for Token as one for all
2152
AttributeImpl and the 6 separate AttributeImpls. (Uwe Schindler, Michael Busch)
2154
* LUCENE-1483: When searching over multiple segments, a new Scorer is now created
2155
for each segment. Searching has been telescoped out a level and IndexSearcher now
2156
operates much like MultiSearcher does. The Weight is created only once for the top
2157
level Searcher, but each Scorer is passed a per-segment IndexReader. This will
2158
result in doc ids in the Scorer being internal to the per-segment IndexReader. It
2159
has always been outside of the API to count on a given IndexReader to contain every
2160
doc id in the index - and if you have been ignoring MultiSearcher in your custom code
2161
and counting on this fact, you will find your code no longer works correctly. If a
2162
custom Scorer implementation uses any caches/filters that rely on being based on the
2163
top level IndexReader, it will need to be updated to correctly use contextless
2164
caches/filters eg you can't count on the IndexReader to contain any given doc id or
2165
all of the doc ids. (Mark Miller, Mike McCandless)
2167
* LUCENE-1846: DateTools now uses the US locale to format the numbers in its
2168
date/time strings instead of the default locale. For most locales there will
2169
be no change in the index format, as DateFormatSymbols is using ASCII digits.
2170
The usage of the US locale is important to guarantee correct ordering of
2171
generated terms. (Uwe Schindler)
2173
* LUCENE-1860: MultiTermQuery now defaults to
2174
CONSTANT_SCORE_AUTO_REWRITE_DEFAULT rewrite method (previously it
2175
was SCORING_BOOLEAN_QUERY_REWRITE). This means that PrefixQuery
2176
and WildcardQuery will now produce constant score for all matching
2177
docs, equal to the boost of the query. (Mike McCandless)
2181
* LUCENE-1419: Add expert API to set custom indexing chain. This API is
2182
package-protected for now, so we don't have to officially support it.
2183
Yet, it will give us the possibility to try out different consumers
2184
in the chain. (Michael Busch)
2186
* LUCENE-1427: DocIdSet.iterator() is now allowed to throw
2187
IOException. (Paul Elschot, Mike McCandless)
2189
* LUCENE-1422, LUCENE-1693: New TokenStream API that uses a new class called
2190
AttributeSource instead of the Token class, which is now a utility class that
2191
holds common Token attributes. All attributes that the Token class had have
2192
been moved into separate classes: TermAttribute, OffsetAttribute,
2193
PositionIncrementAttribute, PayloadAttribute, TypeAttribute and FlagsAttribute.
2194
The new API is much more flexible; it allows to combine the Attributes
2195
arbitrarily and also to define custom Attributes. The new API has the same
2196
performance as the old next(Token) approach. For conformance with this new
2197
API Tee-/SinkTokenizer was deprecated and replaced by a new TeeSinkTokenFilter.
2198
(Michael Busch, Uwe Schindler; additional contributions and bug fixes by
2199
Daniel Shane, Doron Cohen)
2201
* LUCENE-1467: Add nextDoc() and next(int) methods to OpenBitSetIterator.
2202
These methods can be used to avoid additional calls to doc().
2205
* LUCENE-1468: Deprecate Directory.list(), which sometimes (in
2206
FSDirectory) filters out files that don't look like index files, in
2207
favor of new Directory.listAll(), which does no filtering. Also,
2208
listAll() will never return null; instead, it throws an IOException
2209
(or subclass). Specifically, FSDirectory.listAll() will throw the
2210
newly added NoSuchDirectoryException if the directory does not
2211
exist. (Marcel Reutegger, Mike McCandless)
2213
* LUCENE-1546: Add IndexReader.flush(Map commitUserData), allowing
2214
you to record an opaque commitUserData (maps String -> String) into
2215
the commit written by IndexReader. This matches IndexWriter's
2216
commit methods. (Jason Rutherglen via Mike McCandless)
2218
* LUCENE-652: Added org.apache.lucene.document.CompressionTools, to
2219
enable compressing & decompressing binary content, external to
2220
Lucene's indexing. Deprecated Field.Store.COMPRESS.
2222
* LUCENE-1561: Renamed Field.omitTf to Field.omitTermFreqAndPositions
2223
(Otis Gospodnetic via Mike McCandless)
2225
* LUCENE-1500: Added new InvalidTokenOffsetsException to Highlighter methods
2226
to denote issues when offsets in TokenStream tokens exceed the length of the
2227
provided text. (Mark Harwood)
2229
* LUCENE-1575, LUCENE-1483: HitCollector is now deprecated in favor of
2230
a new Collector abstract class. For easy migration, people can use
2231
HitCollectorWrapper which translates (wraps) HitCollector into
2232
Collector. Note that this class is also deprecated and will be
2233
removed when HitCollector is removed. Also TimeLimitedCollector
2234
is deprecated in favor of the new TimeLimitingCollector which
2235
extends Collector. (Shai Erera, Mark Miller, Mike McCandless)
2237
* LUCENE-1592: The method TermsEnum.skipTo() was deprecated, because
2238
it is used nowhere in core/contrib and there is only a very ineffective
2239
default implementation available. If you want to position a TermEnum
2240
to another Term, create a new one using IndexReader.terms(Term).
2243
* LUCENE-1621: MultiTermQuery.getTerm() has been deprecated as it does
2244
not make sense for all subclasses of MultiTermQuery. Check individual
2245
subclasses to see if they support getTerm(). (Mark Miller)
2247
* LUCENE-1636: Make TokenFilter.input final so it's set only
2248
once. (Wouter Heijke, Uwe Schindler via Mike McCandless).
2250
* LUCENE-1658, LUCENE-1451: Renamed FSDirectory to SimpleFSDirectory
2251
(but left an FSDirectory base class). Added an FSDirectory.open
2252
static method to pick a good default FSDirectory implementation
2253
given the OS. FSDirectories should now be instantiated using
2254
FSDirectory.open or with public constructors rather than
2255
FSDirectory.getDirectory(), which has been deprecated.
2256
(Michael McCandless, Uwe Schindler, yonik)
2258
* LUCENE-1665: Deprecate SortField.AUTO, to be removed in 3.0.
2259
Instead, when sorting by field, the application should explicitly
2260
state the type of the field. (Mike McCandless)
2262
* LUCENE-1660: StopFilter, StandardAnalyzer, StopAnalyzer now
2263
require up front specification of enablePositionIncrement (Mike
2266
* LUCENE-1614: DocIdSetIterator's next() and skipTo() were deprecated in favor
2267
of the new nextDoc() and advance(). The new methods return the doc Id they
2268
landed on, saving an extra call to doc() in most cases.
2269
For easy migration of the code, you can change the calls to next() to
2270
nextDoc() != DocIdSetIterator.NO_MORE_DOCS and similarly for skipTo().
2271
However it is advised that you take advantage of the returned doc ID and not
2272
call doc() following those two.
2273
Also, doc() was deprecated in favor of docID(). docID() should return -1 or
2274
NO_MORE_DOCS if nextDoc/advance were not called yet, or NO_MORE_DOCS if the
2275
iterator has exhausted. Otherwise it should return the current doc ID.
2276
(Shai Erera via Mike McCandless)
2278
* LUCENE-1672: All ctors/opens and other methods using String/File to
2279
specify the directory in IndexReader, IndexWriter, and IndexSearcher
2280
were deprecated. You should instantiate the Directory manually before
2281
and pass it to these classes (LUCENE-1451, LUCENE-1658).
2284
* LUCENE-1407: Move RemoteSearchable, RemoteCachingWrapperFilter out
2285
of Lucene's core into new contrib/remote package. Searchable no
2286
longer extends java.rmi.Remote (Simon Willnauer via Mike
2289
* LUCENE-1677: The global property
2290
org.apache.lucene.SegmentReader.class, and
2291
ReadOnlySegmentReader.class are now deprecated, to be removed in
2292
3.0. src/gcj/* has been removed. (Earwin Burrfoot via Mike
2295
* LUCENE-1673: Deprecated NumberTools in favour of the new
2296
NumericRangeQuery and its new indexing format for numeric or
2297
date values. (Uwe Schindler)
2299
* LUCENE-1630, LUCENE-1771: Weight is now an abstract class, and adds
2300
a scorer(IndexReader, boolean /* scoreDocsInOrder */, boolean /*
2301
topScorer */) method instead of scorer(IndexReader). IndexSearcher uses
2302
this method to obtain a scorer matching the capabilities of the Collector
2303
wrt orderedness of docIDs. Some Scorers (like BooleanScorer) are much more
2304
efficient if out-of-order documents scoring is allowed by a Collector.
2305
Collector must now implement acceptsDocsOutOfOrder. If you write a
2306
Collector which does not care about doc ID orderness, it is recommended
2307
that you return true. Weight has a scoresDocsOutOfOrder method, which by
2308
default returns false. If you create a Weight which will score documents
2309
out of order if requested, you should override that method to return true.
2310
BooleanQuery's setAllowDocsOutOfOrder and getAllowDocsOutOfOrder have been
2311
deprecated as they are not needed anymore. BooleanQuery will now score docs
2312
out of order when used with a Collector that can accept docs out of order.
2313
Finally, Weight#explain now takes a sub-reader and sub-docID, rather than
2314
a top level reader and docID.
2315
(Shai Erera, Chris Hostetter, Martin Ruckli, Mark Miller via Mike McCandless)
2317
* LUCENE-1466, LUCENE-1906: Added CharFilter and MappingCharFilter, which allows
2318
chaining & mapping of characters before tokenizers run. CharStream (subclass of
2319
Reader) is the base class for custom java.io.Reader's, that support offset
2320
correction. Tokenizers got an additional method correctOffset() that is passed
2321
down to the underlying CharStream if input is a subclass of CharStream/-Filter.
2322
(Koji Sekiguchi via Mike McCandless, Uwe Schindler)
2324
* LUCENE-1703: Add IndexWriter.waitForMerges. (Tim Smith via Mike
2327
* LUCENE-1625: CheckIndex's programmatic API now returns separate
2328
classes detailing the status of each component in the index, and
2329
includes more detailed status than previously. (Tim Smith via
2332
* LUCENE-1713: Deprecated RangeQuery and RangeFilter and renamed to
2333
TermRangeQuery and TermRangeFilter. TermRangeQuery is in constant
2334
score auto rewrite mode by default. The new classes also have new
2335
ctors taking field and term ranges as Strings (see also
2336
LUCENE-1424). (Uwe Schindler)
2338
* LUCENE-1609: The termInfosIndexDivisor must now be specified
2339
up-front when opening the IndexReader. Attempts to call
2340
IndexReader.setTermInfosIndexDivisor will hit an
2341
UnsupportedOperationException. This was done to enable removal of
2342
all synchronization in TermInfosReader, which previously could
2343
cause threads to pile up in certain cases. (Dan Rosher via Mike
2346
* LUCENE-1688: Deprecate static final String stop word array in and
2347
StopAnalzyer and replace it with an immutable implementation of
2348
CharArraySet. (Simon Willnauer via Mark Miller)
2350
* LUCENE-1742: SegmentInfos, SegmentInfo and SegmentReader have been
2351
made public as expert, experimental APIs. These APIs may suddenly
2352
change from release to release (Jason Rutherglen via Mike
2355
* LUCENE-1754: QueryWeight.scorer() can return null if no documents
2356
are going to be matched by the query. Similarly,
2357
Filter.getDocIdSet() can return null if no documents are going to
2358
be accepted by the Filter. Note that these 'can' return null,
2359
however they don't have to and can return a Scorer/DocIdSet which
2360
does not match / reject all documents. This is already the
2361
behavior of some QueryWeight/Filter implementations, and is
2362
documented here just for emphasis. (Shai Erera via Mike
2365
* LUCENE-1705: Added IndexWriter.deleteAllDocuments. (Tim Smith via
2368
* LUCENE-1460: Changed TokenStreams/TokenFilters in contrib to
2369
use the new TokenStream API. (Robert Muir, Michael Busch)
2371
* LUCENE-1748: LUCENE-1001 introduced PayloadSpans, but this was a back
2372
compat break and caused custom SpanQuery implementations to fail at runtime
2373
in a variety of ways. This issue attempts to remedy things by causing
2374
a compile time break on custom SpanQuery implementations and removing
2375
the PayloadSpans class, with its functionality now moved to Spans. To
2376
help in alleviating future back compat pain, Spans has been changed from
2377
an interface to an abstract class.
2378
(Hugh Cayless, Mark Miller)
2380
* LUCENE-1808: Query.createWeight has been changed from protected to
2381
public. (Tim Smith, Shai Erera via Mark Miller)
2383
* LUCENE-1826: Add constructors that take AttributeSource and
2384
AttributeFactory to all Tokenizer implementations.
2387
* LUCENE-1847: Similarity#idf for both a Term and Term Collection have
2388
been deprecated. New versions that return an IDFExplanation have been
2389
added. (Yasoja Seneviratne, Mike McCandless, Mark Miller)
2391
* LUCENE-1877: Made NativeFSLockFactory the default for
2392
the new FSDirectory API (open(), FSDirectory subclass ctors).
2393
All FSDirectory system properties were deprecated and all lock
2394
implementations use no lock prefix if the locks are stored inside
2395
the index directory. Because the deprecated String/File ctors of
2396
IndexWriter and IndexReader (LUCENE-1672) and FSDirectory.getDirectory()
2397
still use the old SimpleFSLockFactory and the new API
2398
NativeFSLockFactory, we strongly recommend not to mix deprecated
2399
and new API. (Uwe Schindler, Mike McCandless)
2401
* LUCENE-1911: Added a new method isCacheable() to DocIdSet. This method
2402
should return true, if the underlying implementation does not use disk
2403
I/O and is fast enough to be directly cached by CachingWrapperFilter.
2404
OpenBitSet, SortedVIntList, and DocIdBitSet are such candidates.
2405
The default implementation of the abstract DocIdSet class returns false.
2406
In this case, CachingWrapperFilter copies the DocIdSetIterator into an
2407
OpenBitSet for caching. (Uwe Schindler, Thomas Becker)
2411
* LUCENE-1415: MultiPhraseQuery has incorrect hashCode() and equals()
2412
implementation - Leads to Solr Cache misses.
2413
(Todd Feak, Mark Miller via yonik)
2415
* LUCENE-1327: Fix TermSpans#skipTo() to behave as specified in javadocs
2416
of Terms#skipTo(). (Michael Busch)
2418
* LUCENE-1573: Do not ignore InterruptedException (caused by
2419
Thread.interrupt()) nor enter deadlock/spin loop. Now, an interrupt
2420
will cause a RuntimeException to be thrown. In 3.0 we will change
2421
public APIs to throw InterruptedException. (Jeremy Volkman via
2424
* LUCENE-1590: Fixed stored-only Field instances do not change the
2425
value of omitNorms, omitTermFreqAndPositions in FieldInfo; when you
2426
retrieve such fields they will now have omitNorms=true and
2427
omitTermFreqAndPositions=false (though these values are unused).
2428
(Uwe Schindler via Mike McCandless)
2430
* LUCENE-1587: RangeQuery#equals() could consider a RangeQuery
2431
without a collator equal to one with a collator.
2432
(Mark Platvoet via Mark Miller)
2434
* LUCENE-1600: Don't call String.intern unnecessarily in some cases
2435
when loading documents from the index. (P Eger via Mike
2438
* LUCENE-1611: Fix case where OutOfMemoryException in IndexWriter
2439
could cause "infinite merging" to happen. (Christiaan Fluit via
2442
* LUCENE-1623: Properly handle back-compatibility of 2.3.x indexes that
2443
contain field names with non-ascii characters. (Mike Streeton via
2446
* LUCENE-1593: MultiSearcher and ParallelMultiSearcher did not break ties (in
2447
sort) by doc Id in a consistent manner (i.e., if Sort.FIELD_DOC was used vs.
2448
when it wasn't). (Shai Erera via Michael McCandless)
2450
* LUCENE-1647: Fix case where IndexReader.undeleteAll would cause
2451
the segment's deletion count to be incorrect. (Mike McCandless)
2453
* LUCENE-1542: When the first token(s) have 0 position increment,
2454
IndexWriter used to incorrectly record the position as -1, if no
2455
payload is present, or Integer.MAX_VALUE if a payload is present.
2456
This causes positional queries to fail to match. The bug is now
2457
fixed, but if your app relies on the buggy behavior then you must
2458
call IndexWriter.setAllowMinus1Position(). That API is deprecated
2459
so you must fix your application, and rebuild your index, to not
2460
rely on this behavior by the 3.0 release of Lucene. (Jonathan
2461
Mamou, Mark Miller via Mike McCandless)
2463
* LUCENE-1658: Fixed MMapDirectory to correctly throw IOExceptions
2464
on EOF, removed numeric overflow possibilities and added support
2465
for a hack to unmap the buffers on closing IndexInput.
2468
* LUCENE-1681: Fix infinite loop caused by a call to DocValues methods
2469
getMinValue, getMaxValue, getAverageValue. (Simon Willnauer via Mark Miller)
2471
* LUCENE-1599: Add clone support for SpanQuerys. SpanRegexQuery counts
2472
on this functionality and does not work correctly without it.
2473
(Billow Gao, Mark Miller)
2475
* LUCENE-1718: Fix termInfosIndexDivisor to carry over to reopened
2476
readers (Mike McCandless)
2478
* LUCENE-1583: SpanOrQuery skipTo() doesn't always move forwards as Spans
2479
documentation indicates it should. (Moti Nisenson via Mark Miller)
2481
* LUCENE-1566: Sun JVM Bug
2482
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546 causes
2483
invalid OutOfMemoryError when reading too many bytes at once from
2484
a file on 32bit JVMs that have a large maximum heap size. This
2485
fix adds set/getReadChunkSize to FSDirectory so that large reads
2486
are broken into chunks, to work around this JVM bug. On 32bit
2487
JVMs the default chunk size is 100 MB; on 64bit JVMs, which don't
2488
show the bug, the default is Integer.MAX_VALUE. (Simon Willnauer
2489
via Mike McCandless)
2491
* LUCENE-1448: Added TokenStream.end() to perform end-of-stream
2492
operations (ie to return the end offset of the tokenization).
2493
This is important when multiple fields with the same name are added
2494
to a document, to ensure offsets recorded in term vectors for all
2495
of the instances are correct.
2496
(Mike McCandless, Mark Miller, Michael Busch)
2498
* LUCENE-1805: CloseableThreadLocal did not allow a null Object in get(),
2499
although it does allow it in set(Object). Fix get() to not assert the object
2500
is not null. (Shai Erera via Mike McCandless)
2502
* LUCENE-1801: Changed all Tokenizers or TokenStreams in core/contrib)
2503
that are the source of Tokens to always call
2504
AttributeSource.clearAttributes() first. (Uwe Schindler)
2506
* LUCENE-1819: MatchAllDocsQuery.toString(field) should produce output
2507
that is parsable by the QueryParser. (John Wang, Mark Miller)
2509
* LUCENE-1836: Fix localization bug in the new query parser and add
2510
new LocalizedTestCase as base class for localization junit tests.
2511
(Robert Muir, Uwe Schindler via Michael Busch)
2513
* LUCENE-1847: PhraseQuery/TermQuery/SpanQuery use IndexReader specific stats
2514
in their Weight#explain methods - these stats should be corpus wide.
2515
(Yasoja Seneviratne, Mike McCandless, Mark Miller)
2517
* LUCENE-1885: Fix the bug that NativeFSLock.isLocked() did not work,
2518
if the lock was obtained by another NativeFSLock(Factory) instance.
2519
Because of this IndexReader.isLocked() and IndexWriter.isLocked() did
2520
not work correctly. (Uwe Schindler)
2522
* LUCENE-1899: Fix O(N^2) CPU cost when setting docIDs in order in an
2523
OpenBitSet, due to an inefficiency in how the underlying storage is
2524
reallocated. (Nadav Har'El via Mike McCandless)
2526
* LUCENE-1918: Fixed cases where a ParallelReader would
2527
generate exceptions on being passed to
2528
IndexWriter.addIndexes(IndexReader[]). First case was when the
2529
ParallelReader was empty. Second case was when the ParallelReader
2530
used to contain documents with TermVectors, but all such documents
2531
have been deleted. (Christian Kohlschütter via Mike McCandless)
2535
* LUCENE-1411: Added expert API to open an IndexWriter on a prior
2536
commit, obtained from IndexReader.listCommits. This makes it
2537
possible to rollback changes to an index even after you've closed
2538
the IndexWriter that made the changes, assuming you are using an
2539
IndexDeletionPolicy that keeps past commits around. This is useful
2540
when building transactional support on top of Lucene. (Mike
2543
* LUCENE-1382: Add an optional arbitrary Map (String -> String)
2544
"commitUserData" to IndexWriter.commit(), which is stored in the
2545
segments file and is then retrievable via
2546
IndexReader.getCommitUserData instance and static methods.
2547
(Shalin Shekhar Mangar via Mike McCandless)
2549
* LUCENE-1420: Similarity now has a computeNorm method that allows
2550
custom Similarity classes to override how norm is computed. It's
2551
provided a FieldInvertState instance that contains details from
2552
inverting the field. The default impl is boost *
2553
lengthNorm(numTerms), to be backwards compatible. Also added
2554
{set/get}DiscountOverlaps to DefaultSimilarity, to control whether
2555
overlapping tokens (tokens with 0 position increment) should be
2556
counted in lengthNorm. (Andrzej Bialecki via Mike McCandless)
2558
* LUCENE-1424: Moved constant score query rewrite capability into
2559
MultiTermQuery, allowing TermRangeQuery, PrefixQuery and WildcardQuery
2560
to switch between constant-score rewriting or BooleanQuery
2561
expansion rewriting via a new setRewriteMethod method.
2562
Deprecated ConstantScoreRangeQuery (Mark Miller via Mike
2565
* LUCENE-1461: Added FieldCacheRangeFilter, a RangeFilter for
2566
single-term fields that uses FieldCache to compute the filter. If
2567
your documents all have a single term for a given field, and you
2568
need to create many RangeFilters with varying lower/upper bounds,
2569
then this is likely a much faster way to create the filters than
2570
RangeFilter. FieldCacheRangeFilter allows ranges on all data types,
2571
FieldCache supports (term ranges, byte, short, int, long, float, double).
2572
However, it comes at the expense of added RAM consumption and slower
2573
first-time usage due to populating the FieldCache. It also does not
2574
support collation (Tim Sturge, Matt Ericson via Mike McCandless and
2577
* LUCENE-1296: add protected method CachingWrapperFilter.docIdSetToCache
2578
to allow subclasses to choose which DocIdSet implementation to use
2579
(Paul Elschot via Mike McCandless)
2581
* LUCENE-1390: Added ASCIIFoldingFilter, a Filter that converts
2582
alphabetic, numeric, and symbolic Unicode characters which are not in
2583
the first 127 ASCII characters (the "Basic Latin" Unicode block) into
2584
their ASCII equivalents, if one exists. ISOLatin1AccentFilter, which
2585
handles a subset of this filter, has been deprecated.
2586
(Andi Vajda, Steven Rowe via Mark Miller)
2588
* LUCENE-1478: Added new SortField constructor allowing you to
2589
specify a custom FieldCache parser to generate numeric values from
2590
terms for a field. (Uwe Schindler via Mike McCandless)
2592
* LUCENE-1528: Add support for Ideographic Space to the queryparser.
2593
(Luis Alves via Michael Busch)
2595
* LUCENE-1487: Added FieldCacheTermsFilter, to filter by multiple
2596
terms on single-valued fields. The filter loads the FieldCache
2597
for the field the first time it's called, and subsequent usage of
2598
that field, even with different Terms in the filter, are fast.
2599
(Tim Sturge, Shalin Shekhar Mangar via Mike McCandless).
2601
* LUCENE-1314: Add clone(), clone(boolean readOnly) and
2602
reopen(boolean readOnly) to IndexReader. Cloning an IndexReader
2603
gives you a new reader which you can make changes to (deletions,
2604
norms) without affecting the original reader. Now, with clone or
2605
reopen you can change the readOnly of the original reader. (Jason
2606
Rutherglen, Mike McCandless)
2608
* LUCENE-1506: Added FilteredDocIdSet, an abstract class which you
2609
subclass to implement the "match" method to accept or reject each
2610
docID. Unlike ChainedFilter (under contrib/misc),
2611
FilteredDocIdSet never requires you to materialize the full
2612
bitset. Instead, match() is called on demand per docID. (John
2613
Wang via Mike McCandless)
2615
* LUCENE-1398: Add ReverseStringFilter to contrib/analyzers, a filter
2616
to reverse the characters in each token. (Koji Sekiguchi via yonik)
2618
* LUCENE-1551: Add expert IndexReader.reopen(IndexCommit) to allow
2619
efficiently opening a new reader on a specific commit, sharing
2620
resources with the original reader. (Torin Danil via Mike
2623
* LUCENE-1434: Added org.apache.lucene.util.IndexableBinaryStringTools,
2624
to encode byte[] as String values that are valid terms, and
2625
maintain sort order of the original byte[] when the bytes are
2626
interpreted as unsigned. (Steven Rowe via Mike McCandless)
2628
* LUCENE-1543: Allow MatchAllDocsQuery to optionally use norms from
2629
a specific fields to set the score for a document. (Karl Wettin
2630
via Mike McCandless)
2632
* LUCENE-1586: Add IndexReader.getUniqueTermCount(). (Mike
2633
McCandless via Derek)
2635
* LUCENE-1516: Added "near real-time search" to IndexWriter, via a
2636
new expert getReader() method. This method returns a reader that
2637
searches the full index, including any uncommitted changes in the
2638
current IndexWriter session. This should result in a faster
2639
turnaround than the normal approach of commiting the changes and
2640
then reopening a reader. (Jason Rutherglen via Mike McCandless)
2642
* LUCENE-1603: Added new MultiTermQueryWrapperFilter, to wrap any
2643
MultiTermQuery as a Filter. Also made some improvements to
2644
MultiTermQuery: return DocIdSet.EMPTY_DOCIDSET if there are no
2645
terms in the enum; track the total number of terms it visited
2646
during rewrite (getTotalNumberOfTerms). FilteredTermEnum is also
2647
more friendly to subclassing. (Uwe Schindler via Mike McCandless)
2649
* LUCENE-1605: Added BitVector.subset(). (Jeremy Volkman via Mike
2652
* LUCENE-1618: Added FileSwitchDirectory that enables files with
2653
specified extensions to be stored in a primary directory and the
2654
rest of the files to be stored in the secondary directory. For
2655
example, this can be useful for the large doc-store (stored
2656
fields, term vectors) files in FSDirectory and the rest of the
2657
index files in a RAMDirectory. (Jason Rutherglen via Mike
2660
* LUCENE-1494: Added FieldMaskingSpanQuery which can be used to
2661
cross-correlate Spans from different fields.
2662
(Paul Cowan and Chris Hostetter)
2664
* LUCENE-1634: Add calibrateSizeByDeletes to LogMergePolicy, to take
2665
deletions into account when considering merges. (Yasuhiro Matsuda
2666
via Mike McCandless)
2668
* LUCENE-1550: Added new n-gram based String distance measure for spell checking.
2669
See the Javadocs for NGramDistance.java for a reference paper on why
2670
this is helpful (Tom Morton via Grant Ingersoll)
2672
* LUCENE-1470, LUCENE-1582, LUCENE-1602, LUCENE-1673, LUCENE-1701, LUCENE-1712:
2673
Added NumericRangeQuery and NumericRangeFilter, a fast alternative to
2674
RangeQuery/RangeFilter for numeric searches. They depend on a specific
2675
structure of terms in the index that can be created by indexing
2676
using the new NumericField or NumericTokenStream classes. NumericField
2677
can only be used for indexing and optionally stores the values as
2678
string representation in the doc store. Documents returned from
2679
IndexReader/IndexSearcher will return only the String value using
2680
the standard Fieldable interface. NumericFields can be sorted on
2681
and loaded into the FieldCache. (Uwe Schindler, Yonik Seeley,
2684
* LUCENE-1405: Added support for Ant resource collections in contrib/ant
2685
<index> task. (Przemyslaw Sztoch via Erik Hatcher)
2687
* LUCENE-1699: Allow setting a TokenStream on Field/Fieldable for indexing
2688
in conjunction with any other ways to specify stored field values,
2689
currently binary or string values. (yonik)
2691
* LUCENE-1701: Made the standard FieldCache.Parsers public and added
2692
parsers for fields generated using NumericField/NumericTokenStream.
2693
All standard parsers now also implement Serializable and enforce
2694
their singleton status. (Uwe Schindler, Mike McCandless)
2696
* LUCENE-1741: User configurable maximum chunk size in MMapDirectory.
2697
On 32 bit platforms, the address space can be very fragmented, so
2698
one big ByteBuffer for the whole file may not fit into address space.
2699
(Eks Dev via Uwe Schindler)
2701
* LUCENE-1644: Enable 4 rewrite modes for queries deriving from
2702
MultiTermQuery (WildcardQuery, PrefixQuery, TermRangeQuery,
2703
NumericRangeQuery): CONSTANT_SCORE_FILTER_REWRITE first creates a
2704
filter and then assigns constant score (boost) to docs;
2705
CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE create a BooleanQuery but
2706
uses a constant score (boost); SCORING_BOOLEAN_QUERY_REWRITE also
2707
creates a BooleanQuery but keeps the BooleanQuery's scores;
2708
CONSTANT_SCORE_AUTO_REWRITE tries to pick the most performant
2709
constant-score rewrite method. (Mike McCandless)
2711
* LUCENE-1448: Added TokenStream.end(), to perform end-of-stream
2712
operations. This is currently used to fix offset problems when
2713
multiple fields with the same name are added to a document.
2714
(Mike McCandless, Mark Miller, Michael Busch)
2716
* LUCENE-1776: Add an option to not collect payloads for an ordered
2717
SpanNearQuery. Payloads were not lazily loaded in this case as
2718
the javadocs implied. If you have payloads and want to use an ordered
2719
SpanNearQuery that does not need to use the payloads, you can
2720
disable loading them with a new constructor switch. (Mark Miller)
2722
* LUCENE-1341: Added PayloadNearQuery to enable SpanNearQuery functionality
2723
with payloads (Peter Keegan, Grant Ingersoll, Mark Miller)
2725
* LUCENE-1790: Added PayloadTermQuery to enable scoring of payloads
2726
based on the maximum payload seen for a document.
2727
Slight refactoring of Similarity and other payload queries (Grant Ingersoll, Mark Miller)
2729
* LUCENE-1749: Addition of FieldCacheSanityChecker utility, and
2730
hooks to use it in all existing Lucene Tests. This class can
2731
be used by any application to inspect the FieldCache and provide
2732
diagnostic information about the possibility of inconsistent
2733
FieldCache usage. Namely: FieldCache entries for the same field
2734
with different datatypes or parsers; and FieldCache entries for
2735
the same field in both a reader, and one of it's (descendant) sub
2737
(Chris Hostetter, Mark Miller)
2739
* LUCENE-1789: Added utility class
2740
oal.search.function.MultiValueSource to ease the transition to
2741
segment based searching for any apps that directly call
2742
oal.search.function.* APIs. This class wraps any other
2743
ValueSource, but takes care when composite (multi-segment) are
2744
passed to not double RAM usage in the FieldCache. (Chris
2745
Hostetter, Mark Miller, Mike McCandless)
2749
* LUCENE-1427: Fixed QueryWrapperFilter to not waste time computing
2750
scores of the query, since they are just discarded. Also, made it
2751
more efficient (single pass) by not creating & populating an
2752
intermediate OpenBitSet (Paul Elschot, Mike McCandless)
2754
* LUCENE-1443: Performance improvement for OpenBitSetDISI.inPlaceAnd()
2755
(Paul Elschot via yonik)
2757
* LUCENE-1484: Remove synchronization of IndexReader.document() by
2758
using CloseableThreadLocal internally. (Jason Rutherglen via Mike
2761
* LUCENE-1124: Short circuit FuzzyQuery.rewrite when input token length
2762
is small compared to minSimilarity. (Timo Nentwig, Mark Miller)
2764
* LUCENE-1316: MatchAllDocsQuery now avoids the synchronized
2765
IndexReader.isDeleted() call per document, by directly accessing
2766
the underlying deleteDocs BitVector. This improves performance
2767
with non-readOnly readers, especially in a multi-threaded
2768
environment. (Todd Feak, Yonik Seeley, Jason Rutherglen via Mike
2771
* LUCENE-1483: When searching over multiple segments we now visit
2772
each sub-reader one at a time. This speeds up warming, since
2773
FieldCache entries (if required) can be shared across reopens for
2774
those segments that did not change, and also speeds up searches
2775
that sort by relevance or by field values. (Mark Miller, Mike
2778
* LUCENE-1575: The new Collector class decouples collect() from
2779
score computation. Collector.setScorer is called to establish the
2780
current Scorer in-use per segment. Collectors that require the
2781
score should then call Scorer.score() per hit inside
2782
collect(). (Shai Erera via Mike McCandless)
2784
* LUCENE-1596: MultiTermDocs speedup when set with
2785
MultiTermDocs.seek(MultiTermEnum) (yonik)
2787
* LUCENE-1653: Avoid creating a Calendar in every call to
2788
DateTools#dateToString, DateTools#timeToString and
2789
DateTools#round. (Shai Erera via Mark Miller)
2791
* LUCENE-1688: Deprecate static final String stop word array and
2792
replace it with an immutable implementation of CharArraySet.
2793
Removes conversions between Set and array.
2794
(Simon Willnauer via Mark Miller)
2796
* LUCENE-1754: BooleanQuery.queryWeight.scorer() will return null if
2797
it won't match any documents (e.g. if there are no required and
2798
optional scorers, or not enough optional scorers to satisfy
2799
minShouldMatch). (Shai Erera via Mike McCandless)
2801
* LUCENE-1607: To speed up string interning for commonly used
2802
strings, the StringHelper.intern() interface was added with a
2803
default implementation that uses a lockless cache.
2804
(Earwin Burrfoot, yonik)
2806
* LUCENE-1800: QueryParser should use reusable TokenStreams. (yonik)
2811
* LUCENE-1908: Scoring documentation imrovements in Similarity javadocs.
2812
(Mark Miller, Shai Erera, Ted Dunning, Jiri Kuhn, Marvin Humphrey, Doron Cohen)
2814
* LUCENE-1872: NumericField javadoc improvements
2815
(Michael McCandless, Uwe Schindler)
2817
* LUCENE-1875: Make TokenStream.end javadoc less confusing.
2820
* LUCENE-1862: Rectified duplicate package level javadocs for
2821
o.a.l.queryParser and o.a.l.analysis.cn.
2824
* LUCENE-1886: Improved hyperlinking in key Analysis javadocs
2825
(Bernd Fondermann via Chris Hostetter)
2827
* LUCENE-1884: massive javadoc and comment cleanup, primarily dealing with
2829
(Robert Muir via Chris Hostetter)
2831
* LUCENE-1898: Switch changes to use bullets rather than numbers and
2832
update changes-to-html script to handle the new format.
2833
(Steven Rowe, Mark Miller)
2835
* LUCENE-1900: Improve Searchable Javadoc.
2836
(Nadav Har'El, Doron Cohen, Marvin Humphrey, Mark Miller)
2838
* LUCENE-1896: Improve Similarity#queryNorm javadocs.
2839
(Jiri Kuhn, Mark Miller)
2843
* LUCENE-1440: Add new targets to build.xml that allow downloading
2844
and executing the junit testcases from an older release for
2845
backwards-compatibility testing. (Michael Busch)
2847
* LUCENE-1446: Add compatibility tag to common-build.xml and run
2848
backwards-compatibility tests in the nightly build. (Michael Busch)
2850
* LUCENE-1529: Properly test "drop-in" replacement of jar with
2851
backwards-compatibility tests. (Mike McCandless, Michael Busch)
2853
* LUCENE-1851: Change 'javacc' and 'clean-javacc' targets to build
2854
and clean contrib/surround files. (Luis Alves via Michael Busch)
2856
* LUCENE-1854: tar task should use longfile="gnu" to avoid false file
2857
name length warnings. (Mark Miller)
2861
* LUCENE-1791: Enhancements to the QueryUtils and CheckHits utility
2862
classes to wrap IndexReaders and Searchers in MultiReaders or
2863
MultiSearcher when possible to help exercise more edge cases.
2864
(Chris Hostetter, Mark Miller)
2866
* LUCENE-1852: Fix localization test failures.
2867
(Robert Muir via Michael Busch)
2869
* LUCENE-1843: Refactored all tests that use assertAnalyzesTo() & others
2870
in core and contrib to use a new BaseTokenStreamTestCase
2871
base class. Also rewrote some tests to use this general analysis assert
2872
functions instead of own ones (e.g. TestMappingCharFilter).
2873
The new base class also tests tokenization with the TokenStream.next()
2874
backwards layer enabled (using Token/TokenWrapper as attribute
2875
implementation) and disabled (default for Lucene 3.0)
2876
(Uwe Schindler, Robert Muir)
2878
* LUCENE-1836: Added a new LocalizedTestCase as base class for localization
2879
junit tests. (Robert Muir, Uwe Schindler via Michael Busch)
2881
======================= Release 2.4.1 =======================
2885
1. LUCENE-1186: Add Analyzer.close() to free internal ThreadLocal
2886
resources. (Christian Kohlschütter via Mike McCandless)
2890
1. LUCENE-1452: Fixed silent data-loss case whereby binary fields are
2891
truncated to 0 bytes during merging if the segments being merged
2892
are non-congruent (same field name maps to different field
2893
numbers). This bug was introduced with LUCENE-1219. (Andrzej
2894
Bialecki via Mike McCandless).
2896
2. LUCENE-1429: Don't throw incorrect IllegalStateException from
2897
IndexWriter.close() if you've hit an OOM when autoCommit is true.
2900
3. LUCENE-1474: If IndexReader.flush() is called twice when there were
2901
pending deletions, it could lead to later false AssertionError
2902
during IndexReader.open. (Mike McCandless)
2904
4. LUCENE-1430: Fix false AlreadyClosedException from IndexReader.open
2905
(masking an actual IOException) that takes String or File path.
2908
5. LUCENE-1442: Multiple-valued NOT_ANALYZED fields can double-count
2909
token offsets. (Mike McCandless)
2911
6. LUCENE-1453: Ensure IndexReader.reopen()/clone() does not result in
2912
incorrectly closing the shared FSDirectory. This bug would only
2913
happen if you use IndexReader.open() with a File or String argument.
2914
The returned readers are wrapped by a FilterIndexReader that
2915
correctly handles closing of directory after reopen()/clone().
2916
(Mark Miller, Uwe Schindler, Mike McCandless)
2918
7. LUCENE-1457: Fix possible overflow bugs during binary
2919
searches. (Mark Miller via Mike McCandless)
2921
8. LUCENE-1459: Fix CachingWrapperFilter to not throw exception if
2922
both bits() and getDocIdSet() methods are called. (Matt Jones via
2925
9. LUCENE-1519: Fix int overflow bug during segment merging. (Deepak
2926
via Mike McCandless)
2928
10. LUCENE-1521: Fix int overflow bug when flushing segment.
2929
(Shon Vella via Mike McCandless).
2931
11. LUCENE-1544: Fix deadlock in IndexWriter.addIndexes(IndexReader[]).
2932
(Mike McCandless via Doug Sale)
2934
12. LUCENE-1547: Fix rare thread safety issue if two threads call
2935
IndexWriter commit() at the same time. (Mike McCandless)
2937
13. LUCENE-1465: NearSpansOrdered returns payloads from first possible match
2938
rather than the correct, shortest match; Payloads could be returned even
2939
if the max slop was exceeded; The wrong payload could be returned in
2940
certain situations. (Jonathan Mamou, Greg Shackles, Mark Miller)
2942
14. LUCENE-1186: Add Analyzer.close() to free internal ThreadLocal
2943
resources. (Christian Kohlschütter via Mike McCandless)
2945
15. LUCENE-1552: Fix IndexWriter.addIndexes(IndexReader[]) to properly
2946
rollback IndexWriter's internal state on hitting an
2947
exception. (Scott Garland via Mike McCandless)
2949
======================= Release 2.4.0 =======================
2951
Changes in backwards compatibility policy
2953
1. LUCENE-1340: In a minor change to Lucene's backward compatibility
2954
policy, we are now allowing the Fieldable interface to have
2955
changes, within reason, and made on a case-by-case basis. If an
2956
application implements it's own Fieldable, please be aware of
2957
this. Otherwise, no need to be concerned. This is in effect for
2958
all 2.X releases, starting with 2.4. Also note, that in all
2959
likelihood, Fieldable will be changed in 3.0.
2962
Changes in runtime behavior
2964
1. LUCENE-1151: Fix StandardAnalyzer to not mis-identify host names
2965
(eg lucene.apache.org) as an ACRONYM. To get back to the pre-2.4
2966
backwards compatible, but buggy, behavior, you can either call
2967
StandardAnalyzer.setDefaultReplaceInvalidAcronym(false) (static
2968
method), or, set system property
2969
org.apache.lucene.analysis.standard.StandardAnalyzer.replaceInvalidAcronym
2970
to "false" on JVM startup. All StandardAnalyzer instances created
2971
after that will then show the pre-2.4 behavior. Alternatively,
2972
you can call setReplaceInvalidAcronym(false) to change the
2973
behavior per instance of StandardAnalyzer. This backwards
2974
compatibility will be removed in 3.0 (hardwiring the value to
2975
true). (Mike McCandless)
2977
2. LUCENE-1044: IndexWriter with autoCommit=true now commits (such
2978
that a reader can see the changes) far less often than it used to.
2979
Previously, every flush was also a commit. You can always force a
2980
commit by calling IndexWriter.commit(). Furthermore, in 3.0,
2981
autoCommit will be hardwired to false (IndexWriter constructors
2982
that take an autoCommit argument have been deprecated) (Mike
2985
3. LUCENE-1335: IndexWriter.addIndexes(Directory[]) and
2986
addIndexesNoOptimize no longer allow the same Directory instance
2987
to be passed in more than once. Internally, IndexWriter uses
2988
Directory and segment name to uniquely identify segments, so
2989
adding the same Directory more than once was causing duplicates
2990
which led to problems (Mike McCandless)
2992
4. LUCENE-1396: Improve PhraseQuery.toString() so that gaps in the
2993
positions are indicated with a ? and multiple terms at the same
2994
position are joined with a |. (Andrzej Bialecki via Mike
2999
1. LUCENE-1084: Changed all IndexWriter constructors to take an
3000
explicit parameter for maximum field size. Deprecated all the
3001
pre-existing constructors; these will be removed in release 3.0.
3002
NOTE: these new constructors set autoCommit to false. (Steven
3003
Rowe via Mike McCandless)
3005
2. LUCENE-584: Changed Filter API to return a DocIdSet instead of a
3006
java.util.BitSet. This allows using more efficient data structures
3007
for Filters and makes them more flexible. This deprecates
3008
Filter.bits(), so all filters that implement this outside
3009
the Lucene code base will need to be adapted. See also the javadocs
3010
of the Filter class. (Paul Elschot, Michael Busch)
3012
3. LUCENE-1044: Added IndexWriter.commit() which flushes any buffered
3013
adds/deletes and then commits a new segments file so readers will
3014
see the changes. Deprecate IndexWriter.flush() in favor of
3015
IndexWriter.commit(). (Mike McCandless)
3017
4. LUCENE-325: Added IndexWriter.expungeDeletes methods, which
3018
consult the MergePolicy to find merges necessary to merge away all
3019
deletes from the index. This should be a somewhat lower cost
3020
operation than optimize. (John Wang via Mike McCandless)
3022
5. LUCENE-1233: Return empty array instead of null when no fields
3023
match the specified name in these methods in Document:
3024
getFieldables, getFields, getValues, getBinaryValues. (Stefan
3025
Trcek vai Mike McCandless)
3027
6. LUCENE-1234: Make BoostingSpanScorer protected. (Andi Vajda via Grant Ingersoll)
3029
7. LUCENE-510: The index now stores strings as true UTF-8 bytes
3030
(previously it was Java's modified UTF-8). If any text, either
3031
stored fields or a token, has illegal UTF-16 surrogate characters,
3032
these characters are now silently replaced with the Unicode
3033
replacement character U+FFFD. This is a change to the index file
3034
format. (Marvin Humphrey via Mike McCandless)
3036
8. LUCENE-852: Let the SpellChecker caller specify IndexWriter mergeFactor
3037
and RAM buffer size. (Otis Gospodnetic)
3039
9. LUCENE-1290: Deprecate org.apache.lucene.search.Hits, Hit and HitIterator
3040
and remove all references to these classes from the core. Also update demos
3041
and tutorials. (Michael Busch)
3043
10. LUCENE-1288: Add getVersion() and getGeneration() to IndexCommit.
3044
getVersion() returns the same value that IndexReader.getVersion()
3045
returns when the reader is opened on the same commit. (Jason
3046
Rutherglen via Mike McCandless)
3048
11. LUCENE-1311: Added IndexReader.listCommits(Directory) static
3049
method to list all commits in a Directory, plus IndexReader.open
3050
methods that accept an IndexCommit and open the index as of that
3051
commit. These methods are only useful if you implement a custom
3052
DeletionPolicy that keeps more than the last commit around.
3053
(Jason Rutherglen via Mike McCandless)
3055
12. LUCENE-1325: Added IndexCommit.isOptimized(). (Shalin Shekhar
3056
Mangar via Mike McCandless)
3058
13. LUCENE-1324: Added TokenFilter.reset(). (Shai Erera via Mike
3061
14. LUCENE-1340: Added Fieldable.omitTf() method to skip indexing term
3062
frequency, positions and payloads. This saves index space, and
3063
indexing/searching time. (Eks Dev via Mike McCandless)
3065
15. LUCENE-1219: Add basic reuse API to Fieldable for binary fields:
3066
getBinaryValue/Offset/Length(); currently only lazy fields reuse
3067
the provided byte[] result to getBinaryValue. (Eks Dev via Mike
3070
16. LUCENE-1334: Add new constructor for Term: Term(String fieldName)
3071
which defaults term text to "". (DM Smith via Mike McCandless)
3073
17. LUCENE-1333: Added Token.reinit(*) APIs to re-initialize (reuse) a
3074
Token. Also added term() method to return a String, with a
3075
performance penalty clearly documented. Also implemented
3076
hashCode() and equals() in Token, and fixed all core and contrib
3077
analyzers to use the re-use APIs. (DM Smith via Mike McCandless)
3079
18. LUCENE-1329: Add optional readOnly boolean when opening an
3080
IndexReader. A readOnly reader is not allowed to make changes
3081
(deletions, norms) to the index; in exchanged, the isDeleted
3082
method, often a bottleneck when searching with many threads, is
3083
not synchronized. The default for readOnly is still false, but in
3084
3.0 the default will become true. (Jason Rutherglen via Mike
3087
19. LUCENE-1367: Add IndexCommit.isDeleted(). (Shalin Shekhar Mangar
3088
via Mike McCandless)
3090
20. LUCENE-1061: Factored out all "new XXXQuery(...)" in
3091
QueryParser.java into protected methods newXXXQuery(...) so that
3092
subclasses can create their own subclasses of each Query type.
3093
(John Wang via Mike McCandless)
3095
21. LUCENE-753: Added new Directory implementation
3096
org.apache.lucene.store.NIOFSDirectory, which uses java.nio's
3097
FileChannel to do file reads. On most non-Windows platforms, with
3098
many threads sharing a single searcher, this may yield sizable
3099
improvement to query throughput when compared to FSDirectory,
3100
which only allows a single thread to read from an open file at a
3101
time. (Jason Rutherglen via Mike McCandless)
3103
22. LUCENE-1371: Added convenience method TopDocs Searcher.search(Query query, int n).
3106
23. LUCENE-1356: Allow easy extensions of TopDocCollector by turning
3107
constructor and fields from package to protected. (Shai Erera
3110
24. LUCENE-1375: Added convenience method IndexCommit.getTimestamp,
3111
which is equivalent to
3112
getDirectory().fileModified(getSegmentsFileName()). (Mike McCandless)
3114
23. LUCENE-1366: Rename Field.Index options to be more accurate:
3115
TOKENIZED becomes ANALYZED; UN_TOKENIZED becomes NOT_ANALYZED;
3116
NO_NORMS becomes NOT_ANALYZED_NO_NORMS and a new ANALYZED_NO_NORMS
3117
is added. (Mike McCandless)
3119
24. LUCENE-1131: Added numDeletedDocs method to IndexReader (Otis Gospodnetic)
3123
1. LUCENE-1134: Fixed BooleanQuery.rewrite to only optimize a single
3124
clause query if minNumShouldMatch<=0. (Shai Erera via Michael Busch)
3126
2. LUCENE-1169: Fixed bug in IndexSearcher.search(): searching with
3127
a filter might miss some hits because scorer.skipTo() is called
3128
without checking if the scorer is already at the right position.
3129
scorer.skipTo(scorer.doc()) is not a NOOP, it behaves as
3130
scorer.next(). (Eks Dev, Michael Busch)
3132
3. LUCENE-1182: Added scorePayload to SimilarityDelegator (Andi Vajda via Grant Ingersoll)
3134
4. LUCENE-1213: MultiFieldQueryParser was ignoring slop in case
3135
of a single field phrase. (Trejkaz via Doron Cohen)
3137
5. LUCENE-1228: IndexWriter.commit() was not updating the index version and as
3138
result IndexReader.reopen() failed to sense index changes. (Doron Cohen)
3140
6. LUCENE-1267: Added numDocs() and maxDoc() to IndexWriter;
3141
deprecated docCount(). (Mike McCandless)
3143
7. LUCENE-1274: Added new prepareCommit() method to IndexWriter,
3144
which does phase 1 of a 2-phase commit (commit() does phase 2).
3145
This is needed when you want to update an index as part of a
3146
transaction involving external resources (eg a database). Also
3147
deprecated abort(), renaming it to rollback(). (Mike McCandless)
3149
8. LUCENE-1003: Stop RussianAnalyzer from removing numbers.
3150
(TUSUR OpenTeam, Dmitry Lihachev via Otis Gospodnetic)
3152
9. LUCENE-1152: SpellChecker fix around clearIndex and indexDictionary
3153
methods, plus removal of IndexReader reference.
3154
(Naveen Belkale via Otis Gospodnetic)
3156
10. LUCENE-1046: Removed dead code in SpellChecker
3157
(Daniel Naber via Otis Gospodnetic)
3159
11. LUCENE-1189: Fixed the QueryParser to handle escaped characters within
3160
quoted terms correctly. (Tomer Gabel via Michael Busch)
3162
12. LUCENE-1299: Fixed NPE in SpellChecker when IndexReader is not null and field is (Grant Ingersoll)
3164
13. LUCENE-1303: Fixed BoostingTermQuery's explanation to be marked as a Match
3165
depending only upon the non-payload score part, regardless of the effect of
3166
the payload on the score. Prior to this, score of a query containing a BTQ
3167
differed from its explanation. (Doron Cohen)
3169
14. LUCENE-1310: Fixed SloppyPhraseScorer to work also for terms repeating more
3170
than twice in the query. (Doron Cohen)
3172
15. LUCENE-1351: ISOLatin1AccentFilter now cleans additional ligatures (Cedrik Lime via Grant Ingersoll)
3174
16. LUCENE-1383: Workaround a nasty "leak" in Java's builtin
3175
ThreadLocal, to prevent Lucene from causing unexpected
3176
OutOfMemoryError in certain situations (notably J2EE
3177
applications). (Chris Lu via Mike McCandless)
3181
1. LUCENE-1137: Added Token.set/getFlags() accessors for passing more information about a Token through the analysis
3182
process. The flag is not indexed/stored and is thus only used by analysis.
3184
2. LUCENE-1147: Add -segment option to CheckIndex tool so you can
3185
check only a specific segment or segments in your index. (Mike
3188
3. LUCENE-1045: Reopened this issue to add support for short and bytes.
3190
4. LUCENE-584: Added new data structures to o.a.l.util, such as
3191
OpenBitSet and SortedVIntList. These extend DocIdSet and can
3192
directly be used for Filters with the new Filter API. Also changed
3193
the core Filters to use OpenBitSet instead of java.util.BitSet.
3194
(Paul Elschot, Michael Busch)
3196
5. LUCENE-494: Added QueryAutoStopWordAnalyzer to allow for the automatic removal, from a query of frequently occurring terms.
3197
This Analyzer is not intended for use during indexing. (Mark Harwood via Grant Ingersoll)
3199
6. LUCENE-1044: Change Lucene to properly "sync" files after
3200
committing, to ensure on a machine or OS crash or power cut, even
3201
with cached writes, the index remains consistent. Also added
3202
explicit commit() method to IndexWriter to force a commit without
3203
having to close. (Mike McCandless)
3205
7. LUCENE-997: Add search timeout (partial) support.
3206
A TimeLimitedCollector was added to allow limiting search time.
3207
It is a partial solution since timeout is checked only when
3208
collecting a hit, and therefore a search for rare words in a
3209
huge index might not stop within the specified time.
3210
(Sean Timm via Doron Cohen)
3212
8. LUCENE-1184: Allow SnapshotDeletionPolicy to be re-used across
3213
close/re-open of IndexWriter while still protecting an open
3214
snapshot (Tim Brennan via Mike McCandless)
3216
9. LUCENE-1194: Added IndexWriter.deleteDocuments(Query) to delete
3217
documents matching the specified query. Also added static unlock
3218
and isLocked methods (deprecating the ones in IndexReader). (Mike
3221
10. LUCENE-1201: Add IndexReader.getIndexCommit() method. (Tim Brennan
3222
via Mike McCandless)
3224
11. LUCENE-550: Added InstantiatedIndex implementation. Experimental
3225
Index store similar to MemoryIndex but allows for multiple documents
3226
in memory. (Karl Wettin via Grant Ingersoll)
3228
12. LUCENE-400: Added word based n-gram filter (in contrib/analyzers) called ShingleFilter and an Analyzer wrapper
3229
that wraps another Analyzer's token stream with a ShingleFilter (Sebastian Kirsch, Steve Rowe via Grant Ingersoll)
3231
13. LUCENE-1166: Decomposition tokenfilter for languages like German and Swedish (Thomas Peuss via Grant Ingersoll)
3233
14. LUCENE-1187: ChainedFilter and BooleanFilter now work with new Filter API
3234
and DocIdSetIterator-based filters. Backwards-compatibility with old
3235
BitSet-based filters is ensured. (Paul Elschot via Michael Busch)
3237
15. LUCENE-1295: Added new method to MoreLikeThis for retrieving interesting terms and made retrieveTerms(int) public. (Grant Ingersoll)
3239
16. LUCENE-1298: MoreLikeThis can now accept a custom Similarity (Grant Ingersoll)
3241
17. LUCENE-1297: Allow other string distance measures for the SpellChecker
3242
(Thomas Morton via Otis Gospodnetic)
3244
18. LUCENE-1001: Provide access to Payloads via Spans. All existing Span Query implementations in Lucene implement. (Mark Miller, Grant Ingersoll)
3246
19. LUCENE-1354: Provide programmatic access to CheckIndex (Grant Ingersoll, Mike McCandless)
3248
20. LUCENE-1279: Add support for Collators to RangeFilter/Query and Query Parser. (Steve Rowe via Grant Ingersoll)
3252
1. LUCENE-705: When building a compound file, use
3253
RandomAccessFile.setLength() to tell the OS/filesystem to
3254
pre-allocate space for the file. This may improve fragmentation
3255
in how the CFS file is stored, and allows us to detect an upcoming
3256
disk full situation before actually filling up the disk. (Mike
3259
2. LUCENE-1120: Speed up merging of term vectors by bulk-copying the
3260
raw bytes for each contiguous range of non-deleted documents.
3263
3. LUCENE-1185: Avoid checking if the TermBuffer 'scratch' in
3264
SegmentTermEnum is null for every call of scanTo().
3265
(Christian Kohlschuetter via Michael Busch)
3267
4. LUCENE-1217: Internal to Field.java, use isBinary instead of
3268
runtime type checking for possible speedup of binaryValue().
3269
(Eks Dev via Mike McCandless)
3271
5. LUCENE-1183: Optimized TRStringDistance class (in contrib/spell) that uses
3272
less memory than the previous version. (Cédrik LIME via Otis Gospodnetic)
3274
6. LUCENE-1195: Improve term lookup performance by adding a LRU cache to the
3275
TermInfosReader. In performance experiments the speedup was about 25% on
3276
average on mid-size indexes with ~500,000 documents for queries with 3
3277
terms and about 7% on larger indexes with ~4.3M documents. (Michael Busch)
3281
1. LUCENE-1236: Added some clarifying remarks to EdgeNGram*.java (Hiroaki Kawai via Grant Ingersoll)
3283
2. LUCENE-1157 and LUCENE-1256: HTML changes log, created automatically
3284
from CHANGES.txt. This HTML file is currently visible only via developers page.
3285
(Steven Rowe via Doron Cohen)
3287
3. LUCENE-1349: Fieldable can now be changed without breaking backward compatibility rules (within reason. See the note at
3288
the top of this file and also on Fieldable.java). (Grant Ingersoll)
3290
4. LUCENE-1873: Update documentation to reflect current Contrib area status.
3291
(Steven Rowe, Mark Miller)
3295
1. LUCENE-1153: Added JUnit JAR to new lib directory. Updated build to rely on local JUnit instead of ANT/lib.
3297
2. LUCENE-1202: Small fixes to the way Clover is used to work better
3298
with contribs. Of particular note: a single clover db is used
3299
regardless of whether tests are run globally or in the specific
3300
contrib directories.
3302
3. LUCENE-1353: Javacc target in contrib/miscellaneous for
3303
generating the precedence query parser.
3307
1. LUCENE-1238: Fixed intermittent failures of TestTimeLimitedCollector.testTimeoutMultiThreaded.
3308
Within this fix, "greedy" flag was added to TimeLimitedCollector, to allow the wrapped
3309
collector to collect also the last doc, after allowed-tTime passed. (Doron Cohen)
3311
2. LUCENE-1348: relax TestTimeLimitedCollector to not fail due to
3312
timeout exceeded (just because test machine is very busy).
3314
======================= Release 2.3.2 =======================
3318
1. LUCENE-1191: On hitting OutOfMemoryError in any index-modifying
3319
methods in IndexWriter, do not commit any further changes to the
3320
index to prevent risk of possible corruption. (Mike McCandless)
3322
2. LUCENE-1197: Fixed issue whereby IndexWriter would flush by RAM
3323
too early when TermVectors were in use. (Mike McCandless)
3325
3. LUCENE-1198: Don't corrupt index if an exception happens inside
3326
DocumentsWriter.init (Mike McCandless)
3328
4. LUCENE-1199: Added defensive check for null indexReader before
3329
calling close in IndexModifier.close() (Mike McCandless)
3331
5. LUCENE-1200: Fix rare deadlock case in addIndexes* when
3332
ConcurrentMergeScheduler is in use (Mike McCandless)
3334
6. LUCENE-1208: Fix deadlock case on hitting an exception while
3335
processing a document that had triggered a flush (Mike McCandless)
3337
7. LUCENE-1210: Fix deadlock case on hitting an exception while
3338
starting a merge when using ConcurrentMergeScheduler (Mike McCandless)
3340
8. LUCENE-1222: Fix IndexWriter.doAfterFlush to always be called on
3341
flush (Mark Ferguson via Mike McCandless)
3343
9. LUCENE-1226: Fixed IndexWriter.addIndexes(IndexReader[]) to commit
3344
successfully created compound files. (Michael Busch)
3346
10. LUCENE-1150: Re-expose StandardTokenizer's constants publicly;
3347
this was accidentally lost with LUCENE-966. (Nicolas Lalevée via
3350
11. LUCENE-1262: Fixed bug in BufferedIndexReader.refill whereby on
3351
hitting an exception in readInternal, the buffer is incorrectly
3352
filled with stale bytes such that subsequent calls to readByte()
3353
return incorrect results. (Trejkaz via Mike McCandless)
3355
12. LUCENE-1270: Fixed intermittent case where IndexWriter.close()
3356
would hang after IndexWriter.addIndexesNoOptimize had been
3357
called. (Stu Hood via Mike McCandless)
3361
1. LUCENE-1230: Include *pom.xml* in source release files. (Michael Busch)
3364
======================= Release 2.3.1 =======================
3368
1. LUCENE-1168: Fixed corruption cases when autoCommit=false and
3369
documents have mixed term vectors (Suresh Guvvala via Mike
3372
2. LUCENE-1171: Fixed some cases where OOM errors could cause
3373
deadlock in IndexWriter (Mike McCandless).
3375
3. LUCENE-1173: Fixed corruption case when autoCommit=false and bulk
3376
merging of stored fields is used (Yonik via Mike McCandless).
3378
4. LUCENE-1163: Fixed bug in CharArraySet.contains(char[] buffer, int
3379
offset, int len) that was ignoring offset and thus giving the
3380
wrong answer. (Thomas Peuss via Mike McCandless)
3382
5. LUCENE-1177: Fix rare case where IndexWriter.optimize might do too
3383
many merges at the end. (Mike McCandless)
3385
6. LUCENE-1176: Fix corruption case when documents with no term
3386
vector fields are added before documents with term vector fields.
3389
7. LUCENE-1179: Fixed assert statement that was incorrectly
3390
preventing Fields with empty-string field name from working.
3391
(Sergey Kabashnyuk via Mike McCandless)
3393
======================= Release 2.3.0 =======================
3395
Changes in runtime behavior
3397
1. LUCENE-994: Defaults for IndexWriter have been changed to maximize
3398
out-of-the-box indexing speed. First, IndexWriter now flushes by
3399
RAM usage (16 MB by default) instead of a fixed doc count (call
3400
IndexWriter.setMaxBufferedDocs to get backwards compatible
3401
behavior). Second, ConcurrentMergeScheduler is used to run merges
3402
using background threads (call IndexWriter.setMergeScheduler(new
3403
SerialMergeScheduler()) to get backwards compatible behavior).
3404
Third, merges are chosen based on size in bytes of each segment
3405
rather than document count of each segment (call
3406
IndexWriter.setMergePolicy(new LogDocMergePolicy()) to get
3407
backwards compatible behavior).
3409
NOTE: users of ParallelReader must change back all of these
3410
defaults in order to ensure the docIDs "align" across all parallel
3415
2. LUCENE-1045: SortField.AUTO didn't work with long. When detecting
3416
the field type for sorting automatically, numbers used to be
3417
interpreted as int, then as float, if parsing the number as an int
3418
failed. Now the detection checks for int, then for long,
3419
then for float. (Daniel Naber)
3423
1. LUCENE-843: Added IndexWriter.setRAMBufferSizeMB(...) to have
3424
IndexWriter flush whenever the buffered documents are using more
3425
than the specified amount of RAM. Also added new APIs to Token
3426
that allow one to set a char[] plus offset and length to specify a
3427
token (to avoid creating a new String() for each Token). (Mike
3430
2. LUCENE-963: Add setters to Field to allow for re-using a single
3431
Field instance during indexing. This is a sizable performance
3432
gain, especially for small documents. (Mike McCandless)
3434
3. LUCENE-969: Add new APIs to Token, TokenStream and Analyzer to
3435
permit re-using of Token and TokenStream instances during
3436
indexing. Changed Token to use a char[] as the store for the
3437
termText instead of String. This gives faster tokenization
3438
performance (~10-15%). (Mike McCandless)
3440
4. LUCENE-847: Factored MergePolicy, which determines which merges
3441
should take place and when, as well as MergeScheduler, which
3442
determines when the selected merges should actually run, out of
3443
IndexWriter. The default merge policy is now
3444
LogByteSizeMergePolicy (see LUCENE-845) and the default merge
3445
scheduler is now ConcurrentMergeScheduler (see
3446
LUCENE-870). (Steven Parkes via Mike McCandless)
3448
5. LUCENE-1052: Add IndexReader.setTermInfosIndexDivisor(int) method
3449
that allows you to reduce memory usage of the termInfos by further
3450
sub-sampling (over the termIndexInterval that was used during
3451
indexing) which terms are loaded into memory. (Chuck Williams,
3452
Doug Cutting via Mike McCandless)
3454
6. LUCENE-743: Add IndexReader.reopen() method that re-opens an
3455
existing IndexReader (see New features -> 8.) (Michael Busch)
3457
7. LUCENE-1062: Add setData(byte[] data),
3458
setData(byte[] data, int offset, int length), getData(), getOffset()
3459
and clone() methods to o.a.l.index.Payload. Also add the field name
3460
as arg to Similarity.scorePayload(). (Michael Busch)
3462
8. LUCENE-982: Add IndexWriter.optimize(int maxNumSegments) method to
3463
"partially optimize" an index down to maxNumSegments segments.
3466
9. LUCENE-1080: Changed Token.DEFAULT_TYPE to be public.
3468
10. LUCENE-1064: Changed TopDocs constructor to be public.
3469
(Shai Erera via Michael Busch)
3471
11. LUCENE-1079: DocValues cleanup: constructor now has no params,
3472
and getInnerArray() now throws UnsupportedOperationException (Doron Cohen)
3474
12. LUCENE-1089: Added PriorityQueue.insertWithOverflow, which returns
3475
the Object (if any) that was bumped from the queue to allow
3476
re-use. (Shai Erera via Mike McCandless)
3478
13. LUCENE-1101: Token reuse 'contract' (defined LUCENE-969)
3479
modified so it is token producer's responsibility
3480
to call Token.clear(). (Doron Cohen)
3482
14. LUCENE-1118: Changed StandardAnalyzer to skip too-long (default >
3483
255 characters) tokens. You can increase this limit by calling
3484
StandardAnalyzer.setMaxTokenLength(...). (Michael McCandless)
3489
1. LUCENE-933: QueryParser fixed to not produce empty sub
3490
BooleanQueries "()" even if the Analyzer produced no
3491
tokens for input. (Doron Cohen)
3493
2. LUCENE-955: Fixed SegmentTermPositions to work correctly with the
3494
first term in the dictionary. (Michael Busch)
3496
3. LUCENE-951: Fixed NullPointerException in MultiLevelSkipListReader
3497
that was thrown after a call of TermPositions.seek().
3498
(Rich Johnson via Michael Busch)
3500
4. LUCENE-938: Fixed cases where an unhandled exception in
3501
IndexWriter's methods could cause deletes to be lost.
3502
(Steven Parkes via Mike McCandless)
3504
5. LUCENE-962: Fixed case where an unhandled exception in
3505
IndexWriter.addDocument or IndexWriter.updateDocument could cause
3506
unreferenced files in the index to not be deleted
3507
(Steven Parkes via Mike McCandless)
3509
6. LUCENE-957: RAMDirectory fixed to properly handle directories
3510
larger than Integer.MAX_VALUE. (Doron Cohen)
3512
7. LUCENE-781: MultiReader fixed to not throw NPE if isCurrent(),
3513
isOptimized() or getVersion() is called. Separated MultiReader
3514
into two classes: MultiSegmentReader extends IndexReader, is
3515
package-protected and is created automatically by IndexReader.open()
3516
in case the index has multiple segments. The public MultiReader
3517
now extends MultiSegmentReader and is intended to be used by users
3518
who want to add their own subreaders. (Daniel Naber, Michael Busch)
3520
8. LUCENE-970: FilterIndexReader now implements isOptimized(). Before
3521
a call of isOptimized() would throw a NPE. (Michael Busch)
3523
9. LUCENE-832: ParallelReader fixed to not throw NPE if isCurrent(),
3524
isOptimized() or getVersion() is called. (Michael Busch)
3526
10. LUCENE-948: Fix FNFE exception caused by stale NFS client
3527
directory listing caches when writers on different machines are
3528
sharing an index over NFS and using a custom deletion policy (Mike
3531
11. LUCENE-978: Ensure TermInfosReader, FieldsReader, and FieldsReader
3532
close any streams they had opened if an exception is hit in the
3533
constructor. (Ning Li via Mike McCandless)
3535
12. LUCENE-985: If an extremely long term is in a doc (> 16383 chars),
3536
we now throw an IllegalArgumentException saying the term is too
3537
long, instead of cryptic ArrayIndexOutOfBoundsException. (Karl
3538
Wettin via Mike McCandless)
3540
13. LUCENE-991: The explain() method of BoostingTermQuery had errors
3541
when no payloads were present on a document. (Peter Keegan via
3544
14. LUCENE-992: Fixed IndexWriter.updateDocument to be atomic again
3545
(this was broken by LUCENE-843). (Ning Li via Mike McCandless)
3547
15. LUCENE-1008: Fixed corruption case when document with no term
3548
vector fields is added after documents with term vector fields.
3549
This bug was introduced with LUCENE-843. (Grant Ingersoll via
3552
16. LUCENE-1006: Fixed QueryParser to accept a "" field value (zero
3553
length quoted string.) (yonik)
3555
17. LUCENE-1010: Fixed corruption case when document with no term
3556
vector fields is added after documents with term vector fields.
3557
This case is hit during merge and would cause an EOFException.
3558
This bug was introduced with LUCENE-984. (Andi Vajda via Mike
3561
19. LUCENE-1009: Fix merge slowdown with LogByteSizeMergePolicy when
3562
autoCommit=false and documents are using stored fields and/or term
3563
vectors. (Mark Miller via Mike McCandless)
3565
20. LUCENE-1011: Fixed corruption case when two or more machines,
3566
sharing an index over NFS, can be writers in quick succession.
3567
(Patrick Kimber via Mike McCandless)
3569
21. LUCENE-1028: Fixed Weight serialization for few queries:
3570
DisjunctionMaxQuery, ValueSourceQuery, CustomScoreQuery.
3571
Serialization check added for all queries.
3572
(Kyle Maxwell via Doron Cohen)
3574
22. LUCENE-1048: Fixed incorrect behavior in Lock.obtain(...) when the
3575
timeout argument is very large (eg Long.MAX_VALUE). Also added
3576
Lock.LOCK_OBTAIN_WAIT_FOREVER constant to never timeout. (Nikolay
3577
Diakov via Mike McCandless)
3579
23. LUCENE-1050: Throw LockReleaseFailedException in
3580
Simple/NativeFSLockFactory if we fail to delete the lock file when
3581
releasing the lock. (Nikolay Diakov via Mike McCandless)
3583
24. LUCENE-1071: Fixed SegmentMerger to correctly set payload bit in
3584
the merged segment. (Michael Busch)
3586
25. LUCENE-1042: Remove throwing of IOException in getTermFreqVector(int, String, TermVectorMapper) to be consistent
3587
with other getTermFreqVector calls. Also removed the throwing of the other IOException in that method to be consistent. (Karl Wettin via Grant Ingersoll)
3589
26. LUCENE-1096: Fixed Hits behavior when hits' docs are deleted
3590
along with iterating the hits. Deleting docs already retrieved
3591
now works seamlessly. If docs not yet retrieved are deleted
3592
(e.g. from another thread), and then, relying on the initial
3593
Hits.length(), an application attempts to retrieve more hits
3594
than actually exist , a ConcurrentMidificationException
3595
is thrown. (Doron Cohen)
3597
27. LUCENE-1068: Changed StandardTokenizer to fix an issue with it marking
3598
the type of some tokens incorrectly. This is done by adding a new flag named
3599
replaceInvalidAcronym which defaults to false, the current, incorrect behavior. Setting
3600
this flag to true fixes the problem. This flag is a temporary fix and is already
3601
marked as being deprecated. 3.x will implement the correct approach. (Shai Erera via Grant Ingersoll)
3602
LUCENE-1140: Fixed NPE caused by 1068 (Alexei Dets via Grant Ingersoll)
3604
28. LUCENE-749: ChainedFilter behavior fixed when logic of
3605
first filter is ANDNOT. (Antonio Bruno via Doron Cohen)
3607
29. LUCENE-508: Make sure SegmentTermEnum.prev() is accurate (= last
3608
term) after next() returns false. (Steven Tamm via Mike
3614
1. LUCENE-906: Elision filter for French.
3615
(Mathieu Lecarme via Otis Gospodnetic)
3617
2. LUCENE-960: Added a SpanQueryFilter and related classes to allow for
3618
not only filtering, but knowing where in a Document a Filter matches
3621
3. LUCENE-868: Added new Term Vector access features. New callback
3622
mechanism allows application to define how and where to read Term
3623
Vectors from disk. This implementation contains several extensions
3624
of the new abstract TermVectorMapper class. The new API should be
3625
back-compatible. No changes in the actual storage of Term Vectors
3627
3.1 LUCENE-1038: Added setDocumentNumber() method to TermVectorMapper
3628
to provide information about what document is being accessed.
3629
(Karl Wettin via Grant Ingersoll)
3631
4. LUCENE-975: Added PositionBasedTermVectorMapper that allows for
3632
position based lookup of term vector information.
3633
See item #3 above (LUCENE-868).
3635
5. LUCENE-1011: Added simple tools (all in org.apache.lucene.store)
3636
to verify that locking is working properly. LockVerifyServer runs
3637
a separate server to verify locks. LockStressTest runs a simple
3638
tool that rapidly obtains and releases locks.
3639
VerifyingLockFactory is a LockFactory that wraps any other
3640
LockFactory and consults the LockVerifyServer whenever a lock is
3641
obtained or released, throwing an exception if an illegal lock
3642
obtain occurred. (Patrick Kimber via Mike McCandless)
3644
6. LUCENE-1015: Added FieldCache extension (ExtendedFieldCache) to
3645
support doubles and longs. Added support into SortField for sorting
3646
on doubles and longs as well. (Grant Ingersoll)
3648
7. LUCENE-1020: Created basic index checking & repair tool
3649
(o.a.l.index.CheckIndex). When run without -fix it does a
3650
detailed test of all segments in the index and reports summary
3651
information and any errors it hit. With -fix it will remove
3652
segments that had errors. (Mike McCandless)
3654
8. LUCENE-743: Add IndexReader.reopen() method that re-opens an
3655
existing IndexReader by only loading those portions of an index
3656
that have changed since the reader was (re)opened. reopen() can
3657
be significantly faster than open(), depending on the amount of
3658
index changes. SegmentReader, MultiSegmentReader, MultiReader,
3659
and ParallelReader implement reopen(). (Michael Busch)
3661
9. LUCENE-1040: CharArraySet useful for efficiently checking
3662
set membership of text specified by char[]. (yonik)
3664
10. LUCENE-1073: Created SnapshotDeletionPolicy to facilitate taking a
3665
live backup of an index without pausing indexing. (Mike
3668
11. LUCENE-1019: CustomScoreQuery enhanced to support multiple
3669
ValueSource queries. (Kyle Maxwell via Doron Cohen)
3671
12. LUCENE-1095: Added an option to StopFilter to increase
3672
positionIncrement of the token succeeding a stopped token.
3673
Disabled by default. Similar option added to QueryParser
3674
to consider token positions when creating PhraseQuery
3675
and MultiPhraseQuery. Disabled by default (so by default
3676
the query parser ignores position increments).
3679
13. LUCENE-1380: Added TokenFilter for setting position increment in special cases related to the ShingleFilter (Mck SembWever, Steve Rowe, Karl Wettin via Grant Ingersoll)
3685
1. LUCENE-937: CachingTokenFilter now uses an iterator to access the
3686
Tokens that are cached in the LinkedList. This increases performance
3687
significantly, especially when the number of Tokens is large.
3688
(Mark Miller via Michael Busch)
3690
2. LUCENE-843: Substantial optimizations to improve how IndexWriter
3691
uses RAM for buffering documents and to speed up indexing (2X-8X
3692
faster). A single shared hash table now records the in-memory
3693
postings per unique term and is directly flushed into a single
3694
segment. (Mike McCandless)
3696
3. LUCENE-892: Fixed extra "buffer to buffer copy" that sometimes
3697
takes place when using compound files. (Mike McCandless)
3699
4. LUCENE-959: Remove synchronization in Document (yonik)
3701
5. LUCENE-963: Add setters to Field to allow for re-using a single
3702
Field instance during indexing. This is a sizable performance
3703
gain, especially for small documents. (Mike McCandless)
3705
6. LUCENE-939: Check explicitly for boundary conditions in FieldInfos
3706
and don't rely on exceptions. (Michael Busch)
3708
7. LUCENE-966: Very substantial speedups (~6X faster) for
3709
StandardTokenizer (StandardAnalyzer) by using JFlex instead of
3710
JavaCC to generate the tokenizer.
3711
(Stanislaw Osinski via Mike McCandless)
3713
8. LUCENE-969: Changed core tokenizers & filters to re-use Token and
3714
TokenStream instances when possible to improve tokenization
3715
performance (~10-15%). (Mike McCandless)
3717
9. LUCENE-871: Speedup ISOLatin1AccentFilter (Ian Boston via Mike
3720
10. LUCENE-986: Refactored SegmentInfos from IndexReader into the new
3721
subclass DirectoryIndexReader. SegmentReader and MultiSegmentReader
3722
now extend DirectoryIndexReader and are the only IndexReader
3723
implementations that use SegmentInfos to access an index and
3724
acquire a write lock for index modifications. (Michael Busch)
3726
11. LUCENE-1007: Allow flushing in IndexWriter to be triggered by
3727
either RAM usage or document count or both (whichever comes
3728
first), by adding symbolic constant DISABLE_AUTO_FLUSH to disable
3729
one of the flush triggers. (Ning Li via Mike McCandless)
3731
12. LUCENE-1043: Speed up merging of stored fields by bulk-copying the
3732
raw bytes for each contiguous range of non-deleted documents.
3733
(Robert Engels via Mike McCandless)
3735
13. LUCENE-693: Speed up nested conjunctions (~2x) that match many
3736
documents, and a slight performance increase for top level
3737
conjunctions. (yonik)
3739
14. LUCENE-1098: Make inner class StandardAnalyzer.SavedStreams static
3740
and final. (Nathan Beyer via Michael Busch)
3744
1. LUCENE-1051: Generate separate javadocs for core, demo and contrib
3745
classes, as well as an unified view. Also add an appropriate menu
3746
structure to the website. (Michael Busch)
3748
2. LUCENE-746: Fix error message in AnalyzingQueryParser.getPrefixQuery.
3749
(Ronnie Kolehmainen via Michael Busch)
3753
1. LUCENE-908: Improvements and simplifications for how the MANIFEST
3754
file and the META-INF dir are created. (Michael Busch)
3756
2. LUCENE-935: Various improvements for the maven artifacts. Now the
3757
artifacts also include the sources as .jar files. (Michael Busch)
3759
3. Added apply-patch target to top-level build. Defaults to looking for
3760
a patch in ${basedir}/../patches with name specified by -Dpatch.name.
3761
Can also specify any location by -Dpatch.file property on the command
3762
line. This should be helpful for easy application of patches, but it
3763
is also a step towards integrating automatic patch application with
3764
JIRA and Hudson, and is thus subject to change. (Grant Ingersoll)
3766
4. LUCENE-935: Defined property "m2.repository.url" to allow setting
3767
the url to a maven remote repository to deploy to. (Michael Busch)
3769
5. LUCENE-1051: Include javadocs in the maven artifacts. (Michael Busch)
3771
6. LUCENE-1055: Remove gdata-server from build files and its sources
3772
from trunk. (Michael Busch)
3774
7. LUCENE-935: Allow to deploy maven artifacts to a remote m2 repository
3775
via scp and ssh authentication. (Michael Busch)
3777
8. LUCENE-1123: Allow overriding the specification version for
3778
MANIFEST.MF (Michael Busch)
3782
1. LUCENE-766: Test adding two fields with the same name but different
3783
term vector setting. (Nicolas Lalevée via Doron Cohen)
3785
======================= Release 2.2.0 =======================
3787
Changes in runtime behavior
3791
1. LUCENE-793: created new exceptions and added them to throws clause
3792
for many methods (all subclasses of IOException for backwards
3793
compatibility): index.StaleReaderException,
3794
index.CorruptIndexException, store.LockObtainFailedException.
3795
This was done to better call out the possible root causes of an
3796
IOException from these methods. (Mike McCandless)
3798
2. LUCENE-811: make SegmentInfos class, plus a few methods from related
3799
classes, package-private again (they were unnecessarily made public
3800
as part of LUCENE-701). (Mike McCandless)
3802
3. LUCENE-710: added optional autoCommit boolean to IndexWriter
3803
constructors. When this is false, index changes are not committed
3804
until the writer is closed. This gives explicit control over when
3805
a reader will see the changes. Also added optional custom
3806
deletion policy to explicitly control when prior commits are
3807
removed from the index. This is intended to allow applications to
3808
share an index over NFS by customizing when prior commits are
3809
deleted. (Mike McCandless)
3811
4. LUCENE-818: changed most public methods of IndexWriter,
3812
IndexReader (and its subclasses), FieldsReader and RAMDirectory to
3813
throw AlreadyClosedException if they are accessed after being
3814
closed. (Mike McCandless)
3816
5. LUCENE-834: Changed some access levels for certain Span classes to allow them
3817
to be overridden. They have been marked expert only and not for public
3818
consumption. (Grant Ingersoll)
3820
6. LUCENE-796: Removed calls to super.* from various get*Query methods in
3821
MultiFieldQueryParser, in order to allow sub-classes to override them.
3822
(Steven Parkes via Otis Gospodnetic)
3824
7. LUCENE-857: Removed caching from QueryFilter and deprecated QueryFilter
3825
in favour of QueryWrapperFilter or QueryWrapperFilter + CachingWrapperFilter
3826
combination when caching is desired.
3827
(Chris Hostetter, Otis Gospodnetic)
3829
8. LUCENE-869: Changed FSIndexInput and FSIndexOutput to inner classes of FSDirectory
3830
to enable extensibility of these classes. (Michael Busch)
3832
9. LUCENE-580: Added the public method reset() to TokenStream. This method does
3833
nothing by default, but may be overwritten by subclasses to support consuming
3834
the TokenStream more than once. (Michael Busch)
3836
10. LUCENE-580: Added a new constructor to Field that takes a TokenStream as
3837
argument, available as tokenStreamValue(). This is useful to avoid the need of
3838
"dummy analyzers" for pre-analyzed fields. (Karl Wettin, Michael Busch)
3840
11. LUCENE-730: Added the new methods to BooleanQuery setAllowDocsOutOfOrder() and
3841
getAllowDocsOutOfOrder(). Deprecated the methods setUseScorer14() and
3842
getUseScorer14(). The optimization patch LUCENE-730 (see Optimizations->3.)
3843
improves performance for certain queries but results in scoring out of docid
3844
order. This patch reverse this change, so now by default hit docs are scored
3845
in docid order if not setAllowDocsOutOfOrder(true) is explicitly called.
3846
This patch also enables the tests in QueryUtils again that check for docid
3847
order. (Paul Elschot, Doron Cohen, Michael Busch)
3849
12. LUCENE-888: Added Directory.openInput(File path, int bufferSize)
3850
to optionally specify the size of the read buffer. Also added
3851
BufferedIndexInput.setBufferSize(int) to change the buffer size.
3854
13. LUCENE-923: Make SegmentTermPositionVector package-private. It does not need
3855
to be public because it implements the public interface TermPositionVector.
3860
1. LUCENE-804: Fixed build.xml to pack a fully compilable src dist. (Doron Cohen)
3862
2. LUCENE-813: Leading wildcard fixed to work with trailing wildcard.
3863
Query parser modified to create a prefix query only for the case
3864
that there is a single trailing wildcard (and no additional wildcard
3865
or '?' in the query text). (Doron Cohen)
3867
3. LUCENE-812: Add no-argument constructors to NativeFSLockFactory
3868
and SimpleFSLockFactory. This enables all 4 builtin LockFactory
3869
implementations to be specified via the System property
3870
org.apache.lucene.store.FSDirectoryLockFactoryClass. (Mike McCandless)
3872
4. LUCENE-821: The new single-norm-file introduced by LUCENE-756
3873
failed to reduce the number of open descriptors since it was still
3874
opened once per field with norms. (yonik)
3876
5. LUCENE-823: Make sure internal file handles are closed when
3877
hitting an exception (eg disk full) while flushing deletes in
3878
IndexWriter's mergeSegments, and also during
3879
IndexWriter.addIndexes. (Mike McCandless)
3881
6. LUCENE-825: If directory is removed after
3882
FSDirectory.getDirectory() but before IndexReader.open you now get
3883
a FileNotFoundException like Lucene pre-2.1 (before this fix you
3884
got an NPE). (Mike McCandless)
3886
7. LUCENE-800: Removed backslash from the TERM_CHAR list in the queryparser,
3887
because the backslash is the escape character. Also changed the ESCAPED_CHAR
3888
list to contain all possible characters, because every character that
3889
follows a backslash should be considered as escaped. (Michael Busch)
3891
8. LUCENE-372: QueryParser.parse() now ensures that the entire input string
3892
is consumed. Now a ParseException is thrown if a query contains too many
3893
closing parentheses. (Andreas Neumann via Michael Busch)
3895
9. LUCENE-814: javacc build targets now fix line-end-style of generated files.
3896
Now also deleting all javacc generated files before calling javacc.
3897
(Steven Parkes, Doron Cohen)
3899
10. LUCENE-829: close readers in contrib/benchmark. (Karl Wettin, Doron Cohen)
3901
11. LUCENE-828: Minor fix for Term's equal().
3902
(Paul Cowan via Otis Gospodnetic)
3904
12. LUCENE-846: Fixed: if IndexWriter is opened with autoCommit=false,
3905
and you call addIndexes, and hit an exception (eg disk full) then
3906
when IndexWriter rolls back its internal state this could corrupt
3907
the instance of IndexWriter (but, not the index itself) by
3908
referencing already deleted segments. This bug was only present
3909
in 2.2 (trunk), ie was never released. (Mike McCandless)
3911
13. LUCENE-736: Sloppy phrase query with repeating terms matches wrong docs.
3912
For example query "B C B"~2 matches the doc "A B C D E". (Doron Cohen)
3914
14. LUCENE-789: Fixed: custom similarity is ignored when using MultiSearcher (problem reported
3915
by Alexey Lef). Now the similarity applied by MultiSearcer.setSimilarity(sim) is being used.
3916
Note that as before this fix, creating a multiSearcher from Searchers for whom custom similarity
3917
was set has no effect - it is masked by the similarity of the MultiSearcher. This is as
3918
designed, because MultiSearcher operates on Searchables (not Searchers). (Doron Cohen)
3920
15. LUCENE-880: Fixed DocumentWriter to close the TokenStreams after it
3921
has written the postings. Then the resources associated with the
3922
TokenStreams can safely be released. (Michael Busch)
3924
16. LUCENE-883: consecutive calls to Spellchecker.indexDictionary()
3925
won't insert terms twice anymore. (Daniel Naber)
3927
17. LUCENE-881: QueryParser.escape() now also escapes the characters
3928
'|' and '&' which are part of the queryparser syntax. (Michael Busch)
3930
18. LUCENE-886: Spellchecker clean up: exceptions aren't printed to STDERR
3931
anymore and ignored, but re-thrown. Some javadoc improvements.
3934
19. LUCENE-698: FilteredQuery now takes the query boost into account for
3935
scoring. (Michael Busch)
3937
20. LUCENE-763: Spellchecker: LuceneDictionary used to skip first word in
3938
enumeration. (Christian Mallwitz via Daniel Naber)
3940
21. LUCENE-903: FilteredQuery explanation inaccuracy with boost.
3941
Explanation tests now "deep" check the explanation details.
3942
(Chris Hostetter, Doron Cohen)
3944
22. LUCENE-912: DisjunctionMaxScorer first skipTo(target) call ignores the
3945
skip target param and ends up at the first match.
3946
(Sudaakeran B. via Chris Hostetter & Doron Cohen)
3948
23. LUCENE-913: Two consecutive score() calls return different
3949
scores for Boolean Queries. (Michael Busch, Doron Cohen)
3951
24. LUCENE-1013: Fix IndexWriter.setMaxMergeDocs to work "out of the
3952
box", again, by moving set/getMaxMergeDocs up from
3953
LogDocMergePolicy into LogMergePolicy. This fixes the API
3954
breakage (non backwards compatible change) caused by LUCENE-994.
3955
(Yonik Seeley via Mike McCandless)
3959
1. LUCENE-759: Added two n-gram-producing TokenFilters.
3962
2. LUCENE-822: Added FieldSelector capabilities to Searchable for use with
3963
RemoteSearcher, and other Searchable implementations. (Mark Miller, Grant Ingersoll)
3965
3. LUCENE-755: Added the ability to store arbitrary binary metadata in the posting list.
3966
These metadata are called Payloads. For every position of a Token one Payload in the form
3967
of a variable length byte array can be stored in the prox file.
3968
Remark: The APIs introduced with this feature are in experimental state and thus
3969
contain appropriate warnings in the javadocs.
3972
4. LUCENE-834: Added BoostingTermQuery which can boost scores based on the
3973
values of a payload (see #3 above.) (Grant Ingersoll)
3975
5. LUCENE-834: Similarity has a new method for scoring payloads called
3976
scorePayloads that can be overridden to take advantage of payload
3977
storage (see #3 above)
3979
6. LUCENE-834: Added isPayloadAvailable() onto TermPositions interface and
3980
implemented it in the appropriate places (Grant Ingersoll)
3982
7. LUCENE-853: Added RemoteCachingWrapperFilter to enable caching of Filters
3983
on the remote side of the RMI connection.
3984
(Matt Ericson via Otis Gospodnetic)
3986
8. LUCENE-446: Added Solr's search.function for scores based on field
3987
values, plus CustomScoreQuery for simple score (post) customization.
3988
(Yonik Seeley, Doron Cohen)
3990
9. LUCENE-1058: Added new TeeTokenFilter (like the UNIX 'tee' command) and SinkTokenizer which can be used to share tokens between two or more
3991
Fields such that the other Fields do not have to go through the whole Analysis process over again. For instance, if you have two
3992
Fields that share all the same analysis steps except one lowercases tokens and the other does not, you can coordinate the operations
3993
between the two using the TeeTokenFilter and the SinkTokenizer. See TeeSinkTokenTest.java for examples.
3994
(Grant Ingersoll, Michael Busch, Yonik Seeley)
3998
1. LUCENE-761: The proxStream is now cloned lazily in SegmentTermPositions
3999
when nextPosition() is called for the first time. This allows using instances
4000
of SegmentTermPositions instead of SegmentTermDocs without additional costs.
4003
2. LUCENE-431: RAMInputStream and RAMOutputStream extend IndexInput and
4004
IndexOutput directly now. This avoids further buffering and thus avoids
4005
unnecessary array copies. (Michael Busch)
4007
3. LUCENE-730: Updated BooleanScorer2 to make use of BooleanScorer in some
4008
cases and possibly improve scoring performance. Documents can now be
4009
delivered out-of-order as they are scored (e.g. to HitCollector).
4010
N.B. A bit of code had to be disabled in QueryUtils in order for
4011
TestBoolean2 test to keep passing.
4012
(Paul Elschot via Otis Gospodnetic)
4014
4. LUCENE-882: Spellchecker doesn't store the ngrams anymore but only indexes
4015
them to keep the spell index small. (Daniel Naber)
4017
5. LUCENE-430: Delay allocation of the buffer after a clone of BufferedIndexInput.
4018
Together with LUCENE-888 this will allow to adjust the buffer size
4019
dynamically. (Paul Elschot, Michael Busch)
4021
6. LUCENE-888: Increase buffer sizes inside CompoundFileWriter and
4022
BufferedIndexOutput. Also increase buffer size in
4023
BufferedIndexInput, but only when used during merging. Together,
4024
these increases yield 10-18% overall performance gain vs the
4025
previous 1K defaults. (Mike McCandless)
4027
7. LUCENE-866: Adds multi-level skip lists to the posting lists. This speeds
4028
up most queries that use skipTo(), especially on big indexes with large posting
4029
lists. For average AND queries the speedup is about 20%, for queries that
4030
contain very frequent and very unique terms the speedup can be over 80%.
4035
1. LUCENE 791 && INFRA-1173: Infrastructure moved the Wiki to
4036
http://wiki.apache.org/lucene-java/ Updated the links in the docs and
4037
wherever else I found references. (Grant Ingersoll, Joe Schaefer)
4039
2. LUCENE-807: Fixed the javadoc for ScoreDocComparator.compare() to be
4040
consistent with java.util.Comparator.compare(): Any integer is allowed to
4041
be returned instead of only -1/0/1.
4042
(Paul Cowan via Michael Busch)
4044
3. LUCENE-875: Solved javadoc warnings & errors under jdk1.4.
4045
Solved javadoc errors under jdk5 (jars in path for gdata).
4046
Made "javadocs" target depend on "build-contrib" for first downloading
4047
contrib jars configured for dynamic downloaded. (Note: when running
4048
behind firewall, a firewall prompt might pop up) (Doron Cohen)
4050
4. LUCENE-740: Added SNOWBALL-LICENSE.txt to the snowball package and a
4051
remark about the license to NOTICE.TXT. (Steven Parkes via Michael Busch)
4053
5. LUCENE-925: Added analysis package javadocs. (Grant Ingersoll and Doron Cohen)
4055
6. LUCENE-926: Added document package javadocs. (Grant Ingersoll)
4059
1. LUCENE-802: Added LICENSE.TXT and NOTICE.TXT to Lucene jars.
4060
(Steven Parkes via Michael Busch)
4062
2. LUCENE-885: "ant test" now includes all contrib tests. The new
4063
"ant test-core" target can be used to run only the Core (non
4067
3. LUCENE-900: "ant test" now enables Java assertions (in Lucene packages).
4070
4. LUCENE-894: Add custom build file for binary distributions that includes
4071
targets to build the demos. (Chris Hostetter, Michael Busch)
4073
5. LUCENE-904: The "package" targets in build.xml now also generate .md5
4074
checksum files. (Chris Hostetter, Michael Busch)
4076
6. LUCENE-907: Include LICENSE.TXT and NOTICE.TXT in the META-INF dirs of
4077
demo war, demo jar, and the contrib jars. (Michael Busch)
4079
7. LUCENE-909: Demo targets for running the demo. (Doron Cohen)
4081
8. LUCENE-908: Improves content of MANIFEST file and makes it customizable
4082
for the contribs. Adds SNOWBALL-LICENSE.txt to META-INF of the snowball
4083
jar and makes sure that the lucli jar contains LICENSE.txt and NOTICE.txt.
4084
(Chris Hostetter, Michael Busch)
4086
9. LUCENE-930: Various contrib building improvements to ensure contrib
4087
dependencies are met, and test compilation errors fail the build.
4088
(Steven Parkes, Chris Hostetter)
4090
10. LUCENE-622: Add ant target and pom.xml files for building maven artifacts
4091
of the Lucene core and the contrib modules.
4092
(Sami Siren, Karl Wettin, Michael Busch)
4094
======================= Release 2.1.0 =======================
4096
Changes in runtime behavior
4098
1. 's' and 't' have been removed from the list of default stopwords
4099
in StopAnalyzer (also used in by StandardAnalyzer). Having e.g. 's'
4100
as a stopword meant that 's-class' led to the same results as 'class'.
4101
Note that this problem still exists for 'a', e.g. in 'a-class' as
4102
'a' continues to be a stopword.
4105
2. LUCENE-478: Updated the list of Unicode code point ranges for CJK
4106
(now split into CJ and K) in StandardAnalyzer. (John Wang and
4107
Steven Rowe via Otis Gospodnetic)
4109
3. Modified some CJK Unicode code point ranges in StandardTokenizer.jj,
4110
and added a few more of them to increase CJK character coverage.
4111
Also documented some of the ranges.
4114
4. LUCENE-489: Add support for leading wildcard characters (*, ?) to
4115
QueryParser. Default is to disallow them, as before.
4116
(Steven Parkes via Otis Gospodnetic)
4118
5. LUCENE-703: QueryParser changed to default to use of ConstantScoreRangeQuery
4119
for range queries. Added useOldRangeQuery property to QueryParser to allow
4120
selection of old RangeQuery class if required.
4123
6. LUCENE-543: WildcardQuery now performs a TermQuery if the provided term
4124
does not contain a wildcard character (? or *), when previously a
4125
StringIndexOutOfBoundsException was thrown.
4126
(Michael Busch via Erik Hatcher)
4128
7. LUCENE-726: Removed the use of deprecated doc.fields() method and
4130
(Michael Busch via Otis Gospodnetic)
4132
8. LUCENE-436: Removed finalize() in TermInfosReader and SegmentReader,
4133
and added a call to enumerators.remove() in TermInfosReader.close().
4134
The finalize() overrides were added to help with a pre-1.4.2 JVM bug
4135
that has since been fixed, plus we no longer support pre-1.4.2 JVMs.
4138
9. LUCENE-771: The default location of the write lock is now the
4139
index directory, and is named simply "write.lock" (without a big
4140
digest prefix). The system properties "org.apache.lucene.lockDir"
4141
nor "java.io.tmpdir" are no longer used as the global directory
4142
for storing lock files, and the LOCK_DIR field of FSDirectory is
4143
now deprecated. (Mike McCandless)
4147
1. LUCENE-503: New ThaiAnalyzer and ThaiWordFilter in contrib/analyzers
4148
(Samphan Raruenrom via Chris Hostetter)
4150
2. LUCENE-545: New FieldSelector API and associated changes to
4151
IndexReader and implementations. New Fieldable interface for use
4152
with the lazy field loading mechanism. (Grant Ingersoll and Chuck
4153
Williams via Grant Ingersoll)
4155
3. LUCENE-676: Move Solr's PrefixFilter to Lucene core. (Yura
4156
Smolsky, Yonik Seeley)
4158
4. LUCENE-678: Added NativeFSLockFactory, which implements locking
4159
using OS native locking (via java.nio.*). (Michael McCandless via
4162
5. LUCENE-544: Added the ability to specify different boosts for
4163
different fields when using MultiFieldQueryParser (Matt Ericson
4164
via Otis Gospodnetic)
4166
6. LUCENE-528: New IndexWriter.addIndexesNoOptimize() that doesn't
4167
optimize the index when adding new segments, only performing
4168
merges as needed. (Ning Li via Yonik Seeley)
4170
7. LUCENE-573: QueryParser now allows backslash escaping in
4171
quoted terms and phrases. (Michael Busch via Yonik Seeley)
4173
8. LUCENE-716: QueryParser now allows specification of Unicode
4174
characters in terms via a unicode escape of the form \uXXXX
4175
(Michael Busch via Yonik Seeley)
4177
9. LUCENE-709: Added RAMDirectory.sizeInBytes(), IndexWriter.ramSizeInBytes()
4178
and IndexWriter.flushRamSegments(), allowing applications to
4179
control the amount of memory used to buffer documents.
4180
(Chuck Williams via Yonik Seeley)
4182
10. LUCENE-723: QueryParser now parses *:* as MatchAllDocsQuery
4185
11. LUCENE-741: Command-line utility for modifying or removing norms
4186
on fields in an existing index. This is mostly based on LUCENE-496
4187
and lives in contrib/miscellaneous.
4188
(Chris Hostetter, Otis Gospodnetic)
4190
12. LUCENE-759: Added NGramTokenizer and EdgeNGramTokenizer classes and
4191
their passing unit tests.
4194
13. LUCENE-565: Added methods to IndexWriter to more efficiently
4195
handle updating documents (the "delete then add" use case). This
4196
is intended to be an eventual replacement for the existing
4197
IndexModifier. Added IndexWriter.flush() (renamed from
4198
flushRamSegments()) to flush all pending updates (held in RAM), to
4199
the Directory. (Ning Li via Mike McCandless)
4201
14. LUCENE-762: Added in SIZE and SIZE_AND_BREAK FieldSelectorResult options
4202
which allow one to retrieve the size of a field without retrieving the
4203
actual field. (Chuck Williams via Grant Ingersoll)
4205
15. LUCENE-799: Properly handle lazy, compressed fields.
4206
(Mike Klaas via Grant Ingersoll)
4210
1. LUCENE-438: Remove "final" from Token, implement Cloneable, allow
4211
changing of termText via setTermText(). (Yonik Seeley)
4213
2. org.apache.lucene.analysis.nl.WordlistLoader has been deprecated
4214
and is supposed to be replaced with the WordlistLoader class in
4215
package org.apache.lucene.analysis (Daniel Naber)
4217
3. LUCENE-609: Revert return type of Document.getField(s) to Field
4218
for backward compatibility, added new Document.getFieldable(s)
4219
for access to new lazy loaded fields. (Yonik Seeley)
4221
4. LUCENE-608: Document.fields() has been deprecated and a new method
4222
Document.getFields() has been added that returns a List instead of
4223
an Enumeration (Daniel Naber)
4225
5. LUCENE-605: New Explanation.isMatch() method and new ComplexExplanation
4226
subclass allows explain methods to produce Explanations which model
4227
"matching" independent of having a positive value.
4230
6. LUCENE-621: New static methods IndexWriter.setDefaultWriteLockTimeout
4231
and IndexWriter.setDefaultCommitLockTimeout for overriding default
4232
timeout values for all future instances of IndexWriter (as well
4233
as for any other classes that may reference the static values,
4235
(Michael McCandless via Chris Hostetter)
4237
7. LUCENE-638: FSDirectory.list() now only returns the directory's
4238
Lucene-related files. Thanks to this change one can now construct
4239
a RAMDirectory from a file system directory that contains files
4240
not related to Lucene.
4241
(Simon Willnauer via Daniel Naber)
4243
8. LUCENE-635: Decoupling locking implementation from Directory
4244
implementation. Added set/getLockFactory to Directory and moved
4245
all locking code into subclasses of abstract class LockFactory.
4246
FSDirectory and RAMDirectory still default to their prior locking
4247
implementations, but now you can mix & match, for example using
4248
SingleInstanceLockFactory (ie, in memory locking) locking with an
4249
FSDirectory. Note that now you must call setDisableLocks before
4250
the instantiation a FSDirectory if you wish to disable locking
4252
(Michael McCandless, Jeff Patterson via Yonik Seeley)
4254
9. LUCENE-657: Made FuzzyQuery non-final and inner ScoreTerm protected.
4255
(Steven Parkes via Otis Gospodnetic)
4257
10. LUCENE-701: Lockless commits: a commit lock is no longer required
4258
when a writer commits and a reader opens the index. This includes
4259
a change to the index file format (see docs/fileformats.html for
4260
details). It also removes all APIs associated with the commit
4261
lock & its timeout. Readers are now truly read-only and do not
4262
block one another on startup. This is the first step to getting
4263
Lucene to work correctly over NFS (second step is
4264
LUCENE-710). (Mike McCandless)
4266
11. LUCENE-722: DEFAULT_MIN_DOC_FREQ was misspelled DEFALT_MIN_DOC_FREQ
4267
in Similarity's MoreLikeThis class. The misspelling has been
4268
replaced by the correct spelling.
4269
(Andi Vajda via Daniel Naber)
4271
12. LUCENE-738: Reduce the size of the file that keeps track of which
4272
documents are deleted when the number of deleted documents is
4273
small. This changes the index file format and cannot be
4274
read by previous versions of Lucene. (Doron Cohen via Yonik Seeley)
4276
13. LUCENE-756: Maintain all norms in a single .nrm file to reduce the
4277
number of open files and file descriptors for the non-compound index
4278
format. This changes the index file format, but maintains the
4279
ability to read and update older indices. The first segment merge
4280
on an older format index will create a single .nrm file for the new
4281
segment. (Doron Cohen via Yonik Seeley)
4283
14. LUCENE-732: DateTools support has been added to QueryParser, with
4284
setters for both the default Resolution, and per-field Resolution.
4285
For backwards compatibility, DateField is still used if no Resolutions
4286
are specified. (Michael Busch via Chris Hostetter)
4288
15. Added isOptimized() method to IndexReader.
4291
16. LUCENE-773: Deprecate the FSDirectory.getDirectory(*) methods that
4292
take a boolean "create" argument. Instead you should use
4293
IndexWriter's "create" argument to create a new index.
4296
17. LUCENE-780: Add a static Directory.copy() method to copy files
4297
from one Directory to another. (Jiri Kuhn via Mike McCandless)
4299
18. LUCENE-773: Added Directory.clearLock(String name) to forcefully
4300
remove an old lock. The default implementation is to ask the
4301
lockFactory (if non null) to clear the lock. (Mike McCandless)
4303
19. LUCENE-795: Directory.renameFile() has been deprecated as it is
4304
not used anymore inside Lucene. (Daniel Naber)
4308
1. Fixed the web application demo (built with "ant war-demo") which
4309
didn't work because it used a QueryParser method that had
4310
been removed (Daniel Naber)
4312
2. LUCENE-583: ISOLatin1AccentFilter fails to preserve positionIncrement
4315
3. LUCENE-575: SpellChecker min score is incorrectly changed by suggestSimilar
4316
(Karl Wettin via Yonik Seeley)
4318
4. LUCENE-587: Explanation.toHtml was producing malformed HTML
4321
5. Fix to allow MatchAllDocsQuery to be used with RemoteSearcher (Yonik Seeley)
4323
6. LUCENE-601: RAMDirectory and RAMFile made Serializable
4324
(Karl Wettin via Otis Gospodnetic)
4326
7. LUCENE-557: Fixes to BooleanQuery and FilteredQuery so that the score
4327
Explanations match up with the real scores.
4330
8. LUCENE-607: ParallelReader's TermEnum fails to advance properly to
4331
new fields (Chuck Williams, Christian Kohlschuetter via Yonik Seeley)
4333
9. LUCENE-610,LUCENE-611: Simple syntax changes to allow compilation with ecj:
4334
disambiguate inner class scorer's use of doc() in BooleanScorer2,
4335
other test code changes. (DM Smith via Yonik Seeley)
4337
10. LUCENE-451: All core query types now use ComplexExplanations so that
4338
boosts of zero don't confuse the BooleanWeight explain method.
4341
11. LUCENE-593: Fixed LuceneDictionary's inner Iterator
4342
(Kåre Fiedler Christiansen via Otis Gospodnetic)
4344
12. LUCENE-641: fixed an off-by-one bug with IndexWriter.setMaxFieldLength()
4347
13. LUCENE-659: Make PerFieldAnalyzerWrapper delegate getPositionIncrementGap()
4348
to the correct analyzer for the field. (Chuck Williams via Yonik Seeley)
4350
14. LUCENE-650: Fixed NPE in Locale specific String Sort when Document
4352
(Oliver Hutchison via Chris Hostetter)
4354
15. LUCENE-683: Fixed data corruption when reading lazy loaded fields.
4357
16. LUCENE-678: Fixed bug in NativeFSLockFactory which caused the same
4358
lock to be shared between different directories.
4359
(Michael McCandless via Yonik Seeley)
4361
17. LUCENE-690: Fixed thread unsafe use of IndexInput by lazy loaded fields.
4364
18. LUCENE-696: Fix bug when scorer for DisjunctionMaxQuery has skipTo()
4365
called on it before next(). (Yonik Seeley)
4367
19. LUCENE-569: Fixed SpanNearQuery bug, for 'inOrder' queries it would fail
4368
to recognize ordered spans if they overlapped with unordered spans.
4369
(Paul Elschot via Chris Hostetter)
4371
20. LUCENE-706: Updated fileformats.xml|html concerning the docdelta value
4372
in the frequency file. (Johan Stuyts, Doron Cohen via Grant Ingersoll)
4374
21. LUCENE-715: Fixed private constructor in IndexWriter.java to
4375
properly release the acquired write lock if there is an
4376
IOException after acquiring the write lock but before finishing
4377
instantiation. (Matthew Bogosian via Mike McCandless)
4379
22. LUCENE-651: Multiple different threads requesting the same
4380
FieldCache entry (often for Sorting by a field) at the same
4381
time caused multiple generations of that entry, which was
4382
detrimental to performance and memory use.
4383
(Oliver Hutchison via Otis Gospodnetic)
4385
23. LUCENE-717: Fixed build.xml not to fail when there is no lib dir.
4386
(Doron Cohen via Otis Gospodnetic)
4388
24. LUCENE-728: Removed duplicate/old MoreLikeThis and SimilarityQueries
4389
classes from contrib/similarity, as their new home is under
4393
25. LUCENE-669: Do not double-close the RandomAccessFile in
4394
FSIndexInput/Output during finalize(). Besides sending an
4395
IOException up to the GC, this may also be the cause intermittent
4396
"The handle is invalid" IOExceptions on Windows when trying to
4397
close readers or writers. (Michael Busch via Mike McCandless)
4399
26. LUCENE-702: Fix IndexWriter.addIndexes(*) to not corrupt the index
4400
on any exceptions (eg disk full). The semantics of these methods
4401
is now transactional: either all indices are merged or none are.
4402
Also fixed IndexWriter.mergeSegments (called outside of
4403
addIndexes(*) by addDocument, optimize, flushRamSegments) and
4404
IndexReader.commit() (called by close) to clean up and keep the
4405
instance state consistent to what's actually in the index (Mike
4408
27. LUCENE-129: Change finalizers to do "try {...} finally
4409
{super.finalize();}" to make sure we don't miss finalizers in
4410
classes above us. (Esmond Pitt via Mike McCandless)
4412
28. LUCENE-754: Fix a problem introduced by LUCENE-651, causing
4413
IndexReaders to hang around forever, in addition to not
4414
fixing the original FieldCache performance problem.
4415
(Chris Hostetter, Yonik Seeley)
4417
29. LUCENE-140: Fix IndexReader.deleteDocument(int docNum) to
4418
correctly raise ArrayIndexOutOfBoundsException when docNum is too
4419
large. Previously, if docNum was only slightly too large (within
4420
the same multiple of 8, ie, up to 7 ints beyond maxDoc), no
4421
exception would be raised and instead the index would become
4422
silently corrupted. The corruption then only appears much later,
4423
in mergeSegments, when the corrupted segment is merged with
4424
segment(s) after it. (Mike McCandless)
4426
30. LUCENE-768: Fix case where an Exception during deleteDocument,
4427
undeleteAll or setNorm in IndexReader could leave the reader in a
4428
state where close() fails to release the write lock.
4431
31. Remove "tvp" from known index file extensions because it is
4432
never used. (Nicolas Lalevée via Bernhard Messer)
4434
32. LUCENE-767: Change how SegmentReader.maxDoc() is computed to not
4435
rely on file length check and instead use the SegmentInfo's
4436
docCount that's already stored explicitly in the index. This is a
4437
defensive bug fix (ie, there is no known problem seen "in real
4438
life" due to this, just a possible future problem). (Chuck
4439
Williams via Mike McCandless)
4443
1. LUCENE-586: TermDocs.skipTo() is now more efficient for
4444
multi-segment indexes. This will improve the performance of many
4445
types of queries against a non-optimized index. (Andrew Hudson
4448
2. LUCENE-623: RAMDirectory.close now nulls out its reference to all
4449
internal "files", allowing them to be GCed even if references to the
4450
RAMDirectory itself still exist. (Nadav Har'El via Chris Hostetter)
4452
3. LUCENE-629: Compressed fields are no longer uncompressed and
4453
recompressed during segment merges (e.g. during indexing or
4454
optimizing), thus improving performance . (Michael Busch via Otis
4457
4. LUCENE-388: Improve indexing performance when maxBufferedDocs is
4458
large by keeping a count of buffered documents rather than
4459
counting after each document addition. (Doron Cohen, Paul Smith,
4462
5. Modified TermScorer.explain to use TermDocs.skipTo() instead of
4463
looping through docs. (Grant Ingersoll)
4465
6. LUCENE-672: New indexing segment merge policy flushes all
4466
buffered docs to their own segment and delays a merge until
4467
mergeFactor segments of a certain level have been accumulated.
4468
This increases indexing performance in the presence of deleted
4469
docs or partially full segments as well as enabling future
4472
NOTE: this also fixes an "under-merging" bug whereby it is
4473
possible to get far too many segments in your index (which will
4474
drastically slow down search, risks exhausting file descriptor
4475
limit, etc.). This can happen when the number of buffered docs
4476
at close, plus the number of docs in the last non-ram segment is
4477
greater than mergeFactor. (Ning Li, Yonik Seeley)
4479
7. Lazy loaded fields unnecessarily retained an extra copy of loaded
4480
String data. (Yonik Seeley)
4482
8. LUCENE-443: ConjunctionScorer performance increase. Speed up
4483
any BooleanQuery with more than one mandatory clause.
4484
(Abdul Chaudhry, Paul Elschot via Yonik Seeley)
4486
9. LUCENE-365: DisjunctionSumScorer performance increase of
4487
~30%. Speeds up queries with optional clauses. (Paul Elschot via
4490
10. LUCENE-695: Optimized BufferedIndexInput.readBytes() for medium
4491
size buffers, which will speed up merging and retrieving binary
4492
and compressed fields. (Nadav Har'El via Yonik Seeley)
4494
11. LUCENE-687: Lazy skipping on proximity file speeds up most
4495
queries involving term positions, including phrase queries.
4496
(Michael Busch via Yonik Seeley)
4498
12. LUCENE-714: Replaced 2 cases of manual for-loop array copying
4499
with calls to System.arraycopy instead, in DocumentWriter.java.
4500
(Nicolas Lalevee via Mike McCandless)
4502
13. LUCENE-729: Non-recursive skipTo and next implementation of
4503
TermDocs for a MultiReader. The old implementation could
4504
recurse up to the number of segments in the index. (Yonik Seeley)
4506
14. LUCENE-739: Improve segment merging performance by reusing
4507
the norm array across different fields and doing bulk writes
4508
of norms of segments with no deleted docs.
4509
(Michael Busch via Yonik Seeley)
4511
15. LUCENE-745: Add BooleanQuery.clauses(), allowing direct access
4512
to the List of clauses and replaced the internal synchronized Vector
4513
with an unsynchronized List. (Yonik Seeley)
4515
16. LUCENE-750: Remove finalizers from FSIndexOutput and move the
4516
FSIndexInput finalizer to the actual file so all clones don't
4517
register a new finalizer. (Yonik Seeley)
4521
1. Added TestTermScorer.java (Grant Ingersoll)
4523
2. Added TestWindowsMMap.java (Benson Margulies via Mike McCandless)
4525
3. LUCENE-744 Append the user.name property onto the temporary directory
4526
that is created so it doesn't interfere with other users. (Grant Ingersoll)
4530
1. Added style sheet to xdocs named lucene.css and included in the
4531
Anakia VSL descriptor. (Grant Ingersoll)
4533
2. Added scoring.xml document into xdocs. Updated Similarity.java
4534
scoring formula.(Grant Ingersoll and Steve Rowe. Updates from:
4535
Michael McCandless, Doron Cohen, Chris Hostetter, Doug Cutting).
4538
3. Added javadocs for FieldSelectorResult.java. (Grant Ingersoll)
4540
4. Moved xdocs directory to src/site/src/documentation/content/xdocs per
4541
Issue 707. Site now builds using Forrest, just like the other Lucene
4542
siblings. See http://wiki.apache.org/jakarta-lucene/HowToUpdateTheWebsite
4543
for info on updating the website. (Grant Ingersoll with help from Steve Rowe,
4544
Chris Hostetter, Doug Cutting, Otis Gospodnetic, Yonik Seeley)
4546
5. Added in Developer and System Requirements sections under Resources (Grant Ingersoll)
4548
6. LUCENE-713 Updated the Term Vector section of File Formats to include
4549
documentation on how Offset and Position info are stored in the TVF file.
4550
(Grant Ingersoll, Samir Abdou)
4552
7. Added in link to Clover Test Code Coverage Reports under the Develop
4553
section in Resources (Grant Ingersoll)
4555
8. LUCENE-748: Added details for semantics of IndexWriter.close on
4556
hitting an Exception. (Jed Wesley-Smith via Mike McCandless)
4558
9. Added some text about what is contained in releases.
4559
(Eric Haszlakiewicz via Grant Ingersoll)
4561
10. LUCENE-758: Fix javadoc to clarify that RAMDirectory(Directory)
4562
makes a full copy of the starting Directory. (Mike McCandless)
4564
11. LUCENE-764: Fix javadocs to detail temporary space requirements
4565
for IndexWriter's optimize(), addIndexes(*) and addDocument(...)
4566
methods. (Mike McCandless)
4570
1. Added in clover test code coverage per http://issues.apache.org/jira/browse/LUCENE-721
4571
To enable clover code coverage, you must have clover.jar in the ANT
4572
classpath and specify -Drun.clover=true on the command line.
4573
(Michael Busch and Grant Ingersoll)
4575
2. Added a sysproperty in common-build.xml per Lucene 752 to map java.io.tmpdir to
4576
${build.dir}/test just like the tempDir sysproperty.
4578
3. LUCENE-757 Added new target named init-dist that does setup for
4579
distribution of both binary and source distributions. Called by package
4582
======================= Release 2.0.0 =======================
4586
1. All deprecated methods and fields have been removed, except
4587
DateField, which will still be supported for some time
4588
so Lucene can read its date fields from old indexes
4589
(Yonik Seeley & Grant Ingersoll)
4591
2. DisjunctionSumScorer is no longer public.
4592
(Paul Elschot via Otis Gospodnetic)
4594
3. Creating a Field with both an empty name and an empty value
4595
now throws an IllegalArgumentException
4598
4. LUCENE-301: Added new IndexWriter({String,File,Directory},
4599
Analyzer) constructors that do not take a boolean "create"
4600
argument. These new constructors will create a new index if
4601
necessary, else append to the existing one. (Dan Armbrust via
4606
1. LUCENE-496: Command line tool for modifying the field norms of an
4607
existing index; added to contrib/miscellaneous. (Chris Hostetter)
4609
2. LUCENE-577: SweetSpotSimilarity added to contrib/miscellaneous.
4614
1. LUCENE-330: Fix issue of FilteredQuery not working properly within
4615
BooleanQuery. (Paul Elschot via Erik Hatcher)
4617
2. LUCENE-515: Make ConstantScoreRangeQuery and ConstantScoreQuery work
4618
with RemoteSearchable. (Philippe Laflamme via Yonik Seeley)
4620
3. Added methods to get/set writeLockTimeout and commitLockTimeout in
4621
IndexWriter. These could be set in Lucene 1.4 using a system property.
4622
This feature had been removed without adding the corresponding
4623
getter/setter methods. (Daniel Naber)
4625
4. LUCENE-413: Fixed ArrayIndexOutOfBoundsException exceptions
4626
when using SpanQueries. (Paul Elschot via Yonik Seeley)
4628
5. Implemented FilterIndexReader.getVersion() and isCurrent()
4631
6. LUCENE-540: Fixed a bug with IndexWriter.addIndexes(Directory[])
4632
that sometimes caused the index order of documents to change.
4635
7. LUCENE-526: Fixed a bug in FieldSortedHitQueue that caused
4636
subsequent String sorts with different locales to sort identically.
4637
(Paul Cowan via Yonik Seeley)
4639
8. LUCENE-541: Add missing extractTerms() to DisjunctionMaxQuery
4640
(Stefan Will via Yonik Seeley)
4642
9. LUCENE-514: Added getTermArrays() and extractTerms() to
4643
MultiPhraseQuery (Eric Jain & Yonik Seeley)
4645
10. LUCENE-512: Fixed ClassCastException in ParallelReader.getTermFreqVectors
4646
(frederic via Yonik)
4648
11. LUCENE-352: Fixed bug in SpanNotQuery that manifested as
4649
NullPointerException when "exclude" query was not a SpanTermQuery.
4652
12. LUCENE-572: Fixed bug in SpanNotQuery hashCode, was ignoring exclude clause
4655
13. LUCENE-561: Fixed some ParallelReader bugs. NullPointerException if the reader
4656
didn't know about the field yet, reader didn't keep track if it had deletions,
4657
and deleteDocument calls could circumvent synchronization on the subreaders.
4658
(Chuck Williams via Yonik Seeley)
4660
14. LUCENE-556: Added empty extractTerms() implementation to MatchAllDocsQuery and
4661
ConstantScoreQuery in order to allow their use with a MultiSearcher.
4664
15. LUCENE-546: Removed 2GB file size limitations for RAMDirectory.
4665
(Peter Royal, Michael Chan, Yonik Seeley)
4667
16. LUCENE-485: Don't hold commit lock while removing obsolete index
4668
files. (Luc Vanlerberghe via cutting)
4675
1. LUCENE-511: Fix a bug in the BufferedIndexOutput optimization
4676
introduced in 1.9-final. (Shay Banon & Steven Tamm via cutting)
4680
Note that this release is mostly but not 100% source compatible with
4681
the previous release of Lucene (1.4.3). In other words, you should
4682
make sure your application compiles with this version of Lucene before
4683
you replace the old Lucene JAR with the new one. Many methods have
4684
been deprecated in anticipation of release 2.0, so deprecation
4685
warnings are to be expected when upgrading from 1.4.3 to 1.9.
4689
1. The fix that made IndexWriter.setMaxBufferedDocs(1) work had negative
4690
effects on indexing performance and has thus been reverted. The
4691
argument for setMaxBufferedDocs(int) must now at least be 2, otherwise
4692
an exception is thrown. (Daniel Naber)
4696
1. Optimized BufferedIndexOutput.writeBytes() to use
4697
System.arraycopy() in more cases, rather than copying byte-by-byte.
4698
(Lukas Zapletal via Cutting)
4704
1. To compile and use Lucene you now need Java 1.4 or later.
4706
Changes in runtime behavior
4708
1. FuzzyQuery can no longer throw a TooManyClauses exception. If a
4709
FuzzyQuery expands to more than BooleanQuery.maxClauseCount
4710
terms only the BooleanQuery.maxClauseCount most similar terms
4711
go into the rewritten query and thus the exception is avoided.
4714
2. Changed system property from "org.apache.lucene.lockdir" to
4715
"org.apache.lucene.lockDir", so that its casing follows the existing
4716
pattern used in other Lucene system properties. (Bernhard)
4718
3. The terms of RangeQueries and FuzzyQueries are now converted to
4719
lowercase by default (as it has been the case for PrefixQueries
4720
and WildcardQueries before). Use setLowercaseExpandedTerms(false)
4721
to disable that behavior but note that this also affects
4722
PrefixQueries and WildcardQueries. (Daniel Naber)
4724
4. Document frequency that is computed when MultiSearcher is used is now
4725
computed correctly and "globally" across subsearchers and indices, while
4726
before it used to be computed locally to each index, which caused
4727
ranking across multiple indices not to be equivalent.
4728
(Chuck Williams, Wolf Siberski via Otis, bug #31841)
4730
5. When opening an IndexWriter with create=true, Lucene now only deletes
4731
its own files from the index directory (looking at the file name suffixes
4732
to decide if a file belongs to Lucene). The old behavior was to delete
4733
all files. (Daniel Naber and Bernhard Messer, bug #34695)
4735
6. The version of an IndexReader, as returned by getCurrentVersion()
4736
and getVersion() doesn't start at 0 anymore for new indexes. Instead, it
4737
is now initialized by the system time in milliseconds.
4738
(Bernhard Messer via Daniel Naber)
4740
7. Several default values cannot be set via system properties anymore, as
4741
this has been considered inappropriate for a library like Lucene. For
4742
most properties there are set/get methods available in IndexWriter which
4743
you should use instead. This affects the following properties:
4744
See IndexWriter for getter/setter methods:
4745
org.apache.lucene.writeLockTimeout, org.apache.lucene.commitLockTimeout,
4746
org.apache.lucene.minMergeDocs, org.apache.lucene.maxMergeDocs,
4747
org.apache.lucene.maxFieldLength, org.apache.lucene.termIndexInterval,
4748
org.apache.lucene.mergeFactor,
4749
See BooleanQuery for getter/setter methods:
4750
org.apache.lucene.maxClauseCount
4751
See FSDirectory for getter/setter methods:
4755
8. Fixed FieldCacheImpl to use user-provided IntParser and FloatParser,
4756
instead of using Integer and Float classes for parsing.
4757
(Yonik Seeley via Otis Gospodnetic)
4759
9. Expert level search routines returning TopDocs and TopFieldDocs
4760
no longer normalize scores. This also fixes bugs related to
4761
MultiSearchers and score sorting/normalization.
4762
(Luc Vanlerberghe via Yonik Seeley, LUCENE-469)
4766
1. Added support for stored compressed fields (patch #31149)
4767
(Bernhard Messer via Christoph)
4769
2. Added support for binary stored fields (patch #29370)
4770
(Drew Farris and Bernhard Messer via Christoph)
4772
3. Added support for position and offset information in term vectors
4773
(patch #18927). (Grant Ingersoll & Christoph)
4775
4. A new class DateTools has been added. It allows you to format dates
4776
in a readable format adequate for indexing. Unlike the existing
4777
DateField class DateTools can cope with dates before 1970 and it
4778
forces you to specify the desired date resolution (e.g. month, day,
4779
second, ...) which can make RangeQuerys on those fields more efficient.
4782
5. QueryParser now correctly works with Analyzers that can return more
4783
than one token per position. For example, a query "+fast +car"
4784
would be parsed as "+fast +(car automobile)" if the Analyzer
4785
returns "car" and "automobile" at the same position whenever it
4786
finds "car" (Patch #23307).
4787
(Pierrick Brihaye, Daniel Naber)
4789
6. Permit unbuffered Directory implementations (e.g., using mmap).
4790
InputStream is replaced by the new classes IndexInput and
4791
BufferedIndexInput. OutputStream is replaced by the new classes
4792
IndexOutput and BufferedIndexOutput. InputStream and OutputStream
4793
are now deprecated and FSDirectory is now subclassable. (cutting)
4795
7. Add native Directory and TermDocs implementations that work under
4796
GCJ. These require GCC 3.4.0 or later and have only been tested
4797
on Linux. Use 'ant gcj' to build demo applications. (cutting)
4799
8. Add MMapDirectory, which uses nio to mmap input files. This is
4800
still somewhat slower than FSDirectory. However it uses less
4801
memory per query term, since a new buffer is not allocated per
4802
term, which may help applications which use, e.g., wildcard
4803
queries. It may also someday be faster. (cutting & Paul Elschot)
4805
9. Added javadocs-internal to build.xml - bug #30360
4806
(Paul Elschot via Otis)
4808
10. Added RangeFilter, a more generically useful filter than DateFilter.
4809
(Chris M Hostetter via Erik)
4811
11. Added NumberTools, a utility class indexing numeric fields.
4812
(adapted from code contributed by Matt Quail; committed by Erik)
4814
12. Added public static IndexReader.main(String[] args) method.
4815
IndexReader can now be used directly at command line level
4816
to list and optionally extract the individual files from an existing
4817
compound index file.
4818
(adapted from code contributed by Garrett Rooney; committed by Bernhard)
4820
13. Add IndexWriter.setTermIndexInterval() method. See javadocs.
4823
14. Added LucenePackage, whose static get() method returns java.util.Package,
4824
which lets the caller get the Lucene version information specified in
4826
(Doug Cutting via Otis)
4828
15. Added Hits.iterator() method and corresponding HitIterator and Hit objects.
4829
This provides standard java.util.Iterator iteration over Hits.
4830
Each call to the iterator's next() method returns a Hit object.
4831
(Jeremy Rayner via Erik)
4833
16. Add ParallelReader, an IndexReader that combines separate indexes
4834
over different fields into a single virtual index. (Doug Cutting)
4836
17. Add IntParser and FloatParser interfaces to FieldCache, so that
4837
fields in arbitrarily formats can be cached as ints and floats.
4840
18. Added class org.apache.lucene.index.IndexModifier which combines
4841
IndexWriter and IndexReader, so you can add and delete documents without
4842
worrying about synchronization/locking issues.
4845
19. Lucene can now be used inside an unsigned applet, as Lucene's access
4846
to system properties will not cause a SecurityException anymore.
4847
(Jon Schuster via Daniel Naber, bug #34359)
4849
20. Added a new class MatchAllDocsQuery that matches all documents.
4850
(John Wang via Daniel Naber, bug #34946)
4852
21. Added ability to omit norms on a per field basis to decrease
4853
index size and memory consumption when there are many indexed fields.
4854
See Field.setOmitNorms()
4855
(Yonik Seeley, LUCENE-448)
4857
22. Added NullFragmenter to contrib/highlighter, which is useful for
4858
highlighting entire documents or fields.
4861
23. Added regular expression queries, RegexQuery and SpanRegexQuery.
4862
Note the same term enumeration caveats apply with these queries as
4863
apply to WildcardQuery and other term expanding queries.
4864
These two new queries are not currently supported via QueryParser.
4867
24. Added ConstantScoreQuery which wraps a filter and produces a score
4868
equal to the query boost for every matching document.
4869
(Yonik Seeley, LUCENE-383)
4871
25. Added ConstantScoreRangeQuery which produces a constant score for
4872
every document in the range. One advantage over a normal RangeQuery
4873
is that it doesn't expand to a BooleanQuery and thus doesn't have a maximum
4874
number of terms the range can cover. Both endpoints may also be open.
4875
(Yonik Seeley, LUCENE-383)
4877
26. Added ability to specify a minimum number of optional clauses that
4878
must match in a BooleanQuery. See BooleanQuery.setMinimumNumberShouldMatch().
4879
(Paul Elschot, Chris Hostetter via Yonik Seeley, LUCENE-395)
4881
27. Added DisjunctionMaxQuery which provides the maximum score across its clauses.
4882
It's very useful for searching across multiple fields.
4883
(Chuck Williams via Yonik Seeley, LUCENE-323)
4885
28. New class ISOLatin1AccentFilter that replaces accented characters in the ISO
4886
Latin 1 character set by their unaccented equivalent.
4887
(Sven Duzont via Erik Hatcher)
4889
29. New class KeywordAnalyzer. "Tokenizes" the entire stream as a single token.
4890
This is useful for data like zip codes, ids, and some product names.
4893
30. Copied LengthFilter from contrib area to core. Removes words that are too
4894
long and too short from the stream.
4895
(David Spencer via Otis and Daniel)
4897
31. Added getPositionIncrementGap(String fieldName) to Analyzer. This allows
4898
custom analyzers to put gaps between Field instances with the same field
4899
name, preventing phrase or span queries crossing these boundaries. The
4900
default implementation issues a gap of 0, allowing the default token
4901
position increment of 1 to put the next field's first token into a
4902
successive position.
4903
(Erik Hatcher, with advice from Yonik)
4905
32. StopFilter can now ignore case when checking for stop words.
4906
(Grant Ingersoll via Yonik, LUCENE-248)
4908
33. Add TopDocCollector and TopFieldDocCollector. These simplify the
4909
implementation of hit collectors that collect only the
4910
top-scoring or top-sorting hits.
4914
1. Several methods and fields have been deprecated. The API documentation
4915
contains information about the recommended replacements. It is planned
4916
that most of the deprecated methods and fields will be removed in
4917
Lucene 2.0. (Daniel Naber)
4919
2. The Russian and the German analyzers have been moved to contrib/analyzers.
4920
Also, the WordlistLoader class has been moved one level up in the
4921
hierarchy and is now org.apache.lucene.analysis.WordlistLoader
4924
3. The API contained methods that declared to throw an IOException
4925
but that never did this. These declarations have been removed. If
4926
your code tries to catch these exceptions you might need to remove
4927
those catch clauses to avoid compile errors. (Daniel Naber)
4929
4. Add a serializable Parameter Class to standardize parameter enum
4930
classes in BooleanClause and Field. (Christoph)
4932
5. Added rewrite methods to all SpanQuery subclasses that nest other SpanQuerys.
4933
This allows custom SpanQuery subclasses that rewrite (for term expansion, for
4934
example) to nest within the built-in SpanQuery classes successfully.
4938
1. The JSP demo page (src/jsp/results.jsp) now properly closes the
4939
IndexSearcher it opens. (Daniel Naber)
4941
2. Fixed a bug in IndexWriter.addIndexes(IndexReader[] readers) that
4942
prevented deletion of obsolete segments. (Christoph Goller)
4944
3. Fix in FieldInfos to avoid the return of an extra blank field in
4945
IndexReader.getFieldNames() (Patch #19058). (Mark Harwood via Bernhard)
4947
4. Some combinations of BooleanQuery and MultiPhraseQuery (formerly
4948
PhrasePrefixQuery) could provoke UnsupportedOperationException
4949
(bug #33161). (Rhett Sutphin via Daniel Naber)
4951
5. Small bug in skipTo of ConjunctionScorer that caused NullPointerException
4952
if skipTo() was called without prior call to next() fixed. (Christoph)
4954
6. Disable Similiarty.coord() in the scoring of most automatically
4955
generated boolean queries. The coord() score factor is
4956
appropriate when clauses are independently specified by a user,
4957
but is usually not appropriate when clauses are generated
4958
automatically, e.g., by a fuzzy, wildcard or range query. Matches
4959
on such automatically generated queries are no longer penalized
4960
for not matching all terms. (Doug Cutting, Patch #33472)
4962
7. Getting a lock file with Lock.obtain(long) was supposed to wait for
4963
a given amount of milliseconds, but this didn't work.
4964
(John Wang via Daniel Naber, Bug #33799)
4966
8. Fix FSDirectory.createOutput() to always create new files.
4967
Previously, existing files were overwritten, and an index could be
4968
corrupted when the old version of a file was longer than the new.
4969
Now any existing file is first removed. (Doug Cutting)
4971
9. Fix BooleanQuery containing nested SpanTermQuery's, which previously
4972
could return an incorrect number of hits.
4973
(Reece Wilton via Erik Hatcher, Bug #35157)
4975
10. Fix NullPointerException that could occur with a MultiPhraseQuery
4976
inside a BooleanQuery.
4977
(Hans Hjelm and Scotty Allen via Daniel Naber, Bug #35626)
4979
11. Fixed SnowballFilter to pass through the position increment from
4981
(Yonik Seeley via Erik Hatcher, LUCENE-437)
4983
12. Added Unicode range of Korean characters to StandardTokenizer,
4984
grouping contiguous characters into a token rather than one token
4985
per character. This change also changes the token type to "<CJ>"
4986
for Chinese and Japanese character tokens (previously it was "<CJK>").
4987
(Cheolgoo Kang via Otis and Erik, LUCENE-444 and LUCENE-461)
4989
13. FieldsReader now looks at FieldInfo.storeOffsetWithTermVector and
4990
FieldInfo.storePositionWithTermVector and creates the Field with
4991
correct TermVector parameter.
4992
(Frank Steinmann via Bernhard, LUCENE-455)
4994
14. Fixed WildcardQuery to prevent "cat" matching "ca??".
4995
(Xiaozheng Ma via Bernhard, LUCENE-306)
4997
15. Fixed a bug where MultiSearcher and ParallelMultiSearcher could
4998
change the sort order when sorting by string for documents without
4999
a value for the sort field.
5000
(Luc Vanlerberghe via Yonik, LUCENE-453)
5002
16. Fixed a sorting problem with MultiSearchers that can lead to
5003
missing or duplicate docs due to equal docs sorting in an arbitrary order.
5004
(Yonik Seeley, LUCENE-456)
5006
17. A single hit using the expert level sorted search methods
5007
resulted in the score not being normalized.
5008
(Yonik Seeley, LUCENE-462)
5010
18. Fixed inefficient memory usage when loading an index into RAMDirectory.
5011
(Volodymyr Bychkoviak via Bernhard, LUCENE-475)
5013
19. Corrected term offsets returned by ChineseTokenizer.
5014
(Ray Tsang via Erik Hatcher, LUCENE-324)
5016
20. Fixed MultiReader.undeleteAll() to correctly update numDocs.
5017
(Robert Kirchgessner via Doug Cutting, LUCENE-479)
5019
21. Race condition in IndexReader.getCurrentVersion() and isCurrent()
5020
fixed by acquiring the commit lock.
5021
(Luc Vanlerberghe via Yonik Seeley, LUCENE-481)
5023
22. IndexWriter.setMaxBufferedDocs(1) didn't have the expected effect,
5024
this has now been fixed. (Daniel Naber)
5026
23. Fixed QueryParser when called with a date in local form like
5027
"[1/16/2000 TO 1/18/2000]". This query did not include the documents
5028
of 1/18/2000, i.e. the last day was not included. (Daniel Naber)
5030
24. Removed sorting constraint that threw an exception if there were
5031
not yet any values for the sort field (Yonik Seeley, LUCENE-374)
5035
1. Disk usage (peak requirements during indexing and optimization)
5036
in case of compound file format has been improved.
5037
(Bernhard, Dmitry, and Christoph)
5039
2. Optimize the performance of certain uses of BooleanScorer,
5040
TermScorer and IndexSearcher. In particular, a BooleanQuery
5041
composed of TermQuery, with not all terms required, that returns a
5042
TopDocs (e.g., through a Hits with no Sort specified) runs much
5045
3. Removed synchronization from reading of term vectors with an
5046
IndexReader (Patch #30736). (Bernhard Messer via Christoph)
5048
4. Optimize term-dictionary lookup to allocate far fewer terms when
5049
scanning for the matching term. This speeds searches involving
5050
low-frequency terms, where the cost of dictionary lookup can be
5051
significant. (cutting)
5053
5. Optimize fuzzy queries so the standard fuzzy queries with a prefix
5054
of 0 now run 20-50% faster (Patch #31882).
5055
(Jonathan Hager via Daniel Naber)
5057
6. A Version of BooleanScorer (BooleanScorer2) added that delivers
5058
documents in increasing order and implements skipTo. For queries
5059
with required or forbidden clauses it may be faster than the old
5060
BooleanScorer, for BooleanQueries consisting only of optional
5061
clauses it is probably slower. The new BooleanScorer is now the
5062
default. (Patch 31785 by Paul Elschot via Christoph)
5064
7. Use uncached access to norms when merging to reduce RAM usage.
5065
(Bug #32847). (Doug Cutting)
5067
8. Don't read term index when random-access is not required. This
5068
reduces time to open IndexReaders and they use less memory when
5069
random access is not required, e.g., when merging segments. The
5070
term index is now read into memory lazily at the first
5071
random-access. (Doug Cutting)
5073
9. Optimize IndexWriter.addIndexes(Directory[]) when the number of
5074
added indexes is larger than mergeFactor. Previously this could
5075
result in quadratic performance. Now performance is n log(n).
5078
10. Speed up the creation of TermEnum for indices with multiple
5079
segments and deleted documents, and thus speed up PrefixQuery,
5080
RangeQuery, WildcardQuery, FuzzyQuery, RangeFilter, DateFilter,
5081
and sorting the first time on a field.
5082
(Yonik Seeley, LUCENE-454)
5084
11. Optimized and generalized 32 bit floating point to byte
5085
(custom 8 bit floating point) conversions. Increased the speed of
5086
Similarity.encodeNorm() anywhere from 10% to 250%, depending on the JVM.
5087
(Yonik Seeley, LUCENE-467)
5091
1. Lucene's source code repository has converted from CVS to
5092
Subversion. The new repository is at
5093
http://svn.apache.org/repos/asf/lucene/java/trunk
5095
2. Lucene's issue tracker has migrated from Bugzilla to JIRA.
5096
Lucene's JIRA is at http://issues.apache.org/jira/browse/LUCENE
5097
The old issues are still available at
5098
http://issues.apache.org/bugzilla/show_bug.cgi?id=xxxx
5099
(use the bug number instead of xxxx)
5104
1. The JSP demo page (src/jsp/results.jsp) now properly escapes error
5105
messages which might contain user input (e.g. error messages about
5106
query parsing). If you used that page as a starting point for your
5107
own code please make sure your code also properly escapes HTML
5108
characters from user input in order to avoid so-called cross site
5109
scripting attacks. (Daniel Naber)
5111
2. QueryParser changes in 1.4.2 broke the QueryParser API. Now the old
5112
API is supported again. (Christoph)
5117
1. Fixed bug #31241: Sorting could lead to incorrect results (documents
5118
missing, others duplicated) if the sort keys were not unique and there
5119
were more than 100 matches. (Daniel Naber)
5121
2. Memory leak in Sort code (bug #31240) eliminated.
5122
(Rafal Krzewski via Christoph and Daniel)
5124
3. FuzzyQuery now takes an additional parameter that specifies the
5125
minimum similarity that is required for a term to match the query.
5126
The QueryParser syntax for this is term~x, where x is a floating
5127
point number >= 0 and < 1 (a bigger number means that a higher
5128
similarity is required). Furthermore, a prefix can be specified
5129
for FuzzyQuerys so that only those terms are considered similar that
5130
start with this prefix. This can speed up FuzzyQuery greatly.
5131
(Daniel Naber, Christoph Goller)
5133
4. PhraseQuery and PhrasePrefixQuery now allow the explicit specification
5134
of relative positions. (Christoph Goller)
5136
5. QueryParser changes: Fix for ArrayIndexOutOfBoundsExceptions
5137
(patch #9110); some unused method parameters removed; The ability
5138
to specify a minimum similarity for FuzzyQuery has been added.
5141
6. IndexSearcher optimization: a new ScoreDoc is no longer allocated
5142
for every non-zero-scoring hit. This makes 'OR' queries that
5143
contain common terms substantially faster. (cutting)
5148
1. Fixed a performance bug in hit sorting code, where values were not
5149
correctly cached. (Aviran via cutting)
5151
2. Fixed errors in file format documentation. (Daniel Naber)
5156
1. Added "an" to the list of stop words in StopAnalyzer, to complement
5157
the existing "a" there. Fix for bug 28960
5158
(http://issues.apache.org/bugzilla/show_bug.cgi?id=28960). (Otis)
5160
2. Added new class FieldCache to manage in-memory caches of field term
5163
3. Added overloaded getFieldQuery method to QueryParser which
5164
accepts the slop factor specified for the phrase (or the default
5165
phrase slop for the QueryParser instance). This allows overriding
5166
methods to replace a PhraseQuery with a SpanNearQuery instead,
5167
keeping the proper slop factor. (Erik Hatcher)
5169
4. Changed the encoding of GermanAnalyzer.java and GermanStemmer.java to
5170
UTF-8 and changed the build encoding to UTF-8, to make changed files
5171
compile. (Otis Gospodnetic)
5173
5. Removed synchronization from term lookup under IndexReader methods
5174
termFreq(), termDocs() or termPositions() to improve
5175
multi-threaded performance. (cutting)
5177
6. Fix a bug where obsolete segment files were not deleted on Win32.
5182
1. Fixed several search bugs introduced by the skipTo() changes in
5183
release 1.4RC1. The index file format was changed a bit, so
5184
collections must be re-indexed to take advantage of the skipTo()
5185
optimizations. (Christoph Goller)
5187
2. Added new Document methods, removeField() and removeFields().
5190
3. Fixed inconsistencies with index closing. Indexes and directories
5191
are now only closed automatically by Lucene when Lucene opened
5192
them automatically. (Christoph Goller)
5194
4. Added new class: FilteredQuery. (Tim Jones)
5196
5. Added a new SortField type for custom comparators. (Tim Jones)
5198
6. Lock obtain timed out message now displays the full path to the lock
5199
file. (Daniel Naber via Erik)
5201
7. Fixed a bug in SpanNearQuery when ordered. (Paul Elschot via cutting)
5203
8. Fixed so that FSDirectory's locks still work when the
5204
java.io.tmpdir system property is null. (cutting)
5206
9. Changed FilteredTermEnum's constructor to take no parameters,
5207
as the parameters were ignored anyway (bug #28858)
5211
1. GermanAnalyzer now throws an exception if the stopword file
5212
cannot be found (bug #27987). It now uses LowerCaseFilter
5213
(bug #18410) (Daniel Naber via Otis, Erik)
5215
2. Fixed a few bugs in the file format documentation. (cutting)
5220
1. Changed the format of the .tis file, so that:
5222
- it has a format version number, which makes it easier to
5223
back-compatibly change file formats in the future.
5225
- the term count is now stored as a long. This was the one aspect
5226
of the Lucene's file formats which limited index size.
5228
- a few internal index parameters are now stored in the index, so
5229
that they can (in theory) now be changed from index to index,
5230
although there is not yet an API to do so.
5232
These changes are back compatible. The new code can read old
5233
indexes. But old code will not be able read new indexes. (cutting)
5235
2. Added an optimized implementation of TermDocs.skipTo(). A skip
5236
table is now stored for each term in the .frq file. This only
5237
adds a percent or two to overall index size, but can substantially
5238
speedup many searches. (cutting)
5240
3. Restructured the Scorer API and all Scorer implementations to take
5241
advantage of an optimized TermDocs.skipTo() implementation. In
5242
particular, PhraseQuerys and conjunctive BooleanQuerys are
5243
faster when one clause has substantially fewer matches than the
5244
others. (A conjunctive BooleanQuery is a BooleanQuery where all
5245
clauses are required.) (cutting)
5247
4. Added new class ParallelMultiSearcher. Combined with
5248
RemoteSearchable this makes it easy to implement distributed
5249
search systems. (Jean-Francois Halleux via cutting)
5251
5. Added support for hit sorting. Results may now be sorted by any
5252
indexed field. For details see the javadoc for
5253
Searcher#search(Query, Sort). (Tim Jones via Cutting)
5255
6. Changed FSDirectory to auto-create a full directory tree that it
5256
needs by using mkdirs() instead of mkdir(). (Mladen Turk via Otis)
5258
7. Added a new span-based query API. This implements, among other
5259
things, nested phrases. See javadocs for details. (Doug Cutting)
5261
8. Added new method Query.getSimilarity(Searcher), and changed
5262
scorers to use it. This permits one to subclass a Query class so
5263
that it can specify its own Similarity implementation, perhaps
5264
one that delegates through that of the Searcher. (Julien Nioche
5267
9. Added MultiReader, an IndexReader that combines multiple other
5268
IndexReaders. (Cutting)
5270
10. Added support for term vectors. See Field#isTermVectorStored().
5271
(Grant Ingersoll, Cutting & Dmitry)
5273
11. Fixed the old bug with escaping of special characters in query
5274
strings: http://issues.apache.org/bugzilla/show_bug.cgi?id=24665
5275
(Jean-Francois Halleux via Otis)
5277
12. Added support for overriding default values for the following,
5278
using system properties:
5279
- default commit lock timeout
5280
- default maxFieldLength
5281
- default maxMergeDocs
5282
- default mergeFactor
5283
- default minMergeDocs
5284
- default write lock timeout
5287
13. Changed QueryParser.jj to allow '-' and '+' within tokens:
5288
http://issues.apache.org/bugzilla/show_bug.cgi?id=27491
5289
(Morus Walter via Otis)
5291
14. Changed so that the compound index format is used by default.
5292
This makes indexing a bit slower, but vastly reduces the chances
5293
of file handle problems. (Cutting)
5298
1. Added catch of BooleanQuery$TooManyClauses in QueryParser to
5299
throw ParseException instead. (Erik Hatcher)
5301
2. Fixed a NullPointerException in Query.explain(). (Doug Cutting)
5303
3. Added a new method IndexReader.setNorm(), that permits one to
5304
alter the boosting of fields after an index is created.
5306
4. Distinguish between the final position and length when indexing a
5307
field. The length is now defined as the total number of tokens,
5308
instead of the final position, as it was previously. Length is
5309
used for score normalization (Similarity.lengthNorm()) and for
5310
controlling memory usage (IndexWriter.maxFieldLength). In both of
5311
these cases, the total number of tokens is a better value to use
5312
than the final token position. Position is used in phrase
5313
searching (see PhraseQuery and Token.setPositionIncrement()).
5315
5. Fix StandardTokenizer's handling of CJK characters (Chinese,
5316
Japanese and Korean ideograms). Previously contiguous sequences
5317
were combined in a single token, which is not very useful. Now
5318
each ideogram generates a separate token, which is more useful.
5323
1. Added minMergeDocs in IndexWriter. This can be raised to speed
5324
indexing without altering the number of files, but only using more
5325
memory. (Julien Nioche via Otis)
5327
2. Fix bug #24786, in query rewriting. (bschneeman via Cutting)
5329
3. Fix bug #16952, in demo HTML parser, skip comments in
5330
javascript. (Christoph Goller)
5332
4. Fix bug #19253, in demo HTML parser, add whitespace as needed to
5333
output (Daniel Naber via Christoph Goller)
5335
5. Fix bug #24301, in demo HTML parser, long titles no longer
5336
hang things. (Christoph Goller)
5338
6. Fix bug #23534, Replace use of file timestamp of segments file
5339
with an index version number stored in the segments file. This
5340
resolves problems when running on file systems with low-resolution
5341
timestamps, e.g., HFS under MacOS X. (Christoph Goller)
5343
7. Fix QueryParser so that TokenMgrError is not thrown, only
5344
ParseException. (Erik Hatcher)
5346
8. Fix some bugs introduced by change 11 of RC2. (Christoph Goller)
5348
9. Fixed a problem compiling TestRussianStem. (Christoph Goller)
5350
10. Cleaned up some build stuff. (Erik Hatcher)
5355
1. Added getFieldNames(boolean) to IndexReader, SegmentReader, and
5356
SegmentsReader. (Julien Nioche via otis)
5358
2. Changed file locking to place lock files in
5359
System.getProperty("java.io.tmpdir"), where all users are
5360
permitted to write files. This way folks can open and correctly
5361
lock indexes which are read-only to them.
5363
3. IndexWriter: added a new method, addDocument(Document, Analyzer),
5364
permitting one to easily use different analyzers for different
5365
documents in the same index.
5367
4. Minor enhancements to FuzzyTermEnum.
5368
(Christoph Goller via Otis)
5370
5. PriorityQueue: added insert(Object) method and adjusted IndexSearcher
5371
and MultiIndexSearcher to use it.
5372
(Christoph Goller via Otis)
5374
6. Fixed a bug in IndexWriter that returned incorrect docCount().
5375
(Christoph Goller via Otis)
5377
7. Fixed SegmentsReader to eliminate the confusing and slightly different
5378
behaviour of TermEnum when dealing with an enumeration of all terms,
5379
versus an enumeration starting from a specific term.
5380
This patch also fixes incorrect term document frequencies when the same term
5381
is present in multiple segments.
5382
(Christoph Goller via Otis)
5384
8. Added CachingWrapperFilter and PerFieldAnalyzerWrapper. (Erik Hatcher)
5386
9. Added support for the new "compound file" index format (Dmitry
5389
10. Added Locale setting to QueryParser, for use by date range parsing.
5391
11. Changed IndexReader so that it can be subclassed by classes
5392
outside of its package. Previously it had package-private
5393
abstract methods. Also modified the index merging code so that it
5394
can work on an arbitrary IndexReader implementation, and added a
5395
new method, IndexWriter.addIndexes(IndexReader[]), to take
5396
advantage of this. (cutting)
5398
12. Added a limit to the number of clauses which may be added to a
5399
BooleanQuery. The default limit is 1024 clauses. This should
5400
stop most OutOfMemoryExceptions by prefix, wildcard and fuzzy
5401
queries which run amok. (cutting)
5403
13. Add new method: IndexReader.undeleteAll(). This undeletes all
5404
deleted documents which still remain in the index. (cutting)
5409
1. Fixed PriorityQueue's clear() method.
5410
Fix for bug 9454, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9454
5411
(Matthijs Bomhoff via otis)
5413
2. Changed StandardTokenizer.jj grammar for EMAIL tokens.
5414
Fix for bug 9015, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9015
5415
(Dale Anson via otis)
5417
3. Added the ability to disable lock creation by using disableLuceneLocks
5418
system property. This is useful for read-only media, such as CD-ROMs.
5421
4. Added id method to Hits to be able to access the index global id.
5422
Required for sorting options.
5425
5. Added support for new range query syntax to QueryParser.jj.
5428
6. Added the ability to retrieve HTML documents' META tag values to
5430
(Mark Harwood via otis)
5432
7. Modified QueryParser to make it possible to programmatically specify the
5433
default Boolean operator (OR or AND).
5434
(Péter Halácsy via otis)
5436
8. Made many search methods and classes non-final, per requests.
5437
This includes IndexWriter and IndexSearcher, among others.
5440
9. Added class RemoteSearchable, providing support for remote
5441
searching via RMI. The test class RemoteSearchableTest.java
5442
provides an example of how this can be used. (cutting)
5444
10. Added PhrasePrefixQuery (and supporting MultipleTermPositions). The
5445
test class TestPhrasePrefixQuery provides the usage example.
5446
(Anders Nielsen via otis)
5448
11. Changed the German stemming algorithm to ignore case while
5449
stripping. The new algorithm is faster and produces more equal
5450
stems from nouns and verbs derived from the same word.
5453
12. Added support for boosting the score of documents and fields via
5454
the new methods Document.setBoost(float) and Field.setBoost(float).
5456
Note: This changes the encoding of an indexed value. Indexes
5457
should be re-created from scratch in order for search scores to
5458
be correct. With the new code and an old index, searches will
5459
yield very large scores for shorter fields, and very small scores
5460
for longer fields. Once the index is re-created, scores will be
5461
as before. (cutting)
5463
13. Added new method Token.setPositionIncrement().
5465
This permits, for the purpose of phrase searching, placing
5466
multiple terms in a single position. This is useful with
5467
stemmers that produce multiple possible stems for a word.
5469
This also permits the introduction of gaps between terms, so that
5470
terms which are adjacent in a token stream will not be matched by
5471
and exact phrase query. This makes it possible, e.g., to build
5472
an analyzer where phrases are not matched over stop words which
5475
Finally, repeating a token with an increment of zero can also be
5476
used to boost scores of matches on that token. (cutting)
5478
14. Added new Filter class, QueryFilter. This constrains search
5479
results to only match those which also match a provided query.
5480
Results are cached, so that searches after the first on the same
5481
index using this filter are very fast.
5483
This could be used, for example, with a RangeQuery on a formatted
5484
date field to implement date filtering. One could re-use a
5485
single QueryFilter that matches, e.g., only documents modified
5486
within the last week. The QueryFilter and RangeQuery would only
5487
need to be reconstructed once per day. (cutting)
5489
15. Added a new IndexWriter method, getAnalyzer(). This returns the
5490
analyzer used when adding documents to this index. (cutting)
5492
16. Fixed a bug with IndexReader.lastModified(). Before, document
5493
deletion did not update this. Now it does. (cutting)
5495
17. Added Russian Analyzer.
5496
(Boris Okner via otis)
5498
18. Added a public, extensible scoring API. For details, see the
5499
javadoc for org.apache.lucene.search.Similarity.
5501
19. Fixed return of Hits.id() from float to int. (Terry Steichen via Peter).
5503
20. Added getFieldNames() to IndexReader and Segment(s)Reader classes.
5504
(Peter Mularien via otis)
5506
21. Added getFields(String) and getValues(String) methods.
5507
Contributed by Rasik Pandey on 2002-10-09
5508
(Rasik Pandey via otis)
5510
22. Revised internal search APIs. Changes include:
5512
a. Queries are no longer modified during a search. This makes
5513
it possible, e.g., to reuse the same query instance with
5514
multiple indexes from multiple threads.
5516
b. Term-expanding queries (e.g. PrefixQuery, WildcardQuery,
5517
etc.) now work correctly with MultiSearcher, fixing bugs 12619
5520
c. Boosting BooleanQuery's now works, and is supported by the
5521
query parser (problem reported by Lee Mallabone). Thus a query
5522
like "(+foo +bar)^2 +baz" is now supported and equivalent to
5523
"(+foo^2 +bar^2) +baz".
5525
d. New method: Query.rewrite(IndexReader). This permits a
5526
query to re-write itself as an alternate, more primitive query.
5527
Most of the term-expanding query classes (PrefixQuery,
5528
WildcardQuery, etc.) are now implemented using this method.
5530
e. New method: Searchable.explain(Query q, int doc). This
5531
returns an Explanation instance that describes how a particular
5532
document is scored against a query. An explanation can be
5533
displayed as either plain text, with the toString() method, or
5534
as HTML, with the toHtml() method. Note that computing an
5535
explanation is as expensive as executing the query over the
5536
entire index. This is intended to be used in developing
5537
Similarity implementations, and, for good performance, should
5538
not be displayed with every hit.
5540
f. Scorer and Weight are public, not package protected. It now
5541
possible for someone to write a Scorer implementation that is
5542
not in the org.apache.lucene.search package. This is still
5543
fairly advanced programming, and I don't expect anyone to do
5544
this anytime soon, but at least now it is possible.
5546
g. Added public accessors to the primitive query classes
5547
(TermQuery, PhraseQuery and BooleanQuery), permitting access to
5548
their terms and clauses.
5550
Caution: These are extensive changes and they have not yet been
5551
tested extensively. Bug reports are appreciated.
5554
23. Added convenience RAMDirectory constructors taking File and String
5555
arguments, for easy FSDirectory to RAMDirectory conversion.
5558
24. Added code for manual renaming of files in FSDirectory, since it
5559
has been reported that java.io.File's renameTo(File) method sometimes
5560
fails on Windows JVMs.
5561
(Matt Tucker via otis)
5563
25. Refactored QueryParser to make it easier for people to extend it.
5564
Added the ability to automatically lower-case Wildcard terms in
5566
(Tatu Saloranta via otis)
5571
1. Changed QueryParser.jj to have "?" be a special character which
5572
allowed it to be used as a wildcard term. Updated TestWildcard
5573
unit test also. (Ralf Hettesheimer via carlson)
5577
1. Renamed build.properties to default.properties and updated
5578
the BUILD.txt document to describe how to override the
5579
default.property settings without having to edit the file. This
5580
brings the build process closer to Scarab's build process.
5583
2. Added MultiFieldQueryParser class. (Kelvin Tan, via otis)
5585
3. Updated "powered by" links. (otis)
5587
4. Fixed instruction for setting up JavaCC - Bug #7017 (otis)
5589
5. Added throwing exception if FSDirectory could not create directory
5590
- Bug #6914 (Eugene Gluzberg via otis)
5592
6. Update MultiSearcher, MultiFieldParse, Constants, DateFilter,
5593
LowerCaseTokenizer javadoc (otis)
5595
7. Added fix to avoid NullPointerException in results.jsp
5596
(Mark Hayes via otis)
5598
8. Changed Wildcard search to find 0 or more char instead of 1 or more
5599
(Lee Mallobone, via otis)
5601
9. Fixed error in offset issue in GermanStemFilter - Bug #7412
5602
(Rodrigo Reyes, via otis)
5604
10. Added unit tests for wildcard search and DateFilter (otis)
5606
11. Allow co-existence of indexed and non-indexed fields with the same name
5607
(cutting/casper, via otis)
5609
12. Add escape character to query parser.
5612
13. Applied a patch that ensures that searches that use DateFilter
5613
don't throw an exception when no matches are found. (David Smiley, via
5616
14. Fixed bugs in DateFilter and wildcardquery unit tests. (cutting, otis, carlson)
5621
1. Updated contributions section of website.
5622
Add XML Document #3 implementation to Document Section.
5623
Also added Term Highlighting to Misc Section. (carlson)
5625
2. Fixed NullPointerException for phrase searches containing
5626
unindexed terms, introduced in 1.2RC3. (cutting)
5628
3. Changed document deletion code to obtain the index write lock,
5629
enforcing the fact that document addition and deletion cannot be
5630
performed concurrently. (cutting)
5632
4. Various documentation cleanups. (otis, acoliver)
5634
5. Updated "powered by" links. (cutting, jon)
5636
6. Fixed a bug in the GermanStemmer. (Bernhard Messer, via otis)
5638
7. Changed Term and Query to implement Serializable. (scottganyo)
5640
8. Fixed to never delete indexes added with IndexWriter.addIndexes().
5643
9. Upgraded to JUnit 3.7. (otis)
5647
1. IndexWriter: fixed a bug where adding an optimized index to an
5648
empty index failed. This was encountered using addIndexes to copy
5649
a RAMDirectory index to an FSDirectory.
5651
2. RAMDirectory: fixed a bug where RAMInputStream could not read
5652
across more than across a single buffer boundary.
5654
3. Fix query parser so it accepts queries with unicode characters.
5657
4. Fix query parser so that PrefixQuery is used in preference to
5658
WildcardQuery when there's only an asterisk at the end of the
5659
term. Previously PrefixQuery would never be used.
5661
5. Fix tests so they compile; fix ant file so it compiles tests
5662
properly. Added test cases for Analyzers and PriorityQueue.
5664
6. Updated demos, added Getting Started documentation. (acoliver)
5666
7. Added 'contributions' section to website & docs. (carlson)
5668
8. Removed JavaCC from source distribution for copyright reasons.
5669
Folks must now download this separately from metamata in order to
5670
compile Lucene. (cutting)
5672
9. Substantially improved the performance of DateFilter by adding the
5673
ability to reuse TermDocs objects. (cutting)
5675
10. Added IndexReader methods:
5676
public static boolean indexExists(String directory);
5677
public static boolean indexExists(File directory);
5678
public static boolean indexExists(Directory directory);
5679
public static boolean isLocked(Directory directory);
5680
public static void unlock(Directory directory);
5683
11. Fixed bugs in GermanAnalyzer (gschwarz)
5687
- added sources to distribution
5688
- removed broken build scripts and libraries from distribution
5689
- SegmentsReader: fixed potential race condition
5690
- FSDirectory: fixed so that getDirectory(xxx,true) correctly
5691
erases the directory contents, even when the directory
5692
has already been accessed in this JVM.
5693
- RangeQuery: Fix issue where an inclusive range query would
5694
include the nearest term in the index above a non-existant
5695
specified upper term.
5696
- SegmentTermEnum: Fix NullPointerException in clone() method
5697
when the Term is null.
5698
- JDK 1.1 compatibility fix: disabled lock files for JDK 1.1,
5699
since they rely on a feature added in JDK 1.2.
5701
1.2 RC1 (first Apache release):
5702
- packages renamed from com.lucene to org.apache.lucene
5703
- license switched from LGPL to Apache
5704
- ant-only build -- no more makefiles
5705
- addition of lock files--now fully thread & process safe
5706
- addition of German stemmer
5707
- MultiSearcher now supports low-level search API
5708
- added RangeQuery, for term-range searching
5709
- Analyzers can choose tokenizer based on field name
5712
1.01b (last Sourceforge release)
5715
. new prefix query (search for "foo*" matches "food")
5719
This release fixes a few serious bugs and also includes some
5720
performance optimizations, a stemmer, and a few other minor
5725
Lucene now includes a grammar-based tokenizer, StandardTokenizer.
5727
The only tokenizer included in the previous release (LetterTokenizer)
5728
identified terms consisting entirely of alphabetic characters. The
5729
new tokenizer uses a regular-expression grammar to identify more
5730
complex classes of terms, including numbers, acronyms, email
5733
StandardTokenizer serves two purposes:
5735
1. It is a much better, general purpose tokenizer for use by
5738
The easiest way for applications to start using
5739
StandardTokenizer is to use StandardAnalyzer.
5741
2. It provides a good example of grammar-based tokenization.
5743
If an application has special tokenization requirements, it can
5744
implement a custom tokenizer by copying the directory containing
5745
the new tokenizer into the application and modifying it
5750
First open source release.
5752
The code has been re-organized into a new package and directory
5753
structure for this release. It builds OK, but has not been tested
5754
beyond that since the re-organization.