~slub.team/goobi-indexserver/3.x

with other getTermFreqVector calls. Also removed the throwing of the other IOException in that method to be consistent. (Karl Wettin via Grant Ingersoll)

3588

3589

26. LUCENE-1096: Fixed Hits behavior when hits' docs are deleted

3590

along with iterating the hits. Deleting docs already retrieved

3591

now works seamlessly. If docs not yet retrieved are deleted

3592

(e.g. from another thread), and then, relying on the initial

3593

Hits.length(), an application attempts to retrieve more hits

3594

than actually exist , a ConcurrentMidificationException

3595

is thrown. (Doron Cohen)

3596

3597

27. LUCENE-1068: Changed StandardTokenizer to fix an issue with it marking

3598

the type of some tokens incorrectly. This is done by adding a new flag named

3599

replaceInvalidAcronym which defaults to false, the current, incorrect behavior. Setting

3600

this flag to true fixes the problem. This flag is a temporary fix and is already

3601

marked as being deprecated. 3.x will implement the correct approach. (Shai Erera via Grant Ingersoll)

3602

LUCENE-1140: Fixed NPE caused by 1068 (Alexei Dets via Grant Ingersoll)

3603

3604

28. LUCENE-749: ChainedFilter behavior fixed when logic of

3605

first filter is ANDNOT. (Antonio Bruno via Doron Cohen)

3606

3607

29. LUCENE-508: Make sure SegmentTermEnum.prev() is accurate (= last

3608

term) after next() returns false. (Steven Tamm via Mike

3609

McCandless)

3610

3611

3612

New features

3613

3614

1. LUCENE-906: Elision filter for French.

3615

(Mathieu Lecarme via Otis Gospodnetic)

3616

3617

2. LUCENE-960: Added a SpanQueryFilter and related classes to allow for

3618

not only filtering, but knowing where in a Document a Filter matches

3619

(Grant Ingersoll)

3620

3621

3. LUCENE-868: Added new Term Vector access features. New callback

3622

mechanism allows application to define how and where to read Term

3623

Vectors from disk. This implementation contains several extensions

3624

of the new abstract TermVectorMapper class. The new API should be

3625

back-compatible. No changes in the actual storage of Term Vectors

3626

has taken place.

3627

3.1 LUCENE-1038: Added setDocumentNumber() method to TermVectorMapper

3628

to provide information about what document is being accessed.

3629

(Karl Wettin via Grant Ingersoll)

3630

3631

4. LUCENE-975: Added PositionBasedTermVectorMapper that allows for

3632

position based lookup of term vector information.

3633

See item #3 above (LUCENE-868).

3634

3635

5. LUCENE-1011: Added simple tools (all in org.apache.lucene.store)

3636

to verify that locking is working properly. LockVerifyServer runs

3637

a separate server to verify locks. LockStressTest runs a simple

3638

tool that rapidly obtains and releases locks.

3639

VerifyingLockFactory is a LockFactory that wraps any other

3640

LockFactory and consults the LockVerifyServer whenever a lock is

3641

obtained or released, throwing an exception if an illegal lock

3642

obtain occurred. (Patrick Kimber via Mike McCandless)

3643

3644

6. LUCENE-1015: Added FieldCache extension (ExtendedFieldCache) to

3645

support doubles and longs. Added support into SortField for sorting

3646

on doubles and longs as well. (Grant Ingersoll)

3647

3648

7. LUCENE-1020: Created basic index checking & repair tool

3649

(o.a.l.index.CheckIndex). When run without -fix it does a

3650

detailed test of all segments in the index and reports summary

3651

information and any errors it hit. With -fix it will remove

3652

segments that had errors. (Mike McCandless)

3653

3654

8. LUCENE-743: Add IndexReader.reopen() method that re-opens an

3655

existing IndexReader by only loading those portions of an index

3656

that have changed since the reader was (re)opened. reopen() can

3657

be significantly faster than open(), depending on the amount of

3658

index changes. SegmentReader, MultiSegmentReader, MultiReader,

3659

and ParallelReader implement reopen(). (Michael Busch)

3660

3661

9. LUCENE-1040: CharArraySet useful for efficiently checking

3662

set membership of text specified by char[]. (yonik)

3663

3664

10. LUCENE-1073: Created SnapshotDeletionPolicy to facilitate taking a

3665

live backup of an index without pausing indexing. (Mike

3666

McCandless)

3667

3668

11. LUCENE-1019: CustomScoreQuery enhanced to support multiple

3669

ValueSource queries. (Kyle Maxwell via Doron Cohen)

3670

3671

12. LUCENE-1095: Added an option to StopFilter to increase

3672

positionIncrement of the token succeeding a stopped token.

3673

Disabled by default. Similar option added to QueryParser

3674

to consider token positions when creating PhraseQuery

3675

and MultiPhraseQuery. Disabled by default (so by default

3676

the query parser ignores position increments).

3677

(Doron Cohen)

3678

3679

13. LUCENE-1380: Added TokenFilter for setting position increment in special cases related to the ShingleFilter (Mck SembWever, Steve Rowe, Karl Wettin via Grant Ingersoll)

3680

3681

3682

3683

Optimizations

3684

3685

1. LUCENE-937: CachingTokenFilter now uses an iterator to access the

3686

Tokens that are cached in the LinkedList. This increases performance

3687

significantly, especially when the number of Tokens is large.

3688

(Mark Miller via Michael Busch)

3689

3690

2. LUCENE-843: Substantial optimizations to improve how IndexWriter

3691

uses RAM for buffering documents and to speed up indexing (2X-8X

3692

faster). A single shared hash table now records the in-memory

3693

postings per unique term and is directly flushed into a single

3694

segment. (Mike McCandless)

3695

3696

3. LUCENE-892: Fixed extra "buffer to buffer copy" that sometimes

3697

takes place when using compound files. (Mike McCandless)

3698

3699

4. LUCENE-959: Remove synchronization in Document (yonik)

3700

3701

5. LUCENE-963: Add setters to Field to allow for re-using a single

3702

Field instance during indexing. This is a sizable performance

3703

gain, especially for small documents. (Mike McCandless)

3704

3705

6. LUCENE-939: Check explicitly for boundary conditions in FieldInfos

3706

and don't rely on exceptions. (Michael Busch)

3707

3708

7. LUCENE-966: Very substantial speedups (~6X faster) for

3709

StandardTokenizer (StandardAnalyzer) by using JFlex instead of

3710

JavaCC to generate the tokenizer.

3711

(Stanislaw Osinski via Mike McCandless)

3712

3713

8. LUCENE-969: Changed core tokenizers & filters to re-use Token and

3714

TokenStream instances when possible to improve tokenization

3715

performance (~10-15%). (Mike McCandless)

3716

3717

9. LUCENE-871: Speedup ISOLatin1AccentFilter (Ian Boston via Mike

3718

McCandless)

3719

3720

10. LUCENE-986: Refactored SegmentInfos from IndexReader into the new

3721

subclass DirectoryIndexReader. SegmentReader and MultiSegmentReader

3722

now extend DirectoryIndexReader and are the only IndexReader

3723

implementations that use SegmentInfos to access an index and

3724

acquire a write lock for index modifications. (Michael Busch)

3725

3726

11. LUCENE-1007: Allow flushing in IndexWriter to be triggered by

3727

either RAM usage or document count or both (whichever comes

3728

first), by adding symbolic constant DISABLE_AUTO_FLUSH to disable

3729

one of the flush triggers. (Ning Li via Mike McCandless)

3730

3731

12. LUCENE-1043: Speed up merging of stored fields by bulk-copying the

3732

raw bytes for each contiguous range of non-deleted documents.

3733

(Robert Engels via Mike McCandless)

3734

3735

13. LUCENE-693: Speed up nested conjunctions (~2x) that match many

3736

documents, and a slight performance increase for top level

3737

conjunctions. (yonik)

3738

3739

14. LUCENE-1098: Make inner class StandardAnalyzer.SavedStreams static

3740

and final. (Nathan Beyer via Michael Busch)

3741

3742

Documentation

3743

3744

1. LUCENE-1051: Generate separate javadocs for core, demo and contrib

3745

classes, as well as an unified view. Also add an appropriate menu

3746

structure to the website. (Michael Busch)

3747

3748

2. LUCENE-746: Fix error message in AnalyzingQueryParser.getPrefixQuery.

3749

(Ronnie Kolehmainen via Michael Busch)

3750

3751

Build

3752

3753

1. LUCENE-908: Improvements and simplifications for how the MANIFEST

3754

file and the META-INF dir are created. (Michael Busch)

3755

3756

2. LUCENE-935: Various improvements for the maven artifacts. Now the

3757

artifacts also include the sources as .jar files. (Michael Busch)

3758

3759

3. Added apply-patch target to top-level build. Defaults to looking for

3760

a patch in ${basedir}/../patches with name specified by -Dpatch.name.

3761

Can also specify any location by -Dpatch.file property on the command

3762

line. This should be helpful for easy application of patches, but it

3763

is also a step towards integrating automatic patch application with

3764

JIRA and Hudson, and is thus subject to change. (Grant Ingersoll)

3765

3766

4. LUCENE-935: Defined property "m2.repository.url" to allow setting

3767

the url to a maven remote repository to deploy to. (Michael Busch)

3768

3769

5. LUCENE-1051: Include javadocs in the maven artifacts. (Michael Busch)

3770

3771

6. LUCENE-1055: Remove gdata-server from build files and its sources

3772

from trunk. (Michael Busch)

3773

3774

7. LUCENE-935: Allow to deploy maven artifacts to a remote m2 repository

3775

via scp and ssh authentication. (Michael Busch)

3776

3777

8. LUCENE-1123: Allow overriding the specification version for

3778

MANIFEST.MF (Michael Busch)

3779

3780

Test Cases

3781

3782

1. LUCENE-766: Test adding two fields with the same name but different

3783

term vector setting. (Nicolas Lalevée via Doron Cohen)

3784

3785

======================= Release 2.2.0 =======================

3786

3787

Changes in runtime behavior

3788

3789

API Changes

3790

3791

1. LUCENE-793: created new exceptions and added them to throws clause

3792

for many methods (all subclasses of IOException for backwards

3793

compatibility): index.StaleReaderException,

3794

index.CorruptIndexException, store.LockObtainFailedException.

3795

This was done to better call out the possible root causes of an

3796

IOException from these methods. (Mike McCandless)

3797

3798

2. LUCENE-811: make SegmentInfos class, plus a few methods from related

3799

classes, package-private again (they were unnecessarily made public

3800

as part of LUCENE-701). (Mike McCandless)

3801

3802

3. LUCENE-710: added optional autoCommit boolean to IndexWriter

3803

constructors. When this is false, index changes are not committed

3804

until the writer is closed. This gives explicit control over when

3805

a reader will see the changes. Also added optional custom

3806

deletion policy to explicitly control when prior commits are

3807

removed from the index. This is intended to allow applications to

3808

share an index over NFS by customizing when prior commits are

3809

deleted. (Mike McCandless)

3810

3811

4. LUCENE-818: changed most public methods of IndexWriter,

3812

IndexReader (and its subclasses), FieldsReader and RAMDirectory to

3813

throw AlreadyClosedException if they are accessed after being

3814

closed. (Mike McCandless)

3815

3816

5. LUCENE-834: Changed some access levels for certain Span classes to allow them

3817

to be overridden. They have been marked expert only and not for public

3818

consumption. (Grant Ingersoll)

3819

3820

6. LUCENE-796: Removed calls to super.* from various get*Query methods in

3821

MultiFieldQueryParser, in order to allow sub-classes to override them.

3822

(Steven Parkes via Otis Gospodnetic)

3823

3824

7. LUCENE-857: Removed caching from QueryFilter and deprecated QueryFilter

3825

in favour of QueryWrapperFilter or QueryWrapperFilter + CachingWrapperFilter

3826

combination when caching is desired.

3827

(Chris Hostetter, Otis Gospodnetic)

3828

3829

8. LUCENE-869: Changed FSIndexInput and FSIndexOutput to inner classes of FSDirectory

3830

to enable extensibility of these classes. (Michael Busch)

3831

3832

9. LUCENE-580: Added the public method reset() to TokenStream. This method does

3833

nothing by default, but may be overwritten by subclasses to support consuming

3834

the TokenStream more than once. (Michael Busch)

3835

3836

10. LUCENE-580: Added a new constructor to Field that takes a TokenStream as

3837

argument, available as tokenStreamValue(). This is useful to avoid the need of

3838

"dummy analyzers" for pre-analyzed fields. (Karl Wettin, Michael Busch)

3839

3840

11. LUCENE-730: Added the new methods to BooleanQuery setAllowDocsOutOfOrder() and

3841

getAllowDocsOutOfOrder(). Deprecated the methods setUseScorer14() and

3842

getUseScorer14(). The optimization patch LUCENE-730 (see Optimizations->3.)

3843

improves performance for certain queries but results in scoring out of docid

3844

order. This patch reverse this change, so now by default hit docs are scored

3845

in docid order if not setAllowDocsOutOfOrder(true) is explicitly called.

3846

This patch also enables the tests in QueryUtils again that check for docid

3847

order. (Paul Elschot, Doron Cohen, Michael Busch)

3848

3849

12. LUCENE-888: Added Directory.openInput(File path, int bufferSize)

3850

to optionally specify the size of the read buffer. Also added

3851

BufferedIndexInput.setBufferSize(int) to change the buffer size.

3852

(Mike McCandless)

3853

3854

13. LUCENE-923: Make SegmentTermPositionVector package-private. It does not need

3855

to be public because it implements the public interface TermPositionVector.

3856

(Michael Busch)

3857

3858

Bug fixes

3859

3860

1. LUCENE-804: Fixed build.xml to pack a fully compilable src dist. (Doron Cohen)

3861

3862

2. LUCENE-813: Leading wildcard fixed to work with trailing wildcard.

3863

Query parser modified to create a prefix query only for the case

3864

that there is a single trailing wildcard (and no additional wildcard

3865

or '?' in the query text). (Doron Cohen)

3866

3867

3. LUCENE-812: Add no-argument constructors to NativeFSLockFactory

3868

and SimpleFSLockFactory. This enables all 4 builtin LockFactory

3869

implementations to be specified via the System property

3870

org.apache.lucene.store.FSDirectoryLockFactoryClass. (Mike McCandless)

3871

3872

4. LUCENE-821: The new single-norm-file introduced by LUCENE-756

3873

failed to reduce the number of open descriptors since it was still

3874

opened once per field with norms. (yonik)

3875

3876

5. LUCENE-823: Make sure internal file handles are closed when

3877

hitting an exception (eg disk full) while flushing deletes in

3878

IndexWriter's mergeSegments, and also during

3879

IndexWriter.addIndexes. (Mike McCandless)

3880

3881

6. LUCENE-825: If directory is removed after

3882

FSDirectory.getDirectory() but before IndexReader.open you now get

3883

a FileNotFoundException like Lucene pre-2.1 (before this fix you

3884

got an NPE). (Mike McCandless)

3885

3886

7. LUCENE-800: Removed backslash from the TERM_CHAR list in the queryparser,

3887

because the backslash is the escape character. Also changed the ESCAPED_CHAR

3888

list to contain all possible characters, because every character that

3889

follows a backslash should be considered as escaped. (Michael Busch)

3890

3891

8. LUCENE-372: QueryParser.parse() now ensures that the entire input string

3892

is consumed. Now a ParseException is thrown if a query contains too many

3893

closing parentheses. (Andreas Neumann via Michael Busch)

3894

3895

9. LUCENE-814: javacc build targets now fix line-end-style of generated files.

3896

Now also deleting all javacc generated files before calling javacc.

3897

(Steven Parkes, Doron Cohen)

3898

3899

10. LUCENE-829: close readers in contrib/benchmark. (Karl Wettin, Doron Cohen)

3900

3901

11. LUCENE-828: Minor fix for Term's equal().

3902

(Paul Cowan via Otis Gospodnetic)

3903

3904

12. LUCENE-846: Fixed: if IndexWriter is opened with autoCommit=false,

3905

and you call addIndexes, and hit an exception (eg disk full) then

3906

when IndexWriter rolls back its internal state this could corrupt

3907

the instance of IndexWriter (but, not the index itself) by

3908

referencing already deleted segments. This bug was only present

3909

in 2.2 (trunk), ie was never released. (Mike McCandless)

3910

3911

13. LUCENE-736: Sloppy phrase query with repeating terms matches wrong docs.

3912

For example query "B C B"~2 matches the doc "A B C D E". (Doron Cohen)

3913

3914

14. LUCENE-789: Fixed: custom similarity is ignored when using MultiSearcher (problem reported

3915

by Alexey Lef). Now the similarity applied by MultiSearcer.setSimilarity(sim) is being used.

3916

Note that as before this fix, creating a multiSearcher from Searchers for whom custom similarity

3917

was set has no effect - it is masked by the similarity of the MultiSearcher. This is as

3918

designed, because MultiSearcher operates on Searchables (not Searchers). (Doron Cohen)

3919

3920

15. LUCENE-880: Fixed DocumentWriter to close the TokenStreams after it

3921

has written the postings. Then the resources associated with the

3922

TokenStreams can safely be released. (Michael Busch)

3923

3924

16. LUCENE-883: consecutive calls to Spellchecker.indexDictionary()

3925

won't insert terms twice anymore. (Daniel Naber)

3926

3927

17. LUCENE-881: QueryParser.escape() now also escapes the characters

3928

'|' and '&' which are part of the queryparser syntax. (Michael Busch)

3929

3930

18. LUCENE-886: Spellchecker clean up: exceptions aren't printed to STDERR

3931

anymore and ignored, but re-thrown. Some javadoc improvements.

3932

(Daniel Naber)

3933

3934

19. LUCENE-698: FilteredQuery now takes the query boost into account for

3935

scoring. (Michael Busch)

3936

3937

20. LUCENE-763: Spellchecker: LuceneDictionary used to skip first word in

3938

enumeration. (Christian Mallwitz via Daniel Naber)

3939

3940

21. LUCENE-903: FilteredQuery explanation inaccuracy with boost.

3941

Explanation tests now "deep" check the explanation details.

3942

(Chris Hostetter, Doron Cohen)

3943

3944

22. LUCENE-912: DisjunctionMaxScorer first skipTo(target) call ignores the

3945

skip target param and ends up at the first match.

3946

(Sudaakeran B. via Chris Hostetter & Doron Cohen)

3947

3948

23. LUCENE-913: Two consecutive score() calls return different

3949

scores for Boolean Queries. (Michael Busch, Doron Cohen)

3950

3951

24. LUCENE-1013: Fix IndexWriter.setMaxMergeDocs to work "out of the

3952

box", again, by moving set/getMaxMergeDocs up from

3953

LogDocMergePolicy into LogMergePolicy. This fixes the API

3954

breakage (non backwards compatible change) caused by LUCENE-994.

3955

(Yonik Seeley via Mike McCandless)

3956

3957

New features

3958

3959

1. LUCENE-759: Added two n-gram-producing TokenFilters.

3960

(Otis Gospodnetic)

3961

3962

2. LUCENE-822: Added FieldSelector capabilities to Searchable for use with

3963

RemoteSearcher, and other Searchable implementations. (Mark Miller, Grant Ingersoll)

3964

3965

3. LUCENE-755: Added the ability to store arbitrary binary metadata in the posting list.

3966

These metadata are called Payloads. For every position of a Token one Payload in the form

3967

of a variable length byte array can be stored in the prox file.

3968

Remark: The APIs introduced with this feature are in experimental state and thus

3969

contain appropriate warnings in the javadocs.

3970

(Michael Busch)

3971

3972

4. LUCENE-834: Added BoostingTermQuery which can boost scores based on the

3973

values of a payload (see #3 above.) (Grant Ingersoll)

3974

3975

5. LUCENE-834: Similarity has a new method for scoring payloads called

3976

scorePayloads that can be overridden to take advantage of payload

3977

storage (see #3 above)

3978

3979

6. LUCENE-834: Added isPayloadAvailable() onto TermPositions interface and

3980

implemented it in the appropriate places (Grant Ingersoll)

3981

3982

7. LUCENE-853: Added RemoteCachingWrapperFilter to enable caching of Filters

3983

on the remote side of the RMI connection.

3984

(Matt Ericson via Otis Gospodnetic)

3985

3986

8. LUCENE-446: Added Solr's search.function for scores based on field

3987

values, plus CustomScoreQuery for simple score (post) customization.

3988

(Yonik Seeley, Doron Cohen)

3989

3990

9. LUCENE-1058: Added new TeeTokenFilter (like the UNIX 'tee' command) and SinkTokenizer which can be used to share tokens between two or more

3991

Fields such that the other Fields do not have to go through the whole Analysis process over again. For instance, if you have two

3992

Fields that share all the same analysis steps except one lowercases tokens and the other does not, you can coordinate the operations

3993

between the two using the TeeTokenFilter and the SinkTokenizer. See TeeSinkTokenTest.java for examples.

3994

(Grant Ingersoll, Michael Busch, Yonik Seeley)

3995

3996

Optimizations

3997

3998

1. LUCENE-761: The proxStream is now cloned lazily in SegmentTermPositions

3999

when nextPosition() is called for the first time. This allows using instances

4000

of SegmentTermPositions instead of SegmentTermDocs without additional costs.

4001

(Michael Busch)

4002

4003

2. LUCENE-431: RAMInputStream and RAMOutputStream extend IndexInput and

4004

IndexOutput directly now. This avoids further buffering and thus avoids

4005

unnecessary array copies. (Michael Busch)

4006

4007

3. LUCENE-730: Updated BooleanScorer2 to make use of BooleanScorer in some

4008

cases and possibly improve scoring performance. Documents can now be

4009

delivered out-of-order as they are scored (e.g. to HitCollector).

4010

N.B. A bit of code had to be disabled in QueryUtils in order for

4011

TestBoolean2 test to keep passing.

4012

(Paul Elschot via Otis Gospodnetic)

4013

4014

4. LUCENE-882: Spellchecker doesn't store the ngrams anymore but only indexes

4015

them to keep the spell index small. (Daniel Naber)

4016

4017

5. LUCENE-430: Delay allocation of the buffer after a clone of BufferedIndexInput.

4018

Together with LUCENE-888 this will allow to adjust the buffer size

4019

dynamically. (Paul Elschot, Michael Busch)

4020

4021

6. LUCENE-888: Increase buffer sizes inside CompoundFileWriter and

4022

BufferedIndexOutput. Also increase buffer size in

4023

BufferedIndexInput, but only when used during merging. Together,

4024

these increases yield 10-18% overall performance gain vs the

4025

previous 1K defaults. (Mike McCandless)

4026

4027

7. LUCENE-866: Adds multi-level skip lists to the posting lists. This speeds

4028

up most queries that use skipTo(), especially on big indexes with large posting

4029

lists. For average AND queries the speedup is about 20%, for queries that

4030

contain very frequent and very unique terms the speedup can be over 80%.

4031

(Michael Busch)

4032

4033

Documentation

4034

4035

1. LUCENE 791 && INFRA-1173: Infrastructure moved the Wiki to

4036

http://wiki.apache.org/lucene-java/ Updated the links in the docs and

4037

wherever else I found references. (Grant Ingersoll, Joe Schaefer)

4038

4039

2. LUCENE-807: Fixed the javadoc for ScoreDocComparator.compare() to be

4040

consistent with java.util.Comparator.compare(): Any integer is allowed to

4041

be returned instead of only -1/0/1.

4042

(Paul Cowan via Michael Busch)

4043

4044

3. LUCENE-875: Solved javadoc warnings & errors under jdk1.4.

4045

Solved javadoc errors under jdk5 (jars in path for gdata).

4046

Made "javadocs" target depend on "build-contrib" for first downloading

4047

contrib jars configured for dynamic downloaded. (Note: when running

4048

behind firewall, a firewall prompt might pop up) (Doron Cohen)

4049

4050

4. LUCENE-740: Added SNOWBALL-LICENSE.txt to the snowball package and a

4051

remark about the license to NOTICE.TXT. (Steven Parkes via Michael Busch)

4052

4053

5. LUCENE-925: Added analysis package javadocs. (Grant Ingersoll and Doron Cohen)

4054

4055

6. LUCENE-926: Added document package javadocs. (Grant Ingersoll)

4056

4057

Build

4058

4059

1. LUCENE-802: Added LICENSE.TXT and NOTICE.TXT to Lucene jars.

4060

(Steven Parkes via Michael Busch)

4061

4062

2. LUCENE-885: "ant test" now includes all contrib tests. The new

4063

"ant test-core" target can be used to run only the Core (non

4064

contrib) tests.

4065

(Chris Hostetter)

4066

4067

3. LUCENE-900: "ant test" now enables Java assertions (in Lucene packages).

4068

(Doron Cohen)

4069

4070

4. LUCENE-894: Add custom build file for binary distributions that includes

4071

targets to build the demos. (Chris Hostetter, Michael Busch)

4072

4073

5. LUCENE-904: The "package" targets in build.xml now also generate .md5

4074

checksum files. (Chris Hostetter, Michael Busch)

4075

4076

6. LUCENE-907: Include LICENSE.TXT and NOTICE.TXT in the META-INF dirs of

4077

demo war, demo jar, and the contrib jars. (Michael Busch)

4078

4079

7. LUCENE-909: Demo targets for running the demo. (Doron Cohen)

4080

4081

8. LUCENE-908: Improves content of MANIFEST file and makes it customizable

4082

for the contribs. Adds SNOWBALL-LICENSE.txt to META-INF of the snowball

4083

jar and makes sure that the lucli jar contains LICENSE.txt and NOTICE.txt.

4084

(Chris Hostetter, Michael Busch)

4085

4086

9. LUCENE-930: Various contrib building improvements to ensure contrib

4087

dependencies are met, and test compilation errors fail the build.

4088

(Steven Parkes, Chris Hostetter)

4089

4090

10. LUCENE-622: Add ant target and pom.xml files for building maven artifacts

4091

of the Lucene core and the contrib modules.

4092

(Sami Siren, Karl Wettin, Michael Busch)

4093

4094

======================= Release 2.1.0 =======================

4095

4096

Changes in runtime behavior

4097

4098

1. 's' and 't' have been removed from the list of default stopwords

4099

in StopAnalyzer (also used in by StandardAnalyzer). Having e.g. 's'

4100

as a stopword meant that 's-class' led to the same results as 'class'.

4101

Note that this problem still exists for 'a', e.g. in 'a-class' as

4102

'a' continues to be a stopword.

4103

(Daniel Naber)

4104

4105

2. LUCENE-478: Updated the list of Unicode code point ranges for CJK

4106

(now split into CJ and K) in StandardAnalyzer. (John Wang and

4107

Steven Rowe via Otis Gospodnetic)

4108

4109

3. Modified some CJK Unicode code point ranges in StandardTokenizer.jj,

4110

and added a few more of them to increase CJK character coverage.

4111

Also documented some of the ranges.

4112

(Otis Gospodnetic)

4113

4114

4. LUCENE-489: Add support for leading wildcard characters (*, ?) to

4115

QueryParser. Default is to disallow them, as before.

4116

(Steven Parkes via Otis Gospodnetic)

4117

4118

5. LUCENE-703: QueryParser changed to default to use of ConstantScoreRangeQuery

4119

for range queries. Added useOldRangeQuery property to QueryParser to allow

4120

selection of old RangeQuery class if required.

4121

(Mark Harwood)

4122

4123

6. LUCENE-543: WildcardQuery now performs a TermQuery if the provided term

4124

does not contain a wildcard character (? or *), when previously a

4125

StringIndexOutOfBoundsException was thrown.

4126

(Michael Busch via Erik Hatcher)

4127

4128

7. LUCENE-726: Removed the use of deprecated doc.fields() method and

4129

Enumeration.

4130

(Michael Busch via Otis Gospodnetic)

4131

4132

8. LUCENE-436: Removed finalize() in TermInfosReader and SegmentReader,

4133

and added a call to enumerators.remove() in TermInfosReader.close().

4134

The finalize() overrides were added to help with a pre-1.4.2 JVM bug

4135

that has since been fixed, plus we no longer support pre-1.4.2 JVMs.

4136

(Otis Gospodnetic)

4137

4138

9. LUCENE-771: The default location of the write lock is now the

4139

index directory, and is named simply "write.lock" (without a big

4140

digest prefix). The system properties "org.apache.lucene.lockDir"

4141

nor "java.io.tmpdir" are no longer used as the global directory

4142

for storing lock files, and the LOCK_DIR field of FSDirectory is

4143

now deprecated. (Mike McCandless)

4144

4145

New features

4146

4147

1. LUCENE-503: New ThaiAnalyzer and ThaiWordFilter in contrib/analyzers

4148

(Samphan Raruenrom via Chris Hostetter)

4149

4150

2. LUCENE-545: New FieldSelector API and associated changes to

4151

IndexReader and implementations. New Fieldable interface for use

4152

with the lazy field loading mechanism. (Grant Ingersoll and Chuck

4153

Williams via Grant Ingersoll)

4154

4155

3. LUCENE-676: Move Solr's PrefixFilter to Lucene core. (Yura

4156

Smolsky, Yonik Seeley)

4157

4158

4. LUCENE-678: Added NativeFSLockFactory, which implements locking

4159

using OS native locking (via java.nio.*). (Michael McCandless via

4160

Yonik Seeley)

4161

4162

5. LUCENE-544: Added the ability to specify different boosts for

4163

different fields when using MultiFieldQueryParser (Matt Ericson

4164

via Otis Gospodnetic)

4165

4166

6. LUCENE-528: New IndexWriter.addIndexesNoOptimize() that doesn't

4167

optimize the index when adding new segments, only performing

4168

merges as needed. (Ning Li via Yonik Seeley)

4169

4170

7. LUCENE-573: QueryParser now allows backslash escaping in

4171

quoted terms and phrases. (Michael Busch via Yonik Seeley)

4172

4173

8. LUCENE-716: QueryParser now allows specification of Unicode

4174

characters in terms via a unicode escape of the form \uXXXX

4175

(Michael Busch via Yonik Seeley)

4176

4177

9. LUCENE-709: Added RAMDirectory.sizeInBytes(), IndexWriter.ramSizeInBytes()

4178

and IndexWriter.flushRamSegments(), allowing applications to

4179

control the amount of memory used to buffer documents.

4180

(Chuck Williams via Yonik Seeley)

4181

4182

10. LUCENE-723: QueryParser now parses *:* as MatchAllDocsQuery

4183

(Yonik Seeley)

4184

4185

11. LUCENE-741: Command-line utility for modifying or removing norms

4186

on fields in an existing index. This is mostly based on LUCENE-496

4187

and lives in contrib/miscellaneous.

4188

(Chris Hostetter, Otis Gospodnetic)

4189

4190

12. LUCENE-759: Added NGramTokenizer and EdgeNGramTokenizer classes and

4191

their passing unit tests.

4192

(Otis Gospodnetic)

4193

4194

13. LUCENE-565: Added methods to IndexWriter to more efficiently

4195

handle updating documents (the "delete then add" use case). This

4196

is intended to be an eventual replacement for the existing

4197

IndexModifier. Added IndexWriter.flush() (renamed from

4198

flushRamSegments()) to flush all pending updates (held in RAM), to

4199

the Directory. (Ning Li via Mike McCandless)

4200

4201

14. LUCENE-762: Added in SIZE and SIZE_AND_BREAK FieldSelectorResult options

4202

which allow one to retrieve the size of a field without retrieving the

4203

actual field. (Chuck Williams via Grant Ingersoll)

4204

4205

15. LUCENE-799: Properly handle lazy, compressed fields.

4206

(Mike Klaas via Grant Ingersoll)

4207

4208

API Changes

4209

4210

1. LUCENE-438: Remove "final" from Token, implement Cloneable, allow

4211

changing of termText via setTermText(). (Yonik Seeley)

4212

4213

2. org.apache.lucene.analysis.nl.WordlistLoader has been deprecated

4214

and is supposed to be replaced with the WordlistLoader class in

4215

package org.apache.lucene.analysis (Daniel Naber)

4216

4217

3. LUCENE-609: Revert return type of Document.getField(s) to Field

4218

for backward compatibility, added new Document.getFieldable(s)

4219

for access to new lazy loaded fields. (Yonik Seeley)

4220

4221

4. LUCENE-608: Document.fields() has been deprecated and a new method

4222

Document.getFields() has been added that returns a List instead of

4223

an Enumeration (Daniel Naber)

4224

4225

5. LUCENE-605: New Explanation.isMatch() method and new ComplexExplanation

4226

subclass allows explain methods to produce Explanations which model

4227

"matching" independent of having a positive value.

4228

(Chris Hostetter)

4229

4230

6. LUCENE-621: New static methods IndexWriter.setDefaultWriteLockTimeout

4231

and IndexWriter.setDefaultCommitLockTimeout for overriding default

4232

timeout values for all future instances of IndexWriter (as well

4233

as for any other classes that may reference the static values,

4234

ie: IndexReader).

4235

(Michael McCandless via Chris Hostetter)

4236

4237

7. LUCENE-638: FSDirectory.list() now only returns the directory's

4238

Lucene-related files. Thanks to this change one can now construct

4239

a RAMDirectory from a file system directory that contains files

4240

not related to Lucene.

4241

(Simon Willnauer via Daniel Naber)

4242

4243

8. LUCENE-635: Decoupling locking implementation from Directory

4244

implementation. Added set/getLockFactory to Directory and moved

4245

all locking code into subclasses of abstract class LockFactory.

4246

FSDirectory and RAMDirectory still default to their prior locking

4247

implementations, but now you can mix & match, for example using

4248

SingleInstanceLockFactory (ie, in memory locking) locking with an

4249

FSDirectory. Note that now you must call setDisableLocks before

4250

the instantiation a FSDirectory if you wish to disable locking

4251

for that Directory.

4252

(Michael McCandless, Jeff Patterson via Yonik Seeley)

4253

4254

9. LUCENE-657: Made FuzzyQuery non-final and inner ScoreTerm protected.

4255

(Steven Parkes via Otis Gospodnetic)

4256

4257

10. LUCENE-701: Lockless commits: a commit lock is no longer required

4258

when a writer commits and a reader opens the index. This includes

4259

a change to the index file format (see docs/fileformats.html for

4260

details). It also removes all APIs associated with the commit

4261

lock & its timeout. Readers are now truly read-only and do not

4262

block one another on startup. This is the first step to getting

4263

Lucene to work correctly over NFS (second step is

4264

LUCENE-710). (Mike McCandless)

4265

4266

11. LUCENE-722: DEFAULT_MIN_DOC_FREQ was misspelled DEFALT_MIN_DOC_FREQ

4267

in Similarity's MoreLikeThis class. The misspelling has been

4268

replaced by the correct spelling.

4269

(Andi Vajda via Daniel Naber)

4270

4271

12. LUCENE-738: Reduce the size of the file that keeps track of which

4272

documents are deleted when the number of deleted documents is

4273

small. This changes the index file format and cannot be

4274

read by previous versions of Lucene. (Doron Cohen via Yonik Seeley)

4275

4276

13. LUCENE-756: Maintain all norms in a single .nrm file to reduce the

4277

number of open files and file descriptors for the non-compound index

4278

format. This changes the index file format, but maintains the

4279

ability to read and update older indices. The first segment merge

4280

on an older format index will create a single .nrm file for the new

4281

segment. (Doron Cohen via Yonik Seeley)

4282

4283

14. LUCENE-732: DateTools support has been added to QueryParser, with

4284

setters for both the default Resolution, and per-field Resolution.

4285

For backwards compatibility, DateField is still used if no Resolutions

4286

are specified. (Michael Busch via Chris Hostetter)

4287

4288

15. Added isOptimized() method to IndexReader.

4289

(Otis Gospodnetic)

4290

4291

16. LUCENE-773: Deprecate the FSDirectory.getDirectory(*) methods that

4292

take a boolean "create" argument. Instead you should use

4293

IndexWriter's "create" argument to create a new index.

4294

(Mike McCandless)

4295

4296

17. LUCENE-780: Add a static Directory.copy() method to copy files

4297

from one Directory to another. (Jiri Kuhn via Mike McCandless)

4298

4299

18. LUCENE-773: Added Directory.clearLock(String name) to forcefully

4300

remove an old lock. The default implementation is to ask the

4301

lockFactory (if non null) to clear the lock. (Mike McCandless)

4302

4303

19. LUCENE-795: Directory.renameFile() has been deprecated as it is

4304

not used anymore inside Lucene. (Daniel Naber)

4305

4306

Bug fixes

4307

4308

1. Fixed the web application demo (built with "ant war-demo") which

4309

didn't work because it used a QueryParser method that had

4310

been removed (Daniel Naber)

4311

4312

2. LUCENE-583: ISOLatin1AccentFilter fails to preserve positionIncrement

4313

(Yonik Seeley)

4314

4315

3. LUCENE-575: SpellChecker min score is incorrectly changed by suggestSimilar

4316

(Karl Wettin via Yonik Seeley)

4317

4318

4. LUCENE-587: Explanation.toHtml was producing malformed HTML

4319

(Chris Hostetter)

4320

4321

5. Fix to allow MatchAllDocsQuery to be used with RemoteSearcher (Yonik Seeley)

4322

4323

6. LUCENE-601: RAMDirectory and RAMFile made Serializable

4324

(Karl Wettin via Otis Gospodnetic)

4325

4326

7. LUCENE-557: Fixes to BooleanQuery and FilteredQuery so that the score

4327

Explanations match up with the real scores.

4328

(Chris Hostetter)

4329

4330

8. LUCENE-607: ParallelReader's TermEnum fails to advance properly to

4331

new fields (Chuck Williams, Christian Kohlschuetter via Yonik Seeley)

4332

4333

9. LUCENE-610,LUCENE-611: Simple syntax changes to allow compilation with ecj:

4334

disambiguate inner class scorer's use of doc() in BooleanScorer2,

4335

other test code changes. (DM Smith via Yonik Seeley)

4336

4337

10. LUCENE-451: All core query types now use ComplexExplanations so that

4338

boosts of zero don't confuse the BooleanWeight explain method.

4339

(Chris Hostetter)

4340

4341

11. LUCENE-593: Fixed LuceneDictionary's inner Iterator

4342

(Kåre Fiedler Christiansen via Otis Gospodnetic)

4343

4344

12. LUCENE-641: fixed an off-by-one bug with IndexWriter.setMaxFieldLength()

4345

(Daniel Naber)

4346

4347

13. LUCENE-659: Make PerFieldAnalyzerWrapper delegate getPositionIncrementGap()

4348

to the correct analyzer for the field. (Chuck Williams via Yonik Seeley)

4349

4350

14. LUCENE-650: Fixed NPE in Locale specific String Sort when Document

4351

has no value.

4352

(Oliver Hutchison via Chris Hostetter)

4353

4354

15. LUCENE-683: Fixed data corruption when reading lazy loaded fields.

4355

(Yonik Seeley)

4356

4357

16. LUCENE-678: Fixed bug in NativeFSLockFactory which caused the same

4358

lock to be shared between different directories.

4359

(Michael McCandless via Yonik Seeley)

4360

4361

17. LUCENE-690: Fixed thread unsafe use of IndexInput by lazy loaded fields.

4362

(Yonik Seeley)

4363

4364

18. LUCENE-696: Fix bug when scorer for DisjunctionMaxQuery has skipTo()

4365

called on it before next(). (Yonik Seeley)

4366

4367

19. LUCENE-569: Fixed SpanNearQuery bug, for 'inOrder' queries it would fail

4368

to recognize ordered spans if they overlapped with unordered spans.

4369

(Paul Elschot via Chris Hostetter)

4370

4371

20. LUCENE-706: Updated fileformats.xml|html concerning the docdelta value

4372

in the frequency file. (Johan Stuyts, Doron Cohen via Grant Ingersoll)

4373

4374

21. LUCENE-715: Fixed private constructor in IndexWriter.java to

4375

properly release the acquired write lock if there is an

4376

IOException after acquiring the write lock but before finishing

4377

instantiation. (Matthew Bogosian via Mike McCandless)

4378

4379

22. LUCENE-651: Multiple different threads requesting the same

4380

FieldCache entry (often for Sorting by a field) at the same

4381

time caused multiple generations of that entry, which was

4382

detrimental to performance and memory use.

4383

(Oliver Hutchison via Otis Gospodnetic)

4384

4385

23. LUCENE-717: Fixed build.xml not to fail when there is no lib dir.

4386

(Doron Cohen via Otis Gospodnetic)

4387

4388

24. LUCENE-728: Removed duplicate/old MoreLikeThis and SimilarityQueries

4389

classes from contrib/similarity, as their new home is under

4390

contrib/queries.

4391

(Otis Gospodnetic)

4392

4393

25. LUCENE-669: Do not double-close the RandomAccessFile in

4394

FSIndexInput/Output during finalize(). Besides sending an

4395

IOException up to the GC, this may also be the cause intermittent

4396

"The handle is invalid" IOExceptions on Windows when trying to

4397

close readers or writers. (Michael Busch via Mike McCandless)

4398

4399

26. LUCENE-702: Fix IndexWriter.addIndexes(*) to not corrupt the index

4400

on any exceptions (eg disk full). The semantics of these methods

4401

is now transactional: either all indices are merged or none are.

4402

Also fixed IndexWriter.mergeSegments (called outside of

4403

addIndexes(*) by addDocument, optimize, flushRamSegments) and

4404

IndexReader.commit() (called by close) to clean up and keep the

4405

instance state consistent to what's actually in the index (Mike

4406

McCandless).

4407

4408

27. LUCENE-129: Change finalizers to do "try {...} finally

4409

{super.finalize();}" to make sure we don't miss finalizers in

4410

classes above us. (Esmond Pitt via Mike McCandless)

4411

4412

28. LUCENE-754: Fix a problem introduced by LUCENE-651, causing

4413

IndexReaders to hang around forever, in addition to not

4414

fixing the original FieldCache performance problem.

4415

(Chris Hostetter, Yonik Seeley)

4416

4417

29. LUCENE-140: Fix IndexReader.deleteDocument(int docNum) to

4418

correctly raise ArrayIndexOutOfBoundsException when docNum is too

4419

large. Previously, if docNum was only slightly too large (within

4420

the same multiple of 8, ie, up to 7 ints beyond maxDoc), no

4421

exception would be raised and instead the index would become

4422

silently corrupted. The corruption then only appears much later,

4423

in mergeSegments, when the corrupted segment is merged with

4424

segment(s) after it. (Mike McCandless)

4425

4426

30. LUCENE-768: Fix case where an Exception during deleteDocument,

4427

undeleteAll or setNorm in IndexReader could leave the reader in a

4428

state where close() fails to release the write lock.

4429

(Mike McCandless)

4430

4431

31. Remove "tvp" from known index file extensions because it is

4432

never used. (Nicolas Lalevée via Bernhard Messer)

4433

4434

32. LUCENE-767: Change how SegmentReader.maxDoc() is computed to not

4435

rely on file length check and instead use the SegmentInfo's

4436

docCount that's already stored explicitly in the index. This is a

4437

defensive bug fix (ie, there is no known problem seen "in real

4438

life" due to this, just a possible future problem). (Chuck

4439

Williams via Mike McCandless)

4440

4441

Optimizations

4442

4443

1. LUCENE-586: TermDocs.skipTo() is now more efficient for

4444

multi-segment indexes. This will improve the performance of many

4445

types of queries against a non-optimized index. (Andrew Hudson

4446

via Yonik Seeley)

4447

4448

2. LUCENE-623: RAMDirectory.close now nulls out its reference to all

4449

internal "files", allowing them to be GCed even if references to the

4450

RAMDirectory itself still exist. (Nadav Har'El via Chris Hostetter)

4451

4452

3. LUCENE-629: Compressed fields are no longer uncompressed and

4453

recompressed during segment merges (e.g. during indexing or

4454

optimizing), thus improving performance . (Michael Busch via Otis

4455

Gospodnetic)

4456

4457

4. LUCENE-388: Improve indexing performance when maxBufferedDocs is

4458

large by keeping a count of buffered documents rather than

4459

counting after each document addition. (Doron Cohen, Paul Smith,

4460

Yonik Seeley)

4461

4462

5. Modified TermScorer.explain to use TermDocs.skipTo() instead of

4463

looping through docs. (Grant Ingersoll)

4464

4465

6. LUCENE-672: New indexing segment merge policy flushes all

4466

buffered docs to their own segment and delays a merge until

4467

mergeFactor segments of a certain level have been accumulated.

4468

This increases indexing performance in the presence of deleted

4469

docs or partially full segments as well as enabling future

4470

optimizations.

4471

4472

NOTE: this also fixes an "under-merging" bug whereby it is

4473

possible to get far too many segments in your index (which will

4474

drastically slow down search, risks exhausting file descriptor

4475

limit, etc.). This can happen when the number of buffered docs

4476

at close, plus the number of docs in the last non-ram segment is

4477

greater than mergeFactor. (Ning Li, Yonik Seeley)

4478

4479

7. Lazy loaded fields unnecessarily retained an extra copy of loaded

4480

String data. (Yonik Seeley)

4481

4482

8. LUCENE-443: ConjunctionScorer performance increase. Speed up

4483

any BooleanQuery with more than one mandatory clause.

4484

(Abdul Chaudhry, Paul Elschot via Yonik Seeley)

4485

4486

9. LUCENE-365: DisjunctionSumScorer performance increase of

4487

~30%. Speeds up queries with optional clauses. (Paul Elschot via

4488

Yonik Seeley)

4489

4490

10. LUCENE-695: Optimized BufferedIndexInput.readBytes() for medium

4491

size buffers, which will speed up merging and retrieving binary

4492

and compressed fields. (Nadav Har'El via Yonik Seeley)

4493

4494

11. LUCENE-687: Lazy skipping on proximity file speeds up most

4495

queries involving term positions, including phrase queries.

4496

(Michael Busch via Yonik Seeley)

4497

4498

12. LUCENE-714: Replaced 2 cases of manual for-loop array copying

4499

with calls to System.arraycopy instead, in DocumentWriter.java.

4500

(Nicolas Lalevee via Mike McCandless)

4501

4502

13. LUCENE-729: Non-recursive skipTo and next implementation of

4503

TermDocs for a MultiReader. The old implementation could

4504

recurse up to the number of segments in the index. (Yonik Seeley)

4505

4506

14. LUCENE-739: Improve segment merging performance by reusing

4507

the norm array across different fields and doing bulk writes

4508

of norms of segments with no deleted docs.

4509

(Michael Busch via Yonik Seeley)

4510

4511

15. LUCENE-745: Add BooleanQuery.clauses(), allowing direct access

4512

to the List of clauses and replaced the internal synchronized Vector

4513

with an unsynchronized List. (Yonik Seeley)

4514

4515

16. LUCENE-750: Remove finalizers from FSIndexOutput and move the

4516

FSIndexInput finalizer to the actual file so all clones don't

4517

4518

4519

Test Cases

4520

4521

1. Added TestTermScorer.java (Grant Ingersoll)

4522

4523

2. Added TestWindowsMMap.java (Benson Margulies via Mike McCandless)

4524

4525

3. LUCENE-744 Append the user.name property onto the temporary directory

4526

that is created so it doesn't interfere with other users. (Grant Ingersoll)

4527

4528

Documentation

4529

4530

1. Added style sheet to xdocs named lucene.css and included in the

4531

Anakia VSL descriptor. (Grant Ingersoll)

4532

4533

2. Added scoring.xml document into xdocs. Updated Similarity.java

4534

scoring formula.(Grant Ingersoll and Steve Rowe. Updates from:

4535

Michael McCandless, Doron Cohen, Chris Hostetter, Doug Cutting).

4536

Issue 664.

4537

4538

3. Added javadocs for FieldSelectorResult.java. (Grant Ingersoll)

4539

4540

4. Moved xdocs directory to src/site/src/documentation/content/xdocs per

4541

Issue 707. Site now builds using Forrest, just like the other Lucene

4542

siblings. See http://wiki.apache.org/jakarta-lucene/HowToUpdateTheWebsite

4543

for info on updating the website. (Grant Ingersoll with help from Steve Rowe,

4544

Chris Hostetter, Doug Cutting, Otis Gospodnetic, Yonik Seeley)

4545

4546

5. Added in Developer and System Requirements sections under Resources (Grant Ingersoll)

4547

4548

6. LUCENE-713 Updated the Term Vector section of File Formats to include

4549

documentation on how Offset and Position info are stored in the TVF file.

4550

(Grant Ingersoll, Samir Abdou)

4551

4552

7. Added in link to Clover Test Code Coverage Reports under the Develop

4553

section in Resources (Grant Ingersoll)

4554

4555

8. LUCENE-748: Added details for semantics of IndexWriter.close on

4556

hitting an Exception. (Jed Wesley-Smith via Mike McCandless)

4557

4558

9. Added some text about what is contained in releases.

4559

(Eric Haszlakiewicz via Grant Ingersoll)

4560

4561

10. LUCENE-758: Fix javadoc to clarify that RAMDirectory(Directory)

4562

makes a full copy of the starting Directory. (Mike McCandless)

4563

4564

11. LUCENE-764: Fix javadocs to detail temporary space requirements

4565

for IndexWriter's optimize(), addIndexes(*) and addDocument(...)

4566

methods. (Mike McCandless)

4567

4568

Build

4569

4570

1. Added in clover test code coverage per http://issues.apache.org/jira/browse/LUCENE-721

4571

To enable clover code coverage, you must have clover.jar in the ANT

4572

classpath and specify -Drun.clover=true on the command line.

4573

(Michael Busch and Grant Ingersoll)

4574

4575

2. Added a sysproperty in common-build.xml per Lucene 752 to map java.io.tmpdir to

4576

${build.dir}/test just like the tempDir sysproperty.

4577

4578

3. LUCENE-757 Added new target named init-dist that does setup for

4579

distribution of both binary and source distributions. Called by package

4580

and package-*-src

4581

4582

======================= Release 2.0.0 =======================

4583

4584

API Changes

4585

4586

1. All deprecated methods and fields have been removed, except

4587

DateField, which will still be supported for some time

4588

so Lucene can read its date fields from old indexes

4589

(Yonik Seeley & Grant Ingersoll)

4590

4591

2. DisjunctionSumScorer is no longer public.

4592

(Paul Elschot via Otis Gospodnetic)

4593

4594

3. Creating a Field with both an empty name and an empty value

4595

now throws an IllegalArgumentException

4596

(Daniel Naber)

4597

4598

4. LUCENE-301: Added new IndexWriter({String,File,Directory},

4599

Analyzer) constructors that do not take a boolean "create"

4600

argument. These new constructors will create a new index if

4601

necessary, else append to the existing one. (Dan Armbrust via

4602

Mike McCandless)

4603

4604

New features

4605

4606

1. LUCENE-496: Command line tool for modifying the field norms of an

4607

existing index; added to contrib/miscellaneous. (Chris Hostetter)

4608

4609

2. LUCENE-577: SweetSpotSimilarity added to contrib/miscellaneous.

4610

(Chris Hostetter)

4611

4612

Bug fixes

4613

4614

1. LUCENE-330: Fix issue of FilteredQuery not working properly within

4615

BooleanQuery. (Paul Elschot via Erik Hatcher)

4616

4617

2. LUCENE-515: Make ConstantScoreRangeQuery and ConstantScoreQuery work

4618

with RemoteSearchable. (Philippe Laflamme via Yonik Seeley)

4619

4620

3. Added methods to get/set writeLockTimeout and commitLockTimeout in

4621

IndexWriter. These could be set in Lucene 1.4 using a system property.

4622

This feature had been removed without adding the corresponding

4623

getter/setter methods. (Daniel Naber)

4624

4625

4. LUCENE-413: Fixed ArrayIndexOutOfBoundsException exceptions

4626

when using SpanQueries. (Paul Elschot via Yonik Seeley)

4627

4628

5. Implemented FilterIndexReader.getVersion() and isCurrent()

4629

(Yonik Seeley)

4630

4631

6. LUCENE-540: Fixed a bug with IndexWriter.addIndexes(Directory[])

4632

that sometimes caused the index order of documents to change.

4633

(Yonik Seeley)

4634

4635

7. LUCENE-526: Fixed a bug in FieldSortedHitQueue that caused

4636

subsequent String sorts with different locales to sort identically.

4637

(Paul Cowan via Yonik Seeley)

4638

4639

8. LUCENE-541: Add missing extractTerms() to DisjunctionMaxQuery

4640

(Stefan Will via Yonik Seeley)

4641

4642

9. LUCENE-514: Added getTermArrays() and extractTerms() to

4643

MultiPhraseQuery (Eric Jain & Yonik Seeley)

4644

4645

10. LUCENE-512: Fixed ClassCastException in ParallelReader.getTermFreqVectors

4646

(frederic via Yonik)

4647

4648

11. LUCENE-352: Fixed bug in SpanNotQuery that manifested as

4649

NullPointerException when "exclude" query was not a SpanTermQuery.

4650

(Chris Hostetter)

4651

4652

12. LUCENE-572: Fixed bug in SpanNotQuery hashCode, was ignoring exclude clause

4653

(Chris Hostetter)

4654

4655

13. LUCENE-561: Fixed some ParallelReader bugs. NullPointerException if the reader

4656

didn't know about the field yet, reader didn't keep track if it had deletions,

4657

and deleteDocument calls could circumvent synchronization on the subreaders.

4658

(Chuck Williams via Yonik Seeley)

4659

4660

14. LUCENE-556: Added empty extractTerms() implementation to MatchAllDocsQuery and

4661

ConstantScoreQuery in order to allow their use with a MultiSearcher.

4662

(Yonik Seeley)

4663

4664

15. LUCENE-546: Removed 2GB file size limitations for RAMDirectory.

4665

(Peter Royal, Michael Chan, Yonik Seeley)

4666

4667

16. LUCENE-485: Don't hold commit lock while removing obsolete index

4668

files. (Luc Vanlerberghe via cutting)

4669

4670

4671

1.9.1

4672

4673

Bug fixes

4674

4675

1. LUCENE-511: Fix a bug in the BufferedIndexOutput optimization

4676

introduced in 1.9-final. (Shay Banon & Steven Tamm via cutting)

4677

4678

1.9 final

4679

4680

Note that this release is mostly but not 100% source compatible with

4681

the previous release of Lucene (1.4.3). In other words, you should

4682

make sure your application compiles with this version of Lucene before

4683

you replace the old Lucene JAR with the new one. Many methods have

4684

been deprecated in anticipation of release 2.0, so deprecation

4685

warnings are to be expected when upgrading from 1.4.3 to 1.9.

4686

4687

Bug fixes

4688

4689

1. The fix that made IndexWriter.setMaxBufferedDocs(1) work had negative

4690

effects on indexing performance and has thus been reverted. The

4691

argument for setMaxBufferedDocs(int) must now at least be 2, otherwise

4692

an exception is thrown. (Daniel Naber)

4693

4694

Optimizations

4695

4696

1. Optimized BufferedIndexOutput.writeBytes() to use

4697

System.arraycopy() in more cases, rather than copying byte-by-byte.

4698

(Lukas Zapletal via Cutting)

4699

4700

1.9 RC1

4701

4702

Requirements

4703

4704

1. To compile and use Lucene you now need Java 1.4 or later.

4705

4706

Changes in runtime behavior

4707

4708

1. FuzzyQuery can no longer throw a TooManyClauses exception. If a

4709

FuzzyQuery expands to more than BooleanQuery.maxClauseCount

4710

terms only the BooleanQuery.maxClauseCount most similar terms

4711

go into the rewritten query and thus the exception is avoided.

4712

(Christoph)

4713

4714

2. Changed system property from "org.apache.lucene.lockdir" to

4715

"org.apache.lucene.lockDir", so that its casing follows the existing

4716

pattern used in other Lucene system properties. (Bernhard)

4717

4718

3. The terms of RangeQueries and FuzzyQueries are now converted to

4719

lowercase by default (as it has been the case for PrefixQueries

4720

and WildcardQueries before). Use setLowercaseExpandedTerms(false)

4721

to disable that behavior but note that this also affects

4722

PrefixQueries and WildcardQueries. (Daniel Naber)

4723

4724

4. Document frequency that is computed when MultiSearcher is used is now

4725

computed correctly and "globally" across subsearchers and indices, while

4726

before it used to be computed locally to each index, which caused

4727

ranking across multiple indices not to be equivalent.

4728

(Chuck Williams, Wolf Siberski via Otis, bug #31841)

4729

4730

5. When opening an IndexWriter with create=true, Lucene now only deletes

4731

its own files from the index directory (looking at the file name suffixes

4732

to decide if a file belongs to Lucene). The old behavior was to delete

4733

all files. (Daniel Naber and Bernhard Messer, bug #34695)

4734

4735

6. The version of an IndexReader, as returned by getCurrentVersion()

4736

and getVersion() doesn't start at 0 anymore for new indexes. Instead, it

4737

is now initialized by the system time in milliseconds.

4738

(Bernhard Messer via Daniel Naber)

4739

4740

7. Several default values cannot be set via system properties anymore, as

4741

this has been considered inappropriate for a library like Lucene. For

4742

most properties there are set/get methods available in IndexWriter which

4743

you should use instead. This affects the following properties:

4744

See IndexWriter for getter/setter methods:

4745

org.apache.lucene.writeLockTimeout, org.apache.lucene.commitLockTimeout,

4746

org.apache.lucene.minMergeDocs, org.apache.lucene.maxMergeDocs,

4747

org.apache.lucene.maxFieldLength, org.apache.lucene.termIndexInterval,

4748

org.apache.lucene.mergeFactor,

4749

See BooleanQuery for getter/setter methods:

4750

org.apache.lucene.maxClauseCount

4751

See FSDirectory for getter/setter methods:

4752

disableLuceneLocks

4753

(Daniel Naber)

4754

4755

8. Fixed FieldCacheImpl to use user-provided IntParser and FloatParser,

4756

instead of using Integer and Float classes for parsing.

4757

(Yonik Seeley via Otis Gospodnetic)

4758

4759

9. Expert level search routines returning TopDocs and TopFieldDocs

4760

no longer normalize scores. This also fixes bugs related to

4761

MultiSearchers and score sorting/normalization.

4762

(Luc Vanlerberghe via Yonik Seeley, LUCENE-469)

4763

4764

New features

4765

4766

1. Added support for stored compressed fields (patch #31149)

4767

(Bernhard Messer via Christoph)

4768

4769

2. Added support for binary stored fields (patch #29370)

4770

(Drew Farris and Bernhard Messer via Christoph)

4771

4772

3. Added support for position and offset information in term vectors

4773

(patch #18927). (Grant Ingersoll & Christoph)

4774

4775

4. A new class DateTools has been added. It allows you to format dates

4776

in a readable format adequate for indexing. Unlike the existing

4777

DateField class DateTools can cope with dates before 1970 and it

4778

forces you to specify the desired date resolution (e.g. month, day,

4779

second, ...) which can make RangeQuerys on those fields more efficient.

4780

(Daniel Naber)

4781

4782

5. QueryParser now correctly works with Analyzers that can return more

4783

than one token per position. For example, a query "+fast +car"

4784

would be parsed as "+fast +(car automobile)" if the Analyzer

4785

returns "car" and "automobile" at the same position whenever it

4786

finds "car" (Patch #23307).

4787

(Pierrick Brihaye, Daniel Naber)

4788

4789

6. Permit unbuffered Directory implementations (e.g., using mmap).

4790

InputStream is replaced by the new classes IndexInput and

4791

BufferedIndexInput. OutputStream is replaced by the new classes

4792

IndexOutput and BufferedIndexOutput. InputStream and OutputStream

4793

are now deprecated and FSDirectory is now subclassable. (cutting)

4794

4795

7. Add native Directory and TermDocs implementations that work under

4796

GCJ. These require GCC 3.4.0 or later and have only been tested

4797

on Linux. Use 'ant gcj' to build demo applications. (cutting)

4798

4799

8. Add MMapDirectory, which uses nio to mmap input files. This is

4800

still somewhat slower than FSDirectory. However it uses less

4801

memory per query term, since a new buffer is not allocated per

4802

term, which may help applications which use, e.g., wildcard

4803

queries. It may also someday be faster. (cutting & Paul Elschot)

4804

4805

9. Added javadocs-internal to build.xml - bug #30360

4806

(Paul Elschot via Otis)

4807

4808

10. Added RangeFilter, a more generically useful filter than DateFilter.

4809

(Chris M Hostetter via Erik)

4810

4811

11. Added NumberTools, a utility class indexing numeric fields.

4812

(adapted from code contributed by Matt Quail; committed by Erik)

4813

4814

12. Added public static IndexReader.main(String[] args) method.

4815

IndexReader can now be used directly at command line level

4816

to list and optionally extract the individual files from an existing

4817

compound index file.

4818

(adapted from code contributed by Garrett Rooney; committed by Bernhard)

4819

4820

13. Add IndexWriter.setTermIndexInterval() method. See javadocs.

4821

(Doug Cutting)

4822

4823

14. Added LucenePackage, whose static get() method returns java.util.Package,

4824

which lets the caller get the Lucene version information specified in

4825

the Lucene Jar.

4826

(Doug Cutting via Otis)

4827

4828

15. Added Hits.iterator() method and corresponding HitIterator and Hit objects.

4829

This provides standard java.util.Iterator iteration over Hits.

4830

Each call to the iterator's next() method returns a Hit object.

4831

(Jeremy Rayner via Erik)

4832

4833

16. Add ParallelReader, an IndexReader that combines separate indexes

4834

over different fields into a single virtual index. (Doug Cutting)

4835

4836

17. Add IntParser and FloatParser interfaces to FieldCache, so that

4837

fields in arbitrarily formats can be cached as ints and floats.

4838

(Doug Cutting)

4839

4840

18. Added class org.apache.lucene.index.IndexModifier which combines

4841

IndexWriter and IndexReader, so you can add and delete documents without

4842

worrying about synchronization/locking issues.

4843

(Daniel Naber)

4844

4845

19. Lucene can now be used inside an unsigned applet, as Lucene's access

4846

to system properties will not cause a SecurityException anymore.

4847

(Jon Schuster via Daniel Naber, bug #34359)

4848

4849

20. Added a new class MatchAllDocsQuery that matches all documents.

4850

(John Wang via Daniel Naber, bug #34946)

4851

4852

21. Added ability to omit norms on a per field basis to decrease

4853

index size and memory consumption when there are many indexed fields.

4854

See Field.setOmitNorms()

4855

(Yonik Seeley, LUCENE-448)

4856

4857

22. Added NullFragmenter to contrib/highlighter, which is useful for

4858

highlighting entire documents or fields.

4859

(Erik Hatcher)

4860

4861

23. Added regular expression queries, RegexQuery and SpanRegexQuery.

4862

Note the same term enumeration caveats apply with these queries as

4863

apply to WildcardQuery and other term expanding queries.

4864

These two new queries are not currently supported via QueryParser.

4865

(Erik Hatcher)

4866

4867

24. Added ConstantScoreQuery which wraps a filter and produces a score

4868

equal to the query boost for every matching document.

4869

(Yonik Seeley, LUCENE-383)

4870

4871

25. Added ConstantScoreRangeQuery which produces a constant score for

4872

every document in the range. One advantage over a normal RangeQuery

4873

is that it doesn't expand to a BooleanQuery and thus doesn't have a maximum

4874

number of terms the range can cover. Both endpoints may also be open.

4875

(Yonik Seeley, LUCENE-383)

4876

4877

26. Added ability to specify a minimum number of optional clauses that

4878

must match in a BooleanQuery. See BooleanQuery.setMinimumNumberShouldMatch().

4879

(Paul Elschot, Chris Hostetter via Yonik Seeley, LUCENE-395)

4880

4881

27. Added DisjunctionMaxQuery which provides the maximum score across its clauses.

4882

It's very useful for searching across multiple fields.

4883

(Chuck Williams via Yonik Seeley, LUCENE-323)

4884

4885

28. New class ISOLatin1AccentFilter that replaces accented characters in the ISO

4886

Latin 1 character set by their unaccented equivalent.

4887

(Sven Duzont via Erik Hatcher)

4888

4889

29. New class KeywordAnalyzer. "Tokenizes" the entire stream as a single token.

4890

This is useful for data like zip codes, ids, and some product names.

4891

(Erik Hatcher)

4892

4893

30. Copied LengthFilter from contrib area to core. Removes words that are too

4894

long and too short from the stream.

4895

(David Spencer via Otis and Daniel)

4896

4897

31. Added getPositionIncrementGap(String fieldName) to Analyzer. This allows

4898

custom analyzers to put gaps between Field instances with the same field

4899

name, preventing phrase or span queries crossing these boundaries. The

4900

default implementation issues a gap of 0, allowing the default token

4901

position increment of 1 to put the next field's first token into a

4902

successive position.

4903

(Erik Hatcher, with advice from Yonik)

4904

4905

32. StopFilter can now ignore case when checking for stop words.

4906

(Grant Ingersoll via Yonik, LUCENE-248)

4907

4908

33. Add TopDocCollector and TopFieldDocCollector. These simplify the

4909

implementation of hit collectors that collect only the

4910

top-scoring or top-sorting hits.

4911

4912

API Changes

4913

4914

1. Several methods and fields have been deprecated. The API documentation

4915

contains information about the recommended replacements. It is planned

4916

that most of the deprecated methods and fields will be removed in

4917

Lucene 2.0. (Daniel Naber)

4918

4919

2. The Russian and the German analyzers have been moved to contrib/analyzers.

4920

Also, the WordlistLoader class has been moved one level up in the

4921

hierarchy and is now org.apache.lucene.analysis.WordlistLoader

4922

(Daniel Naber)

4923

4924

3. The API contained methods that declared to throw an IOException

4925

but that never did this. These declarations have been removed. If

4926

your code tries to catch these exceptions you might need to remove

4927

those catch clauses to avoid compile errors. (Daniel Naber)

4928

4929

4. Add a serializable Parameter Class to standardize parameter enum

4930

classes in BooleanClause and Field. (Christoph)

4931

4932

5. Added rewrite methods to all SpanQuery subclasses that nest other SpanQuerys.

4933

This allows custom SpanQuery subclasses that rewrite (for term expansion, for

4934

example) to nest within the built-in SpanQuery classes successfully.

4935

4936

Bug fixes

4937

4938

1. The JSP demo page (src/jsp/results.jsp) now properly closes the

4939

IndexSearcher it opens. (Daniel Naber)

4940

4941

2. Fixed a bug in IndexWriter.addIndexes(IndexReader[] readers) that

4942

prevented deletion of obsolete segments. (Christoph Goller)

4943

4944

3. Fix in FieldInfos to avoid the return of an extra blank field in

4945

IndexReader.getFieldNames() (Patch #19058). (Mark Harwood via Bernhard)

4946

4947

4. Some combinations of BooleanQuery and MultiPhraseQuery (formerly

4948

PhrasePrefixQuery) could provoke UnsupportedOperationException

4949

(bug #33161). (Rhett Sutphin via Daniel Naber)

4950

4951

5. Small bug in skipTo of ConjunctionScorer that caused NullPointerException

4952

if skipTo() was called without prior call to next() fixed. (Christoph)

4953

4954

6. Disable Similiarty.coord() in the scoring of most automatically

4955

generated boolean queries. The coord() score factor is

4956

appropriate when clauses are independently specified by a user,

4957

but is usually not appropriate when clauses are generated

4958

automatically, e.g., by a fuzzy, wildcard or range query. Matches

4959

on such automatically generated queries are no longer penalized

4960

for not matching all terms. (Doug Cutting, Patch #33472)

4961

4962

7. Getting a lock file with Lock.obtain(long) was supposed to wait for

4963

a given amount of milliseconds, but this didn't work.

4964

(John Wang via Daniel Naber, Bug #33799)

4965

4966

8. Fix FSDirectory.createOutput() to always create new files.

4967

Previously, existing files were overwritten, and an index could be

4968

corrupted when the old version of a file was longer than the new.

4969

Now any existing file is first removed. (Doug Cutting)

4970

4971

9. Fix BooleanQuery containing nested SpanTermQuery's, which previously

4972

could return an incorrect number of hits.

4973

(Reece Wilton via Erik Hatcher, Bug #35157)

4974

4975

10. Fix NullPointerException that could occur with a MultiPhraseQuery

4976

inside a BooleanQuery.

4977

(Hans Hjelm and Scotty Allen via Daniel Naber, Bug #35626)

4978

4979

11. Fixed SnowballFilter to pass through the position increment from

4980

the original token.

4981

(Yonik Seeley via Erik Hatcher, LUCENE-437)

4982

4983

12. Added Unicode range of Korean characters to StandardTokenizer,

4984

grouping contiguous characters into a token rather than one token

4985

per character. This change also changes the token type to "<CJ>"

4986

for Chinese and Japanese character tokens (previously it was "<CJK>").

4987

(Cheolgoo Kang via Otis and Erik, LUCENE-444 and LUCENE-461)

4988

4989

13. FieldsReader now looks at FieldInfo.storeOffsetWithTermVector and

4990

FieldInfo.storePositionWithTermVector and creates the Field with

4991

correct TermVector parameter.

4992

(Frank Steinmann via Bernhard, LUCENE-455)

4993

4994

14. Fixed WildcardQuery to prevent "cat" matching "ca??".

4995

(Xiaozheng Ma via Bernhard, LUCENE-306)

4996

4997

15. Fixed a bug where MultiSearcher and ParallelMultiSearcher could

4998

change the sort order when sorting by string for documents without

4999

a value for the sort field.

5000

(Luc Vanlerberghe via Yonik, LUCENE-453)

5001

5002

16. Fixed a sorting problem with MultiSearchers that can lead to

5003

missing or duplicate docs due to equal docs sorting in an arbitrary order.

5004

(Yonik Seeley, LUCENE-456)

5005

5006

17. A single hit using the expert level sorted search methods

5007

resulted in the score not being normalized.

5008

(Yonik Seeley, LUCENE-462)

5009

5010

18. Fixed inefficient memory usage when loading an index into RAMDirectory.

5011

(Volodymyr Bychkoviak via Bernhard, LUCENE-475)

5012

5013

19. Corrected term offsets returned by ChineseTokenizer.

5014

(Ray Tsang via Erik Hatcher, LUCENE-324)

5015

5016

20. Fixed MultiReader.undeleteAll() to correctly update numDocs.

5017

(Robert Kirchgessner via Doug Cutting, LUCENE-479)

5018

5019

21. Race condition in IndexReader.getCurrentVersion() and isCurrent()

5020

fixed by acquiring the commit lock.

5021

(Luc Vanlerberghe via Yonik Seeley, LUCENE-481)

5022

5023

22. IndexWriter.setMaxBufferedDocs(1) didn't have the expected effect,

5024

this has now been fixed. (Daniel Naber)

5025

5026

23. Fixed QueryParser when called with a date in local form like

5027

"[1/16/2000 TO 1/18/2000]". This query did not include the documents

5028

of 1/18/2000, i.e. the last day was not included. (Daniel Naber)

5029

5030

24. Removed sorting constraint that threw an exception if there were

5031

not yet any values for the sort field (Yonik Seeley, LUCENE-374)

5032

5033

Optimizations

5034

5035

1. Disk usage (peak requirements during indexing and optimization)

5036

in case of compound file format has been improved.

5037

(Bernhard, Dmitry, and Christoph)

5038

5039

2. Optimize the performance of certain uses of BooleanScorer,

5040

TermScorer and IndexSearcher. In particular, a BooleanQuery

5041

composed of TermQuery, with not all terms required, that returns a

5042

TopDocs (e.g., through a Hits with no Sort specified) runs much

5043

faster. (cutting)

5044

5045

3. Removed synchronization from reading of term vectors with an

5046

IndexReader (Patch #30736). (Bernhard Messer via Christoph)

5047

5048

4. Optimize term-dictionary lookup to allocate far fewer terms when

5049

scanning for the matching term. This speeds searches involving

5050

low-frequency terms, where the cost of dictionary lookup can be

5051

significant. (cutting)

5052

5053

5. Optimize fuzzy queries so the standard fuzzy queries with a prefix

5054

of 0 now run 20-50% faster (Patch #31882).

5055

(Jonathan Hager via Daniel Naber)

5056

5057

6. A Version of BooleanScorer (BooleanScorer2) added that delivers

5058

documents in increasing order and implements skipTo. For queries

5059

with required or forbidden clauses it may be faster than the old

5060

BooleanScorer, for BooleanQueries consisting only of optional

5061

clauses it is probably slower. The new BooleanScorer is now the

5062

default. (Patch 31785 by Paul Elschot via Christoph)

5063

5064

7. Use uncached access to norms when merging to reduce RAM usage.

5065

(Bug #32847). (Doug Cutting)

5066

5067

8. Don't read term index when random-access is not required. This

5068

reduces time to open IndexReaders and they use less memory when

5069

random access is not required, e.g., when merging segments. The

5070

term index is now read into memory lazily at the first

5071

random-access. (Doug Cutting)

5072

5073

9. Optimize IndexWriter.addIndexes(Directory[]) when the number of

5074

added indexes is larger than mergeFactor. Previously this could

5075

result in quadratic performance. Now performance is n log(n).

5076

(Doug Cutting)

5077

5078

10. Speed up the creation of TermEnum for indices with multiple

5079

segments and deleted documents, and thus speed up PrefixQuery,

5080

RangeQuery, WildcardQuery, FuzzyQuery, RangeFilter, DateFilter,

5081

and sorting the first time on a field.

5082

(Yonik Seeley, LUCENE-454)

5083

5084

11. Optimized and generalized 32 bit floating point to byte

5085

(custom 8 bit floating point) conversions. Increased the speed of

5086

Similarity.encodeNorm() anywhere from 10% to 250%, depending on the JVM.

5087

(Yonik Seeley, LUCENE-467)

5088

5089

Infrastructure

5090

5091

1. Lucene's source code repository has converted from CVS to

5092

Subversion. The new repository is at

5093

http://svn.apache.org/repos/asf/lucene/java/trunk

5094

5095

2. Lucene's issue tracker has migrated from Bugzilla to JIRA.

5096

Lucene's JIRA is at http://issues.apache.org/jira/browse/LUCENE

5097

The old issues are still available at

5098

http://issues.apache.org/bugzilla/show_bug.cgi?id=xxxx

5099

(use the bug number instead of xxxx)

5100

5101

5102

1.4.3

5103

5104

1. The JSP demo page (src/jsp/results.jsp) now properly escapes error

5105

messages which might contain user input (e.g. error messages about

5106

query parsing). If you used that page as a starting point for your

5107

own code please make sure your code also properly escapes HTML

5108

characters from user input in order to avoid so-called cross site

5109

scripting attacks. (Daniel Naber)

5110

5111

2. QueryParser changes in 1.4.2 broke the QueryParser API. Now the old

5112

API is supported again. (Christoph)

5113

5114

5115

1.4.2

5116

5117

1. Fixed bug #31241: Sorting could lead to incorrect results (documents

5118

missing, others duplicated) if the sort keys were not unique and there

5119

were more than 100 matches. (Daniel Naber)

5120

5121

2. Memory leak in Sort code (bug #31240) eliminated.

5122

(Rafal Krzewski via Christoph and Daniel)

5123

5124

3. FuzzyQuery now takes an additional parameter that specifies the

5125

minimum similarity that is required for a term to match the query.

5126

The QueryParser syntax for this is term~x, where x is a floating

5127

point number >= 0 and < 1 (a bigger number means that a higher

5128

similarity is required). Furthermore, a prefix can be specified

5129

for FuzzyQuerys so that only those terms are considered similar that

5130

start with this prefix. This can speed up FuzzyQuery greatly.

5131

(Daniel Naber, Christoph Goller)

5132

5133

4. PhraseQuery and PhrasePrefixQuery now allow the explicit specification

5134

of relative positions. (Christoph Goller)

5135

5136

5. QueryParser changes: Fix for ArrayIndexOutOfBoundsExceptions

5137

(patch #9110); some unused method parameters removed; The ability

5138

to specify a minimum similarity for FuzzyQuery has been added.

5139

(Christoph Goller)

5140

5141

6. IndexSearcher optimization: a new ScoreDoc is no longer allocated

5142

for every non-zero-scoring hit. This makes 'OR' queries that

5143

contain common terms substantially faster. (cutting)

5144

5145

5146

1.4.1

5147

5148

1. Fixed a performance bug in hit sorting code, where values were not

5149

correctly cached. (Aviran via cutting)

5150

5151

2. Fixed errors in file format documentation. (Daniel Naber)

5152

5153

5154

1.4 final

5155

5156

1. Added "an" to the list of stop words in StopAnalyzer, to complement

5157

the existing "a" there. Fix for bug 28960

5158

(http://issues.apache.org/bugzilla/show_bug.cgi?id=28960). (Otis)

5159

5160

2. Added new class FieldCache to manage in-memory caches of field term

5161

values. (Tim Jones)

5162

5163

3. Added overloaded getFieldQuery method to QueryParser which

5164

accepts the slop factor specified for the phrase (or the default

5165

phrase slop for the QueryParser instance). This allows overriding

5166

methods to replace a PhraseQuery with a SpanNearQuery instead,

5167

keeping the proper slop factor. (Erik Hatcher)

5168

5169

4. Changed the encoding of GermanAnalyzer.java and GermanStemmer.java to

5170

UTF-8 and changed the build encoding to UTF-8, to make changed files

5171

compile. (Otis Gospodnetic)

5172

5173

5. Removed synchronization from term lookup under IndexReader methods

5174

termFreq(), termDocs() or termPositions() to improve

5175

multi-threaded performance. (cutting)

5176

5177

6. Fix a bug where obsolete segment files were not deleted on Win32.

5178

5179

5180

1.4 RC3

5181

5182

1. Fixed several search bugs introduced by the skipTo() changes in

5183

release 1.4RC1. The index file format was changed a bit, so

5184

collections must be re-indexed to take advantage of the skipTo()

5185

optimizations. (Christoph Goller)

5186

5187

2. Added new Document methods, removeField() and removeFields().

5188

(Christoph Goller)

5189

5190

3. Fixed inconsistencies with index closing. Indexes and directories

5191

are now only closed automatically by Lucene when Lucene opened

5192

them automatically. (Christoph Goller)

5193

5194

4. Added new class: FilteredQuery. (Tim Jones)

5195

5196

5. Added a new SortField type for custom comparators. (Tim Jones)

5197

5198

6. Lock obtain timed out message now displays the full path to the lock

5199

file. (Daniel Naber via Erik)

5200

5201

7. Fixed a bug in SpanNearQuery when ordered. (Paul Elschot via cutting)

5202

5203

8. Fixed so that FSDirectory's locks still work when the

5204

java.io.tmpdir system property is null. (cutting)

5205

5206

9. Changed FilteredTermEnum's constructor to take no parameters,

5207

as the parameters were ignored anyway (bug #28858)

5208

5209

1.4 RC2

5210

5211

1. GermanAnalyzer now throws an exception if the stopword file

5212

cannot be found (bug #27987). It now uses LowerCaseFilter

5213

(bug #18410) (Daniel Naber via Otis, Erik)

5214

5215

2. Fixed a few bugs in the file format documentation. (cutting)

5216

5217

5218

1.4 RC1

5219

5220

1. Changed the format of the .tis file, so that:

5221

5222

- it has a format version number, which makes it easier to

5223

back-compatibly change file formats in the future.

5224

5225

- the term count is now stored as a long. This was the one aspect

5226

of the Lucene's file formats which limited index size.

5227

5228

- a few internal index parameters are now stored in the index, so

5229

that they can (in theory) now be changed from index to index,

5230

although there is not yet an API to do so.

5231

5232

These changes are back compatible. The new code can read old

5233

indexes. But old code will not be able read new indexes. (cutting)

5234

5235

2. Added an optimized implementation of TermDocs.skipTo(). A skip

5236

table is now stored for each term in the .frq file. This only

5237

adds a percent or two to overall index size, but can substantially

5238

speedup many searches. (cutting)

5239

5240

3. Restructured the Scorer API and all Scorer implementations to take

5241

advantage of an optimized TermDocs.skipTo() implementation. In

5242

particular, PhraseQuerys and conjunctive BooleanQuerys are

5243

faster when one clause has substantially fewer matches than the

5244

others. (A conjunctive BooleanQuery is a BooleanQuery where all

5245

clauses are required.) (cutting)

5246

5247

4. Added new class ParallelMultiSearcher. Combined with

5248

RemoteSearchable this makes it easy to implement distributed

5249

search systems. (Jean-Francois Halleux via cutting)

5250

5251

5. Added support for hit sorting. Results may now be sorted by any

5252

indexed field. For details see the javadoc for

5253

Searcher#search(Query, Sort). (Tim Jones via Cutting)

5254

5255

6. Changed FSDirectory to auto-create a full directory tree that it

5256

needs by using mkdirs() instead of mkdir(). (Mladen Turk via Otis)

5257

5258

7. Added a new span-based query API. This implements, among other

5259

things, nested phrases. See javadocs for details. (Doug Cutting)

5260

5261

8. Added new method Query.getSimilarity(Searcher), and changed

5262

scorers to use it. This permits one to subclass a Query class so

5263

that it can specify its own Similarity implementation, perhaps

5264

one that delegates through that of the Searcher. (Julien Nioche

5265

via Cutting)

5266

5267

9. Added MultiReader, an IndexReader that combines multiple other

5268

IndexReaders. (Cutting)

5269

5270

10. Added support for term vectors. See Field#isTermVectorStored().

5271

(Grant Ingersoll, Cutting & Dmitry)

5272

5273

11. Fixed the old bug with escaping of special characters in query

5274

strings: http://issues.apache.org/bugzilla/show_bug.cgi?id=24665

5275

(Jean-Francois Halleux via Otis)

5276

5277

12. Added support for overriding default values for the following,

5278

using system properties:

5279

- default commit lock timeout

5280

- default maxFieldLength

5281

- default maxMergeDocs

5282

- default mergeFactor

5283

- default minMergeDocs

5284

- default write lock timeout

5285

(Otis)

5286

5287

13. Changed QueryParser.jj to allow '-' and '+' within tokens:

5288

http://issues.apache.org/bugzilla/show_bug.cgi?id=27491

5289

(Morus Walter via Otis)

5290

5291

14. Changed so that the compound index format is used by default.

5292

This makes indexing a bit slower, but vastly reduces the chances

5293

of file handle problems. (Cutting)

5294

5295

5296

1.3 final

5297

5298

1. Added catch of BooleanQuery$TooManyClauses in QueryParser to

5299

throw ParseException instead. (Erik Hatcher)

5300

5301

2. Fixed a NullPointerException in Query.explain(). (Doug Cutting)

5302

5303

3. Added a new method IndexReader.setNorm(), that permits one to

5304

alter the boosting of fields after an index is created.

5305

5306

4. Distinguish between the final position and length when indexing a

5307

field. The length is now defined as the total number of tokens,

5308

instead of the final position, as it was previously. Length is

5309

used for score normalization (Similarity.lengthNorm()) and for

5310

controlling memory usage (IndexWriter.maxFieldLength). In both of

5311

these cases, the total number of tokens is a better value to use

5312

than the final token position. Position is used in phrase

5313

searching (see PhraseQuery and Token.setPositionIncrement()).

5314

5315

5. Fix StandardTokenizer's handling of CJK characters (Chinese,

5316

Japanese and Korean ideograms). Previously contiguous sequences

5317

were combined in a single token, which is not very useful. Now

5318

each ideogram generates a separate token, which is more useful.

5319

5320

5321

1.3 RC3

5322

5323

1. Added minMergeDocs in IndexWriter. This can be raised to speed

5324

indexing without altering the number of files, but only using more

5325

memory. (Julien Nioche via Otis)

5326

5327

2. Fix bug #24786, in query rewriting. (bschneeman via Cutting)

5328

5329

3. Fix bug #16952, in demo HTML parser, skip comments in

5330

javascript. (Christoph Goller)

5331

5332

4. Fix bug #19253, in demo HTML parser, add whitespace as needed to

5333

output (Daniel Naber via Christoph Goller)

5334

5335

5. Fix bug #24301, in demo HTML parser, long titles no longer

5336

hang things. (Christoph Goller)

5337

5338

6. Fix bug #23534, Replace use of file timestamp of segments file

5339

with an index version number stored in the segments file. This

5340

resolves problems when running on file systems with low-resolution

5341

timestamps, e.g., HFS under MacOS X. (Christoph Goller)

5342

5343

7. Fix QueryParser so that TokenMgrError is not thrown, only

5344

ParseException. (Erik Hatcher)

5345

5346

8. Fix some bugs introduced by change 11 of RC2. (Christoph Goller)

5347

5348

9. Fixed a problem compiling TestRussianStem. (Christoph Goller)

5349

5350

10. Cleaned up some build stuff. (Erik Hatcher)

5351

5352

5353

1.3 RC2

5354

5355

1. Added getFieldNames(boolean) to IndexReader, SegmentReader, and

5356

SegmentsReader. (Julien Nioche via otis)

5357

5358

2. Changed file locking to place lock files in

5359

System.getProperty("java.io.tmpdir"), where all users are

5360

permitted to write files. This way folks can open and correctly

5361

lock indexes which are read-only to them.

5362

5363

3. IndexWriter: added a new method, addDocument(Document, Analyzer),

5364

permitting one to easily use different analyzers for different

5365

documents in the same index.

5366

5367

4. Minor enhancements to FuzzyTermEnum.

5368

(Christoph Goller via Otis)

5369

5370

5. PriorityQueue: added insert(Object) method and adjusted IndexSearcher

5371

and MultiIndexSearcher to use it.

5372

(Christoph Goller via Otis)

5373

5374

6. Fixed a bug in IndexWriter that returned incorrect docCount().

5375

(Christoph Goller via Otis)

5376

5377

7. Fixed SegmentsReader to eliminate the confusing and slightly different

5378

behaviour of TermEnum when dealing with an enumeration of all terms,

5379

versus an enumeration starting from a specific term.

5380

This patch also fixes incorrect term document frequencies when the same term

5381

is present in multiple segments.

5382

(Christoph Goller via Otis)

5383

5384

8. Added CachingWrapperFilter and PerFieldAnalyzerWrapper. (Erik Hatcher)

5385

5386

9. Added support for the new "compound file" index format (Dmitry

5387

Serebrennikov)

5388

5389

10. Added Locale setting to QueryParser, for use by date range parsing.

5390

5391

11. Changed IndexReader so that it can be subclassed by classes

5392

outside of its package. Previously it had package-private

5393

abstract methods. Also modified the index merging code so that it

5394

can work on an arbitrary IndexReader implementation, and added a

5395

new method, IndexWriter.addIndexes(IndexReader[]), to take

5396

advantage of this. (cutting)

5397

5398

12. Added a limit to the number of clauses which may be added to a

5399

BooleanQuery. The default limit is 1024 clauses. This should

5400

stop most OutOfMemoryExceptions by prefix, wildcard and fuzzy

5401

queries which run amok. (cutting)

5402

5403

13. Add new method: IndexReader.undeleteAll(). This undeletes all

5404

deleted documents which still remain in the index. (cutting)

5405

5406

5407

1.3 RC1

5408

5409

1. Fixed PriorityQueue's clear() method.

5410

Fix for bug 9454, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9454

5411

(Matthijs Bomhoff via otis)

5412

5413

2. Changed StandardTokenizer.jj grammar for EMAIL tokens.

5414

Fix for bug 9015, http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9015

5415

(Dale Anson via otis)

5416

5417

3. Added the ability to disable lock creation by using disableLuceneLocks

5418

system property. This is useful for read-only media, such as CD-ROMs.

5419

(otis)

5420

5421

4. Added id method to Hits to be able to access the index global id.

5422

Required for sorting options.

5423

(carlson)

5424

5425

5. Added support for new range query syntax to QueryParser.jj.

5426

(briangoetz)

5427

5428

6. Added the ability to retrieve HTML documents' META tag values to

5429

HTMLParser.jj.

5430

(Mark Harwood via otis)

5431

5432

7. Modified QueryParser to make it possible to programmatically specify the

5433

default Boolean operator (OR or AND).

5434

(Péter Halácsy via otis)

5435

5436

8. Made many search methods and classes non-final, per requests.

5437

This includes IndexWriter and IndexSearcher, among others.

5438

(cutting)

5439

5440

9. Added class RemoteSearchable, providing support for remote

5441

searching via RMI. The test class RemoteSearchableTest.java

5442

provides an example of how this can be used. (cutting)

5443

5444

10. Added PhrasePrefixQuery (and supporting MultipleTermPositions). The

5445

test class TestPhrasePrefixQuery provides the usage example.

5446

(Anders Nielsen via otis)

5447

5448

11. Changed the German stemming algorithm to ignore case while

5449

stripping. The new algorithm is faster and produces more equal

5450

stems from nouns and verbs derived from the same word.

5451

(gschwarz)

5452

5453

12. Added support for boosting the score of documents and fields via

5454

the new methods Document.setBoost(float) and Field.setBoost(float).

5455

5456

Note: This changes the encoding of an indexed value. Indexes

5457

should be re-created from scratch in order for search scores to

5458

be correct. With the new code and an old index, searches will

5459

yield very large scores for shorter fields, and very small scores

5460

for longer fields. Once the index is re-created, scores will be

5461

as before. (cutting)

5462

5463

13. Added new method Token.setPositionIncrement().

5464

5465

This permits, for the purpose of phrase searching, placing

5466

multiple terms in a single position. This is useful with

5467

stemmers that produce multiple possible stems for a word.

5468

5469

This also permits the introduction of gaps between terms, so that

5470

terms which are adjacent in a token stream will not be matched by

5471

and exact phrase query. This makes it possible, e.g., to build

5472

an analyzer where phrases are not matched over stop words which

5473

have been removed.

5474

5475

Finally, repeating a token with an increment of zero can also be

5476

used to boost scores of matches on that token. (cutting)

5477

5478

14. Added new Filter class, QueryFilter. This constrains search

5479

results to only match those which also match a provided query.

5480

Results are cached, so that searches after the first on the same

5481

index using this filter are very fast.

5482

5483

This could be used, for example, with a RangeQuery on a formatted

5484

date field to implement date filtering. One could re-use a

5485

single QueryFilter that matches, e.g., only documents modified

5486

within the last week. The QueryFilter and RangeQuery would only

5487

need to be reconstructed once per day. (cutting)

5488

5489

15. Added a new IndexWriter method, getAnalyzer(). This returns the

5490

analyzer used when adding documents to this index. (cutting)

5491

5492

16. Fixed a bug with IndexReader.lastModified(). Before, document

5493

deletion did not update this. Now it does. (cutting)

5494

5495

17. Added Russian Analyzer.

5496

(Boris Okner via otis)

5497

5498

18. Added a public, extensible scoring API. For details, see the

5499

javadoc for org.apache.lucene.search.Similarity.

5500

5501

19. Fixed return of Hits.id() from float to int. (Terry Steichen via Peter).

5502

5503

20. Added getFieldNames() to IndexReader and Segment(s)Reader classes.

5504

(Peter Mularien via otis)

5505

5506

21. Added getFields(String) and getValues(String) methods.

5507

Contributed by Rasik Pandey on 2002-10-09

5508

(Rasik Pandey via otis)

5509

5510

22. Revised internal search APIs. Changes include:

5511

5512

a. Queries are no longer modified during a search. This makes

5513

it possible, e.g., to reuse the same query instance with

5514

multiple indexes from multiple threads.

5515

5516

b. Term-expanding queries (e.g. PrefixQuery, WildcardQuery,

5517

etc.) now work correctly with MultiSearcher, fixing bugs 12619

5518

and 12667.

5519

5520

c. Boosting BooleanQuery's now works, and is supported by the

5521

query parser (problem reported by Lee Mallabone). Thus a query

5522

like "(+foo +bar)^2 +baz" is now supported and equivalent to

5523

"(+foo^2 +bar^2) +baz".

5524

5525

d. New method: Query.rewrite(IndexReader). This permits a

5526

query to re-write itself as an alternate, more primitive query.

5527

Most of the term-expanding query classes (PrefixQuery,

5528

WildcardQuery, etc.) are now implemented using this method.

5529

5530

e. New method: Searchable.explain(Query q, int doc). This

5531

returns an Explanation instance that describes how a particular

5532

document is scored against a query. An explanation can be

5533

displayed as either plain text, with the toString() method, or

5534

as HTML, with the toHtml() method. Note that computing an

5535

explanation is as expensive as executing the query over the

5536

entire index. This is intended to be used in developing

5537

Similarity implementations, and, for good performance, should

5538

not be displayed with every hit.

5539

5540

f. Scorer and Weight are public, not package protected. It now

5541

possible for someone to write a Scorer implementation that is

5542

not in the org.apache.lucene.search package. This is still

5543

fairly advanced programming, and I don't expect anyone to do

5544

this anytime soon, but at least now it is possible.

5545

5546

g. Added public accessors to the primitive query classes

5547

(TermQuery, PhraseQuery and BooleanQuery), permitting access to

5548

their terms and clauses.

5549

5550

Caution: These are extensive changes and they have not yet been

5551

tested extensively. Bug reports are appreciated.

5552

(cutting)

5553

5554

23. Added convenience RAMDirectory constructors taking File and String

5555

arguments, for easy FSDirectory to RAMDirectory conversion.

5556

(otis)

5557

5558

24. Added code for manual renaming of files in FSDirectory, since it

5559

has been reported that java.io.File's renameTo(File) method sometimes

5560

fails on Windows JVMs.

5561

(Matt Tucker via otis)

5562

5563

25. Refactored QueryParser to make it easier for people to extend it.

5564

Added the ability to automatically lower-case Wildcard terms in

5565

the QueryParser.

5566

(Tatu Saloranta via otis)

5567

5568

5569

1.2 RC6

5570

5571

1. Changed QueryParser.jj to have "?" be a special character which

5572

allowed it to be used as a wildcard term. Updated TestWildcard

5573

unit test also. (Ralf Hettesheimer via carlson)

5574

5575

1.2 RC5

5576

5577

1. Renamed build.properties to default.properties and updated

5578

the BUILD.txt document to describe how to override the

5579

default.property settings without having to edit the file. This

5580

brings the build process closer to Scarab's build process.

5581

(jon)

5582

5583

2. Added MultiFieldQueryParser class. (Kelvin Tan, via otis)

5584

5585

3. Updated "powered by" links. (otis)

5586

5587

4. Fixed instruction for setting up JavaCC - Bug #7017 (otis)

5588

5589

5. Added throwing exception if FSDirectory could not create directory

5590

- Bug #6914 (Eugene Gluzberg via otis)

5591

5592

6. Update MultiSearcher, MultiFieldParse, Constants, DateFilter,

5593

LowerCaseTokenizer javadoc (otis)

5594

5595

7. Added fix to avoid NullPointerException in results.jsp

5596

(Mark Hayes via otis)

5597

5598

8. Changed Wildcard search to find 0 or more char instead of 1 or more

5599

(Lee Mallobone, via otis)

5600

5601

9. Fixed error in offset issue in GermanStemFilter - Bug #7412

5602

(Rodrigo Reyes, via otis)

5603

5604

10. Added unit tests for wildcard search and DateFilter (otis)

5605

5606

11. Allow co-existence of indexed and non-indexed fields with the same name

5607

(cutting/casper, via otis)

5608

5609

12. Add escape character to query parser.

5610

(briangoetz)

5611

5612

13. Applied a patch that ensures that searches that use DateFilter

5613

don't throw an exception when no matches are found. (David Smiley, via

5614

otis)

5615

5616

14. Fixed bugs in DateFilter and wildcardquery unit tests. (cutting, otis, carlson)

5617

5618

5619

1.2 RC4

5620

5621

1. Updated contributions section of website.

5622

Add XML Document #3 implementation to Document Section.

5623

Also added Term Highlighting to Misc Section. (carlson)

5624

5625

2. Fixed NullPointerException for phrase searches containing

5626

unindexed terms, introduced in 1.2RC3. (cutting)

5627

5628

3. Changed document deletion code to obtain the index write lock,

5629

enforcing the fact that document addition and deletion cannot be

5630

performed concurrently. (cutting)

5631

5632

4. Various documentation cleanups. (otis, acoliver)

5633

5634

5. Updated "powered by" links. (cutting, jon)

5635

5636

6. Fixed a bug in the GermanStemmer. (Bernhard Messer, via otis)

5637

5638

7. Changed Term and Query to implement Serializable. (scottganyo)

5639

5640

8. Fixed to never delete indexes added with IndexWriter.addIndexes().

5641

(cutting)

5642

5643

9. Upgraded to JUnit 3.7. (otis)

5644

5645

1.2 RC3

5646

5647

1. IndexWriter: fixed a bug where adding an optimized index to an

5648

empty index failed. This was encountered using addIndexes to copy

5649

a RAMDirectory index to an FSDirectory.

5650

5651

2. RAMDirectory: fixed a bug where RAMInputStream could not read

5652

across more than across a single buffer boundary.

5653

5654

3. Fix query parser so it accepts queries with unicode characters.

5655

(briangoetz)

5656

5657

4. Fix query parser so that PrefixQuery is used in preference to

5658

WildcardQuery when there's only an asterisk at the end of the

5659

term. Previously PrefixQuery would never be used.

5660

5661

5. Fix tests so they compile; fix ant file so it compiles tests

5662

properly. Added test cases for Analyzers and PriorityQueue.

5663

5664

6. Updated demos, added Getting Started documentation. (acoliver)

5665

5666

7. Added 'contributions' section to website & docs. (carlson)

5667

5668

8. Removed JavaCC from source distribution for copyright reasons.

5669

Folks must now download this separately from metamata in order to

5670

compile Lucene. (cutting)

5671

5672

9. Substantially improved the performance of DateFilter by adding the

5673

ability to reuse TermDocs objects. (cutting)

5674

5675

10. Added IndexReader methods:

5676

public static boolean indexExists(String directory);

5677

public static boolean indexExists(File directory);

5678

public static boolean indexExists(Directory directory);

5679

public static boolean isLocked(Directory directory);

5680

public static void unlock(Directory directory);

5681

(cutting, otis)

5682

5683

11. Fixed bugs in GermanAnalyzer (gschwarz)

5684

5685

5686

1.2 RC2:

5687

- added sources to distribution

5688

- removed broken build scripts and libraries from distribution

5689

- SegmentsReader: fixed potential race condition

5690

- FSDirectory: fixed so that getDirectory(xxx,true) correctly

5691

erases the directory contents, even when the directory

5692

has already been accessed in this JVM.

5693

- RangeQuery: Fix issue where an inclusive range query would

5694

include the nearest term in the index above a non-existant

5695

specified upper term.

5696

- SegmentTermEnum: Fix NullPointerException in clone() method

5697

when the Term is null.

5698

- JDK 1.1 compatibility fix: disabled lock files for JDK 1.1,

5699

since they rely on a feature added in JDK 1.2.

5700

5701

1.2 RC1 (first Apache release):

5702

- packages renamed from com.lucene to org.apache.lucene

5703

- license switched from LGPL to Apache

5704

- ant-only build -- no more makefiles

5705

- addition of lock files--now fully thread & process safe

5706

- addition of German stemmer

5707

- MultiSearcher now supports low-level search API

5708

- added RangeQuery, for term-range searching

5709

- Analyzers can choose tokenizer based on field name

5710

- misc bug fixes.

5711

5712

1.01b (last Sourceforge release)

5713

. a few bug fixes

5714

. new Query Parser

5715

. new prefix query (search for "foo*" matches "food")

5716

5717

1.0

5718

5719

This release fixes a few serious bugs and also includes some

5720

performance optimizations, a stemmer, and a few other minor

5721

enhancements.

5722

5723

0.04

5724

5725

Lucene now includes a grammar-based tokenizer, StandardTokenizer.

5726

5727

The only tokenizer included in the previous release (LetterTokenizer)

5728

identified terms consisting entirely of alphabetic characters. The

5729

new tokenizer uses a regular-expression grammar to identify more

5730

complex classes of terms, including numbers, acronyms, email

5731

addresses, etc.

5732

5733

StandardTokenizer serves two purposes:

5734

5735

1. It is a much better, general purpose tokenizer for use by

5736

applications as is.

5737

5738

The easiest way for applications to start using

5739

StandardTokenizer is to use StandardAnalyzer.

5740

5741

2. It provides a good example of grammar-based tokenization.

5742

5743

If an application has special tokenization requirements, it can

5744

implement a custom tokenizer by copying the directory containing

5745

the new tokenizer into the application and modifying it

5746

accordingly.

5747

5748

0.01

5749

5750

First open source release.

5751

5752

The code has been re-organized into a new package and directory

5753

structure for this release. It builds OK, but has not been tested

5754

beyond that since the re-organization.

Older »