2
Licensed to the Apache Software Foundation (ASF) under one or more
3
contributor license agreements. See the NOTICE file distributed with
4
this work for additional information regarding copyright ownership.
5
The ASF licenses this file to You under the Apache License, Version 2.0
6
(the "License"); you may not use this file except in compliance with
7
the License. You may obtain a copy of the License at
9
http://www.apache.org/licenses/LICENSE-2.0
11
Unless required by applicable law or agreed to in writing, software
12
distributed under the License is distributed on an "AS IS" BASIS,
13
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
See the License for the specific language governing permissions and
15
limitations under the License.
19
Licensed to the Apache Software Foundation (ASF) under one or more
20
contributor license agreements. See the NOTICE file distributed with
21
this work for additional information regarding copyright ownership.
22
The ASF licenses this file to You under the Apache License, Version 2.0
23
(the "License"); you may not use this file except in compliance with
24
the License. You may obtain a copy of the License at
26
http://www.apache.org/licenses/LICENSE-2.0
28
Unless required by applicable law or agreed to in writing, software
29
distributed under the License is distributed on an "AS IS" BASIS,
30
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
31
See the License for the specific language governing permissions and
32
limitations under the License.
35
<title>Apache Lucene API</title>
39
<p>Apache Lucene is a high-performance, full-featured text search engine library.
40
Here's a simple example how to use Lucene for indexing and searching (using JUnit
41
to check if the results are what we expect):</p>
43
<!-- code comes from org.apache.lucene.TestDemo: -->
44
<!-- ======================================================== -->
45
<!-- = Java Sourcecode to HTML automatically converted code = -->
46
<!-- = Java2Html Converter 5.0 [2006-03-04] by Markus Gebhard markus@jave.de = -->
47
<!-- = Further information: http://www.java2html.de = -->
48
<pre class="prettyprint">
49
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
51
// Store the index in memory:
52
Directory directory = new RAMDirectory();
53
// To store an index on disk, use this instead:
54
//Directory directory = FSDirectory.open("/tmp/testindex");
55
IndexWriter iwriter = new IndexWriter(directory, analyzer, true,
56
new IndexWriter.MaxFieldLength(25000));
57
Document doc = new Document();
58
String text = "This is the text to be indexed.";
59
doc.add(new Field("fieldname", text, Field.Store.YES,
60
Field.Index.ANALYZED));
61
iwriter.addDocument(doc);
64
// Now search the index:
65
IndexReader ireader = IndexReader.open(directory); // read-only=true
66
IndexSearcher isearcher = new IndexSearcher(ireader);
67
// Parse a simple query that searches for "text":
68
QueryParser parser = new QueryParser("fieldname", analyzer);
69
Query query = parser.parse("text");
70
ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
71
assertEquals(1, hits.length);
72
// Iterate through the results:
73
for (int i = 0; i < hits.length; i++) {
74
Document hitDoc = isearcher.doc(hits[i].doc);
75
assertEquals("This is the text to be indexed.", hitDoc.get("fieldname"));
79
directory.close();</pre>
80
<!-- = END of automatically generated HTML code = -->
81
<!-- ======================================================== -->
85
<p>The Lucene API is divided into several packages:</p>
89
<b><a href="org/apache/lucene/analysis/package-summary.html">org.apache.lucene.analysis</a></b>
90
defines an abstract <a href="org/apache/lucene/analysis/Analyzer.html">Analyzer</a>
91
API for converting text from a <a href="http://java.sun.com/products/jdk/1.2/docs/api/java/io/Reader.html">java.io.Reader</a>
92
into a <a href="org/apache/lucene/analysis/TokenStream.html">TokenStream</a>,
93
an enumeration of token <a href="org/apache/lucene/util/Attribute.html">Attribute</a>s.
94
A TokenStream can be composed by applying <a href="org/apache/lucene/analysis/TokenFilter.html">TokenFilter</a>s
95
to the output of a <a href="org/apache/lucene/analysis/Tokenizer.html">Tokenizer</a>.
96
Tokenizers and TokenFilters are strung together and applied with an <a href="org/apache/lucene/analysis/Analyzer.html">Analyzer</a>.
97
A handful of Analyzer implementations are provided, including <a href="org/apache/lucene/analysis/StopAnalyzer.html">StopAnalyzer</a>
98
and the grammar-based <a href="org/apache/lucene/analysis/standard/StandardAnalyzer.html">StandardAnalyzer</a>.</li>
101
<b><a href="org/apache/lucene/document/package-summary.html">org.apache.lucene.document</a></b>
102
provides a simple <a href="org/apache/lucene/document/Document.html">Document</a>
103
class. A Document is simply a set of named <a href="org/apache/lucene/document/Field.html">Field</a>s,
104
whose values may be strings or instances of <a href="http://java.sun.com/products/jdk/1.2/docs/api/java/io/Reader.html">java.io.Reader</a>.</li>
107
<b><a href="org/apache/lucene/index/package-summary.html">org.apache.lucene.index</a></b>
108
provides two primary classes: <a href="org/apache/lucene/index/IndexWriter.html">IndexWriter</a>,
109
which creates and adds documents to indices; and <a href="org/apache/lucene/index/IndexReader.html">IndexReader</a>,
110
which accesses the data in the index.</li>
113
<b><a href="org/apache/lucene/search/package-summary.html">org.apache.lucene.search</a></b>
114
provides data structures to represent queries (ie <a href="org/apache/lucene/search/TermQuery.html">TermQuery</a>
115
for individual words, <a href="org/apache/lucene/search/PhraseQuery.html">PhraseQuery</a>
116
for phrases, and <a href="org/apache/lucene/search/BooleanQuery.html">BooleanQuery</a>
117
for boolean combinations of queries) and the abstract <a href="org/apache/lucene/search/Searcher.html">Searcher</a>
118
which turns queries into <a href="org/apache/lucene/search/TopDocs.html">TopDocs</a>.
119
<a href="org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a>
120
implements search over a single IndexReader.</li>
123
<b><a href="org/apache/lucene/queryParser/package-summary.html">org.apache.lucene.queryParser</a></b>
124
uses <a href="http://javacc.dev.java.net">JavaCC</a> to implement a
125
<a href="org/apache/lucene/queryParser/QueryParser.html">QueryParser</a>.</li>
128
<b><a href="org/apache/lucene/store/package-summary.html">org.apache.lucene.store</a></b>
129
defines an abstract class for storing persistent data, the <a href="org/apache/lucene/store/Directory.html">Directory</a>,
130
which is a collection of named files written by an <a href="org/apache/lucene/store/IndexOutput.html">IndexOutput</a>
131
and read by an <a href="org/apache/lucene/store/IndexInput.html">IndexInput</a>.
132
Multiple implementations are provided, including <a href="org/apache/lucene/store/FSDirectory.html">FSDirectory</a>,
133
which uses a file system directory to store files, and <a href="org/apache/lucene/store/RAMDirectory.html">RAMDirectory</a>
134
which implements files as memory-resident data structures.</li>
137
<b><a href="org/apache/lucene/util/package-summary.html">org.apache.lucene.util</a></b>
138
contains a few handy data structures and util classes, ie <a href="org/apache/lucene/util/BitVector.html">BitVector</a>
139
and <a href="org/apache/lucene/util/PriorityQueue.html">PriorityQueue</a>.</li>
141
To use Lucene, an application should:
144
Create <a href="org/apache/lucene/document/Document.html">Document</a>s by
146
<a href="org/apache/lucene/document/Field.html">Field</a>s;</li>
149
Create an <a href="org/apache/lucene/index/IndexWriter.html">IndexWriter</a>
150
and add documents to it with <a href="org/apache/lucene/index/IndexWriter.html#addDocument(org.apache.lucene.document.Document)">addDocument()</a>;</li>
153
Call <a href="org/apache/lucene/queryParser/QueryParser.html#parse(java.lang.String)">QueryParser.parse()</a>
154
to build a query from a string; and</li>
157
Create an <a href="org/apache/lucene/search/IndexSearcher.html">IndexSearcher</a>
158
and pass the query to its <a href="org/apache/lucene/search/Searcher.html#search(org.apache.lucene.search.Query)">search()</a>
161
Some simple examples of code which does this are:
164
<a href="http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/lucene/contrib/demo/src/java/org/apache/lucene/demo/IndexFiles.java">IndexFiles.java</a> creates an
165
index for all the files contained in a directory.</li>
168
<a href="http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/lucene/demo/src/java/org/apache/lucene/demo/SearchFiles.java">SearchFiles.java</a> prompts for
169
queries and searches an index.</li>
171
To demonstrate these, try something like:
172
<blockquote><tt>> <b>java -cp lucene.jar:lucene-demo.jar:lucene-analyzers-common.jar org.apache.lucene.demo.IndexFiles rec.food.recipes/soups</b></tt>
173
<br><tt>adding rec.food.recipes/soups/abalone-chowder</tt>
174
<br><tt> </tt>[ ... ]
176
<p><tt>> <b>java -cp lucene.jar:lucene-demo.jar:lucene-analyzers-common.jar org.apache.lucene.demo.SearchFiles</b></tt>
177
<br><tt>Query: <b>chowder</b></tt>
178
<br><tt>Searching for: chowder</tt>
179
<br><tt>34 total matching documents</tt>
180
<br><tt>1. rec.food.recipes/soups/spam-chowder</tt>
181
<br><tt> </tt>[ ... thirty-four documents contain the word "chowder" ... ]
183
<p><tt>Query: <b>"clam chowder" AND Manhattan</b></tt>
184
<br><tt>Searching for: +"clam chowder" +manhattan</tt>
185
<br><tt>2 total matching documents</tt>
186
<br><tt>1. rec.food.recipes/soups/clam-chowder</tt>
187
<br><tt> </tt>[ ... two documents contain the phrase "clam chowder"
188
and the word "manhattan" ... ]
189
<br> [ Note: "+" and "-" are canonical, but "AND", "OR"
190
and "NOT" may be used. ]</blockquote>