1
Linux Build instructions
2
=======================================
4
If you downloaded CLucene as a tar ball you should be able to skip straight
5
to the section titled 'building', otherwise read the next section
8
Rebuilding the autobuild scripts
9
--------------------------------
10
If you made changes to the configure.ac or any of the Makefile.am
11
files you will also need to run through this process.
14
GNU autotools is required. I have the following versions installed:
19
If you use significantly older versions, I can almost guarantee
20
issues. This is because each of the autotools is constantly changing
21
with little regard to backward compatability or even compatiability
22
with the other autotools.
24
Run the autogen.sh file in the root directory of clucene to run the necessary commands.
29
The following will get you building assuming that you have suffciently
30
recent buld tools installed.
33
3.) if you downloaded a tar version skip to 5
37
7.) things will churn for a very long time, the clucene library will
38
be built as well as the examples.
39
8.) check the src/demo, test and src directory
41
In src/demo you should see:
44
In test you should see
47
In src you should see:
48
libclucene.so.0.0.0 libclucene.la libclucene.a
49
and symbolic links to these files.
51
9.) If you want to run make install to copy the clucene files into the system
52
include and lib directories
53
10.) You may have to run
54
export LD_LIBRARY_PATH=/path/to/clucene/lib
56
11.) run ./cl_test in the test directory and check that the tests all run
58
Alternative (faster) way of building:
59
-------------------------------------
60
This method does not create library files, so depending on your needs you may not
61
find this method useful.
63
* Do steps 1-5 of the previous build process.
64
* Change directory into src/
66
* Change directory into test/ (cd ../test/)
68
* You should see cl_test_monolithic in this directory
69
* run ./cl_test_monolithic and check that the tests all run
1
* There are packages available for most linux distributions through the usual channels.
2
* The Clucene Sourceforge website also has some distributions available.
4
Also in this document is information how to build from source, troubleshooting,
5
performance, and how to create a new distribution.
12
* CMake version 2.4.2 or later.
13
* A functioning and fairly new C++ compiler. We test mostly on GCC and Visual Studio 6+.
14
Anything other than that may not work.
15
* Something to unzip/untar the source code.
18
1.) Download the latest sourcecode from http://www.sourceforge.net/projects/clucene
19
[Choose stable if you want the 'time tested' version of code. However, often
20
the unstable version will suite your needs more since it is newer and has had
21
more work put into it. The decision is up to you.]
22
2.) Unpack the tarball/zip/bzip/whatever
23
3.) Open a command prompt, terminal window, or cygwin session.
24
4.) Change directory into the root of the sourcecode (from now on referred to as <clucene>)
26
5.) Create and change directory into an 'out-of-source' directory for your build.
27
[This is by far the easiest way to build, it has the benefit of being able to
28
create different types of builds in the same source-tree.]
29
# mkdir <clucene>/build-name
30
# cd <clucene>/build-name
31
6.) Configure using cmake. This can be done many different ways, but the basic syntax is
32
# cmake [-G "Script name"] ..
33
[Where "Script name" is the name of the scripts to build (e.g. Visual Studio 8 2005).
34
A list of supported build scripts can be found by]
36
7.) You can configure several options such as the build type, debugging information,
37
mmap support, etc, by using the CMake GUI or by calling
39
Make sure you call configure again if you make any changes.
40
8.) Start the build. This depends on which build script you specified, but it would be something like
44
Or open the solution files with your IDE.
46
[You can also specify to just build a certain target (such as cl_test, cl_demo,
47
clucene-core (shared library), clucene-core-static (static library).]
48
9.) The binary files will be available in <clucene>build-name/bin
49
10.)Test the code. (After building the tests - this is done by default, or by calling make cl_test)
51
11.)At this point you can install the library:
53
[There are options to do this from the IDE, but I find it easier to create a
54
distribution (see instructions below) and install that instead.]
57
[This creates the demo application, which demonstrates a simple text indexing and searching].
59
Adjust build values using ccmake or the Cmake GUI and rebuild.
61
12.)Now you can develop your own code. This is beyond the scope of this document.
62
Read the README for information about documentation or to get help on the mailinglist.
66
Some platforms require specific actions to get cmake working. Here are some general tips:
69
I had problems when using the standard stl library. Using the -stlport4 switch worked. Had
70
to specify compiler from the command line: cmake -DCXX_COMPILER=xxx -stlport4
74
Use of ccache will speed up build times a lot. I found it easiest to add the /usr/lib/ccache directory to the beginning of your paths. This works for most common compilers.
76
PATH=/usr/lib/ccache:$PATH
78
Note: you must do this BEFORE you configure the path, since you cannot change the compiler path after it is configured.
82
CLucene is installed in CMAKE_INSTALL_PREFIX by default.
84
CLucene used to put config headers next to the library. this was done
85
because these headers are generated and are relevant to the library.
86
CMAKE_INSTALL_PREFIX was for system-independent files. the idea is that
87
you could have several versions of the library installed (ascii version,
88
ucs2 version, multithread, etc) and have only one set of headers.
89
in version 0.9.24+ we allow this feature, but you have to use
90
LUCENE_SYS_INCLUDES to specify where to install these files.
96
Some platforms don't provide enough file handles to run CLucene properly.
97
To solve this, increase the open file limit:
103
GDB - GNU debugging tool (linux only)
104
------------------------
105
If you get an error, try doing this. More information on GDB can be found on the internet
109
when gdb shows a crash run
111
a backtrace will be printed. This may help to solve any problems.
116
* clucene-config.h is required and is distributed next to the library, so that multiple libraries can exist on the
117
same machine, but use the same header files.
118
* _HeaderFile.h files are private, and are not to be used or distributed by anything besides the clucene-core library.
119
* _clucene-config.h should NOT be used, it is also internal
120
* HeaderFile.h are public and are distributed and the classes within should be exported using CLUCENE_EXPORT.
121
* The exception to the internal/public conventions is if you use the static library. In this case the internal
122
symbols will be available (this is the way the tests program tests internal code). However this is not recommended.
126
Memory in CLucene has been a bit of a difficult thing to manage because of the
127
unclear specification about who owns what memory. This was mostly a result of
128
CLucene's java-esque coding style resulting from porting from java to c++ without
129
too much re-writing of the API. However, CLucene is slowly improving
130
in this respect and we try and follow these development and coding rules (though
131
we dont guarantee that they are all met at this stage):
133
1. Whenever possible the caller must create the object that is being filled. For example:
134
IndexReader->getDocument(id, document);
135
As opposed to the old method of document = IndexReader->getDocument(id);
137
2. Clone always returns a new object that must be cleaned up manually.
140
1. What should be the convention for an object taking ownership of memory?
141
Some documenting is available on this, but not much
143
Working with valgrind
144
----------------------
145
Valgrind reports memory leaks and memory problems. Tests should always pass
146
valgrind before being passed.
148
#valgrind --leak-check=full <program>
150
Memory leak tracking with dmalloc
151
---------------------------------
152
dmalloc (http://dmalloc.com/) is also a nice tool for finding memory leaks.
153
To enable, set the ENABLE_DMALLOC flag to ON in cmake. You will of course
154
have to have the dmalloc lib installed for this to work.
156
The cl_test file will by default print a low number of errors and leaks into
157
the dmalloc.log.txt file (however, this has a tendency to print false positives).
158
You can override this by setting your environment variable DMALLOC_OPTIONS.
159
See http://dmalloc.com/ or dmalloc --usage for more information on how to use dmalloc
162
# DMALLOC_OPTIONS=medium,log=dmalloc.log.txt
163
# export DMALLOC_OPTIONS
165
UPDATE: when i upgrade my machine to Ubuntu 9.04, dmalloc stopped working (caused
168
Performance with callgrind
169
--------------------------
172
valgrind --tool=callgrind <command: e.g. bin/cl_test>
173
this will create a file like callgrind.out.12345. you can open this with kcachegrind or some
177
Performance with gprof
178
----------------------
179
Note: I recommend callgrind, it works much better.
181
Compile with gprof turned on (ENABLE_GPROF in cmake gui or using ccmake).
182
I've found (at least on windows cygwin) that gprof wasn't working over
183
dll boundaries, running the cl_test-pedantic monolithic build worked better.
185
This is typically what I use to produce some meaningful output after a -pg
186
compiled application has exited:
187
# gprof bin/cl_test-pedantic.exe gmon.out >gprof.txt
189
Code coverage with gcov
190
-----------------------
191
To create a code coverage report of the test, you can use gcov. Here are the
192
steps I followed to create a nice html report. You'll need the lcov package
193
installed to generate html. Also, I recommend using an out-of-source build
194
directory as there are lots of files that will be generated.
196
NOTE: you must have lcov installed for this to work
198
* It is normally recommended to compile with no optimisations, so change CMAKE_BUILD_TYPE
201
* I have created a cl_test-gcov target which contains the necessary gcc switches
202
already. So all you need to do is
205
If everything goes well, there will be a directory called code-coverage containing the report.
207
If you want to do this process manually, then:
208
# lcov --directory ./src/test/CMakeFiles/cl_test-gcov.dir/__/core/CLucene -c -o clucene-coverage.info
209
# lcov --remove clucene-coverage.info "/usr/*" > clucene-coverage.clean
210
# genhtml -o clucene-coverage clucene-coverage.clean
212
If both those commands pass, then there will be a clucene coverage report in the
213
clucene-coverage directory.
217
Very little benchmarking has been done on clucene. Andi Vajda posted some
218
limited statistics on the clucene list a while ago with the following results.
220
There are 250 HTML files under $JAVA_HOME/docs/api/java/util for about
222
org.apache.lucene.demo.IndexFiles with java and gcj:
223
on mac os x 10.3.1 (panther) powerbook g4 1ghz 1gb:
224
. running with java 1.4.1_01-99 : 20379 ms
225
. running with gcj 3.3.2 -O2 : 17842 ms
226
. running clucene 0.8.9's demo : 9930 ms
228
I recently did some more tests and came up with these rough tests:
229
663mb (797 files) of Guttenberg texts
230
on a Pentium 4 running Windows XP with 1 GB of RAM. Indexing max 100,000 fields
231
- Jlucene: 646453ms. peak mem usage ~72mb, avg ~14mb ram
232
- Clucene: 232141. peak mem usage ~60, avg ~4mb ram
234
Searching indexing using 10,000 single word queries
235
- Jlucene: ~60078ms and used ~13mb ram
236
- Clucene: ~48359ms and used ~4.2mb ram
240
CPack is used for creating distributions.
241
* Create a out-of-source build as per usual
242
* Make sure the version number is correct (see <clucene>/CMakeList.txt, right at the top of the file)
243
* Make sure you are compiling in the correct release mode (check ccmake or the cmake gui)
244
* Make sure you enable ENABLE_PACKAGING (check ccmake or the cmake gui)
245
* Next, check that the package is compliant using several tests (must be done from a linux terminal, or cygwin):
246
# cd <clucene>/build-name
248
* Make sure the source directory is clean. Make sure there are no unknown svn files:
250
* Run the tests to make sure that the code is ok (documented above)
251
* If all tests pass, then run
253
for the binary package (and header files). This will only create a tar.gz package.
255
# make package_source
256
for the source package. This will create a ZIP on windows, and tar.bz2 and tar.gz packages on other platforms.
258
There are also options for create RPM, Cygwin, NSIS, Debian packages, etc. It depends on your version of CPack.
261
to get a list of generators.
263
Then create a special package by calling
264
# cpack -G <GENERATOR> CPackConfig.cmake