1
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
2
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3
<html xmlns="http://www.w3.org/1999/xhtml">
5
<meta name="generator" content="HTML Tidy, see www.w3.org" />
6
<title>Recoll user manual</title>
7
<meta name="GENERATOR" content="Modular DocBook HTML Stylesheet Version 1.79" />
8
<link rel="STYLESHEET" type="text/css" href="docbook.css" />
9
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
11
<body class="BOOK" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#840084"
13
<div class="BOOK"><a id="AEN1" name="AEN1"></a>
14
<div class="TITLEPAGE">
15
<h1 class="TITLE"><a id="AEN2" name="AEN2">Recoll user manual</a></h1>
17
<h3 class="AUTHOR"><a id="AEN4" name="AEN4"></a>Jean-Francois Dockes</h3>
19
<div class="AFFILIATION">
21
<p class="ADDRESS"><code class="EMAIL"><<a
22
href="mailto:jean-francois.dockes@wanadoo.fr">jean-francois.dockes@wanadoo.fr</a>></code></p>
26
<p class="COPYRIGHT">Copyright © 2005 Jean-Francois Dockes</p>
29
<div class="ABSTRACT"><a id="AEN14" name="AEN14"></a>
30
<p>This document introduces full text search notions and describes the installation and
31
use of the <b class="APPLICATION">Recoll</b> application.</p>
41
<dt><b>Table of Contents</b></dt>
43
<dt>1. <a href="#RCL.INTRODUCTION">Introduction</a></dt>
47
<dt>1.1. <a href="#RCL.INTRODUCTION.TRYIT">Giving it a try</a></dt>
49
<dt>1.2. <a href="#RCL.INTRODUCTION.SEARCH">Full text search</a></dt>
51
<dt>1.3. <a href="#RCL.INTRODUCTION.RECOLL">Recoll overview</a></dt>
55
<dt>2. <a href="#RCL.INDEXING">Indexing</a></dt>
59
<dt>2.1. <a href="#RCL.INDEXING.INTRODUCTION">Introduction</a></dt>
61
<dt>2.2. <a href="#RCL.INDEXING.STORAGE">Index storage</a></dt>
65
<dt>2.2.1. <a href="#RCL.INDEXING.STORAGE.SECURITY">Security aspects</a></dt>
69
<dt>2.3. <a href="#RCL.INDEXING.CONFIG">The indexing configuration</a></dt>
71
<dt>2.4. <a href="#RCL.INDEXING.PERIODIC">Periodic indexing</a></dt>
75
<dt>2.4.1. <a href="#RCL.INDEXING.PERIODIC.EXEC">Starting indexing</a></dt>
77
<dt>2.4.2. <a href="#RCL.INDEXING.PERIODIC.AUTOMAT">Using <tt class="COMMAND">cron</tt>
78
to automate indexing</a></dt>
82
<dt>2.5. <a href="#RCL.INDEXING.MONITOR">Real time indexing</a></dt>
86
<dt>3. <a href="#RCL.SEARCH">Searching</a></dt>
90
<dt>3.1. <a href="#RCL.SEARCH.SIMPLE">Simple search</a></dt>
92
<dt>3.2. <a href="#RCL.SEARCH.RESLIST">The result list</a></dt>
96
<dt>3.2.1. <a href="#RCL.SEARCH.RESULTLIST.MENU">The result list right-click
101
<dt>3.3. <a href="#RCL.SEARCH.PREVIEW">The preview window</a></dt>
103
<dt>3.4. <a href="#RCL.SEARCH.LANG">The query language</a></dt>
105
<dt>3.5. <a href="#RCL.SEARCH.COMPLEX">Complex/advanced search</a></dt>
107
<dt>3.6. <a href="#RCL.SEARCH.TERMEXPLORER">The term explorer tool</a></dt>
109
<dt>3.7. <a href="#RCL.SEARCH.WILDCARDS">More about wildcards</a></dt>
111
<dt>3.8. <a href="#RCL.SEARCH.MULTIDB">Multiple databases</a></dt>
113
<dt>3.9. <a href="#RCL.SEARCH.HISTORY">Document history</a></dt>
115
<dt>3.10. <a href="#RCL.SEARCH.SORT">Sorting search results</a></dt>
117
<dt>3.11. <a href="#RCL.SEARCH.TIPS">Search tips, shortcuts</a></dt>
119
<dt>3.12. <a href="#RCL.SEARCH.CUSTOM">Customizing the search interface</a></dt>
123
<dt>4. <a href="#RCL.INSTALL">Installation</a></dt>
127
<dt>4.1. <a href="#RCL.INSTALL.BINARY">Installing a prebuilt copy</a></dt>
131
<dt>4.1.1. <a href="#RCL.INSTALL.BINARY.PACKAGE">Installing through a package
134
<dt>4.1.2. <a href="#RCL.INSTALL.BINARY.RCL">Installing a prebuilt <b
135
class="APPLICATION">Recoll</b></a></dt>
139
<dt>4.2. <a href="#RCL.INSTALL.EXTERNAL">Supporting packages</a></dt>
141
<dt>4.3. <a href="#RCL.INSTALL.BUILDING">Building from source</a></dt>
145
<dt>4.3.1. <a href="#RCL.INSTALL.BUILDING.PREREQS">Prerequisites</a></dt>
147
<dt>4.3.2. <a href="#RCL.INSTALL.BUILDING.BUILD">Building</a></dt>
149
<dt>4.3.3. <a href="#RCL.INSTALL.BUILDING.INSTALL">Installation</a></dt>
153
<dt>4.4. <a href="#RCL.INSTALL.CONFIG">Configuration overview</a></dt>
157
<dt>4.4.1. <a href="#RCL.INSTALL.CONFIG.RECOLLCONF">Main configuration file</a></dt>
159
<dt>4.4.2. <a href="#RCLINSTALL.CONFIG.MIMEMAP">The mimemap file</a></dt>
161
<dt>4.4.3. <a href="#RCLINSTALL.CONFIG.MIMECONF">The mimeconf file</a></dt>
163
<dt>4.4.4. <a href="#RCLINSTALL.CONFIG.MIMEVIEW">The mimeview file</a></dt>
165
<dt>4.4.5. <a href="#RCLINSTALL.CONFIG.EXAMPLES">Examples of configuration
174
<div class="CHAPTER">
176
<h1><a id="RCL.INTRODUCTION" name="RCL.INTRODUCTION"></a>Chapter 1. Introduction</h1>
179
<h2 class="SECT1"><a id="RCL.INTRODUCTION.TRYIT" name="RCL.INTRODUCTION.TRYIT">1.1.
180
Giving it a try</a></h2>
182
<p>If you do not like reading manuals (who does?) and would like to give <b
183
class="APPLICATION">Recoll</b> a try, just perform <a
184
href="#RCL.INSTALL.BINARY">installation</a> and start the <tt class="COMMAND">recoll</tt>
185
user interface, which will index your home directory by default, allowing you to search
186
immediately after indexing completes.</p>
188
<p>Do not do this if your home directory contains a huge number of documents and you do
189
not want to wait or are very short on disk space. In this case, you may want to edit the
190
<a href="#RCL.INDEXING.CONFIG">configuration file</a> first to restrict the indexed
193
<p>Also be aware that you may need to install the appropriate <a
194
href="#RCL.INSTALL.EXTERNAL">supporting applications</a> for document types that need
195
them (for example <b class="APPLICATION">antiword</b> for ms-word files).</p>
200
<h2 class="SECT1"><a id="RCL.INTRODUCTION.SEARCH" name="RCL.INTRODUCTION.SEARCH">1.2.
201
Full text search</a></h2>
203
<p><b class="APPLICATION">Recoll</b> is a full text search application. Full text search
204
applications let you find your data by content rather than by external attributes (like a
205
file name). More specifically, they will let you specify words (terms) that should or
206
should not appear in the text you are looking for, and return a list of matching
207
documents, ordered so that the most <span class="emphasis"><i
208
class="EMPHASIS">relevant</i></span> documents will appear first.</p>
210
<p>You do not need to remember in what file or email message you stored a given piece of
211
information. You just ask for related terms, and the tool will return a list of documents
212
where those terms are prominent, in a similar way to Internet search engines.</p>
214
<p><b class="APPLICATION">Recoll</b> tries to determine which documents are most relevant
215
to the search terms you provide. Computer algorithms for determining relevance can be
216
very complex, and in general are inferior to the power of the human mind to rapidly
217
determine relevance. The quality of relevance guessing by the search tool is probably the
218
most important element for a search application.</p>
220
<p>In many cases, you are looking for all the forms of a word, not for a specific form or
221
spelling. These different forms may include plurals, different tenses for a verb, or
222
terms derived from the same root or <span class="emphasis"><i
223
class="EMPHASIS">stem</i></span> (example: floor, floors, floored, flooring...). <b
224
class="APPLICATION">Recoll</b> will by default expand queries to all such related terms
225
(words that reduce to the same stem). This expansion can be disabled at search time.</p>
227
<p>Stemming, by itself, does not accommodate for misspellings or phonetic searches. <b
228
class="APPLICATION">Recoll</b> supports these features through a specific tool (the <tt
229
class="LITERAL">term explorer</tt>) which will let you explore the set of index terms
230
along different modes.</p>
235
<h2 class="SECT1"><a id="RCL.INTRODUCTION.RECOLL" name="RCL.INTRODUCTION.RECOLL">1.3.
236
Recoll overview</a></h2>
238
<p><b class="APPLICATION">Recoll</b> uses the <a href="http://www.xapian.org"
239
target="_top"><b class="APPLICATION">Xapian</b></a> information retrieval library as its
240
storage and retrieval engine. <b class="APPLICATION">Xapian</b> is a very mature package
241
using <a href="http://www.xapian.org/docs/intro_ir.html" target="_top">a sophisticated
242
probabilistic ranking model</a>. <b class="APPLICATION">Recoll</b> provides the interface
243
to get data into (indexing) and out (searching) of the system.</p>
245
<p>In practice, <b class="APPLICATION">Xapian</b> works by remembering where terms appear
246
in your document files. The acquisition process is called indexing.</p>
248
<p>The resulting index can be big (roughly the size of the original document set), but it
249
is not a document archive. <b class="APPLICATION">Recoll</b> can only display documents
250
that still exist at the place from which they were indexed. (Actually, there is a way to
251
reconstruct a document from the information in the index, but the result is not nice, as
252
all formatting, punctuation and capitalization are lost).</p>
254
<p><b class="APPLICATION">Recoll</b> stores all internal data in <b
255
class="APPLICATION">Unicode UTF-8</b> format, and it can index files with different
256
character sets, encodings, and languages into the same index. It has input filters for
257
many document types.</p>
259
<p>Stemming depends on the document language. <b class="APPLICATION">Recoll</b> stores
260
the unstemmed versions of terms and uses auxiliary databases for term expansion. It can
261
switch stemming languages, or add a language, without re-indexing. Storing documents in
262
different languages in the same index is possible, and useful in practice, but does
263
introduce possibilities of confusion. <b class="APPLICATION">Recoll</b> currently makes
264
no attempt at automatic language recognition.</p>
266
<p><b class="APPLICATION">Recoll</b> has many parameters which define exactly what to
267
index, and how to classify and decode the source documents. These are kept in a <a
268
href="#RCL.INDEXING.CONFIG">configuration file</a>. A default configuration is copied
269
into a standard location (usually something like <tt
270
class="FILENAME">/usr/[local/]share/recoll/examples</tt>) during installation. The
271
default parameters from this file may be overridden by values that you set inside your
272
personal configuration, found by default in the <tt class="FILENAME">.recoll</tt>
273
sub-directory of your home directory. The default configuration will index your home
274
directory with default parameters and should be sufficient for giving <b
275
class="APPLICATION">Recoll</b> a try, but you may want to adjust it later.</p>
277
<p><a href="#RCL.INDEXING.PERIODIC.EXEC">Indexing</a> is started automatically the first
278
time you execute the <tt class="COMMAND">recoll</tt> search graphical user interface, or
279
by executing the <tt class="COMMAND">recollindex</tt> command.</p>
281
<p><a href="#RCL.SEARCH">Searches</a> are performed inside the <tt
282
class="COMMAND">recoll</tt> program, which has many options to help you find what you are
287
<div class="CHAPTER">
289
<h1><a id="RCL.INDEXING" name="RCL.INDEXING"></a>Chapter 2. Indexing</h1>
292
<h2 class="SECT1"><a id="RCL.INDEXING.INTRODUCTION" name="RCL.INDEXING.INTRODUCTION">2.1.
293
Introduction</a></h2>
295
<p>Indexing is the process by which the set of documents is analyzed and the data entered
296
into the database. <b class="APPLICATION">Recoll</b> indexing is normally incremental:
297
documents will only be processed if they have been modified. On the first execution, of
298
course, all documents will need processing. A full index build can be forced later by
299
specifying an option to the indexing command (<tt class="COMMAND">recollindex
302
<p><b class="APPLICATION">Recoll</b> indexing can be performed with two different
307
<div class="FORMALPARA">
308
<p><b>Periodic indexing:</b> indexing takes place at discrete times, by executing the <tt
309
class="COMMAND">recollindex</tt> command. The typical usage is to have a nightly indexing
310
run <a href="#RCL.INDEXING.PERIODIC.AUTOMAT">programmed</a> into your <tt
311
class="COMMAND">cron</tt> file.</p>
316
<div class="FORMALPARA">
317
<p><b>Real time indexing:</b> indexing takes place as soon as a file is created or
318
changed. <tt class="COMMAND">recollindex</tt> runs as a daemon and uses a file system
319
alteration monitor such as <b class="APPLICATION">Fam</b>, <b
320
class="APPLICATION">Gamin</b> or <b class="APPLICATION">inotify</b> do detect file
321
changes. Monitoring a big directory tree can consume significant system resources.</p>
326
<p>The choice between the two methods is mostly a matter of preference, and they can be
327
combined by setting up multiple indexes (ie: use periodic indexing on a big documentation
328
directory, and real time indexing on a small home directory). Monitoring a big file
329
system tree can consume significant system resources, for dubious gains.</p>
333
<p><b class="APPLICATION">Recoll</b> knows about quite a few different document types.
334
The parameters for document types recognition and processing are set in <a
335
href="#RCL.INDEXING.CONFIG">configuration files</a> Most file types, like HTML or word
336
processing files, only hold one document. Some file types, like mail folder files can
337
hold many individually indexed documents.</p>
339
<p><b class="APPLICATION">Recoll</b> indexing processes plain text, HTML, openoffice and
340
e-mail files internally. Other types (ie: postscript, pdf, ms-word, rtf) need external
341
applications for preprocessing. The list is in the <a
342
href="#RCL.INSTALL.EXTERNAL">installation</a> section.</p>
344
<p>Without further configuration, <b class="APPLICATION">Recoll</b> will index all
345
appropriate files from your home directory, with a reasonable set of defaults.</p>
347
<p>In some cases, it may be interesting to index different areas of the file system to
348
separate databases. You can do this by using multiple configuration directories, each
349
indexing a file system area to a specific database. See the <a
350
href="#RCL.SEARCH.MULTIDB">section about using multiple databases</a> for more
351
information on multiple configurations and indexes.</p>
356
<h2 class="SECT1"><a id="RCL.INDEXING.STORAGE" name="RCL.INDEXING.STORAGE">2.2. Index
359
<p>The default location for the index data is the <tt class="FILENAME">xapiandb</tt>
360
subdirectory of the <b class="APPLICATION">Recoll</b> configuration directory, typically
361
<tt class="FILENAME">$HOME/.recoll/xapiandb/</tt>. This can be changed via two different
362
methods (with different purposes):</p>
366
<p>You can specify a different configuration directory by setting the <tt
367
class="LITERAL">RECOLL_CONFDIR</tt> environment variable, or using the <tt
368
class="LITERAL">-c</tt> option to the <b class="APPLICATION">Recoll</b> commands. This
369
method would typically be used to index different areas of the file system to different
370
indexes. For example, if you were to issue the following commands:</p>
372
<pre class="PROGRAMLISTING">
373
export RECOLL_CONFDIR=~/.indexes-email
378
Then <b class="APPLICATION">Recoll</b> would use configuration files stored in <tt
379
class="FILENAME">~/.indexes-email/</tt> and, (unless specified otherwise in <tt
380
class="FILENAME">recoll.conf</tt>) would look for the index in <tt
381
class="FILENAME">~/.indexes-email/xapiandb/</tt>. <br />
383
<p>Using multiple configuration directories and <a
384
href="#RCL.INSTALL.CONFIG.RECOLLCONF">configuration options</a> allows you to tailor
385
multiple configurations and indexes to handle whatever subset of the available data that
386
you wish to make searchable.</p>
390
<p>You can also specify a different storage location for the index by setting the <tt
391
class="LITERAL">dbdir</tt> parameter in the configuration file (see the <a
392
href="#RCL.INSTALL.CONFIG.RECOLLCONF">configuration section</a>). This method would
393
mainly be of use if you wanted to keep the configuration directory in its default
394
location, but desired another location for the index, typically out of disk occupation
399
<p>The size of the index is determined by the size of the set of documents, but the ratio
400
can vary a lot. For a typical mixed set of documents, the index size will often be close
401
to the data set size. In specific cases (a set of compressed mbox files for example), the
402
index can become much bigger than the documents. It may also be much smaller if the
403
documents contain a lot of images or other non-indexed data (an extreme example being a
404
set of mp3 files where only the tags would be indexed).</p>
406
<p>Of course, images, sound and video do not increase the index size, which means that it
407
will be quite typical nowadays (2006), that even a big index will be negligible against
408
the total amount of data on the computer.</p>
410
<p>The index data directory (<tt class="FILENAME">xapiandb</tt>) only contains data that
411
can be completely rebuilt by an index run, and it can always be destroyed safely.</p>
415
<h3 class="SECT2"><a id="RCL.INDEXING.STORAGE.SECURITY"
416
name="RCL.INDEXING.STORAGE.SECURITY">2.2.1. Security aspects</a></h3>
418
<p>The <b class="APPLICATION">Recoll</b> index does not hold copies of the indexed
419
documents. But it does hold enough data to allow for an almost complete reconstruction.
420
If confidential data is indexed, access to the database directory should be
423
<p>As of version 1.4, <b class="APPLICATION">Recoll</b> will create the configuration
424
directory with a mode of 0700 (access by owner only). As the index data directory is by
425
default a sub-directory of the configuration directory, this should result in appropriate
428
<p>If you use another setup, you should think of the kind of protection you need for your
429
index, and set the directory and files access modes appropriately.</p>
435
<h2 class="SECT1"><a id="RCL.INDEXING.CONFIG" name="RCL.INDEXING.CONFIG">2.3. The
436
indexing configuration</a></h2>
438
<p>You can control which areas of the file system are indexed, and how files are
439
processed, by setting variables inside the <a href="#RCL.INSTALL.CONFIG"><b
440
class="APPLICATION">Recoll</b> configuration files</a>.</p>
442
<p>You can also use <a href="#RCL.SEARCH.MULTIDB">multiple indexes</a> defined by
443
separate configurations, typically to separate personal and shared indexes, or to take
444
advantage of the organization of your data to improve search precision.</p>
446
<p>The first time you start <tt class="COMMAND">recoll</tt>, you will be asked whether or
447
not you would like recoll to build the index. If you want to adjust the configuration
448
before indexing, just click <span class="GUILABEL">Cancel</span> at this point. That way,
449
recoll will have created a ~/.recoll directory containing empty configuration files.</p>
451
<p>The configuration is documented inside the <a href="#RCL.INSTALL.CONFIG">installation
452
chapter</a> of this document, or in the recoll.conf(5) man page. The most immediately
453
useful variable you may interested in is probably <a
454
href="#RCL.INSTALL.CONFIG.RECOLLCONF.TOPDIRS">topdirs</a>, which determines what subtrees
457
<p>The applications needed to index file types other than text, HTML or email (ie: pdf,
458
postscript, ms-word...) are described in the <a href="#RCL.INSTALL.EXTERNAL">external
459
packages section</a></p>
464
<h2 class="SECT1"><a id="RCL.INDEXING.PERIODIC" name="RCL.INDEXING.PERIODIC">2.4.
465
Periodic indexing</a></h2>
468
<h3 class="SECT2"><a id="RCL.INDEXING.PERIODIC.EXEC"
469
name="RCL.INDEXING.PERIODIC.EXEC">2.4.1. Starting indexing</a></h3>
471
<p>Indexing is performed either by the <tt class="COMMAND">recollindex</tt> program, or
472
by the indexing thread inside the <tt class="COMMAND">recoll</tt> program (use the <span
473
class="GUIMENU">File</span> menu). Both programs will use of the <tt
474
class="LITERAL">RECOLL_CONFDIR</tt> variable or accept a <tt class="LITERAL">-c</tt> <tt
475
class="REPLACEABLE"><i>confdir</i></tt> option to specify the configuration directory to
478
<p>If the <tt class="COMMAND">recoll</tt> program finds no index when it starts, it will
479
automatically start indexing (except if canceled).</p>
481
<p>It is best to avoid interrupting the indexing process, as this may sometimes leave the
482
index in a bad state. This is not a serious problem, as you then just need to delete the
483
index files and restart the indexing. The index files are normally stored in the <tt
484
class="FILENAME">$HOME/.recoll/xapiandb</tt> directory, which you can just delete if
485
needed. Alternatively, you can start <tt class="COMMAND">recollindex</tt> with option <tt
486
class="LITERAL">-z</tt>, which will reset the database before indexing.</p>
491
<h3 class="SECT2"><a id="RCL.INDEXING.PERIODIC.AUTOMAT"
492
name="RCL.INDEXING.PERIODIC.AUTOMAT">2.4.2. Using <tt class="COMMAND">cron</tt> to
493
automate indexing</a></h3>
495
<p>The most common way to set up indexing is to have a cron task execute it every night.
496
For example the following <tt class="FILENAME">crontab</tt> entry would do it every day
497
at 3:30AM (supposing <tt class="COMMAND">recollindex</tt> is in your PATH):</p>
499
<pre class="PROGRAMLISTING">
500
30 3 * * * recollindex > /tmp/recolltrace 2>&1
503
<p>The usual command to edit your <tt class="FILENAME">crontab</tt> is <kbd
504
class="USERINPUT">crontab -e</kbd> (which will usually start the <tt
505
class="COMMAND">vi</tt> editor to edit the file). You may have more sophisticated tools
506
available on your system.</p>
512
<h2 class="SECT1"><a id="RCL.INDEXING.MONITOR" name="RCL.INDEXING.MONITOR">2.5. Real time
515
<p>Real time monitoring/indexing is performed by starting the <tt
516
class="COMMAND">recollindex -m</tt> command. With this option, <tt
517
class="COMMAND">recollindex</tt> will detach from the terminal and become a daemon,
518
permanently monitoring file changes and updating the index.</p>
520
<p>The real time indexing support can be customised during package <a
521
href="#RCL.INSTALL.BUILDING.BUILD">configuration</a> with the <tt
522
class="LITERAL">--with[out]-fam</tt> or <tt class="LITERAL">--with[out]-inotify</tt>
523
options. The default is currently to include inotify monitoring on systems that support
526
<p>The <tt class="FILENAME">rclmon.sh</tt> script can be used to easily start and stop
527
the daemon. It can be found in the <tt class="FILENAME">examples</tt> directory
528
(typically <tt class="FILENAME">/usr/local/[share/]recoll/examples</tt>).</p>
530
<p>Starting the daemon is normally performed as part of the user session script. For
531
example, my out of fashion xdm-based session has a <tt class="FILENAME">.xsession</tt>
532
script with the following lines at the end:</p>
534
<pre class="PROGRAMLISTING">
535
recollconf=$HOME/.recoll-home
536
recolldata=/usr/local/share/recoll
537
RECOLL_CONFDIR=$recollconf $recolldata/examples/rclmon.sh start
542
<p>The indexing daemon gets started, then the window manager, for which the session
545
<p>By default the indexing daemon will monitor the state of the X11 session, and exit
546
when it finishes, it is not necessary to kill it explicitely. (The X11 server monitoring
547
can be disabled with option <tt class="LITERAL">-x</tt> to <tt
548
class="COMMAND">recollindex</tt>).</p>
550
<p>Under KDE, you can place a small script to start <tt class="COMMAND">recollindex
551
-m</tt> under <tt class="FILENAME">$HOME/.kde/Autostart</tt>. This will be executed when
552
the session begins.</p>
554
<p>There is a similar mechanism under Gnome (find the session control tool in the menus
555
and use the "Startup programs" tab).</p>
557
<p>By default, the indexing daemon will write its messages to a file inside the
558
configuration directory (this is controlled by the <tt
559
class="LITERAL">daemlogfilename</tt> and <tt class="LITERAL">daemloglevel</tt>
560
configuration parameters). You may want to change this. Also the log file will only be
561
truncated when the daemon starts. If the daemon runs permanently, the log file may grow
562
quite big, depending on the log level.</p>
564
<p>While it is convenient that data is indexed in real time, repeated indexing can
565
generate a significant load on the system when files such as email folders change. You
566
probably do not want to enable it if your system is short on resources. Periodic indexing
567
is adequate in most cases.</p>
571
<div class="CHAPTER">
573
<h1><a id="RCL.SEARCH" name="RCL.SEARCH"></a>Chapter 3. Searching</h1>
575
<p>The <tt class="COMMAND">recoll</tt> program provides the user interface for searching.
576
It is based on the <b class="APPLICATION">QT</b> library.</p>
580
<h2 class="SECT1"><a id="RCL.SEARCH.SIMPLE" name="RCL.SEARCH.SIMPLE">3.1. Simple
583
<div class="PROCEDURE">
586
<p>Start the <tt class="COMMAND">recoll</tt> program.</p>
590
<p>Possibly choose a search mode: <span class="GUILABEL">Any term</span> or <span
591
class="GUILABEL">All terms</span> or <span class="GUILABEL">File name</span>.</p>
595
<p>Enter search term(s) in the text field at the top of the window.</p>
599
<p>Click the <span class="GUILABEL">Search</span> button or hit the <b
600
class="KEYCAP">Enter</b> key to start the search.</p>
605
<p>The initial default search mode is <span class="GUILABEL">All terms</span>. This will
606
look for documents containing all of the search terms (the ones with more terms will get
607
better scores). <span class="GUILABEL">Any term</span> will search for documents where at
608
least one of the terms appear. <span class="GUILABEL">File name</span> will specifically
609
look for file names.</p>
611
<p>The fourth entry (<span class="GUILABEL">Query Language</span>) is described in <a
612
href="#RCL.SEARCH.LANG">its own section</a>.</p>
614
<p>All search modes allow wildcards inside terms (<tt class="LITERAL">*</tt>, <tt
615
class="LITERAL">?</tt>, <tt class="LITERAL">[]</tt>). You may want to have a look at the
616
<a href="#RCL.SEARCH.WILDCARDS">section about wildcards</a> for more information about
619
<p>You can search for exact phrases (adjacent words in a given order) by enclosing the
620
input inside double quotes. Ex: <tt class="LITERAL">"virtual reality"</tt>.</p>
622
<p>Character case has no influence on search, except that you can disable stem expansion
623
for any term by capitalizing it. Ie: a search for <tt class="LITERAL">floor</tt> will
624
also normally look for <tt class="LITERAL">flooring</tt>, <tt
625
class="LITERAL">floored</tt>, etc., but a search for <tt class="LITERAL">Floor</tt> will
626
only look for <tt class="LITERAL">floor</tt>, in any character case (stemming can also be
627
disabled globally in the preferences).</p>
629
<p><b class="APPLICATION">Recoll</b> remembers the last few searches that you performed.
630
You can use the simple search text entry widget (a combobox) to recall them (click on the
631
thing at the right of the text field). Please note, however, that only the search texts
632
are remembered, not the mode (all/any/file name).</p>
634
<p>Typing <b class="KEYCAP">Esc</b> <b class="KEYCAP">Space</b> while entering a word in
635
the simple search entry will open a window with possible completions for the word. The
636
completions are extracted from the database.</p>
638
<p>Double-clicking on a word in the result list or a preview window will insert it into
639
the simple search entry field.</p>
641
<p>Note that, apart from wildcard characters (single <tt class="LITERAL">?</tt>
642
characters are ok), you can cut and paste any text into an <span class="GUILABEL">All
643
terms</span> or <span class="GUILABEL">Any term</span> search field, punctuation,
644
newlines and all. <b class="APPLICATION">Recoll</b> will process it and produce a
645
meaningful search. This is what most differentiates this mode from the <span
646
class="GUILABEL">Query Language</span> mode, where you have to care about the syntax.</p>
648
<p>You can use the <span class="GUILABEL">Tools</span> / <span class="GUILABEL">Advanced
649
search</span> dialog for more complex searches.</p>
654
<h2 class="SECT1"><a id="RCL.SEARCH.RESLIST" name="RCL.SEARCH.RESLIST">3.2. The result
657
<p>After starting a search, a list of results will instantly be displayed in the main
660
<p>By default, the document list is presented in order of relevance (how well the system
661
estimates that the document matches the query). You can specify a different ordering by
662
using the <a href="#RCL.SEARCH.SORT"><span class="GUILABEL">Tools</span> / <span
663
class="GUILABEL">Sort parameters</span></a> dialog.</p>
665
<p>Clicking on the <tt class="LITERAL">Preview</tt> link for an entry will open an
666
internal preview window for the document. Further <tt class="LITERAL">Preview</tt> clicks
667
for the same search will open tabs in the existing preview window. You can use <b
668
class="KEYCAP">Shift</b>+Click to force the creation of another preview window, which may
669
be useful to view the documents side by side. (You can also browse successive results in
670
a single preview window by typing <b class="KEYCAP">Shift</b>+<b
671
class="KEYCAP">ArrowUp/Down</b> in the window).</p>
673
<p>Clicking the <tt class="LITERAL">Edit</tt> link will attempt to start an external
674
viewer. The viewers can be configured through the user preferences dialog, or by editing
675
the <tt class="FILENAME">mimeview</tt> configuration file.</p>
677
<p>The <tt class="LITERAL">Preview</tt> and <tt class="LITERAL">Edit</tt> edit links may
678
not be present for all entries, meaning that <b class="APPLICATION">Recoll</b> has no
679
configured way to preview a given file type (which was indexed by name only), or no
680
configured external viewer for the file type. This can sometimes be adjusted simply by
681
tweaking the <a href="#RCLINSTALL.CONFIG.MIMEMAP"><tt class="FILENAME">mimemap</tt></a>
682
and <a href="#RCLINSTALL.CONFIG.MIMEVIEW"><tt class="FILENAME">mimeview</tt></a>
683
configuration files (the latter can be modified with the user preferences dialog).</p>
685
<p>You can click on the <tt class="LITERAL">Query details</tt> link at the top of the
686
results page to see the query actually performed, after stem expansion and other
689
<p>Double-clicking on any word inside the result list or a preview window will insert it
690
into the simple search text.</p>
692
<p>The result list is divided into pages (the size of which you can change in the
693
preferences). Use the arrow buttons in the toolbar or the links at the bottom of the page
694
to browse the results.</p>
698
<h3 class="SECT2"><a id="RCL.SEARCH.RESULTLIST.MENU"
699
name="RCL.SEARCH.RESULTLIST.MENU">3.2.1. The result list right-click menu</a></h3>
701
<p>Apart from the preview and edit links, you can display a pop-up menu by right-clicking
702
over a paragraph in the result list. This menu has the following entries:</p>
706
<p><span class="GUILABEL">Preview</span></p>
710
<p><span class="GUILABEL">Edit</span></p>
714
<p><span class="GUILABEL">Copy File Name</span></p>
718
<p><span class="GUILABEL">Copy Url</span></p>
722
<p><span class="GUILABEL">Find similar</span></p>
726
<p><span class="GUILABEL">Find similar</span></p>
730
<p><span class="GUILABEL">Parent document</span></p>
734
<p>The <span class="GUILABEL">Preview</span> and <span class="GUILABEL">Edit</span>
735
entries do the same thing as the corresponding links.</p>
737
<p>The <span class="GUILABEL">Copy File Name</span> and <span class="GUILABEL">Copy
738
Url</span> copy the relevant data to the clipboard, for later pasting.</p>
740
<p>The <span class="GUILABEL">Find similar</span> entry will select a number of relevant
741
term from the current document and enter them into the simple search field. You can then
742
start a simple search, with a good chance of finding documents related to the current
745
<p>The <span class="GUILABEL">Parent document</span> entry will appear for documents
746
which are not actually files but are part of, or attached to, a higher level document.
747
This entry is mainly useful for email attachments and permits viewing the message to
748
which the document is attached. Note that the entry will also appear for an email which
749
is part of an mbox folder file, but that you can't actually visualize the folder (there
750
will be an error dialog if you try). <b class="APPLICATION">Recoll</b> is unfortunately
751
not yet smart enough to disable the entry in this case.</p>
757
<h2 class="SECT1"><a id="RCL.SEARCH.PREVIEW" name="RCL.SEARCH.PREVIEW">3.3. The preview
760
<p>The preview window opens when you first click a <tt class="LITERAL">Preview</tt> link
761
inside the result list.</p>
763
<p>Subsequent preview requests for a given search open new tabs in the existing window
764
(except if you hold the <b class="KEYCAP">Shift</b> key while clicking which will open a
765
new window for side by side viewing).</p>
767
<p>Starting another search and requesting a preview will create a new preview window. The
768
old one stays open until you close it.</p>
770
<p>You can close a preview tab by typing <b class="KEYCAP">^W</b> (<b
771
class="KEYCAP">Ctrl</b> + <b class="KEYCAP">W</b>) in the window. Closing the last tab
772
for a window will also close the window.</p>
774
<p>Of course you can also close a preview window by using the window manager button in
775
the top of the frame.</p>
777
<p>You can display successive or previous documents from the result list inside a preview
778
tab by typing <b class="KEYCAP">Shift</b>+<b class="KEYCAP">Down</b> or <b
779
class="KEYCAP">Shift</b>+<b class="KEYCAP">Up</b> (<b class="KEYCAP">Down</b> and <b
780
class="KEYCAP">Up</b> are the arrow keys).</p>
782
<p>The preview tabs have an internal incremental search function. You initiate the search
783
either by typing a <b class="KEYCAP">/</b> (slash) inside the text area or by clicking
784
into the <span class="GUILABEL">Search for:</span> text field and entering the search
785
string. You can then use the <span class="GUILABEL">Next</span> and <span
786
class="GUILABEL">Previous</span> buttons to find the next/previous occurrence. You can
787
also type <b class="KEYCAP">F3</b> inside the text area to get to the next
790
<p>If you have a search string entered and you use ^Up/^Down to browse the results, the
791
search is initiated for each successive document. If the string is found, the cursor will
792
be positioned at the first occurrence of the search string.</p>
797
<h2 class="SECT1"><a id="RCL.SEARCH.LANG" name="RCL.SEARCH.LANG">3.4. The query
800
<p>The query language processor is activated on the simple search entry when the search
801
mode selector is set to <span class="GUILABEL">Query Language</span>.</p>
803
<p>Here follows a sample request that we are going to explain:</p>
805
<pre class="PROGRAMLISTING">
806
mime:message/rfc822 author:"john doe" Beatles OR Lennon Live OR Unplugged -potatoes
810
<p>This would search for all email messages with <tt class="REPLACEABLE"><i>John
811
Doe</i></tt> appearing as a phrase in the <tt class="LITERAL">From:</tt> header, and
812
containing either <tt class="REPLACEABLE"><i>beatles</i></tt> or <tt
813
class="REPLACEABLE"><i>lennon</i></tt> and either <tt
814
class="REPLACEABLE"><i>live</i></tt> or <tt class="REPLACEABLE"><i>unplugged</i></tt> but
815
not <tt class="REPLACEABLE"><i>potatoes</i></tt>.</p>
817
<p>The first element, <tt class="LITERAL">mime:message/rfc822</tt> is a special switch
818
that restricts the results to be email messages. There could be several such switches,
819
which would form a list of allowed types.</p>
821
<p>The second element <tt class="LITERAL">author:"john doe"</tt> is a phrase search
822
limited to a specific field. Phrase searches are specified as usual by enclosing the
823
words in double quotes. The field specification appears before the colon. <b
824
class="APPLICATION">Recoll</b> currently manages the following fields:</p>
828
<p><tt class="LITERAL">title</tt>, <tt class="LITERAL">subject</tt> or <tt
829
class="LITERAL">caption</tt> are synonyms which specify data to be searched for in the
830
document title or subject.</p>
834
<p><tt class="LITERAL">author</tt> or <tt class="LITERAL">from</tt> for searching the
835
documents originators.</p>
839
<p><tt class="LITERAL">keyword</tt> for searching the document specified keywords (few
840
documents actually have any).</p>
844
<p>The query language is currently the only way to use the <b
845
class="APPLICATION">Recoll</b> field search capability.</p>
847
<p>All elements in the search entry are normally combined with an implicit AND. It is
848
possible to specify that elements be OR'ed instead, as in <tt
849
class="REPLACEABLE"><i>Beatles</i></tt> <tt class="LITERAL">OR</tt> <tt
850
class="REPLACEABLE"><i>Lennon</i></tt>. The <tt class="LITERAL">OR</tt> must be entered
851
literally (capitals), and it has priority over the AND associations: <tt
852
class="REPLACEABLE"><i>word1</i></tt> <tt class="REPLACEABLE"><i>word2</i></tt> <tt
853
class="LITERAL">OR</tt> <tt class="REPLACEABLE"><i>word3</i></tt> means <tt
854
class="REPLACEABLE"><i>word1</i></tt> AND (<tt class="REPLACEABLE"><i>word2</i></tt> <tt
855
class="LITERAL">OR</tt> <tt class="REPLACEABLE"><i>word3</i></tt>) not (<tt
856
class="REPLACEABLE"><i>word1</i></tt> AND <tt class="REPLACEABLE"><i>word2</i></tt>) <tt
857
class="LITERAL">OR</tt> <tt class="REPLACEABLE"><i>word3</i></tt>. Do not enter explicit
858
parenthesis, they are not supported for now.</p>
860
<p>An entry preceded by a <tt class="LITERAL">-</tt> specifies a term that should <span
861
class="emphasis"><i class="EMPHASIS">not</i></span> appear.</p>
863
<p>Words inside phrases and capitalized words are not stem-expanded. Wildcards may be
866
<p>You can use the <tt class="LITERAL">show query</tt> link at the top of the result list
867
to check the exact query which was finally executed by Xapian.</p>
872
<h2 class="SECT1"><a id="RCL.SEARCH.COMPLEX" name="RCL.SEARCH.COMPLEX">3.5.
873
Complex/advanced search</a></h2>
875
<p>The advanced search dialog has a number of fields that will allow a more refined
876
search. Each entry field is configurable for the following modes:</p>
888
<p>None of the terms.</p>
892
<p>Phrase (exact terms in order within an adjustable window).</p>
896
<p>Proximity (terms in any order within an adjustable window).</p>
900
<p>Filename search with wildcards.</p>
904
<p>Additional entry fields can be created by clicking the <span class="GUILABEL">Add
905
clause</span> button.</p>
907
<p>You can choose that all relevant fields will be combined by either an AND or an OR
908
conjunction. All types of clauses except "phrase" and "near" can accept a mix of single
909
words and phrases enclosed in double quotes. Stemming expansion will be performed for all
910
terms not beginning with a capital letter, except for terms inside "phrase" clauses.
911
Wildcards will be processed everywhere.</p>
913
<p>Advanced search will also let you search for documents of specific mime types (ie:
914
only <tt class="LITERAL">text/plain</tt>, or <tt class="LITERAL">text/HTML</tt> or <tt
915
class="LITERAL">application/pdf</tt> etc...). The state of the file type selection can be
916
saved as the default (the file type filter will not be activated at program start-up, but
917
the lists will be in the restored state).</p>
919
<p>You can also restrict the search results to a sub-tree of the indexed area. If you
920
need to do this often, you may think of setting up multiple indexes instead, as the
921
performance will be much better.</p>
923
<p>Click on the <span class="GUILABEL">Start Search</span> button in the advanced search
924
dialog, or type <b class="KEYCAP">Enter</b> in any text field to start the search. The
925
button in the main window always performs a simple search.</p>
927
<p>Click on the <tt class="LITERAL">Show query details</tt> link at the top of the result
928
page to see the query expansion.</p>
933
<h2 class="SECT1"><a id="RCL.SEARCH.TERMEXPLORER" name="RCL.SEARCH.TERMEXPLORER">3.6. The
934
term explorer tool</a></h2>
936
<p><b class="APPLICATION">Recoll</b> automatically manages the expansion of search terms
937
to their derivatives (ie: plural/singular, verb inflections). But there are other cases
938
where the exact search term is not known. For example, you may not remember the exact
939
spelling, or only know the beginning of the name.</p>
941
<p>The term explorer tool (started from the toolbar icon or from the <span
942
class="GUILABEL">Term explorer</span> entry of the <span class="GUILABEL">Tools</span>
943
menu) can be used to search the full index terms list. It has three modes of
946
<div class="VARIABLELIST">
951
<p>In this mode of operation, you can enter a search string with shell-like wildcards (*,
952
?, []). ie: <tt class="REPLACEABLE"><i>xapi*</i></tt> would display all index terms
953
beginning with <tt class="REPLACEABLE"><i>xapi</i></tt>. (More about wildcards <a
954
href="#RCL.SEARCH.WILDCARDS">here</a>).</p>
957
<dt>Regular expression</dt>
960
<p>This mode will accept a regular expression as input. Example: <tt
961
class="REPLACEABLE"><i>word[0-9]+</i></tt>. The expression is implicitely anchored at the
962
beginning. Ie: <tt class="REPLACEABLE"><i>press</i></tt> will match <tt
963
class="REPLACEABLE"><i>pression</i></tt> but not <tt
964
class="REPLACEABLE"><i>expression</i></tt>. You can use <tt
965
class="REPLACEABLE"><i>.*press</i></tt> to match the latter, but be aware that this will
966
cause a full index term list scan, which can be quite long.</p>
969
<dt>Stem expansion</dt>
972
<p>This mode will perform the usual stem expansion normally done as part user input
973
processing. As such it is probably mostly useful to demonstrate the process.</p>
976
<dt>Spelling/Phonetic</dt>
979
<p>In this mode, you enter the term as you think it is spelled, and <b
980
class="APPLICATION">Recoll</b> will do its best to find index terms that sound like your
981
entry. This mode uses the <b class="APPLICATION">Aspell</b> spelling application, which
982
must be installed on your system for things to work. The language which is used to build
983
the dictionary out of the index terms (which is done at the end of an indexing pass) is
984
the one defined by your NLS environment. Weird things will probably happen if languages
990
<p>Note that in cases where <b class="APPLICATION">Recoll</b> does not know the beginning
991
of the string to search for (ie a wildcard expression like <tt
992
class="REPLACEABLE"><i>*coll</i></tt>), the expansion can take quite a long time because
993
the full index term list will have to be processed. The expansion is currently limited at
994
200 results for wildcards and regular expressions.</p>
996
<p>Double-clicking on a term in the result list will insert it into the simple search
997
entry field. You can also cut/paste between the result list and any entry field (the end
998
of lines will be taken care of).</p>
1003
<h2 class="SECT1"><a id="RCL.SEARCH.WILDCARDS" name="RCL.SEARCH.WILDCARDS">3.7. More
1004
about wildcards</a></h2>
1006
<p>All words entered in <b class="APPLICATION">Recoll</b> search fields will be processed
1007
for wildcard expansion before the request is finally executed.</p>
1009
<p>The wildcard characters are:</p>
1013
<p><tt class="LITERAL">*</tt> which matches 0 or more characters.</p>
1017
<p><tt class="LITERAL">?</tt> which matches a single character.</p>
1021
<p><tt class="LITERAL">[]</tt> which allow defining sets of characters to be matched (ex:
1022
<tt class="LITERAL">[</tt><kbd class="USERINPUT">abc</kbd><tt class="LITERAL">]</tt>
1023
matches a single character which may be 'a' or 'b' or 'c', <tt class="LITERAL">[</tt><kbd
1024
class="USERINPUT">0-9</kbd><tt class="LITERAL">]</tt> matches any number.</p>
1028
<p>You should be aware of a few things before using wildcards.</p>
1032
<p>Using a wildcard character at the beginning of a word can make for a slow search
1033
because <b class="APPLICATION">Recoll</b> will have to scan the whole index term list to
1034
find the matches.</p>
1038
<p>Using a <tt class="LITERAL">*</tt> at the end of a word can produce more matches than
1039
you would think, and strange search results. You can use the <a
1040
href="#RCL.SEARCH.TERMEXPLORER">term explorer</a> tool to check what completions exist
1041
for a given term. You can also see exactly what search was performed by clicking on the
1042
link at the top of the result list. In general, for natural language terms, stem
1043
expansion will produce better results than an ending <tt class="LITERAL">*</tt> (stem
1044
expansion is turned off when any wildcard character appears in the term).</p>
1051
<h2 class="SECT1"><a id="RCL.SEARCH.MULTIDB" name="RCL.SEARCH.MULTIDB">3.8. Multiple
1054
<p>Multiple <b class="APPLICATION">Recoll</b> databases or indexes can be created by
1055
using several configuration directories which are usually set to index different areas of
1056
the file system. A specific index can be selected for updating or searching, using the
1057
<tt class="LITERAL">RECOLL_CONFDIR</tt> environment variable or the <tt
1058
class="LITERAL">-c</tt> option to <tt class="COMMAND">recoll</tt> and <tt
1059
class="COMMAND">recollindex</tt>.</p>
1061
<p>A <tt class="COMMAND">recollindex</tt> program instance can only update one specific
1064
<p>A <tt class="COMMAND">recoll</tt> program instance is also associated with a specific
1065
index, which is the one to be updated by its indexing thread, but it can use any number
1066
of <b class="APPLICATION">Recoll</b> indexes for searching. The external indexes can be
1067
selected through the <span class="GUILABEL">external indexes</span> tab in the
1068
preferences dialog.</p>
1070
<p>Index selection is performed in two phases. A set of all usable indexes must first be
1071
defined, and then the subset of indexes to be used for searching. Of course, these
1072
parameters are retained across program executions (there are kept separately for each <b
1073
class="APPLICATION">Recoll</b> configuration). The set of all indexes is usually quite
1074
stable, while the active ones might typically be adjusted quite frequently.</p>
1076
<p>The main index (defined by <tt class="LITERAL">RECOLL_CONFDIR</tt>) is always active.
1077
If this is undesirable, you can set up your base configuration to index an empty
1080
<p>As building the set of all indexes can be a little tedious when done through the user
1081
interface, you can use the <tt class="LITERAL">RECOLL_EXTRA_DBS</tt> environment variable
1082
to provide an initial set. This might typically be set up by a system administrator so
1083
that every user does not have to do it. The variable should define a colon-separated list
1084
of index directories, ie:</p>
1086
<pre class="SCREEN">
1087
export RECOLL_EXTRA_DBS=/some/place/xapiandb:/some/other/db
1090
<p>A typical usage scenario for the multiple index feature would be for a system
1091
administrator to set up a central index for shared data, that you choose to search or not
1092
in addition to your personal data. Of course, there are other possibilities. There are
1093
many cases where you know the subset of files that should be searched, and where
1094
narrowing the search can improve the results. You can achieve approximately the same
1095
effect with the directory filter in advanced search, but multiple indexes will have much
1096
better performance and may be worth the trouble.</p>
1101
<h2 class="SECT1"><a id="RCL.SEARCH.HISTORY" name="RCL.SEARCH.HISTORY">3.9. Document
1104
<p>Documents that you actually view (with the internal preview or an external tool) are
1105
entered into the document history, which is remembered. You can display the history list
1106
by using the <span class="GUILABEL">Tools/</span><span class="GUILABEL">Doc
1107
History</span> menu entry.</p>
1112
<h2 class="SECT1"><a id="RCL.SEARCH.SORT" name="RCL.SEARCH.SORT">3.10. Sorting search
1115
<p>The documents in a result list are normally sorted in order of relevance. It is
1116
possible to specify different sort parameters by using the <span class="GUIMENU">Sort
1117
parameters</span> dialog (located in the <span class="GUIMENU">Tools</span> menu).</p>
1119
<p>The tool sorts a specified number of the most relevant documents in the result list,
1120
according to specified criteria. The currently available criteria are <span
1121
class="emphasis"><i class="EMPHASIS">date</i></span> and <span class="emphasis"><i
1122
class="EMPHASIS">mime type</i></span>.</p>
1124
<p>The sort parameters stay in effect until they are explicitly reset, or the program
1125
exits. An activated sort is indicated in the result list header.</p>
1130
<h2 class="SECT1"><a id="RCL.SEARCH.TIPS" name="RCL.SEARCH.TIPS">3.11. Search tips,
1133
<div class="FORMALPARA">
1134
<p><b>Term completion.</b> Typing <b class="KEYCAP">Esc</b> <b class="KEYCAP">Space</b>
1135
in the simple search entry field while entering a word will either complete the current
1136
word if its beginning matches a unique term in the index, or open a window to propose a
1137
list of completions.</p>
1140
<div class="FORMALPARA">
1141
<p><b>Picking up new terms from result or preview text.</b> Double-clicking on a word in
1142
the result list or in a preview window will copy it to the simple search entry field.</p>
1145
<div class="FORMALPARA">
1146
<p><b>Disabling stem expansion.</b> Entering a capitalized word in any search field will
1147
prevent stem expansion (no search for <tt class="LITERAL">gardening</tt> if you enter <tt
1148
class="LITERAL">Garden</tt> instead of <tt class="LITERAL">garden</tt>). This is the only
1149
case where character case should make a difference for a <b
1150
class="APPLICATION">Recoll</b> search. You can also disable stem expansion or change the
1151
stemming language in the preferences.</p>
1154
<div class="FORMALPARA">
1155
<p><b>Phrases.</b> A phrase can be looked for by enclosing it in double quotes. Example:
1156
<tt class="LITERAL">"user manual"</tt> will look only for occurrences of <tt
1157
class="LITERAL">user</tt> immediately followed by <tt class="LITERAL">manual</tt>. You
1158
can use the <span class="GUILABEL">This exact phrase</span> field of the advanced search
1159
dialog to the same effect. Phrases can be entered along simple terms in all simple or
1160
advanced search entry fields (except <span class="GUILABEL">This exact
1164
<div class="FORMALPARA">
1165
<p><b>Browsing the result list inside a preview window (1.5).</b> Entering <b
1166
class="KEYCAP">Shift-Down</b> or <b class="KEYCAP">Shift-Up</b> (<b
1167
class="KEYCAP">Shift</b> + an arrow key) in a preview window will display the next or the
1168
previous document from the result list. Any secondary search currently active will be
1169
executed on the new document.</p>
1172
<div class="FORMALPARA">
1173
<p><b>Forced opening of a preview window (1.6).</b> You can use <b
1174
class="KEYCAP">Shift</b>+Click on a result list <tt class="LITERAL">Preview</tt> link to
1175
force the creation of a preview window instead of a new tab in the existing one.</p>
1178
<div class="FORMALPARA">
1179
<p><b>AutoPhrases (1.5).</b> This option can be set in the preferences dialog. If it is
1180
set, a phrase will be automatically built and added to simple searches when looking for
1181
<tt class="LITERAL">Any terms</tt>. This will not change radically the results, but will
1182
give a relevance boost to the results where the search terms appear as a phrase. Ie:
1183
searching for <tt class="LITERAL">virtual reality</tt> will still find all documents
1184
where either <tt class="LITERAL">virtual</tt> or <tt class="LITERAL">reality</tt> or both
1185
appear, but those which contain <tt class="LITERAL">virtual reality</tt> should appear
1186
sooner in the list.</p>
1189
<div class="FORMALPARA">
1190
<p><b>Finding related documents.</b> Selecting the <span class="GUILABEL">Find similar
1191
documents</span> entry in the result list paragraph right-click menu will select a set of
1192
"interesting" terms from the current result, and insert them into the simple search entry
1193
field. You can then possibly edit the list and start a search to find documents which may
1194
be apparented to the current result.</p>
1197
<div class="FORMALPARA">
1198
<p><b>File names.</b> File names are added as terms during indexing, and you can specify
1199
them as ordinary terms in normal search fields (<b class="APPLICATION">Recoll</b> used to
1200
index all directories in the file path as terms. This has been abandoned as it did not
1201
seem really useful). Alternatively, you can use the specific file name search which will
1202
<span class="emphasis"><i class="EMPHASIS">only</i></span> look for file names and can
1203
use wildcard expansion.</p>
1206
<div class="FORMALPARA">
1207
<p><b>Query explanation.</b> You can get an exact description of what the query looked
1208
for, including stem expansion, and Boolean operators used, by clicking on the result list
1212
<div class="FORMALPARA">
1213
<p><b>Closing previews.</b> Entering <b class="KEYCAP">^W</b> in a tab will close it
1214
(and, for the last tab, close the preview window). Entering <b class="KEYCAP">Esc</b>
1215
will close the preview window and all its tabs.</p>
1218
<div class="FORMALPARA">
1219
<p><b>Quitting.</b> Entering <b class="KEYCAP">^Q</b> almost anywhere will close the
1226
<h2 class="SECT1"><a id="RCL.SEARCH.CUSTOM" name="RCL.SEARCH.CUSTOM">3.12. Customizing
1227
the search interface</a></h2>
1229
<p>It is possible to customize some aspects of the search interface by using <span
1230
class="GUIMENU">Query configuration</span> entry in the <span
1231
class="GUIMENU">Preferences</span> menu.</p>
1233
<p>There are two tabs in the dialog, dealing with the interface itself, and with the
1234
parameters used for searching and returning results.</p>
1236
<div class="FORMALPARA">
1237
<p><b>User interface parameters:</b></p>
1241
<p><span class="GUILABEL">Number of results in a result page</span></p>
1245
<p><span class="GUILABEL">Result list font</span>: There is quite a lot of information
1246
shown in the result list, and you may want to customize the font and/or font size. The
1247
rest of the fonts used by <b class="APPLICATION">Recoll</b> are determined by your
1248
generic QT config (try the <tt class="COMMAND">qtconfig</tt> command.</p>
1252
<p><span class="GUILABEL">Result paragraph format string</span>: allows you to change the
1253
presentation of each result list entry. This is a qt-html string where the following
1254
printf-like <tt class="LITERAL">%</tt> substitutions will be performed:</p>
1258
<div class="FORMALPARA">
1259
<p><b>%A.</b> Abstract</p>
1264
<div class="FORMALPARA">
1265
<p><b>%D.</b> Date</p>
1270
<div class="FORMALPARA">
1271
<p><b>%K.</b> Keywords (if any)</p>
1276
<div class="FORMALPARA">
1277
<p><b>%L.</b> Preview and Edit links</p>
1282
<div class="FORMALPARA">
1283
<p><b>%M.</b> Mime type</p>
1288
<div class="FORMALPARA">
1289
<p><b>%N.</b> result Number</p>
1294
<div class="FORMALPARA">
1295
<p><b>%R.</b> Relevance percentage</p>
1300
<div class="FORMALPARA">
1301
<p><b>%S.</b> Size information</p>
1306
<div class="FORMALPARA">
1307
<p><b>%T.</b> Title</p>
1312
<div class="FORMALPARA">
1313
<p><b>%U.</b> Url</p>
1318
The default value for the string is:
1320
<pre class="PROGRAMLISTING">
1321
%R %S %L &nbsp;&nbsp;<b>%T</b><br>
1322
%M&nbsp;%D&nbsp;&nbsp;&nbsp;<i>%U</i><br>
1327
You may, for example, try the following for a more web-like experience:
1329
<pre class="PROGRAMLISTING">
1330
<u><b><a href="P%N">%T</a></b></u><br>
1331
%A<font color=#008000>%U - %S</font> - %L
1335
The format of the Preview and Edit links is <tt class="LITERAL"><a href="P<tt
1336
class="REPLACEABLE"><i>docnum</i></tt>"></tt> and <tt class="LITERAL"><a href="E<tt
1337
class="REPLACEABLE"><i>docnum</i></tt>"></tt> where <tt
1338
class="REPLACEABLE"><i>docnum</i></tt> is what %N would print. This makes the title a
1339
preview link in the above format. <br />
1344
<p><span class="GUILABEL">HTML help browser</span>: this will let you chose your
1345
preferred browser which will be started from the <span class="GUIMENU">Help</span> menu
1346
to read the user manual. You can enter a simple name if the command is in your PATH, or
1347
browse for a full pathname.</p>
1351
<p><span class="GUILABEL">Show document type icons in result list</span>: icons in the
1352
result list can be turned off. They take quite a lot of space and convey relatively
1353
little useful information.</p>
1357
<p><span class="GUILABEL">Auto-start simple search on white space entry</span>: if this
1358
is checked, a search will be executed each time you enter a space in the simple search
1359
input field. This lets you look at the result list as you enter new terms. This is off by
1360
default, you may like it or not...</p>
1364
<p><span class="GUILABEL">Start with advanced search dialog open</span> and <span
1365
class="GUILABEL">Start with sort dialog open</span>: If you use these dialogs all the
1366
time, checking these entries will get them to open when recoll starts.</p>
1370
<p><span class="GUILABEL">Use desktop preferences to choose document editor</span>: if
1371
this is checked, the <tt class="COMMAND">xdg-open</tt> utility will be used to open files
1372
when you click the <span class="GUILABEL">Edit</span> link in the result list, instead of
1373
the application defined in <tt class="FILENAME">mimeview</tt>. <tt
1374
class="COMMAND">xdg-open</tt> will in term use your desktop preferences to choose an
1375
appropriate application.</p>
1383
<div class="FORMALPARA">
1384
<p><b>Search parameters:</b></p>
1388
<p><span class="GUILABEL">Stemming language</span>: stemming obviously depends on the
1389
document's language. This listbox will let you chose among the stemming databases which
1390
were built during indexing (this is set in the <a
1391
href="#RCL.INSTALL.CONFIG.RECOLLCONF">main configuration file</a>), or later added with
1392
<tt class="COMMAND">recollindex -s</tt> (See the recollindex manual). Stemming languages
1393
which are dynamically added will be deleted at the next indexing pass unless they are
1394
also added in the configuration file.</p>
1398
<p><span class="GUILABEL">Dynamically build abstracts</span>: this decides if <b
1399
class="APPLICATION">Recoll</b> tries to build document abstracts when displaying the
1400
result list. Abstracts are constructed by taking context from the document information,
1401
around the search terms. This can slow down result list display significantly for big
1402
documents, and you may want to turn it off.</p>
1406
<p><span class="GUILABEL">Replace abstracts from documents</span>: this decides if we
1407
should synthesize and display an abstract in place of an explicit abstract found within
1408
the document itself.</p>
1412
<p><span class="GUILABEL">Synthetic abstract size</span>: adjust to taste...</p>
1416
<p><span class="GUILABEL">Synthetic abstract context words</span>: how many words should
1417
be displayed around each term occurrence.</p>
1425
<div class="FORMALPARA">
1426
<p><a id="RCL.SEARCH.CUSTOM.EXTRADB" name="RCL.SEARCH.CUSTOM.EXTRADB"></a><b>External
1427
indexes:</b> This panel will let you browse for additional indexes that you may want to
1428
search. External indexes are designated by their database directory (ie: <tt
1429
class="FILENAME">/home/someothergui/.recoll/xapiandb</tt>, <tt
1430
class="FILENAME">/usr/local/recollglobal/xapiandb</tt>).</p>
1433
<p>Once entered, the indexes will appear in the <span class="GUILABEL">External
1434
indexes</span> list, and you can chose which ones you want to use at any moment by
1435
checking or unchecking their entries.</p>
1437
<p>Your main database (the one the current configuration indexes to), is always
1438
implicitly active. If this is not desirable, you can set up your configuration so that it
1439
indexes, for example, an empty directory.</p>
1443
<div class="CHAPTER">
1445
<h1><a id="RCL.INSTALL" name="RCL.INSTALL"></a>Chapter 4. Installation</h1>
1448
<h2 class="SECT1"><a id="RCL.INSTALL.BINARY" name="RCL.INSTALL.BINARY">4.1. Installing a
1449
prebuilt copy</a></h2>
1451
<p>Recoll binary installations are always linked statically to the xapian libraries, and
1452
have no other dependencies. You will only have to check or install <a
1453
href="#RCL.INSTALL.EXTERNAL">supporting applications</a> for the file types that you want
1454
to index beyond text, HTML and mail files.</p>
1458
<h3 class="SECT2"><a id="RCL.INSTALL.BINARY.PACKAGE"
1459
name="RCL.INSTALL.BINARY.PACKAGE">4.1.1. Installing through a package system</a></h3>
1461
<p>If you use a BSD-type port system or a prebuilt package (RPM or other), just follow
1462
the usual procedure, and maybe have a look at the <a
1463
href="#RCL.INSTALL.CONFIG">configuration section</a> (but this may not be necessary for a
1464
quick test with default parameters).</p>
1469
<h3 class="SECT2"><a id="RCL.INSTALL.BINARY.RCL" name="RCL.INSTALL.BINARY.RCL">4.1.2.
1470
Installing a prebuilt <b class="APPLICATION">Recoll</b></a></h3>
1472
<p>The unpackaged binary versions are just compressed tar files of a build tree, where
1473
only the useful parts were kept (executables and sample configuration).</p>
1475
<p>The executable binary files are built with a static link to libxapian and libiconv, to
1476
make installation easier (no dependencies). However, this also means that you cannot
1477
change the versions which are used.</p>
1479
<p>After extracting the tar file, you can proceed with <a
1480
href="#RCL.INSTALL.BUILDING.INSTALL">installation</a> as if you had built the package
1481
from source (that is, just type <tt class="LITERAL">make install</tt>). The binary trees
1482
are built for installation to <tt class="FILENAME">/usr/local</tt>.</p>
1484
<p>You may then need to install external applications to process some file types that you
1485
want indexed (ie: acrobat, postscript ...). See next section.</p>
1487
<p>Finally, you may want to have a look at the <a
1488
href="#RCL.INDEXING.CONFIG">configuration section</a>.</p>
1494
<h2 class="SECT1"><a id="RCL.INSTALL.EXTERNAL" name="RCL.INSTALL.EXTERNAL">4.2.
1495
Supporting packages</a></h2>
1497
<p><b class="APPLICATION">Recoll</b> uses external applications to index some file types.
1498
You need to install them for the file types that you wish to have indexed (these are
1499
run-time dependencies. None is needed for building <b
1500
class="APPLICATION">Recoll</b>):</p>
1504
<p>Openoffice: supported natively, but needs the <tt class="COMMAND">unzip</tt> command
1505
to be installed.</p>
1509
<p>PDF: pdftotext is part of the <a href="http://www.foolabs.com/xpdf/"
1510
target="_top">Xpdf</a> package.</p>
1514
<p>Postscript: <a href="http://www.cs.wisc.edu/~ghost/doc/pstotext.htm"
1515
target="_top">pstotext</a>.</p>
1519
<p>MS Word: <a href="http://www.winfield.demon.nl" target="_top">antiword</a>.</p>
1523
<p>MS Excel and PowerPoint: <a href="http://www.45.free.net/~vitus/software/catdoc/"
1524
target="_top">catdoc</a>.</p>
1528
<p>RTF: <a href="http://www.gnu.org/software/unrtf/unrtf.html"
1529
target="_top">unrtf</a></p>
1533
<p>dvi: <a href="http://www.radicaleye.com/dvips.html" target="_top">dvips</a></p>
1537
<p>djvu: <a href="http://djvulibre.djvuzone.org/doc/index.html"
1538
target="_top">DjVuLibre</a></p>
1542
<p>MP3: <b class="APPLICATION">Recoll</b> will use the <tt class="COMMAND">id3info</tt>
1543
command from the <a href="http://id3lib.sourceforge.net/" target="_top">id3lib</a>
1544
package to extract tag information. Without it, only the file names will be indexed.</p>
1548
<p>Text, HTML, mail folders Openoffice and Scribus files are processed internally. Lyx is
1549
used to index Lyx files. Many filters need <tt class="COMMAND">sed</tt> and <tt
1550
class="COMMAND">awk</tt>.</p>
1555
<h2 class="SECT1"><a id="RCL.INSTALL.BUILDING" name="RCL.INSTALL.BUILDING">4.3. Building
1556
from source</a></h2>
1559
<h3 class="SECT2"><a id="RCL.INSTALL.BUILDING.PREREQS"
1560
name="RCL.INSTALL.BUILDING.PREREQS">4.3.1. Prerequisites</a></h3>
1562
<p>At the very least, you will need to download and install the <a
1563
href="http://www.xapian.org" target="_top">xapian core package</a> (<b
1564
class="APPLICATION">Recoll</b> development currently uses version 0.9.5), and the <a
1565
href="http://www.trolltech.com/products/qt/index.html" target="_top">qt run-time and
1566
development packages</a> (<b class="APPLICATION">Recoll</b> development currently uses
1567
version 3.3.5, but any 3.3 version is probably OK).</p>
1569
<p>You will most probably be able to find a binary package for <b
1570
class="APPLICATION">qt</b> for your system. You may have to compile <b
1571
class="APPLICATION">Xapian</b> but this is not difficult (if you are using <b
1572
class="APPLICATION">FreeBSD</b>, there is a port).</p>
1574
<p>You may also need <a href="http://www.gnu.org/software/libiconv/"
1575
target="_top">libiconv</a>. <b class="APPLICATION">Recoll</b> currently uses version 1.9
1576
(this should not be critical). On <b class="APPLICATION">Linux</b> systems, the iconv
1577
interface is part of libc and you should not need to do anything special.</p>
1582
<h3 class="SECT2"><a id="RCL.INSTALL.BUILDING.BUILD"
1583
name="RCL.INSTALL.BUILDING.BUILD">4.3.2. Building</a></h3>
1585
<p><b class="APPLICATION">Recoll</b> has been built on Linux (redhat7.3, mandriva 2005/6,
1586
Fedora Core 3/4/5), FreeBSD and Solaris 8. If you build on another system, <a
1587
href="mailto:jean-francois.dockes@wanadoo.fr" target="_top">I would very much welcome
1590
<p>Depending on the <b class="APPLICATION">qt</b> configuration on your system, you may
1591
have to set the <tt class="LITERAL">QTDIR</tt> and <tt class="LITERAL">QMAKESPECS</tt>
1592
variables in your environment:</p>
1596
<p><tt class="LITERAL">QTDIR</tt> should point to the directory above the one that holds
1597
the qt include files (ie: if <tt class="FILENAME">qt.h</tt> is <tt
1598
class="FILENAME">/usr/local/qt/include/qt.h</tt>, QTDIR should be <tt
1599
class="FILENAME">/usr/local/qt</tt>).</p>
1603
<p><tt class="LITERAL">QMAKESPECS</tt> should be set to the name of one of the <b
1604
class="APPLICATION">qt</b> mkspecs sub-directories (ie: linux-g++).</p>
1608
<p>On many Linux systems, <tt class="LITERAL">QTDIR</tt> is set by the login scripts, and
1609
<tt class="LITERAL">QMAKESPECS</tt> is not needed because there is a <tt
1610
class="FILENAME">default</tt> link in <tt class="FILENAME">mkspecs/</tt>.</p>
1612
<div class="FORMALPARA">
1613
<p><b>Configure options:</b> <tt class="LITERAL">--without-aspell</tt> will disable the
1614
code for phonetic matching of search terms. <tt class="LITERAL">--with-fam</tt> or <tt
1615
class="LITERAL">--with-inotify</tt> will enable the code for real time indexing. Inotify
1616
support is enabled by default on recent Linux systems.</p>
1619
<p>Normal procedure:</p>
1621
<pre class="SCREEN">
1622
<kbd class="USERINPUT">cd recoll-xxx</kbd>
1623
<kbd class="USERINPUT">configure</kbd>
1624
<kbd class="USERINPUT">make</kbd>
1625
<kbd class="USERINPUT">(practices usual hardship-repelling invocations)</kbd>
1629
<p>There little auto-configuration. The <tt class="COMMAND">configure</tt> script will
1630
mainly link one of the system-specific files in the <tt class="FILENAME">mk</tt>
1631
directory to <tt class="FILENAME">mk/sysconf</tt>. If your system is not known yet, it
1632
will tell you as much, and you may want to manually copy and modify one of the existing
1633
files (the new file name should be the output of <tt class="COMMAND">uname -s</tt>).</p>
1638
<h3 class="SECT2"><a id="RCL.INSTALL.BUILDING.INSTALL"
1639
name="RCL.INSTALL.BUILDING.INSTALL">4.3.3. Installation</a></h3>
1641
<p>Either type <kbd class="USERINPUT">make install</kbd> or execute <kbd
1642
class="USERINPUT">recollinstall <tt class="REPLACEABLE"><i>prefix</i></tt></kbd>, in the
1643
root of the source tree. This will copy the commands to <tt class="FILENAME"><tt
1644
class="REPLACEABLE"><i>prefix</i></tt>/bin</tt> and the sample configuration files,
1645
scripts and other shared data to <tt class="FILENAME"><tt
1646
class="REPLACEABLE"><i>prefix</i></tt>/share/recoll</tt>.</p>
1648
<p>If the installation prefix given to <tt class="COMMAND">recollinstall</tt> is
1649
different from what was specified when executing <tt class="COMMAND">configure</tt>, you
1650
will have to set the <tt class="LITERAL">RECOLL_DATADIR</tt> environment variable to
1651
indicate where the shared data is to be found.</p>
1653
<p>You can then proceed to <a href="#RCL.INSTALL.CONFIG">configuration</a>.</p>
1659
<h2 class="SECT1"><a id="RCL.INSTALL.CONFIG" name="RCL.INSTALL.CONFIG">4.4. Configuration
1662
<p>Most of the parameters specific to the <tt class="COMMAND">recoll</tt> GUI are set
1663
through the <span class="GUILABEL">Preferences</span> menu and stored in the standard QT
1664
place (<tt class="FILENAME">$HOME/.qt/recollrc</tt>). You probably do not want to edit
1667
<p>For other options, <b class="APPLICATION">Recoll</b> uses text configuration files.
1668
You will have to edit them by hand for now (there is still some hope for a GUI
1669
configuration tool in the future). The most accurate documentation for the configuration
1670
parameters is given by comments inside the default files, and we will just give a general
1673
<p>There are two sets of configuration files. The system-wide files are kept in a
1674
directory named like <tt class="FILENAME">/usr/[local/]share/recoll/examples</tt>, they
1675
define default values for the system. A parallel set of files exists by default in the
1676
<tt class="FILENAME">.recoll</tt> directory in your home. This directory can be changed
1677
with the <tt class="LITERAL">RECOLL_CONFDIR</tt> environment variable or the -c option
1678
parameter to <tt class="COMMAND">recoll</tt> and <tt
1679
class="COMMAND">recollindex</tt>.</p>
1681
<p>If the <tt class="FILENAME">.recoll</tt> directory does not exist when <tt
1682
class="COMMAND">recoll</tt> or <tt class="COMMAND">recollindex</tt> are started, it will
1683
be created with a set of empty configuration files. <tt class="COMMAND">recoll</tt> will
1684
give you a chance to edit the configuration file before starting indexing. <tt
1685
class="COMMAND">recollindex</tt> will proceed immediately. To avoid mistakes, the
1686
automatic directory creation will only occur for the default location, not if <tt
1687
class="LITERAL">-c</tt> or <tt class="LITERAL">RECOLL_CONFDIR</tt> were used (in the
1688
latter cases, you will have to create the directory).</p>
1690
<p>All configuration files share the same format. For example, a short extract of the
1691
main configuration file might look as follows:</p>
1693
<pre class="PROGRAMLISTING">
1694
# Space-separated list of directories to index.
1695
topdirs = ~/docs /usr/share/doc
1697
[~/somedirectory-with-utf8-txt-files]
1698
defaultcharset = utf-8
1702
<p>There are three kinds of lines:</p>
1706
<p>Comment (starts with <span class="emphasis"><i class="EMPHASIS">#</i></span>) or
1711
<p>Parameter affectation (<span class="emphasis"><i class="EMPHASIS">name =
1712
value</i></span>).</p>
1716
<p>Section definition ([<span class="emphasis"><i
1717
class="EMPHASIS">somedirname</i></span>]).</p>
1721
<p>Section definitions allow redefining some parameters for a directory sub-tree. They
1722
stay in effect until another section definition, or the end of file, is encountered. Some
1723
of the parameters used for indexing are looked up hierarchically from the current
1724
directory location upwards. Not all parameters can be meaningfully redefined, this is
1725
specified for each in the next section.</p>
1727
<p>The tilde character (~) is expanded in file names to the name of the user's home
1730
<p>White space is used for separation inside lists. List elements with embedded spaces
1731
can be quoted using double-quotes.</p>
1735
<h3 class="SECT2"><a id="RCL.INSTALL.CONFIG.RECOLLCONF"
1736
name="RCL.INSTALL.CONFIG.RECOLLCONF">4.4.1. Main configuration file</a></h3>
1738
<p><tt class="FILENAME">recoll.conf</tt> is the main configuration file. It defines
1739
things like what to index (top directories and things to ignore), and the default
1740
character set to use for document types which do not specify it internally.</p>
1742
<p>The default configuration will index your home directory. If this is not appropriate,
1743
start <tt class="COMMAND">recoll</tt> to create a blank configuration, click <span
1744
class="GUIMENU">Cancel</span>, and edit the configuration file before restarting the
1745
command. This will start the initial indexing, which may take some time.</p>
1749
<div class="VARIABLELIST">
1751
<dt><a id="RCL.INSTALL.CONFIG.RECOLLCONF.TOPDIRS"
1752
name="RCL.INSTALL.CONFIG.RECOLLCONF.TOPDIRS"></a><tt class="LITERAL">topdirs</tt></dt>
1755
<p>Specifies the list of directories or files to index (recursively for directories). The
1756
indexer will not follow symbolic links inside the indexed trees. If an entry in the <tt
1757
class="LITERAL">topdirs</tt> list is a symbolic link, indexing will not start and will
1758
generate an error.</p>
1761
<dt><tt class="LITERAL">dbdir</tt></dt>
1764
<p>The name of the Xapian data directory. It will be created if needed when the index is
1765
initialized. If this is not an absolute path, it will be interpreted relative to the
1766
configuration directory. The value can have embedded spaces but starting or trailing
1767
spaces will be trimmed. You cannot use quotes here.</p>
1770
<dt><tt class="LITERAL">skippedNames</tt></dt>
1773
<p>A space-separated list of patterns for names of files or directories that should be
1774
completely ignored. The list defined in the default file is:</p>
1776
<pre class="PROGRAMLISTING">
1777
skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \
1781
<p>The list can be redefined for sub-directories, but is only actually changed for the
1782
top level ones in <tt class="LITERAL">topdirs</tt>.</p>
1784
<p>The top-level directories are not affected by this list (that is, a directory in <tt
1785
class="LITERAL">topdirs</tt> might match and would still be indexed).</p>
1787
<p>The list in the default configuration does not exclude hidden directories (names
1788
beginning with a dot), which means that it may index quite a few things that you do not
1789
want. On the other hand, mail user agents like <b class="APPLICATION">thunderbird</b>
1790
usually store messages in hidden directories, and you probably want this indexed. One
1791
possible solution is to have <tt class="FILENAME">.*</tt> in <tt
1792
class="LITERAL">skippedNames</tt>, and add things like <tt
1793
class="FILENAME">~/.thunderbird</tt> or <tt class="FILENAME">~/.evolution</tt> in <tt
1794
class="LITERAL">topdirs</tt>.</p>
1797
<dt><tt class="LITERAL">skippedPaths</tt> and <tt
1798
class="LITERAL">daemSkippedPaths</tt></dt>
1801
<p>A space-separated list of patterns for <span class="emphasis"><i
1802
class="EMPHASIS">paths</i></span> of files or directories that should be skipped. There
1803
is no default in the sample configuration file, but the code always adds the
1804
configuration and database directories in there.</p>
1806
<p><tt class="LITERAL">skippedPaths</tt> is used both by batch and real time indexing.
1807
<tt class="LITERAL">daemSkippedPaths</tt> can be used to specify things that should be
1808
indexed at startup, but not monitored.</p>
1810
<p>Example of use for skipping text files only in a specific directory:</p>
1812
<pre class="PROGRAMLISTING">
1813
skippedPaths = ~/somedir/*.txt
1818
<dt><tt class="LITERAL">loglevel,daemloglevel</tt></dt>
1821
<p>Verbosity level for recoll and recollindex. A value of 4 lists quite a lot of
1822
debug/information messages. 2 only lists errors. The <tt class="LITERAL">daem</tt>version
1823
is specific to the indexing monitor daemon.</p>
1826
<dt><tt class="LITERAL">logfilename, daemlogfilename</tt></dt>
1829
<p>Where the messages should go. 'stderr' can be used as a special value, and is the
1830
default. The <tt class="LITERAL">daem</tt>version is specific to the indexing monitor
1834
<dt><tt class="LITERAL">filtersdir</tt></dt>
1837
<p>A directory to search for the external filter scripts used to index some types of
1838
files. The value should not be changed, except if you want to modify one of the default
1839
scripts. The value can be redefined for any sub-directory.</p>
1842
<dt><tt class="LITERAL">indexstemminglanguages</tt></dt>
1845
<p>A list of languages for which the stem expansion databases will be built. See
1846
recollindex(1) for possible values. You can add a stem expansion database for a different
1847
language by using <tt class="COMMAND">recollindex -s</tt>, but it will be deleted during
1848
the next indexing. Only languages listed in the configuration file are permanent.</p>
1851
<dt><tt class="LITERAL">defaultcharset</tt></dt>
1854
<p>The name of the character set used for files that do not contain a character set
1855
definition (ie: plain text files). This can be redefined for any sub-directory. If it is
1856
not set at all, the character set used is the one defined by the nls environment (LC_ALL,
1857
LC_CTYPE, LANG), or iso8859-1 if nothing is set.</p>
1860
<dt><tt class="LITERAL">guesscharset</tt></dt>
1863
<p>Decide if we try to guess the character set of files if no internal value is available
1864
(ie: for plain text files). This does not work well in general, and should probably not
1868
<dt><tt class="LITERAL">usesystemfilecommand</tt></dt>
1871
<p>Decide if we use the <tt class="COMMAND">file -i</tt> system command as a final step
1872
for determining the mime type for a file (the main procedure uses suffix associations as
1873
defined in the <tt class="FILENAME">mimemap</tt> file). This can be useful for files with
1874
suffix-less names, but it will also cause the indexing of many bogus "text" files.</p>
1877
<dt><tt class="LITERAL">indexallfilenames</tt></dt>
1880
<p><b class="APPLICATION">Recoll</b> indexes file names in a special section of the
1881
database to allow specific file names searches using wild cards. This parameter decides
1882
if file name indexing is performed only for files with mime types that would qualify them
1883
for full text indexing, or for all files inside the selected subtrees, independently of
1887
<dt><tt class="LITERAL">idxabsmlen</tt></dt>
1890
<p><b class="APPLICATION">Recoll</b> stores an abstract for each indexed file inside the
1891
database. This is so that they can be displayed inside the result lists without decoding
1892
the original file. This parameter defines the size of the stored abstract (which can come
1893
from an actual section or just be the beginning of the text). The default value is
1897
<dt><tt class="LITERAL">iconsdir</tt></dt>
1900
<p>The name of the directory where <tt class="COMMAND">recoll</tt> result list icons are
1901
stored. You can change this if you want different images.</p>
1909
<h3 class="SECT2"><a id="RCLINSTALL.CONFIG.MIMEMAP"
1910
name="RCLINSTALL.CONFIG.MIMEMAP">4.4.2. The mimemap file</a></h3>
1912
<p><tt class="FILENAME">mimemap</tt> specifies the file name extension to mime type
1915
<p>For file names without an extension, or with an unknown one, the system's <tt
1916
class="COMMAND">file -i</tt> command will be executed to determine the mime type (this
1917
can be switched off inside the main configuration file).</p>
1919
<p>The mappings can be specified on a per-subtree basis, which may be useful in some
1920
cases. Example: <b class="APPLICATION">gaim</b> logs have a <tt
1921
class="FILENAME">.txt</tt> extension but should be handled specially, which is possible
1922
because they are usually all located in one place.</p>
1924
<p><tt class="FILENAME">mimemap</tt> also has a <tt class="LITERAL">recoll_noindex</tt>
1925
variable which is a list of suffixes. Matching files will be skipped (which avoids
1926
unnecessary decompressions or <tt class="COMMAND">file</tt> executions). This is
1927
partially redundant with <tt class="LITERAL">skippedNames</tt> in the main configuration
1928
file, with two differences: it will not affect directories, and it cannot be made
1929
dependant on the file-system location (it is a configuration-wide parameter). You could
1930
accomplish with <tt class="LITERAL">skippedNames</tt> anything that <tt
1931
class="LITERAL">recoll_noindex</tt> does. The latter is used mostly for things known to
1932
be unindexable by a given <b class="APPLICATION">Recoll</b> version. Having it there
1933
avoids cluttering the more user-oriented and locally customized <tt
1934
class="LITERAL">skippedNames</tt>.</p>
1939
<h3 class="SECT2"><a id="RCLINSTALL.CONFIG.MIMECONF"
1940
name="RCLINSTALL.CONFIG.MIMECONF">4.4.3. The mimeconf file</a></h3>
1942
<p><tt class="FILENAME">mimeconf</tt> specifies how the different mime types are handled
1943
for indexing, and which icons are displayed in the <tt class="COMMAND">recoll</tt> result
1946
<p>Changing the parameters in the [index] section is probably not a good idea except if
1947
you are a <b class="APPLICATION">Recoll</b> developer.</p>
1949
<p>The [icons] section allows you to change the icons which are displayed by <tt
1950
class="COMMAND">recoll</tt> in the result lists (the values are the basenames of the png
1951
images inside the <tt class="FILENAME">iconsdir</tt> directory (specified in <tt
1952
class="FILENAME">recoll.conf</tt>).</p>
1957
<h3 class="SECT2"><a id="RCLINSTALL.CONFIG.MIMEVIEW"
1958
name="RCLINSTALL.CONFIG.MIMEVIEW">4.4.4. The mimeview file</a></h3>
1960
<p><tt class="FILENAME">mimeview</tt> specifies which programs are started when you click
1961
on an <span class="GUILABEL">Edit</span> link in a result list. Ie: HTML is normally
1962
displayed using <b class="APPLICATION">firefox</b>, but you may prefer <b
1963
class="APPLICATION">Konqueror</b>, your <b class="APPLICATION">openoffice.org</b> program
1964
might be named <tt class="COMMAND">oofice</tt> instead of <tt
1965
class="COMMAND">openoffice</tt> etc.</p>
1967
<p>Changes to this file can be done by direct editing, or through the <tt
1968
class="COMMAND">recoll</tt> user preferences dialog.</p>
1970
<p>As for the other configuration files, the normal usage is to have a <tt
1971
class="FILENAME">mimeview</tt> inside your own configuration directory, with just the
1972
non-default entries, which will override those from the central configuration file.</p>
1974
<p>Please note that these entries must be placed under a <tt class="LITERAL">[view]</tt>
1977
<p>If <span class="GUILABEL">Use desktop preferences to choose document editor</span> is
1978
checked in the user preferences, all <tt class="FILENAME">mimeview</tt> entries will be
1979
ignored except the one labelled <tt class="LITERAL">application/x-all</tt> (which is set
1980
to use <tt class="COMMAND">xdg-open</tt> by default).</p>
1985
<h3 class="SECT2"><a id="RCLINSTALL.CONFIG.EXAMPLES"
1986
name="RCLINSTALL.CONFIG.EXAMPLES">4.4.5. Examples of configuration adjustments</a></h3>
1989
<h4 class="SECT3"><a id="RCLINSTALL.CONFIG.EXAMPLES.ADDVIEW"
1990
name="RCLINSTALL.CONFIG.EXAMPLES.ADDVIEW">4.4.5.1. Adding an external viewer for an
1991
non-indexed type</a></h4>
1993
<p>Imagine that you have some kind of file which does not have indexable content, but for
1994
which you would like to have a functional <span class="GUILABEL">Edit</span> link in the
1995
result list (when found by file name). The file names end in <tt
1996
class="REPLACEABLE"><i>.blob</i></tt> and can be displayed by application <tt
1997
class="REPLACEABLE"><i>blobviewer</i></tt>.</p>
1999
<p>You need two entries in the configuration files for this to work:</p>
2003
<p>In <tt class="FILENAME">$RECOLL_CONFDIR/mimemap</tt> (typically <tt
2004
class="FILENAME">~/.recoll/mimemap</tt>), add the following line:</p>
2006
<pre class="PROGRAMLISTING">
2007
application/x-blobapp = .blob
2011
<p>Note that the mime type is made up here, and you could call it <tt
2012
class="REPLACEABLE"><i>diesel/oil</i></tt> just the same.</p>
2016
<p>In <tt class="FILENAME">$RECOLL_CONFDIR/mimeview</tt> under the <tt
2017
class="LITERAL">[view]</tt> section:</p>
2019
<pre class="PROGRAMLISTING">
2020
application/x-blobapp = blobviewer %f
2024
<p>We are supposing that <tt class="REPLACEABLE"><i>blobviewer</i></tt> wants a file name
2025
parameter here, you would use <tt class="LITERAL">%u</tt> if it liked URLs better.</p>
2029
<p>If you just wanted to change the application used by <b class="APPLICATION">Recoll</b>
2030
to display a mime type which it already knows, you would just need to edit <tt
2031
class="FILENAME">mimeview</tt>. The entries you add in your personal file override those
2032
in the central configuration, which you do not need to alter</p>
2037
<h4 class="SECT3"><a id="RCLINSTALL.CONFIG.EXAMPLES.ADDINDEX"
2038
name="RCLINSTALL.CONFIG.EXAMPLES.ADDINDEX">4.4.5.2. Adding indexing support for a new
2041
<p>Let us now imagine that the above <tt class="REPLACEABLE"><i>.blob</i></tt> files
2042
actually contain indexable text and that you know how to extract it with a command line
2043
program. Getting <b class="APPLICATION">Recoll</b> to index the files is easy. You need
2044
to perform the above alteration, and also to add data to the <tt
2045
class="FILENAME">mimeconf</tt> file (typically in <tt
2046
class="FILENAME">~/.recoll/mimeconf</tt>):</p>
2050
<p>Under the <tt class="LITERAL">[index]</tt> section, add the following line (more about
2051
the <tt class="REPLACEABLE"><i>rclblob</i></tt> indexing script later):</p>
2053
<pre class="PROGRAMLISTING">
2054
application/x-blobapp = exec rclblob
2060
<p>Under the <tt class="LITERAL">[icons]</tt> section, you should choose an icon to be
2061
displayed for the files inside the result lists. Icons are normally 64x64 pixels PNG
2062
files which live in <tt class="FILENAME">/usr/[local/]share/recoll/images</tt>.</p>
2066
<p>Under the <tt class="LITERAL">[categories]</tt> section, you should add the mime type
2067
where it makes sense (you can also create a category). Categories may be used for
2068
filtering in advanced search.</p>
2072
<p>The <tt class="REPLACEABLE"><i>rclblob</i></tt> filter should be an executable program
2073
or script which exists inside <tt
2074
class="FILENAME">/usr/[local/]share/recoll/filters</tt>. It will be given a file name as
2075
argument and should output the text contents in html format on the standard output.</p>
2077
<p>The html could be very minimal like the following example:</p>
2079
<pre class="PROGRAMLISTING">
2080
<html><head>
2081
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
2083
<body>some text content</body></html>
2087
<p>You should take care to escape some characters inside the text by transforming them
2088
into appropriate entities. "<tt class="LITERAL">&</tt>" should be transformed into
2089
"<tt class="LITERAL">&amp;</tt>", "<tt class="LITERAL"><</tt>" should be
2090
transformed into "<tt class="LITERAL">&lt;</tt>".</p>
2092
<p>The character set needs to be specified in the header. It does not need to be UTF-8
2093
(<b class="APPLICATION">Recoll</b> will take care of translating it), but it must be
2094
accurate for good results.</p>
2096
<p><b class="APPLICATION">Recoll</b> will also make use of other header fields if they
2097
are present: <tt class="LITERAL">title</tt>, <tt class="LITERAL">description</tt>, <tt
2098
class="LITERAL">keywords</tt>.</p>
2102
<p>The easiest way to write a new filter is probably to start from an existing one.</p>