140
144
currently makes no attempt at automatic language recognition.</para>
142
146
<para>&RCL; has many parameters which define exactly what to
143
index, and how to classify and decode the source
144
documents. These are kept in <link
145
linkend="rcl.indexing.config">configuration files</link>. A
146
default configuration is copied into a standard location
147
(usually something like
148
<filename>/usr/[local/]share/recoll/examples</filename>)
149
during installation. The default parameters from this file may
150
be overridden by values that you set inside your personal
151
configuration, found by default in the
152
<filename>.recoll</filename> sub-directory of your home
153
directory. The default configuration will index your home
154
directory with default parameters and should be sufficient for
155
giving &RCL; a try, but you may want to adjust it
147
index, and how to classify and decode the source documents. These
148
are kept in <link linkend="rcl.indexing.config">configuration
149
files</link>. A default configuration is copied into a standard
150
location (usually something like
151
<filename>/usr/[local/]share/recoll/examples</filename>) during
152
installation. The default parameters from this file may be
153
overridden by values that you set inside your personal
154
configuration, found by default in the <filename>.recoll</filename>
155
sub-directory of your home directory. The default configuration
156
will index your home directory with default parameters and should
157
be sufficient for giving &RCL; a try, but you may want to adjust it
158
later, which can be done either by editing the text files or by
159
using configuration menus in the <command>recoll</command>
158
162
<para><link linkend="rcl.indexing.periodic.exec">Indexing</link>
159
163
is started automatically the first time you execute the
336
341
total amount of data on the computer.</para>
338
343
<para>The index data directory (<filename>xapiandb</filename>)
339
only contains data that can be completely rebuilt by an index
340
run, and it can always be destroyed safely.</para>
344
only contains data that can be completely rebuilt by an index run
345
(as long as the original documents exist), and it can always be
346
destroyed safely.</para>
342
348
<sect2 id="rcl.indexing.storage.format">
343
349
<title>Xapian index formats</title>
345
<para>If your first installation of &RCL; was 1.9.0 or more
346
recent, you can skip this section.</para>
348
<para>&XAP; has had two possible index formats for quite some
349
time. The "old" one named <literal>Quartz</literal>, and the
350
new one named <literal>Flint</literal>. &XAP; 0.9 used
351
<literal>Quartz</literal> by default, but could use
352
<literal>Flint</literal> if a specific environment variable
353
(<literal>XAPIAN_PREFER_FLINT</literal>) was set. &XAP; 1.0
354
still supports <literal>Quartz</literal> but will use
355
<literal>Flint</literal> by default for new index
358
<para>The number of disk accesses performed during indexing
359
has been much optimized in the new <literal>Flint</literal>
360
engine and you may see indexing times improved by 50% in some
361
cases (compared to <literal>Quartz</literal>), typically for
362
big indexes where disk accesses dominate the indexing
363
time. There is also a more modest improvement of index
351
<para>&XAP; versions usually support several formats for index
352
storage. A given major &XAP; version will have a current format,
353
used to create new indexes, and will also support the format from
354
the previous major version.</para>
366
356
<para>&XAP; will not convert automatically an existing index
367
from the <literal>Quartz</literal> to the
368
<literal>Flint</literal> format. If you have an older index
369
and want to take advantage of the new format (which can be
370
done without setting the environment variable as of &RCL;
371
1.8.2 and &XAP; 1.0.0), you will have to explicitly delete
372
the old index, then run a normal indexing process.</para>
357
from the older format to the newer one. If you want to upgrade to
358
the new format, or if a very old index needs to be converted
359
because its format is not supported any more, you will have to
360
explicitly delete the old index, then run a normal indexing
374
363
<para>Unfortunately, using the <literal>-z</literal> option to
375
364
<command>recollindex</command> is not sufficient to change the
376
format, you have to delete all files inside the index
365
format, you will have to delete all files inside the index
377
366
directory (typically <filename>~/.recoll/xapiandb</filename>)
378
before starting indexing.</para>
367
before starting the indexing.</para>
510
505
<sect2 id="rcl.indexing.periodic.exec">
511
506
<title>Running indexing</title>
513
<para>Indexing is performed either by the
514
<command>recollindex</command> program, or by the
515
indexing thread inside the <command>recoll</command>
516
program (use the <guimenu>File</guimenu> menu). Both programs
517
will use the <literal>RECOLL_CONFDIR</literal>
518
variable or accept a <literal>-c</literal>
519
<replaceable>confdir</replaceable> option to specify a non-default
520
configuration directory.</para>
522
<para>Reasons to use either the indexing thread or the
523
<command>recollindex</command> command:
525
<listitem><para>Starting the indexing thread is more convenient,
526
being just one click away.</para>
528
<listitem><para>The <command>recollindex</command> command has
529
more options, especially the one to reset the index
530
(<literal>-z</literal>).</para>
532
<listitem><para>The <command>recollindex</command> command will
533
not take down your GUI if it crashes (a rare occurrence,
534
but who knows...)</para>
536
<listitem><para>The <command>recollindex</command> command uses
537
<command>setpriority/nice</command> to lower its priority while
539
(it will also use <command>ionice</command> when this becomes
540
more widely available), the thread can't do it, else it would
541
also slow down the user/search interface.</para>
544
I'll let the reader decide where my heart belongs...</para>
508
<para>Indexing is always performed by the
509
<command>recollindex</command> program, which can be started
510
either from the command line or from the <guimenu>File</guimenu>
511
menu in the <command>recoll</command> GUI program. When started
512
from the GUI, the indexing will run on the same configuration
513
<command>recoll</command> was started on. When started from the
514
command line, <command>recollindex</command> will use the
515
<literal>RECOLL_CONFDIR</literal> variable or accept a
516
<literal>-c</literal> <replaceable>confdir</replaceable> option
517
to specify a non-default configuration directory.</para>
546
519
<para>If the <command>recoll</command> program finds no index
547
520
when it starts, it will automatically start indexing (except
548
521
if canceled).</para>
550
523
<para>The <command>recollindex</command> indexing process can be
551
interrupted by sending an
552
interrupt (^C, SIGINT) or terminate (SIGTERM) signal. Some time may
553
elapse before the process exits, because it needs to properly flush
554
and close the index. The indexing thread can be equivalently
555
stopped from the menu.</para>
524
interrupted by sending an interrupt (^C, SIGINT) or terminate
525
(SIGTERM) signal. Some time may elapse before the process exits,
526
because it needs to properly flush and close the index. The
527
indexing thread can be equivalently stopped from the menu.</para>
557
529
<para>After such an interruption, the index will be somewhat
558
530
inconsistent because some operations which are normally performed
595
567
3:30AM (supposing <command>recollindex</command> is in your
598
<programlisting>30 3 * * * recollindex > /some/tmp/dir/recolltrace 2>&1</programlisting>
570
<programlisting>30 3 * * * recollindex > /some/tmp/dir/recolltrace 2>&1</programlisting>
600
572
Or, using <command>anacron</command>:
601
<programlisting>1 15 su mylogin -c "recollindex recollindex > /tmp/rcltraceme 2>&1"</programlisting>
573
<programlisting>1 15 su mylogin -c "recollindex recollindex > /tmp/rcltraceme 2>&1"</programlisting>
576
<para>As of version 1.17 the &RCL; GUI has dialogs to manage
577
<filename>crontab</filename> entries for
578
<command>recollindex</command>. You can reach them from the
579
<guimenu>Preferences->Indexing Schedule</guimenu> menu. They only
580
work with the good old <command>cron</command>, and do not give
581
access to all features of <command>cron</command> scheduling.</para>
604
583
<para>The usual command to edit your
605
584
<filename>crontab</filename> is
606
585
<userinput>crontab -e</userinput> (which will usually start
621
601
<title>Real time indexing</title>
623
603
<para>Real time monitoring/indexing is performed by starting the
624
<command>recollindex -m</command> command. With this option,
625
<command>recollindex</command> will detach from the terminal and
626
become a daemon, permanently monitoring file changes and updating
604
<command>recollindex -m</command> command. With this option,
605
<command>recollindex</command> will detach from the terminal and
606
become a daemon, permanently monitoring file changes and updating
629
<para>The real time indexing support can be customised during package
630
<link linkend="rcl.install.building.build">configuration</link>
631
with the <literal>--with[out]-fam</literal> or
632
<literal>--with[out]-inotify</literal> options. The default is
633
currently to include inotify monitoring on systems that support
609
<para>Under KDE, Gnome and some other desktop environments, the daemon
610
can automatically started when you log in, by creating a desktop
611
file inside the <filename>~/.config/autostart</filename> directory.
612
This can be done for you by the &RCL; GUI. Use the
613
<guimenu>Preferences->Indexing Schedule</guimenu> menu.</para>
615
<para>With older X11 setups, starting the daemon is normally
616
performed as part of the user session script.</para>
636
618
<para>The <filename>rclmon.sh</filename> script can be used to
637
619
easily start and stop the daemon. It can be found in the
638
620
<filename>examples</filename> directory (typically
639
621
<filename>/usr/local/[share/]recoll/examples</filename>).</para>
641
<para>Starting the daemon is normally performed as part
642
of the user session script. For example, my out of fashion
643
xdm-based session has a <filename>.xsession</filename> script
644
with the following lines at the end:</para>
623
<para>For example, my out of fashion xdm-based session has a
624
<filename>.xsession</filename> script with the following lines at
646
627
<programlisting>recollconf=$HOME/.recoll-home
647
628
recolldata=/usr/local/share/recoll
652
633
</programlisting>
654
635
<para>The indexing daemon gets started, then the window manager,
655
for which the session waits.</para> <para>By default the
656
indexing daemon will monitor the state of the X11 session, and
657
exit when it finishes, it is not necessary to kill it
658
explicitly. (The X11 server monitoring can be disabled with option
659
<literal>-x</literal> to <command>recollindex</command>).
636
for which the session waits.</para> <para>By default the
637
indexing daemon will monitor the state of the X11 session, and
638
exit when it finishes, it is not necessary to kill it
639
explicitly. (The X11 server monitoring can be disabled with option
640
<literal>-x</literal> to <command>recollindex</command>).</para>
662
<para>Under KDE, you can place a small script to start
663
<command>recollindex -m</command> under
664
<filename>$HOME/.kde/Autostart</filename>. This will be executed
665
when the session begins.</para>
667
<para>There is a similar mechanism under Gnome (find the session
668
control tool in the menus and use the "Startup programs" tab).</para>
642
<para>If you use the daemon completely out of an X11 session, you
643
need to add option <literal>-x</literal> to disable X11 session
644
monitoring (else the daemon will not start).</para>
670
646
<para>By default, the messages from the indexing daemon will be
671
647
discarded. You may want to change this by setting the
675
651
daemon runs permanently, the log file may grow quite big, depending
676
652
on the log level.</para>
654
<para>When building &RCL;, the real time indexing support can be
655
customised during package
656
<link linkend="rcl.install.building.build">configuration</link>
657
with the <literal>--with[out]-fam</literal> or
658
<literal>--with[out]-inotify</literal> options. The default is
659
currently to include inotify monitoring on systems that support
660
it, and, as of recoll 1.17, gamin support on FreeBSD.</para>
678
662
<para>While it is convenient that data is indexed in real time,
679
663
repeated indexing can generate a significant load on the
680
664
system when files such as email folders change. Also,
1071
1055
through the <guilabel>Tools</guilabel> menu or through the main
1072
1056
toolbar.</para>
1074
<para>The dialog has three parts:</para>
1077
<listitem><para>The top part allows constructing a query by
1078
combining multiple clauses of different types.
1079
Each entry field is configurable for the following modes:</para>
1058
<para>The dialog has two tabs:</para>
1061
<listitem><para>The first tab lets you specify terms to search
1062
for, and permits specifying multiple clauses which are combined
1063
to build the search.</para>
1066
<listitem><para>The second tab lets filter the results according
1067
to file size, date of modification, mime type, or
1073
<para>Click on the <guilabel>Start Search</guilabel> button in
1074
the advanced search dialog, or type <keycap>Enter</keycap> in
1075
any text field to start the search. The button in
1076
the main window always performs a simple search.</para>
1078
<para>Click on the <literal>Show query details</literal> link at
1079
the top of the result page to see the query expansion.</para>
1081
<sect3 id="rcl.search.complex.terms">
1082
<title>Avanced search: the "find" tab</title>
1084
<para>This part of the dialog lets you constructc a query by
1085
combining multiple clauses of different types. Each entry
1086
field is configurable for the following modes:</para>
1082
1089
<listitem><para>All terms.</para>
1107
1114
a mix of single words and phrases enclosed in double quotes.
1108
1115
Stemming and wildcard expansion will be performed as for simple
1109
1116
search. </para>
1112
<listitem><para>The next part allows filtering the
1113
results by their mime types.</para>
1114
<para>The state of the file type selection can be saved as
1115
the default (the file type filter will not be activated at
1116
program start-up, but the lists will be in the restored
1121
<para>The bottom part allows restricting the search results to a
1122
sub-tree of the indexed area. You can use the
1123
<guilabel>Invert</guilabel> checkbox to search for files not in
1124
the sub-tree instead. If you use directory filtering often and on
1125
big subsets of the file system, you may think of setting up
1126
multiple indexes instead, as the performance may be
1133
1118
<formalpara><title>Phrases and Proximity searches</title>
1134
1119
<para>These two clauses work in similar ways, with the
1144
1129
a slack of 1 it will match the latter, but not <literal>fox
1145
1130
quick</literal>. A proximity search for <literal>quick
1146
1131
fox</literal> with the default slack will match the
1147
latter, and also <literal>a fox is a cunning and quick animal</literal>.
1132
latter, and also <literal>a fox is a cunning and quick
1133
animal</literal>.</para>
1150
<para>Click on the <guilabel>Start Search</guilabel> button in
1151
the advanced search dialog, or type <keycap>Enter</keycap> in
1152
any text field to start the search. The button in
1153
the main window always performs a simple search.</para>
1154
<para>Click on the <literal>Show query details</literal> link at
1155
the top of the result page to see the query expansion.</para>
1138
<sect3 id="rcl.search.complex.filter">
1139
<title>Avanced search: the "filter" tab</title>
1141
<para>This part of the dialog has several sections which allow
1142
filtering the results of a search according to a number of
1148
<para>The first section allows filtering by dates of last
1149
modification. You can specify both a minimum and a maximum date. The
1150
initial values are set according to the oldest and newest documents
1151
found in the index.</para>
1155
<para>The next section allows filtering the results by
1156
file size. There are two entries for minimum and maximum
1157
size. Enter decimal numbers. You can use suffix multipliers:
1158
<literal>k/K</literal>, <literal>m/M</literal>,
1159
<literal>g/G</literal>, <literal>t/T</literal> for 1E3, 1E6,
1160
1E9, 1E12 respectively.</para>
1164
<para>The next section allows filtering the results by their mime
1165
types, or mime categories (ie: media/text/message/etc.).</para>
1166
<para>You can transfer the types between two boxes, to define
1167
which will be included or excluded by the search.</para>
1168
<para>The state of the file type selection can be saved as
1169
the default (the file type filter will not be activated at
1170
program start-up, but the lists will be in the restored
1175
<para>The bottom section allows restricting the search results to a
1176
sub-tree of the indexed area. You can use the
1177
<guilabel>Invert</guilabel> checkbox to search for files not in
1178
the sub-tree instead. If you use directory filtering often and on
1179
big subsets of the file system, you may think of setting up
1180
multiple indexes instead, as the performance may be
1182
<para>You can use relative/partial paths for filtering. Ie,
1183
entering <literal>dirA/dirB</literal> would match either
1184
<filename>/dir1/dirA/dirB/myfile1</filename> or
1185
<filename>/dir2/dirA/dirB/someother/myfile2</filename>.</para>
1311
1346
<title>Sorting search results and collapsing duplicates</title>
1313
1348
<para>The documents in a result list are normally sorted in
1314
order of relevance. It is possible to specify different sort
1315
parameters by using the <guimenu>Sort parameters</guimenu>
1316
dialog (located in the <guimenu>Tools</guimenu> menu).</para>
1318
<para>The tool sorts a specified number of the most
1319
relevant documents in the result list, according to specified
1320
criteria. The currently available criteria are
1321
<emphasis>date</emphasis> and <emphasis>mime
1322
type</emphasis>.</para>
1324
<para>The sort parameters stay in effect until they are
1325
explicitly reset, or the program exits. An activated sort is
1326
indicated in the result list header.</para>
1349
order of relevance. It is possible to specify a different sort
1350
order, either by using the vertical arrows in the GUI toolbox to
1351
sort by date, or switching to the result table display and clicking
1352
on any header. The sort order chosen inside the result table
1353
remains active if you switch back to the result list, until you
1354
click one of the vertical arrows, until both are unchecked (you are
1355
back to sort by relevance).</para>
1328
1357
<para>Sort parameters are remembered between program
1329
1358
invocations, but result sorting is normally always inactive
1428
1457
<formalpara><title>AutoPhrases</title>
1429
1458
<para>This option can be set in the preferences dialog. If it is
1430
set, a phrase will be automatically built and added to simple
1431
searches when looking for <literal>Any terms</literal>. This
1432
will not change radically the results, but will give a relevance
1433
boost to the results where the search terms appear as a
1434
phrase. Ie: searching for <literal>virtual reality</literal>
1435
will still find all documents where either
1436
<literal>virtual</literal> or <literal>reality</literal> or
1437
both appear, but those which contain <literal>virtual
1438
reality</literal> should appear sooner in the list.</para>
1459
set, a phrase will be automatically built and added to simple
1460
searches when looking for <literal>Any terms</literal>. This
1461
will not change radically the results, but will give a relevance
1462
boost to the results where the search terms appear as a
1463
phrase. Ie: searching for <literal>virtual reality</literal>
1464
will still find all documents where either
1465
<literal>virtual</literal> or <literal>reality</literal> or
1466
both appear, but those which contain <literal>virtual
1467
reality</literal> should appear sooner in the list.</para>
1470
<para>Phrase searches can strongly slow down a query if most of the
1471
terms in the phrase are common. This is why the
1472
<literal>autophrase</literal> option is off by default for &RCL;
1473
versions before 1.17. As of version 1.17,
1474
<literal>autophrase</literal> is on by default, but very common
1475
terms will be removed from the constructed phrase. The removal
1476
threshold can be adjusted from the search preferences.</para>
1478
<formalpara><title>Phrases and abbreviations</title> <para>As of
1479
&RCL; version 1.17, dotted abbreviations like
1480
<literal>I.B.M.</literal> are also automatically indexed as a word
1481
without the dots: <literal>IBM</literal>. Searching for the word
1482
inside a phrase (ie: <literal>"the IBM company"</literal>) will only
1483
match the dotted abrreviation if you increase the phrase slack (using the
1484
advanced search panel control, or the <literal>o</literal> query
1485
language modifier). Literal occurences of the word will be matched
1486
normally.</para></formalpara>
1532
1582
default is <literal>blue</literal>.</para>
1535
<listitem><para><guilabel>Result list font</guilabel>: There is
1536
quite a lot of information shown in the result list, and you
1537
may want to customize the font and/or font size. The rest of
1538
the fonts used by &RCL; are determined by your generic Qt
1539
config (try the <command>qtconfig</command> command).</para>
1542
<listitem><anchor id="rcl.search.custom.resultpara">
1543
<para><guilabel>Result paragraph format string</guilabel>:
1544
allows you to change the presentation of each result list
1545
entry. This is <link linkend="rcl.search.custom.reslistpara">
1546
described in its own section.</link></para>
1549
<listitem><anchor id="rcl.search.custom.abssep">
1550
<para><guilabel>Abstract snippet separator</guilabel>:
1551
for synthetic abstracts built from index data, which are
1552
usually made of several snippets from different parts of the
1553
document, this defines the snippet separator, an ellipsis by
1585
<listitem><para><guilabel>Style sheet</guilabel>:
1586
The name of a <application>Qt</application> style sheet
1587
text file which is applied to the whole Recoll application
1588
on startup. The default value is empty, but there is a
1589
skeleton style sheet (<filename>recoll.qss</filename>)
1590
inside the <filename>/usr/share/recoll/examples</filename>
1591
directory. Using a style sheet, you can change most Recoll
1592
graphical parameters: colors, fonts, etc. See the sample
1593
file for a few simple examples.</para>
1557
1596
<listitem><para><guilabel>Maximum text size highlighted for
1561
1600
text size to speed up loading.</para>
1603
<listitem><para><guilabel>Prefer HTML to plain text for
1604
preview</guilabel> if set, Recoll will display HTML as such
1605
inside the preview window. If this causes problems with the Qt
1606
HTML display, you can uncheck it to display the plain text
1607
version instead. </para>
1610
<listitem><para><guilabel>Use <PRE> tags instead of
1611
<BR> to display plain text as HTML in preview</guilabel>:
1612
when displaying plain text inside the preview window, &RCL;
1613
tries to preserve some of the original text line breaks and
1614
indentation. It can either use PRE HTML tags, which will
1615
well preserve the indentation but will force horizontal
1616
scrolling for long lines, or use BR tags to break at the
1617
original line breaks, which will let the editor introduce
1618
other line breaks according to the window width, but will
1619
lose some of the original indentation.</para>
1564
1622
<listitem><para><guilabel>Use desktop preferences to choose
1565
1623
document editor</guilabel>: if this is checked, the
1566
1624
<command>xdg-open</command> utility will be used to open files
1600
1658
stat between invocations. It normally starts with sorting
1601
1659
disabled.</para>
1603
<listitem><para><guilabel>Prefer HTML to plain text for preview
1605
</guilabel> if set, Recoll will display HTML as such inside the
1606
preview window. If this causes problems with the Qt HTML
1607
display, you can uncheck it to display the plain text version
1611
1662
</itemizedlist>
1667
<formalpara id="rcl.search.custom.rl">
1668
<title>Result list parameters:</title>
1672
<listitem><para><guilabel>Number of results in a result
1673
page</guilabel></para>
1676
<listitem><para><guilabel>Result list font</guilabel>: There is
1677
quite a lot of information shown in the result list, and you
1678
may want to customize the font and/or font size. The rest of
1679
the fonts used by &RCL; are determined by your generic Qt
1680
config (try the <command>qtconfig</command> command).</para>
1683
<listitem id="rcl.search.custom.resultpara">
1684
<para><guilabel>Edit result list paragraph format string</guilabel>:
1685
allows you to change the presentation of each result list
1686
entry. See the <link linkend="rcl.search.custom.reslist">
1687
result list customisation section</link>.</para>
1690
<listitem id="rcl.search.custom.resulthead">
1691
<para><guilabel>Edit result page html header insert</guilabel>:
1692
allows you to define text inserted at the end of the result
1694
More detail in the <link linkend="rcl.search.custom.reslist">
1695
result list customisation section.</link></para>
1699
<para><guilabel>Date format</guilabel>: allows specifying the
1700
format used for displaying dates inside the result list. This
1701
should be specified as an strftime() string (man strftime).</para>
1704
<listitem id="rcl.search.custom.abssep">
1705
<para><guilabel>Abstract snippet separator</guilabel>:
1706
for synthetic abstracts built from index data, which are
1707
usually made of several snippets from different parts of the
1708
document, this defines the snippet separator, an ellipsis by
1712
</itemizedlist></para>
1616
1715
<formalpara id="rcl.search.custom.search">
1617
1716
<title>Search parameters:</title>
1720
<listitem><para><guilabel>Hide duplicate results</guilabel>:
1721
decides if result list entries are shown for identical
1722
documents found in different places.</para>
1621
1725
<listitem><para><guilabel>Stemming language</guilabel>:
1622
1726
stemming obviously depends on the document's language. This
1623
1727
listbox will let you chose among the stemming databases which
1693
1805
need to implement a way of purging the index from stale data,
1696
<sect3 id="rcl.search.custom.reslistpara">
1697
<title>The result list paragraph format</title>
1699
<para>The presentation of each result inside the result list can be
1700
customized by setting the result list paragraph format inside the
1701
<guilabel>User Interface</guilabel> tab of the <guilabel>Query
1702
configuration</guilabel>.</para>
1704
<para>This is a Qt HTML string where the following printf-like
1705
<literal>%</literal> substitutions will be performed:
1808
<sect3 id="rcl.search.custom.reslist">
1809
<title>The result list format</title>
1811
<para>The result list presentation can be exhaustively customized
1812
by adjusting two elements:</para>
1814
<listitem><para>The paragraph format</para></listitem>
1815
<listitem><para>Html code inside the header
1816
section</para></listitem>
1819
<para>These can be edited from the <guilabel>Result list</guilabel>
1820
tab of the <guilabel>Query configuration</guilabel>.</para>
1822
<para>Newer versions of Recoll (from 1.17) use a WebKit HTML
1823
object by default (this may be disabled at build time), and
1824
total customisation is possible with full support for CSS and
1825
Javascript. Conversely, there are limits to what you can do with
1826
the older Qt QTextBrowser, but still, it is possible to decide
1827
what data each result will contain, and how it will be
1830
<para>No more detail will be given about the header part (only
1831
useful with the WebKit build), if there are restrictions to
1832
what you can do, they are beyond this author's HTML/CSS/Javascript
1833
abilities... There are a few exemples on the
1834
<ulink url="http://www.recoll.org/custom.html">page about
1835
customising the result list</ulink> on the &RCL; web site.</para>
1837
<sect4 id="rcl.search.custom.reslist.para">
1838
<title>The paragraph format</title>
1840
<para>This is an arbitrary HTML string where the following printf-like
1841
<literal>%</literal> substitutions will be performed:
1711
1847
<listitem><formalpara><title>%D</title><para>Date</para></formalpara>
1713
<listitem><formalpara><title>%I</title><para>Icon image name
1714
</para></formalpara>
1849
<listitem><formalpara><title>%I</title><para>Icon image
1850
name. This is normally determined from the mime type. The
1851
associations are defined inside the
1852
<link linkend="rcl.install.config.mimeconf">
1853
<filename>mimeconf</filename> configuration file</link>.
1854
If a thumbnail for the file is found at
1855
the standard Freedesktop location, this will be displayed
1856
instead.</para></formalpara>
1716
1858
<listitem><formalpara><title>%K</title><para>Keywords (if
1717
1859
any)</para></formalpara>
1719
<listitem><formalpara><title>%L</title><para>Preview and
1720
Edit links</para></formalpara>
1861
<listitem><formalpara><title>%L</title><para>Precooked Preview and
1862
Edit links</para></formalpara>
1722
1864
<listitem><formalpara><title>%M</title><para>Mime
1723
1865
type</para></formalpara>
1725
<listitem><formalpara><title>%N</title><para>result Number
1726
</para></formalpara>
1867
<listitem><formalpara><title>%N</title><para>result Number inside
1868
the result page</para></formalpara>
1728
1870
<listitem><formalpara><title>%R</title><para>Relevance
1729
percentage</para></formalpara>
1871
percentage</para></formalpara>
1731
1873
<listitem><formalpara><title>%S</title><para>Size
1732
1874
information</para></formalpara>
1734
<listitem><formalpara><title>%T</title><para>Title</para>
1876
<listitem><formalpara><title>%T</title><para>Title or Filename if
1877
not set.</para></formalpara>
1879
<listitem><formalpara><title>%t</title><para>Title or Filename if
1880
not set.</para></formalpara>
1737
1882
<listitem><formalpara><title>%U</title><para>Url</para></formalpara>
1753
1898
search process (see <link linkend="rcl.program.fields">field
1754
1899
configuration</link>). There are currently very few fields stored
1755
1900
by default, apart from the values above (only
1756
<literal>author</literal>), so this feature will need some custom
1757
local configuration to be useful. For example, you could look at
1758
the fields for the document types of interest (use the right-click
1759
menu inside the preview window), and add what you want to the list
1760
of stored fields. A candidate example would be the
1761
<literal>recipient</literal> field which is generated by the
1762
message filters.</para>
1901
<literal>author</literal> and <literal>filename</literal>), so this
1902
feature will need some custom local configuration to be useful. For
1903
example, you could look at the fields for the document types of
1904
interest (use the right-click menu inside the preview window), and
1905
add what you want to the list of stored fields. A candidate example
1906
would be the <literal>recipient</literal> field which is generated
1907
by the message filters.</para>
1764
1909
<para>The default value for the paragraph format string is:
1765
1910
<programlisting><img src="%I" align="left">%R %S %L &nbsp;&nbsp;<b>%T</b><br>
1891
2041
<para><command>recollq</command> has a man page (not installed by
1892
2042
default, look in the <filename>doc/man</filename> directory). The
1893
2043
Usage string is as follows:</para>
1894
<programlisting>recollq [-o|-a|-f] <query string>
2046
-P: Show the date span for all the documents present in the index
2047
[-o|-a|-f] [-q] <query string>
1895
2048
Runs a recoll query and displays result lines.
1896
Default: will interpret the argument(s) as a query language string
1897
-o Emulate the gui simple search in ANY TERM mode
1898
-a Emulate the gui simple search in ALL TERMS mode
1899
-f Emulate the gui simple search in filename mode
2049
Default: will interpret the argument(s) as a xesam query string
2051
implicit AND, Exclusion, field spec: t1 -t2 title:t3
2052
OR has priority: t1 OR t2 t3 OR t4 means (t1 OR t2) AND (t3 OR t4)
2053
Phrase: "t1 t2" (needs additional quoting on cmd line)
2054
-o Emulate the GUI simple search in ANY TERM mode
2055
-a Emulate the GUI simple search in ALL TERMS mode
2056
-f Emulate the GUI simple search in filename mode
2057
-q is just ignored (compatibility with the recoll GUI command line)
1900
2058
Common options:
1901
-c <configdir> : specify config directory, overriding $RECOLL_CONFDIR
2059
-c <configdir> : specify config directory, overriding $RECOLL_CONFDIR
1902
2060
-d also dump file contents
1903
-n <cnt> limit the maximum number of results (0->no limit, default 2000)
2061
-n [first-]<cnt> define the result slice. The default value for [first]
2062
is 0. Without the option, the default max count is 2000.
2063
Use n=0 for no limit
1904
2064
-b : basic. Just output urls, no mime types or titles
1905
-m : dump the whole document meta[] array
1906
-S fld : sort by field name
2065
-Q : no result lines, just the processed query and result count
2066
-m : dump the whole document meta[] array for each result
2067
-A : output the document abstracts
2068
-S fld : sort by field <fld>
1907
2069
-D : sort descending
2070
-i <dbdir> : additional index, several can be given
2071
-e use url encoding (%xx) for urls
2072
-F <field name list> : output exactly these fields for each result.
2073
The field values are encoded in base64, output in one line and
2074
separated by one space character. This is the recommended format
2075
for use by other programs. Use a normal query with option -m to
2076
see the field names.
1908
2077
</programlisting>
1910
2079
<para>Sample execution:</para>
1929
2098
capabilities as the complex search interface in the
1932
<para>The language is roughly based on the <ulink
1933
url="http://www.xesam.org/main/XesamUserSearchLanguage95">
1934
Xesam</ulink> user search language specification.</para>
2101
<para>The language is roughly based on the (seemingly defunct)
2102
<ulink url="http://www.xesam.org/main/XesamUserSearchLanguage95">
2103
Xesam</ulink> user search language specification.</para>
1936
2105
<para>If the results of a query language search puzzle you and you
1937
doubt what has been actually searched for, you can use the GUI
1938
<literal>show query</literal> link at the top of the result list to
1939
check the exact query which was finally executed by Xapian.</para>
2106
doubt what has been actually searched for, you can use the GUI
2107
<literal>show query</literal> link at the top of the result list to
2108
check the exact query which was finally executed by Xapian.</para>
1941
2110
<para>Here follows a sample request that we are going to
1944
2113
<programlisting>
1945
2114
author:"john doe" Beatles OR Lennon Live OR Unplugged -potatoes
1993
2162
<replaceable>title:"prejudice pride"</replaceable> is not the same as
1994
2163
<replaceable>title:prejudice title:pride</replaceable>, and is
1995
2164
unlikely to find a result.</para>
1996
<para>Most Xesam phrase modifiers are unsupported, except for
1997
<literal>l</literal> (small ell) to disable stemming, and
1998
<literal>p</literal> to turn a phrase into a NEAR (unordered proximity)
1999
search. Exemple: <replaceable>"prejudice pride"p</replaceable></para>
2166
<para>Modifiers can be set on a phrase clause, for exemple to specify
2167
a proximity search (unordered). See
2168
<link linkend="rcl.search.lang.modifiers">the modifier
2169
section</link>.</para>
2001
2171
<para>&RCL; currently manages the following default fields:</para>
2028
2198
results on file location (Ex:
2029
2199
<literal>dir:/home/me/somedir</literal>). <literal>-dir</literal>
2030
2200
also works to find results out of the specified directory, only
2031
after release 1.15.8.</para>
2201
after release 1.15.8. A tilde inside the value will be expanded to
2202
the home directory. <literal>dir</literal> is not a regular field
2203
and only one value makes sense in a query (you can't use
2204
<literal>dir:dir1 OR dir:dir2</literal>). Relative paths make
2206
<literal>dir:share/doc</literal> would match either
2207
<filename>/usr/share/doc</filename> or
2208
<filename>/usr/local/share/doc</filename> </para>
2211
<listitem><para><literal>size</literal> for filtering the
2212
results on file size. Exemple:
2213
<literal>size<10000</literal>. You can use
2214
<literal><</literal>, <literal>></literal> or
2215
<literal>=</literal> as operators. You can specify a range like the
2216
following: <literal>size>100 size<1000</literal>. The usual
2217
<literal>k/K, m/M, g/G, t/T</literal> can be used as (decimal)
2218
multipliers. Ex: <literal>size>1k</literal> to search for files
2219
bigger than 1000 bytes.</para>
2034
2222
<listitem><para><literal>date</literal> for searching or filtering
2327
2517
handle the protocol.</para>
2329
2519
</itemizedlist>
2330
The following will just describe the simple filters, if you are
2331
programmer enough to write one of the other kind, it shouldn't be too
2332
difficult to make sense of one of the existing modules (ie:
2520
The following will just describe the simple filters. If you can
2521
program and want to write one of the other kind, it shouldn't be too
2522
difficult to make sense of one of the existing modules. For example,
2523
look at <command>rclzip</command> which uses Zip file paths as
2524
internal identifiers (<literal>ipath</literal>), and
2525
<command>rclinfo</command>, which uses an integer index.</para>
2527
<sect2 id="rcl.program.filters.simple">
2528
<title>Simple filters</title>
2335
2530
<para>&RCL; simple filters are usually shell-scripts, but this is in
2336
no way necessary. These programs are extremely simple and most
2337
of the difficulty lies in extracting the text from the native
2338
format, not outputting what is expected by &RCL;. Happily
2339
enough, most document formats already have translators or text
2340
extractors which handle the difficult part and can be called
2341
from the filter. In some case the output of the translating
2342
program is appropriate, and no intermediate shell-script is
2531
no way necessary. Extracting the text from the native format is the
2532
difficult part. Outputting the format expected by &RCL; is
2533
trivial. Happily enough, most document formats have translators or
2534
text extractors which can be called from the filter. In some cases
2535
the output of the translating program is completely appropriate,
2536
and no intermediate shell-script is needed.</para>
2345
2538
<para>Filters are called with a single argument which is the
2346
2539
source file name. They should output the result to stdout.</para>
2348
<para>The <literal>RECOLL_FILTER_FORPREVIEW</literal>
2349
environment variable (values <literal>yes</literal>,
2350
<literal>no</literal>) tells the filter if the operation is
2351
for indexing or previewing. Some filters use this to output a
2352
slightly different format. This is not essential.</para>
2541
<para>When writing a filter, you should decide if it will output
2542
plain text or html. Plain text is simpler, but you will not be able
2543
to add metadata or vary the output character encoding (this will be
2544
defined in a configuration file). Additionally, some formatting may
2545
easier to preserve when previewing html. Actually the deciding factor
2546
is metadata: &RCL; has a way to <link linkend="rcl.program.filters.html">
2547
extract metadata from the html header and use it for field
2548
searches.</link>.</para>
2550
<para>The <literal>RECOLL_FILTER_FORPREVIEW</literal> environment
2551
variable (values <literal>yes</literal>, <literal>no</literal>)
2552
tells the filter if the operation is for indexing or
2553
previewing. Some filters use this to output a slightly different
2554
format, for example stripping uninteresting repeated keywords (ie:
2555
<literal>Subject:</literal> for email) when indexing. This is not
2558
<para>You should look to one of the simple filters, for exemple
2559
<literal>rclps</literal> for a starting point.</para>
2561
<para>Don't forget to make your filter executable before
2566
<sect2 id="rcl.program.filters.association">
2567
<title>Telling &RCL; about the filter</title>
2569
<para>There are two elements that link a file to the filter which
2570
should process it: the association of file to mime type and the
2571
association of a mime type with a filter.</para>
2573
<para>The association of files to mime types is mostly based on
2574
name suffixes. The types are defined inside the
2575
<link linkend="rcl.install.config.mimemap">
2576
<filename>mimemap</filename> file</link>. Example:
2579
.doc = application/msword
2581
If no suffix association is found for the file name, &RCL; will try
2582
to execute the <command>file -i</command> command to determine a
2354
2585
<para>The association of file types to filters is performed in
2355
the <filename>mimeconf</filename> file. A sample:</para>
2586
the <link linkend="rcl.install.config.mimeconf">
2587
<filename>mimeconf</filename> file</link>. A sample will probably be
2588
of better help than a long explanation:</para>
2356
2589
<programlisting>
3161
3397
enable the gnu version on systems where the native one is
3164
<listitem><para><literal>--without-gui</literal> Disable the Qt
3165
interface, and auxiliary uses of X11, and compile the command
3166
line version.</para>
3400
<listitem><para><literal>--disable-qtgui</literal> Disable the Qt
3401
interface. Will allow building the indexer and the command line
3402
search program in absence of a Qt environment.</para>
3404
<listitem><para><literal>--disable-x11mon</literal> Disable
3405
X11 connection monitoring inside recollindex. Together with
3406
--disable-qtgui, this allows building recoll without Qt and
3168
3409
<listitem><para>Of course the usual
3169
3410
<application>autoconf</application> <command>configure</command>
3410
3651
</varlistentry>
3653
<varlistentry id="rcl.install.config.recollconf.skippedpathsfnmpathname">
3654
<term><literal>skippedPathsFnmPathname</literal></term>
3655
<listitem><para>The values in the
3656
<literal>*skippedPaths</literal> variables are matched by
3657
default with <literal>fnmatch(3)</literal>, with the
3658
FNM_PATHNAME and FNM_LEADING_DIR flags. This means that '/'
3659
characters must be matched explicitely. You can set
3660
<literal>skippedPathsFnmPathname</literal> to 0 to disable
3661
the use of FNM_PATHNAME (meaning that /*/dir3 will match
3662
/dir1/dir2/dir3).</para>
3412
3667
<varlistentry id="rcl.install.config.recollconf.followlinks">
3413
3668
<term><literal>followLinks</literal></term>
3414
3669
<listitem><para>Specifies if the indexer should follow