3
CHANGES - List of revisions
5
=head1 Revision History
7
This document contains list of bug fixes and feature additions to Swish-e.
9
=head2 Version 2.4.1 - December 17, 2003
13
=item * Added new example CGI script
15
search.cgi is a new skeleton CGI script that uses SWISH::API for searching.
16
It is installed in the same location as swish.cgi.
18
=item * Add Fuzzy access to C and Perl interfaces
20
Added a number of functions to the C API (and SWISH::API)
21
to access the stemmer used when indexing a given index.
23
=item * Commas in numbers
25
Added commas to summary display at end of indexing.
27
=item * Insert whitespace between tags
29
Parser.c was updated to flush the text buffer before and after
30
every (non-inline HTML) tag.
36
would index as a single word "foobarbaz".
40
DirTree.pl was updated to work with SWISH::Filter and to work on Windows.
41
DirTree.pl is a program to fetch files from the file system and works with
42
the -S prog input method.
44
=item * Problem with --enable-incremental option
46
Fixed configure script to build incremental option. Note that this is still
47
experimental. But testers are welcome.
51
Mark Fletcher with the help of valgrind found a bug in headers.c
52
function SwishIndexHeaderNames used by the C API.
54
=item * Clarify documentation regarding search order
56
At the prompting of Doralyn Rossmann updated SEARCH.pod to
57
try and make the explanation of searching clearer, and to fix an error
58
in the description of nested searches.
62
=head2 Version 2.4.0 - October 27, 2003
66
=item * Note: Different Index Format
68
Swish-e version 2.4.0 has a different index file format from previous
69
versions of Swish-e. Upgrading will B<require> reindexing -- version 2.4.0
70
cannot read indexes created with previous versions.
74
=head2 Version 2.4.0 (Release Candidate 4) September 26, 2003
78
=item * robots.txt not closed correctly
80
When using -S http method robots.txt was not closed and that caused
81
the (last) .contents file to not be unlinked under Windows. Windows
82
seems to think filenames are related to files.
84
=item * SWISH::Filter and locating programs on Windows
86
SWISH::Filter now scans $libexecdir in addition to the PATH for programs (such at catdoc and
87
pdftotext), and also checks for programs by adding the extensions ".exe" and ".bat" to the
90
=item * Install sample templates
92
The sample templates included with swish.cgi are now installed
93
in $pkgdatadir (typically /usr/local/share/swish-e).
97
=head2 Version 2.4.0 (Release Candidate 3) September 11, 2003
101
=item * Fix parser bug meta=(foo*)
103
Fixed bug in query parser caused in rc2's (pr2) attempt to catch wildcards
108
=head2 Version 2.4.0 (Release Candidate 2) September 10, 2003
112
=item * Indexing HTML title
114
Fixed a problem when these were used in combination:
117
MetaNameAlias swishtitle title
119
That failed to correctly reset the metaname stack and indexed text under
122
=item * Single Wildcards
124
Due to the way the query parser "works" a search of
128
would result in a search of "foo*". Now that results in:
130
err: Single wildcard not allowed as word
132
=item * Fixed search parsing bug
134
Brad Miele reported that the word "andes" was not being found. It was being
135
stemmed to "and" when was then considered an operator. [moseley]
137
=item * Add new directive PropertyNamesSortKeyLength
139
PropertyNamesSortKeyLength sets the sort key length to use when sorting
140
string properties. The default is 100 characters. There was a hard-coded
141
100 char limit before, but that was a problem where people were not building
142
from source (Windows). The value of this is questionable -- it's intended to
143
limit how much memory is used when sorting while indexing and searching. [moseley]
145
=item * Fixed sorting issues with multiple indexes and reverse sorting
147
Reworked much of the sorting code. Still to do is setting the character sort order.
150
=item * Fixed minor memory leak
152
Fixed leak of not releasing memory of index file name and swish_handle
153
destroy, and fixed SwishStemWord to default to the Stemmer_en. [moseley]
155
Fixed libtest.c example program that was not cleaning up memory after an
158
=item * Replaced Swish-e's Porter Stemmer with Snowball
160
Swish-e now has support for Snowball stemmers (http://snowball.tartarus.org/).
161
The stemmers are enabled for an index with FuzzyIndexingMode Stemming_* where "*" can be:
163
de, dk, en1, en2, es, fi, fr, it, nl, no, pt, ru, se
165
In addition, UseStemming yes or FuzzyIndexingMode Stemming_en will use the old stemmer.
169
=head2 Version 2.4.0 (Release Candidate 1) May 21, 2003
173
=item * Security Fix: swish.cgi
175
The swish.cgi script was not correctly escaping HTML when searching by
176
the right combination of metanames and highlighting module. This could
177
lead to cross-site scripting if indexing un-trusted documents. [moseley]
179
=item * Added Support for building a Debian Package
181
To build as a .deb unpack the distribution and chdir then run
183
$ fakeroot debian/build binary
185
Then install the generated .deb file with dpkg -i
187
=item * Use SWISH::Filter by default with spider.pl
189
spider.pl is installed in the libexecdir directory as well as the SWISH::Filter modules.
190
PDF, MS Word, MP3, and XML documents will be indexed automatically if the required helper
191
applications (e.g. catdoc, pdftotext) or scripts (e.g. MP3::Tag) are installed.
193
Swish also knows about libexecdir, so you you specify a relative path with -S prog
194
swish-e will look for the program in libexecdir. This is mostly for spider.pl so
195
indexing only requires:
198
SwishProgParameters default http://localhost/index.html
200
And swish-e will find spider.pl and SWISH::Filter will be used to convert docs.
202
=item * Fixed Document-Type bug
204
Document-Type was not being reset after set input from a -S prog program causing
205
the wrong parser to be used. [moseley]
207
=item * New Directive: PropertyNamesNoStripChars
209
Swish replaces all series of low ASCII chars with a single space
210
character. This option instructs swish to store all chars in the property. [moseley]
212
=item * Change HTTP access defaults
214
Defaults used with -S http access method were changed.
217
Delay was reduced from one minute between start of each request to five seconds
220
MaxDepth was changed from five to zero, meaning there is no limit to depth indexed by
223
=item * swishspider location and SpiderDirectory
225
The swishspider program is now installed in $prefix/lib/swish-e by default. This can
226
be changed by the --libexecdir option to configure.
228
The SpiderDirectory option now defaults to the value of libexecdir instead of the current
232
=item * Added libtool and automake support
234
Replaces the build system with Autotools. Now builds libswish-e as
235
a shared library on systems that support shared libraries.
236
The swish-e binary links against this shared library.
237
Can also build outside the source tree on platforms with GNU make. [moseley]
239
=item * Updates to installation
241
Running "make install" now installs additional files.
242
Files include the swish-e binary, the libswish-e search library, swish-e.h
243
header, documentation files, the swishspider program, and Perl modules used for the example
244
swish.cgi search script. Directories will be created if they do not already exist.
245
Installation directories can be specified at build time.
247
=item * Fixed bug when searching at end of inverted index
249
Swish was not correctly detecting the end of the inverted index
250
when searching a wildcard word that was past the last word in the index.
251
Caught by Frank Heasley. [moseley]
254
=item * Increase sort key length from 50 to 100 characters
256
The setting MAX_SORT_STRING_LEN in F<src/config.h> sets the max length used
257
when sorting in swish-e. You may reduce this number to save memory while
258
sorting, or increase it if you have very long properties to sort.
260
=item * Remove " entity from -p output
262
The -p option to print properties was escaping double quotes in properties
263
with the &quot; entity. -x does not do that, so inconsistent. -p no longer
264
converts double quotes. The user should pick a good delimiter with -d or preferably use
265
the -x method for generating output.
267
=item * XML parser and Windows
269
The XML parser was being passed the incorrect buffer length when used on Windows
270
platform causing the parser to abort with an error.
272
=item * Version Numbering
274
SWISH-E versions starting with 2.3.4 use kernel version numbering. Versions are
275
in the form: Major.Minor.Build. Odd minor versions are development. Even minor
276
versions are releases. 2.3.4 would be a development version.
277
2.4.0 would be a release version. 2.3.20 would be the 20th build of 2.3.
279
=item * Added RPM support
281
RPMs can be built with:
286
Copy the resulting tarball to RPM's SOURCES directory and then run as a superuser:
288
rpmbuild -ba rpm/swish-e.spec
291
You should have swish-e packages in your RPMS/<arch> directory. [augur]
293
=item * Changed default perl binary location
295
Most perl scripts provided with SWISH-E now use /usr/bin/perl by default.
296
Note that some scripts are generated at build time, so those will look in the
297
path for the location of the perl binary.
299
=item * New Feature: MetaNamesRank
301
MetaNamesRank can be used to adjust the ranking for words based on
304
=item * New Swish Library API and Perl Module
306
The Swish-e C library interface was rewritten to provide
307
better memory management and better separation of data.
308
Most indexing related code has been removed from the library.
309
A new header file is provided for the API: swish-e.h.
311
The Perl module SWISHE was replaced with the SWISH::API module
312
in the Swish-e distribution.
314
B<Previous versions of the SWISHE module will not work with this version of Swish-e.>
316
If you are using the SWISHE module from a previous version of Swish then you must
317
either rewrite your code to use the new SWISH::API module (highly recommended)
318
or use the replacement SWISHE module. The replacement SWISHE module is a thin
319
interface to the SWISH::API module. It can be downloaded from
321
http://swish-e.org/Download/old/SWISHE-0.03.tar.gz
323
=item * NoContents not working with libxml2 parser
325
Corrected problem when using NoContents with binary files and the HTML2 parser.
327
Trying to index image file names with:
330
NoContents .gif .jpeg
332
failed to index the path names because the default parser
333
(HTML2 when libxml2 is linked with swish-e)
334
was not finding any text in the binary files. [moseley]
336
=item * Updates to swish.cgi
338
The example/swish.cgi script can now use the SWISH::API module
339
for searching an index. Combined with mod_perl this module
340
can improve search performance considerably.
342
The Perl modules used with the swish.cgi script have all been moved into
343
the SWISH::* namespace. Hence, files in the F<modules> directory were moved
344
into the F<modules::SWISH> directory.
348
=head2 Version 2.2.3 - December 11, 2002
350
Multiple -L options were ORing instead of ANDing.
351
Catch by Patrick Mouret. [moseley]
353
=head2 Version 2.2.2 - November 14, 2002
355
Pass non- text/* files onto indexing code IF there is a FileFilter
356
associated with the *extension* of the URL. Fixes the problem of not
357
being able to index, say, pdf files by using the FileFilter configuation
360
Fixed bug where nulls were stripped when using FileFilter with -S prog.
361
Catch by Greg Fenton. [moseley]
363
=head2 Version 2.2.1 - September 26, 2002
367
=item * NoContents with -S prog
369
Failed to use the correct default parser when using the No-Contents header
370
and libxml2 linked in. [moseley]
372
=item * Add tests for IRIX and sparc machines
374
8-byte alignment in mem_zones is is required for these machine [moseley]
377
=item * Fixed code when removing files
379
Was not correctly removing words from index when parser aborted [jmruiz]
381
=item * Merge segfault
383
Fixed segfault caused by trying to print null dates while merging
384
duplicate files. [moseley]
386
=item * Documentation patches
388
Spelling corrections to the SWISH-CONFIG pod page [Steve Eckert]
390
=item * Configure corrections
392
Fixed a zlib test error that used "==" in a test [Steve Eckert]
394
=item * Updates to VMS build
396
The VMS build was updated [Jean-Fran�ois PI�RONNE]
398
=item * MANIFEST corrections
400
Added missing filters and vms build file into MANIFEST [moseley]
404
=head2 Version 2.2 - September 18, 2002
409
=item * Default parser
411
Swish-e will now use the HTML2 (libxml2) parser by default if libxml2 is
412
installed and DefaultContents or IndexContents is not used.
414
=item * Selecting parsers
416
Allow HTML*, XML*, and TXT* to automatically select the libxml2-based parsers
417
if libxml2 is linked with Swish-e, otherwise fallback to the built-in parsers.
419
=item * SwishSpider and Filters
421
Filters (FileFilter directive) did not work correctly when spidering
422
with the -S http method. A new filter system was developed and now
423
filtering of documents (e.g. pdf->html or MSWord->text) is handled
424
by the src/SwishSpider program.
426
When indexing with the -S http method only documents of content-type "text/*"
427
are indexed. Other documents must be converted to text by using the filter system.
429
=item * Buffer overflow in xml.c
431
Fixed bug in xml.c reported by Rodney Barnett when very long words
432
were indexed. [moseley]
434
=item * configure script updates
436
Updated from _WIN32 checks to feature checks using autoconf [moseley, norris]
438
=item * updates to run on Alpha (Linux 2.4 (Debian 3.0))
440
Fixed a cast error when calling zlib, and the calls to read/write a packed longs
441
to disk. [jmruiz, moseley]
443
=item * COALESCE_BUFFER_MAX_SIZE
445
Some people were seeing the following error:
447
err: Buffer too short in coalesce_word_locations.
448
Increase COALESCE_BUFFER_MAX_SIZE in config.h and rebuild.
450
This was due to indexing binary data or files with very large number of words.
451
The best solution is to not index binary data or files with a very large number
454
Swish-e will now automatically reallocate the buffer as needed. [jmruiz]
459
=head2 Version 2.2rc1 - August 29, 2002
461
Many large changes were made internally in the code, some for performance
462
reasons, some for feature changes and additions, and some to prepare
463
for new features in later versions of Swish-e.
467
=item * Documentation!
469
Documentation is now included in the source distribution as .pod
470
(perldoc) files, and as HTML files. In addition, the distribution can now
471
generate PDF, postscript, and unix man pages from the source .pod files.
472
See L<README|README> for more information.
474
=item * Indexing and searching speed
476
The indexing process has been imporoved. Depending on a number of
477
factors, you may see a significant improvement in indexing speed,
478
especially if upgrading from version 1.x.
480
Searching speed has also been improved. Properties are not loaded until
481
results are displayed, and properties are pre-sorted during indexing to
482
speed up sorting results by properties while searching.
484
=item * Properties are written to a sepearte file
486
Swish-e now stores document properties in a separate file. This means
487
there are now two files that make up a Swish-e index. The default files
488
are C<index.swish-e> and C<index.swish-e.prop>.
490
This change frees memory while indexing, allowing larger collections to
491
be indexed in memory.
493
=item * Internal data stored as Properties
495
Pre 2.2 some internal data was stored in fixed locations within the
496
index, namely the file name, file size, and title. 2.2 introduced new
497
internal data such as the last modified date, and document summaries.
498
This data is considered I<meta data> since it is data about a document.
500
Instead of adding new data to the internal structure of the index file,
501
it was decided to use the MetaNames and PropertyNames feature of Swish-e
502
to store this meta information. This allows for new meta data to be added
503
at a later time (e.g. Content-type), and provides an easy and customizable
504
way to print results with the C<-p> switch and the new C<-x> switch.
505
In addition, search results can now be sorted and limited by properties.
507
For example, to sort by the rank and title:
509
swish-e -w foo -s swishrank desc swishtitle asc
512
=item * The header display has been slightly reorganized.
514
If you are parsing output headers in a program then you may need to
515
adjust your code. There's a new switch <-H> to control the level of
516
header output when searching.
518
=item * Results are now combined when searching more than one index.
520
Swish-e now merges (and sorts) the results from multiple indexes when
521
using C<-f> to specify more than one index. This change effects the way
522
maxhits (C<-m>) works. Here's a summary of the way it works for the
526
1.3.2 - MaxHits returns first N results starting from the first index.
527
e.g. maxhits=20; 15 hits Index1, 40 hits Index2
528
All 15 from Index1 plus first five from Index2 = 20 hits.
530
2.0.0 - MaxHits returns first N results from each index.
531
e.g. Maxhits=20; 15 hits Index1, 40 hits Index2
532
All 15 from Index1 plus 15 from Index2.
534
2.2.0 - Results are merged and first N results are returned.
535
e.g. Maxhits=20; 15 hits Index1, 40 hits Index2
536
Results are merged from each index and sorted
537
(rank is the default sort) and only the first
541
=item * New B<prog> document source indexing method
543
You can now use -S prog to use an external program to supply documents
544
to Swish-e. This external program can be used to spider web servers,
545
index databases, or to convert any type of document into html, xml,
546
or text, so it can be indexed by Swish-e. Examples are given in the
547
C<prog-bin> directory.
549
=item * The indexing parser was rewritten to be more logical.
551
TranslateCharacters now is done before WordCharacters is checked. For example,
553
WordCharacters abcdefghijklmnopqrstuvwxyz
554
TranslateCharacters � n
556
Now C<El Ni�o> will be indexed as El Nino (el and nino), even though C<�>
557
is not listed in WordCharacters.
559
Previously, stopwords were checked after stemming and soundex conversions,
560
as well as most of the other word checks (WordCharacters, min/max length
561
and so on). This meant that the stopword list probably didn't work as
562
expected when using stemming.
564
=item * The search parser was rewritten to be more logical
566
The search parser was rewritten to correct a number of logic errors.
567
Swish-e did not differentiate between meta names, Swish-e operators
568
and search words when parsing the query. This meant, for example,
569
that metanames might be broken up by the WordCharacters setting, and
570
that they could be stemmed.
572
Swish-e operator characters C<"*()=> can now be searched by escaping
573
with a backslash. For example:
575
./swish-e -w 'this\=odd\)word'
577
will end up searching for the word C<this=odd)word>. To search for a
578
backslash character preceed it with a backslash.
580
Currently, searching for:
582
./swish-e -w 'this\*'
584
is the same as a wildcard search. This may be fixed in the future.
586
Searching for buzzwords with those characters will still require
587
backslashing. This also may change to allow some un-escaped operator
588
characters, but some will always need to be escaped (e.g. the double-quote
591
=item * Quotes and Backslash escapes in strings
593
A bug was fixed in the C<parse_line()> function (in F<string.c>) where
594
backslashes were not escaping the next character. C<parse_line()> is used
595
to parse a string of text into tokens (words). Normally splitting is done
596
at whitespace. You may use quotes (single or double) to define a string
597
(that might include whitespace) as a single parameter. The backslash
598
can also be used to escape the following character when *within* quotes
599
(e.g. to escape an embedded quote character).
601
ReplaceRules append "foo bar" <- define "foo bar" as a single word
602
ReplaceRules append "foo\"bar" <- escape the quotes
603
ReplaceRules append 'foo"bar' <- same thing
606
=item * Example C<user.config> file removed.
608
Previous versions of Swish-e included a configuration file called
609
C<user.config> which contained examples of all directives. This has
610
been replaced by a series of example configuration files located in the
611
C<conf> directory. The configuration directives are now described in
612
L<SWISH-CONFIG|SWISH-CONFIG>.
614
=item * Ports to Win32 and VMS
616
David Norris has included the files required to build Swish-e under
617
Windows. See C<src/win32>. A self-extracting Windows version is
618
available from the Download page of the swish-e.org web site.
620
Jean-Fran�ois Pi�ronne has provided the files required to build Swish-e
621
under OpenVMS. See C<src/vms> for more information.
623
=item * String properties are concatenated
625
Multiple I<string> properties of the same name in a document are now
626
concatenated into one property. A space character is added between
627
the strings if needed. A warning will be generated if multiple numeric
628
or date properties are found in the same document, and the additional
629
properties will be ignored.
631
Previously, properties of the same name were added to the index, but
632
could not be retrieved.
634
To do: remove the C<next> pointer, and allow user-defined character to
635
place between properties.
637
=item * regex type added to ReplaceRules
639
A more general purpose pattern replacement syntax.
644
Swish-e's XML parser was replaced with James Clark's expat XML parser
647
Swish-e can now use Daniel Veillard's libxml2 library for parsing HTML and
648
XML. This requires installation of the library before building Swish-e.
649
See the L<INSTALL|INSTALL> document for information. libxml2 is not
650
required, but is strongly recommended for parsing HTML over Swish-e's
651
internal HTML parser, and provides more features for both HTML and
654
=item * Support for zlib
656
Swish-e can be compiled with zlib. This is useful for compressing large
657
properties. Building Swish-e with zlib is stronly recommended if you
658
use its C<StoreDescription> feature.
660
=item * LST type of document no longer supported
662
LST allowed indexing of files that contained multiple documents.
664
=item * Temporary files
666
To improve security Swish-e now uses the C<mkstemp(3)> function to
667
create temporary files. Temporary files are used while indexing only.
668
This may result in some portability issues, but the security issues
671
(Currently this does not apply to the -S http indexing method.)
673
C<mkstemp> opens the temporary with O_EXCL|O_CREAT flags. This prevents
674
overwriting existing files. In addition, the name of the file created
675
is a lot harder to guess by attackers. The temporary file is created
676
with only owner permissions.
678
Please report any portability issues on the Swish-e discussion list.
680
=item * Temporary file locations
682
Swish-e now uses the environment variables C<TMPDIR>, C<TMP>, and
683
C<TEMP> (in that order) to decide where to write temporary files.
684
The configuration setting of L<TmpDir|SWISH-CONFIG/"item_TmpDir"> will
685
be used if none of the environment variables are set. Swish-e uses the
686
current directory otherwise; there is no default temporary directory.
688
Since the environment variables override the configuration settings,
689
a warning will be issued if you set L<TmpDir|SWISH-CONFIG/"item_TmpDir">
690
in the configuration file and there's also an environment variable set.
692
Temporary files begin with the letters "swtmp" (which can be changed in
693
F<config.h>), followed by two or more letters that indicate the type of
694
temporary file, and some random characters to complete the file name.
695
If indexing is aborted for some reason you may find these temporary
698
=item * New Fuzzy indexing method Double Metaphone
700
Based on Lawrence Philips' Metaphone algorithm, add two
701
new methods of creating a fuzzy index (in addition to Stemming and Soundex).
706
Changes to Configuration File Directives. Please see
707
L<SWISH-CONFIG|SWISH-CONFIG> for more info.
711
=item * New directives: IndexContents and DefaultContents
713
The IndexContents directive assigns internal Swish-e document parsers
714
to files based on their file type. The DefaultContents directive
715
assigns a parser to be used on file that are not assigned a parser with
718
=item * New directive: UndefinedMetaTags [error|ignore|index|auto]
720
This describes what to do when a meta tag is found in a document that
721
is not listed in the MetaNames directive.
723
=item * New directive: IgnoreTags
725
Will ignore text with the listed tags.
727
=item * New directive: SwishProgParameters *list of words*
729
Passes words listed to the external Swish-e program when running with
730
C<-S prog> document source method.
732
=item * New directive: ConvertHTMLEntities [yes|no]
734
Controls parsing and conversion of HTML entities.
736
=item * New directive: DontBumpPositionOnMetaTags
738
The word position is now bumped when a new metatag is found -- this is
739
to prevent phrases from matching across meta tags. This directive will
740
disable this behavior for the listed tags.
742
This directive works for HTML and XML documents.
744
=item * Changed directive: IndexComments
746
This has been changed such that comments are not indexed by default.
748
=item * Changed directive: IgnoreWords
750
The builtin list of stopwords has been removed. Use of the SwishDefault
751
word will generate a warning, and no stop words will be used. You must
752
now specify a list of stopwords, or specify a file of stopwords.
754
A sample file C<stopwords.txt> has been included in the F<conf/stopwords>
755
directory of the distribution, and can be used by the directive:
757
IgnoreWords File: /path/to/stopwords.txt
759
=item * Change of the default for IgnoreTotalWordCountWhenRanking
761
The default is now "yes".
763
=item * New directive: Buzzwords
765
Buzzwords are words that should be indexed as-is, without checking
766
for stopwords, word length, WordCharacters, or any other of the word
767
limiting features. This allows indexing of things like C<C++> when "+"
768
is not listed in WordCharacters.
770
Currenly, IgnoreFirstChar and IgnoreLastChar will be stripped before
771
processing Buzzwords.
773
In the future we may use separate IgnoreFirst/Last settings for buzzwords
774
since, for example, you may wish to index all C<+> within Swish-e words,
775
but strip C<+> from the start/end of Swish-e words, but not from the
778
=item * New directives: PropertyNamesNumeric PropertyNamesDate
780
Before Swish-e 2.2 all user-defined document properties were stored in
781
the index as strings. PropertyNamesNumeric and PropertyNamesDate tell
782
it that a property should be stored in binary format. This allows
783
for correct sorting of numeric properties.
785
Currenly, only integers can be stored, such as a unix timestamp. (Swish-e
786
uses C<strtoul> to convert the number to an unsigned long internally.)
788
PropertyNamesDate only indicates to Swish-e that a number is a unix
789
timestamp, and to display the property as a formatted time when printing
790
results. Swish does not currently parse date strings; you must provide
793
=item * New directive: MetaNameAlias
795
You may now create alias names for MetaNames. This allow you to map or
796
group multiple names to the same MetaName.
798
=item * New directive: PropertyNameAlias
800
Creates aliases for a PropertyName.
802
=item * New directive: PropertyNamesMaxLength
804
Sets the max length of a text property.
806
=item * New directive: HTMLLinksMetaName
808
Defines a metaname to use for indexing href links in HTML documents.
809
Available only with libxml2 parser.
811
=item * New directive: ImageLinksMetaName
813
Defines a metaname to use for indexing src links in <img> tags.
814
Allow you to search image pathnames within HTML pages. Available only
817
=item * New directive: IndexAltTagMetaName
819
Allows indexing of image ALT tags. Only available when using the libxml2 parser.
821
=item * New directive: AbsoluteLinks
823
Attempts to convert relative links indexed with HTMLLinksMetaName and
824
ImageLinksMetaName to absolute links. Available only with libxml2 parser.
826
=item * New directive: ExtractPath
828
Allows you to use a regular expression to extract out part of the path
829
of each file and index it with a meta name. For example, this allows
830
searches to be limited to parts of your file tree.
832
=item * New directive: FileMatch
834
FileMatch is similar to FileRules. Where FileRules is used to exclude
835
files and directoires, FileMatch is used to I<include> files.
837
=item * New directive: PreSortedIndex
839
Controls which properties are pre-sorted while indexing. All properties
840
are sorted by default.
842
=item * New directive: ParserWarnLevel
844
Sets the level of warning printed when using libxml2.
846
=item * New directive: obeyRobotsNoIndex [yes|NO]
848
When using libxml2 to parse HTML, Swish-e will skip files marked as
851
<meta name="robots" content="noindex">
853
Also, comments may be used within HTML and XML source docs to block sections of
854
content from indexing:
856
<!-- SwishCommand noindex -->
857
<!-- SwishCommand index -->
859
and/or these may be used also:
865
=item * New directive: UndefinedXMLAttributes
867
This describes how the content of XML attributes should be indexed,
868
if at all. This is similar to UndefinedMetaTags, but is only for XML
869
attributes and when parsed by libxml2. The default is to not index
872
=item * New directive: XMLClassAttributes
874
XMLClassAttributes can specify a list of attribute names whose content
875
is combined with the element name to form metanames.
877
=item * New directive: PropCompressionLevel [0-9]
879
If compiled with zlib, Swish-e uses this setting to control the level
880
of compression applied to properties. Properties must be long enough
881
(defined in config.h) to be compressed. Useful for StoreDescription.
883
=item * Experimental directive: IgnoreNumberChars
885
Defines a set of characters. If a word is made of of *only* those
886
characters the word will not be indexed.
888
=item * New directive: FuzzyIndexingMode
890
This configuration directive is used to define the type of "fuzzy" index to create.
891
Currently the options are:
903
Changes to command line arguments. See L<SWISH-RUN|SWISH-RUN> for
904
documentation on these switches.
908
=item * New command line argument C<-H>
910
Controls the level (verbosity) of header information printed with
913
=item * New command line argument C<-x>
915
Provides additional header output and allows for a I<format string>
916
to describe what data to print.
918
=item * New command line argument C<-k>
920
Prints words stored in the Swish-e index.
922
=item * New command line argument C<-N>
924
Provides a way to do incremental indexing by comparing last modification
925
dates. You pass C<-N> a path to a file and only files newer than the
926
last modified date of that file will be indexed.
928
=item * Removed command line argument C<-D>
930
C<-D> no longer dumps the index file data. Use C<-T> instead.
932
=item * New command line argument C<-T>
934
C<-T> is used for debugging indexing and searching.
936
=item * Enhanced command line argument C<-d>
938
Now C<-d> can accept some back-slashed characters to be used as output
941
=item * Enhanced command line argument C<-P>
943
Now -P sets the phrase delimiter character in searches.
945
=item * New command line argument C<-L>
947
Swish-e 2.2 contains an B<experimental> feature to limit results by a
948
range of property values. This behavior of this feature may change in
951
=item * Modified command line argument C<-v>
953
Now the argument C<-v 0> results in *no* output unless there is an error.
954
This is a bit more handy when indexing with cron.