1
.\" Hey, EMACS: -*- nroff -*-
2
.\" First parameter, NAME, should be all caps
3
.\" Second parameter, SECTION, should be 1-8, maybe w/ subsection
4
.\" other parameters are allowed: see man(7), man(1)
5
.TH ESTCMD 1 "2005-06-04" "Man Page" "HyperEstraier"
6
.\" Please adjust this date whenever revising the manpage.
8
.\" Some roff macros, for reference:
9
.\" .nh disable hyphenation
10
.\" .hy enable hyphenation
11
.\" .ad l left justify
12
.\" .ad b justify to both left and right margins
13
.\" .nf disable filling
14
.\" .fi enable filling
15
.\" .br insert line break
16
.\" .sp <n> insert n+1 empty lines
17
.\" for manpage-specific macros, see man(7)
19
estcmd \- indexing and searching
22
.RI "[-cl] " db " [" file ]
25
.RI "[-cl] " db " " expr
28
.RI "[-cl] " db " " expr " " name " [" value "]"
40
.RI "" db " [" name " [" value ]]
46
.RI "[-onp] [-ond] " db
49
.RI "[-ic " enc "] [-vu|-va|-vf|-vs|-vh|-vx|-dd] [-kn " num "] [-gs|-gf|-ga] [-cd] [-ni] [-sf] [-hs] [-attr " expr "] [-ord " expr "] [-max " num "] [-sim " id "] " db " [" phrase ]
52
.RI "[-cl] [-fe|-ft|-fh|-fm] [-fx " sufs " " cmd "] [-fz] [-fo] [-rm " sufs "] [-ic " enc "] [-il " lang "] [-bc] [-pc " enc "] [-px " name "] [-apn] [-sd] [-cm] [-cs " num "] " db " [" file|dir ]
55
.RI "[-cl] [-fc] " db " [" prefix ]
58
.RI "[-fc] [-dfdb " file "] [-ni] [-kn " num "] " db " [" prefix ]
61
.RI "[-dfdb " file "] " db
64
.RI "[-ft|-fh|-fm] [-ic " enc "] [-il " lang "] [" file ]
67
.RI "[-ic " enc "] [-il " lang "] [-apn] [-wt] [" file ]
73
.RI "[-ren|-rla|-reu|-ror|-rjp|-rch] [-cs " num "] " db " " dnum
83
This manual page documents briefly the
87
.\" TeX users may be more comfortable with the \fB<whatever>\fP and
88
.\" \fI<whatever>\fP escape sequences to invode bold face and italics,
90
\fBestcmd\fP is a program that can do not only indexing
92
.SH SUBCOMMANDS AND OPTIONS
93
\fBestcmd\fP is an aggregation of sub commands. The name of a sub command is
94
specified by the first argument. Other arguments are parsed according to each
95
sub command. The argument \fIdb\fP specifies the path of an index.
97
All sub commands return 0 if the operation is success, else return 0. As for
98
put, out, gather, purge, randput, wicked, and regression, they finish with
99
closing the database when they catch the signal 1, 2, 3, 13, or 15.
101
The encoding name specified by \fB-ic\fP option should be such name registered to
102
IETF as UTF-8, ISO-8859-1, and so on. The language name specified by -il option
103
should be one of "en" (English), "ja" (Japanese, "zh" (Chinese), "ko" (Korean).
105
An outer command specified by \fB-fx\fP option of gather receives the
106
path of the target document by the first argument and the path for
107
output by the second argument.
109
A summary of options is included below.
110
For a complete description, see /usr/share/doc/hyperestraier/uguide-en.html.
113
.RI "[-cl] " db " [" file ]
115
Register a document of document draft to an index.
118
\fIfile\fP specifies a target file. If it is omitted, the standard
121
If \fB\-cl\fP is specifed, regions of a overwritten document are cleaned up.
126
.RI "[-cl] " db " " expr
128
Remove information of a document from an index.
131
\fIexpr\fP specifies the ID number or the URI of a document.
133
If \fB\-cl\fP is specifed, regions of the document are cleaned up.
138
.RI "[-cl] " db " " expr " " name " [" value "]"
140
Edit an attribute of a document in an index.
143
\fIexpr\fP specifies the ID number or the URI of a document.
145
\fIname\fP specifies the name of an attribute.
147
\fIvalue\fP specifies the value of the attribute. If it is omitted, the attribute is removed.
154
Output document draft of a document in an index.
157
\fIexpr\fP specifies the ID number or the URI of a document.
164
Output a list of all document in an index.
170
Output the ID number of a document specified by URI.
173
\fIuri\fP specifies the URI of a document.
178
.RI "" db " [" name " [" value ]]
183
\fIname\fP specifies the name of a piece of meta data. If it is omitted, a list
184
of all names is output.
186
\fIvalue\fP specifies the value of the meta data to be recorded. If it is
187
omitted, the current value is output. If it is an empty string, the meta
195
Output the number of documents and the number of unique words in an index.
199
.RI "[-onp] [-ond] " db
201
Optimize an index and clean up dispensable regions.
204
If \fB-onp\fP is specified, it is omitted to clean up dispensable regions.
206
If \fB-ond\fP is specified, it is omitted to optimize the database files.
211
.RI "[-ic " enc "] [-vu|-va|-vf|-vs|-vh|-vx|-dd] [-kn " num "] [-gs|-gf|-ga] [-cd] [-ni] [-sf] [-hs] [-attr " expr "] [-ord " expr "] [-max " num "] [-sim " id "] " db " [" phrase ]
213
Search an index for documents.
216
\fIphrase\fP specifies the search phrase.
218
\fB-ic\fP specifies the input encoding. By default, it is UTF-8.
220
If \fB-vu\fP is specified, TSV of ID number and URI are output.
222
If \fB-va\fP is specified, multipart format including attributes is output.
224
If \fB-vf\fP is specified, multipart format including document draft is output.
226
If \fB-vs\fP is specified, multipart format including attributes and
229
If \fB-vh\fP is specified, human readable format including attributes and
232
If \fB-vx\fP is specified, XML including including attributes and snippets is
235
If \fB-dd\fP is specified, document draft data are dumped and saved into
238
\fB-kn\fP specifies the number of keywords to be extracted. By default, no
239
keyword is extracted.
241
If \fB-gs\fP is specified, every key of N-gram is checked. By default, it is
244
If \fB-gf\fP is specified, keys of N-gram are checked every three.
246
If \fB-ga\fP is specified, keys of N-gram are checked every four.
248
If \fB-cd\fP is specified, whether documents match the search phrase
249
definitely is checked.
251
If \fB-ni\fP is specified, TF-IDF tuning is omitted.
253
If \fB-sf\fP is specified, the phrase is treated as a simplefied form.
255
If \fB-hs\fP is specified, score information is output as a hint.
257
\fB-attr\fP specifies an attribute search condition. This option can
258
be specified multiple times.
260
\fB-ord\fP specifies the order expression. By default, it is
263
\fB-max\fP specifies the maximum number of show documents. Negative
264
means unlimited. By default, it is 10.
266
\fB-sim\fP specifies the ID number of the seed document for similarity search.
271
.RI "[-cl] [-fe|-ft|-fh|-fm] [-fx " sufs " " cmd "] [-fz] [-fo] [-rm " sufs "] [-ic " enc "] [-il " lang "] [-bc] [-pc " enc "] [-px " name "] [-apn] [-sd] [-cm] [-cs " num "] " db " [" file|dir ]
273
Scan the local file system and register documents into an index.
275
If the third argument is the name of a file, a list of paths of target
276
documents are read from it. If it is "-", the standard input is specified.
277
If the third argument is the name of a directory. All files under the
278
directory are treated as target documents.
280
If \fB-cl\fP is specified, regions of overwritten documents are cleaned up.
282
If \fB-fe\fP is specified, target files are treated as document draft. By
283
default, the format is detected by the suffix of each document.
285
If \fB-ft\fP is specified, target files are treated as plain text.
287
If \fB-fh\fP is specified, target files are treated as HTML.
289
If \fB-fm\fP is specified, target files are treated as MIME.
291
If \fB-fx\fP is specified, target files with the specified suffixes
292
are processed by the specified outer command. If the command is leaded
293
by "T@", the output of the command is treated as plain text. If the
294
command is leaded by "H@", the output of the command is treated as
295
HTML. If the command is leaded by "M@", the output of the command is
296
treated as MIME. Else, the output is treated as document draft. This
297
option can be specified multiple times.
299
If \fB-fz\fP is specified, documents which do not corresponding to the
300
condition of \fB-fx\fP are ignored.
302
If \fB-fo\fP is specified, target files are not read. It is useful for
303
efficient process of the outer command.
305
If \fB-rm\fP is specified, target files with the specified suffixes are
306
removed. "*" matches any file. This option can be specified multiple times.
308
\fB-ic\fP specifies the input encoding. By default, it is detected
311
\fB-il\fP specifies the preferred input language. By default, English
314
If \fB-bc\fP is specified, binary files are detected and ignored.
316
\fB-pc\fP specifies the encoding of file paths. By default, it is ISO-8859-1.
318
\fB-px\fP specifies the name of an attribute read from the list of paths.
319
As the list of paths can be in TSV format, the first field is treated as
320
the path of a target document, the second field and the followers are
321
definitions of attribute values. \fB-px\fP specifies the name of each
322
values of the second field and the followers. This option can be specified
325
If \fB-apn\fP is specified, N-gram analysis is performed against
328
If \fB-sd\fP is specified, the creation date and the modification date
329
of each file is recorded as attributes.
331
If \fB-cm\fP is specified, documents whose modification date has never
334
\fB-cs\fP specifies the size of cache memory by mega bytes. By
340
.RI "[-cl] [-fc] " db " [" prefix ]
342
Purge information of documents which do not exist on the file system.
344
If \fIprefix\fP is specified, only documents whose URIs are begins with it.
346
If \fB-cl\fP is specified, regions of the deleted documents are cleaned up.
348
If \fB-fc\fP is specified, information of all target documents are deleted.
353
.RI "[-fc] [-dfdb " file "] [-ni] [-kn " num "] " db " [" prefix ]
355
Create a database of keywords extracted from documents.
357
If \fIprefix\fP is specified, only documents whose URIs are begins
360
If \fB-fc\fP is specified, all target documents are processed
361
whichever they have existing records or not.
363
\fB-dfdb\fP specifies an outher database of document frequency. By default,
364
document frequency is calculated dynamically according to the index.
366
If \fB-ni\fP is specified, TF-IDF tuning is omitted.
368
\fB-kn\fP specifies the number of keywords to be extracted.
373
.RI "[-dfdb " file "] " db
375
Output a list of all unique words and each record size which is treated as
378
\fB-dfdb\fP specifies an outer database where the result is stored. By default,
379
the result is output to the standard output as TSV. If the outer database
380
already exists, the value of each record is incremented.
385
.RI "[-ft|-fh|-fm] [-ic " enc "] [-il " lang "] [" file ]
387
Convert the file into document draft.
389
If file argument is omitted, stdin is used.
391
If \fB-fh\fP is specified, target files are treated as HTML.
393
If \fB-fm\fP is specified, target files are treated as MIME.
395
\fB-ic\fP specifies the input encoding. By default, it is detected
398
\fB-il\fP specifies the preferred input language. By default, English
404
.RI "[-ic " enc "] [-il " lang "] [-apn] [-wt] [" file ]
406
break down each words from plain text.
408
If file argument is omitted, stdin is used. If file string starts with
409
'@', string following '@' itself is target.
411
\fB-ic\fP specifies the input encoding. By default, it is detected
414
\fB-il\fP specifies the preferred input language. By default, English
417
If \fB-apn\fP is specified, N-gram analysis is performed against
420
If \fB-wt\fP is specified, output tailing 1-gram.
431
.RI "[-ren|-rla|-reu|-ror|-rjp|-rch] [-cs " num "] " db " " dnum
450
Show the version information.
453
The following is to register mail files of mh format.
455
\fB find /home/mikio/Mail -type f | egrep 'inbox/(business|friends)/[0-9]+$' | estcmd gather -cl -fm -cm casket -\fP
457
The following is to register MS-Office files. estfxmsotohtml requires wvWare
460
\fB PATH=$PATH:/usr/local/share/hyperestraier/filter ; export PATH\fP
462
\fB estcmd gather -cl -fx ".doc,.xls,.ppt" "H@estfxmsotohtml" -fz -sd -cm casket .\fP
464
The following is to register PDF files. estfxpdftohtml requires pdftotext.
466
\fB PATH=$PATH:/usr/local/share/hyperestraier/filter ; export PATH\fP
468
\fB estcmd gather -cl -fx ".pdf" "H@estfxpdftohtml" -fz -sd -cm casket .\fP
470
The following is to output the search result as XML.
472
\fB estcmd search -vx -max 8 casket 'socket AND shutdown'\fP
480
estraier was written by Mikio Hirabayashi <mikio at users.sourceforge.net>.
482
This manual page was written by Fumitoshi UKAI <ukai@debian.or.jp>,
483
for the Debian project (but may be used by others).