~ubuntu-branches/ubuntu/trusty/hyperestraier/trusty-proposed

Committer: Bazaar Package Importer
Author(s): Steve Langasek
Date: 2006-11-14 05:28:32 UTC
mfrom: (2.1.4 feisty)
Revision ID: james.westby@ubuntu.com-20061114052832-0lzqzcefn8mt4yqe

Tags: 1.4.9-1.1

* Non-maintainer upload.
* High-urgency upload for RC bugfix.
* Set HOME=$(CURDIR)/junkhome when building, otherwise the package build
  will incorrectly look for headers there -- and fail when the directory
  exists and is unreadable, as happens sometimes on sudo-using
  autobuilders!

files added:
debian/libestraier8.files

doc/cguide-en.html

doc/cguide-ja.html

doc/javanativeapi/estraier/DatabaseInformer.html

doc/javanativeapi/serialized-form.html

doc/perlnativeapi

doc/perlnativeapi/index.html

doc/rubynativeapi/classes/Estraier/Condition.src/M000037.html

doc/rubynativeapi/classes/Estraier/Condition.src/M000038.html

doc/rubynativeapi/classes/Estraier/Condition.src/M000039.html

doc/rubynativeapi/classes/Estraier/Condition.src/M000040.html

doc/rubynativeapi/classes/Estraier/Condition.src/M000041.html

doc/rubynativeapi/classes/Estraier/Condition.src/M000042.html

doc/rubynativeapi/classes/Estraier/Condition.src/M000043.html

doc/rubynativeapi/classes/Estraier/Condition.src/M000044.html

doc/rubynativeapi/classes/Estraier/Condition.src/M000045.html

doc/rubynativeapi/classes/Estraier/Condition.src/M000046.html

doc/rubynativeapi/classes/Estraier/Condition.src/M000047.html

doc/rubynativeapi/classes/Estraier/Database.src/M000023.html

doc/rubynativeapi/classes/Estraier/Database.src/M000024.html

doc/rubynativeapi/classes/Estraier/Database.src/M000025.html

doc/rubynativeapi/classes/Estraier/Database.src/M000026.html

doc/rubynativeapi/classes/Estraier/Database.src/M000027.html

doc/rubynativeapi/classes/Estraier/Database.src/M000028.html

doc/rubynativeapi/classes/Estraier/Document.src/M000048.html

doc/rubynativeapi/classes/Estraier/Document.src/M000049.html

doc/rubynativeapi/classes/Estraier/Document.src/M000050.html

doc/rubynativeapi/classes/Estraier/Document.src/M000051.html

doc/rubynativeapi/classes/Estraier/Document.src/M000052.html

doc/rubynativeapi/classes/Estraier/Document.src/M000053.html

doc/rubynativeapi/classes/Estraier/Document.src/M000054.html

doc/rubynativeapi/classes/Estraier/Document.src/M000055.html

doc/rubynativeapi/classes/Estraier/Document.src/M000056.html

doc/rubynativeapi/classes/Estraier/Document.src/M000057.html

doc/rubynativeapi/classes/Estraier/Document.src/M000058.html

doc/rubynativeapi/classes/Estraier/Document.src/M000059.html

doc/rubynativeapi/classes/Estraier/Document.src/M000060.html

doc/rubynativeapi/classes/Estraier/Document.src/M000061.html

doc/rubynativeapi/classes/Estraier/Document.src/M000062.html

doc/rubynativeapi/classes/Estraier/Result.src/M000029.html

doc/rubynativeapi/classes/Estraier/Result.src/M000030.html

doc/rubynativeapi/classes/Estraier/Result.src/M000031.html

doc/rubynativeapi/classes/Estraier/Result.src/M000032.html

doc/rubynativeapi/classes/Estraier/Result.src/M000033.html

doc/rubynativeapi/classes/Estraier/Result.src/M000034.html

doc/rubynativeapi/classes/Estraier/Result.src/M000035.html

doc/rubynativeapi/classes/Estraier/Result.src/M000036.html

doc/rubypureapi/classes/EstraierPure/Condition.src/M000048.html

doc/rubypureapi/classes/EstraierPure/Condition.src/M000049.html

doc/rubypureapi/classes/EstraierPure/Condition.src/M000050.html

doc/rubypureapi/classes/EstraierPure/Condition.src/M000051.html

doc/rubypureapi/classes/EstraierPure/Condition.src/M000052.html

doc/rubypureapi/classes/EstraierPure/Condition.src/M000053.html

doc/rubypureapi/classes/EstraierPure/Condition.src/M000054.html

doc/rubypureapi/classes/EstraierPure/Condition.src/M000055.html

doc/rubypureapi/classes/EstraierPure/Condition.src/M000056.html

doc/rubypureapi/classes/EstraierPure/Condition.src/M000057.html

doc/rubypureapi/classes/EstraierPure/Condition.src/M000058.html

doc/rubypureapi/classes/EstraierPure/Condition.src/M000059.html

doc/rubypureapi/classes/EstraierPure/Condition.src/M000060.html

doc/rubypureapi/classes/EstraierPure/Condition.src/M000061.html

doc/rubypureapi/classes/EstraierPure/Document.src/M000062.html

doc/rubypureapi/classes/EstraierPure/Document.src/M000063.html

doc/rubypureapi/classes/EstraierPure/Document.src/M000064.html

doc/rubypureapi/classes/EstraierPure/Document.src/M000065.html

doc/rubypureapi/classes/EstraierPure/Document.src/M000066.html

doc/rubypureapi/classes/EstraierPure/Document.src/M000067.html

doc/rubypureapi/classes/EstraierPure/Document.src/M000068.html

doc/rubypureapi/classes/EstraierPure/Document.src/M000069.html

doc/rubypureapi/classes/EstraierPure/Document.src/M000070.html

doc/rubypureapi/classes/EstraierPure/Document.src/M000071.html

doc/rubypureapi/classes/EstraierPure/Document.src/M000072.html

doc/rubypureapi/classes/EstraierPure/Document.src/M000073.html

doc/rubypureapi/classes/EstraierPure/Document.src/M000074.html

doc/rubypureapi/classes/EstraierPure/Document.src/M000075.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000027.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000028.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000029.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000030.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000031.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000032.html

doc/rubypureapi/classes/EstraierPure/NodeResult.src/M000033.html

doc/rubypureapi/classes/EstraierPure/NodeResult.src/M000034.html

doc/rubypureapi/classes/EstraierPure/NodeResult.src/M000035.html

doc/rubypureapi/classes/EstraierPure/NodeResult.src/M000036.html

doc/rubypureapi/classes/EstraierPure/ResultDocument.src/M000037.html

doc/rubypureapi/classes/EstraierPure/ResultDocument.src/M000038.html

doc/rubypureapi/classes/EstraierPure/ResultDocument.src/M000039.html

doc/rubypureapi/classes/EstraierPure/ResultDocument.src/M000040.html

doc/rubypureapi/classes/EstraierPure/ResultDocument.src/M000041.html

doc/rubypureapi/classes/EstraierPure/ResultDocument.src/M000042.html

estbutler.c

estfraud.c

estfraud.conf

estproxy.c

estproxy.conf

estseek.help

estwaver.c

filter/estfxxdwtotxt

filter/estwnetxpnd

javanative/DatabaseInformer.java

javanative/estraier_Result.h

javanative/result.c

lab/estmgtest

lab/relwords.cgi

lab/sizecheck

lab/wpxmltoest

locale/ja/estseek.help

man/estwaver.1

misc/data004.est

misc/data005.est

misc/echigo.est

misc/francais.txt

misc/rights.txt

misc/validurl.txt

mymorph.c

mymorph.h

perlnative

perlnative/Makefile.in

perlnative/configure

perlnative/configure.in

perlnative/estcmd.pl

perlnative/example

perlnative/example/Makefile

perlnative/example/example001.pl

perlnative/example/example002.pl

perlnative/src

perlnative/src/Estraier.pm

perlnative/src/Estraier.pod

perlnative/src/Estraier.xs

perlnative/src/MANIFEST

perlnative/src/Makefile.PL

wavermod.c

wavermod.h

files removed:
debian/estcall.1

debian/estcmd.1

debian/estconfig.1

debian/estmaster.1

debian/libestraier7.files

debian/shlibs.local

doc/intro-en.html~

doc/intro-ja.html~

doc/rubynativeapi/classes/Estraier/Condition.src/M000028.html

doc/rubynativeapi/classes/Estraier/Condition.src/M000029.html

doc/rubynativeapi/classes/Estraier/Condition.src/M000030.html

doc/rubynativeapi/classes/Estraier/Condition.src/M000031.html

doc/rubynativeapi/classes/Estraier/Condition.src/M000032.html

doc/rubynativeapi/classes/Estraier/Condition.src/M000033.html

doc/rubynativeapi/classes/Estraier/Document.src/M000034.html

doc/rubynativeapi/classes/Estraier/Document.src/M000035.html

doc/rubynativeapi/classes/Estraier/Document.src/M000036.html

doc/rubynativeapi/classes/Estraier/Document.src/M000037.html

doc/rubynativeapi/classes/Estraier/Document.src/M000038.html

doc/rubynativeapi/classes/Estraier/Document.src/M000039.html

doc/rubynativeapi/classes/Estraier/Document.src/M000040.html

doc/rubynativeapi/classes/Estraier/Document.src/M000041.html

doc/rubynativeapi/classes/Estraier/Document.src/M000042.html

doc/rubynativeapi/classes/Estraier/Document.src/M000043.html

doc/rubynativeapi/classes/Estraier/Document.src/M000044.html

doc/rubynativeapi/classes/Estraier/Result.src/M000023.html

doc/rubynativeapi/classes/Estraier/Result.src/M000024.html

doc/rubynativeapi/classes/Estraier/Result.src/M000025.html

doc/rubynativeapi/classes/Estraier/Result.src/M000026.html

doc/rubynativeapi/classes/Estraier/Result.src/M000027.html

doc/rubypureapi/classes/EstraierPure/Condition.src/M000037.html

doc/rubypureapi/classes/EstraierPure/Condition.src/M000038.html

doc/rubypureapi/classes/EstraierPure/Condition.src/M000039.html

doc/rubypureapi/classes/EstraierPure/Condition.src/M000040.html

doc/rubypureapi/classes/EstraierPure/Condition.src/M000041.html

doc/rubypureapi/classes/EstraierPure/Condition.src/M000042.html

doc/rubypureapi/classes/EstraierPure/Document.src/M000048.html

doc/rubypureapi/classes/EstraierPure/Document.src/M000049.html

doc/rubypureapi/classes/EstraierPure/Document.src/M000050.html

doc/rubypureapi/classes/EstraierPure/Document.src/M000051.html

doc/rubypureapi/classes/EstraierPure/Document.src/M000052.html

doc/rubypureapi/classes/EstraierPure/Document.src/M000053.html

doc/rubypureapi/classes/EstraierPure/Document.src/M000054.html

doc/rubypureapi/classes/EstraierPure/Document.src/M000055.html

doc/rubypureapi/classes/EstraierPure/Document.src/M000056.html

doc/rubypureapi/classes/EstraierPure/Document.src/M000057.html

doc/rubypureapi/classes/EstraierPure/NodeResult.src/M000027.html

doc/rubypureapi/classes/EstraierPure/NodeResult.src/M000028.html

doc/rubypureapi/classes/EstraierPure/NodeResult.src/M000029.html

doc/rubypureapi/classes/EstraierPure/NodeResult.src/M000030.html

doc/rubypureapi/classes/EstraierPure/ResultDocument.src/M000031.html

doc/rubypureapi/classes/EstraierPure/ResultDocument.src/M000032.html

doc/rubypureapi/classes/EstraierPure/ResultDocument.src/M000033.html

doc/rubypureapi/classes/EstraierPure/ResultDocument.src/M000034.html

doc/rubypureapi/classes/EstraierPure/ResultDocument.src/M000035.html

doc/rubypureapi/classes/EstraierPure/ResultDocument.src/M000036.html

files modified:
ChangeLog

Makefile.in

THANKS

config.guess

config.sub

configure

configure.in

debian/changelog

debian/control

debian/hyperestraier.init

debian/hyperestraier.manpages

debian/hyperestraier.postinst

debian/hyperestraier.preinst

debian/libestraier-dev.manpages

debian/rules

doc/coreframe.png

doc/index.html

doc/index.ja.html

doc/intro-en.html

doc/intro-ja.html

doc/javanativeapi/allclasses-frame.html

doc/javanativeapi/allclasses-noframe.html

doc/javanativeapi/constant-values.html

doc/javanativeapi/estraier/Cmd.html

doc/javanativeapi/estraier/Condition.html

doc/javanativeapi/estraier/Database.html

doc/javanativeapi/estraier/Document.html

doc/javanativeapi/estraier/Result.html

doc/javanativeapi/estraier/package-frame.html

doc/javanativeapi/estraier/package-summary.html

doc/javanativeapi/estraier/package-tree.html

doc/javanativeapi/index-all.html

doc/javanativeapi/index.html

doc/javanativeapi/overview-summary.html

doc/javanativeapi/overview-tree.html

doc/javanativeapi/packages.html

doc/javapureapi/allclasses-frame.html

doc/javapureapi/allclasses-noframe.html

doc/javapureapi/constant-values.html

doc/javapureapi/estraier/pure/Call.html

doc/javapureapi/estraier/pure/Condition.html

doc/javapureapi/estraier/pure/Document.html

doc/javapureapi/estraier/pure/Node.html

doc/javapureapi/estraier/pure/NodeResult.html

doc/javapureapi/estraier/pure/ResultDocument.html

doc/javapureapi/estraier/pure/package-frame.html

doc/javapureapi/estraier/pure/package-summary.html

doc/javapureapi/estraier/pure/package-tree.html

doc/javapureapi/index-all.html

doc/javapureapi/index.html

doc/javapureapi/overview-summary.html

doc/javapureapi/overview-tree.html

doc/javapureapi/packages.html

doc/logo.png

doc/nguide-en.html

doc/nguide-ja.html

doc/pguide-en.html

doc/pguide-ja.html

doc/rubynativeapi/classes/Estraier/Condition.html

doc/rubynativeapi/classes/Estraier/Database.html

doc/rubynativeapi/classes/Estraier/Database.src/M000001.html

doc/rubynativeapi/classes/Estraier/Database.src/M000002.html

doc/rubynativeapi/classes/Estraier/Database.src/M000003.html

doc/rubynativeapi/classes/Estraier/Database.src/M000004.html

doc/rubynativeapi/classes/Estraier/Database.src/M000005.html

doc/rubynativeapi/classes/Estraier/Database.src/M000006.html

doc/rubynativeapi/classes/Estraier/Database.src/M000007.html

doc/rubynativeapi/classes/Estraier/Database.src/M000008.html

doc/rubynativeapi/classes/Estraier/Database.src/M000009.html

doc/rubynativeapi/classes/Estraier/Database.src/M000010.html

doc/rubynativeapi/classes/Estraier/Database.src/M000011.html

doc/rubynativeapi/classes/Estraier/Database.src/M000012.html

doc/rubynativeapi/classes/Estraier/Database.src/M000013.html

doc/rubynativeapi/classes/Estraier/Database.src/M000014.html

doc/rubynativeapi/classes/Estraier/Database.src/M000015.html

doc/rubynativeapi/classes/Estraier/Database.src/M000016.html

doc/rubynativeapi/classes/Estraier/Database.src/M000017.html

doc/rubynativeapi/classes/Estraier/Database.src/M000018.html

doc/rubynativeapi/classes/Estraier/Database.src/M000019.html

doc/rubynativeapi/classes/Estraier/Database.src/M000020.html

doc/rubynativeapi/classes/Estraier/Database.src/M000021.html

doc/rubynativeapi/classes/Estraier/Database.src/M000022.html

doc/rubynativeapi/classes/Estraier/Document.html

doc/rubynativeapi/classes/Estraier/Result.html

doc/rubynativeapi/created.rid

doc/rubynativeapi/files/estraier-doc_rb.html

doc/rubynativeapi/fr_method_index.html

doc/rubypureapi/classes/EstraierPure/Condition.html

doc/rubypureapi/classes/EstraierPure/Condition.src/M000043.html

doc/rubypureapi/classes/EstraierPure/Condition.src/M000044.html

doc/rubypureapi/classes/EstraierPure/Condition.src/M000045.html

doc/rubypureapi/classes/EstraierPure/Condition.src/M000046.html

doc/rubypureapi/classes/EstraierPure/Condition.src/M000047.html

doc/rubypureapi/classes/EstraierPure/Document.html

doc/rubypureapi/classes/EstraierPure/Node.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000001.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000002.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000003.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000004.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000005.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000006.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000007.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000008.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000009.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000010.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000011.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000012.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000013.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000014.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000015.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000016.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000017.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000018.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000019.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000020.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000021.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000022.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000023.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000024.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000025.html

doc/rubypureapi/classes/EstraierPure/Node.src/M000026.html

doc/rubypureapi/classes/EstraierPure/NodeResult.html

doc/rubypureapi/classes/EstraierPure/ResultDocument.html

doc/rubypureapi/created.rid

doc/rubypureapi/files/estraierpure_rb.html

doc/rubypureapi/fr_method_index.html

doc/uguide-en.html

doc/uguide-ja.html

estcall.c

estcmd.c

estload.c

estmaster.c

estmtdb.c

estmtdb.h

estmttest.c

estnode.c

estnode.h

estraier.c

estraier.h

estraier.idl

estresult.dtd

estseek.c

estseek.conf

estseek.tmpl

estseek.top

estwolefind

filter/estfxasis

filter/estfxmantotxt

filter/estfxmsotohtml

filter/estfxpdftohtml

hyperestraier.pc.in

javanative/Cmd.java

javanative/Condition.java

javanative/Database.java

javanative/Document.java

javanative/Makefile.in

javanative/Result.java

javanative/Utility.java

javanative/condition.c

javanative/configure

javanative/configure.in

javanative/database.c

javanative/document.c

javanative/estraier_Condition.h

javanative/estraier_Database.h

javanative/estraier_Document.h

javanative/myconf.c

javanative/myconf.h

javanative/overview.html

javapure/Call.java

javapure/Condition.java

javapure/Document.java

javapure/Makefile.in

javapure/Node.java

javapure/NodeResult.java

javapure/ResultDocument.java

javapure/Utility.java

javapure/configure

javapure/configure.in

javapure/example/Makefile

javapure/overview.html

lab/diffcheck

lab/estdiet

lab/estndgather

lab/gencert

lab/objtoc

lab/searchlist

lab/stepcount

lab/tabcheck

locale/ja/estseek.conf

locale/ja/estseek.tmpl

locale/ja/estseek.top

man/estcall.1

man/estcmd.1

man/estconfig.1

man/estmaster.1

man/estnode.3

man/estraier.3

mastermod.c

mastermod.h

misc/chars.txt

misc/lang-de.html

misc/lang-zh.html

misc/mymemo-ja.html

misc/test001.est

misc/test003.est

misc/test004.est

misc/test005.est

misc/test009.html

misc/test014.eml

myconf.h

rubynative/configure

rubynative/configure.in

rubynative/estcmd.rb

rubynative/estraier-doc.rb

rubynative/example/Makefile

rubynative/overview

rubynative/src/estraier.c

rubynative/src/extconf.rb

rubypure/Makefile.in

rubypure/configure

rubypure/configure.in

rubypure/estcall.rb

rubypure/estraierpure.rb

rubypure/example/Makefile

rubypure/overview

windows/README-en.txt

windows/README-ja.txt

windows/scmutil.c

Show diffs side-by-side

added added

removed removed

doc/cguide-en.html

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<head>

<title>Crawler Guide of Hyper Estraier Version 1</title>

</head>

<body>

<h1>Crawler Guide</h1>

<div class="note">Last Update: Mon, 11 Sep 2006 21:41:45 +0900</div>

<div class="navi">[<span class="void">English</span>/<a href="cguide-ja.html" hreflang="ja">Japanese</a>] [<a href="index.html">HOME</a>]</div>

<hr />

<h2 id="tableofcontents">Table of Contents</h2>

<ol>

<li><a href="#introduction">Introduction</a></li>

<li><a href="#tutorial">Tutorial</a></li>

<li><a href="#estwaver">Crawler Command</a></li>

</ol>

<hr />

<h2 id="introduction">Introduction</h2>

<p>This guide describes usage of Hyper Estraier's web crawler. If you haven't read <a href="uguide-en.html">user's guide</a> and <a href="nguide-en.html">P2P guide</a> yet, now is a good moment to do so.</p>

<p><code>estcmd</code> can index files on local file system only. Though files on remote hosts can be indexed by using NFS or SMB remote mount mechanism, unspecified number of web sites on Internet can not be mounted by them. Though such web crawlers as <code>wget</code> can do prefetch of those files, it involves high overhead and wastes much disk space.</p>

<p>The command <code>estwaver</code> is useful to crawl arbitrary web sites and to index their documents directly. <code>estwaver</code> is so intelligent that it supports not only depth first order and width first but also similarity oriented order. It crawls documents similar to specified seed documents preferentially.</p>

<hr />

<h2 id="tutorial">Tutorial</h2>

<p>First step is creation of the crawler root directory which contains a configuration file and some databases. Following command will create <code>casket</code>, the crawler root directory:</p>

<pre>estwaver init casket

</pre>

<p>By default, the configuration is to start crawling at the project page of Hyper Estraier. Let's try it as it is:</p>

<pre>estwaver crawl casket

</pre>

<p>Then, documents are fetched one after another and they are indexed into the index. To stop the operation, you can press <code>Ctrl-C</code> on terminal.</p>

<p>When the operation finishes, there is a directory <code>_index</code> in the crawler root directory. It is an index which can be treated with <code>estcmd</code> and so on. Let's try to search the index as with the following command:</p>

<pre>estcmd search -vs casket/_index "hyper estraier"

</pre>

<p>If you want to resume the crawling operation, perform <code>estwaver crawl</code> again.</p>

<hr />

<h2 id="estwaver">Crawler Command</h2>

<p>This section describes specification of <code>estwaver</code>, whose purpose is to index documents on the Web.</p>

<h3>Synopsis and Description</h3>

<p><code>estwaver</code> is an aggregation of sub commands. The name of a sub command is specified by the first argument. Other arguments are parsed according to each sub command. The argument <code>rootdir</code> specifies the crawler root directory which contains configuration file and so on.</p>

<dl>

<dt><kbd>estwaver init <var>rootdir</var></kbd></dt>

<dd>Create the crawler root directory.</dd>

<dd>If -xs is specified, the index is tuned to register less than 50000 documents.</dd>

<dd>If -xl is specified, the index is tuned to register more than 300000 documents.</dd>

<dd>If -xh is specified, the index is tuned to register more than 1000000 documents.</dd>

</dl>

<dl>

<dt><kbd>estwaver crawl [-restart|-revisit|-revcont] <var>rootdir</var></kbd></dt>

<dd>Start crawling.</dd>

<dd>If -restart is specified, crawling is restarted from the seed documents.</dd>

<dd>If -revisit is specified, collected documents are revisited.</dd>

<dd>If -revcont is specified, collected documents are revisited and then crawling is continued.</dd>

</dl>

<dl>

100

<dt><kbd>estwaver unittest <var>rootdir</var></kbd></dt>

101

<dd>Perform unit tests.</dd>

102

</dl>

103

104

<dl>

105

<dt><kbd>estwaver fetch [-proxy <var>host</var> <var>port</var>] [-tout <var>num</var>] [-il <var>lang</var>] <var>url</var></kbd></dt>

106

<dd>Fetch a document.</dd>

107

<dd><var>url</var> specifies the URL of a document.</dd>

108

<dd>-proxy specifies the host name and the port number of the proxy server.</dd>

109

<dd>-tout specifies timeout in seconds.</dd>

110

<dd>-il specifies the preferred language. By default, it is English.</dd>

111

</dl>

112

113

<p>All sub commands return 0 if the operation is success, else return 1. A running crawler finishes with closing the database when it catches the signal 1 (SIGHUP), 2 (SIGINT), 3 (SIGQUIT), or 15 (SIGTERM).</p>

114

115

<p>When crawling finishes, there is a directory <code>_index</code> in the crawler root directory. It is an index available by <code>estcmd</code> and so on.</p>

116

117

<h3>Constitution of the Crawler Root Directory</h3>

118

119

<p>The crawler root directory contains the following files and directories.</p>

120

121

<ul>

122

<li><kbd>_conf</kbd> : configuration file.</li>

123

124

<li><kbd>_meta</kbd> : database file for meta data.</li>

125

<li><kbd>_queue</kbd> : priority queue of URLs to be crawled.</li>

126

<li><kbd>_trace/</kbd> : tracking records of crawled URLs.</li>

127

<li><kbd>_index/</kbd> : index directory.</li>

128

<li><kbd>_tmp/</kbd> : directory for temporary files.</li>

129

</ul>

130

131

<h3>Configuration File</h3>

132

133

<p>The configuration file is composed of lines and the name of an variable and the value separated by "<code>:</code>" are in each line. By default, the following configuration is there.</p>

134

135

<pre>seed: 1.5|http://hyperestraier.sourceforge.net/uguide-en.html

136

seed: 1.0|http://hyperestraier.sourceforge.net/pguide-en.html

137

seed: 1.0|http://hyperestraier.sourceforge.net/nguide-en.html

138

seed: 0.0|http://qdbm.sourceforge.net/

139

proxyhost:

140

proxyport:

141

interval: 500

142

timeout: 30

143

strategy: 0

144

inherit: 0.4

145

seeddepth: 0

146

maxdepth: 20

147

masscheck: 500

148

queuesize: 50000

149

replace: ^http://127.0.0.1/{{!}}http://localhost/

150

allowrx: ^http://

151

denyrx: \.(css|js|csv|tsv|log|md5|crc|conf|ini|inf|lnk|sys|tmp|bak)$

152

denyrx: \.(zip|tar|tgz|gz|bz2|tbz2|z|lha|lzh)(\?.*)?$

153

denyrx: ://(localhost|[a-z]*\.localdomain|127\.0\.0\.1)/

154

noidxrx: /\?[a-z]=[a-z](;|$)

155

urlrule: \.est${{!}}text/x-estraier-draft

156

urlrule: \.(eml|mime|mht|mhtml)${{!}}message/rfc822

157

typerule: ^text/x-estraier-draft${{!}}[DRAFT]

158

typerule: ^text/plain${{!}}[TEXT]

159

typerule: ^(text/html|application/xhtml+xml)${{!}}[HTML]

160

typerule: ^message/rfc822${{!}}[MIME]

161

language: 0

162

textlimit: 128

163

seedkeynum: 256

164

savekeynum: 32

165

threadnum: 10

166

docnum: 10000

167

period: 10000s

168

revisit: 7d

169

cachesize: 256

170

#nodeserv: 1|http://admin:admin@localhost:1978/node/node1

171

#nodeserv: 2|http://admin:admin@localhost:1978/node/node2

172

#nodeserv: 3|http://admin:admin@localhost:1978/node/node3

173

logfile: _log

174

loglevel: 2

175

draftdir:

176

entitydir:

177

postproc:

178

</pre>

179

180

<p>Meaning of each variable is the following.</p>

181

182

<ul>

183

<li><kbd>seed</kbd> : specifies the weight and the URL of a seed document, separated by "<code>|</code>". This can be more than once.</li>

184

<li><kbd>proxyhost</kbd> : specifies the host name of the proxy server.</li>

185

<li><kbd>proxyport</kbd> : specifies the port number of the proxy server.</li>

186

<li><kbd>interval</kbd> : specifies waiting interval of each request (in milliseconds).</li>

187

<li><kbd>timeout</kbd> : specifies timeout of each request (in seconds).</li>

188

<li><kbd>strategy</kbd> : specifies strategy of crawling path (0:balanced, 1:similarity, 2:depth, 3:width, 4:random).</li>

189

<li><kbd>inherit</kbd> : specifies inheritance ratio of similarity from the parent.</li>

190

<li><kbd>seeddepth</kbd> : specifies maximum depth of seed documents.</li>

191

<li><kbd>maxdepth</kbd> : specifies maximum depth of recursion.</li>

192

<li><kbd>masscheck</kbd> : specifies standard value for checking mass sites.</li>

193

<li><kbd>queuesize</kbd> : specifies maximum number of records of the priority queue.</li>

194

<li><kbd>replace</kbd> : specifies regular expressions and replacement strings to normalize URLs. This can be more than once.</li>

195

<li><kbd>allowrx</kbd> : specifies allowing regular expressions of URLs to be visited. This can be more than once.</li>

196

<li><kbd>denyrx</kbd> : specifies denying regular expressions of URLs to be visited. This can be more than once.</li>

197

<li><kbd>noidxrx</kbd> : specifies denying regular expressions of URLs to be indexed. This can be more than once.</li>

198

<li><kbd>urlrule</kbd> : specifies URL rules (regular expressions and media types). This can be more than once.</li>

199

<li><kbd>typerule</kbd> : specifies media type rules (regular expressions and filter commands). This can be more than once.</li>

200

<li><kbd>language</kbd> : specifies the preferred language (0:English, 1:Japanese, 2:Chinese, 3:Korean, 4:misc).</li>

201

<li><kbd>textlimit</kbd> : specifies text size limitation (in kilobytes).</li>

202

<li><kbd>seedkeynum</kbd> : specifies the total number of keywords for seed documents.</li>

203

<li><kbd>savekeynum</kbd> : specifies the number of keywords saved for each document.</li>

204

<li><kbd>threadnum</kbd> : specifies the number of threads running in parallel.</li>

205

<li><kbd>docnum</kbd> : specifies the number of documents to collect.</li>

206

<li><kbd>period</kbd> : specifies running time period (in s:seconds, m:minutes, h:hours, d:days).</li>

207

<li><kbd>revisit</kbd> : specifies revisit span (in s:seconds, m:minutes, h:hours, d:days).</li>

208

<li><kbd>cachesize</kbd> : specifies the maximum size of the index cache (in megabytes).</li>

209

<li><kbd>nodeserv</kbd> : specifies the ID number and the URL of a node server, separated by "<code>|</code>". This can be more than once.</li>

210

<li><kbd>logfile</kbd> : specifies the path of the log file (relative path or absolute path).</li>

211

<li><kbd>loglevel</kbd> : specifies logging level (1:debug, 2:information, 3:warning, 4:error, 5:none).</li>

212

<li><kbd>draftdir</kbd> : specifies the path of the draft directory (relative path or absolute path).</li>

213

<li><kbd>entitydir</kbd> : specifies the path of the entity directory (relative path or absolute path).</li>

214

<li><kbd>postproc</kbd> : the postprocessor for retrieved files.</li>

215

</ul>

216

217

<p><code>allowrx</code>, <code>denyrx</code>, and <code>noidxrx</code> are evaluated in the order of description. Alphabetical characters are case-insensitive.</p>

218

219

<p>Arbitrary filter commands can be specified with <code>typerule</code>. The interface of filter command is same as with <code>-fx</code> option of <code>estcmd gather</code>. For example, the following specifies to process PDF documents.</p>

220

221

<pre>typerule: ^application/pdf${{!}H@/usr/local/share/hyperestraier/filter/estfxpdftohtml

222

</pre>

223

224

<hr />

225

226

</body>

227

228

</html>

229

230

Older »