~grubng-dev/grubng/tools-urlsdb

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
*NOTE* This is alpha version of program, use at your own risk. At this moment program
was tested only on GNU/Linux systems.

To work program needs Mono, MySQL and optional (if you enable it) Solr.

1. First run
After install run few times program by commands:
- mono urlsdb.exe createconf - this create configuration file. Next you must
edit default configuration and set it to your own needs.
- mono urlsdb.exe createdb - this create ULR's database. Now program is ready to work
With program you can find two additional files:
- checkworkunits.sh - Bash script which check for amount of workunits ready to send. If
there no any workunits available, script check if program working. If not, run it for
generate new workunits. If you want use this script, you must edit in it path to
directory with workunits (this same which is in configuration file) and path to
program root. *NOTE* all paths must be full path.
- grubworkunits - this is cron job setting for program - it run file checkworkunits.sh every
5 minutes.  If you want use this file, you must edit user name and path to checkworkunits.sh
file.

2. Normal run
mono urlsdb.exe - show you all available options with short informations how to use they.

3. Configuration variables
- enablesolr - Enable or disable Solr (optimization, deleting urls). Value "Y" enable Solr,
value "N" disable (default: "Y")
- solraddress - Full URL (with http://) to Solr Update interface (for example:
http://localhost:8180/solr/update)
- solrusername - Username for Basic Authorization with Solr server
- solrpassword - Password for Basic Authorization with Solr server
- mysqlhost - MySQL host to connect to URL's database
- mysqldb - MySQL database with URL's
- mysqluser - MySQL user to database with URL's
- mysqlpassword - MySQL password to database with URL's
- workunitsdirectory - Full path to directory where generated workunits been stored
- workunitspassword - Password used during generating workunits. *NOTE* Must be that
same like in upload server.
- useragent - HTTP User-Agent string used in generated workunits
- urlsamount - Amount of URL's in one generated workunit (default: 250)
- httpversion - HTTP protocol version used in generated workunits (default: 1.0)
- accept - HTTP Accept string used in generated workunits (default:
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8)

4. Difference between options "fastfilldb" and "filldb"
Main difference is that during executing option "filldb" URL's are checked if they
are exists in database. For example:
In database you have URL: example.com/
In file with URL's you have:
example1.com/
www.example.com/
example.com/
If you upload it via option "fastfilldb" all 3 URL's will be inserted into database.
If you upload it via option "filldb" only first URL will be inserted (example1.com/).
Thus, use "fastfilldb" option only if you are absolutely sure, that in your file and
in database no identical URL's.

13.01.2011 Bartek thindil Jasicki