~ubuntu-branches/ubuntu/saucy/checkbot/saucy-proposed

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
Checkbot -- a WWW link verifier

Checkbot is a perl5 script which can verify links within a region of
the World Wide Web. It checks all pages within an identified region,
and all links within that region. After checking all links within the
region, it will also check all links which point outside of the
region, and then stop.

Checkbot regularly writes reports on its findings, including all
servers found in the region, and all links with problems on those
servers.

Checkbot was written originally to check a number of servers at
once. This has implied some design decisions, so you might want to
keep that in mind when making suggestions. Speaking of which, be sure
to check the to do file on the website for things which have been
suggested for Checkbot.

INSTALLATION

Making and installing Checkbot is easy:

    perl Makefile.PL
    make
    make install

You will need to have the following Perl modules installed in order to 
properly install Checkbot:

    LWP
    URI
    HTML::Parser
    MIME::Base64
    Net::FTP
    Mail::Send (optional, contained in the MailTools package)
	Time::Duration (optional, used for additional info in report)


WHERE TO FIND IT

Checkbot is distributed at: http://degraaff.org/checkbot/

Problems, bug reports, and feature enhancements are welcome at
http://sourceforge.net/projects/checkbot/

There is an announcement mailing list to which announcements of new
versions are posted. You can sign up for the list at 
https://lists.sourceforge.net/lists/listinfo/checkbot-announce

Hans de Graaff <hans@degraaff.org>


RECENT CHANGES

Changes in versino 1.80 (15-Oct-2008)

    * Fix handling of nofollow robots tag.
    * Require newer version of LWP for better handling of character
      encodings.
    * Ignore mms scheme.
    * Minor clarification in output.

Changes in version 1.79 (3-Feb-2007)

    * Correctly parse documents to avoid problems with UTF-8
      documents. This avoids the "Parsing of undecoded UTF-8 will give
      garbage when decoding entities" messages.
    * Allow regular expressions in the suppression file, and complain if
      the suppression file is not a proper file.
    * More robust handling of HTTP and FTP servers that have problems
      responding to HEAD requests.
    * Use the original URL to report problems.
    * Ensure XHTML compliance.

Changes in version 1.78 (3-May-2006)

    * Don't throw errors for links that cannot be expected to be valid
      all the time (e.g. the classid attribute of an object element)
    * Better fallbacks for some cases where the HEAD request does not
      work
    * Add more classes and ids to allow more styling of results pages
      (including example CSS file)
    * Ensure XHTML compliance
    * Better checks for optional dependencies

Changes in version 1.77 (28-Jul-2005)

    * Fix silly build-related problem that prevented checkbot 1.76 from
      running at all.
    * Check for presence of robots meta tag and act on it.

Changes in version 1.76 (25-Jul-2005)

    * Error reports now include the page title for easier identification.
    * javascript: links are now ignored because they cannot be checked.
    * Documentation updates.

Changes in version 1.75 (22-Apr-2004)

    * New --cookies option to accept cookies from servers while checking.
    * New --noproxy option indicates which domains should not be
      passed through the proxy.
    * New error code for unknown domains; only known non-checkable
      schemes are ignored now.
    * Minor bug fixes.
    * Documentation updates.

Changes in version 1.74 (17-Dec-2003)

    * New --suppress option allows Response code/URL combinations not
      to be reported as problems.
    * Checkbot warnings are now handled as pseudo-HTTP status messages
      so that they can make use of all Checkbot features such as
      --dontwarn.
    * Option --allow-simple-hosts is deprecated due to this change.
    * More robust handling of (lack of) status messages.
    * Checkbot now requires LWP 5.70 due to bugfixes in this release,
      although it should still also work with older LWP versions.
    * Documentation fixes.

Changes in version 1.73 (31-Aug-2003)

    * Checkbot now tries to produce valid XHTML 1.1
    * URLs matching the --ignore option are now completely ignored;
      they used to be checked but not reported.
    * Proxy support works again, but --proxy now applies to all links
    * Documentation fixes

Changes in version 1.72 (04-May-2003)

    * URLs with query strings are now checked by default, the
      --exclude option can be used to revert to the previous behavior
    * The server results page contains shortcut links to each section
    * Removed warning for unqualified hostnames for news: URLs
    * Handling of signals such as SIGINT
    * Bug and documentation fixes

Changes in version 1.71 (29-Dec-2002)

    * New --filter option allows rewriting of URLs before they will be checked
    * Problematic links are now reported for each page on which they occur
    * New statistics which should work correctly
    * Much simplified storage of information on problem links
    * Duplicate links are now properly detected and not checked twice
    * Rewritten internals for link checking, as a consequence internal
      and external links are checked at the same time now, not in two
      passes like before
    * Rewritten internals for message output
    * A simple test case for 'make test'
    * Minor cleanups of the code

Version 1.70 was only released for testing purposes
Changes in version 1.69

    * Improved makefile and packaging
    * Better default for --match argument
    * Additional instance of using GET instead of HEAD added
    * Bug fixes in printing of web server feedback

Changes in version 1.68

    * Add --allow-simple-hosts which doesn't check for unqualified hosts
    * Mention --style option in help and added example style file
    * Change --sleep implementation so that fractional seconds can be used
    * Fix a bug with handling <base> tags
    * Tighten checks for http and https schemes
    * Remove harmless warnings