~jimpop/+junk/mailman_mhonarc

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
Mailman patch #820723: Mailman/pipermail/MHonArc integration patch
------------------------------------------------------------------

Objectives and Description
--------------------------

This patch tightly integrates the MHonArc mail-to-HTML convertor (Earl Hood's
description, not mine) with Mailman and its internal pipermail archiving code.
The purpose of the patch is to produce a fusion of (hopefully) the best feature
of pipermail and MHonArc for handling Mailman mailing list archives. 

Although pipermail has a number of weaknesses it has some good features,
including structuring mail archives by per-list configurable periods and
supporting private and public archive access. 

MHonArc is represented by its proponents as being superior to pipermail in
handling attachments and multipart MIME messages, and being able to
build/rebuild archives from large UNIX mboxes. 

In contrast, pipermail's "database" creaks and groans with large mboxes and
lists with high levels of traffic, and it collapses into a heap when it is asked
to rebuild from very large mboxes. The latest pipermail handling of attachments
and MIME is OK'ish but appears to be weaker than MHonArc. 

With MHonArc, I could not quickly find an obvious way to get the HTML archives
it is generating into optional Yearly/Monthly/Weekly/Daily periods as pipermail
does. It is clearly possible (for instance, see
http://www.mail-archive.com/mailman-users%40python.org/) but how ... 

List archive privacy can be important for some user communities and the
Mailman/pipermail solution works well. 

I also wanted searchable archives and saw no reason not to continue using my
Mailman/HTdig integration patch for per-list archive searching. 

Finally, I thought it would be neat to make choosing whether MHonArc or
pipermail should generate a list's archive pages a per-list configuration option
and have Mailman's archive builder $prefix/bin/arch work whichever choice was
made for any list. 

The upshot was this patch, which is a hack, but one which works. The code is not
pretty but I defy even the best cosmetic surgeon to produce movie star looks
when he is grafting a wart onto a boil. The code works and can fill my needs
until Mailman version 3 comes along incorporating a wizzy new archiver and
archive search capability ... 

Alternative ways of using MHonArc and HTdig in conjunction with Mailman exist
and doubtless some would argue are superior. There is no compulsion to use this
patch; it is just another option you can choose from. 

Implementation Details
----------------------

The implementation operates with pipermail in charge of archiving. That is, the
pipermail code generates the top level archive TOC pages for each list,
organises each list's archive directory structure and sorts out the archving
period stuff. But when it comes to generating the message and index pages of the
HTML archives the use of pipermail or MHonArc depends on the option set for any
list via the Archiving Options page of the admin web GUI. 

For lists set for MHonArc archiving, pipermail uses an instance of MHonArc
instead of its own code to generate the HTML message and index pages. For such
lists, pipermail maintains only a vestigial database so that the problems of
large pipermail databases is avoided. 

The organization of the archives on disk is pretty much the same for both
pipermail'ed and MHonArc'ed lists. The top level list TOC is the same as for
normal Mailman as is the per-period sub-directory structure and per-period text
(mbox) archive. The naming of message files is different for MHonArc as is the
storage of extracted attachment files. For extracted attachments pipermail uses
a separate directory structure while MHonArc puts them in the same directory as
the messages. Actually, this is a win for MHonArc because the URL's on message
pages linking to extracted attachments that it generates are relative URL's. In
contrast, pipermail generates absolute URLs. This means that, for pipermail
generated message pages, if a list archive is changed from private to public or
vice versa, the links on messages pages are wrong and can only be corrected by
rebuilding the entire list archive. With MHonArc'ed archives list privacy
changes are a non-event, with Mailman's creation and deletion of the symlink to
the list archives in $prefix/archives/public/ doing all that is needed. 

$prefix/bin/arch works as normal, regardless of whether pipermail or MHonArc is
generating message and index pages. Indeed, having changed the archiver option
for a list, $prefix/bin/archÊ--wipe must be run to have the change take effect;
this is because of the incompatibility between the per-list database, message
file naming and attachment storage schemes used by the two options. 

Because I like the date/thread/subject/author indexes produced by pipermail,
MHonArc (as used by this patch) does the same thing. The layout of message and
index pages generated by MHonArc is controlled using three MHonArc resource
configuration files (MRCFs), which must reside in Mailman's template directory
structure. The MRCFs are selected for a list in the same way that any other
template file is associated with that list, that is, the template hierarchy is
searched. Before pipermail invokes MHonArc to handle a messages or a group of
messages, it selects the which MRCFs apply and passes these as parameters to
MHonArc. The default MRCFs reside in $prefix/templates/en as mhonarc.mrc,
author.mrc and subject.mrc. The look and feel the default MRCFs produce is
similar to that of the archive and index pages for regular pipermail'ed archive
pages. The only thing that is different with the operation of the templates for
MHonArc is that there is no variable substitution performed by Mailman; instead
some useful values (such as list name and archive name) are passed to MHonArc
using environment variables which can be referred to in the MRCFs to
characterize the pages being generated. 

Whether the MRCFs are passed to MHonArc on the command line when it is invoked
depends on:

1. Whether the message(s) being passed to MHonArc is the first one for a new 
   period; if it is then they are, regardless of any other factors.
    
2. The value of the MHONARC_SAVE_RESOURCES MM config variable (Default value is 
   True). This config variable tells pipermail to pass MHonArc either the  
   -saveresources or the -nosaveresources command line option. If 
   MHONARC_SAVE_RESOURCES is True then the MRCFs are not passed as command line 
   options (exluding the situation in [1]), because MHonArc will already have 
   the template information from the MRCFs in its database.
    
The downside to MHONARC_SAVE_RESOURCES being True is that new messages for an
existing archive period will continue to use the existing templates despite
changes of/to the mhonarc.mrc, author.mrc or subject.mrc applicable to the list
concerned, even though mailmancntrl -restart may have been run. To get the
revised templates into use, run bin/arch --wipe for the list.

Following installation of this patch, the availability of the features it
provides is dependent on the installation of MHonArc (see Necessary Precursors
below) and the setting the value of MHONARC_ARCHIVER_PATH to a non-empty string. 

MHonArc has be installed on the Mailman server. I found installing it into
$prefix/mhonarc worked for me. There is a new Mailman configuration variable,
added to $prefix/Defaults.py by the patch, which tells Mailman where MHonArc is
installed. This was the sum total of setup I had to do for the MHonArc
installation. 

MHONARC_ARCHIVER_PATH = os.path.join(PREFIX, 'mhonarc', 'bin', 'mhonarc')

The patch adds an option, with radio buttons to choose pipermail or MHonArc as
archiver for a list, to the Archiving Options page of the web admin GUI. This 
option is only displayed after a non blank value has been assigned to 
MHONARC_ARCHIVER_PATH. The default value for  for which archiver to use is set 
by a new Mailman configuration variable added to $prefix/Defaults.py by the 
patch: 

# Which archiver to use by default to generate archive pages: 
# 0 - pipermail 
# 1 - mhonarc
DEFAULT_WHICH_ARCHIVER = 0

When a new list is created and when/if archiving is enabled for it, it will use
the archiver specified by DEFAULT_WHICH_ARCHIVER at the time of the list's
creation. Lists with existing archives that pre-date adoption of this patch will
continue to be pipermail archives unless their choice of archiver is changed via
the web admin GUI. 

As noted above, when the archiver nominated for a given list is changed the
change will not take effect until $prefix/bin/arch --wipe is run for the list.
This is not done automatically at the time the option is changed using the web
admin GUI for a good reason: if the list has a large mbox then the amount of
processing involved in rebuilding the archive is inappropriate for a process
being run "in-line" as part of a web server transaction. 

With pipermail in charge, MHonArc only gets to see the archive a period at a
time. In practice, each period of a list's archive is what MHonArc sees and it
maintains a database and index pages for each of them quite separately. This
means that thread and date links generated by MHonArc terminate at the boundary
of the archiving period. The only index linking all of the period archives for a
list is the tope level TOC page for a list's archives which is generated by
pipermail in the normal way. Thus far, this characteristic has not been a
problem for me and I've not given much thought to changing it. 

When $prefix/bin/arch is run on a list configured to use MHonArc, and because of
the period archive approach, pipermail passes messages for the same period to
MHonArc for processing in temporary mbox files. The way this is done means that
the memory demand made by Mailman/pipermail in handling a large mbox is no
bigger than the biggest message in the input file being processed. By contrast,
when archiving a single message, for the ArchRunner say, it is passed via stdin
to MHonArc. 

My Mailman-HTdig integration patch works as normal with archives generated by
both pipermail and MHonArc. Just install and use it in the same way as usual. 

Mailman's pipermail features and configuration for obscuring mail address in the
HTML archives to thwart email address harvesters are not automatically applied
to MHonArc generated pages. Features provided by MHonArc can be used by making
modifications to the MHonArc resource configuration files that control archive
index and message page generation. You should take copies of the default MRCFs
installed in $prefix/templates/en, modify them and add them to a site, virtual
host or list specific sub-directory of the template hierarchy. 

The default MRCFs have embedded in them default indexing control directives for
HTdig, in anticipation of HTdig being used for archive search. With pipermail
generated HTML pages the effect of changing the value of Mailman config
variables ARCHIVE_INDEXING_ENABLE and ARCHIVE_INDEXING_DISABLE in
$prefix/Mailman/mm_cfg.py is dynamically incorporated into the pages. With
MHonArc generated pages this must be achieved by copying and modifying the
default MRCFs installed in $prefix/templates/en and adding them to a site,
virtual host or list specific sub-directory of the template hierarchy. 

Things Still To Do (Maybe)
--------------------------

1.	Integrated support for different languages 
2.	Obscuring mail address in the HTML archives 


Applicability
-------------

This version patch is applicable to Mailman MM 2.1.18-1 

Necessary Precursors
--------------------

The following patch must be applied to Mailman before applying this patch: 

Note: This version of the patch does not require the following patch. It has
been adjusted for Mailman's 2.1.18-1 DATA_FILE_VERSION.  Ignore item 1.
following.

1.	Mailman Patch #760567: both this patch and #760567 update the version number
of the Mailman list database in order to add extra attributes to it.
Note that this means that this MHonArc integration patch and #760567 will have
to be updated the next time the standard DATA_FILE_VERSION value is updated in
$prefix/Mailman/Version.py 
2.	MHonArc has to be installed on the Mailman server machine. My development
and testing has been done with MHonArc 2.6.8. When I installed MHonArc 2.68, I
just followed the instructions and the initial terminal dialogue looked like
this (where $prefix is the value of the Mailman --with-prefix ./configure
option): 


> install.me
Checking dependencies:
        Fcntl ......................... ok
        File::Basename ................ ok
        Getopt::Long .................. ok
        Symbol ........................ ok
        Time::Local ................... ok
Pathname of perl executable: ("/usr/bin/perl") 
Directory to install executables: ("/usr/bin") $prefix/mhonarc/bin
Directory to install library files: ("/usr/lib/perl5/site_perl/5.6.1")
$prefix/mhonarc/lib/perl5/site_perl/5.6.1
Directory to install documentation: ("/usr/doc/MHonArc")
$prefix/mhonarc/doc/MHonArc
Directory to install manpages: ("/usr/share/man") $prefix/mhonarc/share/man
You have specified the following:
        Perl path: /usr/bin/perl
        Bin directory: $prefix/mhonarc/bin
        Lib directory: $prefix/mhonarc/lib/perl5/site_perl/5.6.1
        Doc directory: $prefix/mhonarc/doc/MHonArc
        Man directory: $prefix/mhonarc/share/man
Is this correct? ['y'] 
...


Changes Made
------------
 
See the Description and Implementation Details above. 

Applying the patch
------------------
 
Apply the patch from within the Mailman build directory using the command: 


    patch -p1 < path-to-patch-file

History
-------

Patch Version 	Changes
-------------   -------

2.1.18-1.1      1. Updated for MM 2.1.18-1 compatibility.
                2. Added code to Mailman/versions.py to add the which_archiver
                   and archiver_when_wiped attributes to existing lists.

2.1.11-0.1      1. Updated for MM 2.1.11 compatibility. 
                2. Code that spawns MHonArc now sets umask prior to running the 
                   sub-process to try end ensure file permissions are generated 
                   correctly by MHonArc.

2.1.10-0.1      1. Update for MM 2.1.10 compatibility.

2.1.9-0.1       1. Update for MM 2.1.9 compatibility.

2.1.7-0.1       1. Update for MM 2.1.7 compatibility.

2.1.6-0.3       1. Corrects a long standing omission in the code of 
                   Mailman/Cgi/create.py which fails to get the initial setup
                   of lists created through the web quite right. The leads to
                   spurious errors being logged on message archiving until 
                   bin/arch --wipe is run for such a list. Lists created with
                   bin/newlist did not have this problem.

2.1.6-0.2       1. Corrects error in code of /bin/arch [an omitted mlist.Save()]
                   introduced in patch 2.1.6-0.1.

2.1.6-0.1       1. Update for MM 2.1.6 compatibility.

2.1.5-0.1       1. Update for MM 2.1.5 compatibility.

2.1.4-0.1       1. Update for MM 2.1.4 compatibility.

2.1.3-0.6       1. Added MHONARC_SAVE_RESOURCES config variabletto Defaults.py.
                2. Associated changes in Mailman/Archiver/pipermail.py

2.1.3-0.5 	    1.  Fixed minor HTML syntax error in mhonarc.mrc and 
                    author.mrc that affected date and author index pages.

2.1.3-0.4 	    1.  Changed default value of MHONARC_ARCHIVER_PATH in 
                    Defaults.py to the empty string '' 
                2.  Changed behaviour so that if MHONARC_ARCHIVER_PATH is the 
                    empty string, the ability to change which archiver to use on 
                    the Archiving Options pages of the web admin GUI of lists is 
                    not displayed. In effect until the configuration variable is 
                    defined the installllation of this patch is not seen. 


2.1.3-0.3       1.  Change to tolerate MHonArc prematurely (and validly) closing 
                    the pipe through which it is receiving a message from 
                    pipermail. 


2.1.3-0.2       First 'official release' 
2.1.3-0.1       Original 'unofficial release'