~msapiro/mailman/htdig_mhonarc

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
Mailman Patch #444884 (Mailman-htdig integration) Installation Details
----------------------------------------------------------------------
 

The information below is also available in the INSTALL.htdig-mm.html file which
is installed by this patch. 

Table of Contents
----------------
 

*	Patch identification 
*	Prerequisites 
*	Current version 
*	Changes introduced by this patch version 


*	Introduction 
*	Installing and Building Mailman with this patch 
*	What is Installed by the Patch 
*	Configuration of Mailman-htdig Integration 
*	Health Warning on the packet! 
*	Starting from Scratch (Again) 
*	General 
*	Permissions Considerations 
*	htdig 
*	Apache 


*	Local htdig Configuration 
*	Remote htdig Configuration 
*	Upgrading an Existing Standard Mailman Installation 
*	Changing from local to remote htdig or vice versa 
*	Coping with htdig Upgrades 
*	Changing the Addressing Scheme of your web_page_url 


*	Operational Information 
*	Notes and Warnings 
*	Archive security problems resolved by htdig-2.1.3-0.2 patch 
*	Private archive security problem prior to htdig-2.1.1-0.2.patch version 
*	Maintaining archive security with htdig-2.1.1-0.2.patch version and later 
*	Upgrading to htdig-2.1.1-0.2.patch or later from an earlier patch version 
*	Redhat 7.1 and 7.2 installations 
*	Apache/htdig issues 


*	Contributors 
*	History 
*	Compatibility 
*	Changes 


*	Appendices 
*	Appendix 1 -Technique for htdigging when Mailman's DEFAULT_URL uses the
https scheme 




Patch IdentificationÊ
----------------

Different versions of this patch are available for different versions of
Mailman. There may be different versions of this patch for any given version of
Mailman, typically as a results of MM version specific improvements or
corrections of bugs in the patched code. The names of patch files for this patch
are structured as follows: 


    htdig-<MM-version-no>-<patch-version-no>.patch[.gz]
 

Thus, for instance, patch file htdig-2.1.4-0.1.patch is patch version 0.1 for
application to MM version 2.1.4 source code. 

The <patch-version-no> is reset to 0.1 for the first patch version applicable to
each new version of Mailman. 

The .gz suffix, if present, says that the patch file has been compressed using
gzip. 

As a general rule, you should use the highest patch version number for the MM
version you are installing. 

Current VersionÊ
----------------

The current version of this patch is for Mailman 2.1.10: 
Mailman 2.1.10 	- 	htdig-2.1.10-0.1.patch 

Be sure to read the notes in the cCanges section below about the patch version 
you are going to use.

Patches for previous versions of Mailman are frozen at the highest revision
level they reached while those previous versions of MM were current. 

Information about older Mailman and patch versions is given in the history
section below. 

Changes introduced by this patch versionÊ
----------------

The following changes are introduced by version 0.1 of this patch: 

*  Updated patch for MM 2.1.10 compatibility and made change to setup_htdig()
   function in HyperArch.py suggested by  Mark Sapiro.
*  The frequency with which extra languages are being supported by Mailman 
   exceed my capacity to cope. From htdig-2.1.9-0.1.patch on only the English 
   language (default) templates are guaranteed to have been patched. The 
   following files in a language's default template directory should be checked
   and if necessary modified per the changes made to the en language templates 
   after  installation of this patch if that other language is used:
                    
       templates/<lang>/archidxfoot.html
       templates/<lang>/archidxhead.html
       templates/<lang>/archtoc.html
       templates/<lang>/archtocentry.html
       templates/<lang>/archtocnombox.html
       templates/<lang>/article.html

PrerequisitesÊ[ toc ]
----------------

A working Htdig installation
----------------

You must have a working installation of htdig with htsearch available and
installed on either the machine on which you are running Mailman or on another
machine which has access to Mailman list archives via NFS or some similarly
competent network file sharing scheme. 

Regardless of how you configure things to provide Mailman's Web UI, if its gives
normal operation of the /mailman/private CGI script for providing access to
private list archives, it should also support access to htdig search results via
the /mailman/mmsearch and /mailman/htdig CGI scripts. 

Warning: This patch has been tested with HTdig 3.1.6 and no testing has been
done with the Beta versions of HTdig 3.2 at the time of writing. You may or may
not encounter problems/issues not described here if you use HTdig 3.2 beta or
stable releases. 

Other Mailman patches
---------------- 

Prior to installing this patch you may also need to install the other MM
patches. This will depend on the version of Mailman and the version of this
patch you are dealing with. For version 0.3 of this patch for MM 2.1.3 the
latest version of patch #444879, indexing-2.1.3-x.y.patch , is required. It is
available from: 

*
http://sourceforge.net/tracker/index.php?func=detail&aid=444879&group_id=103&
atid=300103 
*	http://www.openinfo.co.uk/mailman/patches/444879/index.html 


For any other version of this patch details of its prerequisites are in the
version of INSTALL.htdig-mm file which is installed by that patch. 

IntroductionÊ
----------------

This integration enables use of the htdig (http://www.htdig.org) search engine
for searching mail list archives produced by pipermail, Mailman's built-in
archiver. 

You can use htdig without applying these patches to Mailman but you may find it
awkward to achieve some of the features offered by this patch. 

The main features of the patch are: 

1.	per list search facility with a search form on each list's TOC page. 
2.	maintenance of privacy of private archives. The user has to establish their
credentials via the normal private archive access mechanism before any access
via htdig is allowed. 
3.	

a common base URL for both public and private archive access via htsearch
results. This means that htdig indices are unaffected by changing an archive
from private to public and vice versa. All access to archives via htdig is
controlled by wrapped CGI scripts called htdig.py and mmsearch.py. 

Note that Mailman's attachment scrubber creates a problem when it extracts
attachments from messages as they are being archived because it embeds absolute
URLs to what it has extracted in the archived messages. This can only be fixed
by running $prefix/bin/arch to rebuild the list's archive from its mbox file
after changing its archive from private to public or vices versa. This problem
is generic and unrelated to the use of this patch. One way resolving it is by
use the Mailman-MHonArc integration patch #???????? available from 
*	TBA 
*	http://www.openinfo.co.uk/mailman/patches/mhonarc/index.html 


4.	a choice of running htdig on the machine running Mailman (aka local htdig)
or running htdig on another machine which has access to Mailman's archives via
NFS or some similarly competent network file sharing scheme (aka remote htdig). 
5.	cron activated scripts and crontab entry to run htdig regularly to maintain
the per list search indices. 
6.	automatic creation, deletion and maintenance of htdig configuration files
and such. Beyond installing htdig and telling Mailman where it is via mm_cfg you
do not have to do much other setup. 
7.	htdig search related web page elements are retrieved from the
$prefix/templates/ directory hierarchy so that site, virtual host, list and
language tailoring of them can be done. 


Installing and Building Mailman with this patchÊ
----------------

Create your Mailman build directory in the normal way. 

You can apply the patch to either a fresh expansion of the Mailman source
distribution or the one you used to build a currently working Mailman
installation. 

Execute the following command in the Mailman build directory: 


    patch -p1 < path-to-htdig-2.m.n-x.y.patch
 

Follow the configure and make procedures for regular Mailman as given in the
$build/INSTALL file. 

Then follow the Mailman-htdig configuration instructions given below. 

What is Installed by the PatchÊ
----------------
The patch amends:
 

$build/INSTALL 

Adds a reference to this file to the standard installation notes. 

$prefix/bin/check_perms 

To set the permissions for access to $prefix/archive/private/<listname>/htdig/
subdirectories to 2770. This prevents access by 'other', as a security measure. 

$prefix/Mailman/Archiver/HyperArch.py 

The changes in this file set up the per list htdig stuff such as config files
and adds the search forms to the list TOC pages. 

$prefix/Mailman/Queue/ArchRunner.py 

The changes in this file rewrite a list's TOC page if, when archiving a new
message for the list, the update time of the list's TOC page are after the last
time that rundig was last run. This is is only of relevance when one of the
remote_nightly_htdig series of cron scripts (see below) is being used. 

The only deficiency with this approach is that if no message is sent to the list
after rundig is run for the list the TOC page is not rewritten to reflect that
rundig was run. 

$prefix/Mailman/Cgi/private.py 

There is a security hole in the released Mailman code via which private.py will
serve files such as a list's archive pipermail.pck and files in the list's
archive database sub-directory. This hole also allows access to the list's
archive htdig sub-directory. Fixes for this are applied. As htdig.py (see below)
is based on private.py the same security fix has been incorporated into it. 

$build/Mailman/Defaults.py.in 

Adds the default configuration variables needed to support the mailman-htdig
integration 

$build/cron/crontab.in.in 

Adds the nightly_htdig cron script to the default crontab 

$build/configure 
$build/configure.in 
$build/Makefile.in 
$build/cron/Makefile.in 
$build/src/Makefile.in 
$build/bin/Makefile.in 

Changes to configuration and Makefiles used for installing Mailman 



The patch adds:
 

$build/INSTALL.htdig-mm and $build/INSTALL.htdig-mm.html 

These contain the material you are reading. 

$prefix/cgi-bin/htdig 
$prefix/Mailman/Cgi/htdig.py 

these are a CGI script and its wrapper, which is always on the path of URLs
returned from searches of htdig indices. The script provides secure access to
such URLs in the same way that the $prefix/cgi-bin/private and
$prefix/Mailman/Cgi/private.py. Both htdig.py and private.py ensures private
archives are kept private, applying the same criteria for permitting access.
Additionally, htdig.py delivers material from public archives without demanding
any authentication. 

$prefix/cgi-bin/mmsearch 
$prefix/Mailman/Cgi/mmsearch.py 

these are a CGI script and its wrapper. The script acts as a security wrapper
for htdig's htsearch CGI script. It will only run htsearch if the user is
authorized to access a list's archive. it applies the same criteria as
$prefix/Mailman/Cgi/private.py. In the case of local htdig operation, this
script runs htsearch as a sub-process and returns its results. In the case of
remote htdig operation mmsearch runs htsearch on the remote machine via one or
other of the CGI scripts remote_mmsearch and remote-mmsearch. 

$prefix/Mailman/Cgi/remote_mmsearch 
$prefix/Mailman/Cgi/remote-mmsearch 

these are companion scripts of mmsearch for use with remote htdig operation.
They are run by mmsearch via HTTP requests, and in turn run htsearch as a sub
process, returning the results it delivers. 

$prefix/bin/blow_away_htdig 

this is a utility script for removing per list htdig data, e.g. the config file
and indices/db files. This is necessary when: 
a.	ceasing use of the Mailman-htdig integration 
b.	moving from local to remote htdig or vice-versa 
c.	upgrading to a version of htdig which has an incompatible index/db file
format 
d.	changing the addressing scheme (http versus https) in the web_page_url
configuration variable of a list 
e.	reconstructing per-list htdig configuration files after upgrading to
htdig-2.1.1-0.2.patch or later from an earlier patch version, and prior to
running nightly_htdig 


$prefix/cron/nightly_htdig 
$prefix/cron/remote_nightly_htdig 
$prefix/cron/remote_nightly_htdig_noshare 
$prefix/cron/remote_nightly_htdig.pl 

These scripts all do the same thing; they can be installed as a cron task and
run regularly to invoke htdig's rundig script to update mailing list search
indices. Only one of these scripts is used, the choice of which depending on
your system configuration. 

nightly_htdig is used where Mailman and htdig run on the same system. 

the remote_... scripts are used where Mailman and htdig live on different
systems. You choose which one suits your needs best: 

remote_nightly_htdig uses the same python files on both systems, that is the
same .py and .pyc files are accessed, and it hence depends on compatible
bytecode between the Mailman system and htdig system. It also accesses Mailman
data files and depends on compatibility of data files contents, for example
pickled Python values. This should work OK if the same version of python is
being run on both systems even where the systems are not heterogeneous, for
example one is Sun/Solaris and the other is PC/Linux. 

remote_nightly_htdig_noshare shares no Python files between the two systems.
While it is still written in Python it acquires information from the file system
using directory listings and stat operations. 

remote_nightly_htdig.pl is a rewrite of remote_nightly_htdig_noshare in Perl. It
is for use where the htdig system does not have Python available on it: in which
case, shame on you. 

$prefix/templates/en/TOC_htsearch.html 
$prefix/templates/en/htdig_access_error.html 
$prefix/templates/en/htdig_auth_failure.html 
$prefix/templates/en/htdig_conf.txt 

These are English language templates special to the htdig integration: 

TOC_htsearch.html 
the HTML of the search form that is embedded in a list's archive TOC page. 
htdig_access_error.html 
HTML page returned by mmsearch.py in the event of an access error for a page
access. 
htdig_auth_failure.html 
HTML page returned by mmsearch.py in the event of an authentication error for a
page access. 
htdig_conf.txt 
template for the per-list htdig.conf files generated by the patched code. 


Configuration of Mailman-htdig IntegrationÊ
----------------

Configuration of the Mailman-htdig integration is carried out on the Mailman
side. While you must have to hand some information about your htdig
installation, you should not have to tinker much with htdig for the integration
to work. 

Most of the configuration of the integration is done by values assigned to
python variables in either $prefix/Mailman/Defaults.py or
$prefix/Mailman/mm_cfg.py. 

If you opt to run htdig on a different machine or under a different HTTP server
to the one running the HTTP server which provides Mailman's Web UI you will also
have to edit whichever of the patch's three htdig related cron scripts you opt
to run (remote_nightly_htdig, remote_nightly_htdig_noshare, or
remote_nightly_htdig.pl) to add a small amount of configuration information. 

Health Warning on the packet!Ê
----------------

Be careful when editing configuration information in $prefix/Mailman/mm_cg.py:
the only Mailman config file you should be editing. Check, double check and then
recheck before going ahead. If you get either variable names or their values
wrong a lot of confusion in the operation of both Mailman and htdig can result. 

You (and others supporting you) can spend hours trying to identify problems and
looking for non-existent bugs as a consequence of such editing errors. Expect to
find errors in these instructions; compensate for them and tell me when you do
(r.barrett at openinfo.co.uk). 

Also do read the htdig documentation, release notes etc. This patch integrates a
working htdig with htsearch available. These notes are about Mailman and
integrating it with that working htdig. It is up to you to sort out the htdig
end of things. 

Starting from Scratch (Again)Ê
----------------

This is getting ahead of things but some of you may already be asking "What if
I've already been using an older version of this patch and want to start
afresh?", or "I want to change from local to remote htdig or vice versa?" 

In these cases your friend will be the $prefix/bin/blow_away_htdig script. It
removes existing htdig related stuff out of your Mailman installation to the
extent that it was added by this patch and added to by the normal operation of
pipermail and nightly_htdig. With that removed and a revised Mailman
configuration, the patched code will start rebuilding the htdig data. 

But before you get carried away with blow_away_htdig, read the rest of these
notes. 

GeneralÊ
----------------

This patch adds a number of default variables to the file
$prefix/Mailman/Defaults.py that affect operation of the Mailman-htdig
integration. These are in addition to the standard Mailman defaults in that
file. If, in the light of what is said below, you decide any of these are
incorrect, you can override them in $prefix/Mailman/mm_cfg.py [NOT IN
Defaults.py! See the comments in Defaults.py for why]. 

By default the Mailman-htdig integration is NOT ENABLED by the installation of
this patch; the default value of the USE_HTDIG variable in Defaults.py turns off
the operation of the integration. You have to actively override that default in
mm_cfg.py to turn on operation of the integration. 

Once a list is created, changing most of these variables will have either no
effect or a bad effect. You will need to run $prefix/bin/blow_away_htdig script
and/or $prefix/bin/arch to rebuild the archive pages if you make significant
changes to the Mailman-htdig integration configuration variables. 

The install process will not overwrite an existing mm_cfg.py file so you can
freely make changes to this file. If you are re-installing a later version of
this patch you may have to change what is already configured in the existing
file and, if necessary, add extra configuration variables to it. 

Most of the Mailman-htdig control variables default to sensible values which you
will not need to change, especially if you are using local htdig. The semantics
of most variables apply to both local and remote htdig operation but with some
the values assigned will depend on whether htdig is viewing things from the same
or a remote machine. 

The first two variables control what is indexed by htdig. The values assigned
are both embedded in the HTML generated by pipermail in the list archives and
added. Changing the values of these variables will mean that all previously
generated HTML pages in list archives will be out of date and you will probably
want to rebuild existing archives using $prefix/bin/arch: 

ARCHIVE_INDEXING_ENABLE 

Defines a string telling htdig that it should look at the following material
when building it indices. 


    Default: ARCHIVE_INDEXING_ENABLE = '<!--/htdig_noindex-->'
 
ARCHIVE_INDEXING_DISABLE 

Defines a string telling htdig that it not should not look at the following
material when building it indices. 


    Default: ARCHIVE_INDEXING_DISABLE = '<!--htdig_noindex-->'
 
USE_HTDIG 

Semantics: 0 - don't use integrated htdig, 1 - use it 

Turns Mailman-htdig integration on or off. 


    Defaults: USE_HTDIG = 0


Notes: 
1.	when USE_HTDIG is turned on the patched code in Mailman will start adding
htdig stuff for any archiving-enabled mail lists as new posts for eachlist are
handled by Mailman. Until a new post is made after enabling with USE_HTDIG an
existing mail list's archive will not be htdig searchable. When the new post is
handled: 
a.	the list's personalised htdig config file is created 
b.	necessary links to the htdig config file are created 
c.	a search form is added to the TOC page for the list 


Even with this done, htdig searches only become available when htdig indices are
constructed. This is done when one or other of the patch's htdig related cron
scripts are run (nightly_htdig, remote_nightly_htdig,
remote_nightly_htdig_noshare, or remote_nightly_htdig.pl, depending on how you
configure your system). These can be run from the command line ahead of their
scheduled cron time to get htdig searches operational. 

2.	Turning USE_HTDIG off will not remove htdig indices or search forms from
existing archive-enabled lists. It will however stop htdig features from being
added to newly created lists. If you want to eliminate htdig from your existing
lists then use the $prefix/bin/blow_away_htdig script. 



HTDIG_FILES_URL 

This is the URL of the directory containing various HTML and Graphics files
installed by htdig; files such as buttonr.gif, buttonl.gif and button1-10.gif.
The URL must end with a '/'. 


    Default: HTDIG_FILES_URL = '/htdig/'
 

The default assumes the HTTP servers providing access to htdig and to Mailman's
web UI are on the same machine and a symbolic link called 'htdig' has been put
into your HTTP server's top level HTML directory which points to the directory
your htdig install has put the actual files into; this link is often to
/usr/share/htdig. This value will depend on your htdig installation decisions
and HTTP server's configuration files (typically /etc/httpd/httpd.conf on a late
model Apache installation) i.e the Alias through which the link to the htdig
files are reached. 

HTDIG_CONF_LINK_DIR 

This is the name of a directory in which links to list specific htdig config
files are placed. 


    Default: HTDIG_CONF_LINK_DIR = os.path.join(VAR_PREFIX, 'archives', 'htdig')
 

The VAR_PREFIX of the default is resolved to an actual file system path when
when Mailman's 'make install' is run. The 'os.path.join' creates a full file
system path by gluing together the three pieces when Mailman is run. This
definition puts the directory alongside the default PUBLIC_ARCHIVE_FILE_DIR and
PRIVATE_ARCHIVE_FILE_DIR. Unless you are changing the value of these variables
you probably do not want to change HTDIG_CONF_LINK_DIR. 

HTDIG_RUNDIG_PATH 

This is the path in your file system to the rundig shell script that is
installed as part of htdig. This tells one or other of the patch's htdig related
cron scripts (nightly_htdig and remote_nightly_htdig) where to find rundig in
order that they can execute it. 


    Default: HTDIG_RUNDIG_PATH = '/usr/local/bin/rundig'
 
HTDIG_HTSEARCH_PATH 

This is the file path to the htsearch program in the htdig package. 


    Default: HTDIG_HTSEARCH_PATH = '/usr/local/bin/rundig'
 

This value will depend on your htdig installation decisions. This path is used
by either the mmsearch CGI script (for local htdig) or the
remote_mmsearch/remote-mmsearch CGI script (for remote htdig) to execute
htsearch as a sub-process. 

HTDIG_EXCLUDED_URLS 

See htdig's configuration file documentation. The value of this MM variable is
inserted into per-list htdig.conf files when they are created as the value of an
htdig excluded_urls directive. But if an exclusion in this value would prevent
indexing of URLs for accessing the htdig.py cgi wrapper then that exclusion is
omitted from that per-list htdig.conf file. 


    Default: HTDIG_EXCLUDED_URLS = '/cgi-bin/ .cgi'


Note: these are the same as the htdig 3.1.6 default values. 

REMOTE_HTDIG 

Semantics: 0 - htdig runs on local machine, 1 -on remote machine 

Says whether htdig going to be run on the same machine as Mailman or on another
machine. 


    Default: REMOTE_HTDIG = 0
 
REMOTE_PRIVATE_ARCHIVE_FILE_DIR 

Only relevant if REMOTE_HTDIG = 1. It is the file system path to the directory
in which Mailman stores private archives, as seen by the machine running htdig. 


    Default: REMOTE_PRIVATE_ARCHIVE_FILE_DIR = os.path.join(VAR_PREFIX, 
                                              'archives', 'private')
 

The VAR_PREFIX of the default is resolved to an actual file system path when
when Mailman's 'make install' is run. The 'os.path.join' creates a full file
system path by gluing together the three pieces when Mailman is run. If you
assign a value to this in mm_cfg.py, just put the relevant explicit file system
path in. 

REMOTE_MMSEARCH_URL 

Only relevant if REMOTE_HTDIG = 1. It is the URL on the htdig machine through
which whichever of the the remote_mmsearch/remote-mmsearch CGI scripts you have
opted to use can be reached via an HTTP request. 


    Default: REMOTE_MMSEARCH_URL = '/cgi-bin/remote-mmsearch'
 
HTDIG_STRICT_FILE_PERM 

Semantics: 0 - 'other' access allowed, 1 - 'other' access denied 

Says whether 'other' has access permissions for per-list
$prefix/private/archives/<listname>/htdig/ directories. For local htdig
operation such access is not required and is a security hole if allowd. Such
access may be needed if remote htdig is used; see notes on "Apache".
$prefix/bin/check_perms should be run after changing the value of this variable
in mm_cfg.py to update access permissions of existing directories. 


    Defaults: HTDIG_STRICT_FILE_PERM = 1


HTDIG_EXTRAS 

You can assign a string value to this config variable and that string will be
included in all of your site's list specific htdig configuration files when they
are created. The value of the string can be any attribute declarations as
defined at http://www.htdig.org/confindex.html. 

Be cautious in what you do with this. Most sites will not need to use this at
all. But if you have some idiosyncratic htdig installation it might help
overcome problems in integrating with Mailman. If you think you need to use it I
suggest: 
a.	You try creating a test list without assigning a value to HTDIG_EXTRAS in
$prefix/Mailman/mm_cfg.py 
b.	Enable archiving for that test list. 
c.	Send a message to the test list so that its archive is created together with
its htdig configuration file. 
d.	Review the content of the list's htdig conf file in
$prefix/archives/private/<listname>/htdig/<listname>.conf. 
e.	You will see where the default value of HTDIG_EXTRAS from
$prefix/Mailman/Defaults.py has been inserted. This value is onlyan htdig
comment and does nothing. 
f.	Consider whether what you will assign to HTDIG_EXTRAS in
$prefix/Mailman/mm_cfg.py will make sense in the context of the rest of the
htdig conf file's contents. 




Permissions ConsiderationsÊ
----------------

htdigÊ
----------------

Python scripts added by this patch (nightly_htdig and its relatives) run the
htdig rundig script identified by HTDIG_RUNDIG_PATH to build search indices for
Mailman archives. Code added by this patch generates per-list htdig
configuration files which are passed as a parameter to the rundig script. These
configuration files identify a list specific directory
($prefix/archives/private/lt;listname>/htdig) in which list specific data files
generated by and used by htdig are to be placed. 

However, the rundig script identified by HTDIG_RUNDIG_PATH may attempt to
generate some files in htdig's COMMON_DIR when it is first run by nightly_htdig;
the files concerned are likely to be root2word.db, word2root.db, synonyms.db and
possibly some others generated by htidg's htfuzzy program. The standard rundig
script generates these files selectively if they do not already exist. Depending
on how you have installed htdig and how the rundig script is first run, there
may be a permissions problem when nightly_hdig executes rundig under the mailman
UID if it tries to generate these files. 

You may need to either give the mailman UID write permission over htdig's
COMMON_DIR or, before the nightly_htdig script is first run, run htdig's htfuzzy
executable with a sufficiently privileged UID in the manner that the rundig
script would run htfuzzy, to create any necessary files in COMMON_DIR. 

See htdig's documentation for further information on this topic. 

ApacheÊ
----------------

When remote_mmsearch or remote-mmsearch scripts are used as part of a remote
htdig strategy you may encounter a file permissions problem. This is because
these scripts, which in turn execute htsearch as a sub-process, will be run with
UID and GID of the remote Apache server. 

By default, the permissions of the per-list
$prefix/private/archives/<listname>/htdig/ directories only allow access for the
mailman UID and GID and hence the remotely executed htsearch will be unable to
access them. 

If this problem is encounterd, then you will have to use the
HTDIG_STRICT_FILE_PERM configuration variable to say "open up the permissions"
before running $prefix/bin/check_perms. You can then use a RewriteRule or
similar in the Apache server's httpd.conf file to restrict access to
$prefix/private/archives/<listname>/htdig/ directories via the web server. 

Local htdig ConfigurationÊ
----------------

This configuration is for when you are running Mailman, htdig, the HTTP server
used to provide Mailman's web UI and htdig's htsearch CGI script, on the same
machine. 

You will need to: 

a.	If different to the default value, add the definition of HTDIG_RUNDIG_PATH
to file $prefix/Mailman/mm_cfg.py. 
b.	If different to the default value, add the definition of HTDIG_HTSEARCH_PATH
to file $prefix/Mailman/mm_cfg.py. 
c.	Add the definition of USE_HTDIG with the value 1 to
$prefix/Mailman/mm_cfg.py. 



        USE_HTDIG = 1
 

If necessary you can override the values of any of the other configuration
variables in file $prefix/Mailman/mm_cfg.py. 

In particular you might need to change the HTDIG_FILES_URL variable from its
default. This URL can be just the path i.e. absolute URL on the same server as
that which serves Mailman's Web UI, or a full URL identifying the scheme (http),
server, server port and path, for example
http://mailer.yourdomain.tld:8080/htdig/ 

Remote htdig ConfigurationÊ
----------------

This configuration is for when you are running htdig and an HTTP server
providing access to htsearch via remote_mmsearch or remote-mmsearch on a
different machine to that is running Mailman. 

For this configuration to work, htdig's programs, both those run from command
lines such as rundig and those run via CGI such as htsearch, must be able to see
Mailman archives through NFS. In the examples below we'll assume that
/mnt/mailman-archives on the htdig machine maps to $prefix/mailman/archives on
the Mailman machine. 

You should also arrange for he mailman UID and its GID to be common to both
machines. Remember that when rundig is called on the htdig machine to produce
search indices for each list it will be trying to write those files via NFS in
Mailman's archive area and will thus need to run with an appropriate identity
and permissions. 

The differences between the local and remote configuration are: 

1.	configuration values telling htdig where to find files are as viewed from
the remote machine. 
2.	configuration values giving URLs that refer to htdiggy things have to be as
viewed from the Mailman machine. 


You will need to: 

1.	Add the definition of HTDIG_HTSEARCH_PATH to file $prefix/Mailman/mm_cfg.py.
This is path to htdig's htsearch on the remote machine running htdig. For
example: 


    HTDIG_HTSEARCH_PATH = '/usr/local/bin/htsearch'
 
2.	Add the definition of HTDIG_RUNDIG_PATH to file $prefix/Mailman/mm_cfg.py.
This is path to rundig on the remote machine running htdig. For example: 


    HTDIG_RUNDIG_PATH = '/usr/local/bin/rundig'
 
3.	Add the definition of REMOTE_MMSEARCH_URL to file $prefix/Mailman/mm_cfg.py.
This must be a full URL referring to one of Mailman's
remote_mmsearch/remote-mmsearch CGI scripts on the remote htdig machine, as seen
from the Mailman local machine. For example: 


    REMOTE_MMSEARCH_URL = 'http://htdiggy.your.com/cgi-bin/remote-mmsearch'
 
4.	Add the definition of HTDIG_FILES_URL to file $prefix/Mailman/mm_cfg.py.
This must be a full URL referring to the directory containing htdig files on the
remote htdig machine as seen from the Mailman local machine. This URL must end
with a '/'. For example:


    HTDIG_FILES_URL = 'http://htdiggy.your.com/htdig/'
 
5.	Add the definition of REMOTE_PRIVATE_ARCHIVE_FILE_DIR to
$prefix/Mailman/mm_cfg.py. This must be the absolute file system path to the
directory in which Mailman stores private archives as seen by the machine
running htdig. For example: 


    REMOTE_PRIVATE_ARCHIVE_FILE_DIR = '/mnt/mailman-archives/private'
 
6.	Add the definition of USE_HTDIG with the value 1 to
$prefix/Mailman/mm_cfg.py. 


    USE_HTDIG = 1
 
7.	Add the definition of REMOTE_HTDIG with the value 1 to
$prefix/Mailman/mm_cfg.py. 


    REMOTE_HTDIG = 1
 
8.	If necessary add the definition of HTDIG_STRICT_FILE_PERM with the value 0
to $prefix/Mailman/mm_cfg.py. This may be needed it the UID/GID that Apache on
the htdig server will run the remote mmsearch as is not mailman or in the
mailman group. This change will open up a security hole which you may want to
consider plugging; see under the heading "Apache permissions" for more details.


    HTDIG_STRICT_FILE_PERM = 0
 


You have to choose one of the two remote mmsearch scripts found in
$prefix/Mailman/Cgi - remote-mmsearch (a Perl script) and remote_mmsearch (a
Python script) - to use and transfer it to the htdig machine. You need to add
this script to the directory in which the web server on the htdig machines
expects to find CGI scripts. Having transferred the script to you htdig machine
you will need to use a text editor to set the values of four configuration
variables below the heading "Edit the following configuration variables to suit
your installation", namely: 


MAILTO 
this is the default mail address for your installation. 
VALID_IP_LIST 
this is a list of IP numbers from which the script should accept an HTTP
request. Normally this should be set to the IP number of your machine running
Mailman. If the list is empty the script will accept HTTP requests from any
machine and be vulnerable to the exploit described under the heading "Private
archive security problem prior to htdig-2.1.1-0.2.patch version" above. 
HTDIG_CONF_LINK_DIR 
this is the file path to the directory in which links to list specific htdig
config files are placed, as viewed from the remote machine running htdig. 
HTDIG_HTSEARCH_PATH 
this is the file path to the htsearch program in the htdig package as viewed
from the remote machine running htdig. 


See "What is Installed by the Patch" for an explanation of the differences
between these remote mmsearch scripts which both do the same job: being a
security wrapper around htdig's htsearch program to restrict searching of a
list's archive indexes to users authorised to see the contents of that archive. 

Note: You may need to change the '#!' on the first line of whichever of the
remote-mmsearch (Perl) and remote_mmsearch (Python) scripts you opt for so that
the correct interpreter is used for running the script on the remote htdig
machine. You may also need to verify the supporting packages/modules used by the
selected script are installed on that system. 


You have to choose one of the three remote_nightly_htdig scripts found in
$prefix/cron - remote_nightly_htdig, remote_nightly_htdig_noshare and
remote_nightly_htdig.pl - and transfer it to the htdig machine. See above under
heading "What is Installed by the Patch" for an explanation of the differences
between these scripts, which all do the same basic job. You should add the
script to the crontab for the mailman UID on the htdig machine. But first you
need to edit the selected script to add some configuration information. What has
to be added depends on which script you opt to use. In each case the variables
concerned are declared near the top of the script and you just have to enter the
appropriate values: 


remote_nightly_htdig 

you only need to set the value of the python variable MAILMAN_PATH to be the
directory $prefix as seen from the htdig machine. The whole Mailman installation
must be accessible via NFS in order to use this script. 

remote_nightly_htdig_noshare 

you need to copy the values for the following configuration variables from
either $prefix/Mailman/mm_cfg.py or $prefix/Mailman/Defaults.py to the script:
REMOTE_PRIVATE_ARCHIVE_FILE_DIR, HTDIG_RUNDIG_PATH. The variables declared in
remote_nightly_htdig_noshare use the same names. This script only requires that
the archives directory of the Mailman installation be accessible via NFS. 

remote_nightly_htdig.pl 

you need to copy the values for the following configuration variables from
either $prefix/Mailman/mm_cfg.py or $prefix/Mailman/Defaults.py to the script:
REMOTE_PRIVATE_ARCHIVE_FILE_DIR, HTDIG_RUNDIG_PATH. Being a Perl script, the
variables in remote_nightly_htdig.pl use the same names but prefixed with the
'$' character. This script only requires that the archives directory of the
Mailman installation be accessible via NFS. 



Note: You may need to change the '#!' on the first line of whichever of these
scripts you opt for so that the correct interpreter is used for running the
script on the remote htdig machine. You may also need to verify the supporting
packages/modules used by the selected script are installed on that system. 

As with the nightly_htdig script when running with local htdig, these scripts
can be run from the command line using the mailman UID in order to get htdig to
construct an initial set of indices. 


Upgrading an Existing Standard Mailman InstallationÊ
----------------

1.	You will want to suspend operation of Mailman while doing the upgrade.
Consider doing a shutdown of the MTA delivering mail to Mailman and removing
Mailman's crontab. 
2.	Configure and install as described above. 
3.	Restart Mailman's crontab and restart your MTA's delivery to Mailman. 
4.	If your installation already has archives: 
a.	Send a message to each of your archive-enabled lists. This will stimulate
the setup of the new per list htdig config files in the Mailman archives. 
b.	Consider rebuilding your existing archives with $prefix/bin/arch. This will
embed the ARCHIVE_INDEXING_ENABLE and ARCHIVE_INDEXING_DISABLE in the
regenerated archive pages and, after nightly_htdig has been run, give improved
search results. 
c.	Run the nightly_htdig script from the command line to generate an initial
set of per-list htdig search indices. 


Changing from local to remote htdig or vice versaÊ
----------------

1.	You will want to suspend operation of Mailman while making this change.
Consider doing a shutdown of the MTA delivering mail to Mailman and removing
Mailman's crontab. 
2.	Run the $prefix/bin/blow_away_htdig script to remove all existing per list
htdig config files and htdig indices/db files. 
3.	Configure per the instructions above for the local or remote target. 
4.	Restart Mailman's crontab and restart your MTA's delivery to Mailman. 
5.	Send a message to each of your archive-enabled lists. This will stimulate
the set up of the new per list htdig config files in Mailman archives. 
6.	Run the nightly_htdig script from the command line to generate a new set of
per list htdig search indices. 


Coping with htdig UpgradesÊ
----------------

If you change the version of htdig you run, you may find that the indices built
with the earlier version are not compatible with the newer version of htdig's
programs. In that case do the following: 

1.	You will want to suspend operation of Mailman while making this change.
Consider doing a shutdown of the MTA delivering mail to Mailman and removing
Mailman's crontab. 
2.	Run the $prefix/bin/blow_away_htdig script with the -i flag to remove all
existing per list htdig indices/db files. 
3.	Restart Mailman's crontab and restart your MTA's delivery to Mailman. 
4.	Run the nightly_htdig script from the command line to generate new sets of
per-list htdig search indices. 


Changing the Addressing Scheme of your web_page_urlÊ
----------------

If you change the addressing scheme of the web_page_url for a list to or from
http then you will need to rebuild the list's htdig configuration file(s) and
the related htdig indices. Do the following: 

1.	You may want to suspend operation of Mailman while making this change.
Consider doing a shutdown of the MTA delivering mail to Mailman and removing
Mailman's crontab. 
2.	Run the $prefix/bin/blow_away_htdig script to remove all existing per list
htdig material for the list(s) concerned. 
3.	Restart Mailman's crontab and restart your MTA's delivery to Mailman. 
4.	Send a message to each affected list to provoke reconstruction of the list's
htdig config file(s). 
5.	Run the nightly_htdig script from the command line to generate new sets of
per list htdig search indices. 


Operational InformationÊ
----------------

If you have just turned USE_HTDIG on or just used $prefix/bin/blow_away_htdig
(without the -i flag) there will be no per-list htdig information saved in the
archives. 

When the first post to each archive-enabled list is archived by pipermail, the
per-list htdig config file will be constructed and some directories and links
added to your Mailman archive directories. The htdig search form will be added
to list's TOC page. 

However, until one of the nightly_htdig scripts is run no htdig indices will be
constructed. You can either wait for the script to run as a cron job or run it
(while using the mailman UID) from the command line. 

Notes and WarningsÊ
----------------

Archive security problems resolved by htdig-2.1.3-0.2 patch
 

This patch is hopefully the final step in closing security holes in archive
access. 

In version htdig-2.1.3-0.1.patch, htdig.py was rebased on the standard MM
release's private.py which had moved on since the snapshot of it used as the
basis for htdig.py was originally taken. Among other things, htdig.py had been
modified to prevent access to some files in list archive directories such as a
list's archive pipermail.pck and files in the list's archive database
sub-directory. 

This rebasing action re-introduced to htdig.py the security holes, still extant
in private.py despite it being later code, via which private.py would serve
files such as a list's archive pipermail.pck and files in the list's archive
database sub-directory. 

The permissions on these files and directories mean that they are inaccessible
via the web server using /pipermail/ URIs if a list's archive is public. 

Additionally, check_perms is now modified so that the list archive htdig
subdirectory permissions are set to 2770 by default. Prior to
htdig-2.1.1-0.2.patch, this could not be done as the htsearch script, being run
with uid and gid of the Apache server, could then not gain access to files in
the htdig subdirectories. But, since the introduction of the mmsearch script,
which runs with the mailman gid and spawns htsearch, it can. This prevents
accees to the list archive htdig subdirectories via /pipemail/ URI's. Up until
htdig-2.1.3-0.2.patch this could only be achieved by using a RewriteRule or
similar in the Apache server's httpd.conf file. 
The only residual problem is that the revised permissions on the archive htdig
subdirectories may cause problems if the remote_mmsearch and remote-mmsearch are
used. This is because they will be run with uid and gid of the Apache server. If
this problem is encounterd, then you will have to manually add read and execute
permissions for 'other' to the archive htdig subdirectories and read permission
their contents, and then use RewriteRule or similar in the Apache server's
httpd.conf file for protection. 

The solution to this problem has been superceded in htdig-2.1.3-0.3.patch as
follows: Introduced the HTDIG_STRICT_FILE_PERM Mailman config variable as part
of dealing with htsearch access to per-list htdig directories permissions issue
when operating with remote htdig. See under the "Apache" heading above. 

Private archive security problem prior to htdig-2.1.1-0.2.patch versionÊ
----------------

Versions of the Mailman-htdig integration patch installed by versions of this
patch prior to htdig-2.1.1-0.2.patch allow a security exploit which can expose
information, held in the per-list search indexes of private list archives, to
unauthorised users. 

Via the exploit an unauthoized user can submit a search query to htdig's
htsearch CGI program without their having been authenticated as a user allowed
to access the list archive concerned. The results, returned in good faith by
htsearch, will expose some information that the user is not entitled to see. 

However, the security breakdown is not complete. Attempts to follow links
returned by htsearch, which go via the htdig CGI script installed by this patch,
will be blocked if the user is not authorized to access the list archive. 

Maintaining archive security with htdig-2.1.1-0.2.patch version and laterÊ
----------------

With htdig-2.1.1-0.2.patch and later versions of the patch: 

1.	htsearch is no longer used directly via CGI for searching list archives. 
2.	The symbolic link named by the HTDIG_MAILMAN_LINK configuration variable is
no longer used. Indeed, when upgrading earlier installations this symlink should
be deleted and the configuration variable deleted. Without this symlink, on a
normally configured system, htsearch no longer has the unaided ability to access
the per-list htdig configuration and other list archive associated files. 
3.	Thus, even if htsearch can be reached via CGI, it cannot undertake a search
of list archives when requested to do so by an HTTP request which seeks to
circumvent list archive security. 
4.	A new script, $prefix/Mailman/Cgi/mmsearch.py, is now used to search list
archives. This script applies the same user authentication as private.py and
htdig.py. Only if a user is authorised to access a list, does mmsearch use
htdig's htsearch to search a list's archive. In this case, mmsearch provides
htsearch with the information it needs to access the per-list htdig
configuration and other list archive associated files. 
5.	Where htidg and Mailman are run on the same machine, mmsearch acts as a
security wrapper, runs htsearch as a sub-process and list security is preserved
by this means. 
6.	Where htdig is run on a different machine to Mailman, mmsearch can perform
user authentication but has problems in acting as a security wrapper for
htsearch. The solution adopted is for one of two companion CGI scripts
(remote-mmsearch written in Perl or remote_mmsearch written in Python) to be
invoked on the remote htdig machine by an HTTP request made by mmsearch on the
Mailman machine. These scripts run htsearch, providing it with the information
it needs to access the per-list htdig configuration and other list archive
associated files. But, such an HTTP request can be made by other means and thus
the the same security exploit we are trying to avoid still exists. The only
protection in the case of remote htdig operation is that the
remote-mmsearch/remote_mmsearch scripts can be configured to operate only on
HTTP requests originating from specified IP numbers. By restricting operation to
requests originating on the Mailman server some semblance of list privacy can be
preserved. 


Upgrading to htdig-2.1.1-0.2.patch or later from an earlier patch versionÊ
----------------

If you are upgrading a Mailman installation that has an earlier version of the
the Mailman-htdig integration patch than that installed by htdig-2.1.1-0.2.patch
or later, you need to make some changes to that installation: 

1.	You must delete from your file system the symbolic link named by the
HTDIG_MAILMAN_LINK Mailman configuration variable. This link previously gave
htdig programs access to per list htdig configuration files. This is now done by
other means and the symlink allows a security exploit that prejudices the
privacy of list archives. 
2.	You must delete the HTDIG_MAILMAN_LINK Mailman configuration variable from
the $prefix/Mailman/mm-cfg.py file. 


These changes are in addition to the normal installation instructions given
below. Having configured and installed the newly patched version of Mailman you
must: 

1.	Run the script $prefix/bin/blow_away_htdig with the -c option to rebuild
per-list htdig conf files and delete existing per-list search indexes. 
2.	Run the $prefix/cron/nightly_htdig script from the command line to rebuild
per-list search indexes using the revised per-list htdig conf files just created
by blow_away_htdig. 


Redhat 7.1 and 7.2 installationsÊ
----------------

If you install htdig from the htdig-3.2.0 binary rpm of RH7.1/2 Binary CD 1 of 2
you also have to install the htdig-web-3.2.0 binary rpm. This may be from RH
7.1/2 Binary CD 2 of 2 or CD 1 of 2 depending on whether you are using actual
CDs or downloaded CD images. 

Apache/htdig issuesÊ
----------------

htdig's graphics file must be accessible via you web server and the Mailman
configuration variable HTDIG_FILES_URL setup accordingly. Depending on how you
install htdig and Apache you may need to add Alias and/or ScriptAlias directives
to you Apache configuration file to make the htdig components accessible. Check
the Apache and htdig documentation. 

ContributorsÊ
----------------

Original author and maintainer: 
Richard Barrett - <r.barrett at openinfo.co.uk> 
Past bug fixes: 
Nigel Metheringham <nigel.metheringham at vdata.co.uk> 
Stephan Berndts <stb-mm at spline.de>
Testers: 
*	Mark T. Valites <valites at geneseo.edu> 
*	Rehan van der Merwe <rehan at nha.co.za> 
Suggested Improvements:
Mark Sapiro <msapiro at msapiro.net>

HistoryÊ
----------------

CompatibilityÊ
----------------
Version of patch 	Version of Mailman 
htdig-2.1.10-0.1.patch 	Mailman 2.1.10 
htdig-2.1.9-0.1.patch 	Mailman 2.1.9 
htdig-2.1.7-0.1.patch 	Mailman 2.1.7 and 2.1.8
htdig-2.1.6-0.1.patch 	Mailman 2.1.6 
htdig-2.1.4-0.1.patch 	Mailman 2.1.4 
htdig-2.1.3-0.5.patch 	Mailman 2.1.3 
htdig-2.1.3-0.4.patch 	Mailman 2.1.3 
htdig-2.1.3-0.3.patch 	Mailman 2.1.3 
htdig-2.1.3-0.2.patch 	Mailman 2.1.3 
htdig-2.1.3-0.1.patch 	Mailman 2.1.3 
htdig-2.1.2-0.4.patch 	Mailman 2.1.2 
htdig-2.1.2-0.3.patch 	Mailman 2.1.2 
htdig-2.1.2-0.2.patch 	Mailman 2.1.2 
htdig-2.1.2-0.1.patch 	Mailman 2.1.2 
htdig-2.1.1-0.5.patch 	Mailman 2.1.1 
htdig-2.1.1-0.4.patch 	Mailman 2.1.1 
htdig-2.1.1-0.3.patch 	Mailman 2.1.1 
htdig-2.1.1-0.2.patch 	Mailman 2.1.1 
htdig-2.1.1-0.1.patch 	Mailman 2.1.1 
htdig-2.1-0.3.patch 	Mailman 2.1 
htdig-2.1-0.2.patch 	Mailman 2.1 
htdig-2.1-0.1.patch 	Mailman 2.1 
htdig-2.1b6-0.1.patch 	Mailman 2.1b6 
htdig-2.1b5-0.1.patch 	Mailman 2.1b5 
htdig-2.1b4-0.1.patch 	Mailman 2.1b4 
htdig-2.1b3-0.3.patch 	Mailman 2.1b3 
htdig-2.1b3-0.2.patch 	Mailman 2.1b3 
htdig-2.1b3-0.1.patch 	Mailman 2.1b3 
htdig-2.1b2-0.1.patch 	Mailman 2.1b2 
htdig-2.0.13-0.2.patch 	Mailman 2.0.13 
htdig-2.0.13-0.1.patch 	Mailman 2.0.13 
htdig-2.0.12-0.1.patch 	Mailman 2.0.12 
htdig-2.0.11-0.1.patch 	Mailman 2.0.11 
htdig-2.0.10-0.2.patch 	Mailman 2.0.10 
htdig-2.0.10-0.1.patch 	Mailman 2.0.10 
htdig-2.0.9-0.1.patch 	Mailman 2.0.9 
htdig-2.0.8-0.1.patch 	Mailman 2.0.8, 2.0.7, 2.0.6 and probably 2.0.3, 2.0.4
and 2.0.5 

ChangesÊ
----------------

htdig-2.1.10-0.1.patch:
1.  Updated patch for MM 2.1.9 compatibility.
2.  Change to setup_htdig() function in HyperArch.py suggested by 
    Mark Sapiro to ensure correct permissions set on directory creation.
    
htdig-2.1.9-0.1.patch:
1.  Updated patch for MM 2.1.9 compatibility.

htdig-2.1.7-0.1.patch:
1.  Updated patch for MM 2.1.7 compatibility.

htdig-2.1.6-0.1.patch:
1.  Updated patch for MM 2.1.6 compatibility.
    
    Note: the templates in $build/templates/<lang>/for the following 
    languages are NOT modified by this patch or by its precursor indexing 
    patch: ca, eu, sr, sv
    
    The following files in a language's default template directory should be 
    modified per the changes made to the en language templates after 
    installation of this patch if that other language is used ;'
                    
        templates/<lang>/archidxfoot.html
        templates/<lang>/archidxhead.html
        templates/<lang>/archtoc.html
        templates/<lang>/archtocentry.html
        templates/<lang>/archtocnombox.html
        templates/<lang>/article.html

htdig-2.1.4-0.1.patch:
1.  Updated patch for MM 2.1.4 compatibility.
2.  Removed untranslated versions of htdig.html from per-language directories
    under $build/templates, with the exception of the default templates/en/ 
    directory, that were present in previous versions of this patch.

htdig-2.1.3-0.5.patch:
1.  Modified htdig.py and private.py; the security changes introduced by
    htdig-2.1.3-0.2 patch to these scripts incorrectly  blocked access to the
    <listname>.mbox/<listname>.mbox file. The O.5 revison of the patch corrects
    this error. This problem and a suggested fix were pointed out to me in a
    private email by Stephan Berndts <stb-mm at spline.de> 

htdig-2.1.3-0.4.patch:
1.  Modified htdig.py and introduced htdig.html templates. The changes mean that 
    if the user is challenged for authentication, when the credentials are 
    submitted and accepted, the URL requested which led to the challenge is then 
    presented.

htdig-2.1.3-0.3.patch: 
1.	Patch documentation layout revised and simplified. 
2.	Changes to $prefix/bin/check_perms and $prefix/Mailman/Archiver/HyperArch.py
to improve handling of htdig subdirectory permissions if remote htdig is used.
End result is the same as with prior patch version in the case of local htdig. 
3.	Introduced the HTDIG_STRICT_FILE_PERM Mailman config variable as part of
dealing with htsearch access to per-list htdig directories permissions issue
when operating with remote htdig. See under the "Apache" heading above. 


htdig-2.1.3-0.2.patch: 
1.	This patch is hopefully the final step in closing security holes in archive
access. See the discussion below under the heading "Archive security problems
resolved by htdig-2.1.3-0.2 patch". 


htdig-2.1.3-0.1.patch: 
1.	updated patch for MM 2.1.3 compatibility. 


htdig-2.1.2-0.4.patch: 
1.	corrected error in mmsearch.py and remote_mmsearch. This caused a problem if
https was being used for accessing the archives as a pattern match to extract
the list name was misused. 


htdig-2.1.2-0.3.patch: 
1.	updates HyperArch.py so htdig related code uses quick_maketext() function
instead of the Utils.Maketext() function. 


htdig-2.1.2-0.2.patch: 
1.	corrects stupid error inserted in unpublished htdig-2.1.1-0.5.patch and
carried forward into htdig-2.1.2-0.1.patch 


htdig-2.1.2-0.1.patch: 
1.	updated patch for MM 2.1.2 compatibility 


htdig-2.1.1-0.5.patch: 
1.	with previous version the protototype htdig_conf.txt contained an htdig
exclude_urls directive for /cgi-bin/ and .cgi. If MM is configured so that the
URL for accessing the htdig.py cgi wrapper matches these excluded URLS (for
instance by running ./configure with --with-cgi-ext=".cgi") then nothing gets
indexed by rundig. The revised patch: 
a.	makes the excluded URL configurable through a MM config variable
HTDIG_EXCLUDED_URLS which defaults to the old hard-wired value. 
b.	when generating a list-specific htdig.conf file a check is made against
HTDIG_EXCLUDED_URLS and if anything in it would prevent indexing of the URL for
accessing the htdig.py cgi wrapper for that list, it is omitted from the
exclude_urls directive in that htdig.conf file. 


htdig-2.1.1-0.4.patch: 
1.	mmsearch.py and its remote kin remote-mmsearch and mm_search were overly
restrictive on the form fields they were willing to accept. Extended the list so
that multi-page search results worked. 


htdig-2.1.1-0.3.patch: 
1.	corrects silly error in raising an excpetion in mmsearch.py. This will only
show if there is a problem with mmsearch running the htsearch program. 


htdig-2.1.1-0.2.patch: 
1.	This version corrects a security exploit which allowed a URL to obtain an
htsearch results page without the user being authorised to access the list. Any
attempt to follows links on the results page were blocked correctly by
$prefix/Mailman/htdig.py but there was leakage of private information from the
list's search indexes on the page returned by htdig's htsearch CGI program. The
exploit is removed by this patch's revisions. The following sections describe
the problem, the solution and special actions required when updating a Mailman
installation using an earlier version of this patch: 
a.	Private archive security problem prior to htdig-2.1.1-0.2.patch version. 
b.	Maintaining private archive security with htdig-2.1.1-0.2.patch version and
later. 
c.	Upgrading to htdig-2.1.1-0.2.patch or later from an earlier patch version. 

Note that there is no patch revision to deal with this security problem for MM
2.0.13 or earlier and you should seriously consider updating to MM 2.1.x if you
want to implement this security fix. 


htdig-2.1.1-0.1.patch: 
1.	No functional change. Applies without offset warnings to MM 2.1.1 


htdig-2.1-0.3.patch: 
1.	corrects errors in the way $prefix/Mailman/htdig.py worked out content type
of file being returned. 
2.	$prefix/Mailman/htdig.py adopts revised method for establishing the default
URL introduced in 2.1 and as used in $prefix/Mailman/MailList.py 
3.	removed unecessary setup of variable DEFAULT_URL in cron scripts
$prefix/cron/remote_nightly_htdig_noshare and
$prefix/cron/remote_nightly_htdig.pl 
4.	Changes references to DEFAULT_URL in this document to DEFAULT_URL_PATTERN. 


htdig-2.1-0.2.patch: 
1.	improved content type and security handling in $prefix/Mailman/htdig.py.
Fixes bug with htdig.py and problem of interaction with bug in
$prefix/scripts/driver script (see patch #668685 for more details) 


htdig-2.1-0.1.patch: 
1.	Reworked patch for compatibility with MM 2.1. 


htdig-2.1b6-0.1.patch: 
1.	Reworked patch for compatibility with MM 2.1b6. 


htdig-2.1b5-0.1.patch: 
1.	Reworked patch for compatibility with MM 2.1b5. 


htdig-2.1b4-0.1.patch: 
1.	Reworked patch for compatibility with MM 2.1b4. As a consequence, the
remainder of the mailman-htdig integration templates that were strings declared
in Mailman/Archiver/HyperArch.py have been extracted into files under the
templates directory. Edit these with care if you must. 


htdig-2.1b3-0.3.patch: 
1.	Removed unecessary code dependency on Python 2.2 file() function 


htdig-2.1b3-0.2.patch: 
1.	Removed syntax error in htdig-2.1b3-0.1.patch which showed up as logged
errors in the operation of the ArchRunner qrunner at line 721 of HyperArch.py 


htdig-2.1b3-0.1.patch: 
1.	Reworked patch for compatibility with MM 2.1b3 
2.	Removed non-English language template files which were acting as
placeholders until someone actually translated them. 
3.	Removed updateTOC.py and replaced it with an alternate mechanism in a patch
to $prefix/Mailma/Queue/ArchRunner.py to update list TOC page after reindexing
by htdig. This new method is only exercised when the remote_nightly_htdig series
of cron scripts are used. 
4.	Changes to remote_nightly_htdig series of cron scripts to reflect demise of
updateTOC cgi script. 
5.	Multiple instances of code hygiene and conformance to MM "standards"
cleanup. 
6.	Tidied up this documentation. 


htdig-2.1b2-0.1.patch: 
1.	reworked patch for compatibility with MM 2.1b2 


htdig-2.0.13-0.2.patch: 
1.	Added license header 


htdig-2.0.13-0.1.patch: 
1.	Rebuilt patch to get no-comment application on Mailman 2.0.13 


htdig-2.0.12-0.1.patch: 
1.	Rebuilt patch to get no-comment application on Mailman 2.0.12 
2.	Added HTDIG_EXTRAS xonfig variable to allow arbitrary htdig configuration
parameters to be specified for addition to every htdig.conf file created i.e.
site wide additions. 


htdig-2.0.11-0.1.patch: 
1.	No substantive change. Simply rebuilt patch to get no-comment application on
Mailman 2.0.11 


htdig-2.0.10-0.2.patch: 
1.	Python 2.2 compatibility fixes to nightly_htdig cron script and its
relatives. Doing import * inside a function removed. 
2.	Added note on potential problems with htdig and file permissions. 


htdig-2.0.10-0.1.patch: 
1.	change in src/Makefile.in to get clean patch application to MM 2.0.10 


htdig-2.0.9-0.1.patch: 
1.	minor cosmetic changes to get clean patch application to MM 2.0.9 


htdig-2.0.8-0.1.patch: 
1.	resolves a problem with the integration of htdig when the web_page_url for a
list, which is usually the same as DEFAULT_URL from either
$prefix/Mailman/Defaults.py or $prefix/Mailman/mm_cfg.py, when it doesn't use
the http addressing scheme. This arises because htdig will only build indices if
the URLs for pages use the http addressing scheme. There is a work-around for
this problem posted in htdig's mail archives - see the copy in Appendix 1 to
this document. 
2.	This patch revision implements the solution documented in that e-mail. If
non-http URLs are used by the web_page_url of a list an additional htdig
configuration file for use by htsearch is generated. 
3.	In all other respects the operation of the Mailman-htdig integration remains
unchanged. There is no benefit in upgrading to this revised patch unless you
need to use other than http addressing in your DEFAULT_URL or set other than
http addressing in the web_page_url configuration of any of your lists. 
4.	If changing to or from a non-http addressing scheme then the per list htdig
config files of the lists affected and their associated htdig indices must be
reconstructed. See the section below entitled "Changing the Addressing Scheme of
your web_page_url" for details of how to do this. 


htdig-2.0.6-0.3.patch: 
1.	adds support for remote htdig, that is: running htdig on a different system
to Mailman. 
2.	enhances the configurability of the integration. Some of the programmed
assumptions made in previous versions are now configurable in mm_cfg.py. The
configuration variables concerned default to the previous fixed values so that
this version is backwards compatible with earlier versions. 
3.	does some minor cosmetic code changes. 
4.	extends the associated documentation. 




AppendicesÊ
----------------

Appendix 1 -Technique for htdigging when Mailman's web_page_url uses the https
scheme
 


A technique for htdigging when Mailman's web_page_url uses the https 
addressing scheme is described in this archived e-mail: 
http://www.htdig.org/mail/1999/10/0187.html

The text of that e-mail is as follows:

[htdig] Re: Help about htdig indexing https files

------------------------------------------------------------------------
Gilles Detillieux (grdetil at scrc.umanitoba.ca)
Wed, 27 Oct 1999 10:18:31 -0500 (CDT) 


Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] 
Next message: Avi Rappoport: "[htdig] indexing SSL (was: Help building 
the database)" 
Previous message: Gilles Detillieux: "Re: Fw: [htdig] mutiple search 
results" 
In reply to: Torsten Neuer: "Re: Fw: [htdig] mutiple search results" 

------------------------------------------------------------------------
According to Edouard DESSIOUX: 
> >Currently, htdig will not support URLs that begin with https://, even
> >when using local_urls to bypass the server. A trick that might work 
> >would be to index using http:// instead, but use local_urls to point 
> >to the directory that contains the contents of the secure server. 
> 
> I used that, and now, when i use htsearch, it work, except the fact 
> that all my URL are http://x.y.z/ instead of https://x.y.z/ 
> 
> >You'd need to use separate 
> >configuration files for digging and searching, and use 
> >url_part_aliases in each of these configuration files to rewrite the 
> >http:// into https:// in the search results. 
> 
> This is the part i dont understand, and i would like you to explain. 


It basically works as a search and replace. One url_part_aliases in the 
configuration file used by htdig maps the http://x.y.z/ into some 
special code like "*site", and another url_part_aliases in the 
configuration file used by htsearch maps the "*site" back into the value 
you want, i.e. https://x.y.z/. The substitution is left to right in 
htdig, and right to left in htsearch. So, if you use the same config 
file for both, or the same setting for both, you get back what you 
started with (but saved some space in the database because of the 
encoding). However, if you use two separate config files with different 
url_part_aliases setting for htdig and htsearch, you can remap parts of 
URLs from one substring to another. 


I hope this makes things clearer. I thought the current description at 
http://www.htdig.org/attrs.html#url_part_aliases was already quite 
clear. 



-- 
Gilles R. Detillieux              E-mail: <grdetil@scrc.umanitoba.ca>
Spinal Cord Research Centre       WWW:    
http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------