23
23
<p> This document describes the symptoms of Postfix SMTP server
24
overload, and how to avoid the condition under normal conditions.
25
When the condition is caused by botnets or other malware, the
26
document suggests configuration settings that help to minimize the
27
impact on legitimate mail. Finally, the document introduces
28
stress-adaptive behavior, introduced with Postfix 2.5, and how it
29
can be used to automatically switch configuration settings under
24
overload. It presents permanent <a href="postconf.5.html">main.cf</a> changes to avoid overload
25
during normal operation, and temporary <a href="postconf.5.html">main.cf</a> changes to cope with
26
an unexpected burst of mail. This document makes specific suggestions
27
for Postfix 2.5 and later which support stress-adaptive behavior,
28
and for earlier Postfix versions that don't. </p>
32
30
<p> Topics covered in this document: </p>
42
40
<li><a href="#hangup"> Disconnect suspicious SMTP clients </a>
44
<li><a href="#desperate"> Take desperate measures </a>
42
<li><a href="#legacy"> Temporary measures for older Postfix releases </a>
46
<li><a href="#adapt"> Make Postfix behavior stress-adaptive </a>
44
<li><a href="#adapt"> Automatic stress-adaptive behavior </a>
48
46
<li><a href="#feature"> Detecting support for stress-adaptive behavior </a>
50
48
<li><a href="#forcing"> Forcing stress-adaptive behavior on or off </a>
50
<li><a href="#other"> Other measures to off-load zombies </a>
52
52
<li><a href="#credits"> Credits </a>
56
56
<h2><a name="overload"> Symptoms of Postfix SMTP server overload </a></h2>
58
<p> Under normal conditions, Postfix responds immediately when a
59
remote SMTP client connects. The time needed to deliver mail should
60
be noticeable only with very large messages. Performance degrades
61
more dramatically when the number of remote SMTP clients exceeds
62
the number of Postfix SMTP server processes. When a client connects
63
while all server processes are busy, the client must wait until a
64
server process becomes available. </p>
66
<p> Overload may be caused by a legitimate mail (example: a DNS
67
registrar opens a new zone for registrations), by mistake (mail
68
explosion caused by a forwarding loop) or by illegitimate mail (worm
69
outbreak, botnet, or other malware activity). Symptoms of Postfix
70
SMTP mail server overload are: </p>
58
<p> Under normal conditions, the Postfix SMTP server responds
59
immediately when an SMTP client connects to it; the time to deliver
60
mail is noticeable only with large messages. Performance degrades
61
dramatically when the number of SMTP clients exceeds the number of
62
Postfix SMTP server processes. When an SMTP client connects while
63
all Postfix SMTP server processes are busy, the client must wait
64
until a server process becomes available. </p>
66
<p> SMTP server overload may be caused by a surge of legitimate
67
mail (example: a DNS registrar opens a new zone for registrations),
68
by mistake (mail explosion caused by a forwarding loop) or by malice
69
(worm outbreak, botnet, or other illegitimate activity). </p>
71
<p> Symptoms of Postfix SMTP server overload are: </p>
74
75
<li> <p> Remote SMTP clients experience a long delay before Postfix
75
sends the "220 hostname.example.com ESMTP Postfix" greeting. If
76
this affects end-user mail clients, enable the "submission" service
77
entry in <a href="master.5.html">master.cf</a> (present since Postfix 2.1), and tell users to
78
connect to this instead of the public SMTP service. </p>
76
sends the "220 hostname.example.com ESMTP Postfix" greeting. </p>
80
<li> <p> NOTE: Broken DNS configurations can also cause lengthy
81
delays before Postfix sends "220 hostname.example.com ...". These
82
delays also exist when Postfix is NOT overloaded. </p>
84
<li> <p> NOTE: To avoid "overload" delays for end-user mail
85
clients, enable the "submission" service entry in <a href="master.5.html">master.cf</a> (present
86
since Postfix 2.1), and tell users to connect to this instead of
87
the public SMTP service. </p>
80
91
<li> <p> The Postfix SMTP server logs an increased number of "lost
81
92
connection after CONNECT" events. This happens because remote SMTP
82
93
clients disconnect before Postfix answers the connection. </p>
97
<li> <p> NOTE: A portscan for open SMTP ports can also result in
98
"lost connection ..." logfile messages. </p>
84
102
<li> <p> Postfix 2.3 and later logs a warning that all server ports
98
<p> NOTE: The first two symptoms may also happen without overload,
103
<li> <p> Broken DNS also causes lengthy delays before "220
105
..." while the Postfix SMTP server tries to look up the client's
108
<li> <p> A portscan for open SMTP ports also results in "lost
109
connection ..." logfile messages. </p>
113
116
<p> Legitimate mail that doesn't get through during an episode of
114
overload is not necessarily lost. It should still arrive once the
115
situation returns to normal, as long as the overload condition is
117
Postfix SMTP server overload is not necessarily lost. It should
118
still arrive once the situation returns to normal, as long as the
119
overload condition is temporary. </p>
118
121
<h2><a name="concurrency"> Service more SMTP clients at the same time </a> </h2>
120
<p> To service more SMTP clients simultaneously, you need to increase
121
the number of SMTP server processes. This will improve the
123
<p> One measure to avoid the "all server processes busy" condition
124
is to service more SMTP clients simultaneously. For this you need
125
to increase the number of Postfix SMTP server processes. This will
122
127
responsiveness for remote SMTP clients, as long as the server machine
123
128
has enough hardware and software resources to run the additional
124
129
processes, and as long as the file system can keep up with the
199
207
mail. See <a href="BACKSCATTER_README.html">BACKSCATTER_README</a> for examples of the latter.
201
209
<li> <p> Group your <a href="postconf.5.html#header_checks">header_checks</a> and <a href="postconf.5.html#body_checks">body_checks</a> patterns to avoid
202
unnecessary pattern matching operations.
210
unnecessary pattern matching operations:
205
213
1 /etc/postfix/header_checks:
207
215
3 /^Subject: virus found in mail from you/ reject
208
4 /^Subject: ..../ ....
216
4 /^Subject: ..other../ reject
211
219
7 if /^Received:/
212
220
8 /^Received: from (postfix\.org) / reject forged client name in received header: $1
213
9 /^Received: from .../ ....
221
9 /^Received: from ..other../ reject ....
227
<li> <p> Use "421" reply codes for botnet-related RBLs or for
228
selected non-RBL restrictions. This causes Postfix 2.3 and later
229
to disconnect immediately without waiting for the remote SMTP
230
client to send a QUIT command. </p>
235
<li> <p> Use "521" SMTP reply codes (Postfix 2.6 and later) or "421"
236
(Postfix 2.3-2.5) to hang up on clients that that match botnet-related
237
RBLs (see next bullet) or that match selected non-RBL restrictions
238
such as SMTP access maps. The Postfix SMTP server will reject mail
239
and disconnect without waiting for the remote SMTP client to send
232
<p> You can set individual reject codes for RBLs, and for individual
233
responses from a specific RBL. We'll use zen.spamhaus.org as an
234
example; by the time you read this document, details may have
235
changed. Right now, their documents say that a response of 127.0.0.10
236
or 127.0.0.11 indicates a dynamic client IP address, which means
237
that the machine is probably running a bot of some kind. To give
238
a 421 response instead of the default 554 response, use something
242
<li> <p> To hang up connections from blacklisted zombies, you can
243
set specific Postfix SMTP server reject codes for specific RBLs,
244
and for individual responses from specific RBLs. We'll use
245
zen.spamhaus.org as an example; by the time you read this document,
246
details may have changed. Right now, their documents say that a
247
response of 127.0.0.10 or 127.0.0.11 indicates a dynamic client IP
248
address, which means that the machine is probably running a bot of
249
some kind. To give a 521 response instead of the default 554
250
response, use something like: </p>
242
253
1 /etc/postfix/<a href="postconf.5.html">main.cf</a>:
249
260
8 <a href="postconf.5.html#rbl_reply_maps">rbl_reply_maps</a> = hash:/etc/postfix/rbl_reply_maps
251
262
10 /etc/postfix/rbl_reply_maps:
252
11 zen.spamhaus.org=127.0.0.10 421 4.7.1 Service unavailable;
253
12 $rbl_class [$rbl_what] blocked using
254
13 $rbl_domain${rbl_reason?; $rbl_reason}
256
15 zen.spamhaus.org=127.0.0.11 421 4.7.1 Service unavailable;
257
16 $rbl_class [$rbl_what] blocked using
258
17 $rbl_domain${rbl_reason?; $rbl_reason}
263
11 # With Postfix 2.3-2.5 use "421" to hang up connections.
264
12 zen.spamhaus.org=127.0.0.10 521 4.7.1 Service unavailable;
265
13 $rbl_class [$rbl_what] blocked using
266
14 $rbl_domain${rbl_reason?; $rbl_reason}
268
16 zen.spamhaus.org=127.0.0.11 521 4.7.1 Service unavailable;
269
17 $rbl_class [$rbl_what] blocked using
270
18 $rbl_domain${rbl_reason?; $rbl_reason}
261
<p> Although the above shows three RBL lookups (lines 4-6), Postfix
262
will still only do a single DNS query, so the performance difference
265
<p> The down-side of sending 421 instead of the default 554 is that
266
it works only for zombies and other malware. If the client is running
267
a real MTA, then it may connect again several times until the mail
268
expires in its queue. When this is a problem, stick with the default
269
554 reply, and use "<a href="postconf.5.html#smtpd_hard_error_limit">smtpd_hard_error_limit</a> = 1" as described below.
272
<p> With Postfix 2.5, or with earlier releases that contain the
273
stress-adaptive behavior patch, you can turn on the above under
274
overload by replacing line 8 with: </p>
273
<p> Although the above example shows three RBL lookups (lines 4-6),
274
Postfix will only do a single DNS query, so it does not affect the
277
<li> <p> With Postfix 2.3-2.5, use reply code 421 (521 will not
278
cause Postfix to disconnect). The down-side of replying with 421
279
is that it works only for zombies and other malware. If the client
280
is running a real MTA, then it may connect again several times until
281
the mail expires in its queue. When this is a problem, stick with
282
the default 554 reply, and use "<a href="postconf.5.html#smtpd_hard_error_limit">smtpd_hard_error_limit</a> = 1" as
283
described below. </p>
285
<li> <p> You can automatically turn on the above overload measure
286
with Postfix 2.5 and later, or with earlier releases that contain
287
the stress-adaptive behavior source code patch from the mirrors
288
listed at <a href="http://www.postfix.org/download.html">http://www.postfix.org/download.html</a>. Simply replace line
277
292
8 <a href="postconf.5.html#rbl_reply_maps">rbl_reply_maps</a> = ${stress?hash:/etc/postfix/rbl_reply_maps}
280
297
<p> More information about automatic stress-adaptive behavior is
281
at the end of this document. </p>
285
<h2><a name="desperate"> Take desperate measures </a></h2>
287
<p> The following measures will still allow <b>most</b> legitimate
288
clients to connect and send mail, but may affect some legitimate
298
in section "<a href="#adapt">Automatic stress-adaptive behavior</a>".
301
<h2><a name="legacy"> Temporary measures for older Postfix releases </a></h2>
303
<p> See the next section, "<a href="#adapt">Automatic stress-adaptive
304
behavior</a>", if you are running Postfix version 2.5 or later, or
305
if you have applied the source code patch for stress-adaptive
306
behavior from the mirrors listed at <a href="http://www.postfix.org/download.html">http://www.postfix.org/download.html</a>.
309
<p> The following measures can be applied temporarily during overload.
310
They still allow <b>most</b> legitimate clients to connect and send
311
mail, but may affect some legitimate clients. </p>
308
330
names that didn't bother to unsubscribe. No mail should be lost,
309
331
as long as this measure is used only temporarily. </p>
311
<li> <p> Disable remote SMTP client hostname lookups, so that all
312
SMTP client hostnames become "unknown" (line 5 below). This feature
313
was introduced with Postfix 2.3. Unfortunately, this measure is
314
more problematic than the other ones proposed sofar. First, this
315
will result in loss of mail when you use hostname-based access rules
316
that reject mail from "unknown" SMTP clients (examples:
317
<a href="postconf.5.html#reject_unknown_client_hostname">reject_unknown_client_hostname</a>, <a href="postconf.5.html#reject_unknown_reverse_client_hostname">reject_unknown_reverse_client_hostname</a>).
318
Second, this may result in loss of mail when you subject "unknown"
319
SMTP clients to additional restrictions such as <a href="postconf.5.html#reject_unverified_sender">reject_unverified_sender</a>.
333
<li> <p> Use an <a href="postconf.5.html#smtpd_junk_command_limit">smtpd_junk_command_limit</a> of 1 instead of the default
334
100. This prevents clients from keeping idle connections open by
335
repeatedly sending NOOP or RSET commands. </p>
326
341
1 /etc/postfix/<a href="postconf.5.html">main.cf</a>:
327
342
2 <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> = 10
328
343
3 <a href="postconf.5.html#smtpd_hard_error_limit">smtpd_hard_error_limit</a> = 1
329
4 # Caution: line 5 may trigger REJECTs by hostname-based access rules
330
5 <a href="postconf.5.html#smtpd_peername_lookup">smtpd_peername_lookup</a> = no
344
4 <a href="postconf.5.html#smtpd_junk_command_limit">smtpd_junk_command_limit</a> = 1
334
<p> Except with the last measure, no mail should be lost, as long
348
<p> With these measures, no mail should be lost, as long
335
349
as these measures are used only temporarily. The next section of
336
350
this document introduces a way to automate this process. </p>
338
<h2><a name="adapt"> Make Postfix behavior stress-adaptive </a></h2>
352
<h2><a name="adapt"> Automatic stress-adaptive behavior </a></h2>
340
354
<p> Postfix version 2.5 introduces automatic stress-adaptive behavior.
341
This is also available as an add-on patch for Postfix versions 2.4
342
and 2.3 from the mirrors listed at <a href="http://www.postfix.org/download.html">http://www.postfix.org/download.html</a>.
345
<p> It works as follows. When a "public" network service runs into
346
an "all server ports are busy" condition, the <a href="master.8.html">master(8)</a> daemon logs
347
a warning, restarts the service (without interrupting existing
348
network sessions), and runs the service with "-o stress=yes" on the
349
command line. Normally, it runs a stress-adaptive service with "-o
350
stress=" on the command line (i.e. with an empty parameter value).
351
Other services never have "-o stress" parameters on the command
352
line, including services that listen on a loopback interface only.
355
<p> The stress pseudo-parameter value is the key to making <a href="postconf.5.html">main.cf</a>
356
parameter settings stress adaptive: </p>
360
1 /etc/postfix/<a href="postconf.5.html">main.cf</a>:
361
2 <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> = ${stress?10}${stress:300}
362
3 <a href="postconf.5.html#smtpd_hard_error_limit">smtpd_hard_error_limit</a> = ${stress?1}${stress:20}
355
This is also available as a source code patch for Postfix versions
356
2.4 and 2.3 from the mirrors listed at
357
<a href="http://www.postfix.org/download.html">http://www.postfix.org/download.html</a>. </p>
359
<p> It works as follows. When a "public" network service such as
360
the SMTP server runs into an "all server ports are busy" condition,
361
the Postfix <a href="master.8.html">master(8)</a> daemon logs a warning, restarts the service
362
(without interrupting existing network sessions), and runs the
363
service with "-o stress=yes" on the server process command line:
368
80821 ?? S 0:00.24 smtpd -n smtp -t inet -u -c -o stress=yes
372
<p> Normally, the Postfix <a href="master.8.html">master(8)</a> daemon runs such a service with
373
"-o stress=" on the command line (i.e. with an empty parameter
378
83326 ?? S 0:00.28 smtpd -n smtp -t inet -u -c -o stress=
382
<p> Services that have local access only never have "-o stress"
383
parameters on the command line. This includes services internal to
384
Postfix such as the queue manager, and services that listen on a
385
loopback interface only, such as after-filter SMTP services. </p>
387
<p> The "stress" parameter value is the key to making <a href="postconf.5.html">main.cf</a>
388
parameter settings stress adaptive. The following settings are the
389
default with Postfix 2.6 and later. With earlier Postfix versions
390
that have stress-adaptive support, append the lines below to the
391
<a href="postconf.5.html">main.cf</a> file and issue a "postfix reload" command: </p>
395
1 <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> = ${stress?10}${stress:300}s
396
2 <a href="postconf.5.html#smtpd_hard_error_limit">smtpd_hard_error_limit</a> = ${stress?1}${stress:20}
397
3 <a href="postconf.5.html#smtpd_junk_command_limit">smtpd_junk_command_limit</a> = ${stress?1}${stress:100}
370
<li> <p> Line 2: under conditions of stress, use an <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a>
371
value of 10 seconds instead of the default 300 seconds,
373
<li> <p> Line 3: under conditions of stress, use an <a href="postconf.5.html#smtpd_hard_error_limit">smtpd_hard_error_limit</a>
374
of 1 instead of the default 20. </p>
405
<li> <p> Line 1: under conditions of stress, use an <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a>
406
value of 10 seconds instead of the default 300 seconds. Experience
407
on the postfix-users list from a variety of sysadmins shows that
408
reducing the "normal" <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> to 60s is unlikely to affect
409
legitimate clients. However, it is unlikely to become the Postfix
410
default because it's not RFC compliant. Setting <a href="postconf.5.html#smtpd_timeout">smtpd_timeout</a> to
411
10s (line 2 below) or even 5s under stress will still allow most
412
legitimate clients to connect and send mail, but may delay mail
413
from some clients. No mail should be lost, as long as this measure
414
is used only temporarily. </p>
416
<li> <p> Line 2: under conditions of stress, use an <a href="postconf.5.html#smtpd_hard_error_limit">smtpd_hard_error_limit</a>
417
of 1 instead of the default 20. This helps by disconnecting clients
418
after a single error, giving other clients a chance to connect.
419
However, this may cause significant delays with legitimate mail,
420
such as a mailing list that contains a few no-longer-active user
421
names that didn't bother to unsubscribe. No mail should be lost,
422
as long as this measure is used only temporarily. </p>
424
<li> <p> Line 3: under conditions of stress, use an
425
<a href="postconf.5.html#smtpd_junk_command_limit">smtpd_junk_command_limit</a> of 1 instead of the default 100. This
426
prevents clients from keeping idle connections open by repeatedly
427
sending NOOP or RSET commands. </p>
515
<h2><a name="other"> Other measures to off-load zombies </a> </h2>
517
<p> OpenBSD <a href="http://www.openbsd.org/spamd/">spamd</a>
518
implements a daemon that handles all connections from "new" clients.
519
Only well-behaved mail clients are allowed to talk to the mail
520
server. Other clients are tarpitted, and will never get a chance
521
to affect mail server performance. </p>
523
<p> At some point in the future, Postfix may come with a simple
524
front-end daemon that does basic greylisting and pipelining detection
525
to keep zombies and other ratware away from Postfix itself. This
526
would use the "pass" service type which has been available in
527
stable Postfix releases since Postfix 2.5. </p>
462
529
<h2><a name="credits"> Credits </a></h2>