17
16
* Service more SMTP clients at the same time
18
17
* Spend less time per SMTP client
19
18
* Disconnect suspicious SMTP clients
20
* Take desperate measures
21
* Make Postfix behavior stress-adaptive
19
* Temporary measures for older Postfix releases
20
* Automatic stress-adaptive behavior
22
21
* Detecting support for stress-adaptive behavior
23
22
* Forcing stress-adaptive behavior on or off
23
* Other measures to off-load zombies
26
26
SSyymmppttoommss ooff PPoossttffiixx SSMMTTPP sseerrvveerr oovveerrllooaadd
28
Under normal conditions, Postfix responds immediately when a remote SMTP client
29
connects. The time needed to deliver mail should be noticeable only with very
30
large messages. Performance degrades more dramatically when the number of
31
remote SMTP clients exceeds the number of Postfix SMTP server processes. When a
32
client connects while all server processes are busy, the client must wait until
33
a server process becomes available.
35
Overload may be caused by a legitimate mail (example: a DNS registrar opens a
36
new zone for registrations), by mistake (mail explosion caused by a forwarding
37
loop) or by illegitimate mail (worm outbreak, botnet, or other malware
38
activity). Symptoms of Postfix SMTP mail server overload are:
28
Under normal conditions, the Postfix SMTP server responds immediately when an
29
SMTP client connects to it; the time to deliver mail is noticeable only with
30
large messages. Performance degrades dramatically when the number of SMTP
31
clients exceeds the number of Postfix SMTP server processes. When an SMTP
32
client connects while all Postfix SMTP server processes are busy, the client
33
must wait until a server process becomes available.
35
SMTP server overload may be caused by a surge of legitimate mail (example: a
36
DNS registrar opens a new zone for registrations), by mistake (mail explosion
37
caused by a forwarding loop) or by malice (worm outbreak, botnet, or other
38
illegitimate activity).
40
Symptoms of Postfix SMTP server overload are:
40
42
* Remote SMTP clients experience a long delay before Postfix sends the "220
41
hostname.example.com ESMTP Postfix" greeting. If this affects end-user mail
42
clients, enable the "submission" service entry in master.cf (present since
43
Postfix 2.1), and tell users to connect to this instead of the public SMTP
43
hostname.example.com ESMTP Postfix" greeting.
45
o NOTE: Broken DNS configurations can also cause lengthy delays before
46
Postfix sends "220 hostname.example.com ...". These delays also exist
47
when Postfix is NOT overloaded.
49
o NOTE: To avoid "overload" delays for end-user mail clients, enable the
50
"submission" service entry in master.cf (present since Postfix 2.1),
51
and tell users to connect to this instead of the public SMTP service.
46
53
* The Postfix SMTP server logs an increased number of "lost connection after
47
54
CONNECT" events. This happens because remote SMTP clients disconnect before
48
55
Postfix answers the connection.
57
o NOTE: A portscan for open SMTP ports can also result in "lost
58
connection ..." logfile messages.
50
60
* Postfix 2.3 and later logs a warning that all server ports are busy:
52
62
Oct 3 20:39:27 spike postfix/master[28905]: warning: service "smtp"
56
66
condition, increase the process count in master.cf or reduce the
57
67
service time per client
59
NOTE: The first two symptoms may also happen without overload, for example:
61
* Broken DNS also causes lengthy delays before "220 hostname.example.com ..."
62
while the Postfix SMTP server tries to look up the client's hostname.
64
* A portscan for open SMTP ports also results in "lost connection ..."
67
Legitimate mail that doesn't get through during an episode of overload is not
68
necessarily lost. It should still arrive once the situation returns to normal,
69
as long as the overload condition is temporary.
69
Legitimate mail that doesn't get through during an episode of Postfix SMTP
70
server overload is not necessarily lost. It should still arrive once the
71
situation returns to normal, as long as the overload condition is temporary.
71
73
SSeerrvviiccee mmoorree SSMMTTPP cclliieennttss aatt tthhee ssaammee ttiimmee
73
To service more SMTP clients simultaneously, you need to increase the number of
74
SMTP server processes. This will improve the responsiveness for remote SMTP
75
clients, as long as the server machine has enough hardware and software
75
One measure to avoid the "all server processes busy" condition is to service
76
more SMTP clients simultaneously. For this you need to increase the number of
77
Postfix SMTP server processes. This will improve the responsiveness for remote
78
SMTP clients, as long as the server machine has enough hardware and software
76
79
resources to run the additional processes, and as long as the file system can
77
80
keep up with the additional load.
156
162
by hanging up on suspicious clients, so that other clients get a chance to talk
159
* Use "421" reply codes for botnet-related RBLs or for selected non-RBL
160
restrictions. This causes Postfix 2.3 and later to disconnect immediately
161
without waiting for the remote SMTP client to send a QUIT command.
165
* Use "521" SMTP reply codes (Postfix 2.6 and later) or "421" (Postfix 2.3-
166
2.5) to hang up on clients that that match botnet-related RBLs (see next
167
bullet) or that match selected non-RBL restrictions such as SMTP access
168
maps. The Postfix SMTP server will reject mail and disconnect without
169
waiting for the remote SMTP client to send a QUIT command.
163
You can set individual reject codes for RBLs, and for individual responses
164
from a specific RBL. We'll use zen.spamhaus.org as an example; by the time
165
you read this document, details may have changed. Right now, their
171
* To hang up connections from blacklisted zombies, you can set specific
172
Postfix SMTP server reject codes for specific RBLs, and for individual
173
responses from specific RBLs. We'll use zen.spamhaus.org as an example; by
174
the time you read this document, details may have changed. Right now, their
166
175
documents say that a response of 127.0.0.10 or 127.0.0.11 indicates a
167
176
dynamic client IP address, which means that the machine is probably running
168
a bot of some kind. To give a 421 response instead of the default 554
177
a bot of some kind. To give a 521 response instead of the default 554
169
178
response, use something like:
171
180
1 /etc/postfix/main.cf:
178
187
8 rbl_reply_maps = hash:/etc/postfix/rbl_reply_maps
180
189
10 /etc/postfix/rbl_reply_maps:
181
11 zen.spamhaus.org=127.0.0.10 421 4.7.1 Service unavailable;
182
12 $rbl_class [$rbl_what] blocked using
183
13 $rbl_domain${rbl_reason?; $rbl_reason}
185
15 zen.spamhaus.org=127.0.0.11 421 4.7.1 Service unavailable;
186
16 $rbl_class [$rbl_what] blocked using
187
17 $rbl_domain${rbl_reason?; $rbl_reason}
189
Although the above shows three RBL lookups (lines 4-6), Postfix will still
190
only do a single DNS query, so the performance difference is negligible.
192
The down-side of sending 421 instead of the default 554 is that it works
193
only for zombies and other malware. If the client is running a real MTA,
194
then it may connect again several times until the mail expires in its
195
queue. When this is a problem, stick with the default 554 reply, and use
190
11 # With Postfix 2.3-2.5 use "421" to hang up connections.
191
12 zen.spamhaus.org=127.0.0.10 521 4.7.1 Service unavailable;
192
13 $rbl_class [$rbl_what] blocked using
193
14 $rbl_domain${rbl_reason?; $rbl_reason}
195
16 zen.spamhaus.org=127.0.0.11 521 4.7.1 Service unavailable;
196
17 $rbl_class [$rbl_what] blocked using
197
18 $rbl_domain${rbl_reason?; $rbl_reason}
199
Although the above example shows three RBL lookups (lines 4-6), Postfix
200
will only do a single DNS query, so it does not affect the performance.
202
* With Postfix 2.3-2.5, use reply code 421 (521 will not cause Postfix to
203
disconnect). The down-side of replying with 421 is that it works only for
204
zombies and other malware. If the client is running a real MTA, then it may
205
connect again several times until the mail expires in its queue. When this
206
is a problem, stick with the default 554 reply, and use
196
207
"smtpd_hard_error_limit = 1" as described below.
198
With Postfix 2.5, or with earlier releases that contain the stress-adaptive
199
behavior patch, you can turn on the above under overload by replacing line
209
* You can automatically turn on the above overload measure with Postfix 2.5
210
and later, or with earlier releases that contain the stress-adaptive
211
behavior source code patch from the mirrors listed at http://
212
www.postfix.org/download.html. Simply replace line above 8 with:
202
214
8 rbl_reply_maps = ${stress?hash:/etc/postfix/rbl_reply_maps}
204
More information about automatic stress-adaptive behavior is at the end of
207
TTaakkee ddeessppeerraattee mmeeaassuurreess
209
The following measures will still allow mmoosstt legitimate clients to connect and
210
send mail, but may affect some legitimate clients.
216
More information about automatic stress-adaptive behavior is in section
217
"Automatic stress-adaptive behavior".
219
TTeemmppoorraarryy mmeeaassuurreess ffoorr oollddeerr PPoossttffiixx rreelleeaasseess
221
See the next section, "Automatic stress-adaptive behavior", if you are running
222
Postfix version 2.5 or later, or if you have applied the source code patch for
223
stress-adaptive behavior from the mirrors listed at http://www.postfix.org/
226
The following measures can be applied temporarily during overload. They still
227
allow mmoosstt legitimate clients to connect and send mail, but may affect some
212
230
* Reduce smtpd_timeout (default: 300s). Experience on the postfix-users list
213
231
from a variety of sysadmins shows that reducing the "normal" smtpd_timeout
225
243
longer-active user names that didn't bother to unsubscribe. No mail should
226
244
be lost, as long as this measure is used only temporarily.
228
* Disable remote SMTP client hostname lookups, so that all SMTP client
229
hostnames become "unknown" (line 5 below). This feature was introduced with
230
Postfix 2.3. Unfortunately, this measure is more problematic than the other
231
ones proposed sofar. First, this will result in loss of mail when you use
232
hostname-based access rules that reject mail from "unknown" SMTP clients
233
(examples: reject_unknown_client_hostname,
234
reject_unknown_reverse_client_hostname). Second, this may result in loss of
235
mail when you subject "unknown" SMTP clients to additional restrictions
236
such as reject_unverified_sender.
246
* Use an smtpd_junk_command_limit of 1 instead of the default 100. This
247
prevents clients from keeping idle connections open by repeatedly sending
248
NOOP or RSET commands.
238
250
1 /etc/postfix/main.cf:
239
251
2 smtpd_timeout = 10
240
252
3 smtpd_hard_error_limit = 1
241
4 # Caution: line 5 may trigger REJECTs by hostname-based access rules
243
5 smtpd_peername_lookup = no
245
Except with the last measure, no mail should be lost, as long as these measures
246
are used only temporarily. The next section of this document introduces a way
247
to automate this process.
249
MMaakkee PPoossttffiixx bbeehhaavviioorr ssttrreessss--aaddaappttiivvee
253
4 smtpd_junk_command_limit = 1
255
With these measures, no mail should be lost, as long as these measures are used
256
only temporarily. The next section of this document introduces a way to
257
automate this process.
259
AAuuttoommaattiicc ssttrreessss--aaddaappttiivvee bbeehhaavviioorr
251
261
Postfix version 2.5 introduces automatic stress-adaptive behavior. This is also
252
available as an add-on patch for Postfix versions 2.4 and 2.3 from the mirrors
253
listed at http://www.postfix.org/download.html.
255
It works as follows. When a "public" network service runs into an "all server
256
ports are busy" condition, the master(8) daemon logs a warning, restarts the
257
service (without interrupting existing network sessions), and runs the service
258
with "-o stress=yes" on the command line. Normally, it runs a stress-adaptive
259
service with "-o stress=" on the command line (i.e. with an empty parameter
260
value). Other services never have "-o stress" parameters on the command line,
261
including services that listen on a loopback interface only.
263
The stress pseudo-parameter value is the key to making main.cf parameter
264
settings stress adaptive:
266
1 /etc/postfix/main.cf:
267
2 smtpd_timeout = ${stress?10}${stress:300}
268
3 smtpd_hard_error_limit = ${stress?1}${stress:20}
262
available as a source code patch for Postfix versions 2.4 and 2.3 from the
263
mirrors listed at http://www.postfix.org/download.html.
265
It works as follows. When a "public" network service such as the SMTP server
266
runs into an "all server ports are busy" condition, the Postfix master(8)
267
daemon logs a warning, restarts the service (without interrupting existing
268
network sessions), and runs the service with "-o stress=yes" on the server
269
process command line:
271
80821 ?? S 0:00.24 smtpd -n smtp -t inet -u -c -o stress=yes
273
Normally, the Postfix master(8) daemon runs such a service with "-o stress=" on
274
the command line (i.e. with an empty parameter value):
276
83326 ?? S 0:00.28 smtpd -n smtp -t inet -u -c -o stress=
278
Services that have local access only never have "-o stress" parameters on the
279
command line. This includes services internal to Postfix such as the queue
280
manager, and services that listen on a loopback interface only, such as after-
281
filter SMTP services.
283
The "stress" parameter value is the key to making main.cf parameter settings
284
stress adaptive. The following settings are the default with Postfix 2.6 and
285
later. With earlier Postfix versions that have stress-adaptive support, append
286
the lines below to the main.cf file and issue a "postfix reload" command:
288
1 smtpd_timeout = ${stress?10}${stress:300}s
289
2 smtpd_hard_error_limit = ${stress?1}${stress:20}
290
3 smtpd_junk_command_limit = ${stress?1}${stress:100}
272
* Line 2: under conditions of stress, use an smtpd_timeout value of 10
273
seconds instead of the default 300 seconds,
275
* Line 3: under conditions of stress, use an smtpd_hard_error_limit of 1
276
instead of the default 20.
294
* Line 1: under conditions of stress, use an smtpd_timeout value of 10
295
seconds instead of the default 300 seconds. Experience on the postfix-users
296
list from a variety of sysadmins shows that reducing the "normal"
297
smtpd_timeout to 60s is unlikely to affect legitimate clients. However, it
298
is unlikely to become the Postfix default because it's not RFC compliant.
299
Setting smtpd_timeout to 10s (line 2 below) or even 5s under stress will
300
still allow most legitimate clients to connect and send mail, but may delay
301
mail from some clients. No mail should be lost, as long as this measure is
302
used only temporarily.
304
* Line 2: under conditions of stress, use an smtpd_hard_error_limit of 1
305
instead of the default 20. This helps by disconnecting clients after a
306
single error, giving other clients a chance to connect. However, this may
307
cause significant delays with legitimate mail, such as a mailing list that
308
contains a few no-longer-active user names that didn't bother to
309
unsubscribe. No mail should be lost, as long as this measure is used only
312
* Line 3: under conditions of stress, use an smtpd_junk_command_limit of 1
313
instead of the default 100. This prevents clients from keeping idle
314
connections open by repeatedly sending NOOP or RSET commands.
278
316
The syntax of ${name?value} and ${name:value} is explained at the beginning of
279
317
the postconf(5) manual page.