20
20
<h2>Purpose of this document </h2>
22
<p> This document describes the qshape(1) program which helps the
23
administrator understand the Postfix queue message distribution
24
sorted by time and by sender or recipient domain. qshape(1) is
25
bundled with the Postfix 2.1 source under the "auxiliary" directory.
28
<p> In order to understand the output of qshape(1), it useful to
29
understand the various Postfix queues. To this end the role of each
30
Postfix queue directory is described briefly in the "Background
31
info: Postfix queue directories" section near the end of this
22
<p> This document is an introduction to Postfix queue congestion analysis.
23
It explains how the qshape(1) program can help to track down the
24
reason for queue congestion. qshape(1) is bundled with Postfix
25
2.1 and later source code, under the "auxiliary" directory. This
26
document describes qshape(1) as bundled with Postfix 2.4. </p>
34
28
<p> This document covers the following topics: </p>
381
380
take measures to ensure that the mail is deferred instead or even
382
381
add an access(5) rule asking the sender to try again later. </p>
384
<p> If a high volume destination exhibits frequent bursts of
385
consecutive connections refused by all MX hosts or "421 Server busy
386
errors", it is possible for the queue manager to mark the destination
387
as "dead" despite the transient nature of the errors. The destination
388
will be retried again after the expiration of a $minimal_backoff_time
389
timer. If the error bursts are frequent enough it may be that only
390
a small quantity of email is delivered before the destination is
391
again marked "dead". </p>
383
<p> If a high volume destination exhibits frequent bursts of consecutive
384
connections refused by all MX hosts or "421 Server busy errors", it
385
is possible for the queue manager to mark the destination as "dead"
386
despite the transient nature of the errors. The destination will be
387
retried again after the expiration of a $minimal_backoff_time timer.
388
If the error bursts are frequent enough it may be that only a small
389
quantity of email is delivered before the destination is again marked
390
"dead". In some cases enabling static (not on demand) connection
391
caching by listing the appropriate nexthop domain in a table included in
392
"smtp_connection_cache_destinations" may help to reduce the error rate,
393
because most messages will re-use existing connections. </p>
393
395
<p> The MTA that has been observed most frequently to exhibit such
394
396
bursts of errors is Microsoft Exchange, which refuses connections
396
398
server propagate the refused connection to the client as a "421"
399
<p> Note that it is now possible to configure Postfix to exhibit
400
similarly erratic behavior by misconfiguring the anvil(8) server
401
(not included in Postfix 2.1.). Do not use anvil(8) for steady-state
402
rate limiting, its purpose is DoS prevention and the rate limits
403
set should be very generous! </p>
405
<p> In the long run it is hoped that the Postfix dead host detection
406
and concurrency control mechanism will be tuned to be more "noise"
407
tolerant. If one finds oneself needing to deliver a high volume
408
of mail to a destination that exhibits frequent brief bursts of
409
errors, there is a subtle workaround. </p>
401
<p> Note that it is now possible to configure Postfix to exhibit similarly
402
erratic behavior by misconfiguring the anvil(8) service. Do not use
403
anvil(8) for steady-state rate limiting, its purpose is (unintentional)
404
DoS prevention and the rate limits set should be very generous! </p>
406
<p> If one finds oneself needing to deliver a high volume of mail to a
407
destination that exhibits frequent brief bursts of errors and connection
408
caching does not solve the problem, there is a subtle workaround. </p>
412
<li> <p> Postfix version 2.5 and later: </p>
416
<li> <p> In master.cf set up a dedicated clone of the "smtp" transport
417
for the destination in question. In the example below we will call
420
<li> <p> In master.cf configure a reasonable process limit for the
421
cloned smtp transport (a number in the 10-20 range is typical). </p>
423
<li> <p> IMPORTANT!!! In main.cf configure a large per-destination
424
pseudo-cohort failure limit for the cloned smtp transport. </p>
427
/etc/postfix/main.cf:
428
transport_maps = hash:/etc/postfix/transport
429
fragile_destination_concurrency_failed_cohort_limit = 100
430
fragile_destination_concurrency_limit = 20
432
/etc/postfix/transport:
435
/etc/postfix/master.cf:
436
# service type private unpriv chroot wakeup maxproc command
437
fragile unix - - n - 20 smtp
440
<p> See also the documentation for
441
default_destination_concurrency_failed_cohort_limit and
442
default_destination_concurrency_limit. </p>
446
<li> <p> Earlier Postfix versions: </p>
413
450
<li> <p> In master.cf set up a dedicated clone of the "smtp"
414
transport for the destination in question. </p>
451
transport for the destination in question. In the example below
452
we will call it "fragile". </p>
416
454
<li> <p> In master.cf configure a reasonable process limit for the
417
455
transport (a number in the 10-20 range is typical). </p>
419
457
<li> <p> IMPORTANT!!! In main.cf configure a very large initial
420
and destination concurrency limit for this transport (say 200). </p>
458
and destination concurrency limit for this transport (say 2000). </p>
423
461
/etc/postfix/main.cf:
424
initial_destination_concurrency = 200
425
<i>transportname</i>_destination_concurrency_limit = 200
462
transport_maps = hash:/etc/postfix/transport
463
initial_destination_concurrency = 2000
464
fragile_destination_concurrency_limit = 2000
466
/etc/postfix/transport:
469
/etc/postfix/master.cf:
470
# service type private unpriv chroot wakeup maxproc command
471
fragile unix - - n - 20 smtp
428
<p> Where <i>transportname</i> is the name of the master.cf entry
433
<p> The effect of this surprising configuration is that up to 200
474
<p> See also the documentation for default_destination_concurrency_limit.
481
<p> The effect of this configuration is that up to 2000
434
482
consecutive errors are tolerated without marking the destination
435
483
dead, while the total concurrency remains reasonable (10-20
436
484
processes). This trick is only for a very specialized situation:
437
485
high volume delivery into a channel with multi-error bursts
438
486
that is capable of high throughput, but is repeatedly throttled by
439
the bursts of errors.
487
the bursts of errors. </p>
441
489
<p> When a destination is unable to handle the load even after the
442
490
Postfix process limit is reduced to 1, a desperate measure is to
495
<li> <p> Postfix version 2.5 and later: </p>
499
<li> <p> In master.cf set up a dedicated clone of the "smtp" transport
500
for the problem destination. In the example below we call it "slow".
503
<li> <p> In main.cf configure a short delay between deliveries to
504
the same destination. </p>
507
/etc/postfix/main.cf:
508
transport_maps = hash:/etc/postfix/transport
509
slow_destination_rate_delay = 1
511
/etc/postfix/transport:
514
/etc/postfix/master.cf:
515
# service type private unpriv chroot wakeup maxproc command
516
slow unix - - n - - smtp
521
<p> See also the documentation for default_destination_rate_delay. </p>
523
<p> This solution forces the Postfix smtp(8) client to wait for
524
$slow_destination_rate_delay seconds between deliveries to the same
527
<li> <p> Earlier Postfix versions: </p>
447
531
<li> <p> In the transport map entry for the problem destination,
448
532
specify a dead host as the primary nexthop. </p>
452
536
smtp_connect_timeout value. </p>
539
/etc/postfix/main.cf:
540
transport_maps = hash:/etc/postfix/transport
455
542
/etc/postfix/transport:
456
problem.example.com slow:[dead.host]
543
example.com slow:[dead.host]
458
545
/etc/postfix/master.cf:
459
546
# service type private unpriv chroot wakeup maxproc command
460
547
slow unix - - n - 1 smtp
461
548
-o fallback_relay=problem.example.com
462
549
-o smtp_connect_timeout=1
550
-o smtp_connection_cache_on_demand=no
467
555
<p> This solution forces the Postfix smtp(8) client to wait for
468
$smtp_connect_timeout seconds between deliveries. The solution
469
depends on Postfix connection management details, and needs to be
470
updated when SMTP connection caching is introduced. </p>
472
<p> Hopefully a more elegant solution to these problems will be
473
found in the future. </p>
475
<h2><a name="queues">Background info: Postfix queue directories</a></h2>
556
$smtp_connect_timeout seconds between deliveries. The connection
557
caching feature is disabled to prevent the client from skipping
558
over the dead host. </p>
562
<h2><a name="queues">Postfix queue directories</a></h2>
477
564
<p> The following sections describe Postfix queues: their purpose,
478
565
what normal behavior looks like, and how to diagnose abnormal
545
633
<p> The administrator can define "smtpd" access(5) policies, or
546
634
cleanup(8) header/body checks that cause messages to be automatically
547
635
diverted from normal processing and placed indefinitely in the
548
"hold" queue. Messages placed in the "hold" queue stay there until
636
"hold" queue. Messages placed in the "hold" queue stay there until
549
637
the administrator intervenes. No periodic delivery attempts are
550
638
made for messages in the "hold" queue. The postsuper(1) command
551
639
can be used to manually release messages into the "deferred" queue.
554
<p> Messages can potentially stay in the "hold" queue for a time
555
exceeding the normal maximal queue lifetime (after which undelivered
556
messages are bounced back to the sender). If such "old" messages
557
need to be released from the "hold" queue, they should typically
558
be moved into the "maildrop" queue, so that the message gets a new
559
timestamp and is given more than one opportunity to be delivered.
560
Messages that are "young" can be moved directly into the "deferred"
642
<p> Messages can potentially stay in the "hold" queue longer than
643
$maximal_queue_lifetime. If such "old" messages need to be released from
644
the "hold" queue, they should typically be moved into the "maildrop"
645
queue using "postsuper -r", so that the message gets a new timestamp and
646
is given more than one opportunity to be delivered. Messages that are
647
"young" can be moved directly into the "deferred" queue using
563
650
<p> The "hold" queue plays little role in Postfix performance, and
564
651
monitoring of the "hold" queue is typically more closely motivated
590
677
<p> The incoming queue grows when the message input rate spikes
591
678
above the rate at which the queue manager can import messages into
592
the active queue. The main factor slowing down the queue manager
593
is transport queries to the trivial-rewrite service. If the queue
679
the active queue. The main factors slowing down the queue manager
680
are disk I/O and lookup queries to the trivial-rewrite service. If the queue
594
681
manager is routinely not keeping up, consider not using "slow"
595
682
lookup services (MySQL, LDAP, ...) for transport lookups or speeding
596
up the hosts that provide the lookup service. </p>
683
up the hosts that provide the lookup service. If the problem is I/O
684
starvation, consider striping the queue over more disks, faster controllers
685
with a battery write cache, or other hardware improvements. At the very
686
least, make sure that the queue directory is mounted with the "noatime"
687
option if applicable to the underlying filesystem. </p>
598
689
<p> The in_flow_delay parameter is used to clamp the input rate
599
690
when the queue manager starts to fall behind. The cleanup(8) service
645
736
concurrency limit. </p>
647
738
<p> Multiple recipient groups (from one or more messages) are queued
648
for delivery via the common transport/nexthop combination. The
649
destination concurrency limit for the transports caps the number
739
for delivery grouped by transport/nexthop combination. The
740
<b>destination</b> concurrency limit for the transports caps the number
650
741
of simultaneous delivery attempts for each nexthop. Transports with
651
a recipient concurrency limit of 1 are special: these are grouped
652
by the actual recipient address rather than the nexthop, thereby
653
enabling per-recipient concurrency limits rather than per-domain
742
a <b>recipient</b> concurrency limit of 1 are special: these are grouped
743
by the actual recipient address rather than the nexthop, yielding
744
per-recipient concurrency limits rather than per-domain
654
745
concurrency limits. Per-recipient limits are appropriate when
655
746
performing final delivery to mailboxes rather than when relaying
656
747
to a remote server. </p>
658
749
<p> Congestion occurs in the active queue when one or more destinations
659
drain slower than the corresponding message input rate. If a
660
destination is down for some time, the queue manager will mark it
661
dead, and immediately defer all mail for the destination without
750
drain slower than the corresponding message input rate. </p>
752
<p> Input into the active queue comes both from new mail in the "incoming"
753
queue, and retries of mail in the "deferred" queue. Should the "deferred"
754
queue get really large, retries of old mail can dominate the arrival
755
rate of new mail. Systems with more CPU, faster disks and more network
756
bandwidth can deal with larger deferred queues, but as a rule of thumb
757
the deferred queue scales to somewhere between 100,000 and 1,000,000
758
messages with good performance unlikely above that "limit". Systems with
759
queues this large should typically stop accepting new mail, or put the
760
backlog "on hold" until the underlying issue is fixed (provided that
761
there is enough capacity to handle just the new mail). </p>
763
<p> When a destination is down for some time, the queue manager will
764
mark it dead, and immediately defer all mail for the destination without
662
765
trying to assign it to a delivery agent. In this case the messages
663
will quickly leave the active queue and end up in the deferred
664
queue. If the destination is instead simply slow, or there is a
665
problem causing an excessive arrival rate the active queue will
666
grow and will become dominated by mail to the congested destination.
766
will quickly leave the active queue and end up in the deferred queue
767
(with Postfix < 2.4, this is done directly by the queue manager,
768
with Postfix ≥ 2.4 this is done via the "retry" delivery agent). </p>
770
<p> When the destination is instead simply slow, or there is a problem
771
causing an excessive arrival rate the active queue will grow and will
772
become dominated by mail to the congested destination. </p>
669
774
<p> The only way to reduce congestion is to either reduce the input
670
775
rate or increase the throughput. Increasing the throughput requires
691
796
is draining slowly and the system and network are not loaded, raise
692
797
the "smtp" and/or "relay" process limits! </p>
694
<p> Especially for the "relay" transport, consider lower SMTP
695
connection timeouts (1-5 seconds) and higher than default destination
696
concurrency limits. Compute the expected latency when 1 out of N
697
of the MX hosts for a high volume site is down and not responding,
698
and make sure that the configured concurrency divided by this
699
latency exceeds the required steady-state message rate. If the
700
destination is managed by you, consider load balancers in front of
701
groups of MX hosts. Load balancers have higher uptime and will be
702
able to hide individual MX host failures. </p>
704
<p> If necessary, dedicate and tune custom transports for high
705
volume destinations. </p>
707
<p> Another common cause of congestion is unwarranted flushing of
708
the entire deferred queue. The deferred queue holds messages that
709
are likely to fail to be delivered and are also likely to be slow
710
to fail delivery (timeouts). This means that the most common reaction
711
to a large deferred queue (flush it!) is more than likely counter-
712
productive, and is likely to make the problem worse. Do not flush
713
the deferred queue unless you expect that most of its content has
714
recently become deliverable (e.g. relayhost back up after an outage)!
799
<p> When a high volume destination is served by multiple MX hosts with
800
typically low delivery latency, performance can suffer dramatically when
801
one of the MX hosts is unresponsive and SMTP connections to that host
802
timeout. For example, if there are 2 equal weight MX hosts, the SMTP
803
connection timeout is 30 seconds and one of the MX hosts is down, the
804
average SMTP connection will take approximately 15 seconds to complete.
805
With a default per-destination concurrency limit of 20 connections,
806
throughput falls to just over 1 message per second. </p>
808
<p> The best way to avoid bottlenecks when one or more MX hosts is
809
non-responsive is to use connection caching. Connection caching was
810
introduced with Postfix 2.2 and is by default enabled on demand for
811
destinations with a backlog of mail in the active queue. When connection
812
caching is in effect for a particular destination, established connections
813
are re-used to send additional messages, this reduces the number of
814
connections made per message delivery and maintains good throughput even
815
in the face of partial unavailability of the destination's MX hosts. </p>
817
<p> If connection caching is not available (Postfix < 2.2) or does
818
not provide a sufficient latency reduction, especially for the "relay"
819
transport used to forward mail to "your own" domains, consider setting
820
lower than default SMTP connection timeouts (1-5 seconds) and higher
821
than default destination concurrency limits. This will further reduce
822
latency and provide more concurrency to maintain throughput should
825
<p> Setting high concurrency limits to domains that are not your own may
826
be viewed as hostile by the receiving system, and steps may be taken
827
to prevent you from monopolizing the destination system's resources.
828
The defensive measures may substantially reduce your throughput or block
829
access entirely. Do not set aggressive concurrency limits to remote
830
domains without coordinating with the administrators of the target
833
<p> If necessary, dedicate and tune custom transports for selected high
834
volume destinations. The "relay" transport is provided for forwarding mail
835
to domains for which your server is a primary or backup MX host. These can
836
make up a substantial fraction of your email traffic. Use the "relay" and
837
not the "smtp" transport to send email to these domains. Using the "relay"
838
transport allocates a separate delivery agent pool to these destinations
839
and allows separate tuning of timeouts and concurrency limits. </p>
841
<p> Another common cause of congestion is unwarranted flushing of the
842
entire deferred queue. The deferred queue holds messages that are likely
843
to fail to be delivered and are also likely to be slow to fail delivery
844
(time out). As a result the most common reaction to a large deferred queue
845
(flush it!) is more than likely counter-productive, and typically makes
846
the congestion worse. Do not flush the deferred queue unless you expect
847
that most of its content has recently become deliverable (e.g. relayhost
848
back up after an outage)! </p>
717
850
<p> Note that whenever the queue manager is restarted, there may
718
851
already be messages in the active queue directory, but the "real"
732
865
might succeed later), the message is placed in the deferred queue.
735
<p> The queue manager scans the deferred queue periodically. The
736
scan interval is controlled by the queue_run_delay parameter.
737
While a deferred queue scan is in progress, if an incoming queue
738
scan is also in progress (ideally these are brief since the incoming
739
queue should be short), the queue manager alternates between bringing
740
a new "incoming" message and a new "deferred" message into the
741
queue. This "round-robin" strategy prevents starvation of either
742
the incoming or the deferred queues. </p>
868
<p> The queue manager scans the deferred queue periodically. The scan
869
interval is controlled by the queue_run_delay parameter. While a deferred
870
queue scan is in progress, if an incoming queue scan is also in progress
871
(ideally these are brief since the incoming queue should be short), the
872
queue manager alternates between looking for messages in the "incoming"
873
queue and in the "deferred" queue. This "round-robin" strategy prevents
874
starvation of either the incoming or the deferred queues. </p>
744
876
<p> Each deferred queue scan only brings a fraction of the deferred
745
877
queue back into the active queue for a retry. This is because each
746
878
message in the deferred queue is assigned a "cool-off" time when
747
879
it is deferred. This is done by time-warping the modification
748
times of the queue file into the future. The queue file is not
880
time of the queue file into the future. The queue file is not
749
881
eligible for a retry if its modification time is not yet reached.
756
888
retried more often than old messages. </p>
758
890
<p> If a high volume site routinely has large deferred queues, it
759
may be useful to adjust the queue_run_delay, minimal_backoff_time
760
and maximal_backoff_time to provide short enough delays on first
761
failure, with perhaps longer delays after multiple failures, to
762
reduce the retransmission rate of old messages and thereby reduce
763
the quantity of previously deferred mail in the active queue. </p>
891
may be useful to adjust the queue_run_delay, minimal_backoff_time and
892
maximal_backoff_time to provide short enough delays on first failure
893
(Postfix ≥ 2.4 has a sensibly low minimal backoff time by default),
894
with perhaps longer delays after multiple failures, to reduce the
895
retransmission rate of old messages and thereby reduce the quantity
896
of previously deferred mail in the active queue. If you want a really
897
low minimal_backoff_time, you may also want to lower queue_run_delay,
898
but understand that more frequent scans will increase the demand for
765
901
<p> One common cause of large deferred queues is failure to validate
766
902
recipients at the SMTP input stage. Since spammers routinely launch
767
903
dictionary attacks from unrepliable sender addresses, the bounces
768
for invalid recipient addresses clog the deferred queue (and at
769
high volumes proportionally clog the active queue). Recipient
770
validation is strongly recommended through use of the local_recipient_maps
771
and relay_recipient_maps parameters. </p>
904
for invalid recipient addresses clog the deferred queue (and at high
905
volumes proportionally clog the active queue). Recipient validation
906
is strongly recommended through use of the local_recipient_maps and
907
relay_recipient_maps parameters. Even when bounces drain quickly they
908
inundate innocent victims of forgery with unwanted email. To avoid
909
this, do not accept mail for invalid recipients. </p>
773
911
<p> When a host with lots of deferred mail is down for some time,
774
912
it is possible for the entire deferred queue to reach its retry
775
913
time simultaneously. This can lead to a very full active queue once
776
914
the host comes back up. The phenomenon can repeat approximately
777
915
every maximal_backoff_time seconds if the messages are again deferred
778
after a brief burst of congestion. Ideally, in the future Postfix
916
after a brief burst of congestion. Perhaps, a future Postfix release
779
917
will add a random offset to the retry time (or use a combination
780
of strategies) to reduce the chances of repeated complete deferred
918
of strategies) to reduce the odds of repeated complete deferred
781
919
queue flushes. </p>
783
921
<h2><a name="credits">Credits</a></h2>