75
75
for updates. The PDF version is available on <ulink url="http://doc.powerdns.com/pdf">http://doc.powerdns.com/pdf</ulink>, a text file is
76
76
on <ulink url="http://doc.powerdns.com/txt">http://doc.powerdns.com/txt/</ulink>.
78
79
<sect1 id="changelog">
79
80
<title>Release notes</title>
81
82
Before proceeding, it is advised to check the release notes for your PDNS version, as specified in the name of the distribution
85
<sect2 id="changelog-2-9-21"><title>PowerDNS Authoritative Server version 2.9.21</title>
87
Released the 21st of April 2007.
90
This is the first release the PowerDNS Authoritative Server since the Recursor was split off to a separate product, and also marks the transfer
91
of the new technology developed specifically for the recursor, back to the authoritative server.
94
This move has reduced the amount of code of the Authoritative server by over 2000 lines, while improving the quality
95
of the program enormously.
98
However, since so much has been changed, care should be taken when deploying 2.9.21.
101
To signify the magnitude of the underlying improvements, the next release of the PowerDNS Authoritative Server will be called 3.0.
104
This release would not have been possible without large amounts of help and support from the PowerDNS Community. We specifically want to thank
105
Massimo Bandinelli of Italy's <ulink url="http://register.it">Register.it</ulink>, <ulink url="http://aaldering-ict.nl">Dave Aaldering of Aaldering ICT</ulink>,
106
<ulink url="http://true.nl">True BV</ulink>, <ulink url="http://www.xs4all.nl">XS4ALL</ulink>, Daniel Bilik of <ulink url="http://www.neosystem.cz">Neosystem</ulink>,
107
<ulink url="http://www.easydns.com">EasyDNS</ulink>, <ulink url="http://www.siemens.com">Heinrich Ruthensteiner</ulink> of Siemens,
108
<ulink url="http://schwer.us">Augie Schwer</ulink>, <ulink url="http://www.wikipedia.org">Mark Bergsma</ulink>, <ulink url="http://www.forfun.net">Marco Davids</ulink>,
109
<ulink url="http://www.opensuse.org">Marcus Rueckert of OpenSUSE</ulink>, Andre Muraro of <ulink url="http://www.locaweb.com.br">Locaweb</ulink>,
110
Antony Lesuisse, <ulink url="http://www.linuxnetworks.de">Norbert Sendetzky</ulink>, <ulink url="http://www.aruba.it">Marco Chiavacci</ulink>, Christoph Haas,
111
Ralf van der Enden and Ruben Kerkhof.
118
The previous packet parsing and generating code contained no known bugs, but was however very lengthy and overly complex, and might have had
119
security problems. The new code is 'inherently safe' because it relies on bounds-checking C++ constructs. Therefore, a move to 2.9.21 is highly
125
Pre-2.9.21, communication between master and server nameservers was not checked as rigidly as possible, possibly allowing third parties to disrupt
126
but not modify such communications.
134
The 'bind1' legacy version of our BIND backend has been dropped! There should be no need to rely on this old version anymore, as the main BIND backend
135
has been very well tested recently.
143
Multi-part TXT records weren't supported. This has been fixed, and regression tests have been added. Code in commits C1016, C996, C994.
148
Email addresses with embedded dots in SOA records were not parsed correctly, nor were other embedded dots. Noted by 'Bastiaan', fixed in c1026.
153
BIND backend treated the 'm' TTL modifier as 'months' and not 'minutes'. Closes Debian bug 406462. Addressed in c1026.
158
Our snapshots were built against a static version of PosgreSQL that was incompatible with many Linux distributions, leading to instant
159
crashes on startup. Fixed in C1022 and C1023.
164
CNAME referrals to child zones gave improper responses. Noted by Augie Schwer in t123, fixed in c992.
169
When passing a port number with the <command>recursor</command> setting, this would sometimes generate errors during additional processing. Switched off
170
overly helpful additional processing for recursive queries to remove this problem. Implemented in c1031, spotted by Ralf van der Enden.
175
NS to a nameserver with the name of the zone itself generated problems. Spotted by Augie Schwer, fixed in c947.
180
Multi-line records in the BIND backend were not always parsed correctly. Fixed in c1014.
185
The LOC-record had problems operating outside of the eastern hemisphere of the northern part of the world! Fixed in c1011.
190
Backends were compiled without multithreading preprocessor flags. As far as we can determine, this would only cause problems for the BIND backend,
191
but we cannot rule out this caused instability in other backends. Fixed in c1001.
196
The BIND backend was highly unstable under reloads, and leaked memory and file descriptors.
197
Thanks to Mark Bergsma and Massimo Bandinelli for respectively pointing this out to us and testing
198
large amounts of patches to fix the problem. The fixes have resulted in better performance, less code, and a remarkable simplification
199
of this backend. Commits C1039, C1034, C1035, C1006, C999, C905 and previous.
204
BIND backend gave convincing NXDOMAINS on unloaded zones in some cases. Spotted and fixed by Daniel Bilik in c984.
209
SOA records in zone transfers sometimes contained the wrong SOA TTL. Spotted by Christian Kuehn, fixed in c902.
214
PowerDNS could get confused by very high SOA serial numbers. Spotted and fixed by Dan Billik, fixed in c626.
219
Some versions of FreeBSD perform very strict checks on socket address sizes passed to 'connect', which could lead to problems retrieving zones over AXFR.
225
Some versions of FreeBSD perform very strict checks on IPv6 socket addresses, leading to problems. Discovered by Sten Spans, fixed in c885 and c886.
230
IXFR requests were not logged properly. Noted by Ralf van der Enden, fixed in c990.
235
Some NAPTR records needed an additional space character to encode correctly. Spotted by Heinrich Ruthensteiner, fixed in c1029.
240
Many bugs in the TCP nameserver, leading to a PowerDNS process that did not respond to TCP queries over time. Many fixes provided by
241
Dan Bilik, other problems were fixed by rewriting our TCP handling code. Commits C982 and C980, C950, C924, C889, C874, C869, C685, C684.
246
Fix crashes on the ARM processor due to alignment errors. Thanks to Sjoerd Simons. Closes Debian bug 397031.
251
Missing data in generic SQL backends would sometimes lead to faked SOA serial data. Spotted by Leander Lakkas from True. Fix in c866.
256
When receiving two quick notifications in succession, the packet cache would sometimes "process" the second one, leading PowerDNS to ignore it. Spotted by
257
Dan Bilik, fixed in c686.
262
Geobackend (by Mark Bergsma) did not properly override the getSOA method, breaking non-overlay operation of this fine backend. The geobackend now also
263
skips '.hidden' configuration files, and now properly disregards empty configuration files. Additionally, the overlapping abilities were improved. Details
264
available in c876, by Mark.
274
Thanks to <ulink url="http://www.easydns.com">EasyDNS</ulink>, PowerDNS now supports multiple masters per domain. For configuration
275
details, see <xref linkend="slave">. Implemented in c1018, c1017.
280
Thanks to <ulink url="http://www.easydns.com">EasyDNS</ulink>, PowerDNS now supports the KEY record type, as well the SPF record. In c976.
285
Added support for CERT, SSHFP, DNSKEY, DS, NSEC, RRSIG record types, as part of the move to the new DNS parsing/generating code.
290
Support for the AFSDB record type, as requested by 'Bastian'. Implemented in c978, closing t129.
295
Support for the MR record type. Implemented in c941 and c1019.
300
Gsqlite3 backend was added by Antony Lesuisse in c942;
305
Added the ability to send out light-weight root-referrals that save bandwidth yet still placate mediocre resolver implementations. Implemented in c912,
306
enable with 'root-referral=lean'.
316
Miscellaneous OpenDBX and LDAP backend improvements by Norbert Sendetzky. Applied in c977 and c1040.
321
SGML source of the documentation was cleaned up by Ruben Kerkhof in c936.
326
Speedups in core DNS label processing code. Implemented in c928, c654, c1020.
331
When communicating with master servers and encountering errors, more useful details are logged. Reported by Stefan Arentz in t137, closed by c1015.
336
Database errors are now logged with more details. Addressed in c1004.
341
pdns_control problems are now logged more verbosely. Change in c910.
346
Erroneous address configuration was logged unclearly. Spotted by River Tarnell, fixed in c888.
351
Example configuration shipped with PowerDNS was very old. Noted by Leen Besselink, fixed in c946.
356
PowerDNS neglected to chdir to the root when chrooted. This closes t110, fixed in c944.
361
Microsoft resolver had problems with responses we generated for CNAMEs pointing out of our bailiwick. Fixed in c983 and expedited by Locaweb.com.br.
366
Built-in webserver logs errors more verbosely. Closes t82, gixed in c991.
371
Queries containing '@' no longer flood the logs. Addressed in c1014.
376
The build process now looks for PostgreSQL in more places. Implemented in c998, closes t90.
381
Speedups in the BIND backend now mean large installations enjoy startup times up to 30 times faster than with the original BIND nameserver. Many thanks
382
to Massimo Bandinelli.
387
BIND backend now offers full support for query logging, implemented in c1026, c1029.
392
BIND backend named.conf parsing is now fully case-insensitive for domain names. This closes Debian bug 406461, fixed in c1027.
397
IPv6 and IPv4 address parsing routines have been replaced, which should result in prettier output in some cases. c962, c1012 and others.
402
5 new regression tests have been added to insure old bugs do not return.
407
Fix small issues with very modern compilers and BOOST snapshots. Noted by Marcus Rueckert, addressed in c954, c964 c965, c1003.
413
<sect2 id="changelog-recursor-3-1-4"><title>Recursor version 3.1.4</title>
415
Released the 13th of November 2006.
418
This release contains almost no new features, but consists mostly of minor and major bug fixes. It also addresses two major security issues, which makes
419
this release a highly recommended upgrade.
426
Large TCP questions followed by garbage could cause the recursor to crash. This critical security issue has been assigned CVE-2006-4251, and is fixed in
427
c915. More information can be found in <xref linkend="powerdns-advisory-2006-01">.
432
CNAME loops with zero second TTLs could cause crashes in some conditions. These loops could be constructed by malicious parties,
433
making this issue a potential denial of service attack. This security issue has been assigned CVE-2006-4252 and is fixed by c919.
434
More information can be found in <xref linkend="powerdns-advisory-2006-02">. Many thanks to David Gavarret for helping pin down this problem.
444
On certain error conditions, PowerDNS would neglect to close a socket, which might therefore eventually run out. Spotted by Stefan Schmidt, fixed in commits C892, C897, C899.
449
Some nameservers (including PowerDNS in rare circumstances) emit a SOA record in the authority section. The recursor mistakenly interpreted this as an
450
authoritative "NXRRSET". Spotted by Bryan Seitz, fixed in c893.
455
In some circumstances, PowerDNS could end up with a useless (not working, or no longer working) set of nameserver records for a domain. This release contains logic
456
to invalidate such broken NSSETs, without overloading authoritative servers. This problem had previously been spotted by Bryan Seitz, 'Cerb' and Darren Gamble.
457
Invalidations of NSSETs can be plotted using the "nsset-invalidations" metric, available through <command>rec_control get</command>.
458
Implemented in c896 and c901.
463
PowerDNS could crash while dumping the cache using <command>rec_control dump-cache</command>. Reported by Wouter of WideXS and Stefan Schmidt and many others, fixed in c900.
468
Under rare circumstances (depleted TCP buffers), PowerDNS might send out incomplete questions to remote servers. Additionally, on big-endian systems (non-Intel and non-AMD
469
generally), sending out large TCP answers questions would not work at all, and possibly crash. Brought to our attention by David Gavarret, fixed in c903.
474
The recursor contained the potential for a dead-lock processing an invalid domain name. It is not known how this might be triggered,
475
but it has been observed by 'Cerb' on #powerdns. Several dead-locks where PowerDNS consumed all CPU, but did not answer questions,
476
have been reported in the past few months. These might be fixed by c904.
481
IPv6 'allow-from' matching had problems with the least significant bits, sometimes allowing disallowed addresses, but mostly disallowing allowed addresses. Spotted by Wouter
482
from WideXS, fixed in c916.
491
PowerDNS has support to drop answers from so called 'delegation only' zones. A statistic ("dlg-only-drops") is now available to plot how often this happens. Implemented in c890.
496
Hint-file parameter was mistakenly named "hints-file" in the documentation. Spotted by my Marco Davids, fixed in c898.
501
<command>rec_control quit</command> should be near instantaneous now, as it no longer meticulously cleans up memory before exiting. Problem spotted by Darren Gamble, fixed in
507
init.d script no longer refers to the Recursor as the Authoritative Server. Spotted by Wouter of WideXS, fixed in c913.
512
A potentially serious warning for users of the GNU C Library version 2.5 was fixed. Spotted by Marcus Rueckert, fixed in c920.
519
<sect2 id="changelog-recursor-3-1-3"><title>Recursor version 3.1.3</title>
521
Released the 12th of September 2006.
524
Compared to 3.1.2, this release again consists of a number of mostly minor bug fixes, and some slight improvements.
527
Many thanks are again due to Darren Gamble who together with his team has discovered many misconfigured domains that do work
528
with some other name servers. DNS has long been tolerant of misconfigurations, PowerDNS intends to uphold that tradition. Almost all of
529
the domains found by Darren now work as well in PowerDNS as in other name server implementations.
532
Thanks to some recent migrations, this release, or something very close to it, is powering over 40 million internet connections that
533
we know of. We appreciate hearing about succesful as well as unsuccesful migrations, please feel free to notify pdns.bd@powerdns.com of your
534
experiences, good or bad.
541
The MThread default stack size was too small, which led to problems, mostly on 64-bit platforms. This stack size is now configurable
542
using the <command>stack-size</command> setting should our estimate be off. Discovered by Darren Gamble, Sten Spans and a number of others.
548
Plug a small memory leak discovered by Kai and Darren Gamble, fixed in c870.
553
Switch from the excellent nedmalloc to dlmalloc, based on advice by the nedmalloc author. Nedmalloc is optimised for multithreaded
554
operation, whereas the PowerDNS recursor is single threaded. The version of nedmalloc shipped contained a number of possible bugs,
555
which are probably resolved by moving to dlmalloc. Some reported crashes on hitting 2G of allocated memory on 64 bit systems might
556
be solved by this switch, which should also increase performance. See c873 for details.
566
The cache is now explicitly aware of the difference between authoritative and unauthoritative data, allowing it to deal
567
with some domains that have different data in the parent zone than in the authoritative zone. Patch in c867.
572
No longer try to parse DNS updates as if they were queries. Discovered and fixed by Jan Gyselinck, fix in c871.
577
Rebalance logging priorities for less log cluttering and add IP address to a remote server error message.
578
Noticed and fixed by Jan Gyselinck (c877).
583
Add <command>logging-facility</command> setting, allowing syslog to send PowerDNS logging to a separate file. Added in c871.
589
<sect2 id="changelog-recursor-3-1-2"><title>Recursor version 3.1.2</title>
591
Released Monday 26th of June 2006.
594
Compared to 3.1.1, this release consists almost exclusively of bug-fixes and speedups. A quick update is recommended, as some of the bugs
595
impact operators of authoritative zones on the internet. This version has been tested by some of the largest internet providers on the planet,
596
and is expected to perform well for everybody.
599
Many thanks are due to Darren Gamble, Stefan Schmidt and Bryan Seitz who all provided excellent feedback based on their large-scale
600
tests of the recursor.
607
Internal authoritative server did not differentiate between 'NXDOMAIN' and 'NXRRSET', in other words, it would answer
608
'no such host' when an AAAA query came in for a domain that did exist, but did not have an AAAA record. This only affects
609
users with <command>auth-zones</command> configured. Discovered by Bryan Seitz, fixed in c848.
614
ANY queries for hosts where nothing was present in the cache would not work. This did not cause real problems as ANY queries are
615
not reliable (by design) for anything other than debugging, but did slow down the nameserver and cause unnecessary load on remote
616
nameservers. Fixed in c854.
621
When exceeding the configured maximum amount of TCP sessions, TCP support would break and the nameserver would waste CPU trying to accept TCP
622
connections on UDP ports. Noted by Bryan Seitz, fixed in c849.
627
DNS queries come in two flavours: recursion desired and non-recursion desired. The latter is not very useful for a recursor, but is
628
sometimes (erroneously) used by monitoring software or loadbalancers to detect nameserver availability. A non-rd query would not only not recurse,
629
but also not query authoritative zones, which is confusing. Fixed in c847.
634
Non-standard DNS TCP queries, that did occur however, could drive the recursor to 100% CPU usage for extended periods of time. This did not disrupt service
635
immediately, but does waste a lot of CPU, possibly exhausting resources. Discovered by Bryan Seitz, fixed in c858, which is post-3.1.2-rc1.
640
The PowerDNS recursor did not honour the rare but standardised 'ANY' query class (normally 'ANY' refers to the query type, not class), upsetting the Wildfire
641
Jabber server. Discovered and debugged by Daniel Nauck, fixed in c859, which is post-3.1.2-rc1.
646
Everybody's favorite, when starting up under high load, a bogus line of statistics was sometimes logged. Fixed in c851.
651
Remove some spurious debugging output on dropping a packet by an unauthorized host. Discovered by Kai. Fixed in c854.
661
Misconfigured domains, with a broken nameserver in the parent zone, should now work better. Changes motivated and suggested by
662
Darren Gamble. This makes PowerDNS more compliant with RFC 2181 by making it prefer authoritative data over non-authoritative data.
668
PowerDNS can now listen on multiple ports, using the <command>local-address</command> setting. Added in c845.
673
A number of speedups which should have a noticeable impact, implemented in commits C850, C852, C853, C855
678
The recursor now works around an issue with the Linux kernel 2.6.8, as shipped by Debian. Fixed by Christof Meerwald in c860, which is post 3.1.2-rc1.
684
<sect2 id="changelog-recursor-3-1-1"><title>Recursor version 3.1.1</title>
688
3.1.1 is identical to 3.1 except for a bug in the packet chaining code which would mainly manifest itself for IPv6 enabled Konqueror
689
users with very fast connections to their PowerDNS installation. However, all 3.1 users are urged to upgrade to 3.1.1.
690
Many thanks to Alessandro Bono for his quick aid in solving this problem.
695
Released on the 23rd of May 2006. Many thanks are due to the operators of some of the largest internet access providers in the world,
696
each having many millions of customers, who have tested the various 3.1 pre-releases for suitability. They have uncovered and helped
697
fix bugs that could impact us all, but are only (quickly) noticeable with such vast amounts of DNS traffic.
700
After version 3.0.1 has proved to hold up very well under tremendous loads, 3.1 adds important new features:
704
Ability to serve authoritative data from 'BIND' style zone files (using <command>auth-zones</command> statement).
709
Ability to forward domains so configured to external servers (using <command>forward-zones</command>).
714
Possibility of 'serving' the contents of <filename>/etc/hosts</filename> over DNS, which is very well
715
suited to simple domestic router/DNS setups. Enabled using <command>export-etc-hosts</command>.
720
As recommended by recent standards documents, the PowerDNS recursor is now authoritative for RFC-1918 private IP space
721
zones by default (suggested by Paul Vixie).
726
Full outgoing IPv6 support (off by default) with IPv6 servers getting equal treatment with IPv4, nameserver
727
addresses are chosen based on average response speed, irrespective of protocol.
732
Initial Windows support, including running as a service ('NET START "POWERDNS RECURSOR"'). <command>rec_channel</command> is still missing,
733
the rest should work. Performance appears to be below that of the UNIX versions, this situation is expected to improve.
743
No longer send out SRV and MX record priorities as zero on big-endian platforms (UltraSPARC). Discovered by Eric Sproul, fixed in c773.
748
SRV records need additional processing, especially in an Active Directory setting. Reported by Kenneth Marshall, fixed in c774.
753
The root-records were not being refreshed, which could lead to problems under inconceivable conditions. Fixed in c780.
758
Fix resolving domain names for nameservers with multiple IP addresses, with one of these addresses being lame. Other nameserver implementations
759
were also unable to resolve these domains, so not a big bug. Fixed in c780.
764
For a period of 5 minutes after expiring a negative cache entry, the domain would not be re-cached negatively, leading to a lot of duplicate
765
outgoing queries for this short period. This fix has raised the average cache hit rate of the recursor by a few percent. Fixed in c783.
771
Query throttling was not aggressive enough and not all sorts of queries were throttled. Implemented in c786.
777
Fix possible crash during startup when parsing empty configuration lines (c807).
782
Fix possible crash when the first query after wiping a cache entry was for the just deleted entry. Rare in production servers. Fixed in c820.
787
Recursor would send out differing TTLs when receiving a misconfigured, standards violating, RRSET with different TTLs. Implement fix as mandated by
788
RFC 2181, paragraph 5.2. Reported by Stephen Harker (c819).
793
The <command>top-remotes</command> would list remotes duplicately, once per source port. Discovered by Jorn Ekkelenkamp, fixed in c827, which is post 3.1-pre1.
798
Default <command>allow-from</command> allowed queries from fe80::/16, corrected to fe80::/10. Spotted by Niels Bakker, fixed in c829, which is post 3.1-pre1.
803
While PowerDNS blocks failing queries quickly, multiple packets could briefly be in flight for the same domain and nameserver. This situation is now
804
explicitly detected and queries are chained to identical queries already in flight. Fixed in c833 and c834, post 3.1-pre1.
814
ANY queries are now implemented as in other nameserver implementations, leading to a decrease in outgoing queries. The RFCs are not very
815
clear on desired behaviour, what is implemented now saves bandwidth and CPU and brings us in line with existing practice. Previously
816
ANY queries were not cached by the PowerDNS recursor. Implemented in c784.
821
<command>rec_control</command> was very sparse in its error reporting, and user unfriendly as well. Reported by Erik Bos, fixed in c818 and c820.
826
IPv6 addresses were printed in a non-standard way, fixed in c788.
831
TTLs of records are now capped at two weeks, c820.
836
<command>allow-from</command> IPv4 netmasks now automatically work for IP4-to-IPv6 mapper IPv4 addresses, which appear when running on the wildcard
837
<command>::</command> IPv6 address. Lack of feature noted by Marcus 'darix' Rueckert. Fixed in c826, which is post 3.1-pre1.
842
Errors before daemonizing are now also sent to syslog. Suggested by Marcus 'darix' Rueckert. Fixed in c825, which is post 3.1-pre1.
847
When launching without any form of configured network connectivity, all root-servers would be cached as 'down' for some time. Detect this special case
848
and treat it as a resource-constraint, which is not accounted against specific nameservers. Spotted by Seth Arnold, fixed in c835, which is post 3.1-pre1.
853
The recursor now does not allow authoritative servers to keep supplying its own NS records into perpetuity, which causes problems
854
when a domain is redelegated but the old authorative servers are not updated to this effect. Noticed and explained at length by Darren
855
Gamble of Shaw Communications, addressed by c837, which is post 3.1-pre2.
860
Some operators may want to follow RFC 2181 paragraph 5.2 and 5.4. This harms performance and does not solve any real problem,
861
but does make PowerDNS more compliant. If you want this, enable <command>auth-can-lower-ttl</command>. Implemented in c838, which is
868
<sect2 id="changelog-recursor-3-0-1"><title>Recursor version 3.0.1</title>
870
Released 25th of April 2006, <ulink url="http://www.powerdns.com/en/downloads.aspx">download</ulink>.
873
This release consists of nothing but tiny fixes to 3.0, including one with security implications. An upgrade is highly recommended.
879
Compilation used both <filename>cc</filename> and <filename>gcc</filename>, leading to the possibility of compiling with different compiler versions (c766).
884
<command>rec_control</command> would leave files named <filename>lsockXXXXXX</filename> around in the configured socket-dir. Operators
885
may wish to remove these files from their socket-dir (often <filename>/var/run</filename>), quite a few might have accumulated already (c767).
891
Certain malformed packets could crash the recursor. As far as we can determine these packets could only lead to a crash,
892
but as always, there are no guarantees. A quick upgrade is highly recommended (commits C760, C761). Reported by David Gavarret.
898
Recursor would not distinguish between NXDOMAIN and NXRRSET (c756). Reported and debugged by Jorn Ekkelenkamp.
903
Some error messages and trace logging statements were improved (commits C756, C758, C759).
908
stderr was closed during daemonizing, but not dupped to /dev/null, leading to slight chance of odd behaviour on reporting errors (c757)
912
Operating system specific fixes:
916
The stock Debian sarge Linux kernel, 2.6.8, claims to support epoll but fails at runtime. The epoll self-testing code has been improved,
917
and PowerDNS will fall back to a select based multiplexer if needed (c758) Reported by Michiel van Es.
922
Solaris 8 compilation and runtime issues were addressed. See the README for details (c765). Reported by Juergen Georgi and Kenneth Marshall.
927
Solaris 10 x86_64 compilation issues were addressed (c755). Reported and debugged by Eric Sproul.
933
<sect2 id="changelog-recursor-3-0"><title>Recursor version 3.0</title>
935
Released 20th of April 2006, <ulink url="http://www.powerdns.com/en/downloads.aspx">download</ulink>.
938
This is the first separate release of the PowerDNS Recursor. There are many reasons for this, one of the most important ones is that
939
previously we could only do a release when both the recursor and the authoritative nameserver were fully tested and in good shape. The split
940
allows us to release new versions when each part is ready.
943
Now for the real news. This version of the PowerDNS recursor powers the network access of over two million internet connections. Two large
944
access providers have been running pre-releases of 3.0 for the past few weeks and results are good. Furthermore, the various pre-releases
945
have been tested nearly non-stop with DNS traffic replayed at 3000 queries/second.
948
As expected, the 2 million househoulds shook out some very rare bugs. But even a rare bug happens once in a while when there are this many users.
951
We consider this version of the PowerDNS recursor to be the most advanced resolver publicly available. Given current levels of spam, phishing
952
and other forms of internet crime we think no recursor should offer less than the best in spoofing protection. We urge all
953
operators of resolvers without proper spoofing countermeasures to consider PowerDNS, as it is a Better Internet Nameserver Daemon.
956
A good article on DNS spoofing can be found <ulink url="http://www.securesphere.net/download/papers/dnsspoof.htm">here</ulink>. Some
957
more information, based on a previous version of PowerDNS, can be found on the
958
<ulink url="http://blog.netherlabs.nl/articles/2006/04/14/holy-cow-1-3-million-additional-ip-addresses-served-by-powerdns">PowerDNS development blog</ulink>.
963
Because of recent DNS based denial of service attacks, running an open recursor has become a security risk. Therefore, unless configured otherwise
964
this version of PowerDNS will only listen on localhost, which means it does not resolve for hosts on your network.
965
To fix, configure the <command>local-address</command> setting with all addresses you want to listen on. Additionally, by default
966
service is restricted to RFC 1918 private IP addresses. Use <command>allow-from</command> to selectively open up the recursor
967
for your own network. See <xref linkend="recursor-settings"> for details.
972
Important new features of the PowerDNS recursor 3.0:
976
Best spoofing protection and detection we know of. Not only is spoofing made harder by using a new network address for each query,
977
PowerDNS detects when an attempt is made to spoof it, and temporarily ignores the data. For details, see <xref linkend="anti-spoofing">.
982
First nameserver to benefit from epoll/kqueue/Solaris completion ports event reporting framework, for stellar performance.
987
Best statistics of any recursing nameserver we know of, see <xref linkend="recursor-stats">.
992
Last-recently-used based cache cleanup algorithm, keeping the 'best' records in memory
997
First class Solaris support, built on a 'try and buy' Sun CoolThreads T 2000.
1002
Full IPv6 support, implemented natively.
1007
Access filtering, both for IPv4 and IPv6.
1012
Experimental SMP support for nearly double performance. See <xref linkend="recursor-performance">.
1017
Many people helped package and test this release. Jorn Ekkelenkamp of ISP-Services helped find the '8000 SOAs' bug and spotted
1018
many other oddities and <ulink url="http://www.xs4all.nl">XS4ALL</ulink> internet funded a lot of the recent development.
1019
Joaquín M López Muñoz of the boost::multi_index_container was again of great help.
84
1022
<sect2 id="changelog-2-9-20"><title>Version 2.9.20</title>
86
Released the 15th of March 2005
1024
Released the 15th of March 2006
89
1027
Besides adding OpenDBX, this release is mostly about fixing problems and speeding up the recursor. This release has been made possible by
6001
7353
</varlistentry>
7355
<term>version</term>
7358
Print version of this binary. Useful for checking which version of the PowerDNS recursor is installed on a system. Available since 3.1.5.
7363
<term>version-string</term>
7366
By default, PowerDNS replies to the 'version.bind' query with its version number. Security concious users may wish to override
7367
the reply PowerDNS issues.
6002
7371
</variablelist>
6004
<sect2 id="verisign"><title>Verisign weirdness</title>
6006
Verisign, the current operator of the COM and NET zones, decided to add a wildcard record so as to draw all queries for non-existing
6007
domains to their own page, which lists domains you might want to visist instead.
6010
To reinstate old behaviour, add <command>delegation-only=com,net</command> to your recursor configuration.
6013
What this does is reject all authoritative answers from the COM and NET servers. ISC, the current maintainers of BIND, have
6014
implemented this feature first, PowerDNS has mostly copied their algorithm. Thanks!
6017
Verisign might decide to evade our tactic with wildcard NS records, by which time other measures will be needed to restore the
7374
<sect1 id="rec-control"><title>Controlling and querying the recursor</title>
7376
To control and query the PowerDNS recursor, the tool <filename>rec_control</filename> is provided. This program
7377
talks to the recursor over the 'controlsocket', often stored in <filename>/var/run</filename>.
7380
As a sample command, try:
7387
When not running as root, <command>--socket-dir=/tmp</command> might be appropriate.
7390
All rec_control commands are documented below:
7393
<term>dump-cache filename</term>
7396
Dumps the entire cache to the filename mentioned. This file should not exist already, PowerDNS
7397
will refuse to overwrite it. While dumping, the recursor will not answer questions.
7402
<term>get statistic</term>
7405
Retrieve a statistic. For items that can be queried, see below.
7413
Check if server is alive.
7421
Request shutdown of the recursor.
7426
<term>reload-zones</term>
7429
Reload data about all authoritative and forward zones. The configuration file is also scanned
7430
to see if the <command>auth-domain</command>, <command>forward-domain</command> and <command>export-etc-hosts</command>
7431
statements have changed, and if so, these changes are incorporated.
7436
<term>top-remotes</term>
7439
Shows the top-20 most active remote hosts. Statistics are over the last 'remotes-ringbuffer-entries' queries, which
7445
<term>wipe-cache domain0. [domain1. domain2.]</term>
7448
Wipe entries from the cache. This is useful if, for example, an important server has a new IP address, but the TTL has not
7449
yet expired. Multiple domain names can be passed. For versions before 3.1, you must terminate a domain with a .! So to wipe powerdns.org,
7450
issue 'rec_control wipe-cache powerdns.org.'. For later versions, the dot is optional.
7453
Note that deletion is exact, wiping 'com.' will leave 'www.powerdns.com.' untouched!
7458
In PowerDNS versions 3.0.0 and 3.0.1 this command is slightly buggy and might cause your nameserver to crash if the first
7459
query after wiping the cache is for the domain you just wiped.
7464
Don't just wipe 'www.somedomain.com', its NS records or CNAME target may still be undesired, so wipe 'somedomain.com' as well.
7473
The command 'get' can query a large number of statistics, which are detailed in <xref linkend="recursor-stats">.
7477
More details on what 'throttled' queries and the like are can be found below in <xref linkend="recursor-details">.
6024
<sect1><title>Details</title>
6026
PowerDNS implements a very simple but effective nameserver. Care has been taken not to overload remote servers in case
6027
of overly active clients.
6030
This is implemented using the 'throttle'. This accounts all recent traffic and prevents queries that have been sent out
6031
recently from going out again.
6034
There are three levels of throttling.
7480
<sect1 id="recursor-performance"><title>PowerDNS Recursor performance</title>
7482
To get the best out of the PowerDNS recursor, which is important if you are doing thousands of queries per second, please
7483
consider the following.
6038
If a remote server indicates that it is lame for a zone, the exact question won't
6039
be repeated in the next 60 seconds.
6044
After 4 ServFail responses in 60 seconds, the query gets throttled too.
6049
5 timeouts in 20 seconds also lead to query suppression.
7487
Limit the size of the cache to a sensible value. Cache hit rate does not improve meaningfully beyond 4 million <command>max-cache-entries</command>,
7488
reducing the memory footprint reduces CPU cache misses.
7493
Compile using g++ 4.1 or later. This compiler really does a good job on PowerDNS, much better than 3.4 or 4.0.
7498
Consider performing a 'profiled build' as described in the README. This is good for a 20% performance boost in some cases.
7503
When running with >3000 queries per second, and running Linux versions prior to 2.6.17 on some motherboards, your computer may
7504
spend an inordinate amount of time working around an ACPI bug for each call to gettimeofday. This is solved by rebooting with 'clock=tsc'
7505
or upgrading to a 2.6.17 kernel.
7508
The above is relevant if dmesg shows <command>Using pmtmr for high-res timesource</command>
7513
A busy server may need hundreds of file descriptors on startup, and deals with spikes better if it has that many available
7514
later on. Linux by default restricts processes to 1024 file descriptors, which should suffice most of the time, but Solaris
7515
has a default limit of 256. This can be raised using the ulimit command. FreeBSD has a default limit that is high enough for even
7516
very heavy duty use.
7521
If you need it, try <command>--fork</command>, this will fork the daemon into two halves, allowing it to benefit from a second CPU.
7522
This feature almost doubles performance, but is a bit of a hack.
6052
7525
</itemizedlist>
6055
<sect1><title>Statistics</title>
7526
Following the instructions above, you should be able to attain very high query rates.
7529
<sect1 id="recursor-details"><title>Details</title>
7530
<sect2 id="anti-spoofing"><title>Anti-spoofing</title>
7532
The PowerDNS recursor 3.0 uses a fresh UDP source port for each outgoing query, making spoofing around 64000 times harder. This
7533
raises the bar from 'easily doable given some time' to 'very hard'. Under some circimstances, 'some time' has been measured at 2 seconds.
7534
This technique was first used by <filename>dnscache</filename> by Dan J. Bernstein.
7537
In addition, PowerDNS detects when it is being sent too many unexpected answers, and mistrusts a proper answer if found within
7538
a clutch of unexpected ones.
7541
This behaviour can be tuned using the <command>spoof-nearmiss-max</command>.
7544
<sect2><title>Throttling</title>
7546
PowerDNS implements a very simple but effective nameserver. Care has been taken not to overload remote servers in case
7547
of overly active clients.
7550
This is implemented using the 'throttle'. This accounts all recent traffic and prevents queries that have been sent out
7551
recently from going out again.
7554
There are three levels of throttling.
7558
If a remote server indicates that it is lame for a zone, the exact question won't
7559
be repeated in the next 60 seconds.
7564
After 4 ServFail responses in 60 seconds, the query gets throttled too.
7569
5 timeouts in 20 seconds also lead to query suppression.
7577
<sect1 id="recursor-stats"><title>Statistics</title>
7579
The <command>rec_control get</command> command can be used to query the following keys, either single keys or multiple keys
7582
all-outqueries counts the number of outgoing UDP queries since starting
7583
answers0-1 counts the number of queries answered within 1 milisecond
7584
answers100-1000 counts the number of queries answered within 1 second
7585
answers10-100 counts the number of queries answered within 100 miliseconds
7586
answers1-10 counts the number of queries answered within 10 miliseconds
7587
answers-slow counts the number of queries answered after 1 second
7588
cache-entries shows the number of entries in the cache
7589
cache-hits counts the number of cache hits since starting
7590
cache-misses counts the number of cache misses since starting
7591
chain-resends number of queries chained to existing outstanding query
7592
client-parse-errors counts number of client packets that could not be parsed
7593
concurrent-queries shows the number of MThreads currently running
7594
dlg-only-drops number of records dropped because of delegation only setting
7595
negcache-entries shows the number of entries in the Negative answer cache
7596
noerror-answers counts the number of times it answered NOERROR since starting
7597
nsspeeds-entries shows the number of entries in the NS speeds map
7598
nsset-invalidations number of times an nsset was dropped because it no longer worked
7599
nxdomain-answers counts the number of times it answered NXDOMAIN since starting
7600
outgoing-timeouts counts the number of timeouts on outgoing UDP queries since starting
7601
qa-latency shows the current latency average
7602
questions counts all End-user initiated queries with the RD bit set
7603
resource-limits counts number of queries that could not be performed because of resource limits
7604
server-parse-errors counts number of server replied packets that could not be parsed
7605
servfail-answers counts the number of times it answered SERVFAIL since starting
7606
spoof-prevents number of times PowerDNS considered itself spoofed, and dropped the data
7607
sys-msec number of CPU milliseconds spent in 'system' mode
7608
tcp-client-overflow number of times an IP address was denied TCP access because it already had too many connections
7609
tcp-outqueries counts the number of outgoing TCP queries since starting
7610
tcp-questions counts all incoming TCP queries (since starting)
7611
throttled-out counts the number of throttled outgoing UDP queries since starting
7612
throttle-entries shows the number of entries in the throttle map
7613
unauthorized-tcp number of TCP questions denied because of allow-from restrictions
7614
unauthorized-udp number of UDP questions denied because of allow-from restrictions
7615
unexpected-packets number of answers from remote servers that were unexpected (might point to spoofing)
7616
uptime number of seconds process has been running (since 3.1.5)
7617
user-msec number of CPU milliseconds spent in 'user' mode
7619
In the <filename>rrd/</filename> subdirectory a number of rrdtool scripts is provided to make nice
7620
graphs of all these numbers.
6057
7623
Every half our or so, the recursor outputs a line with statistics. More infrastructure is planned so as to allow
6058
7624
for Cricket or MRTG graphs. To force the output of statistics, send the process a SIGUSR1. A line of statistics looks
6072
7638
Finally, 12% of queries were not performed because identical queries had gone out previously, saving load servers worldwide.
7641
<sect1 id="recursor-design-and-engineering">
7642
<title>Design and Engineering of the PowerDNS Recursor</title>
7646
This section is aimed at programmers wanting to contibute to the recursor, or to help fix bugs. It is not required
7647
reading for a PowerDNS operator, although it might prove interesting.
7651
<para>The PowerDNS Recursor consists of very little code, the core DNS logic is less than a thousand lines.</para>
7653
<para>This smallness is achieved through the use of some fine infrastructure: MTasker, MOADNSParser, MPlexer and the C++ Standard Library/Boost. This page will explain the conceptual relation between these components, and the route of a packet through the program.</para>
7656
<title>The PowerDNS Recursor</title>
7657
<para>The Recursor started out as a tiny project, mostly a technology demonstration. These days it consists of the core plus 9000 lines of features. This combined with a need for very high performance has made the recursor code less accessible than it was. The page you are reading hopes to rectify this situation.</para>
7661
<title>Synchronous code using MTasker</title>
7662
<para>The original name of the program was <command>syncres</command>, which is still reflected in the filename <literal>syncres.cc</literal>, and the class SyncRes. This means that PowerDNS is written naively, with one thread of execution per query, synchronously waiting for packets, Normally this would lead to very bad performance (unless running on a computer with very fast threading, like possibly the Sun CoolThreads family), so PowerDNS employs <ulink url="http://ds9a.nl/mtasker">MTasker</ulink> for very fast userspace threading.</para>
7664
<para>MTasker, which was developed separately from PowerDNS, does not provide a full multithreading system but restricts itself to those features a nameserver needs. It offers cooperative multitasking, which means there is no forced preemption of threads. This in turn means that no two <command>MThreads</command> ever really run at the same time.</para>
7666
<para>This is both good and bad, but mostly good. It means PowerDNS does not have to think about locking. No two threads will ever be talking to the DNS cache at the same time, for example.</para>
7668
<para>It also means that the recursor could block if any operation takes too long.</para>
7670
<para>The core interaction with MTasker are the waitEvent() and sendEvent() functions. These pass around PacketID objects. Everything PowerDNS needs to wait for is described by a PacketID event, so the name is a bit misleading. Waiting for a TCP socket to have data available is also passed via a PacketID, for example.</para>
7672
<para>The version of MTasker in PowerDNS is newer than that described at the MTasker site, with a vital difference being that thet waitEvent() structure passes along a copy of the exact PacketID sendEvent() transmitted. Furthermore, threads can trawl through the list of events being waited for and modify the respective PacketIDs. This is used for example with <command>near miss</command> packets: packets that appear to answer questions we asked, but differ in the DNS id. On seeing such a packet, the recursor trawls through all PacketIDs and if it finds any nearmisses, it updates the PacketID::nearMisses counter. The actual PacketID thus lives inside MTasker while any thread is waiting for it.</para>
7676
<title>MPlexer</title>
7677
<para>The Recursor uses a separate socket per outgoing query. This has the important benefit of making spoofing 64000 times harder, and additionally means that ICMP errors are reported back to the program. In measurements this appears to happen to one in ten queries, which would otherwise take a two-second timeout before PowerDNS moves on to another nameserver.</para>
7679
<para>However, this means that the program routinely needs to wait on hundreds or even thousands of sockets. Different operating systems offer various ways to monitor the state of sockets or more generally, filedescriptors. To abstract out the differing strategies (<function>select</function>, <function>epoll</function>, <function>kqueue</function>, <function>completion ports</function>), PowerDNS contains <command>MPlexer</command> classes, all of which descend from the FDMultiplexer class.</para>
7681
<para>This class is very simple and offers only five important methods: addReadFD(), addWriteFD(), removeReadFD(), removeWriteFD() and run.</para>
7683
<para>The arguments to the <command>add</command> functions consist of an fd, a callback, and a boost::any variable that is passed as a reference to the callback.</para>
7685
<para>This might remind you of the MTasker above, and it is indeed the same trick: state is stored within the MPlexer. As long as a filedescriptor remains within either the Read or Write active list, its state will remain stored.</para>
7687
<para>On arrival of a packet (or more generally, when an FD becomes readable or writable, which for example might mean a new TCP connection), the callback is called with the aforementioned reference to its parameter.</para>
7689
<para>The callback is free to call removeReadFD() or removeWriteFD() to remove itself from the active list.</para>
7691
<para>PowerDNS defines such callbacks as newUDPQuestion(), newTCPConnection(), handleRunningTCPConnection().</para>
7693
<para>Finally, the run() method needs to be called whenever the program is ready for new data. This happens in the main loop in pdns_recursor.cc. This loop is what MTasker refers to as <command>the kernel</command>. In this loop, any packets or other MPlexer events get translated either into new MThreads within MTasker, or into calls to sendEvent(), which in turn wakes up other MThreads.</para>
7697
<title>MOADNSParser</title>
7698
<para>Yes, this does stand for <command>the Mother of All DNS Parsers</command>. And even that name does not do it justice! The MOADNSParser is the third attempt I've made at writing DNS packet parser and after two miserable failures, I think I've finally gotten it right.</para>
7700
<para>Writing and parsing DNS packets, and the DNS records it contains, consists of four things:
7704
Parsing a DNS record (from packet) into memory
7709
Generating a DNS record from memory (to packet)
7714
Writing out memory to user-readable zone format
7719
Reading said zone format into memory
7725
<para>This gets tedious very quickly, as one needs to implement all four operations for each new record type, and there are dozens of them.</para>
7727
<para>While writing the MOADNSParser, it was discovered there is a remarkable symmetry between these four transitions. DNS Records are nearly always laid out in the same order in memory as in their zone format representation. And reading is nothing but inverse writing.</para>
7729
<para>So, the MOADNSParser is built around the notion of a <command>Conversion</command>, and we write all Conversion types once. So we have a Conversion from IP address in memory to an IP address in a DNS packet, and vice versa. And we have a Conversion from an IP address in zone format to memory, and vice versa.</para>
7731
<para>This in turn means that the entire implementation of the ARecordContent is as follows (wait for it!)</para>
7733
<literallayout class="monospaced">conv.xfrIP(d_ip);</literallayout>
7734
<para>Through the use of the magic called <literal>c++ Templates</literal>, this one line does everything needed to perform the four operations mentioned above.</para>
7736
<para>At one point, I got really obsessed with PowerDNS memory use. So, how do we store DNS data in the PowerDNS recorsor? I mentioned <command>memory</command> above a lot - this means we could just store the DNSRecordContent objects. However, this would be wasteful.</para>
7738
<para>For example, storing the following:</para>
7740
<literallayout class="monospaced">www.ds9a.nl 3600 IN CNAME outpost.ds9a.nl.</literallayout>
7741
<para>Would duplicate a lot of data. So, what is actually stored is a partial DNS packet. To store the CNAMEDNSRecordContent that corresponds to the above, we generate a DNS packet that has <command>www.ds9a.nl IN CNAME</command> as its question. Then we add <command>3600 IN CNAME outpost.ds9a.nl</command>. as its answer. Then we chop off the question part, and store the rest in the <command>www.ds9a.nl IN CNAME</command> key in our cache.</para>
7743
<para>When we need to retrieve <command>www.ds9a.nl IN CNAME</command>, the inverse happens. We find the proper partial packet, prefix it with a question for <command>www.ds9a.nl IN CNAME</command>, and expand the resulting packet into the answer <command>3600 IN CNAME outpost.ds9a.nl.</command>.</para>
7745
<para>Why do we go through all these motions? Because of DNS compression, which allows us to omit the whole <command>.ds9a.nl.</command> part, saving us 9 bytes. This is amplified when storing multiple MX records which all look more or less alike. This optimization is not performed yet though.</para>
7747
<para>Even without compression, it makes sense as all records are automatically stored very compactly.</para>
7749
<para>The PowerDNS recursor only parses a number of <command>well known record types</command> and passes all other information across verbatim - it doesn't have to know about the content it is serving.</para>
7753
<title>The C++ Standard Library / Boost</title>
7754
<para>C++ is a powerful language. Perhaps a bit too powerful at times, you can turn a program into a real freakshow if you so desire.</para>
7756
<para>PowerDNS generally tries not to go overboard in this respect, but we do build upon a very advanced part of the <ulink url="http://www.boost.org">Boost</ulink> C++ library:
7757
<ulink url="http://boost.org/libs/multi_index/doc/index.html">boost::multi index container</ulink>.</para>
7759
<para>This container provides the equivalent of SQL indexes on multiple keys. It also implements compound keys, which PowerDNS uses as well.</para>
7761
<para>The main DNS cache is implemented as a multi index container object, with a compound key on the name and type of a record. Furthermore, the cache is sequenced, each time a record is accessed it is moved to the end of the list. When cleanup is performed, we start at the beginning. New records also get inserted at the end. For DNS correctness, the sort order of the cache is case insensitive.</para>
7763
<para>The multi index container appears in other parts of PowerDNS, and MTasker as well.</para>
7767
<title>Actual DNS Algorithm</title>
7768
<para>The DNS rfcs do define the DNS algorithm, but you can't actually implement it exactly that way, it was written in 1987.</para>
7770
<para>Also, like what happened to HTML, it is expected that even non-standards conforming domains work, and a sizeable fraction of them is misconfigured these days.</para>
7772
<para>Everything begins with SyncRes::beginResolve(), which knows nothing about sockets, and needs to be passed a domain name, dns type and dns class which we are interested in. It returns a vector of DNSResourceRecord objects, ready for writing either into an answer packet, or for internal use.</para>
7774
<para>After checking if the query is for any of the hardcoded domains (localhost, version.bind, id.server), the query is passed to SyncRes::doResolve, together with two vital parameters: the <literal>depth</literal> and <literal>beenthere</literal> set. As the word <command>recursor</command> implies, we will need to recurse for answers. The <command>depth</command> parameter documents how deep we've recursed already.</para>
7776
<para>The <literal>beenthere</literal> set prevents loops. At each step, when a nameserver is queried, it is added to the <literal>beenthere</literal> set. No nameserver in the set will ever be queried again for the same question in the recursion process - we know for a fact it won't help us further. This prevents the process from getting stuck in loops.</para>
7778
<para>SyncRes::doResolve first checks if there is a CNAME in cache, using SyncRes::doCNAMECacheCheck, for the domain name and type queried and if so, changes the query (which is passed by reference) to the domain the CNAME points to. This is the cause of many DNS problems, a CNAME record really means <command>start over with this query</command>.</para>
7780
<para>This is followed by a call do SyncRes::doCacheCheck, which consults the cache for a straight answer to the question (as possibly rerouted by a CNAME). This function also consults the so called negative cache, but we won't go into that just yet.</para>
7782
<para>If this function finds the correct answer, and the answer hasn't expired yet, it gets returned and we are (almost) done. This happens in 80 to 90% of all queries. Which is good, as what follows is a lot of work.</para>
7788
beginResolve() - entry point, does checks for hardcoded domains
7793
doResolve() - start of recursion process, gets passed <literal>depth</literal> of 0 and empty <literal>beenthere</literal> set
7798
doCNAMECacheCheck() - check if there is a CNAME in cache which would reroute the query
7803
doCacheCheck() - see if cache contains straight answer to possibly rerouted query.
7808
<para>If the data we were queried for was in the cache, we are almost done. One final step, which might as well be optional as nobody benefits from it, is SyncRes::addCruft. This function does additional processing, which means that if the query was for the MX record of a domain, we also add the IP address of the mail exchanger.</para>
7812
<title>The non-cached case</title>
7813
<para>This is where things get interesting, because we start out with a nearly empty cache and have to go out to the net to get answers to fill it.</para>
7815
<para>The way DNS works, if you don't know the answer to a question, you find somebody who does. Initially you have no other place to go than the root servers. This is embodied in the SyncRes::getBestNSNamesFromCache method, which gets passed the domain we are interested in, as well as the <literal>depth</literal> and <literal>beenthere</literal> parameters mentioned earlier.</para>
7817
<para>From now on, assume our query will be for <command><literal>www.powerdns.com.</literal></command>. SyncRes::getBestNSNamesFromCache will first check if there are NS records in cache for <literal><command>www.powerdns.com.</command></literal>, but there won't be. It then checks <literal>powerdns.com. NS</literal>, and while these records do exist on the internet, the recursor doesn't know about them yet. So, we go on to check the cache for <literal><command>com. NS</command></literal>, for which the same holds. Finally we end up checking for <literal><command>. NS</command></literal>, and these we do know about: they are the root servers and were loaded into PowerDNS on startup.</para>
7819
<para>So, SyncRes::getBestNSNamesFromCache fills out a set with the <command>names</command> of nameservers it knows about for the <command><literal>.</literal></command> zone.</para>
7821
<para>This set, together with the original query <command><literal>www.powerdns.com</literal></command> gets passed to SyncRes::doResolveAt. This function can't yet go to work immediately though, it only knows the names of nameservers it can try. This is like asking for directions and instead of hearing <command>take the third right</command> you are told <command>go to 123 Fifth Avenue, and take a right</command> - the answer doesn't help you further unless you know where 123 Fifth Avenue is.</para>
7823
<para>SyncRes::doResolveAt first shuffles the nameservers both randomly and on performance order. If it knows a nameserver was fast in the past, it will get queried first. More about this later.</para>
7825
<para>Ok, here is the part where things get a bit scary. How does SyncRes::doResolveAt find the IP address of a nameserver? Well, by calling SyncRes::getAs (<command>get A records</command>), which in turn calls.. SyncRes::doResolve. Hang on! That's where we came from! Massive potential for loops here. Well, it turns out that for any domain which can be resolved, this loop terminates. We do pass the <literal>beenthere</literal> set again, which makes sure we don't keep on asking the same questions to the same nameservers.</para>
7827
<para>Ok, SyncRes::getAs will give us the IP addresses of the chosen root-server, because these IP addresses were loaded on startup. We then ask these IP addresses (nameservers can have several) for its best answer for <command><literal>www.powerdns.com.</literal></command>. This is done using the LWRes class and specifically LWRes::asyncresolve, which gets passed domain name, type and IP address. This function interacts with MTasker and MPlexer above in ways which needn't concern us now. When it returns, the LWRes object contains the best answers the queried server had for our domain, which in this case means it tells us about the nameservers of <literal>com.</literal>, and their IP addresses.</para>
7829
<para>All the relevant answers it gives are stored in the cache (or actually, merged), after which SyncRes::doResolveAt (which we are still in) evaluates what to do now.</para>
7831
<para>There are 6 options:
7835
The final answer is in, we are done, return to SyncRes::doResolve and SyncRes::beginResolve
7840
The nameserver we queried tells us the domain we asked for authoritatively does not exist. In case of the root-servers, this happens when we query for <emphasis><literal>www.powerdns.kom.</literal></emphasis> for example, there is no <emphasis><literal>kom.</literal></emphasis>. Return to SyncRes::beginResolve, we are done.
7845
A lesser form - it tells us it is authoritative for the query we asked about, but there is no record matching our type. This happens when querying for the IPv6 address of a host which only has an IPv4 address. Return to SyncRes::beginResolve, we are done.
7850
The nameserver passed us a CNAME to another domain, and we need to reroute. Go to SyncRes::doResolve for the new domain.
7855
The namserver did not know about the domain, but does know who does, a <emphasis>referral</emphasis>. Stay within doResolveAt and loop to these new nameservers.
7860
The nameserver replied saying <emphasis>no idea</emphasis>. This is called a <emphasis>lame delegation</emphasis>. Stay within SyncRes::doResolveAt and try the other nameservers we have for this domain.
7865
<para>When not redirected using a CNAME, this function will loop until it has exhausted all nameservers and all their IP addresses. DNS is surprisingly resilient that there is often only a single non-broken nameserver left to answer queries, and we need to be prepared for that.</para>
7867
<para>This is the whole DNS algorithm in PowerDNS, all in less than 700 lines of code. It contains a lot of tricky bits though, related to the cache.</para>
7871
<title>Some of the things we glossed over</title>
7872
<para>Whenever a packet is sent to a remote nameserver, the response time is stored in the SyncRes::s_nsSpeeds map, using an exponentially weighted moving average. This EWMA averages out different response times, and also makes them decrease over time. This means that a nameserver that hasn't been queried recently gradually becomes <command>faster</command> in the eyes of PowerDNS, giving it a chance again.</para>
7874
<para>A timeout is accounted as a 1s response time, which should take that server out of the running for a while.</para>
7876
<para>Furthermore, queries are throttled. This means that each query to a nameserver that has failed is accounted in the <literal>s_throttle</literal> object. Before performing a new query, the query and the nameserver are looked up via shouldThrottle. If so, the query is assumed to have failed without even being performed. This saves a lot of network traffic and makes PowerDNS quick to respond to lame servers.</para>
7878
<para>It also offers a modicum of protection against birthday attack powered spoofing attempts, as PowerDNS will not innundate a broken server with queries.</para>
7880
<para>The negative query cache we mentioned earlier caches the cases 2 and 3 in the enumeration above. This data needs to be stored separately, as it represents <command>non-data</command>. Each negcache query entry is the name of the SOA record that was presented with the evidence of non-existance. This SOA record is then retrieved from the regular cache, but with the TTL that originally came with the NXDOMAIN (case 2) or NXRRSET (case 3).</para>
7884
<title>The Recursor Cache</title>
7885
<para>As mentioned before, the cache stores partial packets. It also stores not the <command>Time To Live</command> of records, but in fact the <command>Time To Die</command>. If the cache contains data, but it is expired, that data should not be deemed present. This bit of PowerDNS has proven tricky, leading to deadlocks in the past.</para>
7887
<para>There are some other very tricky things to deal with. For example, through a process called <command>more details</command>, a domain might have more nameservers than listed in its parent zone. So, there might only be two nameservers for <literal><command>powerdns.com.</command></literal> in the <command><literal>com.</literal></command> zone, but the <command><literal>powerdns.com</literal></command> zone might list more.</para>
7889
<para>This means that the cache should not, when talking to the <command><literal>com.</literal></command> servers later on, overwrite these four nameservers with only the two copies the <command><literal>com.</literal></command> servers pass us.</para>
7891
<para>However, in other cases (like for example for SOA and CNAME records), new data should overwrite old data.</para>
7892
<para>Note that PowerDNS deviates from RFC 2181 (section 5.4.1) in this respect.</para>
7896
<title>Some small things</title>
7897
<para>The server-side part of PowerDNS (<literal>pdns_recursor.cc</literal>), which listens to queries by end-users, is fully IPv6 capable using the ComboAddress class. This class is in fact a union of a <literal>struct sockaddr_in</literal> and a <literal>struct sockaddr_in6</literal>. As long as the <literal>sin_family</literal> (or <literal>sin6_family</literal>) and <literal>sin_port</literal> members are in the same place, this works just fine, allowing us to pass a ComboAddress*, cast to a <literal>sockaddr*</literal> to the socket functions. For convenience, the ComboAddress also offers a length() method which can be used to indicate the length - either sizeof(sockaddr_in) or sizeof(sockaddr_in6).</para>
7899
<para>Access to the recursor is governed through the NetmaskGroup class, which internally contains Netmaks, which in turn contain a ComboAddress.</para>
6076
7903
<chapter id="replication"><title>Master/Slave operation & replication</title>