DNS Magic

This document outlines how we setup our DNS data.

This design has been in production use at some very large sites with complex DNS requirements for over 20 years.

We use an SCM system (cvs, git, hg etc.) and make to ensure reliable operation while allowing for multiple hostmasters and automated tools.

The scheme relies on rather rigid rules for naming of zone files etc, but thanks to a simple script, converting from your old DNS setup to this method is quite painless.

The advantages of this setup are:

Source Code Management (SCM)

An SCM allows you to keep an audit trail of changes. This is very imortant, because after a DNS outage caused by human error you will be able to work out who changed what, and how to prevent it happening again via the regression suite

More importantly it allows that the directory that named uses to load from is never used for editing, that is done in a separate tree (or trees), to avoid the risk of a partially edited zone file being loaded by named.

The setup described here can utilize a number of different SCMs. A key requirement is the ability to configure pre-commit checks.

The original distribution used the venerable CVS, which while it lacks many features compared to other more recent SCMs is adequate for this purpose.

Git and Mercurial (hg) are also good choices with easily configured pre-commit hooks. However unlike CVS you need enable it for each clone of the repository that commits will be made from. This is not a problem since we can do it automatically the first time make is run.

Note: that while CVS (like SVN) is a centralized SCM, it is necessary that the repository exist on the same machine where edits are being made, so that the pre-commit checks can access the context of the client.

CVS

For our purpose this may still be the simplest SCM to use.

The hostmaster might:

$ cd $HOME
$ cvs checkout named

to obtain a private copy of the DNS data for editing. Making changes there after is simply a case of:

$ cd ~/named
$ cvs update -dP                # pick up changes made by others
$ vi hosts/crufty.net.db        # make changes as desired
$ cvs diff -cb                  # see what we have changed
$ cvs commit -m"log comment"    # check in the changes.

If the log comment is omitted then the user is dropped into an editor ($EDITOR or vi) to make a log entry. If multiple files have been changed, they can be committed separately or as a group by adding file names to the commit command.

See here for more detail on using CVS.

Git

The setup for GIT will work best with a bare repository to act as the central repo that edits a pushed to and the live data is pulled from.

The hostmaster might:

$ cd $HOME
$ git clone /share/gits/named.git named
$ cd ~/named
$ make

to obtain a clone of the repo to work with. That first make command will among other things setup .git/hooks/pre-commit.

Then as before we make our changes and:

$ vi hosts/crufty.net.db        # make changes as desired
$ git add hosts/crufty.net.db   # stage changed files
$ git commit -m"log comment"    # commit if regression suite happy

unlike the CVS example, we can do multiple commits without the live named picking up any of them until we:

$ git push

Mercurial (hg)

Usage is virtually identical to Git

make

Make is a tool used to keep files up to date with their dependencies. We use it to ensure that the zone files loaded by named are up to date and that the zone file serial numbers are updated when any of the data within the zone is.

We provide a Makefile for bmake (BSD make) and a GNUmakefile for gmake. Both of these provide setup and then include dnsmagic.mk where the actual logic is kept.

zone files

To achieve our goal, the zone files referenced by named.conf or more specifically primary.zones contain nothing but the SOA record (where the serial number lives) and an appropriate $INCLUDE directive.

Since make is most conveniently driven by filename suffixes, we use the convention that the SOA file has an extension of .soa and that the included zone file has an extension of .db

An example always helps:

# make depend
dnsdeps -N named.conf
# touch ns.list
# make
updsoa hosts/crufty.net.soa
updsoa rev/203.12.250.soa
bouncedns

In the above, we ran make depend which uses dnsdeps to ensure that all the dependencies of all the zone files referenced by primary.zones are recorded. We then simply touch a file that some zones are dependent on and run make, which runs updsoa to update the serial number of zones that were dependent on ns.list.

# rm hosts/crufty.net.soa
# make
updsoa hosts/crufty.net.soa
bouncedns

In the above example, we remove one of the .soa files - to simulate an accident or perhaps a new .db file. When we then run make the .soa file is created automagically.

On systems that do not have perl and cannot install it, we have shell scripts that provide the same functionality though at lower speed. The performance really only matters though, when generating PTR records for a large domain. The biggest benefit of the perl scripts is debuggability.

dependencies

The makefile runs dnsdeps when ever the named.conf or primary.zones files are updated. The purpose is to ensure that make knows about all the files that a zone file depends on. The .depend file produced looks like:

hosts/crufty.net.soa: \
        hosts/crufty.net.db \
        mx/crufty.net \
        ns.list

rev/203.12.250.soa: \
        rev/203.12.250.db \
        ns.list

.zones: \
        hosts/crufty.net.soa \
        named.local \
        rev/203.12.250.soa

Reverse maps

Small sites can easily keep their in-addr.arpa zones in sync with the rest of their DNS data. For large nets or for bootstrapping, updrev can be used to build in-addr .db files for all the A records found in the zone files.

Updrev works with data gleaned from a named_dump.db file and respects any existing PTR records provided that a matching A record exists.

Thus, updrev can be used to initially generate the reverse maps, and a human can then edit them to override the tool's choices, such overrides will be persitent. For sites that object to using perl updrev.sh is a shell version of the same logic.

The tool is reasonably efficient, updrev.pl can generate or update reverse maps at about 10,000 A records per minute (measured on a Sparc Classic many years ago ;-).

Note that updrev only supports the DNS arrangement described in this document.

bouncedns

On a nameserver for a large network, it is not practical to reload/restart named every time a change is made. Even on a small nameserver, we want to reload named when any .soa file is updated but not as each .soa file is updated.

For this reason the bouncedns command above, simply touches a flag file to indicate that a DNS restart is needed. The same command is then run regularly from cron such that if the flag file exists, named is restarted.

Note that it is worthwhile coordinating the cron jobs on secondary servers such that the bouncedns jobs do not all run at the same time.

upddns

To update the tree that named loads from we have cron run dnsmagic on a regular basis. This script:

The assute reader will note that doing an automated SCM update in the live tree, risks updating that tree between two related commits, possibly introducing just the sort of problem we are trying to avoid. For this reason, if a file named .noscm is present, the SCM update step is skipped.

As long as administrators are aware of the issue, the .noscm file can be removed and automated updates allowed. When an extensive set of changes are to be performed, .noscm should be created in the live tree to ensure no automated updates will occur until the commits are complete.

Truely rigid sites might only allow updates of the live tree to be done manually and under change management.

For many sites the cronjob modules included with DNS Magic, should prove quite useful.

Installation

The installation instructions assume you are using our configs tool. You can download a suitable archive of DNSMagic for unpacking within the /configs tree from http://www.crufty.net/ftp/pub/unix/

Eventually we may provide a self contained DNSMagic archive.

Setup

Setup is quite simple thanks to dns_convert.sh An example probably will suffice...

$ mkdir /tmp/named
$ cd /tmp/named
$ dns_convert.sh
$ ls
makefile hosts/ mx/ ns.list db.auth named.conf primary.zones
named.ca rev/ secondary/

If using CVS:

$ cvs import -m"original data" named NAMED NAMED_0
$ su
# cd /var
# mv named named.old
# cvs checkout named
...

If using Git:

$ cd named
$ git init
$ git add --all
$ git commit -m"original data"
$ git clone --bare . /share/gits/named.git
$ su
# cd /var
# mv named named.old
# git clone /share/gits/named.git named
...
Then::
# cd named # make dnsdeps -N named.conf updsoa hosts/crufty.net.soa bouncedns ... # cd /etc # mv named.conf named.conf.old # ln -s /var/named/named.conf . # /etc/rc_d/bouncedns -f Stopping named Restarting named # exit $ cd $HOME $ cvs checkout named

There after, changes you make in ~/named can be committed to the repository and simply running upddns in /var/named will sort it out.

Regression Testing

A corrupted primary DNS zone can bring a company to its knees. For this reason, regression testing is a must for all but trival setups.

The basic idea is to run named in test mode, and check that it can load the uncommitted configuration without complaint.

pre-commit checks

As noted earlier, we rely on the SCM's pre-commit hooks to ensure our regression suite is run.

Git/Mercurial

The setup for these is trivial since everything happens within the context of the repo you are commtting to, we make the pre-commit hook simply run make regress.

We also have the advantage with these of being able to make edits on a machine other than the one where the main repository resides.

CVS commitinfo

Fortunately, CVS makes it simple to enforce regression testing before changes can be committed to the repository.

Simply add a line like:

^named/         /usr/local/share/dns/regress

to $CVSROOT/CVSROOT/commitinfo, and that command will be run when ever a commit is made to $CVSROOT/named. Most systems support starting named with an alternate port and bootfile. This allows named to be started and given a chance to verify its input, without interfering with normal DNS service.

Note that if a large number of files have been updated, CVS may fail to invoke the regression suite due to too many args or rather to long a command line. This then causes the commit to fail. The only work around is to commit the files in several batches. The exact number of files which is too many is system dependent.

An alternative is to modify CVS such that the pre-commit filter is fed its args via stdin rather than the command line. We have a patch which does this if the filter command begins with xargs or its full path. For sites with more than 200 in-addr zone files this is a good option - or use Git.

dns/regress

dns/regress is a symlink to rc.sh, so will look for the directory dns/regress.d and preform all the checks found there (that start with an S, see rc.sh(8) for details). If all of the checks pass, then the commit proceeds.

The basic modules are (most of these do nothing if ``NO_NAMED`` is set in the environment):

dns/regress.d/S10regress.sh

See regress.sh(1) for details. It sets up the environment, and if this is the first call for the current cvs commit, it starts named on a different port, with a trimmed named.conf (produced by dns/Makefile) that does not contain any slave zones. The named process is killed when dns/regress terminates.

For subsequence calls by the same cvs process, we skip the above by setting NO_NAMED (which subsequent tests check) and if the original tests failed we bail out immediately. Since we rely on scanning the syslog output from named, we take great pains to verify that syslog is acutally working before starting. Syslog can fail to log due to lack of space or simply due to bugs (at least one major UNIX vendor has a very unreliable syslogd).

dns/regress.d/S20checklog

This module simply checks the syslog output from named, for problems. It is deliberately pedantic (ok, facist), but that's what we want for regression testing. If it sees anything it is looking for the game is over.

dns/regress.d/S20chkorigin

With the DNS setup we are advocating, there is no need for $ORIGIN records in the zone files. Used incorrectly they can cause data to dissappear mysteriously (mysterious to the victim anyway). This module complains bitterly if it finds any $ORIGIN records and suggests an alternative.

dns/regress.d/S40getdb

This module causes named to dump its cache to named_dump.db and then runs getdata which produces a format which is easily searchable using grep:

SOA crufty.net ns.crufty.net hostmaster@crufty.net
SOA 250.12.203.in-addr.arpa ns.crufty.net
PTR 1.250.12.203.in-addr.arpa gate.crufty.net
PTR 130.250.12.203.in-addr.arpa gate.crufty.net
NS crufty.net ns.crufty.net
MX crufty.net gate.crufty.net 100
A ns.crufty.net 203.12.250.1
A gate.crufty.net 203.12.250.1
A gate.crufty.net 203.12.250.130

This saves us having to support a DNS client which can query named on a non-standard port. It can be omitted if no subsequent tests need to look at the data.

dns/regress.d/local.sh

This module looks for a regress.d directory within the tree being committed and if found runs the tests therein. This is a simple means for providing tests specific to a portion of your DNS data.

dns/regress.d/chkwildmx

Wild card MX's are evil. The only excuse for using them is in an external DNS which basically only provides some MX records. Note that this module is not run by default. Link it to say dns/regress.d/S45chkwildmx or in named/regress.d as it needs S40getdb to have run first. It simply checks that there is at least one wildcard MX record for each domain in $WILD_MX if not, it complains.

dns/regress.d/S70chkcvs

This module, runs cvs update to see which files have not been added or committed to CVS. It then runs make .depend to get the list of files that named will need when it reloads. If any of the needed files have not been added to CVS, it generates an error. If any needed files have been added but not yet committed it issues a warning to that effect. The goal is to avoid committing files that rely on others which have not been committed and thus will not be available to the live named.

dns/regress.d/S90cleanup

Just as the name implies.

The simple process of feeding the DNS config into named will pick up the majority of errors. Sites with complex requirements may well find it necessary to add specific tests. Note that the numbering above is quite sparse so it is simple to instantiate new tests.

As mentioned above, if the variable NO_NAMED is set in the environment, then the above tests do very little. Presumably other tests will check the validity of the data in this case. Note that if a group of changes are to be committed individually, then loading up named each time is over-kill. This is the main reason for the variable NO_NAMED, it is set by regress.sh if it detects that it is not the first child of a CVS process and that the original did not fail.

Forcing a commit

If the variable FORCE_COMMIT is set in the environment, then dns/regress.d/regress.sh terminates dns/regress immediately and no checking is done. Obviously, this should be used with caution.

Example

This example, was run with BIND 9.5, which generally stops after the 1st error, so quite a few iterations are needed.

named.conf:

include "/etc/rndc.key";

controls {
        inet 127.0.0.1 port 953
        allow { 127.0.0.1; } keys { "rndc-key"; };
};

zone "127.in-addr.arpa" {
        type master;
        file "named.local";
};

include "primary.zones";

primary.zones:

zone "test.it" {
        type master;
        file "hosts/test.soa";
};

hosts/test.soa:

@       IN      SOA     ns.crufty.net. hostmaster.crufty.net.  (
                                1.2     ; Last changed by - sjg
                                7200    ; Refresh 2 hour
                                1800    ; Retry 1/2 hour
                                3600000 ; Expire 1000 hours
                                14400 ) ; Minimum
$INCLUDE n.list
$INCLUDE hosts/test.db

hosts/test.db:

cool    IN      A       192.168.168.42
        IN      MX      100 cool

foo     IN      A       192.168.168.1
        IN      A       192.168.168.2
        IN      A       192.168.168.
        IN      MX      foo

fool    IN      CNAME   foo
        IN      MX      foo

A first run though regress produces:

regress: checking start
regress: making sure dependencies are up to date
dnsdeps -N named.conf
dnsdeps: cannot open: n.list
make: ignoring stale .depend for n.list
updsoa hosts/test.soa
bouncedns reload
regress: /bin/sh /share/dns/regress.d/S20checklog start
dns_master_load: hosts/test.soa:11: n.list: file not found
zone test.it/IN: loading from master file hosts/test.soa failed: file not found

Since BIND-9 does not support dotted serial numbers, updsoa converted it. After fixing the other errors:

;; DO NOT EDIT THIS FILE it is maintained by magic
;; see sjg for questions...
;;
$TTL    14400
@       IN      SOA     ns.crufty.net. hostmaster.crufty.net.  (
                                2010072100 ; Last changed by - sjg
                                7200    ; Refresh 2 hour
                                1800    ; Retry 1/2 hour
                                3600000 ; Expire 1000 hours
                                14400 ) ; Minimum
$INCLUDE ns.list
$INCLUDE hosts/test.db

Now regress says:

regress: /bin/sh /share/dns/regress.d/S20checklog start
zone test.it/IN: loading from master file hosts/test.soa failed: bad dotted quad

Ok, fixed that...

hosts/test.db:

cool    IN      A       192.168.168.42
        IN      MX      100 cool

foo     IN      A       192.168.168.1
        IN      A       192.168.168.2
        IN      A       192.168.168.3
        IN      MX      foo

fool    IN      CNAME   foo
        IN      MX      foo

and we get:

regress: /bin/sh /share/dns/regress.d/S20checklog start
dns_rdata_fromtext: hosts/test.db:11: near 'foo': not a valid number
zone test.it/IN: loading from master file hosts/test.soa failed: not a valid number

Fix the MX records:

cool    IN      A       192.168.168.42
        IN      MX      100 cool

foo     IN      A       192.168.168.1
        IN      A       192.168.168.2
        IN      A       192.168.168.3
        IN      MX      10 foo

fool    IN      CNAME   foo
        IN      MX      10 foo

and, one last item:

regress: /bin/sh /share/dns/regress.d/S20checklog start
zone test.it/IN: loading from master file hosts/test.soa failed: CNAME and other data

Remove the offending line and BIND (and hence regress) is happy:

regress: checking start
regress: making sure dependencies are up to date
dnsdeps -N named.conf
regress: /bin/sh /share/dns/regress.d/S20checklog start
regress: /bin/sh /share/dns/regress.d/S20chkorigin start
regress: /bin/sh /share/dns/regress.d/S40getdb start
regress: . /share/dns/regress.d/S60local.sh
regress: /bin/sh /share/dns/regress.d/S70chkcvs start
regress: /bin/sh /share/dns/regress.d/S90cleanup start

BIND-9

As of late 2009, pretty well all Internet sites running BIND should be using 9.5 or later. We are thus removing support for earlier versions.

logging

We put the following into named.conf:

logging {
        // we want to know about all problems
        // so that the regression suite will pick them up
        // we only need this on the master.
        category cname { default_syslog; };
        category lame-servers { default_syslog; };
        category insist { default_syslog; };
        // we may also want some of these
        category xfer-out { default_syslog; };
        category statistics { default_syslog; };
        category update { default_syslog; };
        category security { default_syslog; };
        category os { default_syslog; };
        category notify { default_syslog; };
        category response-checks { default_syslog; };
        category maintenance { default_syslog; };
};

BIND-9 issues

BIND-9 is a complete re-write of BIND and is incompatible with earlier versions in several ways.

Cannot listen on port 0

BIND-8 allowed us to set the listen port to 0 (which gave us a random high numbered port) when running the regression suite, this is not allowed with BIND-9 so we have to revert to picking a port and hoping it is unused. This is far from ideal.

Must use rndc for dumping

To a large extent BIND-9 abandons use of signals for controlling named. So we have to detect BIND-9 and use rndc instead for many operations. We use rndc dumpdb -all and rndc blocks until the dump is complete. So this is actually a big improvment.

9.10

Defaults to wanting to create a session key for dynamic DNS in /var/run which causes problems for regress. So we add the followng to Makefile.inc:

CONF_TEST_SED+= -e 's,pid-file.*,session-keyfile "./s.key";,'

The lastest version of this page can be found at:
http://www.crufty.net/help/dns/DNSMagic.htm
Author:sjg@crufty.net /* imagine something very witty here */
Revision:$Id: DNSMagic.txt,v 1.4 2016/04/10 19:11:32 sjg Exp $
Copyright:1997-2016 Simon J. Gerraty