Collating Kevin

Learn from my mistakes. Build it better.

Building a High Availability DNS Recursive Resolver Utilizing DoT

Why Do You Need a Highly Available Forward Resolver?

The domain name system (DNS) is the backbone of the internet. Acting like a directory, DNS is a critical service that translates human parsable domain names such as example.com into IP addresses 93.184.215.14.

At the heart of this system are 13 root servers which contain a list of all the domains in the internet 1.

The root servers themselves get these names from top-level domain, and authoritative nameservers. These servers are ran by domain registrars, so when you purchase a domain from a reseller you’re simply paying them to update the information stored in their servers so that when the root servers reach out to find a domain they get the information that you provided.

A flowchart describing the flow of DNS requests from user to root nameserver to TLD nameserver to authoritative nameserver

We’re concerned with the recursive resolver, or forwarder. That’s the part of our infrastructure that makes a request to the root servers, and returns it to your users. This is an extremely important part of the DNS puzzle and so it is the part that we want to have 0 downtime on.

Additionally, if you’re running on a domain you will have a domain server running DNS. Your computers will be configured to use these name servers to resolve domain addresses in your primary zone (your.domain.tld), as well as domains on the internet at large. This set up normally works well, but there are a number of caveats to keep in mind. I like to run separate resolvers for three important reasons:

  • More control over load balancing and performance.
  • If something goes wrong with the zone servers, you will still have internet access.
  • As of now, the Windows DNS server does not support upstream DNS over HTTPS or DNS over TLS which can present some security concerns.

A flowchart describing the flow of DNS requests from user to root nameserver to TLD nameserver to authoritative nameserver, an additional domain server is present.

But Why?

DNS is required for almost all web functionality. Even several minutes of DNS downtime can be catostrophic. Additionally, slow DNS can cause many problems from long page loads, to login failure if on a domain.

We’re setting out today to set up a system which can resolve names quickly, reliably, and can be updated in place without downtime.

What is DNS over HTTPS (DoH)/DNS over TLS (DoT)?

Throughout the first part of this article we mention DoH and DoT. These are extensions to DNS that allows requests to travel over encrypted channels. Normal DNS is sent entirely in the clear, sites that you visit are known to anyone with the ability to capture packets on the wire (Your ISP for example.)

Securing DNS is important as it prevents information from leaking out. If someone finds out which websites you are visiting, there is a greater potential for targeted threats like spearphishing or spoofing–Therefore it is extremely important to ‘hide’ this information and make it more difficult for prying eyes to look at.

DNS over HTTPS allows for DNS responses to be returned over HTTPS, the same protocol that secures we traffic, while DNS over TLS is normal DNS that has the encryption built in to the protocol. DoH typically uses TCP port 443, and DoT uses TCP port 853 for traffic. For more information see this article from Cloudflare

Prerequisites

Today’s tutorial will be done on a Debian based host (Debian/Ubuntu). You will need:

  • A dedicated host, separate from the rest of the network is best. A VM will work too.
  • Shell access to the host.
  • Docker, which can be installed quickly using the quickstart script here.

Creating Directories

First, we need to create the directories that we will store configurations in. For the purpose of this tutorial we will create the two following directories:

mkdir -p /srv/docker/dnslb-{traefik,unbound}

This will leave us with two directories in our new docker directory dnslb-traefik, and dnslb-unbound.

Enabling Swarm Mode

We will need to enable Docker’s swarm mode in order to perform load balancing and get high availability features. We can do this with the following command:

docker swarm init

This will change Docker into swarm mode and allow us to perform the next step.

Creating the Bridge Network

Next, we need to create the network that the DNS servers will live on, this is required specifically in swarm mode. This can be accomplished with the following:

docker network create --driver overlay proxy_network

Overlay networks are shared amongst hosts running in swarm mode. We only have a single host for now, but we still need to use some of the other features of overlay networks in order to get this working.

Create the Traefik Configuration

Create the following files.

.
└── srv/
    └── docker/
        └── dnslb-traefik/
            ├── docker-compose.yml
            └── traefik.yml

docker-compose.yml needs the following in it.

services:
  proxy:
    image: traefik:latest
    networks:
      - proxy_bridge
    ports:
      - 53:53
      - 53:53/udp
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /srv/docker/dnslb-traefik/traefik.yml:/traefik.yml:ro
    deploy:
      mode: replicated
      replicas: 2
      placement:
        constraints:
          # Traefik NEEDS to run on manager nodes only
          # it requires special docker api access.
          - node.role == manager 
      restart_policy:
        condition: any
        delay: 2s
      update_config:
        order: start-first
        parallelism: 1
        delay: 10s
        failure_action: continue
        monitor: 60s
        max_failure_ratio: 0.3

networks:
  proxy_bridge:
    external: true

While traefik.yml needs the following content:

api:
  dashboard: true
  insecure: true

entryPoints:
  dns:
    address: ":53/udp"
  dns-tcp:
    address: ":53"

log:
  level: DEBUG
  format: json

providers:
  swarm:
    endpoint: "unix:///var/run/docker.sock"
    exposedByDefault: false

Now we should be able to pull the traefik container, while in the directory with our docker-compose.yml file, run docker compose pull this should pull the latest version of traefik. Once we’re sure that everthing is working we can start the container with docker stack deploy -c docker-compose.yml dnslb-proxy to start traefik in a detached state.

Creating the Unbound Container

We’re going to use the image from Matthew Vance. Create the following directory layout.

.
└── srv/
    └── docker/
        └── dnslb-unbound/
            ├── docker-compose.yml
            └── config/
                └── forward-records.conf

In our docker compose file paste the following:

services:
  unbound:
    image: mvance/unbound:latest
    networks:
      - proxy_bridge
    volumes:
      - $PWD/config/unbound.conf:/opt/unbound/etc/unbound/unbound.conf:ro
      - $PWD/config/conf.d/forward-records.conf:/opt/unbound/etc/unbound/forward-records.conf:ro
    deploy:
      mode: replicated
      replicas: 1
      restart_policy:
        condition: any
        delay: 2s
      update_config:
        order: start-first
        parallelism: 2
        delay: 5s
        failure_action: continue
        monitor: 60s
        max_failure_ratio: 0.3
      labels:
        - "name=unbound"
        - "traefik.enable=true"
        - "traefik.docker.network=proxy_bridge"
        - "traefik.udp.routers.dnslb.entrypoints=dns"
        - "traefik.udp.routers.dnslb.service=dnslb"
        - "traefik.udp.services.dnslb.loadbalancer.server.port=53"
        - "traefik.tcp.routers.dnslb.entrypoints=dns-tcp"
        - "traefik.tcp.routers.dnslb.service=dnslb"
        - "traefik.tcp.services.dnslb.loadbalancer.server.port=53"

networks:
  proxy_bridge:
    external: true

NOTE: Take special note of the labels: element. Notice that it is underneath the deploy: element, this is required for swarm mode containers, the labels will be read by traefik otherwise.

We also need to configure unbound. This image will automatically read specific mounted files, create the file config/forward-records.conf with the configuration below:

forward-zone:
    # Forward all queries (except those in cache and local zone) to
    # upstream recursive servers
    name: "."
    # Queries to this forward zone use TLS
    forward-tls-upstream: yes
    forward-addr: 9.9.9.9@853#dns.quad9.net

    # Enable query caching
    forward-no-cache: no

NOTE: You can find a list of public DoT resolvers at dnsprivacy.org

Configure a resolver under the forward-addr: node. in the format ${ip_address}@${port}#${hostname} this specific format is required for DoT servers. We used Quad9 here in our example.

You can now start the container with docker stack deploy -c docker-compose.yml dnslb-unbound while in the directory with the docker-compose.yml file.

Testing it Out

Issue the command `dig google.com @127.0.0.1 +short" and you should get a list of IP addresses similar to:

142.251.163.113
142.251.163.102
142.251.163.100
142.251.163.139
142.251.163.138
142.251.163.101

Congratulations! You’ve got your resolver up and running, and now for the fun part. Run the command docker service scale dnslb-unbound_unbound=${x} where ${x} is the number of containers you want to spin up. You’ll see docker begin to start containers with output like this:

A screenshot of docker swarm spinning up 15 unbound containers

Play around with scaling and see just how many instances of unbound you can create! You might be surprised.

Next, set the DNS server on your computer to the IP of the machine we just set all of this up on. You should be able to access the internet–Congratulations, you are done!

Updating the Services

Since we’re using docker, updates are a snap. In order to update a service run the command docker service update --image=mvance/unbound:latest dnslb-unbound_unbound this will pull the latest container and begin a rolling update of the unbound service. While this is running, you should notice no difference in your internet browsing experience.

We’re Finished

Keep an eye out for the next article in this series, as we’ll be expanding upon some of the concepts. As always, happy hacking.