DNS

For more than 40 years, the Domain Name System (DNS) has been a critical piece of Internet infrastructure. Even though it shows its age and additional layers are required to provide privacy and security on top of the original protocol, it is a testament to good engineering.

History

DNS was created around 1983 to replace HOSTS.TXT, a static file which mapped host (machine) names to addresses. Having a single, centrally managed and manually maintained database was quickly becoming unsustainable, and Paul Mockapetris was tasked to choose among five competing proposals to fix the problem.

He chose to invent DNS instead.

Today, DNS is used for much more than just name to address translation, although that remains the major use case. It is arguably the most successful distributed database in widespread use today.

Use cases

Before we describe DNS in depth, we’d like to illustrate the various use cases it has accumulated over the years. To query DNS, we’ll be using drill(1), a command-line utility for DNS debugging.

Translation of names to IP addresses (forward DNS lookup)

DNS is used chiefly to translate names to IP addresses; without arguments, drill does just that:

~% drill quad9.net
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 248
;; flags: qr rd ra ; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;; quad9.net.   IN      A

;; ANSWER SECTION:
quad9.net.      543     IN      A       216.21.3.77

;; AUTHORITY SECTION:

;; ADDITIONAL SECTION:

;; Query time: 533 msec
;; SERVER: 127.0.0.1
;; WHEN: Thu Nov 21 09:15:02 2024
;; MSG SIZE  rcvd: 43

The address corresponding to a name is held in so-called A records (address) and that’s what drill retrieved. We could query the A records directly (drill A quad9.net.) and that would give the same result1.

Let’s break the output down. The lines starting with ;; are comments and they are produced by drill; they didn’t come from the name server.

;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 248
;; flags: qr rd ra ; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

The header contains several interesting bits of information:

;; QUESTION SECTION:
;; quad9.net.   IN      A

The question section contains the query (request) that drill sent to the target DNS server. This listing tells us that we asked for a record called quad9.net. (with a trailing dot) in the IN (internet) class of type A (address). In other words, we want to retrieve an IPv4 address for the name.

;; ANSWER SECTION:
quad9.net.      543     IN      A       216.21.3.77

The answer section contains the server’s best response to our query. The section may be empty if no match was found or the query failed, or may contain more than you asked for, or something else altogether. In the simple case above, we got the exact answer we hoped for.

;; AUTHORITY SECTION:

;; ADDITIONAL SECTION:

The authority and additional sections will be important later; in the listing above, they are empty.

;; Query time: 533 msec
;; SERVER: 127.0.0.1
;; WHEN: Thu Nov 21 09:15:02 2024
;; MSG SIZE  rcvd: 43

The trailing comments are useful for debugging. For example, you can see that the query was served by a locally running resolver and took 533 ms. This is unusually long, but in this case there’s a good reason for it3.

Translation of IP addresses to names (reverse DNS lookup or rDNS)

It is also possible to perform a reverse lookup with -x. Let’s try to resolve 216.21.3.77 (the IPv4 address we got as a response to A quad9.net.) back to the name:4

~% drill -x 216.21.3.77
;; [...]
;; QUESTION SECTION:
;; 77.3.21.216.in-addr.arpa.    IN      PTR

;; ANSWER SECTION:
77.3.21.216.in-addr.arpa.       559     IN      PTR     web1.sjc.rrdns.pch.net.
;; [...]

As you can see, the query was for a PTR record (pointer) with a special name (the IPv4 address from the query with bytes reversed) in the in-addr.arpa. domain. The reverse mappings (from IP addresses back to names) are represented in a name tree rooted at arpa..

Note: drill -x 216.21.3.77 is the same as drill PTR 77.3.21.216.in-addr.arpa.; the -x is just a convenience so that you don’t have to construct the unwieldy name yourself5.

Forward and reverse DNS with IPv6 addresses

Note: IPv6 will be discussed in Networking III.

With IPv6, forward and reverse DNS work similarly with some minor modifications. To translate a name to an IPv6 address, one would look up AAAA records6 rather than A records:

~% drill AAAA zajic.v.pytli.cz
;; [...]
;; QUESTION SECTION:
;; zajic.v.pytli.cz.    IN      AAAA

;; ANSWER SECTION:
zajic.v.pytli.cz.       1800    IN      AAAA    2a0c:7040::b:1
;; [...]

A reverse query uses PTR records as before, but the suffix for reverse lookup is ip6.arpa. and nibbles (half-bytes) rather than bytes of the address are reversed:

~% drill -x 2a0c:7040::b:1
;; [...]
;; QUESTION SECTION:
;; 1.0.0.0.b.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.4.0.7.c.0.a.2.ip6.arpa.    IN      PTR

;; ANSWER SECTION:
1.0.0.0.b.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.4.0.7.c.0.a.2.ip6.arpa.       340     IN      PTR     ipv6-2a0c-7040--b-1.dalunet.cz.
;; [...]

Note: The type of address being resolved (IPv4 or IPv6) has nothing to do with whether the request is made over IPv4 or IPv6. It’s possible to query an IPv6 address using IPv4 and vice-versa.

Locating mail servers and other services

But DNS can be used for more than just translating between names and addresses. Let’s ask for MX records (mail exchanger) of proton.me, a privacy-friendly e-mail provider:

~% drill MX proton.me
;; [...]
;; QUESTION SECTION:
;; proton.me.   IN      MX

;; ANSWER SECTION:
proton.me.      1200    IN      MX      10 mail.protonmail.ch.
proton.me.      1200    IN      MX      20 mailsec.protonmail.ch.
;; [...]

The MX records list the host names of machines which accept the mail on behalf of a domain, in this case proton.me. Two mail servers are listed for the domain: the primary server mail.protonmail.ch with priority 10 (lower → more preferred) and a secondary (backup) server mailsec.protonmail.ch with priority 20.

SRV records (service) are similar to MX records but are protocol agnostic.

Arbitrary data: text records

TXT records (text) allow you to capture arbitrary information within DNS:

~% drill TXT dcepelik.cz
;; [...]
;; QUESTION SECTION:
;; dcepelik.cz. IN      TXT

;; ANSWER SECTION:
dcepelik.cz.    1800    IN      TXT     "google-site-verification=ty11Gju7xnbZmQcHmHCCHMPiFX4Zk4pE5MDGDatbAhw"
dcepelik.cz.    1800    IN      TXT     "v=spf1 mx ~all"
;; [...]

As you can see, there are two TXT records served for my domain name:

DNS-based Authentication of Named Entities (DANE)

A quick recap: when you establish a connection to a remote host on the network, you have no guarantee you are talking to the intended recipient. Encryption doesn’t solve this by and on itself, and that’s why:

The original idea behind DANE (RFC 6698) is to get rid of CAs for certificate issuance. We can just publish a (self-signed) certificate as a special type of record in DNS. By using an additional security layer called DNSSEC, the client can verify that this DNS record was indeed provisioned by the administrator of the zone, the same level of attestation that an endorsement from a CA normally provides.

From a security standpoint, this approach is justified:

At the moment, DANE for TLS isn’t supported by any major browser, and we suspect this has something to do with the fact that it’s not at all in the best interest of the various three-letter agencies.

DANE can be used for more than just TLS:

SSL no taste good

NSA doesn’t like the taste of SSL (2014).

Dynamic DNS

One thing all of the above use cases have in common is that they concern mostly static data (that which does not change, or only changes infrequently).

Dynamic DNS refers to using DNS with short-lived, automatically provisioned DNS records:

~% drill netflix.com
;; [...]
;; QUESTION SECTION:
;; netflix.com. IN      A

;; ANSWER SECTION:
netflix.com.    30      IN      A       18.200.8.190
netflix.com.    30      IN      A       54.73.148.110
netflix.com.    30      IN      A       54.155.246.232
;; [...]

If you run this command repeatedly (watch drill netflix.com), you’ll notice several things:

DNS64

As a final example, let’s consider what happens when a client in an IPv6-only network (with no IPv4 address) wishes to connect to an IPv4-only service (with no IPv6 address).

Spoiler: it doesn’t work. Since IPv4 and IPv6 are two distinct protocols and the target does not exist in the IPv6 Internet, a special kind of network address translation (called NAT64) is necessary.

In NAT64, the local IPv6-only network dedicates an IPv6 prefix (usually 64:ff9b::/96) for 6-to-4 tunneling. A client in the IPv6-only network combines the target IPv4 address (e.g. 37.205.14.1178) with the prefix (64:ff9b::25cd:e75 in hex). The client will connect to this synthetic IPv6 address instead of the intended IPv4 target. These packets will reach the NAT64 device (which has both IPv4 and IPv6 connectivity) and it will translate between IPv4 (the target) and IPv6 (the client) transparently.

The client can then reach both the IPv6 Internet (ideally without any NAT) and also the IPv4 Internet (with NAT64 support) and both the network and the client can be configured for IPv6 only. The only thing the client has to do is to discover the IPv6 prefix used for NAT64 and synthesize the IPv6 address.

This finally brings us back to DNS. The prefix can either be advertised by the DHCP server, or it can be discovered by resolving a special DNS name. Better still, DNS resolvers (such as Unbound) are able to synthesize the IPv6 addresses on the fly, so the client needn’t worry about NAT64 at all: if an AAAA query results in no IPv6 address but there is an IPv4 address, the resolver will return the synthetic IPv6 address.

In order for this to work, the client needs to use the local DNS resolver obtained from DHCP. Since I’m always using my locally running Unbound, DNS64 doesn’t work for me. With Ondra Caletka, we wrote godns64 to fix that. The README for the project contains additional details in case you’re interested.

Principle of operation

We’ve seen many interesting use cases for DNS. Let’s take a look now at how it works.

DNS zones and delegation

We’ve mentioned before that DNS is a distributed database. To understand what that really means, we first must grasp the concept of DNS zones and delegation:

DNS zones and delegation.

This figure shows four DNS servers (lighter gray) and four DNS zones (darker gray). Each zone contains some delegation records (blue) and some non-delegation records (red). The delegation records pass authority (responsibility) for a part of the name tree to another name server(s). This is depicted by the blue arrows.

As you can see from the figure above:

Zones versus subdomains

You would be forgiven for thinking that a DNS zone always corresponds to a particular (sub)domain and at first glance, that’s what the figure seems to show.

Although it’s true that delegation often happens for each component of a domain name (for $login.nswi106.cz, delegation happens for cz., nswi106.cz. and $login.nswi106.cz.), this is not the rule:

Formally, a zone is a subtree of the global namespace, minus subtrees of that tree that are delegated elsewhere.

A DNS zone always contains a SOA record (start of authority) at the root of the zone:

~% drill -Q SOA nswi106.cz.
praha.dcepelik.cz. d.dcepelik.cz. 413588 1800 900 1209600 900

The SOA record contains:

The SOA record affects the entire zone and all its records.

Authoritative name servers

A server is authoritative for a zone if the zone has been delegated to it (with NS records) from a parent zone. For example, praha.dcepelik.cz. is authoritative for nswi106.cz. because corresponding delegation records exist in the cz. zone.

That’s how things work from the perspective of the global DNS name space. From a single server’s point of view, the situation is a bit more nuanced:

You can easily set up an authoritative name server on your laptop for the entire cz. zone and point nic.cz. to whatever IP address you like. If you query your local name server for the address of nic.cz., it will return the record you configured and will indicate that it does consider itself authoritative for the zone (and the record). But from the perspective of the global namespace, the server is not authoritative for the zone because there are no corresponding delegation records in the root zone.

NSD is our authoritative-only name server of choice.

Recursive-only DNS

Recursive-only DNS servers aren’t authoritative for any DNS zones. It makes no sense to delegate a zone to a recursive-only server, as it has no DNS data of its own to serve.

Even though they don’t distribute any original data, recursors are an incredibly important part of the global DNS deployment. Their primary responsibility lies in lowering the load on the authoritative servers and improving DNS lookup times for the clients. A typical recursor:

[Unbound] is our recursive-only caching name server of choice.

DNS recursion algorithm

Because the information needed to resolve a single name is typically spread across several name servers, in most cases, it’s not enough to query a single authoritative name server to obtain an answer to a query. Instead, we must query the name servers successively, starting from the root.

Let’s see how we may query A eval.nswi106.cz. by hand, step by step. With drill, you can specify the name server to query with the @ip-address syntax. Starting at the root servers:

~% drill @a.root-servers.net eval.nswi106.cz
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 59430
;; flags: qr rd ; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 8
;; QUESTION SECTION:
;; eval.nswi106.cz.     IN      A

;; ANSWER SECTION:

;; AUTHORITY SECTION:
cz.     172800  IN      NS      a.ns.nic.cz.
cz.     172800  IN      NS      c.ns.nic.cz.
cz.     172800  IN      NS      b.ns.nic.cz.
cz.     172800  IN      NS      d.ns.nic.cz.

;; ADDITIONAL SECTION:
a.ns.nic.cz.    172800  IN      A       194.0.12.1
a.ns.nic.cz.    172800  IN      AAAA    2001:678:f::1
c.ns.nic.cz.    172800  IN      A       194.0.14.1
c.ns.nic.cz.    172800  IN      AAAA    2001:678:11::1
b.ns.nic.cz.    172800  IN      A       194.0.13.1
b.ns.nic.cz.    172800  IN      AAAA    2001:678:10::1
d.ns.nic.cz.    172800  IN      A       193.29.206.1
d.ns.nic.cz.    172800  IN      AAAA    2001:678:1::1
;; [...]

All previous drill examples were using my local recursive resolver. But now we are talking to an authoritative server; you can see the ra (recursion available) flag missing.

The server couldn’t answer our query (hence the empty answer section), but provided us with clues: it responded with a list of name servers for cz.. It also provided A records for the name servers in the additional section.

These additional addresses are called glue records. They solve a problem which is exemplified here: the name servers of cz. have host names which are descendants of the zone delegated to cz. ({a,b,c,d}.ns.nic.cz.). So to get the IP addresses of the name servers, we first must know the IP addresses of the name servers, so that we can query them for their own IP addresses..? Glue records break the loop. They are normally provisioned in the parent zone and they are not authoritative information, just a hint.

Let’s pick a name server at random and continue with our query:

~% drill @a.ns.nic.cz eval.nswi106.cz
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 5904
;; flags: qr rd ; QUERY: 1, ANSWER: 0, AUTHORITY: 2, ADDITIONAL: 4
;; QUESTION SECTION:
;; eval.nswi106.cz.     IN      A

;; ANSWER SECTION:

;; AUTHORITY SECTION:
nswi106.cz.     3600    IN      NS      brno.dcepelik.cz.
nswi106.cz.     3600    IN      NS      praha.dcepelik.cz.

;; ADDITIONAL SECTION:
brno.dcepelik.cz.       3600    IN      A       37.205.14.117
brno.dcepelik.cz.       3600    IN      AAAA    2a03:3b40:fe:3c::1
praha.dcepelik.cz.      3600    IN      A       37.205.14.117
praha.dcepelik.cz.      3600    IN      AAAA    2a03:3b40:fe:3c::1
;; [...]

And finally:

~% drill @brno.dcepelik.cz eval.nswi106.cz
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 57673
;; flags: qr aa rd ; QUERY: 1, ANSWER: 2, AUTHORITY: 2, ADDITIONAL: 0
;; QUESTION SECTION:
;; eval.nswi106.cz.     IN      A

;; ANSWER SECTION:
eval.nswi106.cz.        300     IN      CNAME   z.nswi106.cz.
z.nswi106.cz.   300     IN      A       135.181.230.236

;; AUTHORITY SECTION:
nswi106.cz.     300     IN      NS      brno.dcepelik.cz.
nswi106.cz.     300     IN      NS      praha.dcepelik.cz.

;; ADDITIONAL SECTION:
;; [...]

DNS privacy: DoT/DoH

If you’re using your ISP’s DNS servers, you’re making it exceptionally easy for them to spy on your traffic. Your ISP likely knows your rough browsing history (all host names you visit), can guess what kind of OS you’re using, whether you’re using antivirus software and what kind, etc. That’s not great. Luckily, there are ways to fix this.

The simplest option is not to use your ISP’s DNS servers and instead use one of the public DNS servers. That way it’s a bit harder for the ISP to sniff your traffic, since it’s no longer enough to just read the DNS logs. But DNS is unencrypted by default and they can still easily decode your DNS queries if they want to.

That means you should encrypt your DNS traffic in addition to using a trustworthy11 public DNS provider. That’s exceptionally simple thanks to DNS over TLS (DoT) and DNS over HTTPS (DoH). There are privacy-friendly DNS providers such as Quad9 who support DoT and promise not to spy on your traffic.

Simply set up a locally running Unbound instance and add the following directive:

forward-zone:
  name: .
  forward-tls-upstream: yes
  forward-addr: 9.9.9.9@853#dns.quad9.net

And then configure your network manager to use that DNS server. You could also just replace /etc/resolv.conf and configure just your local name server12:

~% cat /etc/resolv.conf
# Use local Unbound instance pointed to Quad9 public DNS using DoT.
nameserver 127.0.0.1

It’s a good idea to do a before-and-after packet capture to verify that your setup works as intended.

With this setup:

Warning about Server Name Indication (SNI)

All TLS traffic, such as all HTTPS traffic, is still using the Server Name Indication (SNI) TLS extension. This is a set of clear-text fields in the Client Hello packet which contain the host name you’re connecting to. Yes, really. This part of TLS is, unfortunately, unencrypted to this day. Try to capture your HTTPS traffic and you’ll see the clear-text host names in Wireshark.

So, the ISP still can see what host names you connect to, even when you’re using HTTPS. But we don’t expect them to log this by default (unless you are specifically being targeted) since it sounds like work. But you’re not safe. You can read more about this problem and possible solutions here.

If you have a reason to worry about this, use Tor.

Security problems

The DNS protocol was designed for a different world. By default, DNS uses unencrypted UDP or TCP transport. This has far-reaching consequences:

  1. Anybody who can capture packets anywhere along the network path can analyze our DNS traffic. We can mitigate this problem by using DoT.

  2. An attacker on our network path can modify DNS responses to our queries and serve us with fake records. DNSSEC is meant to fix this problem.

Classes of DNS attacks

Let’s take a look at some practical attacks on DNS infrastructure so that we can see how DNSSEC mitigates them:

The terms “DNS spoofing” and “DNS hijacking” are used somewhat interchangeably, so don’t get too attached to them.

Example: DNS-enabled phishing

Serving the user a fake DNS record is usually not an attack in itself; usually, it will be the first step of a more sophisticated endeavor, such as phishing. Let’s take a look at how an attacker may use the inherent insecurity of DNS to carry out a phishing attack.

Let’s assume that an attacker is able to modify or outright fake the A (or AAAA, or CNAME) record of d3s.mff.cuni.cz, to serve you a fake website of the Department of Distributed and Dependable Systems. When you type that domain name into your browser, the following will happen with most browsers in their default configuration:

The browser will first resolve the domain name to the attacker-chosen IPv4 or IPv6 address, and will initiate an HTTP (as opposed to HTTPS) connection to it. This is the default behavior of many browsers when you don’t type the https:// scheme as part of the URL yourself.

Note: Chrome tries HTTPS first since version 90. Firefox can also default to HTTPS in case you visited the site over HTTPS in the past, see HSTS. Firefox will also default to HTTPS First in private browsing.

A legit d3s.mff.cuni.cz web server will respond with a 301 redirect to itself:

~% curl -i http://d3s.mff.cuni.cz
HTTP/1.1 302 Found
Date: Tue, 13 Dec 2022 11:13:52 GMT
Server: Apache/2.4.29 (Ubuntu)
Location: https://d3s.mff.cuni.cz/
Content-Length: 289
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="https://d3s.mff.cuni.cz/">here</a>.</p>
<hr>
<address>Apache/2.4.29 (Ubuntu) Server at d3s.mff.cuni.cz Port 80</address>
</body></html>

You can see it’s an Apache web server serving us the redirect. It is configured to redirect all HTTP URLs to their HTTPS counterparts, which is considered best practice.

However, since the DNS response was spoofed, you’re not talking to the legit web server, so the game is different. The attacker’s goal is to trick you into thinking you’re looking at d3s.mff.cuni.cz. They now have two options:

Your connection is not secure

It’s a good idea to check the address bar. Also it’s not enough.

Since most users are nowadays quite vigilant about the address bar, especially when accessing sensitive sites, the attacker has to go the alternate route, and serve you their fake website over HTTPS, albeit with a different domain name in the address bar. And you would certainly notice that! Or would you?

In this text, I have deliberately misspelled ⅾ3s.mff.cuni.cz once. Did you notice? And would you notice in the address bar?

Typosquatting and Unicode confusables

Typosquatting is a term describing the deliberate use of a misspelled name. It’s a common technique used outside of the realm of domain names, for example with Python packages (paper). Chances are you’ll not notice the typo in the address bar, and you’ll feel safe because of a slick green lock icon telling you everything is perfectly fine. The icon isn’t wrong: your connection to the attacker’s server is indeed secure.

But let’s assume for a while that you are extremely vigilant, and you check each and every URL for typos. In the above text, one of the d3s.mff.cuni.cz is not what they seem! Can you find it?

Depending on your system font, d3s.mff.cuni.cz and ⅾ3s.mff.cuni.cz might look exactly the same (they do on my machine):

Two distinct Unicode code-points rendered exactly the same

On my machine, the two strings appear exactly the same.

But they are not the same:

~% cat <<EOF | hexdump -C
Depending on your system font, d3s.mff.cuni.cz and ⅾ3s.mff.cuni.cz might look
exactly the same (they do on my machine), but they are _not_ the same:
EOF
00000000  44 65 70 65 6e 64 69 6e  67 20 6f 6e 20 79 6f 75  |Depending on you|
00000010  72 20 73 79 73 74 65 6d  20 66 6f 6e 74 2c 20 64  |r system font, d|
00000020  33 73 2e 6d 66 66 2e 63  75 6e 69 2e 63 7a 20 61  |3s.mff.cuni.cz a|
00000030  6e 64 20 e2 85 be 33 73  2e 6d 66 66 2e 63 75 6e  |nd ...3s.mff.cun|
00000040  69 2e 63 7a 20 6d 69 67  68 74 20 6c 6f 6f 6b 0a  |i.cz might look.|
00000050  65 78 61 63 74 6c 79 20  74 68 65 20 73 61 6d 65  |exactly the same|
00000060  20 28 74 68 65 79 20 64  6f 20 6f 6e 20 6d 79 20  | (they do on my |
00000070  6d 61 63 68 69 6e 65 29  2c 20 62 75 74 20 74 68  |machine), but th|
00000080  65 79 20 61 72 65 20 5f  6e 6f 74 5f 20 74 68 65  |ey are _not_ the|
00000090  20 73 61 6d 65 3a 0a                              | same:.|

Take a look at the output carefully: d3s.mff.cuni.cz is written with a d (UTF-8/ASCII byte 0x64) the first time, and a ⅾ (UTF-8 bytes 0xe2 0x85 0xbe) the second time.

In case you wonder what the other letter which looks like d is, it’s a “small roman numeral five hundred”, a completely different Unicode code-point. The reason it looks exactly the same (or almost the same) is simple: there are too many things which ought to look like a d, but a limited number of creative designs for a d-looking glyph. So they’ll either be the exact same design, or at least they’ll look very similar.

Unicode confusables are distinct Unicode code-points which are the same or similar in appearance; here’s an app. Check that link, you’ll find many other creative ways how to write d3s.mff.cuni.cz-looking text.

Now for the really bad news: Internationalized Domain Names (IDNs) make it possible to include non-ASCII (Unicode) code-points in your domain names. So, the attacker can HTTPS redirect you to d3s.mff.ⅽuni.cz (see what I did there?) and get a TLS web certificate for that domain instead (they can—it’s their domain). Now will you notice that?

Note: With the cz. TLD, this isn’t currently possible; CZ.NIC does not allow IDNs (yet). See háčkyčárky.cz for rationale (the page is also available in English).

You cannot reasonably expect to notice this shenanigan, unless your font doesn’t have a glyph for the confusable at all (and renders an ugly box), or happens to render the confusable in a radically different way.

And just as it was the case with typosquatting, Unicode confusables are a problem for much more than just domain names. Consider the following snippet of Ruby14 which decides wheter nuclear warheads should be launched:

#!/usr/bin/env ruby

ALLOWED_TARGETS = ["dresden", "paris", "vienna"]

def missile_launch_allowed(target, secret_key)
  allowed = true
  аllowed = false if secret_key != 1234
  allowed = false unless ALLOWED_TARGETS.include?(target)
  allowed
end

puts(missile_launch_allowed("dresden", 9999))

Run this, then run.

Side quest: securing the browser

Hopefully the example above convinced you that DNS is just one of many problems we have. We’d like to share some tips how you can make the above scenario at least implausible:

DNSSEC

DNSSEC makes it possible to verify that the response you got from a DNS server can be trusted. It relies on asymmetric cryptography and several additional DNS record types.

Resource record signatures: RRSIGs

DNSSEC adds several new record types to attach signatures to all records in a zone. If you compare a regular query:

~% drill mff.cuni.cz
;; [...]
;; QUESTION SECTION:
;; mff.cuni.cz. IN      A

;; ANSWER SECTION:
mff.cuni.cz.    21135   IN      A       195.113.27.221
;; [...]

To a DNSSEC-enabled query (mind the -D):

~% drill -D mff.cuni.cz
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 11920
;; flags: qr rd ra ad ; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;; mff.cuni.cz. IN      A

;; ANSWER SECTION:
mff.cuni.cz.    21141   IN      A       195.113.27.221
mff.cuni.cz.    21141   IN      RRSIG   A 13 3 28800 20230305204805 20221119191842 47500 mff.cuni.cz. EAwgM0+3xvrROUPDp4qRpWALK6qMuqYHMbHXFbX9Hk7ba8yYVXS0lVvf6wXFwANyFKlvvNTsLoLModKbMkI9kQ==

;; AUTHORITY SECTION:

;; ADDITIONAL SECTION:

;; Query time: 23 msec
;; EDNS: version 0; flags: do ; udp: 512
;; SERVER: 8.8.8.8
;; WHEN: Tue Dec 13 19:39:18 2022
;; MSG SIZE  rcvd: 163

You’ll see immediately that additional RRSIG records are retrieved. These are asymmetric signatures of the regular records; in the example above, the RRSIG record is a signature for the A record.

The RRSIG record contains the following information:

In other words, whenever you receive a DNS response, you can verify each record’s authenticity (that it is indeed the record as published by the domain’s authoritative server) and integrity (that it wasn’t modified in transit).

Note: As you can see, DNSSEC does not encrypt the records, it just signs them. You still need to use DoT/DoH for encryption.

Chain of trust

The signature in the RRSIG is computed using the domain’s private key, which is, of course, kept secret. The public key is then used to verify the signature.

The public key is published as a DNSKEY record within the zone itself, so that it’s easy to obtain it for anybody who wishes to verify the RRSIGs (verifying resolvers, such as Unbound, do that). Let’s query the DNSKEY record:

~% drill -D DNSKEY mff.cuni.cz
;; [...]
;; QUESTION SECTION:
;; mff.cuni.cz. IN      DNSKEY

;; ANSWER SECTION:
mff.cuni.cz.    21600   IN      DNSKEY  257 3 13 1PMTgkDSUJEO8PbtFEtJ6sqtBUwlqv5yWMAQpedPoJtvJ9Oxoen3OJoFxEnZCFBCouNsR58PYdzYDowWEQAJVw== ;{id = 47500 (ksk), size = 256b}
mff.cuni.cz.    21600   IN      RRSIG   DNSKEY 13 3 28800 20230319201841 20221119191842 47500 mff.cuni.cz. yZvUmiVeja4HBZaSDKlX1dzkFo3onJ293BD7i7VS50SCefWEKVZQKp/Yu7kaia/PLXSNQA/XWAX2fteB+buFDA==
;; [...]

The DNSKEY record contains the following information:

The DNSKEY record contains the public key material needed to verify the record signature. But how do we know we can trust the key? We cannot blindly trust the DNSKEY from the zone: a man-in-the middle could forge the records, sign them with their private key, and then forge the DNSKEY record as well. The signature would match, but that would say nothing about the validity of the signed records.

To indicate that the key is trusted, the parent zone (cuni.cz.) will publish a DS record (delegation signer):

~% drill -D DS mff.cuni.cz
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 32133
;; flags: qr rd ra ad ; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;; mff.cuni.cz. IN      DS

;; ANSWER SECTION:
mff.cuni.cz.    21600   IN      DS      47500 13 2 6e5316a92bf1ac95f4f9916f57195d305a9e08ef7d0fdda6274f47b03a6abc19
mff.cuni.cz.    21600   IN      RRSIG   DS 13 3 86400 20221223202239 20221123201842 32757 cuni.cz. BvWsWUcxKpGn3wEMfeyy+4wRftcDwWXoW+ENifAdIea78kLuD/Bosj+Dkg9Ge3kjvzX/UqMDoTUWoM96tfKhpg==

;; AUTHORITY SECTION:

;; ADDITIONAL SECTION:

;; Query time: 34 msec
;; EDNS: version 0; flags: do ; udp: 512
;; SERVER: 8.8.8.8
;; WHEN: Tue Dec 13 21:05:54 2022
;; MSG SIZE  rcvd: 191

Crucially, the DS record contains a hash of the DNSKEY record stored in the child zone. This conveys the following message: “if you trust the parent zone, you can trust this key in the child zone”.

All that remains is recursion: to trust the DNSKEY of cuni.cz, you can query cz for the DS record of cuni.cz; to trust the DNSKEY of cz, you can query the root zone (.). The root zone key is trusted, and its public keys are distributed out-of-band. For example on Arch Linux, you’ll likely find out that you have dnssec-anchors installed:

% pacman -Ql dnssec-anchors
...
dnssec-anchors /etc/trusted-key.key
...

To summarize:

The above description is a simplification, but is sufficient for now. DNSSEC requires more than the described new record types, and it also requires several extensions to the protocol. If you’re interested, check out this nice summary.

Key signing keys and zone signing keys

The best practice is to have two keys associated with a domain: a key signing key and a zone signing key.

There are some advantages to splitting the responsibilities this way:

However, this setup is a bit more complex. Having a separate KSK and ZSK is optional, and DNSSEC works with a single key just fine.

Fun fact, did you know that a key signing ceremony is held whenever the KSK is used to sign a new ZSK for the root zone?

DNSSEC deployment statistics

As of this writing (Nov 2024), about 60 % of cz. domains use DNSSEC (stats). Globally, the numbers appear to be much lower (stats).

Whois

Whois, although it has nothing to with DNS, is another kind of a distributed database that appeared in the 1980’s. It can be used to query information about network objects (domain names, IP addresses and AS numbers). You probably know the web-based Whois interface of CZ.NIC.

There’s also a command-line tool, whois(1). Since the protocol itself is trivial (RFC 3912), the command-line tool mostly just tries to guess what kind of data you are looking for and which Whois database your query should be directed to.

For example, when you query something that looks like a domain name, it will send your query to the regional NIC based on the suffix. For .cz domains, you get something like:

~% whois dcepelik.cz
%  (c) 2006-2021 CZ.NIC, z.s.p.o.
%
% [...]
%
% Whoisd Server Version: 3.15.0
% Timestamp: Fri Nov 22 08:49:54 2024

domain:       dcepelik.cz
registrant:   DCEPELIK
nsset:        DCEPELIK-NSSET
keyset:       DCEPELIK-KEYSET
registrar:    REG-INTERNET-CZ
registered:   27.01.2017 23:04:41
changed:      04.02.2017 17:10:37
expire:       27.01.2025

contact:      DCEPELIK
name:         David Čepelík
registrar:    REG-MOJEID
created:      22.12.2013 21:24:45
changed:      24.08.2023 22:26:25

[...]

Whereas for .com domains, you get:

~% whois google.com
   Domain Name: GOOGLE.COM
   Registry Domain ID: 2138514_DOMAIN_COM-VRSN
   Registrar WHOIS Server: whois.markmonitor.com
   Registrar URL: http://www.markmonitor.com
   Updated Date: 2019-09-09T15:39:04Z
   Creation Date: 1997-09-15T04:00:00Z
   Registry Expiry Date: 2028-09-14T04:00:00Z
   Registrar: MarkMonitor Inc.
   Registrar IANA ID: 292
   Registrar Abuse Contact Email: abusecomplaints@markmonitor.com
   Registrar Abuse Contact Phone: +1.2086851750
   Domain Status: clientDeleteProhibited https://icann.org/epp#clientDeleteProhibited
   Domain Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited
   Domain Status: clientUpdateProhibited https://icann.org/epp#clientUpdateProhibited
   Domain Status: serverDeleteProhibited https://icann.org/epp#serverDeleteProhibited
   Domain Status: serverTransferProhibited https://icann.org/epp#serverTransferProhibited
   Domain Status: serverUpdateProhibited https://icann.org/epp#serverUpdateProhibited
   Name Server: NS1.GOOGLE.COM
   Name Server: NS2.GOOGLE.COM
   Name Server: NS3.GOOGLE.COM
   Name Server: NS4.GOOGLE.COM
   DNSSEC: unsigned
   URL of the ICANN Whois Inaccuracy Complaint Form: https://www.icann.org/wicf/

For domain names, the dispatch happens here.

Missing bits

  1. If you run the same query twice, the result is unlikely to be identical, as some parts (such as the query ID or the remaining TTLs of the records) change between calls. But the addresses retrieved would be the same, unless the zone has changed in between queries.

  2. But can also be used over TCP, see drill -t. Using TCP is sometimes necessary (for example when the response doesn’t fit into a single UDP packet; that can happen with many records or when DNSSEC is used). 2

  3. I queried my local Unbound just after it was started.

  4. Listing was redacted for brevity.

  5. For example (non-portable):

    printf "arpa.in-addr.%s\n" "$ip" | tr . \\n | tac | paste -s -d.
    
  6. Because the IPv6 address is four times the length of an IPv4 address, duh!

  7. See /etc/ssl/certs/ca-certificates.crt, update-ca-certificates(8), /etc/ca-certificates.conf.

  8. Examples of non-orthogonal and/or non-secure channels: TODO.

  9. For example here in Unbound, a recursive caching DNS server.

  10. This is just the e-mail address with @ replaced by a dot (.). If the user portion of the email contains dots, they must be escaped: foo\.bar.example.com. Escaping happens only for the user part.

  11. Emphasis on “trustworthy”. While you could use Google DNS (8.8.8.8, 8.8.4.4) just fine, it probably wouldn’t do much for you in terms of privacy.

  12. This mostly works fine, but breaks in at least two cases that I know of: captive portals and DNS64.

  13. To be themselves resilient against DNS attacks, Let’s Encrypt resolves the host name for which it’s issuing the certificate against multiple DNS servers.

  14. Ruby was chosen since it doesn’t require you to declare variables before first use, which is needed for this particular script to work. Otherwise I wouldn’t be worried at all if the nuclear arsenal of the world was controlled by a Ruby script, because Compiling Nokogiri native extensions…