Resolving Domain Names

When we type www.google.com in our web browser, the browser must first resolve the domain name to an IP address; only then can it establish a TCP connection to it.

To illustrate - if we use curl to send a GET request to www.google.com/, you'll see that the domain name gets resolved before the request is sent.

$ curl --ipv4 -v www.google.com/
* Trying 216.58.201.228...
...

But how does this resolution process actually work?

/etc/nsswitch.conf

First, curl (or other programs on your computer, such as a web browser) will check your computer's Name Service Switch (NSS) configuration file, which is located at /etc/nsswitch.conf.

The NSS is a generic facility used by services to determine which sources, and in what order, they should use to obtain name-service information. For example, if a service wants to authenticate a user, it may check the /etc/nsswitch.conf and see an entry for passwd.

$ grep passwd /etc/nsswitch.conf
passwd: files ldap

This means the service should first check with the local /etc/passwd file before querying the Lightweight Directory Access Protocol (LDAP) service.

Likewise, the same NSS configuration file can be used to determine which sources to use to resolve hostnames.

$ grep hosts /etc/nsswitch.conf
hosts: files dns

In this configuration, it will try to use the local /etc/hosts configuration file before using the Domain Name System (DNS). So let's take a look inside /etc/hosts.

/etc/hosts

The /etc/hosts file is simply a text file, where each line contains two or more values, separated by one or more spaces. The left-most value is the IP it should resolve to, the next value is the hostname that should resolve to that IP address, and the rest are aliases of that hostname. For example, your host file may have the following entry:

127.0.0.1 localhost  

This means when we send requests to localhost, it'll be resolved to 127.0.0.1.

$ curl -v localhost/
*   Trying 127.0.0.1...
...

127.0.0.1 is a special IP address that represents your current machine.

You can test this yourself by modifying the /etc/hosts file to something like the following:

127.0.0.1 localhost  
127.0.0.1 google.com google  

Now, when we send requests to google.com or google, it will resolve to 127.0.0.1.

In theory, we can have a massive /etc/hosts file with all domains and their corresponding IP addresses, and it'll be able to resolve all hostnames. However, this is not feasible because there are hundreds of millions of IP addresses, so that file would be gigantic. Furthermore, IP addresses for a domain can change frequently, so we'd have to update this file often.

There are consolidated hosts files available that are used to block dubious sites, such as fake news sites, gambling sites, and pornographic sites. You can find one such file at github.com/StevenBlack/hosts, but even this file only contains 42,915 entries, nowhere close to the hundreds of millions of IP addresses on the internet.

Therefore, under normal use cases, it's very unlikely that services will be able to find the hostname they're after in the /etc/hosts file.

When a hostname cannot be resolved using the /etc/hosts file, it'll fallback to using the DNS.

Understanding the DNS

You can think of the DNS as the internet's equivalent to the postal system – it allows us to address machines on the internet by easy-to-remember names (in other words, domain names), rather than IP addresses, just as you'd send letters to your friends' postal addresses, rather than specifying their geographic coordinates.

What is a domain?

When we talk about a "domain", such as google.com, what we are technically talking about is a fully qualified domain name (FQDN). Using our previous analogy, an FQDN is similar to a full postal address.

There are different parts to a postal address, each giving increasingly more granular details about the location of the destination. For example, when we read the address Main Street, Washington, Connecticut, USA from right to left, we first get information that this address is in the USA, then it becomes more specific and tells us that it is in the state of Connecticut, then the city of Washington, and finally the actual street address. It does this so we can distinguish between it and Main Street, Watertown, Connecticut, USA or Main Street, Vancouver, Washington, USA.

Likewise, an FQDN also has this property of increasing specificity. A valid FQDN contains at least two parts – a top-level domain and a secondary domain. For example, the domain google.com. has a secondary domain of google, and a top-level domain of com. The specificity increases from right to left - starting with . (root, least specific) to com to google (most specific).

Technically, an FQDN that conforms to the Internet Corporation for Assigned Names and Numbers (ICANN) standard requires a period (.) at the end of the FQDN, which denotes the root node within the DNS. However, since this is a fixed requirement for all domains, your browser allows you to leave out the trailing period so as to improve user experience. (Similarly, you don't have to specify Earth when you send mail, as that's implied)

Nameservers

Like our local /etc/hosts file, the purpose of the DNS is to resolve a name to an IP address. However, instead of using a single /etc/hosts file, it relies on a network of nameservers. These nameservers would read the FQDN and respond with an IP address.

There are many types of nameservers - resolving nameserves, TLD nameservers and root nameservers.

Resolving nameservers

Normally, the only type of nameserver your machine will interact with is a resolving nameserver, which is usually provided by your Internet Service Provider (ISP) or another business such as Google (which provides the 8.8.8.8 and 8.8.4.4 resolving nameservers), Cloudflare (1.1.1.1), or OpenDNS (208.67.222.222).

The resolving nameserver acts as an intermediary between your machine and the more "central" nameservers, and would send the requests to those nameservers on your behalf. It may have to query multiple nameservers before it gets the actual IP address.

The benefits of using a resolving nameserver is that the results can be cached. For example, if there were 1000 requests for the website yahoo.com, instead of sending thousands of requests to the "central" nameservers, the resolving nameserver can cache the first result, and reply to the 999 subsequent requests with the cached IP address. This makes IP resolution much quicker, and reduces the load on the DNS as a whole.

All your internet traffic passes through your ISP, who provides the resolving nameserver service. It's important to remember that they may record details of your messages. Therefore, it's very important for your privacy that your data is encrypted over-the-wire, so only the intended recipients will be able to read your messages.

However, even if you do end-to-end encryption on all your messages, your ISPs will still get information about which sites you are communicating with. For example, if you visit e-commerce shops often, your ISP can pick up on this, because they are the ones who are resolving the domain for you. Whether your ISP records this data, and how it utilizes that data, is up to your ISP.

One way you can prevent this data from being recorded is to use a virtual private network (VPN), where you send all requests through the VPN, which then relays your messages onto the intended hosts. In this setup, they act as a proxy server. However, you must do your due diligence to ensure that the VPN can be trusted, and that they do not hold on to traffic information for too long.

Root and TLD nameservers

When your resolving nameserver cannot find a cached record, or it wants to refresh its cache, it will send a request to the broader DNS network, which is made up of many root servers and top-level domain (TLD) servers.

The DNS system is hierarchical. At the top of this system are the root servers, and one layer below them are the TLD servers. It'd be very simple for a single root server to resolve all domains; however, the load would be too much. Therefore, ICANN, the authority that manages domain names, delegates a subset of domain names to the corresponding TLD servers, based on the domain's TLD. For example, the TLD server for .net is able to resolve all domains with the .net TLD.

Requests can be made to any nameservers, and if that server does not have a record of the domain, it will give you the IP of a TLD server that does. If it does not know which TLD server can resolve this domain, it will return you the IP of one of the root servers, which will be able to tell you which TLD is able to resolve it.

Ultimately, once your request goes past your resolving nameserver, it'll almost always end up at a TLD server.

There are currently 13 root servers, operated by 12 different organizations, and each one is mirrored. The interesting thing about these root servers is that all the mirrors have the same IP address, and so act as if they are a single machine.

You can see all root servers at root-servers.org.

TLD server to domain-level nameservers

When we register a domain, we do so through a domain registrar, which is a business entity that has been accepted by ICANN to register domain names. A registrar would check with a registry to see if the domain is available and, if so, registers it for you.

Once your registration is complete, the registrar would submit a record to the relevant TLD server, which is run by the registry for that TLD. The record consists of one or more pointers that point the domain name to the IP address of one or more domain-level nameserver(s) (a.k.a. domain nameserver, or authoritative nameserver); these are usually run by the registrar who registered the domain.

The domain nameserver(s) has on record a zone file that maps a host to an IP address. For example, it'll say api.hobnob.social should point to our server at 142.93.241.63. A domain nameserver will use the zone files it holds to resolve a hostname to the actual IP address of the corresponding resource.

Summary

The DNS resolves FQDNs into IP addresses. When you type a URL in your browser, your computer would first look to resolve the IP locally, by checking your /etc/hosts file. If it can't find it, it will pass the request on to a resolving nameserver, which will check its cache. If it cannot it, it will check with one of the root or TLD nameservers, which will return the IP address of a domain nameserver that holds the zone file for that domain. The domain nameserver will then return the actual IP address of the resource that was originally requested.

comments powered by Disqus