Exploring Internet Communication: From protocols to web browsing

Nalan Ozgedik
Jotform Tech
Published in
10 min readSep 29, 2023

--

Every day, we use the internet for everything from visiting a website to checking our email to downloading files. Have you ever wondered about all the stuff happening behind the scenes when you type a web address and press “Enter” in your browser?

To communicate with distant computers and send data, we need to follow certain rules called protocols. A network protocol is like a guide that tells computers how to communicate in a network. Thanks to RFC (Request for Comments), a document maintained by standard-setting bodies for the internet, these protocols became official and universally adopted. RFC provides the engineering implementation guidelines for how to apply a specified protocol.

For instance, consider RFC 9114, which governs the exact behavior of HTTP/3 requests and responses. The fact that these rules are published on the internet and become standards allows HTTP/3 to be implemented by many different vendors. Consequently, different devices, such as an IBM server and an iPhone, communicate effectively.

When it comes to network protocol, the TCP/IP (Transmission Control Protocol/Internet Protocol) model is like the official rulebook. It has end-to-end knowledge on how data should be packed, routed and received. It specifies how devices exchange the data over the internet to one another.

TCP/IP was developed independently and was implemented for real-world networking needs in contrast to the OSI (Open Systems Interconnection). OSI was created later as a conceptual framework to standardize network communication. While both models provide ways to communicate over the network, they have different origins and purposes. TCP/IP is the practical foundation of the internet, whereas the OSI model is more of a theoretical reference.

TCP/IP consist of two main protocols: TCP helps applications set up communication. It breaks messages into packages before sending them and puts them back together at the other end. IP decides how to send each package and makes sure it reaches the right destination.

The TCP/IP model has four layers, and each of them adds its own bit of information. This process is called encapsulation.

Here is a brief summary of these layers, their functions, the protocol and devices used on each layer, and the encapsulation pipeline for each.

The TCP/IP model allows us to use many communication protocols. To stay on track and avoid distractions by other application layer protocols, let’s stick with the HTTP protocol as an example.

When we type a URL into the address bar, the first thing browsers do is parse the URL into different components like schema (http/https), port, domain, path, and fragment. This helps organize the data before sending it to the target machine. Then the application layer identifies the protocol from the scheme to figure out how to connect to the target server. Even if you forget to type “http” or “https” at the beginning of the web address, the browser adds it automatically to the URL in the background.

The application constructs an http request (e.g.: get, post) according to the selected protocol, which is http in our case. That request is based on the URL and has additional headers or data required. Then the port assignment process comes into play. Even if you don’t explicitly mention a port in the URL, the browser automatically adds the port number to the URL in the background, just like it adds the protocol automatically.

DNS Lookup and How the DNS Server Works

When dealing with websites, we usually use domain names instead of IP addresses, mostly because it’s much simpler to remember words than numbers. But the communication between two computers must be over IP (unless devices are located in the same network, using hubs for broadcasting data, or communicating through MAC addresses over a switch, which will be explained in detail later in this article). When an HTTP request is being prepared, an IP address of the target machine must be resolved from the domain name at the application layer. This resolved IP address will be added to the data packet later at the Network Layer in the TCP/IP stack.

Initially, the browser examines its internal cache memory, known as the internal DNS cache, to see if it has the IP address associated with the given domain name stored. You can check chrome://net-internals/#dns to get Chrome’s cache.

If the IP address isn’t found in the browser’s cache, the operating system’s DNS cache will be checked next. For Windows, the DNS cache is saved in the DNS Client service. For macOS and Linux, the operating system’s networking subsystem takes care of the DNS cache. You can check them by using “ipconfig” for Windows, “dscacheutil” for macOS, and “nscd” or “systemd-resolve” on Linux.

Speaking of which, I’d like to mention DNS spoofing, also known as DNS hijacking. In this type of attack, the intruder changes your computer’s DNS cache memory to show incorrect information so the client can redirect to a harmful website. This attack can also target and poison Conditional DNS (CDNS) resolvers and potentially compromise entire TLDs by using techniques such as MaginotDNS, which was developed by a team of researchers from UC Irvine and Tsinghua University.

If the IP address is still not found in the browser’s nor the OS’s DNS cache, the client sends the DNS query to the resolver server, which is generally your ISP (Internet Service Provider) and it will check its own cache memory to find the specified IP.

If the resolver server still can’t find the IP address, it sends the query to the next level, known as the root server. There are three primary tiers of servers at the system of DNS hierarchy: root servers, top-level domain servers, and authoritative name servers. The root servers are the top of this DNS hierarchy. There are 13 sets of these root servers (labeled A through M) and they are strategically placed around the world and also operated by 12 independent organizations.

By the way, while we’re on the topic, I’d like to highlight that you can find the nearest root server by using this great service, which was developed by my colleague Evren Tan. The root server knows where to find the IP address and directs the resolver to the TLD (top level domain). Then the resolver asks TLD to find the IP address of the specified domain. TLD stores the address information for a top level domain such as .com, .net, .org. So, as an example, when a TLD server receives a query for the IP address for jotform.com, the TLD server is not going to know the IP address for jotform.com. Instead, it will guide the resolver to find the authoritative name server, which is the ultimate authority about the subject and has all the important information about the domain. The authoritative name server uses DNS zone files, which actually is a plain-text file containing all the important information about a domain name. It contains resource records (such as A-record to keep IPv4, AAA-record to keep IPv6) that are used to map and link domain names to corresponding requests.

Consequently, the resolver will requery the authoritative name server to obtain the IP address. Once the resolver gets the IP address, it stores it in its cache memory so it does not have to do all these steps again for the next query.

All of the above processes take place in the application layer, which is responsible for handling high-level protocols and application-related tasks. At the application layer, we are just dealing with the data, which has not been broken into segments yet. Once the data is formatted and the destination IP address is resolved, the request is moved to the transport layer.

The first thing the transport layer does is divide data into smaller segments (for TCP connection) or datagrams (for UDP connection). Then it assigns virtual port numbers to handle multiple services simultaneously. So what is a virtual port and how is it different from a physical port that is located on a network device?

Virtual Ports

Each client-server connection is uniquely identified by the combination of source IP, source port, destination IP, and destination port. This combination ensures that multiple concurrent connections from the same client can be distinguished from one another.

At this point, we are not talking about the physical ports (which we will discuss later in the article). The virtual port is a logical connection that is used by programs and services to exchange information. It specifically determines which program or service on a computer is going to be used, whether it’s pulling up a web page (80 or 443), using FTP service (21), or accessing an email (25).

Ports have a unique number that identifies them in a range of 0–65535, which are assigned by an organization called Assigned Numbers Authority. These port numbers are separated into 3 different categories:

  • 0–1023 are system ports
  • 2014–49151 are registered ports which are registered by companies or developers
  • 49152–65535 are dynamic or private ports

A port number is always associated with an IP address and these two work together to exchange data on a network. IP address determines geographic location, while the port number determines which service on that server it wants to use.

Besides the target machine’s port number, the client machine also needs to assign a port number temporarily to itself during a session. That is called Ephemeral Port Assignment at the transport layer of TCP. Ephemeral port assignment is handled by the client’s operating system at the transport layer of the networking stack. The selection is typically based on an algorithm that ensures uniqueness and fairness. The selected ephemeral port is temporarily reserved for the duration of the connection. This prevents other applications on the same client system from using the same port for concurrent connections, avoiding port conflicts.

When the client’s application communicates with a server, the selected ephemeral port number is used as the source port in the TCP header of outgoing packets. This allows the server to know which client initiated the connection and how to send responses back to the correct port. After the connection is terminated, the operating system releases the previously assigned ephemeral port. It becomes available for future use by other connections.

Consequently, the transport layer adds a header to each segment or datagram. It then establishes a connection between two devices or systems on a network by starting a 3-way handshake or TLS handshake.

Once the destination IP address is resolved and the port number added to the segment at the transport layer, the internet layer (where the internet protocol operates) comes into play. Its main responsibility is encapsulating packets with the header information (source and destination IP addresses) and forwarding them to the next router.

Routers are devices that forward data from one network to another based on their IP address. Routing decisions are typically made dynamically by routers in real time as packets are being forwarded. When data comes to the router, it determines if the package was meant for its own network or another network. If the data package is meant for another network, the device’s routing table determines the best next router to send the packet to, according to the destination IP address in the packet. This decision is made in real time based on the current state of the network.

Each packet also has a TTL (time to live) field in its header, which is decremented at each hop (each router) and causes the package to be discarded if reached the value zero. If the router determines the data package is meant for its own network, it receives the package through the gateway of the local network. At this point, the router uses a service called NAT (Network Address Translation) to translate the public IP address to private IP addresses.

The reason why we need NAT service and private IP address is that although there are over 4 billion IPv4 addresses available, that is not enough for today’s internet infrastructure. To overcome this issue, private IP addresses are developed to be used in local networks. Routers assign each internal device a private IP, the private IP address will be translated by NAT in the router to the one public IP address. Thanks to the new generation of IPv6, we do not need NAT and every single device in the world can actually have its own public IP address, since there are 340 undecillion IP addresses available.

The internet layer also maintains quality of service (QoS) and quality of the bandwidth by prioritizing the traffic if you need more bandwidth for activities like video or voice communication.

The fourth layer, network access layer, is responsible for physical network communication and data link protocols. NAL has an LLC (Logical Link Control) providing services to the upper layer and talks to the network layer. And MAC defines how devices access the medium. MAC addressing (host addressing) is necessary when we come to the local area network to locate the target device.

Hub vs Switch

Once the package reaches the gateway of a LAN (local area network), it can also use devices such as hub or switch to find destination. The purpose of the hub is to connect all of your network devices together on the internal network. It is a device that has multiple physical ports that accept ethernet connections from other network devices.

The main thing about a hub is its lack of intelligence. A hub has no idea where the data is supposed to be sent. So when data arrives at one of its ports, it is copied to all of the ports and sent (broadcast) to all of the connected devices to the ports. This structure not only creates security issues, but it also creates unnecessary traffic on the network, which wastes bandwidth.

The switch is very similar to a hub, but it is intelligent. A switch can actually learn physical addresses, called MAC addresses, of the devices that are connected to it. The MAC (media access control) address is a unique identifier for a particular device on a network. It is a six-byte hexadecimal number that is burned into every NIC (network interface card) by its manufacturer.

The MAC address, or physical address or hardware address, is broken up into two parts. The first three bytes (the OUI, or organizationally unique identifier) identify the manufacturer of NIC, and the last three bytes (the NIC identifier) are a unique number from the manufacturer that identifies each device on a network. MAC addresses are used to identify a device, whereas IP addresses are used to locate the device.

After the package is received by the target device, the whole encapsulation process occurs in reverse and is called decapsulating.

In conclusion, when you click on a website, there’s a lot going on behind the scenes to make it work smoothly. Think of it like a well-organized team making your internet adventures happen effortlessly. So, enjoy your online journey knowing there’s a whole system working for you :).

References

HTTP/3

What happens when you type a URL into your browser?

Root Servers

Nearest Root Server Finder

MaginotDNS attacks exploit weak checks for DNS cache poisoning

DNS Zone

TC/IP Protocols

TCP 3 Way Handshake

--

--