In this modern world engulfed in Technology, I believe everyone has interacted with the internet, as early as five years modern children are already exposed to powerful devices for them to access free and abundant information all over.
However, Have you ever wondered about the chronology of processes involved whenever you type something on your browser in the quest to get informed? Well, that is the purpose of this article, to dissect and explain the nitty-gritty behind the engineering of the internet.
Let's dive in.
Brief Description and Understanding
Whenever you search for something on your browser, in our case https://www.google.com
, a series of processes take place in a split of seconds before you get a response.
Let me expand this in a list format;
When you type google.com and press Enter, Your computer which in this case is referred to as the
client
first sends a request to a Domain Name Server (DNS), which is a server that maps domain names to IP Addresses using the Address book. Meaning, it gives out the IP Address thathttps://www.google.com
points to.Once your computer has this IP Address, it can then establishes a connection with the server/computer through that IP Address. This type of connection that your computer establishes with the server is called Transmission Control Protocol (TCP) using the Internet Protocol (IP). The process of establishing this kind of connection is referred to as a handshake. A Computer that is behind a firewall will have to ensure the request is allowed to pass through.
After establishing the connection, Your browser now sends the request in our case using an encryption protocol, Secure Socket Layer (SSL) or Transport Layer Security (TLS) because we are making an
HTTPS
request (The 's' implies secure connection).Sometimes people/companies use what we call a Load Balancer, in cases where they receive a lot of traffic. The work of the load balancer, just as the name suggests is to distribute/balance the load of traffic among the servers based on the algorithm used in the load balancer. Therefore, the request from the client will come to the load balancer, then the load balancer, depending on its algorithm will choose which server it will send the request to.
The server which will receive the request from the load balancer will then give a response which most of the time are HTML, CSS and JavaScript files that makes up the Google homepage. Dynamic contents will make the web server contact the application server which may also make a request to the Database server depending on the information needed here.
Upon receiving all this information the Web server will then send it to the Load balancer as a response and finally, the Load balancer sends it to the client (Browser). The browser ultimately will do its magic to render that information in a human-readable manner, thanks to HTML, CSS and the rest.
You may be contemplating right now and wondering how fast all of these described steps take place. But computers are really fast. Let us appreciate Technology!
DNS Request
DNS and How it works
DNS stands for Domain Name System, It is like the Internet Phonebook. Humans access information online through domain names, like google.com while Web browsers interact through Internet Protocols (IP). Since humans are unable to map IP Addresses off-head, It is the DNS that translates domain names to IP Addresses so browsers can load Internet resources. Each device connected to the internet has a unique IP which other devices use to find it.
There are a series of steps that takes place whenever you type a domain name in the browse, Firstly before DNS is involved the browser will check its cache for a copy of the DNS record for the domain you are searching. If it exists then it will retrieve the IP Address in the cache and send a request to the server without going to the DNS.
Let's explore the DNS lookup process
The browser sends a request to the Local DNS resolver which is usually provided by the Internet Service Provider (ISP), The DNS resolver checks its cache for a copy of the DNS record for the domain. If it is available it sends the IP to the browser.
If the DNS record copy is not available, The local DNS resolver again sends a request to the root name server which responds with addresses of Top Level Domain (TLD) nameserver e.g. .com or .org and the authoritative nameserver for the domain.
The local DNS resolver then sends a request to the authoritative nameserver which also responds with the IP Address of the domain, The local DNS resolver finally sends this IP Address to the browser and the browser sends a request to that IP to retrieve information.
Once the browser receives the IP Address, it caches it with the help of the local DNS resolver for future requests. This would minimize the process it takes to retrieve the IP since it would already be available in the browser cache.
TCP/IP
TCP/IP stands for Transmission Control Protocol/Internet Protocol. It is a suite of communication protocols used to interconnect network devices on the internet. It is also used as a form of communication in private networks, Intranet or extranet.
TCP/IP specifies how data is exchanged over the internet by providing end-to-end communications that identify how it should be broken into packets, addressed, transmitted, routed and received. TCP/IP uses the client-server model of communication in which a user is provided a service by another computer (server).
When a browser sends a request to a server using an IP Address to establish a connection, The server sends back a message acknowledging the request to establish a connection through the handshake process.
After this, the browser can now send a request for the resources it wants access to (For example google.com). The server upon receiving the request sends a response which basically entails HTML code for the Google homepage. This process is made efficient by the use of TCP.
Firewall
A firewall is a network security device that monitors incoming and outgoing network traffic and permits or blocks data packets based on a set of security rules. Its purpose is to establish a barrier between your internal network and incoming traffic from external sources (such as the internet) to block malicious traffic like viruses and hackers.
Firewalls carefully analyze incoming traffic based on pre-established rules and filter traffic coming from unsecured or suspicious sources to prevent attacks. Firewalls guard traffic at a computer’s entry point, called ports, which is where information is exchanged with external devices.
Firewalls can be categorized into two based on the source of IP and the type of traffic they can block or allow to come in through to our computers.
Rules that block traffic based on the source and destination of the request. These firewalls are configured to block traffic from certain geographical locations or allow certain IP addresses to access the network.
Rules that block traffic based on the type. These firewalls are configured to block traffic on certain ports, ones that are mostly used by malware or allow specific types of traffic e.g. Allow traffic from HTTPS only.
HTTPS/SSL
HTTPS stands for Hyper Text Transfer Protocol Secure, It is a secure version of the HTTP and is used to transmit data on the internet securely.
SSL on the other hand stands for Secure Sockets Layer, It is an encryption protocol used to secure data transmitted over HTTPS.
When you enter the URL google.com on the browser, Google's server gives its public key and certificate (signed by GeoTrust) to the browser. The browser verifies the authenticity of the certificate, that it is signed by GeoTrust. Since browsers come with a pre-installed list of public keys from major Certificate Authorities (CA's), it picks the public key of GeoTrust in this case and tries to decrypt the digital signature of the certificate which was encrypted by the public key of GeoTrust.
If the browser manages to decrypt the signature, then it means the website is trustworthy and proceeds to retrieve information.
When Google sends the data like requested HTML documents and other HTTP data to the browser it first encrypts the data with this session key and the browser decrypts the data with the other copy of the session key. Similarly, when the browser sends the data to the Google server it encrypts it with the session key which the server decrypts on the other side.
Load balancer
Load balancing refers to efficiently distributing incoming network traffic across a group of backend servers, also known as a server farm or server pool.
Google being a big and established company receives high traffic, with hundreds of thousands, if not millions of concurrent requests from users or clients and returns the required content. To cost-effectively scale to meet these high volumes, modern computing best practice generally requires adding more servers.
A load balancer acts as the “traffic cop” sitting in front of your servers and routing client requests across all servers capable of fulfilling those requests in a manner that maximizes speed and capacity utilization and ensures that no one server is overworked, which could degrade performance. If a single server goes down, the load balancer redirects traffic to the remaining online servers. When a new server is added to the server group, the load balancer automatically starts to send requests to it.
In our case, If a browser is trying to access google.com, the request will be first received by the load balancer. The load balancer will then forward the request to one of the servers in the Google server network using whichever algorithm is implemented in the load balancer. This server processes the request and sends it back to the load balancer which then forwards it back to the browser.
Web server
A web server is a computer program that is responsible for handling requests from other computers. The web server receives requests from clients, processes them and sends back a response.
When a clients try to access google.com, the request is sent to the load balancer which in turn forwards it to the web server. The web server would process the request and generate a response which is basically HTML, CSS and JavaScript files. The web server then sends this response to the load balancer which finally forwards it to the browser for rendering and viewing.
Application server
Application servers are high-powered computers providing resources to users and web clients. They sit between database servers storing application data and web servers communicating with clients.
when you type the URL google.com on the browser, this request is sent to the load balancer which then sends it to the webserver. The web server then forwards this request to the application server which processes the request and generates search results.
Database server
A database server runs a database management system and provides database services to clients. The server manages data access and retrieval and completes clients’ requests. The database server stores the Database Management System (DBMS) and the database itself. Its main role is to receive requests from client machines, search for the required data, and pass back the results.
Whenever a request is sent, It goes through the load balancer to the web server. From the web server, it is forwarded to the application server, and depending on the request, the application server may need to contact the database for retrieval of information which will later be rendered to the client as a response.
Conclusion
In Summary, this is what it takes for your page to be rendered to your browser whenever you type https://www.google.com.
NOTE: This is technical writing which is part of the ALX Africa Software Engineering program which required a detailed explanation of what happens when you type google.com on your browser and press Enter
. I hope you enjoyed and are more informed on the engineering behind client-server communication