What happens when you type https://www.holbertonschool.com in your browser and press Enter?
Today we will try to address this question taking into account the following sections
- DNS request
- TCP / IP
- Firewall
- HTTPS / SSL
- Load-balancer
- Web server
- Application server
- Database
We will address each of the mentioned sections and rest assured that after reading this complete article, you will understand how all these processes work.
What is DNS?
The Domain Name System (DNS) is the telephone directory of the Internet. People access information online using domain names, such as nytimes.com or espn.com. Web browsers interact using Internet Protocol (IP) addresses. DNS translates domain names to IP addresses so that browsers can load Internet resources.
Each device connected to the Internet has a unique IP address that other computers can use to find it. DNS servers eliminate the need for humans to memorize IP addresses such as 192.168.1.1 (in IPv4) or new more complex alphanumeric IP addresses, such as 2400: cb00: 2048: 1 :: c629: d7a2 (in IPv6).
So far the basics of dns, we will deepen it in a few moments, but first let’s continue with the other sections
What is a domain name?
A domain name is a string of text that maps to a numeric IP address, used to access a website from client software. In plain English, a domain name is the text that a user types into a browser window to reach a particular website. For instance, the domain name for Google is ‘google.com’.
The actual address of a website is a complex numerical IP address (e.g. 103.21.244.0), but thanks to DNS, users are able to enter human-friendly domain names and be routed to the websites they are looking for. This process is known as a DNS lookup.
What is Https?
HTTPS stands for hypertext transfer protocol secure and is the encrypted version of HTTP. It is used for secure communication across the internet or a network. The communication protocol is encrypted using Transport Layer Security (TLS) or, formerly, Secure Sockets Layer (SSL).
What a Subdomain is ?
In the DNS (domain name system) hierarchy, a subdomain is an entirely new website that is part of a domain. On your domain, you can create multiple separate independent websites, often at no cost.
If your website domain is example.com, you can set up different subdomains, such as blog.example.com and shop.example.com.
Here are some popular examples of what a subdomain is:
- analytics.twitter.com
- business.facebook.com
- support.google.com
- blog.hubspot.com
What Does Top-Level Domain (TLD) Mean?
Top-level domain (TLD) refers to the last segment of a domain name, or the part that follows immediately after the “dot” symbol.
For example, in the internet address: https://www.holbertonschool.com, the “.com” portion is the TLD.
TLDs are mainly classified into two categories: generic TLDs and country-specific TLDs.
Examples of some of the popular TLDs include:
- .com
- .org
- .net
- .gov
- .biz
- .edu.
The Internet Corporation for Assigned Names and Numbers (ICANN), is the entity that coordinates domains and IP addresses for the internet.
Historically, TLDs represented the purpose and type of domain or the geographical area from which it originated. ICANN has generally been very strict about opening up new TLDs, but in 2010, it decided to allow the creation of numerous new generic TLDs as well as TLDs for company-specific trademarks.
What Is an IP Address?
Put simply, an IP address (short for Internet Protocol address) is a unique identifier for your machine. Computers have them, but so do tablets and smartphones. And, just like a fingerprint or a snowflake, no two IP addresses are exactly the same.
There are standards for these sorts of things, of course, and the Internet Assigned Numbers (IANA) Authority sets them. There are two primary types of IP addresses in use today: IP version 4 (IPv4) and IP version 6 (IPv6). The former has been around since January 1983, and is still the most common. These are 32-bit numbers expressed in four octets, separated in a so-called “dotted decimal” notation — for example, 192.0.2.53.
By 1999, with the commercialization of internet access well underway, experts were concerned that the IANA could actually run out of valid IPv4 addresses. So, the Internet Engineering Task Force, a nonprofit standards organization based in Fremont, California, engineered its successor, IPv6. These are 128-bit numbers, expressed in hexadecimal strings — for instance, 2001:0db8:582:ae33::29.
URL stands for Uniform Resource Locator. URL is the address of the website which you can find in the address bar of your web browser. It is a reference to a resource on the internet, be it images, hypertext pages, audio/video files, etc.
Example :
https://www.holbertonschool.com
Steps for what happens when we enter a URL :
- Browser checks cache for DNS entry to find the corresponding IP address of website.
It looks for following cache. If not found in one, then continues checking to the next until found.
- Browser Cache
- Operating Systems Cache
- Router Cache
- ISP Cache
- If not found in cache, ISP’s (Internet Service Provider) DNS server initiates a DNS query to find IP address of server that hosts the domain name.
The requests are sent using small data packets that contain information content of request and IP address it is destined for. - Browser initiates a TCP (Transfer Control Protocol) connection with the server using synchronize(SYN) and acknowledge(ACK) messages.
- Browser sends an HTTP request to the web server. GET or POST request.
- Server on the host computer handles that request and sends back a response. It assembles a response in some format like JSON, XML and HTML.
- Server sends out an HTTP response along with the status of response.
- Browser displays HTML content
- Finally, Done.
But we must not stop here, as we mentioned, we still need to detail what happens when we make a request to our server. This is where things get even more interesting
Let’s take a look at how our infrastructure of our servers is formed and thus be able to go much deeper
But first let’s understand what components or sections our infrastructure is made of, don’t worry we have a good diagram prepared to understand it even better
Firewall defined
A firewall is a network security device that monitors incoming and outgoing network traffic and permits or blocks data packets based on a set of security rules. Its purpose is to establish a barrier between your internal network and incoming traffic from external sources (such as the internet) in order to block malicious traffic like viruses and hackers.
Load Balancer
Load balancing refers to efficiently distributing incoming network traffic across a group of backend servers, also known as a server farm or server pool.
Modern high‑traffic websites must serve hundreds of thousands, if not millions, of concurrent requests from users or clients and return the correct text, images, video, or application data, all in a fast and reliable manner. To cost‑effectively scale to meet these high volumes, modern computing best practice generally requires adding more servers.
A load balancer acts as the “traffic cop” sitting in front of your servers and routing client requests across all servers capable of fulfilling those requests in a manner that maximizes speed and capacity utilization and ensures that no one server is overworked, which could degrade performance. If a single server goes down, the load balancer redirects traffic to the remaining online servers. When a new server is added to the server group, the load balancer automatically starts to send requests to it.
In this manner, a load balancer performs the following functions:
- Distributes client requests or network load efficiently across multiple servers
- Ensures high availability and reliability by sending requests only to servers that are online
- Provides the flexibility to add or subtract servers as demand dictates
Load Balancing Algorithms
Different load balancing algorithms provide different benefits; the choice of load balancing method depends on your needs:
- Round Robin — Requests are distributed across the group of servers sequentially.
- Least Connections — A new request is sent to the server with the fewest current connections to clients. The relative computing capacity of each server is factored into determining which one has the least connections.
- Least Time — Sends requests to the server selected by a formula that combines the
fastest response time and fewest active connections. Exclusive to NGINX Plus. - Hash — Distributes requests based on a key you define, such as the client IP address or
the request URL. NGINX Plus can optionally apply a consistent hash to minimize redistribution
of loads if the set of upstream servers changes. - IP Hash — The IP address of the client is used to determine which server receives the request.
- Random with Two Choices — Picks two servers at random and sends the request to the
one that is selected by then applying the Least Connections algorithm (or for NGINX Plus
the Least Time algorithm, if so configured).
Benefits of Load Balancing
- Reduced downtime
- Scalable
- Redundancy
- Flexibility
- Efficiency
Monitoring
Web Server Monitoring provides information about the availability and performance of web servers, such as Apache, Microsoft Internet Information Services (IIS), IBM HTTP Server, and the Oracle HTTP Server web server. Using the Web Server Monitoring, web server administrators can detect and resolve web server availability and performance problems.
Web Server
Web servers are software or hardware (or both together) that stores and delivers content to a web browser at a basic level. The servers communicate with browsers using Hypertext Transfer Protocol (HTTP). Web servers can also support SMTP (Simple Mail Transfer Protocol) and FTP (File Transfer Protocol).
Web servers are also used for hosting websites and data for web applications. They can host single websites and multiple websites using virtualization.
Why is it important to understand the answer to the question, how does a web server work? The success of a website doesn’t just depend on its content and functionality but also the efficiency of the web server used to power it. This requires an understanding of a web server’s capabilities and limitations. When discussing how a Web server works, it is not enough to simply outline a diagram of how low-level network packets go in and out of a Web server.
What is NGINX?
NGINX is open source software for web serving, reverse proxying, caching, load balancing, media streaming, and more. It started out as a web server designed for maximum performance and stability. In addition to its HTTP server capabilities, NGINX can also function as a proxy server for email (IMAP, POP3, and SMTP) and a reverse proxy and load balancer for HTTP, TCP, and UDP servers.
Application Server
Applications come in all shapes, sizes, and use cases. In a world where we rely on a host of critical business processes, application servers are the high-powered computers providing application resources to users and web clients.
Application servers physically or virtually sit between database servers storing application data and web servers communicating with clients. App servers and akin middleware are the operating systems supporting an application’s development and delivery. Whether it’s a desktop, mobile, or web app, application servers play a critical role in connecting a world of devices.
Data Base
A database is a systematic collection of data. They support electronic storage and manipulation of data. Databases make data management easy.
Let us discuss a database example: An online telephone directory uses a database to store data of people, phone numbers, and other contact details. Your electricity service provider uses a database to manage billing, client-related issues, handle fault data, etc.
Let us also consider Facebook. It needs to store, manipulate, and present data related to members, their friends, member activities, messages, advertisements, and a lot more. We can provide a countless number of examples for the usage of databases.
MySQL is a relational database management system based on SQL — Structured Query Language. The application is used for a wide range of purposes, including data warehousing, e-commerce, and logging applications.
What Does Replication Mean?
Replication is the continuous copying of data changes from one database (publisher) to another database (subscriber). The two databases are generally located on a different physical servers, resulting in a load balancing framework by distributing assorted database queries and providing failover capability. The server for the subscriber database may be configured as a backup in the event of failure of the server for the publisher database.
Take a close look at this diagram I made to give us a clearer idea of all the procedures that are involved.
Now what happens when I type holbertonschool.com?
1. You enter the URL in the browser.
Suppose you want to visit the website of HolbertonSchool . So you type holbertonschool.com in the address bar of your browser. When you type any URL you basically want to reach the server where the website is hosted.
2. The browser looks for the IP address of the domain name in the DNS(Domain Name Server).
DNS is a list of URLs and their corresponding IP address just like the telephone book has phone numbers corresponding to the names of the people. We can access the website directly by typing the IP address but imagine remembering a group of numbers to visit any website. So, we only remember the name of the website and the mapping of the name with the IP address is done by the DNS.
The DNS checks at the following places for the IP address.
- Check Browser Cache: The browser maintains a cache of the DNS records for some fixed amount of time. It is the first place to run a DNS query.
- Check OS Cache: If the browser doesn’t contain the cache then it requests to the underlying Operating System as the OS also maintains a cache of the DNS records.
- Router Cache: If your computer doesn’t have the cache, then it searches the routers as routers also have the cache of the DNS records.
- ISP(Internet Service Provider) Cache: If the IP address is not found at the above three places then it is searched at the cache that ISP maintains of the DNS records. If not found here also, then ISP’s DNS recursive search is done. In “DNS recursive search”, a DNS server initiates a DNS query that communicates with several other DNS servers to find the IP address.
So, the domain name which you entered got converted into a DNS number. Suppose the above-entered domain name holbertonschool.com has an IP address 100.95.224.1. So, if we type https://100.95.224.1 in the browser we can reach the website.
3. The Browser initiates a TCP connection with the server.
When the browser receives the IP address, it will build a connection between the browser and the server using the internet protocol. The most common protocol used is TCP protocol. The connection is established using a three-way handshake. It is a three-step process.
- Step 1 (SYN): As the client wants to establish a connection so it sends an SYN(Synchronize Sequence Number) to the server which informs the server that the client wants to start a communication.
- Step 2 (SYN + ACK): If the server is ready to accept connections and has open ports then it acknowledges the packet sent by the server with the SYN-ACK packet.
- Step 3 (ACK): In the last step, the client acknowledges the response of the server by sending an ACK packet. Hence, a reliable connection is established and data transmission can start now.
4. The browser sends an HTTP request to the server.
The browser sends a GET request to the server asking for holbertonschool.com webpage. It will also send the cookies that the browser has for this domain. Cookies are designed for websites to remember stateful information (items in the shopping cart or wishlist for a website like Amazon) or to record the user’s browsing history etc. It also has additional information like request header fields(User-Agent) for that allows the client to pass information about the request, and about the client itself, to the server. Other header fields like the Accept-Language header tells the server which language the client is able to understand. All these header fields are added together to form an HTTP request.
Sample Example of HTTP Request: Now let’s put it all together to form an HTTP request. The HTTP request below will fetch abc.html page from the web server running on afteracademy.com
GET /abc.htm HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)
Host: www.afteracademy.com
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
5. The server handles the incoming request and sends an HTTP response.
The server handles the HTTP request and sends a response. The first line is called the status line. A Status-Line consists of the protocol version(e.g HTTP/1.1) followed by numeric status code(e.g 200)and its associated textual phrase(e.g OK). The status code is important as it contains the status of the response.
- 1xx: Informational: It means the request was received and the process is continuing.
- 2xx: Success: It means the action was successful.
- 3xx: Redirection: It means further action must be taken in order to complete the request. It may redirect the client to some other URL.
- 4xx: Client Error: It means some sort of error in the client’s part.
- 5xx: Server Error: It means there is some error on the server-side.
It also contains response header fields like Server, Location, etc. These header fields give information about the server. A Content-Length header is a number denoting the exact byte length of the HTTP body. All these headers along with some additional information are added to form an HTTP response.
Sample Example of HTTP Response: Now let’s put it all together to form an HTTP response for a request to fetch the abc.htm page from the web server running on afteracademy.com.
HTTP/1.1 200 OK
Date: Tue, 28 Jan 2020 12:28:53 GMT
Server: Apache/2.2.14 (Win32)
Last-Modified: Wed, 22 Jul 2019 19:15:56 GMT
Content-Length: 88
Content-Type: text/html
Connection: Closed
6. The browser displays the HTML content.
Now the browser gets the response and the HTML web page is rendered in phases. First, it gets the HTML structure and then it sends multiple GET requests to get the embedded links, images, CSS, javascript files, etc and other stuff. The web page will be rendered and in this case, the holbertonschool.com web page will be displayed.