How Does the Internet Work?

 Contents

  1. Introduction
  2. Where to Begin? Internet Addresses
  3. Protocol Stacks and Packets
  4. Networking Infrastructure
  5. Internet Infrastructure
  6. The Internet Routing Hierarchy
  7. Domain Names and Address Resolution
  8. Internet Protocols Revisited
  9. Application Protocols: HTTP and the World Wide Web
  10. Application Protocols: SMTP and Electronic Mail
  11. Transmission Control Protocol
  12. Internet Protocol
  13. Wrap Up
  14. Bibliography

Introduction


What is the Internet's mechanism of operation? That's an excellent question! The rise of the Internet has been tremendous, and it seems difficult to avoid the constant barrage of www.com's seen on television, heard on the radio, and read in periodicals. Because the Internet has become such an integral part of our lives, a thorough grasp is required to make the most of this new technology

The fundamental architecture and technology that make the Internet work are explained in this whitepaper. It does not go into great detail, but it does cover enough of each topic to provide a fundamental comprehension of the principles. A list of resources is supplied at the end of the paper for any unsolved questions. Any suggestions, questions, or other feedback are welcome and should be sent to the author at the email address listed above.


Where to Begin? Internet Addresses


Because the Internet is a global network of computers each computer connected to the Internet must have a unique address. Internet addresses are in the form nnn.nnn.nnn.nnn where nnn must be a number from 0 - 255. This address is known as an IP address. (IP stands for Internet Protocol; more on this later.)

The image below shows two computers connected to the Internet, one with an IP address of 1.2.3.4 and the other with an IP address of 5.6.7.8. The Internet is depicted as a semi-abstract item in the middle. (As the specifics of the Internet are revealed in this work, the Internet component of Diagram 1 will be explained and redrawn multiple times.)

Diagram 1


When you use an Internet Service Provider (ISP) to connect to the Internet, you are normally given a temporary IP address for the duration of your dial-up session. If you connect to the Internet through a LAN, your computer may have a permanent IP address or receive a temporary one from a DHCP (Dynamic Host Configuration Protocol) server. In any event, your computer has a unique IP address if it is linked to the Internet.


Protocol Stacks and Packets


As a result, your machine is online and has a unique address. How does it communicate with other computers on the Internet? As an illustration, consider the following: If your IP address is 1.2.3.4, and you wish to send a message to the computer 5.6.7.8, you can do so. "Hello computer 5.6.7.8!" is the message you want to send. The message must, of course, be sent across whatever wire connects your computer to the Internet. Assume you've dialled into your ISP from your house and the message needs to be sent over the phone http://line.As a result, the message must be converted from alphabetic text to electrical signals, sent via the Internet, and then returned to alphabetic text. How does one go about doing this? A protocol stack is used to do this. To communicate on the Internet, every computer requires one, which is normally incorporated into the operating system (i.e. Windows, Unix, etc.). Because of the two principal communication protocols used on the Internet, the protocol stack is known as the TCP/IP protocol stack.


If we followed the path taken by the message "Hello computer 5.6.7.8!" from our computer to the computer with the IP address 5.6.7.8, it would look like this:

Diagram 2

  1. The message would begin at the top of your computer's protocol stack and work its way down.
  2. If the message to be sent is lengthy, each stack layer through which it goes may divide it up into smaller data chunks. This is because data is transferred in digestible bits through the Internet (and most computer networks). These data chunks are known as packets on the Internet.
  3. The packets would then go to the TCP layer after passing through the Application Layer. A port number is assigned to each packet. Ports will be discussed later, but suffice it to state that numerous programmes may be delivering messages utilising the TCP/IP stack. Because the message will be received on a certain port, we need to know which application on the destination computer needs to receive it.
  4. The packets then travel to the IP layer after passing through the TCP layer. This is where each packet's destination address, 5.6.7.8, gets received.
  5. Our message packets are now ready to be sent across the Internet because they have a port number and an IP address. The hardware layer is responsible for converting our message's alphabetic text into electronic signals and transmitting them across the phone line.
  6. Your ISP has a direct link to the Internet on the other end of the phone line. Each packet's destination address is examined by the ISP's router, which determines where it should be sent. Another router is frequently the packet's next stop. Later, we'll talk about routers and Internet infrastructure.
  7. The packets eventually make it to computer 5.6.7.8. The packets begin at the bottom of the TCP/IP stack on the destination machine and work their way up.
  8. All routing data contributed by the transmitting computer's stack (such as IP address and port number) is taken from the packets as they progress up the stack.
  9. The packets have been reassembled into their original form when the data reaches the top of the stack, "Hello computer 5.6.7.8!"


Networking Infrastructure


So now you know how packets travel from one computer to another over the Internet. But what's in-between? What actually makes up the Internet? Let's look at another diagram:

Diagram 3

Diagram 1 has been redrawn in greater detail. The physical connection to the Internet Service Provider via the phone network may have been obvious, but what happened after that may require some explanation.

For dial-in subscribers, the ISP maintains a pool of modems. This is controlled by a computer (typically a dedicated one) that directs data from the modem pool to a backbone or dedicated line router. Because it'serves' network access, this system is referred to as a port server. In most cases, billing and usage data is also gathered here.

After passing through the phone network and your ISP's local equipment, your packets are routed to the ISP's backbone or a backbone from which the ISP purchases bandwidth. The packets will then travel through multiple routers, backbones, dedicated lines, and other networks until they reach their destination, the computer with the IP address 5.6.7.8. But wouldn't it be great if we could see the exact path our packets took through the Internet? There is, as it turns out, a way...

When you use traceroute, you'll discover that your packets have to pass through a lot of different things before they reach their destination. sjc2-core1-h2-0-0.atlas.digex.net and fddi0-0.br4.SJC.globalcenter.net is a examples of lengthy names. These are the Internet routers that determine where your packets should be sent. Diagram 3 depicts a number of routers, but only a few. The purpose of Diagram 3 is to demonstrate a basic network structure. The Internet, on the other hand, is far more complicated.


Internet Infrastructure


The Internet backbone is made up of a number of big networks that are linked together. Network Service Providers, or NSPs, are the companies that run these enormous networks. UUNet, CerfNet, IBM, BBN Planet, SprintNet, PSINet, and other significant NSPs are among them. These networks exchange packet traffic via peering with one another. Three Network Access Points, or NAPs, are required for each NSP. Packet traffic may jump from one NSP's backbone to another's backbone at the NAPs. Metropolitan Area Exchanges, or MAEs, are also used by NSPs to link. 

MAEs are similar to NAPs in that they serve the same goal but are privately owned. NAPs were the first points of interconnection for the Internet. Internet Exchange Points, or IXs, refer to both NAPs and MAEs. Smaller networks, such as ISPs and smaller bandwidth providers, also buy bandwidth from NSPs. This hierarchical architecture is depicted in the diagram below.

Diagram 4



This is not an accurate portrayal of an actual Internet page. The purpose of Diagram 4 is to show how NSPs can communicate with each other and with smaller ISPs. Diagram 4 does not display any of the physical network components that are illustrated in Diagram 3. This is due to the fact that the backbone infrastructure of a single NSP is a complicated drawing in and of itself. The majority of NSPs make maps of their network architecture available on their websites, which are freely accessible. Due to its size, complexity, and ever-changing structure, drawing an accurate map of the Internet would be practically impossible.

The Internet Routing Hierarchy

So, how do packets get from one end of the Internet to the other? Is every computer connected to the Internet aware of the location of other computers? Is it true that packets are simply 'broadcast' to any machine connected to the Internet? The answer is 'no' to both of the above questions. Packets are not sent to every computer, and no computer knows where the other machines are. Routing tables, which are retained by each router connected to the Internet, contain the information necessary to get packets to their destinations.

Routers are packet switches. In order to route packets between networks, a router is frequently attached between them. Each router is aware of its sub-networks and the IP addresses assigned to them. In most cases, the router has no idea what IP addresses are 'above' it. Take a look at Diagram 5 below. Routers are the black boxes that connect the backbones. A NAP connects the larger NSP backbones at the top. There are multiple sub-networks beneath them, and even more sub-networks beneath them. Two local area networks with computers are connected at the bottom.

Diagram 5

When a packet arrives at a router, the router checks the IP address assigned by the originating computer's IP protocol layer. The router does a routing table check. The packet is delivered to the network that has the IP address if it is detected. If the network carrying the IP address cannot be found, the router sends the packet on a default route, which normally leads to the next router in the backbone hierarchy. Hopefully, the packet will be delivered to the next router. If not, the packet is rerouted upstream until it reaches an NSP backbone. The routing tables on the routers connected to the NSP backbones are the largest, and the packet will be routed to the proper backbone from here, where it will continue its 'downward' journey through smaller and smaller networks until it reaches its destination.


Domain Names and Address Resolution


But what if you don't know the IP address of the machine to which you'd like to connect? What if you need to connect to a web server called www.anothercomputer.com? How does your web browser figure out where this computer is on the Internet? The Domain Name Service, or DNS, is the answer to all of these issues. The Domain Name System (DNS) is a distributed database that keeps track of computer names and IP addresses on the Internet.

A portion of the DNS database, as well as the software that allows others to access it, is hosted on many computers connected to the Internet. DNS servers are the machines in question. The complete database is not stored on any DNS server; just a fraction of it is. If a DNS server does not have the domain name that another computer has requested, the requesting computer is sent to another DNS server.

Diagram 6

The Domain Name Service follows the same hierarchical structure as IP routing. The machine requesting a name resolution will be redirected 'up' the hierarchy until a DNS server capable of resolving the domain name in the request is discovered. A section of the hierarchy is depicted in Figure 6. The domain roots are at the very top of the tree. Near the top, you can see some of the older, more popular domains. The rest of the hierarchy, which is made up of a large number of DNS servers located all over the world, is not visible.

One primary and one or more secondary DNS servers are commonly configured as part of the installation of an Internet connection (e.g. for a LAN or Dial-Up Networking in Windows). Any Internet apps that require domain name resolution will be able to function properly as a result of this. When you type a web address into your browser, for example, the browser connects to your primary DNS server first. The browser connects to the target machine and requests the web page you requested after obtaining the IP address for the domain name you specified.


Internet Protocols Revisited

As mentioned above in the section on protocol stacks, there are numerous protocols that are utilised on the Internet. This is correct; the Internet's operation necessitates the use of numerous communication protocols. TCP and IP protocols, routing protocols, medium access control protocols, application level protocols, and so on are among them. The sections that follow discuss some of the most important and widely used Internet protocols. Lower-level protocols are covered first, then higher-level protocols.


Application Protocols: HTTP and the World Wide Web


The World Wide Web is one of the most widely utilised Internet services (WWW). The Hypertext Transfer Protocol, or HTTP, is the application protocol that allows the web to function. This is not to be confused with the Hypertext Markup Language (HTML). The language used to create web pages is HTML. HTTP is the Internet protocol used by web browsers and web servers to communicate with one another. Because it sits on top of the TCP layer in the protocol stack and is utilised by certain programmes to communicate with one another, it is referred to as an application level protocol. Web browsers and web servers are the applications in this situation.

HTTP is a text-based protocol that does not require a connection. Clients (web browsers) request web elements such as web pages and images from web servers. The connection between the client and the server across the Internet is severed when the request is handled by a server. Each request necessitates the establishment of a new connection. The vast majority of protocols are connection-oriented. This indicates that the two computers speaking with each other keep their Internet connection active. However, HTTP does not. A fresh connection to the server is required before a client may make an HTTP request.

When you type a URL into a web browser, this is what happens:

  1. If the URL includes a domain name, the browser initially connects to a domain name server and obtains the web server's IP address.
  2. The web browser connects to the web server and sends an HTTP request for the desired web page (through the protocol stack).
  3. When the web server receives the request, it looks for the page that is requested. The web server sends the page if it exists. If the server is unable to locate the requested page, an HTTP 404 error message will be sent. (As everyone who has used the internet knows, 404 stands for 'Page Not Found.')
  4. The page is returned to the web browser, and the connection is ended.
  5. The browser then parses the page, looking for extra page elements that it requires to finish the web page. Images, applets, and other multimedia are common examples.
  6. For each element that is required, the browser establishes extra connections and sends HTTP requests to the server.
  7. The page will be completely loaded in the browser window once the browser has finished loading all pictures, applets, and other elements.
The majority of Internet protocols are defined in Request For Comments (RFC) publications. RFCs can be found in a variety of places on the Internet. For appropriate URLs, see the Resources section below. RFC 1945 specifies HTTP version 1.0.

Application Protocols: SMTP and Electronic Mail


Electronic mail is another widely utilised Internet service. Simple Mail Transfer Protocol, or SMTP, is an application level protocol used by e-mail. SMTP, like HTTP, is a text-based protocol, but it is connection-oriented. SMTP is also more difficult to use than HTTP. In comparison to HTTP, SMTP has a lot more commands and concerns.

When you open your mail client to read your e-mail, this is what typically happens:
  1. The default mail server is connected to by the mail client (Netscape Mail, Lotus Notes, Microsoft Outlook, and so on). When a mail client is installed, the IP address or domain name of the mail server is usually configured.
  2. To identify itself, the mail server will always send the initial message.The client will issue an SMTP HELO command, and the server will respond with a status code of 250 OK.
  3. The proper SMTP commands will be sent to the server, which will react accordingly, depending on whether the client is checking mail, sending mail, etc.
  4. The proper SMTP commands will be sent to the server, which will react accordingly, depending on whether the client is checking mail, sending mail, etc.
  5. Until the client provides an SMTP QUIT command, the request/response transaction will continue. The server will then say its goodbyes and end the connection.

Below is a simple 'conversation' between an SMTP client and an SMTP server. S: signifies messages sent by the client and R: denotes messages sent by the server (receiver) (sender).

Smith at host USC-ISIF sends mail to Jones, Green, and Brown at host BBN-UNIX in this SMTP example. We'll suppose that host USC-ISIF makes direct contact with host BBN-UNIX. Jones and Brown's mail is accepted. At host BBN-UNIX, Green does not have a mailbox.

------------------------------------------------------

         R: 220 BBN-UNIX.ARPA Simple Mail Transfer Service Ready

         S: HELO USC-ISIF.ARPA

         R: 250 BBN-UNIX.ARPA


         S: MAIL FROM:<Smith@USC-ISIF.ARPA>

         R: 250 OK


         S: RCPT TO:<Jones@BBN-UNIX.ARPA>

         R: 250 OK


         S: RCPT TO:<Green@BBN-UNIX.ARPA>

         R: 550 No such user here


         S: RCPT TO:<Brown@BBN-UNIX.ARPA>

         R: 250 OK


         S: DATA

         R: 354 Start mail input; end with <CRLF>.<CRLF>

         S: Blah blah blah...

         S: ...etc. etc. etc.

         S: .

         R: 250 OK


         S: QUIT

         R: 221 BBN-UNIX.ARPA Service closing transmission channel


This SMTP transaction is taken from RFC 821, which specifies SMTP.


Transmission Control Protocol


The TCP layer is found behind the application layer in the protocol stack. The messages that apps deliver (using a specific application layer protocol) are transferred down the stack to the TCP layer when they open a connection to another machine on the Internet. TCP is in charge of routing application protocols to the correct destination computer application.Port numbers are utilised to do this. Each computer's ports can be thought of as distinct channels. You can, for example, surf the web while reading e-mail. This is because these two applications (the web browser and the mail client) used separate port numbers. The TCP layer determines which application receives a packet based on a port number when it arrives at a computer and makes its way up the protocol stack.

TCP works like this:

  • When the TCP layer gets the protocol data from the application layer, it divides it into manageable 'chunks' and adds a TCP header with appropriate TCP information to each 'chunk.' The TCP header contains information such as the port number of the application to which the data must be transferred.

  • When the TCP layer gets a packet from the IP layer below it, it strips the TCP header data from the packet, does any necessary data reconstruction, and then transmits the data to the correct application using the port number extracted from the TCP header.

This is how TCP routes the data moving through the protocol stack to the correct application.

The Transmission Control Protocol (TCP) is not a text-based protocol. TCP is a dependable, connection-oriented byte stream service. Connection-oriented means that before transferring data, two TCP-enabled apps must first establish a connection. TCP is dependable because it sends an acknowledgement to the sender for each packet received, confirming delivery. TCP also provides a checksum in its header to ensure that the data received is error-free. This is how the TCP header looks:

Diagram 7

In the TCP header, you'll notice that there's no area for an IP address. This is due to TCP's lack of understanding of IP addresses. TCP's job is to reliably transfer application-level data from one application to another. IP's job is to transfer data from one computer to another.

Internet Protocol

IP, unlike TCP, is a connectionless, unreliable protocol. IP is unconcerned with whether or not a packet reaches its intended destination. IP is also unaware of connections and port numbers. The function of IP is to send and route packets to other computers. IP packets are self-contained entities that may arrive in a jumbled state or not at all. TCP's job is to ensure that packets arrive in the correct order. The way IP receives data and adds its own IP header information to TCP data is about the only thing it has in common with TCP. This is how the IP header looks:

Diagram 8

The IP header above shows the sender and receiving machines' IP addresses. After going through the application layer, TCP layer, and IP layer, a packet looks like this. In the TCP layer, the application layer data is segmented, the TCP header is added, the packet continues to the IP layer, the IP header is added, and the packet is then sent across the Internet.

Diagram 9

Wrap Up

You now understand how the Internet works. But how long will this be the case? The present Internet version of IP (version 4) only allows for 232 addresses. There will eventually be no more available IP addresses. Surprised? Don't be concerned. A coalition of research universities and enterprises is now testing IP version 6 on a research backbone. What happens after that? Who knows what will happen. Since its inception as a Defense Department research project, the Internet has gone a long way.Nobody knows what the Internet will become in the future. However, one thing is certain. The Internet will unite the world in a way that no other mechanism has ever done before. The Information Age has arrived, and I am delighted to be a part of it.

Rus Shuler, 1998
Updates made 2002


Bibliography

The books listed below are wonderful resources that considerably aided in the writing of this paper. Stevens' book, in my opinion, is the best TCP/IP reference ever written and might be regarded the Internet bible. Sheldon's book is far more comprehensive and offers a wealth of networking information.

  • TCP/IP Illustrated, Volume 1, The Protocols.
         W. Richard Stevens.
         Addison-Wesley, Reading, Massachusetts. 1994.
  • Encyclopedia of Networking.
         Tom Sheldon.
         Osbourne McGraw-Hill, New York. 1998.

  • Post a Comment

    0 Comments