Exploring the Depths of the Internet
Hey everyone, how's it going?
In this post, I want to quickly explain how the Internet works and how vast it is. I won't go too deep so everyone can absorb at least the basics.
Anyone reading this text has Internet access, is using a browser, and is accessing this web page via https.
Two words were mentioned: Internet and Web. What's the difference?
Internet​
It's the global infrastructure that connects millions of computer networks around the world. It enables communication between devices through a series of communication protocols, which are standardized rules for information exchange. Some of the main protocols that make the Internet work include:
-
TCP/IP
(Transmission Control Protocol/Internet Protocol): The fundamental set of Internet protocols. IP is responsible for addressing and routing data packets between devices, while TCP ensures data is transmitted reliably, correcting errors and reordering packets when necessary. -
DNS (Domain Name System)
: Translates human-readable domain names (like google.com) into IP addresses that machines use to locate devices on the network. -
FTP (File Transfer Protocol)
: A protocol used to transfer files between devices. -
SMTP (Simple Mail Transfer Protocol)
: Used to send emails. -
HTTP/HTTPS (Hypertext Transfer Protocol/Secure)
: Protocols used on theWeb
to transfer pages and resources between servers and browsers.
In a very generic way, to keep it simple, that's it.
Every time a computer accesses the Internet, it's part of another network, which is inside another network, and inside another, and another, etc. When a connection needs to be established for information exchange, the packet climbs to a larger network until it finds a possible path to reach the destination—theoretically, they'll be part of the same network at some point. If a computer on the purple network wants to access another on the orange network, both are part of the same Blue network, so they communicate.
Web - World Wide Web​
It's a set of pages and documents accessible via the Internet, using protocols like HTTP/HTTPS. When we access a website through a browser, we're using the Web, which is just a part of what the Internet offers.
It's the software layer that some server is offering with the pages we want to access.
An FTP uses the Internet but is not the Web.
DNS (Domain Name System)​
DNS works like an address book. It's the name resolver that makes it easier to remember pages by name instead of memorizing the IP. When we type devsecops.puziol.com.br, the browser will ask the DNS server what the IP is of the server offering the page whose name resolves to devsecops.puziol.com.br. That's why we buy domains—to put your name in an address book that will point to the correct IP.
Roughly speaking, each website responds on an IP, but it's perfectly possible for it to respond on a group of IPs.
This is a more complex subject that I plan to write about in more detail in the future.
Surface Web vs Deep Web​
Understanding the above concepts, we can move to the next stage.
The Surface Web consists of sites visible to search engines. They're "public" sites that don't require any type of credentials for access. It represents on average 5% of what's available on the Internet.
On the other hand, any website or area that requires access control is part of what we call the Deep Web. That page from your bank or your private profile on some site, which can't be reached directly by a search engine like Google, is part of the Deep Web.
- Emails in inboxes
- Intranets
- Cloud Drives
- Etc.
The Deep Web is legitimate and represents on average 95% of the Internet, composed of private and secure information that should not be publicly accessible.
Dark Web​
It's a small part of the Deep Web, accessible only through specialized browsers, such as the Tor Browser. Pages hosted within the Tor network usually have a very complicated address, which is derived from a cryptographic hash and a .onion domain.
Tor (The Onion Router) is a legitimate project and aims to preserve human rights to privacy and freedom of expression. Believe it or not, Tor was developed by the U.S. Navy and is now managed by a non-profit organization called the Tor Project.
The Dark Web is deliberately hidden and encrypted, offering anonymity to its users. Although it's famous for illegal activities, such as drug and weapons trafficking, it's also used for legitimate purposes, such as anonymous communication in authoritarian regimes or by journalists who want to protect their sources.
Although Tor provides anonymity, it's important to be cautious. Many sites on the Dark Web are known for illegal activities. Avoid sharing personal information and use security tools like VPNs when possible.
Using the browser won't prevent access to sites we already know from the Surface Web, but it will change the way communication is done, making it slower due to the multiple layers of encryption in the Onion network that we'll see further ahead.
It's important to remember that, even when using Tor, a normal Surface Web site can still save your account data. Tor only makes traceability to the connection source difficult, but if you're accessing your own account, identification will still be possible because the account is linked to you.
How the Onion Network Works​
On the normal Internet, your connection is "direct," from your device to the server, which makes it easy to track your IP and other information. Of course, there are routings and some layers, but let's abstract that. Data is also encrypted using https, but it's possible to have traceability from origin to destination at various points in this communication. Don't be fooled into thinking the Internet is decentralized because it's not, despite there being a movement to make it so.
In Tor, the network hides your identity (IP) by passing through several intermediate nodes, making the final destination only see the exit node's IP. These servers are called relays, and a minimum of three relay nodes (or "hops") is used to forward network traffic securely and anonymously.
Each node only knows the previous node and the next one in the chain, and the message is encrypted at each stage. This makes it difficult to track where the original request comes from and where it's going, but it can also make browsing a bit slower, since data needs to pass through several nodes around the world. The Tor network uses encryption over encryption, as we'll see further ahead in the figure.
These nodes are:
-
Entry (Guard Node): Your connection is encrypted and sent to the first node in the network, known as the entry node. This node knows your real IP address but doesn't know what the final destination is.
-
Middle Nodes (Middle Nodes): Then, the connection is relayed through several intermediate nodes, each only knowing the previous node and the next one, with no visibility of the origin or destination of the communication.
-
Exit (Exit Node): The last node (exit node) decrypts the data and sends it to the destination website or service. This node knows the destination but not the original requester's IP.
When you access a .onion site, the Tor network resolves the address within its own infrastructure. There's no need for external DNS servers, since the mapping between the .onion name and the hidden service is managed directly by the network nodes.
Testing Tor​
Using the Tor Browser without VPN to access a Surface Web site, we can observe that the returned IP is in California. This IP is from an exit relay node, that is, the last node of the 3 it passed through that delivered the request.
Now let's test a domain that's only accessible within the Tor network, the Tor project's own site. See what the domain used looks like.