Hashes
Hashes are values generated from a hash function (an algorithm) that transforms an input (such as text or file) into a fixed-length hexadecimal format string. These strings, called "hash values" or simply "hashes," are unique to each input, meaning that even a small change in the original input results in a completely different hash.

Important Hash Properties:
- Determinism: The same input always generates the same hash.
- Speed: Hash functions are fast to compute.
- Uniqueness: Different inputs should produce different hashes (although collisions are possible, they are rare in good hash functions).
- Irreversibility: Given a hash, it is impractical to reconstruct the original input. It should not be possible for another function to reverse the process.
Common Uses:
-
Security: Hashes are used to store passwords securely. For example, if a user is registering on your application and typed a password, we should not store the password directly in the database. Anyone with access to the database or even an attack would totally expose the credentials. To avoid this, we pass the password through a hash function and store the hash in the database. When the user types the password again, pass it through the hash function and compare with the value you have in the database.
- If you store the user's password in plain text and this data is leaked, the problem can be bigger than we think. Many users use the same password in many places and we could compromise much more than access to one application.
Knowing this, always try to use 2FA on all your passwords when possible, never use the same password and always use strong passwords to avoid brute force attacks (trial and error).
-
Data Integrity: File integrity verification (such as using MD5 or SHA-256 to check if a file has been corrupted or altered). If we generate a hash over a file, it will always have the same hash for the same applied hash function. In the case of file integrity we can use simpler hash functions than those used for passwords.
-
Data Structures: In structures like tables, hash allows for fast searches. This is another very commonly used functionality in development and databases, especially NoSQL.
-
Cryptography: Hashes are part of many cryptographic algorithms and authentications.
Examples of common hash functions:
- MD5: Generates 128-bit hashes, but is no longer considered secure.
- SHA-1: Generates 160-bit hashes, but also has vulnerabilities.
- SHA-256: Part of the SHA-2 family, is much more secure and commonly used today.
We can still spice up the hash by passing one hash function over another to make it even harder.
The algorithms mentioned above were designed to be efficient and are mainly used for integrity. When we talk about password hashes used by programmers, we move to less efficient but much more secure algorithms. Not being efficient in this case is good, because the more efficient it is, the more power we give to the hacker who will try more times per second to break the password. In this scenario, programmers generally use bcrypt and PBKDF2 precisely because there's no way to make these algorithms faster, it's part of their design to be slower to run, making the work of breaking passwords difficult. Actually, they are not hash algorithms but password derivation algorithms, made to take a short or weak password and try to derive a longer and stronger password.
In summary, hashes are fundamental for various applications in technology, especially where data security and integrity are crucial.
Hashes and Integrity
Let's download Kubernetes. If we look at changelogs we arrive at this link.
Download one of the files and compare the hash to ensure file integrity. The advantage of putting everything in a single tar.gz file is that we can pass the hash over it. Copy the link and let's go to the terminal.

❯ wget https://dl.k8s.io/v1.31.0/kubernetes-server-linux-amd64.tar.gz
--2024-08-20 11:50:02-- https://dl.k8s.io/v1.31.0/kubernetes-server-linux-amd64.tar.gz
Resolving dl.k8s.io (dl.k8s.io)... 34.107.204.206, 2600:1901:0:26f3::
Connecting to dl.k8s.io (dl.k8s.io)|34.107.204.206|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn.dl.k8s.io/release/v1.31.0/kubernetes-server-linux-amd64.tar.gz [following]
--2024-08-20 11:50:02-- https://cdn.dl.k8s.io/release/v1.31.0/kubernetes-server-linux-amd64.tar.gz
Resolving cdn.dl.k8s.io (cdn.dl.k8s.io)... 151.101.93.55, 2a04:4e42:16::311
Connecting to cdn.dl.k8s.io (cdn.dl.k8s.io)|151.101.93.55|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 359694851 (343M) [application/x-tar]
Saving to: 'kubernetes-server-linux-amd64.tar.gz'
kubernetes-server-linux-amd64.tar.gz 100%[===================================================================================================>] 343,03M 3,69MB/s in 94s
2024-08-20 11:51:37 (3,67 MB/s) - 'kubernetes-server-linux-amd64.tar.gz' saved [359694851/359694851]
# Let's pass the same hash function and compare the values. The value above in the image is 4d73777e4f139c67c4551c1ca30aefa4782b2d9f3e5c48b8b010ffc329065e90ae9df3fd515cc13534c586f6edd58c3324943ce9ac48e60bb4fa49113a2e09d4
❯ sha512sum kubernetes-server-linux-amd64.tar.gz
4d73777e4f139c67c4551c1ca30aefa4782b2d9f3e5c48b8b010ffc329065e90ae9df3fd515cc13534c586f6edd58c3324943ce9ac48e60bb4fa49113a2e09d4 kubernetes-server-linux-amd64.tar.gz
We can then confirm the integrity of our files this way.