Basic Cryptography
Cryptography has been the most challenging theme during my path in cyber security. To understand it, one must comprehend the real-world implications of mathematical concepts.
Physics and chemistry were always significantly easier than pure math in my school time since I could envision math as a tool to understand the world around me, not just as a subject per se. My approach to cryptography is very similar, helping me deal with its complexities.
Cryptosystems
Cryptography is primarily used to establish confidentiality: only an authorized individual can access a secret message. For this, cryptography makes use of cryptosystems. A cryptosystem is a suite of algorithms and keys that converts a message, named a plaintext, into an incomprehensible set of characters known as ciphertext. Such operation encompasses the combination of plaintext and a cryptographic key inside the step-by-step transformation process inside an algorithm, outputting the ciphertext.
The effectiveness of a cryptosystem is measured by the level of confusion and diffusion it provides. Confusion is achieved when there are differences between plaintext and ciphertext in such a way as to hide the relation between both. Diffusion is the degree of ciphertext alteration that results from plaintext modification. Ideally, at least fifty percent of the ciphertext must be modified if the plaintext is altered. Both confusion and diffusion make an attack difficult since guesses and pattern identification are not feasible.
As said above, cryptosystems are composed of cryptographic keys and algorithms. First, let’s discuss keys since the algorithms differentiate themselves by how they use them for their operations.
Keys
When discussing algorithms, one fundamental concept is that their operations are not secret. On the contrary, good algorithms are the ones that were peer-reviewed by the cybersecurity community. What provides the obstacle against attacks are the keys.
Keys are random or pseudo-random numbers combined with the message in a cryptographic operation. They provide the unpredictability necessary to make a ciphertext challenging to break. In addition to randomness, the bigger the keys, the harder they are to guess. So, the key strength is also determined by key length, which is generally measured by its number of bits (e.g., a 256-bit key). Some algorithms generate keys, producing them with the necessary size. But passwords can be used as keys, provided they are long and random enough.
The maximum period a key can safely be used is one year. However, cryptosystems often generate keys utilized for one session and immediately discarded. These are known as ephemeral keys. Session keys will be more thoroughly presented below when discussing perfect forward secrecy and its relation with asymmetric algorithms.
One fundamental aspect derived from the cryptographic key’s role in encryption is the need to securely and reliably generate, store, and transmit them since the recipient must have a way to decrypt the message. There are solutions for those three demands, which implement specialized hardware and software, including other encryption algorithms. They will be discussed in more detail below.
Initialization vectors (IV) and nonces complement keys. They are frequently used in block ciphers and hashing functions (both discussed below). IVs are a sequence of random or pseudo-random numbers generated and used to provide an initial state from what the algorithm can work. Nonces (aka numbers only used once) work similarly to IVs but have a counter attached, ensuring they are used only once in a given algorithm operation. This makes nonces perfect to counter replay attacks, where the bad actor may try to replay sessions.
Symmetric Algorithms
When explaining cryptographic algorithms, it’s necessary to determine their various types. The most straightforward to understand are symmetric algorithms. They use the same key for encryption and decryption, hence the term symmetric. Their other name is Secret Key Algorithms, which describes the importance of safeguarding the key: decrypting a ciphertext is easy if someone can access it.
Symmetric algorithms are further divided into two subtypes, following the way encryption is executed: stream ciphers and block ciphers. Stream ciphers perform the encryption bitwise, while block ciphers operate on fixed-length bit blocks.
Both have other significant differences. Block ciphers consume more computational resources and are slower than stream ciphers. This decreases their adoption when compared with their stream counterparts. On the other hand, block ciphers provide higher diffusion and have different modes of operation, which essentially turns them into stream ciphers.
Another problem with block ciphers is the need for padding data when the message chunk isn’t long enough to fit perfectly with the block’s pre-defined length. Changes or corruption of some bits can lead to corruption of the resulting encrypted block, which, on the other hand, doesn’t affect stream ciphers’ output. At the same time, this leads to errors but can stop attempts at implementing malicious insertions into the message.
Block cipher’s operation modes may implement nonces and IVs differently, even using the previous encrypted block to execute encryption on the next block. These modes are:
- Electronic Codebook (ECB) is the most straightforward mode. It segments the message into blocks with pre-defined length and uses the same key to encrypt each block. Since it uses the same key for each operation, the ECB mode has a degree of predictability, which makes it significantly vulnerable to cryptanalytic attacks;
- Cipher Block Chaining (CBC) seeks to overcome this ECB problem. CBC starts like ECB, dividing the message into fixed-length blocks. The differences, though, appear following this step. CBC combines an IV with the plaintext using an exclusive-or (XOR) operation. The algorithm then encrypts this combination with the key. This first ciphertext is combined/XOR’d with the second block of plaintext, acting as the second block’s IV. The key encrypts this combination. This following cipher text is combined with the third plaintext block. The same calculations are performed until the message in its entirety is encrypted;
- Counter Mode (CTR) uses nonce. It starts by encrypting the nonce with the encryption key, followed by XOR’ing this encrypted nonce value with the plaintext block. The counter is then increased by one, producing a new nonce. This new value is then encrypted with the same key and combined with the next plaintext block through XOR calculation. All this operation is then repeated for the following blocks of plaintext until the end. Galois Counter Mode (GCM) is a variant of CTR that incorporates authentication by using associated data as authentication tags. It is important to note that CTR effectively turns a block cipher into a stream cipher.
The Data Encryption Standard (DES), 3DES, Advanced Encryption Standard (AES), Blowfish, Twofish, and RC4 are essential examples of symmetric algorithms. Their particularities will be discussed in a future post.
Asymmetric Algorithms
Asymmetric algorithms implement cryptography by using two keys instead of one. The private key, as the name implies, remains secure with the sender, whereas the public key is shared with other individuals. When Alice wants to send a secret message to Bob, Alice uses Bob’s public key to encrypt the message. Bob then uses his private key to decrypt the message. Both public and private keys have a mathematical relationship that must not be easily discoverable.
These algorithms are ideal when trading information over unsecured channels since only the key for encryption is publicly available. Symmetric algorithms, on the other hand, while fast and efficient, must be implemented in a previously secured environment since the same key is used both for encryption and decryption. An attacker that intercepts the secret key will access all encrypted information.
Because of these pros and cons, both symmetric and asymmetric algorithms are used in a coordinated way. The secured channel the former demands is established by the latter: the secret key is encrypted by the recipient’s public key. When received, the recipient uses its private key to decrypt the secret key. Both parties then have a symmetric key available for secret and agile communication. RSA, Diffie-Hellman, and Elliptic Curve Cryptography (ECC) are examples of algorithms providing secured key exchange.
Asymmetric keys can also be used to generate session keys. As discussed above, cryptographic keys are used to secure communications, often session-wise, instead of being used repeatedly. At the beginning of each session, the private key generates a new secret key used as a session key. After the session ends, the secret key is discarded, and a new one must be created for the following session. This process provides an additional and essential advantage: even if an attacker intercepts the session key and decrypts it, it’s impossible to decrypt messages for future sessions, only for the present one. This aspect is what is called Perfect Forward Secrecy (PFS). This, again, reinforces the necessity to protect private keys. Compromising those means undermining the confidentiality of all sessions protected by them.
Aside from ensuring confidentiality, asymmetric algorithms may also provide authentication and non-repudiation. In principle, since the private key belongs only to a specific party, it can be used as a digital signature, i.e., as proof that the message came from the intended sender. This authentication happens as follows: the sender encrypts a hash of the message with its private key. The output is the digital signature. When the message reaches its destination, the recipient uses the sender’s public key to decrypt the signature. It then compares the decrypted hash with the message hash it has computed on the spot. If both hashes are equal, the message came from the intended sender.
An important factor was omitted here for simplicity’s sake: there has to be a guarantee that the private-public key pair belongs to the sender. A third party, named certificate authority, assigns a digital certificate, which proves the connection between individual and cryptographic keys. Digital certificates are a detailed subject discussed in a future post since this one is lengthy already.
Non-repudiation is a consequence of the authentication process. The recipient not only must have guarantees that the message is undeniably from the intended sender, but at the same time, the sender must know that the message was sent and correctly received and read. Both parties cannot deny their roles in the interaction.
Aside from RSA, Diffie-Hellman, and ECC, other asymmetric standards are Pretty Good Encryption (PGP) and GnuPG. All of them will be discussed in a future post.
Hashing
Cryptography can ensure confidentiality, authentication, and non-repudiation and also provide integrity. Message integrity is obtained when the message is processed by a one-way cryptographic function called a hash function. The output is a mathematical checksum. If any part of the message changes, the hash changes as well. That is why hashes are ideal for signaling file integrity. Since the function is one-way, hashing doesn’t work for confidentiality.
Hash functions can have some vulnerabilities. Collisions are one of them: there may be a situation where different inputs may produce the same hash. Attackers may exploit this fact by, for example, getting unauthorized access using a variation of a password close to the actual password enough for the algorithm to produce the same hash value.
Attackers can also have enormous tables with precomputed hashes of, for example, common passwords. With this procedure, called a rainbow table attack, they can guess more quickly which password a user is using by comparing the hash of the user password and the precomputed hash. To counteract this, organizations may apply salting, which juxtaposes random data (named salt) with the actual data and hashes both of them together. This increased unpredictability hinders the effectiveness of using precomputed hashes of known inputs.
Pre-conditions for Cryptography Implementation
As noted, cryptography ensures confidentiality, integrity, authentication, and non-repudiation. Organizations must first clearly understand these objectives when implementing cryptography and establishing facets of their operations and infrastructure that the adopted cryptographic procedures must protect.
Another important aspect is considering cryptographic service providers (CSPs), i.e., the hardware, software, and firmware destined to implement cryptography. Such operations require significant computational power, and ideally, companies can use special-purpose hardware, like hardware security modules (HSM). These can perform proper key management (generation, storage, and use) and provide digital signatures, cryptographic acceleration, and homomorphic processing.
The other side of this coin is how to provide adequate cryptographic protection in low-power equipment, such as those in the IoT category. A type of lightweight cryptography may be the still pending solution to this problem.
Software design for cryptography must use well-known and peer-reviewed algorithms. Organizations should avoid developing and using proprietary algorithms since the security community didn’t properly assess and analyze their vulnerabilities and limits. They also must follow other security standards, such as recommended modes of operation, proper key lengths, and reliable random number generators.
The National Institute of Standards and Technology (NIST) has a document describing all adequate measures regarding the use of cryptography, the Federal Information Processing Standard 140–2 (aka NIST FIPS 140–2).
Steganography
One final topic for this very long text is steganography. This peculiar term derives from the Greek meaning “concealed text.” Even though this concept may be similar to cryptography, steganography doesn’t encompass the conversion of plaintext into ciphertext. Its practice aims to hide sensitive information in ordinary files, such as an image or a music file.
For example, by converting an image file into an audio file, it may be possible to access some information hidden in the audio that would not have been accessed previously by just looking at or analyzing the image.