This course covers the theory behind modern cryptography. The goal of this field is to secure information. To do this, we want to uphold:
- Data Privacy: Ensure that adversaries cannot see or obtain the message.
- Data Integrity / Authentication: The message origin can be verified (authenticity), and the message has not been modified in transit (integrity).
Common symbols / terminology:
- : The message we want to keep secure
- : A key we use to encrypt and/or decrypt the message
- : The encrypted message; ciphertext.
Symmetric Key Encryption (Ciphers)
One of the driving principles behind symmetric key encryption is as follows:
Kerckhoffs' Principle
A good cipher scheme should be able to be public, without adversaries being able to use it to decrypt messages. To do this, both parties share a secret key for encryption / decryption which adversaries are assumed to not know.
We always have to assume that the crypto designs are public! This is more suitable for large-scale usage of cryptography, and exposes schemes to the public eye so that they can be revised and strengthened.
Historical Ciphers
For each of the following historical schemes, we will discuss the encryption algorithm, the decryption algorithm, the key-space (set of all possible keys) and secret key, and how to break the scheme.
Atbash Cipher
The Atbash Cipher substitutes the first letter with the last, the 2nd with the 2nd last, and so on. To encrypt and decrypt, simply swap the letters.
Because there is nothing that is secret, there is no key for this algorithm.
Shift / Caesar Cipher
The Caeser Cipher replaces each letter by the letter which is positions away in the alphabet. On a message, this has the effect of βshiftingβ every letter positions in the alphabet.
As there are 26 letters in the alphabet, there are 26 possible shifts we could do. Thus, our key-space has a size of 26.
Because the key-space is so small, we can easily brute-force this cipher by trying all possible shifts! Thus, this cipher is suspectible to a brute-force search.
For a secure scheme, itβs necessary that the key space is large! However, a large key space does not imply security.
Scytale Cipher
The Scytale Cipher stacks letters of the message into equal-sized rows. Reading the message column-by-column will give you the ciphertext (encryption), and reading the message row-by-row will give you the plain-text (decryption).
As we could have a nearly infinite number of rows, the key-space is arbitrarily large!
That does not make this cipher secure however, as we may be able to recover the key! Starting from the first letter, we can take letters that are offset positions from the start, to try to form a word. Once we get something that makes sense, weβll have recovered our key!
Information about our ciphertext can let us reverse engineer the key!
Monoalphabetic Substitution
The Monoalphabetic Substitution maps every plaintext letter to a different ciphertext character, and substitutes them as so. Using this map, we can encrypt and decrypt our message.
The size of our key-space is , which is really large! So, brute force search is not possible, but because of the 1-1 mapping, we could do a frequency analysis to recover the key!
Frequency Analysis
Frequency analysis is based off the fact that letters in (gramatically correct) English generally have different usage frequencies. If we know the frequencies of the ciphertext characters, we can guess what the mappings are.
Definitions
Symmetric Encryption Scheme
Here, we define notation for a symmetric encrpytion scheme.
A symmetric encryption scheme is defined by 3 algorithms: Gen
, Enc
, Dec
. Let be our message space, with .
Gen
: The key-generation algorithm (typically probabilistic), which creates a key according to some distribution (must be probabilistic).- denotes the keyspace, the set of all possible keys.
Enc
: The encryption algorithm, which takes an input key and message to create ciphertext (could be probabilitic).- denotes the ciphertext space, the set of all possible ciphertexts.
Dec
: The decryption algorithm, which takes an input key and ciphertext to create message (must be deterministic).
Correctness mandates that . If this does not hold, our encryption scheme wonβt be very practical.
Each of the spaces can be given as random variables with probability distributions. Let denote the probability function. Then:
- denotes the probability that the key output by
Gen
yields .Typically, we will assume that has the uniform probability distribution (each key is of equal probability)
- denotes the probability that the message is equal to , modeling some prior knowledge the adversary may have about the message.
- denotes the probability that the ciphertext is , which is fully determined by the distribution on , , and
Enc
.
We assume that the distributions over and are independent.
Perfect Secrecy
We say an encryption scheme over a message space is perfectly secret if probability distributions over , such that ,
In other words, the probability that our message is does not change even if we know what the ciphertext is. Sometimes, we denote as the a priori distribution, and as the a posteriori distribution.
Thus, even if the adversary sees , this does not change any of their knowledge about the message space!
Lemma 1
An encryption scheme over a message space is perfectly secret if and only if probability distribution over , , ,
In other words, this is saying that the ciphertext is independent of the message (as we can still vary the key ).
Proof (One-Way)
Suppose that we have a perfectly secret scheme. We wish to show that is also true.
Fix message distribution , , . By definition, we know that the following must hold by perfect secrecy.
and by Bayeβs Rule,
Lemma 2: Perfect Indistinguishability
An encryption scheme over a message space is perfectly secret if and only if probability distribution , , and ,
Example: Perfect Secrecy Example
An encryption scheme with message space is perfectly secret if and only if probability distribution over , and , we have
Prove or refute this.
This is false. For every perfectly secret encryption scheme, we can always choose a distribution on for which this is false.
By way of contradiction, suppose this is true. Now, choose a distribution such that
Then, by definition of perfect secrecy,
Which is a contradiction!
Example: Perfect Secrecy Example (2)
An encryption scheme with message space is perfectly secret if and only if probability distribution over , and , ,
- Explain this definition in English.
- Why is this a bad definition? Describe an encryption scheme that leaks information about the message but still satisfies the definition.
Seeing the ciphertext does not tell us any information about the key.
This is bad, because we could just choose an encryption scheme with only one key, so itβs completely deterministic! For example, we do a shift cipher with only possible key . Then, for any ciphertext, , but we can easily reverse the encryption scheme.
Another solution is, this definition says nothing about the message! So, we could have a scheme that doesnβt do anything to the message (leaves it unencrypted), and generates a random key!